0

I have 5 lists of 50 unique elements each and a Pandas DF of 6 columns and 2000 rows.

I need to create a new DF with the same cols as in the original + add 2 new columns 'A' and 'B'. I also need rows with the original values multiplied by the total number of elements in the 5 lists (so 500K rows and 8 cols in total) .

100 rows of each group of 250 multiplied rows will have values in column 'A' (unique values from two of the lists) and 150 in column 'B' (unique values from three of the lists).

I have the result I want from combining itertuples() and nested for loops but processing takes too long.

Is there a more efficient way of doing this?

Example:

import pandas as pd

original_df = pd.read_csv('source.csv')

list_one = [elem1, elem2, ..., elem50]
list_two = [elem1, elem2, ..., elem50]
list_three = [elem1, elem2, ..., elem50]
... etc

final_df = pd.DataFrame(columns=['one', 'two', 'three', 'four', 'five', 'six', 'A', 'B'])

for row in original_df.itertuples():

    for e in list_one:
        temp_df = pd.DataFrame([[row.col_one, row.col_two, row.col_three, row.col_four, row.col_five, row.col_six, e, 'x']])
        final_df = final_df.append(temp_df)

    for e in list_two:
        temp_df = pd.DataFrame([[row.col_one, row.col_two, row.col_three, row.col_four, row.col_five, row.col_six, e, 'x']])
        final_df = final_df.append(temp_df)

    for e in list_three:
        temp_df = pd.DataFrame([[row.col_one, row.col_two, row.col_three, row.col_four, row.col_five, row.col_six, 'x', e]])
        final_df = final_df.append(temp_df)
.
.
.
etc
Leinad
  • 1
  • 2
  • Put a reproducible example in code in your question (like this https://stackoverflow.com/a/20159305/463796), and a fully working slow solution to compare against. You should use the same shape that you load from the .csv, and can use `range` or `np.random` to populate your DataFrame with test data. – w-m Oct 05 '21 at 08:49

0 Answers0