I have 5 lists of 50 unique elements each and a Pandas DF of 6 columns and 2000 rows.
I need to create a new DF with the same cols as in the original + add 2 new columns 'A' and 'B'. I also need rows with the original values multiplied by the total number of elements in the 5 lists (so 500K rows and 8 cols in total) .
100 rows of each group of 250 multiplied rows will have values in column 'A' (unique values from two of the lists) and 150 in column 'B' (unique values from three of the lists).
I have the result I want from combining itertuples() and nested for loops but processing takes too long.
Is there a more efficient way of doing this?
Example:
import pandas as pd
original_df = pd.read_csv('source.csv')
list_one = [elem1, elem2, ..., elem50]
list_two = [elem1, elem2, ..., elem50]
list_three = [elem1, elem2, ..., elem50]
... etc
final_df = pd.DataFrame(columns=['one', 'two', 'three', 'four', 'five', 'six', 'A', 'B'])
for row in original_df.itertuples():
for e in list_one:
temp_df = pd.DataFrame([[row.col_one, row.col_two, row.col_three, row.col_four, row.col_five, row.col_six, e, 'x']])
final_df = final_df.append(temp_df)
for e in list_two:
temp_df = pd.DataFrame([[row.col_one, row.col_two, row.col_three, row.col_four, row.col_five, row.col_six, e, 'x']])
final_df = final_df.append(temp_df)
for e in list_three:
temp_df = pd.DataFrame([[row.col_one, row.col_two, row.col_three, row.col_four, row.col_five, row.col_six, 'x', e]])
final_df = final_df.append(temp_df)
.
.
.
etc