0

I have 180 000 pandas Series that I would need to combine into one DataFrame. Adding them one by one takes a lot of time, apparently because appending gets increasingly slower when the size of the dataframe increases. The same problem persists even if I use numpy which is faster than Pandas in this.

What could be an even better way to create a DataFrame from the Series?

Edit: Some more background info. The Series were stored in a list. It is sports data, and the list was called player_library with 180 000 + items. I didn't realise that it is enough to write just

pd.concat(player_library, axis=1) 

instead of listing all the individual items. Now it works fast and nicely.

MattiH
  • 416
  • 2
  • 8
  • Must you have them as a dataframe? could you modify your later code to take separate series? or perhaps go back to where the series were assigned and try to populate a dataframe from the start – RichieV Sep 05 '20 at 18:20

1 Answers1

1

You could try pd.concat instead of append.

If you want each series to be a column then

df = pd.concat([list_of_series_objects], axis=1)

For more detail on why it is expensive to iterate and append read this question

RichieV
  • 4,988
  • 2
  • 9
  • 24