179

I'm using Pandas data frames. I have a initial data frame, say D. I extract two data frames from it like this:

A = D[D.label == k]
B = D[D.label != k]

I want to combine A and B so I can have them as one DataFrame, something like a union operation. The order of the data is not important. However, when we sample A and B from D, they retain their indexes from D.

DevLoverUmar
  • 8,356
  • 8
  • 44
  • 81
MKoosej
  • 2,946
  • 3
  • 20
  • 29
  • Does this answer your question? [Pandas Merging 101](https://stackoverflow.com/questions/53645882/pandas-merging-101) – Gonçalo Peres Nov 02 '20 at 14:41
  • From `pandas v1.4.1`: The `frame.append` method is deprecated and will be removed from pandas in a future version. Use `pandas.concat` instead. – Trenton McKinney Apr 28 '22 at 17:06

5 Answers5

224

Deprecation Notice: DataFrame.append and Series.append were deprecated in v1.4.0

I believe you can use the append method

bigdata = data1.append(data2, ignore_index=True)

to keep their indexes just don't use the ignore_index keyword...

Henry Ecker
  • 31,792
  • 14
  • 29
  • 50
Joran Beasley
  • 103,130
  • 11
  • 146
  • 174
  • 1
    This works. It creates a new DataFrame though. Is there a way to do it inline? That would be nice for when I'm loading huge amounts of data from a database in batches so I could iteratively update the DataFrame without creating a copy each time. – Andrew Nov 05 '13 at 17:36
  • 1
    Yes, that's possible, see: https://stackoverflow.com/a/46661368/5717580 – martin-martin Oct 10 '17 at 07:55
  • 1
    From `pandas v1.4.1`: The `frame.append` method is deprecated and will be removed from pandas in a future version. Use `pandas.concat` instead. – Trenton McKinney Apr 28 '22 at 17:06
148

You can also use pd.concat, which is particularly helpful when you are joining more than two dataframes:

bigdata = pd.concat([data1, data2], ignore_index=True, sort=False)
vinzee
  • 17,022
  • 14
  • 42
  • 60
ostrokach
  • 14,836
  • 7
  • 69
  • 87
73

Thought to add this here in case someone finds it useful. @ostrokach already mentioned how you can merge the data frames across rows which is

df_row_merged = pd.concat([df_a, df_b], ignore_index=True)

To merge across columns, you can use the following syntax:

df_col_merged = pd.concat([df_a, df_b], axis=1)
vinzee
  • 17,022
  • 14
  • 42
  • 60
pelumi
  • 1,332
  • 11
  • 21
28

If you're working with big data and need to concatenate multiple datasets calling concat many times can get performance-intensive.

If you don't want to create a new df each time, you can instead aggregate the changes and call concat only once:

frames = [df_A, df_B]  # Or perform operations on the DFs
result = pd.concat(frames)

This is pointed out in the pandas docs under concatenating objects at the bottom of the section):

Note: It is worth noting however, that concat (and therefore append) makes a full copy of the data, and that constantly reusing this function can create a significant performance hit. If you need to use the operation over several datasets, use a list comprehension.

martin-martin
  • 2,834
  • 1
  • 30
  • 51
  • 2
    I think there should be `pd.concat(frames)` since pandas doesn't have `append` method. – My Work Jan 04 '21 at 09:37
  • 1
    I don't fully undestand the list "comprehension" focus. What's important here is not calling append every time and hence gathering all the dataframes into a list first. Whether that list is established through a list comprehension or not is completely irrelevant. – MrR Apr 27 '21 at 19:06
  • Thanks for the very relevant comments, I updated the answer to address them. – martin-martin May 14 '21 at 07:55
  • what is the intended definition of the process_file(f) function? – lrthistlethwaite Sep 14 '21 at 17:57
  • That was meant as an example for performing operations on the individual DFs before concatenating them, but I see it's less helpful than I initially thought. Updated the answer, thanks. – martin-martin Sep 15 '21 at 09:10
5

If you want to update/replace the values of first dataframe df1 with the values of second dataframe df2. you can do it by following steps —

Step 1: Set index of the first dataframe (df1)

df1.set_index('id')

Step 2: Set index of the second dataframe (df2)

df2.set_index('id')

and finally update the dataframe using the following snippet —

df1.update(df2)
Mohsin Mahmood
  • 2,506
  • 3
  • 16
  • 23