How do I combine two dataframes?

Question

I'm using Pandas data frames. I have a initial data frame, say D. I extract two data frames from it like this:

A = D[D.label == k]
B = D[D.label != k]

I want to combine A and B so I can have them as one DataFrame, something like a union operation. The order of the data is not important. However, when we sample A and B from D, they retain their indexes from D.

Does this answer your question? [Pandas Merging 101](https://stackoverflow.com/questions/53645882/pandas-merging-101) — Gonçalo Peres, Nov 02 '20 at 14:41
From `pandas v1.4.1`: The `frame.append` method is deprecated and will be removed from pandas in a future version. Use `pandas.concat` instead. — Trenton McKinney, Apr 28 '22 at 17:06

score 224 · Accepted Answer · edited Mar 03 '22 at 03:39

224

Deprecation Notice: DataFrame.append and Series.append were deprecated in v1.4.0

I believe you can use the append method

bigdata = data1.append(data2, ignore_index=True)

to keep their indexes just don't use the ignore_index keyword...

edited Mar 03 '22 at 03:39

Henry Ecker

31,792
14
29
50

answered Oct 12 '12 at 00:07

Joran Beasley

103,130
11
146
174

1

This works. It creates a new DataFrame though. Is there a way to do it inline? That would be nice for when I'm loading huge amounts of data from a database in batches so I could iteratively update the DataFrame without creating a copy each time. – Andrew Nov 05 '13 at 17:36
1

Yes, that's possible, see: https://stackoverflow.com/a/46661368/5717580 – martin-martin Oct 10 '17 at 07:55
1

From `pandas v1.4.1`: The `frame.append` method is deprecated and will be removed from pandas in a future version. Use `pandas.concat` instead. – Trenton McKinney Apr 28 '22 at 17:06

score 148 · Answer 2 · edited Jun 18 '20 at 16:50

148

You can also use pd.concat, which is particularly helpful when you are joining more than two dataframes:

bigdata = pd.concat([data1, data2], ignore_index=True, sort=False)

edited Jun 18 '20 at 16:50

vinzee

17,022
14
42
60

answered May 31 '15 at 11:47

ostrokach

14,836
7
69
87

1

I want to use this, but I'm trying to concatenate two columns of the same name o_O – lifelonglearner Apr 01 '20 at 02:13

score 73 · Answer 3 · edited Jun 18 '20 at 16:50

73

Thought to add this here in case someone finds it useful. @ostrokach already mentioned how you can merge the data frames across rows which is

df_row_merged = pd.concat([df_a, df_b], ignore_index=True)

To merge across columns, you can use the following syntax:

df_col_merged = pd.concat([df_a, df_b], axis=1)

edited Jun 18 '20 at 16:50

vinzee

17,022
14
42
60

answered Sep 22 '16 at 08:38

pelumi

1,332
11
21

martin-martin · Answer 4 · 2021-09-15T09:08:57.437

28

If you're working with big data and need to concatenate multiple datasets calling concat many times can get performance-intensive.

If you don't want to create a new df each time, you can instead aggregate the changes and call concat only once:

frames = [df_A, df_B]  # Or perform operations on the DFs
result = pd.concat(frames)

This is pointed out in the pandas docs under concatenating objects at the bottom of the section):

Note: It is worth noting however, that concat (and therefore append) makes a full copy of the data, and that constantly reusing this function can create a significant performance hit. If you need to use the operation over several datasets, use a list comprehension.

edited Sep 15 '21 at 09:08

answered Oct 10 '17 at 07:53

martin-martin

2,834
1
30
51

2

I think there should be `pd.concat(frames)` since pandas doesn't have `append` method. – My Work Jan 04 '21 at 09:37
1

I don't fully undestand the list "comprehension" focus. What's important here is not calling append every time and hence gathering all the dataframes into a list first. Whether that list is established through a list comprehension or not is completely irrelevant. – MrR Apr 27 '21 at 19:06
Thanks for the very relevant comments, I updated the answer to address them. – martin-martin May 14 '21 at 07:55
what is the intended definition of the process_file(f) function? – lrthistlethwaite Sep 14 '21 at 17:57
That was meant as an example for performing operations on the individual DFs before concatenating them, but I see it's less helpful than I initially thought. Updated the answer, thanks. – martin-martin Sep 15 '21 at 09:10

score 5 · Answer 5 · answered Jan 09 '20 at 22:45

If you want to update/replace the values of first dataframe df1 with the values of second dataframe df2. you can do it by following steps —

Step 1: Set index of the first dataframe (df1)

df1.set_index('id')

Step 2: Set index of the second dataframe (df2)

df2.set_index('id')

and finally update the dataframe using the following snippet —

df1.update(df2)

How do I combine two dataframes?

5 Answers5

Linked

Related