27

I have the following data (2 columns, 4 rows):

Column 1: A, B, C, D

Column 2: E, F, G, H

I am attempting to combine the columns into one column to look like this (1 column, 8 rows):

Column 3: A, B, C, D, E, F, G, H

I am using pandas DataFrame and have tried using different functions with no success (append, concat, etc.). Any help would be most appreciated!

smci
  • 29,564
  • 18
  • 109
  • 144
user2929063
  • 283
  • 1
  • 3
  • 6

4 Answers4

35

The trick is to use stack()

df.stack().reset_index()
    
   level_0   level_1  0
0        0  Column 1  A
1        0  Column 2  E
2        1  Column 1  B
3        1  Column 2  F
4        2  Column 1  C
5        2  Column 2  G
6        3  Column 1  D
7        3  Column 2  H
Henry Ecker
  • 31,792
  • 14
  • 29
  • 50
Nickpick
  • 5,569
  • 14
  • 57
  • 107
  • Aren't the values in the rightmost column of this answer in a wrong order compared to a column asked for by the OP? – Martin Jan 11 '22 at 19:30
16

Update

pandas has a built in method for this stack which does what you want see the other answer.

This was my first answer before I knew about stack many years ago:

In [227]:

df = pd.DataFrame({'Column 1':['A', 'B', 'C', 'D'],'Column 2':['E', 'F', 'G', 'H']})
df
Out[227]:
  Column 1 Column 2
0        A        E
1        B        F
2        C        G
3        D        H

[4 rows x 2 columns]

In [228]:

df['Column 1'].append(df['Column 2']).reset_index(drop=True)
Out[228]:
0    A
1    B
2    C
3    D
4    E
5    F
6    G
7    H
dtype: object
Henry Ecker
  • 31,792
  • 14
  • 29
  • 50
EdChum
  • 339,461
  • 188
  • 752
  • 538
9

You can flatten the values in column direction using ravel, is much faster.

In [1238]: df
Out[1238]:
  Column 1 Column 2
0        A        E
1        B        F
2        C        G
3        D        H

In [1239]: pd.Series(df.values.ravel('F'))
Out[1239]:
0    A
1    B
2    C
3    D
4    E
5    F
6    G
7    H
dtype: object

Details

Medium

In [1245]: df.shape
Out[1245]: (4000, 2)

In [1246]: %timeit pd.Series(df.values.ravel('F'))
10000 loops, best of 3: 86.2 µs per loop

In [1247]: %timeit df['Column 1'].append(df['Column 2']).reset_index(drop=True)
1000 loops, best of 3: 816 µs per loop

Large

In [1249]: df.shape
Out[1249]: (40000, 2)

In [1250]: %timeit pd.Series(df.values.ravel('F'))
10000 loops, best of 3: 87.5 µs per loop

In [1251]: %timeit df['Column 1'].append(df['Column 2']).reset_index(drop=True)
100 loops, best of 3: 1.72 ms per loop
Zero
  • 66,763
  • 15
  • 141
  • 151
  • 1
    `df.values` is thunking out to the underlying array, and calling `numpy.ravel()` on it. But pandas offers `stack()`. – smci Apr 01 '20 at 16:59
  • 1
    `DataFrame.to_numpy()` is preferred to `DataFrame.values`. – Frank May 20 '20 at 14:36
4

What you appear to be asking is simply for help on creating another view of your data. If there is no reason those data are in two columns in the first place then just create one column. If however you need to combine them for presentation in some other tool you can do something like:

import itertools as it, pandas as pd
df = pd.DataFrame({1:['a','b','c','d'],2:['e','f','g','h']})
sorted(it.chain(*df.values))
# -> ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h']
mechanical_meat
  • 155,494
  • 24
  • 217
  • 209