1

Suppose I have a dataframe looking something like this:

  col1 col2 col3 col4
0    A    B    F    O
1    A         G    Q
2    A    C    G    P
3    A         H     
4    A    D    I     
5    A    D    I     
6    A         J    U
7    A    E         J

How can I shift the columns if the column value is empty?

  col1 col2  col3  col4
0    A    B     F     O
1    A    G     Q 
2    A    C     G     P
3    A    H  
4    A    D     I 
5    A    D     I 
6    A    J     U 
7    A    E     J 

I thought I could check current column, if it's empty, take the next column value and make that empty.

for col in df.columns:    
    df[col] = np.where((df[col] == ''), df[f'col{int(col[-1])+1}'], df[col])
    df[f'col{int(col[-1])+1}'] = np.where((df[col] == ''), '', df[col])

But I am failing somewhere. Sample df below.

df = pd.DataFrame(
    {
        'col1': ['A','A','A','A','A','A','A','A'],
        'col2': ['B','','C','','D','D','','E'],
        'col3': ['F','G','G','H','I','I','J',''],
        'col4': ['O','Q','P','','','','U','J']
    }
)
destinychoice
  • 453
  • 3
  • 12

3 Answers3

2

One way is to use np.argsort:

s = df.to_numpy()
orders = np.argsort(s=='', axis=1, kind='mergesort')

df[:] = s[np.arange(len(s))[:,None],orders]

Output:

  col1 col2 col3 col4
0    A    B    F    O
1    A    G    Q     
2    A    C    G    P
3    A    H          
4    A    D    I     
5    A    D    I     
6    A    J    U     
7    A    E    J     

Note: A very similar approach can be found in this question.

Quang Hoang
  • 131,600
  • 10
  • 43
  • 63
2

Replace empty string with NaN

df = df.replace('', np.nan)

Apply dropna row-wise

odf = df.apply(lambda x: pd.Series(x.dropna().values), axis=1)

To retain column names,

odf.columns = df.columns

NOTE: It is always good to represent missing data with NaN

Output

  col1 col2 col3 col4
0    A    B    F    O
1    A    G    Q  NaN
2    A    C    G    P
3    A    H  NaN  NaN
4    A    D    I  NaN
5    A    D    I  NaN
6    A    J    U  NaN
7    A    E    J  NaN
Vishnudev
  • 9,498
  • 1
  • 15
  • 50
0

You can count the number of empty values for a column, then sort it, and finally get the desired datframe..

counts = {}
for col in df.columns.to_list():
    counts[col] = (df[col]== '').sum()  #Based on the example you have provided.
# Then sort the dictionary based on counts.
counts = dict(sorted(counts.items(), key=lambda item: item[1]))
#Assign back to the dataframe.
df = df[[*counts.keys()]]
df

  col1 col3 col2 col4
0    A    F    B    O
1    A    G         Q
2    A    G    C    P
3    A    H          
4    A    I    D     
5    A    I    D     
6    A    J         U
7    A         E    J
ThePyGuy
  • 13,387
  • 4
  • 15
  • 42