33

I am able to add a new column in Panda by defining user function and then using apply. However, I want to do this using lambda; is there a way around?

For Example, df has two columns a and b. I want to create a new column c which is equal to the longest length between a and b.

Some thing like:

df['c'] = df.apply(lambda x, len(df['a']) if len(df['a']) > len(df['b']) or len(df['b']) )

One approach:

df = pd.DataFrame({'a':['dfg','f','fff','fgrf','fghj'], 'b' : ['sd','dfg','edr','df','fghjky']})

df['c'] = df.apply(lambda x: max([len(x) for x in [df['a'], df['b']]]))
print df
      a       b   c
0   dfg      sd NaN
1     f     dfg NaN
2   fff     edr NaN
3  fgrf      df NaN
4  fghj  fghjky NaN
jezrael
  • 729,927
  • 78
  • 1,141
  • 1,090
piyush sharma
  • 357
  • 1
  • 3
  • 7
  • This will work once you fix the syntax errors. `lambda x` needs a colon after it, and your expression lacks `else` (maybe it should go instead of `or`). – Lev Levitsky Nov 12 '15 at 20:29
  • Thanks for the quick response, however it still not work. Here is the code and error message. I will appreciate if you can provide any help. df = pd.DataFrame({'a':['dfg','f','fff','fgrf','fghj'], 'b' : ['sd','dfg','edr','df','fghjky']}) df['c'] = df.apply(lambda x: len(x['a']) if len(x['a']) > len(x['b']) else len(x['b'])) KeyError: ('a', u'occurred at index a') – piyush sharma Nov 12 '15 at 21:18
  • 1
    Please don't put code in comments, [edit] the question instead. – Lev Levitsky Nov 12 '15 at 21:20
  • Sorry this is my first time here. I try to edit my question but still its not coming in a nice formatted way – piyush sharma Nov 12 '15 at 21:33
  • In the edit mode, there is a button that opens formatting help. First off, you can select the code and press Ctrl-K, that will indent it by 4 spaces. – Lev Levitsky Nov 12 '15 at 21:36

1 Answers1

36

You can use function map and select by function np.where more info

print df
#     a     b
#0  aaa  rrrr
#1   bb     k
#2  ccc     e
#condition if condition is True then len column a else column b
df['c'] = np.where(df['a'].map(len) > df['b'].map(len), df['a'].map(len), df['b'].map(len))
print df
#     a     b  c
#0  aaa  rrrr  4
#1   bb     k  2
#2  ccc     e  3

Next solution is with function apply with parameter axis=1:

axis = 1 or ‘columns’: apply function to each row

df['c'] = df.apply(lambda x: max(len(x['a']), len(x['b'])), axis=1)
Community
  • 1
  • 1
jezrael
  • 729,927
  • 78
  • 1,141
  • 1,090
  • 1
    Map might works but mainly I am looking for a way to use Lambda with two columns and create a new column if possible – piyush sharma Nov 12 '15 at 21:34
  • 1
    Why do you want use lambda? – jezrael Nov 12 '15 at 21:36
  • The reason for using lambda is less typing and for me the code is more readable – piyush sharma Nov 12 '15 at 22:57
  • 2
    For future readers, the mistake was thus forgetting the axis = 1 (which caused the KeyError 'a' as we were iterating through the row_indexer [0,1,2,3,4]) instead of df['a'], df['b']. And also Jezraels Solution#2 is a bit neater, since lambda already loops through the rows. – Fed May 24 '20 at 01:37