11

I have a pandas dataframe that I would like to use an apply function on to generate two new columns based on the existing data. I am getting this error: ValueError: Wrong number of items passed 2, placement implies 1

import pandas as pd
import numpy as np

def myfunc1(row):
    C = row['A'] + 10
    D = row['A'] + 50
    return [C, D]

df = pd.DataFrame(np.random.randint(0,10,size=(2, 2)), columns=list('AB'))

df['C', 'D'] = df.apply(myfunc1 ,axis=1)

Starting DF:

   A  B
0  6  1
1  8  4

Desired DF:

   A  B  C   D
0  6  1  16  56
1  8  4  18  58
user2242044
  • 7,943
  • 23
  • 91
  • 154

6 Answers6

13

Based on your latest error, you can avoid the error by returning the new columns as a Series

def myfunc1(row):
    C = row['A'] + 10
    D = row['A'] + 50
    return pd.Series([C, D])

df[['C', 'D']] = df.apply(myfunc1 ,axis=1)
oim
  • 846
  • 8
  • 14
5

Please be aware of the huge memory consumption and low speed of the accepted answer: https://ys-l.github.io/posts/2015/08/28/how-not-to-use-pandas-apply/ !

Using the suggestion presented there, the correct answer would be like this:

def run_loopy(df):
    Cs, Ds = [], []
    for _, row in df.iterrows():
        c, d, = myfunc1(row['A'])
        Cs.append(c)
        Ds.append(d)
    return pd.Series({'C': Cs,
                      'D': Ds})

def myfunc1(a):
    c = a + 10
    d = a + 50
    return c, d

df[['C', 'D']] = run_loopy(df)
tobyvd
  • 65
  • 1
  • 6
Federico Dorato
  • 589
  • 6
  • 21
3

df['C','D'] is considered as 1 column rather than 2. So for 2 columns you need a sliced dataframe so use df[['C','D']]

df[['C', 'D']] = df.apply(myfunc1 ,axis=1)

    A  B   C   D
0  4  6  14  54
1  5  1  15  55

Or you can use chain assignment i.e

df['C'], df['D'] = df.apply(myfunc1 ,axis=1)
Bharath
  • 28,527
  • 5
  • 52
  • 95
  • 1
    This worked on my example dataset (so upvoted), but does not work on my real dataset despite identical code. Error: `KeyError: "['C' 'D'] not in index"` – user2242044 Dec 25 '17 at 15:10
  • 1
    I need to see how you are assigning the data. Your actual code perhaps. – Bharath Dec 25 '17 at 15:10
  • 1
    Same way, the only code that is difference is reading in a dataframe from CSV vs using numpy to generate fake data `df[['C', 'D']] = df.apply(myfunc1 ,axis=1)` – user2242044 Dec 25 '17 at 15:11
  • 1
    Your myfunc1 is same as the above? – Bharath Dec 25 '17 at 15:11
  • 1
    @user2242044. Your error message shows that there is a missing comma between ‘C’ and ‘D’. – Goose Dec 25 '17 at 15:37
  • @Goose you know if you dont pass a comma it will be considered as a single string like `'CD'`. Sometimes assignment wont work. Hard to remember the cases. – Bharath Dec 25 '17 at 15:38
3

It works for me:

def myfunc1(row):
    C = row['A'] + 10
    D = row['A'] + 50
    return C, D

df = pd.DataFrame(np.random.randint(0,10,size=(2, 2)), columns=list('AB'))

df[['C', 'D']] = df.apply(myfunc1, axis=1, result_type='expand')
df

add: ==>> result_type='expand',

regards!

Marcelo
  • 53
  • 4
  • Just had this problem and adding `, result_type='expand'` was the only way I could get this to work, thank you – a11 May 06 '22 at 18:35
1

Add extra brackets when querying for multiple columns.

import pandas as pd
import numpy as np

def myfunc1(row):
    C = row['A'] + 10
    D = row['A'] + 50
    return [C, D]

df = pd.DataFrame(np.random.randint(0,10,size=(2, 2)), columns=list('AB'))

df[['C', 'D']] = df.apply(myfunc1 ,axis=1)
gabe_
  • 354
  • 2
  • 5
1

I believe can achieve similar results to @Federico Dorato answer without use of for loop. Return a list rather than a series and use lambda-apply + to_list() to expand results.

It's cleaner code and on a random df of 10,000,000 rows performs as well or faster.

Federico's code

run_time = []

for i in range(0,25):
    df = pd.DataFrame(np.random.randint(0,10000000,size=(2, 2)), columns=list('AB'))
    def run_loopy(df):
        Cs, Ds = [], []
        for _, row in df.iterrows():
            c, d, = myfunc1(row['A'])
            Cs.append(c)
            Ds.append(d)
        return pd.Series({'C': Cs,
                        'D': Ds})

    def myfunc1(a):
        c = a / 10
        d = a + 50
        return c, d

    start = time.time()
    df[['C', 'D']] = run_loopy(df)
    end = time.time()

    run_time.append(end-start) 
print(np.average(run_time)) # 0.001240386962890625

Using lambda and to_list

run_time = []

for i in range(0,25):
    df = pd.DataFrame(np.random.randint(0,10000000,size=(2, 2)), columns=list('AB'))

    def myfunc1(a):
        c = a / 10
        d = a + 50
        return [c, d]

    start = time.time()
    df[['C', 'D']] = df['A'].apply(lambda x: myfunc1(x)).to_list()
    end = time.time()
run_time.append(end-start)
print(np.average(run_time)) #output 0.0009996891021728516
born_naked
  • 625
  • 8
  • 16