Number of unique values in columns in two pandas dataframe

Question

I have two pd.DataFrame objects (read from .csv file), say,

1, 2
1, 3
2, 4

and

2, 1
1, 2
3, 3

Suppose the DataFrame's are named as data1 and data2. So I can easily count the number of unique values in each column of data1 and data2 individually using

 uniques = data.apply(pd.Series.nunique)

data is replaced by data1 and data2 respectively. So I will get 2, 3 for data1 and 3, 3 for data2. Is there a way (other than concatenating the DataFrame's) so that I can get the number of unique values when these two DataFrame's are combined? I want to get 3, 4.

score 1 · Answer 1 · answered May 19 '17 at 05:52

1

I think not. Need concat first:

df = pd.concat([df1,df2]).apply(pd.Series.nunique)
print (df)
a    3
b    4
dtype: int64

answered May 19 '17 at 05:52

jezrael

729,927
78
1,141
1,090

Allen Qin · Accepted Answer · 2017-05-19T07:07:34.303

1

#use numpy unique to count uninues after combining same columns from both DF.

len(np.unique(np.c_[df1.iloc[:,0],df2.iloc[:,0]]))
Out[1398]: 3

len(np.unique(np.c_[df1.iloc[:,1],df2.iloc[:,1]]))
Out[1399]: 4

edited May 19 '17 at 07:07

answered May 19 '17 at 06:01

Allen Qin

18,332
6
47
59

score 1 · Answer 3 · edited May 23 '17 at 12:18

1

Another alternative that will work for any number of data frames:

dfs = [df1, df2]
print([
    len(set(np.concatenate([df[colname].unique() for df in dfs])))
    for colname in dfs[0]
])
[3, 4]

Note that this will only work if all the data frames have the same column names.

I think that concat is the best option, unless your data frames already fill your local memory: concatenating will copy

edited May 23 '17 at 12:18

Community

1
1

answered May 19 '17 at 09:56

oLas

1,029
1
8
17

I wanted to avoid the concatenation. It's vanilla! – Bishwajit Purkaystha May 19 '17 at 12:40

Number of unique values in columns in two pandas dataframe

3 Answers3