Getting error: KeyError: 'Only the Series name can be used for the key in Series dtype mappings.' when trying to do pandas Smote algorithm

Question

My data is slightly unbalanced, so I am trying to do a SMOTE algorithm before doing the logistic regression model. When I do, I get the error: KeyError: 'Only the Series name can be used for the key in Series dtype mappings.' Could someone help me figure out why? Here is the code:

X = dummies.loc[:, dummies.columns != 'Count']
y = dummies.loc[:, dummies.columns == 'Count']
#from imblearn.over_sampling import SMOTE
os = SMOTE(random_state=0)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=0)
columns = X_train.columns
os_data_X,os_data_y=os.fit_sample(X_train, y_train) # here is where it errors
os_data_X = pd.DataFrame(data=os_data_X,columns=columns )
os_data_y= pd.DataFrame(data=os_data_y,columns=['Count'])

Thank you!

@QuangHoang thank you for the suggestion, but unfortunately it did not fix my error, since the error was on the fit_sample() line. — devdon, Dec 15 '20 at 18:53

score 13 · Answer 1 · answered Dec 15 '20 at 23:34

13

I just encountered this problem myself. As it turned out, I had a duplicate column in my dataset. Perhaps double check that this is not the case for your dataset.

answered Dec 15 '20 at 23:34

Maxime

131
2

1

Thank you, I just checked if I do and there is not a duplicate column – devdon Dec 16 '20 at 17:58
it was also my case... Double check for duplications of the column names. – Amine Jallouli Jan 06 '21 at 05:29
Same problem and your solution fixed it! Thanks. – user2205916 Jan 20 '21 at 18:03
I had the same problem. Thanks! – igorkf Aug 29 '21 at 01:04

score 1 · Answer 2 · answered Dec 16 '20 at 18:11

1

I actually just fixed this problem! I made them matrices: os_data_X,os_data_y=os.fit_sample(X_train.as_matrix(), y_train.as_matrix())

answered Dec 16 '20 at 18:11

devdon

61
1
3

1

as_matrix is deprecated for more recent versions of pandas. This thread https://stackoverflow.com/questions/13187778/convert-pandas-dataframe-to-numpy-array recommends to_numpy or values. – Evelin Amorim Feb 01 '21 at 17:12

score 1 · Answer 3 · answered Mar 23 '21 at 06:19

1

100% correct solution.

Try to convert your X features into an array first and then feed to SMOTE:

sm = SMOTE()

X=np.array(X)

X, y = sm.fit_sample(X, y.ravel())

answered Mar 23 '21 at 06:19

Muhammad Imran Zaman

51
1
3

Getting error: KeyError: 'Only the Series name can be used for the key in Series dtype mappings.' when trying to do pandas Smote algorithm

3 Answers3