5

My data is slightly unbalanced, so I am trying to do a SMOTE algorithm before doing the logistic regression model. When I do, I get the error: KeyError: 'Only the Series name can be used for the key in Series dtype mappings.' Could someone help me figure out why? Here is the code:

X = dummies.loc[:, dummies.columns != 'Count']
y = dummies.loc[:, dummies.columns == 'Count']
#from imblearn.over_sampling import SMOTE
os = SMOTE(random_state=0)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=0)
columns = X_train.columns
os_data_X,os_data_y=os.fit_sample(X_train, y_train) # here is where it errors
os_data_X = pd.DataFrame(data=os_data_X,columns=columns )
os_data_y= pd.DataFrame(data=os_data_y,columns=['Count'])

Thank you!

devdon
  • 61
  • 1
  • 3

3 Answers3

13

I just encountered this problem myself. As it turned out, I had a duplicate column in my dataset. Perhaps double check that this is not the case for your dataset.

Maxime
  • 131
  • 2
1

I actually just fixed this problem! I made them matrices: os_data_X,os_data_y=os.fit_sample(X_train.as_matrix(), y_train.as_matrix())

devdon
  • 61
  • 1
  • 3
  • 1
    as_matrix is deprecated for more recent versions of pandas. This thread https://stackoverflow.com/questions/13187778/convert-pandas-dataframe-to-numpy-array recommends to_numpy or values. – Evelin Amorim Feb 01 '21 at 17:12
1

100% correct solution.

Try to convert your X features into an array first and then feed to SMOTE:

sm = SMOTE()

X=np.array(X)

X, y = sm.fit_sample(X, y.ravel())