My Y variable (output) is binary (0 or 1). I have 10 input variables in total, 3 of them are scaled variable, 2 of them are ordinal number therefore being written with C( ). Rather than running the forward and backward by adding or deleting the variable one by one, is there any automated code that could suggest the best input combination? The best combination should achieve highest Recall. Here is my current code, but still resulting 0 Precision and 0 Recall. Please kindly help
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import confusion_matrix, recall_score, precision_score, f1_score
from sklearn.feature_selection import RFE
from sklearn.preprocessing import MinMaxScaler
from sklearn.preprocessing import OneHotEncoder
Load your dataset into a Pandas DataFrame
data = pd.read_csv('Churn_Modelling.csv')
Preprocessing the data
scaler = MinMaxScaler()
var_names = ['CreditScore', 'Balance', 'EstimatedSalary']
Scaling the columns
data_scaled = scaler.fit_transform(data[var_names])
data_scaled = pd.DataFrame(data_scaled, columns=var_names)
data_scaled = data_scaled.rename(columns={'CreditScore': 'ScaledCS',
'Balance': 'ScaledBalance',
'EstimatedSalary': 'ScaledES'})
Combining the scaled columns with the original data
data = pd.concat([data, data_scaled], axis=1)
data = data.drop(columns=['CreditScore', 'Balance', 'EstimatedSalary'])
Define the target variable and features
target_variable = 'Exited'
features = ['Geography', 'Gender', 'Age', 'Tenure', 'NumOfProducts', 'HasCrCard', 'IsActiveMember', 'ScaledCS', 'ScaledBalance', 'ScaledES']
Separate the target variable and features
X = data[features] # Features
y = data[target_variable] # Target variable
One-hot encode the categorical variables
categorical_columns = ['Geography', 'Gender']
X = pd.get_dummies(X, columns=categorical_columns, drop_first=True)
Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
Initialize the logistic regression model with max_iter parameter
model = LogisticRegression(max_iter=1000)
Use Recursive Feature Elimination (RFE) for forward selection
selector = RFE(model, n_features_to_select=1, step=1)
selector = selector.fit(X_train, y_train)
Get the selected features
selected_features = X.columns[selector.support_]
Train a logistic regression model with the selected features
model.fit(X_train[selected_features], y_train)
y_pred = model.predict(X_test[selected_features])
Evaluate the model
conf_matrix = confusion_matrix(y_test, y_pred)
recall = recall_score(y_test, y_pred)
precision = precision_score(y_test, y_pred)
f1 = f1_score(y_test, y_pred)
Print the results
print("Selected Features:", selected_features)
print("Confusion Matrix:\n", conf_matrix)
print("Recall:", recall)
print("Precision:", precision)
print("F1 Score:", f1)