1

My Y variable (output) is binary (0 or 1). I have 10 input variables in total, 3 of them are scaled variable, 2 of them are ordinal number therefore being written with C( ). Rather than running the forward and backward by adding or deleting the variable one by one, is there any automated code that could suggest the best input combination? The best combination should achieve highest Recall. Here is my current code, but still resulting 0 Precision and 0 Recall. Please kindly help

import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import confusion_matrix, recall_score, precision_score, f1_score
from sklearn.feature_selection import RFE
from sklearn.preprocessing import MinMaxScaler
from sklearn.preprocessing import OneHotEncoder

Load your dataset into a Pandas DataFrame

data = pd.read_csv('Churn_Modelling.csv')

Preprocessing the data

scaler = MinMaxScaler() var_names = ['CreditScore', 'Balance', 'EstimatedSalary']

Scaling the columns

data_scaled = scaler.fit_transform(data[var_names]) data_scaled = pd.DataFrame(data_scaled, columns=var_names) data_scaled = data_scaled.rename(columns={'CreditScore': 'ScaledCS', 'Balance': 'ScaledBalance', 'EstimatedSalary': 'ScaledES'})

Combining the scaled columns with the original data

data = pd.concat([data, data_scaled], axis=1) data = data.drop(columns=['CreditScore', 'Balance', 'EstimatedSalary'])

Define the target variable and features

target_variable = 'Exited' features = ['Geography', 'Gender', 'Age', 'Tenure', 'NumOfProducts', 'HasCrCard', 'IsActiveMember', 'ScaledCS', 'ScaledBalance', 'ScaledES']

Separate the target variable and features

X = data[features] # Features y = data[target_variable] # Target variable

One-hot encode the categorical variables

categorical_columns = ['Geography', 'Gender'] X = pd.get_dummies(X, columns=categorical_columns, drop_first=True)

Split the dataset into training and testing sets

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

Initialize the logistic regression model with max_iter parameter

model = LogisticRegression(max_iter=1000)

Use Recursive Feature Elimination (RFE) for forward selection

selector = RFE(model, n_features_to_select=1, step=1) selector = selector.fit(X_train, y_train)

Get the selected features

selected_features = X.columns[selector.support_]

Train a logistic regression model with the selected features

model.fit(X_train[selected_features], y_train) y_pred = model.predict(X_test[selected_features])

Evaluate the model

conf_matrix = confusion_matrix(y_test, y_pred) recall = recall_score(y_test, y_pred) precision = precision_score(y_test, y_pred) f1 = f1_score(y_test, y_pred)

Print the results

print("Selected Features:", selected_features) print("Confusion Matrix:\n", conf_matrix) print("Recall:", recall) print("Precision:", precision) print("F1 Score:", f1)

J-J-J
  • 4,098
sct 22n
  • 11

0 Answers0