I have a logistic regression model, whose goal is to predict the winner of a sports match (a two-player heads-up game). I am using statistics from both players as features; using the difference between the statistics, instead of the raw values. For example, Person A's average running time - Person B's average running time, could be a feature.
After implementing logistic regression and cross-validation, I have a bit of an odd result - when I use scikit's in-built probability function, I get asymmetrical results. For example, if use my model to predict Person A's chances of beating Person B, using Person A features - Person B features, this value will not be equal to the compliment of when I predict Person B's chances of beating Person A, using Person B features - Person A features instead.
Is there a way to interpret the results of my model (since I do not know which probability to use: { Person A probability of beating Person B}, or { 1 - Prob(Person B beating Person A)}. I would say the average difference between the probabilities is around 10-20%.
Here's an example.
df = pd.DataFrame([[1, 2, 1 ], [1, 3, 0 ], [4, 6, 1 ]], columns=['Diff_Average_Running Time', 'Diff_Average_Max_Speed', 'Result'])
selected_features = ['Diff_Average_Running Time', 'Diff_Average_Max_Speed']
X_train, X_test, y_train, y_test = train_test_split(df[selected_features], df['Result'])
y_train = y_train.astype(int)
y_test = y_test.astype(int)
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)
model = LogisticRegression(fit_intercept = True, random_state=42)
model.fit(X_train_scaled, y_train)
Then I would create a new dataframe with statistics of a matchup I want to project and scale it using the same scaler. For example
predict_df = pd.DataFrame([[-1,2]], columns = ['Diff_Average_Running Time', 'Diff_Average_Max_Speed'])
predict_df = scaler.transform(predict_df)
and use sckit-learn's prediction function to now project the probabilities
model.predict_proba(predict_df)
Now the opposite projection (Person B's chances of beating Person A) would simply be this dataframe
opposite_predict_df = pd.DataFrame([[1,-2]], columns = ['Diff_Average_Running Time', 'Diff_Average_Max_Speed'])
opposite_predict_df = scaler.transform(opposite_predict_df)
However, when projecting the probability of B beating A using the same method, I expect
model.predict_proba(opposite_predict_df) = 1 - model.predict_proba(predict_df)
However, this does not hold. Often times the complement of Person B's probability is larger by Person A's probability by a noticeable margin.