1

We are trying to use SelectKBest F_Regression scoring function on a pool of 1000 numerical features, and solve a regression problem. Also, we wanted to paralellize the execution of SelectKBest and we succeeded too in doing so, as i understand that f_regression execution is a UNIVARIATE APPROACH.

But, the major challenge is to understand exactly how SelectKBest is able to compute the f_regression score. I get that part, that it is trying to perform ANOVA and then get a F_Value, but is the computed F-Value denotes the f_regression_score calculated by SelectKBest?.

I had a very extensive look at the mentioned article, and tried to create a function executing the same formula in order to match the result, but the computed F-Score doesn't matches what the SelectKBest's f_regression_score gives.

x_bar = float((sum(tag1.iloc[:,0].values) + sum(target.iloc[:,0].values))/(len(tag1) + len(target)))

tag1_mean = tag1.mean() target_mean = target.mean() n = len(tag1) m = len(target) numerator = n(tag1_mean - x_bar) + m(target_mean - x_bar)

ssw_tag = np.sum(((tag1-tag1_mean)2)) ssw_target = np.sum(((target-target_mean)2)) degree_freedom = n-1+m-1 denominator = (ssw_tag + ssw_target)/degree_freedom

F_Val = numerator/denominator

F_Val

The sample dataset i am using is something like this:

F1 F2 Target
2137.03417969 2247.9690 343.7083
2202.64135742 2249.1404 343.7735
2147.74707031 2243.414 343.9496
2131.01513672 2249.7673 344.0170
2177.02587891 2242.8867 343.9583
2202.58325195 2242.8474 343.8483
2163.75610352 2248.8467 343.6372
2138.95410156 2251.7893 343.5075
2246.29736328 2248.6138 343.4942
2235.34008789 2247.8184 343.5491
2162.52905273 2257.0894 343.6237

My requirement is to match the f_regression scores for each feature with the score obtained from SelectKBest.

0 Answers0