What does "weighted logistic regression" mean?
I came across this term "weighted logistic regression"in this paper.
I have read the paper a lot of times throughly. But I still can't get the idea of the author. Hope you can help me!
What does "weighted logistic regression" mean?
I came across this term "weighted logistic regression"in this paper.
I have read the paper a lot of times throughly. But I still can't get the idea of the author. Hope you can help me!
Let's begin with a weighted average, which slightly modifies the formula for an ordinary average:
$$\bar{x}^w=\frac{\sum_i w_i x_i}{\sum w_i}$$
An unweighted average would correspond to using $w_i=1$ (though any other constant would do as well).
Why would we do that?
Imagine, for example, that each value occurred multiple times ("We have 15 ones, 23 twos, 19 threes, 8 fours and 1 six"); then we could use weights to reflect the multiplicity of each value ($w_1=15$, etc). Then the weighted average is a faster way to calculate the average you'd get if you wrote "1" fifteen times and "2" twenty three times (etc) and calculated the ordinary average.
For another possible example, imagine that each observation was itself an average. Each average is not equally informative -- the ones based on larger samples should carry more weight (other things being equal).
(In that case if we set each observation's weight to the underlying sample size, we get the overall average of all the data that would comprise the component averages.)
There are many other reasons one might weight observations differently, though (e.g. if the observation values are not all equally precise).
In somewhat similar fashion, we can modify the estimator in ordinary regression to incorporate weights to the observations. It will reproduce a weighted average when the regression is intercept only.
The usual multiple regression estimator is $\hat{\beta}=(X^\top X)^{-1}X^\top y$. The weighted regression estimator is $\hat{\beta}=(X^\top W X)^{-1}X^\top W y$, where $W$ is a diagonal matrix, with weights on the diagonal, $W_{ii} = w_i$.
Weighted logistic regression works similarly, but without a closed form solution as you get with weighted linear regression.
Weighted logistic regression is used when you have an imbalanced dataset. Let's understand with an example.
Let's assume you have a dataset with patient details and you need to predict whether patient has cancer or not. Such datasets are generally imbalanced. If you have 10,000 data points who is having cancer and 1,000,000 data points don't have cancer. So, your approach may be like:
Now you give weight=10 for data points don't have cancer so that, effect of 1,00,000 sampled data points is same as 1,000,000 data points. This is one of techniques that you can use when you have imbalanced dataset.
calculation depends on your sample size. Here in above case sample size is 1,00,000 but your original sample size is 1,000,000. hence, your weight = 1000000/100000 = 10.
– Parimal Roy Dec 31 '19 at 05:12