Variation in Fisher scoring compared with Newton-Raphson (logistic regression)

Question

I have been trying to figure out the implementation in knime. The tool says it uses Fisher scoring (FS). I understand Newton Raphson-method from http://www.win-vector.com/blog/2011/09/the-simpler-derivation-of-logistic-regression/ but can someone explain what is the exact difference between Fisher scoring and the Newton-Raphson method.

This is my knowledge of FS:

1) There is a score function $V(\theta)$ which is the gradient(derivative) of the log-likelihood function. //reference wikipedia

2) For weight updates the Hessian of log-likelihood is used.

Both of the above steps are done in Newton-Raphson method also but there isn't any mention of the score function but it does take first derivative and obtain Hessian.

It is mentioned that Fisher information as the variance of score which is Jacobian, or expected value of observed information which is Hessian. So in the final equation for weight update using Fisher information I don't understand how to take expected value using Hessian. Is it something like subtracting each field with its column mean and so obtain a final matrix which multiplied with score to obtain the second part of RHS in weight update equation?

I know my understanding of the algorithm is cluttered...Can someone detail the step by step procedure for calculating the Fisher information.

https://stats.stackexchange.com/questions/130110/maximize-log-likelihood-of-logistic-regression ... (in the question) or https://stats.stackexchange.com/questions/235514/how-do-i-get-cost-function-of-logistic-regression-in-scikit-learn-from-log-likel (ditto); the log-likelihood is in several other questions. It's not clear what the point of posting a slab of code is; we're not a "translate code into mathematics" site -- and what if the code is wrong? Better to ask what the log-likelihood is (and then you can try to see if it's correctly implemented) — Glen_b, May 29 '17 at 09:32
Also see Elements of Statistical Learning, Eqn 4.20 or http://www.stat.cmu.edu/~cshalizi/uADA/12/lectures/ch12.pdf eq 12.7 to 12.10 — Glen_b, May 29 '17 at 09:37
@Glen_b The code is implemented in knime tool. See the link posted along. I have gone through many sites but the formula here I couldn't connect with any and so is the question. — Devi, May 29 '17 at 09:45
This doesn't seem to be a suitable question here. We're not a code review site or a code-translator site. "What is the log-likelihood for logistic regression" is on topic (but covered). "What is this code doing?" is off topic. What it's implemented in and what you link to doesn't alter that. — Glen_b, May 29 '17 at 09:47
If you edit your question to ask something that doesn't appear to be about translating code, it may do better. — Glen_b, May 29 '17 at 11:00
Thanks, that's more clearly on topic. See the posts that are found for example by https://stats.stackexchange.com/search?q=fisher+scoring+newton -- in particular see the information in this question which states what the difference is. The first part of the answer here gives an explicit example. You may wish to revise your question so that it's not asking just those things — Glen_b, Jun 05 '17 at 06:46
See also here in the question, just above "Questions" (near the bottom), where explicit update formulas are given for Newton-Raphson and then Fisher scoring. Note the difference between the two is the difference between H and E(H) in the NR line and the first of the two FS lines.. — Glen_b, Jun 05 '17 at 07:00
It is said fisher information is expection of hessian. So on finding E(X) will not a dimension of a (2X3)matrix reduce to (2X1) https://math.stackexchange.com/questions/694426/how-do-i-calculate-the-expectation-value. Am I correct ? — Devi, Jun 05 '17 at 07:29
I'm not sure what you're asking. Taking expectation of a random variable doesn't change its dimension. Please clarify what you need to know by editing to produce a clear question above — Glen_b, Jun 05 '17 at 09:30
@Glen_b I have dropped the idea of finding expectation to a hessian matrix to calculate the fisher information. I have found out taking covariance matrix of first derivative of log likelihood could also yield fisher information. So suppose I find the covariance matrix, so its dimension will be nxn right? — Devi, Jun 08 '17 at 08:08
The variance-covariance matrix of a $p$-vector is $p\times p$ — Glen_b, Jun 08 '17 at 09:14
Yes where p is the no. of attributes. So in β update for fisher, I will have to multiply inverse of pxp covariance matrix along with my mxp matrix V(β)? — Devi, Jun 08 '17 at 09:42
Have a look in http://gen.lib.rus.ec/search.php?req=Generalized%20Linear%20Models%20and%20Extensions&lg_topic=libgen&open=0&view=simple&res=25&phrase=1&column=def for a detailed comparison of the 2 algorithms... — Tom Wenseleers, Aug 27 '19 at 21:01

score 1 · Answer 1 · answered Nov 27 '18 at 06:17

The logistic regression is a generalized linear model with canonical link which means the expected information matrix (EIM) or Fisher Information is the same as the observed information matrix (OIM). The way to compute the information matrix is the inverse of the negative of the Hessian evaluated at the parameter estimates.

Variation in Fisher scoring compared with Newton-Raphson (logistic regression)

1 Answers1

Linked