0

I am very wondering why we do not use least squares instead of maximum likelihood?

for example we have 3 choices k= 1, 2 ,3

$minimizing: (e^{\beta_{i} X}/(1+\sum e^{\beta_{i} X})- Y)^{2} $ for i=1,2,3

sherek_66
  • 137
  • 6

1 Answers1

1

The short answer is because it is not maximum likelihood estimation, so it is not optimal. Maximum likelihood solves for $\beta$ that makes the observed data most likely to have been observed. The likelihood function for Bernoulli random variables ($Y=0,1$) involves exponents in $Y$, not squares.

Frank Harrell
  • 91,879
  • 6
  • 178
  • 397