2

I want to know whether my interpretation of GLM weights is correct.

On R documentation of GLM it says that

Non-NULL weights can be used to indicate that different observations have different dispersions (with the values in weights being inversely proportional to the dispersions); or equivalently, when the elements of weights are positive integers , that each response is the mean of unit-weight observations.

I would like to know if I could therefore say that using weights changes the log-likelihood function which is minimized in the following way \begin{align*} \sum_{i} \log f(X_i) \to \sum_{i} w_i \log f(X_i) \end{align*}

If yes does this only hold if the weights are positive integers?

EDIT: If not how can I modify the log likelihood such that this holds? \begin{align*} \sum_{i} \log f(X_i) \to \sum_{i} w_i \log f(X_i) \end{align*}

1 Answers1

2

As described in detail here, you can think of the weights as of pseudo-counts of observations (hence they need to be positive). If your likelihood function is

$$ \mathcal{L}(\theta|x_1,x_2,\dots,x_N) = f_\theta(x_1) f_\theta(x_2) \dots f_\theta(x_N) $$

then if you had your data recorded as tuples $(x_i, n_i)$ where $x_i$ is the value and $n_i$ is the number of times this value was observed, then the likelihood becomes

$$\begin{align} \mathcal{L}(\theta|x_1,x_2,\dots,x_n) &= \underbrace{f_\theta(x_1) \dots f_\theta(x_1)}_{n_1\,\text{times}} \underbrace{f_\theta(x_2) \dots f_\theta(x_2)}_{n_2\,\text{times}} \dots \underbrace{f_\theta(x_n) \dots f_\theta(x_n)}_{n_N\,\text{times}} \\ &= f_\theta(x_1)^{n_1} f_\theta(x_2)^{n_2} \dots f_\theta(x_N)^{n_N} \end{align}$$

with log-likelihood, by the properties of logarithms, this is just

$$ \log \mathcal{L}(\theta|x_1,x_2,\dots,x_N) = n_1 \log f_\theta(x_1) + n_2 \log f_\theta(x_2) + \dots + n_N \log f_\theta(x_N) $$

It is the same if the weights are not counts, but are non-negative and proportional to the counts.

Tim
  • 138,066
  • Thank you very much! – Anja Krause Jan 09 '23 at 09:07
  • @AnjaKrause if this answers your question, you can mark the answer as accepted so others know it is "done". – Tim Jan 09 '23 at 09:10
  • 1
    Did it. Thanks for reminding me! – Anja Krause Jan 09 '23 at 10:36
  • In addition, you can upvote it! An answer which can be accepted also deserves an upvote! – kjetil b halvorsen Jan 09 '23 at 15:39
  • See https://stats.stackexchange.com/questions/369611/can-an-optimal-weighted-average-ever-have-negative-weights/369733#369733 for an example with negative weights – kjetil b halvorsen Jan 09 '23 at 15:42
  • @Tim Do you also know whether this works in this R GLM package? https://www.rdocumentation.org/packages/stats/versions/3.6.2/topics/glm Or do I need the weights to be integers then? Because I am not sure whether I understand the description there correctly because it says equivalently but refers to "if positive integers" – Anja Krause Jan 09 '23 at 20:29
  • @AnjaKrause weights in GLM do not need to be integers, can be any non-negative numbers. – Tim Jan 09 '23 at 20:40