3

I'm trying to grok how the standard errors for a marginal effect are computed.

I know that if $\eta_i = x^T\beta$, then the square of the standard error of $\eta_i$ is $x^T \Sigma x$, where $\Sigma$ is the covariance matrix obtained from the regression.

Suppose then I want to estimate the marginal effect (on the link scale) for a Poisson regression model. If my data are housed in a dataframe df and I want a point estimate of the marginal effect of txt (a binary indicator), then I can simply do

control = df.assign(txt='Control')
treatment = df.assign(txt='Treatment')

Here, I'm using statsmodels.

The predict method returns estimates on the natural scale

treat_y = fit.predict(treatment) control_y = fit.predict(control)

Take logs to get things back onto the scale of the linear predictor.

np.log(treat_y).mean() - np.log(control_y).mean()

The point estimate agrees with {marginaleffects}, a very good R library for computing these quantities. However, I seem to get the standard error incorrect.

My first step is to construct design matrices for the case where everyone has txt=0 and everyone has txt=1.

X_control = patsy.dmatrix(fit.model.data.design_info.builder, control, return_type="matrix")
X_treat = patsy.dmatrix(fit.model.data.design_info.builder, treatment, return_type="matrix")

Now, I can compute the variance for each prediction

cov_mat = fit.cov_params().values

var_control = ((X_control @ cov_mat) * X_control).sum(1) var_treat = ((X_treat @ cov_mat) * X_treat).sum(1)

Finally, since I am taking the mean, this means I should take the means of the variances

se_T = var_treat.mean()
se_C = var_control.mean()

Since I want the standard error of the difference, the variances should add

se = np.sqrt(se_T + se_C)

This returns a standard error of 0.12 whereas marginaleffects returns a standard error of 0.067. I will provide the code required to generate the data. I'd much appreciate some help in computing the standazrd error for a marginal effect

Code for Data

library(tidyverse)
library(marginaleffects)
library(sandwich)

set.seed(0) N <- 1000 txt <- sample(c('Treatment', 'Control'), size=N, replace=T) g <- sample(letters[1:3], size=N, replace=T) X <- model.matrix(~txtg) beta <- c(log(0.4), 0.02, 0.2, 0.25, 0, 0) eta <- X %% beta lam <- exp(eta) y <- rbinom(N, 1, lam)

tibble(txt, g, y) %>% write_csv('test.csv')

The model I am fitting is fit <- glm(y~txt*g, data=d, family = poisson) in R (using robust covariance). In python, the model is

import numpy as np
import patsy
from statsmodels.formula.api import poisson
import pandas as pd

df = pd.read_csv('test.csv')

fit = poisson('y ~ txt * g', data=df).fit(cov_type='HC0')

fit.summary() ```

0 Answers0