If i have some simulated standard normally distributed data: $$µ_i = β_0 + β_1X_{i1} + β_2X_{i2} +···+β_kX_{ik}$$ where $$Y_i \sim N(\mu_i, 1)$$ Created with function: (with python in this case)
def simulate_normal(dummy, k, n, p, k1, k2, seed):
# Set seed
np.random.seed(seed)
# Simulate regcoef
P = np.random.randn(k, p)
Ym = dummy@P
_, v, _ = PCA(Ym, k)
# Simulate noise
# Structured noise
Esn = np.random.randn(n*k,k)@v.T
# White noise
Ewn = np.random.randn(n*k,p)
# Blend data
E = k1*Esn + k2*Ewn
Y = Ym + E
return Y
where dummy is a dummy matrix of the linear predictors (contrast code), k is number of factor levels, n is number of observations per factor level, p is number of variables, k1 and k2 is the size of the noise of structured and white noise.
And i make it into binomial (bernoulli) data with the logistic function: $$p =\frac{1}{1 + e^{-Y}} = \frac{1}{1 + e^{-(x^T_i + E)}}$$
def make_binomial(Y):
# Logistic function:
Y_log = 1/(1 + np.exp(-Y))
# Random binomial
Y_bin = np.random.binomial(Y_log)
return Y_bin
If i then fit two models on the binomial data Y_bin, one with OLS (even though it does not make sense to do) and one with logit link and GLM, i would assume the linear predictors of the glm model $$\eta = β_0 + β_1X_{i1} + β_2X_{i2} +···+β_kX_{ik}$$ are the same as the Ym from the simulate_normal() function.
I would also assume that the linear predictors from the OLS model are the same as the Y_log from the make_binomial() function, or $$µ_i = p_i$$
If i chose to look at only $β_1X_{i1}$ from the GLM model I would expect to get the same values as $β_1X_{i1}$ from the simulated normally distributed data. (Dependent on the error)
So my question.. :
How would i interpret $β_1X_{i1}$ of the OLS model if i chose to look at the contribution of only one linear predictor?


