I have an odd scenario in my data analysis and I'm not sure what is causing it. I have a large set of tuples $(Y_1, X_i) \dots, (Y_N, X_N)$ where $Y_i$ is a random vector from some arbitrary distribution (might not be normal) and $X_i$ is a vector denoting group membership such that $X_i \in [0,1,2]$. The length of the vectors are 200. I shuffle these vectors independently and observe uniform analytical OLS p-values for $\hat{\beta}$ as expected under the null. Then I run robust standard errors using this same shuffled dataset and observe extreme inflation in p-values. I am using the statsmodels package in Python to compute robust errors as follows
model_ols = sm.OLS(y, sm.add_constant(x)).fit(cov_type='HC3')
I'm not sure why this would be the case. Here are some observations about my data:
- The distribution of each vector $Y_i$ roughly follows a beta distribution. It can be Gaussian like at a mean of 0.5 and skewed at a mean of 0.9 or 0.1.
- The groups denote in $X_i$ can be heavily imbalanced.
- The variance per group is highly variable (i.e. heteroskedastic).
- The estimates for $\hat{\beta}$ are indeed identical for both runs. I know $\hat{\beta}$ is unbiased so this is expected.
- I've confirmed the standard errors reported by the statsmodels is identical (negligible numerical imprecision difference) to the jackknife standard error.
I suppose to simplify the question in broader terms: when would OLS control type I error under the null but robust errors result in poor type I error control on the same data?