Appropriate Models for Non-Normal Distributions

Question

I am looking to separately model relationships between two different but very much related dependent variables, and a small range of independent variables.

My 'n' is quite low - 55 to be precise. But it is complete population data (however I do wish to generalise forward into time, so am still concerned with statistical significance).

The distribution of my dependent variables is shown in the following graphs. I can tell they're not normally distributed, so I shouldn't be using OLS or any such equivalent, but I was wondering if anyone might help me pick an appropriate modelling technique to use to explain the variance?

The data is proportional, and hierarchical (11 clusters). I do however also have the data in count form and have generated an offset. I have around 5 independent variables I'd like to include.

I've been looking into using generalised linear models, zero-inflation models, and so on, but can't settle on which to use.

Using both R and Stata.

The distribution of your variables is irrelevant for OLS regression. What is important (for inference) is the distribution of the residuals. But if your DVs are proportions you should use a GLM, possibly a GLMM (I'm not sure what you mean by "11 clusters"). Maybe try beta regression. — Roland, Nov 23 '16 at 11:21
Thanks very much for your reply.
I was under the impression that non-normally distributed dependent variables shouldn't be used in OLS models - have I gotten that wrong?

By 11 clusters, I mean specifically that my 55 proportions are nested in 11 groups - principally, parliamentary seats within geographical regions. — Patrick English, Nov 23 '16 at 11:32
Yes, you have gotten that wrong. And due to your clusters you should use a mixed effects model. — Roland, Nov 23 '16 at 11:47
For a demonstration of Roland's point about bulk vs. residuals distribution see my answer here. — GeoMatt22, Dec 17 '16 at 18:59

Appropriate Models for Non-Normal Distributions

0 Answers0