0

I am looking to separately model relationships between two different but very much related dependent variables, and a small range of independent variables.

My 'n' is quite low - 55 to be precise. But it is complete population data (however I do wish to generalise forward into time, so am still concerned with statistical significance).

The distribution of my dependent variables is shown in the following graphs. I can tell they're not normally distributed, so I shouldn't be using OLS or any such equivalent, but I was wondering if anyone might help me pick an appropriate modelling technique to use to explain the variance?

The data is proportional, and hierarchical (11 clusters). I do however also have the data in count form and have generated an offset. I have around 5 independent variables I'd like to include.

I've been looking into using generalised linear models, zero-inflation models, and so on, but can't settle on which to use.

Using both R and Stata.

Dependent Variable 1

Dependent Variable 2

  • 2
    The distribution of your variables is irrelevant for OLS regression. What is important (for inference) is the distribution of the residuals. But if your DVs are proportions you should use a GLM, possibly a GLMM (I'm not sure what you mean by "11 clusters"). Maybe try beta regression. – Roland Nov 23 '16 at 11:21
  • Thanks very much for your reply.

    I was under the impression that non-normally distributed dependent variables shouldn't be used in OLS models - have I gotten that wrong?

    By 11 clusters, I mean specifically that my 55 proportions are nested in 11 groups - principally, parliamentary seats within geographical regions.

    – Patrick English Nov 23 '16 at 11:32
  • Yes, you have gotten that wrong. And due to your clusters you should use a mixed effects model. – Roland Nov 23 '16 at 11:47
  • For a demonstration of Roland's point about bulk vs. residuals distribution see my answer here. – GeoMatt22 Dec 17 '16 at 18:59

0 Answers0