Statistically significant differences and relationships for binary variables

Question

I'm trying to conduct an analysis on whether or not there is a significant difference between two populations and different attributes.

Population 1 = 81 observations

Population 2 = 621 observations

For both of these populations I have two variables:

v1 = rating low, below, average, above, high

v2 = score low, below, average, above, high

For each of these variables I have made binary assumptions of 1 = yes, 2 = no such as observation 1 from population one has [0 0 0 1 0] --> v1 = above, and [0 0 0 0 1] --> v2 = high, I have this attribute for all of the 702 observations. I now want to find significant relationships/differences both within each population and between the populations.

For example:

In population 1 there are a significant relationship between v1 = average and v2 = high.

And population 1 has significant higher v1 (high) vs population 2. The problem here is that I have different population sizes and I'm not sure how the handle the case that they aren't normally distributed.

Does anyone have any idea how to construct such a test? Or do I have to few observations to actually be able to say anything about the data at all?

Thanks in advance!

What do you mean that there is a "significant relationship between v1 = average and v2 = high", eg? Usually, we test to determine if there is a relationship between 2 variables. I'm not sure what it would mean for there to be a relationship b/t 1 level of 1 variable & 1 level of another. — gung - Reinstate Monica, Apr 05 '16 at 14:40
@gung Thanks for response, what I mean is that I would like to check whether or not there is a relationship between having an average score in attribute v1 (quality) and attribute v2 (price). I understand I will have to run this for each and everyone of the variablies such that checking the relationship between v1 = low vs all of the v2 = low/below/.../high separately. But I'm not sure of which tests that are most appropriate for this kind of data. — Kev1000, Apr 05 '16 at 15:28
I.e. if there is some sort of relationship, high quality always results in high price, then I would like to check this with significance. @gung — Kev1000, Apr 05 '16 at 15:39
I have read a bit into the subject and understand since the scales v1 & v2 are actually scores from low--> one should use ordinal variables? What would be the difference if i still coded them as binary? I know that there are fewer cars rated as "high"/"low" vs the ratings inbetween these two grades, thus the scale does not have similar lengths of the intervalls. — Kev1000, Apr 05 '16 at 18:32

score 1 · Answer 1 · answered Apr 05 '16 at 20:08

Sorry for the brief response, but I have to run soon and wanted to leave you some information because it has been a while since somebody has responded.

If I were you I would go with the ordinal approach you mentioned in your last comment. Look into the Cochran-Mantel-Hansel class of test statistics, specifically $M^2$ statistic, which tests for independence between two ordinal variables. If the $M^2$ is significant you can go on to look at marginal relationships to gain some insight as to which levels are related to others.

If you have access to it, chapter 3 of Alan Agresti's Categorical Data Analysis book is a good reference for this.

http://www.amazon.com/Categorical-Data-Analysis-Alan-Agresti/dp/0470463635/ref=sr_1_1?ie=UTF8&qid=1459886523&sr=8-1&keywords=agresti+categorical+data+analysis

You're welcome. I hope it helps. – BazookaDave Apr 06 '16 at 19:45 — BazookaDave, Apr 06 '16 at 19:45

Statistically significant differences and relationships for binary variables

1 Answers1