What is the relationship between two variables that Chi-Squared Test for Independence is testing for?

Question

I am trying to understand the application of Chi-Squared test for independence between predictor and response variables as it applies to feature selection in machine learning and exploratory data analysis. I understand that there are few types of Chi-Squared test for independence.

However, I do not understand what exactly is this relationship that Chi-Squared measures. Is it simply a measure of a correlation coefficient between distributions of categorical variables?

I would prefer an intuitive explanation over mathematical proof.

Thank you!

Just to be clear -- are you primarily interested in the 2x2 case or the general r $\times$ c case, or something else? If the r x c case, what do you mean by "correlation"? — Glen_b, Dec 19 '16 at 05:54
Correlation as a linear relationship between two variables. I am not sure what you mean by 2x2, I am interested in getting a general understanding. — verkter, Dec 19 '16 at 06:26
A chi-squared test for independence is conducted on data that falls into two (or more) categorical variables. How are you defining "linear" between things falling into categories? — Glen_b, Dec 19 '16 at 11:58
Linear is probably not a correct. I assumed that this is the relationship that Chi-Squared test is measuring. Looking at the definition of what chi-squared test for independence does: "It is used to determine whether there is a significant association between the two variables." (http://stattrek.com/chi-square-test/independence.aspx?Tutorial=AP) What is this "significant association" actually is? — verkter, Dec 20 '16 at 00:13
To return to the question about 2x2 vs r x c (since it impacts the possible ways of interpreting an idea of linear association)... how many categories do you have in each variable? — Glen_b, Dec 20 '16 at 00:16
I don't have a specific example for you. I am mostly interested in it's application and use in feature selection or how variables can be related to each other. You can be very general and high level. I don't understand what you mean by "2x2" and "r x c". Thanks. — verkter, Dec 20 '16 at 00:27
Categorical variables are usually displayed in a table of counts. The first number in the product is the number of rows in the table (number of levels of the categorical row-variable) and the second number is the number of columns in the table. So if you have two binary variables in your chi-square, you display them as a 2x2 table, showing the counts in each combination. Go here, scroll down to "Finding Expected Counts from Observed Counts" and you'll see such a table (labelled "Observed table"). ... ctd — Glen_b, Dec 20 '16 at 01:02
ctd... In that case it has two rows and three columns (not counting headings or totals), and so is a 2 x 3 table. — Glen_b, Dec 20 '16 at 01:08

What is the relationship between two variables that Chi-Squared Test for Independence is testing for?

0 Answers0