1

I want to make a correlation analysis of two nominal columns, the "advocates" column and the "company" column, the advocates in this case are processing the companies, the data looks like this

Advocate Company
Adv 1 Comp A
Adv 1 Comp A
Adv 2 Comp C
Adv 3 Comp B
Adv 3 Comp B
Adv 2 Comp D
Adv 3 Comp E
Adv 1 Comp A

So, I want to make an analysis based on a calculus that shows if theres a strong correlation between advocate X and company Y, for every pair.

I tried to use Cramer's V method but I couldn't make it work properly.

The result I want to achieve is something similar to a correlation matrix of advocate vs company.

Thanks for any help!

utobi
  • 11,726
ThiagoM
  • 11
  • 1
    You cannot obtain a correlation matrix here since your variables are factors. You can only build a contingency table. – utobi May 19 '23 at 20:03
  • 4
    @utobi Many people loosely refer to any assessment of association in a contingency table as "correlation." That's actually justifiable theoretically, because almost every measure of association in $2\times 2$ tables is a form of correlation coefficient -- some are the standard Pearson or Spearman coefficients. – whuber May 19 '23 at 21:29
  • @whuber I see your point, but I remember my (now old) professor of Social Statistics saying us to keep the two concepts distinct, with 'correlation' being a superior form of association concerning numerical variables. But perhaps not everyone agrees with his view. – utobi May 26 '23 at 07:55
  • 1
    @utobi I agree with you. My comment was based on long experience with questions here on CV, where many (especially newcomers) have not been trained in statistics and frequently have no rigorous understanding of any statistical terms. We need to bear that in mind when interpreting questions. – whuber May 26 '23 at 11:26

1 Answers1

2

Turn it into a contingency table:

$$\begin{array}{c|cccccc} & \text{Comp A} & \text{Comp B} & \text{Comp C} & \text{Comp D} & \text{Comp E} \\ \hline \text{Adv 1}&3&0&0&0&0\\ \text{Adv 2}&0&0&1&1&0\\ \text{Adv 3}&0&2&0&0&1\\ \end{array}$$

And use the typical treatment of contingency tables (e.g. Most appropriate statistical test for count data (2x2 contingency)). This requires several considerations about how the data is generated.