I'm trying to calculate a correlation matrix for ordinal variables in R. Kendall rank correlation coefficient, seems a good option, as it "is a statistic used to measure the ordinal association between two measured quantities" (emphasis added).
Since I have variables with a different number of ordinal levels, I'm planning to use Stuart-Kendall Tau-c for accounting for ties when calculating the coefficient: "Tau-c (also called Stuart-Kendall Tau-c) is more suitable than Tau-b for the analysis of data based on non-square (i.e. rectangular) contingency tables."
The R package DescTools has a function StuartTauC which calculates "Stuart's Tau-c statistic, a measure of association for ordinal factors in a two-way table."
As an example, I will use three ordinal variables from the diamonds dataset in ggplot2:
# Import "diamonds" dataset from ggplot2
library( ggplot2 )
head( diamonds[2:4] )
A tibble: 6 x 3
cut color clarity
<ord> <ord> <ord>
1 Ideal E SI2
2 Premium E SI1
3 Good E VS1
4 Premium I VS2
5 Good J SI2
6 Very Good J VVS2
My implementation in R is as follows (I'm open to better implementations for calculating the matrix):
library( DescTools )
df <- diamonds[2:4]
cor_matrix <- matrix(
nrow = ncol( df ),
ncol = ncol( df )
)
rownames( cor_matrix ) <- names( df )
colnames( cor_matrix ) <- names( df )
for( row in 1:ncol( df ) ){
for( col in 1:ncol( df ) ){
cor_matrix[row, col] <- StuartTauC( df[[row]], df[[col]] )
}
}
cor_matrix
Result:
cut color clarity
cut 0.89458402 -0.01356334 0.1464609
color -0.01356334 0.97953628 0.0232527
clarity 0.14646089 0.02325270 0.9405563
My question is, shouldn't the diagonal values be 1, or is this a feature of the Tau-C statistic (or the function StuartTauC)?
A = factor(c("Low", "Low", "Low", "High", "High", "High")); B = factor(c("Low", "Low", "Low", "High", "High")); StuartTauC(A, A); StuartTauC(B, B)– Sal Mangiafico Sep 12 '22 at 16:59