2

I am studying lexical semantics. I have 65 pairs of synonyms with their sense relatedness. The dataset is derived from the paper:

Rubenstein, Herbert, and John B. Goodenough. "Contextual correlates of synonymy." Communications of the ACM 8.10 (1965): 627-633.

I extract sentences containing those synonyms, transfer the neighbouring words appearing in those sentences to vectors, calculate the cosine distance between different vectors, and finally get the Pearson correlation between the sense relatedness given by Rubenstein and Goodenough and our distances.

I get the Pearson correlation for Method 1 is 0.79, and for Method 2 is 0.78, for example. How do I measure if Method 1 is significantly better than Method 2 or not?

  • You can try the R package of cocor, which provides statistical tests between correlation coeffients. It seems that cocor.dep.groups.overlap() function works for you. – Leo Yu Sep 20 '21 at 07:34

1 Answers1

1

Remember that there is statistical significance and there is practical significance. The latter is difficult to assess algorithmically; it usually requires thought and subject-matter or context-specific expertise. The former can be assessed via a test comparing the strength of correlation coefficients using Fisher's Z transformation.

rolando2
  • 12,511
  • Good point. Suppose the context is manufacturing and an alternative process is proposed where manually changing machines requires hundreds of thousands of dollars. A 0.2% increase in efficiency might be statistically significant yet not worth investment because it could take decades for ROI to be realized at such a small marginal increase to productivity. – jbuddy_13 Mar 27 '24 at 14:00