0

this question might be silly, but how do I know which category (value) within a feature is significant?

Say I have some exam result data of students from four different countries: enter image description here

How do I know which country(A,B,C or D) is significant for student to pass? i.e. Can I say students from country A,B and C are more likely to pass while student from country D are more likely to fail? Or because country B and C has so much less data I can only deduce conclusion for country A and D? How do I know what is the threshold to do this?

One idea I have is to do a Chi square test on each country and to see which country is significant (by a confidence level of, say, 95%). However, this idea does not seem to work if we have significantly more data in one country than others since the expected value will skew towards the distribution of the country with more data.

Any thoughts will be appreciated.


Edit: This is because the next step I want to do is to find out students in which country has more influence in pass/fail of my data. I will be computing some sort of "influence measure" for each country. From my very limited statistics knowledge I guess my Null Hypothesis will be that country has no influence i.e. independence between a country and the pass/fail result?

(For reference, the measure which I want to do is shown below, using country A as an example. This tells me how many times it is likely for a student in country A to pass than student from the other countries. Based on the data above, we can already see country B will have a much higher "Influence" on "pass" in my data than other countries. But country B has much less data. How can we test this to say whether country B is significant or not?)

enter image description here

J.Cheuk
  • 1
  • 1
  • 1
    Before performing the statistical tests, the null hypothesis / alternative hypothesis are needed, i.e., what do you want to test? – user158565 May 03 '19 at 02:56
  • Hi @user158565. Thanks for answering. Please see the edit above. – J.Cheuk May 03 '19 at 09:10
  • For "Null Hypothesis will be that country has no influence i.e. independence between a country and the pass/fail result", Fisher's exact test is good one. – user158565 May 04 '19 at 21:01
  • You seem to be asking about how to interpret a significant chi-square of a contingency table. Search this site, there are many relevant posts, some: https://stats.stackexchange.com/questions/147721/which-is-the-best-visualization-for-contingency-tables/577033#577033 https://stats.stackexchange.com/questions/157597/how-to-interpret-a-two-dimensional-contingency-table/157972#157972 – kjetil b halvorsen Jul 17 '23 at 22:56
  • 2
    I'd strongly urge you to avoid the term "significant" in your question. Sometimes it means an effect (difference or ratio) is "big enough to matter". Sometimes it means the p-value is smaller than a preset threshold. Because of the ambiguity, it often leads to confusing questions. – Harvey Motulsky Jul 18 '23 at 00:54
  • @HarveyMotulsky I once got a referee report saying that “significant” in that journal should always be accompanied by an $\alpha$-level or changes to a worked liked “marked” to indicate practical significance. – Dave Jul 18 '23 at 01:07
  • @Dave. Sure, with careful wording there is no ambiguity. But use of the word "significant" is often a sign of muddled thinking in beginners, so I think it is best to avoid that word entirely. – Harvey Motulsky Jul 18 '23 at 14:12

0 Answers0