Determine performance in which subject improves overall performance

Question

I have a dataset in .csv format as shown:

NRC_CLASS,L1_MARKS_FINAL,L2_MARKS_FINAL,L3_MARKS_FINAL,S1_MARKS_FINAL,S2_MARKS_FINAL,S3_MARKS_FINAL,
FAIL,7,12,12,24,4,30,
PASS,49,36,46,51,31,56,
FAIL,59,35,42,18,18,45,
PASS,61,30,51,33,30,52,
PASS,68,30,35,53,45,54,
2,82,77,75,32,36,56,
FAIL,18,35,35,32,21,35,
2,86,56,46,44,37,60,
1,94,45,62,70,50,59,

Where the first column talks about the over all grade:

FAIL - Fail
PASS - Pass class
1 - First class
2 - Second class
D - Distinction

This is followed by marks of each student in 6 subjects.

Is there anyway i can find out performance in which subject makes a difference in overall outcome?

I am using Weka and had used J48 to build a tree.

/* UPDATE */

The summary of J48 classifier is:

=== Summary ===

Correctly Classified Instances       30503               92.5371 %
Incorrectly Classified Instances      2460                7.4629 %
Kappa statistic                          0.902 
Mean absolute error                      0.0332
Root mean squared error                  0.1667
Relative absolute error                 10.8867 %
Root relative squared error             42.7055 %
Total Number of Instances            32963

Also I discretized the marks data into 10 bins with useEqualFrequency set to true. The summary of J48 now is:

=== Summary ===

Correctly Classified Instances       28457               86.3301 %
Incorrectly Classified Instances      4506               13.6699 %
Kappa statistic                          0.8205
Mean absolute error                      0.0742
Root mean squared error                  0.2085
Relative absolute error                 24.3328 %
Root relative squared error             53.4264 %
Total Number of Instances            32963

Would linear modelling (eg using lm in R) not work to find which subject's marks makes a difference in overall outcome? The outcome may be convert to an ordinal variable (e.g. 0-4). — rnso, Nov 17 '14 at 17:44
Read up on how decision trees are built, and what they can tell you... how is the first attribute chosen? — Has QUIT--Anony-Mousse, Nov 18 '14 at 06:48

score 2 · Answer 1 · answered Jan 04 '24 at 18:47

You seem to have an ordinal response variable, but it is not clear to me (not knowing this grading system) what is the order, but you will know. So one modern way (probably less known when you wrote the quetion 9 years ago) is ordinal regression. Try to peruse the tags ordinal-data, [tag.ordered-logit]. Search this site, many good posts. Some:

Determine performance in which subject improves overall performance

1 Answers1