0

I am using the R caret package to build a randomforest classifier model for plant data.

The dataset has 7 variables - all numeric which are being used to predict if a plant will "grow" or "not grow".

This is a very simple model.

In my training dataset I have 70% of observations classified as "grow" and 30% classified at "not grow".

I have trained the model using this data and have received an accuracy of 93% and a kappa of 86%.

My question is about the Mcnemar's Test: I have a p-value of 0.8231 which I understand usually means I reject the Null hypothesis.

I have attempted to research this test and it appears to be something about proportion change before and after an event.

Would I be correct in saying this has something to do with my imbalanced proportion in my dependent variable?

Could anyone interpret this p-value?

Thank you

Will.S89
  • 133
  • 6

1 Answers1

3

First, a p-value of 0.83 means you do not reject the null.

Second, McNemar's test is about whether the row and column marginals are equal, or, equivalently, whether the "off-diagonal" elements are equal. Since your p value is quite high, you cannot reject the null that they are equal. It's not clear, from your question, what was in the four cells of the crosstabulation.

Peter Flom
  • 119,535
  • 36
  • 175
  • 383
  • Thank you for your reply. I really do appreciate it.

    The confusion matrix is a simple one. its a 2X2 matrix consisting of the true and false classifications for "grown" and "not grown"

    – Will.S89 May 20 '18 at 23:37
  • 1
    So, there are two cells that are "errors" (top right and bottom left). McNemar's tests if they are equal. Since p is high, you cannot reject the null that they are equal. – Peter Flom May 20 '18 at 23:39
  • You're welcome. The usual thing to do on this site when an answer meets your needs is to "accept" it. – Peter Flom May 21 '18 at 11:18