2

I tried the Iris Species lda problem in SPSS and R, but the scalings are different. Why?

SPSS results:

Canonical Discriminant Function Coefficients        
                  Function  
                   1      2
SepalLengthCm    -.819   .033
SepalWidthCm    -1.548  2.155
PetalLengthCm    2.185  -.930
PetalWidthCm     2.854  2.806
(Constant)      -2.119 -6.639

R results:

Coefficients of linear discriminants:
                LD1 LD2
Sepal.Length  0.8293776  0.02410215
Sepal.Width   1.5344731  2.16452123
Petal.Length -2.2012117 -0.93192121
Petal.Width  -2.8104603  2.83918785

I know that the signs for the discriminant analysis is just a matter of coding but the scores differ by some 0.01 for all.

Does anyone know what estimate SPSS and R uses to solve LDA?

ttnphns
  • 57,480
  • 49
  • 284
  • 501
  • 1
    Possibly this http://stats.stackexchange.com/q/166942/3277 is a duplicate question? Check it. – ttnphns Nov 05 '16 at 08:40
  • 1
    @ttnphns - I had already checked the above question but it is clear in that question that the only difference is that the SPSS rounds to 3 decimal places while R doesn't. In my example some of the scores differ by 0.02, 0.01 and I need to check whether the two software maybe use different estimates. – Annalise Azzopardi Nov 05 '16 at 08:51
  • In R, there are several packages that do LDA, which one are you using? Also, In the link above, there is a comment with a link to an example of LDA with iris data. Did you read it and try to reproduce the example results? – ttnphns Nov 05 '16 at 09:05
  • Yes I have seen the LDA with IRIS data and they my results are exactly the same for R BUT when using SPSS the coefficients are different. My concern is whether R and SPSS use different estimators? – Annalise Azzopardi Nov 05 '16 at 09:52
  • In the example, principal results (canonical correlations, discriminant coefficients) were the same in SPSS and in that lda R package. And in your question the "R results" of the iris data are identical to SPSS "Unstandardized coefficients" output. Where did you get your "SPSS results" from? Check your data: can there be missing values? Post your spss command syntax, after all. – ttnphns Nov 05 '16 at 10:19
  • The output under "R results" and "SPSS results" in my question some of them differ by 0.02/0.01 so it doesn't make a difference right? The syntax can be found below. In SPSS the commands are Analyze - Classify - DA - Statistics (tick Unstanderdized) – Annalise Azzopardi Nov 05 '16 at 10:53
  • DISCRIMINANT /GROUPS=Species(1 3) /VARIABLES=SepalLengthCm SepalWidthCm PetalLengthCm PetalWidthCm /ANALYSIS ALL /PRIORS EQUAL /STATISTICS=RAW /CLASSIFY=NONMISSING POOLED. – Annalise Azzopardi Nov 05 '16 at 10:55
  • 1
    This syntax produced me right now correct results ("R results"). There must be something wrong with your data in SPSS (missing values? case filtering? wrong values in the data?) I don't thing your SPSS has a bug (what version are you using?) Take iris from http://stats.stackexchange.com/q/82497/3277 and test again. – ttnphns Nov 05 '16 at 11:08
  • @ttnphns Is Iris dataset built-in into SPSS? I know at least one place where it is reproduced with two errors (http://archive.ics.uci.edu/ml/datasets/Iris): the errors are acknowledged right there on the description page but will not be corrected. Confusingly, developers of the scikit-learn package in Python took the Iris data from there, so it is wrong in Python; and so all the analysis in Python will produce slightly different outcomes. It was pointed out to me by usεr11852 once. – amoeba Nov 05 '16 at 11:24
  • 1
    @amoeba, SPSS has no iris data included. I used the data I linked just above which in turn was taken from wikipedia, I recommed the OP to test on that dataset. – ttnphns Nov 05 '16 at 11:34
  • 1
    @ttnphns I can confirm that when LDA is done in Python using Python's version of the Iris data (taken from the UCI website), the results are identical to the OP's results in SPSS (example: http://sebastianraschka.com/Articles/2014_python_lda.html). So this must be the issue: OP most likely downloaded the data from UCI dataset. If Annalise confirms this, it would be useful to write it up as an answer for future reference. – amoeba Nov 05 '16 at 12:40
  • 1
    I have used the dataset from UCI website in the above question. Now since ttnphns said that the data taken was from Wikipedia and the R results and SPSS result are exactly the same :). Thank you for your help – Annalise Azzopardi Nov 05 '16 at 13:02
  • SAS' Proc Discrim examples leverage the Fisher iris data heavily. Why not do a further check with the available online SAS output? https://support.sas.com/documentation/cdl/en/statug/63347/HTML/default/viewer.htm#statug_discrim_sect025.htm – user78229 Nov 05 '16 at 13:40

1 Answers1

2

The reason that the R and SPSS gave me different results is that for the SPSS analysis I took the following iris data from UCI website (https://www.kaggle.com/uciml/iris), while for the R software analysis I took the data from Wikipedia. The two iris datasets differ a little (the Wikipedia's one is deemed to be more correct, original one) - hence the results also differ.

See http://archive.ics.uci.edu/ml/datasets/Iris:

This data differs from the data presented in Fishers article [...]. The 35th sample should be: 4.9,3.1,1.5,0.2,"Iris-setosa" where the error is in the fourth feature. The 38th sample: 4.9,3.6,1.4,0.1,"Iris-setosa" where the errors are in the second and third features.

All credit should go to @amoeda as can be seen in the below comments

  • @amoeba Since you provided the correct answer in a comment, you should put it officially in an answer so that Annaliese Azzopardi can delete her own answer and accept yours. This would not only give you the correct credit (which may not be that important, looking at all the reputation you earned) but also provide everyone searching with the information, that there is an accepted answer. – Bernhard Nov 05 '16 at 13:40
  • 1
    @amoeba -Sorry for any inconvenience caused I am new to this...all credit goes to you – Annalise Azzopardi Nov 05 '16 at 13:43
  • @Bernhard: I am actually happy if Annalise accepts her own answer, which I already upvoted! :-) No problem with me at all. I was only pedantically suggesting that she gives a proper credit where it's due. – amoeba Nov 05 '16 at 13:44
  • @amoeba -Thanks once again :) I will include you in the answer so as to makes things clear :). – Annalise Azzopardi Nov 05 '16 at 13:48