I want to use the iris dataset provided by scikit-learn for a paper. But I don't know what the standard for referencing datasets is. What citation should I use for this dataset in my paper? Should I reference scikit-learn? Ronald Fisher for having introduced the dataset? Edgar Anderson for having collected the data? All of the above?
Asked
Active
Viewed 3,645 times
9
2 Answers
6
I would cite both papers (Anderson, 1936; Fisher, 1936), but not scikit-learn, as the dataset is simply bundled with the library, but is not unique to it (for example, the same iris dataset is bundled with R environment, as well). Having said that, scikit-learn certainly has to be cited as well, if used, but not due to use of the dataset.
Aleksandr Blekh
- 8,666
-
It is worth mentioning that Fisher's paper titled "The use of multiple measurements in taxonomic problems" published in 1936 is part of the annals of eugenics. Although this paper is commonly cited for the methods employed in creating the Iris dataset, it is important to be aware of potential sensitivities that may arise, especially in academic settings where politically frustrated lecturers might be present. – Florian Fasmeyer Jun 23 '23 at 14:11
-
More context: When discussing Iris, we talk about flowers, not the human eye. – Florian Fasmeyer Jun 23 '23 at 14:13
-3
I think that citing Scikit-learn is sufficient. According to Scikit-Learn documentation you should cite their paper. You can always add a reference the Iris datset in Scikit-Learn by providing a link to the page.
EDIT - I stand corrected. The accepted answer is spot on
martino
- 1,690
-
4Part of the purpose of a citation is to allow others to find the same data source and a link would serve this purpose very well. But another purpose of citation is to provide academic credit - which in this case belongs less to the developers, but more to the researchers who collated (and perhaps popularised) the data set. It's in this second aspect I find this answer less convincing. – Silverfish Nov 27 '14 at 07:28
scikit-learn, as the dataset is simply bundled with the library, but is not unique to it (for example, the sameirisdataset is bundled withRenvironment, as well). – Aleksandr Blekh Oct 13 '14 at 13:58scikit-learncertainly has to be cited, if used. However, the OP's question was in regard to citing theirisdataset, which calls for an independent citation. This is because the dataset is an independent entity, which is included in many software packages and is not unique toscikit-learn. (By the way, it wasn't me, who downvoted your answer, in case you are curious.) – Aleksandr Blekh Oct 14 '14 at 04:44