I want to determine whether my assumption that the dataset I'm using is i.i.d. is in fact valid (for an arbitrary dataset, perhaps made of images). I have done quite a bit of research already, looked through various papers on google scholar, but haven't found what I'm looking for. Is there any half-decent measure of independence of a dataset? Even if I can't guarantee it, are there any common modern methods used to test for this assumption?
Asked
Active
Viewed 34 times
0
-
1Since you bring up independence and identically distributed, you're working with the idea that the samples, using your example of images, are randomly produced, correct? – A rural reader Nov 02 '23 at 00:01
-
1Welcome to Cross Validated! You want to know about the independence of what? – Dave Nov 03 '23 at 15:34
-
i.i.d. is usually treated as a theoretical assumption which rarely, if ever, holds with empirical data. I, too, would be interested in a test of this assumption. – user78229 Nov 03 '23 at 15:44
-
2Why do you want to determine whether your dataset meets the i.i.d. assumption? What will you do if it does, and what will you do if it doesn't? One way an image dataset could violate this assumption is if the images were taken by different people, with each person taking multiple similar pictures. Or if the pictures were taken in some order over time, and pictures taken nearby in time are more highly correlated. In these cases you may want to modify your train/test strategy. – Adrian Nov 03 '23 at 15:47
-
If my comment above feels relevant, you may be interested in https://stats.stackexchange.com/questions/564063/references-on-data-partitioning-cross-validation-train-val-test-set-constructi – Adrian Nov 03 '23 at 15:49
-
You will need to tell us some more context&details for this to be answerable. As it is now, it is to abstract – kjetil b halvorsen Nov 10 '23 at 14:40