Before I start be easy on me I am not a stats person - I am just on the quest for information :-)
I was tasked to generate many graphs for someone who wanted to see if the data presented with a bell shape curve vs a graph with > 1 peak - this was no biggie but silly me I thought maybe it would be easier to run the data thru some stats package like R that would flag the populations that need to be looked at with a graph. The quest has led me to reading Shapiro-Wilk, Anderson–Darling test etc but various people with more knowledge than I still pointed to the graphing of data as the only way to verify Normal Distribution.
So my question is is there any Statistics method I can run the following data thru to flag the data for further investigation instead of graphs ??
Instead of Counts of Patients by days should I just look at full list of DaysInHospital
Thanks
Data is plotted with a graph per disease with the data plotted
x-axis - Days in Hosiptal
y-axis - Cnt of Patients
DiseasePopulation #1
( DaysinHospital,Cnt of Patients) = {(1,1),(2,1),(3,3),(4,6),(5,2),(6,1),(7,1),(8,1)}
DiseasePopulation #2
( DaysinHospital,Cnt of Patients) = {(1,1),(2,1),(3,1),(4,2),(5,6),(6,3),(7,1),(8,1)}
If you just used the raw observation list 'x' with only the DaysinHospital values you should be able to easily apply suitable functions - e.g. 'shapiro.test(x)' from '{stats}' or 'dip(x, ...)' from '{diptest}'.
– Jan 14 '14 at 16:28qqplotto compare with a standard normal dist? – Carl Witthoft Jan 14 '14 at 16:31shapiro.testin R just as easily. – Jan 14 '14 at 16:37DaysinHospitalis not a continuous variable. A discrete distribution should be used. – Roland Jan 14 '14 at 16:37diptest::dip), but "is it normal?" is the vocabulary they're working with – Ben Bolker Jan 14 '14 at 16:51