Data unable to meet assumptions for ANOVA, Chi-Square, or Fishers Exact Test

Question

This is for a student selected class project. I am examining incidents of when Wildlife hits an aircraft across the last three decades. The data was selected from population (N=392,341) data at random in excel using data analysis sampling. I was planning on doing an ANOVA originally until I realized my data is a frequency. I recalled I could do Chi-Square, but again I have several cells <5, but more groups than a 2x2 Fisher Exactness will allow. I was wondering what would be the best way to proceed?

Should I remove the unknown incidents, and run another simple random selection? I could still end up with data <5 in each cell.

A Priori was done initially for ANOVA. n=729, alpha =0.001

My first question is, Why sample from the data? Why not analyze all 400,000 observations ?. Second, Are you thinking about Decade as an independent variable ? And if so, would you want to treat them as groups, as if in anova ? — Sal Mangiafico, Dec 16 '22 at 14:38
@SalMangiafico, I am not able to use the population for this assignment its strictly raw sample data, and the professor will need to be able to run the test themselves if needed. Right now I believe the dependent variable is Wildlife Strikes, and Independent Variable will be safety policies and Bird/Wildlife Aircraft Strike Hazard Programs implemented in the last three decades. — Logan Innes, Dec 17 '22 at 02:11
Please add the "self study" tag to your post, and if you would, mention that this is for a class assignment (if it is). See also: stats.stackexchange.com/tags/self-study. — Sal Mangiafico, Dec 18 '22 at 02:47
Here's my opinion. How you approach the analysis for this problem depends on how you see the question practically. 1) You could do as you suggest and use a chi-square test of association. Probably 40% of observations in your table are < 5. Because of this, a standard chi-square may not be the best approach. In this case, I would probably recommend doing the analysis by Monte Carlo simulation, which is easy in some software packages. — Sal Mangiafico, Dec 18 '22 at 16:52
Of course, how you choose to approach the problem depends on the context of the course. — Sal Mangiafico, Dec 18 '22 at 16:59
Practically speaking, I would look at the proportions for the cells within each column. That is, for any decade, 1 Strike is > 70% of the observations, and 2-10 Strikes is > 15%, leaving fewer observations for higher strike numbers. ... And then what do you say about Unknown ? — Sal Mangiafico, Dec 18 '22 at 17:25

Data unable to meet assumptions for ANOVA, Chi-Square, or Fishers Exact Test

0 Answers0