1

I haven't done hypothesis testing in a long time so please bear with me. I am trying to understand if my results are statistically relevant per category and aggregated values per category and find 95% confidence interval.

I have different categories of inventory, shipped and among those inventory part of inventory is damaged. How do I know if this damage is significant and if the damaged sample is big enough to make a conclusion this data is significant? I am not sure how to do it as there is only one value and and not sure how to do a testing on one number and use approaches as described in this post.

EDIT: I am adding more data as requested per comments. I want to compare if there is a significant difference between Damaged_A% vs Damaged_B% on a category level and then total. A and B stand for different type of packaging similar inventory was sent in.

Category   Shipped_A   Damaged_A  Damaged_A %  Shipped_B Damaged_B Damaged_B %
Books   4722         65       1.38%   3400    45    1.4%
Kitchen 53832        129      0.24%   27534   340   1.2%
Total   58554        194      1.3%    30934   385   1.2%   
lucija
  • 13
  • 1
    Statistical tests require a hypothesis to test. You didn't specify that, so it is not clear what you want to test. Neither is it clear what you want to have a confidence interval for, which applies to statistical parameters, but you haven't specified which. There is no such thing as "statistical relevance" and the term "significant" doesn't apply to the data per se without specification of a hypothesis. – Christian Hennig Apr 19 '23 at 16:36
  • Welcome to CV, lucija. In addition to the information requested by @Christian, could you explain what your table means, and in particular why the total value of "Damaged" is not the sum of the individual values? – whuber Apr 19 '23 at 16:44
  • How many Categories are there in reality ? And how many A / B 's ? Is the main focus comparing e.g. A to B *within* e.g. Books ? Like, would you want to totally ignore Kitchen when investigating Books ? – Sal Mangiafico Apr 19 '23 at 17:57
  • I would compare within category and then on Total level for A vs B level. – lucija Apr 19 '23 at 21:17
  • 1
    The term "significant" has two distinct meanings so can tend to get in the way of clear thinking about data analysis. Please rephrase the question(s) without this ambiguous word. – Harvey Motulsky Apr 19 '23 at 23:51

1 Answers1

0

If the goal is to compare A vs. B for one category only, conducting the hypothesis test is easy using a chi-square test of association, or a test of proportions.

Note that the table for a chi-square test uses Damaged / Not-damaged, whereas the test of proportions, here, uses Damaged / Shipped.

### Books
Damaged = c(65, 45)
Shipped = c(4722, 3400)

prop.test(Damaged, Shipped, correct=FALSE)

2-sample test for equality of proportions without continuity correction

data: Damaged out of Shipped

X-squared = 0.04157, df = 1, p-value = 0.8384

alternative hypothesis: two.sided

95 percent confidence interval:

-0.004549325 0.005609444

sample estimates:

prop 1 prop 2

0.01376535 0.01323529

Books

Books = matrix(c(65, 4657, 45, 3355), nrow=2, byrow=TRUE)

chisq.test(Books, correct=FALSE)

Pearson's Chi-squared test

data: Books

X-squared = 0.04157, df = 1, p-value = 0.8384

Likewise, this can be done on the totals as well.

There are more sophisticated approaches, namely logistic regression, which could include all the data simultaneously, and then allow for investigating each category.

Also, be sure to present the proportions, as you have done. With relatively large sample sizes, you might find significant differences that don't have practical importance because the difference in proportions is relatively small.

Sal Mangiafico
  • 11,330
  • 2
  • 15
  • 35