1

details of my dataset

summary(lm(visits ~ health1 + age, data = Medicaid1986))

But it gives this output:

enter image description here

health1:The first principal component (divided by 1000) of three health-status variables: functional limitations, acute conditions, and chronic conditions.

enter image description here

enter image description here

  • Agree that in principle it seems unlikely that the number of visits increases with better health, but (1) the relationship although significant at conventional levels is very weak (2) we can tell you nothing about your data on health status. You might be better off telling us more about that variable or plotting number of visits versus health status – Nick Cox Apr 19 '22 at 09:50
  • Hi @Daisy. With your ordinal 'health' variable, do high values represent better or worse health? Moreover, is there a potential bias in the health variable? For example, are the persons surveyed in a hospital setting? – EB3112 Apr 19 '22 at 10:55
  • You should consider whether your results are a by-product of outliers – Nick Cox Apr 19 '22 at 10:55
  • I wouldn't mush together health status like that but use the original health variables. The sign of the (scores of the) first PC is arbitrary in any case, as negated PCs would have yield equally good solutions to the underlying eigenvector-eigenvalue problem. – Nick Cox Apr 19 '22 at 10:57
  • You have already estimated a relationship between health status and doctor visits. What you need is to see that relationship. Draw a scatterplot matrix of the variables involved: age, health status, and doctor visits. (Put the visits on a square root scale--it will show more detail.) Inspect some of your data if that's not clear enough. – whuber Apr 19 '22 at 17:04
  • A main point about square roots is that they allow zeros, as zero is the square root of zero. – Nick Cox Apr 19 '22 at 17:26

1 Answers1

3

You appear to be proceeding on a false premise about your variable

Based on the description of the dataset in the package, it is far from clear that a higher value of health1 represents better health. This variable is stated to be "[t]he first principal component (divided by 1000) of three health-status variables: functional limitations, acute conditions, and chronic conditions". All of these underlying health-status variables appear to represent worse health, and the process of taking a principal component would give some linear combination of these health-status variables. Unless the principal component involves negative weighting on each of the underlying health-status variables, it does not unambiguously represent better health.

Ben
  • 124,856
  • 2
    @AslıKaya No, we can't say that unless we inspect the relationship of PC1 and the original variables, a.k.a. component loadings. – Nick Cox Apr 19 '22 at 17:28