I have the following answers to the question: How many times a day do you smoke?
- 0 time
- 3 times
- 4 - 6 times
- 7 - 10 times
- 10+ times
Can I use these as ordinal variables, or do I have to use them as numeric? They likely aren't nominal?
I have the following answers to the question: How many times a day do you smoke?
Can I use these as ordinal variables, or do I have to use them as numeric? They likely aren't nominal?
If the only options we consider are from Stevens' typology of level of measurement then:
Clearly those groupings can be treated as categories.
Clearly the sequence of categories are in order.
ergo it's clearly possible to treat them as ordered categorical.
However, there is information about numerical values there. Specifically, those categories are intervals of values. This is, strictly-speaking, an interval-valued* random variable -- indeed there's an underlying variable which we're observing as interval-censored (well the first category is a single value but we can encompass that).
* beware! "interval-valued" is NOT the same thing as "interval-scale" on Stevens' typology; the typology isn't really adequate and would force you to either treat it as ordered-categorical or to construct values to use for a ratio scale (0 is a real 0 and the underlying scale from which these groups are composed is discrete but ratio-scale). While ordered-categorical is a possible way to look at these data, it uses less than the full information in the data.
[In short, Stevens' typology is inadequate - it fails to properly capture all the kinds of data we actually get - as well as being overly prescriptive about how one can use data to make useful inferences. It frequently leads people into making very poor choices of analysis.]
So if we treat those as interval-valued (interval-censored) variables we can use more of the information in the data.
It might be feasible in some situations to treat the intervals as numeric values (e.g. taking the center of the interval as "the value"), but there are a number of difficulties involved in that -- in general the distribution within each interval won't even be close to symmetric (it may be be better to bias down, but we don't have information from the data itself on how much -- using a model for the underlying distribution would help there, but it would an assumption).
There's also the issue of what to do with the last category -- one possibility is to treat those values as right-censored rather than making some assumption and lumping them into one specific value. (Essentially, it's top-coded)
In part what you might do depends on whether this variable is a dependent variable (response) or an independent variable (predictor/covariate) in a model and what kinds of hypotheses or other inference you're interested in.