Is this variable ordinal, numeric or nominal?

Question

I have the following answers to the question: How many times a day do you smoke?

0 time
3 times
4 - 6 times
7 - 10 times
10+ times

Can I use these as ordinal variables, or do I have to use them as numeric? They likely aren't nominal?

Glen_b · Answer 1 · 2017-04-21T11:42:22.167

If the only options we consider are from Stevens' typology of level of measurement then:

Clearly those groupings can be treated as categories.
Clearly the sequence of categories are in order.

ergo it's clearly possible to treat them as ordered categorical.

However, there is information about numerical values there. Specifically, those categories are intervals of values. This is, strictly-speaking, an interval-valued* random variable -- indeed there's an underlying variable which we're observing as interval-censored (well the first category is a single value but we can encompass that).

* beware! "interval-valued" is NOT the same thing as "interval-scale" on Stevens' typology; the typology isn't really adequate and would force you to either treat it as ordered-categorical or to construct values to use for a ratio scale (0 is a real 0 and the underlying scale from which these groups are composed is discrete but ratio-scale). While ordered-categorical is a possible way to look at these data, it uses less than the full information in the data.

[In short, Stevens' typology is inadequate - it fails to properly capture all the kinds of data we actually get - as well as being overly prescriptive about how one can use data to make useful inferences. It frequently leads people into making very poor choices of analysis.]

So if we treat those as interval-valued (interval-censored) variables we can use more of the information in the data.

It might be feasible in some situations to treat the intervals as numeric values (e.g. taking the center of the interval as "the value"), but there are a number of difficulties involved in that -- in general the distribution within each interval won't even be close to symmetric (it may be be better to bias down, but we don't have information from the data itself on how much -- using a model for the underlying distribution would help there, but it would an assumption).

There's also the issue of what to do with the last category -- one possibility is to treat those values as right-censored rather than making some assumption and lumping them into one specific value. (Essentially, it's top-coded)

In part what you might do depends on whether this variable is a dependent variable (response) or an independent variable (predictor/covariate) in a model and what kinds of hypotheses or other inference you're interested in.

Hello and thank you for reply! It is dependant variable and i´d like to test if there is connection between this and "does your family smoke", "do your friends smoke", "how often do your friends smoke". — Gugniro, Apr 21 '17 at 10:32
Then yes, dealing with the censoring as discussed makes sense; you might like to investigate survival models. — Glen_b, Apr 21 '17 at 11:40

Is this variable ordinal, numeric or nominal?

1 Answers1

Linked