3

So it is obvious that weight is a continuous variable as it can be quantified with decimal precision;like 10.2 kg and 3.0122 kg.

If we were to round it to an integer like 10 kg and 3 kg. Would then this variable of weight be still considered a continuous numerical variable or discrete.

Asking because my stat text books says that only continuous numerical variables can be used with certain statistical analysis (Regression, ANCOVA .etc). And i wanted to make sure that I can still use certain "continuous variables" who's measurements have been rounded to integers.

mkt
  • 18,245
  • 11
  • 73
  • 172
  • 5
    Might I suggest you throw those books away? Asserting that the measurement type of a variable determines or severely constrains how you can analyze it is a huge abuse of the concepts and theory of variable types and statistics. – whuber Apr 17 '18 at 15:37
  • 1
    I'm sure you mean mass, but I digress. In any case, whether you treat something as a continuous variable or discrete has more to do with how many unique values of that variable you have in your dataset. If all of your observations are either 2 or 8 kg, then I'd probably treat that variable as having a binomial (discrete) distribution. But if you have values ranging from 0 to 100 kg and the histogram of the data series looks pretty "smooth," I think a continuous distribution would be more appropriate. Regardless, regression can be performed on both discrete and continuous data series. – Josh Apr 17 '18 at 15:37
  • @whuber I rather disagree, I find that a thorough understand of my data type helps to frame the problem and narrow down potential approaches. Categorical data simply can't be handled the same as ordinal discrete data, which itself may or may not be a decent approximation of continuous data. Virtually all statistical tests have some underlying assumptions about the data, and to not appropriately match your method to your data is, IMO, a bigger abuse of the statistics. – Nuclear Hoagie Apr 17 '18 at 16:02
  • 2
    @Nuclear If it helps you, that's fine. But perceptive authors--John Tukey, Lee Wilkinson, and others--have argued very strongly that this only blinds you concerning other opportunities to analyze the data. This is unrelated to the question of underlying distributional assumptions. Historically, the most prominent challenge to your point of view is Lord's (1953) paper on football numbers -- and it still gets attention, indicating the question cannot be considered wholly resolved. – whuber Apr 17 '18 at 16:07
  • Note that the same argument that applies to claiming that 3kg is a discrete value also applies to 3.0122 kg; it's an integer number of milligrams, after all. If you have measurements to n decimal places, then all of your measurements are an integer number of 10^-n kg (with some quibbles about base-10 vs. base-2). – Acccumulation Apr 17 '18 at 16:43

3 Answers3

3

The normal distribution is a theoretical continuous function.

In practice all data is, to some extent (if you look far enough behind the decimal point), discrete.

The normal distribution, in the end, is only used as a useful/easy/handy/simple/manageable approximation to the distribution of the data (this extends to variations from the theoretic distribution beyond the discreteness of data, e.g. different skew/kurtosis). Often unnoticed since the difference is so small, or in other words the approximation is so good (at least due to the aspect of discreteness).

Only in cases of few discrete levels do people consciously think about the deviation of the theoretic Gaussian distribution from the real distribution due to the discreteness of the data. For instance when using a continuity correction when applying a normal approximation to binomial or Poisson distributed data.

  • 2
    Something subtle and interesting has happened in this answer: you have implicitly conflated the concepts of (1) a categorical measurement type, which is defined in terms of a group of allowable transformations of its possible values; (2) a discrete set of numbers (which is a topological property); and (3) a discrete random variable, which refers to a property of a mathematical model for the data. Although in English "categorical" and "discrete" are practically synonyms, these are completely different concepts. It is only in sense (2) that your assertions about "in practice" are true. – whuber Apr 17 '18 at 15:56
  • I have edited my question a few times so maybe it is gone now, but I did not mean to speak about a categorical variable, I currently can't see this in my text. – Sextus Empiricus Apr 17 '18 at 15:59
  • 1
    The question itself concerns a categorical or ordinal variable, whereas your answer discusses topologically discrete data. I wanted to call attention to the distinction between those, which I think is important for coming to a correct and useful understanding of the issues in this thread. – whuber Apr 17 '18 at 16:01
  • 1
    The question discusses continuous variables that are rounded, not? (such as weight measurements). Where does it implicitly refer to categorical data? – Sextus Empiricus Apr 17 '18 at 16:02
  • 1
    The issue, quite clearly, is whether the conversion of one measurement type to a "lower" type, such as rounding to a whole value--essentially conversion to an apparent ordinal or categorical variable--should affect how one can analyze data. Discussing the resulting discretization, in the topological sense, seems to be less germane and less useful than focusing on the implicit misunderstandings of the role of measurement type in statistical analysis (as exemplified in some comments to the question). – whuber Apr 17 '18 at 16:11
  • I think the topological sense is useful (maybe less useful, I don't know about that) since it shows that, every measurement is in essence a conversion/discretization (and if you like every weight/mass measurement of an object is in theory discretized by being a sum of the the weight of individual atoms or, if you dig deeper, quantum energy levels). The practical question boils down to 'how much harm' the discretization did. And even if it did harm, how one can correct for this, if necessary. I am still puzzled about the point were this answer is conflating concepts. – Sextus Empiricus Apr 17 '18 at 16:17
1

That assumption concerns more about the sample/population mean distribution, not how your data are recorded. In a way, 10 kg is also 0.01 metric ton, there is no magic introduced in rounding your readings to whole number here.

What you should be asking, however, is why the rounding to begin with. You lose some information that way so if that's not absolutely necessary, keep what they are, and perform necessary rounding only at the last stage when you present the results.

0

Yes, in that case you can still apply methods that work on continuous data.

One piece of the puzzle is that the integers are a subset of the reals ($\mathbb{Z+} \in \mathbb{R}$). They also follow the same order.

A lazy intuitive argument for this would be to simply consider the input being continuous measurements that take integer values "by chance".

Denwid
  • 722