0

Assuming I have a dataset of temperature data sampled every 5 minutes and I want to find out its mean. If we assume that the data was sampled from a discrete process we can use the arithmetic mean:

$\frac{1}{n}\sum_{i=1}^{n}{x_i}$

However, if we assume that the underlying process in continuous, the mean would be the definite integral:

$\frac{1}{t_n-t_o}\int_{t_1}^{t_n}{f(t) dt}$

where $t$ represents the time and $f(t)$ the corresponding temperature at that time.

My question is, assuming I can approximate $f(t)$ quite good, is it more reasonable to assume a continuous process and calculate the mean accordingly or to assume a discrete process and use the arithmetic mean.

T. Tim
  • 47

1 Answers1

1

Well, you have to decide which model you want to assume behind your discrete data-points!

If you simply draw linear lines between your points, then averaging the discrete data-points is almost exactly the same as calculating the area/width. (because the 2 outer most data-points would have half weight)

So it's the method of fitting that makes the difference!

Maybe read this post, where people discuss probability driven fitting of discrete data-points.

KaPy3141
  • 787
  • Yes indeed when we assume simply a line between the points then the mean will be very similar, however, e.g. looking at the variance this is not the case. "Well, you have to decide which model you want to assume behind your discrete data-points!" This is exactly the question. In case of my temperature data, I would say it is continuous and I would assume a line between the points. However, most of the people just assume a discrete process and use discrete methods and therefore I wonder if this has any deeper reason. And thanks for the post, I'll read it – T. Tim Feb 25 '20 at 11:51