Some help to get started with Spatio-temporal data analysis on different engines

Question

I will soon receive experimental data from $n$ engines ($n$ is small, say, 10) for sensor data at the locations in figure:

As you can see, each engine is equipped with $5\times8=40$ sensors, regularly spaced along the radial and circumferential directions. Sensors acquire temperature in time. I think I could model this in one of two ways:

either I consider that, for each engine, I have 40 stochastic processes indexed by time only, i.e., $\{T_i(t,\omega)\}_{i=1,\dots,40}$
or I consider that for each engine I have a single stochastic process, indexed by time, radius and angle, i.e., $T(r,\theta,t,\omega)$, which I sample at regular positions in space and time.

The second model seems superior to me, because it allows to model in a clear way the fact that random variables closely located in radius and theta should be more correlated than random variables which are further away.

My goal is to predict the averaged radial temperature profile (i.e., averaged in time and along the circumferential direction) of a new, untested engine, "identical" to those already tested, together with an estimate of the prediction uncertainty. The uncertainty concerns the prediction of the averaged radial temperature profile, not in the temperature prediction of one of the 40 sensors. Thus it's smaller than it would be if I had to predict the temperature of a generic sensor.

"Identical" is in quotes because of course even if the design is the same, each engine coming from the production line is different, because of manufacturing process variability, which can be controlled but not eliminated. And even if two engines were identical down to a tenth of millimeter, there would always be testing variability. I have never analyzed spatio-temporal data, so I'm having a lot of difficulty in getting started. Any suggestions? Note that different engines may have a different number of samples in time. Let's say that, approximately, the engine with the largest number of samples may have 4 times more samples than the engine with the lowest number of samples.

EDIT: I have a couple idea on how to start. If you can show me how to refine them and get something actually meaningful, that's great. But if you prefer a different approach, please don't feel constrained by my ideas - I'm a newbie in the field and thus it's possible that my suggestions don't make any sense.

one approach may be to define, for each engine, a Gaussian Process in time and space. The covariance function would be the product of three terms plus a nugget term:

$$k(\theta,r,t,\theta',r',t')=k_{per}(\theta-\theta')k_r(r-r')k_t(t-t')+\sigma^2\delta(\theta-\theta',r-r',t-t')$$

where $k_{per}$ would be a periodic covariance function, depending on $\Delta\theta$ only, $k_r$ a classic Gaussian covariance function, depending on $\Delta r$ only, and finally $k_t$ would be a (Gaussian?) covariance function, depending on $\Delta t$. Each covariance function would have a correlation length to be estimated.

I shudder at the very thought of fitting a Gaussian Process with so many samples: 40 probes with easily hundreds (if not thousands) of time samples each. However, I'm only interested in the steady state temperature profile, thus I could retain only the times samples after any transient is finished. For example, for the plot below, I would retain only the samples with time > 0.5 (approximately):
Of course I would need some way to define the "end" of a transient...Using MLE I could estimate the correlation lengths and the noise terms for the Gaussian process. I could also use a mean function which is a polynomial in the radial direction. However, I have no idea how to "combine" the GPs trained on each engine, to get an average radial profile and a measure of uncertainty for the new (untested) engine. Should I "weight" more the GPs of engines for which there are more times samples?

(super-naive approach) for each engine, I compute the time averaged temperature for each of the 40 probes. The average must be done after any transient is finished, as before, and thus some criterion to choose the time window is needed. Once I choose the time window, I get a steady state temperature and a matrix of standard deviations

$$\Sigma=\{\sigma(\theta_i,r_i)\}=\{\sigma_{i,j}\}$$

Then, for each radius I may average the temperature of the 8 sensors at the same radius, and get an average radial profile $T(r)$. But now I need a measure of variation in the circumferential direction, which will likely be a function of $r$. I guess $\Sigma$ already contains this estimate of variability, but I don't know how to use it.

Finally, I could average the average radial profiles among different engines. I would get an average radial profile of temperature, averaged over all the engines. I would also need a measure of variation. This should "combine" the matrices $\{\Sigma_k\}_{k=1,\ldots,n}$ with the variation in average radial profiles across different engines, but again I don't know how to do it. I wonder if I should "weight" more the engines having more time samples, or not.

As you can see, it's all very cloudy.

Those lecture notes might provide a good start for spatio-temporal analyses: http://web.stanford.edu/class/stats253/lectures.html for more detailed but still practical treatment, you might like the book http://www.asdar-book.org/ . Note these materials require a good foundation in 'traditional' statistical modelling and you might encounter pretty high-level math in there. — Jacek Podlewski, Jul 19 '16 at 22:43
@JacekPodlewski, wow, the lecture are cool stuff! Looks like there is a big difference between the 2014 and 2015 lectures.The book is also a nice reference, but I'm not sure management would agree to buying it - will check anyway. After skimming through the 2015 lectures, it seems like my two initial ideas, while insufficient, are actually not so terrible as I feared. I will add them to the question, hoping that doesn't stifle creativity in the answers. — DeltaIV, Jul 20 '16 at 08:27
The book website contains lots of R code which might be useful even without the text itself. — Jacek Podlewski, Jul 20 '16 at 20:06
This is a reference that I have found useful. Cressie & Wilke Statistics for Spatio-Temporal Data — Sycorax, Jul 28 '16 at 02:46
@GeneralAbrial thanks! Looks like an excellent text, starting from basics and getting to state of the art. A pity it doesn't include R code. This one does, but it has only one chapter on spatio-temporal data. Also, from what I can see, it seems to me that Cressie & Wilke does a better job at teaching the underlying principle. They could probably complement each other. — DeltaIV, Jul 28 '16 at 08:34
PS you seem to be knowledgeable in the matter. What about trying to write an answer? :) — DeltaIV, Jul 28 '16 at 08:36

Some help to get started with Spatio-temporal data analysis on different engines

0 Answers0

Linked