Gaussian processes with finite sampling area

Question

I apologize in advance if this question is poorly-posed: I'm an astronomer, not a statistician. My question is specifically aimed to help me figure out whether Gaussian processes is an appropriate technique for my problem.

Using a telescope and a fiber-fed spectrograph, my project has taken the optical spectrum of a galaxy at many locations. The sampling pattern for a single pointing is in the first image, and is repeated three times in total, with different spatial offsets, in order to fill in the gaps (second image). Ideally, I would want to construct estimates of certain quantities over a grid covering the galaxy.

My naive method would be to analyze each fiber's spectrum separately, so that I had $3 N_{fibers}$ point-estimates of the quantities of interest, and then construct a Gaussian process to estimate those quantities everywhere. Similarly, I could construct a Gaussian process for the spectra themselves, then analyze the GP on my grid of choice to find the quantities I'm interested in. However, I'm not sure this is even a valid approach, since my observations are not discrete, but rather, are coincident.

Unlike, for example, soil scientists, who might sample dirt from a very discrete location, and then move 50 meters away & repeat, my observations overlap spatially, so I am integrating over all the light a galaxy gives off. It's not obvious to me that I would be allowed to neglect any spatial variation that may exist within a given measurement. In other words, is a Gaussian process even valid when individual sampling locations are not small? Can I build in an additional spatial term to account for the the light "mixing" within a single fiber?

Addendum: Traditionally, spectra are just interpolated, resampled on a grid, and then analyzed, which also strikes me as extremely wrong--but if I'm going to rain on colleagues' parades, I want to at least present an alternative method.

score 6 · Accepted Answer · answered Apr 30 '16 at 12:03

I think your two questions nail the issue down. It sounds like you can use GPs for some part of the problem but you might need to do more. To explain the issues I see, I will first translate my understanding of your problem into more mathematical language:

The problem

You are interested in some physical quantity $f(x)$ ("spectra"?) where $x$ is a point in some domain of the plane (your photo). $f$ is scalar i.e. a single number for each point of the plane. You can't observe $f$ directly, you can only observe some spatial average of it $F$ at some points $s_k$ of a grid. I.e. you observe $$ F(s_k) = \int_{D_k} f(x)dx.$$ The $D_k$ are the various overlapping disks in your photo. You did not mention it but maybe there is also some measurement noise in your observations, then you would need to add a noise term $\epsilon$ on the RHS.

What about GPs?

It is absolutely OK to fit a GP to your observations and you will get a valid GP approximation or interpolation of $F$. The GP really does not care that your $F$ is made from overlapping disks, it will note and reflect just the right amount of correlation for values sufficiently close to each other. The problem is of course that this will produce a GP for $F$ not one for $f$. And $F$ will not be a (good/reasonable) approximation of $f$ unless $f$ is more or less constant on the $D_k$.

How to recover $f$?

There are different ways to recover $f$ from $F$. What is doable or maybe even "best" depends on your specific requirements and the details of the problem. Since you know the mean function $m_F$ of $F$ explicitely you might try some form of numeric deconvolution.

A more GP spirited way is to make the assumption that $f$ is a GP with mean function $m$ and covariance function $K$. Mathematical theory tells you then that $F$ is a GP as well with mean function $$m_F(s) = \int_{D_s}m(x)dx$$ and covariance $$ K_F(s_1,s_2) = \int_{D_{s_1}}\int_{D_{s_2}} K(x_1,x_2)dx_1dx_2$$.

The representer theorem for the mean of a GP tells you then that $m_F(s) = \sum_k \alpha_k K_F(s_k,s)$ and you can conclude by comparing the coefficients that $$ m(s) = \sum_k \alpha_k \int_{D_k} K(x,s) dx. $$

You can also derive the predictive distribution at a point $s^*$ by noting that $f(s^*)$ and the observations of $F$ have a joint normal distribution and you can condition on the observations of $F$. The formulas get complicated though but they are straightforward (see this paper Equations (8) and (9) )

The problem with this is on the practical side: You either need to find the kernel $K$ from your choice of $K_F$ which is probably difficult or you start with a $K$ such that (i) you can calculate $K_F$ AND (ii) $K_F$ works reasonably well for your observations AND (iii) $K$ makes sense as a model for your astronomical data.

Great discussion. Could we imagine instead a procedure like: 1) Expand F on chosen basis functions, 2) Estimate the vector of parameters and construct $\hat{F}$, 3) Take the derivative of $\hat{F}$ to recover $\hat{f}$? — dv_bn, Apr 30 '16 at 12:17
Yes but step 3 works only in one dimension not in two as is the case here. — g g, Apr 30 '16 at 13:57
Thanks for this extremely thorough discussion. It has given me a lot to think about! — DathosPachy, May 02 '16 at 04:42

Behrang · Answer 2 · 2016-04-30T18:37:59.533

There is a topic in geostatistics called Exact Downscaling. The main goal here is to estimate a property at a smaller scale than the observations. Also these observations may or may not be overlapped (does not really matter). Please take a look to this paper: http://www.ccgalberta.com/ccgresources/report07/2005-101-exact_reproduction.pdf

In this paper, they show a method to downscale the observations using geostatistical techniques. They show that by correctly calculating the cross-covariances between different data scales (point vs block) the kriging estimate is still valid; such that the average of estimated values at smaller scale is equal to larger input data. Basically, in order to calculate the estimate values in any scale, you just need to calculate the covariance function between the input data, target scales and cross-correlations correctly. At the Gaussian Process, the assumption is that estimation is being done at the same scale as input observations.

So these are the steps: 1- Calculate experimental variogram from you data.

2- Fit the variogram model to your experiential variogam. You may need to account for directional anisotropy here. This is the covariance function that in GP is calculated by maximum likelihood method.

3- Calculate all the covariances and cross covariances between input data and target scale. There are numerical receipts for this step. The idea is that by discretizing the blocks into finite points, you can calculate the average covariance. The overlap data should be taken into account here.

4- perform Kriging and calculate the estimate values.

GP is very related topic to geostatistics. However, geostatistics is not limited to Gaussian processes. There are many other methods to estimate or simulate a random process.

Welcome to the site. We are trying to build a permanent repository of high-quality statistical information in the form of questions & answers. Thus, we're wary of link-only answers, due to linkrot. Can you post a full citation & a summary of the information at the link, in case it goes dead? — gung - Reinstate Monica, Apr 30 '16 at 17:38

Gaussian processes with finite sampling area

2 Answers2