Calculating goodness of fit for a physics-based model

Question

I have a physics based mathematical model that predicts water quality metrics based on end-member mixing. It makes several simplifying assumptions such as constant values of the end-members and continuous mixing that are not perfectly met in the real world, but I'm interested in how well the model explains real world data. I have a dataset of water samples from along the mixture gradient. Is there a way to calculate how well the real world data fits the physical model? I have calculated the % of variance explained by the model, but I'm not quite sure if that's the same thing or not.

I don't understand the traditional goodness of fit tests (like chi^2, KS, etc) to figure out how or even if they would apply to my use case. Any help (or even just direction to places to start reading) would be appreciated.

Thanks

A "physical model" is literally a physical object, like a model train set or a scale model, such as this 20 acre scale model of the Mississippi River Basin: https://en.wikipedia.org/wiki/Mississippi_River_Basin_Model . Do you have a physical model, or do you have a mathematical model of a physical process? This sounds like a stupid question, but physical models were widely used in hydrology until fairly recently. — olooney, Jun 08 '19 at 18:10
Sorry, I meant that it isn't a statistical model, rather it's one based on the ideal mathematical mixing. I meant "physical" in the sense of physics. I'll edit my question for clarity. — C. Denney, Jun 08 '19 at 18:17
OK. Did you "fit" the model to the real-world data? That is, did you choose any of the parameters of the model specifically so it's predictions would agree the observed outcomes? Or did you plug in numbers you knew before and see what the model says? For example, if we were modeling the trajectory of a ballistic object as a parabola, an the example of the former would be using the actual distance observed and working backwards, while an example of the later would be plugging in initial angle and velocity. — olooney, Jun 08 '19 at 18:29
We measured actual mixture and actual water quality metrics of the waters samples. The model used mixture percentage to predict the water quality metrics, so I'm interested in how well it can predict water quality given a certain mixture. So, if I understand correctly, to translate to your example, our sample data measured angle and velocity and distance observed, while our model is attempting to predict distance given angle and velocity. — C. Denney, Jun 08 '19 at 18:55

score 1 · Accepted Answer · answered Jun 08 '19 at 20:26

Very briefly, we can turn any deterministic model into a statistical model by making some assumptions about the distribution of errors (which requires thinking carefully about the causes of those errors) and that guides us towards certain metrics and even statistical tests. However, this is a very involved, highly iterative process. Some examples are in order.

Example 1: Statistics Motivated by the Underlying Physics

For particle physics experiments, such as the search for the Higgs Boson, experimental data is collected by partial detectors placed at different angles around the collision. Particle detector data is a count of events where the probability of an event is independent of the time since the last event. Therefore we know that the count data follow a Poisson distribution. Our theory makes a prediction for what kinds of events we will detect at which angles, which gives us the mean number of events we would expect at each detector. We can tell if the experiment agrees with theory by asking if the observed count falls within the confidence interval for a Poisson distribution centered on our theoretical prediction.

Example 2: Using Statistics to Model Measurement Error

If we measure some quantity with an instrument, say using a scale to measure the weight of a precipitate after some chemical reaction, the source of error is "measurement error". Measurement error is often normally distributed, perhaps because it is the accumulation of many other small discrepancies that accumulated through-out the experiment, such as small variations in the amount of the reagents or the exact temperature of the reaction. Measurement errors are also very likely to be independent from measurement to measurement (although there also techniques for dealing with repeated measures.) Under these assumptions, the difference between our theoretical predictions and observations will be normally distributed and we can use statistics like Mean Square Error (MSE) to quantify the agreement between theory and experiment.

Example 3: Using Statistical Modeling to Create Empirical Models

We often have the case where errors are not so well behaved. Consider the case of the ballistic trajectory of a bullet modeled by a simple parabolic trajectory. There is measurement error, certainly, but this is dwarfed by other sources: wind speed, the Coriolis effect, air friction. If you plot observed vs. theoretical predictions, you will see two lines that start to diverge more an more over time. The residuals - the difference between observed and predictions - will not be independent and normally distributed with mean zero, but will be biased (mean of error not equal to zero), correlated (not independent), heteroscedastic (variance changes over time), and often skewed (not normally distributed.) There are three options: 1) introduce a statistical model that models these complications, 2) add atheoretical empirical corrections to the model and leave them to be explained later, or 3) eschew formal statistical testing for the time being and report a rougher-and-ready metric.

Guidelines

So, where does that leave you? It depends on where you are in the process. Here is a rough roadmap:

Establish qualitative agreement (what the Wikipedia article calls "face validity.") This can be as simple are noting that the model is directionally correct: when one quantity goes up, so does the other.
Check that your model is unbiased. By verifying that the mean of the difference between actual and predicted values $\sum_{i=1}^n (y_i -\hat{y})_i$ is small. Very little can be said about badly biased models and in particular rough-and-ready metrics like Mean Square Error (MSE) or Mean Absolute Deviation (MAD) make little sense.
Assume a very simple model such as additive, normally distributed errors. and verify the assumptions of normality, homoscedasticity, etc. If this works, great! We're in case two with mostly well behaved measurement error to worry about. You can use the model to report goodness-of-fit, confidence intervals, and p-values.
Either give up on statistical inference and report MSE, MAD, or similar metrics in a rough-and-ready way, or
starting adding empirical terms to the model and performing model selection on them. For example, for the ballistic example, we might add a cubic correction and verify this reduced AIC by a certain amount. This generally requires considerably more data but if it works then you have an empirical fit to the data for theory to later explained. For example, Plank's law was known empirically by fitting data before any theoretical explanation could be offered.

By far the most commonly used statistical model used for Example 2 / Step 3 is the ordinary linear model. It is a common misconception that "linear" means "linear relationship between independent and dependent variables" when it really means "linear in the parameters". For example, we could model a non-linear relationship between volume and length like so $V = L^3 + \epsilon , \epsilon \sim \mathcal{N}(0, \sigma^2)$ and this is a linear model. It is also not necessary to fit a linear model to data - many metrics like MSE or log-likelihood can still be used even if the model was not fit. (Other metrics, such as the z-scores/p-values for individual parameters reported by most statistical software, do rely on the model being fit, and those you can't use.)

I know the above answer is kind of high-level, but it's difficult to do justice to it in this format. Oberkampf has written a couple of papers on this: A more introductory one that introduces concepts and a more quantitative one which discusses the statistics of validation metrics. This second paper contains all the detail needed to flesh out step (3) above, for example. Wikipedia also has a very brief introduction to the question.

Calculating goodness of fit for a physics-based model

1 Answers1

Example 1: Statistics Motivated by the Underlying Physics

Example 2: Using Statistics to Model Measurement Error

Example 3: Using Statistical Modeling to Create Empirical Models

Guidelines