How would we define a "model" in terms of its relation estimators and statistics?

Question

I found this to be an interesting post but I want to hone in on the definition of a model.

It defines a model as:

the function (or pooled set of functions) that you may accept or reject as being representative of your phenomenon.

I've likewise seen a model defined as a mathematical artifact that describes a random process.

This all seems to look backwards at the sample or data generating process a model describes. But how would we describe it in terms of its output?

My question is: How would we describe it in terms of both its inputs and it outputs?

Can we augment a definition of a model as:

A mathematical representation of a data generating process with randomness

to:

A mathematical representation of a data generating process with randomness that produces statistics and/or estimates of true parameters?

Is this too limiting? How else can a model be accurately defined in terms of it outputs?

In what sense do you believe that a model "produces estimates of true parameters", or that these are the outputs of the model? — Arya McCarthy, Apr 12 '22 at 03:54
@AryaMcCarthy My thinking was: Let's say your model learns parameter estimates. For example, in regression, we learn parameters describing the conditional expectation of a dependent variable given a response. My thinking is that these parameter estimates are simply estimates of the true parameters -- here, the describing the true relationship between two population variables. — Estimate the estimators, Apr 12 '22 at 12:34

Ben · Answer 1 · 2022-04-13T00:26:38.157

The question of "what is a model" is generally context-dependent, but it is useful to examine the standard case in statistics and then branch out from there. For this purpose, it is important to note that statistical analysis sometimes uses "parametric models" (i.e., models defined by a class of distributions indexed by a parameter vector) and sometimes uses "non-parametric models" (i.e., models defined by a broader class of distributions that are not indexed by a parameter).

In a statistical context, a "parametric model" will typically be a specification of a parameterised class of distributions for a set of observable values. Usually the model will be general enough to describe the joint distribution of a sample of any allowable size, and it will parameterise the distributions based on a parameter in some set (that is usually independent of the sample size).

For example, if we have a sequence of observable values $x_1,x_2,x_3,...$ then the model would be a mapping from a parameter $\theta \in \Theta$ and a sample size $n \in \mathbb{N}$ to the joint distribution of a sample of that size conditional on that parameter value:

$$F_{n,\theta}(\mathbf{x}_n) = \mathbb{P}(X_1 \leqslant x_1,...,X_n \leqslant x_n | \theta).$$

In most statistical models the parameter space is fixed irrespective of the sample size. In this case, if we let $\mathscr{F}_n \subset (\mathbb{R}^n)^{[0,1]}$ denote the class of distribution functions with input dimension $n$ then we can write the model as a sequence of functions:

$$\mathscr{M} \equiv \{ \mathscr{M}_n | n \in \mathbb{N} \} \quad \quad \quad \quad \quad \mathscr{M}_n :\Theta \rightarrow \mathscr{F}_n.$$

It is important to note that the purpose of a parametric model is to specify the probabilistic behaviour of the sample values (parameterised by some unknown parameter). The model does not generally include a specification of the estimators used to estimate the parameters, nor the kinds of other statistics that might be useful in the modelling process. Those things are important to investigate when making inferences from a model, but they are not considered to be "part of" the model. (So, for example, the OLS estimator is not strictly part of the linear regression model; it is just an estimation method that is commonly used with it.)

How would we define a "model" in terms of its relation estimators and statistics?

1 Answers1