Questions tagged [modeling]

This tag describes the process of creating a statistical or machine learning model. Always add a more specific tag.

A statistical model is a formalization of relationships between variables in the form of mathematical equations. A statistical model describes how one or more random variables are related to one or more random variables. The model is statistical as the variables are not deterministically but stochastically related.

2583 questions
89
votes
14 answers

What is the meaning of "All models are wrong, but some are useful"

"Essentially, all models are wrong, but some are useful." --- Box, George E. P.; Norman R. Draper (1987). Empirical Model-Building and Response Surfaces, p. 424, Wiley. ISBN 0471810339. What exactly is the meaning of the above phrase?
gpuguy
  • 1,123
17
votes
6 answers

What exactly is building a statistical model?

What exactly is building a statistical model? These days as I am applying for research jobs or consulting jobs, the term "building a model" or "modelling" often comes up. The term sounds cool, but what are they referring to exactly? How do you…
user13985
  • 946
  • 4
  • 12
  • 21
11
votes
10 answers

Reasons besides prediction to build models?

Joshua Epstein wrote a paper titled "Why Model?" available at http://www.santafe.edu/media/workingpapers/08-09-040.pdf in which gives 16 reasons: Explain (very distinct from predict) Guide data collection Illuminate core dynamics Suggest dynamical…
David J.
  • 576
  • 1
  • 5
  • 12
7
votes
3 answers

Where to find mathematical modeling help on low-budget project?

(Not sure how to formulate this question and also didn't find any suitable tags.) Are there any online sites where one can find professionals who do mathematical modeling, analyzing data and so on? More general, where to turn if you need help…
murrekatt
  • 333
6
votes
1 answer

Modelling the effect of a 2 by 4 mixed design on a three-level nominal dependent variable

A colleague just asked me this question: Context: A psychological study had 2 groups of participants (between subjects) 4 contexts (within subjects)) each participant provided a response in each of the four contexts there were three categorically…
Jeromy Anglim
  • 44,984
5
votes
1 answer

Fitted values for a log-normal model

I assume a simple model $\log(y_i) \sim \mathcal{N}(\mu_i,\sigma)$ with $\mu_i=\alpha + \beta x_i$ . Now if I have the estimates $\hat{\alpha}$ and $\hat{\beta}$, how do I calculate the fitted values $\hat{y}$? (Using $\hat{y}= \exp(\hat{\alpha} +…
teucer
  • 2,051
5
votes
1 answer

Non-linear model fitting in many dimensions

I am interested in comparing a non-linear model with up to 12 parameters to many datasets. However, each instance of the model takes a significant amount of time to compute (~1 hour), so I am pre-computing instances of the model for various…
astrofrog
  • 191
5
votes
1 answer

Relationship between model over fitting and number of parameters

I want to know the relationship between statistical model(regression model) over-fitting and number of parameters to be estimated. This could be a fundamental problem but I appreciate if someone can explain.
Lank
  • 77
4
votes
1 answer

Probability model vs statistical model vs stochastic model

I understand that a statistical model is a model which accounts for the uncertainty in the model. Eg. Demand price equation: $demand_{i}= a+b price_{i}+ u_{i}$ where $u_{i}$ refers to the residual term. How to differentiate the other two models from…
Harry
  • 1,387
4
votes
2 answers

Problem with calculating $R^2$

I believe I have rather simple question but I would like to make it right. I have already asked question, however I am not sure whether I did everything correct or there is a mistake in the answer (probably the first one, but still I cannot find…
mkk
  • 153
3
votes
1 answer

Detecting a peak in an averaged profile

I have a very long time series (about a gigabyte in ascii format) that looks like this: 1, 0.5 2, 0.52 3, 0.3 . . . The points occur at integer time points and are predominantly 1 second apart. A small proportion are missing. The series is known…
Henry B.
  • 1,629
2
votes
2 answers

How can I measure which function is a better fit to a set of datapoints

The problem is from here. Namely, I have set of data points data = [[90.00, 2.0], [97.40, 5.0], [104.8, 14.0], [112.2, 12.0], [119.6, 11.0], [127.0, 6.0], [134.4, 3.0], [141.8, 1.0], [149.2, 2.0], [156.6, 1.0]] I have to fit a curve of the form…
2
votes
1 answer

How to model whether or not a city is thriving?

I'm working on a concept for a game that requires some statistical inference and I'm not sure how to go about it. My issue is I'm trying to come up with a way to calculate if a city (in the game) could evolve into a thriving city. I have a bunch of…
webber
  • 121
2
votes
1 answer

Is altering predictions before model evaluation allowed?

I am doing analysis on a data set regarding tree volume prediction. I'm using regularized least squares as my prediction model and I'm using RMSE and cross-validation to evaluate my model. Currently, I have simply used cross-validation for…
jjepsuomi
  • 5,807
2
votes
2 answers

What to do when independent variables explain one another, but both explain the response?

This is an imaginary question. I am asking out of curiosity. Say we have data on how fast a person can run 100 meters. We have also measured this person's weight, and also how cold the temperature was, or what season we were in (let's say we…
Marke
  • 171
1
2 3