Tags - Statistical Analysis Stack Exchange

regression

Techniques for analyzing the relationship between one (or more) "dependent" variables and "independent" variables.

29789 questions

r

Use this tag for any *on-topic* question that (a) involves `R` either as a critical part of the question or expected answer, & (b) is not *just* about how to use `R`.

29737 questions

machine-learning

Machine learning algorithms build a model of the training data. The term "machine learning" is vaguely defined; it includes what is also called statistical learning, reinforcement learning, unsupervised learning, etc. ALWAYS ADD A MORE SPECIFIC TAG.

20090 questions

time-series

Time series are data observed over time (either in continuous time or at discrete time periods).

14308 questions

probability

A probability provides a quantitative description of the likely occurrence of a particular event.

12556 questions

hypothesis-testing

Hypothesis testing assesses whether data are inconsistent with a given hypothesis rather than being an effect of random fluctuations.

10751 questions

distributions

A distribution is a mathematical description of probabilities or frequencies.

9560 questions

self-study

A routine exercise designed to test one's knowledge; often from a textbook, course, or test used for a class or self-study. This community's policy is to "provide helpful hints" for such questions rather than complete answers.

8136 questions

neural-networks

Artificial neural networks (ANNs) are a broad class of computational models loosely based on biological neural networks. They encompass feedforward NNs (including "deep" NNs), convolutional NNs, recurrent NNs, etc.

8062 questions

bayesian

Bayesian inference is a method of statistical inference that relies on treating the model parameters as random variables and applying Bayes' theorem to deduce subjective probability statements about the parameters or hypotheses, conditional on the observed dataset.

8003 questions

logistic

Refers generally to statistical procedures that utilize the logistic function, most commonly various forms of logistic regression

7842 questions

mathematical-statistics

Mathematical theory of statistics, concerned with formal definitions and general results.

7765 questions

classification

Statistical classification is the problem of identifying the sub-population to which new observations belong, where the identity of the sub-population is unknown, on the basis of a training set of data containing observations whose sub-population is known. Therefore these classifications will show a variable behavior which can be studied by statistics.

6881 questions

mixed-model

Mixed (aka multilevel or hierarchical) models are linear models that include both fixed effects and random effects. They are used to model longitudinal or nested data.

6418 questions

statistical-significance

Statistical significance is a characteristic of a statistic viewed in light of a null hypothesis and a given significance level. It reflects whether the statistic belongs to the rejection region (is statistically significant) or the acceptance region (is not statistically significant).

6388 questions

correlation

A measure of the degree of association among a pair of variables.

6327 questions

normal-distribution

The normal, or Gaussian, distribution has a density function that is a symmetrical bell-shaped curve. It is one of the most important distributions in statistics. Use the [normality] tag for asking about testing for normality.

6091 questions

multiple-regression

Regression that includes two or more non-constant independent variables.

5564 questions

anova

ANOVA stands for ANalysis Of VAriance, a statistical model and set of procedures for comparing multiple group means. The independent variables in an ANOVA model are categorical, but an ANOVA table can be used to test continuous variables as well.

5330 questions

python

Python is a programming language commonly used for machine learning. Use this tag for any *on-topic* question that (a) involves `Python` either as a critical part of the question or expected answer, & (b) is not *just* about how to use `Python`.

4791 questions

confidence-interval

A confidence interval is an interval that covers an unknown parameter with $100(1-\alpha)\%$ confidence. Confidence intervals are a frequentist concept. They are often confused with credible intervals which is the Bayesian analog.

4581 questions

generalized-linear-model

A generalization of linear regression allowing for nonlinear relationships via a "link function" and for the variance of the response to depend on the predicted value. (Not to be confused with "general linear model" which extends the ordinary linear model to general covariance structure and multivariate response.)

4578 questions

variance

The expected squared deviation of a random variable from its mean; or, the average squared deviation of data about their mean.

4239 questions

clustering

Cluster analysis is the task of partitioning data into subsets of objects according to their mutual "similarity," without using preexisting knowledge such as class labels. [Clustered-standard-errors and/or cluster-samples should be tagged as such; do NOT use the "clustering" tag for them.]

4021 questions

forecasting

Prediction of the future events. It is a special case of [prediction], in the context of [time-series].

3878 questions

t-test

A test for comparing the means of two samples, or the mean of one sample (or even parameter estimates) with a specified value; also known as the "Student t-test" after the pseudonym of its inventor.

3674 questions

categorical-data

Categorical (also called nominal) data can take on a limited number of possible values called categories. Categorical values "label", they do not "measure". Please use [ordinal-data] tag for discrete but ordered data types.

3575 questions

cross-validation

Repeatedly withholding subsets of the data during model fitting in order to quantify the model performance on the withheld data subsets.

3473 questions

lme4-nlme

lme4 and nlme are R packages used for fitting linear, generalized linear and nonlinear mixed effects models. For general questions about mixed models use [mixed-model] tag.

3457 questions

pca

Principal component analysis (PCA) is a linear dimensionality reduction technique. It reduces a multivariate dataset to a smaller set of constructed variables preserving as much information (as much variance) as possible. These variables, called principal components, are linear combinations of the input variables.

3419 questions

maximum-likelihood

a method of estimating parameters of a statistical model by choosing the parameter value that optimizes the probability of observing the given sample.

3367 questions

survival

Survival analysis models time to event data, typically time to death or failure time. Censored data are a common problem for survival analyses.

3306 questions

estimation

This tag is too general; please provide a more specific tag. For questions about the properties of specific estimators, use [estimators] tag instead.

3251 questions

sampling

Creating samples from a well-specified population using a probabilistic method and/or producing random numbers from a specified distribution. As this tag is ambiguous, please consider [survey-sampling] for the former and [monte-carlo] or [simulation] for the latter. For questions regarding creating random samples from known distributions, please consider using the [random-generation] tag.

3244 questions

predictive-models

Predictive models are statistical models whose primary purpose is to predict other observations of a system optimally, as opposed to models whose purpose is to test a particular hypothesis or explain a phenomenon mechanistically. As such, predictive models place less emphasis on interpretability and more emphasis on performance.

3152 questions

data-visualization

Constructing and interpreting meaningful and useful graphical representations of data. (If your question is only about how to get particular software to produce a specific effect, then it is likely not on topic here.)

3068 questions