Tags
A tag is a keyword or label that categorizes your question with other, similar questions.
Techniques for analyzing the relationship between one (or more) "dependent" variables and "independent" variables.
29789 questions
Use this tag for any *on-topic* question that (a) involves `R` either as a critical part of the question or expected answer, & (b) is not *just* about how to use `R`.
29737 questions
Machine learning algorithms build a model of the training data. The term "machine learning" is vaguely defined; it includes what is also called statistical learning, reinforcement learning, unsupervised learning, etc. ALWAYS ADD A MORE SPECIFIC TAG.
20090 questions
Time series are data observed over time (either in continuous time or at discrete time periods).
14308 questions
A probability provides a quantitative description of the likely occurrence of a particular event.
12556 questions
Hypothesis testing assesses whether data are inconsistent with a given hypothesis rather than being an effect of random fluctuations.
10751 questions
A distribution is a mathematical description of probabilities or frequencies.
9560 questions
A routine exercise designed to test one's knowledge; often from a textbook, course, or test used for a class or self-study. This community's policy is to "provide helpful hints" for such questions rather than complete answers.
8136 questions
Artificial neural networks (ANNs) are a broad class of computational models loosely based on biological neural networks. They encompass feedforward NNs (including "deep" NNs), convolutional NNs, recurrent NNs, etc.
8062 questions
Bayesian inference is a method of statistical inference that relies on treating the model parameters as random variables and applying Bayes' theorem to deduce subjective probability statements about the parameters or hypotheses, conditional on the observed dataset.
8003 questions
Refers generally to statistical procedures that utilize the logistic function, most commonly various forms of logistic regression
7842 questions
Mathematical theory of statistics, concerned with formal definitions and general results.
7765 questions
Statistical classification is the problem of identifying the sub-population to which new observations belong, where the identity of the sub-population is unknown, on the basis of a training set of data containing observations whose sub-population is known. Therefore these classifications will show a variable behavior which can be studied by statistics.
6881 questions
Mixed (aka multilevel or hierarchical) models are linear models that include both fixed effects and random effects. They are used to model longitudinal or nested data.
6418 questions
Statistical significance is a characteristic of a statistic viewed in light of a null hypothesis and a given significance level. It reflects whether the statistic belongs to the rejection region (is statistically significant) or the acceptance region (is not statistically significant).
6388 questions
The normal, or Gaussian, distribution has a density function that is a symmetrical bell-shaped curve. It is one of the most important distributions in statistics. Use the [normality] tag for asking about testing for normality.
6091 questions
Regression that includes two or more non-constant independent variables.
5564 questions
ANOVA stands for ANalysis Of VAriance, a statistical model and set of procedures for comparing multiple group means. The independent variables in an ANOVA model are categorical, but an ANOVA table can be used to test continuous variables as well.
5330 questions
Python is a programming language commonly used for machine learning. Use this tag for any *on-topic* question that (a) involves `Python` either as a critical part of the question or expected answer, & (b) is not *just* about how to use `Python`.
4791 questions
A confidence interval is an interval that covers an unknown parameter with $100(1-\alpha)\%$ confidence. Confidence intervals are a frequentist concept. They are often confused with credible intervals which is the Bayesian analog.
4581 questions
A generalization of linear regression allowing for nonlinear relationships via a "link function" and for the variance of the response to depend on the predicted value. (Not to be confused with "general linear model" which extends the ordinary linear model to general covariance structure and multivariate response.)
4578 questions
The expected squared deviation of a random variable from its mean; or, the average squared deviation of data about their mean.
4239 questions
Cluster analysis is the task of partitioning data into subsets of objects according to their mutual "similarity," without using preexisting knowledge such as class labels. [Clustered-standard-errors and/or cluster-samples should be tagged as such; do NOT use the "clustering" tag for them.]
4021 questions
Prediction of the future events. It is a special case of [prediction], in the context of [time-series].
3878 questions
A test for comparing the means of two samples, or the mean of one sample (or even parameter estimates) with a specified value; also known as the "Student t-test" after the pseudonym of its inventor.
3674 questions
Categorical (also called nominal) data can take on a limited number of possible values called categories. Categorical values "label", they do not "measure". Please use [ordinal-data] tag for discrete but ordered data types.
3575 questions
Repeatedly withholding subsets of the data during model fitting in order to quantify the model performance on the withheld data subsets.
3473 questions
lme4 and nlme are R packages used for fitting linear, generalized linear and nonlinear mixed effects models. For general questions about mixed models use [mixed-model] tag.
3457 questions
Principal component analysis (PCA) is a linear dimensionality reduction technique. It reduces a multivariate dataset to a smaller set of constructed variables preserving as much information (as much variance) as possible. These variables, called principal components, are linear combinations of the input variables.
3419 questions
a method of estimating parameters of a statistical model by choosing the parameter value that optimizes the probability of observing the given sample.
3367 questions
Survival analysis models time to event data, typically time to death or failure time. Censored data are a common problem for survival analyses.
3306 questions
This tag is too general; please provide a more specific tag. For questions about the properties of specific estimators, use [estimators] tag instead.
3251 questions
Creating samples from a well-specified population using a probabilistic method and/or producing random numbers from a specified distribution. As this tag is ambiguous, please consider [survey-sampling] for the former and [monte-carlo] or [simulation] for the latter. For questions regarding creating random samples from known distributions, please consider using the [random-generation] tag.
3244 questions
Predictive models are statistical models whose primary purpose is to predict other observations of a system optimally, as opposed to models whose purpose is to test a particular hypothesis or explain a phenomenon mechanistically. As such, predictive models place less emphasis on interpretability and more emphasis on performance.
3152 questions
Constructing and interpreting meaningful and useful graphical representations of data. (If your question is only about how to get particular software to produce a specific effect, then it is likely not on topic here.)
3068 questions