## Tuple

In mathematics, a tuple is a finite sequence of objects. Sometimes, the finite sequence is also called an ordered list. This means that the order of the objects matter. In a tuple, the objects are either enclosed within parentheses {\displaystyle ...

## Uncountable set

An uncountable set is an infinite set that is impossible to count. If we try to count the elements, we will always skip some. It does not matter what size step we take. The set of real numbers, often written as R {\displaystyle \mathbb {R} }, is ...

## Venn diagram

A Venn diagram is a diagram that shows the logical relation between sets. They were popularised by John Venn in the 1880s, and are now widely used. They are used to teach elementary set theory, and to illustrate simple set relationships in probab ...

## Zermelo–Fraenkel set theory

Zermelo–Fraenkel set theory is a system of axioms used to describe set theory. When the axiom of choice is added to ZF, the system is called ZFC. It is the system of axioms used in set theory by most mathematicians today. After Russells paradox w ...

## Regression toward the mean

Regression toward the mean simply means that, following an extreme random event, the next random event is likely to be less extreme. Regression toward the mean was first described by Francis Galton. He found that offspring of tall parents tended ...

## Central limit theorem

In probability theory and statistics, the central limit theorems, abbreviated as CLT, are theorems about the limiting behaviors of aggregated probability distributions. They say that given a large number of independent random variables, their sum ...

## Chi-squared test

Chi-squared test is a statistical hypothesis test. It usually tests the hypothesis that "the experimental data does not differ from untreated data". That is a null hypothesis. The distribution of the test statistic is a chi-squared distribution w ...

## Confidence interval

In statistics, a confidence interval, abbreviated as CI, is a special interval for estimating a certain parameter, such as the population mean. With this method, a whole interval of acceptable values for the parameter is given instead of a single ...

## Coupling constant

Every force has a coupling constant, which is a measure of its strength in an interaction. The coupling constant determines the chances of one particle to emit or absorb another particle. In electromagnetism, for example, the coupling constant is ...

## Dunn index

The Dunn Index is a metric for judging a clustering algorithm. A higher DI implies better clustering. It assumes that better clustering means that clusters are compact and well-separated from other clusters. There are many ways to define the size ...

## Errors and residuals in statistics

Statistical errors and residuals occur because measurement is never exact. It is not possible to do an exact measurement, but it is possible to say how accurate a measurement is. One can measure the same thing again and again, and collect all the ...

## Expected value

In probability theory and statistics, the expected value of a random variable X {\displaystyle X} {\displaystyle E}) is the average value the variable will take, that is, assuming that the experiment is repeated an infinite number of times, and t ...

## Frequency distribution

In statistics, a frequency distribution is a list of the values that a variable takes in a sample. It is usually a list, ordered by quantity. It will show the number of times each value appears. For example, if 100 people rate a five-point Likert ...

## Gini coefficient

The Gini coefficient is a measure of differences in income. It was developed by the Italian statistician Corrado Gini in 1912.

## Grouped data

Grouped data is a statistical term used in data analysis. Raw data can be organized by grouping together similar measurements in a table. This frequency table is also called grouped data.

## Histogram

A histogram is a concept from statistics. It is a graphical display that tells us about the distribution of the samples involved. They are commonly a picture made from a table with many categories. The table tells how many samples there are in ea ...

## Independence (statistics)

Probability theory talks about events which occur with a given probability. Usually, when it talks about several events occurring, it assumes that if one event occurs, this does not change the probability of the other events occurring. More speci ...

## Infant mortality

Infant mortality is a measure of how many babies die during the first 12 months after birth. It is usually measured as being a number of deaths for every thousand births. The rate of infant mortality in a given place is the total number of babies ...

## Inference (statistics)

Statistical inference is the process of drawing conclusions from data that is subject to random variation. Examples would be observational errors or sampling variation.

## Interquartile range

In statistics, the interquartile range is a number that indicates how spread out the data are, and tells us what the range is in the middle of a set of scores. The interquartile range IQR is defined as: I Q R = Q 3 − Q 1 {\displaystyle \mathrm {I ...

Lady tasting tea is the name of a famous randomized experiment designed by Ronald Fisher in 1935. The experiment is the original exposition of Fishers notion of a null hypothesis. Fishers description is less than ten pages long and is notable for ...

## Law of large numbers

The law of large numbers, or LLN for short, is a theorem from statistics. It states that if a random process is repeatedly observed, then the average of the observed values will be stable in the long run. This means that as the number of observat ...

## Linear regression

Linear regression is a way to explain the relationship between a dependent variable and one or more explanatory variables using a straight line. It is a special case of regression analysis. Linear regression was the first type of regression analy ...

## Logistic regression

Logistic regression, also known as logit regression or logit model, is a mathematical model used in statistics to estimate the probability of an event occurring having been given some previous data. Logistic regression works with binary data, whe ...

## Mean

In mathematics and statistics, the mean is a kind of average. Besides the mean, there are other kinds of average, and there are also a few kinds of mean. The most common mean is the arithmetic mean, which is calculated by adding all of the values ...

## Median

In probability theory and statistics, the median of a data set X {\displaystyle X}, sometimes written as X ~ {\displaystyle {\widetilde {X}}}, is a number describing the data set. This number has the property that it divides a set of observed val ...

## Method of moments (statistics)

Suppose that the problem is to estimate k {\displaystyle k} unknown parameters θ 1, θ 2, …, θ k {\displaystyle \theta _{1},\theta _{2},\dots,\theta _{k}} describing the distribution f w ; θ {\displaystyle f_{W}w;\theta} of the random variable W { ...

## Null hypothesis

In statistics, a null hypothesis, often written as H 0 {\displaystyle H_{0}}, is a statement assumed to be true unless it can be shown to be incorrect beyond a reasonable doubt. The idea is that the null hypothesis generally assumes that there is ...

## P-value

In statistics, a p -value is the probability that the null hypothesis gives for a specific experimental result to happen. p -value is also called probability value. If the p -value is low, the null hypothesis is unlikely, and the experiment has s ...

## Parametric statistics

Parametric statistics is a branch of statistics. It assumes that in the unknown population, the observations follow a probability distribution. Most of the parameters of the distribution are assumed to be known. Most methods of statistical analys ...

## Percentile

A percentile is a measure in statistics. It shows the value below which a given percentage of observations falls. For example, the 20th percentile is the value or score below which 20% of the observations may be found. The 35th percentile is the ...

## Poisson point process

A Poisson process is a stochastic process. It counts the number of occurrences of an event leading up to a specified time. This is a counting process where the increments of time are independent of one another.

## Population (statistics)

In statistics, a population is a set of things from which samples may be drawn. This allows statistical inferences to be drawn, or estimates made of the total population. For example, if we were interested in crows, then we would sample the set o ...

## Population without double counting

Population sans doubles comptes is a phrase in French that means population without double counting in English. In France, because of the census, the INSEE has allowed people who live in one place and study in a different place to be counted twic ...

## Probability density function

In probability and statistics, a probability density function is a function that characterizes any continuous probability distribution. For a random variable X, the probability density function of X is sometimes written as f X {\displaystyle f_{X ...

## Rank correlation

A rank correlation is any statistic that measures the relationship between rankings. A "ranking" is the assignment of "first", "second", "third", etc. to different observations of a variable. A rank correlation coefficient measures the degree of ...

## Sample

In statistics, a sample is part of a population. The sample is carefully chosen. It should represent the whole population fairly, without bias. When treated as a data set, a sample is often represented by capital letters such as X {\displaystyle ...

## Selection bias

Selection bias is a kind of bias that is introduced by the selection of individuals, groups or data for analysis in such a way that proper randomization is not achieved. This means that the sample may no longer represent the population to be anal ...

Simpsons paradox is a paradox from statistics. It is named after Edward H. Simpson, a British statistician who first described it in 1951. The statistician Karl Pearson described a very similar effect in 1899 Udny Yules description dates from 190 ...

## Standard deviation

Standard deviation is a number used to tell how measurements for a group are spread out from the average. A low standard deviation means that most of the numbers are close to the average, while a high standard deviation means that the numbers are ...

## Standard error

The standard error, sometimes abbreviated as S E {\displaystyle SE}, is the standard deviation of the sampling distribution of a statistic. The term may also be used for an estimate of that standard deviation taken from a sample of the whole grou ...

## Statistical hypothesis test

A statistical hypothesis test is a method used in statistics. It helps you describe the results you get from an experiment. The hypothesis test tells you the likelihood that a specific result would happen by chance. Statistical hypothesis tests a ...

## Statistical parameter

A statistical parameter or population parameter is an amount put into the probability distribution of a statistic or a random variable. It can be thought of as a numbered amount of a trait of a statistical population or a statistical model. A sta ...

## Statistical significance

Statistics uses variables to describe a measurement. Such a variable is called statistically significant if under a certain status quo assumption, the probability of obtaining its outcome is less than a given value. Statistical significance is he ...

## Statistical survey

Statistical surveys are collections of information about items in a population. Surveys can be grouped into numerical and categorical types. A numerical survey is to get numbers as replies. For example: How many minutes, on average, do you spend ...

## Statistics

Statistics is a branch of applied mathematics dealing with data collection, organization, analysis, interpretation and presentation. Descriptive statistics summarize data. Inferential statistics make predictions. Statistics helps in the study of ...

## Students t-test

A t-test is a statistical hypothesis test. People use it when they want to compare a mean of a measurement from one group A to some theoretical, expected value. People also use it when they want to compare the mean of a measurement of two groups ...

## Type I and type II errors

In statistics, type I and type II errors are errors that happen when a coincidence occurs while doing statistical inference, which leads to one making the wrong conclusion. One makes a Type I error when the original hypothesis is rejected, when i ...

## Variance

In probability theory and statistics, the variance is a way to measure how far a set of numbers is spread out. Variance describes how much a random variable differs from its expected value. The variance is defined as the average of the squares of ...

## Zipfs law

Zipfs law is an empirical law, formulated using mathematical statistics, named after the linguist George Kingsley Zipf, who first proposed it. Zipfs law states that given a large sample of words used, the frequency of any word is inversely propor ...