Binomial distributions appear in many real-world contexts. If a situation meets all of the four following criteria, chances are you’re looking at a binomial distribution:
- There are only two possible and mutually exclusive outcomes — for example, yes or no, customer or not, etc. (The bi in binomial.)
- There is a predefined, finite, and constant number of repeated experiments or trials.
- All trials/experiments are identical in that they are all conducted in the same fashion as the others, yet are independent in that one trial’s outcome does not affect others’.
- The probability of success is the same in each of the trials.
Consider, for example, a company that wants to predict the chance that a customer will purchase a product after being exposed to identical advertisements. After determining p and the maximum amount of ads the company would like to run (n), the company can then use a binomial distribution’s mass probability function to determine how many advertisements are worth running in their marketing campaign.
The Bernoulli Distribution is the Binomial Distribution with only one experiment. It happens in an experiment with only two outcomes, successfully with probability p and unsuccessfully with probability q = 1 – p.
The Bernoulli distribution really isn’t a distribution as it is a special case of the Binomial distribution, but it’s good jargon to understand.
A Poisson distribution models the probability of a given number of events occurring in a fixed interval of time or space if these events occur with a constant mean rate, independently of the time since the last event. The controllable factor of each distribution is λ, which is the mean rate.
For instance, if you are keeping track of the number of emails you receive every day and notice you receive an average of 14 a day. If receiving an email does not affect the arrival times of future emails, then the number of emails you receive a day probably obeys the Poisson distribution. there are many other examples of these distributions in other scenarios. For example, the number of phone calls received by a call center per day or the number of decay events per second from a radioactive source have been shown to follow a Poisson distribution.
The distribution follows the probability mass function of:
A bimodal distribution has two peaks (hence the name, bimodal). They are usually a mixture of two unique unimodal (only one peak, for example a normal or Poisson distribution) distributions, relying on two distributed variables X and Y, with a mixture coefficient α.
This means that when a bimodal distribution arises in scenarios where it would seem to be unimodal, there may be an external force at play. In fact, in 2010 professor Richard Quinn caught his students at the University of Central Florida in a cheating scandal on the midterm based on the distribution of scores. Test scores are almost always normally distributed, as it is with midterms at almost any university or with the ACT or the SAT. The Fall 2010 midterms, however, had two peaks.
Because unnatural bimodal distributions are almost always partial combinations of two unique unimodal distributions, the professor realized that the two distributions represented two groups of students: the ones that didn’t cheat, and the ones that did. The unimodal distribution that performed worse has stark resemblance to the Summer 2010 midterm distribution. After the professor discovered this, he did some investigation and discovered that the test bank had been leaked to a group of students.
Although bimodal (or multimodal) distributions can be revealing of systematic biases or issues, they often occur naturally as well. These naturally bimodally distributed variables include:
- the time between eruptions of geysers
- the color of galaxies
- the size of worker weaver ants
- the age of incidence of Hodgkin’s lymphoma
- traffic analysis
- water demand
The most standard (and hence ‘normal’) distribution is the normal distribution, also known as the bell curve, based on its appearance.
Its most standard and well-known properties include the relationship between percentiles of data and standard deviations.
- 68.2% of the data is within one standard deviation of the data.
- 95.4% of the data is within two standard deviations of the data.
- 99.6% of the data is within three standard deviations of the data.
The equation exp(-x²) gives the Bell curve. While this is not the formal probability mass function (which is much more complex), the normal distribution can be created using this equation. A good way to think about this is that exp(x) creates a curve that gradually diminishes to the left and increases to the right, whereas exp(-x) creates a curve that gradually diminishes towards the right and increases towards the left. By applying the squared function, these two functions are ‘combined’, with gradually diminishing properties of both functions.
Within statistics and machine learning, normal distribution plays a significant role, such as in the assumptions of machine learning models. Linear regression models assume that the residuals — the errors of data points from the line’s fit — are normally distributed, for instance. It is definitively the most studied and well-known distribution, with various theorems, properties, and applications to other areas well documented and explored.
In nature, physical characteristics of plants and animals (like height or weight) form a normal distribution. The performance of stock prices tends to fit a normal distribution, and common prediction and measurement errors tend to be distributed normally. It has been observed that natural epidemics follow the path of a Bell Curve. Other measures of performance such as IQ, musical ability, and standardized testing are normally distributed. This has to do with the Central Limit Theorem, which essentially states that various random signals naturally combine together to form a normal distribution.
A uniform distribution is one in which the y-value is approximately the same for every value of x plugged into the probability mass function.
This distribution is the most standard in equal probabilities, such as throwing a die. A uniform distribution, while it may seem boring, has many statistical properties that can make it useful. for instance, consider the German Tank Problem. In World War II, the Allies needed to estimate how many tanks the Germans were producing, and realized that they could use sequential serial numbers on captured tanks to estimate the total number of tanks.
In a uniform distribution from 1 to n, captured serial numbers can be assumed to be an equal distance from each other. Using this property, the Allies were able to create a formula to estimate the number of tanks n that were in production. Often, if the distribution is unknown, depending on the context, it can be usually be assumed to be a uniform distribution. Many principles of statistical inference are based on this premise.