Skip to content
Generic filters
Exact matches only

A Complete Guide to Hypothesis Testing

Assessing Normality

The main assumption of the t-test is that the data is normally distributed. To be more precise, we need the sampling distribution of the sample mean to be approximately normal… but lets ignore this for now as this requires some knowledge on bootstrap distributions, and simply consider approximate normality of the data. Typically, for a large sample size, we have approximate normality by central limit theorem. However, if our sample is not that big, it would be a good idea to assess normality before proceeding with a t-test.

Some simple ways to check for normally distributed data is by examining a boxplot or a histogram representation of the data. A quick way to get such plots in Python is through the Seaborn library. For example, here’s a quick way to get these plots, where x represents the sample of the data we want to evaluate.

import numpy as np 
import seaborn as sns
from matplotlib import pyplot as plt
# generate normal data
x = np.random.normal(size=100)
# put plots side by side
fig, ax = plt.subplots(1, 2, figsize=(15, 6))
# histogram
sns.distplot(x, kde=False, ax=ax[0]).set_title("Histogram")
# boxplot
sns.boxplot(x=x, orient="v", ax=ax[1]).set_title("Boxplot")

Assessing normality from each of the above two plots is quite simple. For a histogram, we want to see symmetry about the mean and fairly even tails. For a boxplot, we want fairly even whisker lengths (which are associated to the tails of a normal distribution). In addition, in both cases, we do not want to see any outliers, nor skewed data. If the data is right skewed, then we will notice that most of bars on the histogram will be condensed to the left, and that there will be a longer upper whisker on the boxplot. In our example, considering that the data used to generate the above plots was directly sampled from a normal distribution, it makes sense that the generated plots meet normality.

Assessing Variance

Once we have concluded that our data is approximately normal, we need to decide which two sample t-test to use as there are further assumptions to check. This decision is based on whether the compared two groups have equal spreads/variances or not. To evaluate this, we will look at boxplots of both groups. If the spreads are the same, the interquartile range (IQR) for both boxplots will be approximately the same.

The most common hypothesis test is a t-test. However, a t-test (similarly to other tests) comes with assumptions, and if these assumption are not met, the results will not make sense. We must chose the appropriate test to use based on the distribution of the data, in addition to the number of groups being compared in the data. Most of the tests mentioned bellow are accessible through Python’s Scipy library.


If we conclude that the data has approximate normality, we can then use a t-test. There is the one-sample t-test and two types of two sample t-tests: pooled t-test (for equal variance) and Welch-Satterthwaite t-test (for unequal variance). Another case of a t-test is when we have two samples that are related to each other i.e. the matched pairs situation, where we will use a paired t-test.

For a one sample t-test, the null hypothesis His “the mean is equal to/greater than/less than θ” vs. the alternative hypothesis Ha that “the mean is not equal to/less than/greater θ” in those respective orders.

from scipy import stats
# one sample t-test (two-sided)
stats.ttest_1samp(group1, null_mean)

For a two sample t-test (where the samples are independent), His “the mean of group A is equal to/greater than/less than the mean of group B”. In the two sample case, each of the groups needs to meet approximate normality. Moreover, we need to assess the variances of both groups to decide which two sample t-test to use. The default is for unequal variance.

from scipy import stats
# two sample t-test (two-sided)
stats.ttest_ind(group1, group2, equal_var=True) #for Pooled

For a matched pairs t-test, His “the mean of both groups are equal” or “ the true difference between the means of both groups is 0”. Here, the difference between both groups must be approximately normal.

from scipy import stats
# matched pairs t-test
stats.ttest_rel(group1, group2)


When each independent group is approximately normal, and are comparing three or more groups of independent observations, we must use an analysis of variance (ANOVA). ANOVA has as H“all groups have the same mean” vs. Ha “at least one group has a different mean from the others”.

Again, as with t-test, we need to check for equal variance between the groups. If the spreads between the groups are equal, we can proceed with regular one-way ANOVA. If the spreads between the groups are not equal, we will need to use Welch ANOVA. This latter test is available through Python’s Pingouin library.

from scipy import stats
# one-way ANOVA
stats.f_oneway(group1, group2, group3, ...)

Notice that this test will not let us know where the difference is, and so we must do further testing to find which group differs from the rest, if any. In the case of equal variance, we will continue the analysis with TukeyHSD test. And in the case of unequal variance, Games-Howell is the way to go.

Sign Test & Mood’s Median Test

If we conclude that the data is not approximately normal, we cannot get any significant nor reliable results from using a t-test. We must then resort to a test that does not rely on the symmetry of the distribution. For a one sample test, we should resort to the sign test. Though when testing for two or more groups, we will use Mood’s median test. These test evaluate association between groups based on the median of the data instead of the mean. It is important to note that these median-based tests do not assume anything about the underlying distribution of the data.

The sign test has H“the median is equal to/greater than/less than θ“, while the null hypothesis for Mood’s median test is “the groups have the same (grand) median” vs. alternative hypothesis “one of the groups has median different than the grand median”.

from scipy import stats
# Mood's median test
stats.median_test(group1, group2, ..., ties)

Mood’s median test can be used for comparing two or more groups. So if we are comparing more than two groups, similarly to ANOVA, we need to find where the difference in medians occurs (if any). We do so with a pairwise median test.