Assessing normality from each of the above two plots is quite simple. For a histogram, we want to see symmetry about the mean and fairly even tails. For a boxplot, we want fairly even whisker lengths (which are associated to the tails of a normal distribution). In addition, in both cases, we do not want to see any outliers, nor skewed data. If the data is right skewed, then we will notice that most of bars on the histogram will be condensed to the left, and that there will be a longer upper whisker on the boxplot. In our example, considering that the data used to generate the above plots was directly sampled from a normal distribution, it makes sense that the generated plots meet normality.

Once we have concluded that our data is approximately normal, we need to decide which two sample t-test to use as there are further assumptions to check. This decision is based on whether the compared two groups have equal spreads/variances or not. To evaluate this, we will look at boxplots of both groups. If the spreads are the same, the interquartile range ( IQR) for both boxplots will be approximately the same.

The most common hypothesis test is a t-test. However, a t-test (similarly to other tests) comes with assumptions, and if these assumption are not met, the results will not make sense. We must chose the appropriate test to use based on the distribution of the data, in addition to the number of groups being compared in the data. Most of the tests mentioned bellow are accessible through Python’s Scipy library.

T-Tests
If we conclude that the data has approximate normality, we can then use a t-test. There is the one-sample t-test and two types of two sample t-tests: pooled t-test (for equal variance) and Welch-Satterthwaite t-test (for unequal variance). Another case of a t-test is when we have two samples that are related to each other i.e. the matched pairs situation, where we will use a paired t-test.

For a one sample t-test, the null hypothesis H₀ is “the mean is equal to/greater than/less than θ₀ ” vs. the alternative hypothesis Ha that “the mean is not equal to/less than/greater θ₀ ” in those respective orders.

from scipy import stats # one sample t-test (two-sided) stats.ttest_1samp(group1, null_mean)
For a two sample t-test (where the samples are independent), H₀ is “the mean of group A is equal to/greater than/less than the mean of group B”. In the two sample case, each of the groups needs to meet approximate normality. Moreover, we need to assess the variances of both groups to decide which two sample t-test to use. The default is for unequal variance.

from scipy import stats # two sample t-test (two-sided) stats.ttest_ind(group1, group2, equal_var=True) #for Pooled
For a matched pairs t-test, H₀ is “the mean of both groups are equal” or “ the true difference between the means of both groups is 0”. Here, the difference between both groups must be approximately normal.

from scipy import stats # matched pairs t-test stats.ttest_rel(group1, group2)
ANOVA
When each independent group is approximately normal, and are comparing three or more groups of independent observations, we must use an analysis of variance (ANOVA). ANOVA has as H₀ “all groups have the same mean” vs. Ha “at least one group has a different mean from the others”.

Again, as with t-test, we need to check for equal variance between the groups. If the spreads between the groups are equal, we can proceed with regular one-way ANOVA. If the spreads between the groups are not equal, we will need to use Welch ANOVA. This latter test is available through Python’s Pingouin library.

from scipy import stats # one-way ANOVA stats.f_oneway(group1, group2, group3, ...)
Notice that this test will not let us know where the difference is, and so we must do further testing to find which group differs from the rest, if any. In the case of equal variance, we will continue the analysis with TukeyHSD test. And in the case of unequal variance, Games-Howell is the way to go.

Sign Test & Mood’s Median Test
If we conclude that the data is not approximately normal, we cannot get any significant nor reliable results from using a t-test. We must then resort to a test that does not rely on the symmetry of the distribution. For a one sample test, we should resort to the sign test. Though when testing for two or more groups, we will use Mood’s median test. These test evaluate association between groups based on the median of the data instead of the mean. It is important to note that these median-based tests do not assume anything about the underlying distribution of the data.

The sign test has H₀ “the median is equal to/greater than/less than θ₀ “, while the null hypothesis for Mood’s median test is “the groups have the same (grand) median” vs. alternative hypothesis “one of the groups has median different than the grand median”.

from scipy import stats # Mood's median test stats.median_test(group1, group2, ..., ties)
Mood’s median test can be used for comparing two or more groups. So if we are comparing more than two groups, similarly to ANOVA, we need to find where the difference in medians occurs (if any). We do so with a pairwise median test.