Contents

- 1 How a Confusion Matrix Behaves Under Distributions of Prediction Scores
- 2 Prediction Score
- 3 Example: Apple Snacks
- 4 Assessing Performance
- 5 Prediction Bias
- 6 Real World Data: Titanic Survival
- 7 Real World Data: Broward County Recidivism
- 8 Independent Predictive Models for Broward County Recidivism
- 9 Conclusion
- 10 References

## How a Confusion Matrix Behaves Under Distributions of Prediction Scores

As algorithms increasingly make decisions about human affairs, it is important that these algorithms and the data they rely on be fair and unbiased. One of the diagnostics for algorithmic bias is the *Confusion Matrix*. The Confusion Matrix is a table that shows what kinds of errors are made in predictions. While everyone who works with data knows what a Confusion Matrix ** is**, it is a more subtle matter to gain intuition for how it

**under different kinds of distributions of predictions and outcomes and the range of possible decision thresholds.**

*behaves*In this article, I walk through an interactive **Confusion Matrix Dashboard** that you can play with to explore different data sets and prediction models, and watch how the Confusion Matrix behaves. You can load your own data. Two of the included data sets are purely synthetic distributions with knobs that you can adjust. Another data set contains synthesized examples that illustrate how algorithmic bias can be distinguished from ambient imbalances. By ambient imbalances, I mean that different groups can inherently hold different distributions of features that lead to different distributions of outcomes. I propose a novel measure for prediction bias, called the Positive Prediction Ratio Score (PPRS), that is independent of the Confusion Matrix, but instead compares curves of positive outcome ratios across the range of prediction scores.

The Confusion Matrix Dashboard also includes two sets of real data about serious matters, accompanied by a few prediction models. One model of interest is the COMPAS model that is used to predict criminal recidivism. The COMPAS model has come under fire for alleged algorithmic bias. This claim is based on the way that False Positive and False Negative rates show up in the Confusion Matrix for different racial groups. There is however no single consistent way to define algorithmic bias. The Confusion Matrix Dashboard allows us to explore ways that the underlying data distributions and prediction models can give rise to allegations of bias that might be misguided.

To accompany this article, I prepared some videos that walk through the main concepts. The 2-minute promo for this article is:

**Algorithmic Bias and the Confusion Matrix Dashboard — Promo**

## Prediction Score

An initial appreciation for the Confusion Matrix can be gained by considering a process that produces a *prediction score* for a binary outcome variable. After assigning the score, we perform an experiment and make an observation. The observation outcome is tallied as either Positive or Negative. Doing this many times, we can build two distributions of observed outcomes, one distribution for the Positive outcomes, and one for the Negative outcomes.

The Confusion Matrix Dashboard allows you to experiment with two different kinds of interactive synthetic distributions. One synthetic distribution defines Positive and Negative outcomes to fall in bumps, or “bell curves”. You can play with the height, width, and locations of the Positive and Negative outcome bumps. A second kind of distribution places the Positive and Negative outcomes more uniformly along the prediction score axis. You can play with the rise and fall of these distributions as score increases.