6 Ways DS/ML Gods Boost Accuracy Without Trying | by Emmett Boudreau | Jul, 2020

Another incredibly easy and surprisingly way to easily boost a model’s accuracy is to get rid of any outliers in your data. There are several ways to go about removing outliers from your data, one great way of doing so is using the z-score. Another way to do this will involve getting rid of any values above the third quartile of your data. The reason why we do this is because these values can, of course, affect our mathematical representation of the data like the mean or standard deviation. This could almost certainly cause the model, which likely works off of values like the mean and standard deviation, to predict low or high, depending on where your stray data is.

The easiest way to remove outliers from data like this is to replace any values that are problematic with the mean. To start, we will get the third quartile of the data. Alternatively, you could get the mean of all of the data above the mean.

For this example, we will be switching over to Python. I will be using both languages in this article as they are both rather similar, and using both in conjunction will make it more accessible to developers on both sides of the spectrum. Consider the following DataFrame: