Generic filters
Exact matches only

# 5 Powerful Visualisation with Pandas for Data Preprocessing | by Kaushik Choudhury | Aug, 2020

Autocorrelation plot

Autocorrelation plots are a quick litmus test to ascertain whether the data points are random. In case the data points are following a certain trend, then one or more of the autocorrelations will be significantly non-zero. The dotted line in the plot shows 99%, confidence band.

In the code below, we are checking whether the total_bill amount in the “tips” database is random.

`autocorrelation_plot(MealDatabase.total_bill)plt.show()`

We can see that the autocorrelation plot is moving very close to zero for all time-lags suggesting that the total_bill data points are random.

When we plot the autocorrelation plot for data points following a particular order, we can see that the plot is significantly non-zero.

`data = pd.Series(np.arange(12,7000,16.3))autocorrelation_plot(data)plt.show()`

Lag Plots

Lag plots are also helpful to verify if the dataset is a random set of values or follows a certain trend.

When the lag plot of “total_bills” value from “tips” dataset is plotted, as in the autocorrelation plot, the lag plot suggests it as random data with values all over the place.

`lag_plot(MealDatabase.total_bill)plt.show()`

When we lag plot a non-random data series, as shown in the code below, we get a nice smooth line.

`data = pd.Series(np.arange(-12*np.pi,300*np.pi,10))lag_plot(data)plt.show()`

Parallel coordinates

It is always a challenge to wrap our head around and visualize more than 3-dimensional data. To plot higher dimension dataset parallel coordinates are very useful. Each dimension is represented by a vertical line.

In parallel coordinates, “N” equally spaced vertical lines represents “N” dimensions of the dataset. The position of the vertex on the n-th axis corresponds to the n-th coordinate of the point.

Confusing!

Let us consider a small sample data with five features for small and large size widgets.