Skip to content
Search
Generic filters
Exact matches only

4 Pandas Plotting Function You Should Know | by Cornellius Yudha Wijaya | Aug, 2020

A lag plot is a scatter plot for a time series and the same data lagged. Lag itself is a fixed amount of passing time; for example, lag 1 is a day 1 (Y1) with a 1-day time lag (Y1+1 or Y2).

A lag plot is used to checks whether the time series data is random or not, and if the data is correlated with themselves. Random data should not have any identifiable patterns, such as linear. Although, why we bother with randomness or correlation? This is because many Time Series models are based on the linear regression, and one assumption is no correlation (Specifically is no Autocorrelation).

Let’s try with an example data. In this case, I would use a specific package to scrap stock data from Yahoo Finance called yahoo_historical.

pip install yahoo_historical

With this package, we could scrap a specific stock data history. Let’s try it.

from yahoo_historical import Fetcher#We would scrap the Apple stock data. I would take the data between 1 January 2007 to 1 January 2017 
data = Fetcher("AAPL", [2007,1,1], [2017,1,1])
apple_df = data.getHistorical()
#Set the date as the index
apple_df['Date'] = pd.to_datetime(apple_df['Date'])
apple_df = apple_df.set_index('Date')

Above is our Apple stock dataset with the date as the index. We could try to plot the data to see the pattern over time with a simple method.

apple_df['Adj Close'].plot()

We can see the Adj Close is increasing over time but is the data itself shown any pattern in with their lag? In this case, we would use the lag_plot.

#Try lag 1 day
pd.plotting.lag_plot(apple_df['Adj Close'], lag = 1)

As we can see in the plot above, it is almost near linear. It means there is a correlation between daily Adj Close. It is expected as the daily price of the stock would not be varied much in each day.

How about a weekly basis? Let’s try to plot it

#The data only consist of work days, so one week is 5 dayspd.plotting.lag_plot(apple_df['Adj Close'], lag = 5)

We can see the pattern is similar to the lag 1 plot. How about 365 days? would it have any differences?

pd.plotting.lag_plot(apple_df['Adj Close'], lag = 365)

We can see right now the pattern becomes more random, although the non-linear pattern still exists.