Skip to content
Search
Generic filters
Exact matches only

Analyzing seasonality using Fourier transforms

Analyzing 911 phone call seasonality

Today, let’s analyze 911 phone call data from Montgomery County, PA. We’re looking to answer whether there are higher or lower levels of 911 calls during:

  • Certain hours of the day?
  • Certain days of the week?
  • Certain months of the year?

Based on the results, we can make decisions on how to staff our 911 call center. For example, if we find that call volume is highest on Friday evenings, we can offer more shifts on Friday evenings so our call center can handle the higher call volume.

What does the Fourier transform do?

On the left, we graphed the sum of two sin waves, one with a period of 5 and frequency of 1/5=0.2 and the other with a frequency of 1/10=0.1. In the Fourier transform, we can clearly see that we have two waves with frequencies of 0.2 and 0.1 by looking at the frequencies corresponding to the peaks.

Real data often contains noise and the Fourier transform lets us see through the noise, and see which frequencies actually matter.

We took the signal from before and added random noise, and we can still clearly see the same frequencies of the signal in the Fourier transform. This is how the Fourier transform separates signal from noise.

This article won’t delve into the mathematics and derivation of Fourier transform here. If you’re interested, I recommend watching 3Blue1Brown’s Visual Introduction to the Fourier Transform after completing this exercise. I recommend doing this exercise first because Fourier Transforms are one of those concepts where starting with a practical example will help you appreciate the mathematics behind it.

Data preparation

For the data prep, let’s transforming the raw data to count the number of calls each hour. We’re aggregating call count at the hour level because call volume at the minute-level is too low and we’re not expecting to see any seasonality below the hour-level. As a rule of thumb, you want your sampling frequency to be twice the highest component frequency you’re expecting to find in the signal. If your frequency is any lower, a condition called aliasing occurs and distorts your results. The minimum frequency where you meet the “2× highest component frequency” rule is referred to as the Nyquist rate. Intuitively, this concept makes sense because we can’t count phone calls per day to answer how the hour of the day impacts phone call volume.

Also, we need to make sure we fill in any missing hours (where there were no 911 calls) with zeros. Finally, for the signal, let’s chart the difference from the average call count instead of the call count itself. This way, our

The first week of data is plotted on the right. Definitely seeing some seasonality here, so it looks like our analysis will be promising.

Fourier transform

Just pass your input data into the function and it’ll output the results of the transform. For the amplitude, take the absolute value of the results. To get the corresponding frequency, we use scipy.fft.fftfreq. We can chart the amplitude vs. the frequency. The frequencies with the highest amplitude are indicative of seasonal patterns. Frequencies with low amplitude are noise. Let’s mark the frequencies where we clearly see spikes in amplitude.

If we look at those frequencies with the highest amplitudes and convert them into hours and days, we see that the top seasonal pattern has a daily frequency (the period is ~1 day).

After that, the amplitude sharply drops off and we see seasonality at 8 hours and 7 days. The former suggests there’s a spike in call volume 3 times a day (potentially morning, evening, and late-night?). The latter suggests that call volume spikes one day out of the week.

The other frequencies are difficult to contextualize, but they’re not very important given their low amplitudes.

Inverse Fourier transform

Here’s what that looks like if we chart the filtered signal over the original signal for the first 5 days of data.

Looks promising! The peaks in the filtered signal line up with the original signal around 5 pm. The problem is that when we stretch this out to the last week of data, the peaks start occurring at 11 am instead.

So what gives? The problem is that our frequency wasn’t exactly once every 24 hours. It was actually once every 23.996 hours, and over the course of the entire dataset, that small deviation adds up.

What can we do?

This will help us figure out when seasonality spikes by trying different inputs inspired by our Fourier results. Additionally, this will allow us to combine seasonality with other variables in our regression model so we can predict future call volume more accurately. We’ll also see how seasonality is often used as a way to explain the residuals in a regression model.

I’m currently working on a companion article on how to use seasonality in your regression models and I will link it here once it’s complete. Be sure to follow me and so you’ll be notified when it’s out.

Helpful links