# Time Series Analysis Part 1: Regression with a Twist

We’re surrounded by time series. It’s one of the more common plots we see in day-to-day life. Finance and economics are full of them – stock prices, GDP over time, and 401K value over time to name a few. The plot looks deceptively simple; just a nice univariate squiggle. No crazy vectors, no surfaces, just one predictor – time. It turns out time is a tricky and fickle explanatory variable, which makes analysis of time series a bit more nuanced than first glance. This nuance is obscured by the ease of automatic implementation of time series modeling in languages like R^{1} As nice as this is for practitioners, the mathematics behind this analysis is lost. Ignoring the mathematics can lead to improper use of these tools. This series will examine some of the mathematics behind stationarity and what is known as ARIMA (**A**uto-**R**egressive **I**ntegrated **M**oving **A**verage) modeling. Part 1 will examine the very basics, showing that time series modeling is really just regression with a twist.

## Back to Basics: Regression Line

We all know the basic equation for a line: y = mx + b. Now, if I change the independent, or explanatory variable^{2} to time and denote it t, then a basic equation for some phenomenon y that depends linearly with time can be written as

Plugging in some numbers, perhaps y = 2t + 4. Next step: make it regular linear regression. That means there’s some additional **error terms** that cause our line to be imperfect. These error terms can be due to all sort of things, but typically are attributed to natural variation. For each point in time we take a measurement, we get an error term \epsilon_{t}. Let’s denote y_{t} to be the value of y at time t. Then, with our error terms, the regression equation becomes

Traditional regression analysis typically uses the method of least squares to estimate m and b, and assumes that the residuals \epsilon_{t} are all just drawn randomly and independently from the same distribution (typically a normal distribution) with constant variance. That is, it’s assumed that the \epsilon_{t} are i.i.d. and don’t form a **random process **that actually does depend on previous error terms.

## Time for the Twist

So what happens when those error terms aren’t exactly just being drawn randomly out of a hat? As I mentioned in the introduction, time is a bit tricky. We can’t assume that the residuals at each point in time are actually truly independent of each other. In time series analysis, we replace that \epsilon_{t} with a **random process** we’ll call X_{t}. That is, now we assume there is an actual order to the residuals, and there is some kind of process that governs them. A random process \{X_{t}, t \geq 0\} is a sequence of random variables that may or may not be identically distributed, or even independent. To proceed further, we’ll need to define a couple of things: the autocovariance function and the notion of stationarity^{3}.

### Autocovariance Function

If we take a random process \{X_{t}\}, it’s really just a collection of random variables with an ordering or indexing. Just like with regular random variables, we can calculate the covariance between them, measuring the joint variability. Here, we call it the **auto covariance** between two random variables in the time series, and the formula is defined exactly as regular covariance. We denote by \gamma_{X} the auto covariance function of the random process \{X_{t}\}, and define it for two points in the sequence X_{r}, X_{s} as

where E[\cdot] is the expectation (or mean) of the random variable, and r,s are indices. (See this post for an explanation of expectation.)

### Stationarity

Stationarity is an important concept in the study of random processes, as its existence yields many mathematical properties we need to make inferences and forecasts later. It was important in discussing Poisson processes, and also shows up again here. The definition of stationarity for time series looks a little different, but the meaning is inherently the same.

**Definition: Stationarity**

A time series \{X_{t}, t \in \mathbb{Z}\} is said to be **stationary **if the following hold:

(i) E[|X_{t}|^{2}] < \infty for all t

(ii)E[X_{t}] = m for all t

(iii)\gamma_{X}(r,s) = \gamma_{x}(r+t, s+t) for all r,s,t

We’ll pick apart each piece of the definition to understand the notion of stationarity. The first part just ensures we have a finite variance for all points in time. We prefer not to deal with infinity. The second part means that all variables in the random process must have the same mean. If the mean varies with time, it’s not a stationary process.

Finally, the third requirement of the definition may be views this way. Two random variables in the sequence X_{r} and X_{s} that are r-s apart^{4}have a certain autocovariance. If we shift both variables by the same amount in time t, then those two new random variables X_{r+t} and X_{s+t} should have the same autocovariance as the first pair X_{s}, X_{t}. That is, in a stationary process, the autocovariance only depends on the distance apart the variables are in the sequence, not on their location in time.

This means we can actually write the autocovariance function of a stationary process in a special way, one that only shows the distance between the current point X_{t} and some **lag **X_{t+h}:

for all t,h.

Let’s test an example of a random process for stationarity. If a random process is indeed stationary, than all parts of the definition should be satisfied, in particular that \gamma_{X}(h) does not have any dependence on t – only on h.

**Example: **Let’s assume A and B are two uncorrelated random variables. (That is, \text{Cov}(A,B) = 0). Assume the mean of both A and B is 0 (E[A] = E[B] = 0), and the variance of A and B is 1. (\text{Var}[A] = \text{Var}[B] = 1). Suppose the angle \theta \in [-\pi, \pi], and the random process is given by X_{t} = A\cos(\theta t) + B\sin(\theta t). Is \{X_{t}\} stationary?

First, note that A and B are the only random variables here. The mean of both is 0, and both cosine and sin stay between \pm 1, so we will definitely have part (i) of our definition. Next, we have to make sure the mean is the same for all X_{t}:

\begin{aligned}E[X_{t}] &= E[A\cos(\theta t) + B\sin(\theta t)]\end{aligned}Now, \cos(\theta t) and \sin(\theta t) aren’t random at all, so the expectation of them is just…well…them. We also know that the expectation is a linear operator^{5}, so

because E[A] = E[B] = 0. Good, part (ii) is satisfied.

Finally, we have to see if \gamma_{X}(h) has any t in it:

\begin{aligned}\gamma_{X}(h) &= \text{Cov}(X_{t+h},X_{t})\\&=\text{Cov}[A\cos(\theta (t+h)) + B\sin(\theta(t+h)), A\cos(\theta t) + B\sin(\theta t)]\end{aligned}OK, stick with me. We’re going to use the fact that \text{Cov}(u+v, w+z) =\text{Cov}(u,w) +\text{Cov}(u,z) +\text{Cov}(v,w) +\text{Cov}(v,z). Now,

\begin{aligned}\gamma_{X}(h) &= \text{Cov}(X_{t+h},X_{t})\\&=\text{Cov}[A\cos(\theta (t+h))+B\sin(\theta(t+h)), A\cos(\theta t)+B\sin(\theta t)]\\&=\text{Cov}(A\cos(\theta(t+h)),A\cos(\theta t))+\text{Cov}(A\cos(\theta(t+h)),B\sin(\theta t))\\&\qquad+\text{Cov}(B\sin(\theta(t+h)),A\cos(\theta t))+\text{Cov}(B\sin(\theta(t+h)),B\sin(\theta t))\end{aligned}Now, remember that the cosines and sines aren’t random. That means they are just constants in terms of the covariance, so we can pull them out, multiplying them together. That is,

\text{Cov}(A\cos(\theta(t+h)), A\cos(\theta t)) = \cos(\theta(t+h))\cos(\theta t)\text{Cov}(A,A)for the first term. We do the same with the other three terms:

\begin{aligned}\gamma_{X}(h) &=\text{Cov}(X_{t+h},X_{t})\\&=\text{Cov}[A\cos(\theta (t+h)) + B\sin(\theta(t+h)), A\cos(\theta t)+B\sin(\theta t)]\\&=\text{Cov}(A\cos(\theta(t+h)),A\cos(\theta t))+\text{Cov}(A\cos(\theta(t+h)),B\sin(\theta t))\\&\qquad+\text{Cov}(B\sin(\theta(t+h)),A\cos(\theta t))+\text{Cov}(B\sin(\theta(t+h)),B\sin(\theta t))\\&=\cos(\theta(t+h))\cos(\theta t)\text{Cov}(A,A)+\cos(\theta(t+h))\sin(\theta t)\text{Cov}(A,B)\\&\qquad+\sin(\theta(t+h))\cos(\theta t)\text{Cov}(B,A)+\sin(\theta(t+h))\sin(\theta t)\text{Cov}(B,B)\end{aligned}Now, we already know that \text{Cov}(A,B) = \text{Cov}(B,A) = 0, and \text{Cov}(A,A) = \text{Var}(A) = 1 = \text{Var}(B) = \text{Cov}(B,B). Then we get

\begin{aligned}\gamma_{X}(h) &=\text{Cov}(X_{t+h},X_{t})\\&=\text{Cov}[A\cos(\theta (t+h))+B\sin(\theta(t+h)), A\cos(\theta t)+B\sin(\theta t)]\\&=\text{Cov}(A\cos(\theta(t+h)),A\cos(\theta t))+\text{Cov}(A\cos(\theta(t+h)),B\sin(\theta t))\\&\qquad+\text{Cov}(B\sin(\theta(t+h)),A\cos(\theta t))+\text{Cov}(B\sin(\theta(t+h)),B\sin(\theta t))\\&=\cos(\theta(t+h))\cos(\theta t)\text{Cov}(A,A)+\cos(\theta(t+h))\sin(\theta t)\text{Cov}(A,B)\\&\qquad+\sin(\theta(t+h))\cos(\theta t)\text{Cov}(B,A)+\sin(\theta(t+h))\sin(\theta t)\text{Cov}(A,A)\\&=\cos(\theta (t+h))\cos(\theta t)+\sin(\theta(t+h))\sin(\theta t)\end{aligned}Finally, we can recognize a trigonometric identity^{6}: \cos(u-v) = \cos(u)\cos(v)+\sin(u)\sin(v), where u = \theta t and v = \theta(t+h). Then we end with

because cosine is an even function. Notice that the autocovariance function doesn’t depend on t so our process is indeed stationary.

## Conclusion

Now that we understand some terminology we’ll need later, we can go to the most high-level definition of a time series. A time series \{Y_{t}, t \geq 0\} is a random process given by the classical decomposition

Y_{t} = m_{t} + s_{t} + X_{t}where m_{t} is called the **trend component** (like our line y = mt + b), s_{t} is called the **seasonal component**^{7}, and X_{t} is a **stationary process. **

There is one particular type of stationary process that we can study, called the ARIMA (**A**uto-**R**egressive **I**ntegrated **M**oving **A**verage) process. After removing the trend and seasonal components, we must still estimate X_{t} and get some sort of equation for it (as it’s no longer guaranteed to just be from a set of i.i.d. normal random variables). The next articles in this series will take a deeper dive into ARIMA modeling.

Times series analysis can get quite mathematically heavy. However, it is important to at least have a working familiarity with the mathematics behind these models. The easier the implementation, the greater the danger in misuse, because a practitioner (and even some instructors) aren’t required to understand the mathematical nuances that can pepper something as complicated as time series.

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.

#### Footnotes

- For example, decomposition of time series into trend, seasonal, and noise is done automatically, and ARIMA modeling of that noise can also be done with a simple function call of auto.arima().
- Sometimes called the input
- We have discussed the notion of stationarity in terms of the Poisson process before, but we’ll go over it again here in this context.
- WLOG (without loss of generality), we can assume r > s.
- If you don’t know what this is, don’t worry. It just means that the expectation of a sum of stuff equals the sum of the expectations of the individual stuffs.
- All math is connected somehow. You never know when this shows up.
- We didn’t discuss this bit in this article. This is just another function that has a seasonal period. It’s typically used when we have to account for something like sales peaking at the beginning of a month, as an example