Basic Steps In Time Series Analysis Finance Essay

Published: Last Edited:

This essay has been submitted by a student. This is not an example of the work written by our professional essay writers.

In this chapter, we discuss the methodology that will be used to conduct this project. The purpose of this chapter is to provide details of the research methods implemented in this project. First, the basic steps in time series analysis are proposed. Secondly, the different time series models used are explained. Then, it is followed by a description of the methods employed to select potential models before estimation. Finally, the forecasting accuracy methods are briefly discussed.

Basic steps in time series analysis

A time series is a set of observations made at equally spaced time intervals. Time series analysis provides tools for selecting a model prior to forecasting future events. Below are some basic steps that need to be followed before fitting any model to a time series:

Gather raw data

Plot the observations against time.

Examine the important features of the graph such as trend, seasonality, outliers and smooth changes in structure such as turning points or sudden discontinuities.

Remove the trend and seasonal components to have a stationary series. If the series is not stationary, use transformation to make it stationary.

Identify the time series model that generates the series and analyse the residuals. This can be done by using information criteria or analysing the ACF and PACF plot of the series.

When a good model is found, use it for forecasting exercises.


A time series is said to be stationary if its properties do not depend on the time at which the series is observed. In other words, it is one whose mean is constant over time and its volatility around its mean does not change over time. Moreover, it has a constant and finite covariance with leading or lagged values. For example, time series with trends or with seasonality are not stationary; the trend and seasonality will affect the value of the time series at different times. On the other hand, a white noise series, which is defined below, is stationary. (Athanasopoulos and Hyndman, 2012)

White noise

A time series is called a white noise if is a sequence of independent identically distributed (i.i.d) random variables with finite mean, usually assumed to be zero and finite variance. In economic time series, the white noise series usually represents innovations or shocks.

Unit root test

These are statistical hypothesis tests that are designed to test whether differencing is needed. A number of unit root tests are available and they are based on different assumptions. One of the most popular tests is the Augmented Dickey- Fuller (ADF) test.

Augmented Dickey-Fuller test

The augmented Dickey-Fuller test is a version of the Dickey-Fuller test that is used for a larger and more complicated time series models and is left-tailed. The augmented Dickey-Fuller statistic, used in the test, is a negative number. Thus, the more negative the test statistics, the stronger the rejection of the hypothesis that there is a unit root at some level of confidence. The following ADF model is estimated:

, ()

where is the first difference operator t is the time trend; p denotes the number of lags used and is the error term; α, β and γ are parameters. The null hypothesis of the augmented Dickey-Fuller t-test is (i.e. the series Xt is not stationary) is tested against the alternative hypothesis .


If the time series is not stationary, some transformations are used to make it stationary. One way to do this is to compute the differences between consecutive observations until the time series becomes stationary. This method is known as differencing. The original series is now replaced by a new one. For example, if a series is of first-order differenced, then any observation at time t is now in the form of , where and are the respective observations at time t and t-1 in the original series.


In this project, only some univariate linear and non-linear time series models are considered. In this section, the models employed are explained.

Autoregressive (AR) Models

An autoregressive model is one in which the current value of a variable at time t is a linear regression against the values of the variable in previous periods together with an error term. An AR model of order p (AR (p)), where p is the number of lags, can be written as:

, ()

where {Xt} is the time series data; c is a constant; Φi are the parameters of the model and εt is a white noise error term with mean zero and variance.

Moving Average (MA) Models

A moving average (MA) model follows the same approach as the autoregressive model except that the current observation is now computed as a linear regression against past (unobserved) innovations. MA (q) refers to a moving average model of order q, where q is the number of lags and is given by:

, ()

where c is a constant, θi are parameters of the model; θ0=1 and {εt} are the white noise error terms.

Autoregressive Moving Average (ARMA) Models

As the name suggests, the autoregressive moving average (ARMA) model is a combination of both the autoregressive (AR (p)) and moving average (MA (q)) models. In other words, it is a linear combination of previous values of the current variable and previous (unobserved) error terms plus the current error term. An ARMA (p, q) model is thus represented as:


where {Xt} is the time series data; c is a constant; Φi and θi are the parameters of the model and {εt} are the white noise error terms.

The aforementioned models, also known as conditional mean models, are needed for the introduction of the models that follow. The following models are often called conditional variance models since they allow the variance of the white noise series (i.e. the error term) to vary with time (conditional heteroscedasticity). The error term, , is now defined throughout as , where zt is a standardised error term with mean zero and variance one (i.e. zt ~ i.i.d (0, 1)) and denotes the conditional variance at time t.

ARCH Models

The Autoregressive Conditional Heteroscedastic (ARCH) model proposed by Engle (1982) allows the conditional variance to change over time as a function of past squared residuals. An ARCH (q) model is one where the error variance depends on q lags of squared errors:


where c is a constant, are parameters and {εt} is the series of error terms.

In empirical applications of the ARCH model, a relatively long lag is usually needed to capture all of the dependence of the conditional variance and the parameters are required to be non-negative. (Bollerslev, 1986)

GARCH Models

To overcome the former problems encountered with ARCH models, Bollerslev (1986) introduced an extension of the usual ARCH (q) model known as the Generalised ARCH (GARCH) model. In the GARCH model, the conditional variance is computed as a linear function of the variance in previous periods together with past squared residuals. The general GARCH (p, q) model, which depends on p lags of the conditional variance and q lags of squared error, is represented as:


where c > 0 ; , q; and .


The Exponential GARCH (EGARCH) model (Nelson (1991)) is an asymmetric GARCH model in which the natural logarithm of the conditional variance is allowed to vary over time as a function of time and lagged error term:


where and are deterministic coefficients and


The value of the depends on several elements. As Nelson (1991, p.351) said, "to accommodate the asymmetric relation between stock returns and volatility changes, the value of the must be a function of both the magnitude and the sign of ". Henceforth, represents the sign effect while the magnitude effect.

Thus, the parameter measures the asymmetric or the leverage effect [2] . If , then the model is symmetric. When , then positive shocks (good news) generate less volatility than negative shocks (bad news). On the other hand, when , it implies that positive innovations are more destabilising than negative innovations. Moreover, the parameter represents the magnitude effect or the symmetric effect of the model. In other words, this parameter accounts for the "GARCH" effect.

As such, the function allows the conditional variance to respond asymmetrically to rises and falls (positive and negative lagged values of ).

The expectation in the function changes accordingly with the distribution of :

When is assumed to be normally distributed, the expectation is given by:


When is assumed to be Student-t(Ï…) distributed, the expectation is given by:


When is assumed to be GED(Ï…) distributed, the expectation is given by:




In contrast to the standard GARCH model, in EGARCH model:

Volatility can react asymmetrically to the good and bad news

The parameters are not restricted to positive values

Volatility of the EGARCH model, which is measured by the conditional variance , is an explicit multiplicative function of the lagged innovations. On the contrary, volatility of the standard GARCH model is an additive function of the lagged error terms. (Härdle, 2004a)


In 1994, Zakoian presented the Threshold GARCH (TGARCH) model, another asymmetric GARCH model, in which the conditional standard deviation, is a linear combination of past innovations and standard deviations. It allows different reactions of the current volatility to the sign of past innovations. The effect of a shock on is a function of both its magnitude and its sign. This important feature of the TGARCH model contributes to the latter's asymmetric characteristic. A TGARCH (p, q) model can be written as:


, positive part of

, negative part of

Positivity Constraints: ; ; for all i, j.

However, this model would not be used in this project.


Glosten et al. (1993) developed the Glosten, Jagannathan and Runkle GARCH (GJR-GARCH) model that allows the effect of good and bad news to have different effects on volatility. The conditional variance is now written as a linear function of squared positive and negative parts of innovations. Following Glosten et al., Rabenmananjara and Zakoian (1993), a TGARCH (p, q) model assumes the form:


where and , and are non-negative parameters.

From the above model, it can be seen that good news contributes to whereas bad news have a larger impact with. Therefore, the asymmetric effect of the model is measured by the dummy variable . The GJR-GARCH model differs from that of the EGARCH in that it considers the error term instead of the standardized error term zt.

Model Identification

Prior to forecasting time series data using any models, some preliminary tests should be carried out. These tests will allow us to detect the most suitable model to our data. At the model identification stage, our aim is to determine the most appropriate model that captures the dynamics of the data. Two general approaches are available for determining the correct model that fits our time series data. The first approach is to use some information criterion function and the second approach uses the correlogram.

Information Criteria

The two most commonly used information criteria in model identification are the Akaike Information Criterion (AIC) (Akaike (1973)) and Schwarz Information Criterion (SIC) (Schwarz (1978)). Both are based on likelihood function. In practice, the model with the minimum AIC or SIC is selected.



where k is the number of parameters and T is the sample size.

Cavanaugh and Neath (1999) write that in large-sample settings, the fitted model favoured by the SIC ideally corresponds to the candidate model which is a posteriori most probable; i.e. the model which is rendered most plausible by the data at hand. Moreover, the AIC is not consistent and tends to overestimate the order of an AR (p) model while the SIC is consistent and corrects this overfitting nature of the AIC (Brockwell and Davis, 2002, p.173). Hence, we select our models based on SIC.

Another way to determine which AR or MA model is appropriate for our data is to examine the ACF and PACF of the time series. The following definitions are given according to Eviews 7.0.

Autocorrelation Function (ACF)

In time series, the concept of autocorrelation is used to explain the linear dependence between a random variable and its past values . The correlation coefficient between Xt and Xt-k is called the lag-k autocorrelation of Xt and is commonly denoted by ρk. For any stationary process {Xt}, only the two moments are known, namely its mean and its variance , which are constants. The covariance of Xt and Xt-k, denoted by , is a measure of how these two random variables move together.

The autocovariance function of a time series is defined as:


where stands for the time series; is the sample mean of the series X; and k represents the time lag.

The autocorrelation function is defined as:


The autocovariance function represents the covariance of Xt and Xt-k and the autocorrelation function shows the correlation between Xt and Xt-k. The graph of the autocorrelation function (ACF) is called the correlogram. This is essential for the analysis of the series since it consists of the time dependence of the observed series.

The autocorrelation plot is formed by displaying the autocorrelation coefficients on the vertical axis and the time lag on the horizontal axis. In addition, two standard error bounds computed as , where T is the sample size, are included in the plot.

The autocorrelation plot is commonly used for checking randomness of a data set. The idea is that if these autocorrelations are near zero for any and all time lags then the data set is random. Moreover, this plot is also used to identify the order of an AR (p) and a MA (q) process. The table below summarises how we use the ACF plot for model identification.

Shape of ACF plot

Indicated Model

Exponential , decaying to zero

Autoregressive model. The PACF plot is used to identify the order of the model

Alternating positive and negative (damped sine curves) , decaying to zero

Autoregressive model. PAC plot is used to identify the order

One or more spikes, rest are essentially zero

Moving average model. Order identified where the plot becomes zero

Decay starts after few lags

Mixed autoregressive and moving average model

All zero or close to zero

Data is essentially random

High values at fixed intervals

Include seasonal autoregressive term

No decay to zero

Series is not stationary

Table 2.3.: Shape of Autocorrelation Function Plot for Model Identification

Partial Autocorrelation Function (PACF)

In addition to the autocorrelation between and , another way to measure the relationship between two random variables and is to remove the linear dependence of the random variables that lie in between, , and then calculate the correlation of the transformed random variables. This is called partial autocorrelation. In time series analysis, the conditional correlation is often referred to as partial autocorrelation.


The partial autocorrelation at lag k () is defined as:


where is the autocorrelation at lag k.

The partial autocorrelation function (PACF) is defined as the partial correlation between and .The partial autocorrelation plot or partial correlogram is also commonly used in model identification. In the same way, the partial autocorrelation plot displays the partial autocorrelation coefficients on the y-axis and the time lag on the x-axis. Moreover, the plot also contains the two aforesaid standard error bounds.

The partial autocorrelation plot is usually used to identify the order of an AR (p) model while the autocorrelation plot is used to identify that of a MA (q) model. For an AR (p) model, the ACF tails off while the PACF cuts off after p lags. Conversely, for a MA (q) model, the PACF tails off while the ACF cuts off after q lags. If the ACF tails off after lag (q-p) and the PACF tails off after lag (p-q), then an ARMA (p, q) is required.

Residual Diagnostics

Before using any GARCH family models to estimate the time series, the residuals of the mean equations should be examined. First, we should check whether the residuals exhibit ARCH effects (i.e. whether heteroscedasticity is present in them). This can be done by studying the correlogram of the squared residuals or using Engle's ARCH test for heteroscedasticity. Secondly, we should check whether the residuals come from a particular distribution. Q-Q plots are often used to check whether the distribution comes from a normal one. However, other descriptive statistics, such as the kurtosis value, the skewness value and the Jarque-Bera test statistic, can also be used.

Engle's ARCH test

The Engle's ARCH test is performed to determine whether the residuals of an estimated mean model exhibit some form of heteroscedasticity. It is a test in which the squared residuals are regressed against a constant and q lagged values of squared residuals.

The null hypothesis of 'No ARCH effects'


is tested against the alternative hypothesis


Thus, the Engle's test checks whether all q lags of the squared residuals have coefficient values significantly different from zero. The null hypothesis is rejected if the test statistic exceeds the critical value from a χ2 distribution with q degrees of freedom.


In this project, the following error distributions are considered:

Normal Distribution

The normal distribution is a symmetric distribution with density function:


where is the expectation value and is the variance of the variable X, thus . The standard normal distribution is given by taking and. The kurtosis of the normal distribution is 3.

Student-t Distribution

The Student-t distribution, or t distribution, has the following density function:


where Ï… is the degree of freedom ( ). The t distribution is symmetric like the normal distribution. The mean, variance and kurtosis of the distribution are:




The Student-t distribution with unit variance has the following density function:


Generalised Error Distribution (GED)

The GED is a symmetric distribution that can be both leptokurtic and platykurtic depending on the degree of freedom Ï… (Ï… > 1). The GED has the following density function:




The GED with unit variance has the following density function:


For Ï… = 2, the GED is a standard normal distribution whereas the tails are thicker than in the normal case when Ï… < 2, and thinner when Ï… > 2.


Skewness defines the degree of asymmetry of the distribution of a series around its mean. The skewness is given as:


where N denotes the sample size; xi is any observation in the series, i= 1:N; represents the mean of the series; is an estimator for the standard deviation ( ) of the series.

The skewness of a symmetric distribution, such as the normal distribution, is zero. A positive skewness value indicates that the data has a distribution skewed to the right. Respectively, a negative skewness value implies that the data follows a distribution skewed to the left.


Kurtosis is the degree of peakedness or flatness of a distribution and is defined as:


where N denotes the sample size; xi is any observation in the series, i= 1:N; represents the mean of the series; is an estimator for the standard deviation ( ) of the series.

The kurtosis of the normal distribution is 3. If the kurtosis of a distribution exceeds 3, then it is peaked (leptokurtic) relative to the normal. When, the kurtosis of a distribution is less than 3, the distribution is flat (platykurtic) relative to the normal.

Jarque-Bera Test

Jarque-Bera is a test statistic for testing whether the series is normally distributed. The test statistic measures the difference of the skewness and kurtosis of the series with those from the normal distribution. The statistic is computed as:


where S and K are the respective skewness and kurtosis of the distribution.

Under the null hypothesis of a normal distribution, the Jarque-Bera statistic is distributed as χ2 with 2 degrees of freedom. The reported probability is the probability that the Jarque-Bera statistic exceeds (in absolute value) the observed value under the null hypothesis. A small probability value leads to the rejection of the null hypothesis of a normal distribution.

Out-Of-Sample Forecasts Models

Previous studies have used a variety of statistics to evaluate and compare forecast errors [3] . Consistent with these studies, we compare the forecast performance of each time series models by three forecast error statistics namely, the Root Mean Squared Error (RMSE), the Mean Absolute Error (MAE) and the Mean Absolute Percentage Error (MAPE).

Let and denote the respective actual and forecast value of any observation at time t, and suppose that the forecast sample is t=n+1, n+2,..., n+k, where is the forecast origin. Then, the out-of-sample forecasts , ,...., ,which use data available from time t=1 to t=n, are compared using the following forecast error statistics:

Root Mean Squared Error (RMSE)

The Root Mean Squared Error (RMSE) computes the square root of the averaged squared errors [4] . Moreover, it depends on the scale of the dependent variable. Thus, it is useful when comparing forecasts of different models on the same series. However, for example, it is not a relative measure when comparing across data sets with different scales.


Mean Absolute Error (MAE)

The Mean Absolute Error (MAE) is an average of the magnitude of the errors in the forecast sample without considering their signs. In addition, the MAE has the same feature as the RMSE in the sense that it is also dependent on the scale of the dependent variable. But, it is less sensitive to large deviations than the squared loss made by the RMSE.


Mean Absolute Percentage Error (MAPE)

Another commonly used forecast error statistic is the Mean Absolute Percentage Error (MAPE). The MAPE is one in which the absolute values of all the percentage errors are added and averaged. It has the advantage of being scale-independent. So, it can be used to compare forecast performance across different data sets. However, MAPE as accuracy measure is often criticised for some significant disadvantages. This measure has the disadvantage of being infinite or undefined if in the period of interest and having an extremely skewed distribution when any is close to zero. Another disadvantage of using this method is that it puts a heavier penalty on positive errors than on negative errors. Moreover, large percentage errors occur when the value in the original series is small and outliers may distort the comparisons in empirical studies.


The criterion for all the three above-mentioned forecast error statistics is the smaller the value, the better the forecasting ability of that model. Nevertheless, due to the disadvantages of the MAPE, the model with the smallest RMSE and MAE is often favoured. In this project, these three forecast error statistics will be used for comparison purposes while only the RMSE will be used to select the best model [5] .