Comparison of ARIMA and SVR Stock Price Forecasting Methods

Info: 2681 words (11 pages) Example Research Project
Published: 1st Dec 2021

Share this: Facebook Twitter Reddit LinkedIn WhatsApp

Time Series Forecasting

This chapter reviews on the theories and research findings related to the research topic. Time series forecasting is an analysis used to forecast future value based on the past performance. There are lot of methods can be used for stock price forecasting. However, different methods will result in different prediction value. This paper compares the forecast value between ARIMA model and SVR model. In theory, ARIMA model is the most general class of models used for forecasting a time series by differencing and logging to become stationary. While, due to the successful application in classification and regression tasks, SVR has become a hot topic of intensive study. SVR is able to solve nonlinear regression estimation problems and thus makes SVR a successful application in time series forecasting.

2.1 Definition of Time Series

“Time series is a set of observations measured sequentially through time” (Chatfield 2001, p.11). Typically, time series represents a set of historical data and plotted as chronologically ordered. This could be daily, weekly, monthly or yearly time series data. Besides, time series consist of four important components, which are seasonal variations, trend variations, cyclical variations, and random variations. Time series forecasting is a series used to forecast long term trends and seasonal fluctuations in statistics. It is a prediction based on the assumption of historical value of a variable will continue to recur in the future. In other words, time series forecasting is a series of past data used to forecast the future value. In general, time series model uses a model to explain the theories and mathematical representations. Based on Frohn (1995), an econometric model has some features like theoretical plausibility, reliable parameter estimation, good adjustment (i.e. the process that generate the data should be captured), good forecast and simplicity (i.e. a model with less variables form should be preferred). As a result, it is important to distinguish the terminology of model.

2.2 Time Series Forecasting

2.2.1 ARIMA Model

Since ARIMA models can be tested in univariate and multivariate variables, these result more suitable for researchers to use for forecasting. Unfortunately, ARIMA models have limitation in their forecast due to the high standard errors that causes large confidence interval of the forecast value (Biondi, & Panaro 1998). As a result, large errors indicate the less accurate of the forecast. Besides, other paper studied that the multivariate forecast or at least half of the individual forecast is tends to outperform Banesian vector autoregressive models in most. This indicates that the univariate ARIMA model are generally the weakest forecast (Zamowitz 1992) due to the less accurate or large errors.

Furthermore, based on the research regarding tourism demand in Australia, ARIMA model indicates better performance than the seasonal ARIMA models for Hong Kong and Malaysia (McAleer 2001). However, the forecast is not as accurate as the forecasts tourist arrivals from Singapore. In contrast with the previous study, seasonable ARIMA and regression model are one of the useful models to forecast the sales of electricity which can be used by businesses for planning or forecasting purposes (Ugiliweneza 2006).

Besides, model selection for ARIMA model is one of the important issues in forecasting. It is often unstable and may cause unnecessarily high variability in the model estimation or prediction. As a result, the weights are updated after each additional observation for the AFTER model in the study (Zou, & Yang 2004). Therefore, there is an advantage of combining ARTER over selection in terms of forecasting accuracy.

From the experimental results, ARIMA models tend to produce good forecast based on the history patterns in crude oil prices yet this is not a better model for daily forecast (Siti, Lee, & Noryanti 2011). This is because ARIMA models are not able to capture the volatility by the non-constant of conditional variance. In contrast, the study of forecasting oil palm price in Thailand found that ARIMA model is the appropriate model for forecasting farm price with smallest Mean Absolute Percentage Error (MAPE) by Nochai, & Nochai (2006).

Based on the finding, ARIMA model itself fails to forecast future values (Nitin 2010). However, it performs better with combination of two hybrid models for Artificial Neutral Network (ANN) and ARIMA in forecasting the index value of Indian Stock Market. By using SARIMABP model, which is the combination of neutral network model of Back Propagation (BP) with seasonal time series ARIMA model (SARIMA), this models able to predict very certain significant turning points of the test time series (Tseng, Yu, & Tzeng, 2002). From the results, SARIMABP model have the lowest Mean Squared Error (MSE), Mean Absolute Error (MAE) and MAPE. Therefore, combination of models will able to forecast more precisely instead of using single model for prediction.

For forecasting problems, Bayesian model generates point and interval forecasts by combining all the information and sources of uncertainty into a distribution for prediction. This means that it would be measures the loss of the function for the forecasters. Based on Alba & Mendoza (2007), Bayesian model able to forecast more accurate compared to ARIMA model in the short time series, provided we assume that the seasonality is stable. The reason is because ARIMA model have not enough observations to verify the seasonality.

2.2.2 SVR Model

Nowadays, the Support Vector Regression (SVR) method, which was first suggested by Vapnik (1995) has been used in various range of application such as classification, regression, data mining and time series forecasting (Cao et al., 2001; Flake et al., 2002; Zhao et al., 2006). According to the author (Andre, Wechselberger, & Zhao, 2007), SVRs tend to implement the Structural Risk Minimization principle, which intend to minimize the upper bound (testing) error rather than minimize the lower bound (training) error.

This paper examines the theory survey of stock yield prediction models (Shen 2009). According to the author, there are two sorts of models, which are the traditional prediction model based on statistical theory and the other of innovational prediction model based on theories such as support vector machines (SVM) and grey theory. As a result, innovational models like SVR can overcome a lot of problems such as large fluctuation or incomplete data. However, integrated prediction model will be the better development model for Chinese stock yield prediction. Besides, from the study of forecasting stock market movement direction with SVR (Huang, Nakamori, & Wang 2004), the author found that SVR outperformed the other classification models, yet combining methods with SVR will be the best due to the strength and weaknesses of every application.

Another paper from Italy studied the online version of the algorithm for training the support vector machine for regression and how the parameter estimator can be extended in the more flexible way (Parrella 2007). The results indicate that all error found have been minimized by changing dynamically the SVR parameters like cost of error, C and the kernel type. Unfortunately, there are few problems remain like quadratic complexity of the algorithm, which does not allow the increment of the number of samples trained without a huge loss of performance.

There are some difficulties like non-stationary data or complexity time series that we may face during forecasting towards stock price movement. But SVR modeling is able to solve this problem by separate data into training and testing. From the research, by using technical indicators for both data driven and non parametric techniques, SVR models result in better performance than Multi Layer Perceptron (MLP) if we consider risk premium as comparison criterion (Ince, Trafalis, & Theodore 2008). Besides, based on the research, SVR with polynomial kernels provide an alternative tool for Kuala Lumpur Stock Exchange (KLSE) stock market prediction (Lung 2006). Again, SVR has proved in time series prediction based on the structural minimization risk and generalization ability. As the comparison model between SVR and ANN, SVR significantly outperform ANN in term of MAE and Root Mean Squared Deviation (RMSE) by Samsudin, Saad, & Shabri (2010). SVR approach proved to have better result than Maximum Likelihood (ML) in estimation procedure (Nalbantow, Patrick, & Bioch 2007). Furthermore, SVR technique leads to avoid problem of over- fitting and improve qualities such as robustness to outliers. By combining Neutral Networks (NN) and linear SVR models without data preprocessing, the results indicate most accurate across all patterns and clearly outperform the Radial basis Function (RBF) of SVR on trended time series (Crone, Guajardo, & Weber, 2006).

With regard to the hybrid model like neutral network models, it is outperformed the traditional models, which can significantly minimize root-mean-squared forecasting errors (RMSE) (Huang, 2006). Based on the studied, the Extended Kalman Filters (EKF) is used as a state estimator and a predictor based on the Black-Schole (BS) formula, while the SVR is used to capture the nonlinear price characteristics which cannot be captured by the BS model.

In China, Yang & Yang (2005) were forecast for the Shanghai composite index by using multiple-step prediction. Besides, Zhou, Fayao, & Peng (2006) utilized SVR in short term prediction for stock market. Based on Jinjing (2007), the experimental result indicated that the method of SVR had higher prediction precision than time sequence method and Neutral Network method.

A SVM classifier is good in terms of the percentage of accurate guesses in the oil price forecasting. Due to the successful applied to forecast, it was clearly illustrated for the prediction (King, Vandrot, & Weng, 2009). From the finding, it shows that SVR model is highly accurate and effective in the forecast of short term power load (Niu, Wang, Duan, & Xing, 2009). The author suggested that there are more advantages to combine the SVR and chaotic time series model in forecasting.

Sometimes, Cao, & Francis (2001) found SVRs can forecast better where smaller number of free parameters and the training SVRs can perform faster than Back Propagation (BP). However, there are little disadvantages on the generalization error on the free parameters that need for improvement in SVRs. Although the SVR forecasting model is good and accurate in predict prices, but there are high noise and non-stationary data that are difficult to detect and remove by SVR models. From the studies, Independent Component Analysis (ICA) performed better than SVR model with non-filtered forecasting variables and a random walk model (Lu, Lee, & Chiu, 2009).

Besides, another major problem that has not yet received proper attention is how to update new data arrives for forecasting models. One of the papers investigates the strategy to update SVR forecasting models for time series with seasonal patterns. Results show that this proposed strategy outperforms the static models which do not update the cycle of the new arrival data (Guajardo, Weber, & Miranda, 2010).

By setting the width of margin, the experimental results show that both Generalized Autoregressive Conditional Heteroskedasticity (GARCH) and Momentum model improve the forecast performance compare with the standard SVR and benchmark model (Yang 2003). By reducing or keeping zero of the predictive downside risk, we are able to increase the up margin.

Experimental results proved that SVMs approach is advantageous to apply in forecasting time series (Sap, & Awan 2005). However, there are still some difficulties to improve in order to perform more accurate forecast. Combination of the techniques with SVR will be better and efficient in forecasting (Nalbantow et al., 2007; Samsudin 2010; Crone et al., 2006).

2.2.3 Comparison between ARIMA Model and SVR Model

In 2010, Allen & Singh (2010) analyzed and compared the forecasting oil price with the applicability of SVR and ARIMA modeling. SVR is a more efficient and supervised learning methods while ARIMA is a widely method based on lag and momentum effect. The finding of the paper clearly showed that SVR is more efficient and accurate in predicting future oil price levels. The reasons is because SVR able to divide categories into training algorithm and testing algorithm by a clear gap. However, further studies by using more sophisticated optimization routines for tuning the parameters of the SVR and also changing the frequency of data used are needed to improve the forecast.

According to Lahane (2008), Feed Forward Neutral Networks (FFNN) models perform better than ARIMA and SVR models in terms of value forecasting. However, ARIMA models perform better than FFNN and SVR models in terms of directional forecasting.

In addition, the studies of forecasting commodity prices by classification methods show that combination of Artificial Neutral Network (ANN) and SVM are more accurate and better for forecast (Fernandez, 2006). The result showed that the ARIMA models able to outperform the ANN and SVM at the short time horizon. Yet, SVM and ANN can perform accurate for long time horizons. These results are contrast with the standard one.

From the studies, Support Vector Regression (SVR) and Partial Least Squares (PLS) provide most accurate forecasts rather than other prediction models like ARIMA based on the daily volume (Alvim, Milidiu, & Santos, 2010). Besides, SVR and artificial Neutral Network (NN) promised accuracy of forecasting by choice of parameters (Crone, Lessmann, & Pietsch, 2006). Result indicated both NN and SVR outperformed the statistical benchmark methods of ARIMA and Exponential Smoothing across time series patterns, noise levels and non-linear series with smallest MAE, RMSE and MAPE.

Since SVR is relatively new application for forecast, in general, the multiplicative seasonal Autoregressive Integrated Moving Average (ARIMA) and Unobserved Components (UC) techniques are more likely outperform the wavelet and SVM ones (Fernandez 2005). Yet, SVM forecast seems to be most valuable when linear combinations of SVM and wavelets forecasts are considered.

AutoRegressive Moving Average (ARMA) model is one of the most widely used linear models in time series forecasting. However, it cannot fully capture the non-linear patterns of the series. Recent studies used ANN to substitute the traditional methods, unfortunately the forecasted results was not really promising. Nowadays, researchers used SVR to forecast the stock market. Based on the author, SVRs model results in promising for forecasting in stock market as compared with ARMA and ANN (Zhang, Song, & Chen, 2008). Hybrid model was able to forecast well in the studied.

The empirical results with the NIFTY returns show that SVM model performed better than ARIMA, NN and random forest regression models in forecasting and trading the S & P CNX NIFTY index return (Kumar, & Thenmozhi, 2007). The reason is because SVR model able to minimize the upper bound of generation error rather than minimize the training or lower bound of generation error. This will thus result in more accurate forecast.

As a conclusion, each method has their own strengths and weaknesses. To get the most accurate prediction, combination of techniques is required. The most important thing is to minimize the error of the forecast. The least of the error’s forecasting, the more accurate of the predictive value.