Artificial Neural Networks

Abstract

This dissertation examines and analyzes the use of the Artificial Neural Networks (ANN) to forecast the London Stock Exchange. Specifically the importance of ANN to predict the future trends and value of the financial market is demonstrated. There are several contributions of this study to this area. The first contribution of this study is to find the best subset of the interrelated factors at both local and international levels that affect the London stock exchange from the various input variables to be used in the future studies.

We use novel aspects, in the sense that we base the forecast on both the fundamental and technical analysis.The second contribution of this study was to provide well defined methodology that can be used to create the financial models in future studies. In addition, this study also gives various theoretical arguments in support of the approaches used in the construction of the forecasting model by comparing the results of the previous studies and modifying some of the existing approaches and tested them. The study also compares the performance of the statistical methods and ANN in the forecasting problem. The main contribution of this thesis lies in comparing the performance of the five different types of ANN by constructing the individual forecasting model of them.

Accuracy of models is compared by using different evaluation criteria and we develop different forecasting models based on both the direction and value accuracy of the forecasted value. The fourth contribution of this study is to investigate whether the hybrid approach combining different individual forecasting models can outperform the individual forecasting models and compare the performance of the different hybrid approaches. Three hybrid approaches are used in this study, two are existing approaches and the third original approach, the mixed combined neural network -is being proposed in this study to the academic studies to forecast the stock exchange. The last contribution of this study lies in modifying the existing trading strategy to increase the profitability of the investor and support the argument that the investor earns more profit if the forecasting model is being developed by using the direction accuracy as compared to the value accuracy.

The best forecasting classification accuracy obtained is 93% direction accuracy and 0.0000831 (MSE) value accuracy which are better than the accuracies obtained by the previous academic studies. Moreover, this research validates the work of the existing studies that hybrid approach outperforms the individual forecasting model. In addition, the rate of the return that was attained in this thesis by using modified trading strategy is 120.14% which has shown significant improvement as compared to the 10.8493% rate of return of the existing trading strategy in other academics studies. The difference in the rate of return could be due to the fact that this study has developed good forecasting model or a better trading strategy.

The experimental results show our method not only improves the accuracy rate, but also meet the short-term investors’ expectations. The results of this thesis also support the claim that some financial time series are not entirely random, and that contrary to the predictions of the efficient markets hypothesis (EMH), a trading strategy could be based solely on historical data. It was concluded that ANN do have good capabilities to forecast financial markets and, if properly trained, the investor could benefit from the use of this forecasting tool and trading strategy.

Chapter 1

1 Introduction

1.1 Background to the Research

Financial Time Series forecasting has attracted the interest of academic researchers and it has been addressed since the 1980.It is a challenging problem as the financial time series have complex behavior, resulting from a various factors such as economic, psychological or political reasons and they are non-stationary , noisy and deterministically chaotic.

In today’s world, almost every individual is influenced by the fluctuations in the stock market. Now day’s people prefer to invest money in the diversified financial funds or shares due to its high returns than depositing in the banks. But there is lot of risk in the stock market due to its high rate of uncertainty and volatility. To overcome such risks, one of the main challenges for many years for the researchers is to develop the financial models that can describe the movements of the stock market and so far there had not been an optimum model.

The complexity and difficulty of forecasting the stock exchange, and the emergence of data mining and computational intelligence techniques, as alternative techniques to the conventional statistical regression and Bayesian models with better performance, have paved the road for the increased usage of these techniques in fields of finance and economics. So, traders and investors have to rely on the various types of intelligent systems to make trading decisions. (Hameed,2008). A Computational Intelligence system such as neural networks, fuzzy logic, genetic algorithms etc has been widely established research area in the field of information systems. They have been used extensively in forecasting of the financial market and they have been quite successful to some extent .Although the number of purposed methods in financial time series is very large , but no one technique has been successful to consistently to “beat the market”.

For last three decades, opposing views have existed between the academic communities and traders about the topic of “Random walk theory “and “Efficient Market Hypothesis(EMH)” due to the complexity of the financial time series and lot of publications by different researchers have gather various amount of evidences in support as well as against it. Lehman (1990), Haugen (1999) and Lo (2000) gave evidence of the deficiencies in EMH. But the investors such as Warren Buffet for long period of time have beaten the stock market consistently. Market Efficiency or “Random walk theory” in terms of stock trading in the financial market means that it is impossible to earn excess returns using any historic information.

In essence, then, the new information is the only variable that causes to alter the price of the index as well as used to predict the arrival and timing. Bruce James Vanstone (2005) stated that in an efficient market, security prices should appear to be randomly generated. Both sides in this argument are supported by empirical results from the different markets across over the globe. This thesis does not wish to enter into the argument theoretically whether to accept or reject the EMH. Instead, this thesis concentrates on the methodologies to be used for development of the financial models using the artificial neural networks (ANN), compares the forecasting capabilities of the various ANN and hybrid based approach models, develop the trading strategy that can help the investor and leaves the research of this thesis to stack up with the published work of other researchers which document ways to predict the stock market. In recent years and since its inception, ANN has gained momentum and has been widely used as a viable computational intelligent technique to forecast the stock market.

The main challenge of the traders is to know the signals when the stock market deviates and to take advantage of such situations. The data used by the traders to remove the uncertainty in the stock market and to take trading decisions whether to buy or sell the stock using the information process is “noisy”. Information not contained in the known information subset used to forecast is considered to be noise and such environment is characterized by a low signal-to noise ratio. Refenes et.al (1993) and Thawornwong and Enke (2004) described that the relationship between the security price or returns and the variables that constitute that price (return), changes over time and this fact is widely accepted within the academic institutes.

In other words, the stock market‘s structural mechanics may change over time which causes the effect on the index also change. Ferreira et al. (2004) described that the relationship between the variables and the predicted index is non linear and the Artificial neural networks (ANN) have the characteristic to represent such complex non-linear relationship. This thesis presents the mechanical London Stock Market trading system that uses the ANN forecasting model to extract the rules from daily index movements and generate signal to the investors and traders whether to buy, sell or hold a stock. The figure 1 and 2 represents the stock exchange and ANN forecasting model. By viewing the stock exchange as a financial market that takes historical and current data or information as an input, the investors react to this information based on their understanding, speculations, analysis etc.

It would now seem very difficult to predict the stock market, characterized by high noise, nonlinearities, using only high frequency (weekly, daily) historical prices. Surprisingly though, there are anomalies in the behavior of the stock market that cannot be explained under the existing paradigm of market efficiency. Studies discussed in the literature review have been able to predict the stock market accurately to some extent and it seems that forecasting model developed by them have been able to pick some of the hidden patterns in the inherently non-linear price series. While it is true that forecasting model need to be designed and optimized with care in order to get accurate results .

Further, it aims to contribute knowledge that will one day lead to a standard or optimum model for the prediction of the stock exchange. As such, it aims to present a well defined methodology that can be used to create the forecasting models and it is hoped that this thesis can address many of the deficiencies of the published research in this area. In the last decade, there has been plethora of the ANN models that were developed due to the absence of the well defined methodology, which were difficult to compare due to less published work and some of them have shown superior results in their domains. Moreover, this study also compares the predictive power of the ANN with the statistical models. Normally the approach used by the academic researchers in the forecasting use technical analysis and some of them include the fundamental analysis. The technical analysis uses only historical data (past price) to determine the movement of the stock exchange and fundamental analysis is based on external information (like interest rates, prices and returns of other asset) that comes from the economic system surrounding the financial market.

Building a trading system using forecasting model and testing it on the evaluation criteria is the only practical way to evaluate the forecasting model. There has been so much prior research on identifying the appropriate trading strategy for forecasting problem. This thesis does not wish to enter into the argument which strategy is best or not. Although, the importance of the trading strategy can hardly be underestimated, but this thesis concentrates on using one of the existing strategy, modify it and compares the return by the forecasting models. But there has always been debate in the academic studies over how to effectively benchmark the model of ANN for trading. Some of the academic researchers stated that predicting the direction of the stock exchange may lead to higher profits while some of them supported the view that predicting the value of the stock exchange may lead to higher rate of return. Azoff (1994) and Thawornwong and Enke (2004) discussed about this debate in their study.

In essence, there is a need for a formalized development methodology for developing the ANN financial models which can be used as a benchmark for trading systems. All of this is accommodated by this thesis.

1.2 Problem Statement and Research Question

The studies mentioned above have generally indicated that ANN, as used in the stock market, can be a valuable tool to the investor .Due to some of the problems discussed above, we are not still able to answer the question:

Can ANNs be used to develop the accurate forecasting model that can be used in the trading systems to earn profit for the investor?

From the variety of academic research summarized in the literature review, it is clear that a great deal of research in this area has taken place by different academic researchers and they have gathered various amounts of evidences in support as well as against it. This directly threatens the use of ANN applicability to the financial industry.

Apart from the previous question, this research addresses various other problems:

1. Which ANN have better performance in the forecasting of the London Stock Exchange from the five different types of the ANN which are widely used in the academics?

2. Which subset of the potential input variables from 2002-08 affect the LSE?

3. Do international stock exchanges, currency exchange rate and other macroeconomic factors affect the LSE?

4. How much the performance of the forecasting model is improved by using the regression analysis in the factor selection?

5. Can use of the technical indicators improve the performance of the forecasting model?

6. Which learning algorithm in the training of the ANN give the better performance?

7. Does Hybrid-based Forecasting Models give better performance than the individual ANN forecasting models?

8. Which Hybrid-based models have the better performance and what are the limitations of using them?

9. Does the forecasting model developed on the basis of the percentage accuracy gives more rate of the return as compared to the value accuracy?

10. Does the forecasting model having better performance in terms of the accuracy increase the profit of the investor when applied to the trading strategy?

Apart from all questions outlined above, it addresses various another questions regarding the design of the ANN.

• Are there any approaches to solve the various issues in designing of the ANN like number of hidden layers and activation functions?

This thesis will attempt to answer the above question within the constraints and scope of the 6-year sample period (from 2002-2008) using historical data of various variables that affect the LSE. Further, this thesis will also attempt to answer these questions within the practical constraints of transaction costs and money management imposed by real-world trading systems. Although a formal statement of the methodology or steps that is being used is left until section 3, it makes sense to discuss the way in which this thesis will address the above question.

In this thesis, various types of ANN will be trained using fundamental data, and technical data according to the direction and value accuracy. A better trading system development methodology will be defined, and the performance of the forecasting model will be checked by using evaluation criteria rate of the return .In this way, the benefits of incorporating ANN into trading strategies in the stock market can be exposed and quantified. Once this process has been undertaken, it will be possible to answer the thesis all questions.

1.3 Motivation of the Research

Stock market has always had been an attractive appeal for the researchers and financial investors and they have studied it over again to extract the useful patterns to predict the movement of the stock market. The reason is that if the researchers can make the accurate forecasting model, they can beat the market and can gain excess profit by applying the best trading strategy.

Numerous financial investors have suffered lot of financial losses in the stock market as they were not aware of the stock market behavior. They had the problem that they were not able to decide when they should sell or buy the stock to gain profit. Nevertheless, finding out the best time for the investor to buy or to sell has remained a very difficult task because there are too many factors that may influence stock prices. If the investors have the accurate forecasting model, then they can predict the future behavior of the stock exchange and can gain profit. This solves the problem of the financial investors to some extent as they will not bear any financial loss. But it does not guarantee that the investor can have better profit or rate of return as compared to other investors unless he utilized the forecasting model using better trading strategy to invest money in the share market. This thesis tries to solve the above problem by providing the investor better forecasting model and trading strategies that can be applied to real-world trading systems.

1.4 Justification of Research

There are several features of this academic research that distinguish it from previous academic researches. First of all, the time frame chosen for the investigation of the ANN (2002-08) in the London Stock Exchange has never been tested in the previous academic work. The importance of the period chosen is that there are two counter forces, which are opposing each other. On the one hand, the improvement of the UK and other countries economy after the 2001 financial crises happened in this period as a whole. On the other hand, this period also shows the decline in the stock markets from Jan, 2008 to Dec, 2008. So, it is important to test the forecasting model for bull, stable and bear market.

Second, some of the research questions addressed in the above section, have not been investigated much in the academic studies, especially there is hardly any study which have done research on all the problems. Moreover, original hybrid based mixed neural network, better trading strategy and other modified approaches have been successfully being described and used in this study

Finally, there is a significant lack of work carried out in this area in the LSE. As such, this thesis draws heavily on results published mainly within the United States and other countries; from the academics .One interesting aspect of this thesis is that it will be interesting to see how much of the published research on application of ANN in stock market anomalies is applicable to the UK market. This is important as some of the academic studies (Pan et al (2005)) states that each stock market in the globe is different.

1.5 Delimitations of scope

The thesis concerns itself with historical data for the variables that affect London Stock Exchange during the period 2002 – 2008.

1.6 Outline of the Report

The remaining part of the thesis is organized in the following six chapters.

The second chapter, the background and literature review, provides a brief introduction to the domain and also pertinent literature is reviewed to discuss the related published work of the previous researchers in terms of their contribution and content in the prediction of the stock exchange which serves as the building block for much of the research. Moreover, this literature review also gave solid justification why a particular set of ANN inputs are selected, which is important step according to the Thawornwong and Enke (2004) and and some concepts from finance.

The third chapter, the methodology, describes the steps in detail, data and the mechanics or techniques that take place in the thesis along with the empirical evidence. In addition, it also discuss the literature review for each step. Formulas and diagrams are shown to explain the techniques when necessary and it also covers issues as software and hardware used in the study.

The fourth chapter, the implementation, discusses the approaches used in the implementation in detail based on the third chapter. It also covers such issues as software and hardware used in the study.

The fifth chapter, the results and analysis, present the results according to the performance and benchmark measures that we have used in this study to compare with other models. It describes the choices that were needed in making model and justifies these choices in terms of the literature.

The sixth chapter, conclusions and further work, restates the thesis hypothesis, discuss the conclusions drawn from the project and also thesis findings are put into perspective. Finally, the next steps to improve the model performance are considered.

Chapter 2 Background and Literature Review

2 Background and Literature Review

This section of thesis explores the theory of three relevant fields of the Financial Time Series, Stock Market, and Artificial Neural Networks, which together form the conceptual frameworks of the thesis as shown in the figure 1. Framework is provided to the trader to make quantitative and qualitative judgments concerning the future stock exchange movements. These three fields are reviewed in historical context, sketching out the development of those disciplines, and reviewing their academic credibility, and their application to this thesis. In the case of Neural Networks, the field is reviewed with regard to that portion of the literature which deals with applying neural network to the prediction of the stock exchange, the various type of techniques and neural networks used and an existing prediction model is extended to allow a more detailed analysis of the area than would otherwise have been possible.

2.1 Financial Time Series

2.1.1 Introduction

The field of the financial time series prediction is a highly complex task due to the following reasons:

1. The financial time series frequently behaves like a random-walk process and predictability of such series is controversial issue which has been questioned in scope of EMH.

2. The statistical property of the financial time series shift with the different time. Hellstr¨om and Holmstr¨om [1998]).

3. Financial time series is usually noisy and the models which have been able to reduce such noise has been the better model in forecasting the value and direction of the stock exchange.

4. In the long run, a new forecasting technique becomes a part of the process to be forecasted, i.e. it influences the process to be forecasted (Hellstr¨om and Holmstr¨om [1998]).

The first point is explained later in this section while discussing the EMH theory (Page).The graph of the volatility time series of FTSE 100 index from 14 June, 1993 to 29 December, 1998 and Dow Jones from 1928 to 2000 by Nelson Areal (2008) and Negrea Bogdan Cristian (2007) illustrates the second point of the FTSE 100 [2.1.r]in figure 2.1.1 and 2.2.2.These figures also shows that the volatility changes with period , in some periods FTSE 100 index value fluctuates so much and in some it remains calm.

The third point is explained by the fact the events on a particular data affect the financial time series of the index, for example, the volatility of stocks or index increases before announcement of major stock specific news (Donders and Vorst [1996]). These events are random and contribute noise in the time series which may make difficult to compare the two forecasting models difficult to compare as a random model can also produce results. The fourth result can be explained by the example. Suppose a company develop a model or technique that can outcast all other models or techniques. The company will make lot of profits if this model is available to less people. But if this technique is available to all people with time due to its popularity, than the profits of the company will decrease as the company will not no longer take advantage of this technique. This argument is described in Hellstr¨om and Holmstr¨om [1998] and Swingler [1994] .

2.1.2 Efficient Market Hypothesis (EMH)

EMH Theory has been a controversial issue for many years and there has been no mutual agreed deal among the academic researchers, whether it is possible to predict the stock price. The people who believe that the prices follow “random walk” trend and cannot be predicted, are usually people who support the EMH theory. Academic researchers( Tino et al. [2000]), have shown that the profit can be made by using historical information , whereas they also found difficult to verify the strong form due to lack of all private and public data.

The EMH was developed in 1965 by Fama (Fama [1965], Fama [1970]) and has found widely accepted (Anthony and Biggs [1995], Malkiel [1987], White [1988], Lowe and Webb [1991]) in the academic community (Lawrence et al. [1996]).It states that the future index or stock value is completely unpredictable given the historical information of the index or stocks. There are three forms of EMH: weak, semi-strong, and strong form. The weak EMH rules out any form of forecasting based on the stock’s history, since the stock prices follows a random walk in which in which successive changes have zero correlation (Hellstr¨om and Holmstr¨om [1998]). In Semi Strong hypothesis, we consider all the publicly available information such as volume data and fundamental data. In strong form, we consider all the publicly and privately available information.

Another reason for argument against the EMH is that different investors or traders react differently when a stock suddenly drops in a value. These different time perspectives will cause the unexpected change in the stock exchange, even if the new information has not entered in the scene. It may be possible to identify these situations and actually predict future changes (Hellstr ¨om and Holmstr¨om [1998])

The developer have proved it wrong by making forecasting models, this issue remains an interesting area. This controversy is just only matter of the word immediately in the definition. The studies in support of the argument of EMH rely on using the statistical tests and show that the technical indicators and tested models can’t forecast. However, the studies against the argument uses the time delay between the point when new information enters the model or system and the point when the information has spread across over the globe and a equilibrium has been reached in the stock market with a new market price.

2.1.3 Financial Time Series Forecasting

Financial Time series Forecasting aims to find underlying patterns, trends and forecast future index value using using historical and current data or information. The historic values are continuous and equally spaced value over time and it represent various types of data . The main aim of the forecasting is to find an approximate mapping function between the input variables and the forecasted or output value . According to Kalekar (2004), Time series forecasting assumes that a time series is a combination of a pattern and some error. The goal of the model using time series is to separate the pattern from the error by understanding the trend of the pattern and its seasonality Several methods are used in time series forecasting like moving average (section ) moving averages, linear regression with time etc. Time series differs from the technical analysis (section) that it is based on the samples and treated the values as non-chaotic time series. Many academic researchers have applied time series analysis in their forecasting model, but there has been no major success. [1a]

2.2 Stock Market

2.2.1 Introduction

Let us consider the basics of the stock market.

MM What are stocks?

Stock refers to a share in the ownership of a corporation or company. They represent a claim of the stock owner on the company’s earnings and assets and by buying more stocks; the stake in the ownership is increased. In United States, stocks are often referred as shares, whereas in the UK they are also used as synonym for bonds, shares and equities.

MM Why a Company issues a stock?

The main reason for issuing stock is that the company wants to raise money by selling some part of the company. A company can raise money by two ways: “debt financing” (borrowing money by issuing bonds or loan from bank) and “equity financing “(borrowing money by issuing stocks).It is advantageous to raise the money by issuing stocks as the company has not to pay money back to the stock owners but they have to share the profit in the form of the dividends.

MM What is Stock Pricing or price?

A stock price is the price of a single stock of a number of saleable stocks traded by the company. A company issue stock at static price, and the stock price may increase or decrease according to the trade. Normally the price of the stocks in the stock market is determined by the supply/demand equilibrium.

 

MM What is a Stock Market?

Stock Market or equity market is a public market where the trading and issuing of a company stock or derivates takes place either through the stock exchange or they may be traded privately and over-the counter markets. It is vital part of the economy as it provides opportunities to the company to raise money and also to the investors of having potential gain by selling or buying share. The stock market in the US includes the NYSE, NASDAQ, the AMEX as well as many regional exchanges. London Stock Exchange is the major stock exchange in the UK and Europe.As mentioned in the Chapter 1, in this study we forecast the London Stock Exchange (Section 2.2.2.).

Investing in the stock market is very risky as the stock market is uncertain and unsteady. The main aim of the investor is to get maximum returns from the money invested in the stock market, for which he has to study about the performance, price history about the stock company .So it is a broad category and according to Hellstrom (1997), there are four main ways to predict the stock market:

1. Fundamental analysis (section 2.2.3)

2. Technical analysis, (section 2.2.4)

3. Time series forecasting (section 2.1)

4. Machine learning (ANN). (Section 2.3)

2.2.2 London Stock Exchange

London Stock Exchange is one of the world’s oldest and largest stock exchanges in the world, which started its operation in 1698, when John Casting commenced “at this Office in Jonathan’s Coffee-house” a list of stock and commodity prices called “The Course of the Exchange and other things” [2] .On March 3, 1801, London Stock Exchange was officially established with current lists of over 3,200 companies and has existed, in one or more form or another for more than 300 years. In 2000, it decided to become public and listed its shares on its own stock exchange in 2001. The London Stock market consists of the Main Market and Alternative Investments Market (AIM), plus EDX London (exchange for equity derivatives).

The Main Market is mainly for established companies with high performance, and AIM hand trades small-caps, or new enterprises with high growth potential.[1] Since the launch of the AIM in 1995, AIM has become the most successful growth market in the world with over 3000 companies from across the globe have joined AIM. To evaluate the London Stock Exchange, the autonomous FTSE Group (owned by the Financial Times and the London Stock Exchange) , sustains a series of indices comprising the FTSE 100 Index, FTSE 250 Index, FTSE 350 Index, FTSE All-Share, FTSE AIM-UK 50, FTSE AIM 100, FTSE AIM All-Share, FTSE SmallCap, FTSE Tech Mark 100 ,FTSE Tech Mark All-Share.[4] FTSE 100 is the most famous and composite index calculated respectively from the top 100 largest companies whose shares are listed on the London Stock Exchange.

The base date for calculation of FTSE 100 index is 1984. [2] In the UK, the FTSE 100 is frequently used by large investor, financial experts and the stock brokers as a guide to stock market performance. The FTSE index is calculated from the following formula:

2.2.3 Fundamental Analysis

Fundamental Analysis focuses on evaluation of the future stock exchange movements and the expected returns from the index by analyzing the market or the factors which affect the index. These factors are discussed in this section and how these factors can be used by the investors to estimate the price or direction of the stock exchange, so that they can earn significant amount of the profits.

In 1928, Benjamin Graham, commonly referred to within the field of Finance as the “Father of Value Investment “ introduced the discipline of the Value investment which is used to discover the “intrinsic values “ of a securities or index through fundamental analysis. These investment policies to forecast the stock exchange was widely recognized by the success of the Warren Buffet.There are many credible definitions of intrinsic values which are defined by the various investors or researchers, for example, Cottle et al. (1988), defined “intrinsic value is the value which is justified by assets, earnings, dividends, definite prospects, and the factor of management’. Graham emphasized many concepts in the various edition of his book “The Intelligent Investor” that should be used to predict the stock index or stock, for example, price-earnings ratio, size of firm and capitalization of stocks within index. These concepts were tested by Oppenheimer and Schlarbaum (1981), Banz (1981), and Reinganum (1981) to determine its usefulness. Following from the success of the Graham many researchers identified better and reliable ways to determine the value of the stock or index .The remainder of this section on Fundamental Analysis considers some of the research, and presents it according to the specific fundamental factors considered.

Basu (1977) investigated the use of the P/E ratio to determine the stock price which can be used similarly to forecast the index value. Banz (1981) studied the relationship of the market capitalization of a firm, and its return. DeBondt and Thayler (1987) presented the evidences that the stocks undergo seasonality patterns in their returns. Kanas (2001) investigated the non-linear relation between stock returns and the fundamental variables of dividends and trading volume. According to the Dow Dividend Policy, if there is reduction in the dividend payments of the stock, the stock price (in case of big companies) will decrease which will led to reduce the value of the index. Olson and Mossman (2002) presented that ANN are superior to the Ordinary Least Square (OLS) and logistic regression techniques and concluded that fundamental analysis adds in forecasting within the Canadian Market. This portion of the literature review has presented key ideas and developments of use of the fundamental analysis in forecasting.

MM Applicability

From the view point of the thesis, the discussions and the works above revealed that number of persistent anomalies are used within the forecasting of the stock exchange. The anomalies described above are related to the fundamental variables. As such, we use fundamental variables as the input variables of this project.

2.2.4 Technical Analysis

In 1884, Charles Dow drew up an average of the daily closing prices of 11 important stocks which led to the beginning of the technical analysis in the prediction of the stock exchange. (Edwards et al. (2001)).Dow‘s work was updated in various books and journals by various researchers such as S.A Nelson in 1903, William Peter Hamilton in 1922 , Robert Rhea in 1922 Richard W Schabacker in 1930s, Robert D Edwards and John Magee in 1948.In 1978. Wilders introduced new technical indicators and many of them are use in today. Returning to modern technical analysis, there has been many technical indicators have been used by the traders. In the 1980’s various esoteric approaches have been criticized for the technical analysis by such as the relationship of the “Super Bowl Indicator” or length of women skirts with the stock price movements. The principles of the technical analysis say that a trend, once established of the index, it tends to persist, and the index value usually move in trends. There are number of studies which support the use of the technical analysis in forecasting and an equally, oppose them. The remainder of this research present evidences as well against the use of the Technical Analysis.

In 1965, Fatma presented the considerable amount of the evidence supporting the random walk hypothesis, which says that series of price changes have no memory. In contrast, Wilder (1978), Kamara (1982) and Laderman (1987) showed the effectiveness of the technical analysis and future value can be predicted by using the historical data. Neftci and Policano (1984) used the moving averages and slopes (trends) to forecast the various gold and T-bill futures contracts and they concluded that there is significant relationship between the future prices and moving averages.

This relationship was supported by the Le Baron (1997) in his study of prediction of the foreign exchange rates. Murphy (1988) demonstrated that there is relationship between the different sectors of the market with the other sectors of the market. White (1998) used the neural networks to predict the future values but he could not find any evidence that contradicts the efficient markets hypothesis. In the late 80s, the full acceptance by the academic community for use of the technical analysis was very low, until the study of Lehmann (1990), Jegadeesh (1990), Netfci (1991), Brock et al. (1992), Taylor and Allen (1992), Levich and Thomas (1993), Osler and Chang (1995), Neely et al. (1997) and Mills (1997) concluded with the acceptance of the Technical Analysis. Lee and Swaminathan (2000) showed that the results of the forecasting get better by usage of the price momentum and trading volume. Su and Huang (2003) got the better results in prediction of trend by using the combination of various technical indicators ((Moving Average, Stochastic Line [KD], Moving average Convergence and Divergence [MACD], Relative Strength Index [RSI] and Moving average of Exchanged Volume [EMA]).

In addition to the academic sources, a brief literature review was also conducted throughout the journals. Although some of the work published in these sources, is of not academic quality, but his search help in finding the technical variables that are being used by the professionals in this field. Reverre (2000), Sharp (2000), Ehlers (2000), Pring (2000), Fries (2001), Ehlers (2001), Dormeier (2001), Boomers (2001), Schaap (2004), Yoder(2002) discussed the different techniques and combinations that can be used with the moving average. Similarly, Levey (2000), Gustafson (2001), Pezzutti (2002) investigated the use of volatility. Study of Pring (2000), Ehrlich (2000), Tanksley (2000), Peterson (2001), Bulkowski (2004), Katsanos(2004), Castleman (2003), Peterson (2003) and Gimelfarb (2004) explained the significance of the volumes in the price movements. The importance of using the Average Direction Index (ADX) in the forecasting was explained by the Boot (2000), Star (2003), Gujral (2004).In addition, Steckler (2000) and Steckler (2004) studied the use of the stochastic indicator in the prediction model. These variables along with those identified above in the research are used in this thesis.

MM Applicability

Essentially, the discussions above show that some of the technical variables have academic acceptance in the forecasting of the stock exchange, for example, Siegel (2002) supported the use of the Moving Average. Although the academic researchers have mixed kind of feeling about the use of the technical analysis, but some work clearly argue that they cannot be ignored. As such, we use the technical variables as the inputs for the ANNs in this thesis.

2.3 Artificial Neural Networks (ANN)

2.3.1 Introduction

In the last decade, the artificial neural networks were constructed with different techniques for forecasting the stock market. In this section, we gave a brief presentation on the artificial neural networks. We will focus on the structure of the feed forward networks, radial axis neural networks, time delay neural networks, probabilistic neural networks and recurrent networks which are used widely used in the forecasting of the stock market.

MM What is ANN?

An Artificial Neural Network (ANN), usually called “Neural Network” (NN) is defined as information – processing paradigm inspired by the mathematical or computational methods by which the biological nervous system (brain) process information. One unique and important property of this paradigm is the exceptional structure of the information processing system. It is built out of a highly densely interconnected set of processing elements, which are similar to the neurons, where each set of elements are joined by the weighted connections that takes a number of real-valued inputs, and produce real-valued output.

To develop a feed for this analogy, let us understand the basic principle of neuron, which is the basic building unit block of any neural network.

The simplest neural network is Multilayer perceptrons (MLP).It consists of several processing layers of nodes. The first layer is an input layer whose neurons receive input layer. After preprocessing these input values, these output values are forwarded to the neurons in the hidden layer. After preprocessing in the hidden layer, the values are processed to another layer until it reaches the output layer. Figure 1 shows an example of the MLP with one hidden layer.

For the casual forecasting problem, the relationship between the input and output value is given by where  are input or independent variables and  is output or dependent variables .But this equation is same as the non-linear regression analysis model(section 3.2).But for the time series forecasting the equation can be rewritten as where the  is the closed value of the stock market at time t. So, we can say the concept of the ANN in forecasting is same as the nonlinear autoregressive model for time series forecasting of the stock market.

Back Propagation Using Gradient Descent Technique

Usually, when we use the learning algorithm, called “backpropogation” to train the MLP we refer it as “neural network” Let the error function used by the ANN where N denotes the set of all training patterns i.e.  is the measure of the error produced by all training examples , is target output for the  th component of the neuron should produce,  is the actual pattern produced by the  th component of the output neuron and the weights of the ANN are represnted by the vector .

Normally the learning procedure consists of a set of pairs of inputs and outputs patterns. The model produces the output pattern by using the input pattern and compares these with the target pattern. If there is difference, the weights are changed to reduce the difference and change in weights is proportional to the gradient of the error surface in the negative direction. This method is called the gradient descent method. There is no derivative computation in the perceptron as it has continuous step function, so gradient descent method cannot be used in perceptron. So, we use the sigmoid neurons in FFNN. So, the basic aim of gradient descent algorithm is to reduce the minimize the .

Gradient EQ of function f is the vector of first partial derivates (Dimitri PISSARENKO,2000).

In our case .When we try to interpret the vector in the weight space,the gradient specifies the direction that produces the steepest increase in E.(Mitchell,1997,p.91)

The figure 4 shows the behavior of the  with respect to the , i.e. to decrease the value of the , we should move in the negative(reverse) direction of the slope. We repeat the procedure as moving downhill shown in the figure 4 until we reach a minimum ) , as shown in the figure 5.

The algorithm shown in the figure 6 explains the procedure of the Gradient Descent.By using the equation (1)(Dimitri PISSARENKO,2000) , we get

In this equation the weight  can influence the ANN only through the .So, if we use the chain rule to write ([Mitchell, 1997, p. 102])

By using the equation (3) (Dimitri PISSARENKO,2000), the equation (2) reduces to the equation (4)

From equation (4), (3), we see that the  can influence the network only through  and  can influence the network only through the term through the .If we again use the chain rule , we get the equation:

The first term in equation can be rewritten as by using the equation(7) (Dimitri PISSARENKO,2000)

The second term in the equation (5) can be written as In equation(8) we use the fact that the Combining the results of the equation (1.1), 2,3,4,5,6,7,8, the  can be written as

Now we will discuss how weights of hidden nodes are updated.

By using equation(1),(2),(3),(4),(5),(6),(7), we get sub-expression of the weight update rule as in this equation only  differs between the output and hidden nodes.(Dimitri PISSARENKO,2000) .So, we need to derive this term only as rest all the expressions are same .

The set of all units immediately downstream of unit j in the network are denoted by , which is only variable that causes  to influence the network outputs.Therefore we can write as (Dimitri PISSARENKO,2000)

So, the weight update rule for hidden nodes is equal to, although there are many algorithms such as momentum term which are improvement over the gradient descent algorithm, but still the gradient descent algorithm is the most popular combination with the MLP to design the ANN.(Dimitri PISSARENKO,2000)

2.3.2 Feed Forward Neural Networks (FFNN)

Feed forward neural networks are the most popular and most widely used models in the forecasting problems. They are also known as "multi-layer perceptrons."

Figure shows a one-hidden-layer FFNN with inputs  and output. FFNN is divided into the layers and every node in each layer is connected to the node in the previous layer. These connections may have different name. It is called FFNN as there is no feedback in this network .The hidden layer consists of a neurons and the functionality of the hidden layer is

The output of the network is given by where h is the number of neurons in the hidden layer , n is the number of inputs and the variables  are the parameters of the network.

2.3.3 Time Delay Neural Networks (TDNN)

Time Delay Neural Network was developed by Weibel and Lang in 1987.This network helps in the introduction of the “memory” in the neural network to deal with the connections. The architecture has continuous inputs which arrive at the hidden units at different points at different time and the inputs are stored in the memory.(figure)

The response of the TDNN in time t is based on the inputs in times (t-1),(t-2), ..., (t-k).The output function at time i is given by where  is the input at time  and  is the maximum adopted time-delay.

2.3.4 Radial Basis Function Neural Networks (RBFNN)

Radial Basis function neural networks (RBFNN) are non linear hybrid networks which have attracted various researchers due to their simplicity, fast training, high prediction precision and have been used in lot of applications such as pattern recognition (Krzyzak, Linder, & Lugosi, 1996), spline interpolation and function approximation (Poggio & Girosi, 1990). RBF emerged as part of ANN in late 80s. But, they are quite not popular in the dynamic systems due there disadvantage in approximating non-smooth function limits.

It consists of one input layer, one single hidden layer of processing elements (PEs), and one output layer as shown in the figure. The input layer is non-linear n-dimension vector and this input vector connects via urinary weights with the hidden layer. Radial Basis Functions( RBF) are the activations functions on the neurons of the hidden layer, which symmetrically attenuate in the radial direction off centers The value of RBF has maximum value equal to one. The hidden layer uses the Gaussian transfer function, rather than the sigmoid transfer function which is used in feed-forward back-propogation and recurrent neural network.  is the RBF acting on the th hidden neuron, which usually adopts the Gauss Function where i is the RBF bandwidth,  is the number of hidden neurons,  is the centre of  and  is the connecting weight between the th hidden neuron and the output neuron .The output layer [222]is given by

2.3.5 Probablistic Neural Networks (PNN)

Probabilistic neural networks have been used widely by the researchers in the forecasting and classification problems. The architecture is shown in figure 3 .When a financial time series is presented to the network, the first layer (radial axis) computes distances from the input vector to the training input vectors and it produces a vector whose elements indicate how close the input is to a training input. The second layer (competitive layer) sums these contributions for each class of inputs to produce as its net output a vector of probabilities. A compete transfer function on the output of the competitive layer selects the maximum of these probabilities, and produces a 1 for that class and a 0 for the other classes. (MATLAB).

Where m is the number of training samples of category B and p is the number of dimensions of the input pattern X. (K.Schierholt,1996)

2.3.6 Recurrent Networks(RNN)

RNN are defined as one in which the input layer’s activity patterns or network's hidden unit activations pass through the network more than once and output values are fed back into the network as inputs before generating a new output pattern.Recurrent Neural networks are appropriate in making forecasting model of the financial market as the feedback allows the recurrent networks to acquire state representations.The architecture of the ANN consists of the two separate components: temporal context (short memory) and predictor(feed forward part ) .Temporal context retains the features of the input financial time series relevant to the forecasting task and capture the RNN prior’s activation historical information.

Normally, most of the studies build three different types of RNN and then compare their performance. But as it is not feasible in this study to vary so many parameters in developing the three RNN, as it takes so much time(nearly 5-20 hours) to run single experiment based on RNN, we use the result of the previous studies. Tenti (1996) concluded that the RNN with hidden layer feedback perform better as compared to RNN with input layer feedback and RNN with output layer feedback. So, we are using RNN with the RNN with hidden layer feedback in this study.

In RNN with the hidden layer feedback(figure8), the hidden layer is fed back into itself through a layer of recurrent neurons. Both the input and recurrent layer as shown in the figure feed forward to activate the hidden layer, and then this hidden layer feeds forward to activate the output layer. So, the features of the previous patterns are fed back to the network. (Tenti (1996)).The output of a RNN is a function of the current input together with its previous inputs and outputs as given by: where  is the input at time  .

Theortically, RNN have an advantage over FNN by modelling dynamic relationship ,since the output of the neuron is function to the current input as well as to the previous input. In the literature view, we discussed the previous studies of the academic researchers in which they concluded that RNN are better than the FNN. They have disadvantages that they require substantially more memory as well as connections of nodes in simulation as compared to FNN and TDNN.

2.4 Literature Survey

In recent years, ANN has been used in the various applications such as powerful pattern classification and pattern recognition. For the last two decade, ANN has been extensively used in the forecasting problems. They have provided traders an alternative tool to predict the stock market. According to the Zekic (Zekic [1998]), ANN are used in the financial market for predicting stock performance for recommendation for trading, forecasting price changes of stock indexes, classification of stocks, stock price forecasting, modeling the stock performance and forecasting the performance of stock prices. They have several features which are valuable in forecasting the stock market. First is that they are self driven i.e. they don’t need prior assumptions about the model and they learn from the examples. Second, ANN can infer correctly the unseen time data series after training, even if the time series contains the noise. Third, ANN can be used to approximate accurately any continuous function. Fourth, ANN is non-linear

In this section, we discuss theoretically about the various open issues that have been subject of debate among academic researches for various years and support the use of nonlinear models and ANN in this study by giving examples of the previous studies by the various academic researchers.There has been an avalanche of studies on forecasting the stock market. A number of important academic research papers are reviewed below, chosen as they are either representative of current research directions , or represent a novel approach to this area of forecasting the stock market. In this section we present a very brief review of the related and recent studies. In addition, we also compare theoretically the performance of the various ANN in the previous studies.

MM Why consider Non-Linear Models

In recent years, non- linear model have become more common in forecasting as compared to the linear model (statistics), whose domain had been the forecasting for many years. Linear models have advantage that they can be easily analyzed and understood as well easy to implement as compared to non-linear model. Traditional Models (ARIMA method or Box-Jenkins) to prediction of the time series assume that the time series is generated from the linear process. However, they were wrong as the real world is often non-linear. (Granger, C.W.J., 1993).

During the last decade, many non-linear time-series models such as autoregressive conditional heteroscedastic (ARCH), the threshold autoregressive (TAR) model, and bilinear modelhave been developed. (Zhang et al., 1998) stated that non-linear models are still limited in that an explicit relationship for the data series .Moreover, as the number of patterns are very less in non-linear, so formulation of non-linear model to a particular set of time series is a very difficult task.ANN have been able to solve the problem as they are capable of making non-linear forecasting models without a prior knowledge of the relationship between the input and forecasted time series.

Research efforts on comparison of the ANN and statistical model are considerable and the literature is vast and growing .This thesis does not wish to enter into the argument whether to accept or reject the linear models. Based on the results of the previous studies ((Granger, C.W.J., 1993), (Zhang et al., 1998), Tang et.al (1991), Kohzadi et.al (1996)), it concentrates on the non-linear methodologies(ANN) to be used for development of the financial models. The review presented above is comprehensive and it supports the use of the non-linear models(ANN) in this study.

2.4.1 Relative Performance of ANN in forecasting

In this section we compare the performance of the ANN with the widely used statistical methods. There are many inconsistent reports on the performance of the ANN in the literature on the various academic papers. This may due to large number of reasons such as wrong selection of network (not ideal network structure), training method and use of linear data in forecasting .In this section, we attempt to provide the comprehensive view of the current status of the research. There are several academic papers that are devoted in comparing the ANN with the conventional methods, which are described below:

1. Sharda and Patil (1990), (1992) concluded that simple ANN models are comparable to Box-Jenkins method. They used the 75 and 111 M-Competition time series to make comparison between them.

2. Tang et.al (1991) concluded that for time series with more irregularity and short memory, ANN outperforms the Box-Jenkins (ARIMA). But for large memory, the performance of both the model is same. They used the reexamine the same 3 time series from the Sharda and Patil (1990).

3. Kang (1991) used the 50M-competition series to make comparison and ANN and Box-Jenkins (ARIMA).He concluded that the best ANN model is always better than the Box-Jenkins.

4. Hill et.al (1994), (1996) forecasted 50 M competition time series with the ANN and statistical method and the results of the ANN were slightly better than the statistical method.

5. Kohzadi et.al (1996) used monthly live cattle and wheat prices as data to compare the ANN and AIRMA and concluded that ANN can find more turning points and consistently better.

6. Bruce et.al (1991) concluded that ANN models are inferior to the statistical models by forecasting the 8-electric load data series.

7. Caire et.al(1992) concluded that ANN are more reliable for longer step-ahead forecasts but hardly better than ARIMA for 1 step-ahead forecast by using one electric consumption data.

8. Foster et.al (1992) concluded that the performance of ANN is not good as linear regression and simple average of exponential smoothing methods.

9. Nelson (1994) concluded from his results that the ANN is unable to learn the seasonality. He used the 68 monthly time series from the M-Competition. His results were in contradiction to the result of Sharda and Patil(1992) ,who proved that the performance of the ANN are not affected by the seasonality of the time series.

10. Tang et.al (1991 and Tang and Fishwick(1993) studied under what conditions that ANN are superior than the traditional time series forecasting such as Box-Jenkins models. He concluded :

(a) ANN perform better when the forecast horizon increases.This was confirmed by studies of Kang (1991), Caire et.al(1992), Hill et.al (1994), (1996).

(b) In case of short memories, ANN perform better .This was confirmed by study of Sharda and Patil(1992).

(c) When we have more input nodes, we get better results with the ANN.

The review presented above is comprehensive and a considerable amount of research has been done in academics to find whether the ANN is better than the statistical methods.While there is no final word on this issue between the academicians, the prevalent view in this literature by most studies follows an ANN in the forecasting of the stock market .There is now strong evidence that ANN can forecast the stock market and returns of the stock market are not independent of past changes. So, these studies discussed in the literature review support the use of ANN in this study. However, the lack of many studies supporting the statistical methods over ANN does not rule out the fact that the statistical methods cannot be better than the ANN in the forecasting. There could be many tests that were often inappropriate and some conclusions could be questionable.

2.4.2 Prior research on stock market prediction using ANN

As mentioned in previous section that the potential use of the ANN in forecasting has been performed in recent years. There are various studies which have used the ANN in the forecasting of the stock exchange. The most important findings are described below.

1. One of the earliest studies was by Kimoto et.al (1990); they used ANN, and several prediction methods for developing a prediction system for the Tokyo Stock Exchange Index. They investigated the performance of the model by using the correlation coefficient and concluded that the correlation coefficient produced by the multiple regression was much lower than the model. However, the correlation coefficient may not be a proper measure for performance of the forecasting model.

2. Kamijo and Tanikawa (1990) employed recurrent neural network for analyzing candlestick charts which are used to study the pattern of the stock market.

3. Choi, Lee and Lee (1995) and Trippi and DeSieno (1992) forecasted the daily direction of change in the S&P 500 index futures using ANN.

4. Mizuno et.al. (1998) applied ANN again to Tokyo stock exchange to predict buying and selling signals with an overall prediction rate of 63%.

5. Phua et.al (2000) applied neural network with genetic algorithm to the stock exchange market of Singapore and predicted the market direction with an accuracy of 81%.

2.4.3 Relative Performance of Various ANN in Finance

Although the body of application of ANN literature is substantial, there is still a great deal of inconsistency in the findings. This is particularly the case in the relation between and futures price. While most studies agree on the importance of futures prices for financial markets, only a few studies, if any, agree on how, and why it is important. Furthermore, the vast majority of the literature is based on analytical models. A major shortfall of econometrical model is making strong assumption about the problem. This means if the assumptions are not correct; the model could generate misleading results.

1. Connor&Atlas(1993),Adam et.al, (1994) have confirmed the superiority of RNN over feed forward networks when performing non-linear time series prediction.

2.4.4 Prior Research on the Various Trading Strategy

The relation between the stock price prediction model and using that model as an investment tool by applying different trading strategies has been the centre of attention for a large number of studies, and the literature is rich with several studies covering a range of aspects with respect to this relationship. As the main aim of the trading strategy is to assist an investor in making accurate financial decisions, the application should be based on a profitable trading strategy. There are numerous trading strategies available like Buy and Hold (B&H), Stop and Objective Strategy (S&O), Neural Network stop (NN B&H) and Objective strategy or Buy and Sell (NN S&O) that are discussed in futures-spot literature. (Atya & Talaat, 1987).

Amir Atiya and Nohia Talaat (1997), tested four different trading strategies and concluded with the result that the neural networks results are consistently superior, especially NN S&O. P. B Patel (T.Marwala & Patel, October 8-112006) has also compared the ”Buy Low ,sell high ” and “buy and hold “ trading strategy, with the former better than latter.[1,a] The Buy low, sell high trading strategy, as the name suggests , involves an investor purchasing certain stocks at low price and selling these stocks when the prices is high. H. Pan et.al (2003), tested the ”Buy Low ,sell high ” and got the maximum rate of return as 10.8493%.As the main aim of the thesis , is to develop the forecasting model using the ANN and then show profit to the investor using one of the trading strategy being used in the academics. In this study, we are using the trading strategy of the H. Pan et.al .

2.5 Conclusion

Chapter 3.Methodology

3 Methodology

In this section, we discuss the design and methodologies to be used to build the different types of ANN, and hybrid approaches forecasting models based on the direction and value accuracy for different trading strategies, which are built under the guidance of the literature survey and theortical framework of the ANN employed in this thesis. The development process of the forecasting model is divided into the eight steps (figure 1) which are discussed in detail with the literature review of each step to suggest the optium step. In addition, we also modify various existing approches to find the optium solution to each step.

3.1 Variable Selection

The most important step in the construction of the forecasting model is the selection of the data. As mentioned earlier in the thesis, wrong selection of the data can decrease the performance of the forecasting model by increasing the noise in the forecasting model. Even if all other steps in implementing the forecasting model are implemented in the efficient way, the performance of the model will not increase if we select the wrong data. So, it is necessary to know which input variables are important and influence the stock market. However, economic theory can help in choosing the variables which are likely to be most important predictors.

As discussed in Section 2, we are using both the technical and fundamental data as a potential input data in this study .Technical inputs are defined as lagged values of the FTSE100 (dependable variable) calculated from the lagged values. Fundamental values are economic variables which influence the FTSE 100 (dependant variable). A most popular approach which is widely used in the academics is to choose as many input variables which may affect the stock exchange. Günsel et.al (2007) suggested that the market is influenced by the interest rate. It is commonly accepted that the interest rate should be included in the forecasting model as an input variable. Günsel et.al (2007) stated that as there is considerable increase in the economic globalization, so all business is directly and indirectly affected by the international activities. So, exchange rate, other important international stock exchanges, stocks of the international stock exchanges may affect the stock market. Most of academic studies stated that the dividends of the stocks also influence the stock market. If the dividend of the stock is higher, more people will buy the stock which may affect the stock market .But, Günsel et.al (2007) stated that Shah & Wadhwani founded less evidence that the dividend may affect the stock market of the countries except the US. So, we are not including the dividends as an input data.

Kaastra and Boyd [1996] stated that the intermarket data such as the Dollar/Yen and Pound cross rates and interest rate differentials could be used as an input data when forecasting the stock exchange. Fundamental information such as the current account balance, GDP, unemployment, money supply or wholesale price index may also be used. Izumi and Ueda (1999) stated that macroeconomic factors such as inflation and short-term interest rate have direct impacts on the stock returns.

Moreover, the choice of the input data variables is also affected by the type of the forecasting i.e. whether it is short or long term forecasting. An investor or trader in the trading floor would like to use the daily data in designing the forecasting model, while an investor with long time investment option might use the weekly or monthly data.

3.2 Collection of Data

The academic researcher or investor should consider the cost and availability of the historical information of the input variables chosen in the previous subsection 3.1.Techncial data is readily available from many vendors or websites at free or nominal cost while fundamental data is difficult to obtain. Kaastra and Boyd [1996] stated that the website and vendor should have the reputation of the highly quality data. However, the data collected should be checked for error at other possible sources to confirm whether the data that is used as input variable to next step is free from the errors.

In this study, we are not including the variable with missing data more than 5% of the total data.As mentioned in previous section, we take the technical and fundamental data in this study.

MM Technical Data

Technical Data used in the financial time series prediction is normally:

• Close Value

• Highest Price during the day

• Lowest Price during the day

• Volume (total number of stocks being traded)

Siddhivinayak et.al (1999) stated that weekly, and monthly data are preferred for forecasting horizon because it is less noisy. Some of the studies use the daily data while some of them use the intraday data for forecasting. (Hellstr¨om and Holmstr¨om [1998]), stated that the intra data is not often used for the modeling of the financial market and the most obvious choice of the data is to choose the closing price of the time series. He also stated few disadvantages too. So, in this study we are using the closed value of the variables.

MM Fundamental Data

Fundamental data describes the information about the current economic situation of the market as well as macroeconomic parameters. Analysis of the stock exchange is usually done on regular basis by the researchers which help in evaluating the accurate value of the stock exchange. (Hellstr¨om and Holmstr¨om [1998]) stated that the following factors are considered by the fundamental analysts:

• The factors such as inflation, interest rate etc which measures the state of the economy.

• The condition of the market, to which the stock index belongs: Stock price Indices ( Dow Jones, DAX, FTSE 300, S&P 100 etc) , Exchange Rate of the market currency(pound in this study) in comparison to the other currency ,the value of the major stocks in the index.

• The condition of the companies in the stock index measured by the factors such as P/E (Price /earnings) ratio, debt ratio etc.

MM Technical Indicators

The technical indicators are extensively used in the prediction of the stock exchange. As discussed in Section 2, that the technical indicators improve the performance of the model. So, we are using the commonly used technical indicators such as RSI (Relative Strength Index), Moving Average(section 2.34.) which have been widely used in the academics.

3.3 Data Pre- and Post Processing

The performance of the neural network can be improved by the quality of the input data. Therefore, this step is very crucial step in the construction of the forecasting models from the neural network .It is not necessary that an additional data could improve the performance of the forecasting model as additional data can increase the number of training data which could result into a phenomena of the curse of dimensionality. (Dimitri PISSARENKO, 2000).So, the reduction of the dimensionality is necessary which is done in the data preprocessing section.

The remainder of this section will elaborate on the various stages of this procedure that includes fill missing data, normalize input data, and calculate moving averages etc.

3.3.1 Missing Data

The main problem in the data collected from the various websites is that there have been so many missing values, and the performance of the model or neural network would decrease if we don’t remove the missing values as they would increase the noise in the model. Moreover, if we perform the calculations on the missing value which is replaced by NaN in MATLAB, the NaN values will be propagated to the result.

Various researchers have stated alternate ways to find the missing value. C. D. Tilakaratne and S. A. Morris [a] stated that the rate of the change of price or index value should be zero in such cases and the missing values of the close price should be replaced by the corresponding close price of the last trading day. Heinkel and Kraus [b] stated that there are three ways to deal with missing days with no trading. First is to ignore the days with no trading and use data for trading days only. Second, assign a zero value for the days in which there is no trading. Third is to build a linear model which can be used to estimate the missing value. Olivier Coupelon [c] stated that the missing values should be filled by interpolation and he concluded that the results are much better with interpolation. There are many ways of doing interpolation like linear, cubic spline but we are using the Linear interpolation approach as we are using the financial time series as an input in the project .Linear interpolation takes a point and  and builds an approximating line for the sequence , which is the line connecting the two points giving the piecewise approximation a “smooth” look. However, such approximation can produce a incoherent look on some data .The formula for calculating x for given value y (which is the date variable) is given below

3.3.2 Normalization

Normalization is a technique to rescale the input variables into the range of . In section 2, we discussed that some of the non-linear activation functions (logistic function) have output range of  or . Even if linear outputs transfer function is used, it is advantageous to normalize the outputs as well as the input to avoid computational problems (Zhang et al., 1998). Moreover, different input variables can have typical values which differ significantly. So, it is necessary to perform the normalization before the training process begins.

Azoff (1994) described the four methods of the data normalization which are summarized below.

1. Along Channel Normalization: A channel is defined as a set of elements in the same position over all input vectors in the training or test set. The along channel normalization is performed column by column if the input vectors are put into a matrix, i.e. it normalizes each input variable individually.

2. Across channel normalization: Across channel normalization is performed for each input vector independently, i.e. normalization is across all the elements in a data pattern.

3. Mixed channel normalization: Mixed channel normalization method uses some kind of combinations of along and across normalization.

4. External normalization: All the training data are normalized into a specific range [3.1.3Zhang, 1998].

For the time forecasting problem the external normalization methods is commonly used methods. Although academic researchers have used the channel normalization method in forecasting time series but it can cause the problem as in channel normalization every data is normalized separately and hence the information may be lost from the data time series. In our study, we are using the external normalization.

Four different methods of normalization are summarized by Zhang (1998):

1. Linear Transformation to :

2. Linear Transformation to  :

3. Statistical Normalization :

4. Simple Normalization:

where  and  represent the normalized and original data;  ,  ,  and  are the minimum, maximum, mean, and standard deviation along the columns or rows Zhang (1998).

But it is still unclear whether there is need to normalize because the weights used in the neural network can do scaling. Shanker et al. (1996) investigated this topic and concluded that normalization is useful in terms of the classification rate and the mean square error, but usefulness of the normalization decreases as the sample size of the network increases. As, his results demonstrated that although the use of normalization decreases the speed of training but it increases the accuracy of the result, so we are using the normalization in our study. However, the results we obtain from the network will be in the normalized range, so we need to rescale them to the original value and the accuracy of the model would be based on the rescaled data set.

3.3.3 Treatment of discrete data

Continuous data and ordinal discrete data (those which have some mathematical value such as interest) can be feed into the forecasting model directly without processing. But categorical discrete values cannot be directly fed into the network and they need to be encoded. In this study, we use the 1-of-c coding. The principle of this encoding is that there is only one node corresponding to a discrete value of the variable.

For example, if we have to represent the day of the week ( “Monday”, ”Tuesday”, ”Wednesday”, ”Thursday”, ”Friday”) , it can be represented by

If the day of the week is Thursday , it will be represented by (0,0,0,1,0,0,0) respectively.

3.3.4 Moving Averages

Moving averages is an indicator used in the technical analysis technique to forecast the future outcomes of the index using the average of the actual historical data. In addition, it is one of the known approaches to remove noises from the data by smoothing a data series, making it easier to identify trends. They are best suited for the trend identification, not for prediction. Generally most of the traders use four types of moving averages: Simple moving average, Exponential Moving Average (EMA), Smoothened Moving Average (SMMA), and Linear Weighted Moving Average (LWMA).

Simple moving average () is calculated by summing all the past closing prices over the specified number of the time period. The formula for calculating the  is

Where is the historical data in period I and n is the number of periods in moving average.

Smoothened Moving average () is similar to the moving average and the average is calculated by subtracting the previous  value rather than the oldest value. The first value of the  is calculated in same way as the .The second and succeeding moving averages are calculated by the formula where is the total sum of closing prices for n periods,  is the smoothed moving average of the first bar, and  is the current closing price.

Exponential Moving Averages  uses a smoothing factor (W) to place a higher weight to the most current value. Technicians often use EMA to reduce the lag in the simple moving average .The formula for calculating the  is given below:

Where  is given by W=2 divide by (1+N) , where N is the number of the days for which  is being calculated.

Linear Weighted Moving Average () is a weighted average of the last number of periods and attaches greater weight to the most recent data and less weight to the previous data i.e. weight decreases by 1 with each previous value. For example, in a six-day linear weighted average , today closing price is multiplied by the six, yesterday by five and so on until the first day in the period range is reached.

The problem arises using the moving average in the prediction of stock market is the number of the moving periods to be used, how many to be used and to choose which type of moving average. The moving averages can be of any period, which can be as low as 10 days or as high as 200 days. The smaller is the period (trader view) of the moving average, the more sensitive it will be and will identify the new trends earlier while the longer is the period ( investors view) of the moving average, the more reliable and less responsive it will be, and identifies big trends.

Some of the traders also use Fibonacci numbers of 5, 8, 13 and 21. Moreover, they vary according to the current price volatility, trendiness, and personal preferences.  is closer to the actual or current market prices than  and some traders/ investors prefer to work with the  as it identifies changes quicker for shorter time periods and it is more representative of the current market prices. In addition,  is generally used by the traders to identify long term changes over long term periods. Due to the complex affection of the various factors affecting moving averages, we are using the 5, and 10 day moving day average of each type of the moving average

Momentum

Volatility

3.3.5 Feature Selection

We discussed earlier in this section, the problem of the curse of the dimensionality which could decrease the performance of the model. This problem is solved by the feature selection which discards the variable which doesn’t carry much information. For forecasting the FTSE 100 trend and price accurately, we need to select the potential inputs or variables to avoid the interferences in the training process which may increase the error occurrence or decrease the explanation ability of the model .It may be due to the inputs themselves which have low impact to the model or mutual function of the two inputs who are decreasing the performance of the model under their influence.

The selection criteria of selecting the subset from the whole set should measure the performance of the subset as compared to another subsets. Ideally this would involve training all possible subsets in the ANN. But it is not feasible to train the neural network with every possible subset of more than 200 input variables. For e.g. If we have 200 input variables, we have to train the neural network  times to find the best subset. Therefore, we should use another approach used by other academic researchers in their study.

The regression analysis is the technique that involves finding the correlation-ship between the variables to explain how the variation in outcome or dependable variable depends upon the variation of the specified independent variables. In the past, different researchers have used different ways of regression analysis which depends upon the theory of the dissertation. Moreover, regression analysis is also used by the analysts to predict the stock market. Regression analysis in financial forecasting can be done in two ways: Linear regression (used by Hossein Abdoh Tabrizi ) and Logistic Regression (Used by Hakan Aksoy (2004)). In this project, we use linear regression as compared to the logistic regression.

In matrix form, the simple linear model regression equation can be written as where  is an  vector of observation,  is an  vector of parameters,  is an  matrix of observations,  is an  vector of independent normal random variables, , i.e. where I is an  scalar matrix (Paul G.et.al,2008). But in our case, we have more than one independent variable, so we need to use multiple regression (the term was first used by Pearson, 1908).The equation (1) can be written as

where i=1,2...............p and  are the regression coefficients (square root of R-square) which determine the relative significance of corresponding independent variable. It can have value between 0 and 1 and the sign of the coefficient determine the relationship, whether it is positive or negative. If the value is 0, there is no relationship between the variables and if value is 1, then there is strong relationship between them.

Basically, there are five methods of introducing variables into a regression analysis model: Simultaneous entry, stepwise entry, Forward Selection, Backward deletion, Hierarchical regression. (xxx).Most of the academic studies have used the step wise regression for the factor selection and they have good result while using it. Although this study does not rule that other methods are not good in factor selection in the forecasting of the financial time series. This thesis does not wish to enter into the argument whether to accept or reject that step wise regression is the best method. Based on the results of the previous studies (Pei.et.al (1996)), it concentrates on the step wise regression to be used for the factor selection for the development of the financial models.

MM Stepwise regression

In stepwise regression, we first input all the potential explanatory variables and then sort the input variables out and leave more influential variables in the model. Then in this operation we use the methods of adding variables onward or removing backward to find the fittest combination for analyzing index. The criterion that is being used in this study to add or remove potential variable is decided by the sequence of F-tests and decreasing the sum of squared error. Other criteria such as T-tests, adjusted R-square, Akaike information criterion, Mallows' Cp, or rate can also be used. But the F-tests are widely used by the academic researchers and there has been no academic study which has compared the performance of all of the criteria. So, this study does not rule that other criteria are not good. Based on the results of the previous studies (Pei.et.al (1996)), it concentrates on using the F-Tests as a criterion.

In the step wise regression, we increase the variable numbers step by step after the entrance of the first variable to the model. Once we remove the variable from the model, it cannot be entered again as variable. The critical point, values of Fe (F-to-enter), Fo (F-to- remove) and level of significant have to be determined before insertion of the variable as well as selecting variable. Then we compare the value of the Fe and F0 with the F value of each of the step .If the value of the F is greater than the Fe, we add the potential variable to the model .If F is less than the Fo, we remove the variables from the model.

3.4 Data Partitioning

3.4.1 Partitioning of the Data

Any study that is using the ANN for forecasting must separate data which will be used for the training, validation and testing. Refenes et al. (1993) described that the relationship of the security prices (index) and the variables that affect the price changes with the time. That’s why, we partition the data vertically instead of the horizontally. Although the horizontal prediction can give better accuracy but they are unrealistic. A vertical partition of the data, will divide the data set into three partitions, one for training, one for validation and one for testing.

There are no standard theories or approaches that can determine the actual ratio of the split of the training and testing, which depends upon the data and the methodology. There are wide varieties of competing guidelines by various academic researchers on the actual ratio split. Ruggiero (1997) and Kim and Lee (2004) suggested an 80.20 split, Kaufman (1998) suggested a 70.30 split, Gatley suggested 90.10 split. These academic studies do not include the validation set, but validation set should be chosen as it strikes a balance between obtaining an enough set of data to evaluate the model and having enough observations for both training and validation (Dimitri Pissarenko, 2001-02). Moreover, we should not include the validation set into the training set. In essence, the main aim of the training should be to capture the variety of stock market as possible with training, while keeping the testing window as small as possible. As this study is of 6 years period, it is reasonable to split the remaining data 80.10.10 as training ,validation and testing data and then evaluate the model by forecasting the next day value of the FTSE100 . Moreover, this split provides a reasonable compromise and takes all the guidelines.

3.4.2 Sliding Moving Window Approach

Most of the studies that uses ANN for time series forecasting uses a fixed window approach. The data is usually divided into the three sets: training data, validation data and test data. We train and developed the neural network by using the training and validation data and the network is tested through the test data. As mentioned, in the Data portioning section that different academic researchers divide the data differently. In addition, the size of each portion of set of data remains constant after the splitting of the data. But this approach is not good in case of the financial time series as the characteristics of the time series keep on changing with time. So, the forecasting model build from this approach will give poorer result. [Weighted]This approach is useful when the time series is relatively stationary and the parameters that affect that time series are constant. But, even if the financial time series is constant, we cannot use this time series because possibly the parameters and the noise keep on changing with time. So, we need to include new time series or observations as time keep on changing. Normally two approaches are used by academic researchers in case of the financial data: rolling and moving window approach.

In case of the Rolling window approach, the data or sample size increases with the time as it keeps adding a new data as time changes. But in case of the moving window approach the data size remains constant as when we add new observation, we delete the oldest observation. Rolling Window approach suffer from same disadvantage of the fixed window approach when the time series data keeps on changing. So, we are using the moving window approach, although it has disadvantage that it is difficult to find the window size. But we can find the best window size by an experiment in our implementation. In this section, we explore the advantages and algorithm of the moving or sliding window approach and their usage by the various academic researchers in forecasting, in improving the performance of the forecasting model.

The procedure of moving window approach works by selecting the left point of the time series .i.e the first data point of time series to the right of the time series according to the window size. Normally, we advanced by N days after each training, validation and testing, where N is number of time series in each test period.

In the window approach, the window is composed of two sub-windows which are cascaded together:

where,

 is a  matrix , which contains the set of past input vectors as the window in this study.

 is a  matrix, which contains the set of new input data vectors

But in this case, we modified the moving window approach by taking N =1. So  becomes

If we define the input matrix  as then for given (unitary separator), the corresponding output can be written in a matrix , which can be written as

But in this case, we modified the moving window approach by taking . As mentioned earlier in this section, the problem in the case of the moving window approaches is to ascertain the window size. If the window size is too small, it will not be able to represent the trend of the time series and will give poorer result. But if the window size is too large, then it will increase the computational complexity of the network. Mayhew (2002) uses a window with a length of 1000 as window size of daily observations to show evidence of decreasing autocorrelation for the US stock market between the early 1970’s and 2000. Some of other academic researchers use the small window size. But we solve the problem in our study which is explained in the chapter 4.

From the view point of the thesis, the discussions and the works above revealed that moving window approach should be used for the forecasting of the stock exchange. As such, we use moving window approach for this project.

3.5 Neural Network Design

In earlier section, we discussed theoretically about the benefits of using ANN over other statistical models in the forecasting problem such as that they are non-linear, notoriously good at detecting the non-linearties, and is universal function approximators. In the broadest sense, the main requirements for any successful ANN forecasting model is convergence, the ability of the model to perform with new information and satiability of the network output. To insure the above points in the design of the forecasting model, a large number of issues or challenges need to be taken into account, the size and frequency of the data, architect of the network, activation function, and the number of hidden neurons, amongst other. Although some rules of designing the ANN exists. However, there is no evidence that these rules should work for every forecasting problem. Therefore designing ANN could be challenging task. We chose to flow a systematic theoretical approach in this section for finding the rules for optimal ANN design for this problem by discussing the studies of the previous academic researchers.

3.5.1 Hidden layers and Number of neurons

There are no standard rules or generally accepted theories which can determine the optimal number of hidden layers and number of neurons in each of the hidden layer. Basically, they depend upon the number of inputs and outputs, complexity and design of the problem, amount of noise, training algorithm and the number of training cases. There are several approaches or general rules of thumb that are purposed by the researchers that can be used to determine the number of hidden layers and number of neurons while designing an ANN methodology, although for greater generalization, the smaller their value the better is accuracy of system.

Shih (1994) suggested that number of neurons and hidden layers can be inferred by construction nets that have a pyramidical topology. Azoff (1994) suggested that finding optimum value is problem dependant, and matter of experimentation. As another alternative, some researchers, for example Kim & Lee (2004) and Versace et al (2005), suggested to use the genetic algorithms to select the input selection and processing. The table 1 shows the approaches of other researchers to determine the number of neurons that are retrieved from the literature.

In this study, we used the alternative approach described by Tan (2001) with some modification, to start training of the network with the one hidden layer, with square root of N hidden nodes, where N is the number of inputs smallest number of neurons and increase them gradually till you find the accurate value. But we alter the approach of the Tan (2001) in this study, as it may not result into the accurate result, as he has kept the number of hidden neurons in each layer constant. So, we also train and test the network by increasing and decreasing the number of neurons in both directions by one in each layer, and the direction in which we find the superior result in terms of metrics being used, we use that direction to increase (in case of the superior results in forward direction) or decrease (in case of the superior results in downward direction) the number of neurons until we find that next network is inferior to the previous one. After testing the previous network by calculating the optimum number of neurons with one hidden layer, we increase the hidden nodes by 1 and the training and in sample testing process are repeated. If the new network is superior than the previous network in terms of the metric is being used, we repeat whole process again of increasing the number of hidden nodes until the next network shows inferior performance as compared to previous network.

Normally, the number of neurons should be bit higher for complex problems related to decision areas and if they are low, they may result into the under-fitting. The Tan (2001) approach preserves the generalization capabilities of ANN, as it is observation that the training error, validation error and generalization error usually decreases when the training process starts, and it increases when we increase the network size which can be risky as it can generate lot of noises and can result the phenomena of over-fit.

3.5.2 Activation or Transfer Functions

Transfer functions are also called the activation functions. Transfer functions, are mathematical expression that determine the relationship between the input and output of a network (model) or node. In general, any differentiable function can be used as activation function, but only small amount of them is used .The table shows some of the activation function used in implementing network. Linear Transfer Functions are not appropriate to be used in the financial market. This statement was supported by the study of the R. M. Levich and L. R. Thomas [1993] and Kao and Ma [1992] who concluded that non linear transfer functions are more appropriate to be used in the financial market because the financial markets are non-linear and have memory .In this section, we explore various possible non- linear transfer functions and their usage by the various academic researchers in forecasting, in improving the performance of the forecasting model.

There are some heuristic rules which are used by the researchers for the selection of the transfer functions. Generally, Transfer functions such as the sigmoid are commonly are used in the financial time series because they have non-linear characteristics and are continuously differentiable. Klimasauskas (1993) stated that the sigmoid transfer function should be used for classification problems, if the network has to learn the average behavior and hyperbolic tangents works best for the problem which involves the learning about deviations from average. However, this statement has not stated the major effects by different transfer function on the performance of the forecasting model, which is explained later in the section.

Generally, a standard network uses different transfer functions at different layers and nodes. But, majority of the network uses the same transfer function for all nodes in the same layer. Most of the studies use the sigmoid transfer function for the hidden nodes. A sigmoid layer has two transfer functions –the tan sigmoid (tansig) or the log sigmoid (logsig) function. The logsig function gives output in the range of 0 to 1 and tansig transfer function gives output in the range of -1 to 1 by taking input in the wide range. PISSARENKO( 2000 ) stated the problem associated while using the sigmoid function that gradient changes very little at the extreme which causes the outputs to change very little even if we have inputs which are quite different in the value .This is the reason that the multiple layer with sigmoid transfer function have similar problem. So, it is necessary to restrict the number of sigmoid layers and if necessary, increase the number of linear layers. Moreover, the transfer function with the smaller range of output performs rather poorly as compared to the transfer function with wide range. This statement was verified by the experiment by PISSARENKO (2000), that logsig transfer function give poorer result as compared to the tangsig transfer function which has double the output range as compared to the logsig transfer function.

But there has been academic research which has investigated the relative performance of different types of linear and non-linear transfer function on the output nodes. Various academic researchers (Chakraborty et.al. (1992)) have used the logistic activation for all hidden and output nodes. Zhang, X., Hutchinson, J. (1993) uses the hyperbolic tangent transfer functions in both hidden and output layer. Since the actual output from the network normally have value between the range [0 1] or [-1 1], the target values needs to be normalized when we use the non-linear activation functions in the output layer. (Schoenburg, E., 1990) uses mixed logistic and sine hides nodes and a logistic output nodes. Rumelhart et.al. (1995) gave the theoretical evidence of using the linear activation functions for output nodes and showed the benefit of using the linear output nodes for output nodes for forecasting problem with a probabilistic model. Many academic studies used the linear transfer functions in the output node for forecasting problem. Conventionally, the problem which involves the forecasting of continuous or time series should use the linear activation function for output nodes while the problem which involves the classification problem should use the logistic activation functions.

From the view point of the thesis, the discussions and the works above revealed that tangsig and linear transfer function should be used for the hidden and output layers for the forecasting of the stock exchange. As such, we use logsig and linear as the transfer function for the hidden and output layer for this project.

3.5.3 Number of Output Neurons

To decide the number of the output neurons is straightforward as there are many studies which have use only one output neuron. Dimitri Pissarenko(2001-02) stated that the if the outputs are widely spaced the network with multiple output will produce inferior result. Moreover, an ANN trains by selecting weights such that average error overall output neurons are minimized. As, we are forecasting only one value, we use one output i.e. closed value of the next day FTSE100.

3.5.4 Training Algorithms

Training refers to the process by which the weights or parameters of the ANN get optimal values. In training of the neural network, the weights of the ANN are iteratively modified to minimize the error between the output and the actual value. There are many training algorithms which are used extensively for the academic purposes, but there are no guarantee optimal training algorithms for non-linear optimization problems. All training algorithms suffer from a problem of the local optima problem .As such, we don’t have any true global solution but we can use the training algorithms which are widely used by the academic researchers for the forecasting problems. In this section, we will discuss the learning algorithms for different ANN and their usage by the various academic researchers in forecasting, in improving the performance of the forecasting model.

Gradient steepest Descent algorithm (Section 2) is extensively used as training method for the back-propagation. But it suffers from a problem of slow convergence, inefficiency and lack of robustness. The figure 6 explains this problem which is an analogous to the problem of the backpropogation. The ball has to be thrown from a position X to Y .Applying too much force (learning rate) will cause the ball to oscillate between points X and Y or it may never return to X. If we apply too little force it will not escape from point A or it will not improve the learning process.

This problem is solved by using the additional momentum parameter which will increase the learning rate and thereby minimize the tendency to oscillate and speed up process. The modified back propagation training rule is:

where  is the learning rate, is the momentum term,  is the change of weight at learning epoch  and is the ith input to neuron j.(Kaastra and Boyd [1996])

Now the problem is to choose the value of the learning rate and momentum rate simultaneously. There is no standard value for them, the best values are chosen by experimentation .They can take value between 0 to 1 .In this study we use the concept of the Sharda and Patil (1992) to try nine combinations of three learning rates (0.1, 0.5, 0.9) and three momentum values (0.1, 0.5, 0.9) .Tang et al.(1991) stated that high learning rate is good for complex data and low learning data with high momentum should be used for more complex data series. This algorithm is used extensively by academic researchers for feed-forward and time delay neural networks for forecasting problems.

However, there are many algorithms which are better than this gradient descent have been purposed such as quasi-Newton, BFGS, Levenberg-Marquardt and and conjugate gradient methods. Levenberg-Marquardt is been used by academic researchers in time series forecasting due to its faster convergence, robustness, and the ability to find good local minima. De Groot and Wurtz (1991) stated that by using the Levenberg-Marquardt there is significant improvement in training time and accuracy for time series forecasting.

Levenberg-Marquardt is a class of the non-linear squares algorithms and it is considered to be the most efficient algorithm for training purposes and is often the fastest back propagation algorithm in the MATLAB. The only disadvantage is that it requires lot of memory. It transfers a non-linear model into the linear model. This algorithm provides the solution to optimize the estimation function by combining an objective log-likelihood function, a conditional least squares estimation, a modified Gauss-Newton method of iterative linearization, a steepest directional supervisor and a step-wise governor to enhance efficiency .[asd]

The performance function in the form of sum of squares is

where the weights of the network are given by

The sum of squared error function is given by

If we use the recurrence formula than we can derive the Newton’s method for minimizing objective function

By using the equation(1) and  as the gradient of , then the Hessian matrix can be approximated as,

The gradient can be computed by using the equation(2)

where J is the Jacobian matrix and  is a positive definite and if it is not positive quantity then we have to make changes into the equation that will make it positive.Thus,

where is the learning parameter which ensures that is a positive quantity .Intially, we take the learning parameter large and decreased as the iterative process approaches to a minimum.By using the result of all the equations, we can sum that inversion of square matrix is involved in the Levenberg-Marquardt. In addition, the reason of having large memory requirement for this process is due to the fact that large memory space is required to store the Jacobian matrix and and Hessian matrix along with inversion of approximate H matrix of order  in each iteration of the process.[Syed Muhammad Aqil Burney.et.al(2005)]

From the view point of the thesis, the discussions and the works above revealed that gradient descent algorithm with momentum backpropogation should be used for feed-forward and and Levenberg-Marquardt should be used for all ANN.In this study, we are also comparing the performance of the Levenberg-Marquardt with the gradient descent algorithm by construction of the FNN from both these learning algorithms.

3.6 Training the ANN

ANN is normally trained so that it can learn patterns in the historical data by iteratively presenting it the set of examples to the correct known answers. As mentioned earlier, the main aim of the training is to find the set of weights between the neurons which determine the global minimum of the error function. The set of weights should provide good generalization unless the ANN is over fitted.(Dimitri Pissarenko(2001-02) ) The most important issue in training is the number of iterations.

MM Number of iterations

Dimitri Pissarenko(2001-02) stated that there are two school of thoughts at which training should be stopped. The first is that researcher should stop training when there is no improvement in the error function and the point is called convergence. The second point states that training is stopped after predetermined number of iterations and then we evaluate the network ability and the training is resumed.

The second thought has been criticized on the fact that additional test-train interruption can cause the error to fall further instead of increasing and there is no way to know whether additional training could increase or improve the generalization ability of the network. Both the thoughts differ on the notion of the over-training versus over-fitting. The first thought says that there is no such thing overtraining; only over-fitting exists. The problem of over-fitting can be solved by reducing the number of hidden neurons. Both the thoughts have advantages and disadvantages and we don’t wish to go into the argument discussing both of them in the detail as the main aim of this section is to find the optimal method. As we have computational resources very less, we try different approach which saves the time as well as computational resources. In this study, we use plot the graph of the sum of squared errors for each iteration and stops at the point where the improvement is negligible. Using this approach, the researcher can choose the maximum number of iterations based on the point in the graph where the sum of squared errors stops decreasing and flattens. This method solves the problems of over-training.

3.7 Evaluation Metrics

Forecasting model are typically evaluated and compared using evaluation metrics by academic researchers, traders and investors. This is the first step in design of the forecasting model is to choose how to measure the performance of the steps in the design of the system and also, of overall system .There are set of commonly used metrics which are used to compare forecasting ability of each model, each designed for particular type of approach. Further, the forecasting models can be compared using various statistical methods, which are used in this thesis. The detail of the evaluation metrics and their importance, and significance is explained in this section.

The first evaluation metric is Mean Square Error (MSE), which is commonly used to measure the performance of overall system in terms of the amount by which the predicated value differs from the actual value. The formula for the MSE is given by

where  is the number of predictions,  is the predicted value for time t, and  is the actual value at time t.

The second evaluation criterion is Absolute Mean Error (AME), which is a metric to measure the average error for each prediction made by the forecasting model

The third evaluation metric is the Mean Absolute Error (MAPE), which is similar to the AME, except the error made by the model is measured in terms of the percentage.

However, these three criteria are able to compare the forecasting ability of the model in terms of the error in terms of the magnitude, but they fail to tell the accuracy of the model in terms of the predicting the direction and turning points.The condition for correct forecasting requires:

The other metric which are used to compare the accuracy of predicting turning points is obtained the evaluation method developed Cumby and Modest (111) which is a version of Merton Test(112) .

The Merton Test is defined as follows

where is the amount of change in the actual variable between time  and  and  is the amount of change in the predicted value for the same period [113].

The conditional probability matrix for accuracy of the system in prediction of turning points is as follows

where and  are the probability of the forecasting model in term of the predicting turning points in upward and downward direction.

The probability of forecasting model in predicting overall direction is given by

So,  ,  and  are another evaluation metrics used in this study.

Furthermore, Merton stated that necessary condition of market timing ability (model on an average should predict more than half) is

So, the hypothesis to be tested is given

Cumby and Modest [111] stated that the Merton hypothesis can be tested through the regression equation [113]:

where  is defined in equation (1) ,  is defined in equation and is the error term and  is:

For evaluating the rate of return of the trading strategy , we use the following formula

where TC for the first strategy ,

where TC for the second strategy ,

The terms M, ,S,,P, are dicussed in the section 4.

The regression analysis is evaluated by RMSE ,

Choice of performance or evaluation metrics

Variable Selection

Data Collection

Data Pre-processing

Neural Network Design

Data Partitioning

Training ANN

Evaluation of Individual Models

Linear Combined Neural Networks

3.8 Linear Combined Neural Network(LCNN)

The LCNN is employed to combine to merge the forecast of the individual forecasting model to form the final prediction of the index. In LCNN, we make four models LCNN1, LCNN2, LCNN3, and LCNN4.LCNN1 LCNN2, LCNN3, and LCNN4merges the output of the best two, three, four, five individual models. The figure demonstrates the LCNN 4 network.

3.9 Weight Combined Neural Network(WCNN)

The WCNN is employed to combine to merge the forecast of the individual forecasting model according to the weights to form the final prediction of the index. In LCNN, we make four models WCNN1, WCNN2, WCNN3, and WCNN4.WCNN1 LCNN2, LCNN3, and LCNN4merges the output of the best two, three, four, five individual models. The weights are assigned according to their accuracy.

For the case of the direction accuracy, the weights are determined by

For the case of the value accuracy, the weights are determined by

3.10 Mixed CombinedNeural Network (MCNN)

This is new approach which has been developed in this study to improve the forecasting accuracy of the model.The MCNN is employed to combine to merge the forecast of the individual forecasting model with the inputs to retrain with the best individual forecasting model. In MCNN, we make four models MCNN1, MCNN2, MCNN3, and MCNN4.MCNN1 ,MCNN2, MCNN3, and MCNN4merges the output of the best two, three, four, five individual models with the inputs.

We assume that the forecasting performance in this model will improve as compared to the forecasting performance of the individual as we assume that the final output of the model will form some linear relationship with the forecasted value of the individual value model. The reason for this assumption is based on the fact that the individual forecasting model performance is more than 50% .The figure shows the MCNN4 and the output function is given by

3.11 Trading Strategies

As mentioned earlier, one of the aim for this work is to determine whether ANN can realistically enhance good rate of returns for profits from trading. As has already been discussed in the literature review, developing a ANN forecasting model which gives high accuracy signals to the trader is not sufficient to enable a trader to make economically significant profits. For this reason, the ANN trained from the fundamental and technical data are benchmarked for both accuracy of signals, and its ability to capture profit in the LSE from those signals are tested through the modified trading system, which includes risk control, transcation cost and money management.In this section, we discuss about the trading strategy used in this project.

This study assumes that the trading in the stock market have always transaction cost which have been ignored by the lot of academic studies and low transaction cost will always make the investors to invest the money even if they are making small profit. This argument applies for most of the financial instruments such as bonds, stocks; however, it is more relevant to the stock markets. The reason for this is, when new news or information in the form of data related to the stock market is introduced in the stock market, investors have three options, either to take a buy, sell or hold the stock.

Most of the previous studies have not included the option of buying and selling in stock market even for small amount of profit .Although, I agree that it is not the best way for reacting to the new information, because it requires high transaction costs (brokerage cost) etc if the profit is less than the transaction cost. But, if the investor compares the profit with the transaction cost and if even there is small amount of profit than he should make transaction as addition of various small profits makes big profit at the end of the year if he is short time investor. So, in this study we are using the strategy of trading in the market even for small amount profit.

In this study, we modify the trading strategy used by the H. Pan et.al (2003) to get better return.Two types of trading strategies are used in this project as used by the H. Pan et.al (2003).

1. Response to the forecasted trading signals which might be “Buy” or “Sell” and “Hold”

2. Keep the money in the hand till the end of the period i.e. does not participate in the trading. This strategy is used as benchmark to compare the relative return.

3.11.1 First Trading Strategy

This study assumes that an index can be traded as the stock in the stock market. Let the value of the money in the hand of the investor be M. The number of shares be S=M/W where W is the closed price of the FTSE100 on the day before the starting day of the period.

Let , , ,  be the money in hand of the investor, number of shares, Closed price of the FTSE100, value of the share on the day t (t=1,2…..T).

We assume that the fixed amount of money is being used in the market irrespective of the signal whether it is buy or sell. Let the fixed amount be denoted by F and equal to M*L, where L=0.1, 0.2….1.

Suppose the trading signal at the beginning of the day t is “Buy “Signal. Then the investor spends FB=min {F, } amount of the money to buy a share at the rate of the previous day’s FTSE100 price.

If the trading signal is a “hold”signal , then

If the trading signal at the beginning of the day t is “Sell”.Then the investor sells amount of shares

In this study, we buy the stocks even if the “Buy Signal “ follows the “Buy Signal” immediately.

3.11.2 Second Trading Strategy

In this case the trader does not participate.Therefore  and .As the value of the share changes every day , so the value of the shares at day, we compare the rate of the return of both the strategies and if the return of the first strategy is more than the second one, it proves that the investor can gain the profit by investing short-term by using the forecastign model.

Chapter 4.Implementation

4 Implementation

In this section, the implementation steps to construct the forecasting model and trading strategy using the different types of ANN , hybrid approaches and trading strategy are described and explained. The construction process follows the procedure which we mentioned in the Chapter 3 .Firstly, the development environment is presented and then later the steps are explained. All these processes are implemented and conducted in the MATLAB programming language.

4.1 Development Environment

In Section 3, we discussed and compared the various possible procedures for each step (as shown in the figure 1) used in the various academic studies to implement the accurate forecasting model using ANN and then use that model with the trading strategy to get high rate of return or profit. Since the major aim of this study is to show the usage of the ANN in implementing the forecasting model and also to compare the performance of the different ANN, so we need to construct and simulate the forecasting model, besides the theoretical comparison of various approaches. It was difficult to choose the best procedure for each step for implementing the forecasting model as there is no fixed or best procedure which as the standard in all the academic studies to implement the forecasting model .Moreover; we also need the ideal environment to implement the forecasting model and fulfill the aims of this project.

MM Why MATLAB?

Jamshid Nazari and Okan K. Ersoy (1992) investigated the performance of various software packages like C with the MATLAB in implementing the neural network and they concluded that the speed of the neural network implemented in MATLAB is 4.5 to 7 times faster than C programs. Moreover, the software packages like C and JOONE ( Java Object Oriented Neural Engine) are huge in size; they need to be compile and sometimes modification of the code of these packages takes massive amount of time as one need to understand the massive code and learn additional low level programming, before making modification. Moreover, the MATLAB has the graphic capability and the user can see network parameters using the graphs to understand how each neural network works. In addition, it is easy to make additions and modifications in MATLAB. From the view point of the thesis, the discussions and the works above revealed that MATLAB should be used in the construction of the ANN. As such, we use MATLAB as the development environment of this thesis.

The Forecasting Models and the trading strategy in this study are implemented and tested using MATLAB (Version 7.9.0.3522 (R2009B)), and its associated partner tool, Time Series Tools, Financial Toolbox, Curve Fitting Toolbox, Neural Network Toolbox.

A great many individual programs were also written outside of MATLAB, to implement the data transformations, calculations and merges used for both making calculations for making statistical models, and also for determining the values of the various technical indicators like moving averages .These were written by writing scripts in the Microsoft Access.

A stand-alone PC with an AMD Athlon(tm) (Dual core Processor 5200G 2.69 GHZ and 2GB RAM) were used to perform all neural network training and testing. Using this high-end configuration, networks like recurrent and time delay neural network took approximately 3-6 days to train the noisy data and 8-24 hrs to train the network without the noisy data.

4.2 Variable Selection

In this section, we discuss about the various variables selected to be used as an potential input variable in this study .The selection of the potential variables are based on the discussions in section 3, where we discussed about the various possible technical and fundamental variables used by other academic researchers in their studies. As mentioned in Section 1, that there is no academic study that has been able to accurately forecast the FTSE 100, so we don't have any prior information which historical data should be used for forecasting of the FTSE 100.Initially we should take as much as historical data of the various variables that may affect the LSE by studying various journals, academic papers to get the best possible performance of the model in the forecasting.In addition, more is the data of the relevant variables that affect the LSE, more effective will be training and better will be performance in the forecasting.

In this study, we have selected almost all the indexes of the United States(US) financial market as the potential input variable, as it has been seen in the past that the UK stock market is strongly affected by the US stock market. In addition, we have also selected the stock market index of various European countries and Asia as the potential input variable. The indexes that are selected as a potential input variable are :

CAC 40 (FCHI) , Madrid General (SMSI), Swiss Market (SMSI ), BSE (BSESN), Hang Seng (HSI), Nikkei 225 (N225), FTSE 100, FTSE 250 , FTSE 350, FTSE Techmark , FTSE all share, Dow Jones Industrial Average (DJI), Dow Jones Composite Average (DJA), Dow Jones Transportation Average (DJT), Dow Jones Utility Average (DJU), S& P 500, NASDAQ, S& P 100,Shanghai Composite(SSEC).

In addition, we also tried the historical data of the other stock exchanges like CMA, TA-100, ATX, BEL-20, DAX, AEX General, OSE All Share, MIBTel, Taiwan Weighted, NZSE50 as an input , but during the individual index regression analysis with the FTSE 100 we find that they don’t affect the next day value of the FTSE100 as other index selected in this study.

The exchange rates that are used as a potential input variable in this study are:

GBP/USD, GBP/INR, GBP/JPY, GBP/CAD, GBP/EUR, GBP/CHF, GBP/ AUD, GBP/HKD, GBP/ NZD, GBP/KRW, and GBP/MXN.

The historical data of the metals tthat are used as a potential input variable in this study are:

GBP/XAU, GBP/XAG, GBP/ XPT, Silver, Gold, FTSE goldmine.

In addition ,we also selected the following fundamental information variables as a potential input variable in this study:

Interest rate of bank of England, Federal bank effective rate, Interest rate of euro dollar deposit of federal bank, GDP of UK, unemployment in UK

The stocks that are selected as a potential input variable in this study are:

RDSA (Royal Dutch Shell, LSE), Standard Chartered(LSE), HSBC HLDG (LSE), GlaxoSmithKline (LSE), AstraZeneca(NYSE), IBM (NYSE), Exxon Mobil Corp (NYSE), Chevron Corp (NYSE), 3m Co.(NYSE), McDonald’s Corp (MCD,NYSE), United technologies Corp( UTX,NYSE), Procter Gamble (PG), Wal-Mart Stores Inc (WMT,NYSE), KO (The Coco Cola Company, NYSE), PHLX gold/Silver Sector (PHLX,NYSE), AMEX Oil (XOI,NYSE).

where NYSE stands for New York Stock Exchange. We selected the top stocks based on the market capitalisation of both the LSE and NYSE. As mentioned in Section3.2, we are not selecting the variables as a potential input variable who has missing values more than 5% of the total data except the fundamental variable which have value on the monthly or yearly basis .This is the reason that we could not select the other top stocks of the LSE as there is no historical information of them before 2003 on websites .

4.3 Data Collection.

The historical financial time series for this thesis is obtained from the internet that covers the 6- year period from the first day of trading in 2002 to the last day of trading in 2008 with 1826 observations. Data is sourced from the various websites: Yahoo Finance, Google Finance, oanda.com, lbma.org, federalreserve.gov, mortgages.co.uk, data.bls.gov, statistics.gov.uk, fx.sauder.ubc.ca, and ftse.com. Programs were written to assess the consistency of the technical data obtained from the website, specifically; every row for every security was checked to ensure the following conditions were met: Open <= High, Close <= High, Open >= Low, Close >= Low, Low <= High, Open > 0, Low > 0, High > 0, Close > 0, Volume > 0.

MM Selection of Technical Variable

MM Selection of Fundamental Variable

4.4 Data Pre- and Post Processing Function

As mentioned in the previous chapter, before constructing the model, we need to fill the missing data and normalization too has to be done. We fill the missing data by using the linear interpolation. Except for the linear regression analysis, we have used the Financial Time series Toolbox embedded in MATLAB for pre- and processing step.

MM Converting Data into the Financial Time Series

As mentioned in Section 3, we need to convert the data into the financial time series before using it as an input for forecasting model. The financial data is converted into time series using the Financial Time Series Toolbox. We open the GUI (figure 1)by typing the command “ident” in the MATLAB Command Window.

MM Missing Values

As mentioned in Section3, we need to find the missing value of the financial time series. In this study, we use the result of Olivier Coupelon [c] study which concluded that results are much better when we use linear interpolation. So, we use linear interpolation in this thesis. The Financial Time series Toolbox embedded in MATLAB is used for finding missing values. We open the GUI (figure 1)by typing the command “tstool” in the MATLAB Command Window.

MM Regression Analysis

In this project, regression analysis has been done to determine the correlation-ship between the various independent variables and the FTSE 100 closed value. As mentioned in Section 3, we need to do regression analysis to reduce the noise in the forecasting model.

It is an initial step in every forecasting model because we cannot use all the technical and fundamental indicators (total 229 variables in this study) .In this thesis, all 1825 days data is segmented from the data sample to exposure a general relationship between the variables. We open the GUI (figure 1) for doing regression analysis by typing the command “stepwise (input, target)” in the MATLAB Command Window, where the input is the input variables to the forecasting model and target is the target value.ie next day closed Value of FTSE 100.

The regression analysis is done twice in the project. First it is done initially when we have 229 input variables, and later it is done when we add moving averages to the output of the first regression analysis.

4.5 Data Partitioning

As mentioned in the section 3, we are using the 80.10.10 approach in separating training data ,validation data and testing data for this study .After training, validation and testing of the data set equal to the window size, we are forecasting next day value of the FTSE100 instead of overlapping of the test data in the evaluation for each continuous training cycle and then calculating the average of the data set. Figure 1 shows the three time index series which are portioned into three sets: Historical data for training and validation and latest data for validation.

Historical Data (H): This set consist of data set equal to window size .As mentioned earlier, we are using different types of window size in this study and then selecting the best window size which gives the best performance according to the evaluation metrics (Section3).So, as to make sure that all data set with different window size are tested on the same data, we select the last day of the training day as the 500 observation (1st Dec, 2003) .Figure explains the approach used in the implementation.

Latest Data (L): This set consist of the data set of the 1326 observations (starting from 501th to 1826 Observations).This data set is used for the testing the forecasting performance of the model after training. As the main aim of the thesis is to forecast the next day value, we use one observation for each training cycle.

Model is trained and validated using the historical data and then latest data was used for the forecasting. But before using both the data as the input variable in the forecasting model, we need to convert into the form of the vector sets. This is one of the requirements of the supervised learning. So the historical data is converted into the form of the and the output .After this step, the vectors sets should look like as shown in the table1.

Matlab was used for this step.

As mentioned earlier in the study, we are training the forecasting model with the 5, 10, 15, 25, 30, 50, 75, 100, 200, 300, 500 window size as there is no standard solution about the size of the window in the sliding window approach.

4.6 Construction of the Forecasting Model

After the pre-and post processing stage, we move to the construction of the neural network design stages. In section 3, we discussed theoretically about the various challenges or issues in construction of the forecasting model by the different ANN. In this section, we use the results of those theoretical discussions in constructing the optimal forecasting model.

We first construct the forecasting model with the 229 variables and find the best forecasting model according to the direction of the index for each ANN and that model is called as “Noisy data Model”. We don’t construct the Noisy data model for the value of the index as the basic purpose of constructing the Noisy Data Model is to show the comparison of the result with the Noisy Data Model with the “Non-Noisy Data Model”, although we have discussed in the literature review that the performance of the model increases by removing variables which are causing noise. Non-Noisy data Model is the forecasted model that takes input (18 variables) which we get after the second regression analysis. It is given the name Non-Noisy data model as we are taking assumption that it will contain least noise. But for the Non-Noisy data model we construct the forecasting model according to the direction of the index and the value of the index.

For checking whether the approach of finding the number of hidden layers and neurons being proposed gives corrct results , we run the experiments by varying the number of hidden layers from 1 to 5 and hidden nodes in each layer from 1 to 20.

4.6.1 FNN Forecasting Model

As discussed in Section3, we initially construct one hidden layer FNN network and then increase the number of layers, when we vary the network parameters to get better performance .We intially train the Non-Noisy model by both the learning algorithms: gradient descent with the momentum backpropogation and Levenberg-Marquardt according to the direction accuracy and compare the result from both the algorithms.Then we develop the model with the best learning algorithm according to the value accuracy .In additon, we also develop the Noisy Data Model with the best learning algorithm , to show the comparison of the Non-Noisy Data Model with the Noisy Data Model according to the direction accuracy.

We use and vary the network parameters based on the theoretical discussion in the section3.

The MATLAB is used for implementing the code (Appendix4) for this program

4.6.2 TDNN Forecasting Model

As discussed in Section3, we initially construct one hidden layer TDNN network .We train the model by the learning algorithms Levenberg-Marquardt . We use the network parameters based on the theoretical discussion in the section3.The MATLAB is used for implementing the code (Appendix4) for this program.

4.6.3 RNN Forecasting Model

As discussed in Section3, we initially construct one hidden layer RNN network .We train the model by the learning algorithms Levenberg-Marquardt . We use the network parameters based on the theoretical discussion in the section3.The MATLAB is used for implementing the code (Appendix4) for this program.

4.6.4 PNN Forecasting Model

As discussed in Section3, we construct the two layer PNN network .The first layer has radbas neurons and it calculates its weighted inputs with dist and its net input with netprod. The first layer has only biases. For the second layer, we selected the compet neurons and calculated its weighted input with dotprod and its net inputs with netsum. In the code (Appendix B) newpnn sets the first layer weights to ‘P’ and the biases of the first layer are all set to 0.8326/spread, resulting in radial bias functions that cross 0.5 at weighted inputs of +/- spread. The weights W2 of the second layer are set to T. (MATLAB)

In this experiment, we vary the value of the spread factor and other parameters unless we find the best forecasting model.The larger the spread is, function approximation will be smoothered more. If the spread is too large a lot number of neurons are required to fit a fast-changing function. But if the spread is too small , it means many neurons are required to fit a smooth function, and the forecasting neural network might not generalize well.(MATLAB).The advantage of this neural network is that we have to vary only the one parameter(spreading factor).

4.6.5 RBNN Forecasting Model

As discussed in section3, we construct the two- layer radial basis neural network. The first layer has radbas neurons and it calculates its weighted inputs with dist and its net input with netprod. In this network both the layer has biases. For the second layer, we selected the purelin neurons and calculated its weighted input with dotprod and its net inputs with netsum. In the code (Appendix B) newrbe sets the first layer weights to ‘P’ and the biases of the first layer are all set to 0.8326/spread, resulting in radial basis functions that cross 0.5 at weighted inputs of +/- spread. Basically the numbers of hidden neurons are added to the hidden layer until it meets the specified mean squared error goal. (MATLAB)

The weights of the second layer IW {2, 1} and biases b {2} are calculated by simulating the first-layer outputs A{1} and then solving the following linear expression(MATLAB):

In this experiment, we vary the value of the spread factor and other parameters unless we find the best forecasting model.

For moving and weighed window approaches, the architecture specification also includes the amount of observations used for training, which is also difficult to specify exactly in advance and is generally data dependent. To find the best size of the window, we vary it from 50 to the highest integer multiple of 50 that was possible within the training set. The choice of 50 is somewhat arbitrary but follows general recommendation in the time series forecasting literature that at least 50 observations are needed in order to build a successful forecasting model (Box and Jenkins, 1976). The code was written in the Matlab.

As for evaluation, we focus mainly on out-of-sample performance, as it is most important in financial time series forecasting. We consider Root Mean Square Error statistics (RMSE) to see the performance of out-of-sample prediction. Further on, we use statistics proposed by Pesaran Timmerman – SR (PT) [11], which evaluates the correctness of the signs prediction. Such statistics is often used in financial literature as the predicted positive change predicts buy signal, negative change sell signal which allows evaluating a trading strategy. Pesaran Timmerman statistics is based on the null hypothesis that a given model has no economic value in forecasting direction and is approximately normally distributed. In other words, we test the null hypothesis that the signs of the forecasts and the signs of actual variables are independent. If the prediction of signs is statistically dependent, we approached a good forecasting model with economic significance.

In this study, 6 ANN models were applied to the system model, using an ANN software package. ANN models’ performances can be measured by the coefficient of determination (R2) or the mean relative percentage error. This coefficient of determination is a measure of the accuracy of prediction of the trained network models. Higher R2 values indicate better prediction. The mean relative percentage error may also be used to measure the accuracy of prediction through representing the degree of scatter. For each prediction model, Eq.6 was utilized to calculate the relative error for each case in the testing set. Then, the calculated values were averaged and factored by 100 to express in percentages.

Chapter . Results and Analysis

5 Results and Analysis

In this section, all the results related to different steps of the constructing the various forecasting models and trading strategy are presented with the analysis and discussion. The construction process follows the steps which we mentioned in the Chapter 4 .Firstly the results for individual forecasting model are presented with and without noisy data, starting with the FNN, TDNN, RNN RBNF, RNN and the PNN. Then the results of the hybid based forecasting models made by combination of the various individual forecasting models by using different approaches are presented. Then we show the results of the trading model made by using the best forecasting model by various different strategies. For analysis on the forecasting performance of the different forecasting models on the Latest Data (L), we use the evaluation metrics discussed in Chapter 4. A forecasting model with the maximum  (L) is considered to be better at forecasting direction of the movement of the series. While the model with the minimum MSE (L) is considered to be better at forecasting the value of the series. We also consider another evaluation metric in case the forecasting models are having the same value of the  (L) or MSE (L). A model with the maximum rate of return is considered to be the better trading model in terms of the profit. All these processes are implemented and conducted in MATLAB programming language.

5.1 Result of Regression Analysis

The following figure 1and table 1 shows the result of the regression analysis on the 229 input variables. 56 different models were made in this process to determine the subsets of the input variables which have strong correlation among them to forecast the FTSE 100 closed value. The evaluation metric used in this process to determine the most important subset of indicators is Root Mean Square Value (RMSE).The minor is the RMSE, the better is the subset of variables as an input to forecasting variable.

From the results listed above, we can conclude that the regression analysis is necessary step in the forecasting the stock exchange as it has reduced the RMSE from 858.767(Model 1) to 51.8232 (Model 56) So, we select the subset of variables in Model 56 which has the least RMSE, as the input variables for next stage. Moreover; we also found that the variables which have higher coefficient (linear relationship) are not always selected in the forecasting model, which prove the statement that it is not necessarily that the variables which have higher coefficient are selected.

There were 27 variables selected in the final subset of the regression analysis and they are : CAC PARIS High, CAC PARIS Low, CAC PARIS Close, BSE High, BSE Adj Close, GBP/CHF, GBP/AUD, GBP/MXN, FTSE 100Open,FTSE 100 close, FTSE 350Close, GlaxosmithklineClose, AstraZeneca plc Open, AstraZeneca plc Close, Open DJI, Open DJT, High DJT, Low DJT, Close DJT, Open DJU, S& P 500 IndexOpen, S&P Close, S& P 100 Open, ibm Adj Close, PHLX Open, PHLX High, PGAdj Close

Then we apply each of the all the technical indicators discussed in section 3 with 5 and 10 day to the subset of variables selected in the first regression analysis and then we apply the second regression analysis.

The following figure 2and table 2 shows the result of the second regression analysis (after addition of moving averages) on the input variables. 35 different models were made in this process to determine the subsets of the input variables which have strong correlation among them to forecast the FTSE 100 closed value.

From the results listed above, we can again conclude that the regression analysis is necessary step in the forecasting the stock exchange as it reduced the RMSE from 701.869(Model 1) to 40.0667 (Model 35) So, we select the subset of variables in Model 35 which has the least RMSE, as the input variables for the forecasting model.

In this study, following set of input variables were considered to ultimately affect the FTSE100 value.

1. Previous day’s CAC PARIS high value

2. Previous day’s CAC PARIS Moving Average10

3. Previous day’s BSE High

4. Previous day’s BSE Adj Close

5. Previous day’s BSE Adj close10 day lag

6. Previous day’s GBP/MXN exchange rate

7. Previous day’s GBP/MXN exchange rate 5 day lag

8. Previous day’s FTSE 100Open LMMA5

9. Previous day’s FTSE 100 Close

10. Previous day’s FTSE 100 close SMMA5

11. Previous day’s FTSE 100 close10 day lag

12. Previous day’s GlaxoSmithKline Close

13. Previous day’s GlaxoSmithKline Close SMMA5,

14. Previous day’s Open DJU EM 10

15. Previous day’s S& P 500 Index Open

16. Previous day’s S&P Close

17. Previous day’s S&P Close 5 day lag

18. Previous day’s S&P Close LMMA5.

Considering these input variables as Fun1, Fun2......Fun18 at time t, the following system model was considered for the prediction stock exchange market index value:

5.2 Results For FNN Forecasting models

As discussed in section 4.3, FNN models with different network parameters were created, trained and tested for each series for Noisy Data Model and Non-Noisy Data Model. The detailed results and discussions are presented in this section.

Table 1 and table 2 demonstrate the result of the FNN with the gradient descent momentum backpropogation(GDMB) and Levenberg-Marquardt (LM) learning algorithm in terms of the direction accuracy. The accuracy of both the models have been compared by the probability , which is equal to the overall direction accuracy if we multiply it with the 100.Based on our empirical investigation, we find that the LM algorithm approach is fairly effective for financial time series forecasting and is significantly better than the GDMB learning algorithm from the perspective of overall prediction accuracy Clearly, the table 1 and 2 shows that the results have improved significantly from 59.66% to 89.66% in direction accuracy when we use the Levenberg-Marquardt as compared to the GDMB.

As the table 1 and 2 shows that we have develop the forecasting model for each window size. The best result in the case of the Levenberg-Marquardt was with 200 window size whereas of the gradient descent momentum backpropogation results was with the window size 300.This clearly supports our argument that there cannot be standard window size for any network .In addition, it also supports our argument that we have to always vary the window size to get the optimum result. Figure shows clearly that the direction accuracy of the FNN with the variation of the window size for both the algorithm follows almost the same pattern.

Table 3 and table 4 demonstrate the result of the FNN with GDMB and LM learning algorithm in terms of the value accuracy. The accuracy of both the models have been compared by the MSE.Clearly, the table 2 shows that the results have improved significantly from 0.1700 to 0.000124 in value accuracy when we use the Levenberg-Marquardt.So, we conclude that Levenberg-Marquardt gives better result in the forecasting problem when we are predicting the value of the stock exchange than the GDMB algorithm.

Moreover, the table 1 and 3 clearly demonstrates that the performance of the model in terms of prediction of the direction of the stock exchange improves by 33.66% with the regression analysis and techincal indicators.

Moreover, we find the result in the experiment supports our proposed approach in this study of finding the optimum number of hidden layers and number of hidden neurons. These numbers of hidden layers were varied from 1 to 5 in the experiment. On the other hand, we vary number of hidden nodes from 1 to 30.We find the result that the performance of the model increases with increase of the number of hidden neurons to some point, after which it decreases constantly for each window size. In this experiment, we find that the performance of the model decreases when we increased the number of the hidden layers and we get the best performance at the number of hidden layers equal to one.

For each ANN architecture experimented, the model parameters are estimated with the training sample while the best model is selected with the validation sample.

Finally, the forecasting model which is considered as a candidate for the benchmark from FNN in terms of the direction and value accuracy is a FNN 42 and FNN 55. Furthermore, the network architecture of FNN 42 and FNN 55 is one layer time delay feedforward with 12 and 3 neurons in the hidden layer. The network was trained for 100 iterations or until one of the stopping criteria is met. The learning rate is 1.0 and 1.0 for FNN 42 and FNN 55, momentum rate is 1.0 and 1.0 for TDNN 32 and TDNN 22 , and training algorithm is Levenberg-Marquardt.

5.3 Results For TDNN Forecasting models

As discussed in section 4.3, TDNN models with different network parameters were created, trained and tested for each series with and without noisy data. The detailed results and discussions are presented in this section. For Non-Noisy Data Model RBNN , it wasn’t the case that models with minimum MSE (L) had maximum  (L). Hence this section is divided into three subsections:

In general, as can be seen from Table 8, the results of the TDNN Non-Noisy Data Model forecasting model were ranging around 50% which is not unusual for noisy data. The main reasons were too much noise in the data of the Non-Noisy data Model. We find that the maximum direction accuracy achieved with the Non-Noisy Data Model was 55% with window size 500. As mentioned earlier, noise issue can be solved through regression analysis and noise filter such as moving average and we get Non-Noisy data Model. Moreover, we found that the TDNN1, TDNN2, TDNN6, TDNN7 does not satisfy the Merton criteria. So, they are not good forecasting model.

Table 9 shows the results of the TDNN Non-Noisy Data Model in terms of the direction accuracy .The performance of the Non-Noisy data Model in terms of the direction was improved about 36.33% as compared to Noisy data Model. TDNN 32 has the 0.9059 conditional probability of predicting the upward direction, 0.9180 conditional probability of predicting the downward direction and overall 91.33 conditional probability of predicting the overall turning points in the LSE.Moreover, all models with different window size in table 9 satisfies the Merton Test.The results in table 8 and 9 clearly supports our argument that there cannot be standard window size for any network .In addition, it also supports our argument that we have to always vary the window size to get the optimum result.

Table 10 shows the results of the TDNN Non-Noisy Data Model in terms of the value accuracy .TDNN 22 with MSE 0.0001432 is the best forecasting model as compared to all models in the table 9.Moreover, all models with different window size in table 9 satisfies the Merton Test.

Moreover, we find the result in the experiment( table 9 and table 10) supports our proposed approach in this study of finding the optimum number of hidden layers but not in case of the number of hidden neurons. These numbers of hidden layers were varied from 1 to 5 in the experiment. On the other hand, we vary number of hidden nodes from 1 to 20. Figure 7 clearly shows that the performance in terms of the value (MSE) first decreases with increase in the number of hidden neurons to some point, after which it increases significantly and then it decreases to some point after which the MSE starts increasing .In case of the direction accuracy, performance of the model follows zigzag pattern and it is difficult to use any approach or “magic” formula to describe a structure that can find the optium hidden neurons. The best choice have must be search by the randomly alternatives according to the data.This pattern was observed in the experiment for each window size. But we find that the performance of the model decreases when we increased the number of the hidden layers and we get the best performance at the number of hidden layers equal to one.

Finally, the forecasting model which is considered as a candidate for the benchmark from TDNN in terms of the direction and value accuracy is a TDNN 32 and TDNN 22. Furthermore, the network architecture of TDNN 32 and TDNN 22 is one layer time delay feedforward with 11 and 15 neurons in the hidden layer. The network was trained for 100 iterations or until one of the stopping criteria is met. The learning rate is 0.6 and 1.0 for TDNN 32 and TDNN 22, momentum rate is 1.0 and 1.0 for TDNN 32 and TDNN 22 , and training algorithm is Levenberg-Marquardt.

5.4 Results For RNN Forecasting models

As discussed in section 4.3, RNN models with different network parameters were created, trained and tested for each series with and without noisy data. The detailed results and discussions are presented in this section. For Non-Noisy Data Model RNN , it wasn’t the case that models with minimum MSE (L) had maximum  (L). Hence this section is divided into three subsections:

In general, as can be seen from Table 11, the results of the RNN Non-Noisy Data Model forecasting model were ranging around 42-58% which is not unusual for noisy data. The main reasons were too much noise in the data of the Non-Noisy data Model. We find that the maximum direction accuracy achieved with the Non-Noisy Data Model was 58% with window size 30. As mentioned earlier, noise issue can be solved through regression analysis and noise filter such as moving average and we get Non-Noisy data Model. Moreover, we found that the RNN 2, RNN 6, RNN 7, RNN 8, RNN 9 does not satisfy the Merton criteria. So, they are not good forecasting model.

Table 12 shows the results of the RNN Non-Noisy Data Model in terms of the direction accuracy .The performance of the Non-Noisy data Model in terms of the direction was improved about 34% as compared to Noisy data Model. RNN 20 has the 0.9344 conditional probability of predicting the upward direction, 0.8974 conditional probability of predicting the downward direction and overall 0.9200 conditional probability of predicting the overall turning points in the LSE.Moreover, all models with different window size in table 9 satisfies the Merton Test.The results in table 8 and 9 clearly supports our argument that there cannot be standard window size for any network .In addition, it also supports our argument that we have to always vary the window size to get the optimum result.

Table 13 shows the results of the RNN Non-Noisy Data Model in terms of the value accuracy . RNN 33 with MSE 0.000122 is the best forecasting model as compared to all models in the table 13.Moreover, all models with different window size in table 9 satisfies the Merton Test.

Moreover, we find the result in the experiment( table 9 and table 10) supports our proposed approach in this study of finding the optimum number of hidden layers and the number of hidden neurons. These numbers of hidden layers were varied from 1 to 5 in the experiment. On the other hand, we vary number of hidden nodes from 1 to 30. We find the result that the performance of the model increases with increase of the number of hidden neurons to some point, after which it decreases constantly for each window size. In this experiment, we find that the performance of the model decreases when we increased the number of the hidden layers and we get the best performance at the number of hidden layers equal to one.

Finally, the forecasting model which is considered as a candidate for the benchmark from RNN in terms of the direction and value accuracy is a RNN 20 and RNN 33. Furthermore, the network architecture of RNN 20 and RNN 33 is one layer time recurrent network with 8 and 8 neurons in the hidden layer. The network was trained for 100 iterations or until one of the stopping criteria is met. The learning rate is 1.0 and 1.0 for RNN 20 and RNN 33, momentum rate is 1.0 and 1.0 for RNN 20 and RNN 33 , and training algorithm is Levenberg-Marquardt.

5.5 Results For PNN Forecasting models

As discussed in section 4.3, PNN models with different network parameters were created, trained and tested for each series with and without noisy data. The detailed results and discussions are presented in this section. For Non-Noisy Data Model PNN , it wasn’t the case that models with minimum MSE (L) had maximum  (L). Hence this section is divided into three subsections:

In general, as can be seen from Table 14, the results of the PNN Non-Noisy Data Model forecasting model were ranging around % which is not unusual for noisy data. The main reasons were too much noise in the data of the Non-Noisy data Model. We find that the maximum direction accuracy achieved with the Non-Noisy Data Model was 58% with window size 30. As mentioned earlier, noise issue can be solved through regression analysis and noise filter such as moving average and we get Non-Noisy data Model. Moreover, we found that the PNN 2, PNN 6, PNN 7, PNN 8, PNN 9 does not satisfy the Merton criteria. So, they are not good forecasting model.

Table 15 shows the results of the PNN Non-Noisy Data Model in terms of the direction accuracy .The performance of the Non-Noisy data Model in terms of the direction was improved about % as compared to Noisy data Model. The results demonstarte that PNN 13, PNN 14, PNN 15, PNN16 have same direction accuarcy.So, now we have to see the performance of these models according to the value of the other evalaution metrics and select the best model according to those parametrics. The table clearly demonstartes that the AME(L) , MSE(L), MAPE (L) is least for the PNN16. PNN 16 has the 0.9016 conditional probability of predicting the upward direction, 0.8974 conditional probability of predicting the downward direction and overall 0.9000 conditional probability of predicting the overall turning points in the LSE.Moreover, all models with different window size in table 9 satisfies the Merton Test.The results in table 14 and 15 clearly supports our argument that there cannot be standard window size for any network .In addition, it also supports our argument that we have to always vary the window size to get the optimum result.

Table 13 shows the results of the PNN Non-Noisy Data Model in terms of the value accuracy . PNN 33 with MSE 9.95E-05 is the best forecasting model as compared to all models in the table 13.Moreover, all models with different window size in table 9 satisfies the Merton Test.

Moreover, we find the result in the experiment( table 9 and table 10) supports our proposed approach in this study of finding the optimum number of spread factor. These numbers of spread factor were varied from 0.1 to 20 in the experiment. On the other hand, we vary spread factor from 1 to 20. We find the result that the performance of the model increases with increase of the spread factor to some point, after which it decreases constantly for each window size.

Finally, the forecasting model which is considered as a candidate for the benchmark from PNN in terms of the direction and value accuracy is a PNN 16 and PNN 33. Furthermore, the network architecture of PNN 16 and PNN 33 is a probabilistic neural network with 6 and 9 as a spread factor.

5.6 Results For RBFNN Forecasting models

As discussed in section 4.3, PNN models with different network parameters were created, trained and tested for each series with and without noisy data. The detailed results and discussions are presented in this section. For Non-Noisy Data Model PNN , it was the case that models with minimum MSE (L) had maximum  (L). Hence this section is divided into two subsections:

In general, as can be seen from Table 17, the results of the PNN Non-Noisy Data Model forecasting model were ranging around 38-57 % which is not unusual for noisy data. The main reasons were too much noise in the data of the Non-Noisy data Model. We find that the maximum direction accuracy achieved with the Non-Noisy Data Model was 57% with window size 500. As mentioned earlier, noise issue can be solved through regression analysis and noise filter such as moving average and we get Non-Noisy data Model. Moreover, we found that the RBFNN 1, RBFNN 2, RBFNN 4, RBFNN 5, RBFNN 9,RBFNN 11 does not satisfy the Merton criteria. So, they are not good forecasting model.

Table 18 shows the results of the PNN Non-Noisy Data Model in terms of the direction accuracy .The performance of the Non-Noisy data Model in terms of the direction was improved about 30 % as compared to Noisy data Model. RBFNN 16 has the 0.90 conditional probability of predicting the upward direction, 0.82 conditional probability of predicting the downward direction and overall 0.87 conditional probability of predicting the overall turning points in the LSE.Moreover, all models with different window size in table 17 except RBFNN 13,RBFNN 20, RBFNN 21 does not satisfies the Merton Test.The results in table 16 and 17 clearly supports our argument that there cannot be standard window size for any network .In addition, it also supports our argument that we have to always vary the window size to get the optimum result.

Table 18 also shows the results of the RNN Non-Noisy Data Model in terms of the value accuracy . RBFNN 19 with MSE 0.00044 is the best forecasting model as compared to all models in the table 17.

Moreover, we find the result in the experiment( table 18) supports our proposed approach in this study of finding the optimum number of spread factor. These numbers of spread factor were varied from 0.1 to 20 in the experiment. On the other hand, we vary spread factor from 1 to 20. We find the result that the performance of the model increases with increase of the spread factor to some point, after which it decreases constantly for each window size.

Finally, the forecasting model which is considered as a candidate for the benchmark from PNN in terms of the direction and value accuracy is a RBFNN 19. Furthermore, the network architecture of RBFNN 19 is a radial basis neural network with 6 and 9 as a spread factor.

5.7 Comparison of Individual Models

In this section, we are going to compare the performance of the various ANN models. Table 19 shows the comparison of the performance of different types of ANN according to the direction and value accuracy of forecasting the LSE.

Comparisons of different forecasting model shows that RNN and PNN have better forecasting direction and value accuracy. However, there are an almost infinite number of configurations and network parameters in other ANN forecasting mode; I cannot say that it would be impossible to find one that could yield better results than RNN and PNN.

In terms of the direction accuracy, we can arrange the different ANN in descending order as RNN > TDNN > FNN > PNN > RBFNN .

In terms of the direction accuracy, we can arrange the different ANN in descending order as PNN > RNN > FNN > TDNN > RBFNN.

5.8 Result of Linear Combined Neural Network Forecasting Model

In this section, we are going to compare the performance of the various types of LCNN forecasting models. Table 20 and 21 shows the comparison of the performance of different types of LCNN according to the direction and value accuracy of forecasting the LSE.

Four LCNN were created during this thesis according to the direction accuracy. These are summarized below:

f1={RNN20,TDNN32}

f2={RNN20,TDNN32,PNN16}

f3={RNN20,TDNN32,PNN16,FNN42}

f4={RNN20,TDNN32,PNN16,FNN42,RBNN19}

The table clearly demonstrates that the Model f2 and f3 have same performance in terms of the direction accuracy. Now, we have to consider other evaluation criteria and we find that the f3 has better value than the f2 in other evaluation criteria. Finally, f3 is considered to be a candidate for the benchmark from LCNN in terms of the direction accuracy.

Four LCNN were created during this thesis according to the value accuracy. These are summarized below:

f5={PNN33,RNN33}

f6={PNN33,RNN33,FNN55}

f7={PNN33,RNN33,FNN55,TDNN22}

f8={PNN33,RNN33,FNN55,TDNN22,PNN33}

The table clearly demonstartes that the Model f6 have better performance in terms of the value accuracy than all models. Finally, f6 is considered to be a candidate for the benchmark from LCNN in terms of the value accuracy.

5.9 Result of Weight Combined Neural Network Forecasting Model

In this section, we are going to compare the performance of the various types of WCNN forecasting models. Table 22 and 23 shows the comparison of the performance of different types of WCNN according to the direction and value accuracy of forecasting the LSE.

Four WCNN were created during this thesis according to the direction accuracy. These are summarized below:

W1={RNN20,TDNN32}

W2={RNN20,TDNN32,PNN16}

W3={RNN20,TDNN32,PNN16,FNN42}

W4={RNN20,TDNN32,PNN16,FNN42,RBNN19}

The table clearly demonstrates that the Model W2 and W3 have same performance in terms of the direction accuracy. Now, we have to consider other evaluation criteria and we find that the W3 has better value than the W2 in other evaluation criteria. Finally, W3 is considered to be a candidate for the benchmark from WCNN in terms of the direction accuracy.

Four WCNN were created during this thesis according to the direction accuracy. These are summarized below:

W5={PNN33,RNN33}

W6={PNN33,RNN33,FNN55}

W7={PNN33,RNN33,FNN55,TDNN22}

W8={PNN33,RNN33,FNN55,TDNN22,RBNN33}

The table clearly demonstartes that the Model W6 have better performance in terms of the value accuracy than all models. Finally, W6 is considered to be a candidate for the benchmark from WCNN in terms of the value accuracy.

5.10 Results of Mixed Combined Neural Network Forecasting Model

In this section, we are going to compare the performance of the various types of MCNN forecasting models. Table 22 and 23 shows the comparison of the performance of different types of MCNN according to the direction and value accuracy of forecasting the LSE.

Four MCNN were created during this thesis according to the direction accuracy. These are summarized below:

M1={RNN20,TDNN32}

M2={RNN20,TDNN32,PNN16}

M3={RNN20,TDNN32,PNN16,FNN42}

M4={RNN20,TDNN32,PNN16,FNN42,RBNN19}

The table clearly demonstrates that the Model M3 have better performance in terms of the direction accuracy than all models. Finally, M3 is considered to be a candidate for the benchmark from MCNN in terms of the direction accuracy.

Four MCNN were created during this thesis according to the direction accuracy. These are summarized below:

M5={PNN33,RNN33}

M6={PNN33,RNN33,FNN55}

M7={PNN33,RNN33,FNN55,TDNN22}

M8={PNN33,RNN33,FNN55,TDNN22,PNN33}

The table clearly demonstartes that the Model M6 have better performance in terms of the value accuracy than all models. Finally, M6 is considered to be a candidate for the benchmark from MCNN in terms of the value accuracy.

5.11 Comparison of All Forecasting Models

In this section, we are going to compare the performance of the individual models with the hybrid based models. Table 26 shows the comparison of the performance of al benchmark models according to the direction and value accuracy of forecasting the LSE.

Comparisons of different forecasting model shows that W3 and W6 have better forecasting direction and value accuracy.Tables 26 show that none of the individual ANN forecasting models were able to outperform the benchmark of the hybrid based models.So, the result shows that the performance of the hybrid based approaches are better than individual models.However, the result does not conclude that WCNN models are better than the MCNN as in this study we have taken the best individual forecasting model for developing the WCNN (figure 3.1) and there are an almost infinite number of configurations and network parameters in the individual forecasting model , that can improve the performance of WCCNN. I cannot say that it would be impossible to find one tWCNN which could yield better results than MCNN.

5.12 Results of Trading Strategies

In this section, we are going to compare the rate of return for both the strategies. Table 27 shows the comparison of the rate of return of the forecasting model based on the direction and value accuracy. The results clearly demonstartes that an investor can earn more money if he uses the forecasting model based on the direction accuracy to invest the money.

In addition, the rate of the return that was attained in this thesis by using modified trading strategy is 120.14% which has shown dramatic change as compared to the 10.8493% rate of return of the existing trading strategy in the academics.

6 Conclusions

6.1 Introduction

A great deal of the research discussed in the literature review focused on application of the ANNs to forecast the stock market trend, prices and returns. Although we found in the literature review that ANN are suitable for forecasting the stock market, virtually no academic research attempted to site the various type of ANN and hybrid based neural networks in the context of a LSE real trading system, and actually determine whether the ANNs and hybrid based neural networks were economically practical.

This thesis has attempted to do just that. In doing so, it has laid down a well-structured approach of developing the forecasting model using ANN and then using it to create trading systems, and has defined the better trading strategy that can give better profit than the existing strategy and tested these trading systems tested on out-of-sample data

Thousands of neural networks were created during this thesis, focusing on different sets of network parameters, and they were trained and tested over whole of the period from 2002-08.

The purpose of this section of the thesis is to provide a summary on why the ANN and trading strategy achieved the results they did, and then to formally draw conclusions about the research question from the results.

6.2 Discussion of Results

The results that we obtain are very promising and we have been to able to answer all the questions addressed in the section 1.4.

Ferreira et al. (2004) described that the relationship between the variables and the predicted index is non linear and the Artificial neural networks (ANN) have the characteristic to represent such complex non-linear relationship.

Results of five different types of ANNs and three hybrids based neural network forecasting models trained in this thesis demonstrated this ability very well. These models effectively determined the relationships inherent in their underlying datasets. These models were successfully able to approximate an underlying equation or function that embodied the relationship of the potential input variables to the underlying structural mechanics of the LSE.

The individual neural network “Noisy Data Model” trained using all the potential input variables performed poorly, despite the fact that they were using the same inputs in their neural network. A number of possible explanations for this result were proposed, and the major reason was too much noise in the forecasting model.

Another possible explanation is that the some of the Noisy Data Model did not perform poorly, had 58% direction accuracy and gave good signals, however, these generated signals were not focused enough to allow a investor to use these models in the trading system for profit. There has been quite popular statement in the financial market ‘a rising tide lifts all the boats’. This statement implies that in case of bull market, many investors tend to benefit by investing money in the stocks that are rising. Naturally, the converse is also true. The main limitation becomes apparent when considering the generated signals of the ANN Model. This limitation is that these models are not ‘contextual’. In other words, for a given predicted value by these models, the average return from these models is equal the average accuracy of the model. However, it is quite feasible that the investor can get good rate of return using the Noisy Data Model. One guideline consistently demonstrates that the investors increase their chances of success when they trade with the good forecasting model

Using this as limitation of the Noisy Data Model, the signals from the poorly performing forecasting model can be improved by using regression analysis. The comparison of the graph 2 and 3 in Section 4 shows that the performance of the system has significantly improved. To improve results further, we add the technical variables as the studies in the literature review have stated that results improve when we use them. This is proved by the result of both the regression analysis that RMSE improved by 22.68%.The analysis of the final variables (18 variables) reveals that 9 variables are result of the technical analysis.

The result of all individual Noisy Data Model forecasting model shows that the direction accuracy improves by an average of 28%, which also proves our point that if we can remove all the noise from the system, we can get 100% accurate model, which seems impossible to some extent. The result of all individual forecasting model shows that RNN has better direction and value accuracy. But, the results do not rule out the fact that the PNN, RBNN, FNN, and TDNN cannot have better accuracy than the RNN as the difference in accuracy among all is very small. Although, we have tried every possible parameter for developing each neural network, we don’t rule out the fact that there cannot be any configurations in any of these neural networks which can give the better performance than the performance of the RNN.

The results of the hybrid based approaches shows that WCNN have better performance than other hybrid approaches. But, we cannot rule out the fact the MCNN can’t have better performance than the WCNN .We will discuss about this in future work. Further comments regarding future work will be delayed until section 6.5. The implication of the better results with the hybrid approach could be due to the fact that when we combine the results of all models, the overall noise gets reduced.

Comparing the results of all forecasting models, we find that the hybrid based WCNN Model has better performance in forecasting than the individual models. In addition, the result clearly shows that all hybrid based model outperform the individual based forecasting models. But, we find interesting point in the study that inclusion of the RBNN in the hybrid based forecasting model, decreases the performance i.e. increases noise. This problem can be solved which is discussed in future work.

Comparing the results of all forecasting models, we find that the hybrid based WCNN Model has better performance in forecasting than the individual models. In addition, the result clearly shows that all hybrid based model outperform the individual based forecasting models. But, we find interesting point in the study that inclusion of the RBNN in the hybrid based forecasting model, decreases the performance i.e. increases noise. This observation leads to a number of possible directions for future study.

The results of the trading strategy shows that we get better return if we develop the forecasting models based on the direction accuracy and the results have better rate of return when we use the modified trading strategy as compared to the existing strategy. This observation leads to a point that the investor should trade in the stock market even if he is getting very small profit.

In addition, we find that the purposed modified technique of finding the number of hidden nodes works well for all the ANN except the TDNN.

6.3 Conclusions regarding Research Question

The results of the trading systems created using the ANN neural models can be used to answer the research question posed at the start of the thesis

Yes, ANNs can be used to develop the accurate forecasting model that can be used in the trading systems to earn profit for the investor

The other research questions are answered in the light of the thesis results.

1. RNN has better performance than all other ANN in forecasting

2. The 18 variables that were used in the Non-Noisy data Model are potential input variables from 2002-08 affect the LSE

3. Yes, international stock exchanges, currency exchange rate and other macroeconomic factors affect the LSE which is proved by the variables selected after the regression analysis.

4. The performance of the forecasting model improves by 28% using the regression analysis in the factor selection

5. Yes, technical indicators improve the performance of the forecasting model which is shown by the results of regression analysis and individual forecasting model

6. Levenberg-Marquardt gives better performance in the training of the ANN.

7, 8 WCNN hybrid-based Forecasting Model give better performance than all other hybrid based models and they have better performance than the individual ANN forecasting models

9. Yes, forecasting model developed on the basis of the percentage accuracy give more accuracy as compared to the value accuracy

10. Yes, forecasting model having better performance in terms of the accuracy increase the profit of the investor when applied to the trading strategy.

The various approaches that can be used in the construction of the ANN was shown in the Chapter 3.

6.4 Implications for theory

It is clear that the outcomes of this thesis do not support the EMH.A great number of many other researchers has shown evidence against the EMH.This thesis adds to their result.

Azoff (1994) stated that the ANN with a high degree of predictive accuracy may not give better result when applied to the trading system. This point is been supported by the study of Thawornwong and Enke (2004) and Chande (1997).But this study have shown evidence against this statement as better forecasting model was most successful in the trading system i.e. higher rate of return.

It is hoped that the forecasting model and trading system development methodology presented in this thesis might encourage other academics researchers to pursue the area of neural network research in the finance. There is a great distance to go before developing an accurate forecasting model using ANN can be defined that can predict the stock market pricing behavior. However, a great deal of depth can be added to existing methodologies of constructing the ANN by detecting and documenting persistent anomalies that exist, as these will definitely help to have better forecasting model.

6.5 Future Research

This thesis has effectively attempted to create the accurate forecasting model using ANN and then applying the modified trading strategy to get better rate of return. To some extent, it has been successful able to meet its objectives. But, we have not tested the other computational intelligent techniques like Fuzzy Logic, genetic algorithm and modern statistical methods, so we cannot say that it is optimal to use ANN in the forecasting of the stock market. The limitations of this model can be improved by doing research on the flowing points:

1. In this study, we selected the best forecasting model (RNN) for the MCNN. This was the limitation of this study as we cannot be sure that the previous best network parameters will be best for other inputs too. We can improve the performance of the model by varying the network parameters of the RNN. Besides, we can also try another ANN, which might also give better performance. In addition, we should do all steps of regression analysis and technical indicators again as we have new inputs variables and we can eliminate noise by these process.

2. We restricted the moving averages to 5, 10 days. We should extend to 15, 20 or 30 days.

3. We should also try genetic algorithms for the factor selection.