A Stock Prediction System Computer Science Essay

Published: Last Edited:

This essay has been submitted by a student. This is not an example of the work written by our professional essay writers.

Data mining can be described as "making better use of data". Predicting changes in the stock market has always had a certain appeal to researchers. Data mining is well founded on the theory that the historic data holds the essential memory for predicting the future direction. This technology is designed to help investors discover hidden patterns from the historic data that have probable predictive capability in their investment decisions.

The objective of our project is to design and implement the stock market prediction system to find hidden pattern of a particular stock and making the right decision for investment. This is an attempt made to maximize the prediction of financial stock markets using data mining techniques.


The Scope of the project is to implement the knowledge gained in the Data Mining & Pattern Recognition class and design a system that can predict the hidden trend of stock. The data mining is usually applied to techniques that can be used to find underlying structure and relationships in large amount of data .Data mining techniques are being successfully used for many diverse applications. One of the applications of data mining is Stock Market Prediction System .The approach taken in this project is to combine some methods of analyzing stocks and use them to automatically generate prediction of trend of the stock on the next day. The project requires the use of data mining techniques and depending on the characteristics of the problem domain some features needs to be selected which will be simple to extract, invariant and insensitive to noise. These unique features can proved to be helpful in making an optimal decision regarding stock market trend.


There are a variety of techniques to predict whether a Particular area of investment would be profitable in the Future. Data Analysis mainly deals with the feature extraction. Financial Securities are generally evaluated or observed by two methods and also when the objective of the analysis is to determine what stock to buy and at what price, the same two methodologies are being followed. Fundamental and technical analyses are the two basic techniques used to analyze securities in the stock market

Fundamental analysis: Fundamental analysis is concerned with an asset's intrinsic value and what its price should be. Fundamental analysis, is studying the company physically in terms of its product sales, man power, quality, infrastructure etc. and to understand its standing in the market and thereby its profitability as an investment. Fundamental analysis includes :

Economic analysis

Industry analysis

Company analysis

On the basis of these three analyses the intrinsic value of the shares are determined and it is considered to be the true value of the share. In these analysis signals for prediction works as; If it equal to market price hold the share and if it is less than the market price then sell the shares.

Technical analysis: Technical analysis is that kind of security analysis discipline for forecasting the direction of prices through the study of post market, depending on price and volume. Technical analysis focuses more on pattern recognition. Technical analysis on the contrary doesn't study the physical nature of the company .It rather evaluates the securities based on their trends and movement in the market. It assumes that the market movement of stock prices or securities is a reflection of the company's fundamental components. Therefore to understand the company and its profitability through its stock prices in the market, some parameters need to be evaluated that can guide an investor for making a judicious decision. These parameters are called Indicators and Oscillators. For e.g. PPO- Percentage Price Oscillator, PVP - Percentage volume oscillator and RSI- relative strength index. Etc. When using Technical analysis for prediction of stock values, few assumptions are made:

It is assumed that market moves in trends.

History repeats itself i.e. under similar kinds of inputs the stock values behave in similar Manner.

Prices have tendency to go with the trend rather than against it.


A stock is a certificate of proof of your ownership of a small fraction of a corporation. When we buy a stock, we are paying for a small percentage of everything that that company owns such as buildings, chairs, computers, etc. When we own a stock, we are referred to as a shareholder or a stockholder. In essence, a stock is a representation of the amount of a company that we own. The benefit of owning stock in a corporation is that whenever the corporation profits, you profit as well. A stock also gives you the right to make decisions that may influence the company.

There are different kinds of stock we can purchase:

Penny stocks: These are the lowest levels of stock. Penny stocks are small companies that have almost no chance of making it big, and they are usually of no value. These stocks could be a local chain of stores, or a company that does not provide anything desirable.

Growth stocks: These are stocks in new companies that have a lot of potential for success, but are not stable, and do not always become successful. Growth stocks provide the fastest earnings since the company is usually growing due to an advantage such as a new product. Growth stocks are not always a safe investment since they are not well established.

Blue chip stocks: These are the highest level of stocks you can buy. The older companies usually are blue chip, such as International Business Machines (IBM), AT&T, and Coca Cola. These stocks are the safest investment you can make, but they also take a lot more time to profit with.

Stock Market:

A stock market or equity market is a public entity for the trading of company stock (shares) and derivatives at an agreed price; these are securities listed on a stock exchange as well as those only traded privately. The stocks are listed and traded on stock exchanges, which are entities of a corporation or mutual organization specialized in the business of bringing buyers and sellers of the organizations to a listing of stocks and securities together.

Stock Market Index

The movements of the prices in a market or section of a market are captured in price indices called stock market indices e.g., the S&P, the FTSE and the Euronext indices. The constituents of the index are reviewed frequently to include/exclude stocks in order to reflect the changing business environment.

Buying and selling

The first step when buying stocks is to decide what company to buy stock in. You can buy stock in any publicly held corporation but cannot buy in a privately held or closely held corporation. There are different methods to choose a company for investment:

Fundamental analysis 

Technical analysis 

One popular method is just throwing darts at the stock page

After you decide what company to invest in, you need to find a broker. After you find a broker and buy the stocks, the broker does the rest of the work. You just have to call him up and place an order with him. A broker is the only person that can make an order to buy or sell stocks.

Stockbroker - who researches investments; helps make goals, give advice on investing.

Discount brokers - They just are middle men in the transactions.

Floor brokers - do all the actual buying and selling.

There are also websites such as Scottrade on which you can buy and sell stocks yourself. Those who understand the stock market generally use these sites. Just as you would pay a broker, each trade has a fee on the site. One trade on average is $7.00.

Investment strategies

There are many different investment strategies; two basic methods are classified as either fundamental analysis or technical analysis. Fundamental analysis refers to analyzing companies by their financial statements found in SEC Filings, business trends, and general economic conditions, etc. Technical analysis studies price actions in markets through the use of charts and quantitative techniques to attempt to forecast price trends regardless of the company's financial prospects.


To define the movement of the prices of the stocks we need to understand the concepts of momentum of stocks and the trends that follow. Momentum, as the name suggests, reflects the amount of change experienced by the stock price over the previous period.

Momentum = close today − close N days ago

Uptrend and Downtrend

"The trend is your friend" is a quote used often by traders and the theory behind it is simple; it's perceived as easy to make money trading in the same direction as the trend. Trend is basically defined as the general direction of a market or of the price of an asset. Trend Is Where each successive peak or trough in a market's price chart is higher/lower than the preceding one.

Trend however reflects the kind of movement patterns that are observed in the values under similar market conditions. Without the knowledge of these factors predicting the future stock values is very difficult. Broadly speaking there are 'uptrends' (when the prices are going up) and down trends (when the prices are going down).

To determine the trend just follow the basic concept:

Uptrend: An uptrend shows a series of higher peaks (highs) and higher troughs (lows). To determine an uptrend ,connect the low to high, high to higher low, higher low to higher high, higher high to higher low and so on

Downtrend: A downtrend shows a series of lower peaks, and lower troughs. To determine the Downtrend, connect the low to high, high to lower low, lower low to lower high, lower high to lower low and so on.


Data preprocessing is a major step of the development process. This step transforms the data into its final form for input to a model. Often times Preprocessing is the key for developing a highly successful pattern recognition system .Typical preprocessing steps include normalizing the data and reducing the no of input variables. One of the primary reasons for preprocessing data is to reduce noise and inconsistent data preprocessing can often reduce noise and enhance the signal.

In Preprocessing phase we select and extract intrinsic features which represent the property of a class but different from the other classes. The main goal of preprocess is to transform the raw data into an easier or a simplified representation for decision making system design.

There are many features which can be extracted and used on stock market prediction system. Among the features is that when the stock price goes very high or low, an excessive amount of the stock volume also changes. This would indicate either a buying or selling signal. In our project, we have selected the most important and common features which are directly related to the price and the stock volume.

In our system, we have used the moving average to smooth our data whenever is required. RSI which is a volume indictor is also to provide stock volume information.

In the Stock Prediction system we have chosen three Indicators or oscillators:

Percentage Price Oscillator (Moving average)

Percentage Volume Oscillator

Relative Strength Index

Percentage Price Oscillator:

The percentage price Oscillator PPO is a momentum oscillator which is the difference between two moving averages as a percentage of the larger moving average. It represents two moving averages relative to each other. Similar to Moving Average Convergence-Divergence MACD, the percentage price oscillator is represented by a signal line, a histogram and a centerline. It generates buy signals when shorter tern moving averages cross above longer term moving average. Signals are generated with the signal line crossovers, centerline crossovers and divergences. Because these signals are no different than those associated with MACD, however there are few differences between the two. First PPO readings are not subject to the price level of the security. Second PPO readings for different securities can be compared even when there are large differences in the price.

PPO uses the concept of moving average:

Moving average: Moving averages smooth the price data to form a trend following indicator. They do not predict price direction, but rather define the current direction with a lag. Moving averages lag because they are based on past prices. Despite this lag, moving averages help smooth price action and filter out the noise. They also form the building blocks for many other technical indicators and overlays, such as Bollinger Bands, MACD etc. The two most popular types of moving averages are the Simple Moving Average (SMA) and the Exponential Moving Average (EMA).

SMA Calculation:

A simple moving average is formed by calculating the average price of a stock over a specific number of periods. Most moving averages are based on closing prices. A 5-day simple moving average is the five day sum of closing prices divided by five. As its name implies, a moving average is an average that moves. Old data is dropped as new data comes available. This causes the average to move along the time scale. Below is an example of a 5-day moving average evolving over three days

Daily Closing Prices: 11,12,13,14,15,16,17

First day of 5-day SMA: (11 + 12 + 13 + 14 + 15) / 5 = 13

Second day of 5-day SMA: (12 + 13 + 14 + 15 + 16) / 5 = 14

Third day of 5-day SMA: (13 + 14 + 15 + 16 + 17) / 5 = 15

The first day of the moving average simply covers the last five days. The second day of the moving average drops the first data point (11) and adds the new data point (16). The third day of the moving average continues by dropping the first data point (12) and adding the new data point (17). In the example above, prices gradually increase from 11 to 17 over a total of seven days. Notice that the moving average also rises from 13 to 15 over a three day calculation period. Also notice that each moving average value is just below the last price. For example, the moving average for day one equals 13 and the last price is 15. Prices the prior four days were lower and this causes the moving average to lag.

EMA Calculation

Exponential moving averages reduce the lag by applying more weight to recent prices. The weighting applied to the most recent price depends on the number of periods in the moving average. There are three steps to calculate an exponential moving average. First, calculate the simple moving average. An exponential moving average (EMA) has to start somewhere so a simple moving average is used as the previous period's EMA in the first calculation. Second, calculate the weighting multiplier. Third, calculate the exponential moving average. The formula below is for a 10-day EMA.

SMA: 10 period sum / 10

Multiplier: (2 / (Time periods + 1) ) = (2 / (10 + 1) ) = 0.1818 (18.18%)

EMA: {Close - EMA(previous day)} x multiplier + EMA(previous day).

A 10-period exponential moving average applies an 18.18% weighting to the most recent price. A 10-period EMA can also be called an 18.18% EMA. A 20-period EMA applies a 9.52% weighing to the most recent price (2/(20+1) = .0952). Notice that the weighting for the shorter time period is more than the weighting for the longer time period. In fact, the weighting drops by half every time the moving average period doubles.

PPO Calculation:

To calculate the PPO, subtract the 26-day Exponential Moving Average (EMA) from the twelve-day Exponential Moving Average EMA, and then divide this difference by the 26-day EMA. The end result is a percentage that tells the trader where the short-term average is relative to the longer-term average. 


Percentage Price Oscillator : {(12-day EMA - 26-day EMA)/26-day EMA} x 100

Signal Line: 9-day EMA of PPO

PPO Histogram: PPO - Signal Line

The Percentage Price Oscillators (PPO) Histogram shows the difference between the Percentage Price Oscillators (PPO) and 9-days EMA (exponential moving average) of the PPO. Increases in the Percentage Price Oscillators (PPO) histogram represents the bullish momentum is strengthening and declines in Percentage Price Oscillators (PPO) Histogram represents bearish momentum is strengthening.  All crossover above or below the center line indicates shorter moving average crossing above or below the longer moving average and that's a buy or sell signal.

Percentage Volume Oscillator

The Percentage Volume Oscillator (PVO) is a momentum oscillator for volume. PVO measures the difference between two volume-based moving averages as a percentage of the larger moving average. As with MACD and the Percentage Price Oscillator (PPO), it is shown with a signal line, a histogram and a centerline. PVO is positive when the shorter volume EMA is above the longer volume EMA and negative when the shorter volume EMA is below. This indicator can be used to define the ups and downs for volume, which can then be used to confirm or refute other signals.

Percentage Volume Oscillator (PVO):

((12-day EMA of Volume - 26-day EMA of Volume)/26-day EMA of Volume) x 100

Signal Line: 9-day EMA of PVO

PVO Histogram: PVO - Signal Line

The default settings for the PVO are (12,26,9), which is the same as MACD or the PPO. This means PVO is positive when the 12-day Volume EMA moves above the 26-day Volume EMA. PVO is negative when the 12-day Volume EMA moves below the 26-day Volume EMA.The PVO-Histogram acts just like the MACD and PPO histograms. The PVO-Histogram is positive when the PVO is trading above its signal line (9-day EMA). The PVO-Histogram is negative when the PVO is below its signal line. Note that the PVO is multiplied by 100 to move the decimal point two places.

Relative Strength Index (RSI):

Relative Strength Index (RSI) is a momentum oscillator that measures the speed and change of price movements. RSI oscillates between zero and 100. RSI is an extremely popular momentum indicator that has been featured in a number of articles, interviews and books over the years. 



RSI = 100 - --------

1 + RS

RS = Average Gain / Average Loss

To simplify the calculation explanation, RSI has been broken down into its basic components: RS,Average Gain and Average Loss. This RSI calculation is based on 14 periods; Losses are expressed as positive values, not negative values.

The very first calculations for average gain and average loss are simple 14 period averages.

First Average Gain = Sum of Gains over the past 14 periods / 14.

First Average Loss = Sum of Losses over the past 14 periods / 14

The second, and subsequent, calculations are based on the prior averages and the current gain loss:

Average Gain = [(previous Average Gain) x 13 + current Gain] / 14.

Average Loss = [(previous Average Loss) x 13 + current Loss] / 14.

Taking the prior value plus the current value is a smoothing technique similar to that used in exponential moving average calculation. This also means that RSI values become more accurate as the calculation period extends. Sharp Charts uses at least 250 data points prior to the starting date of any chart (assuming that much data exists) when calculating its RSI values. To exactly replicate our RSI numbers, a formula will need at least 250 data points.

Wilder's formula normalizes RS and turns it into an oscillator that fluctuates between zero and 100. In fact, a plot of RS looks exactly the same as a plot of RSI. The normalization step makes it easier to identify extremes because RSI is range bound. RSI is 0 when the Average Gain equals zero. Assuming a 14-period RSI, a zero RSI value means prices moved lower all 14 periods. There were no gains to measure. RSI is 100 when the Average Loss equals zero. This means prices moved higher all 14 periods. There were no losses to measure.



There are many approaches that can be considered while designing a system. One approach to design a system is to use the sample to estimate the probability densities. The other approach is to assume the structure of the probability density is known and only need to estimate the parameters of the function.

After doing the long research and a brain storm to come up with an approach that can let make the optimal decision in the stock prediction system. In addition to this, a literature survey was conducted, in which we learned the basic terminology related to stock market, the behavior of stock market, to find out how other prediction systems are implemented.

Building Ground Truth Rules:

The ground truth we followed in the stock market prediction system is:

For the System design we have two major parameters one is the Period time which measures the range of time in which system will process analysis and another parameter is pre-defined difference of maximum price and minimum price.

For the Specific period of time it calculates the min price and max price for the particular stock and there comes the two cases:

If day of min is before the day of max ; day of min buy and day of max sell

1.1)Then find the day which have price closest to min and similarly find the day which have price closest to max along with fulfilling the condition that their price range = (max -min) x (100 %- ratio of difference)/2 .

The other days in that period are set to HOLD. Continue to next period starting at the day which is next day of max price.

If day of min after day of max , then there comes two cases:

2.1) find another min day before max day, if it exits then assign day of min buy and day of

Max sell. Proceed to 1.1

2.2) Else Find another min before another max in this period, go to 1.1

The project has undergone mainly three phases:

Phase 1: Initial Phase:

In our system design we have employed several approaches. The initial approach consists of a very basic idea which only looks and compares the stock price with previous and following days. It then issues a decision based on the fact that If the stock price is greater than the previous day the system will issue a sell signal. However, if it is lower than the next day, it will issue a buy signal. This approach results in inaccurate decisions. This approach was unable to provide accurate decision because it couldn't correctly predict the trend.

Phase 2: Extended Period.

The next approach, we extended the period for the algorithm of comparing the stock price.


Phase 3: Decision:

Our current approach is (to try exploit more knowledge ) to get better rtreults more accurate than initially implemented approaches. The current approach will try to find out the lowest stock price and then issue a buy signal. Then, algorithm will also try to find out the next highest points during a specific period and issue a sell signal.


There are two types of approach being used in this project.

Histogram Approach to estimate probability density

Gaussian Classifier

Histogram Approach: The Histogram approach is a graphical display of tabulated frequencies. It represents the no of sample events occurred in a defined historical time period or space.

Gaussian classifier: Another classifier that we have used in the project is Gaussian Classifier. The unimodal Gaussian algorithm is a relatively simple parametric model for pattern classification. The basis for this model relies on the assumption that the probability distribution for input vectors of each class is Gaussian.

// formula , reason //

Knn classifier

TESTING results and Discussion

With PPO feature only using Gaussian classifier

HP Stock Wal-Mart stock




























KNN classifier










// make 6 similar pages , same as above //


The stock market is among the most volatile financial institutions in business. And it's this volatility that tends to be the biggest problem with the stock market. Almost any reason, real or imagined can cause these extreme fluctuations that often affect the stock market's credibility. Real factors such as the weather, political instability, political decisions, war, terrorist threats, boycotts and strikes, economic trends and international trade or even company scandals also become factors to the stock market problems. Bad weather such as hurricanes affects certain industries such as oil production, which in turn will affect the stock price value. Therefore it was challenge to determine which indicators and input data will be used, and gathering enough training data to train the system appropriately.


Although we have designed a system there are yet many improvements to be done to make the system to give optimal and profitable decision.

Make Trg data perfect future trg data to give good signal

Grab feed directky from yahoo or etc , for data feed

Finding more fetures


The project was aimed at finding the optimal decision for the prediction of stock market trend .


[1] K. Senthamarai Kannan, P. Sailapathi Sekar, M.Mohamed Sathik and P. Arumugam (2010), "Financial Stock Market Forecast using DataMining Techniques"

[2] Samarth Agrawal, Manoj Jindal, G. N. Pillai (2010) , "Momentum Analysis based Stock Market

Prediction using Adaptive Neuro-Fuzzy Inference System (ANFIS)"

[3] http://stockcharts.com/school/doku.php?id=chart_school:technical_indicators

[4] htto://finance,yahoo.com