This essay has been submitted by a student. This is not an example of the work written by our professional essay writers.
This paper focuses on Market Trend Prediction of stock using hybrid decision tree algorithm. Market Trend Prediction of stock and their various trends which has been an area of great interest both to those who wish to profit by trading stocks in the stock market and for researchers attempting to uncover the hidden information in the stock market. A decision tree is constructed for the stock market data set. Fuzzy logic and the rough set techniques are applied to predict the future trend of the stock market. A comparison between these two techniques is done and the best predicted value will be selected as the percentage of increase or decrease in stock market which will help in the economy crisis and also for the investors who will have a rough idea about the growth trend of the market.
Keywords: Data Mining, decision tree, fuzzy entropy techniques, and rough set based techniques.
This work presents the design and performance evaluation of a hybrid decision tree- in a rough set and a fuzzy based system for predicting the next daysâ€Ÿ trend in the Stock Exchange . Technical indicators are used in the present study to extract features from the historical SENSEX data. C4.5 decision tree is then used to select the relevant features and a Rough fuzzy hybridizationÂ method ofÂ hybrid intelligent systemÂ orÂ soft computing is used, whereÂ Fuzzy setÂ theory is used for linguistic representation of patterns, leading to aÂ fuzzy granulationÂ of the feature space. Rough setÂ theory is used to obtain dependency rules which model informative regions in the granulated feature space. Performance of the hybrid rough set based system is compared to that of an Naive Bayes based trend predictor.
Decision tree has been widely used in the prediction of financial time series. It always has been giving an output in 70 to 80 % correctness. This output was an incomplete prediction of probabilities therefore techniques like fuzzy and rough set were used in order to calculate the 99% correct prediction of the stock exchange for future benefits of an economy of a company. Technical indicators and neural networks were used in to predict the US Dollar Vs. British Pound exchange rates. In a framework for intelligent interaction of automatic trading algorithms with the user was presented. In a back propagation neural network was employed to predict the buy/sell points for a stock and then applied a case based dynamic window to further improve the forecast accuracy. In a survey of more than hundred articles which used neural networks and neuro-fuzzy models for predicting stock markets was presented. It was observed that soft computing techniques outperform conventional models in most cases. Defining the structure of the model is however, a major issue and is a matter of trial and error. In, review of data mining applications in stock markets was presented., used a two-layer bias decision tree with technical indicators feature to create a decision rule that can make recommendations when to buy a stock and when not to buy it. combined the filter rule and the decision tree technique for stock trading. presented a hybrid model for stock market forecasting and portfolio selection. Which included fuzzy Entropy and rough set indicators which calculated d difference in the outputs and gave an higher prediction value of when to buy or sell a stock or when the stock value gets higher .
DESIGN OF THE HYBRID SYSTEM
The hybrid trend prediction system proposed in this paper works in the following way: First the features are extracted from the daily stock market data which is already collected and kept. Then these relevant features are selected using decision tree. Fuzzy entropy and a rough set based classifier is then used to predict the next day's trend using the selected features. In the present study, the prediction accuracy of the proposed hybrid system is differentiated and validated using the Stock Exchange Sensitive Index (SENSEX or commonly, SENSEX) data. The daily SENSEX data (open, high, low, close and volume) from September 3, 2003 to March 7, 2010 (a total of 1625 SE working days) is considered for the purpose.
AÂ decision treeÂ is a decision support tool that uses a tree-likeÂ graphÂ orÂ modelÂ of decisions and their possible consequences, includingÂ chanceÂ event outcomes, resource costs, andÂ utility. It is one way to display algorithm. Decision trees are commonly used inÂ operations research, specifically inÂ decision analysis, to help identify a strategy most likely to reach aÂ goal. Decision tree learning is a method commonly used in data mining. The goal is to create a model that predicts the value of a target variable based on several input variables.
InÂ data mining, decision trees can be described also as the combination of mathematical and computational techniques to aid the description, categorization and generalization of a given setof data.
Data comes in records of the form:
The dependent variable, Y, is the target variable that we are trying to understand, classify or generalize. The vectorÂ xÂ is composed of the input variables, x1, x2, x3Â etc., that are used for that task.
Attribute selection measure for decision tree-the Gain Ratio An attribute selection measure (or a splitting rule) is a heuristic for selecting the splitting criterion that best separates a given data partition, D, of class-labeled training tuples into individual classes. It provides a ranking for each attribute describing the given training tuples. The attribute having the best score for the measure is chosen as the splitting attribute for the given tuples. The tree node created for partition D is labeled with the splitting criterion, branches are grown for each outcome of the criterion, and the tuples are partitioned accordingly.
C4.5,uses gain ratio as its attribute selection measure.
It is defined as:
Gain ratio(A)= Gain (A)
The Split Info A (D) represents the potential information generated by splitting the training data set, D, into v partitions, corresponding to the v outcomes of a test on attribute A. The attribute with the maximum gain ratio is selected as the splitting attribute.
The Info (D), also known as the entropy of D, is the average amount of information needed to identify the class label of a tuple in D.
Info A (D) is the expected information required to classify a tuple from D based on the partitioning by A. The smaller the expected information still required, the greater the purity of the partitions.
Gain (A), i.e. the information gain, is defined as the difference between the original information requirement (i.e., based on just the proportion of classes) and the new requirement (i.e., obtained after partitioning on A). The attribute A with the highest information gain, Gain (A), is chosen as the splitting attribute at node N.
In physics, the word entropy has important physical implication as the amount of "disorder" of a system, and in mathematics, a more abstract definition is used. Entropy is as a measure of probabilistic uncertainty. Concept of entropy has penetrated a wide range of disciplines, such as statistical mechanics, business, pattern recognition, transportation, information theory, queuing theory, linear and nonlinear programming and so on.
The concept of fuzzy set was initiated by Zadeh via membership function in 1965. In order to measure a fuzzy event, Zadeh proposed the concept of possibility measure in 1978. Fuzzy entropy is a measure of uncertainty and has been studied by many researchers such as Lao,
Yager . A survey on fuzzy entropy can be found in Pal and Eezdek . Especially, when Î¾ is a fuzzy set taking values xi with membership degrees, i = 1, 2, â€¦ , n, respectively, De Luca and Termini defined its entropy as
It is easy to verify that the functionS(t) is symmetrical about t = 0.5, strictly increases on the interval [0, 0.5], strictly decreases on the interval [0.5,1], and reaches its unique maximum ln 2at t = 0.5 The entropy by De Luca and Termini characterizes the uncertainty resulting primarily from the linguistic vagueness rather than resulting from information deficiency, and vanishes when the fuzzy variable is an equipossible one. However, see hope that Workshop on Intelligent Information Technology Application the entropy is 0 when the fuzzy variable degenerates to a crisp number, and is maximum when fuzzy variable is an equipossible one.
Rough set based trend prediction
Rough sets are extremely useful in dealing with incomplete or imperfect knowledge. A rough set uses the concepts of lower and upper approximations to classify the domain of interest into disjoint categories. The lower approximation is a description of the domain objects that are known with certainty to belong to the subset of interest, whereas the upper approximation is a description of the objects that possibly belong to the subset.
Trend Predictor Techniques for Rough set
A naive Bayes based classifier/prediction system is much easier to design when compared to neural network based or rough set based classifier. Hence, the naive Bayes based prediction system is designed and evaluated to showcase the improvement in accuracy that is brought about by the use of neural network based and the rough set based prediction systems. Bayesian classifiers are statistical classifiers. They can predict class membership probabilities, such as the probability that a given tuple belongs to a particular class. NaÃ¯ve Bayesian classifiers assume class conditional independence, that is, the effect of an attribute value on a given
class is considered independent of the values of the other attributes. It is made to simplify the computations involved and, in this sense, is considered "naÃ¯ve."
over a dependent class variableÂ Â with a small number of outcomes orÂ classes, conditional on several feature variablesÂ Â throughÂ . The problem is that if the number of featuresÂ Â is large or when a feature can take on a large number of values, then basing such a model on probability tables is infeasible. We therefore reformulate the model to make it more tractable.
UsingÂ Bayes' theorem, we write
In practice we are only interested in the numerator of that fraction, since the denominator does not depend onÂ Â and the values of the featuresÂ Â are given, so that the denominator is effectively constant. The numerator is equivalent to theÂ joint probabilityÂ model
which can be rewritten as follows, using theÂ chain ruleÂ for repeated applications of the definition ofÂ conditional probability:
Now the "naive"Â conditional independenceÂ assumptions come into play: assume that each featureÂ Â is conditionallyÂ independentÂ of every other featureÂ Â forÂ . This means that
forÂ , and so the joint model can be expressed as
This means that under the above independence assumptions, the conditional distribution over the class variableÂ Â can be expressed like this:
whereÂ Â (the evidence) is a scaling factor dependent only onÂ , i.e., a constant if the values of the feature variables are known.
Models of this form are much more manageable, since they factor into a so-calledÂ class priorÂ Â and independent probability distributions. If there areÂ Â classes and if a model for eachÂ Â can be expressed in terms ofÂ Â parameters, then the corresponding naive Bayes model has (kÂ âˆ’ 1) +Â nÂ rÂ kÂ parameters. In practice, oftenÂ Â (binary classification) andÂ Â (Bernoulli variablesÂ as features) are common, and so the total number of parameters of the naive Bayes model isÂ , whereÂ Â is the number of binary features used for classification and prediction.
. That is, the parameter estimation i.e..
In order to predict the class labels of p(C|F1,â€¦.Fn) and it is evaluated for each class C it should be
In the present study three classes are considered:C1=UP C2=Down C3=no trend Each F is a set of all attributes for one day.
In this algorithm step basis we have explained how the concept of stock market trend prediction works.
An extraction is performed in the stock exchange to collect all the old data of the SENSEX the rise and fall of all the events. It uses attribute measures formulae of the
Gainratio (A)= Gain(A)
Decision tree C4.5 to formulate a new decision tree.
After formulating a New Decision Tree use the STEP 1 again to predict the particular set of data sets for further calculation using Fuzzy and Rough set techniques.
This data sets are then used as inputs for the Fuzzy Entropy technique and then the calculation is performed using the
Fuzzy Entropy technique and a particular numerical value in percentage is formatted.
Then again the data sets from the new created decision tree is taken and used it for rough set technique using the trend predictors method
5)The values from the rough set trend predictors are taken and compared with the values of Fuzzy technique which values have the MAX percentage is taken and at last the value is predicted.
The design of the hybrid fuzzy and rough set based stock market trend prediction system for predicting the one-day-ahead trend of the SENSEX is presented in this paper. Features are extracted from the historical SENSEX data which is already present . Extracted features are used to generate the prediction rules using fuzzy logic- rough set theory. It is observed that the proposed hybrid decision tree fuzzy-rough set based trend prediction system produces an output which is better than the stand alone systems. The stand-alone systems such as using only rough set based trend prediction system or any trend prediction systems, without any feature selection, produced output which still comes under a doubt of weather it will be sufficient to give an nearby prediction . Both these systems are values are compared then the difference in the values of both the prediction systems is compared and the max values in the difference are taken which gives a 99% exact output. It is seen that automated feature selection employed in the present study significantly improves the trend prediction accuracy. It can be concluded from the present study that soft computing based techniques, particularly the hybrid decision tree fuzzy-rough set based trend prediction system is well suited for stock market trend prediction.