# Classification Of Text Using Decision Learning And Fuzzy Logic Computer Science Essay

Published:

This essay has been submitted by a student. This is not an example of the work written by our professional essay writers.

The article is based on applying decision learning classification to create a model that predicts the value of a target variable based on several input variables with the help of fuzzy logic and then giving the best fitted algorithm for the same. The system is in development stage ,in this article we are giving the proposed system which we have started implementing .The first aim of the system is to apply different classification algorithms on training data set and then trying to removing some of the drawbacks of the algorithms with the help of fuzzy logic. Many of the data mining algorithms are aimed to predict the future state of the data Classification technique of mapping the target data to the predefined groups or classes, this is a supervise learning because the classes are predefined before the examination of the target data The classification algorithm learns from the training set and builds a model. The model is used to classify new objects. Such systems take as input a collection of cases, each belonging to one of a small number of classes and described by its values for a fixed set of attributes, and output a classifier that can accurately predict the class to which a new case belongs.

Keywords:-KDD (Knowledge Discovery in Database), ID3 ((Iterative Dichotomiser 3),FL (fuzzy logic)

Data mining, popularly known as Knowledge Discovery in Databases (KDD), is the nontrivial extraction of implicit, previously unknown and potentially useful information from data in databases[1].

## 1.2 The data mining models:

The data mining models are of two types[1]:

## 1.2.1 Predictive and Descriptive.

The predictive model makes prediction about unknown data values by using the known values. Ex. Classification, Regression, Time series analysis, Prediction etc[3].

The descriptive model identifies the patterns or relationships in data and explores theproperties of the data examined. Ex. Clustering, Summarization, Association rule, Sequence discovery etc[3].

## Problem Definition

Applying different classification models of data mining to the result produced from text mining and, thus, make the information contained in the text accessible to the various data mining (statistical and machine learning) algorithms and then giving the best fitted algorithm for the same with the help of fuzzy logic.

## 2. Text mining:

Text mining is also called an intelligent text analysis, text data mining, or knowledge discovery in text uncovers previously invisible patterns in existing resources. To perform analysis, decision-making, and knowledge management tasks, information systems use an increasing amount of unstructured information in the form of text[5].

This data needs to improve the text mining technologies required for information retrieval, filtering, and classification.

Text Mining itself is not a function, it comprises of various functions which when combined can be called Text Mining functions. The main functions include Searching, Information Extraction (IE), Categorization, Summarization, Prioritization, Clustering, Information Monitor and Question and Answers[5].

Decision tree learning is a common method used in data mining. The goal is to create a model that predicts the value of a target variable based on several input variables. Each interior node corresponds to one of the input variables; there are edges to children for each of the possible values of that input variable. Each leaf represents a value of the target variable given the values of the input variables represented by the path from the root to the leaf[6].

A tree can be "learned" by splitting the source set into subsets based on an attribute value test. This process is repeated on each derived subset in a recursive manner called recursive partitioning. The recursion is completed when the subset at a node all has the same value of the target variable, or when splitting no longer adds value to the predictions[16,18].

In data mining, trees can be described also as the combination of mathematical and computational techniques to aid the description, categorisation and generalisation of a given set of data.

Data comes in records of the form:

(\textbf{x},Y) = (x_1, x_2, x_3, ..., x_k, Y)

The dependent variable, Y, is the target variable that we are trying to understand, classify or generalise. The vector x is composed of the input variables, x1, x2, x3 etc., that are used for that task.

## 1) ID3 (Iterative Dichotomiser 3)

In decision tree learning, ID3 (Iterative Dichotomiser 3) is an algorithm used to generate a decision tree invented by Ross Quinlan. ID3 is the precursor to the C4.5 algorithm[3].

The ID3 algorithm can be summarized as follows[3]:

Take all unused attributes and count their entropy concerning test samples

Choose attribute for which entropy is minimum (or, equivalently, information gain is maximum)

Make node containing that attribute

The algorithm is based on Occam's razor: it prefers smaller decision trees (simpler theories) over larger ones. However, it does not always produce the smallest tree, and is therefore a heuristic. Occam's razor is formalized using the concept of information entropy[2]:

## Entropy

E(S) = - \sum^{n}_{j=1} f_{S}(j) \log_{2} f_{S}(j)

WhereÂ :

E(S) is the information entropy of the subset SÂ ;

n is the number of different values of the attribute in S (entropy is computed for one chosen attribute)

fS(j) is the frequency (proportion) of the value j in the subset S

log2 is the binary logarithm

An entropy of 0 identifies a perfectly classified subset while 1 shows a totally random composition.

Entropy is used to determine which node to split next in the algorithm, the higher the entropy, the higher the potential to improve the classification here[3].

## Gain

Gain is computed to estimate the gain produce by a split over an attribute: G(S, A) = E(S) - \sum^{m}_{i=1} f_{S}(A_{i}) E(S_{A_{i}})

Where:

G(S,A) is the gain of the subset S after a split over the A attribute

E(S) is the information entropy of the subset S

m is the number of different values of the attribute A in S

fS(Ai) is the frequency (proportion) of the items possessing Ai as value for A in S

Ai is ith possible value of A

S_{A_{i}}is a subset of S containing all items where the value of A is Ai

Gain quantifies the entropy improvement by splitting over an attribute: higher is better.

## 3.0 Proposed System:-

Decision tree learning algorithm has been successfully used in expert systems in capturing knowledge. The main task performed in these systems is using inductive methods to the given values of attributes of an unknown object to determine appropriate classification according to decision tree rules. I examine the decision tree learning algorithm ID3 and will implement this algorithm using Java programming. I first implement basic ID3 in which I dealt with the target function that has discrete output values. I also extend the domain of ID3 to real-valued output, such as numeric data and discrete outcome rather than simply Boolean value with the help of fuzzy logic. Again I am going to add the pruning technique which will delete the unused branches of tree.

Here I am using the fuzzy logic's member fuction so as to calculate real values ,the next section gives the brief idea about fuzzy logic and the part which I will require for my system.

## 3.1 Fuzzy logic

It is a form of many-valued logic derived from fuzzy set theory to deal with reasoning that is fluid or approximate rather than fixed and exact. In contrast with "crisp logic", where binary sets have two-valued logic, fuzzy logic variables may have a truth value that ranges in degree between 0 and 1. In simple words we can say fuzzy logic is a super set of conventional (boolean) logic that has been extended to handle the concept of partial truth--the truth values between completely true and completely false.[7]

Fuzzy logic and probabilistic logic are mathematically similar - both have truth values ranging between 0 and 1 - but conceptually distinct, due to different interpretations-see interpretations of probability theory. Fuzzy logic corresponds to "degrees of truth", while probabilistic logic corresponds to "probability, likelihood"; as these differ, fuzzy logic and probabilistic logic yield different models of the same real-world situations [7].

Both degrees of truth and probabilities range between zero and one and hence may seem similar at first. For example, let a 100Â ml glass contain 30 ml of water. Then we may consider two concepts: Empty and Full. The meaning of each of them can be represented by a certain fuzzy set. Then one might define the glass as being 0.7 empty and 0.3 full. Note that the concept of emptiness would be subjective and thus would depend on the observer or designer. Another designer might equally well design a set membership function where the glass would be considered full for all values down to 50 ml. It is essential to realize that fuzzy logic uses truth degrees as a mathematical model of the vagueness phenomenon while probability is a mathematical model of ignorance. The same could be achieved using probabilistic methods, by defining a binary variable "full" that depends on a continuous variable that describes how full the glass is. There is no consensus on which method should be preferred in a specific situation.

## 3.1.1 Decidability issues for fuzzy logic

The notions of a "decidable subset" and "recursively enumerable subset" are basic ones for classical mathematics and classical logic. Then, the question of a suitable extension of such concepts to fuzzy set theory arises. A first proposal in such a direction was made by E.S. Santos by the notions of fuzzy Turing machine, Markov normal fuzzy algorithm and fuzzy program.Successively, L. Biacino and G. Gerla showed that such a definition is not adequate and therefore proposed the following one.

Ü denotes the set of rational numbers in [0,1]. A fuzzy subset sÂ : S \rightarrow[0,1] of a set S is recursively enumerable if a recursive map hÂ : SÃ-N \rightarrowÜ exists such that, for every x in S, the function h(x,n) is increasing with respect to n and s(x) = lim h(x,n). We say that s is decidable if both s and its complement -s are recursively enumerable. An extension of such a theory to the general case of the L-subsets is proposed in Gerla 2006. The proposed definitions are well related with fuzzy logic. Indeed, the following theorem holds true (provided that the deduction apparatus of the fuzzy logic satisfies some obvious effectiveness property) [8].

Theorem: - Any axiomatizable fuzzy theory is recursively enumerable. In particular, the fuzzy set of logically true formulas is recursively enumerable in spite of the fact that the crisp set of valid formulas is not recursively enumerable, in general. Moreover, any axiomatizable and complete theory is decidable

3.1.2 Fuzzy classification is the process of grouping elements into a fuzzy set (Zadeh 1965) whose membership function is defined by the truth value of a fuzzy propositional function. It has been discussed for example by (Zimmermann H.-J., 2000), (Meier, Schindler, & Werro, 2008) or (Del Amo, Montero, & Cutello, 1999).

A fuzzy class ~C = { i | ~Î (i) } is defined as a fuzzy set ~C of individuals i satisfying a fuzzy classification predicate ~Î which is a fuzzy propositional function. The domain of the fuzzy class operator ~{ .| .} is the set of variables V and the set of fuzzy propositional functions ~PF, and the range is the fuzzy powerset (the set of fuzzy subsets) of this universe, ~P(U):

~{ .| .}âˆ¶V Ã- ~PF âŸ¶ ~P(U)

A fuzzy propositional function is, analogous to (Russel, 1919, S. 155), an expression containing one or more variables, such that, when values are assigned to these variables, the expression becomes a fuzzy proposition in the sense of (Zadeh 1975).

Accordingly, fuzzy classification is the process of grouping individuals having the same characteristics into a fuzzy set. A fuzzy classification corresponds to a membership function Î¼ that indicates whether an individual is a member of a class, given its fuzzy classification predicate ~Î .

Î¼âˆ¶~PF Ã- U âŸ¶ ~T

Here, ~T is the set of fuzzy truth values (the interval between zero an one). The fuzzy classification predicate ~Î corresponds to a fuzzy restriction "i is R" (Zadeh, Calculus of fuzzy restrictions, 1975) of U, where R is a fuzzy set defined by a truth function. The degree of membership of an individual i in the fuzzy class ~C is defined by the truth value of the corresponding fuzzy predicate.

Î¼~C(i):= Ï„(~Î (i))

## 3.2 System Architecture

'Decision tree learning is a method for approximating discrete-valued target functions, in which the learned function is represented by a decision tree[6].

Decision tree learning is one of the most widely used and practical methods for inductive inference Decision tree learning algorithm has been successfully used in expert systems in capturing knowledge. The main task performed in these systems is using inductive methods to the given values of attributes of an unknown object to determine appropriate classification according to decision tree rules[6].

Decision trees classify instances by traverse from root node to leaf node. We start from root node of decision tree, testing the attribute specified by this node, then moving down the tree branch according to the attribute value in the given set. This process is the repeated at the sub-tree level.

Determine appropriate classification according to decision tree rules

## Apply Decision Tree Learning Method

## Apply Fuzzy logic to calculate targeted value

Extend attribute -value to continuous-valued data

Classify instances(attribute value) by traverse from root node to leaf node

Figure3.1 System Architecture

## 3.3 Feasibility Study

Feasibility study for this system includes the following points

## 1) System Feasibility:

Text mining also called intelligent text analysis, text data mining, or knowledge discovery in text uncovers previously invisible patterns in existing resources[1].

To perform analysis, decision-making, and knowledge management tasks, information systems use an increasing amount of unstructured information in the form of text. This data influx, in turn, has spawned a need to improve the text mining technologies required for information retrieval, filtering, and classification.

People who are involved in doing research can systematically analyze multiple research papers, e-books and other documents, and then swiftly determine what they contain. For example in an HR department, a CV which matches a particular job specification from amongst a million CV in a database may not be the simplest of tasks. All this doesn't just make it easy to determine what to focus in a particular document, but also where to find it and how to important it is as compared to other similar documents. With its extensible knowledge base and generic algorithms, it can be brought to use for just about any field or industry

## 2) Operational Feasibility:

What is decision tree learning algorithm suited for:

1. Instance is represented as attribute-value pairs. For example, attribute 'Temperature' and its value 'hot', 'mild', 'cool'. We are also concerning to extend attribute -value to continuous-valued data (numeric attribute value) in system.

2. The target function has discrete output values. It can easily deal with instance which is assigned to a boolean decision, such as 'true' and 'false', 'p(positive)' and 'n(negative)'. Although it is possible to extend target to real valued outputs. The training data may contain errors. This can be dealt with pruning technique.

## 4.0 Applications OR Benefits

For inductive learning, decision tree learning is attractive for 3 reasons:

1. Decision tree is a good generalization for unobserved instance, only if the instances are described in terms of features that are correlated with the target concept.

2. The methods are efficient in computation that is proportional to the number of observed training instances.

3. The resulting decision tree provides a representation of the concept that appeals to human because it renders the classification process self-evident.

## 5.0 Requirement Analysis

Requirement analysis for this project includes the following points

1) Architectural Requirement: - 'Decision tree learning is a method for approximating discrete-valued target functions, in which the learned function is represented by a decision tree.

Decision tree learning is one of the most widely used and practical methods for inductive inference'. Decision tree learning algorithm has been successfully used in expert systems in capturing knowledge. The main task performed in these systems is using inductive methods to the given values of attributes of an unknown object to determine appropriate classification according to decision tree rules.

Decision trees classify instances by traverse from root node to leaf node. We start from root node of decision tree, testing the attribute specified by this node, and then moving down the tree branch according to the attribute value in the given set. This process is the repeated at the sub-tree level.

## 2) Functional Requirement:-

As shown in block diagram of System architecture the functional requirement for the system are forming decision tree, decision tree learning and classifying instances.

The basic idea of ID3 algorithm is to construct the decision tree by employing a top-down, greedy search through the given sets to test each attribute at every tree node. In order to select the attribute that is most useful for classifying a given sets, we introduce a metric information gain.

## 3) Behavioral Requirement:-

To find an optimal way to classify a learning set, what we need to do is to minimize the questions asked (i.e. minimizing the depth of the tree). Thus, we need some function which can measure which questions provide the most balanced splitting. The information gain metric is such a function.

## 4) Nonfunctional Requirement:-

The non functional requirement for the project are given below

Load learning sets

create decision tree root node

Add learning set into root node as its subset.

## 5.1 Objectives

The objective of the implementation of the project is

To study the text mining concept and then retrieving the data values by applying Latent Semantic Indexing word transformation technique

Use the extracted information in a data mining model with predictive classification algorithms.

By comparing the performance, giving the best fitted algorithm.

## 5.2 Constraints

## 1) Shortcoming of ID3

'A significant shortcoming of ID3 is that the space of legal splits at a node is impoverished. A split is a partition of the instance space that results from placing a test at a decision tree node. ID3 and its descendants only allow testing a single attribute and branching on the outcome of that test

## 2) Extending ID3 to real-valued data:

ID3 is quite efficient in dealing with the target function that has discrete output values. It can easily deal with instance which is assigned to a boolean decision, such as 'true' and 'false', 'p (positive)' and 'n (negative)'. It is possible to extend target to real-valued outputs

## 5.4 Assumptions

The main task of this project is Decision Tree learning algorithm so it will have a simple database format. Each database is actually a text file. The first line is the name of the database. The second line contains the attribute names immediately followed by their types: numerical or symbolic. A numerical attribute can be real or integer. A symbolic attribute denotes any discrete attribute. Its value can be symbols, numbers or both of them. Each line corresponds to one example and contains the values of the attributes.