Chapter 1 Intelligent Software Agent

1.1 Intelligent Agent

An Agent can be defined as follows: “An Agent is a software thing that knows how to do things that you could probably do yourself if you had the time” (Ted Seller of IBM Almaden Research Centre). Another definition is: “A piece of software which performs a given task using information gleaned from its environment to act in a suitable manner so as to complete the task successfully. The software should be able to adapt itself based on changes occurring in its environment, so that a change in circumstances will still yield the intended results” (G.W.Lecky - Thompson). [1] [2] [3] [4]

An Intelligent Agent can be divided into weak and strong notations.

Table 1.1 shows the properties for both the notations.

Weak notation

Strong notation

Autonomy

Mobility

Social ability

Benevolence

Reactivity & Proactivity

Rationality

Temporal continuity

Adaptivity

Goal oriented

Collaboration

Table 1.1

1.1.1 Intelligency

Intelligence refers to the ability of the agent to capture and apply domain specific knowledge and processing to solve problems. An Intelligent Agent uses knowledge, information and reasoning to take reasonable actions in pursuit of a goal. It must be able to recognise events, determine the meaning of those events and then take actions on behalf of a user. One central element of intelligent behaviour is the ability to adopt or learn from experience. Any Agent that can learn has an advantage over one that cannot. Adding learning or adaptive behaviour to an intelligent agent elevates it to a higher level of ability. In order to construct an Intelligent Agent, we have to use the following topics of Artificial Intelligence:

* Knowledge Representation

* Reasoning

* Learning [5]

1.1.2 Operation

The functionality of a mobile agent is illustrated in 1.1. Computer A and Computer B are connected via a network. In step 1 a mobile Agent is going to be dispatched from Computer A towards Computer B. In the mean time Computer A will suspend its execution. Step 2 shows this mobile Agent is now on network with its state and code. In step 3 this mobile Agent will reach to its destination, computer B, which will resume its execution. [7]

1.1.3 Strengths and Weaknesses

Many researchers are now developing methods for improving the technology, with more standardisation and better programming environments that may allow mobile agents to be used in products.

It is obvious that the more an application gets intelligent, the more it also gets unpredictable and uncontrollable. The main drawback of mobile agents is the security risk involved in using them. [8] [9]

The following table shows the major strengths and weaknesses of Agent technology:

Strengths

Weakness

Overcoming Network Latency

Security

Reducing Network traffic

Performance

Asynchronous Execution and Autonomy

Lack of Applications

Operating in Heterogeneous Environments

Limited Exposure

Robust and Fault-tolerant Behavior

Standardization

Table 1.2

1.2 Applications

The followings are the major and most widely applicable areas of Mobile Agent:

* Distributed Computing: Mobile Agents can be applied in a network using free resources for their own computations.

* Collecting data: A mobile Agent travels around the net. On each computer it processes the data and sends the results back to the central server.

* Software Distribution and Maintenance: Mobile agents could be used to distribute software in a network environment or to do maintenance tasks.

* Mobile agents and Bluetooth: Bluetooth is a technology for short range radio communication. Originally, the companies Nokia and Ericsson came up with the idea. Bluetooth has a nominal range of 10 m and 100 m with increased power. [38]

* Mobile agents as Pets: Mobile agents are the ideal pets. Imagine something like creatures. What if you could have some pets wandering around the internet, choosing where they want to go, leaving you if you don't care about them or coming to you if you handle them nicely? People would buy such things won't they? [38]

* Mobile agents and offline tasks:

1. Mobile agents could be used for offline tasks in the following way:

a- An Agent is sent out over the internet to do some task.

b- The Agent performs its task while the home computer is offline.

c- The Agent returns with its results.

2. Mobile agents could be used to simulate a factory:

a- Machines in factory are agent driven.

b- Agents provide realistic data for a simulation, e.g. uptimes and efficiencies.

c- Simulation results are used to improve real performance or to plan better production lines. [10[ [11] [12]

1.3 Life Cycle

An intelligent and autonomous Agent has properties like Perception, Reasoning

and Action which form the life cycle of an Agent as shown in 1.2. [6]

The agent perceives the state of its environment, integrates the perception in its knowledge base that is used to derive the next action which is then executed. This generic cycle is a useful abstraction as it provides a black-box view on the Agent and encapsulates specific aspects. The first step is the Agent initialisation. The Agent will then start to operate and may stop and start again depending upon the environment and the tasks that it tried to accomplish. After the Agent finished all the tasks that are required, it will end at the completing state. [13] Table 1.3 shows these states.

Name of Step

Description

Initialize

Performs one-time setup activities.

Start

Start its job or task.

Stop

Stops jobs, save intermediate results, joins all threads and stops.

Complete

Performs one-time termination activities.

Table 1.3

1.4 Agent Oriented Programming (AOP)

It is a programming technique which deals with objects, which have independent thread of control and can be initiated. We will elaborate on the three main components of the AOP.

a- Object: Grouping data and computation together in a single structural unit called an ‘Object'. Every Agent looks like an object.

b- Independent Thread of control: This means when this developed Agent which is an object, when will be implemented in Boga server, looks like an independent thread. This makes an Agent different from ordinary object.

c- Initiation: This deals with the execution plan of an Agent, when implemented, that Agent can be initiated from the server for execution. [14] [15] [16] [17]

1.5 Network paradigms

This section illustrates the traditional distributed computing paradigms like Simple Network Management Protocol (SNMP) and Remote Procedure Call (RPC).

1.5.1 SNMP

Simple Network Management Protocol is a standard for gathering statistical data about network traffic and the behavior of network components. It is an application layer protocol that sits above TCP/IP stack. It is a set of protocols for managing complex networks. It enables network administrators to manage network performance, find and solve network problems and plan for network growth. It is basically a request or response type of protocol, communicating management information between two types of SNMP entities: Manager (Applications) and Agents. [18]

Agents: They are compliant devices; they store data about themselves in Management Information Base (MIB) (Each agent in SNMP maintain a local database of information relevant to network management is known as the Management Information Base) and return this data to the SNMP requesters. An agent has properties like: Implements full SNMP protocol, Stores and retrieves managed data as defined by the Management Information Base and can asynchronously signal an event to the manager.

Manager (Application): It issues queries to get information about the status, configuration and performance of external network devices. A manager has the following properties: Implemented as a Network Management Station (the NMS), implements full SNMP Protocol, able to Query Agents, get responses from Agents, set variables in agents and acknowledge asynchronous events from Agents. [18] 1.3 illustrates an interaction between a manager and an Agent.

The agent is software that enables a device to respond to manager requests to view or update MIB data and send traps reporting problems or significant events. It receives messages and sends a response back. An Agent does not have to wait for order to act, if a serious problem arises or a significant event occurs, it sends a TRAP (a message that reports a problem or a significant event) to the manager (software in a network management station that enables the station to send requests to view or update MIB variables, and to receive traps from an agent). The Manager software which is in the management station sends message to the Agent and receives a trap and responses. It uses User Data Protocol (UDP, a simple protocol enabling an application to send individual message to other applications. Delivery is not guaranteed, and messages need not be delivered in the same order as they were sent) to carry its messages. Finally, there is one application that enables end user to control the manager software and view network information. [19]

Table 1.4 comprises the Strengths and Weaknesses of SNMP.

Strengths

Weaknesses

Its design and implementation are simple.

It may not be suitable for the management of truly large networks because of the performance limitations of polling.

Due to its simple design it can be expanded and also the protocol can be updated to meet future needs.

It is not well suited for retrieving large volumes of data, such as an entire routing table.

All major vendors of internetwork hardware, such as bridges and routers, design their products to support SNMP, making it very easy to implement.

Its traps are unacknowledged and most probably not delivered.

Not applicable

It provides only trivial authentication.

Not applicable

It does not support explicit actions.

Not applicable

Its MIB model is limited (does not support management queries based on object types or values).

Not applicable

It does not support manager-to-manager communications.

Not applicable

The information it deals with neither detailed nor well-organized enough to deal with the expanding modern networking requirements.

Not applicable

It uses UDP as a transport protocol. The complex policy updates require a sequence of updates and a reliable transport protocol, such as TCP, allows the policy update to be conducted over a shared state between the managed device and the management station.

Table 1.4

1.5.2 RPC

A remote procedure call (RPC) is a protocol that allows a computer program running on one host to cause code to be executed on another host without the programmer needing to explicitly code for this. When the code in question is written using object-oriented principles, RPC is sometimes referred to as remote invocation or remote method invocation. It is a popular and powerful technique for constructing distributed, client-server based applications. An RPC is initiated by the caller (client) sending a request message to a remote system (the server) to execute a certain procedure using arguments supplied. A result message is returned to the caller. It is based on extending the notion of conventional or local procedure calling, so that the called procedure need not exist in the same address space as the calling procedure. The two processes may be on the same system, or they may be on different systems with a network connecting them. By using RPC, programmers of distributed applications avoid the details of the interface with the network. The transport independence of RPC isolates the application from the physical and logical elements of the data communications mechanism and allows the application to use a variety of transports. A distributed computing using RPC is illustrated in 1.4.

Local procedures are executed on Machine A; the remote procedure is actually executed on Machine B. The program executing on Machine A will wait until Machine B has completed the operation of the remote procedure and then continue with its program logic. The remote procedure may have a return value that continuing program may use immediately.

It intercepts calls to a procedure and the following happens:

* Packages the name of the procedure and arguments to the call and transmits them over network to the remote machine where the RPC server id running. It is called “Marshalling”. [20]

* RPC decodes the name of the procedure and the parameters.

* It makes actual procedure call on server (remote) machine.

* It packages returned value and output parameters and then transmits it over network back to the machine that made the call. It is called “Unmarshalling”. [20]

1.6 Comparison between Agent technology and network paradigms

Conventional Network Management is based on SNMP and often run in a centralised manner. Although the centralised management approach gives network administrators a flexibility of managing the whole network from a single place, it is prone to information bottleneck and excessive processing load on the manager and heavy usage of network bandwidth.

Intelligent Agents for network management tends to monitor and control networked devices on site and consequently save the manager capacity and network bandwidth. The use of Intelligent Agents is due to its major advantages e.g. asynchronous, autonomous and heterogeneous etc. while the other two contemporary technologies i.e. SNMP and RPC are lacking these advantages. The table below shows the comparison between the intelligent agent and its contemporary technologies:

Property

RPC

SNMP

Intelligent Agent

Communication

Synchronous

Asynchronous

Asynchronous

Processing Power

Less Autonomy

More Autonomous but less than Agent

More Autonomous

Network support

Distributed

Centralised

Heterogeneous

Network Load Management

Heavy usage of Network Bandwidth

Load on Network traffic and heavy usage of bandwidth

Reduce Network traffic and latency

Transport Protocol

TCP

UDP

TCP

Packet size Network

Only address can be sent for request and data on reply

Only address can be sent for request and data on reply

Code and execution state can be moved around network. (only code in case of weak mobility)

Network Monitoring

This is not for this purpose

Network delays and information bottle neck at centralised management station

It gives flexibility to analyse the managed nodes locally

Table 1.5

Indeed, Agents, mobile or intelligent, by providing a new paradigm of computer interactions, give new options for developers to design application based on computer connectivity.

20

Chapter 2 Learning Paradigms

2.1 Knowledge Discovery in Databases (KDD) and Information Retrieval (IR)

KDD is defined as “the nontrivial process of identifying valid, novel, potentially useful and ultimately understandable patterns in data” (Fayyad, Piatetsky-Shapiro and Smith (1996)). A closely related process of IR is defined as “the methods and processes for searching relevant information out of information systems that contain extremely large numbers of documents” (Rocha (2001)).

KDD and IR are, in fact, highly complex processes that are strongly affected by a wide range of factors. These factors include the needs and information seeking characteristics of system users as well as the tools and methods used to search and retrieve the structure and size of the data set or database and the nature of the data itself. The result, of course, was increasing numbers of organizations that possessed very large and continually growing databases but only elementary tools for KD and IR. [21] Two major research areas have been developed in response to this problem:

* Data warehousing:

It is defined as: “Collecting and ‘cleaning' transactional data to make it available for online analysis and decision support”. (Fayyad 2001, p.30)

· Data Mining:

It is defined as: “The application of specific algorithms to a data set for purpose of extracting data patterns”. (Fayyad p. 28)

2.2 Data Mining

Data mining is a statistical term. In Information Technology it is defined as a discovery of useful summaries of data.

2.2.1 Applications of Data Mining

The following are examples of the use of data mining technology:

* Pattern of traveller behavior mined: Manage the sale of discounted seats in planes, rooms in hotels.

* Diapers and beer: Observation those customers who buy diapers are more likely to buy beer than average allowed supermarkets to place beer and diapers nearby, knowing many customers would walk between them. Placing potato chips between increased sales of all three items.

* Skycat and Sloan Sky Survey: Clustering sky objects by their radiation levels in different bands allowed astronomers to distinguish between galaxies, nearby stars, and many other kinds of celestial objects.

* Comparison of genotype of people: With/without a condition allowed the discovery of a set of genes that together account for many case of diabetes. This sort of mining will become much more important as the human genome is constructed. [22] [23] [24]

2.2.2 Communities of Data Mining

As data mining has become recognised as a powerful tool, several different communities have laid claim to the subject:

* Statistics

* Artificial Intelligence (AI) where it is called “Machine Learning”

* Researchers in clustering algorithms

* Visualisation researchers

* Databases: When data is large and the computations is very complex, in this context, data mining can be thought of as algorithms for executing very complex queries on non-main-memory data.

2.2.3 Stages of data mining process

The following are the different stages of data mining process, sometimes called as a life cycle of data mining as shown in 2.1:

1- Data gathering: Data warehousing, web crawling.

2- Data cleansing: Eliminate errors and/or bogus data e.g.

Patient's fever = 125oC.

3- Feature extraction: Obtaining only the interesting attributes of the data e.g. “data acquired” is probably not useful for clustering celestial objects as in skycat.

4- Pattern extraction and discovery: This is the stage that is often thought of as “data mining” and is where we shall concentrate our efforts.

5- Visualisation of the data:

6- Evaluation of results: Not every discovered fact is useful, or even true! Judgment is necessary before following the software's conclusions. [22] [23] [24]

2.3 Machine Learning

There are five major techniques of machine learning in Artificial Intelligence (AI), which are discussed in the following sections.

2.3.1 Supervised Learning

It relies on a teacher that provides the input data as well as the desired solution. The learning agent is trained by showing it examples of the problem state or attributes along with the desired output or action. The learning agent makes a prediction based on the inputs and if the output differs from the desired output, then the agent is adjusted or adapted to produce the correct output. This process is repeated over and over until the agent learns to make accurate classifications or predictions e.g. Historical data from databases, sensor logs or trace logs is often used as training or example data. The example of supervised learning algorithm is the ‘Decision Tree', where there is a pre-specified target variable. [25] [5]

2.3.2 Unsupervised Learning

It depends on input data only and makes no demands on knowing the solution. It is used when learning agent needs to recognize similarities between inputs or to identify features in the input data. The data is presented to the Agent, and it adapts so that it partitions the data into groups. This process continues until the Agents place the same group on successive passes over the data. An unsupervised learning algorithm performs a type of feature detection where important common attributes in the data are extracted. The example of unsupervised learning algorithm is “the K-Means Clustering algorithm”. [25] [5]

2.3.3 Reinforcement Learning

It is a kind of supervised learning, where the feedback is more general. On the other hand, there are two more techniques in the machine learning, and these are: on-line learning and off-line learning. [25] [5]

2.3.4 On-line and Off-line Learning

On-line learning means that the agent is adapting while it is working. Off-line involves saving data while the agent is working and using the data later to train the agent. [25] [5]

In an intelligent agent context, this means that the data will be gathered from situations that the agents have experienced. Then augment this data with information about the desired agent response to build a training data set. Once this database is ready it can be used to modify the behaviour of agents. These approaches can be combined with any two or more into one system.

In order to develop Learning Intelligent Agent(LIAgent) we will combine unsupervised learning with supervised learning. We will test LIAgents on Iris dataset, Vote dataset about the polls in USA and two medical datasets namely Breast and Diabetes. [26] See Appendix A for all these four datasets.

2.4 Supervised Learning (Decision Tree ID3)

Decision trees and decision rules are data mining methodologies applied in many real world applications as a powerful solution to classify the problems. The goal of supervised learning is to create a classification model, known as a classifier, which will predict, with the values of its available input attributes, the class for some entity (a given sample). In other words, classification is the process of assigning a discrete label value (class) to an unlabeled record, and a classifier is a model (a result of classification) that predicts one attribute-class of a sample-when the other attributes are given. [40]

In doing so, samples are divided into pre-defined groups. For example, a simple classification might group customer billing records into two specific classes: those who pay their bills within thirty days and those who takes longer than thirty days to pay. Different classification methodologies are applied today in almost every discipline, where the task of classification, because of the large amount of data, requires automation of the process. Examples of classification methods used as a part of data-mining applications include classifying trends in financial market and identifying objects in large image databases. [40]

A particularly efficient method for producing classifiers from data is to generate a decision tree. The decision-tree representation is the most widely used logic method. There is a large number of decision-tree induction algorithms described primarily in the machine-learning and applied-statistics literature. They are supervised learning methods that construct decision trees from a set of input-output samples. A typical decision-tree learning system adopts a top-down strategy that searches for a solution in a part of the search space. It guarantees that a simple, but not necessarily the simplest tree will be found. A decision tree consists of nodes, where attributes are tested. The outgoing branches of a node correspond to all the possible outcomes of the test at the node. [40]

Decision trees are used in information theory to determine where to split data sets in order to build classifiers and regression trees. Decision trees perform induction on data sets, generating classifiers and prediction models. A decision tree examines the data set and uses information theory to determine which attribute contains the information on which to base a decision. This attribute is then used in a decision node to split the data set into two groups, based on the value of that attribute. At each subsequent decision node, the data set is split again. The result is a decision tree, a collection of nodes. The leaf nodes represent a final classification of the record. ID3 is an example of decision tree. It is kind of supervised learning. We used ID3 in order to print the decision rules as its output. [40]

2.4.1 Decision Tree

Decision trees are powerful and popular tools for classification and prediction. The attractiveness of decision trees is due to the fact that, in contrast to neural networks, decision trees represent rules. Rules can readily be expressed so that humans can understand them or even directly used in a database access language like SQL so that records falling into a particular category may be retrieved. Decision tree is a classifier in the form of a tree structure, where each node is either:

Leaf node - indicates the value of the target attribute (class) of examples, or Decision node - specifies some test to be carried out on a single attribute value, with one branch and sub-tree for each possible outcome of the test. Decision tree induction is a typical inductive approach to learn knowledge on classification.

The key requirements to do mining with decision trees are:

· Attribute value description: Object or case must be expressible in terms of a fixed collection of properties or attributes. This means that we need to discretise continuous attributes, or this must have been provided in the algorithm.

· Predefined classes (target attribute values): The categories to which examples are to be assigned must have been established beforehand (supervised data).

· Discrete classes: A case does or does not belong to a particular class, and there must be more cases than classes.

* Sufficient data: Usually hundreds or even thousands of training cases. A decision tree is constructed by looking for regularities in data. [27] [5]

2.4.2 ID3 Algorithm

J. Ross Quinlan originally developed ID3 at the University of Sydney. He first presented ID3 in 1975 in a book, Machine Learning, vol. 1, no. 1. ID3 is based on the Concept Learning System (CLS) algorithm. [28]

function ID3

Input: (R: a set of non-target attributes,

C: the target attribute,

2.4.3 Functionality of ID3

ID3 searches through the attributes of the training instances and extracts the attribute that best separates the given examples. If the attribute perfectly classifies the training sets then ID3 stops; otherwise it recursively operates on the m (where m = number of possible values of an attribute) partitioned subsets to get their "best" attribute.

The algorithm uses a greedy search, that is, it picks the best attribute and never looks back to reconsider earlier choices. If the dataset has no such attribute which will be used for the decision then the result will be the misclassification of data.

Entropy - a measure of homogeneity of the set of examples. [5]

Entropy(S) = - pplog2 pp - pnlog2 pn (1)

(2)

2.4.4 Decision Tree Representation

A decision tree is an arrangement of tests that prescribes an appropriate test at every step in an analysis. It classifies instances by sorting them down the tree from the root node to some leaf node, which provides the classification of the instance. Each node in the tree specifies a test of some attribute of the instance, and each branch descending from that node corresponds to one of the possible values for this attribute. This is illustrated in 2.3. The decision rules can also be obtained from ID3 in the form of if-then-else, which can be use for the decision support systems and classification.

Given m attributes, a decision tree may have a maximum height of m. [29][5]

2.4.5 Challenges in decision tree

Following are the issues in learning decision trees:

* Determining how deeply to grow the decision tree.

* Handling continuous attributes.

* Choosing an appropriate attribute selection measure.

* Handling training data with missing attribute values.

* Handling attributes with differing costs and

* Improving computational efficiency.

2.4.6 Strengths and Weaknesses

Following are the strengths and weaknesses in decision tree:

Strengths

Weaknesses

It generates understandable rules.

It is less appropriate for estimation tasks where the goal is to predict the value of a continuous attribute.

It performs classification without requiring much computation.

It is prone to errors in classification problems with many class and relatively small number of training examples.

It is suitable to handle both continuous and categorical variables.

It can be computationally expensive to train. The process of growing a decision tree is computationally expensive. At each node, each candidate splitting field must be sorted before its best split can be found. Pruning algorithms can also be expensive since many candidate sub-trees must be formed and compared.

It provides a clear indication of which fields are most important for prediction or classification.

It does not treat well non-rectangular regions. It only examines a single field at a time. This leads to rectangular classification boxes that may not correspond well with the actual distribution of records in the decision space.

Table 2.1

2.4.7 Applications

Decision tree is generally suited to problems with the following characteristics:

a. Instances are described by a fixed set of attributes (e.g., temperature) and their values (e.g., hot).

b. The easiest situation for decision tree learning occurs when each attribute takes on a small number of disjoint possible values (e.g., hot, mild, cold).

c. Extensions to the basic algorithm allow handling real-valued attributes as well (e.g., a floating point temperature).

d. A decision tree assigns a classification to each example.

i- Simplest case exists when there are only two possible classes (Boolean classification).

ii- Decision tree methods can also be easily extended to learning functions with more than two possible output values.

e. A more substantial extension allows learning target functions with real-valued outputs, although the application of decision trees in this setting is less common.

f. Decision tree methods can be used even when some training examples have unknown values (e.g., humidity is known for only a fraction of the examples). [30]

Learned functions are either represented by a decision tree or re-represented as sets of if-then rules to improve readability.

2.5 Unsupervised Learning (K-Means Clustering)

Cluster analysis is a set of methodologies for automatic classification of samples into a number of groups using a measure of association, so that the samples in one group are similar and samples belonging to different groups are not similar. The input for a system of cluster analysis is a set of samples and a measure of similarity (or dissimilarity) between two samples. The output from cluster analysis is a number of clusters that form a partition, or a structure of partitions, of the data set. One additional result of cluster analysis is a generalised description of every cluster, and this is especially important for a deeper analysis of the data set's characteristics.

K-Means clustering algorithm is the most commonly used algorithm employing a square-error criterion. It is used for finding the similar patterns due to its simplicity and fast execution. It starts with a random, initial partition and keeps re-assigning the samples to clusters, based on the similarity between samples and clusters, until a convergence criterion is met. Typically, this criterion is met when there is no re-assignment of any sample from one cluster to another that will cause a decrease of the total squared error. K-Means algorithm is popular because it is easy to implement, and its time and space complexity is relatively small. A major problem with this algorithm is that it is sensitive to the selection of the initial partition and may converge to a local minimum of the criterion function if the initial partition is not properly chosen.

Clustering in data mining is a useful technique for discovering interesting data distribution and pattern in the data sets. It is actually a way to organize similar patterns into groups. It appears extensively in machine learning literature and in most data mining suits. It is probably the father of the entire family of clustering algorithms. This algorithm was introduced by J.B.MacQueen in 1967. [31]

2.5.1 Cluster and Clustering

Clustering of data is a method by which a large set of data is grouped into clusters of smaller sets of similar data. It is concerned with finding structure in a large set of data. It is defined as: The process of organising objects into groups whose members are similar in some way. A cluster is therefore a collection of objects which are “similar” between them and “dissimilar” to objects belonging to other clusters. Similarity criterion is a distance and this is called distance-based clustering.

2.5.2 Applications of Clustering

Clustering techniques are used for combining observed examples into clusters (groups) which satisfy two main criteria. Firstly, each group or cluster is homogeneous; examples that belong to the same group are similar to each other and secondly, each group or cluster should be different from other clusters, i.e., examples that members belong to one cluster should be different from the examples of other clusters.

2.5.3 Clusters Analysis

Depending on the clustering technique, clusters can be expressed in different ways:

* Identified clusters may be exclusive, so that any example belongs to only one cluster.

* They may be overlapping; an example may belong to several clusters.

* They may be probabilistic, whereby an example belongs to each cluster with a certain probability.

* Clusters might have hierarchical structure, having crude division of examples at highest level of hierarchy, which is then refined to sub-clusters at lower levels. [31]

2.5.4 K-Means Algorithm

The algorithm has as an input a predefined number of clusters that is the ‘k' from its name. Means stands for an average, an average location of all the members of a particular cluster. When dealing with clustering techniques, one has to adopt a notion of a high dimensional space, or space in which orthogonal dimensions are all attributes from the table of data we are analysing. The value of each attribute of an example represents a distance of the example from the origin along the attribute axes. In order to use this geometry efficiently, the values in the data set must all be numeric (categorical data must be transformed into numeric ones) and should be normalised in order to allow fair computation of the overall distances in a multi-attribute space. The basic mechanism of all clustering algorithms is represented in 2.4.

K-Means algorithm is an iterative procedure, where final results depend on the values selected for initial centroids A Centroid is an artificial point in the space of records which represents an average location of the particular cluster. The coordinates of this point are averages of attribute values of all examples that belong to the cluster. Finding useful pattern in large datasets has attracted considerable interest recently and of the most widely studied problems in this area is the identification of clusters or densely populated regions, in a multi-dimensional data set.

Given the desired number of clusters ‘k' and a dataset of ‘N' points and a distance-based measurement function, two major problems with this K-means Clustering Algorithm remain:

Wrong choice of initial centroids: This problem deals with the production of erroneous results. The algorithm depends upon this selection. If the selection is correct or accurate then the result will be fine otherwise it may fail.
Selection of initial partition: The algorithm is sensitive in selection of the initial partition and may converge to a local minimum of the criterion function if the initial partition is not properly chosen.

It has two primary steps first, the assignment step where the instances are placed in the closest class and the second one, re-assignment step where the class centroids are recalculated from the instances to the class. [33] [22]

The K-means-clustering algorithm is a popular approach to finding clusters due to simplicity of implementation and fast execution. It appears extensively in the machine learning literature and in most data mining suites of tools. However, its implementation makes an algorithm whose behaviour is complex. The following are very important points to understand this behaviour:

* How do we prepare the data for clustering?

* How do we formulate the problem and use the clustering tool?

* How do we interpret the clusters and use them? [34] [22]

Centroid of a cluster

Centroid is an artificial point in the space of records which represents an average location of the particular cluster. The coordinates of this point are averages of attribute values of all examples that belong to the cluster.

The basic steps of K-means Algorithm

K-Means algorithm's basic steps are shown in 2.5.

The interpretation of this flow chart in 2.5 is follows:

Enter the number of clusters and number of iterations, which are the required and basic inputs of the K-Means algorithm, then compute the initial centroids by providing the formula shown in equation 3 and 4 below. Calculate the distance either by Euclidean's distance or City Block (Manhattan) distance formulae (equations 5 and 6). On the basis of these distances, generate the partition by assigning each sample to the closet cluster. Compute new cluster centers as centroids of the clusters, again compute the distances and generate the partition. Repeat this until the cluster memberships stabilises.

Now the initial centroid will be C(ci, cj).Where: max X, max Y, min X and min Y represent maximum and minimum values of X and Y attributes respectively. k represents number of clusters and i,j and n vary from 1 to k where k is an integer.

In this way, we can calculate the initial centroids; this will be the starting point of this algorithm by using the square-errors.

Where d(xi, xj) is the distance between xi and xj. xi and xj are the attributes of a given object, where i and j vary from 1 to N where N is total number of attributes of a given object. i,j and N are integers.

There are different techniques available to use the initial points (it is required as an input). The following are the different methods to calculate the initial points:

* Corner: In this method all the values in the data sets are scaled to be in [-1, 1], the set of all the clusters close to the vertices (-1,…,-1) is considered. This is usually a ‘bad' set of initial points since it lies on the boundary of the data, and can be considered an outlier.

* Bins: This method consists in dividing the space in bin and then takes random points inside each bin, this assures that the set of initial points are distributed covering the entire dataset.

* Centroid: This method consists of choosing all the starting clusters close to the mass centroid of the dataset. Each cluster center is calculated by adding a small random perturbation to the centroid of the dataset.

* Spread: The cluster centers are distributed randomly and trying to cover the entire space. This method is similar to bins.

* PCA: The data points are projected in the space of the principal component, a clustering procedure is applied to this one-dimensional set. The cluster centres are calculated depending on the obtained clusters in the one-dimensional space.

Among the above mentioned methods, in order to calculate the initial points, we will opt for the ‘centroid' method because it has been advocated throughout the literature that the centroid method produces the best results.

Stages of Clustering Applications:

Typically, a clustering application involves the following stages as illustrated in 2.6:

* Develop a data set. (In order to develop a data set one needs to define a substantive problem or issue).

* Data pre-processing. (In this stage missing and unreliable entries are checked).

* Clustering data. (After Applying the Algorithm, the data set will be clustered into the required number of clusters).

* Interpretation of clusters. (The most important stage, after the clustering, the interpretation of these clusters).

* Drawing conclusions. (The final stage is drawing conclusions with respect to the issue in question from the result).

The view is supported by the fact that, typically, clustering results are not supported to solve the entire substantive problem but rather contribute to an aspect of it. On the other hand, clustering algorithms are supposed to be applied to situations and issues at which the user's knowledge of the domain is not quite deep but rather developing. K-Means is implemented in statistical packages like Statistical Package for Social Science (SPSS) and data mining programs such as Clementine and MineSet. [35]

2.5.5 Strengths and Weaknesses

One of the most basic, straightforward clustering techniques is the classic K-means algorithm, in use for several decades. It is probably the father of this entire family of algorithms. K-Means forms clusters in numeric domains, partitioning instances into disjoint clusters. Variations of K-Means where the Euclidean distance function is replaced by another distance have been proposed. It has been shown that the performance of the method is strongly related to the distance used. In summary, only the K-means algorithm and its equivalent in an artificial neural networks domain-the Kohonen network have been applied for clustering on large data sets. Other approaches have been tested, typically, on small data sets. The reasons behind the popularity and weaknesses of K-Means algorithms are as follows:

Strengths

Weaknesses

Its time complexity is O(nkl), where n is the number of samples, k is the number of clusters, and 1 is the number of iterations taken by the algorithm to converge. Typically, k and 1 are fixed in advance and so the algorithm has linear time complexity in the size of the data set.

Although this algorithm is easy to implement, it has the drawback of being greedy in the sense that it tends to get stuck in a local minimum that usually depends on the initial centre provided. This makes this algorithm very sensitive to initial points. As an alternative to the problem of local minimum, stochastic methods with a close relation to physics, such as variations of simulated annealing, have been applied to clustering optimisation problems.

Its space complexity is O(k + n), and if it is possible to store all the data in the primary memory, access time to all elements is very fast and the algorithm is very efficient.

If an obvious distance measure does not exist we must define it, which is not always easy, especially in multidimensional spaces.

It is an order-independent algorithm. For a given initial distribution of clusters, it generates the same partition of the data at the end of the partitioning process irrespective of the order in which the samples are presented to the algorithm.

The Results of the clustering algorithm (that in many cases can be arbitrary itself) can be interpreted in different ways.

Not applicable

Current clustering techniques do not address all the requirements adequately (and concurrently).

Not applicable

Dealing with large number of dimensions and large number of data items can be problematic because of time complexity.

Not applicable

The effectiveness of the method depends on the definition of “distance” (for distance-based clustering).

Table 2.2

2.5.6 Challenges in Clustering

Most of the issues related to automatic cluster detection are connected to the kinds of questions wanted to answer in the data mining project, or data preparation for their successful application.

* Distance measure: Most clustering techniques use for the distance measure is the Euclidean distance formula (square root of the sum of the squares of distances along each attribute axes). Non-numeric variables must be transformed and scaled before the clustering can take place. Depending on these transformations, the categorical variables may dominate clustering results or they may be even completely ignored.

* Choice of the right number of clusters: If the number of clusters k, in the K-Means method, is not chosen to match the natural structure of the data, the results will not be good. The proper way to alleviate this is to experiment with different values for k. In principle, the best k value will exhibit the smallest intra-cluster distances and largest inter-cluster distances. More sophisticated techniques measure these qualities automatically, and optimise the number of clusters in a separate loop (Auto Class).

* Cluster interpretation: Once the clusters are discovered they have to be interpreted in order to have some value for the data mining project. There are different ways to utilise clustering results:

i. Cluster membership can be used as a label for the separate classification problem. Some descriptive data mining technique (like decision trees) can be used to find descriptions of clusters.

ii. Clusters can be visualised using 2D and 3D scatter graphs or some other visualisation technique.

iii. Differences in attribute values among different clusters can be examined, one attribute at a time.

* Application issues: Clustering techniques are used when we expect natural groupings in examples of the data. Clusters should then represent groups of items (products, events, customers) that have a lot in common. Creating clusters prior to application of some other data mining technique (decision trees, neural networks) might reduce the complexity of the problem by dividing the space of examples. This space partitions can be mined separately and such two-step procedure might exhibit improved results (descriptive or predictive) as compared to data mining without using clustering.

2.5.7 Applications

The followings are the possible application rather than areas of this K-Means Clustering Algorithms:

* Marketing: Finding groups of customers with similar behaviour given a large database of customer containing their properties and past records.

* Biology: Classification of plants and animals given their features.

* Libraries: Book ordering.

* Insurance: Identifying groups of motor insurance policy holders with a high average claim cost; identifying frauds.

* City-planning: Identifying groups of houses according to their house type, value and geographically location.

* Earthquake studies: Clustering observed earthquake epicenters to identify dangerous zones.

* WWW: Document classification; clustering web log data to discover groups of similar access patterns.

* Medical Sciences: Classification of medicines; patient records according to their doses etc. [36]

Chapter 3 Results and Discussion

3.1 Results and Discussion

The data mining algorithm has been implemented in the form of a multiagent system, comprising two agents. The first agent implements K-means algorithm with proposed initial centroids formula and the second one is based on ID3 algorithm for the interpretation of its results, under the ‘Kaariboga” framework, an experimental Java-based mobile agent framework [Annex]. Choosing a clustering algorithm, however, can be a difficult task; even finding just the most relevant approaches for a given dataset is not obvious. Finding a good similarity function depends strongly on the dataset and the determinant elements are the nature of the data and the desired number of clusters. Another element is the type of input and tools that the algorithm requires because some algorithms only handle the numeric inputs and others categorical. For simplicity the tests were performed on datasets which have numerical attributes values. The chosen datasets are:

* Iris data set: This data set contains information about the flowers. There are 3 different classes of flowers. It has 150 records with 5 attributes. After applying the process of normalisation, only 100 are selected.

* Vote data set: This data set is about the polls in USA. There are 2 different classes of polls i.e. Republicans or Democrats. There are 300 records and 17 attributes. The data set is normalised and the same number of records for clustering is chosen.

* Breast data set: This is a medical data set, containing information about the Breast's diseases. There are 2 different classes of Breast's diseases. It has total of 700 records and 11 attributes. After applying the normalisation process, only 233 records are selected for clustering.

* Diabetes data set: This is also a medical data set, containing information about Diabetes' diseases. We have divided this data set into 5 different classes. It has total 513 records and 8 attributes. Similarly, we normalise this dataset and select only 256 records for clustering.

The vertical partitions of each datasets by selecting the proper number of attributes have been created which are suitable for this clustering algorithm. The clusters are created on the basis of the classes in these four datasets, that is, for dataset ‘iris' 3 clusters, for dataset ‘vote' 2 clusters, for dataset ‘breast' 2 clusters and for dataset ‘diabetes' 5 clusters; which is also the value of ‘k' in this algorithm. Another required input is the number of iterations, which is ‘n', for these datasets the value of ‘n = 100' is chosen. These datasets are illustrated in Appendix A.

There is no predefined rule, how many clusters will be created and what will be the number of iterations? It always depends upon the user. If the result is not according to the requirements, the values of ‘k' (number of clusters) and ‘n' (number of iterations) can be changed every time until the required and better results are found. This is a weakness of this algorithm. Once the clusters are discovered, they have to be interpreted in order to have some value; this is another major issue in this algorithm. There are different ways to utilise clustering results; clusters membership can be used as a label for the separate classification problems, some descriptive data mining techniques like ID3 (decision tree) can be used to find descriptions of clusters and clusters can be visualised using 2D or 3D scattered graphs. We have used 2D scattered graphs and ID3 for visualizing as well as interpreting the results of K-means clustering algorithm.

The K-means algorithm with the proposed initial centroids formula was tested on both peer-to-peer and client-server network. There is no difference between results obtained from both modes. The results are demonstrated in tables 3.1, 3.2, 3.3, 3.4 and 3.5.

K-Means

Number of Clusters

Number of Iterations

Record/cluster

Percentage record/cluster

Euclidean's distance

3

100

1 = 38

2 = 50

3 = 12

38

50

12

City Block (Manhattan) distance

3

100

1 = 27

2 = 38

3 = 35

27

38

35

Table 4.1-Iris a flower dataset

K-Means

Number of Clusters

Number of Iterations

Record/cluster

Percentage record/cluster

Euclidean's distance

2

100

1 = 181

2 = 119

60

39

City Block (Manhattan) distance

2

100

1 = 115

2 = 185

38

61

Table 3.2 - Vote a polls dataset

K-Means

Number of Clusters

Number of Iterations

Record/cluster

Percentage record/cluster

Euclidean's distance

2

100

1 = 142

2 = 91

60

39

City Block (Manhattan) distance

2

100

1 = 178

2 = 55

76

23

Table 3.3 - Breast a medical dataset

K-Means

Number of Clusters

Number of Iterations

Record/cluster

Percentage record/cluster

Euclidean's distance

5

100

1 = 52

2 = 26

3 = 11

4 = 91

5 = 76

20

10

4

35

29

City Block (Manhattan) distance

5

100

1 = 83

2 = 11

3 = 58

4 = 81

5 = 23

32

4

22

31

8

Table 3.4 - Diabetes a medical dataset

K-Means/Data sets

Iris

Vote

Breast

Diabetes

Euclidean's distance

C1 = 38

C2 = 50

C3 = 12

C1=181

C2=119

C1 = 142

C2 = 91

C1 = 52

C2 = 26

C3 = 11

C4 = 91

C5 = 76

City-Block (Manhattan) distance

C1 = 27

C2 = 38

C3 = 35

C1=115

C2=185

C1 = 178

C2 = 55

C1 = 83

C2 = 11

C3 = 58

C4 = 81

C5 = 23

Table 3.5 - Summary of datasets

We have noticed in all the four datasets the results obtained from this proposed formula are satisfactory and consistent. So, the further tests have been demonstrated, the formula is transcendental, that is, independent of size and nature of the data.

For visualising the obtained clusters we will first apply scattered graph technique only on dataset ‘iris', by taking ‘petal_length', ‘petal_width' and ‘class' attributes of this dataset. 3.1 shows a scattered graph of all data points in ‘iris' dataset before applying the K-means algorithm.

References

[1] Cawsey, Alison. The Essence of Artificial Intelligence. ISBN: 0135717795,1997.

[2] Hermans, Bjorn. Intelligent Software Agents on the Internet: an inventory of Currently offered functionality in the information society & a prediction of (near-) future developments. The Netherlands, 9th July, 1996.

[3] Kiniry, J., and Zimmerman, Daniel. A Hands-On look at Java Mobile Agents, IEEE, 1997.

[4] Robert, S.G. Mobile Agents: Overcoming Early Hype and a Bad Name. Dartmouth College, Thayer School of Engineering 8000 Cummings Hall, Hanover, New Hampshire URL: http://csdl.computer.org/comp/proceedings/mdm/2004/2070/00/20700302.pdf., 2004

[5] Bigus, J.P. Constructing Intelligent Agents using JAVA - 2nd ed. ISBN: 0- 71- 39601-X., 2001.

[6] Mohammadian (ed), Masoud. Intelligent Agents for Data Mining and Information Retrieval. ISBN: 1591401941 Idea Group Publishing., 2004.

[7] Lang, D.B. Mobile Agents: Environments, Technologies, and Applications. In proceedings of the Third International Conference on the practical Application of Intelligent Agents and Multi-Agent Technology, 11-14. The Practical Application Company., 1998.

[8] Gray, R., Cybenko, G., Kotz, D., and Rus, D. Mobile Agents: Motivations and State of the Art, Handbook of Agent Technology, AAAI/MIT Press, 2002. URL: http://citeseer.ist.psu.edu/context/1209666/0, 2002.

[9] Ichiro, S. (2004). Selection of Mobile Agents, 24th IEEE, ICDCS 2004.

[10] Sutandiyo, W., Chhetri, M. B., Krishnaswamy, Shonali., and Loke, S.Wai. Experiences with Software Engineering of Mobile Agent Applications, Australian Software Engineering Conference (ASWEC) 2004.

[11] Wayne, Jansen., Peter, Mell., Tom, Karygiannis., and Don, Marks. (1999). Applying Mobile Agents to Intrusion Detection and Response. National Institute of Standards and Technology Computer Security Division NIST Interim Report (IR) - 6416 October 1999 at http://csrc.nist.gov/publications/nistir/ir6416.pdf , 1999.

[12] Suna, Alexandru., Fallah-Seghrouchni, Amal El., and Klein, Giles. Using Mobile Agents for resource sharing, IEEE/WIC/ACM, IAT 2004.

[13] Sundsted, T. Agents on the move, Bolster your client apps by adding agent mobility. URL:http://www.javaworld.com/javaworld/jw-07-1998/jw-07- howto.html, 1998.

[14] Suna, Alexandru., and Fallah-Seghrouchni, Amal El. A Programming language for Autonomous and Mobile Agents, IEEE/WIC, IAT 2003.

[15] Wang, Ji., Shen, Rui., and Zhu, Hong. Scenario Mechanism in Agent-Oriented Programming, IEEE, APSEC 2004.

[16] Suna, Alexandru., and Fallah-Seghrouchni, Amal El. Programming Mobile Intelligent Agents: an Operational Semantics, IEEE/WIC/ACM, IAT 2004.

[17] Tripathi, A.R., Ahmed, T., and Karnik, N.M. Experiences and future challenges in mobile agent programming. Microprocessors and Microsystems, Vol 25. March 2003, pp.121, 129. , 2003.

[18] Stallings, W. SNMP, SNMPv2 and RMON: Practical Network Management. Addison-Wesley, 1999.

[19] Stallings, W. SNMP and SNMPv2: The Infrastructure for Network Management. IEEE Communications Magazine, vol. 36, n_3, p. 37-43, March 1998.

[20] Darby, Chad., Griffin, John., Haan, Psacal den., Haan, Peter den.,Konstantinou, A.V., Li, Sing., MacLean. Sean., Mitchell II, G. E., Peach, Joel.,Wansch, Peter., and Wright, William. Beginning Java Networking. Wrox Press. ISBN: 81-7366-399-8, 2001.

[21] Donovan, Anne Maric. Knowledge Discovery in Databases and Information Retrieval in Knowledge Management Systems. Knowledge Management System, LIS385T, The University of Austin, April 22, 2003.

[22] Kantardzic, Mehmed. Data Mining: Concepts, Models, Methods, and Algorithms. ISBN: 471228524 John Wiley & Sons., 2003.

[23] Ullman, J. D. Data Mining: A knowledge discovery in databases. URL: http://www-db.stanford.edu/~ullman/mining

[24] Threaling, Kurt. An Introduction to Data Mining, http://www.threaling.com/dmintro/dmintro.htm. , 2003.

[25] Mannila, Heikki. Data mining: Machine Learning, Statistics and Databases, Proceedings of the 8th IEEE, SSDBM 96., 1996.

[26] US Census Bureau. Iris, Diabetes, Vote and Breast datasets. Publicly available at URL: www.sgi.com/tech/mlc/db.

[27] Keedwell, Ed., Bessler, Florian., Narayanan, Ajit., and Savic, Dragan. From Data Mining to Rule Refining, A new tool for post data mining rule optimization, IEEE, ICTAI 2000.

[28] Quinlan, J.R. C4.5: Programs for Machine Learning. San Francisco: Morgan Kanfmann. ISBN: 1-55860-238-0., 1993.

[29] Bentayeb, Fadila., Darmont, Jerome., and Udrea, Cedric. Efficient Integration of Data Mining Techniques in Database Management Systems,IEEE, IDEAS 2004.

[30] Zhang, Jianping., Bloedorn, Eric., Rosen, Lowell., and Venese, Daniel. Learning Rules from Highly Unbalanced Data Sets, IEEE, ICDM 2004.

[31] MacQueen, J.B. "Some Methods for classification and Analysis of Multivariate Observations, Proceedings of 5-th Berkeley Symposium on Mathematical Statistics and Probability", Berkeley, University of California Press, 1:281-297

[32] Fung, Glenn. A Comprehensive Overview of Basic Clustering Algorithms. June 22, 2001.

[33] Davidson, Ian. Understanding K-Means Non-hierarchical Clustering, SUNY Albany - Technical Report 2002.

[34] Kanungo, T., Mount, D. M., Netanyahu, N. S., Piatko, C. D., Silverman, R.,and Wu, A.Y. An Efficient k-Means Clustering Algorithm: Analysis and Implementation, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 24, No. 7, July 2002.

[35] Mirkin, Boris. Towards Comprehensive Clustering of Mixed Scale Data with K-Means. School of Computer Science and Information Systems, Birkbeck College, University of London, Mallet Street, London, WC1E 7HX., 1998.

[36] Skrypnik, Irina., Terziyan, Vagan., Puuronen, Seppo., and Tsymbal, Alexey. Learning Feature Selection for Medical Databases. CBMS 1999.

[37] Horvat, H., Cvetkovi, D., Milutinovi, D., Kovi, P., and Kovaevi, V. Mobile Agents and Java Mobile Agent Toolkits. Proceedings of the 33rd Hawaii International Conference on System Sciences (HICSS-33). CD, ROM, IEEE Computing Society, Los Alamos, CA, p.10. , 2000.

[38] Kaariboga. A Java- Based Experimental Mobile Agent Framework. URL: http//www.projectory.de/kaariboga. , 2004.

[39] Pakistan. Map of Pakistan. URL: www.mideastweb.org/pakistan.htm., 2004.

[40] http://dms.irb.hr/tutorial/tut_dtrees.php

Appendix A

Appendix B

An agent program for the ID3 algorithm under kaariboga agent framework package org.kaariboga.agents;

import java.lang.*;

import java.io.*;

import java.util.*;

import java.awt.*;

import java.awt.event.*;

import javax.swing.*;

import javax.swing.event.*;

import org.kaariboga.core.Kaariboga;

import org.kaariboga.core.KaaribogaEvent;

public class dtAgent1 extends Kaariboga

{ String filename1;

String outputfile;

public dtAgent1(String name)

{ super("DECISIONTREE2_"+name);

}

public void onArrival()

{ System.out.println("This Agent will print the decision rules");

}

public void onDestroy() throws ArrayIndexOutOfBoundsException

{ System.out.println("The Agent is going to be destroyed:");

}

public void KaaribogaDestroyRequest(KaaribogaEvent e)

{ System.out.println("The agent Destroyed");

}

public void run()

{

JOptionPane dialog = new JOptionPane();

filename1 = dialog.showInputDialog (null, "Enter Data File Name");

dtmain1 dtm = new dtmain1();

long startTime = System.currentTimeMillis();

try

{ String in = dtm.inputFile(filename1);

}

catch(Exception e)

{ System.out.println("Unable to open this Data File:"+e);

}

System.out.println("The Decision Rules are...");

dtm.decisionRules(); // actual program which will print the decision rules

long endTime = System.currentTimeMillis();

long totalTime = (endTime - startTime)/1000;

System.out.println("Total Time to Build a Decision Tree is: "+totalTime+" Second");

}

}

Appendix C

A class diagram for ID3 algorithm.

A class diagram for K-means clustering algorithm using City Block formula.

A class diagram for K-means clustering algorithm using Euclidean's distance.

Message exchange between agents on different servers.

Message exchange between Agents on same server

Glossary

Agent Oriented Programming (AOP): It is a programming technique which deals with an object, which has independent thread of control and can be initiated. Centroid of a cluster: The centroid of multi-dimensional data points is the data point that is the mean of the values in each dimension. For XY data, the centroid is the point at (mean of the X values, mean of the Y values). Average or mean value of the objects contained in the cluster on each variable.

Clustering: Clustering of data is a method by which large a set of data is grouped into clusters of smaller sets of similar data. It is concerned with finding structure in a large set of data. It is defined as: The process of organizing objects into groups whose members are similar in some way. Cluster: A cluster is therefore a collection of objects which are “similar” between them and a “dissimilar” to the objects belonging to other clusters. Similarity criterion is a distance this is called distance-based clustering. Information Retrieval (IR): The methods and processes for searching relevant information out of information systems that contain extremely large numbers of documents.

Intelligent Agent: A piece of software which performs a given task using information gleaned from its environment to act in a suitable manner so as to complete the task successfully. The software should be able to adapt itself based on changes occurring in its environment, so that a change in circumstances will still yield the intended results.

Knowledge Discovery in Databases (KDD): The nontrivial process of identifying valid, novel, potentially useful and ultimately understandable patterns in data.

Learning Intelligent Agent (LIAgent): An agent which is capable of to do classification, clustering and prediction using learning algorithms is called a Learning Intelligent Agent (LIAgent).