This essay has been submitted by a student. This is not an example of the work written by our professional essay writers.
The Multiagent System for the Prediction, Classification, Interpretation and Visualization of Diabetes, a Medical Database using Data Mining Algorithms
'k-means' clustering algorithm is used for finding the similar patterns from the large databases. It can be applied in number of areas, for example, Marketing, Libraries, Insurance, City-planning, Earthquake studies, WWW and Medical Sciences etc. In Medical Sciences, the classification of medicines; patient records according to their doses etc. can be performed by applying the k-means clustering algorithm. The issue is how to interpret these clusters. It is not easy for all the users that they can interpret and extract the required results from these clusters, until some visualization tools are not used. In this case study we are using 'Diabetes' a medical dataset having the following attributes:
a) Number of times pregnant,
b) Plasma glucose concentration a 2 hours in an oral glucose tolerance test,
c) Diastolic blood pressure (mm Hg),
d) Triceps skin fold thickness (mm),
e) 2-Hour serum insulin (m U/ml),
f) Body mass index (weight in kg/(height in m)^2),
g) Diabetes pedigree function,
i) Class (whether diabetes is +ive or -ive).
There are two sources of data distribution, firstly, centralized data source and secondly, distributed data source. The distributed data source is further has two approaches in the partitioning of data, first, horizontally partitioned data, where same sets of attributes are on each node, this case is also called homogeneous case. The second is vertically partitioned data, which requires that different attributes are observed at different nodes, this case is also called heterogeneous case. It is required that each node must contain a unique identifier to facilitate matching in vertical partition.
In this paper we use the vertical partitioning of data. The following four vertical partitions of dataset 'Diabetes' are created, the attribute 'class' is a unique identifier in all the partitions:
Table - 1 Vertically distributed Diabetes dataset at node 1
Number of times pregnant Plasma glucose concentration a 2 hours in an oral glucose tolerance test Class 4 148 -ive 2 85 +ive 2 185 -ive
Table - 2 Vertically distributed Diabetes dataset at node 2
Diastolic blood pressure (mm Hg) Triceps skin fold thickness (mm) Class 72 35 -ive 66 29 +ive 64 0 -ive
Table - 3 Vertically distributed Diabetes dataset at node 3
2-Hour serum insulin (m U/ml) Body mass index (weight in kg/(height in m)^2) Class 0 33.6 -ive 94 28.1 +ive 168 43.1 -ive
Table - 4 Vertically distributed Diabetes dataset at node 4
Diabetes pedigree function Age Class 0.627 50 -ive 0.351 31 +ive 2.288 33 -ive
Each partitioned table is a dataset of 1500 records; only 3 records are exemplary shown in each table. The partitioned datasets are placed on different nodes of the distributed network as shown in figure 1. The traditional centralized data analyzing does not scale very well in distributed applications. In distributed environment analyzing the distributed data is a non-trivial problem because of many constraints such as limited bandwidth, privacy-sensitive data and distributed compute nodes. Due to the adaptive and deliberative reasoning features of intelligent mobile agents, the latter is well suited to cope up with the problems of distributed systems. An intelligent, learning and autonomous agent is capable of capturing and applying domain specific knowledge, learning, information and reasoning, to take actions in pursuit of a goal. The distributed problems solving environment fit well with the multiagent system (MAS) since the solution requires autonomous behavior, collabartion and reasioning. The agents perform the underlying data analysis tasks very efficiently in distributed manner. The MAS offer an architecture for distributed problem solving. The MAS deal with complex applications that require distributed problem solving. The MAS is also a distributed systems, combing data mining algorithms with MAS for data analyzing will further enhance the processing power of the application.
The multiagent system is used for the extraction of results from these nodes. These agents can roam from one node to other node freely and can be stored at any node in the distributed network. The results of these agents can also be stored at any where in the network. The architecture of mobile intelligent agents is shown in figure 2. This is a multiagent system, capable of performing classification, interpretation and visualization of large datasets.
This multiagent system comprises three intelligent mobile agents. First agent performs the classification of the given dataset using k-means clustering algorithm and provides clusters as output. Second and third agents perform the interpretation and visualization of these clusters using decision tree algorithm and data visualization of data mining. The user can directly access the clusters as an output from k-means algorithm and can interpret these clusters using 2D graphs using data visualization and decision rules derived from the decision tree algorithm.
The study could be extended to large scale distributed databases so as to validate the effectiveness of the proposed methodology. For further investigation in this direction, one will undoubtedly has to take into account the parameters such as data caching and the validity of the agent framework.