Implementation Of Id3 In Java Computer Science Essay

Published:

This essay has been submitted by a student. This is not an example of the work written by our professional essay writers.

Due to vast availability of data on web, the challenges in retrieving the web data is also increasing. In our paper we discussed about web and how to mould unstructured web documents into structural format. We have also developed a java program for ID3 to classify web documents.

Keywords: Java, ID3, Web documents, Title, classification, Moulding.

Web is popular due to the availability of vast amount of data. It contains several billions of HTML documents, pictures and other multi media files. These documents reside on Internet servers and the information-exchange can be done through HTTP protocols. Individual HTML files having unique address is called Web page and collection of such web pages having same electronic address is called as Web site. The quality of the web page is decided by its title. It is the name of a web page or web site. So giving an appropriate and good title to a web page is very important. To retrieve the required web pages based on the title of the web page, the title of the web page has to be parsed (i.e.) the key words has to be extracted from the given titles. The unstructured web documents have to be moulded into structural format for classifying them using these keywords so as to retrieve the documents efficiently.

2. Moulding Web Documents

We represent the Web data in the binary format where all of the keywords derived from the schema (Title of the web page) posited in the columns and some frequent schemas are posited in the rows. If a keyword is in a frequent schema, a 1 is stored in related cell and otherwise a 0 is stored in it. In the following you can see an example of this form. The attributes of frequent schemas are stated below.

QI1: Deep Web Content mining = {Deep, Web, content, mining}

QI2: Extracting structured data from Web pages = {Extracting, structured, data, Web, pages}

QI3: Page content Rank: An approach to the web content mining = {Page, content, rank, web, content, mining}

QI4: Web mining: Information and pattern discovery on the World Wide Web = {Mining, information, pattern, discovery, World Wide Web}

We represent the web data in binary format as in below table.

Table 1: A Sample Array with Input Information.

Deep

Web

Content

Mining

Extracting

Other Keywords

Category

QI1

1

1

1

1

0

0

1

QI2

0

1

0

0

1

1

2

QI3

0

1

1

1

0

1

3

QI4

0

0

0

1

0

1

1

3. Java Program

To attain knowledge out of retrieved data we classify the documents by converting the unstructured documents into structural format. For classification of web documents we have many methods like classification by Decision tree induction, Bayesian classification, Rule-Based classification, Support vector machine etc. During late 1970s and 1980s J.Ross Quinlan, a researcher in machine learning has developed a decision tree algorithm known as ID3 [1] (Iterative Dichotomiser). ID3 uses information gain as its attribute selection. Information gain Gain(A) is given as

Gain(A) = Info(D) -InfoA(D)

We felt very difficult to calculate this information gain manually. So we have developed a java program to calculate information gain. Though accuracy wise it is not encouraging, methodology wise it is promising. We divided the program into two parts. The first part calculates Info(D) and the second part calculates the Gain(A). The web documents which we represented in binary format, constitutes as input array. This program is described as below.

/*ID3 uses information gain as its attribute selection measure. Gain(A) = Info(D) -InfoA(D). (This part of the program deals with Info(D). */

import java.math.*;

class Entropy1

{

public static void main(String arg[])

{

int a[]={/* Enter values in class columns */};

int b[]= new int[10],count=1;

int n=a.length,temp=0,i ;

for( i=0; i<n;i++)

{

for(int j=i+1;j<n;j++)

{

if(a[i]>a[j])

{

temp=a[i];

a[i]=a[j];

a[j]=temp;

}

}

}

for( i=0;i<n;i++)

{

System.out.print(a[i]);

}

int k=0;

System.out.println(„n-1="+(n-1));

for( i=0;i<n-1;i++)

{

if (a[i]==a[i+1])

{

count=count+1;

}

else

{

b[k]=count;

k=k+1;

count=1;

}

}

if(a[i-1]!=a[i])

{

if(i+1==n)

b[k]=1;

}

else if((a[i-1]==a[i]) && (i+1==n))

{

b[k]=count;

}

System.out.println("k="+k);

for( i=0;i<=k;i++)

System.out.println (b[i]);

double sum=0;

for( i=0;i<=k;i++)

{

float x = (float)b[i]/a.length;

double x1 = Math.log10(x);

double ent =-x*(x1/0.301);

sum=sum+ent;

}

/* Value of Info(D)*/

System.out.println(" info(x) = "+sum);

}

}

Output:-

Info(x) = 2.8156. (When some input array is given (extension of above sample array))

/*Second part deals with Gain(A).*/

import java.util.*;

class Ent1

{

public static void main(String arg[])

{

int[][] a={/* Enter data item values*/};

int b[] = new int[20];

int c[] = new int[20];

int b1[] = new int[20];

int c1[] = new int[20];

double in[] = new double[50];

int sum=0,in1=0;

int count=0,len,k,k1,sum1,count1,k3=0,i3;

double info,info1,info2;

System.out.println(a.length);

len=(a[0].length)-1;

System.out.println(len+1);

for(int i=0;i<len;i++)

{

sum=0;

count=0;

k=0;k1=0;

for(int j=0;j<a.length;j++)

{

if( a[j][i]==1)

{

b[k]=a[j][len];

k++;

sum=sum+1;

}

else

{

c[k1]=a[j][len];

k1++;

count=count+1;

}

}

Arrays.sort(b,0,sum);

Arrays.sort(c,0,count);

/* For array b */

sum1=1;

count1=1;

k3=0;

b1[0]=1;

for( i3=0;i3<sum-1;i3++)

{

if (b[i3]==b[i3+1])

{

sum1=sum1+1;

}

else

{

b1[k3]=sum1;

k3=k3+1;

sum1=1;

}

}

if(i3!=0)

{

if(b[i3-1]!=b[i3])

{

if(i3+1==sum)

b1[k3]=1;

}

else

if((b[i3-1]==b[i3]) && (i3+1==sum))

{

b1[k3]=sum1;

}

}

info1=0;

for( i3=0;i3<=k3;i3++)

{

float x = (float)b1[i3]/sum;

double x1 = Math.log10(x);

double ent =-x*(x1/0.301);

info1=info1+ent;

}

info1=((float)sum/a.length)*info1;

/* For array c */

for( i3=0;i3<count-1;i3++)

{

if (c[i3]==c[i3+1])

{

count1=count1+1;

}

else

{

c1[k3]=count1;

k3=k3+1;

count1=1;

}

}

if(i3!=0)

{

if(c[i3-1]!=c[i3])

{

if(i3+1==count)

c1[k3]=1;

}

else

if((c[i3-1]==c[i3]) && (i3+1==count))

{

c1[k3]=count1;

}

}

info2=0;

for( i3=0;i3<=k3;i3++)

{

float x2 = (float)c1[i3]/count;

if(x2==0)

{

info2=0;

}

else

{

double x3 = Math.log10(x2);

double ent1 =-x2*(x3/0.301);

info2=info2+ent1;

}

}

info2=((float)count/a.length)*info2;

info=/*Info(x) (from Ist program)*/ -(info1+info2);

in[in1]=info;

in1++;

}

for(int i=0;i<len;i++)

{

System.out.println("i="+i);

System.out.println(in[i]);

}

double max=in[0];

int inx=0;

/* To find max data item in the array */

for (int i=1;i<len;i++)

{

if(in[i]>max)

{

max=in[i];

inx = i;

}

}

System.out.println("max="+max+"index="+inx);

}

}

Output:-

Max= 0.489

Index = 39 (which means the value at index 39 will become root value in the decision tree).

Conclusions

In our paper we discussed about web and how to mould web documents into a structural format. Out of many methods available in data mining we have used ID3 algorithm for classification and we implemented a java program for finding information gain. This program is for better understanding of the algorithm.

Writing Services

Essay Writing
Service

Find out how the very best essay writing service can help you accomplish more and achieve higher marks today.

Assignment Writing Service

From complicated assignments to tricky tasks, our experts can tackle virtually any question thrown at them.

Dissertation Writing Service

A dissertation (also known as a thesis or research project) is probably the most important piece of work for any student! From full dissertations to individual chapters, we’re on hand to support you.

Coursework Writing Service

Our expert qualified writers can help you get your coursework right first time, every time.

Dissertation Proposal Service

The first step to completing a dissertation is to create a proposal that talks about what you wish to do. Our experts can design suitable methodologies - perfect to help you get started with a dissertation.

Report Writing
Service

Reports for any audience. Perfectly structured, professionally written, and tailored to suit your exact requirements.

Essay Skeleton Answer Service

If you’re just looking for some help to get started on an essay, our outline service provides you with a perfect essay plan.

Marking & Proofreading Service

Not sure if your work is hitting the mark? Struggling to get feedback from your lecturer? Our premium marking service was created just for you - get the feedback you deserve now.

Exam Revision
Service

Exams can be one of the most stressful experiences you’ll ever have! Revision is key, and we’re here to help. With custom created revision notes and exam answers, you’ll never feel underprepared again.