Comparison Of Data Mining Methods Information Technology Essay


In the modern world, increasing of the population is not going to be a big issue with increasing amount of Data. Normally amount of data in the world doubles in every twenty months, therefore the day to day life support of Hardware lead users to store data in proper manner in cost effective way, few years before measurement of data was done in Megabytes and Gigabytes but now that measurement translate in to Terabytes that shows how fast of increasing amount of the Data in the world. Although it is necessary to take the advantages from stored data, otherwise there is no reason to keep that much of data in storage, in data mining data is major valuable Resource for its activities. For that it is essential to understand the data, in here understanding means retrieve the patterns which are hiding in the data. But analyzing of data is still has some problems because make using of data is not parallel develop with increasing of data, it means still Data Mining is some kind of a up and coming event of data. What is the meaning of non understandable data? Data is in the form of non understandable format as an example it's like an encrypted paragraph which anything cannot be understandable. We need to decrypt it to get it to an understandable format then that data will become information, and then we can use that information to improve all our needs. In modern business strategies it uses several kinds of information to satisfy the customer requirements in here that information which gain by using the patterns in data then these is use as a business promotional method. In this context that patterns are separate form user comments that mean resource will be the user comments which are given by the user of particular thing. In my field of interest I would like to compare those user comments classification methods for a particular product and turn them comments to information which can be use to various purposes in future. As an example following examples shows what are the areas that user comments classification happens, Deegalla (2009)users can comment on particular films in Top hundred movie collection, from that it can collect those user comment and categorize in to groups (Action, Romance, Horror) in to particular criteria. These categorized data can be use in further activities such as keep informing those users about new product and services about particular interests. Likewise user comments classification can use in several areas. To classify the user comments we have to have proper powerful classifying methodology, in order to achieve that task need to find out a good classifying methodology, following research question was arise at that point. Before going to that point it is vital to understand essential key points of the research, following topics will explain about the main terms related to the proposed research.

Artificial Intelligent and Data Mining

In modern world most of the people have some idea about the field of Artificial Intelligent, such image is sometimes limited to Robots and some machineries in fiction Films in the cinemas. But it is more powerful than we thinking. The actual meaning of the artificial intelligent is represent as follows,

The computational devices and systems that were made by human being which those act as like an intelligent agent. Berkeley (1997)

Such kind of intelligent approaches from small Robots to big Space shuttles which gain information from the universe like varieties are fulfill the field of Artificial Intelligent. On the other hand another usage of Artificial intelligent is Data Mining

Data mining is an up and coming event of retrieve pattern in the data which make using of AI technologies, what is Data Mining? Chakrabarti et al (2009) Data Mining is way of retrieving machine stored electronic patterns in data by automated or semi automated way.

Most of the time data mining uses an automated approach to access patterns hide in the data. However this way of analyzing data is done by statistician forecasters in past, although in now data mining activities always perform with aid of computer.


Algorithm is pre define series of steps which takes some input and process it and give an output. It uses various kinds of methods and algorithms in user comment classification methods.

User Comments as a data type in Data Mining

The main scenario in this research is talking about retrieving patterns from data. One of the main features of web 2.0 is allow the users to put user comments on particular product. Those comments have some valuable information about particular product or something therefore this valuable information would be the data type in data mining.

For any product there should be a row materiel, in here row material for the patterns recognizing would be the user comments which is given by the users about particular thing.

Research Question

As I mentioned earlier how important of analyzing of stored data for expands of businesses and improve the quality of the services which is given to customers. For that there are lots of researches and investigations are performed in Data Mining field in order to improve the performance of Data Mining activities. As an example Data Mining strategies are already use in fields like Opinion Mining ,Hydro Informatics, marketing opportunities in Heath Care sector and also in generally tracking customer loyalty in online or normal businesses, and further more it is important to say that Data Mining is use for tracking online fraud and anonymous activities happen in the cyber space. For that it used various kinds of Data mining methodologies, from that some of them are perform well and some of them have some kinds of short comings related with them. All these short coming sometimes related to the performance of the Algorithms, therefore comparison of those methodologies is a vital thing in order to select a best methodology for User comment classification. There for it is essential to find out what kinds of Data Mining methodologies perform well on user comment classification.

By analyzing those researches I got an interest to perform a research to compare the Data Mining methods related to user comments classification. As mentioned earlier these kinds of user comments classification can be use in future activities to perform an automated activities which is relevant to customer satisfaction. Because of that it is vital to have good classification method in order to achieve above mentioned Goals for that selection I'm going to answer the following research question,

"How to compare Data Mining methods on classifying of user comments?"

By performing the above research I would be able to let various fields to get benefits from it such as business social networks and all other areas which needs to keep track of user comments about the particular thing by using a proper user comment classification method, for that they can user a suitable method for their user comments classification activity, this will evolve the efficiency of the classification job.

On the other hand this kind of research will helpful to develop any better methodology in user comment classification move to best position by investigating its good and bad behaviors.

As a introduction to a new field of study and also need help to improve an up and coming strategy of new business era, I select these as my dissertation topic for MSC.

Objectives of the research

In user comment classification it uses various methods in order to achieve the tasks. In those methods it uses several algorithms, therefore performance of those methods in different levels in different methods. Because of that it is necessary to compare those methods to come up with a good solution, for that following set of objectives achieved through the research in order to come with best user classification method.

Summarize existing user classification methods

Understand the differences between methods according to algorithm and behaviors

Understand why different methods have different performance

How it will affect to classification methods

By achieving above tasks I will be able to give a solution to find out a good user classification method.

Supported Literature

One of the leading researchers for this up and coming field of Data mining is Bin Liu. I used his white paper called "Opinion Mining" for my research as one of my supporting document. In this paper the researcher tries to perform the research on series of data mining investigations.

"Opinion Mining

Document Level Sentiment Analysis

Feature based Opinion Mining and Summarization etc"

In here the writer explains all the related things in well organized way. It is very help full for novel readers to get an idea on above topics. Ways of explaining those concepts in his document lead me to do my research on this field. In the topic opinion mining part researcher gave a clear strong idea about how the research question arises from the current position of the data mining activities. When refer to the paper which is done by the above researcher a main idea which is mention in paper was we are not gain the full benefits related with the web content data Liu (2010). He explains that further more like this, when it consider with the comments and reviews in the web pages blogs or any other kind of thing which is related to particular product in the web is good kind of source for further activities. By analyzing those data he says that it can use them as self ranking purposes. On the other hand he says that by using those data it can perform self survey about the particular product these kinds of things define that research question, further more he mention that modern search engines search for the content which given by the user to the search engine, he say, that is basically focus on some set of keywords and there is no way to find any opinion on the web for particular topic. From this point the researcher got the research defined the question more strong way. This is some kind of a totally new field because of that it will continue that idea by others who are interested in this area, further more this is something like new business strategy so it will get good recognition among on online business promotion and surveying activities. In some kind of further activities related with this article can be easily achieved with this document, it's because of clearly explained the literature which is based on to this research.

This research is something like take the traditional data mining subject to bit further and expands it to comply with the modern user requirements and business trends. In this research it uses inductive way to carry on the process, it's something like bottom up approach for the above observation. It first observes the above shortcoming with current search engines problems and other development need to improve the reliability of the web based activities then by using them problems it tries to give a solution or a theory to overcome with that problem.

In here it uses an experiment as the research method, because all of them technologies still in stage of experiment at that time. By performing those experiments he tries to give series of solutions to improve the reliability of the web.

From the above document I got an introduction to my research question. By reviewing the literature I got a focus idea on this new area. Then I search for review which is related to the terms in the Data Mining field. As I mentioned earlier it's totally new thing to doing on data mining research. When I search for a related document I found that a document which is related to terms and algorithm in data mining methodologies. This journal is written by San Jun Lee and Keng Siau , in that journal those writers summaries all the key factors needed to perform a data mining activity. Those terms are clearly explained in the base of the document and in that document gave all the related requirements in order to become with a good data mining solution. And from that document I understand that possible challenges regarding with the Data mining. It was very easy to gain those ideas from that document, because of its written in a simple understandable way. After refer to that journal I was able to get some rough idea about how those data mining algorithms are working in order to perform a particular task.

Approach and Research Method

This is some kind of a research we use to evaluate an existing theory to understand their good and bad behaviors regarding with them. To conduct the above research I use a top down analyst that mean from theory to analyze its other sub parts of those methods, further more my interest was to investigate about methods that use to classify the user comments that is already introduce as a theory to hypothesis by performing several kinds of activities.

When it moves to the research method part it needs to select a research method in order to carry out the research. This is going to be an experiment type research because it needs to perform some kind of an experiment to compare those methods. All the supporting data should be gain by investigating the controlled way and need to report all the outcomes relevant to a particular method. This going to be kind of Laboratory experiment although in this research I will expected some difficulties when carry out these research these are the expected difficulties in this task

Less number of experts in the problem area to gain a support

As mentioned earlier low background knowledge about the area

Less number of supporting document and references available

To tackle these above difficulties it need to pre preparation for the research, therefore I decided to perform any focus interviews in order to get rid from above short comings when needed. In this focus interviews mainly target on handle above difficulties arise on the research, and also those ideas and recommendations are use as the data for the research as well.

When it comes to the data analyze method I suppose to use qualitative method. Most of the time qualitative method is related with research method like experiments. In here those qualitative data is gained by performing deep Observation refer to the written documents and web content, and also as mentioned earlier hope to conduct any focus interviews when needed. From the above methods main way of gathering data would be the deep observation, that means

Out comes

The main objective behind the research question is to get a good idea about the field of AI and Data Mining. Now there is a trend to develop lots AI related application and services in the computer science context, such a less knowledge about the above area of study lead me to learn and discover something about AI and Data Mining. From the beginning of this path from Bsc till Msc I have never done anything related to AI, so that as a way of gain broader knowledge about IT and gain an experience how to performs an experiment is one of my major objective in doing this research.

By analyzing the existing algorithms which are related to Data Mining and user comment classification introduce a best solution for user comments classification.

The outcome of the research can directly use in many areas and it will directly affect to the performance of the methods, by using a proper method we can improve the following application of those methods. When customers comments about any particular product this method can categorize them to groups like comments like positive or negative or neutral. By using those ranking we can give automatic ranking to those products, and also on the other hand we can use that method to keep track of user interests. From that we can promote other products to those users according to their interests. Likewise proper method of user comments classification will improve the quality of the above applications of user comments classification.

