Result Analysis using Fast Clustering Algorithm

3373 words (13 pages) Essay

27th Mar 2018 Computer Science Reference this

Disclaimer: This work has been submitted by a university student. This is not an example of the work produced by our Essay Writing Service. You can view samples of our professional work here.

Any opinions, findings, conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of

Result Analysis using Fast Clustering Algorithm and Query Processing using Localized Servers.


Abstract—This paper identifying records that produces compatible results using Fast Clustering Selection Algorithm. A selection algorithm may be evaluated from both the efficiency and effectiveness points of view. While the efficiency concerns the time required to find a record, the effectiveness is related to the quality of the record. The selection algorithm fetches the result with the help of register number. The Selection algorithm works in two steps. In the first step, the register number fetches the result from the server. The record for every individual will be obtained by hit method. The sender sends the request to the server. In the second step, the most representative record that is strongly related to target classes is fetched from database. The record fetches from the database by the register number. The string generation algorithm is guaranteed to generate the optimal result k candidates. We analyses the results of students using Selection Algorithm. We need to define compatible operation analogs by introducing max-min operation & min-max operation. It automatically collects data from the web to enrich the result. The analysis of result for huge students make more time. The accuracy of the result has to be considered. We need to fetch the result individually by their register number. It leads to time inefficiency. In a proposed system, we obtain the result for a group of students. The Selection method fetches the result for a student according to their register number which is entered in between a range. The result for the student automatically fetched from the server. Once the result for the candidate has been fetched from the server, it stored in the client database. Then we sort the result of the student as group. It increases the accuracy and makes the efficient one. It reduces the burden of the people who analyze the result. The result analysis is performed within a short period. We can generate the report based on the GRADE system. Our experimental evaluation shows that our approach generates superior results.

Extensive experiments on large real data sets demonstrate the efficiency and effectiveness. Finally we sort the results of students using FAST CLUSTERING SELECTION algorithm.

Index Terms– FAST, Minmax & Maxmin Operation.


Students play a major role in Educational field.Students are evaluated under different categories: By choosing their institution, studying well, gaining good knowledge, and getting good marks. Result analysis of each student paves the way for their higher education as well as their improvement in future. Percentage marks prior to the grade scheme were converted into grades for ease of comparison.

The reliability of the new scheme was again studied using statistical analysis of data obtained from both the old and new schemes. Some assessment schemes use a grading category index (GCI) instead of actual mark for each assessment criterion. GCIs usually have a smaller number of options to choose from when awarding results. For example, the GCI may gave eight levels with the highest being awarded to exceptional students and the lowest being awarded to students of inadequate performance. This reduced level of categories has been shown to result in less variability between assessors compare to systems which use marking ranges between 0 and 100. The Results of the students are analyzed using Fast Clustering Selection Algorithm.

Get Help With Your Essay

If you need assistance with writing your essay, our professional essay writing service is here to help!

Find out more

In this paper, we are analyzing the results of students using clustering methods with the help of filtering by introducing max-min operation &min-max operation.The filter method is usually a good choice when the number of records is very large.The SELECTION algorithm works in two steps.

In the first step, the register number fetches the result from the server. The record for every individual will be obtained by hit method. The sender sends the request to the server. In thesecond step,themost representative record that is strongly related to target classes is fetched from database.It consists of three components: query generation, and data selection and presentation.This approach automatically determinesinformation. It then automatically collects data fromthe web .By processing a large set of data; it is able to deal with more complex queries. In order to collect result, we need to generate informative queries. The queries have to be generated for every individual student.It increases the time to fetches the result and inefficiency. In order to overcome this, the queries are generated along with unique identification number i.e. register number. Based on the generated queries, we vertically collect image data with multimedia search engines.We then perform reranking and duplicate removal to obtain a set of accurate and representative results.


Selection can be viewed as the process of identifying and removing as many irrelevant and redundant record as possible. This is because: (i) irrelevant records do not contribute to the predictive accuracy, and (ii) redundant features do not redound to getting a better predictor for that they provide mostly information which is already present. Selection focused on searching for relevant records. Irrelevant data, along with redundant data, severely affect the accuracy.

Thus, selection should be able to identify and remove as much of the irrelevant and redundantinformation as possible.


To collect result from the web,we need to generate appropriate queries before performing search. We accomplish the task with two steps. The first step is query extraction. We needto extract a set of informative keywords from querying. The second step is query selection.This is because we can generate different queries: one fromretrieve, one from display, and one from the combinationof retrieve and display.

In query generation, given an input string Qi, we aim to generate the most likely koutput strings sothat can betransformed from Qi and have the largest probabilities.


We perform search using the generated queries to collect the result of the student. The result of the student is fetched from the server by three processes. Before query generation, the register number for the students is fetched from the database. The register numbers are grouped based upon the department. The register number for each group is partitioned and stored as arrays of objects. In query generation, the register number is added with the query and it performs the request to server.The results are built upon text based indexing. Therefore, reranking is essential to reorder the initial text-based search results. A query-adaptivereranking approach is used for the selection of the result. We first decide whether a query is text related or image related, and then we use different features for reranking.

Here we regard the prediction of whether a query is text related as a classification task.We can choose to match each query term with a result list. But it will not be easy tofind a complete list. In addition, it will be difficult to keep the list updated in time.

We adopt a method that analyzes results. Thus, we perform a duplicate removal step to avoid information redundancy. The result which is fetched from the server may increases the time if there is large amount of data. To increases the time efficiency we need to process the query in a different manner. The results are grouped with the help of group id.


The generated query is first passed as a string to the server. The server searches the result with the register number. Once the result is found for the particular register number, the server sends the respond to the query client.Theresult received for a particular student is stored in the database with help of the register number. The results can be printed for a group of students by simply selecting the results from database with the group id. The group id is set for a group of students based upon their department id. The department id is a unique constraint for the identification of the record. In query generation the records are fetched from the server and stored in the client database by the department id and group id.


We use the query adaptive ranking to perform query classification and thenadopt query-adaptive reranking accordingly. It is our proposedapproach and it is denoted as “proposed”. After reranking, we perform duplicate removal and irrelevant removal of result.


The proposed FAST algorithm logically consists of two steps: (i) removing irrelevant record, (ii) removing redundant record.

1) Irrelevant records have no/weak correlation with target concept;

2) Redundant records are assembled in a cluster and a representative data can be taken out of the cluster.


For every result

Calculate the average queue size(avg)


Calculateprobability pa

With probability pa:

ifregister no. is valid and

if the result is not already fetched


Mark the result

Send request to the sender and save the result



Drop the request to the server


else if maxth<= avg

Store the result in database

Send acknowledgment to the server.

Fig.1. gives the flowchart of the algorithm

FAST Algorithm

The FAST algorithm fetches the result of the student with the help of the register number.



Fig.1. Flowchart of the algorithm FAST Algorithm

The algorithm checks whether the given register number is valid or invalid. The register number is a collection of college code and student code.

The college code is used to identify the result of the particular college.

The FAST algorithm calculates the probability of finding the result of the student from the server. Then it identifies the results from the server using the request and response method. The avgSELECTIVITY OF RANGE QUERIES

Selectivity estimation of range queries is a much harder problem. Several methods were available. However, they are only able to estimate the number of records in the range. None can be efficiently adapted to estimate the number of results in the range. One naive solution is to treat information as record by removing the irrelevant information. This clearly increases the space consumption significantly (and affects the efficiency) since the number of points is typically much larger than the number of existing nodes. When generating the query workload for ourdatasets we had to address two main challenges. We had to generate a workload,with an attribute distribution representing the user interests in a realistic way. Second, we had to create queries of theform attribute-value.

Find out how can help you!

Our academic experts are ready and waiting to assist with any writing project you may have. From simple essay plans, through to full dissertations, you can guarantee we have a service perfectly matched to your needs.

View our services

Query reformulation involves rewriting the original query with its similar queries and enhancing the effectiveness of search. Most existing methods manage to mine transformation rules from pairs of queries in thesearch logs. One represents an original query and the other represents a similar query.

1) Select the length of the query l by sampling from a uniform probability distribution with lengths varying from 1 to 3.

2) Select an attribute A1 using the popularity that they have on the vector

3) Select the next attribute A2 using the co-occurrence ratio with the previous attribute A1.

4) Repeat from Step 2, until we get l different attributes.


We check the effect of the size of the database on the precision of attribute suggestions and thenumber of query matches. We consider subsets of the database of documents of different sizes. As expected the proposed strategies increase their quality when weincrease the data size. The size of the result is based on the method of us storing it. We storing the data which is retrieved from sever to the client database which increases the time efficiency and minimum storage capacity. The results are stored in the database by the student register number which requires less storage and increases the efficiency of accessing the information.


In this paper, we have presented a clustering-based selection algorithm for result analysis. The algorithm involves (i) removing irrelevantrecords, (ii) removing redundant record. We can do the result analysis but it makes more time to get the result of every student. For that we are using a selection algorithm which removes the redundancy of the result and using it we can fetch the result of large group of people. We have adopted a method to remove duplicates, but in many cases more diverse results may be better. In our future work, we will

further improve the scheme, such as developing better query generation method and investigating the relevant segmentsfrom the result.


[1] Chanda P., Cho Y., Zhang A. and Ramanathan M., Mining of Attribute Interactions Using Information Theoretic Metrics, In Proceedings of IEEE international Conference on Data Mining Workshops, pp 350-355, 2009.

[2] Y. Du, S. Gupta, and G. Varsamopoulos, “Improving On-Demand Data Access Efficiency in MANETs with Cooperative Caching,” Ad Hoc Networks, vol. 7, pp. 579-598, May 2009.

[3] Biesiada J. and Duch W., Features election for high-dimensionaldatała Pearson redundancy based filter, AdvancesinSoftComputing, 45, pp 242C249, 2008.

[4] Garcia S and Herrera F., An extension on “Statistical Comparisons of Classifiers over Multiple Data Sets” for all pairwise comparisons, J. Mach. Learn. Res., 9, pp 2677-2694, 2008.

[5] C. Chow, H. Leong, and A. Chan, “GroCoca: Group-Based Peer- to-Peer Cooperative Caching in Mobile Environment,” IEEE J. Selected Areas in Comm., vol. 25, no. 1, pp. 179-191, Jan. 2007.

[6] Demsar J., Statistical comparison of classifiers over multiple data sets, J. Mach. Learn. Res., 7, pp 1-30, 2006.

[7] L. Yin and G. Cao, “Supporting Cooperative Caching in Ad Hoc Networks,” IEEE Trans. Mobile Computing, vol. 5, no. 1, pp. 77-89, Jan. 2006.

[8] Butterworth R., Piatetsky-Shapiro G. and Simovici D.A., On Feature Selectionthrough Clustering, In Proceedings of the Fifth IEEE internationalConference on Data Mining, pp 581-584, 2005.

[9] Fleuret F., Fast binary feature selection with conditional mutual Information, Journal of Machine Learning Research, 5, pp 1531-1555, 2004.

[10] Dhillon I.S., Mallela S. and Kumar R., A divisive information theoretic feature clustering algorithm for text classification, J. Mach. Learn. Res., 3, pp 1265-1287, 2003.

[11] Forman G., An extensive empirical study of feature selection metrics for text classification, Journal of Machine Learning Research, 3, pp 1289-1305, 2003.

[12] Guyon I. and Elisseeff A., An introduction to variable and feature selection, Journal of Machine Learning Research, 3, pp 1157-1182, 2003.

[13] M. Korupolu and M. Dahlin, “Coordinated Placement and Replacement for Large-Scale Distributed Caches,” IEEE Trans. Knowledge and Data Eng., vol. 14, no. 6, pp. 1317-1329, Nov. 2002.

[14] Das S., Filters, wrappers and a boosting-based hybrid for feature Selection, In Proceedings of the Eighteenth International Conference on Machine Learning, pp 74-81, 2001.

[15] Dougherty, E. R., Small sample issues for microarray-based classification. Comparative and Functional Genomics, 2(1), pp 28-34, 2001.

[16] S. Dykes and K. Robbins, “A Viability Analysis of Cooperative Proxy Caching,” Proc. IEEE INFOCOM, 2001.

[17] Bell D.A. and Wang, H., A formalism for relevance and its application in feature subset selection, Machine Learning, 41(2), pp 175-195, 2000.

[18] Dash M., Liu H. and Motoda H., Consistency based feature Selection, In Proceedings of the Fourth Pacific Asia Conference on Knowledge Discovery and Data Mining, pp 98-109, 2000.

[19] Hall M.A., Correlation-Based Feature Selection for Discrete and Numeric Class Machine Learning, In Proceedings of 17th International Conference on Machine Learning, pp 359-366, 2000.

[20] Baker L.D. and McCallum A.K., Distributional clustering of words for text classification, In Proceedings of the 21st Annual international ACM SIGIR Conference on Research and Development in information Retrieval, pp 96- 103, 1998.

Cite This Work

To export a reference to this article please select a referencing stye below:

Reference Copied to Clipboard.
Reference Copied to Clipboard.
Reference Copied to Clipboard.
Reference Copied to Clipboard.
Reference Copied to Clipboard.
Reference Copied to Clipboard.
Reference Copied to Clipboard.

Related Services

View all

DMCA / Removal Request

If you are the original writer of this essay and no longer wish to have your work published on the website then please: