A Comparison of Clustering Techniques in Data Mining

Abstract
Authors
Keywords
Conclusion
References

Clustering is an important tool in data analysis, as data set grows then their properties and interrelationships will also change. There are different types of cluster model: Connectivity models, Distribution models, Centroid models, Subspace model, Group models and Graph-based models. Clustering algorithms can be categorized based on the models which are using .Traditionally clustering techniques are broadly divided into hierarchical and density based clustering. There are so many clustering methods because the notion of cluster cannot be easily defined. Data mining deals with large data sets and their relationships, while we are imposing clustering to analyze the huge data that needs additional challenges. This leads to an efficient and broadly applicable clustering method. In this paper some of the clustering techniques are discussed.

Published In : IJCAT Journal Volume 1, Issue 4

Date of Publication : 31 May 2014

Pages : 42 - 46

Figures : 03

Tables : 01

Publication Link : IJCAT-2014/1-4/A Comparison of Clustering Techniques in Data Mining

Rahumath Beevi A : Received the bachelor’s degree in Computer Science and Engineering from Cochin University of Science and Technology, Kerala in 2012. Presently she is pursuing her Mtech in the department of Computer Science and Engineering from Cochin University of Science and Technology, Kerala. Her research interests include Data Mining.

Remya R : Received the bachelor’s degree in Information Technology from University college Trivandrum, Kerala in 2004 and master’s degree in Computer Science and Engineering from Anna University, Coimbatore in 2008. Currently working as an Assistant Professor in Information Technology department, of College of Engineering Perumon,under Cochin university of Science and Technology. She has teaching experience of eight years. Research interests Includes Data Mining.

Data mining

Hard and soft clustering

fuzzy clustering

sentence level clustering

Concept analysis

Graph centrality

We have already reviewed numerous clustering algorithms. But it is necessary to pre assume the number c of clusters for all these algorithms. Therefore, the method to find optimal c is very important. By analyzing various methods it is clear that each of them have their own advantages and disadvantages. The quality of clusters depends on the particular application. When the interobject relationship has no metric characteristics then ARCA is a better choice. Among the different fuzzy clustering techniques FRECCA algorithm is superior to others. It is able to overcome the problems in sentence level clustering. But when time is critical factor then we cannot adopt fuzzy based approaches. A good clustering of text requires effective feature selection and a proper choice of the algorithm for the task at hand. It is observed from the above analysis that fuzzy based clustering approaches provide significant performance. But, fuzzy approaches do have certain drawbacks which have to be eliminated.

[1] Oded Maimon, Lior Rokach, “Data Mining AND Knowlwdge Discovery Handbook”, Springer Science+Business Media.Inc, pp.321-352, 2005.

[2] P. Berkhin, “A Survey of Clustering Data Mining Techniques” Kogan,Jacob; Nicholas, Charles; Teboulle, Marc (Eds) Grouping Multidimensional Data, Springer Press (2006) 25-72

[3] J.B MacQueen, “Some Methods for Classification and Analysis of Multivariate Observations,” Proc. Fifth Berkeley Symp. Math. Statistics and Probability, pp. 281-297, 1967.

[4] R. Krishnapuram, A. Joshi, and Y. Liyu, “A Fuzzy Relative of them k-Medoids Algorithm with Application to Web Document and Snippet Clustering,” Proc. IEEE Fuzzy Systems Conf., pp. 1281-1286,1999.

[5] Gerald Kowalski, Information Retrieval Systems – Theory and Implementation, Kluwer Academic Publishers, 1997.

[6] L.A. Zadeh, Fuzzy sets, Inform. and Control 8, 338-353 (1965).

[7] E.H. Ruspini, A new approach to clustering, Inform. and Control 15, 22-32 (1969).

[8] P. Corsini, F. Lazzerini, and F. Marcelloni, “A New Fuzzy Relational Clustering Algorithm Based on the Fuzzy C-Means Algorithm,” Soft Computing, vol. 9, pp. 439-447, 2005.

[9] Andrew Skabar, Member, IEEE, and Khaled Abdalgader “Clustering Sentence-Level Text Using a Novel Fuzzy Relational Clustering Algorithm” 1041- 4347/13/$31.00 2013 IEEE Published by the IEEE Computer Society.

[10] A.P. Dempster, N.M. Laird, and D.B. Rubin, “Maximum Likelihood from Incomplete Data via the EM Algorithm,” J. the Royal Statistical Soc. Series B (Methodological), vol. 39, no. 1, pp. 1-38, 1977.

[11] http://www.famousquotesandauthors.com, 2012.