Improved Algorithm Applied in Text Clustering  
  Authors : Roslina; Muhammad Zarlis


The more number of documents stored in digitally, like as journals , e-books, bulletins and news .Tthe impact of fact the information that was available become blurred or lost because too many documents stored in the storage. This paper reviews the common algorithms used in text clustering : hierarchical clustering, partitioned clustering, density-based algorithm and selforganizing maps algorithm. And improved text clustering algorithm: TGSOM and Fuzzy K-Means Algorithm, PLSA Factors, Hierarchical Fuzzy Relational Eigenvector Centrality-based Clustering Algorithm (HFRECCA). Algorithm that has been developed aimed to improved the precision of text clustering for natural language. Clustering algorithms that have been developed have not been able to meet the maximum performance. To improve the performance of text clustering can be combined between the density-based algorithms and hierarchy clustering. Existing algorithm can be developed for video or image data.


Published In : IJCAT Journal Volume 2, Issue 7

Date of Publication : July 2015

Pages : 247 - 252

Figures :02

Tables : 01

Roslina : received the Master degree in 1999 from National University of Malaysia (UKM) Malaysia in Information Technology and Bachelor Degree in 1994 from STMIK YPTK Padang, Indonesia in Computer Science. She is a lecturer at Politeknik Negeri Medan. Her current interests are in data mining and artificial intelligence. Nowadays, She is a Student in a Doctoral Program in Computer Science at University of Sumatera Utara (USU) Medan, Indonesia.

Muhammad Zarlis : He is a Professor in Computer Science at University of Sumatera Utara (USU) Medan, Indonesia.












PLSA Factor


To cluster data text required several processes such as : text prepocessing consisting of case folding, tokenizing, filtering, stemming, tagging and analyzing. Then the future selection by Filter Feature Selection and Feature selection wrapper, and the clustering process developed algorithm to provides text clustering solutions with a high degree of precision ( more effective and efficient) and multiple clusters. In future, And existing algorithm can be developed for the data in the form of video or image.










