Improved Algorithm Applied in Text Clustering  
  Authors : Roslina; Muhammad Zarlis

 

The more number of documents stored in digitally, like as journals , e-books, bulletins and news .Tthe impact of fact the information that was available become blurred or lost because too many documents stored in the storage. This paper reviews the common algorithms used in text clustering : hierarchical clustering, partitioned clustering, density-based algorithm and selforganizing maps algorithm. And improved text clustering algorithm: TGSOM and Fuzzy K-Means Algorithm, PLSA Factors, Hierarchical Fuzzy Relational Eigenvector Centrality-based Clustering Algorithm (HFRECCA). Algorithm that has been developed aimed to improved the precision of text clustering for natural language. Clustering algorithms that have been developed have not been able to meet the maximum performance. To improve the performance of text clustering can be combined between the density-based algorithms and hierarchy clustering. Existing algorithm can be developed for video or image data.

 

Published In : IJCAT Journal Volume 2, Issue 7

Date of Publication : July 2015

Pages : 247 - 252

Figures :02

Tables : 01

Publication Link :Improved Algorithm Applied in Text Clustering

 

 

 

Roslina : received the Master degree in 1999 from National University of Malaysia (UKM) Malaysia in Information Technology and Bachelor Degree in 1994 from STMIK YPTK Padang, Indonesia in Computer Science. She is a lecturer at Politeknik Negeri Medan. Her current interests are in data mining and artificial intelligence. Nowadays, She is a Student in a Doctoral Program in Computer Science at University of Sumatera Utara (USU) Medan, Indonesia.

Muhammad Zarlis : He is a Professor in Computer Science at University of Sumatera Utara (USU) Medan, Indonesia.

 

 

 

 

 

 

 

Text

Clustering

Algorithm

TGSOM

PLSA Factor

HFRECCA

To cluster data text required several processes such as : text prepocessing consisting of case folding, tokenizing, filtering, stemming, tagging and analyzing. Then the future selection by Filter Feature Selection and Feature selection wrapper, and the clustering process developed algorithm to provides text clustering solutions with a high degree of precision ( more effective and efficient) and multiple clusters. In future, And existing algorithm can be developed for the data in the form of video or image.

 

 

 

 

 

 

 

 

 

[1] Ms. Seema V. Et. al 2014, HFRECCA for Clustering of Text Data from Travel Guide Articles International Conference on Advances in Computing,Communications and Informatics (ICACCI) [2] Upuli Gunasinghe et. Al, 2012, A Sequence Based Dynamic SOM Model for Text Clustering, World Congress on Computational Inteligence (WCCI), IEEE [3] Fasheng L and Lu Xiong, 2011, Survey on Text Clustering Algorithm - Research Present Situation of Text Clustering Algorithm. IEEE [4] Jiabin Deng et.al, 2010, An Improved Fuzzy Clustering Method for Text Mining, Second International Conference on Networks Security, Wireless Communications and Trusted Computing, IEEE. [5] SUN Ai-xiang, 2010, Improved SOM Algorithm- HDSOM Applied in Text Clustering, International Conference on Multimedia Information Networking and Security. [6] Jinzhu Hu et al, 2009, A Novel Text Clustering Method Based on TGSOM and Fuzzy K-Means, First International Workshop on Education Technology and Computer Science. IEEE [7] Zhao Y and Karypis G. 2005, Hierarchical Clustering Algorithms for Document Datasets. Data Mining and Knowledge Discovery [8] Karypis G, Zhao Y, 2002. Evalution of hierarchical clustering algorithms for document datasets. In: Proc of the International Conference on Information and knowledge Management[C]. New York, 512-524 [9] Bing. Liu,2009, Web Data Mining, Tsinghua University Press [10] Hongbin Gao, Haizhen Yang, Xiaobin Zhang, 2008, an Improved Document Clustering Algorithm. Computer Applications, 30-32 [11] Xuning Tang and Jiangbo Dang, 2012, An Exploratory Study of Enhancing Text Clustering with Auto- Generated Semantic Tags. Eighth International Conference on Semantics, Knowledge and Grids.