The more number of documents stored in
digitally, like as journals , e-books, bulletins and news
.Tthe impact of fact the information that was available
become blurred or lost because too many documents
stored in the storage. This paper reviews the common
algorithms used in text clustering : hierarchical clustering,
partitioned clustering, density-based algorithm and selforganizing
maps algorithm. And improved text clustering
algorithm: TGSOM and Fuzzy K-Means Algorithm, PLSA
Factors, Hierarchical Fuzzy Relational Eigenvector
Centrality-based Clustering Algorithm (HFRECCA).
Algorithm that has been developed aimed to improved the
precision of text clustering for natural language.
Clustering algorithms that have been developed have not
been able to meet the maximum performance. To improve
the performance of text clustering can be combined
between the density-based algorithms and hierarchy
clustering. Existing algorithm can be developed for video
or image data.
Published In : IJCAT Journal Volume 2, Issue 7
Date of Publication : July 2015
Pages : 247 - 252
Figures :02
Tables : 01
Publication Link :Improved Algorithm Applied in Text Clustering
Roslina : received the Master
degree in 1999 from National
University of Malaysia (UKM)
Malaysia in Information
Technology and Bachelor Degree
in 1994 from STMIK YPTK
Padang, Indonesia in Computer
Science. She is a lecturer at
Politeknik Negeri Medan. Her
current interests are in data mining
and artificial intelligence.
Nowadays, She is a Student in a
Doctoral Program in Computer
Science at University of Sumatera
Utara (USU) Medan, Indonesia.
Muhammad Zarlis : He is a
Professor in Computer Science at
University of Sumatera Utara
(USU) Medan, Indonesia.
Text
Clustering
Algorithm
TGSOM
PLSA
Factor
HFRECCA
To cluster data text required several processes such as
: text prepocessing consisting of case folding,
tokenizing, filtering, stemming, tagging and analyzing.
Then the future selection by Filter Feature Selection and
Feature selection wrapper, and the clustering process
developed algorithm to provides text clustering solutions
with a high degree of precision ( more effective and
efficient) and multiple clusters. In future, And existing
algorithm can be developed for the data in the form of
video or image.
[1] Ms. Seema V. Et. al 2014, HFRECCA for Clustering
of Text Data from Travel Guide Articles International
Conference on Advances in
Computing,Communications and Informatics
(ICACCI)
[2] Upuli Gunasinghe et. Al, 2012, A Sequence Based
Dynamic SOM Model for Text Clustering, World
Congress on Computational Inteligence (WCCI), IEEE
[3] Fasheng L and Lu Xiong, 2011, Survey on Text
Clustering Algorithm - Research Present Situation of
Text Clustering Algorithm. IEEE
[4] Jiabin Deng et.al, 2010, An Improved Fuzzy Clustering
Method for Text Mining, Second International
Conference on Networks Security, Wireless
Communications and Trusted Computing, IEEE.
[5] SUN Ai-xiang, 2010, Improved SOM Algorithm-
HDSOM Applied in Text Clustering, International
Conference on Multimedia Information Networking
and Security.
[6] Jinzhu Hu et al, 2009, A Novel Text Clustering
Method Based on TGSOM and Fuzzy K-Means, First
International Workshop on Education Technology and
Computer Science. IEEE
[7] Zhao Y and Karypis G. 2005, Hierarchical Clustering
Algorithms for Document Datasets. Data Mining and
Knowledge Discovery
[8] Karypis G, Zhao Y, 2002. Evalution of hierarchical
clustering algorithms for document datasets. In: Proc
of the International Conference on Information and
knowledge Management[C]. New York, 512-524
[9] Bing. Liu,2009, Web Data Mining, Tsinghua
University Press
[10] Hongbin Gao, Haizhen Yang, Xiaobin Zhang, 2008, an
Improved Document Clustering Algorithm. Computer
Applications, 30-32
[11] Xuning Tang and Jiangbo Dang, 2012, An Exploratory
Study of Enhancing Text Clustering with Auto-
Generated Semantic Tags. Eighth International
Conference on Semantics, Knowledge and Grids.