A Pragmatic Application of Clustering-Based Feature Subset Selection Algorithm for High Dimensional Data

Abstract
Authors
Keywords
Conclusion
References

Using the rapid growth of computational biology and e-commerce applications, high-dimensional data becomes usual. Thus, mining high dimensional data is an urgent problem of great practical importance. Within the high dimensional data the dimensional reduction is a vital factor, to the purpose the clustering based feature subset selection algorithm is proposed in this particular paper. The characteristics are actually clustered Based on the class labels. The Relevance on the clustered features has become evaluated. The correlation on the relevant clustered feature will be evaluated. This technique improved by cluster based FAST Algorithm and Fuzzy Logic. FAST Algorithm can often Identify and taking out the irrelevant data set. This algorithm process implements using two different steps which are graph theoretic clustering methods and representative feature cluster is selected. Feature subset selection researchers have centered on in search of relevant features. The proposed fuzzy logic has focused on minimized redundant data set and improves the feature subset accuracy.

Published In : IJCAT Journal Volume 1, Issue 7

Date of Publication : 31 August 2014

Pages : 349 - 353

Figures :01

Tables : --

Publication Link : A Pragmatic Application of Clustering-Based Feature Subset Selection Algorithm for High Dimensional Data

N.Narendra Reddy : Post-Graduate Student, Department of Computer Science and Engineering, SIT, PUTTUR, India

K. Narayana : Head & Associate Professor, Department of computer Science and Engineering, SIT, PUTTUR, India

Clustering

Fuzzy logic

Biology

E-commerce

In this particular paper, we have proposed a clusteringbased feature subset selection algorithm for high dimensional data. The algorithm involves 1) removing irrelevant features, 2) constructing the absolute minimum spanning tree from relative ones, and 3) partitioning the MST and selecting representative features. The projected feature subset selection algorithm FAST was tested as well as the investigational results demonstrate that, evaluated along with other various kinds of feature subset selection algorithms, the projected algorithm not only decrease the quantity of features, and also advances the performances with the renowned various kinds of classifiers.

[1] L. Kaufman, and P.J. Rousseeuw (1990) Finding groups in data: An introduction to cluster analysis. John Wiley and Sons, New York.

[2] J. Daxin, C. Tang and A. Zhang (2004) Cluster analysis for Gene expression data: A survey, IEEE Transaction on Knowledge and Data Engineering, Vol. 16 Issue 11, pp. 1370-1386.

[3] R. Agrawal, J. Gehrke, D. Gunopulos and Raghavan (1998) Automatic subspace clustering of high dimensional data for data mining applications, In Proceedings of the SIGMOD, Vol. 27 Issue 2, pp. 94- 105.

[4] M. Steinbach, L. Ertöz and V. Kumar, “The challenges of clustering high dimensional data”, [online] available : http://www.users.cs.umn.edu/~kumar/papers/high_di m_clustering_19.pdf

[5] J. Gao, P. W. Kwan and Y. Guo (2009) Robust multivariate L1 principal component analysis and dimensionality reduction, Neurocomputing, Vol. 72: 1242-1249.

[6] A.Jain and R. Dubes (1988) Algorithms for clustering data, Prentice Hall, Englewood Cliffs, NJ.

[7] K. Fukunaga, (1990) Introduction to statistical pattern recognition, Academic Press, New York.

[8] G. Strang (1986) Linear algebra and its applications. Harcourt Brace Jovanovich, third edition.

[9] A.Blum and P. Langley (1997) Selection of relevant features and examples in machine learning, Artificial Intelligence, Vol. 97:245–271.

[10] H. Liu and H. Motoda (1998), Feature selection for knowledge discovery & data mining, Boston: Kluwer Academic Publishers.

[11] J. M. Pena, J. A. Lozano, P. Larranaga and Inza, I. (2001) Dimensionality reduction in unsupervised learning of conditional gaussian networks, IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 23(6):590 - 603.

[12] L. Yu and H. Liu, (2003), Feature selection for high dimensional data: A fast correlation based filter solution, In Proceedings of the Twentieth International Conference on Machine Learning, pp. 856-863.

[13] J. Friedman (1994) An overview of computational learning and function approximation, In: From Statistics to Neural Networks. Theory and Pattern Recognition Applications. (Cherkassky, Friedman, Wechsler, eds.) Springer-Verlag 1

[14] M. Ester, H.-P. Kriegel, J. Sander and X. Xu (1996) A Density-based algorithm for discovering clusters in large spatial databases with noise, In Proceedings of the 2nd ACM International Conference on Knowledge Discovery and Data Mining (KDD), Portland, OR., pp. 226-231.

[15] G. Sheikholeslami, S. Chatterjee and A. Zhang “Wavecluster: A multi-resolution clustering approach for very large spatial databases,” In Proceedings of the 24th VLDB Conference (1998).

[16] A. Hinneburg and D. A. Keim, “An efficient approach to clustering in large multimedia databases with noise,” Proceedings of the 4th International Conference on Knowledge Discovery and Data Mining, New York, pp. 58-65 (1998).