Feature Subset Selection Algorithms for Irrelevant Removal Using Minimum Spanning Tree Construction

Abstract
Authors
Keywords
Conclusion
References

Feature selection in clustering is used for extracting the relevant data from a large collection of data by analyzing on various patterns of similar data. Based on the accuracy and efficiency of the data, major issue occurs in clustering. Feature selection may remedy this issue and thus enhance the prediction accuracy and minimize the particular computational overhead associated with classification algorithms. Irrelevant features usually do not contribute to the actual predictive accuracy, and also redundant features usually do not contribute in order to obtaining a better predictor with the they provide mostly information which can be already contained in other feature(s).

Published In : IJCAT Journal Volume 1, Issue 10

Date of Publication : 30 November 2014

Pages : 525 - 529

Figures :02

Tables : --

Publication Link :Feature Subset Selection Algorithms for Irrelevant Removal Using Minimum Spanning Tree Construction

Asifa Akthar Shaik : M.Tech 2nd Year, Department of CSE, SEAT, Tirupati, AP, India

M.Purushottam : Assistant Professor, Department of CSE, SEAT, Tirupati, AP, India

MST

Swift

Symmetric uncertainty

TRelevance

F-Correlation

PCA

Feature selection method is an efficient way to improve the accuracy of classifiers, dimensionality reduction, removing both irrelevant and redundant data. Thus SWIFT algorithm selects only fewer and relevant features which adds to the classifier accuracy when compared with PCA as shown in table 1. For the future work, we plan to explore different types of correlation measures, and study some formal properties of feature space.

[1] Bell D.A. and Wang, H., A formalism for relevance and its application in feature subset selection, Machine Learning, 41(2), pp 175-195, 2000. [2] Biesiada J. and Duch W., Features election for highdimensionaldatala Pearson redundancy based filter,AdvancesinSoftComputing, 45, pp 242C249,2008. [3] Butterworth R., Piatetsky-Shapiro G. and Simovici D.A., On Feature Selection through Clustering, In Proceedings of the Fifth IEEE international Conference on Data Mining, pp 581-584, 2005. [4] Chikhi S. and Benhammada S., ReliefMSS: a variation on a feature ranking ReliefF algorithm. Int. J. Bus. Intell. Data Min. 4(3/4), pp 375-390, 2009. [5] Cohen W., Fast Effective Rule Induction, In Proc. 12th international Conf. Machine Learning (ICML’95), pp 115-123, 1995. [6] Dash M. and Liu H., Feature Selection for Classification, Intelligent Data Analysis, 1(3), pp 131- 156, 1997. [7] Dash M., Liu H. and Motoda H., Consistency based feature Selection, In Proceedings of the Fourth Pacific Asia Conference on Knowledge Discovery and Data Mining, pp 98-109, 2000. [8] Das S., Filters, wrappers and a boosting-based hybrid for feature Selection, In Proceedings of the Eighteenth International Conference on Machine Learning, pp 74- 81, 2001. [9] Dash M. and Liu H., Consistency-based search in feature selection.Artificial Intelligence, 151(1-2), pp 155-176, 2003. [10] Dhillon I.S., Mallela S. and Kumar R., A divisive information theoretic feature clustering algorithm for text classification, J. Mach.Learn. Res., 3, pp 1265- 1287, 2003. [11] Dougherty, E. R., Small sample issues for microarraybased classification. Comparative and Functional Genomics, 2(1), pp 28-34, 2001. [12] Fayyad U. and Irani K., Multi-interval discretization of continuousvalued attributes for classification learning, In Proceedings of the Thirteenth International Joint Conference on Artificial Intelligence,pp 1022-1027, 1993. [13] Fleuret F., Fast binary feature selection with conditional mutual Information, Journal of Machine Learning Research, 5, pp 1531-1555, 2004. [14] Forman G., An extensive empirical study of feature selection metrics for text classification, Journal of Machine Learning Research, 3, pp 1289-1305, 2003. [15] Guyon I. and Elisseeff A., An introduction to variable and feature selection, Journal of Machine Learning Research, 3, pp 1157-1182,2003. [16] Hall M.A., Correlation-Based Feature Subset Selection for MachineLearning, Ph.D. dissertation Waikato, New Zealand: Univ. Waikato,1999. [17] Hall M.A. and Smith L.A., Feature Selection for Machine Learning:Comparing a Correlation-Based Filter Approach to the Wrapper, In Proceedings of the Twelth nternational Florida Artificialintelligence Research Society Conference, pp 235-239, 1999. [18] Hall M.A., Correlation-Based Feature Selection for Discrete and Numeric Class Machine Learning, In Proceedings of 17th International Conference on Machine Learning, pp 359-366, 2000 [19] Jaromczyk J.W. and Toussaint G.T., Relative Neighborhood Graphs and Their Relatives, In Proceedings of the IEEE, 80, pp 1502-1517,1992.