Feature selection in clustering is used for extracting
the relevant data from a large collection of data by analyzing on
various patterns of similar data. Based on the accuracy and
efficiency of the data, major issue occurs in clustering. Feature
selection may remedy this issue and thus enhance the prediction
accuracy and minimize the particular computational overhead
associated with classification algorithms. Irrelevant features
usually do not contribute to the actual predictive accuracy, and
also redundant features usually do not contribute in order to
obtaining a better predictor with the they provide mostly
information which can be already contained in other feature(s).
Published In : IJCAT Journal Volume 1, Issue 10
Date of Publication : 30 November 2014
Pages : 525 - 529
Figures :02
Tables : --
Publication Link :Feature Subset Selection Algorithms for Irrelevant
Removal Using Minimum Spanning Tree
Construction
Asifa Akthar Shaik : M.Tech 2nd Year, Department of CSE, SEAT, Tirupati, AP, India
M.Purushottam : Assistant Professor, Department of CSE, SEAT, Tirupati, AP, India
MST
Swift
Symmetric uncertainty
TRelevance
F-Correlation
PCA
Feature selection method is an efficient way to improve
the accuracy of classifiers, dimensionality reduction,
removing both irrelevant and redundant data. Thus
SWIFT algorithm selects only fewer and relevant features
which adds to the classifier accuracy when compared with
PCA as shown in table 1. For the future work, we plan to
explore different types of correlation measures, and study
some formal properties of feature space.
[1] Bell D.A. and Wang, H., A formalism for relevance
and its application in feature subset selection,
Machine Learning, 41(2), pp 175-195, 2000.
[2] Biesiada J. and Duch W., Features election for
highdimensionaldatala Pearson redundancy based
filter,AdvancesinSoftComputing, 45, pp
242C249,2008.
[3] Butterworth R., Piatetsky-Shapiro G. and Simovici
D.A., On Feature Selection through Clustering, In
Proceedings of the Fifth IEEE international
Conference on Data Mining, pp 581-584, 2005.
[4] Chikhi S. and Benhammada S., ReliefMSS: a
variation on a feature ranking ReliefF algorithm. Int.
J. Bus. Intell. Data Min. 4(3/4), pp 375-390, 2009.
[5] Cohen W., Fast Effective Rule Induction, In Proc. 12th
international Conf. Machine Learning (ICML’95), pp
115-123, 1995.
[6] Dash M. and Liu H., Feature Selection for
Classification, Intelligent Data Analysis, 1(3), pp 131-
156, 1997.
[7] Dash M., Liu H. and Motoda H., Consistency based
feature Selection, In Proceedings of the Fourth Pacific
Asia Conference on Knowledge Discovery and Data
Mining, pp 98-109, 2000.
[8] Das S., Filters, wrappers and a boosting-based hybrid
for feature Selection, In Proceedings of the Eighteenth
International Conference on Machine Learning, pp 74-
81, 2001.
[9] Dash M. and Liu H., Consistency-based search in
feature selection.Artificial Intelligence, 151(1-2), pp
155-176, 2003.
[10] Dhillon I.S., Mallela S. and Kumar R., A divisive
information theoretic feature clustering algorithm for
text classification, J. Mach.Learn. Res., 3, pp 1265-
1287, 2003.
[11] Dougherty, E. R., Small sample issues for microarraybased
classification. Comparative and Functional
Genomics, 2(1), pp 28-34, 2001.
[12] Fayyad U. and Irani K., Multi-interval discretization
of continuousvalued attributes for classification
learning, In Proceedings of the Thirteenth
International Joint Conference on Artificial
Intelligence,pp 1022-1027, 1993.
[13] Fleuret F., Fast binary feature selection with
conditional mutual Information, Journal of Machine
Learning Research, 5, pp 1531-1555, 2004.
[14] Forman G., An extensive empirical study of feature
selection metrics for text classification, Journal of
Machine Learning Research, 3, pp 1289-1305, 2003.
[15] Guyon I. and Elisseeff A., An introduction to variable
and feature selection, Journal of Machine Learning
Research, 3, pp 1157-1182,2003.
[16] Hall M.A., Correlation-Based Feature Subset
Selection for MachineLearning, Ph.D. dissertation
Waikato, New Zealand: Univ. Waikato,1999.
[17] Hall M.A. and Smith L.A., Feature Selection for
Machine Learning:Comparing a Correlation-Based
Filter Approach to the Wrapper, In Proceedings of the
Twelth nternational Florida Artificialintelligence
Research Society Conference, pp 235-239, 1999.
[18] Hall M.A., Correlation-Based Feature Selection for
Discrete and Numeric Class Machine Learning, In
Proceedings of 17th International Conference on
Machine Learning, pp 359-366, 2000
[19] Jaromczyk J.W. and Toussaint G.T., Relative
Neighborhood Graphs and Their Relatives, In
Proceedings of the IEEE, 80, pp 1502-1517,1992.