This paper proposes an ensemble model for classification of Cancer dataset. Ensemble models are used to improve the
classification accuracy of a system by combining the outcomes of individual classifiers. In this paper a number of data mining classifiers
like C5.0 (C5.0 Decision Tree), CART(Classification and Regression Tree), CHAID(Chi-squared Automatic Interaction Detection),
QUEST(Quick Unbiased Efficient Statistical Tree), ANN (Artificial Neural Network) and SVM( Support Vector Machine) are used as
individual classifier for classification purpose. The outcomes of the individual classifiers are evaluated using performance measures like
accuracy, specificity, sensitivity, gain charts and response chart. A comparative analysis is carried out among the individual classifiers.
Further to improve the classification accuracy of the system the outcomes of individual classifiers are combined using confidential voting
scheme to develop the ensemble model. The performance of the ensemble model is evaluated and compared with the individual
classifiers. From experiment it is found that the ensemble model developed exhibit well as compared to the individual classifiers.
Published In:IJCAT Journal Volume 7, Issue 6
Date of Publication : June 2020
Pages : 72-80
Figures :04
Tables :05
Pushpalata Pujari :
is working an Assistant Professor
and Head in the CSIT department, Guru Ghasidas Vishwavidyalaya,
Central University, Bilaspur, Chhattisgarh, India. She has received
her M.C.A Degree from Berhampur University, Berhampur, Odisha,
India in 1998. She has received her Ph.D degree from the department
of Computer Science and Information Technology, Guru Ghasidas
Vishwavidyalaya, Central University, Bilaspur, Chhattisgarh, India.
Her areas of interest include Character Recognition, Pattern
Recognition, Soft Computing, Evolutionary Computing and Data
Mining.
Classification, C5.0, CART, CHAID, QUEST, ANN, SVM and Ensemble model
The main goal of this study is to show the effectiveness of
ensemble model. The performance of individual classifiers C5.0, CART, CHAID, QUEST, ANN, SVM and ensemble
models are analyzed on the cancer dataset. The
performance of all models is investigated by using
statistical performance measures like accuracy, specificity
and sensitivity. The performance of each classifier is also
investigated with the help of gain chart and response chart
for both training and testing set. The accuracy of C5.0,
CART, CHAID, QUEST, ANN and SVM is found be
94.52, 95.89, 93.15 ,89.04,97.26 and 91.78 respectively on
test dataset. The accuracy of the ensemble model built
using individual classifier is found to 98.63 on test data set.
It is observed that performance of ensemble is higher than
the individual models. Thus the proposed ensemble models
can be a competitive technique for the classification of
cancer dataset.
[1] Usama M. Fayyad. "Data mining and knowledge
discovery: Making sense out of data". IEEE Expert:
Intelligent Systems and Their Applications, 1996, Vol.
11(5), pp: 20-25.
[2] Jiwaei Han, Kamber Micheline, Jian"Pei, Data mining:
Concepts and Techniques", Morgan Kaufmann
Publishers (Mar 2006).
[3] Cabena, Hadjinian, Atadler, Verhees, Zansi
"Discovering data mining from concept to
implementation" International Technical Support
Organization, Copyright IBM corporation 1998.
[4] S.Mitra, T. Acharya "Data Mining Multimedia, Soft
computing and Bioinformatics", A john Willy & Sons,
INC, Publication, 2004.
[5] Alaa M. Elsayad "Predicting the severity of breast
masses with ensemble of Bayesian classifiers" journal
of computer science, 2010, Vol. 6 (5), pp: 576-584.
[6] Alaa M. Elsayad, "Diagnosis of Erythemato-Squamous
diseases using ensemble of data mining methods",
ICGST-BIME Journal Volume 10, Issue 1, December
2010.
[7] SPSS Clementine help file. http//www.spss.com
[8] UCI Machine Learning Repository of machine learning
databases. University of California, School of
Information and Computer Science, Irvine. C.A.
http://www.ics.uci.edu/~mlram,?ML.Repositary.html
[9] Michael J. A. Berry, Gordon Linoff, "Data Mining
Techniques ", John Wiley and Sons, Inc.
[10] Hota H.S, Pujari P; "A comparative study of Decision
tree based data mining algorithm and its ensemble
method for classification of data" ,Proceeding of
international conference on Emerging trends in soft
computing and ICT (SCICT-2011) ,(pp:41-
44),Organized by Dept of CSIT,GGV ,Bilaspur, India
on 16-17 March
[11] Jozef Zurada and Subash Lonial "Comparison of The
Performance of Several Data Mining methods for Bed
Debt Recovery in The Health Care Industry".
[12] Matthew N Anyanwu & Sajjan G Shiva "Comparative
Analysis of serial Decision Trees Classification
Algorithms",(IJCSS), Volume ( 3) : Issue ( 3).
[13] Mahesh Pal, "Ensemble Learning With Decision Tree
for Remote Sensing Classification", World Academy of
Science, Engineering and Technology 36 , 2007.
[14] Kelly H. Zou, PhD; A. James O'Malley, PhD; Laura
Mauri, MD, MSc "ROC Analysis for Evaluating
Diagnostic Test and Predictive Models".
[15] "Ensemble Data Mining Model for Classification of
Pima Indian Diabetes Data set", Proceeding of
International Joint Conference on Advance
Engineering & Technology, ISBN: 978-93-81693-88-
22, Raipur, (PP: 16-22)9th-10th, April 2013.
[16] Pujari P, Gupta J.B; "Estimation and Comparison of
Classification Models by Using Numeric Predictor on
Iris Data Set (AICON-13)", All India Conference on
"Global Innovations in Computer Science and
Engineering and Information Technology", Organized
by CSIT, Durg, (C.G), on April 12-13, 2013., (pp.
1.30-1.39) ISBN: 978-81-923288-1-2.
[17] Sharma D.K, Hota H.S, Pujari P., "Neural network,
support vector machine and its ensemble model for
prediction of different categories of dermatology data
set", Proceedings of Academy of Information and
Management Sciences, Volume 16, Number 1, Allied
Academies International Conference New Orleans,
Louisiana,4-6 April 2012.
[18] Maria-Luiza Antonie, Osmar R. Zaiane, Alexandru
Coma, .Application of Data Mining Techniques for
Medical Image Classification. Proceeding of second
International workshop on Multimedia data mining
(MDM/KDD'2001), in conjunction with ACM
SIGKDD conference.SAN FRANCISCO, USA, AUG
26, 2001.
[19] Sujatha, Dr.K.Usha Rani, "Evaluation of Decision Tree
Classifiers on Tumor Data
sets",IJETTCS,Vol2,Issue4,July-aug2013,pp.418-423.
[20] Aruna, Dr S.P. Rajagopalan and L.V. Nandakishore,
2011 Knowledge Based Analysis Of Various Statistical
Tools In Detecting Breast Cancer.
[21] Tan,Gilbert, " Ensembling machine learning on gene
expression data for cancer classification", Proceedings
of New Zealand Bioinformatics Conference, Te Papa,
Wellington, New Zealand, 13-14 February 2003.
[22] D.Lavanya and Dr.K.Usha Rani, "Ensemble Decision
Tree Classifiers for Brest Cancer Data", International
Journal of Information Technology Convergence and
Services, Feb 2012 Vol.2, No.1, pp.17-24
[23] Delen Dursun, Walker Glenn and Kadam Amit,
"Predicting breast cancer survivability: a comparison
of three data mining methods," Artificial Intelligence in
Medicine, June 2005, vol. 34, Pg. no: 113-127.
[24] Shomona Gracia Jacob, Dr.R.Geetha Ramani, P.Nancy
(2011 b), "Feature Selection and Classification in
Breast Cancer Datasets through Data Mining
Algorithms", Proceedings of the IEEE International
Conference on Computational Intelligence and Computing Research (ICCIC'2011), Kanyakumari,
India,, IEEE Catalog Number: CFP1120J-PRT, ISBN:
978-1-61284-766-5. pp. 661-667
[25] Eva Volna and Martin Kotyrba, "Enhanced ensemblebased
classifier with boosting for pattern recognition,"
Applied mathematics and computations, October 2017,
vol. 310, pp. 1-14.
[26] Ankit, Nabizath Saleena, "An Ensemble Classification
System for Twitter Sentiment Analysis", Science
Volume, 2018, pp.937-946.