Ensemble of Data Mining Classifiers for Classification of Cancer Dataset  
  Authors : Pushpalata Pujari

 

This paper proposes an ensemble model for classification of Cancer dataset. Ensemble models are used to improve the classification accuracy of a system by combining the outcomes of individual classifiers. In this paper a number of data mining classifiers like C5.0 (C5.0 Decision Tree), CART(Classification and Regression Tree), CHAID(Chi-squared Automatic Interaction Detection), QUEST(Quick Unbiased Efficient Statistical Tree), ANN (Artificial Neural Network) and SVM( Support Vector Machine) are used as individual classifier for classification purpose. The outcomes of the individual classifiers are evaluated using performance measures like accuracy, specificity, sensitivity, gain charts and response chart. A comparative analysis is carried out among the individual classifiers. Further to improve the classification accuracy of the system the outcomes of individual classifiers are combined using confidential voting scheme to develop the ensemble model. The performance of the ensemble model is evaluated and compared with the individual classifiers. From experiment it is found that the ensemble model developed exhibit well as compared to the individual classifiers.

 

Published In : IJCAT Journal Volume 7, Issue 6

Date of Publication : June 2020

Pages : 72-80

Figures :04

Tables :05

 

 

 

Pushpalata Pujari : is working an Assistant Professor and Head in the CSIT department, Guru Ghasidas Vishwavidyalaya, Central University, Bilaspur, Chhattisgarh, India. She has received her M.C.A Degree from Berhampur University, Berhampur, Odisha, India in 1998. She has received her Ph.D degree from the department of Computer Science and Information Technology, Guru Ghasidas Vishwavidyalaya, Central University, Bilaspur, Chhattisgarh, India. Her areas of interest include Character Recognition, Pattern Recognition, Soft Computing, Evolutionary Computing and Data Mining.

 

 

 

 

 

 

 

Classification, C5.0, CART, CHAID, QUEST, ANN, SVM and Ensemble model

 

 

 

 

 

 

 

 

 

 

 

 

 

 

The main goal of this study is to show the effectiveness of ensemble model. The performance of individual classifiers C5.0, CART, CHAID, QUEST, ANN, SVM and ensemble models are analyzed on the cancer dataset. The performance of all models is investigated by using statistical performance measures like accuracy, specificity and sensitivity. The performance of each classifier is also investigated with the help of gain chart and response chart for both training and testing set. The accuracy of C5.0, CART, CHAID, QUEST, ANN and SVM is found be 94.52, 95.89, 93.15 ,89.04,97.26 and 91.78 respectively on test dataset. The accuracy of the ensemble model built using individual classifier is found to 98.63 on test data set. It is observed that performance of ensemble is higher than the individual models. Thus the proposed ensemble models can be a competitive technique for the classification of cancer dataset.

 

 

 

 

 

 

 

 

 

[1] Usama M. Fayyad. "Data mining and knowledge discovery: Making sense out of data". IEEE Expert: Intelligent Systems and Their Applications, 1996, Vol. 11(5), pp: 20-25. [2] Jiwaei Han, Kamber Micheline, Jian"Pei, Data mining: Concepts and Techniques", Morgan Kaufmann Publishers (Mar 2006). [3] Cabena, Hadjinian, Atadler, Verhees, Zansi "Discovering data mining from concept to implementation" International Technical Support Organization, Copyright IBM corporation 1998. [4] S.Mitra, T. Acharya "Data Mining Multimedia, Soft computing and Bioinformatics", A john Willy & Sons, INC, Publication, 2004. [5] Alaa M. Elsayad "Predicting the severity of breast masses with ensemble of Bayesian classifiers" journal of computer science, 2010, Vol. 6 (5), pp: 576-584. [6] Alaa M. Elsayad, "Diagnosis of Erythemato-Squamous diseases using ensemble of data mining methods", ICGST-BIME Journal Volume 10, Issue 1, December 2010. [7] SPSS Clementine help file. http//www.spss.com [8] UCI Machine Learning Repository of machine learning databases. University of California, School of Information and Computer Science, Irvine. C.A. http://www.ics.uci.edu/~mlram,?ML.Repositary.html [9] Michael J. A. Berry, Gordon Linoff, "Data Mining Techniques ", John Wiley and Sons, Inc. [10] Hota H.S, Pujari P; "A comparative study of Decision tree based data mining algorithm and its ensemble method for classification of data" ,Proceeding of international conference on Emerging trends in soft computing and ICT (SCICT-2011) ,(pp:41- 44),Organized by Dept of CSIT,GGV ,Bilaspur, India on 16-17 March [11] Jozef Zurada and Subash Lonial "Comparison of The Performance of Several Data Mining methods for Bed Debt Recovery in The Health Care Industry". [12] Matthew N Anyanwu & Sajjan G Shiva "Comparative Analysis of serial Decision Trees Classification Algorithms",(IJCSS), Volume ( 3) : Issue ( 3). [13] Mahesh Pal, "Ensemble Learning With Decision Tree for Remote Sensing Classification", World Academy of Science, Engineering and Technology 36 , 2007. [14] Kelly H. Zou, PhD; A. James O'Malley, PhD; Laura Mauri, MD, MSc "ROC Analysis for Evaluating Diagnostic Test and Predictive Models". [15] "Ensemble Data Mining Model for Classification of Pima Indian Diabetes Data set", Proceeding of International Joint Conference on Advance Engineering & Technology, ISBN: 978-93-81693-88- 22, Raipur, (PP: 16-22)9th-10th, April 2013. [16] Pujari P, Gupta J.B; "Estimation and Comparison of Classification Models by Using Numeric Predictor on Iris Data Set (AICON-13)", All India Conference on "Global Innovations in Computer Science and Engineering and Information Technology", Organized by CSIT, Durg, (C.G), on April 12-13, 2013., (pp. 1.30-1.39) ISBN: 978-81-923288-1-2. [17] Sharma D.K, Hota H.S, Pujari P., "Neural network, support vector machine and its ensemble model for prediction of different categories of dermatology data set", Proceedings of Academy of Information and Management Sciences, Volume 16, Number 1, Allied Academies International Conference New Orleans, Louisiana,4-6 April 2012. [18] Maria-Luiza Antonie, Osmar R. Zaiane, Alexandru Coma, .Application of Data Mining Techniques for Medical Image Classification. Proceeding of second International workshop on Multimedia data mining (MDM/KDD'2001), in conjunction with ACM SIGKDD conference.SAN FRANCISCO, USA, AUG 26, 2001. [19] Sujatha, Dr.K.Usha Rani, "Evaluation of Decision Tree Classifiers on Tumor Data sets",IJETTCS,Vol2,Issue4,July-aug2013,pp.418-423. [20] Aruna, Dr S.P. Rajagopalan and L.V. Nandakishore, 2011 Knowledge Based Analysis Of Various Statistical Tools In Detecting Breast Cancer. [21] Tan,Gilbert, " Ensembling machine learning on gene expression data for cancer classification", Proceedings of New Zealand Bioinformatics Conference, Te Papa, Wellington, New Zealand, 13-14 February 2003. [22] D.Lavanya and Dr.K.Usha Rani, "Ensemble Decision Tree Classifiers for Brest Cancer Data", International Journal of Information Technology Convergence and Services, Feb 2012 Vol.2, No.1, pp.17-24 [23] Delen Dursun, Walker Glenn and Kadam Amit, "Predicting breast cancer survivability: a comparison of three data mining methods," Artificial Intelligence in Medicine, June 2005, vol. 34, Pg. no: 113-127. [24] Shomona Gracia Jacob, Dr.R.Geetha Ramani, P.Nancy (2011 b), "Feature Selection and Classification in Breast Cancer Datasets through Data Mining Algorithms", Proceedings of the IEEE International Conference on Computational Intelligence and Computing Research (ICCIC'2011), Kanyakumari, India,, IEEE Catalog Number: CFP1120J-PRT, ISBN: 978-1-61284-766-5. pp. 661-667 [25] Eva Volna and Martin Kotyrba, "Enhanced ensemblebased classifier with boosting for pattern recognition," Applied mathematics and computations, October 2017, vol. 310, pp. 1-14. [26] Ankit, Nabizath Saleena, "An Ensemble Classification System for Twitter Sentiment Analysis", Science Volume, 2018, pp.937-946.