Enhancing Usability of See5 (Incorporating C5 Algorithm) for Prediction of HPF from SDF

Abstract
Authors
Keywords
Conclusion
References

Prediction of molecular class of an unknown protein is an area of great relevance for carrying out research in various disease detections and their corresponding drug discovery processes and it is a very tough and challenging task. Some specific approaches were used in the past to increase the accuracy of Human protein Function (HPF) prediction. This research is primarily concentrated on one such approach of HPF prediction with sequence derived features (SDF) using decision trees and there variants implemented using C5 algorithm. More sequence derived features were identified and incorporated, training data was enhanced (Sequence data evolved from HPRD (Human protein reference database)) in terms of number of sequences and the features used to extract the relation towards a specific class. Multiple techniques were tested for accuracy in prediction and a comprehensive comparison was done amongst them and the previous research results.

Published In : IJCAT Journal Volume 3, Issue 3

Date of Publication : April 2016

Pages : 255-260

Figures :04

Tables : 01

Publication Link : Enhancing Usability of See5 (Incorporating C5 Algorithm) for Prediction of HPF from SDF

Sunny Sharma : Department of Computer Science, Guru Nanak Dev University, Amritsar, India

Amritpal Singh : Department of Computer Science, Guru Nanak Dev University, Amritsar, India

Dr. Rajinder Singh : Department of Computer Science, Guru Nanak Dev University, Amritsar, India

HPF, C5, See5, Decision Tree, SDF

Present work focus on usability of see5 tool in HPF prediction and also demonstrate the impact of choosing the right training data. The detailed analysis shows that increasing number of features (5 features) of HPF data increases the accuracy of prediction process (about 16%)but does not necessarily involves the participation of all parameters in decision making process. Some parameters were more dominant than others (like GRAVY 13%, Solubility 8%, Thr 4%) hence they decide the course of prediction. Activities like advanced pruning and winnowing (17 attributes winnowed) help in minimizing the computation time and also help in reaching the most important parameters involved in prediction process (ExpAA came out as most important parameter after winnowing). In future more features can be extracted on more sequences and their relative impact on prediction process can be examined hence it will lead to greater precision in the HPF identification process. Inclusion of comparison feature in See5 tool can be of great importance as it will help researchers in identification of correct ruleset and role of newly incorporated feature for the HPF prediction scenario.

[1] B. Bergeron, “Bioinformatics Computing”, pp 257- 270, 2002. [2] D. Arditi and T. Pulket, “Predicting the outcome of construction litigation using boosted decision trees ”, Journal of Computing in Civil Engineering, vol. 19, no. 4, pp 387–393, 2005. [3] H. Wei-Feng, G. Na, Y. Yan, L. Ji-Yang, Y. Ji-Hong, “Decision Trees Com-bined with Feature Selection for the Rational Synthesis of Aluminophos-phate AlPO4- 5”, National Natural Science Foundation of China, vol 27, no.9, pp 2111-2117, 2011. [4] I. Friedberg, “Automated Protein Function Predictionthe Genomic Chal-lenge”, Briefings in Bioinformatics, vol 7, no.3, pp 225-242. [5] J. Han and M. Kamber, “Data Mining Concepts and Techniques”, MorganKaufmann Publishers, USA pp 279-322, 2003. [6] L.J. Jensen, R. Gupta, N. Blom, D. Devos, J. Tamames C. Kesmir, H. Nielsen, H.H. Stærfeldt, K. Rapacki, C. Workman C.A.F. Andersen, S. Knudsen, A. Krogh, A.Valencia and S. Brunak , “Prediction of Human Protein Function from Post-Translational Modifications and Localization Features ”, Journal of Molecular Biology, vol. 319, issue 5,pp 1257-1265, 2002. [7] M. Singh, G. Singh, “Cluster Analysis Technique based on Bipartite Graph for Human Protein Class Prediction”, International Journal of Computer Applications (0975 – 8887), vol. 20, no.3, pp. 22-27, 2011. [8] M. Singh, P. K. Wadhwa and P. S. Sandhu , “ Human Protein Function Prediction using Decision Tree Induction “, IJCSNS International Journal of Computer Science and Network Security, vol. 7, no.4, pp. 92-98, 2007. [9] www.hprd.org. [10] http://rulequest.com/see5-info.html.