Automated Essay Grading using Natural Language Processing and Support Vector Machine

Abstract
Authors
Keywords
Conclusion
References

This paper proposes to grade various essays and literary materials automatically using the feature extraction techniques from Natural Language Processing (NLP) and Support Vector Machine (SVM), a powerful machine learning algorithm for classification, modelled around Education Testing Service’s GRE Analytical Writing scoring guidelines. We extracted various features like word count, TF-IDF score, number of paragraphs, part of speech tagging and number of spelling mistakes on the essay dataset sourced from Kaggle [1]. After extracting the features using NLP, there were two possible approaches to tackle the problem; a regression model or a model based on classification. We used a classification-based approach to train our model with training essays, normalized to a scale of 1 to 6. Upon predicting the grades for the essays in testing set, we found that the accuracy of our model stood at 0.52, and 0.89 with a tolerance of one point, as permissible by ETS that uses automatic essay grading [2]. Individual essay sets can be graded with an automatic grading framework, a lot of human effort could be saved and literary pieces can be graded with transparency.

Published In : IJCAT Journal Volume 5, Issue 2

Date of Publication : February 2018

Pages : 18-21

Figures :01

Tables :01

Publication Link :Automated Essay Grading using Natural Language Processing and Support Vector Machine

Abhishek Suresh : received his BTech in Mechanical Engineering from Manipal Institute of Technology, Karnataka, India. He worked as a Trainee Decision Scientist at Mu Sigma Business Solutions Pvt. Ltd., Bengaluru, India after graduation. Currently he is pursuing Master’s in Computational Linguistics, Analytics, Search and Informatics at the University of Colorado Boulder. His interests include computational linguistics, NLP, machine learning and text analytics.

Manuj Jha : received his BE in Telecommunication from R.V. College of Engineering, Bengaluru, India. He worked as a Trainee Decision Scientist at Mu Sigma Business Solutions Pvt. Ltd., Bengaluru, India after graduation. Currently he is pursuing Master’s in Data Science at Texas Tech University. His interests include NLP, machine learning and descriptive analytics.

Natural Language Processing, Essay Grading, Machine Learning, Support Vector Machine

In this paper, we identified a classification-based approach to solve the problem of grading literary materials manually. We used Natural Language Processing to extract various features which are characteristic of a good writing. The accuracy of model could be further improved if a metric for similarity between the essay topic/ problem statement were added. The topic on which essay was written was not described in the dataset we used, but this is mostly known in most of the exams/ standardized tests. The model we designed performed reasonably well with an allowance of one point in marking and could definitely be used to grade written essays/ literary materials.

[1] Kaggle. The Hewlett Foundation: Automated Essay Scoring. Available from: https://www.kaggle.com/c/asap-aes. [2] Valenti, S., F. Neri, and A. Cucchiarelli, An overview of current research on automated essay grading. Journal of Information Technology Education: Research, 2003. 2(1): p. 319-330. [3] Shermis, M.D., et al., Automated essay scoring: Writing assessment and instruction. International encyclopedia of education, 2010. 4(1): p. 20-26. [4] Drolia, S., et al., Automated Essay Rater using Natural Language Processing. International Journal of Computer Applications, 2017. 163(10). [5] Ramineni, C., et al., Evaluation of the eirater® Scoring Engine for the GRE® Issue and Argument Prompts. ETS Research Report Series, 2012. 2012(1). [6] Drucker, H., D. Wu, and V.N. Vapnik, Support vector machines for spam categorization. IEEE Transactions on Neural networks, 1999. 10(5): p. 1048-1054. [7] Abhishek Suresh and Manuj Jha, Automated Essay Grading Python Code. Available from: https://github.com/absu5530/AES/blob/master/AES. py [8] Durgesh, K.S. and B. Lekha, Data classification using support vector machine. Journal of Theoretical and Applied Information Technology, 2010. 12(1): p. 1-7. [9] Li, Y., K. Bontcheva, and H. Cunningham, Adapting SVM for data sparseness and imbalance: a case study in information extraction. Natural Language Engineering, 2009. 15(2): p. 241-271.