Impact of Using Preprocessing in Data Mining and Knowledge Discovery Process

Abstract
Authors
Keywords
Conclusion
References

Data mining works to extract information known in advance from the enormous quantities of data which can lead to knowledge. It provides information that helps to make good decisions. The effectiveness of data mining in access to knowledge to achieve the goal of which is the discovery of the hidden facts contained in databases and through the use of multiple technologies. Unfortunately, real-world databases are highly influenced by negative factors such the presence of noise, inconsistent and superfluous data and huge sizes in dimensions, examples and features. Thus, low-quality data will lead to low-quality Data Mining performance. Data pre-processing is a first step of Data Mining in Knowledge discovery process (KDD) that reduces the complexity of the data and offers better analysis and ANN training. Based on the collected data from the field as well soil testing laboratory, data analysis is performed more accurately and efficiently. This paper study the huge impact of preprocessing in data mining by prepare the data (clean it, transform it, integrate it) to produce a good data that leads to high quality data mining performance.

Published In : IJCAT Journal Volume 3, Issue 10

Date of Publication : December 2016

Pages : 524-527

Figures :01

Tables : --

Publication Link :Impact of Using Preprocessing in Data Mining and Knowledge Discovery Process

Abdelrahman Elsharif Karrar : College of Computer Science and Engineering, Taibah University Al Madina, Saudi Arabia.

Nafeesa Hassan Mohammed : College of Computer Science and Information Technology, Al-Neelain University Khartoum, Sudan.

Moez Mutasim Ali : College of Computer Science and Information Technology, University of Science and Technology Omdurman, Sudan.

Preprocessing, Data Mining, Knowledge Discovery, Data Preparation

Machine learning and data mining algorithms automatically extract knowledge from machine-readable information. Unfortunately, their success is usually dependant on the quality of the data that they operate on. If the data is inadequate, or contains extraneous and irrelevant information, machine learning and data mining algorithms may produce less accurate and less understandable results, or may fail to discover anything of use at all.

[1] Gregory Piatetsky, “From Data Mining to Knowledge Discovery:An Introduction”, 2012. [2] Jiawei Han MK, Jian Pei. “Data Mining - Concepts and Techniques”, 2012. [3] C. Lemnaru,“Strategies for Dealing with Real World Classification Problems”, 2012. [4] J. Laurikkala, “Instance-based data reduction for improved identification of difficult small classes”, 2002. [5] R. Kumar, V.K. Jayaraman, B.D. Kulkarni, “An SVM classifier incorporating simultaneous noise reduction and feature selection”, 2005.