Data mining works to extract information known
in advance from the enormous quantities of data which can
lead to knowledge. It provides information that helps to
make good decisions. The effectiveness of data mining in
access to knowledge to achieve the goal of which is the
discovery of the hidden facts contained in databases and
through the use of multiple technologies. Unfortunately,
real-world databases are highly influenced by negative
factors such the presence of noise, inconsistent and
superfluous data and huge sizes in dimensions, examples
and features. Thus, low-quality data will lead to low-quality
Data Mining performance. Data pre-processing is a first
step of Data Mining in Knowledge discovery process (KDD)
that reduces the complexity of the data and offers better
analysis and ANN training. Based on the collected data from
the field as well soil testing laboratory, data analysis is
performed more accurately and efficiently. This paper study
the huge impact of preprocessing in data mining by prepare
the data (clean it, transform it, integrate it) to produce a
good data that leads to high quality data mining
performance.
Published In : IJCAT Journal Volume 3, Issue 10
Date of Publication : December 2016
Pages : 524-527
Figures :01
Tables : --
Publication Link :Impact of Using Preprocessing in Data Mining and
Knowledge Discovery Process
Abdelrahman Elsharif Karrar :
College of Computer Science and Engineering, Taibah University
Al Madina, Saudi Arabia.
Nafeesa Hassan Mohammed : College of Computer Science and Information Technology, Al-Neelain University
Khartoum, Sudan.
Moez Mutasim Ali : College of Computer Science and Information Technology, University of Science and Technology
Omdurman, Sudan.
Preprocessing, Data Mining, Knowledge
Discovery, Data Preparation
Machine learning and data mining algorithms
automatically extract knowledge from machine-readable
information. Unfortunately, their success is usually
dependant on the quality of the data that they operate on.
If the data is inadequate, or contains extraneous and
irrelevant information, machine learning and data mining
algorithms may produce less accurate and less
understandable results, or may fail to discover anything of
use at all.
[1] Gregory Piatetsky, “From Data Mining to
Knowledge Discovery:An Introduction”, 2012.
[2] Jiawei Han MK, Jian Pei. “Data Mining -
Concepts and Techniques”, 2012.
[3] C. Lemnaru,“Strategies for Dealing with Real
World Classification Problems”, 2012.
[4] J. Laurikkala, “Instance-based data reduction for
improved identification of difficult small classes”,
2002.
[5] R. Kumar, V.K. Jayaraman, B.D. Kulkarni, “An
SVM classifier incorporating simultaneous noise
reduction and feature selection”, 2005.