Archive
Volume 7 , Issue 3 , June 2021 , Pages: 89 - 97
Separation of Data Cleansing Concept from EDA
Khanjan Purohit, Data Science and Analytics, Jain University, Bangalore, India
Received: May 25, 2021;       Accepted: Jun. 8, 2021;       Published: Jun. 22, 2021
DOI: 10.11648/j.ijdsa.20210703.16        View        Downloads  
Abstract
Available dataset whether it is structured, semi structured or unstructured data, is used for various purposes. These data sets are mostly used for solving an issue using different kinds of techniques like visualization, descriptive, algorithms etc. This data process includes many levels, two of those steps are exploratory data analysis (EDA) and data cleansing. Data cleansing and exploratory data analysis (EDA) are two major operations of any data mining or machine learning study. After collecting the data from various sources, Data cleansing is done to make the data set more accurate, useful and less redundant. Data cleansing is useful to get the accurate information from the dataset and It is used to deal with null values, duplicate values, multiple values, inconsistent value, inaccurate value etc, Which are existing in that data set and It can make our data set filled with error which also affects the analysis and decision making process. By performing data cleansing, we can get rid of many types of misleadings like getting inaccurate output, inaccurate model of machine learning, not getting the hidden patterns behind that data set etc. The purpose of this paper is to study existing research of Data cleansing and EDA and state why Data cleansing process is not part of exploratory data analysis (EDA).
Keywords
Data Cleansing, Exploratory Data Analysis (EDA), Data Mining, Normalization, Visualization, Big Data
To cite this article
Khanjan Purohit, Separation of Data Cleansing Concept from EDA, International Journal of Data Science and Analysis. Vol. 7, No. 3, 2021, pp. 89-97. doi: 10.11648/j.ijdsa.20210703.16
Copyright
Copyright © 2021 Authors retain the copyright of this article.
This article is an open access article distributed under the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
References
[ 1 ]
Arfa Skandar, Mariam Rehman, Maria Anjum (October 2015). An Efficient Duplication Record Detection Algorithm for Data Cleansing. In International Journal of Computer Applications (0975 – 8887) Volume 127–No. 6.
[ 2 ]
Estelle Camizuli, Emmanuel John Carranza (2018). Exploratory Data Analysis (EDA), In the Encyclopedia of Archaeological Sciences. Edited by Sandra L. López Varela. © 2018 JohnWiley & Sons, Inc. Published 2018 by John Wiley & Sons, Inc.
[ 3 ]
Fakhitah Ridzuan, Wan Mohd Nazmee Wan Zainon (2019). A Review on Data Cleansing Methods for Big Data. In The Fifth Information Systems International Conference 2019, Procedia Computer Science 161 (2019) 731–738.
[ 4 ]
G. Sunitha, Dr. A. Jaya (May 2013). A Knowledge Based Approach for Automatic Database Normalization. In International Journal of Advanced Research in Computer Engineering & Technology (IJARCET) Volume 2, No 5, May 2013.
[ 5 ]
Heiko Müller, Johann-Christoph Freytag (January 2003). Problems, Methods, and Challenges in Comprehensive Data Cleansing.
[ 6 ]
Kabita Sahoo, Abhaya Kumar Samal, Jitendra Pramanik, Subhendu Kumar Pani (October 2019). Exploratory Data Analysis using Python. In International Journal of Innovative Technology and Exploring Engineering (IJITEE) ISSN: 2278-3075, Volume-8, Issue-12.
[ 7 ]
Karen A. Monsen. Intervention Effectiveness Research: Quality Improvement and Program Evaluation. © Springer International Publishing AG 2018.
[ 8 ]
Kofi Adu-Manu Sarpong, John Kingsley Arthur (August 2013). Analysis of Data Cleansing Approaches regarding Dirty Data – A Comparative Study. In International Journal of Computer Applications (0975–8887) Volume 76–No. 7, August 2013.
[ 9 ]
Kofi Adu-Manu Sarpong, John Kingsley Arthur (August 2013). A Review of Data Cleansing Concepts Achievable Goals and Limitations. In International Journal of Computer Applications (0975–8887) Volume 76–No. 7.
[ 10 ]
Matthieu Komorowski, Dominic C. Marshall, Justin D. Salciccioli, Yves Crutain (2016). Exploratory Data Analysis. © The Author (s) 2016 in MIT Critical Data, Secondary Analysis of Electronic Health Records.
[ 11 ]
Ronald D. Snee (2020). Using Exploratory Data Analysis. In Statistical Engineering Handbook, Chapter 3 - Section 3.
[ 12 ]
Rory M. Leith, Keith W. Hipel & Herman Goertz (1991). Exploratory Data Analysis, Canadian Water resources journal, 16: 1, 81-92.
[ 13 ]
Hiroyuki Konno, Naoshi Uchihira, Michitaka Kosaka (December 2018). Effective Data Cleansing Method Based on Metadata. International Journal of Japan Association for Management Systems Vol. 10 No. 1, December 2018, pp. 53-58
[ 14 ]
Sardjono, R Yadi Rakhman Alamsyah, Marwondo3, Elia Setiana (2020). Data Cleansing Strategies on Data Sets Become Data Science. International Journal of Quantitative Research and Modeling Vol. 1, No. 3, pp. 145-156, 2020.
[ 15 ]
Otmane Azeroual1, 2, 3, Gunter Saake2, Mohammad Abuosba (February 2018). Data Quality Measures and Data Cleansing for Research Information Systems. Journal of Digital Information Management Volume 16 Number 1 February 2018.
Browse journals by subject