Separation of Data Cleansing Concept from EDA

Khanjan Purohit

doi:doi:10.11648/j.ijdsa.20210703.16

| Peer-Reviewed

Separation of Data Cleansing Concept from EDA

Khanjan Purohit

Published in International Journal of Data Science and Analysis (Volume 7, Issue 3)

Received: 25 May 2021 Accepted: 8 June 2021 Published: 22 June 2021

Views: Downloads:

Download PDF

Share This Article

Twitter
Linked In
Facebook

Abstract

Available dataset whether it is structured, semi structured or unstructured data, is used for various purposes. These data sets are mostly used for solving an issue using different kinds of techniques like visualization, descriptive, algorithms etc. This data process includes many levels, two of those steps are exploratory data analysis (EDA) and data cleansing. Data cleansing and exploratory data analysis (EDA) are two major operations of any data mining or machine learning study. After collecting the data from various sources, Data cleansing is done to make the data set more accurate, useful and less redundant. Data cleansing is useful to get the accurate information from the dataset and It is used to deal with null values, duplicate values, multiple values, inconsistent value, inaccurate value etc, Which are existing in that data set and It can make our data set filled with error which also affects the analysis and decision making process. By performing data cleansing, we can get rid of many types of misleadings like getting inaccurate output, inaccurate model of machine learning, not getting the hidden patterns behind that data set etc. The purpose of this paper is to study existing research of Data cleansing and EDA and state why Data cleansing process is not part of exploratory data analysis (EDA).

Published in	International Journal of Data Science and Analysis (Volume 7, Issue 3)
DOI	10.11648/j.ijdsa.20210703.16
Page(s)	89-97
Creative Commons	This is an Open Access article, distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution and reproduction in any medium or format, provided the original work is properly cited.
Copyright	Copyright © The Author(s), 2024. Published by Science Publishing Group

Keywords

Data Cleansing, Exploratory Data Analysis (EDA), Data Mining, Normalization, Visualization, Big Data

References

[1]	Arfa Skandar, Mariam Rehman, Maria Anjum (October 2015). An Efficient Duplication Record Detection Algorithm for Data Cleansing. In International Journal of Computer Applications (0975 – 8887) Volume 127–No. 6.
[2]	Estelle Camizuli, Emmanuel John Carranza (2018). Exploratory Data Analysis (EDA), In the Encyclopedia of Archaeological Sciences. Edited by Sandra L. López Varela. © 2018 JohnWiley & Sons, Inc. Published 2018 by John Wiley & Sons, Inc.
[3]	Fakhitah Ridzuan, Wan Mohd Nazmee Wan Zainon (2019). A Review on Data Cleansing Methods for Big Data. In The Fifth Information Systems International Conference 2019, Procedia Computer Science 161 (2019) 731–738.
[4]	G. Sunitha, Dr. A. Jaya (May 2013). A Knowledge Based Approach for Automatic Database Normalization. In International Journal of Advanced Research in Computer Engineering & Technology (IJARCET) Volume 2, No 5, May 2013.
[5]	Heiko Müller, Johann-Christoph Freytag (January 2003). Problems, Methods, and Challenges in Comprehensive Data Cleansing.
[6]	Kabita Sahoo, Abhaya Kumar Samal, Jitendra Pramanik, Subhendu Kumar Pani (October 2019). Exploratory Data Analysis using Python. In International Journal of Innovative Technology and Exploring Engineering (IJITEE) ISSN: 2278-3075, Volume-8, Issue-12.
[7]	Karen A. Monsen. Intervention Effectiveness Research: Quality Improvement and Program Evaluation. © Springer International Publishing AG 2018.
[8]	Kofi Adu-Manu Sarpong, John Kingsley Arthur (August 2013). Analysis of Data Cleansing Approaches regarding Dirty Data – A Comparative Study. In International Journal of Computer Applications (0975–8887) Volume 76–No. 7, August 2013.
[9]	Kofi Adu-Manu Sarpong, John Kingsley Arthur (August 2013). A Review of Data Cleansing Concepts Achievable Goals and Limitations. In International Journal of Computer Applications (0975–8887) Volume 76–No. 7.
[10]	Matthieu Komorowski, Dominic C. Marshall, Justin D. Salciccioli, Yves Crutain (2016). Exploratory Data Analysis. © The Author (s) 2016 in MIT Critical Data, Secondary Analysis of Electronic Health Records.
[11]	Ronald D. Snee (2020). Using Exploratory Data Analysis. In Statistical Engineering Handbook, Chapter 3 - Section 3.
[12]	Rory M. Leith, Keith W. Hipel & Herman Goertz (1991). Exploratory Data Analysis, Canadian Water resources journal, 16: 1, 81-92.
[13]	Hiroyuki Konno, Naoshi Uchihira, Michitaka Kosaka (December 2018). Effective Data Cleansing Method Based on Metadata. International Journal of Japan Association for Management Systems Vol. 10 No. 1, December 2018, pp. 53-58
[14]	Sardjono, R Yadi Rakhman Alamsyah, Marwondo3, Elia Setiana (2020). Data Cleansing Strategies on Data Sets Become Data Science. International Journal of Quantitative Research and Modeling Vol. 1, No. 3, pp. 145-156, 2020.
[15]	Otmane Azeroual1, 2, 3, Gunter Saake2, Mohammad Abuosba (February 2018). Data Quality Measures and Data Cleansing for Research Information Systems. Journal of Digital Information Management Volume 16 Number 1 February 2018.

Cite This Article

Plain Text BibTeX RIS

APA Style

Khanjan Purohit. (2021). Separation of Data Cleansing Concept from EDA. International Journal of Data Science and Analysis, 7(3), 89-97. https://doi.org/10.11648/j.ijdsa.20210703.16

Copy | Download

ACS Style

Khanjan Purohit. Separation of Data Cleansing Concept from EDA. Int. J. Data Sci. Anal. 2021, 7(3), 89-97. doi: 10.11648/j.ijdsa.20210703.16

Copy | Download

AMA Style

Khanjan Purohit. Separation of Data Cleansing Concept from EDA. Int J Data Sci Anal. 2021;7(3):89-97. doi: 10.11648/j.ijdsa.20210703.16

Copy | Download

@article{10.11648/j.ijdsa.20210703.16,
  author = {Khanjan Purohit},
  title = {Separation of Data Cleansing Concept from EDA},
  journal = {International Journal of Data Science and Analysis},
  volume = {7},
  number = {3},
  pages = {89-97},
  doi = {10.11648/j.ijdsa.20210703.16},
  url = {https://doi.org/10.11648/j.ijdsa.20210703.16},
  eprint = {https://article.sciencepublishinggroup.com/pdf/10.11648.j.ijdsa.20210703.16},
  abstract = {Available dataset whether it is structured, semi structured or unstructured data, is used for various purposes. These data sets are mostly used for solving an issue using different kinds of techniques like visualization, descriptive, algorithms etc. This data process includes many levels, two of those steps are exploratory data analysis (EDA) and data cleansing. Data cleansing and exploratory data analysis (EDA) are two major operations of any data mining or machine learning study. After collecting the data from various sources, Data cleansing is done to make the data set more accurate, useful and less redundant. Data cleansing is useful to get the accurate information from the dataset and It is used to deal with null values, duplicate values, multiple values, inconsistent value, inaccurate value etc, Which are existing in that data set and It can make our data set filled with error which also affects the analysis and decision making process. By performing data cleansing, we can get rid of many types of misleadings like getting inaccurate output, inaccurate model of machine learning, not getting the hidden patterns behind that data set etc. The purpose of this paper is to study existing research of Data cleansing and EDA and state why Data cleansing process is not part of exploratory data analysis (EDA).},
 year = {2021}
}

Copy | Download

TY  - JOUR
T1  - Separation of Data Cleansing Concept from EDA
AU  - Khanjan Purohit
Y1  - 2021/06/22
PY  - 2021
N1  - https://doi.org/10.11648/j.ijdsa.20210703.16
DO  - 10.11648/j.ijdsa.20210703.16
T2  - International Journal of Data Science and Analysis
JF  - International Journal of Data Science and Analysis
JO  - International Journal of Data Science and Analysis
SP  - 89
EP  - 97
PB  - Science Publishing Group
SN  - 2575-1891
UR  - https://doi.org/10.11648/j.ijdsa.20210703.16
AB  - Available dataset whether it is structured, semi structured or unstructured data, is used for various purposes. These data sets are mostly used for solving an issue using different kinds of techniques like visualization, descriptive, algorithms etc. This data process includes many levels, two of those steps are exploratory data analysis (EDA) and data cleansing. Data cleansing and exploratory data analysis (EDA) are two major operations of any data mining or machine learning study. After collecting the data from various sources, Data cleansing is done to make the data set more accurate, useful and less redundant. Data cleansing is useful to get the accurate information from the dataset and It is used to deal with null values, duplicate values, multiple values, inconsistent value, inaccurate value etc, Which are existing in that data set and It can make our data set filled with error which also affects the analysis and decision making process. By performing data cleansing, we can get rid of many types of misleadings like getting inaccurate output, inaccurate model of machine learning, not getting the hidden patterns behind that data set etc. The purpose of this paper is to study existing research of Data cleansing and EDA and state why Data cleansing process is not part of exploratory data analysis (EDA).
VL  - 7
IS  - 3
ER  -

Copy | Download

Author Information

Khanjan Purohit

Data Science and Analytics, Jain University, Bangalore, India

Download PDF

Sections

Plain Text BibTeX RIS

APA Style

Khanjan Purohit. (2021). Separation of Data Cleansing Concept from EDA. International Journal of Data Science and Analysis, 7(3), 89-97. https://doi.org/10.11648/j.ijdsa.20210703.16

Copy | Download

ACS Style

Khanjan Purohit. Separation of Data Cleansing Concept from EDA. Int. J. Data Sci. Anal. 2021, 7(3), 89-97. doi: 10.11648/j.ijdsa.20210703.16

Copy | Download

AMA Style

Khanjan Purohit. Separation of Data Cleansing Concept from EDA. Int J Data Sci Anal. 2021;7(3):89-97. doi: 10.11648/j.ijdsa.20210703.16

Copy | Download

@article{10.11648/j.ijdsa.20210703.16,
  author = {Khanjan Purohit},
  title = {Separation of Data Cleansing Concept from EDA},
  journal = {International Journal of Data Science and Analysis},
  volume = {7},
  number = {3},
  pages = {89-97},
  doi = {10.11648/j.ijdsa.20210703.16},
  url = {https://doi.org/10.11648/j.ijdsa.20210703.16},
  eprint = {https://article.sciencepublishinggroup.com/pdf/10.11648.j.ijdsa.20210703.16},
  abstract = {Available dataset whether it is structured, semi structured or unstructured data, is used for various purposes. These data sets are mostly used for solving an issue using different kinds of techniques like visualization, descriptive, algorithms etc. This data process includes many levels, two of those steps are exploratory data analysis (EDA) and data cleansing. Data cleansing and exploratory data analysis (EDA) are two major operations of any data mining or machine learning study. After collecting the data from various sources, Data cleansing is done to make the data set more accurate, useful and less redundant. Data cleansing is useful to get the accurate information from the dataset and It is used to deal with null values, duplicate values, multiple values, inconsistent value, inaccurate value etc, Which are existing in that data set and It can make our data set filled with error which also affects the analysis and decision making process. By performing data cleansing, we can get rid of many types of misleadings like getting inaccurate output, inaccurate model of machine learning, not getting the hidden patterns behind that data set etc. The purpose of this paper is to study existing research of Data cleansing and EDA and state why Data cleansing process is not part of exploratory data analysis (EDA).},
 year = {2021}
}

Copy | Download

TY  - JOUR
T1  - Separation of Data Cleansing Concept from EDA
AU  - Khanjan Purohit
Y1  - 2021/06/22
PY  - 2021
N1  - https://doi.org/10.11648/j.ijdsa.20210703.16
DO  - 10.11648/j.ijdsa.20210703.16
T2  - International Journal of Data Science and Analysis
JF  - International Journal of Data Science and Analysis
JO  - International Journal of Data Science and Analysis
SP  - 89
EP  - 97
PB  - Science Publishing Group
SN  - 2575-1891
UR  - https://doi.org/10.11648/j.ijdsa.20210703.16
AB  - Available dataset whether it is structured, semi structured or unstructured data, is used for various purposes. These data sets are mostly used for solving an issue using different kinds of techniques like visualization, descriptive, algorithms etc. This data process includes many levels, two of those steps are exploratory data analysis (EDA) and data cleansing. Data cleansing and exploratory data analysis (EDA) are two major operations of any data mining or machine learning study. After collecting the data from various sources, Data cleansing is done to make the data set more accurate, useful and less redundant. Data cleansing is useful to get the accurate information from the dataset and It is used to deal with null values, duplicate values, multiple values, inconsistent value, inaccurate value etc, Which are existing in that data set and It can make our data set filled with error which also affects the analysis and decision making process. By performing data cleansing, we can get rid of many types of misleadings like getting inaccurate output, inaccurate model of machine learning, not getting the hidden patterns behind that data set etc. The purpose of this paper is to study existing research of Data cleansing and EDA and state why Data cleansing process is not part of exploratory data analysis (EDA).
VL  - 7
IS  - 3
ER  -

Copy | Download