| Peer-Reviewed

Separation of Data Cleansing Concept from EDA

Received: 25 May 2021    Accepted: 8 June 2021    Published: 22 June 2021
Views:       Downloads:
Abstract

Available dataset whether it is structured, semi structured or unstructured data, is used for various purposes. These data sets are mostly used for solving an issue using different kinds of techniques like visualization, descriptive, algorithms etc. This data process includes many levels, two of those steps are exploratory data analysis (EDA) and data cleansing. Data cleansing and exploratory data analysis (EDA) are two major operations of any data mining or machine learning study. After collecting the data from various sources, Data cleansing is done to make the data set more accurate, useful and less redundant. Data cleansing is useful to get the accurate information from the dataset and It is used to deal with null values, duplicate values, multiple values, inconsistent value, inaccurate value etc, Which are existing in that data set and It can make our data set filled with error which also affects the analysis and decision making process. By performing data cleansing, we can get rid of many types of misleadings like getting inaccurate output, inaccurate model of machine learning, not getting the hidden patterns behind that data set etc. The purpose of this paper is to study existing research of Data cleansing and EDA and state why Data cleansing process is not part of exploratory data analysis (EDA).

Published in International Journal of Data Science and Analysis (Volume 7, Issue 3)
DOI 10.11648/j.ijdsa.20210703.16
Page(s) 89-97
Creative Commons

This is an Open Access article, distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution and reproduction in any medium or format, provided the original work is properly cited.

Copyright

Copyright © The Author(s), 2024. Published by Science Publishing Group

Keywords

Data Cleansing, Exploratory Data Analysis (EDA), Data Mining, Normalization, Visualization, Big Data

References
[1] Arfa Skandar, Mariam Rehman, Maria Anjum (October 2015). An Efficient Duplication Record Detection Algorithm for Data Cleansing. In International Journal of Computer Applications (0975 – 8887) Volume 127–No. 6.
[2] Estelle Camizuli, Emmanuel John Carranza (2018). Exploratory Data Analysis (EDA), In the Encyclopedia of Archaeological Sciences. Edited by Sandra L. López Varela. © 2018 JohnWiley & Sons, Inc. Published 2018 by John Wiley & Sons, Inc.
[3] Fakhitah Ridzuan, Wan Mohd Nazmee Wan Zainon (2019). A Review on Data Cleansing Methods for Big Data. In The Fifth Information Systems International Conference 2019, Procedia Computer Science 161 (2019) 731–738.
[4] G. Sunitha, Dr. A. Jaya (May 2013). A Knowledge Based Approach for Automatic Database Normalization. In International Journal of Advanced Research in Computer Engineering & Technology (IJARCET) Volume 2, No 5, May 2013.
[5] Heiko Müller, Johann-Christoph Freytag (January 2003). Problems, Methods, and Challenges in Comprehensive Data Cleansing.
[6] Kabita Sahoo, Abhaya Kumar Samal, Jitendra Pramanik, Subhendu Kumar Pani (October 2019). Exploratory Data Analysis using Python. In International Journal of Innovative Technology and Exploring Engineering (IJITEE) ISSN: 2278-3075, Volume-8, Issue-12.
[7] Karen A. Monsen. Intervention Effectiveness Research: Quality Improvement and Program Evaluation. © Springer International Publishing AG 2018.
[8] Kofi Adu-Manu Sarpong, John Kingsley Arthur (August 2013). Analysis of Data Cleansing Approaches regarding Dirty Data – A Comparative Study. In International Journal of Computer Applications (0975–8887) Volume 76–No. 7, August 2013.
[9] Kofi Adu-Manu Sarpong, John Kingsley Arthur (August 2013). A Review of Data Cleansing Concepts Achievable Goals and Limitations. In International Journal of Computer Applications (0975–8887) Volume 76–No. 7.
[10] Matthieu Komorowski, Dominic C. Marshall, Justin D. Salciccioli, Yves Crutain (2016). Exploratory Data Analysis. © The Author (s) 2016 in MIT Critical Data, Secondary Analysis of Electronic Health Records.
[11] Ronald D. Snee (2020). Using Exploratory Data Analysis. In Statistical Engineering Handbook, Chapter 3 - Section 3.
[12] Rory M. Leith, Keith W. Hipel & Herman Goertz (1991). Exploratory Data Analysis, Canadian Water resources journal, 16: 1, 81-92.
[13] Hiroyuki Konno, Naoshi Uchihira, Michitaka Kosaka (December 2018). Effective Data Cleansing Method Based on Metadata. International Journal of Japan Association for Management Systems Vol. 10 No. 1, December 2018, pp. 53-58
[14] Sardjono, R Yadi Rakhman Alamsyah, Marwondo3, Elia Setiana (2020). Data Cleansing Strategies on Data Sets Become Data Science. International Journal of Quantitative Research and Modeling Vol. 1, No. 3, pp. 145-156, 2020.
[15] Otmane Azeroual1, 2, 3, Gunter Saake2, Mohammad Abuosba (February 2018). Data Quality Measures and Data Cleansing for Research Information Systems. Journal of Digital Information Management Volume 16 Number 1 February 2018.
Cite This Article
  • APA Style

    Khanjan Purohit. (2021). Separation of Data Cleansing Concept from EDA. International Journal of Data Science and Analysis, 7(3), 89-97. https://doi.org/10.11648/j.ijdsa.20210703.16

    Copy | Download

    ACS Style

    Khanjan Purohit. Separation of Data Cleansing Concept from EDA. Int. J. Data Sci. Anal. 2021, 7(3), 89-97. doi: 10.11648/j.ijdsa.20210703.16

    Copy | Download

    AMA Style

    Khanjan Purohit. Separation of Data Cleansing Concept from EDA. Int J Data Sci Anal. 2021;7(3):89-97. doi: 10.11648/j.ijdsa.20210703.16

    Copy | Download

  • @article{10.11648/j.ijdsa.20210703.16,
      author = {Khanjan Purohit},
      title = {Separation of Data Cleansing Concept from EDA},
      journal = {International Journal of Data Science and Analysis},
      volume = {7},
      number = {3},
      pages = {89-97},
      doi = {10.11648/j.ijdsa.20210703.16},
      url = {https://doi.org/10.11648/j.ijdsa.20210703.16},
      eprint = {https://article.sciencepublishinggroup.com/pdf/10.11648.j.ijdsa.20210703.16},
      abstract = {Available dataset whether it is structured, semi structured or unstructured data, is used for various purposes. These data sets are mostly used for solving an issue using different kinds of techniques like visualization, descriptive, algorithms etc. This data process includes many levels, two of those steps are exploratory data analysis (EDA) and data cleansing. Data cleansing and exploratory data analysis (EDA) are two major operations of any data mining or machine learning study. After collecting the data from various sources, Data cleansing is done to make the data set more accurate, useful and less redundant. Data cleansing is useful to get the accurate information from the dataset and It is used to deal with null values, duplicate values, multiple values, inconsistent value, inaccurate value etc, Which are existing in that data set and It can make our data set filled with error which also affects the analysis and decision making process. By performing data cleansing, we can get rid of many types of misleadings like getting inaccurate output, inaccurate model of machine learning, not getting the hidden patterns behind that data set etc. The purpose of this paper is to study existing research of Data cleansing and EDA and state why Data cleansing process is not part of exploratory data analysis (EDA).},
     year = {2021}
    }
    

    Copy | Download

  • TY  - JOUR
    T1  - Separation of Data Cleansing Concept from EDA
    AU  - Khanjan Purohit
    Y1  - 2021/06/22
    PY  - 2021
    N1  - https://doi.org/10.11648/j.ijdsa.20210703.16
    DO  - 10.11648/j.ijdsa.20210703.16
    T2  - International Journal of Data Science and Analysis
    JF  - International Journal of Data Science and Analysis
    JO  - International Journal of Data Science and Analysis
    SP  - 89
    EP  - 97
    PB  - Science Publishing Group
    SN  - 2575-1891
    UR  - https://doi.org/10.11648/j.ijdsa.20210703.16
    AB  - Available dataset whether it is structured, semi structured or unstructured data, is used for various purposes. These data sets are mostly used for solving an issue using different kinds of techniques like visualization, descriptive, algorithms etc. This data process includes many levels, two of those steps are exploratory data analysis (EDA) and data cleansing. Data cleansing and exploratory data analysis (EDA) are two major operations of any data mining or machine learning study. After collecting the data from various sources, Data cleansing is done to make the data set more accurate, useful and less redundant. Data cleansing is useful to get the accurate information from the dataset and It is used to deal with null values, duplicate values, multiple values, inconsistent value, inaccurate value etc, Which are existing in that data set and It can make our data set filled with error which also affects the analysis and decision making process. By performing data cleansing, we can get rid of many types of misleadings like getting inaccurate output, inaccurate model of machine learning, not getting the hidden patterns behind that data set etc. The purpose of this paper is to study existing research of Data cleansing and EDA and state why Data cleansing process is not part of exploratory data analysis (EDA).
    VL  - 7
    IS  - 3
    ER  - 

    Copy | Download

Author Information
  • Data Science and Analytics, Jain University, Bangalore, India

  • Sections