| Peer-Reviewed

A Topological Approach of Principal Component Analysis

Received: 10 January 2021    Accepted: 29 January 2021    Published: 20 April 2021
Views:       Downloads:
Abstract

Large datasets are increasingly widespread in many disciplines. The exponential growth of data requires the development of more data analysis methods in order to process information more efficiently. In order to better visualize the data, many methods such as Principal Component Analysis (PCA) and MultiDimensional Scaling (MDS) allow to extract a low-dimensional structure from high-dimensional data set. The proposed approach, called Topological Principal Component Analysis (TPCA), is a multidimensional descriptive method witch studies a homogeneous set of continuous variables defined on the same set of individuals. It is a topological method of data analysis that consists of comparing and classifying proximity measures from among some of the most widely used proximity measures for continuous data. Proximity measures play an important role in many areas of data analysis, the results strongly depend on the proximity measure chosen. So, among the many existing measures, which one is most useful? Are they all equivalent? How to identify the one that is most appropriate to analyze the correlation structure of a set of quantitative variables. TPCA proposes an appropriate adjacency matrix associated to an unknown proximity measure according to the data under consideration, then analyzes and visualizes, with graphic representations, the relationship structure of the variables relating to, the well known PCA problem. Its uses the concept of neighborhood graphs and compares a set of proximity measures for continuous data which can be more-or-less equivalent a topological equivalence criterion between two proximity measures is defined and statistically tested according to the topological correlation between the variables considered. An example on real data illustrates the proposed approach.

Published in International Journal of Data Science and Analysis (Volume 7, Issue 2)
DOI 10.11648/j.ijdsa.20210702.11
Page(s) 20-31
Creative Commons

This is an Open Access article, distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution and reproduction in any medium or format, provided the original work is properly cited.

Copyright

Copyright © The Author(s), 2024. Published by Science Publishing Group

Keywords

Proximity Measure, Neighborhood Graph, Adjacency Matrix, Topological Equivalence, Correlation Matrix, MDS Graphical Representation

References
[1] R. Abdesselam, “A Topological Multiple Correspondence Analysis.” Journal of Mathematics and Statistical Science, Science Signpost Publishing Inc., USA, Vol. 5, Issue 8, pp. 175-192, 2019.
[2] R. Abdesselam, “Selection of proximity measures for a Topological Correspondence Analysis.” In a Book Series, 5th Stochastic Modeling Techniques and Data Analysis, International Conference, Chania, Greece, pp. 11-24, 2018.
[3] R. Abdesselam, “A Topological Discriminant Analysis.” In book Chapter, Vol. 3, Data Analysis and Applications 2: Utilization of Results in Europe and 0ther Topics, ISTE Science Publishing, Wiley, pp. 167-178, 2018.
[4] V. Batagelj and M. Bren, “Comparing resemblance measures.” In Journal of classification, 12, pp. 73-90, 1995.
[5] F. Cailliez and J-P Pagès “Introduction à l'Analyse des données.”, S. M. A. S. H., Paris, 1976.
[6] J. Cohen, “A coefficient of agreement for nominal scales.” Educational and Psychological Measurement, Vol. 20, pp. 27-46, 1960.
[7] J. Demsar, “Statistical comparisons of classifiers over multiple data sets.” The journal of Machine Learning Research, Vol. 7, pp. 1-30, 2006.
[8] Eurostat, Data source: Government finance statistics - Statistics explained, https://ec.europa.eu/eurostat/statistics-explained/index.php?title=Government finance statistics, pp. 1-15, 2018.
[9] R-A. Fisher, “The Interpretation of chi2 from Contingency Tables, and the Calculation of P.” Journal of the Royal Statistical Society, Published by Wiley, 85, 1, pp. 87-94, 1922.
[10] H. Hotelling, “Analysis of a Complex of Statistical Variables into Principal Components”. Journal of Educational Psychology, Vol. 24, 6, pp. 417-441, 1933.
[11] J. H. Kim and S. Lee, “Tail bound for the minimal spanning tree of a complete graph.” In Statistics & Probability Letters, Vol. 4, 64, pp. 425-430, 2003.
[12] L. Lebart, “Stratégies du traitement des données d'enquêtes.” La Revue de MODULAD, 3, pp. 21-29, 1989.
[13] J. Lesot, M. Rifqi and H. Benhadda, “Similarity measures for binary and numerical data: a survey.” In IJKESDP, Vol. 1, 1, pp. 63-84, 2009.
[14] N. Mantel, “A technique of disease clustering and a generalized regression approach.” In Cancer Research, Vol. 27, pp. 209-220, 1967.
[15] J-C. Park, H. Shin, and B-K. Choi, “Elliptic Gabriel graph for finding neighbors in a point set and its application to normal vector estimation.” In Computer-Aided Design Elsevier, Vol. 38, 6, pp. 619-626, 2006.
[16] K. Pearson, “On lines and Planes of Closest Fit to Systems of Points in Space.” In Philosophical Magazine, vol. 2, 11, pp. 559-572, 1901.
[17] J. Rifqi, M., Detyniecki, M. and Bouchon-Meunier, B. “Discrimination power of measures of resemblance.” IFSA'03 Citeseer, 2003.
[18] G. Saporta, “Probabilités, analyse des données et Statistique.” Editions TECHNIP, 2011.
[19] J-W. Schneider and P. Borlund, “Matrix comparison, Part 1: Motivation and important issues for measuring the resemblance between proximity measures or ordination results.” In Journal of the American Society for Information Science and Technology, Vol. 58, 11, pp. 1586-1595, 2007.
[20] J-W. Schneider and P. Borlund, “Matrix comparison, Part 2: Measuring the resemblance between proximity measures or ordination results by use of the Mantel and Procrustes statistics.” In Journal of the American Society for Information Science and Technology, Vol. 11, 58, pp. 1596-1609, 2007.
[21] G-T. Toussaint, “The relative neighbourhood graph of a finite planar set.” In Pattern recognition, Vol. 12, 4, pp. 261-268, 1980.
[22] J-R. Ward, “Hierarchical grouping to optimize an objective function.” In Journal of the American statistical association JSTOR, Vol. 58, 301, pp. 236-244, 1963.
[23] M-J. Warrens, “Bounds of resemblance measures for binary (presence/absence) variables.” In Journal of Classification, Springer, Vol. 25, 2, pp. 195-208, 2008.
[24] D. Zighed, R. Abdesselam and A. Hadgu, “Topological comparisons of proximity measures.” In the 16th PAKDD 2012 Conference. In P.-N. Tan et al., Eds. Part I, LNAI 7301, Springer-Verlag, Berlin Heidelberg, pp. 379-391, 2012.
Cite This Article
  • APA Style

    Rafik Abdesselam. (2021). A Topological Approach of Principal Component Analysis. International Journal of Data Science and Analysis, 7(2), 20-31. https://doi.org/10.11648/j.ijdsa.20210702.11

    Copy | Download

    ACS Style

    Rafik Abdesselam. A Topological Approach of Principal Component Analysis. Int. J. Data Sci. Anal. 2021, 7(2), 20-31. doi: 10.11648/j.ijdsa.20210702.11

    Copy | Download

    AMA Style

    Rafik Abdesselam. A Topological Approach of Principal Component Analysis. Int J Data Sci Anal. 2021;7(2):20-31. doi: 10.11648/j.ijdsa.20210702.11

    Copy | Download

  • @article{10.11648/j.ijdsa.20210702.11,
      author = {Rafik Abdesselam},
      title = {A Topological Approach of Principal Component Analysis},
      journal = {International Journal of Data Science and Analysis},
      volume = {7},
      number = {2},
      pages = {20-31},
      doi = {10.11648/j.ijdsa.20210702.11},
      url = {https://doi.org/10.11648/j.ijdsa.20210702.11},
      eprint = {https://article.sciencepublishinggroup.com/pdf/10.11648.j.ijdsa.20210702.11},
      abstract = {Large datasets are increasingly widespread in many disciplines. The exponential growth of data requires the development of more data analysis methods in order to process information more efficiently. In order to better visualize the data, many methods such as Principal Component Analysis (PCA) and MultiDimensional Scaling (MDS) allow to extract a low-dimensional structure from high-dimensional data set. The proposed approach, called Topological Principal Component Analysis (TPCA), is a multidimensional descriptive method witch studies a homogeneous set of continuous variables defined on the same set of individuals. It is a topological method of data analysis that consists of comparing and classifying proximity measures from among some of the most widely used proximity measures for continuous data. Proximity measures play an important role in many areas of data analysis, the results strongly depend on the proximity measure chosen. So, among the many existing measures, which one is most useful? Are they all equivalent? How to identify the one that is most appropriate to analyze the correlation structure of a set of quantitative variables. TPCA proposes an appropriate adjacency matrix associated to an unknown proximity measure according to the data under consideration, then analyzes and visualizes, with graphic representations, the relationship structure of the variables relating to, the well known PCA problem. Its uses the concept of neighborhood graphs and compares a set of proximity measures for continuous data which can be more-or-less equivalent a topological equivalence criterion between two proximity measures is defined and statistically tested according to the topological correlation between the variables considered. An example on real data illustrates the proposed approach.},
     year = {2021}
    }
    

    Copy | Download

  • TY  - JOUR
    T1  - A Topological Approach of Principal Component Analysis
    AU  - Rafik Abdesselam
    Y1  - 2021/04/20
    PY  - 2021
    N1  - https://doi.org/10.11648/j.ijdsa.20210702.11
    DO  - 10.11648/j.ijdsa.20210702.11
    T2  - International Journal of Data Science and Analysis
    JF  - International Journal of Data Science and Analysis
    JO  - International Journal of Data Science and Analysis
    SP  - 20
    EP  - 31
    PB  - Science Publishing Group
    SN  - 2575-1891
    UR  - https://doi.org/10.11648/j.ijdsa.20210702.11
    AB  - Large datasets are increasingly widespread in many disciplines. The exponential growth of data requires the development of more data analysis methods in order to process information more efficiently. In order to better visualize the data, many methods such as Principal Component Analysis (PCA) and MultiDimensional Scaling (MDS) allow to extract a low-dimensional structure from high-dimensional data set. The proposed approach, called Topological Principal Component Analysis (TPCA), is a multidimensional descriptive method witch studies a homogeneous set of continuous variables defined on the same set of individuals. It is a topological method of data analysis that consists of comparing and classifying proximity measures from among some of the most widely used proximity measures for continuous data. Proximity measures play an important role in many areas of data analysis, the results strongly depend on the proximity measure chosen. So, among the many existing measures, which one is most useful? Are they all equivalent? How to identify the one that is most appropriate to analyze the correlation structure of a set of quantitative variables. TPCA proposes an appropriate adjacency matrix associated to an unknown proximity measure according to the data under consideration, then analyzes and visualizes, with graphic representations, the relationship structure of the variables relating to, the well known PCA problem. Its uses the concept of neighborhood graphs and compares a set of proximity measures for continuous data which can be more-or-less equivalent a topological equivalence criterion between two proximity measures is defined and statistically tested according to the topological correlation between the variables considered. An example on real data illustrates the proposed approach.
    VL  - 7
    IS  - 2
    ER  - 

    Copy | Download

Author Information
  • Department of Economics and Management, University Lumière of Lyon 2, Lyon, France

  • Sections