Package overview#

The Archetypes package is designed to provide a set of algorithms for performing Archetypal Analysis (AA). The goal of the package is to make AA more accessible and easier to implement for Python developers. This package provides a set of functions for performing AA on a variety of data types, including NumPy arrays, Pandas dataframes, and PyTorch tensors. The package also provides a set of visualization tools for visualizing the results of AA.

What is Archetypal Analysis?#

Archetypal Analysis is a data analysis method that aims to identify a small number of archetypes, which are extreme examples of the data set. These archetypes can be used to describe the data set and to perform dimensionality reduction, outlier detection, and clustering.

AA is based on the concept of convex hulls, which are the smallest convex shapes that can contain all the observations in a dataset. The archetypes are the vertices of these convex hulls, which represent the most extreme and typical points in the data. AA seeks to find the smallest set of archetypes that can explain the convex hull of the data.

Each observation in the data set can be represented as a linear combination of the archetypes. The coeficients of this linear combination are constrained to be non-negative and sum to one, which ensures that the resulting combination is a convex combination of the archetypes. This coeficients can be used to describe the observation, and they can be used to reconstruct the observation from the archetypes.

Benefits of using Archetypal Analysis#

Archetypal Analysis has several benefits when it comes to describing data. Here are some examples:

  • Dimensionality Reduction: AA can reduce the dimensionality of the data by identifying a small set of archetypes that can explain most of the variability in the data set.

  • Interpretability: The archetypes generated by AA can be interpreted as characteristic examples of the data set, which can provide insights into the underlying patterns and structures in the data.

  • Outlier Detection: AA can be used to identify outliers in the data set by detecting observations that cannot be well represented by a linear combination of the archetypes.

  • Clustering: AA can be used for clustering by partitioning the data set into groups that are well-represented by different subsets of the archetypes. Overall, the “archetypes” package aims to provide a powerful and flexible tool for exploring and understanding complex data sets using Archetypal Analysis.