This repository is the author's implementation of the paper Fuzzy c-Means Clustering for Persistence Diagrams.
We develop an algorithm to fuzzy cluster datasets based on their topology. In the figure we show nine datasets with their corresponding persistence diagrams, and three cluster centres produced by our algorithm. The cluster centres have zero, one, or two significant off-diagonal points, corresponding to zero, one, or two rings in the original datasets.
To install requirements:
pip install -r requirements.txt
In clustering.py
we provide a function fpd_cluster
that accepts a list of datasets and number of clusters as an input, and returns membership values and cluster centres.
To use it, have clustering.py
in the same folder as your project and import fpd_cluster from clustering
at the top of your file.
If you already have a list of persistence diagrams, you can cluster them using the pd_fuzzy
function in clustering.py
.
Our algorithm is able to successfully cluster cubic structures and carbon allotropes from atomic coordinates, regardless of transformations applied to the data. Comparable Wasserstein barycentre clustering algorithms fail to cluster all transformed carbon allotropes datasets, and all but the reflected cubic structures datasets.
Our algorithm also runs at least an order of magnitude faster than comparable Wasserstein barycentre clustering algorithms. The table below shows timing results in seconds per iteration.
Points | 100 | 200 | 300 | 400 | 500 | 600 | 700 | 800 | 900 | 1000 |
---|---|---|---|---|---|---|---|---|---|---|
FPDCluster | 0.01552 | 0.1975 | 0.9358 | 2.229 | 5.694 | 12.29 | 19.27 | 34.50 | 53.20 | 77.81 |
ADMM | 5.622 | 34.86 | 161.3 | 617.6 | - | - | - | - | - | - |
BADMM | 0.2020 | 2.188 | 26.38 | 112.6 | - | - | - | - | - | - |
SubGD | 0.4217 | 2.273 | 22.17 | 103.4 | - | - | - | - | - | - |
IterBP | 0.3825 | 2.226 | 21.57 | 108.9 | - | - | - | - | - | - |
LP | 0.3922 | 2.031 | 22.32 | 117.3 | - | - | - | - | - | - |
To reproduce the synthetic data results and figures:
python src/synthetic_data.py
To reproduce the timing results:
python src/timing_experiments.py
To reproduce the cubic structures and carbon allotropes results:
python src/lattice_structures.py
N.b. The paths to the data require that you are in the root directory when running these commands.
We provide examples of face-centred and body-centred cubic structures from the Materials Project, and diamond and cis-hinged polydiacetylene from the SACADA Carbon Allotrope Database.
These are provided as cif
files, which we export to xyz
format using VESTA, then convert to csv
using a custom function in file_utils.py
.
The Wasserstein barycentre code that we have used has a custom d2
file format. Also included in file_utils.py
is a function save_as_d2
that saves the data structures used in our code to d2 format.
All content in this repository is licensed under the MIT license.