Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

build an ordination plot (NMDS? PCoA?) on top of sourmash compare distance matrices #2900

Open
ctb opened this issue Jan 4, 2024 · 2 comments

Comments

@ctb
Copy link
Contributor

ctb commented Jan 4, 2024

No description provided.

@ctb
Copy link
Contributor Author

ctb commented Apr 16, 2024

script, via @bluegenes and @mr-eyes -

#! /usr/bin/env python
import sys
import argparse
import numpy as np
from sklearn.manifold import MDS
import matplotlib.pyplot as plt
from scipy.sparse import lil_matrix, csr_matrix

def create_sparse_similarity_matrix(tuples, num_objects):
    # Initialize matrix in LIL format for efficient setup
    similarity_matrix = lil_matrix((num_objects, num_objects))

    for obj1, obj2, similarity in tuples:
        similarity_matrix[obj1, obj2] = similarity
        if obj1 != obj2:
            similarity_matrix[obj2, obj1] = similarity

    # Ensure diagonal elements are 1
    similarity_matrix.setdiag(1)

    # Convert to CSR format for efficient operations later
    return similarity_matrix.tocsr()


def plot_mds_sparse(matrix):
    # Convert sparse similarity to dense dissimilarity matrix
    #dissimilarities = 1 - matrix.toarray()
    dissimilarities = 1 - matrix
    mds = MDS(n_components=2, dissimilarity='precomputed', random_state=42)
    mds_coords = mds.fit_transform(dissimilarities)
    plt.scatter(mds_coords[:, 0], mds_coords[:, 1])
    plt.title('MDS Plot')
    plt.xlabel('Dimension 1')
    plt.ylabel('Dimension 2')


def main():
    p = argparse.ArgumentParser()
    p.add_argument('comparison_matrix')
    p.add_argument('-o', '--output-figure', required=True)
    args = p.parse_args()

    with open(args.comparison_matrix, 'rb') as f:
        mat = np.load(f)

    # Example usage
    # Assume object indices instead of names for simplicity
    #similarity_tuples = [(0, 1, 0.7), (0, 2, 0.4), (1, 2, 0.5)]
    #num_objects = 3  # You should know the total number of objects
    #sparse_matrix = create_sparse_similarity_matrix(similarity_tuples, num_objects)
    plot_mds_sparse(mat)

    plt.savefig(args.output_figure)


if __name__ == '__main__':
    sys.exit(main())

@ctb
Copy link
Contributor Author

ctb commented May 18, 2024

MDS plots are now available in the betterplot plugin as sourmash scripts mds.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant