Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pairwise/clustering downstream-analysis research-driven thoughts #252

Open
mr-eyes opened this issue Feb 27, 2024 · 1 comment
Open

pairwise/clustering downstream-analysis research-driven thoughts #252

mr-eyes opened this issue Feb 27, 2024 · 1 comment

Comments

@mr-eyes
Copy link
Member

mr-eyes commented Feb 27, 2024

  • Graph construction was implemented using rustworkx in MRG: Add graph-based clustering #234, I want to mention that the rustworkx python interface is remarkably optimized. Using Python, we can build all graph-downstream analyses after the initial creation of the graph (I suppose an undirected graph).
  • BiPartite: As visualized in DBRetina bipartite, this can be useful in so many applications (maybe like):
    • metagenomes compositional analysis.
    • pangenome-like signatures relations with genomes.
    • Host-Pathogen Interactions.
    • Strain-level analysis.
  • Community Detection: The current clustering algorithm is weakly_connected_component, @bluegenes tried it before with kSpider, and -as far as I remember- it did a great job in the ANI-based clustering of the GTDB-207. Here, I propose adopting community detection methods, which have been proven very useful in DBRetina, but I haven't tried them on DNA data.
    • Note: RustworkX currently lacks variability in graph algorithms, unlike NetworkX.
    • Suggested algorithms to explore:
  • k-mer graph 🌟: : Here the graph will consists of k-mer hashes as nodes, and genomes/metagenomes/etc.. as edges, with abundance as edge-weight. This also can be useful for God knows how many applications (maybe like):
    • Biomarkers detection
    • Evolutionary and taxonomy analysis
    • Low-complexity k-mers detection and removal
    • and more ...
  • Interactive Dashboard: In DBRetina, I implemented a JS-based dashboard that loads the graphs and allows interactive researching by filtering/querying the graph with many features/thresholds/etc.. it was super helpful. Previously, this was done by exporting the graph to a graph database like Neo4J or memgraph, but it will not help software users.
  • (maybe odds ratio & p-value): In the pairwise script, we can allow an optional calculation of the similarity significance by calculating the odds ratio and p-value. But I will need to think more about it in this context.
@mr-eyes
Copy link
Member Author

mr-eyes commented Feb 27, 2024

Notes regarding visualization and clustering:

  • UMAP, tSNE, and other MDS algorithms usually require tweaking the parameter many times to get an expected output.
  • Constructing MDS, then performing k-means or other clustering algorithms can be super useful.

Examples for MDS visualizations done by kSpider: https://farm.cse.ucdavis.edu/~mhussien/hmp_bacterial_plots/

ref #248

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant