GitHub - genecell/COSG: Accurate and fast cell marker gene identification with COSG

Accurate and fast cell marker gene identification with COSG

Overview

COSG is a cosine similarity-based method for more accurate and scalable marker gene identification.

COSG is a general method for cell marker gene identification across different data modalities, e.g., scRNA-seq, scATAC-seq, and spatially resolved transcriptome data.
Marker genes or genomic regions identified by COSG are more indicative and with greater cell-type specificity.
COSG is ultrafast for large-scale datasets and is capable of identifying marker genes for one million cells in less than two minutes.

The method and benchmarking results are described in Dai et al. (2022).

Additionally, the R version of COSG is available here.

Note: we have recently released our python toolkit, PIASO, in which some methods were built upon COSG, please try out PIASO, thank you!

Documentation

COSG documentation.

Release notes

Release v1.0.3 (March 11, 2025)

Fixed the incompatibility with multiple index columns of adata.uns['cosg']['COSG'] in adata.write function
Enhanced plotMarkerDendrogram function with several new capabilities:
- Implemented support for customized cell type-gene pairs
- Added color control for nodes and edges
- Added cell type filtering functionality
- Integrated support for curved edges in visualization

Release v1.0.2 (March 5, 2025)

Added plotMarkerDotplot and plotMarkerDendrogram for enhanced marker gene visualization.
Introduced support for batch_key to compute cosine similarities separately across different batches.
Enabled calculation of normalized COSG scores for comparing gene expression specificity across cell types or datasets.
Resolved a SciPy version deprecation issue related to .A attribute usage.
Fixed a DataFrame manipulation warning.
Added verbosity control, allowing users to adjust log output levels.

Release v1.0.1 (June 15, 2021)

First release in PyPI.

Installation

Stable version:

pip install cosg

Development version:

pip install git+https://github.com/genecell/COSG.git

Example

Run COSG:

import cosg
n_gene=30
groupby='CellTypes'
cosg.cosg(
   adata,
   key_added='cosg',
   # use_raw=False, layer='log1p', ## e.g., if you want to use the log1p layer in adata
   mu=100,
   expressed_pct=0.1,
   remove_lowly_expressed=True,
   n_genes_user=100,
   groupby=groupby
)

Draw the dot plot:

sc.tl.dendrogram(adata, groupby=groupby, use_rep='X_pca') ## Change use_rep to the cell embeddings key you'd like to use
df_tmp=pd.DataFrame(adata.uns['cosg']['names'][:3,]).T
df_tmp=df_tmp.reindex(adata.uns['dendrogram_'+groupby]['categories_ordered'])
marker_genes_list={idx: list(row.values) for idx, row in df_tmp.iterrows()}
marker_genes_list = {k: v for k, v in marker_genes_list.items() if not any(isinstance(x, float) for x in v)}

sc.pl.dotplot(
   adata,
   marker_genes_list,
   groupby=groupby,
   dendrogram=True,
   swap_axes=False,
   standard_scale='var',
   cmap='Spectral_r'
 )

Output the marker list as pandas dataframe:

marker_gene=pd.DataFrame(adata.uns['cosg']['names'])
marker_gene.head()

You could also check the COSG scores:

marker_gene_scores=pd.DataFrame(adata.uns['cosg']['scores'])
marker_gene_scores.head()

Question

For questions about the code and tutorial, please contact Min Dai, dai@broadinstitute.org.

Citation

If COSG is useful for your research, please consider citing Dai et al. (2022).

Name		Name	Last commit message	Last commit date
Latest commit History 67 Commits
.github/workflows		.github/workflows
cosg		cosg
docs		docs
.gitignore		.gitignore
LICENSE		LICENSE
README.rst		README.rst
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Accurate and fast cell marker gene identification with COSG

Overview

Documentation

Release notes

Installation

Example

Question

Citation

About

Releases 3

Packages

Contributors 2

Languages

License

genecell/COSG

Folders and files

Latest commit

History

Repository files navigation

Accurate and fast cell marker gene identification with COSG

Overview

Documentation

Release notes

Installation

Example

Question

Citation

About

Topics

Resources

License

Stars

Watchers

Forks

Releases 3

Packages 0

Contributors 2

Languages

Packages