different TFs are discovered from different cistarget databases #526

gilgolan73 · 2024-12-15T12:25:05Z

Describe the bug
Hi, I am using scenicplus to find eRegulons in my sc-multiomic data (mouse).
I ran scenicplus twice, each time using a different cistarget databases:

custom databases - created by create_cistarget_databases (according to https://scenicplus.readthedocs.io/en/latest/human_cerebellum_ctx_db.html). As a consensus peak set, I used a list of peaks identified by MACS2 (this set includes all the peaks identified by MACS2 in at least one of the clusters in the dataset).
precomputed databases for mouse - for mouse, as suggested by the tutorial.

When I looked at the results, I noticed that a lot of the TFs that were found using the custom databases, are missing in the precomputed databases-based analysis (and vice-versa). For example, for the custom-based databases , I found 100 TFs with regulons (among these, 51 were not found in the precomputed-based analysis). For the precomputed-based analysis, I found 94 TFs with regulons (among these, 46 were not found in the custom-based analysis).

These different sets of found TFs results in very different results (in terms of the eRegulons found, and the ability to perform in-silico perturbation assays for the TFs). Why do I get such different results? Is there a way to include more TFs in the custom-based analysis (which I guess is more accurate?) ?

Thank you,
Gil Golan

SeppeDeWinter · 2024-12-16T07:25:35Z

Hi @gilgolan73

This is expected.

The precomputed database uses ENCODE SCREEN regions. These regulatory regions are based on data from a variety of cell lines. These cell lines might not be too similar to your data.

For this reason I would trust the results from the custom database more.

Best,

Seppe

gilgolan73 · 2024-12-16T08:41:48Z

Hello @SeppeDeWinter ,
I understand that the analysis based on custom cistarget databases (hence on more accurate regulatory regions) is more trustable.
However, the results of such analysis are devoid of many TFs that seem important in my dataset (according to a previous SCENIC analysis, and in-vitro studies). I would like them to have eRegulons as well (to predict their targets). Is there a way to change the SNAKEMAKE parameters (maybe to do the filtering less strick?) to include these TFs also ?

Thank you,
Gil

gilgolan73 · 2024-12-19T14:50:08Z

Hi @SeppeDeWinter ,
Is there a way to run the analysis using the combination of the pre-computed and the custom cistarget databases?

An additional question- when I look at the TF_names.txt file, I can see that some TFs are missing (in both analyses). Is it because their motif does not appear in the database? If so, can I add their motif somehow?

Thank you,
Gil

SeppeDeWinter · 2025-01-06T07:29:21Z

Hi @gilgolan73

Is there a way to run the analysis using the combination of the pre-computed and the custom cistarget databases?

Yes, this in theory possible. This would involve generating a new database containing the union of SCREEN regions (https://screen.encodeproject.org/) and you consensus peak set.

An additional question- when I look at the TF_names.txt file, I can see that some TFs are missing (in both analyses). Is it because their motif does not appear in the database? If so, can I add their motif somehow?

This file is generated based on your motif enrichment results. If the TF does not appear it means that no motif for your TF of interest is enriched. It is of course possible that our collection does not contain any motif for your TF of interest. For this, please check wether your TF of interest occurs on the motif-to-TF annotation table: https://resources.aertslab.org/cistarget/motif2tf/

I hope I could help with this?

Best,

S

gilgolan73 · 2025-01-07T10:10:23Z

Hi,
Thank you for the help.

Which regions did you use from the SCREEN database? all cCREs?
My TF of interest do appear on the motif-to-TF annotation table (motifs-v10nr_clust-nr.mgi-m0.001-o0.0.tbl). However they do not appear in my results. Do you do some filtering of these motifs when running create_cistarget_databases? Is there a way to look if their motifs appear in the cistarget databases?
Because it does not make sense that they will be excluded from the analysis (expressed in high levels in certain cell clusters, and known to bind promoters of target genes in the dataset).

Thanks,
Gil

SeppeDeWinter · 2025-01-13T08:54:20Z

Hi

<Which regions did you use from the SCREEN database? all cCREs?>
Yes, all of them

We don't do any filtering, but you can check using this code if you want.

from ctxcore.rnkdb import FeatherRankingDatabase
db = FeatherRankingDatabase(<PATH_TO_REGIONS_V_MOTIF_DB>, name="test")

# are any motifs annotated to your TF of interest in 
db.genes

I think the region sets you are using for motif enrichment might not represent the state where the TF is expressed very well and that this is the reason why you don't find any of its motifs enriched.

Best,

Seppe

gilgolan73 · 2025-01-13T09:48:37Z

Hello @SeppeDeWinter ,
I tried to take a look at db.genes but I got a list of coordinates rather than TF names.

Additionally, from my understanding, the region sets that I am using for motif enrichment are derived from the pycistopic analysis. Is there any parameters or steps that you recommend to change in this analysis?

Thank you,
Gil

SeppeDeWinter · 2025-01-13T12:33:06Z

Hi @gilgolan73

Yes, my mistake.
can you check the column names instead of

df = db.load_full()

All the best,

Seppe

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

different TFs are discovered from different cistarget databases #526

different TFs are discovered from different cistarget databases #526

gilgolan73 commented Dec 15, 2024

SeppeDeWinter commented Dec 16, 2024

gilgolan73 commented Dec 16, 2024

gilgolan73 commented Dec 19, 2024

SeppeDeWinter commented Jan 6, 2025

gilgolan73 commented Jan 7, 2025

SeppeDeWinter commented Jan 13, 2025

gilgolan73 commented Jan 13, 2025

SeppeDeWinter commented Jan 13, 2025

different TFs are discovered from different cistarget databases #526

different TFs are discovered from different cistarget databases #526

Comments

gilgolan73 commented Dec 15, 2024

SeppeDeWinter commented Dec 16, 2024

gilgolan73 commented Dec 16, 2024

gilgolan73 commented Dec 19, 2024

SeppeDeWinter commented Jan 6, 2025

gilgolan73 commented Jan 7, 2025

SeppeDeWinter commented Jan 13, 2025

gilgolan73 commented Jan 13, 2025

SeppeDeWinter commented Jan 13, 2025