Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

different TFs are discovered from different cistarget databases #526

Open
gilgolan73 opened this issue Dec 15, 2024 · 5 comments
Open

different TFs are discovered from different cistarget databases #526

gilgolan73 opened this issue Dec 15, 2024 · 5 comments

Comments

@gilgolan73
Copy link

Describe the bug
Hi, I am using scenicplus to find eRegulons in my sc-multiomic data (mouse).
I ran scenicplus twice, each time using a different cistarget databases:

  1. custom databases - created by create_cistarget_databases (according to https://scenicplus.readthedocs.io/en/latest/human_cerebellum_ctx_db.html). As a consensus peak set, I used a list of peaks identified by MACS2 (this set includes all the peaks identified by MACS2 in at least one of the clusters in the dataset).
  2. precomputed databases for mouse - for mouse, as suggested by the tutorial.

When I looked at the results, I noticed that a lot of the TFs that were found using the custom databases, are missing in the precomputed databases-based analysis (and vice-versa). For example, for the custom-based databases , I found 100 TFs with regulons (among these, 51 were not found in the precomputed-based analysis). For the precomputed-based analysis, I found 94 TFs with regulons (among these, 46 were not found in the custom-based analysis).

These different sets of found TFs results in very different results (in terms of the eRegulons found, and the ability to perform in-silico perturbation assays for the TFs). Why do I get such different results? Is there a way to include more TFs in the custom-based analysis (which I guess is more accurate?) ?

Thank you,
Gil Golan

@SeppeDeWinter
Copy link
Collaborator

Hi @gilgolan73

This is expected.

The precomputed database uses ENCODE SCREEN regions. These regulatory regions are based on data from a variety of cell lines. These cell lines might not be too similar to your data.

For this reason I would trust the results from the custom database more.

Best,

Seppe

@gilgolan73
Copy link
Author

Hello @SeppeDeWinter ,
I understand that the analysis based on custom cistarget databases (hence on more accurate regulatory regions) is more trustable.
However, the results of such analysis are devoid of many TFs that seem important in my dataset (according to a previous SCENIC analysis, and in-vitro studies). I would like them to have eRegulons as well (to predict their targets). Is there a way to change the SNAKEMAKE parameters (maybe to do the filtering less strick?) to include these TFs also ?

Thank you,
Gil

@gilgolan73
Copy link
Author

Hi @SeppeDeWinter ,
Is there a way to run the analysis using the combination of the pre-computed and the custom cistarget databases?

An additional question- when I look at the TF_names.txt file, I can see that some TFs are missing (in both analyses). Is it because their motif does not appear in the database? If so, can I add their motif somehow?

Thank you,
Gil

@SeppeDeWinter
Copy link
Collaborator

Hi @gilgolan73

Is there a way to run the analysis using the combination of the pre-computed and the custom cistarget databases?

Yes, this in theory possible. This would involve generating a new database containing the union of SCREEN regions (https://screen.encodeproject.org/) and you consensus peak set.

An additional question- when I look at the TF_names.txt file, I can see that some TFs are missing (in both analyses). Is it because their motif does not appear in the database? If so, can I add their motif somehow?

This file is generated based on your motif enrichment results. If the TF does not appear it means that no motif for your TF of interest is enriched. It is of course possible that our collection does not contain any motif for your TF of interest. For this, please check wether your TF of interest occurs on the motif-to-TF annotation table: https://resources.aertslab.org/cistarget/motif2tf/

I hope I could help with this?

Best,

S

@gilgolan73
Copy link
Author

Hi,
Thank you for the help.

  1. Which regions did you use from the SCREEN database? all cCREs?
  2. My TF of interest do appear on the motif-to-TF annotation table (motifs-v10nr_clust-nr.mgi-m0.001-o0.0.tbl). However they do not appear in my results. Do you do some filtering of these motifs when running create_cistarget_databases? Is there a way to look if their motifs appear in the cistarget databases?
    Because it does not make sense that they will be excluded from the analysis (expressed in high levels in certain cell clusters, and known to bind promoters of target genes in the dataset).

Thanks,
Gil

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants