Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error when running SnakeMake #468

Closed
gilgolan73 opened this issue Sep 17, 2024 · 23 comments
Closed

Error when running SnakeMake #468

gilgolan73 opened this issue Sep 17, 2024 · 23 comments

Comments

@gilgolan73
Copy link

Describe the bug
Hello, I'm trying to run SCENIC+ using SnakeMake in a linux machine (centos 9), on the tutorial dataset.
I ran scATAC-seq preprocessing in python (using pycistopic, using the tutorial: https://pycistopic.readthedocs.io/en/latest/notebooks/human_cerebellum.html)
Then I ran the scRNAseq preprocessing in python (using the tutorial: https://scenicplus.readthedocs.io/en/latest/human_cerebellum_scRNA_pp.html#Preprocessing-the-scRNA-seq-data).
I'm using the default config.yml file for SnakeMake, just changed the location of the input data.
1-2 minutes after running SnakeMake (as in the tutorial: https://scenicplus.readthedocs.io/en/latest/human_cerebellum.html#Running-SCENIC+), I receive an error (see below) which I believe is regarding to the different cell names between the scATACseq and scRNAseq datasets. Please let me know how to solve this issue. Thank you!

To Reproduce
"snakemake --cores 20"

Error output
"Assuming unrestricted shared filesystem usage for local execution.
Building DAG of jobs...
Using shell: /usr/bin/bash
Provided cores: 20
Rules claiming more threads will be scaled down.
Job stats:
job count


AUCell_direct 1
AUCell_extended 1
all 1
download_genome_annotations 1
eGRN_direct 1
eGRN_extended 1
get_search_space 1
motif_enrichment_cistarget 1
motif_enrichment_dem 1
prepare_GEX_ACC_multiome 1
prepare_menr 1
region_to_gene 1
scplus_mudata 1
tf_to_gene 1
total 14

Select jobs to execute...
Execute 2 jobs...

[Tue Sep 17 16:12:50 2024]
localrule prepare_GEX_ACC_multiome:
input: /home/gilgolan/bioinfo_analysis/scenicplus/tutorial2/outs/cistopic_obj.pkl, /home/gilgolan/bioinfo_analysis/scenicplus/tutorial2/scRNAseq/adata.h5ad
output: ACC_GEX.h5mu
jobid: 2
reason: Missing output files: ACC_GEX.h5mu
resources: tmpdir=/tmp

[Tue Sep 17 16:12:50 2024]
localrule download_genome_annotations:
output: genome_annotation.tsv, chromsizes.tsv
jobid: 8
reason: Missing output files: chromsizes.tsv, genome_annotation.tsv
resources: tmpdir=/tmp

2024-09-17 16:12:52,879 SCENIC+ INFO Reading cisTopic object.
2024-09-17 16:12:53,289 SCENIC+ INFO Reading gene expression AnnData.
Traceback (most recent call last):
File "/home/gilgolan/.local/bin/scenicplus", line 8, in
sys.exit(main())
^^^^^^
File "/home/gilgolan/.local/lib/python3.11/site-packages/scenicplus/cli/scenicplus.py", line 1137, in main
args.func(args)
File "/home/gilgolan/.local/lib/python3.11/site-packages/scenicplus/cli/scenicplus.py", line 46, in command_prepare_GEX_ACC
prepare_GEX_ACC(
File "/home/gilgolan/.local/lib/python3.11/site-packages/scenicplus/cli/commands.py", line 96, in prepare_GEX_ACC
mdata = process_multiome_data(
^^^^^^^^^^^^^^^^^^^^^^
File "/home/gilgolan/.local/lib/python3.11/site-packages/scenicplus/data_wrangling/adata_cistopic_wrangling.py", line 73, in process_multiome_data
raise Exception(
Exception: No cells found which are present in both assays, check input and consider using bc_transform_func!
[Tue Sep 17 16:12:53 2024]
Error in rule prepare_GEX_ACC_multiome:
jobid: 2
input: /home/gilgolan/bioinfo_analysis/scenicplus/tutorial2/outs/cistopic_obj.pkl, /home/gilgolan/bioinfo_analysis/scenicplus/tutorial2/scRNAseq/adata.h5ad
output: ACC_GEX.h5mu
shell:

        scenicplus prepare_data prepare_GEX_ACC                 --cisTopic_obj_fname /home/gilgolan/bioinfo_analysis/scenicplus/tutorial2/outs/cistopic_obj.pkl                 --GEX_anndata_fname /home/gilgolan/bioinfo_analysis/scenicplus/tutorial2/scRNAseq/adata.h5ad                 --out_file ACC_GEX.h5mu                 --bc_transform_func "lambda x: f'{x}'"
        
    (one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)

2024-09-17 16:13:38,876 Download gene annotation INFO Using genome: GRCh38.p14
2024-09-17 16:13:38,878 Download gene annotation INFO Found corresponding genome Id 51 on NCBI
2024-09-17 16:13:39,381 Download gene annotation INFO Found corresponding assembly Id 11968211 on NCBI
2024-09-17 16:13:39,884 Download gene annotation INFO Downloading assembly information from: ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/001/405/GCF_000001405.40_GRCh38.p14/GCF_000001405.40_GRCh38.p14_assembly_report.txt

Unhandeled exception occured
<urlopen error [Errno 110] Connection timed out>
Returning gene annotation without subestting for assembled chromosomesand converting to UCSC style. Please make sure that the chromosome namesin the returned object match with the chromosome names in the scplus_obj.Chromosome sizes will not be returned
2024-09-17 16:15:50,496 SCENIC+ INFO Chrosomome sizes was not found, please provide this information manually.
2024-09-17 16:15:50,497 SCENIC+ INFO Saving genome annotation to: genome_annotation.tsv
Waiting at most 5 seconds for missing files.
MissingOutputException in rule download_genome_annotations in file /home/gilgolan/bioinfo_analysis/scenicplus/tutorial2/scplus_pipeline/Snakemake/workflow/Snakefile, line 221:
Job 8 completed successfully, but some output files are missing. Missing files after 5 seconds. This might be due to filesystem latency. If that is the case, consider to increase the wait time with --latency-wait:
chromsizes.tsv
Removing output files of failed job download_genome_annotations since they might be corrupted:
genome_annotation.tsv
Shutting down, this might take some time.
Exiting because a job execution failed. Look above for error message
Complete log: .snakemake/log/2024-09-17T161250.453181.snakemake.log
WorkflowError:
At least one job did not complete successfully."

Expected behavior
I expect SnakeMake to run successfully on the tutorial pre-processed dataset.

Screenshots
If applicable, add screenshots to help explain your problem or show the format of the input data for the command/s.

Version (please complete the following information):
Python: 3.11.9
SCENIC+: 1.0a1
pyscenic 0.12.1+8.gd2309fe

Additional context
When I look at the cell names in the adata object (scRNAseq) and cistopic object (scATACseq) they are different, also I have a different number of cells:
"adata.obs
Out[216]:
VSN_cell_type ... pct_counts_mt
CCCTCATAGACACTTA-1 GC ... 0.083148
GCCATTACACCTGCCT-1 ASTP ... 0.132626
ATTGCAGGTTGTGACA-1 MGL ... 1.447368
CTGTTGGAGGCATTAC-1 MOL_B ... 0.095579
CGAATCTAGCTTAGCG-1 MOL_B ... 0.061851
... ... ... ...
TACCTTAGTTACTAGG-1 MOL_B ... 0.058480
GTAGCTGTCATTACAG-1 AST_CER ... 0.188088
AGGCAGGTCGCGACAC-1 MOL_A ... 0.044703
GCATTGCCAAGACTCC-1 MOL_B ... 0.145530
ACATTAGTCCGCAAGC-1 AST_CER ... 0.068552
[2313 rows x 13 columns]

cistopic_obj.cell_data
Out[218]:
cisTopic_nr_frag ... pycisTopic_cca_Seurat_cell_type
CACCTCAGTTGTAAAC-1-10x_multiome_brain 18300 ... AST
TGACTCCTCATCCACC-1-10x_multiome_brain 100055 ... BG
TTTCTCACATAAACCT-1-10x_multiome_brain 32192 ... GP
GTCCTCCCACACAATT-1-10x_multiome_brain 88443 ... BG
CTCCGTCCAGTTTGTG-1-10x_multiome_brain 131110 ... ENDO
... ... ... ...
GCAGGTTGTCCAAATG-1-10x_multiome_brain 2770 ... MOL
AAGCTCCCAGCACCAT-1-10x_multiome_brain 2180 ... MOL
CAGAATCTCCTCATGC-1-10x_multiome_brain 1744 ... MOL
TAGCCGGGTAACAGGG-1-10x_multiome_brain 2674 ... INH_SNCG
GTGCGCAGTGCTTAGA-1-10x_multiome_brain 5859 ... GP
[2845 rows x 42 columns]"

@ruicatxiao
Copy link

Running into this exact error as well. Curious to see if there is any potential fix

@kennethho04
Copy link

kennethho04 commented Oct 5, 2024

I got the same error in rule prepare_GEX_ACC_multiome and I was able to resolve it by setting bc_transform_func under params_data_preparation in config.yaml to "\"lambda x: f'{x}-10x_multiome_brain'\"". You can see that in the tutorial page when they show the config.yaml pipeline.

Not sure about the error with download_genome_annotations though. I ran into a similar problem with download_genome_annotations myself going through the tutorial and still unable to resolve it just yet.

@SeppeDeWinter
Copy link
Collaborator

Hi @ruicatxiao and @gilgolan73

Related to the barcode error, indeed make use of the bc_transform_func as mentioned by @kennethho04 (let me know if you need help with this).

Related to the chromsizes issue, looks like the pipeline was not able to download these files automatically. You can download them from: https://hgdownload.cse.ucsc.edu/goldenpath/hg38/bigZips/hg38.chrom.sizes

All the best,

Seppe

@gilgolan73
Copy link
Author

Hello @SeppeDeWinter , indeed this solution (to change bc_transform_func) solve the issue.
For the chromesizes issue, changing the data_wrangling/gene_search_space.py as mentioned in #357 solve this issue.

However, now I encounter another issue when running Snakemake . The issue is related to the DARs, I checked and all the DAR region set bed files are not empty (as suggested in #183).
Thank you for the help.
Gil

attached is the error:
"(scenicplus) [gilgolan@localhost Snakemake]$ snakemake --cores 20
Assuming unrestricted shared filesystem usage for local execution.
Building DAG of jobs...
Using shell: /usr/bin/bash
Provided cores: 20
Rules claiming more threads will be scaled down.
Job stats:
job count


AUCell_direct 1
AUCell_extended 1
all 1
download_genome_annotations 1
eGRN_direct 1
eGRN_extended 1
get_search_space 1
motif_enrichment_cistarget 1
motif_enrichment_dem 1
prepare_GEX_ACC_multiome 1
prepare_menr 1
region_to_gene 1
scplus_mudata 1
tf_to_gene 1
total 14

Select jobs to execute...
Execute 2 jobs...

[Tue Oct 8 15:58:49 2024]
localrule prepare_GEX_ACC_multiome:
input: /home/gilgolan/bioinfo_analysis/scenicplus/tutorial2/outs/cistopic_obj.pkl, /home/gilgolan/bioinfo_analysis/scenicplus/tutorial2/scRNAseq/adata.h5ad
output: ACC_GEX.h5mu
jobid: 2
reason: Missing output files: ACC_GEX.h5mu
resources: tmpdir=/tmp

[Tue Oct 8 15:58:49 2024]
localrule download_genome_annotations:
output: genome_annotation.tsv, chromsizes.tsv
jobid: 8
reason: Missing output files: chromsizes.tsv, genome_annotation.tsv
resources: tmpdir=/tmp

2024-10-08 15:58:51,599 SCENIC+ INFO Reading cisTopic object.
2024-10-08 15:58:51,975 SCENIC+ INFO Reading gene expression AnnData.
2024-10-08 15:58:52,056 Ingesting multiome data INFO Found 1963 multiome cells.
2024-10-08 15:58:52,196 cisTopic INFO Imputing region accessibility
2024-10-08 15:58:52,196 cisTopic INFO Impute region accessibility for regions 0-20000
2024-10-08 15:58:52,402 cisTopic INFO Impute region accessibility for regions 20000-40000
2024-10-08 15:58:52,603 cisTopic INFO Impute region accessibility for regions 40000-60000
2024-10-08 15:58:52,804 cisTopic INFO Impute region accessibility for regions 60000-80000
2024-10-08 15:58:52,994 cisTopic INFO Impute region accessibility for regions 80000-100000
2024-10-08 15:58:53,185 cisTopic INFO Impute region accessibility for regions 100000-120000
2024-10-08 15:58:53,384 cisTopic INFO Impute region accessibility for regions 120000-140000
2024-10-08 15:58:53,577 cisTopic INFO Impute region accessibility for regions 140000-160000
2024-10-08 15:58:53,777 cisTopic INFO Impute region accessibility for regions 160000-180000
2024-10-08 15:58:53,966 cisTopic INFO Impute region accessibility for regions 180000-200000
2024-10-08 15:58:54,160 cisTopic INFO Impute region accessibility for regions 200000-220000
2024-10-08 15:58:54,367 cisTopic INFO Impute region accessibility for regions 220000-240000
2024-10-08 15:58:54,587 cisTopic INFO Impute region accessibility for regions 240000-260000
2024-10-08 15:58:54,793 cisTopic INFO Impute region accessibility for regions 260000-280000
2024-10-08 15:58:55,005 cisTopic INFO Impute region accessibility for regions 280000-300000
2024-10-08 15:58:55,196 cisTopic INFO Impute region accessibility for regions 300000-320000
2024-10-08 15:58:55,385 cisTopic INFO Impute region accessibility for regions 320000-340000
2024-10-08 15:58:55,593 cisTopic INFO Impute region accessibility for regions 340000-360000
2024-10-08 15:58:55,808 cisTopic INFO Impute region accessibility for regions 360000-380000
2024-10-08 15:58:56,007 cisTopic INFO Impute region accessibility for regions 380000-400000
2024-10-08 15:58:56,196 cisTopic INFO Impute region accessibility for regions 400000-420000
2024-10-08 15:58:56,398 cisTopic INFO Impute region accessibility for regions 420000-440000
2024-10-08 15:58:56,561 cisTopic INFO Done!
... storing 'sample_id' as categorical
... storing 'VSN_cell_type' as categorical
... storing 'VSN_leiden_res0.3' as categorical
... storing 'VSN_leiden_res0.6' as categorical
... storing 'VSN_leiden_res0.9' as categorical
... storing 'VSN_leiden_res1.2' as categorical
... storing 'VSN_sample_id' as categorical
... storing 'Seurat_leiden_res0.6' as categorical
... storing 'Seurat_leiden_res1.2' as categorical
... storing 'Seurat_cell_type' as categorical
... storing 'Chromosome' as categorical
[Tue Oct 8 15:59:03 2024]
Finished job 2.
1 of 14 steps (7%) done
Select jobs to execute...
2024-10-08 15:59:18,979 Download gene annotation INFO Using genome: GRCh38.p14
2024-10-08 15:59:18,982 Download gene annotation INFO Found corresponding genome Id 51 on NCBI
2024-10-08 15:59:19,487 Download gene annotation INFO Found corresponding assembly Id 11968211 on NCBI
2024-10-08 15:59:19,991 Download gene annotation INFO Downloading assembly information from: http://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/001/405/GCF_000001405.40_GRCh38.p14/GCF_000001405.40_GRCh38.p14_assembly_report.txt
2024-10-08 15:59:21,279 Download gene annotation INFO Found following assembled molecules (chromosomes):
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
X
Y
MT
2024-10-08 15:59:21,289 Download gene annotation INFO Converting chromosomes names to UCSC style as follows:
Original UCSC
1 chr1
2 chr2
3 chr3
4 chr4
5 chr5
6 chr6
7 chr7
8 chr8
9 chr9
10 chr10
11 chr11
12 chr12
13 chr13
14 chr14
15 chr15
16 chr16
17 chr17
18 chr18
19 chr19
20 chr20
21 chr21
22 chr22
X chrX
Y chrY
MT chrM
2024-10-08 15:59:21,297 SCENIC+ INFO Saving chromosome sizes to: chromsizes.tsv
2024-10-08 15:59:21,298 SCENIC+ INFO Saving genome annotation to: genome_annotation.tsv
[Tue Oct 8 15:59:21 2024]
Finished job 8.
2 of 14 steps (14%) done
Execute 1 jobs...

[Tue Oct 8 15:59:21 2024]
localrule get_search_space:
input: ACC_GEX.h5mu, genome_annotation.tsv, chromsizes.tsv
output: search_space.tsv
jobid: 11
reason: Missing output files: search_space.tsv; Input files updated by another job: ACC_GEX.h5mu, chromsizes.tsv, genome_annotation.tsv
resources: tmpdir=/tmp

2024-10-08 15:59:23,995 SCENIC+ INFO Reading data
/home/gilgolan/.local/lib/python3.11/site-packages/anndata/_core/anndata.py:522: FutureWarning: The dtype argument is deprecated and will be removed in late 2024.
warnings.warn(
/home/gilgolan/.local/lib/python3.11/site-packages/anndata/_core/anndata.py:522: FutureWarning: The dtype argument is deprecated and will be removed in late 2024.
warnings.warn(
2024-10-08 15:59:26,016 Get search space INFO Extending promoter annotation to 10 bp upstream and 10 downstream
2024-10-08 15:59:26,116 Get search space INFO Extending search space to:
150000 bp downstream of the end of the gene.
150000 bp upstream of the start of the gene.
2024-10-08 15:59:26,516 Get search space INFO Intersecting with regions.
2024-10-08 15:59:27,792 Get search space INFO Calculating distances from region to gene
2024-10-08 16:00:09,179 Get search space INFO Imploding multiple entries per region and gene
2024-10-08 16:01:45,857 SCENIC+ INFO Writing search space to: search_space.tsv
[Tue Oct 8 16:01:47 2024]
Finished job 11.
3 of 14 steps (21%) done
Select jobs to execute...
Execute 1 jobs...

[Tue Oct 8 16:01:47 2024]
localrule region_to_gene:
input: ACC_GEX.h5mu, search_space.tsv
output: region_to_gene_adj.tsv
jobid: 10
reason: Missing output files: region_to_gene_adj.tsv; Input files updated by another job: ACC_GEX.h5mu, search_space.tsv
threads: 20
resources: tmpdir=/tmp

2024-10-08 16:01:51,690 SCENIC+ INFO Reading multiome MuData.
/home/gilgolan/.local/lib/python3.11/site-packages/anndata/_core/anndata.py:522: FutureWarning: The dtype argument is deprecated and will be removed in late 2024.
warnings.warn(
/home/gilgolan/.local/lib/python3.11/site-packages/anndata/_core/anndata.py:522: FutureWarning: The dtype argument is deprecated and will be removed in late 2024.
warnings.warn(
2024-10-08 16:01:53,068 SCENIC+ INFO Reading search space
2024-10-08 16:01:53,646 R2G INFO Calculating region to gene importances, using GBM method
Running using 20 cores: 100%|ג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆ| 18565/18565 [19:06<00:00, 16.19it/s]
2024-10-08 16:21:06,511 R2G INFO Calculating region to gene correlation, using SR method
Running using 20 cores: 0%| | 0/18565 [00:00<?, ?it/s]/home/gilgolan/.local/lib/python3.11/site-packages/scenicplus/enhancer_to_gene.py:158: ConstantInputWarning: An input array is constant; the correlation coefficient is not defined.
correlation_result = np.array([correlator(x, exp) for x in acc.T])
/home/gilgolan/.local/lib/python3.11/site-packages/scenicplus/enhancer_to_gene.py:158: ConstantInputWarning: An input array is constant; the correlation coefficient is not defined.
correlation_result = np.array([correlator(x, exp) for x in acc.T])
/home/gilgolan/.local/lib/python3.11/site-packages/scenicplus/enhancer_to_gene.py:158: ConstantInputWarning: An input array is constant; the correlation coefficient is not defined.
correlation_result = np.array([correlator(x, exp) for x in acc.T])
Running using 20 cores: 2%|ג–ˆג–� | 280/18565 [00:00<01:05, 278.20it/s]/home/gilgolan/.local/lib/python3.11/site-packages/scenicplus/enhancer_to_gene.py:158: ConstantInputWarning: An input array is constant; the correlation coefficient is not defined.
correlation_result = np.array([correlator(x, exp) for x in acc.T])
/home/gilgolan/.local/lib/python3.11/site-packages/scenicplus/enhancer_to_gene.py:158: ConstantInputWarning: An input array is constant; the correlation coefficient is not defined.
correlation_result = np.array([correlator(x, exp) for x in acc.T])
Running using 20 cores: 2%|ג–ˆג–� | 313/18565 [00:00<01:03, 285.64it/s]/home/gilgolan/.local/lib/python3.11/site-packages/scenicplus/enhancer_to_gene.py:158: ConstantInputWarning: An input array is constant; the correlation coefficient is not defined.
correlation_result = np.array([correlator(x, exp) for x in acc.T])
/home/gilgolan/.local/lib/python3.11/site-packages/scenicplus/enhancer_to_gene.py:158: ConstantInputWarning: An input array is constant; the correlation coefficient is not defined.
correlation_result = np.array([correlator(x, exp) for x in acc.T])
/home/gilgolan/.local/lib/python3.11/site-packages/scenicplus/enhancer_to_gene.py:158: ConstantInputWarning: An input array is constant; the correlation coefficient is not defined.
correlation_result = np.array([correlator(x, exp) for x in acc.T])
/home/gilgolan/.local/lib/python3.11/site-packages/scenicplus/enhancer_to_gene.py:158: ConstantInputWarning: An input array is constant; the correlation coefficient is not defined.
correlation_result = np.array([correlator(x, exp) for x in acc.T])
Running using 20 cores: 2%|ג–ˆג–� | 360/18565 [00:01<01:19, 229.80it/s]/home/gilgolan/.local/lib/python3.11/site-packages/scenicplus/enhancer_to_gene.py:158: ConstantInputWarning: An input array is constant; the correlation coefficient is not defined.
correlation_result = np.array([correlator(x, exp) for x in acc.T])
Running using 20 cores: 2%|ג–ˆג–‰ | 386/18565 [00:01<01:19, 228.68it/s]/home/gilgolan/.local/lib/python3.11/site-packages/scenicplus/enhancer_to_gene.py:158: ConstantInputWarning: An input array is constant; the correlation coefficient is not defined.
correlation_result = np.array([correlator(x, exp) for x in acc.T])
/home/gilgolan/.local/lib/python3.11/site-packages/scenicplus/enhancer_to_gene.py:158: ConstantInputWarning: An input array is constant; the correlation coefficient is not defined.
correlation_result = np.array([correlator(x, exp) for x in acc.T])
Running using 20 cores: 3%|ג–ˆג–ˆג–� | 508/18565 [00:02<01:59, 150.90it/s]/home/gilgolan/.local/lib/python3.11/site-packages/scenicplus/enhancer_to_gene.py:158: ConstantInputWarning: An input array is constant; the correlation coefficient is not defined.
correlation_result = np.array([correlator(x, exp) for x in acc.T])
Running using 20 cores: 4%|ג–ˆג–ˆג–ˆג–� | 715/18565 [00:04<03:07, 95.32it/s]/home/gilgolan/.local/lib/python3.11/site-packages/scenicplus/enhancer_to_gene.py:158: ConstantInputWarning: An input array is constant; the correlation coefficient is not defined.
correlation_result = np.array([correlator(x, exp) for x in acc.T])
/home/gilgolan/.local/lib/python3.11/site-packages/scenicplus/enhancer_to_gene.py:158: ConstantInputWarning: An input array is constant; the correlation coefficient is not defined.
correlation_result = np.array([correlator(x, exp) for x in acc.T])
Running using 20 cores: 4%|ג–ˆג–ˆג–ˆג–‰ | 783/18565 [00:05<02:20, 126.41it/s]/home/gilgolan/.local/lib/python3.11/site-packages/scenicplus/enhancer_to_gene.py:158: ConstantInputWarning: An input array is constant; the correlation coefficient is not defined.
correlation_result = np.array([correlator(x, exp) for x in acc.T])
Running using 20 cores: 5%|ג–ˆג–ˆג–ˆג–ˆג–� | 838/18565 [00:05<01:43, 170.62it/s]/home/gilgolan/.local/lib/python3.11/site-packages/scenicplus/enhancer_to_gene.py:158: ConstantInputWarning: An input array is constant; the correlation coefficient is not defined.
correlation_result = np.array([correlator(x, exp) for x in acc.T])
/home/gilgolan/.local/lib/python3.11/site-packages/scenicplus/enhancer_to_gene.py:158: ConstantInputWarning: An input array is constant; the correlation coefficient is not defined.
correlation_result = np.array([correlator(x, exp) for x in acc.T])
Running using 20 cores: 5%|ג–ˆג–ˆג–ˆג–ˆג–� | 878/18565 [00:06<03:39, 80.51it/s]/home/gilgolan/.local/lib/python3.11/site-packages/scenicplus/enhancer_to_gene.py:158: ConstantInputWarning: An input array is constant; the correlation coefficient is not defined.
correlation_result = np.array([correlator(x, exp) for x in acc.T])
Running using 20 cores: 5%|ג–ˆג–ˆג–ˆג–ˆג–� | 900/18565 [00:06<02:59, 98.28it/s]/home/gilgolan/.local/lib/python3.11/site-packages/scenicplus/enhancer_to_gene.py:158: ConstantInputWarning: An input array is constant; the correlation coefficient is not defined.
correlation_result = np.array([correlator(x, exp) for x in acc.T])
Running using 20 cores: 100%|ג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆ| 18565/18565 [02:44<00:00, 113.04it/s]
2024-10-08 16:24:01,456 R2G INFO Done!
2024-10-08 16:24:01,568 SCENIC+ INFO Saving region to gene adjacencies to region_to_gene_adj.tsv
[Tue Oct 8 16:24:07 2024]
Finished job 10.
4 of 14 steps (29%) done
Select jobs to execute...
Execute 1 jobs...

[Tue Oct 8 16:24:07 2024]
localrule motif_enrichment_dem:
input: /home/gilgolan/bioinfo_analysis/scenicplus/tutorial2/outs/region_sets, /home/gilgolan/bioinfo_analysis/scenicplus/tutorial2/cistarget_db/hg38_10kbp_up_10kbp_down_full_tx_v10_clust.genes_vs_motifs.scores.feather, genome_annotation.tsv, /home/gilgolan/bioinfo_analysis/scenicplus/tutorial2/motifs-v10nr_clust-nr.hgnc-m0.001-o0.0.tbl
output: dem_results.hdf5, dem_results.html
jobid: 7
reason: Missing output files: dem_results.hdf5; Input files updated by another job: genome_annotation.tsv
threads: 20
resources: tmpdir=/tmp

2024-10-08 16:24:11,789 SCENIC+ INFO Reading region sets from: /home/gilgolan/bioinfo_analysis/scenicplus/tutorial2/outs/region_sets
2024-10-08 16:24:11,789 SCENIC+ INFO Reading all .bed files in: Topics_otsu
2024-10-08 16:24:12,109 SCENIC+ INFO Reading all .bed files in: Topics_top_3k
2024-10-08 16:24:12,194 SCENIC+ INFO Reading all .bed files in: DARs_cell_type
joblib.externals.loky.process_executor._RemoteTraceback:
"""
Traceback (most recent call last):
File "/home/gilgolan/.local/lib/python3.11/site-packages/joblib/externals/loky/process_executor.py", line 463, in _process_worker
r = call_item()
^^^^^^^^^^^
File "/home/gilgolan/.local/lib/python3.11/site-packages/joblib/externals/loky/process_executor.py", line 291, in call
return self.fn(*self.args, **self.kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/gilgolan/.local/lib/python3.11/site-packages/joblib/parallel.py", line 589, in call
return [func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^
File "/home/gilgolan/.local/lib/python3.11/site-packages/joblib/parallel.py", line 589, in
return [func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/home/gilgolan/.local/lib/python3.11/site-packages/scenicplus/cli/commands.py", line 320, in _run_dem_single_region_set
dem_db = DEMDatabase(
^^^^^^^^^^^^
File "/home/gilgolan/.local/lib/python3.11/site-packages/pycistarget/motif_enrichment_dem.py", line 147, in init
self.db_regions = pr.PyRanges(region_names_to_coordinates(list(self.genes)))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/gilgolan/.local/lib/python3.11/site-packages/pycistarget/utils.py", line 35, in region_names_to_coordinates
regiondf.columns=['Chromosome', 'Start', 'End']
^^^^^^^^^^^^^^^^
File "/home/gilgolan/.local/lib/python3.11/site-packages/pandas/core/generic.py", line 5920, in setattr
return object.setattr(self, name, value)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "pandas/_libs/properties.pyx", line 69, in pandas._libs.properties.AxisProperty.set
File "/home/gilgolan/.local/lib/python3.11/site-packages/pandas/core/generic.py", line 822, in _set_axis
self._mgr.set_axis(axis, labels)
File "/home/gilgolan/.local/lib/python3.11/site-packages/pandas/core/internals/managers.py", line 228, in set_axis
self._validate_set_axis(axis, new_labels)
File "/home/gilgolan/.local/lib/python3.11/site-packages/pandas/core/internals/base.py", line 70, in _validate_set_axis
raise ValueError(
ValueError: Length mismatch: Expected axis has 0 elements, new values have 3 elements
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "/home/gilgolan/.local/bin/scenicplus", line 8, in
sys.exit(main())
^^^^^^
File "/home/gilgolan/.local/lib/python3.11/site-packages/scenicplus/cli/scenicplus.py", line 1137, in main
args.func(args)
File "/home/gilgolan/.local/lib/python3.11/site-packages/scenicplus/cli/scenicplus.py", line 588, in motif_enrichment_dem
run_motif_enrichment_dem(
File "/home/gilgolan/.local/lib/python3.11/site-packages/scenicplus/cli/commands.py", line 466, in run_motif_enrichment_dem
dem_results: List[DEM] = joblib.Parallel(
^^^^^^^^^^^^^^^^
File "/home/gilgolan/.local/lib/python3.11/site-packages/joblib/parallel.py", line 1952, in call
return output if self.return_generator else list(output)
^^^^^^^^^^^^
File "/home/gilgolan/.local/lib/python3.11/site-packages/joblib/parallel.py", line 1595, in _get_outputs
yield from self._retrieve()
File "/home/gilgolan/.local/lib/python3.11/site-packages/joblib/parallel.py", line 1699, in _retrieve
self._raise_error_fast()
File "/home/gilgolan/.local/lib/python3.11/site-packages/joblib/parallel.py", line 1734, in _raise_error_fast
error_job.get_result(self.timeout)
File "/home/gilgolan/.local/lib/python3.11/site-packages/joblib/parallel.py", line 736, in get_result
return self._return_or_raise()
^^^^^^^^^^^^^^^^^^^^^^^
File "/home/gilgolan/.local/lib/python3.11/site-packages/joblib/parallel.py", line 754, in _return_or_raise
raise self._result
ValueError: Length mismatch: Expected axis has 0 elements, new values have 3 elements
[Tue Oct 8 16:24:31 2024]
Error in rule motif_enrichment_dem:
jobid: 7
input: /home/gilgolan/bioinfo_analysis/scenicplus/tutorial2/outs/region_sets, /home/gilgolan/bioinfo_analysis/scenicplus/tutorial2/cistarget_db/hg38_10kbp_up_10kbp_down_full_tx_v10_clust.genes_vs_motifs.scores.feather, genome_annotation.tsv, /home/gilgolan/bioinfo_analysis/scenicplus/tutorial2/motifs-v10nr_clust-nr.hgnc-m0.001-o0.0.tbl
output: dem_results.hdf5, dem_results.html
shell:

            scenicplus grn_inference motif_enrichment_dem                     --region_set_folder /home/gilgolan/bioinfo_analysis/scenicplus/tutorial2/outs/region_sets                     --dem_db_fname /home/gilgolan/bioinfo_analysis/scenicplus/tutorial2/cistarget_db/hg38_10kbp_up_10kbp_down_full_tx_v10_clust.genes_vs_motifs.scores.feather                     --output_fname_dem_result dem_results.hdf5                     --temp_dir /tmp                     --species homo_sapiens                     --fraction_overlap_w_dem_database 0.4                     --max_bg_regions 500                     --genome_annotation genome_annotation.tsv                     --balance_number_of_promoters                     --promoter_space 1000                     --adjpval_thr 0.05                     --log2fc_thr 1.0                     --mean_fg_thr 0.0                     --motif_hit_thr 3.0                     --path_to_motif_annotations /home/gilgolan/bioinfo_analysis/scenicplus/tutorial2/motifs-v10nr_clust-nr.hgnc-m0.001-o0.0.tbl                     --annotation_version v10nr_clust                     --motif_similarity_fdr 0.001                     --orthologous_identity_threshold 0.0                     --annotations_to_use Direct_annot Orthology_annot                     --write_html                     --output_fname_dem_html dem_results.html                     --seed 666                     --n_cpu 20
        
    (one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)

Shutting down, this might take some time.
Exiting because a job execution failed. Look above for error message
Complete log: .snakemake/log/2024-10-08T155849.153427.snakemake.log
WorkflowError:
At least one job did not complete successfully."

@kennethho04
Copy link

Hi @gilgolan73

Seems like you ran into same problem as issue #432 ; rolling your python version from 3.11.9 to 3.11.8 should resolve the error.

@gilgolan73
Copy link
Author

gilgolan73 commented Oct 10, 2024

Hello @kennethho04 , thank you for the quick reply
How do you suggest to revert the python version? Do i need to re-install the packages?

Gil

@kennethho04
Copy link

@gilgolan73 doing conda install python==3.11.8 in your conda env should suffice. You don't need to re-install the packages.

@gilgolan73
Copy link
Author

hello @kennethho04, I tried to do it (revert to python 3.11.8 as suggested), but still getting the same error (attached). Also I'm attaching the list of installed packages.

Thank you,
Gil
error 141024_python_3.11.8.txt
list packages 141024.txt

@yojetsharma
Copy link

hello @kennethho04, I tried to do it (revert to python 3.11.8 as suggested), but still getting the same error (attached). Also I'm attaching the list of installed packages.

Thank you, Gil error 141024_python_3.11.8.txt list packages 141024.txt

It looks like an issue with parallel processing. Have you tried installing/downgrading dask?

@gilgolan73
Copy link
Author

Hi @yojetsharma @kennethho04 , I am using Dask version 2024.5.0 with python 3.11.8.
Which version do you recommend I use? (I have a CentOS 9 Linux machine; it is a virtual machine on OracleVM.)

Thank you,
Gil

@yojetsharma
Copy link

Hi @yojetsharma @kennethho04 , I am using Dask version 2024.5.0 with python 3.11.8. Which version do you recommend I use? (I have a CentOS 9 Linux machine; it is a virtual machine on OracleVM.)

Thank you, Gil

I too had faced an issue with parallel processing but as someone suggested in one of the other issues, downgrading it to 2024.5.0 helped. But looks like you are already using that.
The python version that I am using is 3.11.10.

@gilgolan73
Copy link
Author

@yojetsharma @kennethho04 I tried with python 3.11.8, both with dask version 2024.2.1 and 2024.5.0.
still receiving the same error. do you think I need to install python 3.11.10?

Thanks

@yojetsharma
Copy link

Does reducing number of cores help?
Also, are the region_sets, genome_annotation.tsv are looking fine?

@gilgolan73
Copy link
Author

@yojetsharma Hi, I tried to reduce the number of cores to 10, and to 1. It still doesn't help.
The files look OK, i'm attaching them.
genome_annotation.txt
PURK.txt
OPC.txt
NFOL.txt
MOL.txt
MGL.txt
MG.txt
INH_VIP.txt
INH_SST.txt
INH_SNCG.txt
INH_PVALB.txt
GP.txt
GC.txt
ENDO.txt
COP.txt
BG.txt
AST.txt

Gil

@yojetsharma
Copy link

My last resort would be to try and reinstalling the conda env and see if it fixes the issue.

@gilgolan73
Copy link
Author

Hi @yojetsharma @kennethho04
I tried to reinstall the conda env (both with python 3.11.10 and python 3.11.8), it still did not resolve the issue. Do you have any other suggestions?

Thank you,
Gil

@brianysoong
Copy link

@gilgolan73 Did you end up finding a way to resolve your issue? I think I am experiencing a similar issue, where the cistarget motif enrichment is working fine, but DEM does not identify any motifs

@gilgolan73
Copy link
Author

Hi @brianysoong, unfortunately I am still stuck with this issue. Tried many different Dask and python versions but the same issue persists. Do you have any suggestions?
@yojetsharma @kennethho04

Gil

@brianysoong
Copy link

@gilgolan73 In my case, it ended up being a dumb mistake where I used human genome / annotations for mouse data!

@gilgolan73
Copy link
Author

Hi @brianysoong , can you please elaborate which file exactly was wrong ? and from where did you downloaded the correct file?

Thank you,
Gil

@gilgolan73
Copy link
Author

Hi,
I finally managed to solve the issue, by downloading the cistarget databases (ranking and scores) from this location:
https://resources.aertslab.org/cistarget/databases/homo_sapiens/hg38/screen/mc_v10_clust/region_based/
(according to this issue - #231).
@SeppeDeWinter @yojetsharma @kennethho04 can you please confirm that this is the right location to download these files from?

Thanks,
Gil

@SeppeDeWinter
Copy link
Collaborator

Hi @gilgolan73

That's the correct location to download the database from.

S

@gilgolan73
Copy link
Author

Thank you @SeppeDeWinter .
I believe it will be helpful for other users to mention it in the tutorial (so they will not make the same mistake as me).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants