Error when running SnakeMake #468

gilgolan73 · 2024-09-17T17:53:02Z

Describe the bug
Hello, I'm trying to run SCENIC+ using SnakeMake in a linux machine (centos 9), on the tutorial dataset.
I ran scATAC-seq preprocessing in python (using pycistopic, using the tutorial: https://pycistopic.readthedocs.io/en/latest/notebooks/human_cerebellum.html)
Then I ran the scRNAseq preprocessing in python (using the tutorial: https://scenicplus.readthedocs.io/en/latest/human_cerebellum_scRNA_pp.html#Preprocessing-the-scRNA-seq-data).
I'm using the default config.yml file for SnakeMake, just changed the location of the input data.
1-2 minutes after running SnakeMake (as in the tutorial: https://scenicplus.readthedocs.io/en/latest/human_cerebellum.html#Running-SCENIC+), I receive an error (see below) which I believe is regarding to the different cell names between the scATACseq and scRNAseq datasets. Please let me know how to solve this issue. Thank you!

To Reproduce
"snakemake --cores 20"

Error output
"Assuming unrestricted shared filesystem usage for local execution.
Building DAG of jobs...
Using shell: /usr/bin/bash
Provided cores: 20
Rules claiming more threads will be scaled down.
Job stats:
job count

AUCell_direct 1
AUCell_extended 1
all 1
download_genome_annotations 1
eGRN_direct 1
eGRN_extended 1
get_search_space 1
motif_enrichment_cistarget 1
motif_enrichment_dem 1
prepare_GEX_ACC_multiome 1
prepare_menr 1
region_to_gene 1
scplus_mudata 1
tf_to_gene 1
total 14

Select jobs to execute...
Execute 2 jobs...

[Tue Sep 17 16:12:50 2024]
localrule prepare_GEX_ACC_multiome:
input: /home/gilgolan/bioinfo_analysis/scenicplus/tutorial2/outs/cistopic_obj.pkl, /home/gilgolan/bioinfo_analysis/scenicplus/tutorial2/scRNAseq/adata.h5ad
output: ACC_GEX.h5mu
jobid: 2
reason: Missing output files: ACC_GEX.h5mu
resources: tmpdir=/tmp

[Tue Sep 17 16:12:50 2024]
localrule download_genome_annotations:
output: genome_annotation.tsv, chromsizes.tsv
jobid: 8
reason: Missing output files: chromsizes.tsv, genome_annotation.tsv
resources: tmpdir=/tmp

2024-09-17 16:12:52,879 SCENIC+ INFO Reading cisTopic object.
2024-09-17 16:12:53,289 SCENIC+ INFO Reading gene expression AnnData.
Traceback (most recent call last):
File "/home/gilgolan/.local/bin/scenicplus", line 8, in
sys.exit(main())
^^^^^^
File "/home/gilgolan/.local/lib/python3.11/site-packages/scenicplus/cli/scenicplus.py", line 1137, in main
args.func(args)
File "/home/gilgolan/.local/lib/python3.11/site-packages/scenicplus/cli/scenicplus.py", line 46, in command_prepare_GEX_ACC
prepare_GEX_ACC(
File "/home/gilgolan/.local/lib/python3.11/site-packages/scenicplus/cli/commands.py", line 96, in prepare_GEX_ACC
mdata = process_multiome_data(
^^^^^^^^^^^^^^^^^^^^^^
File "/home/gilgolan/.local/lib/python3.11/site-packages/scenicplus/data_wrangling/adata_cistopic_wrangling.py", line 73, in process_multiome_data
raise Exception(
Exception: No cells found which are present in both assays, check input and consider using bc_transform_func!
[Tue Sep 17 16:12:53 2024]
Error in rule prepare_GEX_ACC_multiome:
jobid: 2
input: /home/gilgolan/bioinfo_analysis/scenicplus/tutorial2/outs/cistopic_obj.pkl, /home/gilgolan/bioinfo_analysis/scenicplus/tutorial2/scRNAseq/adata.h5ad
output: ACC_GEX.h5mu
shell:

        scenicplus prepare_data prepare_GEX_ACC                 --cisTopic_obj_fname /home/gilgolan/bioinfo_analysis/scenicplus/tutorial2/outs/cistopic_obj.pkl                 --GEX_anndata_fname /home/gilgolan/bioinfo_analysis/scenicplus/tutorial2/scRNAseq/adata.h5ad                 --out_file ACC_GEX.h5mu                 --bc_transform_func "lambda x: f'{x}'"
        
    (one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)

2024-09-17 16:13:38,876 Download gene annotation INFO Using genome: GRCh38.p14
2024-09-17 16:13:38,878 Download gene annotation INFO Found corresponding genome Id 51 on NCBI
2024-09-17 16:13:39,381 Download gene annotation INFO Found corresponding assembly Id 11968211 on NCBI
2024-09-17 16:13:39,884 Download gene annotation INFO Downloading assembly information from: ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/001/405/GCF_000001405.40_GRCh38.p14/GCF_000001405.40_GRCh38.p14_assembly_report.txt

Unhandeled exception occured
<urlopen error [Errno 110] Connection timed out>
Returning gene annotation without subestting for assembled chromosomesand converting to UCSC style. Please make sure that the chromosome namesin the returned object match with the chromosome names in the scplus_obj.Chromosome sizes will not be returned
2024-09-17 16:15:50,496 SCENIC+ INFO Chrosomome sizes was not found, please provide this information manually.
2024-09-17 16:15:50,497 SCENIC+ INFO Saving genome annotation to: genome_annotation.tsv
Waiting at most 5 seconds for missing files.
MissingOutputException in rule download_genome_annotations in file /home/gilgolan/bioinfo_analysis/scenicplus/tutorial2/scplus_pipeline/Snakemake/workflow/Snakefile, line 221:
Job 8 completed successfully, but some output files are missing. Missing files after 5 seconds. This might be due to filesystem latency. If that is the case, consider to increase the wait time with --latency-wait:
chromsizes.tsv
Removing output files of failed job download_genome_annotations since they might be corrupted:
genome_annotation.tsv
Shutting down, this might take some time.
Exiting because a job execution failed. Look above for error message
Complete log: .snakemake/log/2024-09-17T161250.453181.snakemake.log
WorkflowError:
At least one job did not complete successfully."

Expected behavior
I expect SnakeMake to run successfully on the tutorial pre-processed dataset.

Screenshots
If applicable, add screenshots to help explain your problem or show the format of the input data for the command/s.

Version (please complete the following information):
Python: 3.11.9
SCENIC+: 1.0a1
pyscenic 0.12.1+8.gd2309fe

Additional context
When I look at the cell names in the adata object (scRNAseq) and cistopic object (scATACseq) they are different, also I have a different number of cells:
"adata.obs
Out[216]:
VSN_cell_type ... pct_counts_mt
CCCTCATAGACACTTA-1 GC ... 0.083148
GCCATTACACCTGCCT-1 ASTP ... 0.132626
ATTGCAGGTTGTGACA-1 MGL ... 1.447368
CTGTTGGAGGCATTAC-1 MOL_B ... 0.095579
CGAATCTAGCTTAGCG-1 MOL_B ... 0.061851
... ... ... ...
TACCTTAGTTACTAGG-1 MOL_B ... 0.058480
GTAGCTGTCATTACAG-1 AST_CER ... 0.188088
AGGCAGGTCGCGACAC-1 MOL_A ... 0.044703
GCATTGCCAAGACTCC-1 MOL_B ... 0.145530
ACATTAGTCCGCAAGC-1 AST_CER ... 0.068552
[2313 rows x 13 columns]

cistopic_obj.cell_data
Out[218]:
cisTopic_nr_frag ... pycisTopic_cca_Seurat_cell_type
CACCTCAGTTGTAAAC-1-10x_multiome_brain 18300 ... AST
TGACTCCTCATCCACC-1-10x_multiome_brain 100055 ... BG
TTTCTCACATAAACCT-1-10x_multiome_brain 32192 ... GP
GTCCTCCCACACAATT-1-10x_multiome_brain 88443 ... BG
CTCCGTCCAGTTTGTG-1-10x_multiome_brain 131110 ... ENDO
... ... ... ...
GCAGGTTGTCCAAATG-1-10x_multiome_brain 2770 ... MOL
AAGCTCCCAGCACCAT-1-10x_multiome_brain 2180 ... MOL
CAGAATCTCCTCATGC-1-10x_multiome_brain 1744 ... MOL
TAGCCGGGTAACAGGG-1-10x_multiome_brain 2674 ... INH_SNCG
GTGCGCAGTGCTTAGA-1-10x_multiome_brain 5859 ... GP
[2845 rows x 42 columns]"

The text was updated successfully, but these errors were encountered:

ruicatxiao · 2024-10-01T18:35:06Z

Running into this exact error as well. Curious to see if there is any potential fix

kennethho04 · 2024-10-05T21:13:46Z

I got the same error in rule prepare_GEX_ACC_multiome and I was able to resolve it by setting bc_transform_func under params_data_preparation in config.yaml to "\"lambda x: f'{x}-10x_multiome_brain'\"". You can see that in the tutorial page when they show the config.yaml pipeline.

Not sure about the error with download_genome_annotations though. I ran into a similar problem with download_genome_annotations myself going through the tutorial and still unable to resolve it just yet.

SeppeDeWinter · 2024-10-09T10:04:37Z

Hi @ruicatxiao and @gilgolan73

Related to the barcode error, indeed make use of the bc_transform_func as mentioned by @kennethho04 (let me know if you need help with this).

Related to the chromsizes issue, looks like the pipeline was not able to download these files automatically. You can download them from: https://hgdownload.cse.ucsc.edu/goldenpath/hg38/bigZips/hg38.chrom.sizes

All the best,

Seppe

gilgolan73 · 2024-10-09T11:40:31Z

Hello @SeppeDeWinter , indeed this solution (to change bc_transform_func) solve the issue.
For the chromesizes issue, changing the data_wrangling/gene_search_space.py as mentioned in #357 solve this issue.

However, now I encounter another issue when running Snakemake . The issue is related to the DARs, I checked and all the DAR region set bed files are not empty (as suggested in #183).
Thank you for the help.
Gil

attached is the error:
"(scenicplus) [gilgolan@localhost Snakemake]$ snakemake --cores 20
Assuming unrestricted shared filesystem usage for local execution.
Building DAG of jobs...
Using shell: /usr/bin/bash
Provided cores: 20
Rules claiming more threads will be scaled down.
Job stats:
job count

AUCell_direct 1
AUCell_extended 1
all 1
download_genome_annotations 1
eGRN_direct 1
eGRN_extended 1
get_search_space 1
motif_enrichment_cistarget 1
motif_enrichment_dem 1
prepare_GEX_ACC_multiome 1
prepare_menr 1
region_to_gene 1
scplus_mudata 1
tf_to_gene 1
total 14

Select jobs to execute...
Execute 2 jobs...

[Tue Oct 8 15:58:49 2024]
localrule prepare_GEX_ACC_multiome:
input: /home/gilgolan/bioinfo_analysis/scenicplus/tutorial2/outs/cistopic_obj.pkl, /home/gilgolan/bioinfo_analysis/scenicplus/tutorial2/scRNAseq/adata.h5ad
output: ACC_GEX.h5mu
jobid: 2
reason: Missing output files: ACC_GEX.h5mu
resources: tmpdir=/tmp

[Tue Oct 8 15:58:49 2024]
localrule download_genome_annotations:
output: genome_annotation.tsv, chromsizes.tsv
jobid: 8
reason: Missing output files: chromsizes.tsv, genome_annotation.tsv
resources: tmpdir=/tmp

2024-10-08 15:58:51,599 SCENIC+ 2024-10-08 15:58:51,975 SCENIC+ 2024-10-08 15:58:52,056 2024-10-08 15:58:52,196 cisTopic 2024-10-08 15:58:52,196 cisTopic 2024-10-08 15:58:52,402 cisTopic 2024-10-08 15:58:52,603 cisTopic 2024-10-08 15:58:52,804 cisTopic 2024-10-08 15:58:52,994 cisTopic 2024-10-08 15:58:53,185 cisTopic 2024-10-08 15:58:53,384 cisTopic 2024-10-08 15:58:53,577 cisTopic 2024-10-08 15:58:53,777 cisTopic 2024-10-08 15:58:53,966 cisTopic 2024-10-08 15:58:54,160 cisTopic 2024-10-08 15:58:54,367 cisTopic 2024-10-08 15:58:54,587 cisTopic 2024-10-08 15:58:54,793 cisTopic 2024-10-08 15:58:55,005 cisTopic 2024-10-08 15:58:55,196 cisTopic 2024-10-08 15:58:55,385 cisTopic 2024-10-08 15:58:55,593 cisTopic 2024-10-08 15:58:55,808 cisTopic 2024-10-08 15:58:56,007 cisTopic 2024-10-08 15:58:56,196 cisTopic 2024-10-08 15:58:56,398 cisTopic 2024-10-08 15:58:56,561 cisTopic ... storing 'sample_id' ... storing 'VSN_cell_type' ... storing 'VSN_leiden_res0.3' ... storing 'VSN_leiden_res0.6' ... storing 'VSN_leiden_res0.9' ... storing 'VSN_leiden_res1.2' ... storing 'VSN_sample_id' ... storing 'Seurat_leiden_res0.6' ... storing 'Seurat_leiden_res1.2' ... storing 'Seurat_cell_type' ... storing 'Chromosome' [Tue Oct 8 15:59:03 2024]
Finished job 2.
1 of 14 steps (7%) done
Select jobs to execute...
2024-10-08 15:59:18,979 2024-10-08 15:59:18,982 2024-10-08 15:59:19,487 2024-10-08 15:59:19,991 2024-10-08 15:59:21,279 1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
X
Y
MT
2024-10-08 15:59:21,289 Original UCSC
1 chr1
2 chr2
3 chr3
4 chr4
5 chr5
6 chr6
7 chr7
8 chr8
9 chr9
10 chr10
11 chr11
12 chr12
13 chr13
14 chr14
15 chr15
16 chr16
17 chr17
18 chr18
19 chr19
20 chr20
21 chr21
22 chr22
X chrX
Y chrY
MT chrM
2024-10-08 15:59:21,297 SCENIC+ 2024-10-08 15:59:21,298 SCENIC+ [Tue Oct 8 15:59:21 2024]
Finished job 8.
2 of 14 steps (14%) done
Execute 1 jobs... INFO Reading cisTopic object.
INFO Reading gene expression AnnData.
Ingesting multiome data INFO Found 1963 multiome cells.
INFO Imputing region accessibility
INFO Impute region accessibility for regions 0-20000
INFO Impute region accessibility for regions 20000-40000
INFO Impute region accessibility for regions 40000-60000
INFO Impute region accessibility for regions 60000-80000
INFO Impute region accessibility for regions 80000-100000
INFO Impute region accessibility for regions 100000-120000
INFO Impute region accessibility for regions 120000-140000
INFO Impute region accessibility for regions 140000-160000
INFO Impute region accessibility for regions 160000-180000
INFO Impute region accessibility for regions 180000-200000
INFO Impute region accessibility for regions 200000-220000
INFO Impute region accessibility for regions 220000-240000
INFO Impute region accessibility for regions 240000-260000
INFO Impute region accessibility for regions 260000-280000
INFO Impute region accessibility for regions 280000-300000
INFO Impute region accessibility for regions 300000-320000
INFO Impute region accessibility for regions 320000-340000
INFO Impute region accessibility for regions 340000-360000
INFO Impute region accessibility for regions 360000-380000
INFO Impute region accessibility for regions 380000-400000
INFO Impute region accessibility for regions 400000-420000
INFO Impute region accessibility for regions 420000-440000
INFO Done!
as categorical
as categorical
as categorical
as categorical
as categorical
as categorical
as categorical
as categorical
as categorical
as categorical
as categorical
Download gene annotation INFO Using genome: GRCh38.p14
Download gene annotation INFO Found corresponding genome Id 51 on NCBI
Download gene annotation INFO Found corresponding assembly Id 11968211 on NCBI
Download gene annotation INFO Downloading assembly information from: http://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/001/405/GCF_000001405.40_GRCh38.p14/GCF_000001405.40_GRCh38.p14_assembly_report.txt
Download gene annotation INFO Found following assembled molecules (chromosomes):
Download gene annotation INFO Converting chromosomes names to UCSC style as follows:
INFO Saving chromosome sizes to: chromsizes.tsv
INFO Saving genome annotation to: genome_annotation.tsv

[Tue Oct 8 15:59:21 2024]
localrule get_search_space:
input: ACC_GEX.h5mu, genome_annotation.tsv, chromsizes.tsv
output: search_space.tsv
jobid: 11
reason: Missing output files: search_space.tsv; Input files updated by another job: ACC_GEX.h5mu, chromsizes.tsv, genome_annotation.tsv
resources: tmpdir=/tmp

2024-10-08 15:59:23,995 SCENIC+ INFO Reading data
/home/gilgolan/.local/lib/python3.11/site-packages/anndata/_core/anndata.py:522: FutureWarning: The dtype argument is deprecated and will be removed in late 2024.
warnings.warn(
/home/gilgolan/.local/lib/python3.11/site-packages/anndata/_core/anndata.py:522: FutureWarning: The dtype argument is deprecated and will be removed in late 2024.
warnings.warn(
2024-10-08 15:59:26,016 Get search space INFO Extending promoter annotation to 10 bp upstream and 10 downstream
2024-10-08 15:59:26,116 Get search space INFO Extending search space to:
150000 bp downstream of the end of the gene.
150000 bp upstream of the start of the gene.
2024-10-08 15:59:26,516 Get search space INFO Intersecting with regions.
2024-10-08 15:59:27,792 Get search space INFO Calculating distances from region to gene
2024-10-08 16:00:09,179 Get search space INFO Imploding multiple entries per region and gene
2024-10-08 16:01:45,857 SCENIC+ INFO Writing search space to: search_space.tsv
[Tue Oct 8 16:01:47 2024]
Finished job 11.
3 of 14 steps (21%) done
Select jobs to execute...
Execute 1 jobs...

[Tue Oct 8 16:01:47 2024]
localrule region_to_gene:
input: ACC_GEX.h5mu, search_space.tsv
output: region_to_gene_adj.tsv
jobid: 10
reason: Missing output files: region_to_gene_adj.tsv; Input files updated by another job: ACC_GEX.h5mu, search_space.tsv
threads: 20
resources: tmpdir=/tmp

2024-10-08 16:01:51,690 SCENIC+ INFO Reading multiome MuData.
/home/gilgolan/.local/lib/python3.11/site-packages/anndata/_core/anndata.py:522: FutureWarning: The dtype argument is deprecated and will be removed in late 2024.
warnings.warn(
/home/gilgolan/.local/lib/python3.11/site-packages/anndata/_core/anndata.py:522: FutureWarning: The dtype argument is deprecated and will be removed in late 2024.
warnings.warn(
2024-10-08 16:01:53,068 SCENIC+ INFO Reading search space
2024-10-08 16:01:53,646 R2G INFO Calculating region to gene importances, using GBM method
Running using 20 cores: 100%|ג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆ| 18565/18565 [19:06<00:00, 16.19it/s]
2024-10-08 16:21:06,511 R2G INFO Calculating region to gene correlation, using SR method
Running using 20 cores: 0%| | 0/18565 [00:00<?, ?it/s]/home/gilgolan/.local/lib/python3.11/site-packages/scenicplus/enhancer_to_gene.py:158: ConstantInputWarning: An input array is constant; the correlation coefficient is not defined.
correlation_result = np.array([correlator(x, exp) for x in acc.T])
/home/gilgolan/.local/lib/python3.11/site-packages/scenicplus/enhancer_to_gene.py:158: ConstantInputWarning: An input array is constant; the correlation coefficient is not defined.
correlation_result = np.array([correlator(x, exp) for x in acc.T])
/home/gilgolan/.local/lib/python3.11/site-packages/scenicplus/enhancer_to_gene.py:158: ConstantInputWarning: An input array is constant; the correlation coefficient is not defined.
correlation_result = np.array([correlator(x, exp) for x in acc.T])
Running using 20 cores: 2%|ג–ˆג–� | 280/18565 [00:00<01:05, 278.20it/s]/home/gilgolan/.local/lib/python3.11/site-packages/scenicplus/enhancer_to_gene.py:158: ConstantInputWarning: An input array is constant; the correlation coefficient is not defined.
correlation_result = np.array([correlator(x, exp) for x in acc.T])
/home/gilgolan/.local/lib/python3.11/site-packages/scenicplus/enhancer_to_gene.py:158: ConstantInputWarning: An input array is constant; the correlation coefficient is not defined.
correlation_result = np.array([correlator(x, exp) for x in acc.T])
Running using 20 cores: 2%|ג–ˆג–� | 313/18565 [00:00<01:03, 285.64it/s]/home/gilgolan/.local/lib/python3.11/site-packages/scenicplus/enhancer_to_gene.py:158: ConstantInputWarning: An input array is constant; the correlation coefficient is not defined.
correlation_result = np.array([correlator(x, exp) for x in acc.T])
/home/gilgolan/.local/lib/python3.11/site-packages/scenicplus/enhancer_to_gene.py:158: ConstantInputWarning: An input array is constant; the correlation coefficient is not defined.
correlation_result = np.array([correlator(x, exp) for x in acc.T])
/home/gilgolan/.local/lib/python3.11/site-packages/scenicplus/enhancer_to_gene.py:158: ConstantInputWarning: An input array is constant; the correlation coefficient is not defined.
correlation_result = np.array([correlator(x, exp) for x in acc.T])
/home/gilgolan/.local/lib/python3.11/site-packages/scenicplus/enhancer_to_gene.py:158: ConstantInputWarning: An input array is constant; the correlation coefficient is not defined.
correlation_result = np.array([correlator(x, exp) for x in acc.T])
Running using 20 cores: 2%|ג–ˆג–� | 360/18565 [00:01<01:19, 229.80it/s]/home/gilgolan/.local/lib/python3.11/site-packages/scenicplus/enhancer_to_gene.py:158: ConstantInputWarning: An input array is constant; the correlation coefficient is not defined.
correlation_result = np.array([correlator(x, exp) for x in acc.T])
Running using 20 cores: 2%|ג–ˆג–‰ | 386/18565 [00:01<01:19, 228.68it/s]/home/gilgolan/.local/lib/python3.11/site-packages/scenicplus/enhancer_to_gene.py:158: ConstantInputWarning: An input array is constant; the correlation coefficient is not defined.
correlation_result = np.array([correlator(x, exp) for x in acc.T])
/home/gilgolan/.local/lib/python3.11/site-packages/scenicplus/enhancer_to_gene.py:158: ConstantInputWarning: An input array is constant; the correlation coefficient is not defined.
correlation_result = np.array([correlator(x, exp) for x in acc.T])
Running using 20 cores: 3%|ג–ˆג–ˆג–� | 508/18565 [00:02<01:59, 150.90it/s]/home/gilgolan/.local/lib/python3.11/site-packages/scenicplus/enhancer_to_gene.py:158: ConstantInputWarning: An input array is constant; the correlation coefficient is not defined.
correlation_result = np.array([correlator(x, exp) for x in acc.T])
Running using 20 cores: 4%|ג–ˆג–ˆג–ˆג–� | 715/18565 [00:04<03:07, 95.32it/s]/home/gilgolan/.local/lib/python3.11/site-packages/scenicplus/enhancer_to_gene.py:158: ConstantInputWarning: An input array is constant; the correlation coefficient is not defined.
correlation_result = np.array([correlator(x, exp) for x in acc.T])
/home/gilgolan/.local/lib/python3.11/site-packages/scenicplus/enhancer_to_gene.py:158: ConstantInputWarning: An input array is constant; the correlation coefficient is not defined.
correlation_result = np.array([correlator(x, exp) for x in acc.T])
Running using 20 cores: 4%|ג–ˆג–ˆג–ˆג–‰ | 783/18565 [00:05<02:20, 126.41it/s]/home/gilgolan/.local/lib/python3.11/site-packages/scenicplus/enhancer_to_gene.py:158: ConstantInputWarning: An input array is constant; the correlation coefficient is not defined.
correlation_result = np.array([correlator(x, exp) for x in acc.T])
Running using 20 cores: 5%|ג–ˆג–ˆג–ˆג–ˆג–� | 838/18565 [00:05<01:43, 170.62it/s]/home/gilgolan/.local/lib/python3.11/site-packages/scenicplus/enhancer_to_gene.py:158: ConstantInputWarning: An input array is constant; the correlation coefficient is not defined.
correlation_result = np.array([correlator(x, exp) for x in acc.T])
/home/gilgolan/.local/lib/python3.11/site-packages/scenicplus/enhancer_to_gene.py:158: ConstantInputWarning: An input array is constant; the correlation coefficient is not defined.
correlation_result = np.array([correlator(x, exp) for x in acc.T])
Running using 20 cores: 5%|ג–ˆג–ˆג–ˆג–ˆג–� | 878/18565 [00:06<03:39, 80.51it/s]/home/gilgolan/.local/lib/python3.11/site-packages/scenicplus/enhancer_to_gene.py:158: ConstantInputWarning: An input array is constant; the correlation coefficient is not defined.
correlation_result = np.array([correlator(x, exp) for x in acc.T])
Running using 20 cores: 5%|ג–ˆג–ˆג–ˆג–ˆג–� | 900/18565 [00:06<02:59, 98.28it/s]/home/gilgolan/.local/lib/python3.11/site-packages/scenicplus/enhancer_to_gene.py:158: ConstantInputWarning: An input array is constant; the correlation coefficient is not defined.
correlation_result = np.array([correlator(x, exp) for x in acc.T])
Running using 20 cores: 100%|ג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆג–ˆ| 18565/18565 [02:44<00:00, 113.04it/s]
2024-10-08 16:24:01,456 R2G INFO Done!
2024-10-08 16:24:01,568 SCENIC+ INFO Saving region to gene adjacencies to region_to_gene_adj.tsv
[Tue Oct 8 16:24:07 2024]
Finished job 10.
4 of 14 steps (29%) done
Select jobs to execute...
Execute 1 jobs...

[Tue Oct 8 16:24:07 2024]
localrule motif_enrichment_dem:
input: /home/gilgolan/bioinfo_analysis/scenicplus/tutorial2/outs/region_sets, /home/gilgolan/bioinfo_analysis/scenicplus/tutorial2/cistarget_db/hg38_10kbp_up_10kbp_down_full_tx_v10_clust.genes_vs_motifs.scores.feather, genome_annotation.tsv, /home/gilgolan/bioinfo_analysis/scenicplus/tutorial2/motifs-v10nr_clust-nr.hgnc-m0.001-o0.0.tbl
output: dem_results.hdf5, dem_results.html
jobid: 7
reason: Missing output files: dem_results.hdf5; Input files updated by another job: genome_annotation.tsv
threads: 20
resources: tmpdir=/tmp

2024-10-08 16:24:11,789 SCENIC+ INFO Reading region sets from: /home/gilgolan/bioinfo_analysis/scenicplus/tutorial2/outs/region_sets
2024-10-08 16:24:11,789 SCENIC+ INFO Reading all .bed files in: Topics_otsu
2024-10-08 16:24:12,109 SCENIC+ INFO Reading all .bed files in: Topics_top_3k
2024-10-08 16:24:12,194 SCENIC+ INFO Reading all .bed files in: DARs_cell_type
joblib.externals.loky.process_executor._RemoteTraceback:
"""
Traceback (most recent call last):
File "/home/gilgolan/.local/lib/python3.11/site-packages/joblib/externals/loky/process_executor.py", line 463, in _process_worker
r = call_item()
^^^^^^^^^^^
File "/home/gilgolan/.local/lib/python3.11/site-packages/joblib/externals/loky/process_executor.py", line 291, in call
return self.fn(*self.args, **self.kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/gilgolan/.local/lib/python3.11/site-packages/joblib/parallel.py", line 589, in call
return [func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^
File "/home/gilgolan/.local/lib/python3.11/site-packages/joblib/parallel.py", line 589, in
return [func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/home/gilgolan/.local/lib/python3.11/site-packages/scenicplus/cli/commands.py", line 320, in _run_dem_single_region_set
dem_db = DEMDatabase(
^^^^^^^^^^^^
File "/home/gilgolan/.local/lib/python3.11/site-packages/pycistarget/motif_enrichment_dem.py", line 147, in init
self.db_regions = pr.PyRanges(region_names_to_coordinates(list(self.genes)))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/gilgolan/.local/lib/python3.11/site-packages/pycistarget/utils.py", line 35, in region_names_to_coordinates
regiondf.columns=['Chromosome', 'Start', 'End']
^^^^^^^^^^^^^^^^
File "/home/gilgolan/.local/lib/python3.11/site-packages/pandas/core/generic.py", line 5920, in setattr
return object.setattr(self, name, value)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "pandas/_libs/properties.pyx", line 69, in pandas._libs.properties.AxisProperty.set
File "/home/gilgolan/.local/lib/python3.11/site-packages/pandas/core/generic.py", line 822, in _set_axis
self._mgr.set_axis(axis, labels)
File "/home/gilgolan/.local/lib/python3.11/site-packages/pandas/core/internals/managers.py", line 228, in set_axis
self._validate_set_axis(axis, new_labels)
File "/home/gilgolan/.local/lib/python3.11/site-packages/pandas/core/internals/base.py", line 70, in _validate_set_axis
raise ValueError(
ValueError: Length mismatch: Expected axis has 0 elements, new values have 3 elements
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "/home/gilgolan/.local/bin/scenicplus", line 8, in
sys.exit(main())
^^^^^^
File "/home/gilgolan/.local/lib/python3.11/site-packages/scenicplus/cli/scenicplus.py", line 1137, in main
args.func(args)
File "/home/gilgolan/.local/lib/python3.11/site-packages/scenicplus/cli/scenicplus.py", line 588, in motif_enrichment_dem
run_motif_enrichment_dem(
File "/home/gilgolan/.local/lib/python3.11/site-packages/scenicplus/cli/commands.py", line 466, in run_motif_enrichment_dem
dem_results: List[DEM] = joblib.Parallel(
^^^^^^^^^^^^^^^^
File "/home/gilgolan/.local/lib/python3.11/site-packages/joblib/parallel.py", line 1952, in call
return output if self.return_generator else list(output)
^^^^^^^^^^^^
File "/home/gilgolan/.local/lib/python3.11/site-packages/joblib/parallel.py", line 1595, in _get_outputs
yield from self._retrieve()
File "/home/gilgolan/.local/lib/python3.11/site-packages/joblib/parallel.py", line 1699, in _retrieve
self._raise_error_fast()
File "/home/gilgolan/.local/lib/python3.11/site-packages/joblib/parallel.py", line 1734, in _raise_error_fast
error_job.get_result(self.timeout)
File "/home/gilgolan/.local/lib/python3.11/site-packages/joblib/parallel.py", line 736, in get_result
return self._return_or_raise()
^^^^^^^^^^^^^^^^^^^^^^^
File "/home/gilgolan/.local/lib/python3.11/site-packages/joblib/parallel.py", line 754, in _return_or_raise
raise self._result
ValueError: Length mismatch: Expected axis has 0 elements, new values have 3 elements
[Tue Oct 8 16:24:31 2024]
Error in rule motif_enrichment_dem:
jobid: 7
input: /home/gilgolan/bioinfo_analysis/scenicplus/tutorial2/outs/region_sets, /home/gilgolan/bioinfo_analysis/scenicplus/tutorial2/cistarget_db/hg38_10kbp_up_10kbp_down_full_tx_v10_clust.genes_vs_motifs.scores.feather, genome_annotation.tsv, /home/gilgolan/bioinfo_analysis/scenicplus/tutorial2/motifs-v10nr_clust-nr.hgnc-m0.001-o0.0.tbl
output: dem_results.hdf5, dem_results.html
shell:

            scenicplus grn_inference motif_enrichment_dem                     --region_set_folder /home/gilgolan/bioinfo_analysis/scenicplus/tutorial2/outs/region_sets                     --dem_db_fname /home/gilgolan/bioinfo_analysis/scenicplus/tutorial2/cistarget_db/hg38_10kbp_up_10kbp_down_full_tx_v10_clust.genes_vs_motifs.scores.feather                     --output_fname_dem_result dem_results.hdf5                     --temp_dir /tmp                     --species homo_sapiens                     --fraction_overlap_w_dem_database 0.4                     --max_bg_regions 500                     --genome_annotation genome_annotation.tsv                     --balance_number_of_promoters                     --promoter_space 1000                     --adjpval_thr 0.05                     --log2fc_thr 1.0                     --mean_fg_thr 0.0                     --motif_hit_thr 3.0                     --path_to_motif_annotations /home/gilgolan/bioinfo_analysis/scenicplus/tutorial2/motifs-v10nr_clust-nr.hgnc-m0.001-o0.0.tbl                     --annotation_version v10nr_clust                     --motif_similarity_fdr 0.001                     --orthologous_identity_threshold 0.0                     --annotations_to_use Direct_annot Orthology_annot                     --write_html                     --output_fname_dem_html dem_results.html                     --seed 666                     --n_cpu 20
        
    (one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)

Shutting down, this might take some time.
Exiting because a job execution failed. Look above for error message
Complete log: .snakemake/log/2024-10-08T155849.153427.snakemake.log
WorkflowError:
At least one job did not complete successfully."

kennethho04 · 2024-10-09T17:30:24Z

Hi @gilgolan73

Seems like you ran into same problem as issue #432 ; rolling your python version from 3.11.9 to 3.11.8 should resolve the error.

gilgolan73 · 2024-10-10T06:03:07Z

Hello @kennethho04 , thank you for the quick reply
How do you suggest to revert the python version? Do i need to re-install the packages?

Gil

kennethho04 · 2024-10-13T16:10:31Z

@gilgolan73 doing conda install python==3.11.8 in your conda env should suffice. You don't need to re-install the packages.

gilgolan73 · 2024-10-14T13:45:33Z

hello @kennethho04, I tried to do it (revert to python 3.11.8 as suggested), but still getting the same error (attached). Also I'm attaching the list of installed packages.

Thank you,
Gil
error 141024_python_3.11.8.txt
list packages 141024.txt

yojetsharma · 2024-10-18T19:45:38Z

hello @kennethho04, I tried to do it (revert to python 3.11.8 as suggested), but still getting the same error (attached). Also I'm attaching the list of installed packages.

Thank you, Gil error 141024_python_3.11.8.txt list packages 141024.txt

It looks like an issue with parallel processing. Have you tried installing/downgrading dask?

gilgolan73 · 2024-10-19T09:20:09Z

Hi @yojetsharma @kennethho04 , I am using Dask version 2024.5.0 with python 3.11.8.
Which version do you recommend I use? (I have a CentOS 9 Linux machine; it is a virtual machine on OracleVM.)

Thank you,
Gil

yojetsharma · 2024-10-19T09:30:48Z

Hi @yojetsharma @kennethho04 , I am using Dask version 2024.5.0 with python 3.11.8. Which version do you recommend I use? (I have a CentOS 9 Linux machine; it is a virtual machine on OracleVM.)

Thank you, Gil

I too had faced an issue with parallel processing but as someone suggested in one of the other issues, downgrading it to 2024.5.0 helped. But looks like you are already using that.
The python version that I am using is 3.11.10.

gilgolan73 · 2024-10-21T09:00:42Z

@yojetsharma @kennethho04 I tried with python 3.11.8, both with dask version 2024.2.1 and 2024.5.0.
still receiving the same error. do you think I need to install python 3.11.10?

Thanks

yojetsharma · 2024-10-21T10:36:01Z

Does reducing number of cores help?
Also, are the region_sets, genome_annotation.tsv are looking fine?

gilgolan73 · 2024-10-21T14:20:39Z

@yojetsharma Hi, I tried to reduce the number of cores to 10, and to 1. It still doesn't help.
The files look OK, i'm attaching them.
genome_annotation.txt
PURK.txt
OPC.txt
NFOL.txt
MOL.txt
MGL.txt
MG.txt
INH_VIP.txt
INH_SST.txt
INH_SNCG.txt
INH_PVALB.txt
GP.txt
GC.txt
ENDO.txt
COP.txt
BG.txt
AST.txt

Gil

yojetsharma · 2024-10-21T15:23:42Z

My last resort would be to try and reinstalling the conda env and see if it fixes the issue.

gilgolan73 · 2024-10-27T08:59:57Z

Hi @yojetsharma @kennethho04
I tried to reinstall the conda env (both with python 3.11.10 and python 3.11.8), it still did not resolve the issue. Do you have any other suggestions?

Thank you,
Gil

brianysoong · 2024-11-15T19:57:31Z

@gilgolan73 Did you end up finding a way to resolve your issue? I think I am experiencing a similar issue, where the cistarget motif enrichment is working fine, but DEM does not identify any motifs

gilgolan73 · 2024-11-18T12:06:07Z

Hi @brianysoong, unfortunately I am still stuck with this issue. Tried many different Dask and python versions but the same issue persists. Do you have any suggestions?
@yojetsharma @kennethho04

Gil

brianysoong · 2024-11-21T18:30:51Z

@gilgolan73 In my case, it ended up being a dumb mistake where I used human genome / annotations for mouse data!

gilgolan73 · 2024-11-24T07:42:39Z

Hi @brianysoong , can you please elaborate which file exactly was wrong ? and from where did you downloaded the correct file?

Thank you,
Gil

gilgolan73 · 2024-11-26T09:58:56Z

Hi,
I finally managed to solve the issue, by downloading the cistarget databases (ranking and scores) from this location:
https://resources.aertslab.org/cistarget/databases/homo_sapiens/hg38/screen/mc_v10_clust/region_based/
(according to this issue - #231).
@SeppeDeWinter @yojetsharma @kennethho04 can you please confirm that this is the right location to download these files from?

Thanks,
Gil

SeppeDeWinter · 2024-11-28T07:20:56Z

Hi @gilgolan73

That's the correct location to download the database from.

S

gilgolan73 · 2024-11-28T07:48:03Z

Thank you @SeppeDeWinter .
I believe it will be helpful for other users to mention it in the tutorial (so they will not make the same mistake as me).

yojetsharma mentioned this issue Oct 17, 2024

Chromsize file issue, snakemake doesn't proceed #482

Closed

gilgolan73 mentioned this issue Nov 18, 2024

Running Scenic+ without Snakemake #510

Open

gilgolan73 closed this as completed Nov 28, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Error when running SnakeMake #468

Error when running SnakeMake #468

gilgolan73 commented Sep 17, 2024

ruicatxiao commented Oct 1, 2024

kennethho04 commented Oct 5, 2024 •

edited

Loading

SeppeDeWinter commented Oct 9, 2024

gilgolan73 commented Oct 9, 2024

kennethho04 commented Oct 9, 2024

gilgolan73 commented Oct 10, 2024 •

edited

Loading

kennethho04 commented Oct 13, 2024

gilgolan73 commented Oct 14, 2024

yojetsharma commented Oct 18, 2024

gilgolan73 commented Oct 19, 2024

yojetsharma commented Oct 19, 2024

gilgolan73 commented Oct 21, 2024

yojetsharma commented Oct 21, 2024

gilgolan73 commented Oct 21, 2024

yojetsharma commented Oct 21, 2024

gilgolan73 commented Oct 27, 2024

brianysoong commented Nov 15, 2024

gilgolan73 commented Nov 18, 2024

brianysoong commented Nov 21, 2024

gilgolan73 commented Nov 24, 2024

gilgolan73 commented Nov 26, 2024

SeppeDeWinter commented Nov 28, 2024

gilgolan73 commented Nov 28, 2024

Error when running SnakeMake #468

Error when running SnakeMake #468

Comments

gilgolan73 commented Sep 17, 2024

ruicatxiao commented Oct 1, 2024

kennethho04 commented Oct 5, 2024 • edited Loading

SeppeDeWinter commented Oct 9, 2024

gilgolan73 commented Oct 9, 2024

kennethho04 commented Oct 9, 2024

gilgolan73 commented Oct 10, 2024 • edited Loading

kennethho04 commented Oct 13, 2024

gilgolan73 commented Oct 14, 2024

yojetsharma commented Oct 18, 2024

gilgolan73 commented Oct 19, 2024

yojetsharma commented Oct 19, 2024

gilgolan73 commented Oct 21, 2024

yojetsharma commented Oct 21, 2024

gilgolan73 commented Oct 21, 2024

yojetsharma commented Oct 21, 2024

gilgolan73 commented Oct 27, 2024

brianysoong commented Nov 15, 2024

gilgolan73 commented Nov 18, 2024

brianysoong commented Nov 21, 2024

gilgolan73 commented Nov 24, 2024

gilgolan73 commented Nov 26, 2024

SeppeDeWinter commented Nov 28, 2024

gilgolan73 commented Nov 28, 2024

kennethho04 commented Oct 5, 2024 •

edited

Loading

gilgolan73 commented Oct 10, 2024 •

edited

Loading