Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error in rule get_search_space: jobid: 11 second #492

Open
vcleon88 opened this issue Oct 24, 2024 · 4 comments
Open

Error in rule get_search_space: jobid: 11 second #492

vcleon88 opened this issue Oct 24, 2024 · 4 comments

Comments

@vcleon88
Copy link

I created my own chromsize.tsv:
it looks like
Chromosome Start End
0 chr1 0 248956422
1 chr2 0 242193529
2 chr3 0 198295559
3 chr4 0 190214555
4 chr5 0 181538259
Index(['Chromosome', 'Start', 'End'], dtype='object')

and the genomo_annotataion.tsv with filtered chromosome be like:
Chromosome Start End Strand Gene Transcription_Start_Site
0 chrM 3307 4262 + MT-ND1 3307
1 chrM 4470 5511 + MT-ND2 4470
2 chrM 5904 7445 + MT-CO1 5904
3 chrM 7586 8269 + MT-CO2 7586
4 chrM 8366 8572 + MT-ATP8 8366

Transcript_type
0 protein_coding
1 protein_coding
2 protein_coding
3 protein_coding
4 protein_coding
Index(['Chromosome', 'Start', 'End', 'Strand', 'Gene',
'Transcription_Start_Site', 'Transcript_type'],
dtype='object')

however when i run the Snakemake
the error comes again

:~/scplus_pipeline/Snakemake$ Assuming unrestricted shared filesystem usage for local execution.
Building DAG of jobs...
Using shell: /bin/bash
Provided cores: 40
Rules claiming more threads will be scaled down.
Job stats:
job count


AUCell_direct 1
AUCell_extended 1
all 1
eGRN_direct 1
eGRN_extended 1
get_search_space 1
motif_enrichment_dem 1
prepare_menr 1
region_to_gene 1
scplus_mudata 1
tf_to_gene 1
total 11

Select jobs to execute...
Execute 1 jobs...

[Thu Oct 24 15:57:03 2024]
localrule get_search_space:
input: /home/gu/scecis/plusout/ACC_GEX.h5mu, /home/gu/scecis/plusout/genome_annotation.tsv, /home/gu/scecis/plusout/chromsizes.tsv
output: /home/gu/scecis/plusout/search_space.tsv
jobid: 11
reason: Missing output files: /home/gu/scecis/plusout/search_space.tsv
resources: tmpdir=/tmp

2024-10-24 15:57:08,201 SCENIC+ INFO Reading data
(scenicplus) gu@s166:~/scplus_pipeline/Snakemake$ /home/gu/miniconda3/envs/scenicplus/lib/python3.11/site-packages/anndata/_core/anndata.py:522: FutureWarning: The dtype argument is deprecated and will be removed in late 2024.
warnings.warn(
/home/gu/miniconda3/envs/scenicplus/lib/python3.11/site-packages/anndata/_core/anndata.py:522: FutureWarning: The dtype argument is deprecated and will be removed in late 2024.
warnings.warn(
Traceback (most recent call last):
File "/home/gu/miniconda3/envs/scenicplus/bin/scenicplus", line 8, in
sys.exit(main())
^^^^^^
File "/home/gu/miniconda3/envs/scenicplus/lib/python3.11/site-packages/scenicplus/cli/scenicplus.py", line 1137, in main
args.func(args)
File "/home/gu/miniconda3/envs/scenicplus/lib/python3.11/site-packages/scenicplus/cli/scenicplus.py", line 208, in search_space
get_search_space_command(
File "/home/gu/miniconda3/envs/scenicplus/lib/python3.11/site-packages/scenicplus/cli/commands.py", line 661, in get_search_space_command
search_space = get_search_space(
^^^^^^^^^^^^^^^^^
File "/home/gu/miniconda3/envs/scenicplus/lib/python3.11/site-packages/scenicplus/data_wrangling/gene_search_space.py", line 294, in get_search_space
pr_regions = pr.PyRanges(region_names_to_coordinates(scplus_region))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/gu/miniconda3/envs/scenicplus/lib/python3.11/site-packages/scenicplus/utils.py", line 223, in region_names_to_coordinates
regiondf.columns = ['Chromosome', 'Start', 'End']
^^^^^^^^^^^^^^^^
File "/home/gu/miniconda3/envs/scenicplus/lib/python3.11/site-packages/pandas/core/generic.py", line 5920, in setattr
return object.setattr(self, name, value)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "pandas/_libs/properties.pyx", line 69, in pandas._libs.properties.AxisProperty.set
File "/home/gu/miniconda3/envs/scenicplus/lib/python3.11/site-packages/pandas/core/generic.py", line 822, in _set_axis
self._mgr.set_axis(axis, labels)
File "/home/gu/miniconda3/envs/scenicplus/lib/python3.11/site-packages/pandas/core/internals/managers.py", line 228, in set_axis
self._validate_set_axis(axis, new_labels)
File "/home/gu/miniconda3/envs/scenicplus/lib/python3.11/site-packages/pandas/core/internals/base.py", line 70, in _validate_set_axis
raise ValueError(
ValueError: Length mismatch: Expected axis has 0 elements, new values have 3 elements
[Thu Oct 24 15:57:18 2024]
Error in rule get_search_space:
jobid: 11
input: /home/gu/scecis/plusout/ACC_GEX.h5mu, /home/gu/scecis/plusout/genome_annotation.tsv, /home/gu/scecis/plusout/chromsizes.tsv
output: /home/gu/scecis/plusout/search_space.tsv
shell:

    scenicplus prepare_data search_spance             --multiome_mudata_fname /home/gu/scecis/plusout/ACC_GEX.h5mu             --gene_annotation_fname /home/gu/scecis/plusout/genome_annotation.tsv             --chromsizes_fname /home/gu/scecis/plusout/chromsizes.tsv             --out_fname /home/gu/scecis/plusout/search_space.tsv             --upstream 1000 150000             --downstream 1000 150000             --extend_tss 10 10

    (one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)

Shutting down, this might take some time.
Exiting because a job execution failed. Look above for error message
Complete log: .snakemake/log/2024-10-24T155703.261548.snakemake.log
WorkflowError:
At least one job did not complete successfully.

is there anyone have idea of this issue ?

Thanks in advance.

@kennethho04
Copy link

Hi @vcleon88

I got the same issue and was able to resolve it. Your problem is similar to what was mentioned in issue #426

The problem is likely due to the format of your mudata var.names.
You can check the format by running the following:

import mudata

mdata = mudata.read(<PATH_TO_ ACC_GEX.h5mu>)
mdata["scATAC"].var_names

The format should be "chr:start-end". In my case it was formatted as "chr-start-end" so I reformatted mdata["scATAC"].var_names to "chr:start-end" and saved it as a new mudata to replace the old one in my out folder. Hope that helps.

@vcleon88
Copy link
Author

vcleon88 commented Nov 2, 2024

Hi @vcleon88

I got the same issue and was able to resolve it. Your problem is similar to what was mentioned in issue #426

The problem is likely due to the format of your mudata var.names. You can check the format by running the following:

import mudata

mdata = mudata.read(<PATH_TO_ ACC_GEX.h5mu>)
mdata["scATAC"].var_names

The format should be "chr:start-end". In my case it was formatted as "chr-start-end" so I reformatted mdata["scATAC"].var_names to "chr:start-end" and saved it as a new mudata to replace the old one in my out folder. Hope that helps.

Hi @kennethho04

Thank you so much!!!! I solved this problem!!!

@Melody-cell
Copy link

Hi, @vcleon88 , @kennethho04 , @SeppeDeWinter , i got the same error,
but my mudata is all good, i don't know why.
Did anyone know how to solve it?
image
image

@SeppeDeWinter
Copy link
Collaborator

Hi @Melody-cell

Could you post the entire error log please (just copy paste the output).

All the best,

Seppe

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants