Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

setting directories #23

Open
etr-asu opened this issue Feb 20, 2023 · 5 comments
Open

setting directories #23

etr-asu opened this issue Feb 20, 2023 · 5 comments

Comments

@etr-asu
Copy link

etr-asu commented Feb 20, 2023

Hi,

I am trying to run the DB creation part of the workflow and I am struggling to parse all the locations I need to point to to run on my system locally.
https://github.com/functional-dark-side/agnostos-wf/wiki#db-creation

Running this command:
snakemake --conda-frontend conda --use-conda -j 100 --config module="creation" --cluster-config config/cluster.yaml --cluster "sbatch --export=ALL -t {cluster.time} -c {threads} --ntasks-per-node {cluster.ntasks_per_node} --nodes {cluster.nodes} --cpus-per-task {cluster.cpus_per_task} --job-name {rulename}.{jobid} --partition {cluster.partition}" -R --until creation_workflow_report

With the attached my yaml files (renamed to txt so I can upload them).

Results in this error:

rule gene_prediction:
    input: /data/etrembat/agnostos_test/db_creation_data/TARA_039_041_SRF_0.1-0.22_5K_contigs.fasta
    output: /vol/cloud/agnostos_test/db_creation/gene_prediction/orf_seqs.fasta, /vol/cloud/agnostos_test/db_creation/gene_prediction/orf_partial_info.tsv
    log: logs/gene_stdout.log, logs/gene_stderr.err
    jobid: 11
    benchmark: benchmarks/gene_prediction.tsv
    reason: Missing output files: /vol/cloud/agnostos_test/db_creation/gene_prediction/orf_partial_info.tsv, /vol/cloud/agnostos_test/db_creation/gene_prediction/orf_seqs.fasta
    resources: mem_mb=1000, mem_mib=954, disk_mb=1000, disk_mib=954, tmpdir=<TBD>

I am unsure where these files should be since the download for the creation data only has the contigs.fasta files.

Thanks!

config_yaml.txt
config_communities_yaml.txt
cluster_yaml.txt

@genomewalker
Copy link
Contributor

genomewalker commented Feb 21, 2023

Hi @etr-asu

the error seems to be related that you need to define a path that exists in your system here in config.yaml:

rdir: "/vol/cloud/agnostos_test/db_creation"
idir: "/vol/cloud/agnostos_test/db_creation_data"

Antonio

@etr-asu
Copy link
Author

etr-asu commented Feb 21, 2023

Thanks for responding! The config.yaml file now reads:

# This file should contain everything to configure the workflow on a global scale.
# In case of sample based data, it should be complemented by a samples.tsv file that contains
# one row per sample. It can be parsed easily via pandas.
wdir: "/data/etrembat/agnostos-wf/workflow"
rdir: "/data/etrembat/agnostos_test/db_creation"
idir: "/data/etrembat/agnostos_test/db_creation_data"

And I get the same error:

rule gene_prediction:
    input: /data/etrembat/agnostos_test/db_creation_data/TARA_039_041_SRF_0.1-0.22_5K_contigs.fasta
    output: /data/etrembat/agnostos_test/db_creation/gene_prediction/orf_seqs.fasta, /data/etrembat/agnostos_test/db_creation/gene_prediction/orf_partial_info.tsv
    log: logs/gene_stdout.log, logs/gene_stderr.err
    jobid: 2
    benchmark: benchmarks/gene_prediction.tsv
    reason: Missing output files: /data/etrembat/agnostos_test/db_creation/gene_prediction/orf_seqs.fasta, /data/etrembat/agnostos_test/db_creation/gene_prediction/orf_partial_info.tsv
    resources: mem_mb=1000, mem_mib=954, disk_mb=1000, disk_mib=954, tmpdir=<TBD>


@genomewalker
Copy link
Contributor

Hi @etr-asu
can you send the log files? The message you pasted doesn't show an error, but the reason why the rule is executed.

@etr-asu
Copy link
Author

etr-asu commented Feb 22, 2023

Hi @genomewalker, I think the issue is the cluster.yaml file is asking for settings that are more than what my university hpc allows by default (for example I can't ask for a 1000 hour job). Is there a minimum set of requirements you would recommend? Or some other approach for users adapting to a shared hpc system? Thanks!

@genomewalker
Copy link
Contributor

genomewalker commented Feb 24, 2023

Yes, you should adapt the cluster.yaml settings to your HPC system. The time will depend on the size of your dataset. You can use the maximum allowed for the partition you will use. You can check it with the sinfo command.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants