Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The Bash tutorial is missing a step #365

Open
8 tasks done
lyisrae1 opened this issue Oct 23, 2024 · 1 comment
Open
8 tasks done

The Bash tutorial is missing a step #365

lyisrae1 opened this issue Oct 23, 2024 · 1 comment

Comments

@lyisrae1
Copy link

lyisrae1 commented Oct 23, 2024

User checklist

  • Are you using the latest release? Yes
  • Are you using python 3? Yes
  • Did you check previous issues to see if this has already been mentioned? Yes
  • Are you using a Mac or Linux machine? Linux machine

Description

Hello there,
I am trying to learn how to use autometa from your tutorial posted on ReadTheDocs, but there is a piece missing in Step 4 - Single Copy Markers. There is not a step detailing how we create a hmmscan.tsv file. Can you provide this information to me please?

Expected Behavior

I checked the rest of the document, but there is no other mention of how the learners are supposed to make the hmmscan.tsv file.

System Environment

  • Operating System: Linux
  • RAM: A ton. Our university's cluster has over 45k nodes
  • Disk: N/A

Tasks/Command(s)

  • Task 1
  • Task 2
  • Task 3
  • etc.
Log/Error information generated by Autometa.

Hello,

I appreciate you looking at my inquiry. I noticed that there was a step missing in your ReadTheDocs page for the tutorial. There is not step given to show us how to create hmmscan.tsv files before we need them to complete Step 4 - Single Copy Markers.

For example, I followed the tutorial exactly, but I keep getting an error telling me that the hmmscan.tsv file does not exist. I will past the directions for Step 4 here:

Create a markers directory to hold the marker genes

mkdir -p $HOME/Autometa/autometa/databases/markers

Change the default download path to the directory created above

autometa-config
--section databases
--option markers
--value $HOME/Autometa/autometa/databases/markers

Download single-copy marker genes

autometa-update-databases --update-markers

hmmpress the marker genes

hmmpress -f $HOME/Autometa/autometa/databases/markers/bacteria.single_copy.hmm
hmmpress -f $HOME/Autometa/autometa/databases/markers/archaea.single_copy.hmm

autometa-markers
--orfs $HOME/tutorial/78mbp_metagenome.orfs.faa
--kingdom bacteria
--hmmscan $HOME/tutorial/78mbp_metagenome.hmmscan.tsv
--out $HOME/tutorial/78mbp_metagenome.markers.tsv
--parallel
--cpus 4
--seed 42

When I follow this code, I get this error:
ERROR:
[10/23/2024 04:39:10 PM DEBUG] autometa.common.external.hmmscan: hmmscan --seed 42 --cpu 0 --tblout /vast/agnanad1/Leone/autometa_tutorial/78mbp_metagenome.hmmscan.tsv /vast/agnanad1/Leone/autometa_tutorial/markers/bacteria.single_copy.hmm /vast/agnanad1/Leone/autometa_tutorial/78mbp_metagenome.orfs.faa
[10/23/2024 04:39:10 PM WARNING] autometa.common.external.hmmscan: Make sure your hmm profiles are pressed! hmmpress -f /vast/agnanad1/Leone/autometa_tutorial/markers/bacteria.single_copy.hmm
Traceback (most recent call last):
File "/home/lyisrae1/.conda/envs/autometa/bin/autometa-markers", line 10, in
sys.exit(main())
^^^^^^
File "/home/lyisrae1/.conda/envs/autometa/lib/python3.12/site-packages/autometa/common/markers.py", line 266, in main
get(
File "/home/lyisrae1/.conda/envs/autometa/lib/python3.12/site-packages/autometa/common/markers.py", line 162, in get
scans = hmmscan.run(
^^^^^^^^^^^^
File "/home/lyisrae1/.conda/envs/autometa/lib/python3.12/site-packages/autometa/common/external/hmmscan.py", line 174, in run
annotate_sequential(
File "/home/lyisrae1/.conda/envs/autometa/lib/python3.12/site-packages/autometa/common/external/hmmscan.py", line 106, in annotate_sequential
raise err
File "/home/lyisrae1/.conda/envs/autometa/lib/python3.12/site-packages/autometa/common/external/hmmscan.py", line 101, in annotate_sequential
subprocess.run(
File "/home/lyisrae1/.conda/envs/autometa/lib/python3.12/subprocess.py", line 571, in run
raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['hmmscan', '--seed', '42', '--cpu', '0', '--tblout', '/vast/agnanad1/Leone/autometa_tutorial/78mbp_metagenome.hmmscan.tsv',

Additionally, I've have had a few syntax issues in Step 5 - Taxonomy. But those were very easy to fix, so that is not the issue. But can I please get some clarification to finish out Step 4 on ReadTheDocs please? I cannot finish the tutorial properly without that step.
autometa_tutorial.txt

Here is the process I did without the markers:

autometa-binning
--kmers /vast/agnanad1/Leone/autometa_tutorial/78mbp_metagenome.bacteria.kmers.embedded.tsv
--coverages /vast/agnanad1/Leone/autometa_tutorial/78mbp_metagenome.coverages.tsv
--gc-content /vast/agnanad1/Leone/autometa_tutorial/78mbp_metagenome.gc_content.tsv
--output-binning /vast/agnanad1/Leone/autometa_tutorial/78mbp_metagenome.binning.tsv
--output-main /vast/agnanad1/Leone/autometa_tutorial/78mbp_metagenome.main.tsv
--clustering-method dbscan
--completeness 20
--purity 90
--cov-stddev-limit 25
--gc-stddev-limit 5
--taxonomy /vast/agnanad1/Leone/autometa_tutorial/78mbp_metagenome.taxonomy.tsv
--starting-rank superkingdom
--rank-filter superkingdom
--rank-name-filter bacteria

And here is the error message:
usage: autometa-binning [-h] --kmers filepath --coverages filepath
--gc-content filepath --markers filepath
--output-binning filepath [--output-main filepath]
[--clustering-method {dbscan,hdbscan}]
[--completeness 0 < float <= 100]
[--purity 0 < float <= 100] [--cov-stddev-limit float]
[--gc-stddev-limit float] [--taxonomy filepath]
[--starting-rank {superkingdom,phylum,class,order,family,genus,species}]
[--reverse-ranks]
[--rank-filter {superkingdom,phylum,class,order,family,genus,species}]
[--rank-name-filter RANK_NAME_FILTER] [--verbose]
[--cpus int]
autometa-binning: error: the following arguments are required: --markers

https://autometa.readthedocs.io/en/latest/bash-step-by-step-tutorial.html#single-copy-markers

Thank you for your time,
Leone

@chasemc
Copy link
Member

chasemc commented Oct 25, 2024

It looks the documentation needs to be fixed but is mostly an issue with file paths

1

At the start it says to download metagenome.fna.gz to $HOME/tutorial/test_data/
but later the file has a different name $HOME/tutorial/test_data/78mbp_metagenome.fna

So to start you should download the metagenome.fna.gz and save it to/as $HOME/tutorial/test_data/78mbp_metagenome.fna

2

There is a separate issue in the ORF creation step.

Current:

autometa-orfs \
    --assembly $HOME/tutorial/78mbp_metagenome.filtered.fna \
    --output-nucls $HOME/tutorial/78mbp_metagenome.orfs.fna \
    --output-prots $HOME/tutorial/a78mbp_metagenome.orfs.faa \
    --cpus 40

Should be:

autometa-orfs \
    --assembly $HOME/tutorial/78mbp_metagenome.filtered.fna \
    --output-nucls $HOME/tutorial/78mbp_metagenome.orfs.fna \
    --output-prots $HOME/tutorial/78mbp_metagenome.orfs.faa \
    --cpus 40

That should fix the error.


CC- @shaneroesemann @jason-c-kwan , the documentation needs to be updated accordingly. Also the error message generated by autometa-markers is not helpful, the subprocess stderr should be captured and printed rather than just saying there's an error with hmmpress and "Make sure your hmm profiles are pressed! " which wasn't the issue

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants