Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Automatically re-annotate PFAM domains if they are incompatible #58

Open
susheelbhanu opened this issue Jul 6, 2021 · 3 comments
Open
Labels
bug Something isn't working

Comments

@susheelbhanu
Copy link

susheelbhanu commented Jul 6, 2021

Hi,

I'm running version 0.1.26 and I'm getting the same error as #47

part of error message is below:

ERROR   06/07 22:04:58   pfam_id
Traceback (most recent call last):
  File "/mnt/data/sbusi/antismash/.snakemake/conda/bf12a359/lib/python3.7/site-packages/pandas/core/indexes/base.py", line 2656, in get_loc
    return self._engine.getloc(key)
  File "pandas/_libs/index.pyx", line 108, in pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/index.pyx", line 132, in pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/hashtable_class_helper.pxi", line 1601, in pandas._libs.hashtable.PyObjectHashTable.get_item
  File "pandas/_libs/hashtable_class_helper.pxi", line 1608, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 'pfam_id'

Thank you for your help!

@prihoda
Copy link
Collaborator

prihoda commented Jul 7, 2021

Hi @susheelbhanu, are you able to share the input file and the input command you are using?

@susheelbhanu
Copy link
Author

Hi,

Sure, here's the command below and two of the files having the same issue are attached.
GL_R1_GL5_UP_1_C1.3.1_sub.merged.gbk.txt
GL_R80_GL56_UP_1_maxbin_res.049.fasta_sub.merged.gbk.txt

I added the .txt extension to get past the upload file type limitations.

(date && deepbgc pipeline /mnt/data/sbusi/antismash/results/gbk/GL_R80_GL56_UP_1_maxbin_res.049.fasta_sub/GL_R80_GL56_UP_1_maxbin_res.049.fasta_sub.merged.gbk -o $(dirname /mnt/data/sbusi/antismash/results/merged_deepbgc/GL_R80_GL56_UP_1_maxbin_res.049.fasta_sub/GL_R80_GL56_UP_1_maxbin_res.049.fasta_sub.bgc.tsv) && date)

Thank you!

@prihoda
Copy link
Collaborator

prihoda commented Jul 7, 2021

Ok, this seems to be caused by incompatible PFAM_domain annotations in the GenBank file. Can you turn this into a nucleotide FASTA and run deepbgc pipeline on that? This will re-annotate the Pfam domains using DeepBGC which should fix the issue.

I will rename this ticket, hopefully we can fix this issue later, in the meantime please try the FASTA input.

@prihoda prihoda changed the title KeyError: pfam_id Automatically re-annotate PFAM domains if they are incompatible Jul 7, 2021
@prihoda prihoda added the bug Something isn't working label Jul 7, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants