Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ancillary files not found when trying to remove #1

Open
npavlovikj opened this issue Apr 1, 2020 · 6 comments
Open

Ancillary files not found when trying to remove #1

npavlovikj opened this issue Apr 1, 2020 · 6 comments

Comments

@npavlovikj
Copy link

Hi,

I am trying to install "cenote-taker2" on our HPC system for a local research group.
I downloaded all databases and dependencies using the commit from 03/26/2020.

I did a test run you have provided in the Wiki page using the command:

run_cenote-taker2.0.1.py --contigs testcontigs_DNA_ct2.fasta --run_title test_DNA_ct --template_file template.sbt --prune_prophage True --mem 58 --cpu 8 --filter_out_plasmids False --enforce_start_codon False --handle_contigs_without_hallmark sketch_all --known_strains blast_knowns --blastn_db /work/HCC/BCRF/BLAST/nt

Most of the programs finished ok I think, but I am getting the following error at the end:

9783
 Summary file made: test_DNA_ct.tsv 
removing ancillary files
rm: cannot remove '*.comb.tbl': No such file or directory
rm: cannot remove '*.remove_hypo.txt': No such file or directory
rm: cannot remove '*.out.hhr': No such file or directory
rm: cannot remove '*.out.hhr': No such file or directory
rm: cannot remove 'bt2_indices/': No such file or directory
rm: cannot remove 'other_contigs/*.dat': No such file or directory
rm: cannot remove 'no_end_contigs_with_viral_domain/*.remove_hypo.txt': No such file or directory
rm: cannot remove 'no_end_contigs_with_viral_domain/*.trans.fasta': No such file or directory
rm: cannot remove 'no_end_contigs_with_viral_domain/test_DNA_ct3_vs1.AA.called_hmmscan2.txt': No such file or directory
rm: cannot remove 'no_end_contigs_with_viral_domain/test_DNA_ct4.AA.called_hmmscan2.txt': No such file or directory
rm: cannot remove 'no_end_contigs_with_viral_domain/test_DNA_ct4_vs1.AA.called_hmmscan2.txt': No such file or directory

These files indeed do not exist in the output directory, thus the message.
I was wondering if this type of error message is familiar to you, and whether you have some suggestions on how to fix it.
I am using the "testcontigs_DNA_ct2.fasta" file you have provided, and a dummy "template.sbt" file.
Please find the complete log here, cenote-taker2.log

I am looking forward to hearing from you, and if you need any additional information, please let me know.

Thank you,
Natasha

@mtisza1
Copy link
Owner

mtisza1 commented Apr 1, 2020

Hi Natasha,

Thanks for letting me know about this and for providing a detailed issue. Basically, these errors are of no concern. It just has to do with the fact that some of my coding, while it all works as far as I can tell, is a bit ugly. There are a range of ancillary files that could be generated by Cenote-Taker2, and sometimes they are not present at the end. That said, I've update the code to eliminate this message. Also, I've done a couple debugging updates since 3/26/2020, so please 'git pull' the repo.

Going through your log file, I did see that the IRF (inverted repeat finder) program isn't found.

/home/deogun/npavlovikj/.conda/envs/ct2/bin/cenote-taker2.0.1.sh: /home/deogun/npavlovikj/.conda/envs/ct2/bin/irf307.linux.exe: /lib/ld-linux.so.2: bad ELF interpreter: No such file or directory

IRF was a bit trickier to include in the install as it's not available on Anaconda and is meant for 32 bit systems. A google search supports the idea that this type of error arises from 64 bit/32 bit incompatibility. You might try to resolve this, but this program is not strictly necessary for the rest of the code. You just won't know which contigs are flanked by inverted terminal repeats.

I hope this was at least a little bit helpful and please let me know if you run into any other issues.

Best,

Mike

@npavlovikj
Copy link
Author

Hi @mtisza1 ,

Thank you so much for your prompt reply and fixes, I highly appreciate it!
I got the newest commit from the repo, and I can confirm that the messages do not appear anymore and the test job finished fine.

When we install tools on our HPC cluster, we try to package them in conda packages if that is feasible.
Therefore, I created conda package for cenote-taker2 and added it to our HCC conda channel, https://anaconda.org/hcc/cenote-taker2.
I followed the dependencies you have used in your conda environment and your code.
Everything with exception of phanotate was available in bioconda, and I added phanotate in our channel.
You can see the full recipe here, https://github.com/unlhcc/hcc-conda-recipes/tree/master/recipes/cenote-taker2/2020.04.01.
It is always good to have the data and the script directories separate, so I use a variable CT2_DIR to point to the data directory, and I patched the code respectively.
Briefly, if someone uses the conda package for cenote-taker2, one needs to set the CT2_DIR variable to point to the data directory, and follow the instructions in the post-link.sh script to download the databases.
The download of the databases is not part of the conda recipe, and it is not common practice to do that mostly because of their size, so the download of the databases for cenote-taker2 is in the download-db.sh script the user needs to run after the environment is created.

Regarding irf307.linux.exe - you are right that it works only on 32-bit systems, and ours are 64-bit.
Unfortunately, looks like the provided binary is only for 32-bit systems.
In order to make IRF reproducible, I created a Docker image for it.
On our clusters we support Singularity, so replacing irf307.linux.exe $NONCIR 2 3 5 80 10 40 500000 10000 -d -h with singularity exec docker://unlhcc/irf irf307.linux.exe $NONCIR 2 3 5 80 10 40 500000 10000 -d -h works fine.

Using the mentioned conda package for cenote-taker2 with the Singularity image for IRF worked well with the test data on our clusters.
I am letting you know about this in case some of the users of cenote-taker2 notices the conda package.
I do not plan to maintain the recipe, unless you have updated the code and the user requests the changes.
However, you are more than welcome to modify and re-use the conda recipe if you want, or add your changes to our repo.

I am sorry for the long reply, and I hope what I wrote makes sense.
If you have any questions, please let me know.

Thank you,
Natasha

@mtisza1
Copy link
Owner

mtisza1 commented Apr 3, 2020

Hey Natasha,

This is really interesting! I'm a bit new to the whole packaging/making images thing, but I will try to keep this conda package updated if possible.

However, I tried to install your package conda install -c hcc cenote-taker2 and got an error:

Collecting package metadata (current_repodata.json): done
Solving environment: failed with initial frozen solve. Retrying with flexible solve.
Solving environment: failed with repodata from current_repodata.json, will retry with next repodata source.
Collecting package metadata (repodata.json): done
Solving environment: failed with initial frozen solve. Retrying with flexible solve.
Solving environment: | 
Found conflicts! Looking for incompatible packages.
This can take several minutes.  Press CTRL-C to abort.
failed                                                                                    

UnsatisfiableError: The following specifications were found to be incompatible with each other:

Output in format: Requested package -> Available versions

I noticed a few differences from my .yaml file including

yours:

tbl2asn-forever=25.7.1f
bbmap=37.62
coreutils
perl
phanotate=2019.08.09
matplotlib-base=3.1.3
mine:
tbl2asn=25.7
bbtools=37.62
cmake=3.14.0
matplotlib=3.1.3

Also I didn't see the channels that I listed in my .yaml in yours. I don't know if I'm doing something wrong, but I wanted to make you aware.

Let me know if you have ideas. Thanks!

@npavlovikj
Copy link
Author

Hi @mtisza1 ,

I am sorry, I forgot to mention that our HCC conda packages depend on packages from conda-forge, bioconda and defaults as the main and well-maintained conda channels.
So, the conda command for installing should actually be:
conda create -n cenote-taker2 -c hcc -c conda-forge -c bioconda -c defaults cenote-taker2=2020.04.01
Also, I would suggest you to use conda create instead of conda install.
The first command will create a completely new environment with cenote-taker2, while the latter one is installing packages in already existing environment which may cause incompatibility issues as you have observed.

When we create conda packages for our channel, we try to follow the infrastructure of bioconda and install packages that only depend on the channels conda-forge, bioconda and defaults.
I saw you used the AgBiome channel.
You can definitely add packages from this channel - I didn't have the need to since all the dependencies are available in the main channels listed above.

Regarding the difference in the packages you listed:

  • "bbtools" in "bioconda" is named "bbmap" (thus, "bbtools"=="bbmap")
  • the main "tbl2asn" package expires every year, although the version is not changed. To avoid this, the "bioconda" contributors created the same package "tbl2asn" with addition to utility that fakes the time, so no users will get the expiration message anymore (thus, "tbl2asn-forever"=="tbl2asn")
  • You can either use "matplotlib-base" or "matplotlib". The rule is, if one needs the full Qt functionality, then one needs "matplotlib" as a dependency; otherwise, if one needs "matplotlib" to just render images to files, which I assume is the case here, one needs to use "matplotlib-base". Both are available in "conda-forge", just "matplotlib" takes a lot of space and needs to be used only when needed. If my assumption here is wrong, please let me know, and I can use "matplotlib" in the recipe instead
  • I believe you use "cmake" to compile "last". Since I am using conda package for "last" and no dependencies need to be compiled, there is no need to use "cmake" in the recipe
  • "cenote-taker2" uses "phanotate" as a dependency, and I created a conda package for it
  • There are few "perl" scripts in the repo, thus the "perl" dependency
  • Some of the scripts in "cenote-taker2" do bash scripting. While most systems should have the bash commands already installed, that is not always the case. To provide more robust distribution of "cenote-taker2", I added the "coreutils" package that provides bash utilities, https://anaconda.org/bioconda/coreutils

As you can see, the idea with the dependencies in the recipe is to not use https://github.com/mtisza1/Cenote-Taker2/blob/master/install_scripts/cenote_install1.sh anymore. With conda recipes, it is preferable to have a conda package for each dependency. So, all the packages you are downloading and installing in the install script and now conda packages that can just be used.

I hope this makes sense.
Please try installing cenote-taker2 again, and let me know if you have any other questions or issues.

Thank you,
Natasha

@mtisza1
Copy link
Owner

mtisza1 commented Apr 6, 2020

OK I've finally gotten around to trying this. It seems to work! I'll make a note on the repo about this option. I appreciate that you took the time to explain this to me, and I think this is a valuable addition.

Best regards,

Mike

P.S. I'm closing this issue.

@mtisza1
Copy link
Owner

mtisza1 commented May 10, 2022

Hi Natasha,

It's been a couple years and there have been many updates to Cenote-Taker 2, now at v2.1.5. Occasionally, I'm getting questions from people who are using the bioconda package you created, and it's difficult for me to give guidance about updates to them.

You are probably quite busy, but I was wondering if you'd be able to help me update the bioconda package. Overall, the install is much easier than it was 2 years ago.

Best,

Mike

@mtisza1 mtisza1 reopened this May 10, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants