Skip to content

Commit

Permalink
Merge pull request #55 from hoelzer-lab/issue/filter_aln_failed
Browse files Browse the repository at this point in the history
Hot fix for issue #54, skipping tree calc if no core gene MSA
  • Loading branch information
hoelzer authored Aug 23, 2023
2 parents 6643ea4 + e9b8d0b commit 0131fc2
Show file tree
Hide file tree
Showing 3 changed files with 30 additions and 10 deletions.
31 changes: 21 additions & 10 deletions modules/filter_alignment.nf
Original file line number Diff line number Diff line change
Expand Up @@ -16,22 +16,33 @@ process filter_alignment {
for file in ${aln}; do
NUM=\$(grep -c '>' \$file)
STRAINS=\$(cat ${strain_ids} | wc -l)
if [ \$NUM -eq \$STRAINS ]; then
# calculate cutoff for how many species are needed to define a core gene
# default is 1.0, but can be lowered by the user
t=\$(echo \$STRAINS*$params.core_perc | bc)
# round, everything equal or below .5 will be rounded down, otherwise up
# printf "%.0f" 26.4 == 26
# printf "%.0f" 26.52 == 27
STRAINS_CUTOFF=\$(printf "%.0f" \$t)
if [ \$NUM -ge \$STRAINS_CUTOFF ]; then
cp \${file} "\$(basename \${file} .aln)"_core.aln
sed -r -i '/>/ s/_[^_]+\$//' "\$(basename \${file} .aln)"_core.aln
sed -r -i '/^[^>]/ s/-/X/g' "\$(basename \${file} .aln)"_core.aln
fi
done
for file in *_core.aln; do
cd-hit -i \$file -o "\$file"_TMP -c 1.0
RECORDS=\$(grep -c ">" "\$file"_TMP)
if [ \$RECORDS -ne 1 ]; then
cp \$file FINAL_\$file
sed -r -i '/^[^>]/ s/X/-/g' FINAL_\$file
fi
done
# it can happen that there is no MSA with all input species!
if ls ./ | grep '_core.aln' > /dev/null; then
for file in *_core.aln; do
cd-hit -i \$file -o "\$file"_TMP -c 1.0
RECORDS=\$(grep -c ">" "\$file"_TMP)
if [ \$RECORDS -ne 1 ]; then
cp \$file FINAL_\$file
sed -r -i '/^[^>]/ s/X/-/g' FINAL_\$file
fi
done
fi
"""
}

1 change: 1 addition & 0 deletions nextflow.config
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,7 @@ params {
width = 10
tree = false
bootstrap = 1000
core_perc = 1.0
sets = false
tmlim = 240 // ILP solve time limit in seconds
chunks = 8 // how many ILP chunks for parallel computing
Expand Down
8 changes: 8 additions & 0 deletions ribap.nf
Original file line number Diff line number Diff line change
Expand Up @@ -263,6 +263,14 @@ def helpMSG() {
Sure thing, We will use RAxML for this.
Be aware, this will take a lot of time. [default: $params.tree]
--bootstrap Bootstrap value for tree building (increases time!). Must be >=1000 for IQ-TREE ultra-fast bootstraps [default: $params.bootstrap]
--core_perc Define how many species are required so that a gene is considered a core gene for tree calculation.
Per default, RIBAP will only consider genes that were found in all input genomes (100%).
However, this can cause tree calculation to stop when there are no such core genes.
You can lower the threshold to include more homologous genes into the tree caclulation.
The total input genome number will be multiplied by this value and rounded down when the
results is <= x.5 (e.g., 28*0.9=25.2 --> 25) and up otherwise (e.g., 28*0.95=26.6 --> 27).
All RIBAP groups that are composed of genes from different species equal or greater this number
will be considered in the core gene MSA and tree [default: $params.core_perc]
${c_yellow}UpSet plot:${c_reset}
--sets FASTA simpleNames for genomes that should be
Expand Down

0 comments on commit 0131fc2

Please sign in to comment.