The format is based on Keep a Changelog and this project adheres to Semantic Versioning.
- Added notes on HTTP(s) server on the HiC page and on the need to move dynamically loaded content when moving the report's HTML file #183
- Fixed an issue where PLOTSR crashed due to a mismatch in the ordering of
syri.out
files whensynteny_plotsr_assembly_order
was not specified #184 - Fixed an issue where a path to HiC FastQ file pairs from the current directory were considered a SRR ID #179
- Fixed edges and input/output arrows in the flowchart #178
- Nextflow!>=24.04.2
- nf-schema@2.1.1
- Added Gfastats #126
- Updated nf-core/template to 3.0.2 #149
- Updated
samtools faidx
to 1.21 - Now using nf-test for pipeline level testing #153
- Added
text/html
as content mime type for the report file #146 - Added a sequence labels table below the HiC contact map #147
- Added parameter
hic_samtools_ext_args
and set its default value to-F 3852
#159 - Added the HiC QC report to the final report so that users don't have to navigate to the results folder #162
- Added the fastp log to the final report #163
- Updated the tube map along with the tool list #166
- Added Orthofinder #167
- Changed order of tool options in the
nextflow.config
file - Updated PFR's Kraken 2 database to
k2_pluspfp_20240904
#170 - Increased memory requirement for Kraken 2 to
256.GB
- Fixed a bug where Gene score distribution graph did not appear correctly #125
- Increased memory requirement for
DNADIFF
to avoid SLURM OOM kills with exit code 2 #141 - Documented the use explicit use of
-revision
parameter #160 - Now using
_JAVA_OPTIONS
in moduleRUNASSEMBLYVISUALIZER
to avoid user preferences related errors
- Nextflow!>=24.04.2
- nf-schema@2.1.1
- Reduced the GenomeTools stats figures to 300 DPI #142
- Now
synteny_mummer_min_bundle_size
is set to1000000
by default #142 results
is not the default output directory anymore- Removed a number of unnecessary parameters:
monochromeLogs
,config_profile_contact
,config_profile_url
,validationFailUnrecognisedParams
,validationLenientMode
,validationSchemaIgnoreParams
,validationShowHiddenParams
validate_params
- Resource parameters have been removed:
max_memory
,max_cpus
,max_time
- Configured nf-test for function testing
- Made the
hic
param pattern more flexible as^SR\w+$|^\S+\{1,2\}[\w\.]*\.f(ast)?q\.gz$
#130 - Fixed flowchart syntax to remove '\n' #132
- Updated modules to remove Bioconda
defaults
channel #135 - Now gff files for circular molecules can have end coordinates greater than the sequence length #129
- Fixed the branch protection GitHub action
- Nextflow!>=23.04.0
- nf-validation@1.1.3
- Created summary presence/absence tables for NCBI FCS modules #88
- Added min. system requirements #91
- Added a test to verify the fix for the bug which resulted in a pipeline crash for assemblies without LTRs
- Updated NCBI FCS GX to 0.5.4 #93
- Updated
SYRI
to 1.7.0 #104 - Added a script to automatically check for updates on GitHub/GitLab and post issues
- Updated modules:
UNTAR
,MERYL_COUNT
,GUNZIP
,MINIMAP2_ALIGN
,FASTQC
- Fixed a bug where
intron_length_distribution
was used instead ofcds_length_distribution
when creating the CDS Length Distribution Graph #95 - Fixed a bug where 'Subsequent pipeline modules are skipped.' was printed in the
report.html
even whencontamination_stops_pipeline
was set to false - Now NCBI FCS GX module uses all the cores available from the Nextflow task
- Fixed a bug which caused
PLOTSR
to fail for certain assembly names #102 - Now
LTRRETRIEVER_LTRRETRIEVER
does not crash when the input assembly does not contain any LTRs #92 - Now
LTRRETRIEVER_LTRRETRIEVER
does not crash when the input assembly is not writable #98 - Now soft masked regions are unmasked before computing LAI #117
- Fixed a bug in
ASSEMBLATHON_STATS
which caused it to fail on MMC executor due to multiple binds of thebin
directory - Changed
NextFlow
toNextflow
- Updated citation to Bioinformatics
- Nextflow!>=23.04.0
- nf-validation@1.1.3
- Changed default branch name from
master
tomain
in nf-core template files - Moved
version_check.sh
to.github/version_checks.sh
- Moved
docs/contributors.sh
to.github/contributors.sh
- Removed dependency on https://github.com/PlantandFoodResearch/nxf-modules.git
- Replaced
nf-core/fastq_trim_fastp_fastqc
withnf-core/fastq_fastqc_umitools_fastp
which has nf-test unit tests - Removed version check on README.md
- Updated nf-core/template to 2.14.1
- Removed release-announcements GitHub workflow
- Added a list of nf-core contributors
- Added a launcher script for local testing
local_assemblyqc
- Added a custom
BUNDLELINKS
module which respects direction when bundlingDNADIFF
links #82 - Added the ability to create linear synteny plot in addition to the circos plot #74
- Updated modules and sub-workflows:
BWA/INDEX
,BWA/MEM
,CAT/CAT
, ,CUSTOM/RESTOREGFFIDS
,CUSTOM/SHORTENFASTAIDS
,GT/GFF3
,GT/GFF3VALIDATOR
,GT/STAT
,LTRFINDER
,LTRHARVEST
,LTRRETRIEVER/LAI
,LTRRETRIEVER/LTRRETRIEVER
,SAMBLASTER
,FASTA_LTRRETRIEVER_LAI
,FASTQ_BWA_MEM_SAMBLASTER
,GFF3_VALIDATE
,CUSTOM/SRATOOLSNCBISETTINGS
,FASTP
,FASTQC
,UNTAR
,SEQKIT/SEQ
,SEQKIT/SORT
,FASTA_EXPLORE_SEARCH_PLOT_TIDK
- Now the
contamination_stops_pipeline
flag allows the pipeline to continue if contamination is detected. It's default value istrue
#54 - Now fasta ids are sorted in natural order for the HiC module #76
- Now using
FASTQ_DOWNLOAD_PREFETCH_FASTERQDUMP_SRATOOLS
for SRA downloads - Added
MERQURY
module #85 - Replaced
GFF3_VALIDATE
sub-workflow withGFF3_GT_GFF3_GFF3VALIDATOR_STAT
- Replaced local
BUSCO
module withFASTA_GXF_BUSCO_PLOT
sub-workflow #75 - Replaced local
NCBI_FCS_ADAPTOR
with nf-core module and updated to 0.5.0 which includes additional adaptors for PacBio and Nanopore technologies #55 - Added PLOTSR #77
- Added JADWOS01 assembly to xrefsheet for successfully running PLOTSR
- Now detecting duplicate sequences with
SEQKIT/RMDUP
#64
- Fixed a bug which caused NCBI_FCS_GX to not resume #80
- Restored the original version of
nf-core/subworkflows/fastq_trim_fastp_fastqc
- Fixed n-core linting
- Updated
tower.yml
- Updated LICENSE copyright to Copyright (c) 2024 The New Zealand Institute for Plant and Food Research Limited #81
RUNASSEMBLYVISUALIZER
is now single threaded for successful execution on both Linux and MacOS- Fixed java memory overflow issues in
RUNASSEMBLYVISUALIZER
- Updated
FASTA_LTRRETRIEVER_LAI
to fix a pipeline crash whench_monoploid_seqs
was[ meta, [] ]
#83 - Improved input assembly documentation #86
- Added assembly tag to synteny warning message regarding missing
synteny_labels
file - Now copying files in
NCBI_FCS_GX_SETUP_SAMPLE
rather than symlinking in an attempt to support Nextflow Fusion
- Nextflow!>=23.04.0
- nf-validation@1.1.3
- Removed
CIRCOS_BUNDLELINKS
module - Now the default value of
synteny_plot_1_vs_all
is false - Replace module
CUSTOM/CHECKGFF3FASTACORRESPONDENCE
with a local groovy function inGFF3_GT_GFF3_GFF3VALIDATOR_STAT
sub-workflow
- Now it is possible to skip FASTP and FASTQC for the HIC module
- Renamed ASSEMBLY_QC workflow to ASSEMBLYQC
- Now using nf-core/FASTA_EXPLORE_SEARCH_PLOT_TIDK
- Now redirecting validation errors to AssemblyQC report
- Simplified layout of CITATIONS.md file
- Now using pfr/gff3_validate sub-workflow for gff3 validation
- Now listing software versions from the versions.yml file
- Replaced custom GUNZIP module with nf-core/gunzip
- Replaced custom gt/stat with pfr/gt/stat
- Replaced custom fasta_validator with nf-core/fastavalidator
- Added pre-commit version checking
- Now gt/stat reports extended stats and multiple distribution plots have been added to the report
- Added a tools tab to the report which lists the tools used by the pipeline to create the report
- Refactored and cleaned data flows for all the custom sub-workflow
- Started using nf-core template
- Started using semantic versioning
- Moved all python depending packages to 'docker.io/gallvp/python3npkgs:v0.6'
- All modules are now emitting versioning information
- Fixed a bug which caused LAI to run with null assembly fasta
- Fixed FASTA_LTRRETRIEVER_LAI sub-workflow so that it respects
monoploid_ids
parameter.
- Nextflow!>=23.04.0
- nf-validation@1.1.3
- Removed BIOCODE GFF3 STATS owing to its frequent failures
- Docker engine is now also supported
- Added Amazon Genomics CLI project file and a minimal test params file: ./docs/test_params/test_agc.json
- Downgraded to Nextflow 22.04.3
- Removed container setup process from NCBI_FCS_ADAPTOR workflow
- The pipeline does not download the kraken database anymore
- Fixed a bug in SYNTENY/DNADIFF module which caused failure on AWS Batch
- Now tar zipped database can be directly used with Kraken2
- Removed
db_manifest_url
parameter for the NCBI_FCS_GX workflow - Now using parallel version of LTRHARVEST from the EDTA package
- BWA_INDEX_AND_MEM can now run for two days
- Now using FASTQ_BWA_MEM_SAMBLASTER subworkflow to optimize SAM file transfer on AWS
- Switched to apptainer from singularity
- Now requiring Nextflow/23.04.4
- Simplified output directory from
outdir.main
tooutdir
- Changed profile name from slurm to pfr
- Now using APPTAINER_BINDPATH to provide TMPDIR
- Integrated and tested FASTA_LTRRETRIEVER_LAI to replace EDTA_LAI sub-workflow
- Corrected LAI version to beta3.2
For a ~600 MB assembly, EDTA (without sensitive flag) takes ~25 hours of compute time. Whereas, FASTA_LTRRETRIEVER_LAI sub-workflow ( LTRHARVEST+LTRFINDER -> LTRRETRIEVER ) takes ~2.5 hours of compute time. LAI estimates for four plant assemblies are listed below.
Assembly | EDTA_LAI | FASTA_LTRRETRIEVER_LAI |
---|---|---|
ck6901m/v2 | 18.43 | 16.19 |
donghong/v1 | 19.03 | 16.85 |
red5/v2.1 | 18.75 | 16.59 |
tair/v10 | 18.06 | 17.42 |
- Now running kraken2 with a single cpu.
- Now pulling containers from https://depot.galaxyproject.org/singularity/
- Now pipeline timeline, report, and trace are enabled by default.
- Included
procps
package where needed to allow Nextflow to collect system statistics.
Same as v1rc6c
- Added logic for the
-mono
parameter in LAI. This parameter allows correct LAI calculation for polyploid assemblies. - Fixed the typo in
assemblathon_stats
in nextflow.config. - Fixed the test_full.config example config and docs to exclude the mitochondrion genome from synteny and LAI modules.
- Now saving
*.EDTA.TEanno.gff3
and*.EDTA.intact.gff3
with original fasta ids. - Removed comments from the ID lines of the FASTA file before running lAI.
- Now presenting the PARAMS page as formatted JSON rather than a table.
- Now SAMBLASTER can run up to 20 hours.
- (RC6b) NCBI FCS GX taxonomy is now presented as a Krona plot. (RC6c) No hits are included. Sequence length is used when calculating abundance.
- (RC6c) Krona plot for Kraken2 now uses sequence length for abundance calculation.
- Made ASSEMBLATHON_STATS robust to missing paths declared in the PATH variable.
- Updated README in accordance with SPO Editor.
- Added a note on LTR sequence identity in the nextflow.config.
- Split MATLOCK_BAM2_JUICER module into MATLOCK_BAM2_JUICER and JUICER_SORT and using
--parallel
withsort
.
- Fixed a bug in the BIOCODE GFF3 STATS module which resulted in a cramped up plot of CDS vs mRNA counts.
- Fixed a bug in the BIOCODE GFF3 STATS module which prevented it from processing valid gff3 files.
- Added labels to the pipeline flowchart.
- Update the README based on team feedback.
- Added validation for fasta and gff3 files.
- Added support for compressed files (fasta.gz, gff3.gz).
- Added BIOCODE GFF3 STATS.
- Added correspondence checks between gff3 and fasta files.
- Now using standard mode as default for LAI.
- Added information regarding LAI:EDTA time requirements for various genome sizes.
- Added information regarding influence of LAI:EDTA:is_sensitive flag on LAI scores.
- Added a params summary page.
- Now the default config file (nextflow.config) is designed to run out-of-the-box at PFR. There is no need to do any setup.
- "report" is now the default results folder.
- Added documentation and configuration files for examples based on publicly accessible data from NCBI.
- Added test configurations for Fungal, Bacterial, and Viral assemblies.
- Added test configuration for a Transcriptome of a Nematode.
- Now allowed up to 7 days for SYNTENY::DNADIFF based on recent evidence from two ~2.5 GB genomes.
- CRITICAL: Fixed a bug in LAI::EDTA which prevented it from renaming fasta ids in case they were longer than 13 characters.
- Now NCBI FCS Adaptor and NCBI FCS GX both run in parallel so that both contamination checks are part of the final report even if there is adaptor contamination.
- CRITICAL: Fixed a bug in LAI::EDTA which prevented it from renaming fasta ids in case they were longer than 13 characters.
- Now the HiC module does not require the storage_server parameter and the HiC contact map does not disappear when the report is moved across folders.
- Further developed the tutorials section.
- Improved presentation of tables for BUSCO and LAI in the report.
- CRITICAL: Fixed a bug in LAI::EDTA which prevented it from renaming fasta ids in case they were longer than 13 characters.
- CRITICAL: Fixed a bug in LAI::EDTA which prevented it from accessing the tmp directory.
- BREAKING: Merged the max_resources config file into the main config file. Slight modifications are required when using the same config file across versions.
- Now using a central location for assembly_qc singularity containers (/workspace/assembly_qc/singularity) so that individual users don't have to download these containers.
- Increased resources for the nextflow process so that it can run child processes effectively.
- Now using nf-core's convention for resource allocation and error strategy.
- Removed the option to enable hyper-threading.
- Now only saving the renamed.ids.tsv instead of the whole fasta file from EDTA.
- Now also saving the EDTA.intact.gff3 file as EDTA sometimes does not store all the annotations in the EDTA.TEanno.gff3 file.
- CRITICAL: Fixed a bug in RUN_ASSEMBLY_VISUALIZER, HIC_QC introduced by the specification of the temporary directory in version 0.10.4.
- MATLOCK_BAM2_JUICER now has two hours time limit.
- Removed dependency on conda. Instead the pipeline now requires vanilla python > 3.7. No specific python packages are required.
- Started adding detailed tutorials.
- Now TIDK supports a filter by size parameter to filter out small contigs from it output. By default this filter is turned off.
- Moved the main workflow into
workflows/assembly_qc.nf
so that it can be imported by other Nextflow pipelines. - Fixed a bug in synteny due to which the pipeline did not resume properly sometimes.
- The included binaries now have unique versions to avoid collision with binaries with same names already present on local PATH.
- Now using a unique name for the conda environment to have better interoperability across pipelines.
- Merged configuration files for compiled and max_resources.
- CRITICAL: Now explicitly setting the temporary directory to avoid "No space left" errors. This problem may have affected container build and NCBI FCS Adaptor/GX modules in the past.
- Now reporting max_gap and min_bundle size in the report for improved readability.
- Improved annotation of the config file.
- Now using natural sort in the synteny color generator so that chr10's color is assigned after chr9's color.
- Removed global variable definitions in the synteny module in the hope of improving resume-ability.
- Now all the processes have unique tags. This ensures traceability and resume-ability.
- CRITICAL: Fixed a bug in the HIC module due to which the pipeline failed to resume properly in some cases. This bug may have also caused mislabelling of the output hic file such that
hap1.hic
may be labelled ashap2.hic
and vice versa. - Added GPLv3 license.
- Now assembly tags in the dropdown menus of the report are in natural sort order.
- Allowed 2 hours for DNADIFF and CIRCOS_BUNDLE_LINKS modules.
- Contigs are now ordered by number on the synteny plot.
- Added
color_by_contig
option to the synteny module along with a maximum contrast color generator.
- Fixed a bug in the TIDK module which resulted in genome fasta file emptying in some cases.
- Added a contributors section to README.md
- Generalized and simplified configuration parameters and annotations.
- Fixed a bug in synteny analysis where
between_target_asm
flag had no effect. - Updated Juicebox.js to 2.4.3 so that HIC module works behind a VPN.
- Sorted the list of synteny plots.
- Removed auto-capitalization of text in the first column of report tables.
- Fixed a bug in the synteny module which resulted in incorrect inclusion of target sequences in 1-vs-all synteny maps.
- In the synteny plot, label font size and ticks are now responsive to the number of sequences.
- Added the
plot_1_vs_all
option in the synteny module. - Added
max_gap
andmin_bundle_size
options to the synteny module.
- Added Synteny Analysis.
- Added "-q" and "-qq" option to LAI. "-qq" is the default.
- Now copying the *.TElib.fa file from EDTA work dir to the results folder.
- Fixed the n_limit bug in assemblathon_stats.pl.
- Now using 4-hour time limit for FASTP and FASTQC.
- Added references for all the tools in the README.
- Now the conda environment is saved in the users home directory so that it can be shared across pipeline runs.
- Updated Juicebox.js to 2.4.1.
- Allowed 8 hours for BWA MEM.
- Fixed a bug in LAI where the output was not parsed correctly due to file name mismatch.
- Added NCBI FCS GX module.
- Added additional annotation to config file.
- Removed unnecessary species argument in BUSCO module.
- Moved NCBI FCS Adaptor/GX scripts to user home directory for sharing across pipeline downloads to different directories.
- Now using system-wide DBs for BUSCO and KRAKEN2.
- Added HiC Contact Map module.
- Further simplified and annotated the config file.
- Fixed a potential bug in ncbi fcs adaptor.
- Fixed rm -f bug in KRAKEN2.
- Added additional info for LAI
- Fixed a few typos in the config file.
- Fixed a bug in the slurm job submission script.
- Fixed a bug in the ASSEMBLATHON_STATS module.
- Fixed a bug in SETUP_KRAKEN2_DB module.
- Now using uniform naming in the TIDK sub-workflow.
- Max time for LAI now set to 2 hours.
- Added Kraken2 and NCBI FCS Adaptor tools.
- Added Assemblathon stats.
- Added
Genometools gt stat
statistics for gff3 files. - Added both a priori and a posteriori sequence search in TIDK.
- Simplified pipeline flow chart.
- Simplified conda environment.
- Fixed css styling browser conflicts
- TIDK process now uses a container instead of conda.
- Included results_dict and dependencies dict (without html formatting) to json.
- Removed completed items in readme.
- Fixed json dump repeating image url.
- Added LAI.
- Now sorting sequences by size before feeding to TIDK.
- Added skip switches for all the tools.
- Added configuration annotations.
- Optimised resource allocation.
- Changed report parsers to allow alphanumeric ([a-zA-Z0-9_]) characters in the haplotype names.
- Added TIDK
- Added ability run BUSCO for multiple augustus species simultaneously
- Formatted tabs into a drop down list for ease of navigation
- Summary page has been added
- BUSCO plots are now rendered on the summary page
- Styling has been changed for better user experience
- Added ability to run BUSCO for multiple haplotypes simultaneously
- Updated README for new functionality
- Adjusted styling for easier comparisons between reports
- Incorporated conda instead of python venv
- Added ability to run BUSCO for multiple lineages simultaneously
- Removed intermediary outputDir
- Standardised naming conventions across the tool
- Updated README for new functionality
- Change report.html layout to tab view