Skip to content
This repository has been archived by the owner on Aug 31, 2022. It is now read-only.

Commit

Permalink
Sloika 2.1 release
Browse files Browse the repository at this point in the history
  • Loading branch information
tmassingham-ont committed Jul 24, 2018
0 parents commit 5874bbd
Show file tree
Hide file tree
Showing 89 changed files with 8,790 additions and 0 deletions.
13 changes: 13 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
*.pyc
*~
*.sw?
*.so
*.egg-info
build
dist
sloika/viterbi_helpers.c
sloika/version.py
run
.cache
.pytest_cache
.tox
136 changes: 136 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,136 @@
Release 2.0
===========

* Migrate to python3 in full and stop supporting python2
* Open-source project


Release 1.2 brown bag 1
=======================

* Bug fixes to chunkify from raw data
* Take the `--drop` into account when reporting label accuracy during training


Release 1.2
===========

* Can train and basecall off raw data
* Examples in `models/` directory use temporal convolution to extract features from the raw signal
* Training data may be labelled from events or remapped on the fly using a pretrained raw data model
* The interfaces for most scripts have changed to support raw data:
* `./bin/basecall_network` and `./bin/train_network.py` provide `raw` and `events` routes
* `./bin/chunkify.py` provides `identity`, `remap`, `raw_identity` and `raw_remap` routes
* `./bin/basecall_network.py` may be used via `./bin/basecall_network` entry point that sets up its environment
* Full builds are performed on CI only for `master`, `release` and branches with names having `ci_` prefix
* Layers:
* All layers now have `insize` and `size` attributes; models pickled with previous versions of Sloika will not be unpicklabe in Sloika 1.2
* Added `Convolution` and `MaxPool` layers
* Improvements to `pekarnya.py`:
* Updated database schema
* Runs are marked with time stamps and git commits
* Uniquely generated output directories facilitate restart of failed jobs
* New script `./bin/extract_reference.py` for extraction of references from a directory of fast5 files


### Minor changes

* Minimum required numpy version bumped to 1.9.0
* `verify_network.py` tests network execution on random inputs
* `align.py` outputs reference coordinates
* `align.py` fixed under python3
* Randomly choosing chunk size during training is now the default


Release 1.1 brown bag 3
=======================

Addresses the following issue:
* Make Sloika 1.1 avoid h5py version 2.7.0. This version works in Dev environment, but we are unable to import it on CI nodes
https://github.com/h5py/h5py/issues/803


Release 1.1 brown bag 2
=======================

Addresses the following issues:
* Brown bag 1 did not update changelog
https://git.oxfordnanolabs.local/algorithm/sloika/issues/43
* Fixes TypeError in json dump script when invoked with `--out_file` option
https://git.oxfordnanolabs.local/algorithm/sloika/issues/52


Release 1.1 brown bag 1
=======================

Addresses the following issues:
* Remapping doesn't appear to work with 2d reads
https://git.oxfordnanolabs.local/algorithm/sloika/issues/40
* Can't change segmentation in chunkify remap
https://git.oxfordnanolabs.local/algorithm/sloika/issues/41
* Work around Isilon problems
https://git.oxfordnanolabs.local/algorithm/sloika/issues/42


Release 1.1
===========

* Activation functions have been separated into their own module and many new functions have been added
See https://wiki/display/~tmassingham/2016/10/17/Activation+functions
Note: this rearrangement breaks compatibility with older model pickle files
* Refactoring of `NBASE` constant
Now a single source of responsibility `sloika/variables.py`
Models importing `_NBASE` from `sloika/module_tools.py` should now import `NBASE` instead
* Default for training and basecalling are transducer based models
* Compilation of networks is handled automatically by `basecall_network.py`
* Compiled network may be saved for future use
* `compile_network.py` executable has been removed
* Recurrent layers
* New recurrent unit types have been added
* Detailed tests to ensure recurrent layers work
* Type of gate function is now an option on layer initialisation
* Pekarnya server for scheduling model training jobs
https://wiki/display/RES/Pekarnya
* Considerable work on the building and testing infrastructure
* Stable and development branches were created
* Binary artefacts are built for each commit in development branch
* Artefacts are automatically versioned in development branch
* Unit and acceptance tests are exercising artefact before it is marked as a release candidate
* Remapping using RNN from fast5 directly to chunks
* `chunkify.py`
* `chunkify.py identity` has similar behaviour to `chunk_hdf5.py`
* `chunkify.py remap` will remap a directory of fast5 files using a transducer RNN before chunking
* `remap_hdf5.py` and `chunk_hdf5.py` removed in favour of `chunkify.py`
* Per chunk normalisation optional `--normalisation`
* Default is still to normalise over entire read
* Chunk size for training can be randomly selected from batch to batch
* `--chunk_len_range min max`
* Default is to always train with maximum possible chunk size
* Chunk size chosen uniformly in specified interval
* Edge events are not used when assessing loss function
* `--drop n`
* Default to drop 20 events from start and end before assessing loss


### Minor changes

* Changed default trimming from ends of sequence
* Fix to allow trimming of zero events
* Minimum read length (in events) for chunking to take place
* Removed vestigial `networks.py` file that has been replaced by the contents of the `models/` directory
* Seed for random number generator can be set on command line of `train_network.py`
* Enable HDF5 compression
* Fix to ensure every chunk starts with a non-zero (not stay) label
* Trim first and last events from loss function calculation (burn-in)
* Fix bug in how kmers are merged into sequence in low complexity regions
* Increased PEP8 compliance
* Default location of segmentation information has changed (see Untangled 0.5.1)
* Location of segmentation information can now be given as commandline option in many programs
* Trainer copies logging information to stdout. May be silenced with `--quiet`
* JSON may be dumped to file rather than stdout


Release 1.0
===========

Initial release
Loading

0 comments on commit 5874bbd

Please sign in to comment.