Releases: sourmash-bio/sourmash
v4.4.2
Minor fixes and performance improvements:
- circumvent a very slow
MinHash.remove_many(...)
call insourmash gather
(#2123)
Developer updates:
- substantial refactoring of
CounterGather
and relatedIndex
code. (#2116) - update
Index
protocol tests to include tests forpeek
andconsume
(#2111) - Bump pypa/cibuildwheel from 2.7.0 to 2.8.0 (#2118)
- test insert after downsample for LCA_Database (#2117)
- update release notes & pyproject.toml after v4.4.1 (#2114)
v4.4.1
Major new features:
- less stringent size accuracy parameters for ANI accuracy reporting (#2074)
- only skip dist est if containment/jaccard are 0 or 1 (#2060)
- emit fewer warnings about potential ANI estimation issues (#2061)
Minor new features:
- fix
lca summarize
to support general collections for queries (#2107) - add compare --avg-containment (#2056)
Documentation updates:
- fix search and gather docs (#2105)
- fix
CITATION.cff
YAML and add a test for parseability and content. (#2103)
Developer updates:
- move setup.cfg into pyproject.toml (#2097)
- Fix downsample_scaled in
core
(#2108) - add picklist tests; support for allow_empty (#2106)
- remove LazyLoadedIndex (#2104)
- Bump web-sys from 0.3.57 to 0.3.58 (#2092)
- Bump getrandom from 0.2.6 to 0.2.7 (#2090)
- Bump wasm-bindgen-test from 0.3.30 to 0.3.31 (#2093)
- Bump pypa/cibuildwheel from 2.6.1 to 2.7.0 (#2089)
- Build: nix updates (#2088)
- CI: split wheel building (#2087)
- rust version bumps (#2086)
- Update sphinx requirement from <5,>=4.4.0 to >=4.4.0,<6 (#2068)
- Bump actions/setup-python from 3 to 4 (#2080)
- Bump myst-parser from 0.17.2 to 0.18.0 (#2081)
- Bump pypa/cibuildwheel from 2.5.0 to 2.6.1 (#2079)
- remove unnecessary
object
fromclass
definitions (#2077)
v4.4.0
This release contains many new features! Of particular note:
- sourmash now estimates and outputs average nucleotide identity (ANI) based on k-mer measures;
sourmash sketch translate
is no longer unusably slow;- we provide Mac OS 'arm64' wheels for the new M1 Macs;
- we've added a number of support features for managing large collections of signatures and building very large databases;
- and we've added support for SQLite databases that can be used for storing and searching signatures and doing Kraken-style LCA analysis of genomes and metagenomes.
In addition, we have built updated Genbank genome databases (with contents from March 2022) as well as GTDB R07-RS207 databases; see the prepared databases page. We've also made some benchmarks available for these databases, so you can get some idea of the necessary computational resources for your searches.
Last but by no means least, we have begun providing a number of examples and recipes for using sourmash - see the new sourmash examples Web site!
Major new features:
- add ANI output to search, prefetch, and gather (#1934, #1952, #1955, #1966, #1967, #2011, #2031, #2032)
- new GTDB and Genbank database releases (#2013, #2038)
- provide macos arm64 wheels (#1935)
- support for SQLite databases (#1808)
- implement
sourmash sketch fromfile
(#1884, #1885, #1886, #2009) - add
sourmash sig check
for comparing picklists and databases (#1907, #1915, #1917) - add
sig collect
command (#2036) for building standalone manifests from many databases - Add direct loading of manifest CSVs as sourmash indices (#1891)
- add
-A/--abundance-from
tosig subtract
& addsig inflate
(#1889) - advanced database format documentation (#2025)
Minor new features:
- add
-d/--debug
tosourmash sig describe
; upgrade output errors. (#1782) - add
sum_hashes
tosourmash sig describe
output. (#1882)
Bug fixes:
- catch TypeError in search w/abund vs flat at the command line (#1928)
- speed up
SeqToHashes
translate
(#1938, #1946)
Cleanup and documentation fixes:
- better handle some pickfile errors (#1924)
- remove unnecessary downsampling warnings (#1971)
- use same wording for dayhoff/hp as for dna/protein (#1929)
- rename
covered_bp
property to better reflect function (#2050)
Developer updates:
- provide "protocol" tests for
Index
,CollectionManifest
, andLCA_Database
classes (#1936) - remove khmer CI tests (#1950)
- Benchmarks for seq_to_hashes in protein mode (#1944)
- add some tests for Jaccard output ordering (#1926)
- Oxidize ZipStorage (#1909)
- cleanup and commenting of
test_index.py
tests. (#1898, #1900) - rationalize
_signatures_with_internal
(#1896) - Convert nix to flakes (#1904)
- fix docs build (#1897)
- Fix build/CI and unused imports papercuts (#1974)
- fix hypothesis CI (#2028)
- dependabot version updates (#1977, #1978, #1979, #1980, #1981, #1982, #1983, #1984, #1985, #1986, #1987, #1988, #1989, #1991, #1993, #1994, #1995, #1996, #1997, #1998, #2017, #2019, #2020, #2021, #2022, #2023, #2042)
v4.3.0
New features:
- add
sourmash sig grep
(#1864) - add
sourmash sig summarize
(#1837, #1863) - add
--include-db-pattern
and--exclude-db-pattern
to many commands (#1871) - update lca summarize output to output total counts (#1838)
Bug fixes:
- fix
sourmash prefetch
to work when db scaled is larger than query scaled (#1870) - fix
sourmash prefetch
for multiple ksizes in database (#1866) - allow missing columns in tax CSV files (#1869)
- fix containment calculation for nodegraphs (#1862)
- fix
tax prepare
SQL code for empty/blank taxonomic ranks (#1843)
Cleanup and documentation fixes:
- clean up 'describe' a little bit, add a test (#1861)
- add --output-dir as alias for every --outdir (#1817)
- fix doc titles in
command-line.md
and update description a bit (#1874)
Developer updates:
v4.2.4
Medium bug fixes:
- fix bug where
sourmash sketch ... --singleton -o output.sig
drops signatures (#1810) - fix
sourmash search --containment
with two abund signatures (#1780) - fix plot/labels/CSV ordering with
sourmash plot --csv
(#1821)
Small bug fixes:
- fix
Index.search_abund
downsampling and filename output (#1820) - check to make sure that .zip files exist before trying to load from them (#1777)
- fix and test and refactor output information during signature creation (#1826)
Minor new functionality:
- adjust text output of gather to indicate weighted/unweighted results (#1819)
- update
sourmash multigather
to save hash abundances to.unassigned.sig
(#1720) - re-inflate prefetch output sketches (#1827)
Cleanup and documentation fixes:
- fix 'sketch' output info (#1794)
- fix PMID for mock metagenome (#1811)
- check to make sure that = is in param strings where necessary (#1775)
Developer updates:
v4.2.3
Minor new features:
- Save prefetch csv directly from prefetch-gather with
--save-prefetch-csv
(#1765) - Added brief descriptions and
-h/--help
text to sourmashgather
,search
, andcompare
(#1735) - Adding bounds checking for
--scaled
and--num
insourmash sketch
(#1711)
Documentation updates:
- update release notes with -m for git tag (#1754)
- update coverage from 10x to 20x per description in documentation page (#1736)
Development updates:
- Update tests to use
runtmp
fixture instead ofutils.TempDirectory()
(#1718) - Refactor
ZipFileLinearCollection
andSaveSignatures_ZipFile
to useZipStorage
(#1598) - Clippy fixes for 1.57 beta (#1760)
- CI: Update cibuildwheel usage (#1759)
- Replace
notify
format usage with f-strings instead (#1723) - CI: Fix build errors with cbindgen (#1713)
- Change
sourmash compute
tosourmash sketch
in test files (#1712) - Update tests to use
runtmp
fixture instead ofutils.TempDirectory()
v4.2.2
Major new features:
- added functionality to recover original k-mers given hashes -
sourmash sig kmers
et al. (#1653, #1695, #1701)
Documentation updates:
Minor new features:
- Adjusted dayhoff and hp encodings to tolerate stop codons in the protein sequence (#1673)
Bug fixes and performance improvements:
- Fixed panic bug in
sourmash sketch
dna with bad input and--check-sequence
(#1702)
Refactoring and cleanup:
v4.2.1
This is a bug-fix and performance release of sourmash.
There are no major new features.
git log --oneline v4.2.0..latest
Minor new features:
- new picklist coltypes for directly using
gather
,prefetch
, andmanifest
outputs without specifying column name (#1660) - add
--from-file
tosig cat
(#1657) - implement a lazy/on-demand
Index
loading class to support low memory tracking of a large index (#1661) - add
sourmash tax prepare
to build SQLite taxonomy databases for use withtax
commands(#1651) - Support manifests in
MultiIndex
(#1654) tax
summarization additions and fixes, including reporting bp and unclassified (#1667)- add
--from-file
, improved sig selection to mostsig
commands (#1672)
Bug fixes and performance improvements:
- fix bug in
gather
when run withscaled=1
(#1670)
Documentation updates:
- Add sourmash-bio/community Gitter badge to README (#1658)
Refactoring and cleanup:
- add tests for
sourmash tax
--containment-threshold
arg (#1666) - fix
sourmash tax
usage string (#1655) - add bounds checking for
--scaled
(#1650)
Rust interface:
- Rust Core update (tag: r0.11.0) (#1643)
v4.2.0
This release adds several significant features: first, we've added a set of taxonomy
command-line functionality for combining sourmash gather
output with taxonomy databases, and we've also added a new "picklist" feature that enables flexible selection of subsets of databases. Finally, we've added manifests to databases to support picklists as well as faster database loading and signature selection.
As of this release, we've also formally moved development over to the sourmash-bio organization on GitHub, and we've created a new gitter support channel, sourmash-bio/community. Please join us there if you have any questions, comments, or feature requests!
Major new features:
- add
tax/taxonomy
submodule (#1543, #1628, #1630, #1648) - add picklists for subsetting databases and results (#1587, #1588, #1623, #1590, #1639)
- Add manifests to support fast
Index.select(...)
and lazy loading (#1590)
Documentation updates:
- Add new GTDB databases description to docs and start legacy databases page (#1581)
- Change
dib-lab/
URLs to newsourmash-bio/
URLs. (#1629) - Add notice for sustainable open source study (#1580)
Minor new features:
- alias
--nucleotide
,--no-nucleotide
for moltype args. (#1632) - add signature names to known/unknown hash sigs output by
sourmash prefetch
(#1646)
Bug fixes and performance improvements:
- Speed up
sourmash gather
with prefetch by ignoring unidentifiable hashes (#1613) - Check for
MinHash
compatibility inMinHash.intersection_and_union(...)
(#1627) - Fix selection w/abund and manifest column type conversions (#1645)
Refactoring and cleanup:
v4.1.2
This is a bug-fix and performance release of sourmash.
There are no major new features.
Minor new features:
- add query info to gather CSV output (#1565)
Bug fixes and performance improvements:
- Improved
MinHash.remove_many(...)
performance by five orders of magnitude (#1571) - Fix SBT index saving bug that arbitrarily replaced names (but not content) of identical signatures in
.sbt.zip
files (#1568) - Empty zipfiles should not cause
AssertionError
(#1546)
Major refactoring and new internal functionality:
- update
MinHash.set_abundances
to remove hash if 0 abund; handle negative abundances (#1575)
Refactoring and cleanup: