- Fixed bug in the kekulization of molecules with radicals (thanks Olabisi-Aishat-Bello for reporting, thanks Robert Pollice for fixing)
- Fixed constraints for validity of molecules with changed C, P or S, to align with validity-definition of RDKit.
- Fixed recursion bug for very long molecules (thanks haydn-jones)
- Added warning when dot-symbol (".") exists in peculiar cases (thanks vandrw)
- Fixed index bug in attribution
- Dropped support for Python 3.5-3.6 and will continue to support only current Python versions.
- optional attribution to map encoder/decoder output string back to input string (Issue #48, #79)
- Improved SMILES parsing (by using adjacencey lists internally), with tighter error handling (e.g. issues #62 and #60).
- Faster and improved kekulization algorithm (issue #55 fixed).
- Support for symbols that are constrained to 0 bonds (e.g.,
[CH4]
) or >8 bonds (users can now specify custon bond constraints with over 8 bonds). - New
strict=True
flag toselfies.encoder
, which raises an error if the input SMILES violates the current bond constraints.True
by default, can beFalse
for speed-up (if SMILES are guaranteed to be correct). - Added bond constraints for B (max. 3 bonds) to the default and preset constraints.
- Updated the syntax of SELFIES symbols to be cleaner and more readable.
- Removing
expl
from atomic symbols, e.g.,[C@@Hexpl]
becommes[C@@H]
- Cleaner branch symbols, e.g.,
[BranchL_2]
becomes[=BranchL]
- Cleaner ring symbols, e.g.,
[Expl=RingL]
becomes[=RingL]
- If you want to use the old symbols, use the
compatible=True
flag toselfies.decoder
, e.g.,sf.decoder('[C][C][Expl=Ring1]',compatible=True)
(not recommended!)
- Removing
- More logically consistent behaviour of
[Ring]
symbols. - Standardized SELFIES alphabet, i.e., no two symbols stand for the same atom/ion (issue #58), e.g.,
[N+1]
and[N+]
are equivalent now. - Indexing symbols are now included in the alphabet returned by
selfies.get_semantic_robust_alphabet
.
- Removed
constraints
flag fromselfies.decoder
; please useselfies.set_semantic_constraints()
and pass in"hypervalent"
or"octet_rule"
instead. - Removed
print_error
flag inselfies.encoder
andselfies.decoder
, which now raise errorsselfies.EncoderError
andselfies.DecoderError
, respectively.
- Potential chirality inversion of atoms making ring bonds (e.g.
[C@@H]12CCC2CCNC1
): fixed by inverting their chirality inselfies.encoder
such that they are decoded with the original chirality preserved. - Failure to represent mismatching stereochemical specifications at ring bonds
(e.g.
F/C=C/C/C=C\C
): fixed by adding new ring symbols (e.g.[-/RingL]
,[\/RingL]
, etc.).
- decoder option for relaxed hypervalence rules,
decoder(...,constraints='hypervalent')
- decoder option for strict octet rules,
decoder(...,constraints='octet_rule')
- Fixed constraint for Phosphorus
- Support for aromatic Si and Al (is not officially supported by Daylight SMILES, but RDKit supports it and examples exist in PubChem).
- Support for aromatic Te and triple bonds.
- Inbuild SELFIES to 1hot encoding, and 1hot encoding to SELFIES
- Added default semantic constraints for charged atoms (single positive/negative charge of
[C]
,[N]
,[O]
,[S]
,[P]
) - Raised the bond capacity of
P
to 7 bonds (from 5 bonds).
- Fixed bug:
selfies.decoder
did not terminate for malformed SELFIES that are missing the closed bracket']'
.
- Code so that is compatible with python >= 3.5.
- More descriptive error messages.
- Minor bug fixes in the encoder for SMILES ending in branches (e.g.
C(Cl)(F)
), and SMILES with ring numbers between branches (e.g.C(Cl)1(Br)CCCC1
) - Minor bug fix with ring ordering in decoder (e.g.
C1CC2CCC12
vsC1CC2CCC21
).
- Added semantic handling of aromaticity / delocalization (by kekulizing SMILES with aromatic symbols before
they are translated into SELFIES by
selfies.encoder
). - Added semantic handling of charged species (e.g.
[CH+]1CCC1
). - Added semantic handling of radical species (
[CH]1CCC1
) or any species with explicit hydrogens (e.g.CC[CH2]
). - Added semantic handling of isotopes (e.g.
[14CH2]=C
or[235U]
). - Improved semantic handling of explicit atom symbols in square brackets, e.g. Carbene (
[C]=C
). - Improved semantic handling of chirality (e.g.
O=C[Co@@](F)(Cl)(Br)(I)S
). - Improved semantic handling of double-bond configuration (e.g.
F/C=C/C=C/C
). - Added new functions to the library, such as
selfies.len_selfies
andselfies.split_selfies
. - Added advanced-user functions to the library to customize the SELFIES semantic constraints, e.g.
selfies.set_semantic_constraints
. Allows to encode for instance diborane,[BH2]1[H][BH2][H]1
. - Introduced new padding
[nop]
(no operation) symbol.
- Optimized the indexing alphabet (it is base-16 now).
- Optimized the behaviours of rings and branches to fix an issue with specific non-standard molecules that could not be translated.
- Changed behaviour of Ring/Branch, such that states
X9991-X9993
are not necessary anymore. - Significantly improved encoding and decoding algorithms, it is much faster now.
- Function
get_alphabet()
which returns a list of 29 selfies symbols whose arbitrary combination produce >99.99% valid molecules.
- Fixed bug which happens when three rings start at one node, and two of them form a double ring.
- Enabled rings with sizes of up to 8000 SELFIES symbols.
- Bug fix for tiny ring to RDKit syntax conversion, spanning multiple branches.
We thank Kevin Ryan (LeanAndMean@github), Theophile Gaudin and Andrew Brereton for suggestions and bug reports.
- Enabled
[C@],[C@H],[C@@],[C@@H],[H]
to use in a semantic constrained way.
We thank Andrew Brereton for suggestions and bug reports.
- Decoder: added optional argument to restrict nitrogen to 3 bonds.
decoder(...,N_restrict=False)
to allow for more bonds; standard:N_restrict=True
. - Decoder: added optional argument make ring-function bi-local
(i.e. confirms bond number at target).
decoder(...,bilocal_ring_function=False)
to not allow bi-local ring function; standard:bilocal_ring_function=True
. The bi-local ring function will allow validity of >99.99% of random molecules. - Decoder: made double-bond ring RDKit syntax conform.
- Decoder: added state X5 and X6 for having five and six bonds free.
- Decoder + Encoder: allowing for explicit brackets for organic atoms, for
instance
[I]
. - Encoder: explicit single/double bond for non-canonical SMILES input issue fixed.
- Decoder: bug fix for
[Branch*]
in state X1.
We thank Benjamin Sanchez-Lengeling, Theophile Gaudin and Zhenpeng Yao for suggestions and bug reports.
- initial release