-
Notifications
You must be signed in to change notification settings - Fork 127
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Potential bug in kekulization #55
Comments
Thank you, i can confirm this bug and we will look into it asap. |
Dear adamoyoung, |
Interesting! Thanks for the fix and the invite! The string in question was one I found from PubChem, |
Hi @adamoyoung, We have implemented a more flexible SMILES parser in original = "NC(=O)c1cccc2c1-c1ccc(cc1)-n-c-2=O"
decoded = sf.decoder(sf.encoder(original))
print(Chem.CanonSmiles(original) == Chem.CanonSmiles(decoded)) # now True! Thanks for the bug report! |
I was using selfies.encoder with a non-kekulized smiles string
NC(=O)c1cccc2c1-c1ccc(cc1)-n-c-2=O
, and I got the error `Encoding error 'NC(=O)c1cccc2c1-c1ccc(cc1)-n-c-2=O': kekulization algorithm failed'.However, I am able to kekulize the string with rdkit, using
rdkit.Chem.Kekulize(mol); rdkit.Chem.MolToSmiles(mol,kekuleSmiles=True)
. The resulting smiles string isNC(=O)C1=CC=CC2=C1C1=CC=C(C=C1)NC2=O
which can then be encoded as the selfies string[N][C][Branch1_2][C][=O][C][=C][C][=C][C][=C][Ring1][Branch1_2][C][=C][C][=C][Branch1_1][Branch1_1][C][=C][Ring1][Branch1_2][N][C][Ring1][Branch2_3][=O]
without error. I am just wondering if this is expected behaviour or a possibly a bug, I understand that kekulization algorithms sometimes can produce different results.I am using python 3.7.10, selfies 1.0.3, and rdkit 2018.09.3
The text was updated successfully, but these errors were encountered: