Standardizer output with "mysterious" single bonds

ChemAxon 587f88acea

23-09-2004 14:14:47

When using the standardizer I figured out another problem by accident. When applying a second standardize on an already standardized smiles string there are still changes be made on double bond chirality. I first thought the reason might be a standardize.xml file which has certain reaction tags that supply as output their own input... but I was wrong as my reaction tags shouldn't affect chirality and even when leaving only the aromatize action tag in the config file I get this phenomenon on certain structures. Basically this does not affect the structure but produces two distinct differing strings and makes it hard to handle them in a DBMS.





Examples:


Code:



origin:: O=C\1C4=C(NC1=C\3NC2=CC=CC=C2C3=O)C=CC=C4


normalized once:  O=C1\C(Nc2ccccc12)=C3/Nc4ccccc4C3=O


normalized twice: O=C1C(\Nc2ccccc12)=C3/Nc4ccccc4C3=O





smiles  = origin:: C2(=CC=C(NC1=CC=CC=C1)C=C2)\C(C3=CC(=C(C=C3)N)C)=C/4C=CC(/C=C4)=N\C5=CC=CC=C5


normalized once:  Cc1cc(ccc1N)C(\c2ccc(Nc3ccccc3)cc2)=C4/C=CC(\C=C4)=N/c5ccccc5


normalized twice: Cc1cc(ccc1N)\C(c2ccc(Nc3ccccc3)cc2)=C4/C=CC(\C=C4)=N/c5ccccc5








This problem occurs pretty often. With the datasets I am working about. 1.5% of the molcules are affected.





Any ideas to avoid ?





cheers





Friedemann

User f359e526a1

24-09-2004 14:59:01

Hello, it is has nothing to do with the Standardizer but with the SMILES export in general. First we should to know why do you want unique SMILES since the "unique" name can be sometimes misleading when dealing with compounds with stereo centres. According to the SMILES specification http://www.daylight.com/dayhtml/doc/theory/theory.smiles.html (3.1. SMILES Specification Rules) there areMarvin (and the Standardizer) is generating unique SMILES using the algorithm in the aforementioned article but it is not guaranteed to give absolute SMILES for every structure. Generating absolute SMILES can be very CPU demanding and is not recommended, currently we are using an approximation to make it as good as possible. In practice the workaround for generating absolute SMILES is make two runs - as you realised - but it is not always possible to run SMILES generation twice.


So, please write what are the exact difficulties with the database, what are the handling issues we could help to solve.





Hope it helps:


Szilva

ChemAxon 43e6884a7a

28-09-2004 05:35:56

For correct exact (and perfect) structure searching, MolSearch and JChemSearch classes of JChem Base or the jc_equals SQL operator of the JChem Cartridge are suggested instead of using unique SMILES directly.

ChemAxon 587f88acea

28-09-2004 07:35:40

Our primary concern was to have a comparative value to determine if two structures are equal or not actually before having it put into database. For further trials in this sector I will bear in mind to actually use your cartridge instead of doing it "manually".


Nevertheless, calling the standardizer twice has proven to be unsusceptible.


I will investigate the uSMILES algorithm for better understanding.





Friedemann