Molecule.toFormat("smiles") problem

User 941c2467a3

25-06-2010 20:43:00

Dear Chemaxon developers,

We have a SMILES string,
CC(C)(C)C(\C([O-])=O)=C\C=C(\C([O-])=O)C([O-])=O


When we threw it in a Molecule and convert it out as a SMILES string,
CC(C)(C)C(\C([O-])=O)=C\C=C(C([O-])=O)C([O-])=O

by the following process,

Molecule mol = MolImporter.importMol();
mol.toFormat("smiles");

The output SMILES string lost some information on its stereochemistry.

We found this using JChem 5.3.4.

Thanks for any suggestions.

Best,
Jeff Gao



ChemAxon 25dcd765a3

28-06-2010 16:08:35

Hi,


What is your output SMILES?


I have the following output: CC(C)(C)C(\C([O-])=O)=C\C=C(C([O-])=O)C([O-])=O


And this exactly matches the input.


 


Andras

User 941c2467a3

28-06-2010 16:22:54










volfi wrote:

Hi,


What is your output SMILES?


I have the following output: CC(C)(C)C(\C([O-])=O)=C\C=C(C([O-])=O)C([O-])=O


And this exactly matches the input.


 


Andras



Andras,


There is a slight different in it.



CC(C)(C)C(\C([O-])=O)=C\C=C(\C([O-])=O)C([O-])=O - input
CC(C)(C)C(\C([O-])=O)=C\C=C(C([O-])=O)C([O-])=O - output


ChemAxon 25dcd765a3

28-06-2010 16:50:04

Hi,


You are right, but that information (the backslash) is superfluous as two similar ligands are attached for the same atom of the double bond, namely the carboxil group (C([O-])=O).


Andras

User 941c2467a3

28-06-2010 17:26:41










volfi wrote:

Hi,


You are right, but that information (the backslash) is superfluous as two similar ligands are attached for the same atom of the double bond, namely the carboxil group (C([O-])=O).


Andras



They look the same. But when I throw them into a reaction, they will gave different products:


./react -v -r ../molecules/bt0051_3573 "CC(C)(C)C(\C([O-])=O)=C\C=C(\C([O-])=O)C([O-])=O"
CC(C)(C)C(\C([O-])=O)=C\C=C/C([O-])=O
CC(C)(C)C(\C([O-])=O)=C\C=C\C([O-])=O


./react -v -r ../molecules/bt0051_3573 "CC(C)(C)C(\C([O-])=O)=C\C=C(C([O-])=O)C([O-])=O"
CC(C)(C)C(\C([O-])=O)=C\C=CC([O-])=O


The first group of products are desired because of having the stereochemistry information; but second example doesn't have stereochemistry information.


I used JChem 5.3.4, and the reaction file was attached. Thank you for any further insights!

ChemAxon e08c317633

01-07-2010 23:21:37

For better understanding let's see what are the products without duplicate filtering ("-w" option; see attached image):


1. The input reactant contains superfluous stereochemistry information:


react -r bt0051_3573.mrv "CC(C)(C)C(\C([O-])=O)=C\C=C(\C([O-])=O)C([O-])=O" -t reaction -M changing -w
CC(C)(C)C(=C\C=[C:1]([C:2]([O-:4])=[O:3])C([O-])=O)\C([O-])=O>>CC(C)(C)C(=C\C=[CH:1]/C([O-])=O)\C([O-])=O
CC(C)(C)C(=C\C=[C:1](C([O-])=O)[C:2]([O-:4])=[O:3])\C([O-])=O>>CC(C)(C)C(=C\C=[CH:1]\C([O-])=O)\C([O-])=O

2. The input reactant  does not contain superfluous stereochemistry information:


react -r bt0051_3573.mrv "CC(C)(C)C(\C([O-])=O)=C\C=C(C([O-])=O)C([O-])=O" -t reaction -M changing -w
CC(C)(C)C(=C\C=[C:1]([C:2]([O-:4])=[O:3])C([O-])=O)\C([O-])=O>>CC(C)(C)C(=C\C=[CH:1]C([O-])=O)\C([O-])=O
CC(C)(C)C(=C\C=[C:1](C([O-])=O)[C:2]([O-:4])=[O:3])\C([O-])=O>>CC(C)(C)C(=C\C=[CH:1]C([O-])=O)\C([O-])=O

Both reaction generate two product sets, but when there is no superfluous stereochemistry information in the reactant then the generated products are identical and one of them is filtered out. It seems the input molecule (imported by MolImporter) contains the superfluous stereochemistry information, and it is removed only at export.


Solution: please use the stereoisomer plugin to generate E/Z stereoisomers of the products with unspecified double bonds.


Example:


react -r bt0051_3573.mrv "CC(C)(C)C(\C([O-])=O)=C\C=C(C([O-])=O)C([O-])=O" | cxcalc doublebondstereoisomers --protectdoublebondstereo -f smiles
CC(C)(C)C(=C\C=C\C([O-])=O)\C([O-])=O
CC(C)(C)C(=C\C=C/C([O-])=O)\C([O-])=O

The output of reactor is piped to cxcalc. The doublebondstereoisomers calculation generates the required stereoisomers.


I hope this helps.


Zsolt

User 941c2467a3

02-07-2010 01:46:20










Zsolt wrote:

For better understanding let's see what are the products without duplicate filtering ("-w" option; see attached image):


1. The input reactant contains superfluous stereochemistry information:


react -r bt0051_3573.mrv "CC(C)(C)C(\C([O-])=O)=C\C=C(\C([O-])=O)C([O-])=O" -t reaction -M changing -w
CC(C)(C)C(=C\C=[C:1]([C:2]([O-:4])=[O:3])C([O-])=O)\C([O-])=O>>CC(C)(C)C(=C\C=[CH:1]/C([O-])=O)\C([O-])=O
CC(C)(C)C(=C\C=[C:1](C([O-])=O)[C:2]([O-:4])=[O:3])\C([O-])=O>>CC(C)(C)C(=C\C=[CH:1]\C([O-])=O)\C([O-])=O

2. The input reactant  does not contain superfluous stereochemistry information:


react -r bt0051_3573.mrv "CC(C)(C)C(\C([O-])=O)=C\C=C(C([O-])=O)C([O-])=O" -t reaction -M changing -w
CC(C)(C)C(=C\C=[C:1]([C:2]([O-:4])=[O:3])C([O-])=O)\C([O-])=O>>CC(C)(C)C(=C\C=[CH:1]C([O-])=O)\C([O-])=O
CC(C)(C)C(=C\C=[C:1](C([O-])=O)[C:2]([O-:4])=[O:3])\C([O-])=O>>CC(C)(C)C(=C\C=[CH:1]C([O-])=O)\C([O-])=O

Both reaction generate two product sets, but when there is no superfluous stereochemistry information in the reactant then the generated products are identical and one of them is filtered out. It seems the input molecule (imported by MolImporter) contains the superfluous stereochemistry information, and it is removed only at export.


Solution: please use the stereoisomer plugin to generate E/Z stereoisomers of the products with unspecified double bonds.


Example:


react -r bt0051_3573.mrv "CC(C)(C)C(\C([O-])=O)=C\C=C(C([O-])=O)C([O-])=O" | cxcalc doublebondstereoisomers --protectdoublebondstereo -f smiles
CC(C)(C)C(=C\C=C\C([O-])=O)\C([O-])=O
CC(C)(C)C(=C\C=C/C([O-])=O)\C([O-])=O

The output of reactor is piped to cxcalc. The doublebondstereoisomers calculation generates the required stereoisomers.


I hope this helps.


Zsolt



Thank you very much Zsolt.


Yes. The superfluous stereochemistry information of reactant was lost at the export process (mol.toFormat("smiles")). I searched the export parameters and there are no ways to keep such superfluous information during the export process. We will eithor think to modify our rules or give a try on the cxcalc tool.


Best regards,


Jeff Gao