Tetrahedral stereocenters in CML and atom ordering issue

User 4df9fb85ce

06-06-2014 22:17:00

Hello,


I've been working with a set of molecules without coordinates in CML format. This dataset contains both SMILES and CML files for each molecule, but the problem is that Marvin treats some stereocenters in such CML files differently when comparing it to other cheminformatics packages. In may dataset the CML and SMILES fields should be the same, but Marvin treats them differently. I tried to localize the problem and found that atom ordering in CML influences stereocenter parity by some reason.


I took the following CML file:


<?xml version="1.0" ?>
<cml>
<molecule>
<atomArray>
<atom id="a0" elementType="C"/>
<atom id="a1" elementType="C"/>
<atom id="a2" elementType="C" isotope="13" isotopeNumber="13">
<atomParity atomRefs4="a3 a1 a4 a2">1</atomParity>
</atom>
<atom id="a3" elementType="C"/>
<atom id="a4" elementType="C" isotope="14" isotopeNumber="14">
<atomParity atomRefs4="a2 a8 a7 a4">1</atomParity>
</atom>
<atom id="a5" elementType="N"/>
<atom id="a6" elementType="C"/>
<atom id="a7" elementType="O"/>
<atom id="a8" elementType="C"/>
<atom id="a9" elementType="O"/>
</atomArray>
<bondArray>
<bond atomRefs2="a0 a1" order="1"/>
<bond atomRefs2="a1 a2" order="1"/>
<bond atomRefs2="a2 a3" order="1"/>
<bond atomRefs2="a3 a5" order="1"/>
<bond atomRefs2="a5 a6" order="1"/>
<bond atomRefs2="a6 a7" order="1"/>
<bond atomRefs2="a7 a4" order="1"/>
<bond atomRefs2="a2 a4" order="1"/>
<bond atomRefs2="a4 a8" order="1"/>
<bond atomRefs2="a8 a9" order="2"/>
</bondArray>
</molecule>
</cml>

In that file I changed order for atoms a7 and a8. From 


    <atom id="a7" elementType="O"/>
<atom id="a8" elementType="C"/>

I changed to 

<atom id="a8" elementType="C"/>
<atom id="a7" elementType="O"/>

Marvin gave me two different molecules: {mol_order1.png} and {mol_order2.png}


I compared Marvin results with OpenBabel and Indigo, and they both produces the same SMILES and Image for this molecules:


CC[13C@@H]1CNCO[14C@@H]1C=O	

Marvin produces two different molecules for {mol_order1.cml} and {mol_order2.cml}:


CC[13C@@H]1CNCO[14C@H]1C=O
CC[13C@@H]1CNCO[14C@@H]1C=O

I looked at the CML specification, but couldn't find anything that can explain differences in these CML molecules. Could you check if this is a bug?


PS: I found a remark in your documentation about CML: http://www.chemaxon.com/marvin/help/formats/cml-doc.html


Attention: When a cml file containing parity information is imported to Marvin older than 5.8, the parity information will be displayed wrongly!

I'm using Marvin 6.3, but it seems that the parity information is displayed wrongly. Or did you assume that the version less than 5.8 worked as I expect to be correct, but you realized that is a bug. If so, then could you explain why the molecules {mol_order1.cml} and {mol_order2.cml} are different?


Best regards,
Michael

ChemAxon fc046975bc

10-06-2014 13:07:04

Hello Michael,


You are right, there is a bug in our side. Marvin reads the molecule with opposite chirality value. We will fix this.


Regarding the other problem, changing of rows results in different molecules, is also bug. How big problem is it for you?



Best Regards,


Peter

User 4df9fb85ce

10-06-2014 18:55:13



Hello Peter, 



Thank you for a quick reply.



Good, that we localized the problem. I'd like to ask, how is it better to work with these files. Actually the original structures were in MRV format because they contained MDocument tags. I do not know where they come from, but they do not contain any version or namespaces. I see that the latest Marvin saves version string and specifies xml namespaces explicitly both for CML and MRV. Maybe that files were created with an older version, or that information were omitted on the phase when coordinates where discarded. The SMILES data where stored for each structure and created using Marvin Beans.



The problem is that Marvin treats these molecule pairs in CML/MRV and SMILES format to be the same, while other tools says that they are different. 



I think to convert the original structures into Molfile format using ChemAxon tools, because I should get correct Molfiles that present the same SMILES,  and  that works correctly both in Marvin and in other tools. What do you think about such solution?



Row changing problem by itself is not a problem for me. It just wanted to understand what is the reason for this difference, and found that row changing bug could explain that. I think that CML is rarely used without coordinates, and this is why the bug has not been noticed before.



Michael

ChemAxon fc046975bc

11-06-2014 06:14:58

Using MOL files is a good workflow. There should be no problem with it.


Peter