Incorrect interpetation of CML bondstereo

User bd69837856

16-06-2010 15:08:59

Tested on Marvin 5.3.4 (also occurs on 5.3.2)


Marvin sketch interprets the attached CML as being the Z isomer whilst the structure is the E isomer.


The CML bondstereo references the carbon of the nitrile and the
carbon on the dimethoxyphenyl and indicates they are trans to each other whilst Marvin has
them cis.


Thanks,


Daniel

User bd69837856

18-06-2010 14:28:36

Just to add the bug appears to be caused by the atomRefs4 attribute which specifies what atoms the cis/trans is reffering to being ignored.

ChemAxon 5433b8e56b

22-06-2010 13:55:31

Hi,


currently Marvin does not support the atomRefs4 property on bondStereo tag unfortunatelly.


We have to discuss with my collagues that how we can fix it, i'll get back to you with some news soon.


Regards,
istvan

User bd69837856

07-10-2010 14:55:05

Has any further progress been made on this issue?


One possibity, although obviously not my favoured one, could be to report undefined stereochemistry in the case of the atomrefs4 attribute being present, rather than the current behaviour which relies on the order of atoms in the molecule matching those in the ignored atomsrefs4 attribute hence giving cases like this where incorrect stereochemistry is generated.

ChemAxon 5433b8e56b

07-10-2010 23:20:30

Hi Daniel,


yes, there was some progress. We tried to push it into the 5.4 release, but unfortunatelly we did not have enough capacity to implement the handling, so we decide to implement it in the next major release after 5.4 with the reorganization, refactoring, and simplifying of the import/export in Marvin. We will notify you in this topic if it is ready.


Thank you for your patience, and sorry for the inconvenience.


Regards,
Istvan

ChemAxon 25dcd765a3

25-10-2011 11:12:39

Hi Daniel,


Your topic finally reached me and I think the given structure is Z isomer.


Calculating the priorities according to the CIP rules on the double bond (between atoms 29-30) ligands I find that the CN group has priority over the substituted benzene ring, which means that the 28-29=30-6 atom indexes are in CIS arrangement from which I can conclude that  the double bond is Z isomer.


Please correct me if I'm wrong.

User bd69837856

25-10-2011 12:11:39

The structure that MarvinSketch (currently 5.6.0.0) generates from that CML file is indeed the Z isomer.


The CML however states:


<bondStereo atomRefs4="a17 a18 a19 a2">T</bondStereo>


where a17 is the carbon of the cyanide and a2 is the carbon atom in the benzene ring with two methoxys and T means these two atoms are trans to each other.


(a18 and a19 are the two atoms in the double bond)


i.e. MarvinSketch's CIP implementation is working correctly, the problem is the CML reader ignoring the atomRefs4 when generating the structure


The generated Z isomer corresponds to: <bondStereo atomRefs4="a11 a18 a19 a2">T</bondStereo>


so I would hypothesise that Marvin is implicitly picking the atoms with the lowest ids on either end of the double bond and assigning them to be trans.














volfi wrote:



Hi Daniel,


Your topic finally reached me and I think the given structure is Z isomer.


Calculating the priorities according to the CIP rules on the double bond (between atoms 29-30) ligands I find that the CN group has priority over the substituted benzene ring, which means that the 28-29=30-6 atom indexes are in CIS arrangement from which I can conclude that  the double bond is Z isomer.


Please correct me if I'm wrong.


ChemAxon 25dcd765a3

27-10-2011 13:02:32

Hi Daniel,


Yes you are right.


We are fixing this issue.

ChemAxon 25dcd765a3

27-10-2011 14:56:11

Hi Daniel!


We have fixed this issue. Thank you for the report.


Marviin 5.8 will contain the fix. Until 5.8 we will have 5.7 release but its commit deadline is over so this fix cannot go to there.

User bd69837856

27-10-2011 15:13:31

Thanks for fixing this


Only took a year


On the topic of CML reading. I was just testing whether any similar problems exist with tetrahedral stereochemistry.


For tetrahedral stereochemistry the atomrefs4 seems to be being taken into account, however for a really simple example I generated the stereochemistry seemed to be always the opposite of what it should be (see attached).


Let me know what you think.

ChemAxon 25dcd765a3

28-10-2011 09:53:25

Hi Daniel,


Things are changing, you are in right hands now, that's why it is fixed :-).


It seems to me that you are right in tetrahedral stereochemistry with atomrefs. I'm just deriving if we are always the opposite or the problem is bigger.


We fix this issue as well.


Thank you for the report.

User bd69837856

04-04-2012 16:13:49

I noticed in 5.9 you introduced:


"Parity value in CML format was stored the opposite way as the CML
standard defines it. Now the correct value is stored which leads to
backward inconsistency with CML files generated by ChemAxon applications
in previous versions, but correct values are read from standard CML
files. To avoid running into similar problems in the future, now a
version information is stored both in CML and MRV formats. This
information contains a version number of the file format and the
application version by which the file was generated. The parity value is
changed to opposite in MRV format as well to be consistent with the CML
format but it does not lead to backward inconsistency in this case:
parity values stored in not versioned MRV files are converted to their
opposite while parity values from versioned MRV files are not."


Despite the wording "correct values are read from standard CML
files" it looks like you are checking for an attribute of value "version" (although not actually checking the content of the attribute) to toggle whether to invert the behaviour. Is this the intended mechanism of enabling the fixed support ;-)


I think there are also still cases where Marvin's interpretation of atom parity differs from CML's. The attached file I believe to be 1R,3S but depending on whether or not I add a version attribute is treated as S,S or R,R.


The ordering of the atomRefs4 in the file is identical to the clockwise ordering of the ligands around the two stereocentres starting from the wedge atom and ending on the hatch atom in Marvin's depiction. The atom parity however is opposite so this shouldn't be possible. Let me know what you think.


If anyone asks I should probably just recommend they use SMILES

ChemAxon 25dcd765a3

05-04-2012 10:05:46

Hi,


Despite the wording "correct values are read from standard CML
files" it looks like you are checking for an attribute of value
"version" (although not actually checking the content of the attribute)
to toggle whether to invert the behaviour. Is this the intended
mechanism of enabling the fixed support ;-)


I would read the text more thoroughly , it does not state that we check the version for CML files but: "now a

version information is stored both in CML and MRV formats".


This version information is used only in case of MRV format for backward compatibility reason, but not in case of CML format.




I think there are also still cases where Marvin's interpretation of atom parity differs from CML's. 

Thank you for the report, I'll check this issue. If it is a bug we will fix it.


If anyone asks I should probably just recommend they use SMILES.

You are right, our SMILES IO is much more tested due to the fact it is used more often and we could find plenty of test structures. Could you suggest a warehouse of CML structures which we could use for testing? That would be useful and hopefully improve our CML IO code quality.

User bd69837856

05-04-2012 14:42:21










volfi wrote:

This version information is used only in case of MRV format for backward compatibility reason, but not in case of CML format.













I think there may be a bug then as in my testing (on CML files) the addition of a version attribute was clearly effecting the interpretation of atom parity.


volfi wrote:



 You are right, our SMILES IO is much more tested due to the fact it
is used more often and we could find plenty of test structures. Could
you suggest a warehouse of CML structures which we could use for
testing? That would be useful and hopefully improve our CML IO code
quality.



http://opsin.ch.cam.ac.uk/ can generate CML from most IUPAC names. If you wanted a large set of CML structures you could probably throw one of the sets of IUPAC names that Daniel B is sure to have through it.


At the time that I started this thread OPSIN did not output SMILES natively hence some of the complaints I got about stereochemistry precision were just due to Marvin reading it incorrectly from the CML. I don't get that sort of complaint anymore assumedly as most users now just use the SMILES output. Beyond fixing these obvious bugs, to be honest I probably wouldn't go out of your way to try and see whether there are more bugs in CML reading, it isn't an especially common format...

ChemAxon 25dcd765a3

10-04-2012 10:56:20

 


I think there may be a bug then as in my testing 
(on CML files) the addition of a version attribute was clearly effecting
the interpretation of atom parity.




Could you please tell me how to reproduce this effect. If you were able to get different chirality value based on the version information in the CML file, this is definitely a bug which we would like to fix.


Thank you for suggesting opsin to generate CML from IUPAC name for test molecules, we will make a try.

User bd69837856

10-04-2012 12:49:35










volfi wrote:

 


I think there may be a bug then as in my testing 
(on CML files) the addition of a version attribute was clearly effecting
the interpretation of atom parity.




Could you please tell me how to reproduce this effect. If you were able to get different chirality value based on the version information in the CML file, this is definitely a bug which we would like to fix.


Thank you for suggesting opsin to generate CML from IUPAC name for test molecules, we will make a try.



I attach a copy of the CML file I uploaded previously but with a version attribute manually added. On Marvin 5.9.1 the files are interpreted as enantiomers i.e. opposite interpretation of stereochemistry.


I should emphasise though that this is a different problem to my main problem with the interpretation which is that in either interpretation both are assigned the same R/S label whilst the stereocentres should have opposite labels. This is not a bug in Marvin's R/S assignment.

ChemAxon 25dcd765a3

12-04-2012 15:41:56

Thank you for the attachment.


So we have two distinct bugs:


1) cml format is misinterpreted (R,R instead of R, S)


2) cml format somehow takes care for the version information (R,R becomes S, S)


I could reproduce both problem, thank you.

User bd69837856

12-04-2012 21:45:35

Exactly.


I thought I would also add that OpenBabel is another good source of CML (or for verifying the stereochemistry in CML)

ChemAxon 25dcd765a3

16-04-2012 13:13:16










dan2097 wrote:

Exactly.


I thought I would also add that OpenBabel is another good source of CML (or for verifying the stereochemistry in CML)



Yes I have tried that, however it does not work always as expected:


obabel ~/Downloads/\(1R,3S\)-3-amino-3-methylcyclohexanecarboxylic\ acid.cml -o sdf -O test.sdf --gen2D

the generated sdf does not contain any stereo information.