Stereoisomer names in cxcalc and marvin different?

User 677b9c22ff

04-11-2008 19:35:08

HI,


i tested all stereoisomers from an inositol OC1C(O)C(O)C(OC2C(O)C(O)C(O)C(O)C2O)C(O)C1O and got 532. The number is wrong and lower (528), I will open a ticket for that later.





If I calculate the names with cxcalc name inositols-532.smi I get:








Code:



O[C@H]1[C@@H](O)[C@@H](O)[C@@H](O[C@H]2[C@@H](O)[C@H](O)[C@H](O)[C@@H](O)[C@H]2O)[C@H](O)[C@@H]1O


O[C@H]1[C@@H](O)[C@@H](O)[C@H](O[C@H]2[C@@H](O)[C@H](O)[C@H](O)[C@@H](O)[C@H]2O)[C@H](O)[C@@H]1O











Code:



527     (1R,2R,4R,5R)-6-{[(1S,2R,3R,4S,5R,6S)-2,3,4,5,6-pentahydroxycyclohexyl]oxy}cyclohexane-1,2,3,4,5-pentol


528     (1R,2R,4R,5R)-6-{[(1S,2R,3R,4S,5R,6S)-2,3,4,5,6-pentahydroxycyclohexyl]oxy}cyclohexane-1,2,3,4,5-pentol








which is the same name (possible if the name algo is a strong canonizer).





But in Marvin 5.0.1 (which has confirmed stereogen issues) I get


different names:





Code:



Preferred IUPAC Name = (1R,2R,3S,4R,5S,6S)-6-{[(2R,3R,5R,6R)-2,3,4,5,6-


pentahydroxycyclohexyl]oxy}cyclohexane-1,2,3,4,5-


pentol





Preferred IUPAC Name = (1R,2R,4R,5R)-6-{[(1S,2R,3R,4S,5R,6S)-2,3,4,5,6-


pentahydroxycyclohexyl]oxy}cyclohexane-1,2,3,4,5-


pentol








I also checked different options (single mode , non IUPAC) etc.


But cxcalc generates different names.


I attached two files, one with the smiles and one with the generated names.


Tobias

ChemAxon e7b9408ca1

05-11-2008 14:25:13

Dear Tobias,





As far as I can see the situation is as follows. The molecules are the same, in particular they have the same chirality, they are just represented differently. Both names generated by Marvin 5.0 are correct (they depend on which cycle is chosen to be the parent). For implementation reasons, Marvin 5.1 generates the same name in both cases.





In general, do not expect that the "same" molecule will be given only one name. The IUPAC standard specifically makes several names acceptable is countless cases. There is a draft specification from IUPAC that tries to assign a preferred name is such cases, but


1. it is only a draft


2. it does not cover all cases (yet). For instance I could not find any rule saying which of those two names is preferred.





Is this satisfactory?

User 677b9c22ff

05-11-2008 18:12:50

Hi Daniel,


The substances are not the same, this is a (possibly severe) error in the naming algorithm. See also Inositol naming bug.





Unique SMILES (are different):


O[C@H]1[C@@H](O)[C@@H](O)[C@@H](O[C@H]2[C@@H](O)[C@H](O)[C@H](O)[C@@H](O)[C@H]2O)[C@H](O)[C@@H]1O


O[C@H]1[C@@H](O)[C@@H](O)[C@H](O[C@H]2[C@@H](O)[C@H](O)[C@H](O)[C@@H](O)[C@H]2O)[C@H](O)[C@@H]1O





Code:



cxcalc name "O[C@H]1[C@@H](O)[C@@H](O)[C@@H](O[C@H]2[C@@H](O)[C@H](O)[C@H](O)[C@@H](O)[C@


H]2O)[C@H](O)[C@@H]1O"


id      Preferred IUPAC Name


1       (1R,2R,4R,5R)-6-{[(1S,2R,3R,4S,5R,6S)-2,3,4,5,6-pentahydroxycyclohexyl]oxy}cyclohexane-1,2,3,4,5-pentol


cxcalc name "O[C@H]1[C@@H](O)[C@@H](O)[C@H](O[C@H]2[C@@H](O)[C@H](O)[C@H](O)[C@@H](O)[C@


H]2O)[C@H](O)[C@@H]1O"


id      Preferred IUPAC Name


1       (1R,2R,4R,5R)-6-{[(1S,2R,3R,4S,5R,6S)-2,3,4,5,6-pentahydroxycyclohexyl]oxy}cyclohexane-1,2,3,4,5-pentol








Now from Marvin, after the SMILES are cononized,


the SMILES trings have a different string length,


that means they are not the same.





However now the names generated are the same.


So either the SMILES canonizer is broken, which would be really really


bad or the name generator has this mentioned bug.





Code:



Preferred IUPAC Name = (1R,2R,4R,5R)-6-{[(1S,2R,3R,4S,5R,6S)-2,3,4,5,6-pentahydroxycyclohexyl]oxy}cyclohexane-1,2,3,4,5-pentol


Preferred IUPAC Name = (1R,2R,4R,5R)-6-{[(1S,2R,3R,4S,5R,6S)-2,3,4,5,6-pentahydroxycyclohexyl]oxy}cyclohexane-1,2,3,4,5-pentol








The substances are not the same. See attached picture.





For whatever reason I always thought that IUPAC Names


are canonized names, so wherever the algorithm starts,


it should find the same name. That is not the case here.





Actually the structure name always could be generated from


the uniuqe (canonized) SMILES. In this way such future mixups would be avoided. I am not sure if this is feasible.





Cheers


Tobias

ChemAxon e7b9408ca1

07-11-2008 15:00:10

Dear Tobias,





You are absolutely right, the molecules are indeed different and should have different names. The reason for this is that marvin in general does not detect those topological differences as chiralities 'r' and 's' (the way it does for the other atoms with 'R' and 'S'). Once it does, which is currently planned for release 5.2, these chiralities will also be included in the generated name, and will solve in particular the issue you reported.





Best regards,





Daniel

User 677b9c22ff

17-11-2008 20:28:19

Hi,


thanks Daniel. It must be quite complicated, given the mess


with names one can find in a plethora of publications. Therefore


InChI might be a good solution.


Tobias

ChemAxon e7b9408ca1

18-11-2008 11:13:50

Yes, InChI (or smiles) are easier ways to generate unique strings identifying structures. But names can be better at conveying a human-readable sense of the nature of the structure. In this case, as soon as marvin supports (r) and (s) stereo information, the name will be unique as well.

ChemAxon e7b9408ca1

18-10-2013 12:15:18

Tobias,


This comes late, but I'm happy to report that since version 5.11 our stereochemistry engine fully supports these cases. The generated names are now:


(1R,2R,3S,4R,5R,6S)-6-{[(1S,2R,3R,4S,5R,6S)-2,3,4,5,6-pentahydroxycyclohexyl]oxy}cyclohexane-1,2,3,4,5-pentol


(1R,2R,3R,4R,5R,6R)-6-{[(1S,2R,3R,4S,5R,6S)-2,3,4,5,6-pentahydroxycyclohexyl]oxy}cyclohexane-1,2,3,4,5-pentol



User 677b9c22ff

08-11-2013 05:25:05

Hi,


Rome wasn't built in a day. :-)


Thanks and Cheers


Tobias