Different results depending on the format

User 918876d6ff

13-03-2013 17:23:24

I encounter a really strange problem


I have two chemical compounds, in 2 different databases, that are supposed to be the same:


- kegg:C03516 http://www.genome.jp/dbget-bin/www_bget?C03516


- CHEBI:15431 http://www.ebi.ac.uk/chebi/searchId.do?chebiId=CHEBI:15431


 


If I load these two compounds with MolImporter using their respective molfiles ( http://www.genome.jp/dbget-bin/www_bget?-f+m+compound+C03516 for kegg:C03516
and
http://www.ebi.ac.uk/chebi/saveStructure.do;jsessionid=4803C46E7EC8609A093ABB7548715337?defaultImage=true&chebiId=15431&imageId=0 for CHEBI:15431) and ask for the InchI and the SMILES of each, jchem return the same InchI and the same SMILES for both, which is correct :


InChI returned for kegg:C03516


InChI=1S/C34H34N4O4.Mg/c1-7-21-17(3)25-13-26-19(5)23(9-11-33(39)40)31(37-26)16-32-24(10-12-34(41)42)20(6)28(38-32)15-30-22(8-2)18(4)27(36-30)14-29(21)35-25;/h7-8,13-16H,1-2,9-12H2,3-6H3,(H4,35,36,37,38,39,40,41,42);/q;+2/p-2/b25-13-,26-13-,27-14-,28-15-,29-14-,30-15-,31-16-,32-16-;    


SMILES returned for kegg:C03516


CC1=C(CCC(O)=O)C2=N\C\1=C/C1=C(C)C(C=C)=C3\C=C4/N=C(/C=C5\N([Mg]N13)/C(=C\2)C(CCC(O)=O)=C5C)C(C=C)=C4C

InChI returned for chebi:15431   


InChI=1S/C34H34N4O4.Mg/c1-7-21-17(3)25-13-26-19(5)23(9-11-33(39)40)31(37-26)16-32-24(10-12-34(41)42)20(6)28(38-32)15-30-22(8-2)18(4)27(36-30)14-29(21)35-25;/h7-8,13-16H,1-2,9-12H2,3-6H3,(H4,35,36,37,38,39,40,41,42);/q;+2/p-2/b25-13-,26-13-,27-14-,28-15-,29-14-,30-15-,31-16-,32-16-;    


SMILES returned for chebi:15431


CC1=C(CCC(O)=O)C2=N\C\1=C/C1=C(C)C(C=C)=C3\C=C4/N=C(/C=C5\N([Mg]N13)/C(=C\2)C(CCC(O)=O)=C5C)C(C=C)=C4C


 


But now, if I ask for the majormicrospecies at pH 7.3, I have different results depending wether I used the kegg or the chebi molfile, which is strange:


InChI returned for major microspecies of kegg:C03516 at pH 7.3


InChI=1S/C34H34N4O4.Mg/c1-7-21-17(3)25-13-26-19(5)23(9-11-33(39)40)31(37-26)16-32-24(10-12-34(41)42)20(6)28(38-32)15-30-22(8-2)18(4)27(36-30)14-29(21)35-25;/h7-8,13-16H,1-2,9-12H2,3-6H3,(H4,35,36,37,38,39,40,41,42);/q;+2/p-4/b25-13-,26-13-,27-14-,28-15-,29-14-,30-15-,31-16-,32-16-;    


SMILES returned for major microspecies of kegg:C03516 at pH 7.3


CC1=C(CCC([O-])=O)C2=N\C\1=C/C1=C(C)C(C=C)=C3\C=C4/N=C(/C=C5\N([Mg]N13)/C(=C\2)C(CCC([O-])=O)=C5C)C(C=C)=C4C


InChI returned for major microspecies of chebi:15431 at pH 7.3


InChI=1S/C34H34N4O4.Mg/c1-7-21-17(3)25-13-26-19(5)23(9-11-33(39)40)31(37-26)16-32-24(10-12-34(41)42)20(6)28(38-32)15-30-22(8-2)18(4)27(36-30)14-29(21)35-25;/h7-8,13-16,36-37H,1-2,9-12H2,3-6H3,(H,39,40)(H,41,42);/q-2;+2/p-2/b25-13-,26-13-,27-14-,28-15-,29-14-,30-15-,31-16-,32-16-; 


SMILES returned for major microspecies of chebi:15431 at pH 7.3


CC1=C2NC(\C=C3/N4[Mg]N5\C(=C/2)C(C)=C(C=C)\C\5=C\C2=C(C)C(C=C)=C(N2)\C=C4\C(C)=C3CCC([O-])=O)=C1CCC([O-])=O


Any explanation to this?


 


Also, even more strange. If instead of importing the kegg compound from the molfile, I import it using its InChI and ask jchem to give me back the InChI of this molecule, I have different values:


InChI used for the import:


InChI=1S/C34H34N4O4.Mg/c1-7-21-17(3)25-13-26-19(5)23(9-11-33(39)40)31(37-26)16-32-24(10-12-34(41)42)20(6)28(38-32)15-30-22(8-2)18(4)27(36-30)14-29(21)35-25;/h7-8,13-16H,1-2,9-12H2,3-6H3,(H4,35,36,37,38,39,40,41,42);/q;+2/p-2/b25-13-,26-13-,27-14-,28-15-,29-14-,30-15-,31-16-,32-16-;   


InChI returned by jchem:


InChI=1S/C34H34N4O4.Mg/c1-7-21-17(3)25-13-26-19(5)23(9-11-33(39)40)31(37-26)16-32-24(10-12-34(41)42)20(6)28(38-32)15-30-22(8-2)18(4)27(36-30)14-29(21)35-25;/h7-8,13-16H,1-2,9-12H2,3-6H3,(H4,35,36,37,38,39,40,41,42);/q;+2/p-2/b25-13-,26-13?,27-14?,28-15-,29-14-,30-15?,31-16-,32-16?;  


So why is some information, encoded in the InChI used for the import, lost during the process?


SMILES returned by jchem:


 [Mg++].[H]\C1=C2\[N-]\C(=C([H])/C3=N/C(=C([H])\C4=N\C(=C([H])/C5=C(C=C)C(C)=C1N5)\C(C)=C4CCC(O)=O)/C(CCC([O-])=O)=C3C)C(C)=C2C=C


which differ from the one obtained when imported from the molfile...


 


And if now I ask for the major microspecies of this compound (the one imported from the InchI) at pH 7.3 , the information of the Mg is lost...:


InChI returned by jchem for the major microspecies at pH 7.3:


InChI=1S/C34H34N4O4/c1-7-21-17(3)25-13-26-19(5)23(9-11-33(39)40)31(37-26)16-32-24(10-12-34(41)42)20(6)28(38-32)15-30-22(8-2)18(4)27(36-30)14-29(21)35-25/h7-8,13-16,35-36H,1-2,9-12H2,3-6H3,(H,39,40)(H,41,42)/p-2  


SMILES returned by jchem for the major microspecies at pH 7.3:


CC1=C(CCC([O-])=O)C2=NC1=CC1=C(C=C)C(C)=C(N1)C=C1NC(=CC3=NC(=C2)C(CCC([O-])=O)=C3C)C(C)=C1C=C


 


the same kind of inconsistencies occur if I import from the SMILES


SMILES used for the import:


CC1=C(CCC(O)=O)C2=N\C\1=C/C1=C(C)C(C=C)=C3\C=C4/N=C(/C=C5\N([Mg]N13)/C(=C\2)C(CCC(O)=O)=C5C)C(C=C)=C4C


InChI returned by jchem:


InChI=1S/C34H34N4O4.Mg/c1-7-21-17(3)25-13-26-19(5)23(9-11-33(39)40)31(37-26)16-32-24(10-12-34(41)42)20(6)28(38-32)15-30-22(8-2)18(4)27(36-30)14-29(21)35-25;/h7-8,13-16H,1-2,9-12H2,3-6H3,(H4,35,36,37,38,39,40,41,42);/q;+2/p-2/b25-13?,26-13-,27-14-,28-15-,29-14?,30-15?,31-16?,32-16-;    


SMILES returned by jchem:


CC1=C(CCC(O)=O)C2=N\C\1=C/C1=C(C)C(C=C)=C3\C=C4/N=C(/C=C5\N([Mg]N13)/C(=C\2)C(CCC(O)=O)=C5C)C(C=C)=C4C


this time the SMILE used for the import and the one returned are coherent


InChI returned by jchem for the major microspecies at pH 7.3:


InChI=1S/C34H34N4O4.Mg/c1-7-21-17(3)25-13-26-19(5)23(9-11-33(39)40)31(37-26)16-32-24(10-12-34(41)42)20(6)28(38-32)15-30-22(8-2)18(4)27(36-30)14-29(21)35-25;/h7-8,13-16H,1-2,9-12H2,3-6H3,(H4,35,36,37,38,39,40,41,42);/q;+2/p-4  


SMILES returned by jchem for the major microspecies at pH 7.3:


CC1=C(CCC([O-])=O)C2=CC3=C(CCC([O-])=O)C(C)=C4C=C5N=C(C=C6N([Mg]N34)C(=CC1=N2)C(C)=C6C=C)C(C)=C5C=C    
this SMILES differ from the one obtained when importing from the molfile and asking for the major microspecies at pH 7.3 :


CC1=C(CCC([O-])=O)C2=N\C\1=C/C1=C(C)C(C=C)=C3\C=C4/N=C(/C=C5\N([Mg]N13)/C(=C\2)C(CCC([O-])=O)=C5C)C(C=C)=C4C


 


So my question is, why did the results differ depending on the format used for the import?


 


PS: I use jchem release 5.12

ChemAxon a202a732bf

14-03-2013 11:00:30

Dear Thomas,


I have tried to reproduce your problem, but I could not, because it seems that the two mol files are different. The one from CHEBI database contains coordinate bonds and the other from kegg does not. Major microspecies can not even be calculated for the molecule from the CHEBI database because of the coordinate bonds. Could you please check the two molecules in the mol files?


The SMILES formats does not differ because SMILES does not store coordinate bonds.


Regards,


Zsuzsa

User 918876d6ff

14-03-2013 15:22:41

Dear Zsuzsa


       I may have not express myself properly. I agree that the 2 molfiles are not exactly the same as the one from CHEBI contains coordinates bonds that the one from KEGG do not include. However, what I do not understand is why these two mol files give exactly the same InChI but not the same major tautomer at ph 7.3 (the InchIs of the major tautomers differ)? (Remark: the coordinate bonds in CHEBI do not prevent the MajorMicrospeciesPlugin to run and give a result)


The second thing I don't understand is why, Importing the KEGG compound from its InChI gives different results than importing it from its molfile? And especially, why the InChI returned by Jchem is not the same as the one used to import the molecule?


    Thanks for your answers


Regards,


     Thomas


 


 


 



ChemAxon a202a732bf

18-03-2013 17:56:12

Dear Thomas,


I have checked the issues you have raised. I have found the followings that may cause the problems you have experienced.







I hope this helps, best regards,


Zsuzsa

ChemAxon fc046975bc

24-02-2014 12:47:18

Dear Thomas,


Our next major release, 6.3 will contain the fix. InChi export will not export coordinate bonds as single one, but omit it.


Best Regards,
Peter