2D clean results in bond angles that confuse InChI algorithm

User a11e9761d6

28-03-2011 18:14:57

We have two structures with the following CXSMILES:


CC(C)NC(=O)N(C)C[C@@H]1OCCCC[C@@H](C)Oc2ccc(NC(=O)Nc3cccc4ccccc34)cc2C(=O)N(C[C@@H]1C)[C@@H](C)CO


and 


CC(C)NC(=O)N(C)C[C@@H]1OCCCC[C@@H](C)Oc2ccc(NC(=O)Nc3cccc4ccccc34)cc2C(=O)N(C[C@@H]1C)[C@H](C)CO


We were surprised to discover that when we used the ChemAxon API to convert these to molfiles and generated InChIs from them, the InChIs were the same. According to several people on the InChI discussion list, this is because one of the stereocenters is drawn with very unrealistic bond angles, so the InChI algorithm considers it unspecified.


InChIs and InChIKeys are an important part of our application, so we would definitely like a solution for this issue.


 


Thanks in advance,


Krishna

ChemAxon 0a9e2a55e1

29-03-2011 11:09:08

Dear Krishna,



I have tested these 2 smiles in Marvin 5.4.


In the following code in the API:



    Molecule m1 = MolImporter.importMol("CC(C)NC(=O)N(C)C[C@@H]1OCCCC[C@@H](C)Oc2ccc(NC(=O)Nc3cccc4ccccc34)cc2C(=O)N(C[C@@H]1C)[C@@H](C)CO");
    Molecule m2 = MolImporter.importMol("CC(C)NC(=O)N(C)C[C@@H]1OCCCC[C@@H](C)Oc2ccc(NC(=O)Nc3cccc4ccccc34)cc2C(=O)N(C[C@@H]1C)[C@H](C)CO");
    System.out.println(m1.toFormat("inchi:AuxNone"));
    System.out.println(m2.toFormat("inchi:AuxNone"));

I get different inchis:


InChI=1S/C37H51N5O6/c1-24(2)38-37(46)41(6)22-34-25(3)21-42(26(4)23-43)35(44)31-20-29(17-18-33(31)48-27(5)12-9-10-19-47-34)39-36(45)40-32-16-11-14-28-13-7-8-15-30(28)32/h7-8,11,13-18,20,24-27,34,43H,9-10,12,19,21-23H2,1-6H3,(H,38,46)(H2,39,40,45)/t25-,26-,27+,34-/m0/s1
InChI=1S/C37H51N5O6/c1-24(2)38-37(46)41(6)22-34-25(3)21-42(26(4)23-43)35(44)31-20-29(17-18-33(31)48-27(5)12-9-10-19-47-34)39-36(45)40-32-16-11-14-28-13-7-8-15-30(28)32/h7-8,11,13-18,20,24-27,34,43H,9-10,12,19,21-23H2,1-6H3,(H,38,46)(H2,39,40,45)/t25-,26+,27+,34-/m0/s1


In the /t part the 26+ and 26- is different.


I also tried it in molconvert with both smiles and on the site with the applet importing them and getting the inchi in Edit/Source and got the same results, and on the site the when displaying the smiles the 2D clean is running before the export.


Maybe You have found a bug in specific configuration. Which Marvin version and operating system do You use? Is the molconvert and the applet gives the same result? Did You use the importMol and toFormat in API as I have in the code above? If not, can You send me the import and export line from Your code?


We are currently rewriting the inchi converter in out code, using the API of the IUPAC code, not executable, it will be faster, and will not need clean any more. I have checked this in the new converter also, and it also gives above the different result for Your smiles.


Best Regards,


Peter

User a11e9761d6

05-04-2011 19:03:49

Hi Peter,


Thank you for your very detailed reply!  We do the following to produce InChIs:


1) convert the structure to a molfile using the code below. I also tried running a 2-d clean in the Marvin applet, and the coordinates did not change at all, so the issue appears to occur with the applet as well. (we are running MarvinSketch 5.3.6; how can I find out what JChem version we're running?)


2) pass this molfile to the IUPAC binary


 


Code used to generate the molfile (happens to be called through Ruby):


 


    # We have to use MolConverter because MolHandler cannot produce a molfile with coordinate data


    MolConverter.new_with_sig(MOL_CONVERTER_CONSTRUCTOR_SIG, input, output, "mol:-a", false).convert


 


Does calling toFormat("inchi:AuxNone") produce exactly the same result as the IUPAC 1.03 InChI library? We may switch to this if so.


Thanks again,


Krishna


 


 

ChemAxon 0a9e2a55e1

07-04-2011 08:09:10

Dear Krishna,


In one release the jchem and marvin version is the same, so You have most probably a JChem 5.3.6.


I do not know Ruby, but as I see the code I think You use it like out molconverter command scripts that is for converting file formats, so its a good solution.


In all the previous releases Marvin called the original IUPAC code, and produced the same result in most cases. (The problem was that there was a memory leak, so Marvin could throw exceptions after a large number of structures, and that the 0d molfiles can not store stereo information, so we had to clean before export, and that was slow, and we could not import stereo info from 0d molecules - and the inchi string without auxinfo is 0d, and many user asked us to improve it.) The latest release (5.4.1.1) used inchi 1.02.


The next release (5.5) will convert inchi with the jni-inchi sourceforge project and use inchi 1.03 from API. The problem is that we do not support allenes and cumulenes yet, and there are minor differences when writing out some specially projected 2D molecules - these are mostly wrongly drawn structures, for example unrealistic wedge information - and we do not add automatically hydrogenes to metals. So its not exactly the same result as the 1.03 executable for all molecules, but only different in some very special cases.


Best Regards,


Peter

User a11e9761d6

07-04-2011 18:27:47

Hi Peter,


Thanks for your clarifications. Based on your description I am reluctant to switch to ChemAxon 5.5's InChI generation because we do have allenes in our database, and it sounds like there are other compounds that would also fail to produce InChIs. Do you have any suggestions for how we could export/convert structures in a way that would avoid the bond angle problem we are currently having between Marvin and the IUPAC binary?


Krishna

ChemAxon 0a9e2a55e1

08-04-2011 10:51:26

Dear Krishna,


After the first post I have written that when I have checked the smiles there I got different inchis. So it seems from Your post that You got the same inchi, I just can not reproduce it. Is it one of the inchis I got, or a totally different one? (Thats really strange: in the previous versions Marvin converts to mol, then calls the IUPAC executable...)


I do not know who said on that forum that the bond angles are not realistic, and if they have used Marvin, but I do not know about any complains about our bond angles in 2D or 3D, if I open the structure You have sent in MSketch it also does not seem to be wrongly drawn in 2d or 3d.


Best Regards,


Peter

User a11e9761d6

11-04-2011 21:35:59

Hi Peter,


Hopefully the attached SDF will answer your questions. In it you will see:


1) the CXSMILES that we start with, which are the same as what I posted


2) the molfiles that we generate from the CXSMILES, using the MolConverter command I posted. When I paste these molfiles into Marvin, the stereocenter that distinguishes the two molecules is drawn with a bond angle less than 45 degrees. According to multiple users on the inchi-discuss mailing list, this bond angle is what causes to IUPAC executable to consider that stereocenter undefined. Performing a 2-D clean does not change the bond angle. What molfiles do you get when you convert these CXSMILES?


3) the InChIKeys that we get by passing the molfiles to the IUPAC executable, which you can see are the same.


 


Krishna




 

ChemAxon 0a9e2a55e1

13-04-2011 07:58:14

Dear Krishna,


I can not see any sdfile in this forum. Could You upload it again?


Best Regards,


Peter

User a11e9761d6

13-04-2011 22:12:19

Oops! How about this time...

ChemAxon 0a9e2a55e1

15-04-2011 07:59:44

Dear Krishna,


I have checked the structures in 5.3.6, the current version (5.4.1) and the currently alpha upcoming version (5.5). I could not clean in 2D the structure in 5.3.6, but it was cleaned in the others, and as far as I see the bond angles are normal in 5.4.1, and 5.5. (I have checked the marvin all changes and it seems we did not correct any bond angle bug since 5.3.0, and I could not found any reports about problematic bond angles.)


Please check one of out applet examples:


http://www.chemaxon.com/marvin/sketch/index.php


insert this structure, clean in 2D and check the problematic bonds in the result. If it is good then I think it is an 5.3.6 specific bug, and simply upgrading to 5.4.1 or the upcoming 5.5 will fix Your problem.


Best Regards,


Peter

User a11e9761d6

18-04-2011 19:38:38

Hi Peter,


 


Yes, confirmed-- I get different InChIs when I use molfiles generated from http://www.chemaxon.com/marvin/sketch/index.php . Thanks for your help.


 


Krishna