[URGENT PROBLEM] Stereochemistry with InChI and Jcsearch.

User 779e37e0e6

22-02-2012 18:28:28

Hi,


I am currently working on substructure searching and use molconvert/jcsearch to search for compounds.


I take smiles strings as input (canonical and/or isomeric). When I run jcsearch to search for substructures in one or many query structure(s), the smiles strings of the superstructure(s)  returned by jcsearch is/are not always matching the input smiles strings. As an example:


 


>> $ jcsearch -q '[$(N.O)]' -f smiles "NC.C.O"


      C.O.CN    (<-----Results)


As you see: Input="NC.C.O", and output="C.O.CN". So, from the output, it is difficult to match the output smiles to the corresponding input smiles.


As a way to go around this, I decided to convert the input smiles to InchIs, store them, and then run jcsearch so as to return inchi instead of smiles.


So in this case, the corresponding inchi returned by molconvert would be: 


InChI=1S/CH5N.CH4.H2O/c1-2;;/h2H2,1H3;1H4;1H2


AuxInfo=1/0/N:2,1;3;4/rA:4nNCCO/rB:s1;;;/rC:.4125,0,0;-.4125,0,0;2.6518,0,0;0,-1.65,0;


 >>>>>>PS: I usually consider only the first line (InChI=1S/CH5N.CH4.H2O/c1-2;;/h2H2,1H3;1H4;1H2) and it is stored with the corresponding smiles



If I run  jcsearch -q '[$(N.O)]' -f inchi "NC.C.O", I get the following:

   InChI=1S/CH5N.CH4.H2O/c1-2;;/h2H2,1H3;1H4;1H2

   AuxInfo=1/0/N:2,1;3;4/rA:4nNCCO/rB:s1;;;/rC:;;;;



You can see that the first line in the output "InChI=1S/CH5N.CH4.H2O/c1-2;;/h2H2,1H3;1H4;1H2" is the same as the InChI I stored. And it is easy to identify the corresponding input smiles. HOWEVER, I noticed that it did not solve everything. There is a problem if the input smiles string is isomeric. For example: 


The Inchi for the isomeric smiles "O[C@H]1CN(CC[C@H](NCC[C@H](O)C(O)=O)C(O)=O)[C@@H]1C(O)=O" returned by molconvert is : InChI=1S/C12H20N2O8/c15-7(11(19)20)1-3-13-6(10(17)18)2-4-14-5-8(16)9(14)12(21)22/h6-9,13,15-16H,1-5H2,(H,17,18)(H,19,20)(H,21,22)/t6-,7-,8-,9-/m0/s1



If I run  jcsearch -q '[$(C(O)=O)]' -f inchi  "O[C@H]1CN(CC[C@H](NCC[C@H](O)C(O)=O)C(O)=O)[C@@H]1C(O)=O",  to check for the presence of a carboxylic acid group, I get the following InChI: InChI=1S/C12H20N2O8/c15-7(11(19)20)1-3-13-6(10(17)18)2-4-14-5-8(16)9(14)12(21)22/h6-9,13,15-16H,1-5H2,(H,17,18)(H,19,20)(H,21,22), which to my interpretation, corresponds to the canonical version of the input smiles string. 

As you can see, I cannot match the result to the input smiles, because the Inchi associated with it is different from what I get from jcsearch.


How do I solve this? One way would be to find a way for jcsearch to output the InchI taking into account the stereoisomeric configuration. This way, jcsearch would return the same inchi given by molconvert. Is there a jcsearch option for this? Another option would be to convert the smiles to inchi without taking care of the stereochemistry. Thus molconvert would give the same inchi that is returned by jcsearch. Either of the two solutions would be helpful, but I would have to pay the price of loosing information on stereochemistry. That will penalize me down the road when I would have to differentiate molecules with different configurations.


Could you please help me here?




Regards,



Yannick

ChemAxon 9c0afc9aaf

23-02-2012 19:30:47

There might be other options, just one quick tip:


You could try to convert your input to unique (canonical) SMILES ("smiles:u"), and use the same format for jcsearch output (-f) too.


Best,


Szilard

User 779e37e0e6

28-02-2012 21:22:21

Thanks Szilard,


This is also one option I tried. 


Regards,


Yannick

ChemAxon 25dcd765a3

29-02-2012 09:44:49

Strange I have different result:


Jcsearch result


jchemsite/bin/jcsearch -q '[$(C(O)=O)]' -f inchi  "O[C@H]1CN(CC[C@H](NCC[C@H](O)C(O)=O)C(O)=O)[C@@H]1C(O)=O"
InChI=1S/C12H20N2O8/c15-7(11(19)20)1-3-13-6(10(17)18)2-4-14-5-8(16)9(14)12(21)22/h6-9,13,15-16H,1-5H2,(H,17,18)(H,19,20)(H,21,22)/t6-,7-,8-,9-/m0/s1

AuxInfo=1/1/N:10,6,9,5,3,7,11,2,19,16,13,20,8,4,12,1,17,18,14,15,21,22/E:(17,18)(19,20)(21,22)/it:im/rA:22OC.oCNCCC.eNCCC.eOCOOCOOC.eCOO/rB:s1;s2;s3;s4;s5;s6;s7;s8;s9;s10;s11;s11;s13;d13;s7;s16;d16;s2s4;s19;s20;d20;/rC:;;;;;;;;;;;;;;;;;;;;;;


molconvert generates exactly the same inchi :



marvin/trunk> molconvert inchi -s "O[C@H]1CN(CC[C@H](NCC[C@H](O)C(O)=O)C(O)=O)[C@@H]1C(O)=O"
InChI=1S/C12H20N2O8/c15-7(11(19)20)1-3-13-6(10(17)18)2-4-14-5-8(16)9(14)12(21)22/h6-9,13,15-16H,1-5H2,(H,17,18)(H,19,20)(H,21,22)/t6-,7-,8-,9-/m0/s1
AuxInfo=1/1/N:10,6,9,5,3,7,11,2,19,16,13,20,8,4,12,1,17,18,14,15,21,22/E:(17,18)(19,20)(21,22)/it:im/rA:22OC.oCNCCC.eNCCC.eOCOOCOOC.eCOO/rB:s1;s2;s3;s4;s5;s6;s7;s8;s9;s10;s11;s11;s13;d13;s7;s16;d16;s2s4;s19;s20;d20;/rC:;;;;;;;;;;;;;;;;;;;;;;






So it seems that somehow we got different results.

Which version are you use?

Could you also check If you get the same result as me?

ChemAxon a9ded07333

29-02-2012 09:59:58

Hi Yannick,


Do I understand well that you have tried converting your structures to unique smiles and also formatted the jcsearch output? Was there any problem with that method?


Could you also let us know your environment and JChem version number?


Best regards,
Tamás