Hi,
I am currently working on substructure searching and use molconvert/jcsearch to search for compounds.
I take smiles strings as input (canonical and/or isomeric). When I run jcsearch to search for substructures in one or many query structure(s), the smiles strings of the superstructure(s) returned by jcsearch is/are not always matching the input smiles strings. As an example:
>> $ jcsearch -q '[$(N.O)]' -f smiles "NC.C.O"
C.O.CN (<-----Results)
As you see: Input="NC.C.O", and output="C.O.CN". So, from the output, it is difficult to match the output smiles to the corresponding input smiles.
As a way to go around this, I decided to convert the input smiles to InchIs, store them, and then run jcsearch so as to return inchi instead of smiles.
So in this case, the corresponding inchi returned by molconvert would be:
InChI=1S/CH5N.CH4.H2O/c1-2;;/h2H2,1H3;1H4;1H2
AuxInfo=1/0/N:2,1;3;4/rA:4nNCCO/rB:s1;;;/rC:.4125,0,0;-.4125,0,0;2.6518,0,0;0,-1.65,0;
>>>>>>PS: I usually consider only the first line (InChI=1S/CH5N.CH4.H2O/c1-2;;/h2H2,1H3;1H4;1H2) and it is stored with the corresponding smiles
If I run jcsearch -q '[$(N.O)]' -f inchi "NC.C.O", I get the following:
InChI=1S/CH5N.CH4.H2O/c1-2;;/h2H2,1H3;1H4;1H2
AuxInfo=1/0/N:2,1;3;4/rA:4nNCCO/rB:s1;;;/rC:;;;;
You can see that the first line in the output "InChI=1S/CH5N.CH4.H2O/c1-2;;/h2H2,1H3;1H4;1H2" is the same as the InChI I stored. And it is easy to identify the corresponding input smiles. HOWEVER, I noticed that it did not solve everything. There is a problem if the input smiles string is isomeric. For example:
The Inchi for the isomeric smiles "O[C@H]1CN(CC[C@H](NCC[C@H](O)C(O)=O)C(O)=O)[C@@H]1C(O)=O" returned by molconvert is : InChI=1S/C12H20N2O8/c15-7(11(19)20)1-3-13-6(10(17)18)2-4-14-5-8(16)9(14)12(21)22/h6-9,13,15-16H,1-5H2,(H,17,18)(H,19,20)(H,21,22)/t6-,7-,8-,9-/m0/s1
If I run jcsearch -q '[$(C(O)=O)]' -f inchi "O[C@H]1CN(CC[C@H](NCC[C@H](O)C(O)=O)C(O)=O)[C@@H]1C(O)=O", to check for the presence of a carboxylic acid group, I get the following InChI: InChI=1S/C12H20N2O8/c15-7(11(19)20)1-3-13-6(10(17)18)2-4-14-5-8(16)9(14)12(21)22/h6-9,13,15-16H,1-5H2,(H,17,18)(H,19,20)(H,21,22), which to my interpretation, corresponds to the canonical version of the input smiles string.
As you can see, I cannot match the result to the input smiles, because the Inchi associated with it is different from what I get from jcsearch.
How do I solve this? One way would be to find a way for jcsearch to output the InchI taking into account the stereoisomeric configuration. This way, jcsearch would return the same inchi given by molconvert. Is there a jcsearch option for this? Another option would be to convert the smiles to inchi without taking care of the stereochemistry. Thus molconvert would give the same inchi that is returned by jcsearch. Either of the two solutions would be helpful, but I would have to pay the price of loosing information on stereochemistry. That will penalize me down the road when I would have to differentiate molecules with different configurations.
Could you please help me here?
Regards,
Yannick