Substructure search does not return correct result set.

User 478d103dc9

14-06-2013 21:57:44



I’m running below example for sulphonamide(SearchMol.mrv
attached), but search does not return as many records it as should.


String mol = "CN(C)S(C)(=O)=O"; // Query structure

String structureTableName = "tblStructures";

JChemSearch searcher = new JChemSearch(); // Create searcher object

searcher.setQueryStructure(mol);

searcher.setConnectionHandler(connHandler);

searcher.setStructureTable(structureTableName);

JChemSearcOptions searchOptions = new JChemSearchOptions(SearchConstants.SUBSTRUCTURE);

searcher.setSearchOptions(searchOptions);

searcher.run();


These are just two(Mol1.mrv and Mol2.mrv) out
of many molecules missing from the result set. 


Any ideas why? 


Thank you.

ChemAxon 4a2fc68cd1

17-06-2013 07:44:17

Hi,


Note that JChemSearch.setQueryStructure() function interprets the given string as SMARTS format, not SMILES. It means, that in case of "CN(C)S(C)(=O)=O", the terminal carbon atoms are considered to be aliphatic and hence it will not match the target structure in Mol2.mrv. However, it should still match to Mol1.mrv, as far as I see.


The SMARTS representation of the desired query would be "[#6]N([#6])S([#6])(=O)=O". Could you try this one in your code? Do you still have missing hits (e.g. Mol1.mrv) with this modified query?


Best regards,
Peter

User 478d103dc9

17-06-2013 14:54:26










pkovacs84 wrote:

Hi,


Note that JChemSearch.setQueryStructure() function interprets the given string as SMARTS format, not SMILES. It means, that in case of "CN(C)S(C)(=O)=O", the terminal carbon atoms are considered to be aliphatic and hence it will not match the target structure in Mol2.mrv. However, it should still match to Mol1.mrv, as far as I see.


The SMARTS representation of the desired query would be "[#6]N([#6])S([#6])(=O)=O". Could you try this one in your code? Do you still have missing hits (e.g. Mol1.mrv) with this modified query?


Best regards,
Peter



Hi Peter,


"However, it should still match to Mol1.mrv, as far as I see." - You are correct, somehow I missed this one.


Yes, when I use SMARTS representation "[#6]N([#6])S([#6])(=O)=O", all hits are returned.  


 We run all our test cases using SMILES format and it
seems to be working all good. 
Is there is an option in JChemSearch that
can be used for searches, no matter of specified string format(SMARTS or SMILES)? 


Thank you



User 478d103dc9

18-06-2013 12:48:17










Access wrote:










pkovacs84 wrote:

Hi,


Note that JChemSearch.setQueryStructure() function interprets the given string as SMARTS format, not SMILES. It means, that in case of "CN(C)S(C)(=O)=O", the terminal carbon atoms are considered to be aliphatic and hence it will not match the target structure in Mol2.mrv. However, it should still match to Mol1.mrv, as far as I see.


The SMARTS representation of the desired query would be "[#6]N([#6])S([#6])(=O)=O". Could you try this one in your code? Do you still have missing hits (e.g. Mol1.mrv) with this modified query?


Best regards,
Peter



Hi Peter,


"However, it should still match to Mol1.mrv, as far as I see." - You are correct, somehow I missed this one.


Yes, when I use SMARTS representation "[#6]N([#6])S([#6])(=O)=O", all hits are returned.  


 We run all our test cases using SMILES format and it
seems to be working all good. 
Is there is an option in JChemSearch that
can be used for searches, no matter of specified string format(SMARTS or SMILES)? 


Thank you





Any heads up?

ChemAxon d4fff15f08

18-06-2013 14:27:02

Hi,


Sorry for the late answer.


Unfortunately there isn't such a general calling procedure to handle both SMARTS and SMILES at the same time. As Peter wrote any string will be interpreted as SMARTS (since it supports several extra properties which SMILES does not) with the exception of -t:d or --tautomer cases, when jcsearch expects SMILES instead of SMARTS in older versions of JChem.


We would recommend you to use SMARTS for such searches, if your environment allows it, of course.


Regards,


Norbert

ChemAxon 9c0afc9aaf

18-06-2013 14:44:45

Hi,


JChemSearch can also accept a Molecule object as query:


http://www.chemaxon.com/jchem/doc/dev/java/api/chemaxon/jchem/db/JChemSearch.html#setQueryStructure(chemaxon.struc.Molecule)


Molimporter treats ambigous strings as SMILES if possible, only treats it as SMARTS if SMARTS features are present.


http://www.chemaxon.com/marvin/help/developer/beans/api/chemaxon/formats/MolImporter.html#importMol(java.lang.String)


I hope this helps.


Best regards,


Szilard

User 478d103dc9

21-06-2013 12:58:43










nsas wrote:

Hi,


Sorry for the late answer.


Unfortunately there isn't such a general calling procedure to handle both SMARTS and SMILES at the same time. As Peter wrote any string will be interpreted as SMARTS (since it supports several extra properties which SMILES does not) with the exception of -t:d or --tautomer cases, when jcsearch expects SMILES instead of SMARTS in older versions of JChem.


We would recommend you to use SMARTS for such searches, if your environment allows it, of course.


Regards,


Norbert



Hi Norbert,


I’ve changed it to SMARTS format and started to get
more accurate results. However now I’m seeing an issue with SIMILARITY
searching.


I run a SIMILARITY(80% )search for sildenafil citrate(see
attached Sildenafil.mrv) and got only 10 hits out of 126 expected(see attached
SearchResults.sdf).
 


Please advise.

ChemAxon d4fff15f08

24-06-2013 13:39:21

Hi!


sorry for the delayed answer.


Unfortunatelly we couldn't open the attached SearchResults.sdf file. Could you resend it, please. Or you may try to send it to the [email protected] mail address too.


Could you also give us some information on how did you create the .sdf file (if it was created by a Chemaxon software, it would be nice to investigate the reason of having an inappropriate .sdf file saved)


Thank you in advance.


Norbert

User 478d103dc9

24-06-2013 17:29:39










nsas wrote:

Hi!


sorry for the delayed answer.


Unfortunatelly we couldn't open the attached SearchResults.sdf file. Could you resend it, please. Or you may try to send it to the [email protected] mail address too.


Could you also give us some information on how did you create the .sdf file (if it was created by a Chemaxon software, it would be nice to investigate the reason of having an inappropriate .sdf file saved)


Thank you in advance.


Norbert



Hi Norbert,


Just re-send you the file to the above email address.


Thanks

ChemAxon d4fff15f08

25-06-2013 07:14:06

Hi!


Thank you for the corrected .sdf file; in the meantime we also found
the problem with the exported file.



I have tried to reproduce the behaviour described by you, but I
could not. My results for JCB versions 5.12.0 and 6.0.0 are:

- with a threshold of 0.8 I found 126 structures (All)

- with a threshold of 0.75 I found 126 structures (All)

- with a threshold of 0.7 I found 124structures

- with a threshold of 0.6 I found 109 structures

I have also tried to import the content to a DB and search against
the DB. I had the same result.

I tried to search with SMARTS string against DB and against the .sdf
file provided. The result was the same.



Could you, please, give me some more information on how do you
initiate the search? (eg. what version of JCB are you using; do you
search with SMARTS, or with file; what type of DB do you use
(Oracle, postgre, etc) ).



Thank you for your help.

Best regards,

Norbert