jc_contains returns subset of correct struct search result

User 104a68add6

20-04-2005 20:03:21

I am experiencing retrieval inconsistencies between a JChem webapp which we have written and the JChem Cartridge. I am not sure if the problem is with the type of structure being searched for, or the total number of structures (or structure ids) that can be returned from the search.





Searches that return less than 1000 for the webapp have returned the exact same number using the Cartridge. When my webapp search results are larger, the Cartridge results are about one third of what they should be.





We are using Oracle 9i on a windows server, with the Jchem cartridge and Tomcat on a separate Linux box. We're running Jchem 3.0.5.





The webapp search is using chemaxon.jchem.db.JChemSearch.





I have compared the JChem webapp search with an MDL ISIS search, using a table of structures that is entirely synched with the JChem Structure table. I understand that the way ISIS may do a search can


result in a little different result than your hemaxon.jchem.db.JChemSearch, which it does in some cases. But even so, the Cartridge returns fewer values.





Our one thought of something incorrect is that we simply created the index using a default:





CREATE INDEX jc_structure_idx ON structure(cd_smiles) INDEXTYPE IS jc_idxtype;





The results are in an attached file...they show the smiles string searched, and the number of structures located.





I have marked with *** the obvious disturbing results.





The Cartridge queries are all jc_contains queries of the format:


select ct_number from structure where jc_contains(cd_smiles, 'C1OC1') =1





Thanks very much,


Julie

ChemAxon a3d59b832c

21-04-2005 06:56:37

Hi Julie,





I think the problem is in the format of the query. jc_contains always interprets the query as SMARTS, but for JChemSearch it may be SMILES or SMARTS, depending on the way you imported the molecule.





I attach a picture showing the difference for your smiles "c1[nH0]c[nH0]c2[nH]c[nH0]c12". It can be seen that the SMARTS version contains further hydrogen count restrictions on the nitrogens.








If you would like to interpret the query as SMILES in the cartridge, I suggest to use the jc_compare function with the queryType option:





select ct_number from structure where jc_compare(cd_smiles, 'C1OC1', 'queryType:l') =1





See documentation:





http://www.jchem.com/doc/guide/cartridge/cartapi.html#jc_compare





And if you are interested in how to import a string as SMARTS in the API, check out this link:





http://www.chemaxon.hu/forum/ftopic236.html





Please tell us if it helped,





All the best,


Szabolcs