Multiple smile search?

User 77bcbbb206

03-08-2005 16:32:38

If I have multiple smile strings (from user input) and I need to return their cd_id's, do I have to search on each string separately or is there a way to have jc_compare accept a sql string?





Example:





Input


Cc1ccc(F)cc1F


CC(=O)F





My current approach is to loop through the input and execute this command on each string





SELECT cd_id FROM STRUCTURE WHERE jc_compare(cd_smiles, 'SMILE_STRING', 't:p ')=1





where SMILE_STRING is replaced by the actual smile string.





My question is can I do the following?





SELECT cd_id FROM STRUCTURE WHERE jc_compare(cd_smiles, 'select ''Cc1ccc(F)cc1F'',''CC(=O)F'' from dual', 't:p')=1





There is no way to link the input (like molecules or structural similarities)





I realize this example looks trivial, but if the user input is 100 structures rather than 2, there are 100 different calls that need to be made to the cartridge, etc.





Any ideas would really help,


Jim

ChemAxon aa7c50abf8

03-08-2005 17:08:00

Jim,
Quote:
My question is can I do the following?





SELECT cd_id FROM STRUCTURE WHERE jc_compare(cd_smiles, 'select ''Cc1ccc(F)cc1F'',''CC(=O)F'' from dual', 't:p')=1
It will not work. The second argument to jc_compare is expected to be a molecular structure.





Peter

User 77bcbbb206

03-08-2005 17:10:51

Peter,





Thanks for quick response.





Is there any way to do what I'm asking other than repeatedly calling the jc_compare function?





Thanks,


Jim

ChemAxon aa7c50abf8

04-08-2005 10:27:52

What is your main motivation? Avoiding the need to manually enter a SQL statement multiple times, or improving performance?

ChemAxon aa7c50abf8

04-08-2005 12:49:50

With the statement you suggested (where multiple structures can be specified as queries to the jc_compare operator at a time), would it not be a problem that you will not be able to establish the correspondence between the individual cd_ids returned and the individual structures in the "query-set"? Please, could you describe the context in which you need this feature so we more fully understand your requirements?





Peter

User 77bcbbb206

04-08-2005 14:03:22

You make a very good point about the not being able to link the cd_id with the structure queried. But I think, in my personal case, the benefit out ways the limitations.





The web application I've developed allows users to upload lists of structure criteria for a search. Smile strings, molecule names, CAS numbers, etc. Some of this data is part of the cartridge table and some of it is part of a separate table. The search, upon completion, returns a pagable list of the structure hits for the user to do with what they please. The criteria that relates to the external table (not jchem) is easy to search on with a simple where in clause "select id from table where cas_num in (...)". The jchem table is a little more complicated in that I have to parse the smile string, which I pass is as comma delimited to a package, and then I have to call the jc_compare function multiple times in a dynamic sql statement. Now this isn't a problem. Well, not functionally per say. The problem comes with the search time needed. Each jc_compare separately works pretty well, but serialized, they start taking a lot of time. I'm not positive, but it seems to me taking HTTP calls out of the loop might quicken multiple calls. It also seems to me that getting oracle into the loop with regards to multiple calls would really improve performance.





This is just an idea. I think people developing applications using jchem would find this useful, but the average user probably wouldn't.





Does this make any sense? Am I doing what I described above correctly?

ChemAxon aa7c50abf8

05-08-2005 09:35:43

Jim,





The HTTP call (within JChem Cartridge) represents a fix overhead of about 36 milliseconds on a 2x3GHz Xeon machine. A perfect search ('t:p') on a table containing 3 million structures may take as little as 51 milliseconds. If I pick 10 thousand smiles randomly and retrieve their cd_ids, the average time of retrieval is: 136 milliseconds. The variation seems to be important on an otherwise idle machine, yet the amount of the overhead of the HTTP call represents an order of magnitude which is similar to the time of the perfect search proper. It follows that (as you suggest) reducing the number of HTTP calls is likely to give noticeable performance improvements in many cases. We will implement support for jc_compare to accept multiple queries probably in version 3.1.1.





Thank you for bringing up this issue.





Peter

User 77bcbbb206

05-08-2005 15:43:59

That's great! Thank you.