runCompleteSearch fails when molfile is used

User 873a9ae9d0

08-05-2013 07:38:51

Hi,


I have a WCF service that uses the JChemSearch webservice to perform Substructure search in a JChem table (residing in a MySQL db). The service is hosted on a LAMP server.


I use the 'runCompleteSearch' operation and as long as I use a SMILES string for the search all works fine.


If I use a MOLFILE (V2000) I get the error:


"Error in deserializing body of reply message for operation 'runCompleteSearch'."


Our db is pretty big (15 million structures). On a subset of the db the search using molfiles work !


I assume the error is due to a problem in the big db, but since I can not get any reply from calling runCompleteSEarch I have no idea what might cause the error in the big db.


First question: How can I get an interpretable error when I perform a substructure search on the big db when I use a molfile and the reply of runCompleteSearch can not be deserialized ?


Second question: Since using smiles works on the big db , but molfile not, I assume that the mechanism of performning a substructure (or similarity) search using a smiles is DIFFERENT from using a molfile.


Is that true, and if yes, what exactly IS the difference of using smiles compared with molfiles ?


Hope to hear back from you.


Best regards


Hans-Juergen


AKos GmbH


 


 


 


 


 


 

ChemAxon e07e2a364b

08-05-2013 08:55:02


Hi,


   Memory can be a bottle neck at two levels: a) during the search and b) during the export of the result.


a) Search build up an in-memory fingerprint database (aka structure cache) which size is independent from the molecule structure format. Calculating with 15 million molecules and averages smiles length 50 it must be around 1,5 gigabytes. To make better prediction use the formula at: http://www.chemaxon.com/jchem/doc/admin/Performance.html#cacheSize


b) As soap protocol has problem with large message and the classic web services were not optimized for large result sets. The results are stored in the memory in object representation which is far from efficient (benzine molecules needs 1400 byte approx in Molecule object form). So if you query * object, the system tries to load all of the molecules into the memory before serialize it.


According to the error message, you most probably have problem during export of the molecules. You may test with queries from which you expect to retrieve only a few molecules (1-10) to verify this hypothesis.


  If this hint does not help, could you send some more specific information about the problem: version number and tomcat log?


User 873a9ae9d0

08-05-2013 20:24:46

Hi,


thanks a lot for your reply !


I use 'outputformat="empty" and dataFieldnames="cd_id" in my call to runCompleteSearch i.e. in the response there should be no molecule information.


What I want is to get the complete hitcount (even for cases such as benzene), but in the response from calling runCompleteSearch I only want to retrieve say 500 cd_id's.


I THOUGHT for that I use the 'count' parameter of runCompleteSearch, but when specify 500 as count parameter and outputformat="empty" and dataFieldname="cd_id" I get the response
<Complete><SearchId>searchID1164376460</SearchId><ResultCount>2149</ResultCount><Rows></Rows></Complete>


I even tried  tried count=10 and I get the same result. 


The 'count' parameter seems to be ignored.


Question: What do I need to do to BOTH get the information that 2149 hits are found AND get in the response the first say 500 cd_ids (and ONLY cd_ids) ?


Hope to hear back from you.


Best regards
Hans-Juergen


 


 

ChemAxon e07e2a364b

09-05-2013 12:29:52

Hi Hans-Juergen,


   Generally, what you thought is right! It must be some minor issues (column name capitalization, etc.) This works for me:


 


<soapenv:Envelope xmlns:soapenv="http://schemas.xmlsoap.org/soap/envelope/" xmlns:web="http://webservice.jchem.chemaxon">


   <soapenv:Header/>


   <soapenv:Body>


      <web:runCompleteSearch>


         <!--Replace values with valid parameters: connectionHandler, tableName, etc. -->


        <web:connectionHandlerId>connectionHandlerID565478961</web:connectionHandlerId>


         <web:tableName>DEMO_TABLE</web:tableName>


         <web:queryMolecule>ccc</web:queryMolecule>


         <web:queryOptions>t:s</web:queryOptions>


         <web:beginIndex>2</web:beginIndex>


<web:count>3</web:count>


<web:outputFormat>empty</web:outputFormat>


         <web:dataFieldNames>cd_id</web:dataFieldNames>


         <web:hitColorAndAlignmentOptions xsi:nil="true" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"/>


      </web:runCompleteSearch>


   </soapenv:Body>


</soapenv:Envelope>