Command Line Searching Slow on Linux/MySQL vServer

User 773d472e7f

06-05-2013 15:27:21

Dear CHEMAXON:

I have an issue/question regarding the performance of Substructure searching:

JCHEM Molecule database on a Virtual Private Server.

15 Million + molecule table.

Linux Ubuntu vServer / MySQL

There are some duplicate structures

There are some Empty molecules

Duplicate searching returns the result within 1 or 2 seconds eg:

./jcsearch -t:d -q "CCOc1cc(C=O)ccc1OCCCN(CC)CC"  DB:AKOS_MOLTABLE

CCOC1=C(OCCCN(CC)CC)C=CC(C=O)=C1

CCOC1=C(OCCCN(CC)CC)C=CC(C=O)=C1

Substructure searching takes a very long time eg the following takes over 1 hour to return the results:

root@v14908:~/ChemAxon/JChem/bin# ./jcsearch -Xmx4000M -server -t:s -q "CCOc1cc(C=O)ccc1OCCCN(CC)CC"  DB:AKOS_MOLTABLE

CCOC1=C(OCCCN2CCCCC2C)C=CC(C=O)=C1

CCOC1=C(OCCCN2CCC(C)CC2)C=CC(C=O)=C1

CCOC1=C(OCCCN2CCN(CCO)CC2)C=CC(C=O)=C1

CCOC1=C(OCCCN2CC(C)CC(C)C2)C=CC(C=O)=C1

...etc

Subsequent searches also take the same amount of
time.


My understanding is that the first search can take a
long time to load the cache but subsequent searches should be much faster.


This is not happening in this case. Please would you
advise what I should look at to improve the performance, for example checking
the state of the cache.


I am looking forward to your reply

Kind Regards

Bernard D'Alwis



ChemAxon 9c0afc9aaf

06-05-2013 17:43:15

Hi,


My understanding is that the first search can take a
long time to load the cache but subsequent searches should be much faster.


This is true as long as the JVM (the application) is not restarted.


Unfortunately running "jcsearch" means starting a new process and ending it at every run, therefore the structure cache is lost and has to be loaded each time.


We only recommend "jcsearch" for small databases.


For larger databases we recommend continously running solutions:


GUI: Instant JChem GUI (supports MySQL)


API: Oracle Cartridge, JChem Web Services, or Java/.NET API


Best regards,


Szilard


PS: The duplicate search is fast because it does not use the cache.



User 773d472e7f

07-05-2013 07:12:18

Thanks for clarifying the difference between the jcsearch and the other forms of running a search. 


We are also testing the performance of the JCHEM webservices on this same database.


In this case I have added the following option to the startup.sh file in:


JChemWebServices/tomcat/bin/startup.sh


as follows:



#!/bin/sh


#JAVA_HOME=/location/of/my/jdk/home/


#export JAVA_HOME


JAVA_OPTS='-Xmx3000M'


export JAVA_OPTS


CATALINA_HOME=/root/JChemWebServices/tomcat


export CATALINA_HOME


${CATALINA_HOME}/bin/startup.sh $1


exit


We will now test this with a call this same search via the webservices.


My expectation is that  the first search will take two hours (same as jcsearch) and then subsiquent searches should be faster.


Please confirm this approach.


Kind regards.


Bernard D'Alwis


ChemAxon e07e2a364b

07-05-2013 12:35:03

Hi,


   the JAVA_OPTS looks fine. I would suggest to try the new Web Services, the "classic" may have problem of buffering the query results into the memory. 


Download site: https://www.chemaxon.com/download.php?d=/data/download/webservices2/0.9.0-developer-preview


(we are going to release the first official version in a week). If you are more specific about the size/type of the molecules in the database, we may help more about the memory optimization/query performance.


Gabor