Command Line Searching Slow on Linux/MySQL vServer

User 773d472e7f

06-05-2013 15:27:21

Dear CHEMAXON:
I have an issue/question regarding the performance of Substructure searching:
JCHEM Molecule database on a Virtual Private Server.
	15 Million + molecule table.
	Linux Ubuntu vServer / MySQL
	There are some duplicate structures
	There are some Empty molecules
Duplicate searching returns the result within 1 or 2 seconds eg:
./jcsearch -t:d -q "CCOc1cc(C=O)ccc1OCCCN(CC)CC"  DB:AKOS_MOLTABLE
CCOC1=C(OCCCN(CC)CC)C=CC(C=O)=C1
CCOC1=C(OCCCN(CC)CC)C=CC(C=O)=C1
Substructure searching takes a very long time eg the following takes over 1 hour to return the results:
root@v14908:~/ChemAxon/JChem/bin# ./jcsearch -Xmx4000M -server -t:s -q "CCOc1cc(C=O)ccc1OCCCN(CC)CC"  DB:AKOS_MOLTABLE
CCOC1=C(OCCCN2CCCCC2C)C=CC(C=O)=C1
CCOC1=C(OCCCN2CCC(C)CC2)C=CC(C=O)=C1
CCOC1=C(OCCCN2CCN(CCO)CC2)C=CC(C=O)=C1
CCOC1=C(OCCCN2CC(C)CC(C)C2)C=CC(C=O)=C1
...etc
Subsequent searches also take the same amount of

time.

My understanding is that the first search can take a

long time to load the cache but subsequent searches should be much faster.


This is not happening in this case. Please would you

advise what I should look at to improve the performance, for example checking

the state of the cache.


I am looking forward to your reply
Kind Regards
Bernard D'Alwis

ChemAxon 9c0afc9aaf

06-05-2013 17:43:15

Hi,

My understanding is that the first search can take a

long time to load the cache but subsequent searches should be much faster.

This is true as long as the JVM (the application) is not restarted.

Unfortunately running "jcsearch" means starting a new process and ending it at every run, therefore the structure cache is lost and has to be loaded each time.

We only recommend "jcsearch" for small databases.

For larger databases we recommend continously running solutions:

GUI: Instant JChem GUI (supports MySQL)

API: Oracle Cartridge, JChem Web Services, or Java/.NET API

Best regards,

Szilard

PS: The duplicate search is fast because it does not use the cache.

User 773d472e7f

07-05-2013 07:12:18

Thanks for clarifying the difference between the jcsearch and the other forms of running a search.

We are also testing the performance of the JCHEM webservices on this same database.

In this case I have added the following option to the startup.sh file in:

JChemWebServices/tomcat/bin/startup.sh

as follows:

#!/bin/sh

#JAVA_HOME=/location/of/my/jdk/home/

#export JAVA_HOME

JAVA_OPTS='-Xmx3000M'

export JAVA_OPTS

CATALINA_HOME=/root/JChemWebServices/tomcat

export CATALINA_HOME

${CATALINA_HOME}/bin/startup.sh $1

exit

We will now test this with a call this same search via the webservices.

My expectation is that the first search will take two hours (same as jcsearch) and then subsiquent searches should be faster.

Please confirm this approach.

Kind regards.

Bernard D'Alwis

ChemAxon e07e2a364b

07-05-2013 12:35:03

Hi,

the JAVA_OPTS looks fine. I would suggest to try the new Web Services, the "classic" may have problem of buffering the query results into the memory.

Download site: https://www.chemaxon.com/download.php?d=/data/download/webservices2/0.9.0-developer-preview

(we are going to release the first official version in a week). If you are more specific about the size/type of the molecules in the database, we may help more about the memory optimization/query performance.

Gabor