Slow substructure search with explicit [H]

User 8139ea8dbd

24-10-2011 15:30:22

Using the cartridge, substructure search with


[H]C1=C(C)C2=C([H])C(=C([H])N=C2N1)C1=C([H])C([H])=C([H])C([H])=C1[H]


is very slow.


If you remove [H] and use (s*), performance is normal. It seems like explicit hydrogen is not handled in an optimal way.

ChemAxon 8407015329

26-10-2011 11:27:59

Hi,


We started to check the possible scenarios(default settings with latest JCB) for the issue you experienced. In the meanwhile could you please send us some additional data such as:


- what version are you using


- what kind of table are you searching and how many structures are in it


- did you use any specific search options


- what was the exact command executed


 


Regards,


Vencel

User 4cd5052280

03-11-2011 16:45:22

JCHEM_CORE_PKG.GETENVIRONMENT()
------------------------------------------------------------
Oracle environment:
Oracle Database 10g Enterprise Edition Release 10.2.0.3.0 -
64bi
PL/SQL Release 10.2.0.3.0 - Production
CORE    10.2.0.3.0      Production
TNS for Solaris: Version 10.2.0.3.0 - Production
NLSRTL Version 10.2.0.3.0 - Production

JChem Server environment:
Java VM vendor: Sun Microsystems Inc.
Java version: 1.6.0_26
Java VM version: 20.1-b02
JChem version: 5.5.1.0
JChem Index version: 5050100
JDBC driver version: 11.1.0.7.0-Production


 


SQL> select count(*) from cpd;

  COUNT(*)
----------
   5482273


select cpd_sid from gnf_imc.CPD where jc_compare(jc_smiles, '[H]C1=C(C)C2=C([H])C(=C([H])N=C2N1)C1=C([H])C([H])=C([H])C([H])=C1[H]', 't:s')=1;


 

ChemAxon 8407015329

04-11-2011 19:51:30

Hi,


Unfortunately we were unable to reproduce the slowdown even with that many structures in the target table. Only experienced a ~1.2 times slowdown which is reasonable considering the amount of hydrogen atoms in the query structure can affect the atom by atom search. 


Is this issue you experiencing reproducable every time you search? What is the slowdown factor you experience? Is the issue present if you query only a part of the table (using a filter query for example)?


Regards,


Vencel

User 8139ea8dbd

04-11-2011 21:36:26

This looks like a total mystery. The SQL basically never returns (something stalled in the backend)


I first try


select cpd_sid,jc_smiles from cpd where jc_compare(jc_smiles, '[H]C1=C(C)C2=C([H])C(=C([H])N=C2N1)C1=C([H])C([H])=C([H])C([H])=C1[H]', 't:na')=1


And I get 461 structures that pass the initial screening


Then for each of the 461 candidate, I did


select jc_compare(<smiles>, '[H]C1=C(C)C2=C([H])C(=C([H])N=C2N1)C1=C([H])C([H])=C([H])C([H])=C1[H]', 't:s') from dual


and it went through all of them without problem.


But if I run


select cpd_sid,jc_smiles from cpd where jc_compare(jc_smiles, '[H]C1=C(C)C2=C([H])C(=C([H])N=C2N1)C1=C([H])C([H])=C([H])C([H])=C1[H]', 't:s maxTime:1000 maxHitCount:5')=1;


it never returns (maxTime does not have effect).


(Side note: when I do
select * from cpd where jc_compare(jc_smiles, '[H]C1=C(C)C2=C([H])C(=C([H])N=C2N1)C1=C([H])C([H])=C([H])C([H])=C1[H]', 't:na haltOnError:y')=1;
I got an exception saying: ORA-29902: error in executing ODCIIndexStart() routine, ORA-20102: Invalid search option: error: uknown option name: haltonerror Use -h for help. ORA-06512: at "JCHEM_CART.JCHEM_CORE_PKG", line 34  ORA-06512: at "JCHEM_CART.JC_IDXTYPE_IM" line 483  ORA-06512: at line 1)
It seems haltOnError is no longer a valid option, maybe the document needs to be updated?)


What do you suggest we do next?


Thanks

ChemAxon aa7c50abf8

04-11-2011 23:24:55

Would it be possible to temporarily increase the log level by adding the following lines in the jchem/cartridge/conf/logging.properties file:


chemaxon.jchem.db.level = FINEST
chemaxon.jchem.cartridge.level = FINEST

? For these changes to take effect, restarting the JChem Cartridge server is currently required.


It would be also very helpful, if you could execute the following command a couple of times at a few seconds intervals while the problematic search is running/hanging:


bash server.sh thread-dump 2>> thread-dump.log

It needs to executed in the same directory where the JChem Cartridge server is started and stopped (i.e. in jchem/cartridge).


Please, could you send the log files (from jchem/cartridge/logs) as well as the thread-dump.log file to pkovacs at chemaxon dot com?


(The haltOnError search option was added in 5.6.0.0. It was not available in 5.5.x. The ineffectiveness of maxTime may be related to the problem at hand.)


Thank you,


Peter

ChemAxon aa7c50abf8

05-11-2011 13:09:35

PS:


If the FINEST log level results in excessively large log files, FINER should also do the job in this first round.

ChemAxon 8407015329

10-11-2011 10:15:59

Hi,


We were finally successfull to reproduce the issue. It is caused by a faulty query enumeration in the database search mechanism, thus the query with the explicit H atoms is about 20 times slower.


This issue has been fixed in version 5.7, which is due for release in a couple of days. If the workaround with s* is not a suitable alternative for you we suggest you upgrade to version 5.7. 


Thanks for all the help in detecting the issue,


Vencel