Substructure search performance tuning

ChemAxon 9c0afc9aaf

14-04-2006 11:06:33

Multiple processors





Since JChem 3.0 searches are running on multiple threads (in cached mode).


The easiest way to speed up searches is to add new processors.


(The available processors are automatically detected and utilized)





Server JVM mode





Another advice is to use the "-server" JVM option.


This means more run-time optimization (especially in the early stages of running).


So after the JVM start-up and the first few initial searches will be slower, but after the code gets into the optimized state during calculations you can achieve a higher performance.





Java version





Later Java versions perform better than earlier ones. (currently we recommend the latest update version of Sun Java 1.5)





Fingerprint settings





You should also make sure that the fast screening phase eliminate as many non-hits as possible.


Use "JChemSearch.isInfoToStdError(true)" to get some statistics about the number of screened and number of hits.


(the number of screened should be close to the number of hits for most queries)


If your fingerprint gets too dark you can get too many screened structures (compared to the number of hits) which have to be processed in vain by the more CPU demanding graph search phase.


(This usually occurs if you have a significant proportion of very large molecules)





Since JChem 3.1.6 you can calculate and view fingerprint statistics by executing





Code:
jcman s <table>






If necessary you may try different fingerprint settings and check the performance again.


Please see the documentation about the options:





http://www.chemaxon.com/jchem/doc/user/fingerprint.html





The easiest way to make a fingerprint less dark is to increase the length.


Please note that it also increases the cache size.





Limiting the number of hits





By using "JChemSearch.setMaxResultCount()" one can limit the number of hits if you do not need them all.


(e.g. it makes no sense to provide a hit list of millions of compounds for browsing)





http://www.chemaxon.com/jchem/doc/api/chemaxon/jchem/db/JChemSearch.html#setMaxResultCount(int)





Since search time is linear to the number of screened structures (which is approximately linear to the number of screened), the search time will remain about constant even for huge databases if you use this feature.


(because the screening time is usually negligible)





Structural Keys





Although our fingerprint is a chemical hashed fingerprint, we also have a similar concept to MDL keys which can be handy in certain cases.


The chemical hashed fingerprint can be extended with Structural Keys.


Each key represents a query structure and the bit is set to 1 if the target contains it.


If a query structure perfectly matches one of the structural keys it is recognized at the start of the search, and substructure results are coming almost instantaneously.


(we only have to check if the bit is set)





This is useful if you have a fix set of very frequently used queries (e.g. functional groups).





Currently these keys can only be specified at table creation, and cannot be changed afterwards, but we are planning to work on this.


(it's a relatively new thing)





For more information please visit:





http://www.chemaxon.com/jchem/doc/admin/#create

ChemAxon a3d59b832c

21-09-2012 09:16:17

There is a lot of collected information about optimizing JChem Cartridge in the FAQ:


http://www.chemaxon.com/jchem/doc/admin/CartridgeFAQ.html#jcart_faster


 


(Tips of Oracle-level optimizations.)