[Q] Request for best practice

User bf3dbc99cf

19-02-2014 14:31:30

Dear ChemAxon,


 


I am using this DB server:


Server : 20-core CPU, 96GB RAM, 256GB SATA-3 SSD, and 1TB PCI-E SSD.


OS : CentOS 6.5 64-bit, Oracle11gR2,


 


I created all of the oracle database files on PCI-E SSD.


 


Using JCMan command line interface, I was able to load zinc_2013 database (37 millions structures) on a single COMPOUND table in 15 hours, about 2.4 million strucures per hour. 


 


For substructure searching, I was able to find hundreds of thousands hits in less than 10 seconds.


During searching, I found oracle uses 1 CPU core for screening, and java uses 6-10 cores for matching.


 


Do you think this is the best performance for data loading and screening?


If not, can you recommend better scenario? (oracle RAC, parallel_server,  partition table or so)


 


Regards,


 


Chong Hak Chae,

ChemAxon d4fff15f08

20-02-2014 12:11:24

Hi,


 


Your configuration look pretty OK.


The import and search times look reasonable.


I just want to mention a few other things, that are just ideas, and some explanations about how JChem works so you can finetune your system (but this is really fine gained tuning, would not affect dramatically your present performance).


 - there are two options that you can optimize your search with (detailed description can be found here: https://www.chemaxon.com/jchem/doc/dev/cartridge/cartapi.html ): 


- you can maximize the number of hits via "maxHitCount" option. however please be aware of the aspect discussed here: https://www.chemaxon.com/forum/viewtopic.php?p=54489#54489


- you can continuously retrieve the hits as they are found by search engine using "earlyResults" option while the search is still running



 - During a standard cartridge search (I believe you are using cartridge) the JChem server takes the most of the loaded. At the first search a cache (that includes all the structure and its fingerprints present in the data table) is built up on JChem server side which will be used as basis for multiple searches including screening too. This search will return the cd_ids of the hits so the database needs to be accessed in order to retrieve the structures for the cd_id so the user could get the hit structures (not only the cd_id). Only this last procedure will put load on oracle, during the search the DB normally is not accessed (unless the search needs data from the DB which is not cached, this is very rare case). In this sense there is no real need to have a multithreaded oracle, since it would not be loaded that much. Although, you can set it up as it was described here: https://www.chemaxon.com/forum/viewpost54534.html&highlight=#54534


 Concerning the disk allocation of the servers I think you are OK. It was not clear whether you have the JChem server on SSD or HDD. You can gain some speed by putting it on SSD.


 Oracle RAC is indeed a good way of balancing the load among the nodes of a cluster. Do you expect hard load on your system? 


 


 I hope I could help you.


 Best regards,


 Norbert