Understanding structure searching performance

User 7f33ec9a5c

26-06-2012 02:46:58

We are currently exploring the abilities of JChem Cartridge to work with a table of 17,683,823 structures, with an average SMILES length of 45 characters.


From your document on calculating memory requirements and cache refresh times, http://www.chemaxon.com/jchem/doc/admin/Performance.html#cacheSize, we calculate that we need a 2121 MB cache which will have an estimated 15.62 minute cache refresh time.


My first question is wether these numbers look reasonably accurate to you


Second, when I read the document http://www.chemaxon.com/jchem/doc/dev/cartridge/index.html there is a section (included below) which suggests that the cache is updated on every search following a structure insertion or deletion.  Can you please confirm that I am understanding the cartrige and jchem server interoperation correctly, and that the section below really does mean that the entire cache will need to be rebuilt every time a structure is inserted and a search is run?


Thank you.


~mike


Database Access Modes




Dual-Session Database Access Example



  1. The Search Engine checks (over the JDBC connection) to see if the database table involved in the search has been modified since the corresponding structure cache was loaded the last time.

  2. The structure cache is refreshed (over the JDBC connection), if necessary.

ChemAxon 9c0afc9aaf

26-06-2012 03:32:19

Hi Mike,


The semi-empirical formula refers to the smiles stored internaly after standardization, I think we should clarify it.


http://www.chemaxon.com/jchem/doc/dev/cartridge/index.html#index_stats


This is basicaly a rough estimation helping those who wish to consider different fingerprint sizes.


Anyway, your estimation seems to be reasonable for the memory consumption.


The cache loading time was measured many years ago, I would expect better results in a  more up-to-date environment.


The cache is only loaded fully at the first search after a JChem Server (re)start, which is normally very rare.


Additional changes are updated incrementally at every search.


If the number of inserted/updated structures between searches is reasonably low it is usually not even noticeable.


Best regards,


Szilard

ChemAxon aa7c50abf8

26-06-2012 07:50:49

The facility provided by JChem Cartridge to monitor memory usage is described here: http://www.chemaxon.com/jchem/doc/admin/cartridge.html#tracking_server_memory_utilization .

User 7f33ec9a5c

24-08-2012 16:28:56

You might want to consider adding Szilard's Information to the documentation describing the indexes that I refrence in the orignal post.  It would be a very helpful addition to 2. on that page.


 



"The cache is only loaded fully at the first search after a JChem Server (re)start, which is normally very rare.


Additional changes are updated incrementally at every search.


If the number of inserted/updated structures between searches is reasonably low it is usually not even noticeable."  


User 7f33ec9a5c

24-08-2012 16:29:34

You might want to consider adding Szilard's Information to the documentation describing the indexes that I refrence in the orignal post.  It would be a very helpful addition to 2. on that page.


 



"The cache is only loaded fully at the first search after a JChem Server (re)start, which is normally very rare.


Additional changes are updated incrementally at every search.


If the number of inserted/updated structures between searches is reasonably low it is usually not even noticeable."  


ChemAxon aa7c50abf8

25-08-2012 12:44:06

Sure.