Memory consumption of queries

User f5e6ccf034

03-08-2009 11:43:23

In the case of an in-process JChem db, do queries have more or less constant memory footprint (for a given db) or does it vary? If the latter, is there a rule of thumb?


Thanks in advance,


-- O.L.

ChemAxon 9c0afc9aaf

04-08-2009 01:31:22

Hi,


 


In a typical scenario of a large database and relatively few parallel users the biggest part of the memory consumption is usually required for the Structure Cache.


Th cache is loaded at the beginning of the first search on that table. The size is proportional to the table size, and does not vary with the number of users.


You can find more information here:


http://www.chemaxon.com/jchem/FAQ.html#cacheSize


 


 


The other part of the memory consumption is the memory needed for the search operations themselves.


(I think you were probably asking about this, but I thought it is useful to mention the cache too)


Obviously memory need scales with the number of simultaneous search operations.


However the exact memory need of these operations can depend on the type of the search (e.g. substructure, similarity ...), the query and target structures, the number of hits returned (even storing the hit list can be significant in case of millions of hits for numerous users), property calculations performed (e.g. Chemical Terms filter), filter query SQL, 64 or 32 bit Java, etc.


The number of variables is so high that no single value of formula can be accurate enough.


So we recommend a more empirical approach. We advise to:


- stress test the system with typical queries started at the same time, and see where it fails


(please note that statistically all users will never start their searches at the same time)


- monitor the system from time to time (structure and user activity may change over time)


- be a bit generous with memory: we advise to allow some overhead (the price of the memory is probably not significant compared to the total SW and HW cost)


- if the number of users might get very high (e.g. popular website) the number of concurrent searches can be controlled by the application code  (new requests wait)


Finally, to provide some number anyway: ~ 25 MB may be enough for a simple search.


 


Best regards,


 


Szilard