In-memory caching of molecule structures/objects for search

User 05a0a0ba3d

14-04-2006 22:27:04

Earlier, I read this:





> Although it still takes some time to create Molecule objects from the


> cached cxsmiles strings, this compromise is necessary to allow the


> caching of big tables with minimal memory footprint. (for 1 million


> typical structures the memory consumption is 100MB or less) At the


> moment there is no way to store the structures in an other format, as


> it would consume a lot of memory or the slow DB acces should be utilized.


> (we may implement some compact binary in-memory form in the future


> though as an option)





Suppose our application is a webapp; multiple users


can therefore start a search at the same time. Assuming


1 million structures are cached per search, the cache (or RAM)


requirement is now 100 MB x #searches.





In short, it is relatively easy to construct a hypothetical test case


for which the required cache/memory usage will exceed server


RAM.





Question: do you have a rough functional relation between the


number molecules searched and memory usage?





Question: if our webapp needs to support multiple simultaneous


searches, is there anything special we need to do through


the chemaxon API?





Thanks.

ChemAxon a3d59b832c

17-04-2006 13:55:19

Hello,
creadcdd wrote:
Earlier, I read this:


> Although it still takes some time to create Molecule objects from the


> cached cxsmiles strings, this compromise is necessary to allow the


> caching of big tables with minimal memory footprint. (for 1 million


> typical structures the memory consumption is 100MB or less) At the


> moment there is no way to store the structures in an other format, as


> it would consume a lot of memory or the slow DB acces should be utilized.


> (we may implement some compact binary in-memory form in the future


> though as an option)





Suppose our application is a webapp; multiple users


can therefore start a search at the same time. Assuming


1 million structures are cached per search, the cache (or RAM)


requirement is now 100 MB x #searches.
No worries, the cache is stored only once, and the separate search threads can access this simultaneously. The memory consumption per search thread is much smaller, and is recyclable of course.
creadcdd wrote:
In short, it is relatively easy to construct a hypothetical test case


for which the required cache/memory usage will exceed server


RAM.





Question: do you have a rough functional relation between the


number molecules searched and memory usage?
It depends on the size of the molecules currently searched, but the size of this temporary memory is usually well below 10MB per search. (It consists of molecule and search objects, hit lists, etc.)
creadcdd wrote:
Question: if our webapp needs to support multiple simultaneous


searches, is there anything special we need to do through


the chemaxon API?
No, you do not have to do anything special, all is handled internally. Even if your server has multiple processors, more than one computing threads are started to fully exploit your hardware.





Best regards,





Szabolcs

User 05a0a0ba3d

02-05-2006 23:57:38

We observe a memory leak initiated by


a search operation, and worsened by


successive searches.





My conclusion is that either (a) our software


persists session-based data that is not released


for each search or (b) the chemaxon search


software does not release certain data obtained


during the search.





You previously asserted that chemaxon searches


cache memory; does this mean each search will


cause new additional memory to be cached?





Have you observed memory leaks that scale with


the number of searches?





Aside from system-wide garbage collection, is there


a method in the chemaxon API that we can use to


release the search cache? At least this might help


us ascertain whether the chemaxon search is involved


in the leakage.





Thanks.

ChemAxon 9c0afc9aaf

03-05-2006 08:06:10

Quote:
You previously asserted that chemaxon searches


cache memory; does this mean each search will


cause new additional memory to be cached?



During the very first search the relevant data from the the structure table is cached (stored in memory).


Subsequent searches use the same cache, so the memory consumption of the cache remains the same.





Of course every search has a certain memory need for calculations and storing the results. This temporary memory need will be recycled if you release the JChemSearch object for garbage collection after the search.


So again, the memory need will not increase.
Quote:
Have you observed memory leaks that scale with


the number of searches?
No.
Quote:
Aside from system-wide garbage collection, is there


a method in the chemaxon API that we can use to


release the search cache? At least this might help


us ascertain whether the chemaxon search is involved


in the leakage.
You can try JChemSearch.clearCache() :


http://www.chemaxon.com/jchem/doc/api/chemaxon/jchem/db/JChemSearch.html#clearCache()


Or you can search without cache right from the start by calling setStructureSearching(false):


http://www.chemaxon.com/jchem/doc/api/chemaxon/jchem/db/JChemSearch.html#setStructureCaching(boolean)





Please note that both calls will drastically slow down the searches, so they are not recommended in general.





Perhaps you could comment out some parts of the code instead (e.g. generating an arbitrary hit list instead of calling the search, etc.), but it's up to your debugging practices of course.





Best regards,





Szilard

ChemAxon a3d59b832c

03-05-2006 08:22:49

I have split this discussion into a separate topic.