Memory problem when generating library

ChemAxon 60ee1f1328

23-12-2005 14:07:46

And finally, slightly later than anticipated below, we are again looking at Reactor...

I have just tried to run 50000 of one reactant against 1 other reactant and instead of the expected licence key error constraint I recieved a

java.lang.OutOfMemory error, which leads me to wonder if my machine

(2.4Ghz processor, 512Mb ram, XP) will be capable of completing the enumeration of approx 35 million molecules i.e. 700 * 50000 - obviously the answer is no...however Should I consider a larger specification for my machine (which is possible) or a grid/parallelisation strategy for this size of expected output? We would expect to be running many similar type reactions at once and so maybe both would be required in reality! (One of your bench mark's is enumeration of 1 million in 3 hours with P1.4 Ghz and 512Mb ram so I can probably work it out for myself) but thought it would be good to get your advise on the enumeration of very large data sets so that I get a new T-shirt every so often!

So partition the inputs accordingly and distribute to similar nodes is the approach?

We would like to further discuss licensing terms for Reactor and I have written a separate email to Sales _at_ Chemaxon.com to this end.

Many thanks and Merry Xmas,

Daniel.

ChemAxon d76e6e95eb

23-12-2005 16:20:00

I think, that generating huge libraries should not be a problem with Reactor if you do not store all molecules in the memory. Running combichem reactions in file mode or database mode (with Synthesizer) would let you generate large libraries, however the largest one I tested was only a 6 million member library on a similar system to yours.

How do you run reactor currently?

ChemAxon 60ee1f1328

23-12-2005 16:43:32

Hi Gyuri,

In the longer term, I intend to use the react API in java similar to (likely based upon) the wrapper class examples you make available, i.e. the examples from the conference literature.

In the immediate term, I intend to run reactor from command line and hence it sounds like I should investigate either file/database mode in order to control the memory consumption.

In fact we have only had a trial licence to date and thus have done no such large scale stuff at all - i.e. we are now looking to start this...

Incidentally, what is to stop an individual setting up a batch processing approach in java and processing many millions of molecules with a trial license, presumably your 200 call limitation for this licence is based on a time period similar to the searching constraint in the JChem licence?

Cheers + merry Xmas to all at ChemAxon,

Daniel.

ChemAxon d76e6e95eb

23-12-2005 16:59:51

We will investigate the problem, the developers will contact you probably right after the holiday.

(For your information, we are building a dektop Java GUI for more user

friendly access.)

ChemAxon fb166edcbd

24-12-2005 15:48:31

inhibox wrote:

I have just tried to run 50000 of one reactant against 1 other reactant and instead of the expected licence key error constraint I recieved a

java.lang.OutOfMemory error, which leads me to wonder if my machine

(2.4Ghz processor, 512Mb ram, XP) will be capable of completing the enumeration of approx 35 million molecules i.e. 700 * 50000 - obviously the answer is no...however Should I consider a larger specification for my machine (which is possible) or a grid/parallelisation strategy for this size of expected output?

Actually only the input is stored in molecule sets, the output is written out to the output stream molecule-by-molecule. Before processing the reaction, all input is loaded into molecule sets, that's why you run into this OutOfMemoryError before any output is generated.

The default JVM memory (64 MB) can hold 1000-2000 molecules of average size. You can increase the JVM memory by setting the command line option -Xmx, e.g.:

Code:

react -Xmx256M -m comb -r reaction.mrv reactants1.sdf reactants2.sdf

will run Reactor with 256 MB JVM memory. This option is available for our Linux scripts from JChem 3.1. In earlier versions and for Windows .bat files you should edit the 'react' script file (bin/react (Linux) or bin/react.bat (Windows)) to add the -Xmx option as hard-coded.

It is said that no more than half of your RAM should be set for JVM memory, therefore in your case it is 256 MB. This is sufficient for approximately 4000-8000 molecules. Hence it seems to be impossible to have a memory-based molecule set with 50000 molecules. You should either split your input into 10-20 parts and process each of them separately or else use file mode:

Code:

react -e -m comb -r reaction.mrv reactants1.sdf reactants2.sdf

This slows down the process a bit because of the molfile I/O.

Database mode is only available for Synthesizer but you can just as well use that with a one-step synthesis in database mode. Note, that Synthesizer is part of the Reactor-Professional package.

inhibox wrote:

Incidentally, what is to stop an individual setting up a batch processing approach in java and processing many millions of molecules with a trial license, presumably your 200 call limitation for this licence is based on a time period similar to the searching constraint in the JChem licence?

No, the 200 call limitation refers to each invokation of Reactor (through the 'react' script or through its API). This means that in demo mode you can process 200 different reactant settings at each invokation. There is no time limit here.