CXCALC crashing with large files
I'm using cxcalc to calculate properties for SD files. The command line format I'm using is:
$CHEMAXON/cxcalc file.sdf -o file.prop acceptor donor logp mass psa rotatablebondcount
The SDFiles being used are created by MOLCONVERT from a SMILES file. It works fine for a small file (few thousand structures) but for bigger files I get Java exceptions:
(for file 1....)
Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 3
at chemaxon.formats.MolConverter.parseOutFile(Unknown Source)
at chemaxon.formats.MolConverter.createMolConverter(Unknown Source)
at chemaxon.formats.MolConverter.main(Unknown Source)
(for file 2...)
Exception in thread "main" java.lang.OutOfMemoryError: Java
Any idea what might be happening or how to correct it? The machine I'm using has 640MB of RAM as well as swap space set up.
I tested this with a 10000 molecule SDF file and it worked OK although it was very slow (about 3 hours). Could you upload some test files for both problems?
Since the molecules are read and processed one-by-one (only one molecule and corresponding data is kept in memory at a time) I guess that your files may contain some specific molecules causing these problems by themselves and it is not the number of molecules that matters.
I started to test it with this file and I could process 47597 molecules without problem but again it took me hours. Could you check your output file for the index of the last molecule processed before the OutOfMemoryError? (The molecule index is written in the first output column.) I am thinking of extracting the relevant part of this huge file to reproduce the problem faster.
I just tried running it again. It ran for a few hours, then came up with the error:
Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
When the error occured, the output file was 77.651 lines long, indicating the problem is with structure 77,650:
Just in case I got that wrong, here is the same line with the two structures above and below it:
Running cxcalc on these structures seems to work fine for me. I should add that I'm using cxcalc v3.5.5
We could produce the out-of-memory error with molecule 77671 (m.smiles):
cxcalc acceptor donor logp mass psa rotatablebondcount m.smiles
Exception in thread "main" java.lang.OutOfMemoryError
and it seems that our ring-search algorithm fails in this case.
We will investigate this problem.
OK, look forward to hearing back - thanks for looking into it!