CXCALC crashing with large files

User 19b92665cf

06-04-2005 12:29:43

I'm using cxcalc to calculate properties for SD files. The command line format I'm using is:





$CHEMAXON/cxcalc file.sdf -o file.prop acceptor donor logp mass psa rotatablebondcount





The SDFiles being used are created by MOLCONVERT from a SMILES file. It works fine for a small file (few thousand structures) but for bigger files I get Java exceptions:





(for file 1....)





Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 3


at chemaxon.formats.MolConverter.parseOutFile(Unknown Source)


at chemaxon.formats.MolConverter.createMolConverter(Unknown Source)


at chemaxon.formats.MolConverter.main(Unknown Source)





(for file 2...)





Exception in thread "main" java.lang.OutOfMemoryError: Java


heap space





Any idea what might be happening or how to correct it? The machine I'm using has 640MB of RAM as well as swap space set up.





Thanks!





David

ChemAxon fb166edcbd

07-04-2005 12:55:00

I tested this with a 10000 molecule SDF file and it worked OK although it was very slow (about 3 hours). Could you upload some test files for both problems?





Since the molecules are read and processed one-by-one (only one molecule and corresponding data is kept in memory at a time) I guess that your files may contain some specific molecules causing these problems by themselves and it is not the number of molecules that matters.

User 19b92665cf

07-04-2005 13:42:29

OK, thanks. The gzipped file is 16MB (it's the NCI dataset), available for download from:





http://www.informatics.indiana.edu/djwild/files/nci_struct.sdf.gz





This is the one which is giving the "out of memory" error. I'm running under Red Hat 9.0 with a Celeron processor and 640MB memory.





David

ChemAxon fb166edcbd

08-04-2005 01:04:53

I started to test it with this file and I could process 47597 molecules without problem but again it took me hours. Could you check your output file for the index of the last molecule processed before the OutOfMemoryError? (The molecule index is written in the first output column.) I am thinking of extracting the relevant part of this huge file to reproduce the problem faster.

User 19b92665cf

09-04-2005 02:35:31

I just tried running it again. It ran for a few hours, then came up with the error:





Exception in thread "main" java.lang.OutOfMemoryError: Java heap space





When the error occured, the output file was 77.651 lines long, indicating the problem is with structure 77,650:





O=P(CC1=CC=CC=C1)(CC2=CC=CC=C2)P(CC3=CC=CC=C3)CC4=CC=CC=C4 98940





Just in case I got that wrong, here is the same line with the two structures above and below it:





CC(=NNC(N)=N)C1=NC(=CC=C1)C(C)=NNC(N)=N.O[N+]([O-])=O 98939


O=P(CC1=CC=CC=C1)(CC2=CC=CC=C2)P(CC3=CC=CC=C3)CC4=CC=CC=C4 98940


CP1CCC(O)(CC1)C2=CC=CC=C2 98941





Running cxcalc on these structures seems to work fine for me. I should add that I'm using cxcalc v3.5.5





Any ideas?





Thanks





David

ChemAxon fb166edcbd

11-04-2005 12:25:44

We could produce the out-of-memory error with molecule 77671 (m.smiles):


B12B34B56B378B149B2%10B79%11B58%12B6%13%14B%15%16B%17%18B%19B%20%21B%17%19%22B%15%18%23B%20%22%24B%13%16%23B%10%11%12%14%21%24





cxcalc acceptor donor logp mass psa rotatablebondcount m.smiles


Exception in thread "main" java.lang.OutOfMemoryError





and it seems that our ring-search algorithm fails in this case.


We will investigate this problem.

User 19b92665cf

11-04-2005 16:52:15

OK, look forward to hearing back - thanks for looking into it!