Memory leak and crash during ASA and VDWSA calculations (3D)

User 677b9c22ff

04-02-2009 09:40:03

Hi,





the following smiles creates a memory leak (eats all mem until it dies) if





used with cxcalc. Unde Marvin GUI it seems fine, this is independent from





SMILES or 3D optimized structure. This never finishes and crashes.








Clc1cccc(Cl)c1


Clc1cccc(Cl)c1Cl





COC(=O)[C@H]1C2[C@]3(C)C=C[C@@H](O[Pb](C)(C)C)[C@@]2(OC3=O)C2CC[C@@H]3C[C@]12CC3=C














Code:
cxcalc  maximalprojectionarea  maximalprojectionradius  minimalprojectionarea  minimalprojectionradius  vdwsa  wateraccessiblesurfacearea "Clc1cccc(Cl)c1"








replace the (Cl) with Br or remove the Cl it works. Usually calc time is milliseconds with above molecule its multiple seconds.





There are multiple other issues with other molecules from the NCI2000 test set. Possible issues may be aromatization and all kinds of metalloorganics (remember the boran hair net?)





The other issue was here that its so slow because it needs to recreate the





3D structure again and again (10 times in above example). I think the main problem as before is the validation with a diverse test set. This could be the NCI2000 set or any sub sample from PubChem. The API and also cxcalc is not rigorous tested (I assume) because I had those tests running with each and every command in cxcalc already some years ago to find out where it got stuck or why it is so slow sometimes. Also when the code sometimes gets more safe it also gets slower, or sometimes the code also gets faster (speaking of selective perception) :-)





(just for reference the test file was punish-jchem.zip)


This is Marvin 5.1.0 and JAVA 1.6





Tobias














( I used the JAVA server here with 600 MByte (usually 200 are enough)





This is not a memory limit issue its a leak , you can increase to the maximum memory size (here 2 Gig) and it still will crash. Weird for





such a small molecule. I also dont understand why is it so fast in the GUI





but extremely slow in the cxcalc. Is that a threading issue? It almost seems that the code sits around waiting for something and then after some milliseconds it thinks ok i am finished.








------------------------------------------------------------------------------








Exception in thread "main" java.lang.OutOfMemoryError: GC overhead limit exceeded


        at chemaxon.calculations.MoleculeProjector.calculateCrossSections(MoleculeProjector.java:481)


        at chemaxon.calculations.MoleculeProjector.calculateArea(MoleculeProjector.java:223)


        at chemaxon.calculations.MoleculeProjector.run(MoleculeProjector.java:770)


        at chemaxon.calculations.Geometry.calculateMoleculeProjection(Geometry.java:287)


        at chemaxon.marvin.calculations.GeometryPlugin.run(GeometryPlugin.java:629)


        at chemaxon.marvin.plugin.concurrent.PluginWorkUnit.call(PluginWorkUnit.java:84)


        at chemaxon.marvin.plugin.concurrent.ReusablePluginWorkUnit.call(ReusablePluginWorkUnit.java:62)


        at chemaxon.util.concurrent.marvin.CompositeWorkUnit.call(CompositeWorkUnit.java:70)


        at chemaxon.util.concurrent.processors.SingleThreadedProcessor.getNext(SingleThreadedProcessor.java:67)


        at chemaxon.marvin.Calculator.run(Calculator.java:1086)


        at chemaxon.marvin.Calculator.run(Calculator.java:1045)


        at chemaxon.marvin.Calculator.main(Calculator.java:1502)





---------------------------------------------------------------------------





Exception in thread "main" java.lang.OutOfMemoryError: GC overhead limit exceeded


        at chemaxon.calculations.MoleculeProjector.calculateCrossSections(MoleculeProjector.java:495)


        at chemaxon.calculations.MoleculeProjector.calculateArea(MoleculeProjector.java:223)


        at chemaxon.calculations.MoleculeProjector.run(MoleculeProjector.java:703)


        at chemaxon.calculations.Geometry.calculateMoleculeProjection(Geometry.java:287)


        at chemaxon.marvin.calculations.GeometryPlugin.run(GeometryPlugin.java:629)


        at chemaxon.marvin.plugin.concurrent.PluginWorkUnit.call(PluginWorkUnit.java:84)


        at chemaxon.marvin.plugin.concurrent.ReusablePluginWorkUnit.call(ReusablePluginWorkUnit.java:62)


        at chemaxon.util.concurrent.marvin.CompositeWorkUnit.call(CompositeWorkUnit.java:70)


        at chemaxon.util.concurrent.processors.SingleThreadedProcessor.getNext(SingleThreadedProcessor.java:67)


        at chemaxon.marvin.Calculator.run(Calculator.java:1086)


        at chemaxon.marvin.Calculator.run(Calculator.java:1045)


        at chemaxon.marvin.Calculator.main(Calculator.java:1502)

ChemAxon 8b644e6bf4

24-02-2009 16:20:38

Dear Tobias,














Thank You for the report. We will check it and notify You in this topic.





Sorry for the late answer,














Regards,





Gabor

ChemAxon efa1591b5a

30-07-2009 11:55:08

Hi Tobias, 


Could please check it with the most recent release v5.2.3.1 - it should resolve the problem you encountered with a previous version:



id      Maximal projection area Maximal projection radius       Minimal projection area Minimal projection radius  Van der Waals surface area (3D) ASA     ASA+    ASA-    ASA_H   ASA_P


1       42.51   4.60    16.28   3.63    168.54  283.51  114.53  168.99  283.51  0.00



Kind regards,

Miklos


User 677b9c22ff

30-07-2009 16:57:37

Hi Miklos,


I tried withversion 5.2.3_2


Clc1cccc(Cl)c1


Clc1cccc(Cl)c1Cl




COC(=O)[C@H]1C2[C@]3(C)C=C[C@@H](O[Pb](C)(C)C)[C@@]2(OC3=O)C2CC[C@@H]3C[C@]12CC3=C


but it takes seconds up to minutes to finish, that is still problematic.


Aehm, actually via mview webstart the first molecule enever finishes.


I selected Geometry-Selected all ASA options - calc enery (always) - very strict.


Tobias



ChemAxon 8b644e6bf4

03-08-2009 17:33:38

Dear Tobias,


 


Thanks for the reply. It seems that "calculate lowest energy conformer" option is buggy, we are working on it. As a workaround you can generate 3D structures and use them in the geometry plugin. (We will notify about the fix.)


Your last structure seems to contain invalid stereo specification which can not be satisfied. (atom 17 and 7 can not have this parity combination specified). Resolving this conflict can be done by removing parity specification from one of them:


COC(=O)[C@H]1C2[C@]3(C)C=C[C@@H](O[Pb](C)(C)C)C2(OC3=O)C2CC[C@@H]3C[C@]12CC3=C


(I would like to note that we are planning the implementation of a filter for such situations in the future.)


This will make the 3d coordinate generation possible. (I would like to note that a bug around stereo criteria processing still exists which produced error in http://www.chemaxon.com/forum/viewpost19178.html#19178 . We are working on it, fix will be released soon.)


Using command line tool works:


$ time ./cxcalc  maximalprojectionarea  maximalprojectionradius  minimalprojectionarea  minimalprojectionradius  vdwsa  wateraccessiblesurfacear
ea "Clc1cccc(Cl)c1"
Starting areas:
 42.51265439480811 16.279825531070177 42.51265439480811
Starting areas:
 42.51265439480811 16.279825531070177 42.51265439480811
Starting areas:
 42.51265439480811 16.279825531070177 42.51265439480811
Starting areas:
 42.51265439480811 16.279825531070177 42.51265439480811
id      Maximal projection area Maximal projection radius       Minimal projection area Minimal projection radius       Van der Waals surface are
a (3D)  ASA     ASA+    ASA-    ASA_H   ASA_P
1       42.51   4.60    16.28   3.63    168.54  283.51  114.53  168.99  283.51  0.00

real    0m1.943s
user    0m0.258s
sys     0m0.198s


$ time ./cxcalc maximalprojectionarea maximalprojectionradius minimalprojectionarea minimalprojectionradius vdwsa wateraccessiblesurfacear
ea "Clc1cccc(Cl)c1"
Starting areas:
42.51265439480811 16.279825531070177 42.51265439480811
Starting areas:
42.51265439480811 16.279825531070177 42.51265439480811
Starting areas:
42.51265439480811 16.279825531070177 42.51265439480811
Starting areas:
42.51265439480811 16.279825531070177 42.51265439480811
id Maximal projection area Maximal projection radius Minimal projection area Minimal projection radius Van der Waals surface are
a (3D) ASA ASA+ ASA- ASA_H ASA_P
1 42.51 4.60 16.28 3.63 168.54 283.51 114.53 168.99 283.51 0.00

real 0m1.898s
user 0m0.258s
sys 0m0.214s


Using the "fixed" last structure:$ time ./cxcalc  maximalprojectionarea  maximalprojectionradius  minimalprojectionarea  minimalprojectionradius  vdwsa  wateraccessiblesurfacear
ea "COC(=O)[C@H]1C2[C@]3(C)C=C[C@@H](O[Pb](C)(C)C)C2(OC3=O)C2CC[C@@H]3C[C@]12CC3=C"
Starting areas:
 85.72993559023487 56.154617639072235 83.16021823176978
Starting areas:
 85.72993559023487 56.154617639072235 83.16021823176978
Starting areas:
 85.72993559023487 56.154617639072235 83.16021823176978
Starting areas:
 85.72993559023487 56.154617639072235 83.16021823176978
id      Maximal projection area Maximal projection radius       Minimal projection area Minimal projection radius       Van der Waals surface are
a (3D)  ASA     ASA+    ASA-    ASA_H   ASA_P
1       88.22   7.29    51.88   5.63    634.01  629.99  486.81  143.18  582.87  47.12

real    0m20.575s
user    0m0.289s
sys     0m0.152s



Regards,


Gabor

ChemAxon efa1591b5a

19-03-2010 13:10:25

Hi Tobias,


A brand new method for the prediction of minimal and maximal projection are will be released soon in version 5.3.2. It is already available for online tryout here


According to recent tests (by users and by ourselves) the new calculation is more accurate and much faster than previous versions. Beside projected area size calculations the new plugin also predict size (length) of molecule perpendicular to projection plane.


All feedback and suggestions are welcome.


Regards


Miklos

User 677b9c22ff

19-03-2010 21:02:38

Hi,


thanks, I like Sandboxes alot. Nice concept.


I get results but dont see the molecule. Have to wait until its fixed.


Cheers


Tobias

ChemAxon efa1591b5a

19-03-2010 21:09:30

Hi Tobias,


Which structure is not displayed, the 2d or the 3d? (or both...)


Does any other MView applet work in your browser (there are many examples available here)?


Which web browser do you use?


 


Miklos

User 677b9c22ff

20-03-2010 03:47:39

Actually the above error with the applet not loading is forwarded to forum5.


----------------------------------------------------------------------------------------------


I can run it on a WIN7 machine and I must say very nice job! The calculations are fast


and the engine will accept all kinds of tricky structures, bridges compounds, metallo organic


compounds etc.Evend the gold old friend the boran hair net and Vanadium and Hg structures.


http://discoverygroup.chemaxon.com/MGSandbox/g3ddemoplain.jsp?molsource=Cl[Hg][Mo]1234%28C%23[O]%29%28C%23[O]%29%28C%23[O]%29C5C1C2C3C45&task=ok


Are you running it on a 48 core Magny Cours Opteron (4xModel 6174, 12-core) or a


dual socket Xeon 5680 Hex-core machine? :-) Or is and older system? Or an improved


algorithm?


 


Cheers and thanks


Tobias

ChemAxon efa1591b5a

23-03-2010 14:19:41

I wish we could run these tools on the HWs you mentioned in your post... But the truth is far from that, the machine is actually our webserver, and the HW is: Intel(R) Core(TM)2 CPU  6600  @ 2.40GHz.


Generate3D was significantly improved in terms of accuracy and robustness in recent releases. We are not yet satisfied with its performance though, we keep working on that. It's often compared against Corina, but Generate3D is not a mere 2D to 3D structure generator but it optimises structures over a forcefield, so geometry can be trusted.


Molecular volume and the minimal projected surface area are brand new calculations. Do you find them useful?


 


Regards,


Miklos

User 677b9c22ff

27-03-2010 05:14:38

Hi,


yes, I find them useful, I started using the surface areas and 3D structures and


electronic properties for QSPR models, but got stuck with some of the compound


errors as discussed above.


 


I think speed is currently not an issue, the prices for 12 or 24 core systems will


drop soon, due to cloud computing and virtual machine demand, therefore multiple


compound calculations can be parallelized.


 


I also would not be obsessed with Corina, but rather publish a validation study with


some substances from PDB, CrystalEye (open experimental 3D crystal strutures) or


the Gold Set. In this way users would be convinced that the algorithms works


sufficiently.


 


Cheers


Tobias

User 677b9c22ff

31-03-2010 18:42:50

Hi Miklos and Gabor,


regarding the 3D coordinate test set.


There is a publication in BMC Bioinformatics which also provides supplement 3D structures.


So how is the statistics with the new Marvin 3D algorithm, because the publication was from


last year, so there should be improvements.


Cyndi: a multi-objective evolution algorithm based method for
bioactive molecular conformational generation


http://www.biomedcentral.com/1471-2105/10/101


The validation set can be found here in the first file, statistics in the other ones.


http://www.biomedcentral.com/1471-2105/10/101/additional/


 


Can you guide me to a short JAVA snippet to calculate the RMS of molecule alignment?


Or can I use old code from the forum (see quatfit thread some time ago).


The idea was to use


1) molimporter to read the 3D structures,


2) save the original molecule in 3D


3) perform a 2D conversion and subsequent MARVIN 3D generation (fast or slow)


4) overlap or align the original 3D struc with the MARVIN 3D struc


5) calculate RMSE of the alignment, print it with time used for calculation


6) goto next molecule


 


I think Adrian Kalaszi had an API example but at this time there was no RMS output.


Cheers


Tobias


 

ChemAxon 1b9e90b2e7

13-04-2010 10:06:26

Hi Tobias,


we are testing the forcefield(s) used by the Generate3D in a quite similar manner.


I will provide a jar file that works as you expect. It should be ready quite soon.


Cheers,


Adrian

ChemAxon 1b9e90b2e7

14-04-2010 11:28:51










adrian wrote:

Hi Tobias,


we are testing the forcefield(s) used by the Generate3D in a quite similar manner.


I will provide a jar file that works as you expect. It should be ready quite soon.


Cheers,


Adrian



The test environment is ready I will contact you in email.


Adrian