Large SMILES input & Standardised Molsearch ->Java ex

User d6c1b7eb8c

10-09-2009 12:47:39

 


Hi, I am aware that inputting huge SMILES is not necessarily what JChem is designed to be able to cope with, however I have come across a situation where JChem not only cannot perform a substrucure search successfully, but also causes a fatal Java OutOfMemory exception. I am looking for a way to ensure that my script runs smoothly, no matter what junk people put into it, so I'm not too interested if it actually performs the search on ridiculous molecules.

There is no problem when doing a MolSearch or Counting the Number of Heavy atoms of these smiles, the problem only arises when performing a StandardizedMolSearch(). I have also noticed that I cannot replicate the problem by placing one of the huge SMILES files on its own, and querying it, the only way I can replicate this error is by feeding in a large smiles file containing hundreds of thousands of molecules, where there is a section of these huge ones that it starts to fail upon.


Do you know why this may be occurring, or any way to make it continue more gracefully?

Thanks,
Ben


 




 


Using the following code allows the code to continue to the next SMARTS query/SMILES target gracefully for a period of time;


 try:
    isMatching = subSearch.isMatching()


   except SearchException, e:
    sys.stderr.write(e.getMessage())
    sys.stderr.write("Search Exception on SMILES %s and SMARTS %s \n" % (smiles, Name))
   
   except Error, e:
    if errorCounter == 2:
     skipSMARTSdueToError = True
     break #or continue etc. to get it to move to next SMILES/SMARTS


however eventually an unavoidable  error occurs as the Java VM runs out of memory to write the error messages:


Java heap spaceTraceback (most recent call last):
  File "./JMasterGrep_err3.jy", line 212, in <module>
    sys.stderr.write("Java failed on SMILES %s and SMARTS %s \n" % (smiles, Name))
java.lang.OutOfMemoryError: Java heap space


java.lang.OutOfMemoryError: java.lang.OutOfMemoryError: Java heap space




The Error Stack that is produced is as follows:


java.lang.OutOfMemoryError: Java heap space


            at chemaxon.struc.CGraph.clonecopy(CGraph.java:1255)


            at chemaxon.struc.MoleculeGraph.clonecopy(MoleculeGraph.java:1817)


            at chemaxon.struc.Molecule.clonecopyWithoutSgroups(Molecule.java:1115)


            at chemaxon.struc.Molecule.clonecopy(Molecule.java:971)


            at chemaxon.struc.Molecule.cloneMolecule(Molecule.java:1218)


            at chemaxon.struc.Molecule.clone(Molecule.java:1227)


            at chemaxon.sss.search.MolSearch.standardizeTarget(MolSearch.java:969)


            at chemaxon.sss.search.MolSearch.standardize(MolSearch.java:941)


            at chemaxon.sss.search.MolSearch.standardizeTarget(MolSearch.java:1331)


            at chemaxon.sss.search.MolSearch.initSearch(MolSearch.java:1159)


            at chemaxon.sss.search.MolSearch.isMatching(MolSearch.java:601)


            at sun.reflect.GeneratedMethodAccessor6.invoke(Unknown Source)


            at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)


            at java.lang.reflect.Method.invoke(Method.java:597)


            at org.python.core.PyReflectedFunction.__call__(PyReflectedFunction.java:175)


            at org.python.core.PyObject.__call__(PyObject.java:355)


            at org.python.core.PyMethod.__call__(PyMethod.java:215)


            at org.python.core.PyMethod.instancemethod___call__(PyMethod.java:221)


            at org.python.core.PyMethod.__call__(PyMethod.java:206)


            at org.python.core.PyObject.__call__(PyObject.java:381)


            at org.python.core.PyObject.__call__(PyObject.java:385)


            at org.python.pycode._pyx0.f$0(./JMasterGrep_err3.jy:233)


            at org.python.pycode._pyx0.call_function(./JMasterGrep_err3.jy)


            at org.python.core.PyTableCode.call(PyTableCode.java:165)


            at org.python.core.PyCode.call(PyCode.java:18)


            at org.python.core.Py.runCode(Py.java:1197)


            at org.python.util.PythonInterpreter.execfile(PythonInterpreter.java:166)


            at org.python.util.jython.run(jython.java:229)


            at org.python.util.jython.main(jython.java:117)


 




 


And example of one of the huge smiles I am inputting is:


Cc1cn([C@H]2C[C@H](OP(=O)(O)OC[C@H]3O[C@H](C[C@@H]3O)n4cnc5c(=O)[nH]c(N)nc45)[C@@H](COP(=O)(O)O[C@H]6C[C@@H](O[C@@H]6COP(=O)(O)O[C@H]7C[C@@H](O[C@@H]7COP(=O)(O)O[C@H]8C[C@@H](O[C@@H]8COP(=O)(O)O[C@H]9C[C@@H](O[C@@H]9COP(=O)(O)O[C@H]%10C[C@@H](O[C@@H]%10COP(=O)(O)O[C@H]%11C[C@@H](O[C@@H]%11COP(=O)(O)O[C@H]%12C[C@@H](O[C@@H]%12COP(=O)(O)O[C@H]%13C[C@@H](O[C@@H]%13COP(=O)(O)O[C@H]%14C[C@@H](O[C@@H]%14COP(=O)(O)O[C@H]%15C[C@@H](O[C@@H]%15COP(=O)(O)O[C@H]%16C[C@@H](O[C@@H]%16COP(=O)(O)O[C@H]%17C[C@@H](O[C@@H]%17COP(=O)(O)O[C@H]%18C[C@@H](O[C@@H]%18COP(=O)(O)O[C@H]%19C[C@@H](O[C@@H]%19COP(=O)(O)O[C@H]%20C[C@@H](O[C@@H]%20COP(=O)(O)O[C@H]%21C[C@@H](O[C@@H]%21COP(=O)(O)O[C@H]%22C[C@@H](O[C@@H]%22COP(=O)(O)O[C@H]%23C[C@@H](O[C@@H]%23COP(=O)(O)O[C@H]%24C[C@@H](O[C@@H]%24COP(=O)(O)O[C@H]%25C[C@@H](O[C@@H]%25COP(=O)(O)O[C@H]%26C[C@@H](O[C@@H]%26COP(=O)(O)O[C@H]%27C[C@@H](O[C@@H]%27COP(=O)(O)O[C@H]%28C[C@@H](O[C@@H]%28COP(=O)(O)O[C@H]%29C[C@@H](O[C@@H]%29COP(=O)(O)O[C@H]%30C[C@@H](O[C@@H]%30COP(=O)(O)O[C@H]%31C[C@@H](O[C@@H]%31COP(=O)(O)O[C@H]%32C[C@@H](O[C@@H]%32COP(=O)(O)O[C@H]%33C[C@@H](O[C@@H]%33COP(=O)(O)O[C@H]%34C[C@@H](O[C@@H]%34COP(=O)(O)O[C@H]%35C[C@@H](O[C@@H]%35COP(=O)(O)O[C@H]%36C[C@@H](O[C@@H]%36COP(=O)(O)O[C@H]%37C[C@@H](O[C@@H]%37COP(=O)(O)O[C@H]%38C[C@@H](O[C@@H]%38COP(=O)(O)O[C@H]%39C[C@@H](O[C@@H]%39COP(=O)(O)O[C@H]%40C[C@@H](O[C@@H]%40COP(=O)(O)O[C@H]%41C[C@@H](O[C@@H]%41COP(=O)(O)O[C@H]%42C[C@@H](O[C@@H]%42COP(=O)(O)O[C@H]%43C[C@@H](O[C@@H]%43COP(=O)(O)O[C@H]%44C[C@@H](O[C@@H]%44COP(=O)(O)O[C@H]%45C[C@@H](O[C@@H]%45COP(=O)(O)O[C@H]%46C[C@@H](O[C@@H]%46COP(=O)(O)O[C@H]%47C[C@@H](O[C@@H]%47COP(=O)(O)O[C@H]%48C[C@@H](O[C@@H]%48COP(=O)(O)O[C@H]%49C[C@@H](O[C@@H]%49COP(=O)(O)O[C@H]%50C[C@@H](O[C@@H]%50COP(=O)(O)O[C@H]%51C[C@@H](O[C@@H]%51COP(=O)(O)O[C@H]%52C[C@@H](O[C@@H]%52COP(=O)(O)O[C@H]%53C[C@@H](O[C@@H]%53COP(=O)(O)O[C@H]%54C[C@@H](O[C@@H]%54COP(=O)(O)O[C@H]%55C[C@@H](O[C@@H]%55COP(=O)(O)O[C@H]%56C[C@@H](O[C@@H]%56COP(=O)(O)O[C@H]%57C[C@@H](O[C@@H]%57COP(=O)(O)O[C@H]%58C[C@@H](O[C@@H]%58COP(=O)(O)O[C@H]%59C[C@@H](O[C@@H]%59COP(=O)(O)O[C@H]%60C[C@@H](O[C@@H]%60COP(=O)(O)O[C@H]%61C[C@@H](O[C@@H]%61COP(=O)(O)O[C@H]%62C[C@@H](O[C@@H]%62COP(=O)(O)O[C@H]%63C[C@@H](O[C@@H]%63COP(=O)(O)O[C@H]%64C[C@@H](O[C@@H]%64CO)n%65ccc(N)nc%65=O)n%66cnc%67c(=O)[nH]c(N)nc%66%67)n%68cnc%69c(N)ncnc%68%69)n%70cnc%71c(N)ncnc%70%71)n%72cc(C)c(=O)[nH]c%72=O)n%73cc(C)c(=O)[nH]c%73=O)n%74ccc(N)nc%74=O)n%75cnc%76c(=O)[nH]c(N)nc%75%76)n%77cnc%78c(N)ncnc%77%78)n%79cnc%80c(=O)[nH]c(N)nc%79%80)n%81cc(C)c(=O)[nH]c%81=O)n%82ccc(N)nc%82=O)n%83ccc(N)nc%83=O)n%84cnc%85c(=O)[nH]c(N)nc%84%85)n%86ccc(N)nc%86=O)n%87cc(C)c(=O)[nH]c%87=O)n%88ccc(N)nc%88=O)n%89cnc%90c(N)ncnc%89%90)n%91cnc%92c(=O)[nH]c(N)nc%91%92)n%93cc(C)c(=O)[nH]c%93=O)n%94cc(C)c(=O)[nH]c%94=O)n%95ccc(N)nc%95=O)n%96ccc(N)nc%96=O)n%97ccc(N)nc%97=O)n%98cnc%99c(=O)[nH]c(N)nc%98%99)n3cnc4c(N)ncnc34)n5cnc6c(N)ncnc56)n7ccc(N)nc7=O)n8cc(C)c(=O)[nH]c8=O)n9cnc%10c(=O)[nH]c(N)nc9%10)n%11cnc%12c(=O)[nH]c(N)nc%11%12)n%13cc(C)c(=O)[nH]c%13=O)n%14cnc%15c(N)ncnc%14%15)n%16ccc(N)nc%16=O)n%17cnc%18c(N)ncnc%17%18)n%19cc(C)c(=O)[nH]c%19=O)n%20ccc(N)nc%20=O)n%21cc(C)c(=O)[nH]c%21=O)n%22ccc(N)nc%22=O)n%23ccc(N)nc%23=O)n%24cnc%25c(N)ncnc%24%25)n%26ccc(N)nc%26=O)n%27ccc(N)nc%27=O)n%28cc(C)c(=O)[nH]c%28=O)n%29ccc(N)nc%29=O)n%30ccc(N)nc%30=O)n%31ccc(N)nc%31=O)n%32cnc%33c(N)ncnc%32%33)n%34cnc%35c(=O)[nH]c(N)nc%34%35)n%36cnc%37c(=O)[nH]c(N)nc%36%37)n%38ccc(N)nc%38=O)n%39cc(C)c(=O)[nH]c%39=O)n%40cnc%41c(=O)[nH]c(N)nc%40%41)n%42cnc%43c(N)ncnc%42%43)n%44cnc%45c(=O)[nH]c(N)nc%44%45)n%46cnc%47c(N)ncnc%46%47)n%48cnc%49c(N)ncnc%48%49)n%50ccc(N)nc%50=O)n%51cnc%52c(N)ncnc%51%52)O2)c(=O)[nH]c1=O






Thanks for any help,

Ben


 

ChemAxon 9c0afc9aaf

10-09-2009 21:55:57

 I am aware that inputting huge SMILES is not necessarily what JChem is designed to be able to cope with


Ther is no problem with the size of your SMILES: there is no limit regarding the length of inout SMILES strings.


THe problem is that not enough memory is provided for the application.


Most calcualtions have some temporary memory requirement.


In this case


- the memory need is bigger for huge molecules


- StandardizedMolSearch needs some more memory than MolSearch, because the huge Molecule is also duplicated:


 The query and target molecules are cloned 
before standardization, so the original objects are not modified during
the use of this class.



(In fact the stack suggest the cloning part is where your application fails.)


Solution:


More memory (heap) should be provided to the Java Virtual Machine, so it will not be close to running out of resources.


Please also see:


http://www.chemaxon.com/jchem/FAQ.html#outofmemory


 


There may be other, application specific solutions:


- limiting searches to certain molecule size ( Molecule.getAtomCount() )


- limiting the number of concurrent users


- using MolSearch instead of StandardizedMolSearch


- using less memory in other parts of the program


- etc.


 


Best regards,


Szilard