error during jchemsearch on a very huge structure table

User fca35de8d7

18-06-2013 10:56:01

Hello,


We have running  PP 8.5 and PP9.0.2 Server with chemaxon 6.0 .

During a jchemsearch on a very huge structure table we get the following error:

GC overhead limit exceeded
JChem version : 6.0.0
Component collection version : 2.6.2_j60
JVM: Oracle Corporation  Java HotSpot(TM) 64-Bit Server VM  1.7.0_15
Memory:  7878.0 MB maximum  7878.0 MB total  2499.0 MB free
OS: amd64 Linux 2.6.35.12-90.fc14.x86_64
Exception ID: E74799659
Current input structure:
c12c(cccc1O)cccc2
Please find detailed report in log file: "/data/scitegicadmin/.chemaxon/logs/pp_error_2013_06_17.txt"
java.lang.OutOfMemoryError: GC overhead limit exceeded
    at chemaxon.jchem.db.index.MolIndexNIOStore.get(MolIndexNIOStore.java:236)
    at chemaxon.jchem.db.cache.JChemCache.getIndexEntry(JChemCache.java:1123)
    at chemaxon.jchem.db.cache.JChemCache$1.next(JChemCache.java:1232)
    at chemaxon.jchem.db.cache.JChemCache$1.next(JChemCache.java:1220)
    at chemaxon.jchem.base.persist.impl.sql.search.SqlTargetCollection.getContexts(SqlTargetCollection.java:1157)
    at chemaxon.jchem.base.persist.impl.FilteredTargetCollection.getContexts(FilteredTargetCollection.java:296)
    at chemaxon.jchem.base.search.MolSetSearchImpl.getCandidates(MolSetSearchImpl.java:3097)
    at chemaxon.jchem.base.search.MolSetSearchImpl.searchCore(MolSetSearchImpl.java:1806)
    at chemaxon.jchem.base.search.MolSetSearchImpl.enumeratedSearch(MolSetSearchImpl.java:3768)
    at chemaxon.jchem.base.search.MolSetSearchImpl.search1(MolSetSearchImpl.java:1701)
    at chemaxon.jchem.base.search.MolSetSearchImpl.search(MolSetSearchImpl.java:1595)
    at chemaxon.jchem.base.search.MolSetSearchImpl.run(MolSetSearchImpl.java:1534)
    at chemaxon.jchem.db.JChemSearch.search(JChemSearch.java:1453)
    at chemaxon.jchem.db.JChemSearch.run(JChemSearch.java:1411)
    at chemaxon.pp.JChemSearch.search(JChemSearch.java:336)
    at chemaxon.pp.JChemSearch.onProcessBody(JChemSearch.java:303)
    at chemaxon.pp.ChemAxonComponent.onProcess(ChemAxonComponent.java:53)
    at com.scitegic.pilot.Pilot.callOnProcess(Pilot.java:331)
CComponentJavaPlugin::onProcess: Pipeline Pilot exception rethrown
CProtocolStd::onProcess: Pipeline Pilot exception rethrown

in the log fil I found:


Component collection version : 2.6.2_j60
JVM: Oracle Corporation  Java HotSpot(TM) 64-Bit Server VM  1.7.0_15
Memory:  7563.0 MB maximum  7563.0 MB total  2125.0 MB free
OS: amd64 Linux 2.6.35.12-90.fc14.x86_64
Exception ID: E49841013

Current input structure:
c12c(cccc1O)cccc2


Stack trace:
java.lang.OutOfMemoryError: GC overhead limit exceeded
        at chemaxon.jchem.base.search.SContext.semiClone(SContext.java:41)
        at chemaxon.jchem.base.persist.impl.sql.search.SqlTargetCollection.getContexts(SqlTargetCollection.java:1158)
        at chemaxon.jchem.base.persist.impl.FilteredTargetCollection.getContexts(FilteredTargetCollection.java:296)
        at chemaxon.jchem.base.search.MolSetSearchImpl.getCandidates(MolSetSearchImpl.java:3097)
        at chemaxon.jchem.base.search.MolSetSearchImpl.searchCore(MolSetSearchImpl.java:1806)
        at chemaxon.jchem.base.search.MolSetSearchImpl.enumeratedSearch(MolSetSearchImpl.java:3768)
        at chemaxon.jchem.base.search.MolSetSearchImpl.search1(MolSetSearchImpl.java:1701)
        at chemaxon.jchem.base.search.MolSetSearchImpl.search(MolSetSearchImpl.java:1595)
        at chemaxon.jchem.base.search.MolSetSearchImpl.run(MolSetSearchImpl.java:1534)
        at chemaxon.jchem.db.JChemSearch.search(JChemSearch.java:1453)
        at chemaxon.jchem.db.JChemSearch.run(JChemSearch.java:1411)
        at chemaxon.pp.JChemSearch.search(JChemSearch.java:336)
        at chemaxon.pp.JChemSearch.onProcessBody(JChemSearch.java:303)
        at chemaxon.pp.ChemAxonComponent.onProcess(ChemAxonComponent.java:53)
        at com.scitegic.pilot.Pilot.callOnProcess(Pilot.java:331)

CTAB of current structure:

  SciTegic06181321532D

 11 12  0  0  0  0            999 V2000
   -0.3830    0.2063    0.0000 C   0  0
   -1.0975    0.6188    0.0000 C   0  0
   -1.8120    0.2063    0.0000 C   0  0
   -1.8120   -0.6187    0.0000 C   0  0
   -1.0975   -1.0312    0.0000 C   0  0
   -0.3830   -0.6187    0.0000 C   0  0
    0.3314   -1.0312    0.0000 C   0  0
    1.0459   -0.6187    0.0000 C   0  0
    1.0459    0.2063    0.0000 C   0  0
    0.3314    0.6188    0.0000 C   0  0
    0.3314    1.4438    0.0000 O   0  0
  1  2  1  0
  1  6  2  0
  1 10  1  0
  2  3  2  0
  3  4  1  0
  4  5  2  0
  5  6  1  0
  6  7  1  0
  7  8  2  0
  8  9  1  0
  9 10  2  0
 10 11  1  0
M  END
$$$$

---------------------------------

************************************************
(END)


 


Thanks for helping me


 


Bernd

ChemAxon 9c0afc9aaf

19-06-2013 10:53:53

Hi,


 


From the error log it seems you have increased the JVM heap size successfuly, as also described in the README.txt of the package.


There is only one Java memory size to set.


It seems the latest limit is still not enough. How many structures you have in all used tables ?


The "GC overhead limit" may indicate that you might be getting close to the minimum memory size that would work.


I suggest to increase it to such an amount that the cache will fit comfortably in the memory.


Please see the following links on how to estimate the structire cache / total memory size:


http://www.chemaxon.com/jchem/doc/admin/cartridge.html#server_memory_app


http://www.chemaxon.com/jchem/doc/admin/Performance.html#cacheSize


 


Best regards,


 


Szilard


 

User fca35de8d7

20-06-2013 08:57:20

Hi,


it is a standart jchemstructure table with 38 mill structures.


I will test bigger cache size and tell you later how it works.


 


Regards,


 


Bernd

ChemAxon 9c0afc9aaf

21-06-2013 13:16:12

Hi,


Any luck with a bigger heap size ?


Please note that if you are searching on multiple tables, all should fit into the cache.


Best regards,


Szilard


 

User fca35de8d7

21-06-2013 13:36:39

Hi Szilard,


32GB was sufficient for the search.
The pipe runs 2 hours 35 minutes and 33 seconds in Pipeline pilot 9.0.2!
In Pipeline pilot I have a run time of 2 hours 45 minutes, and is still running.


 


Best regards


 


Bernd

ChemAxon 9c0afc9aaf

24-06-2013 08:23:54

Hi Bernd,


Most of this time is not the search time, but cache loading time - our search is much-much faster.


If
you run multiple searches in the same protocol you may notice that only
the first search is slow, subsequent searches can be orders of
magnitude faster.


Unfortunately the cache is not preserved in Pipeline Pilot between protocol runs.


Because of this the JChemSearch component is not optimal for very big datasets (similarly to the "jcsearch" command-line tool).


(To
add insult to injury the cache loading time has also slowed down lately
in JChem because of a new feature. This is expected to speed up again
with further development, making some improvement in such cases - but not
orders of magnitude)


One solution could be to connect to the JChem Cartridge, which premanently holds the cache centrally and does the calculations too.


Right now one can do this via writing SQL for the standard ODBC components in PP.


We understand that it is not for the average user, so we are planning to provide a cartridge search component in the furure.


JChem
Web services may also be a candidate to connect to with a search component, if there is demand.
It has an advantage that it does not require Oracle.


Best regards,


Szilard