API JChemSearch and multiThreading - ChemAxon Forum Archive

User dfeb81947d

21-06-2011 12:46:35

Dear support,

I wonder how JChemSearch deals with multithreading? I tryed two methods: read SD File and for each molecule make a structure search.

First Method I call the search within a thread, for exemple if all structure are loaded on a Map.
I tried also using AWT thread (SwingUtilities.invokeAndWait(Runnable))

for(Entry<Integer, String> entry : map.entrySet()) {
   final String molfile = entry.getValue();
   new Thread(new Runnable() {
         @Override
         public void run() {
              // get new Connection from Pool and initialize ConnectionHandler
              JChemSearch searcher = new JChemSearch();
              searcher.setConnectionHandler(connectionHandler);
              searcher.setStructureTable(csmolTable);
              searcher.setResultTableMode(JChemSearch.NO_RESULT_TABLE);
              searcher.setInfoToStdError(false);
              searcher.setRunMode(JChemSearch.RUN_MODE_SYNCH_COMPLETE);
              searcher.setQueryStructure(molfile);
              searcher.setSearchOptions(searchOptions);
              try {
                 searcher.run();
              } catch (Exception e) {
                  e.printStackTrace();
              }
         }
   }).start();

}

The other solution is to do each search following each others (the following code is also an example)

for(Entry<Integer, String> entry : map.entrySet()) {
   final String molfile = entry.getValue();
   // get new Connection from Pool and initialize ConnectionHandler
   JChemSearch searcher = new JChemSearch();
   searcher.setConnectionHandler(connectionHandler);
   searcher.setStructureTable(csmolTable);
   searcher.setResultTableMode(JChemSearch.NO_RESULT_TABLE);
   searcher.setInfoToStdError(false);
   searcher.setRunMode(JChemSearch.RUN_MODE_SYNCH_COMPLETE);
   searcher.setQueryStructure(molfile);
   searcher.setSearchOptions(searchOptions);
   try {
        searcher.run();
   } catch (Exception e) {
       e.printStackTrace();
   }
}

I use a set of 100 molécules to be searched on a database containing 800 000 structures.

I tried substructure search and similarity search on the following method: do a blank search to load all structure in cache, search for 100 structures with multi-threading, when it's over calculate time. search for 100 structures in single thread.

It appears that multi-threading search is not faster, sometimes even slower than the sequential search.

How do JChemSearch on API manage structure search when multi-threading access?

I'm using JChem 5.2.6 with jdk 1.6 on oracle 10g

Thank you very much for your hints and explanations.

Best Regards,

Jacques

ChemAxon 8407015329

22-06-2011 13:25:24

Hi Jacques,

I looked after the issue you experienced using jchem 5.2.6.

JChemSearch is using multithreaded execution under the hood. It checks for the available/optimal number of threads to be used and searches paralelly with that many thread, finally merges the results.

In the two cases you implemented and tested the sequential(second) one is using the resources optimally, while the first solution, searching with 100 JChemSearch instances start at least 200 threads(maybe much more if you have a multicore computer) and thus there is a big overhead in switching between threads and synchronizing access to the structure cache or any other resources used.

If you would like to be sure that all your resources are used at maximum, i would suggest to try and run various number of paralell JChemSearch instances(2, 3, 4, etc...) and monitor system usage during the search. Due to the fact that JChemSearch uses multiple threads, i doubt that you could improve your execution time significantly.

Best regards,

Vencel