Search performance with many concurrent requests

User 0261d34ad7

02-08-2012 13:47:28

Hi,

I have a few questions about the scalability of structure searching. Any help you can provide would be greatly appreciated.

We expose structure searching via a web service. Requests are received via a RESTful interface, passed off to JChemSearch (in asynch mode), progress information is returned, then the results can be retrieved.

This seems to work fine when a single user is accessing the server, but when we scale up, search performance degrades linearly. That is, if it takes ten seconds to perform a search for one user, it will take sixy seconds to perform six simulaneous searches. We also receive Java Out of Memory issues when the number of concurrent requests reaches ~10.

So the question is, what can we do to improve scalability, performance, and reliability in our application?

We've already reviewed the "search performance sticky" and will be testing different "max hits" parameter values. We'll also be reviewing the screening data provided on the stderr stream, and potentially increasing the number of cores on each server.

What we don't know is:

- How does JChemSearch handle multiple concurrent requests? Are all requests processed by the same set of threads?

- Are there any search related resources for which access is serialized, e.g. the structure cache? Could something like this be slowing things down?

- How much memory is needed per request, given our database size (12m structures)?

- Are there any configuration settings that influence performance when running multiple concurrent requests?

- And finally, will the "max hits" parameter make any difference to duplicate or exact searches? If we set a value of 1, for example, would that actually have an effect?

Again any help is appreciated!

Kind regards,

Jim

ChemAxon 9c0afc9aaf

02-08-2012 16:42:25

 but when we scale up, search performance degrades linearly. That is, if it takes ten seconds to perform a search for one user, it will take sixy seconds to perform six simulaneous searches.

As even a single search is utilizing all CPU cores, this is the expected and the ideal scenario.

For 6x the work you need 6x the time utilizing the same resources.

- How does JChemSearch handle multiple concurrent requests? Are all requests processed by the same set of threads?

With parallel request each JChemSearch will be already in a separate thread, and each will spawn as many processing threads on its own as many CPU cores are visible.

- Are there any search related resources for which access is serialized, e.g. the structure cache? Could something like this be slowing things down?

Normally not (unless the cache needs to be loaded/updated after change to the table, all threads will wait for that)

There does not seem to be any slowdown in your case. (6x computation is done in 6x time period)

- How much memory is needed per request, given our database size (12m structures)?

It is hardto give an exact figure as it depends on a range of factors, not excluding memory consupton in your part of the code.

Probably the simplest to test the worst case scenario with parallel searches returning the whole database,

(e.g. with an "any" atom), and see rouhgly how much is the difference between the minimum memory sie which allows 1 and let's say 6 searches to complete.

My colleagues may have additional comments on this.

- Are there any configuration settings that influence performance when running multiple concurrent requests?

If the total number of threads is orders of magnitudes higher than the number of cores (extreme parallel usage), the "slowdown" may not be linear due to the added overhead of too many threads.

In this case it might be beneficial to reduce he number of allowed processing threads per request (the default equals the number of detected CPU cores.)

http://www.chemaxon.com/jchem/doc/dev/java/api/chemaxon/jchem/db/JChemSearch.html#setNumberOfProcessingThreads(int)

I think this situation is probably very rare.

- And finally, will the "max hits" parameter make any difference to duplicate or exact searches? If we set a value of 1, for example, would that actually have an effect?

Yes, it can make a difference (depending on the number of duplicates present, etc)

Best regards,

Szilard