User 0261d34ad7
02-08-2012 13:47:28
Hi,
I have a few questions about the scalability of structure searching. Any help you can provide would be greatly appreciated.
We expose structure searching via a web service. Requests are received via a RESTful interface, passed off to JChemSearch (in asynch mode), progress information is returned, then the results can be retrieved.
This seems to work fine when a single user is accessing the server, but when we scale up, search performance degrades linearly. That is, if it takes ten seconds to perform a search for one user, it will take sixy seconds to perform six simulaneous searches. We also receive Java Out of Memory issues when the number of concurrent requests reaches ~10.
So the question is, what can we do to improve scalability, performance, and reliability in our application?
We've already reviewed the "search performance sticky" and will be testing different "max hits" parameter values. We'll also be reviewing the screening data provided on the stderr stream, and potentially increasing the number of cores on each server.
What we don't know is:
- How does JChemSearch handle multiple concurrent requests? Are all requests processed by the same set of threads?
- Are there any search related resources for which access is serialized, e.g. the structure cache? Could something like this be slowing things down?
- How much memory is needed per request, given our database size (12m structures)?
- Are there any configuration settings that influence performance when running multiple concurrent requests?
- And finally, will the "max hits" parameter make any difference to duplicate or exact searches? If we set a value of 1, for example, would that actually have an effect?
Again any help is appreciated!
Kind regards,
Jim