I build a test database with 1.2 million structures as local database. The good news is that one can build such a large database. However, it seems to me the SSS is with 4 minutes about as fast (or slow) as a text string search. It seems the structures are not indexed particularly. Or, do I need to make adjustments.
There should be no need to make any adjustments. So it seems that the time is about right. I'm affraid that it cannot be speeded up easily. We have some customers using MySQL with several millon structures and the search takes several minutes to run...
We provide the solution based on Oracle Cartridge which is particularly suitable for large databases. The difference in performance against the local database is enormous. Please refer to JChem Cartridge product if you are interested.
Sorry, I was not precise about the search times. Only the first search could take minutes and next searches should be finished within seconds, when the structures are in cache.
Yes - only the first search is not so fast!
I am working on a 64 bit Window 8 machine (modified to Windows 7 look and feel) with 8GB Ram. Would it make a difference to install Java 64 bit? Presently the 32 bit version is installed.
We are testing speed also on a 64bit Linux server with MySQL as database.
I have no benchmarks available wrt your question right now.
Basically I would not expect a big difference in the first search. It's limited by the Derby Database engine (for the local database) and its IO operations.
What could help in case of really big database is increasing Java heap memory available to Instant JChem. But this counts only for other searches from cache (so the cache can store more structures). Java heap memory size for 32bit Java is limited to ~1300MB whereas for 64bit Java you can set up more.
Searchtimes are roughly similar with a remote MySql database running on a linux server with bucketloads of ram.
On a 9.8 milion compounds table the initial search takes around 40 minutes in our setup. That is about the time needed to build a new index, so i guess that is what hapens (probably clientside). Consecutive searches use the same calulated index (or whatever the term is) and are therefore superfast.
If you then change something essential to the search criteria (so not adding more restrictions, but for example change the scaffold structure in your search) this whole process will take place all over again.