Since regeneration of 5.4 tabels was so slow, I decided to recreate my 5 miljon compound database (in 5.3 this was an overnight import job from a smiles file with duplicate and tautomer checking on). I have 1 Gb of RAM dedicated from start and max 2 Gb. This gave previously an extremely good performance both when importing and searching.
In 5.4 I experience importing to be extremely slow. My first attempt was an import on a local machine into a remote mysql database. After a weekend only 27% had been imported, and at that point the speed was about 1 compound per second. I canceled the import and restarted to import what remained (removed already importe compounds from the smiles file). From the start it didn't come over 1 compound per second.
In order to eliminate network-related problems, I installed IJC5.4 directly on the server itself and started off from scratch (also with 1Gb start and 2 Gb max RAM dedicated). The first few hundred compounds go reasonably fast, but once over 1000 imported compounds things slow down to about 10 cpds/sec. I checked the load on the server, which was 30% for Java (IJC I reckon) only. I used the same server and setup when I created the database in 5.3.
What is going on here? Is it the duplicate and tautomer checking that slow things down so much?
We'll check this out, but it is certainly true that both duplicate checking and tautomer detection will slow import down as extra checks need to be performed prior to inserting the structure. But we're not aware of any reason why this would be slower with 5.4 than with 5.3, and we'll check this.
In JChem 220.127.116.11 (the JChem version used in IJC 5.4) we have made changes in the tautomer generator to improve accuracy. We knew that it impacted the performance, although we did not anticipate this large slowdown.
Now it seems that there are certain individual structures where performance is severely affected, while most are OK.
My colleagues are checking the situation.
Unfortunately, currently there is no workaround in IJC 5.4. I suggest to keep using version 5.3.X for the moment.
I am sorry for the inconvenience.
OK, thanks for letting me know. Unfortunately I cannot revert to 5.3, but instead can turn of tautomer checking - this is not crucial for this particular database.
It must be the tautomer checker, because if I switch it off the import goes at 100 cpds/sec, which is what it used to be.
My colleague has fixed the speed, and 5.4.1 will be as fast as the 5.3 version, with better accuracy.
Thanks for your input.