User e05b1833aa
25-01-2011 07:10:52
Hi Tim,
Since regeneration of 5.4 tabels was so slow, I decided to recreate my 5 miljon compound database (in 5.3 this was an overnight import job from a smiles file with duplicate and tautomer checking on). I have 1 Gb of RAM dedicated from start and max 2 Gb. This gave previously an extremely good performance both when importing and searching.
In 5.4 I experience importing to be extremely slow. My first attempt was an import on a local machine into a remote mysql database. After a weekend only 27% had been imported, and at that point the speed was about 1 compound per second. I canceled the import and restarted to import what remained (removed already importe compounds from the smiles file). From the start it didn't come over 1 compound per second.
In order to eliminate network-related problems, I installed IJC5.4 directly on the server itself and started off from scratch (also with 1Gb start and 2 Gb max RAM dedicated). The first few hundred compounds go reasonably fast, but once over 1000 imported compounds things slow down to about 10 cpds/sec. I checked the load on the server, which was 30% for Java (IJC I reckon) only. I used the same server and setup when I created the database in 5.3.
What is going on here? Is it the duplicate and tautomer checking that slow things down so much?
ChemAxon fa971619eb
25-01-2011 07:24:15
We'll check this out, but it is certainly true that both duplicate checking and tautomer detection will slow import down as extra checks need to be performed prior to inserting the structure. But we're not aware of any reason why this would be slower with 5.4 than with 5.3, and we'll check this.
ChemAxon a3d59b832c
26-01-2011 06:54:51
In JChem (the JChem version used in IJC 5.4) we have made changes in the tautomer generator to improve accuracy. We knew that it impacted the performance, although we did not anticipate this large slowdown.
Now it seems that there are certain individual structures where performance is severely affected, while most are OK.
My colleagues are checking the situation.
Unfortunately, currently there is no workaround in IJC 5.4. I suggest to keep using version 5.3.X for the moment.
I am sorry for the inconvenience.
Best regards,
User e05b1833aa
26-01-2011 08:14:36
OK, thanks for letting me know. Unfortunately I cannot revert to 5.3, but instead can turn of tautomer checking - this is not crucial for this particular database.
User e05b1833aa
26-01-2011 11:26:40
It must be the tautomer checker, because if I switch it off the import goes at 100 cpds/sec, which is what it used to be.
ChemAxon a3d59b832c
28-01-2011 09:53:51
Hi Evert,
My colleague has fixed the speed, and 5.4.1 will be as fast as the 5.3 version, with better accuracy.
Thanks for your input.
Best regards,