Rebuild index timing

User c5c63b5c6a

01-02-2010 14:38:45

Since the release of 530 the timing for the rebuild of the jchem indexes has increased dramatically.


 


PRD index takes 25 min to build from scratch using 526, the rebuild on the test database took 55 minutes. What has changed so much in 530 to increase the timing. Am going to create the index from scratch to see it there is a difference there as well.

ChemAxon aa7c50abf8

01-02-2010 20:06:14

Rachel,


Using the first 4.5 million structures from pubchem, I measured the following execution times (same machine, same 10gR2 Oracle instance):




















Command \ JCC version 5.2.6 5.3.0
CREATE INDEX 25 mins 28 mins
INDEX REBUILD 28 mins 32 mins

There appears to be a minor slow-down from 5.2.6 to 5.3.0, but far from the order of magnitude you reported.


Let me know whether your further tests confirm or refute your initial observation. (It is not entirely impossible that your input significantly differs from mine and that there has been a major slow-down in processing the particular kind of structures which is prevalent in your input. We're not aware of any input-specific degradation of this magnitude, though.)


Thanks


Peter

User c5c63b5c6a

02-02-2010 13:25:50

Dropped the index and rec-reated is from scratch. Index creation on the same database took 45 minutes compared to the rebuild that took 50 minutes.

ChemAxon aa7c50abf8

02-02-2010 14:01:54

Thank you, Rachel, for the feed-back!


These latest results come from your test system, correct? Were they obtained using JChem 5.2.6 or 5.3.0?


Is your production and test hardware equally powerful? Do they have the same workload while the measurements were/are done?


Thanks


Peter

User c5c63b5c6a

03-02-2010 09:39:07


Hi Peter,



 


Here's the info you requested, let me know if you need anything else.

 


The tests for uktst605 were performed using 5.3.0.

 


In ukprd605 (production) the table contains 4.6 million rows, for uktst605 (test) the table contains 3.8 million rows, data that has been copied from PRD. Workload wise, the PRD server would have been in use while the TST is not likely to have any use other than me performing this work.

 


PRD Server Spec:
Solaris 10, ULTRASPARC-IV, 8 Single CPU's, 1 core, processor speed 1050 with 32GB RAM

 


TST Server Spec:
Solaris 10, ULTRASPARC-IV+, 2 Multiple CPU's 4 core, processor speed 2150 with 32GB RAM

 


Thanks,
Rachel

 


 









User 9f6f294e9f

03-02-2010 11:18:15

Hi Peter


Yes, I'm back ! The temptation of working on Chemaxon products again was too much for me to resist


My memory may be playing tricks on me but I seem to recall that, in previous versions of the cartridge, the rebuild option ran far faster than a full index creation. Given the current timings there seems little value in re-building an index rather than dropping and re-creating, other than avoiding the need to re grant privileges to use the index.


Regards


Ant


 

ChemAxon aa7c50abf8

03-02-2010 12:33:25

Hi Ant,


Cool to have you back on board.


It depends on the purpose you want to use INDEX REBUILD for. If you want to use it for disk space reclamation/consolidation, there is no time benefit. (Actually, in this case, INDEX REBUILD and CREATE INDEX should take the same time, so I'll need to check why REBUILD takes longer than CREATE.)


But if you want use it for upgrading indexes as part of a JChem Cartridge upgrade (with the upgradeOnly:y parameter), chances are (especially between minor version upgrades) that the rebuild/upgrade will be instantaneous when changes in implementation don't require recalculation of index data.


But aren't we digressing from the original issue which was that there is a perceived dramatic slow-down in REBUILD from 5.2.6 to 5.3.0?


Wouldn't it be possible to run the REBUILD test with the two incriminated JChem versions on the same machine? I have the haunting feeling that we're comparing apples with oranges when running them on two different configurations. (I run them on the same machine and didn't find any major difference between the two versions.)


Thanks,


Peter

User c5c63b5c6a

03-02-2010 12:53:05

Hi Peter,


I added a comment - see above











Post subject: Timings
Posted: Tue Feb 02, 2010 2:25 pm 

These timings were on the same database (uktst605) 45 minutes for the index creation and 50 for the index re-build, so lightly slower for the creation.


We have another two databases that we're looking to upgrade from 522 (ustst620) and 526 (ukint605) to 530 that we could look at the timings of the re-build v. index creation.


Regards,


Rachel

User 9f6f294e9f

03-02-2010 13:34:20

Hi again


I think that I see what has happened here. In your previous reply, Peter, you said "that the rebuild/upgrade will be instantaneous when changes in implementation don't require recalculation of index data." I guess that the 5.3 upgrade does, though, required recalculation of index data ?



Is there any way of knowing in advance whether the index data will be recalculated ? I'm guessing that this is only likely to occur when moving between major releases.


If we knew in advance that a rebuild would take about the same time as dropping and re-creating do you see any benefit in going for the drop and re-create rather than the rebuild ?


Regards


Ant

ChemAxon aa7c50abf8

03-02-2010 18:18:08

Hi Ant,


Yes, upgrading to JChem 5.3 requires regeneration. In the list of changes, there is a separate section describing whether regeneration is required after upgrading from the previous JChem version.


Yes, regeneration is likely to be required only between major version changes. Between minor versions, we try to avoid making changes in the implementation which would require regeneration.


As you mentioned previously, rebuilding will save you having to re-grant the privileges on new database objects created as part of re-creating the index. Other than that I am not aware of any benefit or disadvantage.


Regards,


Peter

User 9f6f294e9f

03-02-2010 19:20:08

Thats useful to know, thanks Peter

ChemAxon aa7c50abf8

09-02-2010 18:15:14

I found out the reason for the speed difference between CREATE INDEX and INDEX REBUILD: one of the regular indices on the index table is created after the index table is fully populated at the final stage of CREATE INDEX, but the same regular index exists before and during INDEX REBUILD, making inserts into the index table 10-20% slower than during CREATE INDEX.


I changed the implementation of INDEX REBUILD so as to dropping the said index before the index table is populated and recreate after the population is complete. The change will appear in JChem version 5.3.1.


Peter

ChemAxon 990acf0dec

25-02-2010 18:47:08

Hi Ant,

I would like to inform you that we had to make an urgent patch release that was named 5.3.1, therefore the indexing change promised in this topic will be included only in the patch release coming at the end of March (probably named 5.3.2).

Best regards,

Akos

User 9f6f294e9f

26-02-2010 08:28:54

Thanks Akos


I saw the release notes for 5.3.1 yesterday and had realised that it probably wasn't the 5.3.1 I had been expecting ! Not a problem, and thanks for letting me know when the changes that I am looking for are likely to be released.


Regards


Ant