Question Index for JChem table .Net

User 761d9eb85b

20-08-2013 11:55:40

Hello,


We use .NET version of JChem 5.8 with sql server 2008 to search for both reactions and molecules.


Recently we found the search speed decrease significantly when the number of row in base reach 500000.


Then we have found a significant improvement of duplicate searching speed when we add an index on the cd_hash column :


CREATE NONCLUSTERED INDEX [JChemMolecule_cdhash_Index]


ON [dbo].[JChem_Molecule] ([cd_hash])


My question are :


Is there any document about this optimization procedure ?


Should I add more index to other colume like cd_fp1 ... to increase substructure searching speed ?


Thank your for help !


 


Ning

ChemAxon 61b4fee994

21-08-2013 07:44:43

Dear Ning,


The index you specified manually should have existed on the table. In fact, JChem creates this index immediately after creating the new table. This index indeed have some dramatic effects on the speed of duplicate search.


I can't exactly tell why this index is missing and even if it is a JChem bug or not. My first guess is that somewhere during a database upgrade or export it has been dropped from the table. If you have some more information or guess about this, please tell us about it.


There are two other indexes that are created by JChem which have less importance on the CD_PRE_CALCULATED and the CD_SORTABLE_FORMULA columns, but they have less importance. If they are also missing, I recommend you create it the same way you did on the CD_HASH columns. These help the precalcaulation during regeneration at upgrade and querying the sortable formula of the structures respecively.


There are no other indexes that would speed up other search types, since only the duplicate search uses the table columns directly, the other search types uses the cache in memory during search.


Regards,


Tamas

User 761d9eb85b

21-08-2013 13:32:06

The table have been upgraded from JChem 5.4 so maybe there was a bug which causes indexes missing.


I can confirm that the other 2 indexes are missing too.


Thank you for the clear response.