JChem search exact search speed issue

User 8201cee929

18-12-2007 15:26:26

If we switch off caching mode the exact structure search becomes 10 times faster. (I know it is deprecated)


With caching mode the speed is 700-800 ms for one molecule, we would expect about 100ms for one molecule.


We have 7 000 000 molecules in our structure table.





Here is the log from JChem:


with cache:


Tue Dec 18 16:01:28 CET 2007


Search mode: EXACT


Structure table: DBO.MOLECULES


Query: [#7]


Screened: 1


Hits: 1


Total time: 727 ms Screening: 696 ms


Processing threads: 2


Current / peak / maximum searches per minute: 9 / 9 / Unlimited





no cache:


Tue Dec 18 16:30:59 CET 2007


Search mode: EXACT


Structure table: DBO.MOLECULES


Query: [#8-]


Screened: 1


Hits: 1


Total time: 93 ms Screening: 23 ms


Processing threads: 2


Current / peak / maximum searches per minute: 9 / 9 / Unlimited





Any idea?


Thanks


Gabor

ChemAxon 9c0afc9aaf

18-12-2007 18:35:55

Hi,





The difference is in the screening time (the phase for selecting hit candidates for the slower graph search).





The discrepancy is due to a "trick" we apply in this phase:


The cd_hash column in the database table is normally used for speeding up duplicate filtering (PERFECT search).


This cannot be used for EXACT search in general, as the hits are not necessarily identical (e.g. a "single-or-double" bond should find both, an "any atom" can match on anything).


If the query atom does not have such features, we can "cheat" and use the hash code.





This speedup is currently not applied in cached mode, this explains the discrepancy in the search times.


We are planning to improve on this in the future.





By the way do you use the EXACT search for finding duplicate structures ?


In that case I recommend PERFECT search mode, which is specifically designed for handling this.


Please see the chemistry differences in our Query Guide:


http://www.chemaxon.com/jchem/doc/user/Query.html#otherSearchTypes


The search time should be similar to your faster measurement.





Best regards,





Szilard

ChemAxon 9c0afc9aaf

28-05-2008 15:13:04

Hi,





The exact search will use the hash code for screening whenever possible in the future releases, which will greatly improve the screening time.





Szilard