Similarity search sped up to get only 10-100 analogs

User 93ffa33d02

20-02-2015 15:20:52

We use similarity search in case if others “more exact”
search types were unsuccessful. We use search with JChemBase for Java.

What kind of options we can use to speed up similarity
search? Actually we need only small set of most similar compounds (from 10 to
1000 compounds) but search will be performed for really large Database (more than
20 million of compounds). According to documentation: we cannot use option
“setMaxResultCount” because “In case of similarity searches, the full search is
performed and the maxResult Count most similar results will be given back. In
this case this option does not mean speedup.” Is it still true?

Options “setFilterQuery” can increase speed? Filtering will
be done before search or after? So if for 40 million of compounds we perform
SQL to filter cd_id and leave only 20 million. Will be similarity search quickly
than for 40 million?

Currently we use JChem 6.2.1 but going to update to the latest version near future.

ChemAxon abe887c64e

24-02-2015 12:52:53

Unfortunately, setMaxResultCount really cannot be applied in similarity search.

In older JChem versions (like 6.2.1), filterQuery process always runs before the chemical structure search. From version 15.1.26 we execute combined searches depending on the structural and non-structural part of the query as described here.

Our guess is that your searches combined with filterQuery condition might run faster with the latest JChem (any version older than 15.1..26). Please let us know your findings.

An addition question: are your similarity searches running on the basis of the default chemical hashed fingerprints present in JChem tables, or on the basis of other fingerprints/descriptors?

Best regards

Krisztina