Pharmacophoric similarity search too slow

User 86810cf9fa

23-04-2008 07:28:40

Dear support,





I tryed to perform a Pharmacophoric similarity search using JChem Base 3.1.6.


I simply added searcher.setDescriptorName("PF"); to specify that it is a pharmacophoric similarity search. However, the pharmacophoric similarity search is very slow compared to the similarity search. Can you, please, tell me how to accelerate this search?





thank you very much


best regards


Severine

ChemAxon 9c0afc9aaf

24-04-2008 16:19:03

Hi,





For "normal" similarity search the Chemical Hashed Fingerprint is used, which is already cached in memory to provide a rapid pre-filtering for structural searches.





In the case of Molecular Descriptors each descriptor is stored in a separate database table. (Several descriptors, therefore several such tables may exist for a single structure table.)





During descriptor similarity search this data must be fetched from the database (from a BLOB column or equivalent).





So this type of similarity search is inherently slower, however it is more intelligent and flexible.





Is it unreasonably slow for you ? What is the size of the table ?





Suggestions for speedup:





1. the application and the RDBMS should have a good network connection to ease data transfer





2. You may try caching the relevant columns by the database.


For example if using Oracle you can specify LOB columns for caching according to this article (maybe only in 11g though):








http://www.oracle.com/technology/pub/articles/oracle-database-11g-top-features/11g-securefiles.html





Code:



alter table <table> modify lob(<blob_column>) (cache)






Best regards,





Szilard

User 86810cf9fa

28-04-2008 13:03:11

Hi Szilard,





Thank you for your answer. We are using an Oracle database.
Quote:
Is it unreasonably slow for you ? What is the size of the table ?
The 2D pharmacophoric fingerprint is stored in a long raw column. I changed the cache mode for this table but I don't know how to do that specifically for the md_data (LONG RAW) column.





After this change, the 2D pharmacophoric similarity search take 5 minutes minimum for this structure "OC(=O)c1ccccc1O" with 0.2 as dissimilarity threshold. It is two long for us especially because the normal similarity search (using the same structure and the same dissimilmiraty threshold) take a few seconds on a table containing 458452 rows.





I take a look on the Oracle session and this query :





Code:
 SELECT csmol.cd_id, csmol_md_pf.md_data


    FROM csmol, csmol_md_pf


    WHERE csmol.cd_id = csmol_md_pf.cd_id








take a long time because there is a full scan of the csmol_md_pf table.





Is it possible to decrease the search duration?





Thank you.





Best regards,


Severine

ChemAxon 9c0afc9aaf

29-04-2008 08:20:34

Hi,





In the current implementation the similarity search with descriptors will always be orders of magnitude slower (see the details in my previous post), you cannot expect to reach the same speed as normal similarity which manipulates on data already in the memory.





You are right, in case of Oracle LONG RAW columns are created indeed.


(In the future we will probably change this on the long run to BLOB according to Oracle's recommendation).





I might be wrong, but LONG RAW columns are probably do not need special tricks to be cached, just a sufficiently big (huge - depending on the table size) buffer cache setting.





Reading the content of the whole table is perfectly normal, we need all those data indeed.





I recommend to try a huge buffer cache setting, and run descriptor similarity search at least twice and see if it gets faster.


(the first run may still be slow, as the data is not in the cache yet).





In the long run we might implement optional in-memory caching of descriptor data if it turns out to be a popular request.





Best regards,





Szilard

User 86810cf9fa

30-04-2008 07:45:33

Hi Szilard,





Thank you for your answer.





This kind of search is actually around 3 minutes using the cache. I must consult my colleagues to see if we can integrate this tool in our application.





Best regards,


Severine