Cartridge - ordering by similarity

User 4e4b708dbd

09-01-2008 09:46:33

This must have been covered in documentation somewhere or discussed in forums but I have been unable to find an answer. So the problem is as follows:





I want to perform a similarity search and display results ordered by similarity. What is the best way to do it with the cartridge?

ChemAxon aa7c50abf8

09-01-2008 11:38:27

Currently, the only way to do it is using the ORDER BY clause similarily to the following:





Code:
select id, jc_tanimoto(structure, 'Brc1ccccc1') sim from nci_1k order by sim






If you also need to apply a filter based on the similarity value:





Code:
select id, sim from (select id, jc_tanimoto(structure, 'Brc1ccccc1') sim from nci_1k) where sim > 0.2 order by sim






Note, that the Oracle optimizer will be smart enough to use a domain-index scan in the latter case.

User 276402c609

30-01-2008 11:26:59

pkovacs wrote:
Currently, the only way to do it is using the ORDER BY clause similarily to the following:





Code:
select id, jc_tanimoto(structure, 'Brc1ccccc1') sim from nci_1k order by sim






If you also need to apply a filter based on the similarity value:





Code:
select id, sim from (select id, jc_tanimoto(structure, 'Brc1ccccc1') sim from nci_1k) where sim > 0.2 order by sim






Note, that the Oracle optimizer will be smart enough to use a domain-index scan in the latter case.



Is there a way to use jc_tanimoto with BLOBs (the structure is stored in BLOB and draw structure also is BLOB)? In some cases we can easyly exceed the leght of string in PL/SQL (for example with bucky balls).

ChemAxon aa7c50abf8

30-01-2008 11:34:44

jc_tanimotob is implemented for this purpose.

User 276402c609

30-01-2008 12:04:37

pkovacs wrote:
jc_tanimotob is implemented for this purpose.
Thanks for fast response, but seems I still have problems.


I have 2 SQL's:





select m.cd_id, jc_tanimotob(m.cd_structure, TO_BLOB(UTL_RAW.CAST_TO_RAW('c'))) from molecules m where m.cd_id = 41146





select m.cd_id, jc_tanimotob(m.cd_structure, UTL_RAW.CAST_TO_RAW('c')) from molecules m where m.cd_id = 41146





Both of them gives error:


ORA-06502: PL?SQL: numeric or value error: character to conversation error


ORA-06512: at TANIMATO_FUNCB, line 28





The code executed in this function is:





return jchem_blob_pkg.getSimilarityValue(





'tanimoto', query, target,


indexctx.Rid,


indexctx.IndexInfo.IndexSchema,


indexctx.IndexInfo.IndexName,


indexctx.IndexInfo.IndexPartition,


indexctx.IndexInfo.IndexCols(1).TableSchema,


indexctx.IndexInfo.IndexCols(1).TableName, null);





I am missing something? Or doing something in incorrect way?

ChemAxon aa7c50abf8

30-01-2008 12:11:58

In an earlier post in this thread (http://www.chemaxon.com/forum/viewpost14460.html#14460), I suggested to use a sub-select when filtering is needed. Now, I suggest trying the same.