filterQuery option not returning expected results

User 348e44f632

24-04-2016 17:19:31

Hello,


We are upgrading JOC to the latest version on Oracle 12c.  In our current production environment we execute search statements that utilize filterQuery clause as follows:


SELECT  STRUCTURE_ID  FROM STRUCTURE WHERE jc_compare(SMILES, 'NC(=O)C1=CC=CC=C1','sep=! t:s!filterQuery:SELECT  STRUCTURE.rowid FROM STRUCTURE,TABLE2 WHERE   TABLE2.ID = mySetId and structure_id = TABLE2.SID') = 1


Using a set of PubChem molecules in the attachment as an example of a subset that would be returned by the pre-filter clause, this search returns 8 of the 49 molecules in our current production environment.


In the upgraded test environment, it returns zero rows.


CURRENT PRODUCTION ENVIRONMENT:


Oracle environment:
Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit Production
PL/SQL Release 11.2.0.3.0 - Production
CORE 11.2.0.3.0 Production
TNS for Linux: Version 11.2.0.3.0 - Production
NLSRTL Version 11.2.0.3.0 - Production

JChem owner: JCHEM

JChem Server environment:
Java VM vendor: Oracle Corporation
Java version: 1.7.0_04
Java VM version: 23.0-b21
JChem version: 5.11.3
JChem Index version: 5110000
JDBC driver version: 11.1.0.7.0-Production


UPGRADED TEST ENVIRONMENT:


Oracle environment:
Oracle Database 12c Enterprise Edition Release 12.1.0.1.0 - 64bit Production
PL/SQL Release 12.1.0.1.0 - Production
CORE 12.1.0.1.0 Production
TNS for Linux: Version 12.1.0.1.0 - Production
NLSRTL Version 12.1.0.1.0 - Production

JChem owner: JCHEM

JChem Server environment:
Java VM vendor: Sun Microsystems Inc.
Java version: 1.6.0_21
Java VM version: 17.0-b16
JChem version: 16.4.11.0
JChem index version: 16022200
JDBC driver version: 11.2.0.3.0
JChem Server start directory: /home/oracle/ChemAxon/jchem/cartridge
Maximum memory: 9 GB
JChem Server host:port : db2:1099


Can you explain what is causing this behavior?


Thank you,


maya

ChemAxon abe887c64e

26-04-2016 08:36:42

Hi Maya,


Unfortunately, we cannot reproduce the missing hit result.


Have you checked the result of the inner SELECT itself ?


Would you write the output of


DESC structure
DESC table2

and the CREATE INDEX parameters of the jc_idxtype index, and the standardizer configuration file, if there were any custom standardization applied.


Thank you,


Krisztina

User 348e44f632

26-04-2016 12:36:07

Hi Krisztina,


when you say you cannot reproduce the problem, do you mean that query returns 8 results with the latest cartridge /Oracle 12c; or that you cannot set up a test environment equivalent to ours?


The inner pre-filter query returns 49 rows.


maya

ChemAxon abe887c64e

26-04-2016 14:46:47

Hi Maya,


I tested with the same Oracle and JChem version as yours, (I could not apply the same column types and index parameters, standardizer config because of missing information) and the relevant hits are returned.


What are your results with the following two separate select statements?


SELECT  STRUCTURE.rowid FROM STRUCTURE,TABLE2 WHERE   TABLE2.ID = mySetId and structure_id = TABLE2.SID;

SELECT  STRUCTURE_ID  FROM STRUCTURE WHERE
jc_compare(SMILES, 'NC(=O)C1=CC=CC=C1','t:s') = 1;


 


Krisztina


 

User 348e44f632

26-04-2016 19:38:11

Hi Krisztina,


Attached are the files to create and populate my test tables.  


Originally, i ran the query directly on the tables that have been upgraded "in place", although we did rebuild the domain index.


These test tables behave a little differently.  I do sometimes get the correct result of 8 rows.  If i run my query multiple times, it returns different recordset on different tries.  It's usually 8 or 0 rows, but sometimes it's 1 or 2.  


SELECT  STRUCTID   FROM A_TEST_STRUCT2  WHERE jc_compare(SMILES, 'NC(=O)C1=CC=CC=C1','sep=! t:s!filterQuery:SELECT  A_TEST_STRUCT2.rowid FROM A_TEST_STRUCT2 ,A_TEST_JOIN  WHERE STRUCTID  = tj_sid') = 1


We use these arguments for standardizer:


removeexplicitH..automerize..aromatize~outFormat:smiles:u


Can you see if you can reproduce it?


Thank you for your help,


maya

ChemAxon abe887c64e

27-04-2016 09:47:44

Dear Maya,


Thank you for the sql files you sent, now we could reproduce the alternating search result (8 hits, 0 hits).


We continue examining this issue and will be back when we have more information.


Best regards,


Krisztina

ChemAxon abe887c64e

27-04-2016 11:27:59

As a workaround we recommend to combine the two select conditions Instead of embedding filterQuery into the the parameters of jc_compare.


SELECT  STRUCTID FROM A_TEST_STRUCT2,A_TEST_JOIN WHERE jc_compare(SMILES, 'NC(=O)C1=CC=CC=C1','t:s')=1 and STRUCTID  = tj_sid;


For combined queries in case of bigger tables, it is worth to use our cost estimation and calibration procedure in order to reach better performance.


Krisztina

User 348e44f632

28-04-2016 00:31:39

Thank you, Krisztina.


It appears that filterQuery gets translated into SQL that looks something like this:


select fqres.rowid from ( SELECT STRUCTURE.rowid FROM CHEMICAL_LISTS, STRUCTURE WHERE   CL_CHEMICAL_LIST_ID = 398452 and s_structure_id = cl_structure_id  ) fqres where fqres.rowid in ('AAAEbF ABgAABZQeAAR', 'AAAEbFABhAABbOzAAM', 'AAAEbFABgAABZJdAAB', 'AAAE bFABgAABQ4IAAK', 'AAAEbFABhAABbOzAAJ', 'AAAEbFABgAABQoOAAA', 'AA AEbFABhAABRFJAAN', 'AAAEbFABhAABRFJAAA', 'AAAEbFABgAABQnOAAI')


This is not entirely correct, since fqres.rowid is not the same as STRUCTURE.rowid returned by the fqres sub-query. The correct version should be:


select fqres.rowid from ( SELECT STRUCTURE.rowid as RID FROM CHEMICAL_LISTS, STRUCTURE WHERE   CL_CHEMICAL_LIST_ID = 398452 and s_structure_id = cl_structure_id  ) fqres where fqres.RID in ('AAAEbF ABgAABZQeAAR', 'AAAEbFABhAABbOzAAM', 'AAAEbFABgAABZJdAAB', 'AAAE bFABgAABQ4IAAK', 'AAAEbFABhAABbOzAAJ', 'AAAEbFABgAABQoOAAA', 'AA AEbFABhAABRFJAAN', 'AAAEbFABhAABRFJAAA', 'AAAEbFABgAABQnOAAI')


Best,


maya


 

ChemAxon abe887c64e

02-05-2016 13:27:00

Hi Maya,


Thank you for the helping feedback. In the meantime we investigated this issue, went through the advances and hindrances of the use of filterQuery, and at the moment we are near to the decision about the deprecation of filterQuery from the supported jc_compare parameters. 


Can you help us to find any disadvantage of the statement where the conditions are joined by AND (jc_compare(....)=1 AND any other filter condition) compared to the use of filterQuery embedded in jc_compare's 3rd parameter.


Thank you,


Krisztina


 



User 348e44f632

03-05-2016 16:08:40

Hi Krisztina,


I think you are right - filterQuery has no advantage over simply constructing a query that joins the tables.  I found that the most performant query will pre-filter the structure set using a with clause or inline query, and will return a subset of rowids  to which the domain index will then be applied (which, i believe, is what filterQuery   parameter is interpreted as). Eg:


with fqres as( SELECT STRUCTURE.rowid rid, S_STRUCTURE_ID FROM some_table, STRUCTURE WHERE  some_table.id = 5 and s_structure_id = some_table.structure_id  ) select  fqres.S_STRUCTURE_ID from fqres where fqres.rid in (SELECT STRUCTURE.rowid FROM STRUCTURE WHERE jc_compare(S_SMILES, 'NC=O', 't:s')=1) 


Does this answer your question?


Best,


maya

ChemAxon abe887c64e

04-05-2016 08:09:17

Hi Maya,


Thank you for sharing your opinion with us, and yes, my you answered my question. You correctly see the point of the filterQuery parameter, but we cannot fix the bug you discovered, so very possibly, we will deprecate it in the near future. 


We recommend using our cost estimation and calibration procedure  - linked in previously - to optimize the performance of the searches when chemical structure search condition is combined with other filter conditions.


Best regards,


Krisztina