User 325f2762fd
18-04-2008 16:43:51
We are running the following query and it is taking 15 minutes. How we can improve the performance
COMPOUND table contains smiles and it has 2.2 million records
PROJECT_COMPOUNDS contains ID and has 6 million records
The other tables are small in the range of 1000. When I run without jchem search it is very quick.
SELECT DISTINCT COMPOUND.compound_id, COMPOUND.smiles,
COMPOUND.h_bond_acceptors, COMPOUND.h_bond_donors,
COMPOUND.rotatable_bonds, COMPOUND.Clogp,
COMPOUND.molecular_weight, COMPOUND.synthesisable
FROM COMPOUND,
PROJECT_COMPOUNDS,
eseek_projects,
evoseek.VM_USER_PROJECT_ACCESS
WHERE COMPOUND.compound_id = PROJECT_COMPOUNDS.compound_id
AND PROJECT_COMPOUNDS.project_id = eseek_PROJECTS.project_id
AND jc_contains (COMPOUND.smiles, 'c1ccccc1') = 1
AND evoseek.VM_USER_PROJECT_ACCESS.project_id = eseek_projects.project_id
AND evoseek.VM_USER_PROJECT_ACCESS.user_id = 434
User 325f2762fd
18-04-2008 17:10:18
Oracle is already doing that see the explain plan below
Code: |
Operation Object Name Rows Bytes Cost Object Node In/Out PStart PStop
SELECT STATEMENT Optimizer Mode=ALL_ROWS 2 K 8004
HASH UNIQUE 2 K 330 K 8004
HASH JOIN 2 K 330 K 7925
TABLE ACCESS BY INDEX ROWID EVOSEEK.COMPOUND 22 K 2 M 7080
DOMAIN INDEX EVOSEEK.IXJC_COMPOUND_SMILES
NESTED LOOPS 103 K 2 M 558
NESTED LOOPS 4 44 1
INDEX FULL SCAN EVOSEEK.PK_PROJECTS 308 1 K 1
INDEX UNIQUE SCAN EVOSEEK.PK_VM_USER_PROJECT_ACCESS 1 7 0
INDEX RANGE SCAN EVOSEEK.UNQ_PROJECT_COMPS_PROJ_COMPID 24 K 238 K 139
|
ChemAxon aa7c50abf8
18-04-2008 17:41:28
Based on this explain plan, Oracle doesn't do what I meant. Oracle does a full domain index scan -- meaning that the substructure search is applied to every structure in the table. Looking at your query structure, JC_CONTAINS probably finds quite a lot of hits, which take quite a long time to return with most of them probably being subsequently discarded as the two legs of the execution plan are hash joined. (Returning hits from a domain index scan is a fairly time consuming operation and its duration increases linearly with the number of the hits returned.)
What I meant was to effectively reduce the input (and output) of the substructure search part of your query. I suggest to use the
filterQuery option of the JC_COMPARE operator:
http://www.chemaxon.com/jchem/doc/guide/cartridge/cartapi.html#jc_compare_filterQuery
This basically means that you pack the non-structure-search conditions (the project and user specific conditions) in a SELECT statement and then make this SELECT statement the value of the
filterQuery option. This way, even though JC_COMPARE will still be executed in domain index scan mode, internally it will apply substructure search only to the structures returned by the
filterQuery. This will result both in reduced time spent on substructure search and in (typically) substantially less hits returned by the domain index scan.
User 325f2762fd
18-04-2008 18:06:37
Fantastic. The query takes 3 minutes now after changing the query as per your suggestion.
Thanks for your help. See below the change made to the query
Code: |
SELECT COMP.compound_id, COMP.smiles,
COMP.h_bond_acceptors, COMP.h_bond_donors,
COMP.rotatable_bonds, COMP.clogp,
COMP.molecular_weight, COMP.synthesisable
FROM evoseek.COMPOUND comp
WHERE jc_compare (COMP.smiles, 'c1ccccc1','sep=! t:s!filterQuery:select compound.rowid
FROM evoseek.COMPOUND,
evoseek.PROJECT_COMPOUNDS,
evoseek.PROJECTS,
evoseek.VM_USER_PROJECT_ACCESS
WHERE COMPOUND.compound_id = PROJECT_COMPOUNDS.compound_id
AND PROJECT_COMPOUNDS.project_id = PROJECTS.project_id
AND evoseek.VM_USER_PROJECT_ACCESS.project_id = PROJECTS.project_id
AND evoseek.VM_USER_PROJECT_ACCESS.user_id = 434') = 1 |
What is sep=! mean and also ! sign after t:s mean?
ChemAxon aa7c50abf8
18-04-2008 18:15:13
sep=! means that the options in the option list (the third argument of the
jc_compare operator) will be separated by the '!' string (which consists in this case of just one character). By default, the options in the list are separated by space(s) which is not appropriate for option values which themselves may contain spaces (such as a SQL statemet). For further details please, see
http://www.chemaxon.com/jchem/doc/guide/cartridge/index.html#option_list_parameter.