User 325f2762fd
		18-04-2008 16:43:51
	 
	
	
	We are running the following query and it is taking 15 minutes. How we can improve the performance
COMPOUND table contains smiles and it has 2.2 million records
PROJECT_COMPOUNDS contains ID and has 6 million records
The other tables are small in the range of 1000. When I run without jchem search it is very quick.
SELECT DISTINCT COMPOUND.compound_id, COMPOUND.smiles,
                COMPOUND.h_bond_acceptors, COMPOUND.h_bond_donors,
                COMPOUND.rotatable_bonds, COMPOUND.Clogp,
                COMPOUND.molecular_weight, COMPOUND.synthesisable
           FROM COMPOUND,
                PROJECT_COMPOUNDS,
                eseek_projects,
                evoseek.VM_USER_PROJECT_ACCESS
          WHERE COMPOUND.compound_id = PROJECT_COMPOUNDS.compound_id
            AND PROJECT_COMPOUNDS.project_id = eseek_PROJECTS.project_id
            AND  jc_contains (COMPOUND.smiles, 'c1ccccc1') = 1
            AND evoseek.VM_USER_PROJECT_ACCESS.project_id = eseek_projects.project_id
            AND evoseek.VM_USER_PROJECT_ACCESS.user_id = 434	
	
 
	
		User 325f2762fd
		18-04-2008 17:10:18
	 
	
	
	Oracle is already doing that see the explain plan below
 	  | Code: | 
		  | Operation   Object Name   Rows   Bytes   Cost   Object Node   In/Out   PStart   PStop 
 
 
 
 
 SELECT STATEMENT Optimizer Mode=ALL_ROWS      2 K       8004
 
 
 HASH UNIQUE      2 K   330 K   8004
 
 
 HASH JOIN      2 K   330 K   7925
 
 
 TABLE ACCESS BY INDEX ROWID   EVOSEEK.COMPOUND   22 K   2 M   7080
 
 
 DOMAIN INDEX   EVOSEEK.IXJC_COMPOUND_SMILES
 
 
 NESTED LOOPS      103 K   2 M   558
 
 
 NESTED LOOPS      4     44     1
 
 
 INDEX FULL SCAN   EVOSEEK.PK_PROJECTS   308     1 K   1
 
 
 INDEX UNIQUE SCAN   EVOSEEK.PK_VM_USER_PROJECT_ACCESS   1     7     0
 
 
 INDEX RANGE SCAN   EVOSEEK.UNQ_PROJECT_COMPS_PROJ_COMPID   24 K   238 K   139
 
 
 
 | 
	
	 
	
		ChemAxon aa7c50abf8
		18-04-2008 17:41:28
	 
	
	
	Based on this explain plan, Oracle doesn't do what I meant. Oracle does a full domain index scan -- meaning that the substructure search is applied to every structure in the table. Looking at your query structure, JC_CONTAINS probably finds quite a lot of hits, which take quite a long time to return with most of them probably being subsequently discarded as the two legs of the execution plan are hash joined. (Returning hits from a domain index scan is a fairly time consuming operation and its duration increases linearly with the number of the hits returned.)
What I meant was to effectively reduce the input (and output) of the substructure search part of your query. I suggest to use the 
filterQuery option of the JC_COMPARE operator: 
http://www.chemaxon.com/jchem/doc/guide/cartridge/cartapi.html#jc_compare_filterQuery
This basically means that you pack the non-structure-search conditions (the project and user specific conditions) in a SELECT statement and then make this SELECT statement the value of the 
filterQuery option. This way, even though JC_COMPARE will still be executed in domain index scan mode, internally it will apply substructure search only to the structures returned by the 
filterQuery. This will result both in reduced time spent on substructure search and in (typically) substantially less hits returned by the domain index scan.	
	
 
	
		User 325f2762fd
		18-04-2008 18:06:37
	 
	
	
	Fantastic. The query takes 3 minutes now after changing the query as per your suggestion.
Thanks for your help. See below the change made to the query
 	  | Code: | 
		  | SELECT          COMP.compound_id, COMP.smiles, 
 
 COMP.h_bond_acceptors, COMP.h_bond_donors,
 
 
 COMP.rotatable_bonds, COMP.clogp,
 
 
 COMP.molecular_weight, COMP.synthesisable
 
 
 FROM evoseek.COMPOUND comp
 
 
 WHERE jc_compare (COMP.smiles, 'c1ccccc1','sep=! t:s!filterQuery:select compound.rowid
 
 
 FROM evoseek.COMPOUND,
 
 
 evoseek.PROJECT_COMPOUNDS,
 
 
 evoseek.PROJECTS,
 
 
 evoseek.VM_USER_PROJECT_ACCESS
 
 
 WHERE COMPOUND.compound_id = PROJECT_COMPOUNDS.compound_id
 
 
 AND PROJECT_COMPOUNDS.project_id = PROJECTS.project_id
 
 
 AND evoseek.VM_USER_PROJECT_ACCESS.project_id = PROJECTS.project_id
 
 
 AND evoseek.VM_USER_PROJECT_ACCESS.user_id = 434') = 1
 | 
What is sep=! mean and also ! sign after t:s mean?	
	 
	
		ChemAxon aa7c50abf8
		18-04-2008 18:15:13
	 
	
	
	sep=! means that the options in the option list (the third argument of the 
jc_compare operator) will be separated by the '!' string (which consists in this case of just one character). By default, the options in the list are separated by space(s) which is not appropriate for option values which themselves may contain spaces (such as a SQL statemet). For further details please, see 
http://www.chemaxon.com/jchem/doc/guide/cartridge/index.html#option_list_parameter.