Compare structure between two tables in oracle

User 325f2762fd

29-04-2008 15:39:52

We are running the following query to find out any structural discrepencies. But having looked into the explain plan oracle is not using jchem domain index to retrieve data. Is there any operator available that can compare structures held in two tables?





Example : The parent smiles is molecule A. The version smiles is molecule A. molecule B. Here we are trying to identify rows where parent smiles is not included in version smiles.





SELECT *


FROM r_v_smiles a, r_p_smiles b


WHERE jc_compare (a.smiles, b.smiles, 't:ef') <> 1


AND b.db_no = a.parent_db_no








Rajeev & Catherine

ChemAxon aa7c50abf8

30-04-2008 09:51:33

Rajeev & Catherine,





Let me re-formulate the requirement and see if I understood it correctly:





You have two tables continaing chemical structures. You want to join the tables on some condition(s) (typically "non-chemical" condition(s)), then compare the structures at each juncture point. Expressing this in a function signature would give:


Code:



jc_compare_tables(<table1-name>, <struct-col1-name>, <table2-name>, <struct-col2-name>, <join-condition>, <search-options>) RETURN NUMBER






Is this more or less what you mean?





Thanks


Peter

User 325f2762fd

30-04-2008 10:33:53

Yes and search-options in case would be structure column in one table with the structure column in another table. And to be me more precise, "comparing with" is not a literal but dynamic can vary as it is a structure column in a table

ChemAxon aa7c50abf8

30-04-2008 18:46:19

Not entirely sure I understand your previous comment, but what I meant by <search-options> is a subset of the options accepted by jc_compare as its third argument. For search types where a distinction has to be made between target and query (such as substructure search), the first table would hold the target structures, the second the query structures.





Also, I've realized that you will probably want to retrieve all of the non-matches from one single call of the function. The most generic solution for this purpose is probably to store the non-matches in a table. To handle this, I'd add another paramater to the signature: <other-options>. So the new signature proposed is:
Code:
jc_compare_tables(<table1-name>, <struct-col1-name>, <table2-name>, <struct-col2-name>, <join-condition>, <search-options>, <other-options>) RETURN NUMBER






Applying this to your concrete example, the actual call would look something like:
Code:
jc_compare_tables('r_v_smiles', 'smiles', 'r_p_smiles', 'smiles', 'r_v_smiles.db_no = r_p_smiles.parent_db_no', 't:ef', 'outputTable:rvrp_nonmatch outputColumns:r_v_smiles.db_no')






The call would return the number of non-matching structures and store the non-matching structures in the rvrp_nonmatch table. The outputColumns options (a comma separated list of qualified column names) would tell which fields of each non-matching record to store in the outputTable.





Does this makes sense?





Peter

ChemAxon aa7c50abf8

30-04-2008 19:23:41

Another question which came to my mind when thinking about the options available with the current JChem Cartridge API for making this particular query more efficient:





I have the impression that the r_v_smiles and the r_p_smiles entities have a many-to-one relationship (when related through their parent_db_no and db_no columns respectively). Is this correct?





Thanks


Peter

User 325f2762fd

01-05-2008 08:08:34

Yes r_p_smiles and r_v_smiles have 1 to many relationship. Is JC_COMPARE_TABLES available in new jchem version?





We have two kind of requirement here


1) To find structure search between two tables that do not match


2) To find structure search between two tables that do match


For example for option 2 source and target could be same table and condition would limit the source. The aim is to find structural duplicates with in the same table.





Catherine will discuss it further at the user group conference meeting next week.





Thanks


Rajeev & Catherine

ChemAxon aa7c50abf8

02-05-2008 09:53:22

Quote:
We have two kind of requirement here


1) To find structure search between two tables that do not match


2) To find structure search between two tables that do match
So one of the <other-options> should be something like "lookForMatches" with a default value of "n" meaning that non-matching structures will be reported by default. Fair enough.
Quote:
For example for option 2 source and target could be same table and condition would limit the source. The aim is to find structural duplicates with in the same table.
In order for the syntax to support "self-joins" in a flexible and unambiguous way, the implementation can probably not get around supporting table aliases. (And this is probably not too hard to support.) So you should be able to specify something like the following:





Code:
jc_compare_tables('r_v_smiles a', 'smiles', 'r_v_smiles b', 'smiles', 'a.parent_db_no = b.parent_db_no', 't:ef', 'lookForMatches:y outputTable:rvrv_match outputColumns:a.db_no,b.db_no')






(And this makes me realize that in my previous example I inadvertantly swapped the column names between the two tables in the <join-condition> parameter. I will correct it.)





Does this make sense? Am I overlooking something?
Quote:
Is JC_COMPARE_TABLES available in new jchem version?
It is not. We are just in the phase of defining the requirements. Thank you for helping us do it!
Quote:
Yes r_p_smiles and r_v_smiles have 1 to many relationship.
I am still thinking about ways to implement your original use case with the currently available JChem Cartridge API until a dedicated function is implemented.
Quote:
Catherine will discuss it further at the user group conference meeting next week.
I am looking forward to it.





Thanks


Peter

ChemAxon aa7c50abf8

11-09-2009 12:50:16

Returning to the original problem, the solution appears to be turning out a simple one: instead of the "<> 1" predicate, use "= 0" like this:


SELECT * FROM r_v_smiles a, r_p_smiles b
WHERE jc_compare (a.smiles, b.smiles, 't:ef earlyResults:0') = 0
AND b.db_no = a.parent_db_no


In order to further improve performance,  you can try and move the db_no equality condition into a filterQuery option like this:


SELECT * FROM r_v_smiles a, r_p_smiles b
WHERE jc_compare (a.smiles, b.smiles,
'sep=! earlyResults:0!t:ef!filterQuery:select rowid from
r_v_smiles where parent_db_no = '
|| to_char(b.
db_no)) = 0

Let me know if this helps,


Peter