jc_compare filterquery problems

User f698d0529d

25-08-2005 10:05:23

Hi


Just to advise you that the problems I am having are preventing me from migrating one of our main applications to use the JChem cartridge. I am prepared to completely change my queries to make them JChem compatible. The problem is that working out what works with JChem is very difficult.





My latest bug discovery:





the filterQuery option in jc_compare does not work properly. And it produces inconsistent results.





Code:



A. select v.VERSION_ID, v.isosmiles from eoaiacd_v15.version v where


v.MOLECULAR_WEIGHT >= 464 and v.MOLECULAR_WEIGHT <= 470 and v.MOLECULAR_FORMULA = 'C16H11N3Na2O7S2' and jchem.jc_contains(v.ISOSMILES, 'Nc1ccccc1') = 1;


- 1 row, version id 94401, isosmiles [Na+].[Na+].Nc1cc(cc2cc(c(N=Nc3ccccc3)c(O)c12)S(=O)(=O)[O-])S(=O)(=O)[O-]





B. select rowid, v.VERSION_ID from eoaiacd_v15.version v where


v.MOLECULAR_WEIGHT >= 464 and v.MOLECULAR_WEIGHT <= 470 and jchem.jc_contains(v.ISOSMILES, 'Nc1ccccc1') = 1; - 815 rows





C. select v.Parent_ID, v.VERSION_ID, v.ISOSMILES, v.MOLECULAR_FORMULA from version v where


jchem.jc_compare(v.ISOSMILES, 'Nc1ccccc1',


'sep=! t:s!filterQuery:


select rowid from eoaiacd_v15.version v where


v.MOLECULAR_WEIGHT >= 464 and v.MOLECULAR_WEIGHT <= 470'


) = 1; - 815 rows.





D select v.Parent_ID, v.VERSION_ID, v.ISOSMILES, v.MOLECULAR_FORMULA from version v where


jchem.jc_compare(v.ISOSMILES, 'Nc1ccccc1',


'sep=! t:s!filterQuery:


select rowid from eoaiacd_v15.version v where


v.MOLECULAR_WEIGHT >= 460 and v.MOLECULAR_WEIGHT <= 470'


) = 1; - 1405 results.








So far so good. Queries B and C are equivalent, and produce the same results, although C is much faster. Since Jchem does not interact with the optimiser, this is understandable.





But





Code:



E. select v.Parent_ID, v.VERSION_ID, v.ISOSMILES, v.MOLECULAR_FORMULA from version v where


jchem.jc_compare(v.ISOSMILES, 'Nc1ccccc1',


'sep=! t:s!filterQuery:


select rowid from eoaiacd_v15.version v where


v.MOLECULAR_WEIGHT >= 464 and v.MOLECULAR_WEIGHT <= 470


and v.MOLECULAR_FORMULA = ''C16H11N3Na2O7S2'''


) = 1;





no rows returned. 








This should be equivalent to query A





However, it gets worse.





The result of query E seems to depend on which query you have run before. If you run C, then E, E gives 815 rows. You can see in the results that the molecular formulae do not match what was asked for. If you run D, then E, you get no rows again.





I notice that you have to precede the filterQuery SQL tablenames with the name of the owner schema, otherwise you get a “table or view does not exist” error. I figured this was because the filterQuery operation was connecting back to the database as the Jchem cartridge owner schema.





If you connect as jchem, and run this,





Code:



select v.ISOSMILES, v.MOLECULAR_FORMULA, v.MOLECULAR_WEIGHT from eoaiacd_v15.version v where


v.MOLECULAR_FORMULA = 'C16H11N3Na2O7S2';








You get one row.





ISOSMILES MOLECULAR_FORMULA MOLECULAR_WEIGHT


[Na+].[Na+].Nc1cc(cc2cc(c(N=Nc3ccccc3)c(O)c12)S(=O)(=O)[O-])S(=O)(=O)[O-] C16H11N3Na2O7S2 467.389435

User f698d0529d

25-08-2005 10:41:07

Hi


The ctFilter option does not seem to work properly either.





A select v.ISOSMILES, v.VERSION_ID from version v where jchem.jc_contains(v.ISOSMILES, 'Nc1ccccc1') = 1 and


jchem.jc_contains(v.ISOSMILES, 'Oc2ccccc2') = 1 and jchem.jc_contains(v.ISOSMILES, 'S') = 1;





--several rows are returned. eg


Nc1cc2ccccc2c(c1O)S(=O)(=O)O 2859


[O-][N+](=O)c1cccc(Cl)c1Oc2ccc(cc2)S(=O)(=O)Cl 5974


S=C=Nc1ccc(OCc2ccccc2)cc1 6066





B select v.ISOSMILES, v.VERSION_ID from version v where


jchem.jc_compare(v.ISOSMILES, 'Nc1ccccc1',


'sep=! t:s!ctFilter:(match(''Oc2ccccc2'')&&match(''S''))'


) = 1;





This should be equivalent to A, but it suffers from a similar problem as the filterQuery option, above


--sometimes no rows are returned, sometimes a set of spurious results is returned, eg


Oc1ccc(cc1[N+](=O)[O-])[N+](=O)[O-] 2919


NNc1ccccc1 2925


OC(=O)c1ccc(CBr)c(c1)[N+](=O)[O-] 2961





I forgot to mention my system specifications.





Oracle 9.2.0.6


Red Hat Enterprise Linux ES release 3 (Taroon Update 4)


Kernel 2.4.21-27.0.2.ELsmp on an i686





Jchem 3.0.14


Tomcat 4.1.31


Java 1.5.0_04


Tomcat is using the latest JDBC driver, ojdbc14.jar, as that seems to be faster than classes12.jar, and was the outcome of the TAR I had with Oracle about the hanging Jchem queries.

User f698d0529d

25-08-2005 10:46:12

Hi


I am just going to check if the ojdbc14.jar is compatible with this Oracle version and if the problem goes away if I revert to classes12.jar.


Mark

ChemAxon aa7c50abf8

25-08-2005 11:46:35

Mark,





I could identify the bug in JChem Cartridge causing the result set corruption in case of statements like:





Code:
select v.ISOSMILES, v.VERSION_ID from version v where jchem.jc_contains(v.ISOSMILES, 'Nc1ccccc1') = 1 and


jchem.jc_contains(v.ISOSMILES, 'Oc2ccccc2') = 1 and jchem.jc_contains(v.ISOSMILES, 'S') = 1;






or





Code:
SELECT v.version_ID FROM VERSION v WHERE jc_contains(v.ISOSMILES, 'FC(F)F') = 1


intersect 


SELECT v.version_ID FROM VERSION v WHERE jc_contains(v.ISOSMILES, 'COc1cccc(OC)c1') = 1






The problem is that the individual result sets from each jc_contains (or jc_compare) index scan in the same SQL statement are interfering with each other.





I am still investigating the issue with "single-jc_compare" SELECTs using ctFilter or filterQuery.





Peter

User f698d0529d

25-08-2005 12:11:52

Peter


Two points.





1. It is definitely not the upgraded driver that is causing the problem I reported here. I reverted to classes12.jar from the Oracle installation and there is no difference in behaviour, apart from speed (classes12.jar is slower).





Also, Oracle say that the driver/RDBMS combination is supported





http://www.oracle.com/technology/tech/java/sqlj_jdbc/htdocs/jdbc_faq_0.htm#02_02





2. You said that you have "identified a bug causing result set corruption is case of statements like... " and gave two statements. As far as I am aware, only the second statement, using the intersect, has a problem. The first statement works OK (although it is too slow for my use).





Thanks for your help.


Mark

ChemAxon aa7c50abf8

25-08-2005 12:30:03

Quote:
2. You said that you have "identified a bug causing result set corruption is case of statements like... " and gave two statements. As far as I am aware, only the second statement, using the intersect, has a problem. The first statement works OK (although it is too slow for my use).
Thank you for pointing this out.

ChemAxon aa7c50abf8

25-08-2005 12:49:20

I am having difficulty reproducing the problems. Though I am trying it on Oracle 10.1.0.4 Standard. (I also have 9.2.0.7 on Windows, but that is also slightly different from your environment.)





Do you think that if I repeat the statements in your first posting (A. through E.) randomly, the problem should occur? Do you execute these statements sequentially in various orders? Or do you also have now and then other statements between them?

ChemAxon aa7c50abf8

25-08-2005 12:56:04

I was trying with a different JChem version. I am going to try with the same as you.

User f698d0529d

25-08-2005 13:09:52

Run them in this order, waiting for the results each time, and you should reproduce the problem.


A, E, C, E, D, E





The E query should vary between giving no results and giving spurious results. It never gives the right result, which is one row only.

ChemAxon aa7c50abf8

25-08-2005 14:34:33

With JChem 3.0.14, statement E. invariably gives me wrong results which are invariably "no rows selected". This is due to a bug in the code which parses the options for jc_compare. This affects all options whose value is case-sensitive. This bug has been fixed in JChem version 3.1.





I am still unable to reproduce the spurious nature of the problem, not even with the sequence you suggested.

User f698d0529d

25-08-2005 15:38:49

I am investigating if it is just an artefact of using TOAD, or if it it occurs also when I write some java code. I will let you know.


Thanks


Mark

User f698d0529d

25-08-2005 15:53:42

I have some new information for you. The problem is not specific to TOAD. It can be reproduced using the SQL worksheet function in JDeveloper.





It is simpler than I thought. If you execute query C, then E, the problem happens. However, if you execute query C, but traverse to the bottom of the cursor (815 rows, in my case) and then execute query E, the problem does not happen.





The problem cannot be reproduced using SQL Plus, because every time you execute a query in that application, the entire cursor is read. I think by default in applications like TOAD and JDeveloper, only the first few rows are read until you do something to make it read more.





I will write some vb.net code to see if I can reproduce the problem there. The application I wish to migrate is a vb.net application, so it would be best to try that first.





Mark

User f698d0529d

25-08-2005 16:26:16

I am afraid the problem can be reproduced using vb.net





The first code (posted below as code 1) executes query C, traverses to the end of the cursor, closes the cursor and then executes query E.





It produces this output





select v.Parent_ID, v.VERSION_ID, v.ISOSMILES, v.MOLECULAR_FORMULA from version


v where jchem.jc_compare(v.ISOSMILES, 'Nc1ccccc1', 'sep=! t:s!filterQuery:select


rowid from eoaiacd_v15.version v where v.MOLECULAR_WEIGHT >= 464 and v.MOLECULA


R_WEIGHT <= 470') = 1


Retrieved 815 rows


select v.Parent_ID, v.VERSION_ID, v.ISOSMILES, v.MOLECULAR_FORMULA from version


v where jchem.jc_compare(v.ISOSMILES, 'Nc1ccccc1', 'sep=! t:s!filterQuery:select


rowid from eoaiacd_v15.version v where v.MOLECULAR_WEIGHT >= 464 and v.MOLECULA


R_WEIGHT <= 470 and v.MOLECULAR_FORMULA = ''C16H11N3Na2O7S2''') = 1


Retrieved 0 rows





If you change the code (code 2 below) so that it executes query C, but does not traverse the cursor, and then executes query E, then the spurious result occurs in vb.net as well.





This gives the output





select v.Parent_ID, v.VERSION_ID, v.ISOSMILES, v.MOLECULAR_FORMULA from version


v where jchem.jc_compare(v.ISOSMILES, 'Nc1ccccc1', 'sep=! t:s!filterQuery:select


rowid from eoaiacd_v15.version v where v.MOLECULAR_WEIGHT >= 464 and v.MOLECULA


R_WEIGHT <= 470 and v.MOLECULAR_FORMULA = ''C16H11N3Na2O7S2''') = 1


Retrieved 815 rows





Code 1


Code:



    Sub Main()


        Dim cn As New OleDbConnection("Provider=""MSDAORA.1"";User ID=eoaiacd;Password=eoaiacd;Data Source=ccdtest")


        Dim cmd As New OleDbCommand("select v.Parent_ID, v.VERSION_ID, v.ISOSMILES, v.MOLECULAR_FORMULA from version v where jchem.jc_compare(v.ISOSMILES, 'Nc1ccccc1', 'sep=! t:s!filterQuery:select rowid from eoaiacd_v15.version v where v.MOLECULAR_WEIGHT >= 464 and v.MOLECULAR_WEIGHT <= 470') = 1", cn)


        Dim iCounter As Integer = 0


        Try


            cn.Open()


            Dim dr As OleDbDataReader = cmd.ExecuteReader


            While dr.Read


                iCounter += 1


            End While


            dr.Close()


            Console.WriteLine(cmd.CommandText)


            Console.WriteLine("Retrieved " & iCounter & " rows")


            cmd.CommandText = "select v.Parent_ID, v.VERSION_ID, v.ISOSMILES, v.MOLECULAR_FORMULA from version v where jchem.jc_compare(v.ISOSMILES, 'Nc1ccccc1', 'sep=! t:s!filterQuery:select rowid from eoaiacd_v15.version v where v.MOLECULAR_WEIGHT >= 464 and v.MOLECULAR_WEIGHT <= 470 and v.MOLECULAR_FORMULA = ''C16H11N3Na2O7S2''') = 1"


            dr = cmd.ExecuteReader


            iCounter = 0


            While dr.Read


                iCounter += 1


            End While


            dr.Close()


            Console.WriteLine(cmd.CommandText)


            Console.WriteLine("Retrieved " & iCounter & " rows")


        Catch ex As Exception


            Console.WriteLine("Error occured during processing of command")


            Console.WriteLine(cmd.CommandText)


            Console.WriteLine(ex.ToString)


        Finally


            cmd.Dispose()


            cn.Close()


        End Try


        Console.ReadLine()


    End Sub








Code2


Code:



  Sub Main()


        Dim cn As New OleDbConnection("Provider=""MSDAORA.1"";User ID=eoaiacd;Password=eoaiacd;Data Source=ccdtest")


        Dim cmd As New OleDbCommand("select v.Parent_ID, v.VERSION_ID, v.ISOSMILES, v.MOLECULAR_FORMULA from version v where jchem.jc_compare(v.ISOSMILES, 'Nc1ccccc1', 'sep=! t:s!filterQuery:select rowid from eoaiacd_v15.version v where v.MOLECULAR_WEIGHT >= 464 and v.MOLECULAR_WEIGHT <= 470') = 1", cn)


        Dim iCounter As Integer = 0


        Try


            cn.Open()


            Dim dr As OleDbDataReader = cmd.ExecuteReader


            'While dr.Read


            'iCounter += 1


            'End While


            dr.Close()


            'Console.WriteLine(cmd.CommandText)


            'Console.WriteLine("Retrieved " & iCounter & " rows")


            cmd.CommandText = "select v.Parent_ID, v.VERSION_ID, v.ISOSMILES, v.MOLECULAR_FORMULA from version v where jchem.jc_compare(v.ISOSMILES, 'Nc1ccccc1', 'sep=! t:s!filterQuery:select rowid from eoaiacd_v15.version v where v.MOLECULAR_WEIGHT >= 464 and v.MOLECULAR_WEIGHT <= 470 and v.MOLECULAR_FORMULA = ''C16H11N3Na2O7S2''') = 1"


            dr = cmd.ExecuteReader


            iCounter = 0


            While dr.Read


                iCounter += 1


            End While


            dr.Close()


            Console.WriteLine(cmd.CommandText)


            Console.WriteLine("Retrieved " & iCounter & " rows")


        Catch ex As Exception


            Console.WriteLine("Error occured during processing of command")


            Console.WriteLine(cmd.CommandText)


            Console.WriteLine(ex.ToString)


        Finally


            cmd.Dispose()


            cn.Close()


        End Try


        Console.ReadLine()


    End Sub








Mark

ChemAxon aa7c50abf8

26-08-2005 08:11:12

Mark,





Thank you very much for your extensive research and your precise definition of the problem!!!





In fact the problem occurs with every query, which returns no results and is immediately preceded by another query whose results have not been fully fetched. This problem can be reproduced with jc_contains as well. The behavior you have seen with your original queries is related to the jc_compare options only to the extent that it was the combination of two unrelated bugs: one already known bug in the option parser, which is already fixed in JChem 3.1 (but not yet in 3.0.14) and another new bug. This latter will be fixed in the next JChem release.





Thank you again!





Peter