Mass structure registration hangs on some computers

User 77bcbbb206

10-10-2005 22:29:43

I have a very confusing problem that has been quite difficult to track down (or replicate consistently), but I'm going to explain it, and hopefully it will make sense.





JCHEM_CORE_PKG.GETENVIRONMENT()


--------------------------------------------------------------------------------


Oracle9i Enterprise Edition Release 9.2.0.5.0 - Production


PL/SQL Release 9.2.0.5.0 - Production


CORE 9.2.0.6.0 Production


TNS for 32-bit Windows: Version 9.2.0.5.0 - Production


NLSRTL Version 9.2.0.5.0 - Production


NLSRTL Version 9.2.0.5.0 - Production


JChem version in the database: 3.1.1


JChem version in the Tomcat server: 3.1.1


java.vm.version: 1.4.2_09-b05


java.vm.vendor: Sun Microsystems Inc.


Apache Tomcat/4.1.31


Major JDBC version in Tomcat: 10


Minor JDBC version in Tomcat: 2





The problem I'm having is during the initial setup of the application schema I'm using, there is a mass-registration of structures (~ 450K) into a JChem table. The registration hangs indefinitely at different structures on different registration attempts. Let me attempt to explain further. Initial installation includes: JDK install, Apache install, Oracle install, and JChem install. Everything works fine from an application install perspective. Once JChem is installed, the application schema is installed. The application schema contains a structure table which is registered into the JChem table using jc_insert and then the structure images are generated in PNG format using jc_molconvertb and stored in the non-JChem structure table. This registration process is the part described above that hangs.





There is some good news and some bad news. This process only hangs on certain servers not all servers. That is the bad news. The good news is that it always hangs on the servers that have the problems and never hangs on the servers that don't. I've attempted (for almost five full days now), to figure out what the differences are between the working and non-working server to no avail. I am hoping someone has attempted this before and had the same problems I'm having.





The mass registration doesn't seem to be the problem. It's the mass generation of the binary PNG images that, I think, is causing the disruption. If the process hangs, I can open a command prompt and use the "netstat -a" command and see that on the local server's 8080 port there is a "CLOSE_WAIT" status instead of "ESTABLISHED" or "LISTENING". I think this is where the hanging is coming from. Somehow the TCP port is not being closed on both sides of the connection and the mass image generation is not able to continue. This, in turn, does not allow the registration to continue.





I've attempted to upgrade the JDBC driver to 10g (which you should see above). I've attempted to upgrade to Tomcat 4.1, 5.0, and 5.5. Nothing has helped.





There are no errors thrown and no entries in any log file I can see (Apache or Oracle).





Please ask any questions you need. I'm really running out of ideas here.





-Jim

ChemAxon aa7c50abf8

11-10-2005 12:17:45

Jim,





My working assumption is that some kind of resource exhaustion causes the registration process to fail (or wait indefinitely) on certain servers. Those servers may have less of the critical resource (e. g. temporary tablespace) than the others (or there are more consumers of the resource).





How do you call the jc_molconvertb function? In a PL/SQL procedure? (If so, you probably use jcf_molconvertb function rather than the jc_molconvertb operator.) What is your SQL statement?





After you're done with inserting the image, you call dbms_lob.freetemporary for the blob returned by jc_molconvertb, I presume. You can check the number of existing temporary lobs with:


Code:
select * from sys.v$temporary_lobs



Does this statement show any increase in temporary blobs?





Please, could you check with the following sql statement to see which event the hanging Oracle session is waiting for (after it gets stuck)?:


Code:
select swait.* from v$session sess, v$session_wait swait


    where sess.username = 'JCHEMUSER31HEAD'


                and sess.sid = swait.sid;



(Replace JCHEMUSER31HEAD in the above statement with your username.)





Peter

User 77bcbbb206

11-10-2005 17:00:28

Peter,





Thanks for replying so quickly.





I've reduced my procedure down to a small anonymous block for testing purposes:





Code:
set serveroutput on


declare


   v_cursor NUMBER;


   v_return NUMBER;


   --


   v_structure_smile CHEM_STRUCTURE.STRUCTURE_SMILES%TYPE;


   v_structure_img CHEM_STRUCTURE.STRUCTURE_IMG%TYPE;


   --


begin


   v_cursor := DBMS_SQL.OPEN_CURSOR;


   DBMS_SQL.PARSE( v_cursor, 'select structure_smiles from chem_structure', DBMS_SQL.native );


      --


   DBMS_SQL.DEFINE_COLUMN( v_cursor, 1, v_structure_smile, 2048 );


      --


      v_return := DBMS_SQL.EXECUTE( v_cursor );


      --


      LOOP


         IF DBMS_SQL.FETCH_ROWS( v_cursor ) > 0 THEN


            --


            DBMS_SQL.COLUMN_VALUE( v_cursor, 1, v_structure_smile );


            --


            v_structure_img := jcf_molconvertb(v_structure_smile, 'png');


            dbms_lob.freetemporary(v_structure_img);


         ELSE -- no more rows, exit


            EXIT;


         END IF;


      END LOOP;


      DBMS_SQL.CLOSE_CURSOR( v_cursor );


end;


/








There are about 20K rows in the table "CHEM_STRUCTURE". I'm calling jcf_molconvertb and immediately freeing the blob returned.





Code:
SQL> select * from sys.v$temporary_lobs;





       SID CACHE_LOBS NOCACHE_LOBS


---------- ---------- ------------


        14          0          967






The NOCACHE_LOBS increased from 0 to 967 and have remained (I am letting the procedure run).








Code:
SID, SEQ#,                       EVENT,      P1TEXT,         P1,    P1RAW


  8,  233, SQL*Net message from client,   driver id, 1413697536, 54435000


 14, 1100,           direct path write, file number,        201, 000000C9





    P2TEXT,     P2,    P2RAW,    P3TEXT, P3,    P3RAW, WAIT_TIME, SECONDS_IN_WAIT, STATE


    #bytes,      1, 00000001,          ,  0,       00,         0,             163, WAITING


 first dba, 511825, 0007CF51, block cnt,  1, 00000001,        -1,              49, WAITED KNOWN TIME






I've tested this a few times and the NOCACHE_LOBS max value is ALWAYS 967, which I'm assuming is the resource limit. The problem remains though? I realize this is asking a bit much, but do you know why the LOBS would not be freed? I will continue to investigate.





Again, thanks for all your help,


Jim

ChemAxon aa7c50abf8

11-10-2005 19:09:29

Jim,





What you have found is very similar to the symptoms I was able to reproduce. The only difference is that my session is waiting for direct path read (instead of direct path write) and the "limit" on my test machine is 1441 temporary BLOBS. The "limit" probably depends on the size of your temporary BLOBs and on the size of the free temporary tablespace. (If you have a large enough temporary tablespace, you simply do not notice.) The problem of the dbms_lob.freetemporary stored procedure having no effect occurs therefore not only on some of your servers but everywhere -- even on 10g databases.





The source of the problem is probably due to the fact that the temp BLOB returned by jc_molconvertb is created in a Java stored procedure. I have been using a Java client to test jc_molconvertb. When the temporary BLOBs are freed within the Java client, the number of temp blobs, as seen in v$temporary_lobs, never goes above 10.





I did not find anything in the Oracle docs suggesting that temp LOBs should be freed in the same runtime environment where they were created. Regular LOBs (or more accurately: LOB locators) can be passed around with no problem between runtimes. So can locators for temporary LOBs -- except that freeing does not seem to work across runtime environments.





This last statement is also defective, since my Java client which successfully frees tmp BLOBs runs definitely in an environment distinct from the Oracle Java stored procedure where jc_molconvertb is implemented. On the other hand, passing the temp BLOB from PL/SQL to a Java stored proc whose sole function is to free the tmp blobs does not seem to help.





The only approach which promises success is adding a BLOB parameter to jc_molconvertb which can be used to pass a tmp BLOB created in PL/SQL to the Java stored proc (instead of the tmp blob being created in the Java stored proc). (This way the tmp blob is both created and freed by the caller PL/SQL.)





The weird thing is that:





Code:
select dbms_lob.getlength(jc_molconvertb(structure, 'png')) from a_large_table;






does not produce any increase in tmp lobs (as per v$temporary_lobs) when executed on the sqlplus command line, even though there is apparently nothing which would free the tmp blobs returned by jc_molconvertb.





Peter

ChemAxon aa7c50abf8

12-10-2005 08:10:18

Quote:
The weird thing is that:





Code:


select dbms_lob.getlength(jc_molconvertb(structure, 'png')) from a_large_table;








does not produce any increase in tmp lobs (as per v$temporary_lobs) when executed on the sqlplus command line, even though there is apparently nothing which would free the tmp blobs returned by jc_molconvertb.
For the above, there is an explanation here:


http://download-west.oracle.com/docs/cd/B10501_01/appdev.920/a96591/adl07mds.htm#135757





It says:
Quote:
Returned temporary LOBs automatically get freed at the end of a PL/SQL program block.
Peter

ChemAxon aa7c50abf8

12-10-2005 12:15:40

Quote:
The "limit" probably depends on the size of your temporary BLOBs and on the size of the free temporary tablespace. (If you have a large enough temporary tablespace, you simply do not notice.)
I have added datafiles to both the temporary tablespace and the system tablespace and the image generation still hangs after the same number of calls to jcf_molconvertb.





More importantly, I have tested the solution (outlined above) by which the number of non-freed temporary BLOBs can be kept under limits and the session still hangs at the same number of iterations. This means that it is not the number of temporary BLOBs which causes this problem.





Peter

ChemAxon aa7c50abf8

12-10-2005 16:07:58

Jim,





It looks like upgrading to Tomcat 5.5.12 and JDK 1.5.0_05 solves the problem. I am at the beginning of the test (I am going to leave it running for the night), but it's already been running much further than before.





Peter

User 77bcbbb206

12-10-2005 16:09:42

That makes me feel a little better. As a "Quick Fix" solution, I also increased the Temp and System tablespace size. I actually got the limit to increase to a upper value of ~27K from 967, but an upper limit still remains.





It kind of seems like the temp lob extents are not being reached and increased...





Are we absolutely sure that the NOCACHE_LOBS increasing within a certain session is a very bad thing? Bad enough to hang an entire session?

User 77bcbbb206

12-10-2005 17:49:14

I just tried the Tomcat 5.5 implementation again and came to the same conclusion. The upper limit is increased, but still remains. After a few minutes, all the tcp connections are closed except for:





Code:
Proto  Local Address          Foreign Address        State


TCP    kdev2:4457             kdev2.kelaroo.com:8080  CLOSE_WAIT





(kdev2 is my computer name)






which never actually completely closes. (kdev2 is my computer name) Everytime a session hangs, another entry is made in the tcp connection table and remains until the computer is restarted (not just Tomcat).





The registration process got a read timeout though, maybe this will help?





Code:
java.net.SocketTimeoutException: Read timed out


   at java.net.SocketInputStream.socketRead0(Native Method)


   at java.net.SocketInputStream.read(Unknown Source)


   at org.apache.coyote.http11.InternalInputBuffer.fill(InternalInputBuffer.java:747)


   at org.apache.coyote.http11.InternalInputBuffer$InputStreamInputBuffer.doRead(InternalInputBuffer.java:777)


   at org.apache.coyote.http11.filters.IdentityInputFilter.doRead(IdentityInputFilter.java:115)


   at org.apache.coyote.http11.InternalInputBuffer.doRead(InternalInputBuffer.java:712)


   at org.apache.coyote.Request.doRead(Request.java:427)


   at org.apache.catalina.connector.InputBuffer.realReadBytes(InputBuffer.java:283)


   at org.apache.tomcat.util.buf.ByteChunk.substract(ByteChunk.java:346)


   at org.apache.catalina.connector.InputBuffer.readByte(InputBuffer.java:292)


   at org.apache.catalina.connector.CoyoteInputStream.read(CoyoteInputStream.java:92)


   at chemaxon.jchem.cartridge.servlets.JChemServletBase.checkJChemVersions(JChemServletBase.java:269)


   at chemaxon.jchem.cartridge.servlets.JChemServletBase.doPost(JChemServletBase.java:228)


   at javax.servlet.http.HttpServlet.service(HttpServlet.java:709)


   at javax.servlet.http.HttpServlet.service(HttpServlet.java:802)


   at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:252)


   at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:173)


   at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:213)


   at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:178)


   at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:126)


   at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:105)


   at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:107)


   at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:148)


   at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:856)


   at org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.processConnection(Http11Protocol.java:744)


   at org.apache.tomcat.util.net.PoolTcpEndpoint.processSocket(PoolTcpEndpoint.java:527)


   at org.apache.tomcat.util.net.LeaderFollowerWorkerThread.runIt(LeaderFollowerWorkerThread.java:80)


   at org.apache.tomcat.util.threads.ThreadPool$ControlRunnable.run(ThreadPool.java:684)


   at java.lang.Thread.run(Unknown Source)

ChemAxon aa7c50abf8

13-10-2005 13:40:54

Jim,
Quote:
As a "Quick Fix" solution, I also increased the Temp and System tablespace size. I actually got the limit to increase to a upper value of ~27K from 967, but an upper limit still remains.


Enlarging these tablespaces did not help in my case.
Quote:
Are we absolutely sure that the NOCACHE_LOBS increasing within a certain session is a very bad thing? Bad enough to hang an entire session?


Not at all any more. My attention has completely shifted from Oracle's life cycle handling of the temporary LOBs to the Oracle-Tomcat communication. I was misled by the description of the "direct path read/write (lob)" events in the Oracle documentation. I was further pushed in the direction of LOBs by the discovery of what I consider a strong evidence of a bug in Oracle's handling of LOB life cycle.





Presently, I believe that the hanging session's waiting for "direct path (lob)" events means in our case waiting for networking I/O and not disk I/O: reading/writing from/to the network connection with Tomcat is probably involved -- instead of writing from/to disks or LOBs as the Oracle documentation implies.
Quote:
I just tried the Tomcat 5.5 implementation again and came to the same conclusion. The upper limit is increased, but still remains.
In my environment, upgrading both Tomcat and the JDK does seem to solve the problem. My test program currently iterates over the 236k-th call to jcf_molconvertb, which makes me optimistic when I compare this to the 1.4k iteration before the upgrade. (My target is 400k iterations. There was a power outage in our offices last night so I had to restart the test this morning.)





Did you use the most recent JDK (version 1.5.0_05) with Tomcat 5.5? Actually, I think the JDK version is much more important here than the Tomcat version. Also, when Oracle and Tomcat are on the same machine, I suggest to use "localhost" in the JChem Streams URL configured for Oracle instead of the domain name of the host (kdev2.kelaroo.com).
Quote:
The registration process got a read timeout though, maybe this will help?
I think this is basically the same problem as the one described in http://www.chemaxon.com/forum/ftopic485.html . There is some kind of incompatibility in the TCP/IP protocol handling between Oracle 9i and JDK 1.4. Oracle 10g with JDK 1.4 does not show this problem and now it seems that Oracle 9i with JDK 1.5 also works. (For some reason, there is a much higher likelihood for this problem to occur on Windows than on Linux, but I saw it occur on Linux as well.)





Peter

User 77bcbbb206

13-10-2005 18:29:45

Peter,





Thanks for working so diligently on this.





My problem still remains. I am currently using the most up to date of everything (I think).





Code:
JCHEM_CORE_PKG.GETENVIRONMENT()


-----------------------------------------------------------------------


Oracle9i Enterprise Edition Release 9.2.0.5.0 - Production


PL/SQL Release 9.2.0.5.0 - Production


CORE    9.2.0.6.0       Production


TNS for 32-bit Windows: Version 9.2.0.5.0 - Production


NLSRTL Version 9.2.0.5.0 - Production


NLSRTL Version 9.2.0.5.0 - Production


JChem version in the database: 3.1.1


JChem version in the Tomcat server: 3.1.1


java.vm.version: 1.5.0_05-b05


java.vm.vendor: Sun Microsystems Inc.


Apache Tomcat/5.5.9


Major JDBC version in Tomcat: 10


Minor JDBC version in Tomcat: 2






I have gotten to about 17K calls to jcf_molconvertb and the session seems to have hung itself. I changed all references of my machine name (kdev2) to localhost in hopes that some type of network communication would be avoided, but that didn't help. I've checked all of the log files from Tomcat and Oracle and nothing seems out of the ordinary.





If I open another session and attempt to run the procedure again, everything works, but the same limit is encountered. I can do this again and again with the same results.





I've been looking around a bit a am going to check about the PGA_AGGREGATE_TARGET size and see if the LOB access or table sort size is causing the block.





-Jim

ChemAxon aa7c50abf8

14-10-2005 16:24:28

Jim,





It turned out that the successful completion of my test case was not due the JDK-Tomcat upgrade, but rather to the fact that I left the connectionTimeout Tomcat-property at its default value: 20 seconds.





Inspecting the TCP/IP packets exchanged between Oracle and Tomcat I found the following:





Assume that the jcf_molconvertb test tends to hang at iteration 900. When connection timeout is disabled in Tomcat, Oracle sends an acknowledgment for the data containing the 899th image to Tomcat and stays there without sending the request for the 900th image.





When connection timeout is set to 20 seconds (e.g.), then Oracle sends nothing for the 900th image -- as in the case with disabled timeout. However, now, after 20 seconds idly passed by, Tomcat closes the connection and this seems to kick the Oracle side out of its state of apparent paralysis. Oracle sends acknowledgment to Tomcat about the connection being closed, opens a new connection and continues to send requests for the remaining images.





I am going to remove from my test the creation and reading of the temporary BLOBs to see if it makes any difference.





Peter

User 77bcbbb206

09-12-2005 17:13:00

Peter,





I'm sorry for bringing this topic back to life, but I got really busy and it's priority waned a bit. Now I have time, I'd like to get back to it...?





I have recently tested the setup you mentioned with the 20s connectionTimeout and still have the registration process freeze on me.





Code:
JCHEM_CORE_PKG.GETENVIRONMENT()


--------------------------------------------------------------------------------


Oracle9i Enterprise Edition Release 9.2.0.5.0 - Production


PL/SQL Release 9.2.0.5.0 - Production


CORE    9.2.0.6.0       Production


TNS for 32-bit Windows: Version 9.2.0.5.0 - Production


NLSRTL Version 9.2.0.5.0 - Production


NLSRTL Version 9.2.0.5.0 - Production


JChem version in the database: 3.1.1


JChem version in the Tomcat server: 3.1.1


java.vm.version: 1.5.0_05-b05


java.vm.vendor: Sun Microsystems Inc.


Apache Tomcat/5.5.12





JCHEM_CORE_PKG.GETENVIRONMENT()


--------------------------------------------------------------------------------


Major JDBC version in Tomcat: 10


Minor JDBC version in Tomcat: 2






Any thoughts or ideas? I'm going to refresh my memory a bit with options we've tried and see if I can come up with anything. Are you sure the connetion timeout solved your problems?

User 77bcbbb206

09-12-2005 19:34:13

I dropped my connectionTimeout value to 5s and was able to get at least one timeout (though I don't think it has anything to do with the freezing problem, just a connection that took longer than 5s).





Code:
Creating FpMaker...


2005-12-09 10:29:47.934 [null]: JDBC driver version: 10.2.0.1.0


java.net.SocketTimeoutException: Read timed out


   at java.net.SocketInputStream.socketRead0(Native Method)


   at java.net.SocketInputStream.read(Unknown Source)


   at org.apache.coyote.http11.InternalInputBuffer.fill(InternalInputBuffer.java:747)


   at org.apache.coyote.http11.InternalInputBuffer$InputStreamInputBuffer.doRead(InternalInputBuffer.java:777)


   at org.apache.coyote.http11.filters.IdentityInputFilter.doRead(IdentityInputFilter.java:115)


   at org.apache.coyote.http11.InternalInputBuffer.doRead(InternalInputBuffer.java:712)


   at org.apache.coyote.Request.doRead(Request.java:427)


   at org.apache.catalina.connector.InputBuffer.realReadBytes(InputBuffer.java:283)


   at org.apache.tomcat.util.buf.ByteChunk.substract(ByteChunk.java:346)


   at org.apache.catalina.connector.InputBuffer.readByte(InputBuffer.java:292)


   at org.apache.catalina.connector.CoyoteInputStream.read(CoyoteInputStream.java:104)


   at chemaxon.jchem.cartridge.servlets.JChemServletBase.checkJChemVersions(JChemServletBase.java:269)


   at chemaxon.jchem.cartridge.servlets.JChemServletBase.doPost(JChemServletBase.java:228)


   at javax.servlet.http.HttpServlet.service(HttpServlet.java:709)


   at javax.servlet.http.HttpServlet.service(HttpServlet.java:802)


   at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:252)


   at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:173)


   at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:213)


   at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:178)


   at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:126)


   at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:105)


   at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:107)


   at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:148)


   at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:868)


   at org.apache.coyote.http11.Http11BaseProtocol$Http11ConnectionHandler.processConnection(Http11BaseProtocol.java:663)


   at org.apache.tomcat.util.net.PoolTcpEndpoint.processSocket(PoolTcpEndpoint.java:527)


   at org.apache.tomcat.util.net.LeaderFollowerWorkerThread.runIt(LeaderFollowerWorkerThread.java:80)


   at org.apache.tomcat.util.threads.ThreadPool$ControlRunnable.run(ThreadPool.java:684)


   at java.lang.Thread.run(Unknown Source)






I agree with you when you said Oracle seems to be in a paralyzed state. Is there any way for me to check that the connection gets closed on the Tomcat side or see any type of open connection state in Oracle. I've tried using a packet sniffer but have yet to make any sense of the communication. If you have any recommendations, I'd gladly take them.

ChemAxon aa7c50abf8

10-12-2005 20:31:30

Quote:
Are you sure the connetion timeout solved your problems?
With a time-out (say 20 seconds), any number of jc_molconvertb iterations seem to complete in my environment. However another problem surfaces: after the Oracle connection has been idle longer than the time-out value (e.g. 20 seconds), Tomcat drops the connection as expected, but Oracle seems to be unable to come up with the right sequence to establish a new TCP connection and the current database session can never be used again with JChem Cartridge. So in my environment there are actually two problems both apparently related to Oracle's implementation of the TCP state machine: one problem surfaces when the connection time-out is set in Tomcat, the other visits me when the time-out is disabled.
Quote:
I dropped my connectionTimeout value to 5s and was able to get at least one timeout (though I don't think it has anything to do with the freezing problem, just a connection that took longer than 5s).
In my environment, I can consistently reproduce a time-out error with Oracle 9i on Windows with a connection time-out value set in Tomcat by going through the following steps:


1.) Open a new database session


2.) Execute a structure search


3.) Leave the session idle for longer than the connection time-out value set in Tomcat


4.) Execute another structure search





In step 4., I always get the time-out error (and the structure search is aborted).
Quote:
Is there any way for me to check that the connection gets closed on the Tomcat side or see any type of open connection state in Oracle.
Unfortunately (???), this low level stuff is hidden by the APIs we are using. The most promising idea which occurred to me was that we could try to have Oracle reconnect to Tomcat each time an HTTP call is made. If Oracle does a better job re-connecting than keeping a connection alive, we would win. The Java API for HTTPConnections does not (probably intentionally) include a close() method. The only way I see to get Oracle to reconnect each time would be to force HTTP/1.0 instead of HTTP/1.1 (reusing the same connection seems to be a feature introduced with HTTP/1.1), but this did not seem to work: there is a way in the Java API to set the HTTP header (so I can insert/change the supported HTTP-version manually) but the Oracle side seems to ignore this (and maintains [as we see not very successfully] a single connection to Tomcat).





Is it not option for you to either upgrade Oracle to 10g or migrate your server away from Windows? I have extensively tested this problem on various combinations of Oracle and and Windows vs. Linux and it seems that this problem only affects this specific combination of Oracle 9i on Windows. Also, I am aware of at least one customer who has been using JChem Cartridge with Oracle 9i on Solaris for quite some time and this customer have not reported any problem of this kind so far.





By the way, which version of Windows do you use? I have only tested so far with Widnows XP. There is a chance (although I guess it is very slim) that the problem does not occur on, say, Windows Advanced Server. (I expect that I will have access to more Windows platforms some time next year.)





Peter

User 77bcbbb206

14-12-2005 04:54:57

Peter,





Please forgive me if I'm not completely clear, but I've been at this problem for what feels like a long time today.





I have a few questions and as my java knowledge is limited, they may seem simple but they may help. First, I finally think we've found our way to the root problem, which is good. It seems that Tomcat is closing its connections (or not depending on the connectionTimeout property). Oracle, or more correctly, Java in Oracle (I think), is not closing tcp connections that should time out. I can see this by looking at a "netstat -a" once the registration process has stalled and see that the connection state stays in CLOSE_WAIT even when Tomcat has been restarted. My question is, is Oracle's JVM running the java used in the JChem cartridge or is the JDK installed on the server running the functionality, or are they the same thing? Can the java end be configured at a server level or are any changes really application specific? Can we change the http client java is using to use whatever Oracle 10g is using without having to upgrade (if we know 10g doesn't have this problem)? I'm currently using the 10g JDBC drivers. Is the same basic trick available for this?





Is there any way something within the JChem http module is, once bombarded with connections, not handling their closure correctly or is this totally handled within the Java runtimes?





I have found this problem to exist on Windows 2003 server and Windows 2000 Professional (some and not all). You have found it to be on Windows XP. I think this problem is Microsoft Operating System agnostic.





I have not tested windows 2000 with Oracle 10g yet, as this is really not an option for our production environment, but I am quite curious. The application we are using is written in Microsoft's .NET and I am more versed in Microsoft's technology than Sun's so shifting to different operating systems is probably not going to happen without a large reason.





Last thought. This is a java problem...? I think, which means someone, somewhere must have run into it before. This leads me to believe there is a bug report or open incident somewhere that can be found. Any ideas on where this might be?

ChemAxon 9c0afc9aaf

14-12-2005 14:00:43

Jim,





Peter took a day off for today.


Until then I will try to answer some of your questions.





The internal Oracle JVM that runs Java stored procedures is a special Java implementation.


It has nothing to do with the JVM that runs Tomcat.


It is also different from the external Sun JVM that ships with Oracle, e.g.:


"C:\Program Files\Oracle\jre\1.3.1\".


In fact the whole point of sending out data to Tomcat for processing is speeding up calculations: the Oracle JVM is much slower than Sun's HotSpot JVM.


The Oracle JVM is tightly integrated with the Oracle version, and can only be replaced by upgrading the Oracle version.
Quote:



Is there any way something within the JChem http module is, once bombarded with connections, not handling their closure correctly or is this totally handled within the Java runtimes?
As Peter wrote this seems to be a platform-specific bug, which has nothing to do with our code (otherwise it would be present on more platform-combinations).


Of course we always try to find some workarounds for bugs like these whenever possible.
Quote:
The application we are using is written in Microsoft's .NET and I am more versed in Microsoft's technology than Sun's so shifting to different operating systems is probably not going to happen without a large reason.
It should be possible to run your .NET application on a Windows server communicating with Oracle 9i on a Linux machine.


Would it be a reasonable alternative for you ?
Quote:



Last thought. This is a java problem...?
It is certainly not a general Java problem.


I would call it an Oracle problem (maybe it has something to do with the Java in Oracle, but this is a special Oracle-specific Java implementation).





Peter, please correct me and add your further thoughts if necessary.





Best regards,





Szilard

User 77bcbbb206

14-12-2005 17:08:22

Szilard,





Thanks for clearing that up. I appreciate the time and effort.





Why would this problem happen so sporadically? I mean, I have two setups as identical as you can get (Windows 2000, Oracle 9.2.0.5, JVM installed identically though script, Tomcat 4.1, JChem 3.1.1, j2sdk1.4.2_05) and this problem shows up on one machine, but not another?????





-Jim

User 77bcbbb206

14-12-2005 22:29:40

All,





I'm upgrading to 10g. I'll let you know the results.





-Jim

ChemAxon aa7c50abf8

15-12-2005 10:03:37

Jim,
Quote:
Why would this problem happen so sporadically? I mean, I have two setups as identical as you can get (Windows 2000, Oracle 9.2.0.5, JVM installed identically though script, Tomcat 4.1, JChem 3.1.1, j2sdk1.4.2_05) and this problem shows up on one machine, but not another?????
This is an important piece of information. Is it so that the problem always occurs on one Windows machine and never on the other? If it is so, we just need to find out the difference(s) between the two machines/configurations.





I do not have many machines with Oracle 9i for Windows on it. In fact, I currently have just one (a Windows XP machine). However, in the past I have tested earlier (prior to 3.0) versions of JChem Cartridge on several machines with Oracle 9i for Windows and all of them showed problems very similar to this.





Peter

User 77bcbbb206

15-12-2005 17:14:52

Peter et all,





Upgrading my development machine to Oracle 10g has solved the problem.





Sorry. I guess when I re-introduced this problem I should have re-explained the finer points as well. If you look at my first post from October you'll see:
Quote:
This process only hangs on certain servers not all servers.
Hmm. Maybe I need to clarify even further. There are two problems we've found.





1) If the connectionTimeout is set to 20s in Tomcat, and a connection that has used JChem sits idle for longer than 20s, that connection is in an unusable state and must be disconnected before it can use JChem again.





This happens on any server and can be remedied by disabling the connectionTimeout property.





2) During loop query of JChem (say jc_molconvert), the process will hang after an unknown period of time and never timeout or error out.





This happens on only some of the servers, all of which have exactly the same setup and currently the only remedy is upgrading the server to 10g (which I really don't want to do yet).





This second problem is the problem I'm trying to solve and have tracked down to one of two things. This problem is tied to either Oracle's JVM or Oracle's JRE, both of which are within ORACLE_HOME and do not depend on the java products you have installed. Just as I write this, I realize that the JRE is standard and shipped with Oracle, but the JVM is installed after Oracle is installed. The JVM installation has to be the problem. I'm going to continue to look around. I'll let you know.





-Jim

User 77bcbbb206

16-12-2005 01:04:04

Peter et all,





I've found a bug report on Oracle's MetaLink site (https://metalink.oracle.com) that seems to be the problem we've been dealing with. Do you have access to this site? If so, the bug number is 3300400. Can you please take a look and see if this makes sense. Can you also verify that there is no patch or fix, that the report states that it has been fixed in 10.2.0.





Thanks,


Jim

ChemAxon aa7c50abf8

16-12-2005 11:21:30

Jim,





Thank you for so diligently helping us sort out this issue.
Quote:
2) During loop query of JChem (say jc_molconvert), the process will hang after an unknown period of time and never timeout or error out.





This happens on only some of the servers, all of which have exactly the same setup and currently the only remedy is upgrading the server to 10g (which I really don't want to do yet).
My question still stands: Is this problem consistently reproducible on one of your Windows 2000 servers and consistently not reproducible on the other? In other words: do you have a Windows 2000 server where everything works fine? On this properly working server, have you tested the case where you resume JChem Cartridge searches after you have left the session idle for more than tomcat-connection-timeout period of time?
Quote:
[...] the bug number is 3300400. Can you please take a look and see if this makes sense.


This bug might be related to the hanging loop query problem, but I am not sure. We do not directly use the Socket Java interface (for performance reasons). We use the Http/Url Java interfaces. It is possible that Oracle's implementation of the Http/Url Java interfaces uses Oracle's Java implementation of the Socket Java interface, but it is also possible that they use some direct native implementation.
Quote:
Can you also verify that there is no patch or fix, that the report states that it has been fixed in 10.2.0.
According to Note:211268.1 ( https://www.metalink.oracle.com/metalink/plsql/ml2_documents.showFrameDocument?p_database_id=NOT&p_id=211268.1 ) a fix for this bug was included in "ORACLE 9I 9.2.0.5 PATCH 2 FOR WINDOWS NT, 2000 , XP AND 2003". Recently, I used patch set 9.2.0.7. The documentation of this patch set says:
Quote:
Patch sets are cumulative. Patch set release 9.2.0.7 includes all fixes in patch sets 9.2.0.7 and earlier as well as new fixes for patch set 9.2.0.7. This means that unless the patch set documentation indicates otherwise, you can apply this patch set to any earlier release 9.2 installation. You do not have to install intermediate patch sets.
So my Oracle version theoretically includes the fix for this bug. The problem is still present nonetheless. I am not sure about your Oracle version. CORE is said to be 9.2.0.6 but all other components are shown to be 9.2.0.5, so it could eventually make sense to apply Patch 3668346. (Raise an ITAR with Oracle, if you are unsure of the applicability of this patch to your Oracle installation.)





If all else fails and some negative performance impact is acceptable for you, I can try to work around this problem using the DBMS_PIPE package for communication between Oracle and Tomcat. I have no experience with this package, so I do not know the degree of the performance impact this workaround will have. (I expect this impact to be at least "noticeable".)





Peter

User 77bcbbb206

19-12-2005 18:24:43

Ahh... The sweet smell of victory.





The patch to 9.2.0.7 has solved the mass structure registration problem completely. I'm not 100% sure the bug mentioned in my last post was the bug causing my problem, but I'm 99% sure. Either way, the problem was specifically with Windows and Oracle 9.2.0.5's socket connection in the JVM. If the Tomcat connectionTimeout is disabled, everything works just the ways it is supposed to.





Please log and note this so no one ever has to go through this problem ever again.





Thank you guys for all your help. I really appreciated it.





-Jim

ChemAxon aa7c50abf8

19-12-2005 20:57:55

I am glad that your problem is solved. Strange that 9.2.0.7 does not solve this problem on Windows XP. Do I have to apply the patch in a special way? What I did was simply to install the patch over a 9.2.0.1 Oracle installation.





Peter

User 77bcbbb206

21-12-2005 17:47:14

Peter,





That's a good question.





The registration process hanging is solved with this patch. I still receive the connection timeout problem (if I query the cartridge, wait longer than the Tomcat timeout period, and then query again).





I've testing this patch on windows 2000 professional and windows 2003 and it seems to work?





I've applied the patch onto a 9.2.0.1 and 9.2.0.6 Oracle server and both seems to have worked? Very strange.

ChemAxon aa7c50abf8

10-01-2008 13:26:23

Jim,





JChem 5.0 has been released. It includes a workaround for this problem.





Cheers,


Peter