Mass structure registration hangs on some computers

11-10-2005 12:17:45

Does this statement show any increase in temporary blobs?

Please, could you check with the following sql statement to see which event the hanging Oracle session is waiting for (after it gets stuck)?:

11-10-2005 17:00:28

There are about 20K rows in the table "CHEM_STRUCTURE". I'm calling jcf_molconvertb and immediately freeing the blob returned.

11-10-2005 19:09:29

does not produce any increase in tmp lobs (as per v$temporary_lobs) when executed on the sqlplus command line, even though there is apparently nothing which would free the tmp blobs returned by jc_molconvertb.

Peter

12-10-2005 08:10:18

For the above, there is an explanation here:

http://download-west.oracle.com/docs/cd/B10501_01/appdev.920/a96591/adl07mds.htm#135757

It says:

12-10-2005 12:15:40

I have added datafiles to both the temporary tablespace and the system tablespace and the image generation still hangs after the same number of calls to jcf_molconvertb.

More importantly, I have tested the solution (outlined above) by which the number of non-freed temporary BLOBs can be kept under limits and the session still hangs at the same number of iterations. This means that it is not the number of temporary BLOBs which causes this problem.

Peter

12-10-2005 17:49:14

which never actually completely closes. (kdev2 is my computer name) Everytime a session hangs, another entry is made in the tcp connection table and remains until the computer is restarted (not just Tomcat).

The registration process got a read timeout though, maybe this will help?

13-10-2005 13:40:54

Enlarging these tablespaces did not help in my case.

13-10-2005 18:29:45

I have gotten to about 17K calls to jcf_molconvertb and the session seems to have hung itself. I changed all references of my machine name (kdev2) to localhost in hopes that some type of network communication would be avoided, but that didn't help. I've checked all of the log files from Tomcat and Oracle and nothing seems out of the ordinary.

If I open another session and attempt to run the procedure again, everything works, but the same limit is encountered. I can do this again and again with the same results.

I've been looking around a bit a am going to check about the PGA_AGGREGATE_TARGET size and see if the LOB access or table sort size is causing the block.

-Jim

09-12-2005 17:13:00

Any thoughts or ideas? I'm going to refresh my memory a bit with options we've tried and see if I can come up with anything. Are you sure the connetion timeout solved your problems?

09-12-2005 19:34:13

I agree with you when you said Oracle seems to be in a paralyzed state. Is there any way for me to check that the connection gets closed on the Tomcat side or see any type of open connection state in Oracle. I've tried using a packet sniffer but have yet to make any sense of the communication. If you have any recommendations, I'd gladly take them.

10-12-2005 20:31:30

With a time-out (say 20 seconds), any number of jc_molconvertb iterations seem to complete in my environment. However another problem surfaces: after the Oracle connection has been idle longer than the time-out value (e.g. 20 seconds), Tomcat drops the connection as expected, but Oracle seems to be unable to come up with the right sequence to establish a new TCP connection and the current database session can never be used again with JChem Cartridge. So in my environment there are actually two problems both apparently related to Oracle's implementation of the TCP state machine: one problem surfaces when the connection time-out is set in Tomcat, the other visits me when the time-out is disabled.

14-12-2005 14:00:43

As Peter wrote this seems to be a platform-specific bug, which has nothing to do with our code (otherwise it would be present on more platform-combinations).

Of course we always try to find some workarounds for bugs like these whenever possible.

15-12-2005 10:03:37

This is an important piece of information. Is it so that the problem always occurs on one Windows machine and never on the other? If it is so, we just need to find out the difference(s) between the two machines/configurations.

I do not have many machines with Oracle 9i for Windows on it. In fact, I currently have just one (a Windows XP machine). However, in the past I have tested earlier (prior to 3.0) versions of JChem Cartridge on several machines with Oracle 9i for Windows and all of them showed problems very similar to this.

Peter

15-12-2005 17:14:52

Hmm. Maybe I need to clarify even further. There are two problems we've found.

1) If the connectionTimeout is set to 20s in Tomcat, and a connection that has used JChem sits idle for longer than 20s, that connection is in an unusable state and must be disconnected before it can use JChem again.

This happens on any server and can be remedied by disabling the connectionTimeout property.

2) During loop query of JChem (say jc_molconvert), the process will hang after an unknown period of time and never timeout or error out.

This happens on only some of the servers, all of which have exactly the same setup and currently the only remedy is upgrading the server to 10g (which I really don't want to do yet).

This second problem is the problem I'm trying to solve and have tracked down to one of two things. This problem is tied to either Oracle's JVM or Oracle's JRE, both of which are within ORACLE_HOME and do not depend on the java products you have installed. Just as I write this, I realize that the JRE is standard and shipped with Oracle, but the JVM is installed after Oracle is installed. The JVM installation has to be the problem. I'm going to continue to look around. I'll let you know.

-Jim

16-12-2005 11:21:30

My question still stands: Is this problem consistently reproducible on one of your Windows 2000 servers and consistently not reproducible on the other? In other words: do you have a Windows 2000 server where everything works fine? On this properly working server, have you tested the case where you resume JChem Cartridge searches after you have left the session idle for more than tomcat-connection-timeout period of time?