handle leak

User 4140faeba5

07-12-2011 13:34:52

Hi.


I'm running Jchem 5.4.1.1 on Oracle 11.2.0.2 on Windows Server 2008 R2. I'm seeing a lot of handles not being closed and was wondering if you had a bug in this version of the cartridge with regards to OCI handles ?


Regards,


Micke

ChemAxon aa7c50abf8

07-12-2011 16:19:31

Hi Micke,


Thank you for sharing this observation with us.


Did you get an error message about resources having been exhausted or did you make this observation as part of regular application monitoring/health check? Can you observe a constant increase in open handles over time? If so, at what rate?


Do I understand correctly that it is "OCI handles" which appear to be leaking in particular? If these are "OCI handles", can you see a matching increase in Oracle server resources (such as open cursors, allocated PGA or SGA)?


Please, could you also briefly describe how you made this observation (using what kind of tools)?


Thanks


Peter

User 4140faeba5

08-12-2011 06:24:22

Hi Peter.


The strange thing is that there is no error in any log or trace file that I have found so far.  Oracle does not report anything in the alert log. I have turned up tracing of the cartridge to level 4 now.


What lead me to investigate was that Oracle suddenly crashed. The crash was due to no virtual memory free.


I increased the virtual memory to 8x the size of the physical. After a few months, Oracle crashed again.


No leak is shown if I check the counters inside Oracle, but using Process Explorer or handle utility shows me that Oracle is leaking handles (more than 150000 so far and still increasing).


On that machine, I have 2 instances of the same version of Oracle, but it is only the one with the cartridge that is consuming virtual memory.


I have a support ticket also running at Oracle. I cannot turn of all calls to the cartridge, that would make the application using it to stop. I have turned off all inserts and deletes, so only select are using the cartridge now to see if that will stop the leaks.


There are no specific OCI handles that I can see. Attached is an output from the handle utility that lists all open handles.


Also attached is a screenshot from Process Explorer which shows in green the attempt to open a handle but fails.


I thought I'd check with you if this was something you have encountered.


Regards,


Micke

User 4140faeba5

08-12-2011 06:25:48

The screenshot attached.

ChemAxon aa7c50abf8

08-12-2011 09:12:58

Hi Micke,


Thank you for the input. I assume the only sizable process running in this VM is the Oracle server and this is from this fact that you infer Oracle is leaking the host of handles showing up in handle.txt. Otherwise, I cannot see from this output to which process these resource are attributable. (The JChem Cartridge server is running in a different VM, correct?)


We haven't been reported memory leaks in Oracle yet, but the manifestation of this kind of errors greatly depend on specific usage patterns, so it is entirely possible that you are the unlucky one to first hit it...


We did find and reported a memory leak in Oracle (Bug 6408917), but it was supposed to be fixed in 11.1.0.7 and in 10.2.0.5. Actually, I am not entirely sure about the status of this bug in the 11.2 line, but it is presumably fixed there as well.


Prompted by you enquiry, I revised the relevant portions of the Java Stored Procedure codes used by JChem Cartridge and found one place potentially causing resource leaks. (This particular code part has been there since about 8 years and, paradoxically, it was the result of working around a memory leak in 9i Java Stored Procedures.) While I'd expect a leak from this code part to also result in increasing number of open cursors, it possible that client memory is leaked at a far more greater rate. Possible workaround: starting with JChem version 5.5.1.0, you can try turning off statement caching in JCC Java Stored Procedures: http://www.chemaxon.com/jchem/doc/dev/cartridge/cartapi.html#jchem_misc_pkg.cache_stmts . This call operates in the session scope, so you need to modify application code. We will work to eliminate this leak potential in future JChem versions (e.g. by turning off statement caching by default).


Let me know if this helps.


Peter

User 4140faeba5

12-12-2011 09:58:42

Hi Peter.


Before we had the memory leak, the application did all the transactions against the cartridge per session. I had to change the behaviour becuase of a bug in the Oracle version that was caused by simultanious calls to java procedure. All inserts and deletes are now stored in normal tables and then done against the cartridge by a stored procedure that is run by a job. If I stop the job, the memory leak stops growing.  Attached is a file that contains the stored procedures that are used. The one launched by a job is handlequeuejchem. Do you think that upgrading the cartridge to the latest release will fix the leak ?


Regards,


Micke

ChemAxon aa7c50abf8

12-12-2011 15:10:12

Hi Micke,


Thank you for this input!


There will be some resource management related improvements/bugfixes in 5.8, but until we know what specifically causes the problem you're experiencing, it is difficult to say whether these changes are relevant or not to your problem. (I am more inclined to say that they are not relevant to this problem...)


I started to extensively test the problem, but I may have gone in the wrong direction: I started to work on the assumption that your problem is related to JDBC resources allocated for searching, but your last input makes me suspect that this is about something else. (Also, my tests with searches seem to indicate that even though the number of handles is constantly increasing, memory use stops increasing after a certain time, drops and starts increasing again: https://docs.google.com/spreadsheet/pub?hl=en_US&key=0An8GVXkzvW-qdDVjY3BwTEs1QmU4VzNoTmY4SkJRLXc&hl=en_US&gid=7 . Also, even this increase can be ovserved only when new connections are opened to do the searches every now and then. When the entire batch of 10k searches is executed over a single connection, handle count and memory use appears to be fairly constant: https://docs.google.com/spreadsheet/pub?hl=en_US&key=0An8GVXkzvW-qdDVjY3BwTEs1QmU4VzNoTmY4SkJRLXc&hl=en_US&gid=0)


I have the impression that your problem might be related to this other problem: https://www.chemaxon.com/forum/ftopic6967.html&start=0&postdays=0&postorder=asc&highlight= . Are the stored procedures in question executed in an Oracle job? If so, would it be possible to execute them in an operating system job (using Task Scheduler for example)? The problem may be related to the way Oracle jobs handle connection...


Thanks


Peter

User 4140faeba5

16-12-2011 07:24:23

Hi Peter.


I tried to launch the procedure as a scheduled OS job, but sadly the leak continues.


Looking at the memory consumption graph after each restart of the instance, it looks like everything works ok up to a certain point when something happens in the cartridge (or something related to it) which creates the leak and causes all calls to that function also leaking.


I will now attach a debugger to the Oracle process, run the procedure and send that dump to Oracle to see what is causing the leak.


I'll keep you updated on the progress.


Regards,


Micke

ChemAxon aa7c50abf8

16-12-2011 11:01:45

Hi Micke,


Many thanks for the feedback. Interesting findings.


I assume that the scheduled OS job opens a new database connection each time it is executed, correct? Is the job typically running for a long time? Does the memory leak appear to kick in already at the first run of the job after Oracle instance restart? (I am asking all this, because I would naively assume that resources acquired by a given database connection are released -- or at least marked as releasable -- after the connection is closed. Something nontrivial must be going on here, if resources held by Oracle get accumulated after connections are relased.)


Also, please, could you tell us with what input parameters is the affected JChem index created?


Other than that we are eagerly awaiting what comes out of the debug information.


Many thanks


Peter

User 4140faeba5

19-04-2012 09:59:22

Hi Peter.


After many turns with Oracle support and their development teams, they have filed this as a bug.


13773444 - ORACLE LEAKING INTO THE VIRTUAL MEMORY


I don't know if they will continue to work on the issue since I could not supply them with the cartridge itself and the underlying code.


 


Regards,


Micke

ChemAxon aa7c50abf8

19-04-2012 10:31:25

Hi Micke,


Many thanks for the feedback! Let us know if we can help solve the issue.


I think that "normally" Oracle shouldn't need the JCC source code to fix a bug in Oracle server. They just need a reproducible test case. Part of the test case is likely to be the JCC (binary) distribution which makes it easy to reproduce the problem. They have the source that needs to be fixed.


In my experience, once an Oracle bug is filed based on a reproducible test case (even if it complex to setup), the major factor in how much effort Oracle is willing to put into fixing the bug is the stated impact of the bug on your business. We already requested/initiated several bug fixes from Oracle over the years and they never required JCC source code for the fix.


If JCC license is needed for the test case, we may be able to arrange to issue a time-limited license for this purpose.


Peter