User 7b0ee04e66
27-06-2006 10:14:31
Hi we are experiencing intermittant problems with our "JChem" enabled Oracle server, in that the whole server appears to hang due to lack of memory. This happens about once every two weeks and forces us to carry out a reboot of the machine.
On the odd occations that we are able to get a connection to the server and are able to run "top" we see many java processes consumming large amounts of memory.
Below is an example of such a process. Currently our server shows 37 such process, all with a memory size of 799M.
PID USER PRI NI SIZE RSS SHARE STAT %CPU %MEM TIME CPU Cmd
3458 oracle 16 0 799M 796M 5324 S 0.0 10.5 0:57 3 java
Java v1.5.0.4 is installed under the Oracle account and is only installed for Tomcat/JChem. The Tomcat version is v4.1 and the version of the JChem cartridge is v3.1.1.
We believe that it is only Tomcat/JChem that are creating these Java processes. & we were wondering if there are any known issues with Java proceses not being killed?
ChemAxon aa7c50abf8
27-06-2006 14:49:38
Hi,
It appears that the primary symptom of the problem is that the response time of your machine is drastically reduced. How did you establish that the reason was free memory shortage?
P.
User 7b0ee04e66
27-06-2006 15:07:05
Hi,
It appears that the primary symptom of the problem is that the response time of your machine is drastically reduced. How did you establish that the reason was free memory shortage?
Hi,
I'm afraid I don't understand your response. Why do you say that the "primary symptom of the problem is that the response time of your machine is drastically reduced"?
For us, the reason why we began investigating this issue is because users were complaining that they could not get connections to database. The reason why we think that this might be related to lack of free memory is because when we looked at top we saw many processes (in this case Java) appearing to consume much of the machines available memory.
Also I have tried mounting a CDRom on the machine (when it was in this state) and get an "unable to allocate enough memory" error.
Unfortunately we are not Linux experts so we could well believe that what we are seeing is perfectly normal. But to us, it does look strange.
User 7b0ee04e66
27-06-2006 16:13:33
Hi
Sorry no you are right. Most times we know there is an issue is when we cannot connect to the machine. Below is the list of current java processes that top shows. We have placed the Parent Process Id column into the output.
This shows all 37 Java processes, each with a different process Id. However when you look at parent process Id there are only 3 values and all are linked.
I don't know therefore whether we are seeing 37 separate Java processes or 3 processes and many threads within the 3rd process....
PID PPID USER PRI NI SIZE RSS SHARE STAT %CPU %MEM TIME CPU COMM
3456 1 oracle 25 0 799M 799M 5340 S 0.0 10.6 0:10 1 java
3457 3456 oracle 15 0 799M 799M 5340 S 0.0 10.6 0:07 2 java
3458 3457 oracle 16 0 799M 799M 5340 S 0.0 10.6 1:01 1 java
3459 3457 oracle 16 0 799M 799M 5340 S 0.0 10.6 1:00 1 java
3460 3457 oracle 16 0 799M 799M 5340 S 0.0 10.6 1:01 2 java
3461 3457 oracle 16 0 799M 799M 5340 S 0.0 10.6 1:01 0 java
3462 3457 oracle 15 0 799M 799M 5340 S 0.0 10.6 0:26 2 java
3463 3457 oracle 15 0 799M 799M 5340 S 0.0 10.6 0:00 2 java
3464 3457 oracle 15 0 799M 799M 5340 S 0.0 10.6 0:00 1 java
3465 3457 oracle 20 0 799M 799M 5340 S 0.0 10.6 0:00 1 java
3466 3457 oracle 16 0 799M 799M 5340 S 0.0 10.6 0:00 2 java
3467 3457 oracle 17 0 799M 799M 5340 S 0.0 10.6 0:47 1 java
3468 3457 oracle 16 0 799M 799M 5340 S 0.0 10.6 0:46 1 java
3469 3457 oracle 20 0 799M 799M 5340 S 0.0 10.6 0:00 1 java
3470 3457 oracle 15 0 799M 799M 5340 S 0.0 10.6 5:53 3 java
3474 3457 oracle 15 0 799M 799M 5340 S 0.0 10.6 0:51 0 java
3475 3457 oracle 15 0 799M 799M 5340 S 0.0 10.6 0:00 0 java
3477 3457 oracle 15 0 799M 799M 5340 S 0.0 10.6 0:00 2 java
3502 3457 oracle 15 0 799M 799M 5340 S 0.0 10.6 0:00 0 java
3503 3457 oracle 15 0 799M 799M 5340 S 0.0 10.6 0:00 1 java
3504 3457 oracle 15 0 799M 799M 5340 S 0.0 10.6 0:00 2 java
3505 3457 oracle 15 0 799M 799M 5340 S 0.0 10.6 0:00 2 java
3506 3457 oracle 15 0 799M 799M 5340 S 0.0 10.6 0:00 2 java
3507 3457 oracle 15 0 799M 799M 5340 S 0.0 10.6 0:56 1 java
3513 3457 oracle 15 0 799M 799M 5340 S 0.0 10.6 0:00 0 java
3514 3457 oracle 20 0 799M 799M 5340 S 0.0 10.6 0:00 0 java
3515 3457 oracle 20 0 799M 799M 5340 S 0.0 10.6 0:00 0 java
3516 3457 oracle 20 0 799M 799M 5340 S 0.0 10.6 0:00 0 java
3517 3457 oracle 20 0 799M 799M 5340 S 0.0 10.6 0:00 0 java
3518 3457 oracle 20 0 799M 799M 5340 S 0.0 10.6 0:00 0 java
3519 3457 oracle 15 0 799M 799M 5340 S 0.0 10.6 0:00 2 java
12839 3457 oracle 15 0 799M 799M 5340 S 0.0 10.6 0:01 0 java
14051 3457 oracle 15 0 799M 799M 5340 S 0.0 10.6 0:00 0 java
14053 3457 oracle 15 0 799M 799M 5340 S 0.0 10.6 0:00 3 java
15054 3457 oracle 15 0 799M 799M 5340 S 0.0 10.6 0:00 3 java
15536 3457 oracle 15 0 799M 799M 5340 S 0.0 10.6 0:00 2 java
15537 3457 oracle 15 0 799M 799M 5340 S 0.0 10.6 0:00 0 java
15538 3457 oracle 15 0 799M 799M 5340 S 0.0 10.6 0:00 1 java
15539 3457 oracle 15 0 799M 799M 5340 S 0.0 10.6 0:00 1 java
ChemAxon aa7c50abf8
27-06-2006 17:49:48
Hi,
Despite the PIDs being different, we still believe that the processes belong to the same Tomcat-Java process. (Whether the displayed PIDs of threads are different or not may also depend on the operating system version and/or the version of the "top" program.)
We are not aware of any JChem component running in Tomcat which would spawn other processes. This implies that if you start Tomcat just once, there should be only one Tomcat-Java process running on your machine.
Also, it is obviously impossible that all of the 37 processes take up 10.6% of the memory as indicated for each line in the "top" output. That would add up to 392.2% of memory consumption, which does not make sense. If you are looking for the memory hog, I suggest execute something like
Code: |
ps -e -o "pid,ppid,lwp,pcpu,size,pmem,rss,cmd" |
and look at the SZ and %MEM columns, to figure out.
Do you expect the next lock-up to occur soon?
Please, could you post the output of the above command?
P.
User 7b0ee04e66
28-06-2006 08:47:36
Here are the results from running the following command
ps -e -o ";pid,ppid,pcpu,size,pmem,rss,cmd"
(Note lwp was not liked.)
ChemAxon aa7c50abf8
28-06-2006 09:06:35
Please, could you also execute a "free" and post the output?
Could you, please, also post the -Xmx parameter for Tomcat (it should have been in the ouput of the "ps" command, but the long lines were somehow trimmed. Alas, this is another aspect your linux tools are different. [Or maybe you copy-pasted the output from a terminal window.])
Thanks
P.
User 7b0ee04e66
28-06-2006 09:12:17
[oracle@uksap12 oracle]$ free -t -m
total used free shared buffers cached
Mem: 7554 7530 24 0 61 6927
-/+ buffers/cache: 541 7013
Swap: 16998 170 16827
Total: 24553 7701 16852
java -server -Xmx512M
ChemAxon aa7c50abf8
28-06-2006 10:27:41
Did you, or your users, observe a gradual slowdown in the machine's responsiveness a day (or two) before the hang occurs, or does the problem just kick in without any prior sign?
Please, could point your WEB-browser to the "Tomcat Web Application Manager" page (something like
http://localhost:8090/manager/html -- host and port may differ) and check how many sessions are displayed in the "Session" column? (Accessing this page requires the userid and password of a user having the "manager" role in the realm configured for the manager WEB app. The default realm is mapped to the <tomcat-home>/conf/tomcat-users.xml user database. It is possible that you have not yet configure a manager user. If you have not, I can give you further instructions how to do it.)
I whish we could get displayed the entire command line of the Tomcat-Java programs with your version of "ps". Is the end of the lines cut off even if you redirect the ps command's output into less:
Code: |
ps -e -o "pid,ppid,pcpu,size,pmem,rss,stime,cmd" | less |
(Please, note that I added a new column "stime" to the column list. Knowing when a given process was started might come handy.)
Does "ps -ef | less" also trim the lines?
Thanks
P.
User 7b0ee04e66
28-06-2006 10:37:56
You are right in that we have not configured the manager user. Below is the content of the tomcat-users.xml file.
<?xml version='1.0' encoding='utf-8'?>
<tomcat-users>
<role rolename="tomcat"/>
<role rolename="role1"/>
<user username="tomcat" password="tomcat" roles="tomcat"/>
<user username="role1" password="tomcat" roles="role1"/>
<user username="both" password="tomcat" roles="tomcat,role1"/>
</tomcat-users>
Below is the full line for the Java entry from ps (piping through less)
3458 3457 0.0 2267012 10.1 782464 Jun20 /home/oracle/jdk1.5.0_04/bin/java -server -Xmx2000m -Djava.awt.headless=true -Djava.endorsed.dirs=/home/oracle/tomcat/jakarta-tomcat-4.1.31/common/endorsed -classpath /home/oracle/jdk1.5.0_04/lib/tools.jar:/home/oracle/tomcat/jakarta-tomcat-4.1.31/bin/bootstrap.jar -Dcatalina.
All entries are the same (expect for process and parent process Ids)