Many Java Processes running

User 7b0ee04e66

27-06-2006 10:14:31

Hi we are experiencing intermittant problems with our "JChem" enabled Oracle server, in that the whole server appears to hang due to lack of memory. This happens about once every two weeks and forces us to carry out a reboot of the machine.





On the odd occations that we are able to get a connection to the server and are able to run "top" we see many java processes consumming large amounts of memory.





Below is an example of such a process. Currently our server shows 37 such process, all with a memory size of 799M.





PID USER PRI NI SIZE RSS SHARE STAT %CPU %MEM TIME CPU Cmd


3458 oracle 16 0 799M 796M 5324 S 0.0 10.5 0:57 3 java








Java v1.5.0.4 is installed under the Oracle account and is only installed for Tomcat/JChem. The Tomcat version is v4.1 and the version of the JChem cartridge is v3.1.1.





We believe that it is only Tomcat/JChem that are creating these Java processes. & we were wondering if there are any known issues with Java proceses not being killed?

ChemAxon aa7c50abf8

27-06-2006 14:49:38

Hi,





It appears that the primary symptom of the problem is that the response time of your machine is drastically reduced. How did you establish that the reason was free memory shortage?





P.

User 7b0ee04e66

27-06-2006 15:07:05

Hi,





It appears that the primary symptom of the problem is that the response time of your machine is drastically reduced. How did you establish that the reason was free memory shortage?





Hi,





I'm afraid I don't understand your response. Why do you say that the "primary symptom of the problem is that the response time of your machine is drastically reduced"?





For us, the reason why we began investigating this issue is because users were complaining that they could not get connections to database. The reason why we think that this might be related to lack of free memory is because when we looked at top we saw many processes (in this case Java) appearing to consume much of the machines available memory.





Also I have tried mounting a CDRom on the machine (when it was in this state) and get an "unable to allocate enough memory" error.





Unfortunately we are not Linux experts so we could well believe that what we are seeing is perfectly normal. But to us, it does look strange.

ChemAxon aa7c50abf8

27-06-2006 15:49:55

Quote:
I'm afraid I don't understand your response. Why do you say that the "primary symptom of the problem is that the response time of your machine is drastically reduced"?
Because this was the only way I could explain what you wrote:
Quote:
On the odd occations that we are able to get a connection to the server and are able to run "top"...
I interpreted this (maybe mistakenly) that it was very difficult to connect to your machine. Let me know, if my interpretation is wrong.





Sorry, but I have to take a step-by-step approach in order to avoid mistaking assumptions for facts.





First off, if the only Java program which you run on your machine is Tomcat, there should not be more Java processes than just one. It is possible that in the "top" output, you see the threads (37 of them) of a single Java process. (I cannot say this for sure, because on my Linux, "top" consolidates threads into the owner process and displays only the process. But other (mainly older) Linux distributions may contain "top" which displays all the threads of a process on a separate line.) Do all of the 37 Java processes you mention have PID (process ID) equal to 3458? If yes, you only have one Java process and the total virtual memory consumption of the process is 799M in the SIZE column. (Since threads of a given process share the same process space, I strongly suspect that they acquire memory on behalf of the process so the operating system probably does not keep track of the memory allocation on the thread level -- and it simply displays the virtual memory size of the entire process for each thread listed.)





Is the PID for all of the 37 Java processes the same?





P.

User 7b0ee04e66

27-06-2006 16:13:33

Hi





Sorry no you are right. Most times we know there is an issue is when we cannot connect to the machine. Below is the list of current java processes that top shows. We have placed the Parent Process Id column into the output.





This shows all 37 Java processes, each with a different process Id. However when you look at parent process Id there are only 3 values and all are linked.





I don't know therefore whether we are seeing 37 separate Java processes or 3 processes and many threads within the 3rd process....





PID PPID USER PRI NI SIZE RSS SHARE STAT %CPU %MEM TIME CPU COMM


3456 1 oracle 25 0 799M 799M 5340 S 0.0 10.6 0:10 1 java


3457 3456 oracle 15 0 799M 799M 5340 S 0.0 10.6 0:07 2 java


3458 3457 oracle 16 0 799M 799M 5340 S 0.0 10.6 1:01 1 java


3459 3457 oracle 16 0 799M 799M 5340 S 0.0 10.6 1:00 1 java


3460 3457 oracle 16 0 799M 799M 5340 S 0.0 10.6 1:01 2 java


3461 3457 oracle 16 0 799M 799M 5340 S 0.0 10.6 1:01 0 java


3462 3457 oracle 15 0 799M 799M 5340 S 0.0 10.6 0:26 2 java


3463 3457 oracle 15 0 799M 799M 5340 S 0.0 10.6 0:00 2 java


3464 3457 oracle 15 0 799M 799M 5340 S 0.0 10.6 0:00 1 java


3465 3457 oracle 20 0 799M 799M 5340 S 0.0 10.6 0:00 1 java


3466 3457 oracle 16 0 799M 799M 5340 S 0.0 10.6 0:00 2 java


3467 3457 oracle 17 0 799M 799M 5340 S 0.0 10.6 0:47 1 java


3468 3457 oracle 16 0 799M 799M 5340 S 0.0 10.6 0:46 1 java


3469 3457 oracle 20 0 799M 799M 5340 S 0.0 10.6 0:00 1 java


3470 3457 oracle 15 0 799M 799M 5340 S 0.0 10.6 5:53 3 java


3474 3457 oracle 15 0 799M 799M 5340 S 0.0 10.6 0:51 0 java


3475 3457 oracle 15 0 799M 799M 5340 S 0.0 10.6 0:00 0 java


3477 3457 oracle 15 0 799M 799M 5340 S 0.0 10.6 0:00 2 java


3502 3457 oracle 15 0 799M 799M 5340 S 0.0 10.6 0:00 0 java


3503 3457 oracle 15 0 799M 799M 5340 S 0.0 10.6 0:00 1 java


3504 3457 oracle 15 0 799M 799M 5340 S 0.0 10.6 0:00 2 java


3505 3457 oracle 15 0 799M 799M 5340 S 0.0 10.6 0:00 2 java


3506 3457 oracle 15 0 799M 799M 5340 S 0.0 10.6 0:00 2 java


3507 3457 oracle 15 0 799M 799M 5340 S 0.0 10.6 0:56 1 java


3513 3457 oracle 15 0 799M 799M 5340 S 0.0 10.6 0:00 0 java


3514 3457 oracle 20 0 799M 799M 5340 S 0.0 10.6 0:00 0 java


3515 3457 oracle 20 0 799M 799M 5340 S 0.0 10.6 0:00 0 java


3516 3457 oracle 20 0 799M 799M 5340 S 0.0 10.6 0:00 0 java


3517 3457 oracle 20 0 799M 799M 5340 S 0.0 10.6 0:00 0 java


3518 3457 oracle 20 0 799M 799M 5340 S 0.0 10.6 0:00 0 java


3519 3457 oracle 15 0 799M 799M 5340 S 0.0 10.6 0:00 2 java


12839 3457 oracle 15 0 799M 799M 5340 S 0.0 10.6 0:01 0 java


14051 3457 oracle 15 0 799M 799M 5340 S 0.0 10.6 0:00 0 java


14053 3457 oracle 15 0 799M 799M 5340 S 0.0 10.6 0:00 3 java


15054 3457 oracle 15 0 799M 799M 5340 S 0.0 10.6 0:00 3 java


15536 3457 oracle 15 0 799M 799M 5340 S 0.0 10.6 0:00 2 java


15537 3457 oracle 15 0 799M 799M 5340 S 0.0 10.6 0:00 0 java


15538 3457 oracle 15 0 799M 799M 5340 S 0.0 10.6 0:00 1 java


15539 3457 oracle 15 0 799M 799M 5340 S 0.0 10.6 0:00 1 java

ChemAxon aa7c50abf8

27-06-2006 17:49:48

Hi,





Despite the PIDs being different, we still believe that the processes belong to the same Tomcat-Java process. (Whether the displayed PIDs of threads are different or not may also depend on the operating system version and/or the version of the "top" program.)





We are not aware of any JChem component running in Tomcat which would spawn other processes. This implies that if you start Tomcat just once, there should be only one Tomcat-Java process running on your machine.





Also, it is obviously impossible that all of the 37 processes take up 10.6% of the memory as indicated for each line in the "top" output. That would add up to 392.2% of memory consumption, which does not make sense. If you are looking for the memory hog, I suggest execute something like


Code:
ps -e -o "pid,ppid,lwp,pcpu,size,pmem,rss,cmd"



and look at the SZ and %MEM columns, to figure out.





Do you expect the next lock-up to occur soon?





Please, could you post the output of the above command?





P.

User 7b0ee04e66

28-06-2006 08:47:36

Here are the results from running the following command





ps -e -o ";pid,ppid,pcpu,size,pmem,rss,cmd"





(Note lwp was not liked.)

ChemAxon aa7c50abf8

28-06-2006 09:06:35

Please, could you also execute a "free" and post the output?





Could you, please, also post the -Xmx parameter for Tomcat (it should have been in the ouput of the "ps" command, but the long lines were somehow trimmed. Alas, this is another aspect your linux tools are different. [Or maybe you copy-pasted the output from a terminal window.])





Thanks


P.

User 7b0ee04e66

28-06-2006 09:12:17

[oracle@uksap12 oracle]$ free -t -m


total used free shared buffers cached


Mem: 7554 7530 24 0 61 6927


-/+ buffers/cache: 541 7013


Swap: 16998 170 16827


Total: 24553 7701 16852











java -server -Xmx512M

ChemAxon aa7c50abf8

28-06-2006 10:27:41

Did you, or your users, observe a gradual slowdown in the machine's responsiveness a day (or two) before the hang occurs, or does the problem just kick in without any prior sign?





Please, could point your WEB-browser to the "Tomcat Web Application Manager" page (something like http://localhost:8090/manager/html -- host and port may differ) and check how many sessions are displayed in the "Session" column? (Accessing this page requires the userid and password of a user having the "manager" role in the realm configured for the manager WEB app. The default realm is mapped to the <tomcat-home>/conf/tomcat-users.xml user database. It is possible that you have not yet configure a manager user. If you have not, I can give you further instructions how to do it.)





I whish we could get displayed the entire command line of the Tomcat-Java programs with your version of "ps". Is the end of the lines cut off even if you redirect the ps command's output into less:


Code:
ps -e -o "pid,ppid,pcpu,size,pmem,rss,stime,cmd" | less



(Please, note that I added a new column "stime" to the column list. Knowing when a given process was started might come handy.)


Does "ps -ef | less" also trim the lines?





Thanks


P.

User 7b0ee04e66

28-06-2006 10:37:56

You are right in that we have not configured the manager user. Below is the content of the tomcat-users.xml file.





<?xml version='1.0' encoding='utf-8'?>


<tomcat-users>


<role rolename="tomcat"/>


<role rolename="role1"/>


<user username="tomcat" password="tomcat" roles="tomcat"/>


<user username="role1" password="tomcat" roles="role1"/>


<user username="both" password="tomcat" roles="tomcat,role1"/>


</tomcat-users>





Below is the full line for the Java entry from ps (piping through less)





3458 3457 0.0 2267012 10.1 782464 Jun20 /home/oracle/jdk1.5.0_04/bin/java -server -Xmx2000m -Djava.awt.headless=true -Djava.endorsed.dirs=/home/oracle/tomcat/jakarta-tomcat-4.1.31/common/endorsed -classpath /home/oracle/jdk1.5.0_04/lib/tools.jar:/home/oracle/tomcat/jakarta-tomcat-4.1.31/bin/bootstrap.jar -Dcatalina.








All entries are the same (expect for process and parent process Ids)