Linux Health - CPU not found
I have communication between the agent and servers check. However, it's returning a status of DOWN because the CPU is not found, but it is reporting a CPU utilization?
Any thoughts on whats going on here?
SCv7.5.8, CPU is 2-way Intel Xeon DC
The check has been performed.
Status: DOWN?
Error returned: No CPU found
Value returned: PING 0ms - 4 % CPU usage - sda2.68 % Free memory - 67 % free on sda1 - 100 % free on none - 96 % free on sda6 -
Any thoughts on whats going on here?
SCv7.5.8, CPU is 2-way Intel Xeon DC
The check has been performed.
Status: DOWN?
Error returned: No CPU found
Value returned: PING 0ms - 4 % CPU usage - sda2.68 % Free memory - 67 % free on sda1 - 100 % free on none - 96 % free on sda6 -
This discussion has been closed.
Comments
If you telnet to the agent on port 5555, what do you get as data returned?
When I telnet into the linux server, from my windows box on port 5555, I get nothing. It just connects - is there something I need to do to see it?
Let's try this then (a bit trickier):
1/ Download the linucproc.conf file from following URL
http://files.serverscheck.net/debug/linucproc.conf
2/ Open with Notepad and replace on the first line the IP address with your IP
3/ Open the Windows Command window (Start > Run > 'cmd')
and go to the /agents subdirectory
4/ In there type following command:
linuxproc_check.exe linucproc
It will generate a file called linucproc.log Reply with content of that file.
--- Process info ---
PID TTY TIME CMD
1 ? 00:00:04 init
2
? 00:00:00 keventd
3 ? 00:00:00 ksoftirqd/0
6 ? 00:0
0:00 bdflush
4 ? 00:00:00 kswapd
5 ? 00:00:00 kscand
7 ?
00:00:00 kupdated
18 ? 00:00:00 vmnixhbd
27 ? 00:00:00
vmkdevd
41 ? 00:00:00 scsi_eh_0
50 ? 00:00:01 kjournald
101 ?
00:00:00 khubd
216 ? 00:00:00 kjournald
217 ? 00:00:00 k
journald
1061 ? 00:00:00 syslogd
1065 ? 00:00:00 klogd
1093 ?
00:00:00 sshd
1145 ? 00:00:00 vmklogger
1153 ? 00:00:00 scsi_eh
_1
1155 ? 00:00:00 vmkiscsid
1203 ? 00:00:00 xinetd
1212 ?
00:00:00 gpm
1232 ? 00:00:00 vmware-watchdog
1237 ? 00:00:07 webAc
cess
1260 ? 00:00:00 crond
1271 ? 00:00:00 vmkload_app
1289 ?
00:00:00 vmware-watchdog
1291 ? 00:00:00 logger
1295 ? 01:43:46
vmware-hostd
1325 ? 00:00:00 vmware-watchdog
1330 ? 00:00:01 cims
erver
1348 ? 00:00:00 vmware-watchdog
1353 ? 00:33:16 vpxa
1389 tt
y1 00:00:00 mingetty
1390 tty2 00:00:00 mingetty
1391 tty3 00:00:00
mingetty
1392 tty4 00:00:00 mingetty
1393 tty5 00:00:00 mingetty
1394 tt
y6 00:00:00 mingetty
1428 ? 00:00:00 cimservera
1837 ? 00:00:0
0 ftbb
1846 ? 00:00:00 ftbackbone
1877 ? 00:00:15 ftAgent
1934 ?
00:00:00 ftStateMon
1940 ? 00:00:04 ftProcMon
1944 ? 00:00:0
0 ftRuleManager
1987 ? 00:00:00 VMap
4231 ? 00:00:00 vmkiscsid
2627
3 ? 00:00:53 vmkload_app
26284 ? 00:00:52 vmkload_app
26288 ?
00:00:54 vmkload_app
26290 ? 00:00:56 vmkload_app
32543 ? 00:00:00
serv
6441 ? 00:00:00 sshd
6443 ? 00:00:00 sftp-server
6553 ?
00:00:00 sshd
6555 pts/1 00:00:00 bash
25769 ? 00:00:00 sh
25770 ?
00:00:00 ps
;--- Disk info ---
Filesystem 1K-blocks Used Availa
ble Use% Mounted on
/dev/sda2 5036316 1504184 3276300 32% /
/dev
/sda1 101089 31258 64612 33% /boot
none
134112 0 134112 0% /dev/shm
/dev/sda6 2008108 7704
8 1829052 5% /var/log
--- Memory info ---
total used
free shared buffers cached
Mem: 268228 260060 8168
0 41996 90560
-/+ buffers/cache: 127504 140724
Swap:
554200 0 554200
--- CPU states ---
Connection to host lost.
You can use the SNMP checks instead on Linux to obtain diskspace information as well as memory availability.
I've found the problem, two really. 'CPU' is capitalized and the code uses 'Cpu'. Also CPU stats are 2 lines of stats instead of one-a header and then the data.
Are there any linux guru's who can modify the grep command to support this output?
top output:
CPU states: cpu user nice system irq softirq iowait idle
total 3.4% 0.0% 0.4% 0.0% 2.0% 0.0% 94.2%
current command:
"sh -c "top b n 2 | grep 'Cpu'"n"}
Fyi, this is a vmware esx3 box running 2.4.21-37.0.2.ELvmnix #1 Mon Sep 25 22:18:34 PDT 2006 i686 i686 i386 GNU/Linux
I tried this:
top b n 2 | grep 'total' | sed 's# total#Cpu(s):#g'
No real luck either ....
--- CPU states ---
Cpu(s): 44.5%us, 0.5%sy, 0.0%ni, 54.2%id, 0.8%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu(s): 47.3%us, 0.7%sy, 0.0%ni, 52.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
It isolates the value for idle time: "54.2%id"
Multiple CPU's are OK. It computes average automatically.
My server has an CPU %idle of 99%.
And the rule wants me to put in CPU% greater than a parameter. In my case as I understand it right, it should say when the %idle is less than in stead of greater than.
Current command is:
top b n 2 | grep 'total' | sed
's/total/Cpu(s)/g' | sed 's/$/ id/'
so it will generate:
Cpu(s) 0.9% 0.0% 0.0% 0.0% 0.0% 0.0% 99.0% id
Cpu(s) 0.7% 0.0% 0.1% 0.0% 0.5% 0.0% 98.4% id
Any help on this ?
If you can not produce it in the same format, then you will need to use SNMP instead. See knowledge base on OIDs
*UPDATE* A change request has been submitted to development to support your output too.
Thanks a lot
Please check release 7.8.5 to see if it works for you with the changes made.
I appreciate very much the willingness to put effort to resolve this issue. But at the moment it is not fixing the issue. Actually it breaks the way Linuxhealth was calculated on a SUSE box.
Here is the case. The linuxhealth test was working well on a Suse box. When testing with the rule linuxhealth the TEST SETTINGS button give following results:
Status: OK
Value returned: PING 3ms - 0.19 % CPU usage - 16 % Free memory - 84 % free on hda3 - 99 % free on tmpfs -
Default command in server.c
top b n 2 | grep -i cpu
Results:
Cpu(s): 0.1% us, 0.0% sy, 0.0% ni, 99.9% id, 0.0% wa, 0.0% hi, 0.0% si
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
Cpu(s): 0.0% us, 0.3% sy, 0.0% ni, 99.7% id, 0.0% wa, 0.0% hi, 0.0% si
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
You was mentioning in earier conversations that ServerCheck is filtering on %id.
So someway how you are doing a calculation of getting the CPU usage by 100% - CPU%id = CPU %usage.
The SUSE Box was doing well in v7.8.4
In VMWare I am capturing the CPU%id as well doing
top b n 2 | grep 'total' | sed 's/total/Cpu(s)/g' | sed 's/$/ id/'
Result:
Cpu(s) 0.0% 0.0% 0.9% 0.0% 0.0% 0.0% 99.0% id
Cpu(s) 0.2% 0.0% 0.0% 0.0% 0.8% 0.0% 98.9% id
Status: OK
Value returned: PING 0ms - 100 % CPU usage - 4 % Free memory - 33 % free on sda2 - 69 % free on sda1 - 100 % free on none - 96 % free on sda6 -
Now at this point your calculation does not seem to calculate right ... In ServersCheck 7.8.5 my SUSE box is reported down due to CPU being 100% utilized, which is not and also my VMWare box is still down due to 100% utilized.
If you need more info, I would be happy to give you.
Thank you, Menno
http://files.serverscheck.net/fixes/monitoring_rule.zip
http://files.serverscheck.net/fixes/monitoring_thread2.zip
Thanks for the quick reply.
However, when I do "TEST SETTINGS" the application hangs. And the eventviewer on the SC server is giving:
Faulting application monitoring_thread2.exe, version 7.8.0.0, faulting module perl58.dll, version 5.8.8.820, fault address 0x00085c78.
So .... still no luck.
Best regards
We need the error message to be able to locate it.
It works fine on our Suse Linux box with the default agent running.
Try with following builds:
http://files.serverscheck.net/fixes/monitoring_rule.zip
http://files.serverscheck.net/fixes/monitoring_thread2.zip
All is working ok now.
Much appreciated. You are doing a good job