Linux Health - CPU not found

October 2007

I have communication between the agent and servers check. However, it's returning a status of DOWN because the CPU is not found, but it is reporting a CPU utilization?

Any thoughts on whats going on here?

SCv7.5.8, CPU is 2-way Intel Xeon DC

The check has been performed.

Status: DOWN?

Error returned: No CPU found

Value returned: PING 0ms - 4 % CPU usage - sda2.68 % Free memory - 67 % free on sda1 - 100 % free on none - 96 % free on sda6 -

October 2007

It seems not to return the data in the correct structure.

If you telnet to the agent on port 5555, what do you get as data returned?

October 2007

Quote: Originally posted by Administrator on 30 October 2007

It seems not to return the data in the correct structure.

If you telnet to the agent on port 5555, what do you get as data returned?

When I telnet into the linux server, from my windows box on port 5555, I get nothing. It just connects - is there something I need to do to see it?

October 2007

When you hit enter you do not get any output returned?

Let's try this then (a bit trickier):

1/ Download the linucproc.conf file from following URL

2/ Open with Notepad and replace on the first line the IP address with your IP

3/ Open the Windows Command window (Start > Run > 'cmd')

and go to the /agents subdirectory

4/ In there type following command:

linuxproc_check.exe linucproc

It will generate a file called linucproc.log Reply with content of that file.

October 2007

I hit Enter and had to wait a little bit. Here is the telnet session output. Let me know if the log file will provide more and I'll do that.

--- Process info ---

PID TTY TIME CMD

1 ? 00:00:04 init

2

? 00:00:00 keventd

3 ? 00:00:00 ksoftirqd/0

6 ? 00:0

0:00 bdflush

4 ? 00:00:00 kswapd

5 ? 00:00:00 kscand

7 ?

00:00:00 kupdated

18 ? 00:00:00 vmnixhbd

27 ? 00:00:00

vmkdevd

41 ? 00:00:00 scsi_eh_0

50 ? 00:00:01 kjournald

101 ?

00:00:00 khubd

216 ? 00:00:00 kjournald

217 ? 00:00:00 k

journald

1061 ? 00:00:00 syslogd

1065 ? 00:00:00 klogd

1093 ?

00:00:00 sshd

1145 ? 00:00:00 vmklogger

1153 ? 00:00:00 scsi_eh

_1

1155 ? 00:00:00 vmkiscsid

1203 ? 00:00:00 xinetd

1212 ?

00:00:00 gpm

1232 ? 00:00:00 vmware-watchdog

1237 ? 00:00:07 webAc

cess

1260 ? 00:00:00 crond

1271 ? 00:00:00 vmkload_app

1289 ?

00:00:00 vmware-watchdog

1291 ? 00:00:00 logger

1295 ? 01:43:46

vmware-hostd

1325 ? 00:00:00 vmware-watchdog

1330 ? 00:00:01 cims

erver

1348 ? 00:00:00 vmware-watchdog

1353 ? 00:33:16 vpxa

1389 tt

y1 00:00:00 mingetty

1390 tty2 00:00:00 mingetty

1391 tty3 00:00:00

mingetty

1392 tty4 00:00:00 mingetty

1393 tty5 00:00:00 mingetty

1394 tt

y6 00:00:00 mingetty

1428 ? 00:00:00 cimservera

1837 ? 00:00:0

0 ftbb

1846 ? 00:00:00 ftbackbone

1877 ? 00:00:15 ftAgent

1934 ?

00:00:00 ftStateMon

1940 ? 00:00:04 ftProcMon

1944 ? 00:00:0

0 ftRuleManager

1987 ? 00:00:00 VMap

4231 ? 00:00:00 vmkiscsid

2627

3 ? 00:00:53 vmkload_app

26284 ? 00:00:52 vmkload_app

26288 ?

00:00:54 vmkload_app

26290 ? 00:00:56 vmkload_app

32543 ? 00:00:00

serv

6441 ? 00:00:00 sshd

6443 ? 00:00:00 sftp-server

6553 ?

00:00:00 sshd

6555 pts/1 00:00:00 bash

25769 ? 00:00:00 sh

25770 ?

00:00:00 ps

;--- Disk info ---

Filesystem 1K-blocks Used Availa

ble Use% Mounted on

/dev/sda2 5036316 1504184 3276300 32% /

/dev

/sda1 101089 31258 64612 33% /boot

none

134112 0 134112 0% /dev/shm

/dev/sda6 2008108 7704

8 1829052 5% /var/log

--- Memory info ---

total used

free shared buffers cached

Mem: 268228 260060 8168

0 41996 90560

-/+ buffers/cache: 127504 140724

Swap:

554200 0 554200

--- CPU states ---

Connection to host lost.

October 2007

For some reason the code on your Linux system does not return the CPU value.

You can use the SNMP checks instead on Linux to obtain diskspace information as well as memory availability.

October 2007

I was hoping to use a single rule.

I've found the problem, two really. 'CPU' is capitalized and the code uses 'Cpu'. Also CPU stats are 2 lines of stats instead of one-a header and then the data.

Are there any linux guru's who can modify the grep command to support this output?

top output:

CPU states: cpu user nice system irq softirq iowait idle

total 3.4% 0.0% 0.4% 0.0% 2.0% 0.0% 94.2%

current command:

"sh -c "top b n 2 | grep 'Cpu'"n"}

Fyi, this is a vmware esx3 box running 2.4.21-37.0.2.ELvmnix #1 Mon Sep 25 22:18:34 PDT 2006 i686 i686 i386 GNU/Linux

November 2007

You have the same problem as me. I reported it as well.

I tried this:

top b n 2 | grep 'total' | sed 's# total#Cpu(s):#g'

No real luck either ....

November 2007

Data needs to be received in following format:

--- CPU states ---

Cpu(s): 44.5%us, 0.5%sy, 0.0%ni, 54.2%id, 0.8%wa, 0.0%hi, 0.0%si, 0.0%st

Cpu(s): 47.3%us, 0.7%sy, 0.0%ni, 52.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st

It isolates the value for idle time: "54.2%id"

Multiple CPU's are OK. It computes average automatically.

November 2007

One question though: You are saying that it isolates %id.

My server has an CPU %idle of 99%.

And the rule wants me to put in CPU% greater than a parameter. In my case as I understand it right, it should say when the %idle is less than in stead of greater than.

Current command is:

top b n 2 | grep 'total' | sed

's/total/Cpu(s)/g' | sed 's/$/ id/'

so it will generate:

Cpu(s) 0.9% 0.0% 0.0% 0.0% 0.0% 0.0% 99.0% id

Cpu(s) 0.7% 0.0% 0.1% 0.0% 0.5% 0.0% 98.4% id

Any help on this ?

November 2007

This is not the format accepted by the software.

If you can not produce it in the same format, then you will need to use SNMP instead. See knowledge base on OIDs

*UPDATE* A change request has been submitted to development to support your output too.

November 2007

That would be great !

Thanks a lot

November 2007

Jacobsenm,

Please check release 7.8.5 to see if it works for you with the changes made.

November 2007

Hi,

I appreciate very much the willingness to put effort to resolve this issue. But at the moment it is not fixing the issue. Actually it breaks the way Linuxhealth was calculated on a SUSE box.

Here is the case. The linuxhealth test was working well on a Suse box. When testing with the rule linuxhealth the TEST SETTINGS button give following results:

Status: OK

Value returned: PING 3ms - 0.19 % CPU usage - 16 % Free memory - 84 % free on hda3 - 99 % free on tmpfs -

Default command in server.c

top b n 2 | grep -i cpu

Results:

Cpu(s): 0.1% us, 0.0% sy, 0.0% ni, 99.9% id, 0.0% wa, 0.0% hi, 0.0% si

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND

Cpu(s): 0.0% us, 0.3% sy, 0.0% ni, 99.7% id, 0.0% wa, 0.0% hi, 0.0% si

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND

You was mentioning in earier conversations that ServerCheck is filtering on %id.

So someway how you are doing a calculation of getting the CPU usage by 100% - CPU%id = CPU %usage.

The SUSE Box was doing well in v7.8.4

In VMWare I am capturing the CPU%id as well doing

top b n 2 | grep 'total' | sed 's/total/Cpu(s)/g' | sed 's/$/ id/'

Result:

Cpu(s) 0.0% 0.0% 0.9% 0.0% 0.0% 0.0% 99.0% id

Cpu(s) 0.2% 0.0% 0.0% 0.0% 0.8% 0.0% 98.9% id

Status: OK

Value returned: PING 0ms - 100 % CPU usage - 4 % Free memory - 33 % free on sda2 - 69 % free on sda1 - 100 % free on none - 96 % free on sda6 -

Now at this point your calculation does not seem to calculate right ... In ServersCheck 7.8.5 my SUSE box is reported down due to CPU being 100% utilized, which is not and also my VMWare box is still down due to 100% utilized.

If you need more info, I would be happy to give you.

Thank you, Menno

November 2007

Try with following builds:

November 2007

Hi,

Thanks for the quick reply.

However, when I do "TEST SETTINGS" the application hangs. And the eventviewer on the SC server is giving:

Faulting application monitoring_thread2.exe, version 7.8.0.0, faulting module perl58.dll, version 5.8.8.820, fault address 0x00085c78.

So .... still no luck.

Best regards

November 2007

Run the s-server.exe in debug mode and then repeat it. When it hangs, then reply with the content of debug.log file

We need the error message to be able to locate it.

It works fine on our Suse Linux box with the default agent running.

November 2007

We found a coding error with variables and that could have resulted in a division by zero error.

Try with following builds:

November 2007

Looks great !

All is working ok now.

Much appreciated. You are doing a good job

November 2007

OK We are going to release it as 7.8.6

Linux Health - CPU not found

Comments