Linux Health - CPU not found

jbastianjbastian
I have communication between the agent and servers check. However, it's returning a status of DOWN because the CPU is not found, but it is reporting a CPU utilization?

Any thoughts on whats going on here?



SCv7.5.8, CPU is 2-way Intel Xeon DC



The check has been performed.

Status: DOWN?

Error returned: No CPU found

Value returned: PING 0ms - 4 % CPU usage - sda2.68 % Free memory - 67 % free on sda1 - 100 % free on none - 96 % free on sda6 -

Comments

  • AdministratorAdministrator
    It seems not to return the data in the correct structure.



    If you telnet to the agent on port 5555, what do you get as data returned?
  • jbastianjbastian
    Quote: Originally posted by Administrator on 30 October 2007

    It seems not to return the data in the correct structure.



    If you telnet to the agent on port 5555, what do you get as data returned?








    When I telnet into the linux server, from my windows box on port 5555, I get nothing. It just connects - is there something I need to do to see it?
  • AdministratorAdministrator
    When you hit enter you do not get any output returned?



    Let's try this then (a bit trickier):

    1/ Download the linucproc.conf file from following URL

    http://files.serverscheck.net/debug/linucproc.conf



    2/ Open with Notepad and replace on the first line the IP address with your IP



    3/ Open the Windows Command window (Start > Run > 'cmd')

    and go to the /agents subdirectory



    4/ In there type following command:

    linuxproc_check.exe linucproc



    It will generate a file called linucproc.log Reply with content of that file.
  • jbastianjbastian
    I hit Enter and had to wait a little bit. Here is the telnet session output. Let me know if the log file will provide more and I'll do that.



    --- Process info ---

    PID TTY TIME CMD

    1 ? 00:00:04 init

    2

    ? 00:00:00 keventd

    3 ? 00:00:00 ksoftirqd/0

    6 ? 00:0

    0:00 bdflush

    4 ? 00:00:00 kswapd

    5 ? 00:00:00 kscand

    7 ?

    00:00:00 kupdated

    18 ? 00:00:00 vmnixhbd

    27 ? 00:00:00

    vmkdevd

    41 ? 00:00:00 scsi_eh_0

    50 ? 00:00:01 kjournald

    101 ?

    00:00:00 khubd

    216 ? 00:00:00 kjournald

    217 ? 00:00:00 k

    journald

    1061 ? 00:00:00 syslogd

    1065 ? 00:00:00 klogd

    1093 ?

    00:00:00 sshd

    1145 ? 00:00:00 vmklogger

    1153 ? 00:00:00 scsi_eh

    _1

    1155 ? 00:00:00 vmkiscsid

    1203 ? 00:00:00 xinetd

    1212 ?

    00:00:00 gpm

    1232 ? 00:00:00 vmware-watchdog

    1237 ? 00:00:07 webAc

    cess

    1260 ? 00:00:00 crond

    1271 ? 00:00:00 vmkload_app

    1289 ?

    00:00:00 vmware-watchdog

    1291 ? 00:00:00 logger

    1295 ? 01:43:46

    vmware-hostd

    1325 ? 00:00:00 vmware-watchdog

    1330 ? 00:00:01 cims

    erver

    1348 ? 00:00:00 vmware-watchdog

    1353 ? 00:33:16 vpxa

    1389 tt

    y1 00:00:00 mingetty

    1390 tty2 00:00:00 mingetty

    1391 tty3 00:00:00

    mingetty

    1392 tty4 00:00:00 mingetty

    1393 tty5 00:00:00 mingetty

    1394 tt

    y6 00:00:00 mingetty

    1428 ? 00:00:00 cimservera

    1837 ? 00:00:0

    0 ftbb

    1846 ? 00:00:00 ftbackbone

    1877 ? 00:00:15 ftAgent

    1934 ?

    00:00:00 ftStateMon

    1940 ? 00:00:04 ftProcMon

    1944 ? 00:00:0

    0 ftRuleManager

    1987 ? 00:00:00 VMap

    4231 ? 00:00:00 vmkiscsid

    2627

    3 ? 00:00:53 vmkload_app

    26284 ? 00:00:52 vmkload_app

    26288 ?

    00:00:54 vmkload_app

    26290 ? 00:00:56 vmkload_app

    32543 ? 00:00:00

    serv

    6441 ? 00:00:00 sshd

    6443 ? 00:00:00 sftp-server

    6553 ?

    00:00:00 sshd

    6555 pts/1 00:00:00 bash

    25769 ? 00:00:00 sh

    25770 ?

    00:00:00 ps



    ;--- Disk info ---

    Filesystem 1K-blocks Used Availa

    ble Use% Mounted on

    /dev/sda2 5036316 1504184 3276300 32% /

    /dev

    /sda1 101089 31258 64612 33% /boot

    none

    134112 0 134112 0% /dev/shm

    /dev/sda6 2008108 7704

    8 1829052 5% /var/log



    --- Memory info ---

    total used

    free shared buffers cached

    Mem: 268228 260060 8168

    0 41996 90560

    -/+ buffers/cache: 127504 140724

    Swap:

    554200 0 554200



    --- CPU states ---





    Connection to host lost.
  • AdministratorAdministrator
    For some reason the code on your Linux system does not return the CPU value.



    You can use the SNMP checks instead on Linux to obtain diskspace information as well as memory availability.
  • jbastianjbastian
    I was hoping to use a single rule.



    I've found the problem, two really. 'CPU' is capitalized and the code uses 'Cpu'. Also CPU stats are 2 lines of stats instead of one-a header and then the data.



    Are there any linux guru's who can modify the grep command to support this output?



    top output:

    CPU states: cpu user nice system irq softirq iowait idle

    total 3.4% 0.0% 0.4% 0.0% 2.0% 0.0% 94.2%



    current command:

    "sh -c "top b n 2 | grep 'Cpu'"n"}



    Fyi, this is a vmware esx3 box running 2.4.21-37.0.2.ELvmnix #1 Mon Sep 25 22:18:34 PDT 2006 i686 i686 i386 GNU/Linux
  • jacobsenmjacobsenm
    You have the same problem as me. I reported it as well.

    I tried this:



    top b n 2 | grep 'total' | sed 's# total#Cpu(s):#g'



    No real luck either ....
  • AdministratorAdministrator
    Data needs to be received in following format:

    --- CPU states ---

    Cpu(s): 44.5%us, 0.5%sy, 0.0%ni, 54.2%id, 0.8%wa, 0.0%hi, 0.0%si, 0.0%st

    Cpu(s): 47.3%us, 0.7%sy, 0.0%ni, 52.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st



    It isolates the value for idle time: "54.2%id"



    Multiple CPU's are OK. It computes average automatically.
  • jacobsenmjacobsenm
    One question though: You are saying that it isolates %id.



    My server has an CPU %idle of 99%.

    And the rule wants me to put in CPU% greater than a parameter. In my case as I understand it right, it should say when the %idle is less than in stead of greater than.



    Current command is:

    top b n 2 | grep 'total' | sed

    's/total/Cpu(s)/g' | sed 's/$/ id/'



    so it will generate:

    Cpu(s) 0.9% 0.0% 0.0% 0.0% 0.0% 0.0% 99.0% id

    Cpu(s) 0.7% 0.0% 0.1% 0.0% 0.5% 0.0% 98.4% id



    Any help on this ?
  • AdministratorAdministrator
    This is not the format accepted by the software.



    If you can not produce it in the same format, then you will need to use SNMP instead. See knowledge base on OIDs



    *UPDATE* A change request has been submitted to development to support your output too.
  • jacobsenmjacobsenm
    That would be great !



    Thanks a lot
  • AdministratorAdministrator
    Jacobsenm,



    Please check release 7.8.5 to see if it works for you with the changes made.
  • jacobsenmjacobsenm
    Hi,



    I appreciate very much the willingness to put effort to resolve this issue. But at the moment it is not fixing the issue. Actually it breaks the way Linuxhealth was calculated on a SUSE box.



    Here is the case. The linuxhealth test was working well on a Suse box. When testing with the rule linuxhealth the TEST SETTINGS button give following results:



    Status: OK

    Value returned: PING 3ms - 0.19 % CPU usage - 16 % Free memory - 84 % free on hda3 - 99 % free on tmpfs -



    Default command in server.c

    top b n 2 | grep -i cpu



    Results:



    Cpu(s): 0.1% us, 0.0% sy, 0.0% ni, 99.9% id, 0.0% wa, 0.0% hi, 0.0% si

    PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND

    Cpu(s): 0.0% us, 0.3% sy, 0.0% ni, 99.7% id, 0.0% wa, 0.0% hi, 0.0% si

    PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND



    You was mentioning in earier conversations that ServerCheck is filtering on %id.

    So someway how you are doing a calculation of getting the CPU usage by 100% - CPU%id = CPU %usage.



    The SUSE Box was doing well in v7.8.4



    In VMWare I am capturing the CPU%id as well doing

    top b n 2 | grep 'total' | sed 's/total/Cpu(s)/g' | sed 's/$/ id/'



    Result:

    Cpu(s) 0.0% 0.0% 0.9% 0.0% 0.0% 0.0% 99.0% id

    Cpu(s) 0.2% 0.0% 0.0% 0.0% 0.8% 0.0% 98.9% id



    Status: OK

    Value returned: PING 0ms - 100 % CPU usage - 4 % Free memory - 33 % free on sda2 - 69 % free on sda1 - 100 % free on none - 96 % free on sda6 -



    Now at this point your calculation does not seem to calculate right ... In ServersCheck 7.8.5 my SUSE box is reported down due to CPU being 100% utilized, which is not and also my VMWare box is still down due to 100% utilized.



    If you need more info, I would be happy to give you.



    Thank you, Menno
  • jacobsenmjacobsenm
    Hi,

    Thanks for the quick reply.

    However, when I do "TEST SETTINGS" the application hangs. And the eventviewer on the SC server is giving:



    Faulting application monitoring_thread2.exe, version 7.8.0.0, faulting module perl58.dll, version 5.8.8.820, fault address 0x00085c78.



    So .... still no luck.



    Best regards
  • AdministratorAdministrator
    Run the s-server.exe in debug mode and then repeat it. When it hangs, then reply with the content of debug.log file



    We need the error message to be able to locate it.



    It works fine on our Suse Linux box with the default agent running.
  • AdministratorAdministrator
    We found a coding error with variables and that could have resulted in a division by zero error.



    Try with following builds:

    http://files.serverscheck.net/fixes/monitoring_rule.zip

    http://files.serverscheck.net/fixes/monitoring_thread2.zip
  • jacobsenmjacobsenm
    Looks great !

    All is working ok now.



    Much appreciated. You are doing a good job
  • AdministratorAdministrator
    OK We are going to release it as 7.8.6


This discussion has been closed.