WINDOWSHEALTH rule alert not correct

mabumabu
Hi,

the alert for the free memory in the WINDOWSHEALTH rule do not perform correct.

The value for the memory is always below the value for the DOWN setting and the Rule frequency are set to 60 when DOWN. But the last alert mail was for round about 5 hours.



Yesterday it send out Recovery messages for the memory but the value for the memory are under the value for DOWN.

Comments

  • AdministratorAdministrator
    This rule is used by thousands users around the world and is reported working fine by all others.



    Without knowing the error message, following could be reasons:

    - server was unreachable

    - mail server delay causing alert to sent out late



    The rule activity for status changes can be viewed in the rule log history.



    Run in debug mode as per knowledge base and simulate error. You will then be able to track down the behavior in detail.
  • The server is reachable and for other check I become alerts.



    I have this issue with two WINDOWSHEALTH rules and in both cases for the memory check.



    The last entry in the rule log history is from today 2:53 am, it changed from DOWN to OK. But the memory value is still below the DOWN value.

    As information I did not acknowledge the last DOWN state from 2:52 am and



    I will run the monitoring in debug mode today and send you the log file.
  • AdministratorAdministrator
    Also post the exact error message as returned for the rule. A remark is not an error message. We need the actual error message returned. It shown when you go with your mouse over the text "details" or in the log file of the rule.
  • Here the Last Error

    MEMORY Error level: 4 (returned value) less than 10



    But the debug log is to big to paste it here. Could I send it to you per mail?
  • I have forgotten the error from the seconds rule with the same problems.



    Rule name

    "DEBACKUP WINDOWS HEALTH (11918618541731)"

    Last Error

    MEMORY Warning level: 25 (returned value) less than 30
  • AdministratorAdministrator
    Warning level is something different than DOWN level.



    Isolate data in debug data related to Windowshealth rule showing where it exceeds data in log file and post it
  • Here a part of the debug log





    # S-12 Tue Oct 9 09:31:50 2007 1190909412WINDOWSHEALTH - s:DOWN? - e:MEMORY Error level: 6 (returned value) less than 10 - # S-12 Tue Oct 9 09:31:53 2007 1190909412WINDOWSHEALTH - Starting check - 514

    # S-12 Tue Oct 9 09:31:56 2007 1190909412WINDOWSHEALTH - s:OK - e:MEMORY Error level: 6 (returned value) less than 10 - v:0|0|6|C.40|D.60|E.54 - t:6

    # S-12 Tue Oct 9 09:46:56 2007 1190909412WINDOWSHEALTH - Starting check - 515

    # S-12 Tue Oct 9 09:47:00 2007 1190909412WINDOWSHEALTH - s:DOWN? - e:MEMORY Error level: 6 (returned value) less than 10 - v:0|0|6|C.40|D.60|E.54 - t:910
  • bedeebedee
    I've maybe a similar problem with serverhealth



    I set the limit for a down message to 300 ms (Ping). now i got a down message with following description:



    PING Error level: 163 (returned value) greater than 10



    I checked the settings again in servershealth rule. They value are still by 300ms for the down level. I cant find any setting of 10 ms
  • AdministratorAdministrator
    Mabu & Bedee,



    thank you for the post. Our engineers are looking at it.



    Update 4:11pm CET: we found the issue and are working on a fix right now.
  • AdministratorAdministrator
    Download the fix from following url:

    http://files.serverscheck.net/fixes/monitoring_thread2.zip
  • AdministratorAdministrator
    We have tested it with other users and our test platforms and the reported bug is gone.



    I assume you upgraded to 7.6.4



    Run in debug mode again and send output.



    Last known error is always shown.
  • So I have done a reinstallation with the version 7.6.4 and afterwards I have run the debug mode. You find the log below.



    The rule what I have recreated (CCmarketss07 WinHEALTH (1192035878)) send alerts the other not recreated one (DEBACKUP WINDOWS HEALTH (11918618541731)) not.



    For the rule 1192035878 I have set "Interval when status is down" to 60 but it didn't send an alert every minute.



    # S-30 Thu Oct 11 16:34:00 2007 1192035878WINDOWSHEALTH - Starting check - 6

    # S-30 Thu Oct 11 16:34:04 2007 1192035878WINDOWSHEALTH - s:DOWN? - e:MEMORY Error level: 9 (returned value) less than 10 - v:0|4|9|C.40|D.60|E.54 - t:1192113244

    # S-30 Thu Oct 11 16:34:07 2007 1192035878WINDOWSHEALTH - Starting check - 6

    # S-30 Thu Oct 11 16:34:11 2007 1192035878WINDOWSHEALTH - s:DOWN? - e:MEMORY Error level: 9 (returned value) less than 10 - v:0|0|9|C.40|D.60|E.54 - t:7

    # S-30 Thu Oct 11 16:34:14 2007 1192035878WINDOWSHEALTH - Starting check - 6

    # S-30 Thu Oct 11 16:34:18 2007 1192035878WINDOWSHEALTH - s:DOWN? - e:MEMORY Error level: 9 (returned value) less than 10 - v:0|2|9|C.40|D.60|E.54 - t:14

    # S-30 Thu Oct 11 16:34:21 2007 1192035878WINDOWSHEALTH - Starting check - 6

    # S-30 Thu Oct 11 16:34:24 2007 1192035878WINDOWSHEALTH - s:DOWN - e:MEMORY Error level: 9 (returned value) less than 10 - v:0|0|9|C.40|D.60|E.54 - t:20

    # S-30 Thu Oct 11 16:35:24 2007 1192035878WINDOWSHEALTH - Starting check - 6

    # S-30 Thu Oct 11 16:35:28 2007 1192035878WINDOWSHEALTH - s:DOWN? - e:MEMORY Error level: 9 (returned value) less than 10 - v:0|0|9|C.40|D.60|E.54 - t:84

    # S-30 Thu Oct 11 16:35:31 2007 1192035878WINDOWSHEALTH - Starting check - 6

    # S-30 Thu Oct 11 16:35:35 2007 1192035878WINDOWSHEALTH - s:DOWN? - e:MEMORY Error level: 9 (returned value) less than 10 - v:0|2|9|C.40|D.60|E.54 - t:7

    # S-30 Thu Oct 11 16:35:38 2007 1192035878WINDOWSHEALTH - Starting check - 6

    # S-30 Thu Oct 11 16:35:42 2007 1192035878WINDOWSHEALTH - s:DOWN? - e:MEMORY Error level: 9 (returned value) less than 10 - v:0|2|9|C.40|D.60|E.54 - t:14

    # S-30 Thu Oct 11 16:35:45 2007 1192035878WINDOWSHEALTH - Starting check - 6

    # S-30 Thu Oct 11 16:35:48 2007 1192035878WINDOWSHEALTH - s:DOWN - e:MEMORY Error level: 9 (returned value) less than 10 - v:0|0|9|C.40|D.60|E.54 - t:20

    # S-30 Thu Oct 11 16:36:48 2007 1192035878WINDOWSHEALTH - Starting check - 6

    # S-30 Thu Oct 11 16:36:52 2007 1192035878WINDOWSHEALTH - s:DOWN? - e:MEMORY Error level: 9 (returned value) less than 10 - v:0|0|9|C.40|D.60|E.54 - t:84

    # S-30 Thu Oct 11 16:36:55 2007 1192035878WINDOWSHEALTH - Starting check - 6

    # S-30 Thu Oct 11 16:36:59 2007 1192035878WINDOWSHEALTH - s:DOWN? - e:MEMORY Error level: 9 (returned value) less than 10 - v:0|2|9|C.40|D.60|E.54 - t:7

    # S-30 Thu Oct 11 16:37:02 2007 1192035878WINDOWSHEALTH - Starting check - 6

    # S-30 Thu Oct 11 16:37:06 2007 1192035878WINDOWSHEALTH - s:DOWN? - e:MEMORY Error level: 9 (returned value) less than 10 - v:0|0|9|C.40|D.60|E.54 - t:14

    # S-30 Thu Oct 11 16:37:09 2007 1192035878WINDOWSHEALTH - Starting check - 6

    # S-30 Thu Oct 11 16:37:12 2007 1192035878WINDOWSHEALTH - s:DOWN - e:MEMORY Error level: 9 (returned value) less than 10 - v:0|0|9|C.40|D.60|E.54 - t:20

    # S-30 Thu Oct 11 16:38:12 2007 1192035878WINDOWSHEALTH - Starting check - 6

    # S-30 Thu Oct 11 16:38:16 2007 1192035878WINDOWSHEALTH - s:DOWN? - e:MEMORY Error level: 9 (returned value) less than 10 - v:0|0|9|C.40|D.60|E.54 - t:84

    # S-30 Thu Oct 11 16:38:19 2007 1192035878WINDOWSHEALTH - Starting check - 6

    # S-30 Thu Oct 11 16:38:23 2007 1192035878WINDOWSHEALTH - s:DOWN? - e:MEMORY Error level: 9 (returned value) less than 10 - v:0|2|9|C.40|D.60|E.54 - t:7

    # S-30 Thu Oct 11 16:38:26 2007 1192035878WINDOWSHEALTH - Starting check - 6

    # S-30 Thu Oct 11 16:38:31 2007 1192035878WINDOWSHEALTH - s:DOWN? - e:MEMORY Error level: 9 (returned value) less than 10 - v:0|31|9|C.40|D.60|E.54 - t:15

    # S-30 Thu Oct 11 16:38:34 2007 1192035878WINDOWSHEALTH - Starting check - 6

    # S-30 Thu Oct 11 16:38:37 2007 1192035878WINDOWSHEALTH - s:DOWN - e:MEMORY Error level: 9 (returned value) less than 10 - v:0|0|9|C.40|D.60|E.54 - t:21

    # S-30 Thu Oct 11 16:39:37 2007 1192035878WINDOWSHEALTH - Starting check - 6

    # S-30 Thu Oct 11 16:39:41 2007 1192035878WINDOWSHEALTH - s:DOWN? - e:MEMORY Error level: 9 (returned value) less than 10 - v:0|0|9|C.40|D.60|E.54 - t:64

    # S-30 Thu Oct 11 16:39:44 2007 1192035878WINDOWSHEALTH - Starting check - 6

    # S-30 Thu Oct 11 16:39:48 2007 1192035878WINDOWSHEALTH # M Thu Oct 11 16:39:55 2007 import rules

    # S-30 Thu Oct 11 16:39:51 2007 1192035878WINDOWSHEALTH - Starting check - 6

    # S-30 Thu Oct 11 16:39:55 2007 1192035878WINDOWSHEALTH - s:DOWN? - e:MEMORY Error level: 9 (returned value) less than 10 - v:0|0|9|C.40|D.60|E.54 - t:14

    # S-30 Thu Oct 11 16:39:58 2007 1192035878WINDOWSHEALTH - Starting check - 6

    # S-30 Thu Oct 11 16:40:02 2007 1192035878WINDOWSHEALTH - s:DOWN - e:MEMORY Error level: 9 (returned value) less than 10 - v:0|5|9|C.40|D.60|E.54 - t:21

    # S-30 Thu Oct 11 16:41:02 2007 1192035878WINDOWSHEALTH - Starting check - 6

    # Thu Oct 11 16:41:06 2007 skipping Availability graphs - only value graphs plotted

    # Thu Oct 11 16:41:06 2007 skipping SLA graphs - only value graphs plotted

    # S-30 Thu Oct 11 16:41:06 2007 1192035878WINDOWSHEALTH - s:DOWN? - e:MEMORY Error level: 9 (returned value) less than 10 - v:0|0|9|C.40|D.60|E.54 - t:64

    # S-30 Thu Oct 11 16:41:09 2007 1192035878WINDOWSHEALTH - Starting check - 6

    # S-30 Thu Oct 11 16:41:13 2007 1192035878WINDOWSHEALTH - s:DOWN? - e:MEMORY Error level: 9 (returned value) less than 10 - v:0|0|9|C.40|D.60|E.54 - t:7

    # S-30 Thu Oct 11 16:41:16 2007 1192035878WINDOWSHEALTH - Starting check - 6

    # S-30 Thu Oct 11 16:41:20 2007 1192035878WINDOWSHEALTH - s:DOWN? - e:MEMORY Error level: 9 (returned value) less than 10 - v:0|2|9|C.40|D.60|E.54 - t:14

    # S-30 Thu Oct 11 16:41:23 2007 1192035878WINDOWSHEALTH - Starting check - 6

    # S-30 Thu Oct 11 16:41:26 2007 1192035878WINDOWSHEALTH - s:DOWN - e:MEMORY Error level: 9 (returned value) less than 10 - v:0|0|9|C.40|D.60|E.54 - t:20

    # S-30 Thu Oct 11 16:42:26 2007 1192035878WINDOWSHEALTH - Starting check - 6

    # S-30 Thu Oct 11 16:42:30 2007 1192035878WINDOWSHEALTH - s:DOWN? - e:MEMORY Error level: 9 (returned value) less than 10 - v:0|0|9|C.40|D.60|E.54 - t:84

    # S-30 Thu Oct 11 16:42:33 2007 1192035878WINDOWSHEALTH - Starting check - 6

    # S-30 Thu Oct 11 16:42:37 2007 1192035878WINDOWSHEALTH - s:DOWN? - e:MEMORY Error level: 9 (returned value) less than 10 - v:0|0|9|C.40|D.60|E.54 - t:7

    # S-30 Thu Oct 11 16:42:40 2007 1192035878WINDOWSHEALTH - Starting check - 6

    # S-30 Thu Oct 11 16:42:45 2007 1192035878WINDOWSHEALTH - s:DOWN? - e:MEMORY Error level: 9 (returned value) less than 10 - v:0|2|9|C.40|D.60|E.54 - t:15

    # S-30 Thu Oct 11 16:42:48 2007 1192035878WINDOWSHEALTH - Starting check - 6

    # S-30 Thu Oct 11 16:42:52 2007 1192035878WINDOWSHEALTH - s:DOWN - e:MEMORY Error level: 9 (returned value) less than 10 - v:0|0|9|C.40|D.60|E.54 - t:22

    # S-30 Thu Oct 11 16:43:52 2007 1192035878WINDOWSHEALTH - Starting check - 6

    # S-30 Thu Oct 11 16:43:56 2007 1192035878WINDOWSHEALTH - s:DOWN? - e:MEMORY Error level: 9 (returned value) less than 10 - v:0|2|9|C.40|D.60|E.54 - t:64

    # S-30 Thu Oct 11 16:43:59 2007 1192035878WINDOWSHEALTH - Starting check - 6

    # S-30 Thu Oct 11 16:44:03 2007 1192035878WINDOWSHEALTH - s:DOWN? - e:MEMORY Error level: 9 (returned value) less than 10 - v:0|0|9|C.40|D.60|E.54 - t:7

    # S-30 Thu Oct 11 16:44:06 2007 1192035878WINDOWSHEALTH - Starting check - 6

    # S-30 Thu Oct 11 16:44:10 2007 1192035878WINDOWSHEALTH - s:DOWN? - e:MEMORY Error level: 9 (returned value) less than 10 - v:1|0|9|C.40|D.60|E.54 - t:14

    # S-30 Thu Oct 11 16:44:13 2007 1192035878WINDOWSHEALTH - Starting check - 6

    # S-30 Thu Oct 11 16:44:16 2007 1192035878WINDOWSHEALTH - s:DOWN - e:MEMORY Error level: 9 (returned value) less than 10 - v:0|0|9|C.40|D.60|E.54 - t:20

    # S-30 Thu Oct 11 16:45:16 2007 1192035878WINDOWSHEALTH - Starting check - 6

    # S-30 Thu Oct 11 16:45:20 2007 1192035878WINDOWSHEALTH - s:DOWN? - e:MEMORY Error level: 6 (returned value) less than 10 - v:0|4|6|C.40|D.60|E.54 - t:84

    # S-30 Thu Oct 11 16:45:23 2007 1192035878WINDOWSHEALTH - Starting check - 6

    # S-30 Thu Oct 11 16:45:27 2007 1192035878WINDOWSHEALTH - s:DOWN? - e:MEMORY Error level: 7 (returned value) less than 10 - v:0|0|7|C.40|D.60|E.54 - t:7

    # S-30 Thu Oct 11 16:45:30 2007 1192035878WINDOWSHEALTH - Starting check - 6
  • AdministratorAdministrator
    There are no errors.



    DOWN was when memory was 6o or 9 which less than your treshold value of 10



    If you have the STARTER edition, then checks are done one after the other and not simultaneously.



    When interval for DOWN is 60 secs then it will wait at least a minute before performing the check again. If the check is not OK then it enters into a retry mode until the retries are completed. Depending on how long the retries take, it may be more than a minute. The interval is the minimum time between the last check and the first next check.



    Based on the debug log all is normal and fine.
  • Sorry it seems that it was my fault.

    Now I have set in the "General alert options" the "Alert when:" to "All DOWN and on each status change" and it send out alerts every 1 till 4 minutes.



    Thanks very much for your effort
This discussion has been closed.