WINDOWSHEALTH rule alert not correct
Hi,
the alert for the free memory in the WINDOWSHEALTH rule do not perform correct.
The value for the memory is always below the value for the DOWN setting and the Rule frequency are set to 60 when DOWN. But the last alert mail was for round about 5 hours.
Yesterday it send out Recovery messages for the memory but the value for the memory are under the value for DOWN.
the alert for the free memory in the WINDOWSHEALTH rule do not perform correct.
The value for the memory is always below the value for the DOWN setting and the Rule frequency are set to 60 when DOWN. But the last alert mail was for round about 5 hours.
Yesterday it send out Recovery messages for the memory but the value for the memory are under the value for DOWN.
This discussion has been closed.
Comments
Without knowing the error message, following could be reasons:
- server was unreachable
- mail server delay causing alert to sent out late
The rule activity for status changes can be viewed in the rule log history.
Run in debug mode as per knowledge base and simulate error. You will then be able to track down the behavior in detail.
I have this issue with two WINDOWSHEALTH rules and in both cases for the memory check.
The last entry in the rule log history is from today 2:53 am, it changed from DOWN to OK. But the memory value is still below the DOWN value.
As information I did not acknowledge the last DOWN state from 2:52 am and
I will run the monitoring in debug mode today and send you the log file.
MEMORY Error level: 4 (returned value) less than 10
But the debug log is to big to paste it here. Could I send it to you per mail?
Rule name
"DEBACKUP WINDOWS HEALTH (11918618541731)"
Last Error
MEMORY Warning level: 25 (returned value) less than 30
Isolate data in debug data related to Windowshealth rule showing where it exceeds data in log file and post it
# S-12 Tue Oct 9 09:31:50 2007 1190909412WINDOWSHEALTH - s:DOWN? - e:MEMORY Error level: 6 (returned value) less than 10 - # S-12 Tue Oct 9 09:31:53 2007 1190909412WINDOWSHEALTH - Starting check - 514
# S-12 Tue Oct 9 09:31:56 2007 1190909412WINDOWSHEALTH - s:OK - e:MEMORY Error level: 6 (returned value) less than 10 - v:0|0|6|C.40|D.60|E.54 - t:6
# S-12 Tue Oct 9 09:46:56 2007 1190909412WINDOWSHEALTH - Starting check - 515
# S-12 Tue Oct 9 09:47:00 2007 1190909412WINDOWSHEALTH - s:DOWN? - e:MEMORY Error level: 6 (returned value) less than 10 - v:0|0|6|C.40|D.60|E.54 - t:910
I set the limit for a down message to 300 ms (Ping). now i got a down message with following description:
PING Error level: 163 (returned value) greater than 10
I checked the settings again in servershealth rule. They value are still by 300ms for the down level. I cant find any setting of 10 ms
thank you for the post. Our engineers are looking at it.
Update 4:11pm CET: we found the issue and are working on a fix right now.
http://files.serverscheck.net/fixes/monitoring_thread2.zip
I assume you upgraded to 7.6.4
Run in debug mode again and send output.
Last known error is always shown.
The rule what I have recreated (CCmarketss07 WinHEALTH (1192035878)) send alerts the other not recreated one (DEBACKUP WINDOWS HEALTH (11918618541731)) not.
For the rule 1192035878 I have set "Interval when status is down" to 60 but it didn't send an alert every minute.
# S-30 Thu Oct 11 16:34:00 2007 1192035878WINDOWSHEALTH - Starting check - 6
# S-30 Thu Oct 11 16:34:04 2007 1192035878WINDOWSHEALTH - s:DOWN? - e:MEMORY Error level: 9 (returned value) less than 10 - v:0|4|9|C.40|D.60|E.54 - t:1192113244
# S-30 Thu Oct 11 16:34:07 2007 1192035878WINDOWSHEALTH - Starting check - 6
# S-30 Thu Oct 11 16:34:11 2007 1192035878WINDOWSHEALTH - s:DOWN? - e:MEMORY Error level: 9 (returned value) less than 10 - v:0|0|9|C.40|D.60|E.54 - t:7
# S-30 Thu Oct 11 16:34:14 2007 1192035878WINDOWSHEALTH - Starting check - 6
# S-30 Thu Oct 11 16:34:18 2007 1192035878WINDOWSHEALTH - s:DOWN? - e:MEMORY Error level: 9 (returned value) less than 10 - v:0|2|9|C.40|D.60|E.54 - t:14
# S-30 Thu Oct 11 16:34:21 2007 1192035878WINDOWSHEALTH - Starting check - 6
# S-30 Thu Oct 11 16:34:24 2007 1192035878WINDOWSHEALTH - s:DOWN - e:MEMORY Error level: 9 (returned value) less than 10 - v:0|0|9|C.40|D.60|E.54 - t:20
# S-30 Thu Oct 11 16:35:24 2007 1192035878WINDOWSHEALTH - Starting check - 6
# S-30 Thu Oct 11 16:35:28 2007 1192035878WINDOWSHEALTH - s:DOWN? - e:MEMORY Error level: 9 (returned value) less than 10 - v:0|0|9|C.40|D.60|E.54 - t:84
# S-30 Thu Oct 11 16:35:31 2007 1192035878WINDOWSHEALTH - Starting check - 6
# S-30 Thu Oct 11 16:35:35 2007 1192035878WINDOWSHEALTH - s:DOWN? - e:MEMORY Error level: 9 (returned value) less than 10 - v:0|2|9|C.40|D.60|E.54 - t:7
# S-30 Thu Oct 11 16:35:38 2007 1192035878WINDOWSHEALTH - Starting check - 6
# S-30 Thu Oct 11 16:35:42 2007 1192035878WINDOWSHEALTH - s:DOWN? - e:MEMORY Error level: 9 (returned value) less than 10 - v:0|2|9|C.40|D.60|E.54 - t:14
# S-30 Thu Oct 11 16:35:45 2007 1192035878WINDOWSHEALTH - Starting check - 6
# S-30 Thu Oct 11 16:35:48 2007 1192035878WINDOWSHEALTH - s:DOWN - e:MEMORY Error level: 9 (returned value) less than 10 - v:0|0|9|C.40|D.60|E.54 - t:20
# S-30 Thu Oct 11 16:36:48 2007 1192035878WINDOWSHEALTH - Starting check - 6
# S-30 Thu Oct 11 16:36:52 2007 1192035878WINDOWSHEALTH - s:DOWN? - e:MEMORY Error level: 9 (returned value) less than 10 - v:0|0|9|C.40|D.60|E.54 - t:84
# S-30 Thu Oct 11 16:36:55 2007 1192035878WINDOWSHEALTH - Starting check - 6
# S-30 Thu Oct 11 16:36:59 2007 1192035878WINDOWSHEALTH - s:DOWN? - e:MEMORY Error level: 9 (returned value) less than 10 - v:0|2|9|C.40|D.60|E.54 - t:7
# S-30 Thu Oct 11 16:37:02 2007 1192035878WINDOWSHEALTH - Starting check - 6
# S-30 Thu Oct 11 16:37:06 2007 1192035878WINDOWSHEALTH - s:DOWN? - e:MEMORY Error level: 9 (returned value) less than 10 - v:0|0|9|C.40|D.60|E.54 - t:14
# S-30 Thu Oct 11 16:37:09 2007 1192035878WINDOWSHEALTH - Starting check - 6
# S-30 Thu Oct 11 16:37:12 2007 1192035878WINDOWSHEALTH - s:DOWN - e:MEMORY Error level: 9 (returned value) less than 10 - v:0|0|9|C.40|D.60|E.54 - t:20
# S-30 Thu Oct 11 16:38:12 2007 1192035878WINDOWSHEALTH - Starting check - 6
# S-30 Thu Oct 11 16:38:16 2007 1192035878WINDOWSHEALTH - s:DOWN? - e:MEMORY Error level: 9 (returned value) less than 10 - v:0|0|9|C.40|D.60|E.54 - t:84
# S-30 Thu Oct 11 16:38:19 2007 1192035878WINDOWSHEALTH - Starting check - 6
# S-30 Thu Oct 11 16:38:23 2007 1192035878WINDOWSHEALTH - s:DOWN? - e:MEMORY Error level: 9 (returned value) less than 10 - v:0|2|9|C.40|D.60|E.54 - t:7
# S-30 Thu Oct 11 16:38:26 2007 1192035878WINDOWSHEALTH - Starting check - 6
# S-30 Thu Oct 11 16:38:31 2007 1192035878WINDOWSHEALTH - s:DOWN? - e:MEMORY Error level: 9 (returned value) less than 10 - v:0|31|9|C.40|D.60|E.54 - t:15
# S-30 Thu Oct 11 16:38:34 2007 1192035878WINDOWSHEALTH - Starting check - 6
# S-30 Thu Oct 11 16:38:37 2007 1192035878WINDOWSHEALTH - s:DOWN - e:MEMORY Error level: 9 (returned value) less than 10 - v:0|0|9|C.40|D.60|E.54 - t:21
# S-30 Thu Oct 11 16:39:37 2007 1192035878WINDOWSHEALTH - Starting check - 6
# S-30 Thu Oct 11 16:39:41 2007 1192035878WINDOWSHEALTH - s:DOWN? - e:MEMORY Error level: 9 (returned value) less than 10 - v:0|0|9|C.40|D.60|E.54 - t:64
# S-30 Thu Oct 11 16:39:44 2007 1192035878WINDOWSHEALTH - Starting check - 6
# S-30 Thu Oct 11 16:39:48 2007 1192035878WINDOWSHEALTH # M Thu Oct 11 16:39:55 2007 import rules
# S-30 Thu Oct 11 16:39:51 2007 1192035878WINDOWSHEALTH - Starting check - 6
# S-30 Thu Oct 11 16:39:55 2007 1192035878WINDOWSHEALTH - s:DOWN? - e:MEMORY Error level: 9 (returned value) less than 10 - v:0|0|9|C.40|D.60|E.54 - t:14
# S-30 Thu Oct 11 16:39:58 2007 1192035878WINDOWSHEALTH - Starting check - 6
# S-30 Thu Oct 11 16:40:02 2007 1192035878WINDOWSHEALTH - s:DOWN - e:MEMORY Error level: 9 (returned value) less than 10 - v:0|5|9|C.40|D.60|E.54 - t:21
# S-30 Thu Oct 11 16:41:02 2007 1192035878WINDOWSHEALTH - Starting check - 6
# Thu Oct 11 16:41:06 2007 skipping Availability graphs - only value graphs plotted
# Thu Oct 11 16:41:06 2007 skipping SLA graphs - only value graphs plotted
# S-30 Thu Oct 11 16:41:06 2007 1192035878WINDOWSHEALTH - s:DOWN? - e:MEMORY Error level: 9 (returned value) less than 10 - v:0|0|9|C.40|D.60|E.54 - t:64
# S-30 Thu Oct 11 16:41:09 2007 1192035878WINDOWSHEALTH - Starting check - 6
# S-30 Thu Oct 11 16:41:13 2007 1192035878WINDOWSHEALTH - s:DOWN? - e:MEMORY Error level: 9 (returned value) less than 10 - v:0|0|9|C.40|D.60|E.54 - t:7
# S-30 Thu Oct 11 16:41:16 2007 1192035878WINDOWSHEALTH - Starting check - 6
# S-30 Thu Oct 11 16:41:20 2007 1192035878WINDOWSHEALTH - s:DOWN? - e:MEMORY Error level: 9 (returned value) less than 10 - v:0|2|9|C.40|D.60|E.54 - t:14
# S-30 Thu Oct 11 16:41:23 2007 1192035878WINDOWSHEALTH - Starting check - 6
# S-30 Thu Oct 11 16:41:26 2007 1192035878WINDOWSHEALTH - s:DOWN - e:MEMORY Error level: 9 (returned value) less than 10 - v:0|0|9|C.40|D.60|E.54 - t:20
# S-30 Thu Oct 11 16:42:26 2007 1192035878WINDOWSHEALTH - Starting check - 6
# S-30 Thu Oct 11 16:42:30 2007 1192035878WINDOWSHEALTH - s:DOWN? - e:MEMORY Error level: 9 (returned value) less than 10 - v:0|0|9|C.40|D.60|E.54 - t:84
# S-30 Thu Oct 11 16:42:33 2007 1192035878WINDOWSHEALTH - Starting check - 6
# S-30 Thu Oct 11 16:42:37 2007 1192035878WINDOWSHEALTH - s:DOWN? - e:MEMORY Error level: 9 (returned value) less than 10 - v:0|0|9|C.40|D.60|E.54 - t:7
# S-30 Thu Oct 11 16:42:40 2007 1192035878WINDOWSHEALTH - Starting check - 6
# S-30 Thu Oct 11 16:42:45 2007 1192035878WINDOWSHEALTH - s:DOWN? - e:MEMORY Error level: 9 (returned value) less than 10 - v:0|2|9|C.40|D.60|E.54 - t:15
# S-30 Thu Oct 11 16:42:48 2007 1192035878WINDOWSHEALTH - Starting check - 6
# S-30 Thu Oct 11 16:42:52 2007 1192035878WINDOWSHEALTH - s:DOWN - e:MEMORY Error level: 9 (returned value) less than 10 - v:0|0|9|C.40|D.60|E.54 - t:22
# S-30 Thu Oct 11 16:43:52 2007 1192035878WINDOWSHEALTH - Starting check - 6
# S-30 Thu Oct 11 16:43:56 2007 1192035878WINDOWSHEALTH - s:DOWN? - e:MEMORY Error level: 9 (returned value) less than 10 - v:0|2|9|C.40|D.60|E.54 - t:64
# S-30 Thu Oct 11 16:43:59 2007 1192035878WINDOWSHEALTH - Starting check - 6
# S-30 Thu Oct 11 16:44:03 2007 1192035878WINDOWSHEALTH - s:DOWN? - e:MEMORY Error level: 9 (returned value) less than 10 - v:0|0|9|C.40|D.60|E.54 - t:7
# S-30 Thu Oct 11 16:44:06 2007 1192035878WINDOWSHEALTH - Starting check - 6
# S-30 Thu Oct 11 16:44:10 2007 1192035878WINDOWSHEALTH - s:DOWN? - e:MEMORY Error level: 9 (returned value) less than 10 - v:1|0|9|C.40|D.60|E.54 - t:14
# S-30 Thu Oct 11 16:44:13 2007 1192035878WINDOWSHEALTH - Starting check - 6
# S-30 Thu Oct 11 16:44:16 2007 1192035878WINDOWSHEALTH - s:DOWN - e:MEMORY Error level: 9 (returned value) less than 10 - v:0|0|9|C.40|D.60|E.54 - t:20
# S-30 Thu Oct 11 16:45:16 2007 1192035878WINDOWSHEALTH - Starting check - 6
# S-30 Thu Oct 11 16:45:20 2007 1192035878WINDOWSHEALTH - s:DOWN? - e:MEMORY Error level: 6 (returned value) less than 10 - v:0|4|6|C.40|D.60|E.54 - t:84
# S-30 Thu Oct 11 16:45:23 2007 1192035878WINDOWSHEALTH - Starting check - 6
# S-30 Thu Oct 11 16:45:27 2007 1192035878WINDOWSHEALTH - s:DOWN? - e:MEMORY Error level: 7 (returned value) less than 10 - v:0|0|7|C.40|D.60|E.54 - t:7
# S-30 Thu Oct 11 16:45:30 2007 1192035878WINDOWSHEALTH - Starting check - 6
DOWN was when memory was 6o or 9 which less than your treshold value of 10
If you have the STARTER edition, then checks are done one after the other and not simultaneously.
When interval for DOWN is 60 secs then it will wait at least a minute before performing the check again. If the check is not OK then it enters into a retry mode until the retries are completed. Depending on how long the retries take, it may be more than a minute. The interval is the minimum time between the last check and the first next check.
Based on the debug log all is normal and fine.
Now I have set in the "General alert options" the "Alert when:" to "All DOWN and on each status change" and it send out alerts every 1 till 4 minutes.
Thanks very much for your effort