Runaway Job
I'm having problems with the EventLog check. I've set it up to look for a specific string in the Windows EventLog and have activated it in the ServerCheck web interface. However the check never seems to run. I've put ServersCheck into debug mode and I'm only running the EventLog check. It appears that the monitoring_rule.exe files continuously dies and restarts. Also, my EventLog check is labeled as a runaway job. Here's a few lines of the debug output:
# M Mon Dec 5 10:16:00 2005 Monitoring_manager instances allowed: 1
# M Mon Dec 5 10:16:00 2005 Monitoring_manager instances counted: 1
#
# M Mon Dec 5 10:16:00 2005 Loading language file: EN.lang
# M Mon Dec 5 10:16:05 2005 ServersCheck Monitoring Manager
# M Mon Dec 5 10:16:05 2005 ENTERPRISE version 5.11.4
# M Mon Dec 5 10:16:05 2005 Started OK
# 1 Mon Dec 5 10:16:06 2005 Starting Monitoring Rule Thread 1
# 1 Mon Dec 5 10:16:06 2005 ServersCheck Monitoring Component
# 1 Mon Dec 5 10:16:06 2005 ENTERPRISE version 5.12.0
# 1 Mon Dec 5 10:16:06 2005 monitoring_rule instances allowed: 2
# 2 Mon Dec 5 10:16:06 2005 Starting Monitoring Rule Thread 2
# 2 Mon Dec 5 10:16:06 2005 ServersCheck Monitoring Component
# 2 Mon Dec 5 10:16:06 2005 ENTERPRISE version 5.12.0
# 2 Mon Dec 5 10:16:06 2005 monitoring_rule instances allowed: 2
# 1 Mon Dec 5 10:16:06 2005 monitoring_rule instances counted: 2
# 2 Mon Dec 5 10:16:06 2005 monitoring_rule instances counted: 2
# M Mon Dec 5 10:16:22 2005 Zanni_App_EventlogEVENTLOG job queued
# 2 Mon Dec 5 10:16:22 2005 Skipping D:Program FilesServersCheck_MonitoringjobsAZanni_App_EventlogEVENTLOG.0.31658935546875
# 1 Mon Dec 5 10:16:22 2005 Zanni_App_EventlogEVENTLOG - Starting check
# M Mon Dec 5 10:16:25 2005 Runaway job: AZanni_App_EventlogEVENTLOG.0.5950927734375
# M Mon Dec 5 10:16:59 2005 Cleaning job: AZanni_App_EventlogEVENTLOG.0.5950927734375
# M Mon Dec 5 10:17:01 2005 Zanni_App_EventlogEVENTLOG job queued
# 2 Mon Dec 5 10:17:02 2005 Zanni_App_EventlogEVENTLOG - Starting check
# M Mon Dec 5 10:17:04 2005 Runaway job: AZanni_App_EventlogEVENTLOG.0.17095947265625
# M Mon Dec 5 10:17:37 2005 Cleaning job: AZanni_App_EventlogEVENTLOG.0.17095947265625
# M Mon Dec 5 10:17:39 2005 Zanni_App_EventlogEVENTLOG job queued
# M Mon Dec 5 10:17:55 2005 Monitoring Rule Process Watcher: 0 found
# M Mon Dec 5 10:17:55 2005 Monitoring_rule.exe seems to have died; it will now be restarted
# R2 Mon Dec 5 10:17:55 2005 Starting Monitoring Rule Thread R2
# R2 Mon Dec 5 10:17:56 2005 ServersCheck Monitoring Component
# R2 Mon Dec 5 10:17:56 2005 ENTERPRISE version 5.12.0
# R2 Mon Dec 5 10:17:56 2005 monitoring_rule instances allowed: 2
# R2 Mon Dec 5 10:17:56 2005 monitoring_rule instances counted: 1
# R2 Mon Dec 5 10:17:56 2005 keyfile1 AZanni_App_EventlogEVENTLOG.0.233154296875 Zanni_App_EventlogEVENTLOG 1
# R2 Mon Dec 5 10:17:56 2005 Zanni_App_EventlogEVENTLOG - Starting check
# R3 Mon Dec 5 10:18:01 2005 Starting Monitoring Rule Thread R3
# R3 Mon Dec 5 10:18:01 2005 ServersCheck Monitoring Component
# R3 Mon Dec 5 10:18:01 2005 ENTERPRISE version 5.12.0
# R3 Mon Dec 5 10:18:01 2005 monitoring_rule instances allowed: 2
# R3 Mon Dec 5 10:18:01 2005 monitoring_rule instances counted: 1
# M Mon Dec 5 10:18:02 2005 Runaway job: AZanni_App_EventlogEVENTLOG.0.233154296875
# M Mon Dec 5 10:18:35 2005 Cleaning job: AZanni_App_EventlogEVENTLOG.0.233154296875
# M Mon Dec 5 10:18:37 2005 Zanni_App_EventlogEVENTLOG job queued
# R3 Mon Dec 5 10:18:38 2005 Zanni_App_EventlogEVENTLOG - Starting check
# M Mon Dec 5 10:18:41 2005 Runaway job: AZanni_App_EventlogEVENTLOG.0.365753173828125
# M Mon Dec 5 10:19:14 2005 Cleaning job: AZanni_App_EventlogEVENTLOG.0.365753173828125
# M Mon Dec 5 10:19:16 2005 Zanni_App_EventlogEVENTLOG job queued
# M Mon Dec 5 10:19:34 2005 Monitoring Rule Process Watcher: 0 found
# M Mon Dec 5 10:19:34 2005 Monitoring_rule.exe seems to have died; it will now be restarted
....
....
....
Any ideas on what might be causing this?
Thanks
# M Mon Dec 5 10:16:00 2005 Monitoring_manager instances allowed: 1
# M Mon Dec 5 10:16:00 2005 Monitoring_manager instances counted: 1
#
# M Mon Dec 5 10:16:00 2005 Loading language file: EN.lang
# M Mon Dec 5 10:16:05 2005 ServersCheck Monitoring Manager
# M Mon Dec 5 10:16:05 2005 ENTERPRISE version 5.11.4
# M Mon Dec 5 10:16:05 2005 Started OK
# 1 Mon Dec 5 10:16:06 2005 Starting Monitoring Rule Thread 1
# 1 Mon Dec 5 10:16:06 2005 ServersCheck Monitoring Component
# 1 Mon Dec 5 10:16:06 2005 ENTERPRISE version 5.12.0
# 1 Mon Dec 5 10:16:06 2005 monitoring_rule instances allowed: 2
# 2 Mon Dec 5 10:16:06 2005 Starting Monitoring Rule Thread 2
# 2 Mon Dec 5 10:16:06 2005 ServersCheck Monitoring Component
# 2 Mon Dec 5 10:16:06 2005 ENTERPRISE version 5.12.0
# 2 Mon Dec 5 10:16:06 2005 monitoring_rule instances allowed: 2
# 1 Mon Dec 5 10:16:06 2005 monitoring_rule instances counted: 2
# 2 Mon Dec 5 10:16:06 2005 monitoring_rule instances counted: 2
# M Mon Dec 5 10:16:22 2005 Zanni_App_EventlogEVENTLOG job queued
# 2 Mon Dec 5 10:16:22 2005 Skipping D:Program FilesServersCheck_MonitoringjobsAZanni_App_EventlogEVENTLOG.0.31658935546875
# 1 Mon Dec 5 10:16:22 2005 Zanni_App_EventlogEVENTLOG - Starting check
# M Mon Dec 5 10:16:25 2005 Runaway job: AZanni_App_EventlogEVENTLOG.0.5950927734375
# M Mon Dec 5 10:16:59 2005 Cleaning job: AZanni_App_EventlogEVENTLOG.0.5950927734375
# M Mon Dec 5 10:17:01 2005 Zanni_App_EventlogEVENTLOG job queued
# 2 Mon Dec 5 10:17:02 2005 Zanni_App_EventlogEVENTLOG - Starting check
# M Mon Dec 5 10:17:04 2005 Runaway job: AZanni_App_EventlogEVENTLOG.0.17095947265625
# M Mon Dec 5 10:17:37 2005 Cleaning job: AZanni_App_EventlogEVENTLOG.0.17095947265625
# M Mon Dec 5 10:17:39 2005 Zanni_App_EventlogEVENTLOG job queued
# M Mon Dec 5 10:17:55 2005 Monitoring Rule Process Watcher: 0 found
# M Mon Dec 5 10:17:55 2005 Monitoring_rule.exe seems to have died; it will now be restarted
# R2 Mon Dec 5 10:17:55 2005 Starting Monitoring Rule Thread R2
# R2 Mon Dec 5 10:17:56 2005 ServersCheck Monitoring Component
# R2 Mon Dec 5 10:17:56 2005 ENTERPRISE version 5.12.0
# R2 Mon Dec 5 10:17:56 2005 monitoring_rule instances allowed: 2
# R2 Mon Dec 5 10:17:56 2005 monitoring_rule instances counted: 1
# R2 Mon Dec 5 10:17:56 2005 keyfile1 AZanni_App_EventlogEVENTLOG.0.233154296875 Zanni_App_EventlogEVENTLOG 1
# R2 Mon Dec 5 10:17:56 2005 Zanni_App_EventlogEVENTLOG - Starting check
# R3 Mon Dec 5 10:18:01 2005 Starting Monitoring Rule Thread R3
# R3 Mon Dec 5 10:18:01 2005 ServersCheck Monitoring Component
# R3 Mon Dec 5 10:18:01 2005 ENTERPRISE version 5.12.0
# R3 Mon Dec 5 10:18:01 2005 monitoring_rule instances allowed: 2
# R3 Mon Dec 5 10:18:01 2005 monitoring_rule instances counted: 1
# M Mon Dec 5 10:18:02 2005 Runaway job: AZanni_App_EventlogEVENTLOG.0.233154296875
# M Mon Dec 5 10:18:35 2005 Cleaning job: AZanni_App_EventlogEVENTLOG.0.233154296875
# M Mon Dec 5 10:18:37 2005 Zanni_App_EventlogEVENTLOG job queued
# R3 Mon Dec 5 10:18:38 2005 Zanni_App_EventlogEVENTLOG - Starting check
# M Mon Dec 5 10:18:41 2005 Runaway job: AZanni_App_EventlogEVENTLOG.0.365753173828125
# M Mon Dec 5 10:19:14 2005 Cleaning job: AZanni_App_EventlogEVENTLOG.0.365753173828125
# M Mon Dec 5 10:19:16 2005 Zanni_App_EventlogEVENTLOG job queued
# M Mon Dec 5 10:19:34 2005 Monitoring Rule Process Watcher: 0 found
# M Mon Dec 5 10:19:34 2005 Monitoring_rule.exe seems to have died; it will now be restarted
....
....
....
Any ideas on what might be causing this?
Thanks
This discussion has been closed.
Comments
In the current version, the timeout can not be set.
The component in question can be downloaded here:
http://www.serverscheck.com/files/monitoring_manager.zip
Let us know if that solves the issue
# M Mon Dec 5 13:49:08 2005 Monitoring_rule.exe seems to have died; it will now be restarted
# R17 Mon Dec 5 13:49:09 2005 Starting Monitoring Rule Thread R17
# R17 Mon Dec 5 13:49:09 2005 ServersCheck Monitoring Component
# R17 Mon Dec 5 13:49:09 2005 ENTERPRISE version 5.12.0
# R17 Mon Dec 5 13:49:09 2005 monitoring_rule instances allowed: 2
# R17 Mon Dec 5 13:49:09 2005 monitoring_rule instances counted: 1
# R18 Mon Dec 5 13:49:14 2005 Starting Monitoring Rule Thread R18
# R18 Mon Dec 5 13:49:14 2005 ServersCheck Monitoring Component
# R18 Mon Dec 5 13:49:14 2005 ENTERPRISE version 5.12.0
# R18 Mon Dec 5 13:49:14 2005 monitoring_rule instances allowed: 2
# R18 Mon Dec 5 13:49:14 2005 monitoring_rule instances counted: 2
# R19 Mon Dec 5 13:49:19 2005 Starting Monitoring Rule Thread R19
# R19 Mon Dec 5 13:49:19 2005 ServersCheck Monitoring Component
# R19 Mon Dec 5 13:49:19 2005 ENTERPRISE version 5.12.0
# R19 Mon Dec 5 13:49:19 2005 monitoring_rule instances allowed: 2
# R19 Mon Dec 5 13:49:19 2005 monitoring_rule instances counted: 3
# M Mon Dec 5 13:49:50 2005 Cleaning job: AZanni_App_EventlogEVENTLOG.0.928741455078125
# M Mon Dec 5 13:49:52 2005 Zanni_App_EventlogEVENTLOG job queued
# R17 Mon Dec 5 13:49:52 2005 Zanni_App_EventlogEVENTLOG - Starting check
# M Mon Dec 5 13:49:55 2005 Runaway job: AZanni_App_EventlogEVENTLOG.0.597259521484375
# M Mon Dec 5 13:50:54 2005 Monitoring Rule Process Watcher: 1 found
# M Mon Dec 5 13:50:54 2005 Monitoring_rule.exe seems to have died; it will now be restarted
Also, I can see from the security event logs on the target system (the one whose EventLogs I'm trying to check) that the domain user is successfully logging on and logging off the machine during the checks. Usually the id is logged in for only a second. This occurs about once a minute.
The ~ServersCheckchecklogsZanni_app_eventlog.log contains the following entries which seem to indicate it is actually getting a "DOWN" status, which is what I'm expecting. (The check is looking for the Symantec startup message in the application event log.)
Mon Dec 5 13:48:48 2005 DOWN - Information event on zanni at 12/5/2005 12:15:39 PM (GMT-5) - Source: "Symantec AntiVirus". Event: Symantec AntiVirus services startup was successful.
Mon Dec 5 13:49:53 2005 DOWN - Information event on zanni at 12/5/2005 12:15:39 PM (GMT-5) - Source: "Symantec AntiVirus". Event: Symantec AntiVirus services startup was successful.
Mon Dec 5 13:51:07 2005 DOWN - Information event on zanni at 12/5/2005 12:15:39 PM (GMT-5) - Source: "Symantec AntiVirus". Event: Symantec AntiVirus services startup was successful.
Mon Dec 5 13:52:14 2005 DOWN - Information event on zanni at 12/5/2005 12:15:39 PM (GMT-5) - Source: "Symantec AntiVirus". Event: Symantec AntiVirus services startup was successful.
One last thing, the "All Rules View" of ServersCheck shows the status as "DOWN?". The "Group View" shows it as "DOWN".
D:Program FilesServersCheck_Monitoring>monitoring_manager > debug2.txt
# Error: More monitoring_rule instances running then licensed for. Killing curr
ent.
Socket could not be created : Unknown error
Socket could not be created : Unknown error
I'm not receving any messages from the s-alerts.exe command window.
ServersCheck fails because of a Winsock issue, meaning that it can not create a socket based communication.
What is your OS and Service Pack?
It's attempting to read the event logs of a Win XP Pro, SP2.
I'm not sure if this helps, but in an attempt to figure out where things are going wrong, I created an additional rule to check Ping status. So I had just the eventlog and ping rule running. Both log files (in the checklogs directory) for the eventlog and ping are updated continuously with the correct information, but the status is not reflected on the ServersCheck page.
"Socket could not be created : Unknown error"
Can you please not use the PING check at this stage. Just the event log one with the new build of the monitoring manager. Let it run for 5 minutes in Debug Mode and send me the output again.
I will forward it to the development team