Monitoring Checks - Stopped
Hi,
I'm running Serverscheck 5.12.0 which over the past few months was running fine...
However yesterday when I came into the office, all the checks had stopped over the weekend.
I rebooted the server and it appeared to return to its working state.
Over night however, we did have a real server failure; but were not notified because (like before) Serverscheck had just stopped running the checks!
I've checked it today (and haven't rebooted it incase you need some logs or me to verify anything) but;
1. The time stamp on the main rule page shows the current time.
2. If I choose a manual 'Test Settings' on each rule, it DOES reply with the correct check details.
The checks that are down do reply with a stauts of down but only on the manual check.
3. The CPU and Pagefile etc on the monitoring server are running at normal thresholds, and there are no excessive processes.
4. The Serverscheck Services are both started.
5. The Monitoring Manager is on the desktop and showing the current time and 'Process watching 6 rules'
6. Its not a IE caching issue. Caching is disabled and even from the server console the rules and alerts are not working.
7. There are no Event Logs stating any problems of Serverscheck issues.
(There are some relating to DCOM connections; but I'm sure are unrelated).
This has happened several time in the past.
Thanks
I'm running Serverscheck 5.12.0 which over the past few months was running fine...
However yesterday when I came into the office, all the checks had stopped over the weekend.
I rebooted the server and it appeared to return to its working state.
Over night however, we did have a real server failure; but were not notified because (like before) Serverscheck had just stopped running the checks!
I've checked it today (and haven't rebooted it incase you need some logs or me to verify anything) but;
1. The time stamp on the main rule page shows the current time.
2. If I choose a manual 'Test Settings' on each rule, it DOES reply with the correct check details.
The checks that are down do reply with a stauts of down but only on the manual check.
3. The CPU and Pagefile etc on the monitoring server are running at normal thresholds, and there are no excessive processes.
4. The Serverscheck Services are both started.
5. The Monitoring Manager is on the desktop and showing the current time and 'Process watching 6 rules'
6. Its not a IE caching issue. Caching is disabled and even from the server console the rules and alerts are not working.
7. There are no Event Logs stating any problems of Serverscheck issues.
(There are some relating to DCOM connections; but I'm sure are unrelated).
This has happened several time in the past.
Thanks
This discussion has been closed.
Comments
As for any Windows based computer, it is recommended to periodically reboot the machine.
* Start the Task Scheduler. Under Windows 2000/XP this is located in Start Menu > Programs > Accessories > System Tools.
* Check that the Task Scheduler is running. You may do this by checking the Advances menu and seeing if it lists 'Stop Using Task Scheduler'. If it is listed, this means the Task Scheduler is running. If it says 'Start Using Task Scheduler' then the Task Scheduler is not running. Click on the option to start it.
* Now create a new task by selecting the File menu and then the sub-option New and then the sub-sub-option Scheduled Task. Rename the new task something like Reboot. Double click on the new task
* First lets set the time. Click on the tab Schedule and select Weekly for 'Schedule Task'. Now select the day and time. Normally a weekend day (Sunday say) and a time at night (say 1am) to ensure no-one is using the system at the time.
* Now to set what is to run. Click back on the tab Task. Click set password and enter the password you use to log-on to Windows. This will ensure the reboot happens even if you are not logged on at the time.
* In the field 'Run' you now enter:
For Windows 2000/XP/2003 Server - SHUTDOWN.EXE -r -f -t 01
The shutdown utility can be downloaded from following url:
http://www.serverscheck.net/files/shutdown.zip
A new service is being developed that will watch the refresh status of checks and if needed restart the service to overcome Windows related issues.
Thanks for the reply - however planning a scheduled reboot of our monitoring server doesn't exactly fill me with confidence.
This actual server (and other servers) should be able to run for weeks and months without a reboot. If this software is requiring regular reboots then surely there is an issue with the software or the processes it's running that needs to be resolved.
Currently the monitoring screen is showing incorrect checks but no notification has indicated to me that the actual monitoring server has a problem.
DCOM is a Windows component that ServersCheck uses for Windows based checks (transport layer for WMI).
The reboot option is a tip only.
We have an optional fail-over module which does monitor a primary installation and if that fails, then the backup module takes over.
As you know within the software there are already quite a few watchers to detect potential issues and to correct them in order to have the software continue normally.
The issue you described can not be tackled inside the service. Therefore the free add-on "ServersCheck Monitoring Watcher" is going to be released. This service will watch the monitoring service and configuration service. If the built-in rules fail, then the service will be automatically restarted.
In terms of release schedule: this will be part of the 6.0.3 release and is planned for end of next week.
I have even more problems today!
I installed the Windows Server 2003 SP1 on my monitoring server last night and since the reboot ALL my performance monitor checks now fail.
Each check states: "Performance counter retrieval failed with error code: ERROR: 800007D3 - Thae data item has been added tot he Query but has not been validated"
Any ideas how I can resolve this?
Thanks
Can you access the performance counters through the Perfmon Monitor? Add the one you want to monitor in there and then try again.
Indeed, when I select the object through Performance Monitor I do get a valid response.
I've even treble checked the checks 'Performance counter' and it is correct.
If I deliberatly enter the wrong counter I get a 'ERROR C0000BC0 - Performance counter retrieval failed'
As soon as I enter the correct counter, it returns to ERROR 800007D3.
I've also upgraded my Serverscheck to 6.0.2 hoping it would help...it didn't but the new interface looks cool.
Any more ideas?
Thanks
Has SP1 changed anything to your security settings?
Some progress which I hope will help us diagnose the problem.
Every check I run through Windows Performance Monitor runs fine - whether the check is to a local or remote machine.
When I choose the same check through Serverscheck; the LOCAL check returns a value - so is successful.
The REMOTE check returns the 800007D3 error - so fails.
I've tried different logins - Domain Admin, Local Admin no difference.
I've also tried different checks, also they fail.
One interesting point is... If I choose the '% Processor Time' counter for a REMOTE machine - the check fails.
However if I choose the built in CPU check the same machines report back successful.
I checked and checked the format for typing mistake or incorrect spacing and they are all fine.
What is different between the serverscheck CPU check and the PERFMON check?
I really want to get this working again.
Thanks
The performance counters are accessed through a different Windows layer. I need to check with development on what protocols are involved for the performance counters.
Computernameipc$
verify if you can make above connection to the remote computer under the account of the service.
Connection to the IPC$ is not enabled on Windows Server 2003 by default, it has to be enabled by a registry change.
Anyway, having tried your suggestion and found that connecting to IPC$ failed both to servers with SP1 or without SP1, I decided to remove the Service Pack from my monitoring server.
After the uninstall was complete and I'd rebooted the server ALL PERFCOUNT checks have started to work again!
I've verified them against the System Monitor graph, and indeed I'm back in business.
Whether or not 'Stronger defaults and privilege reduction on services' that SP1 applies had anything to do with the DCOM connections to the remote servers I don't know - but maybe your support desk can look into these issues?
Surely I'm not the only person wanting to run Serverscheck on a Win2K3 server with SP1...?
Finally, even though my PERFCOUNT checks are now all running again I can still not connect to the IPC$ share of each remote server.
Has Serverscheck been tested with Windows Server 2003 + SP1?
Thanks
(it's the one you can see when going to http://www.serverscheck.com/livecapture.asp)
ServersCheck also runs on Windows Vista (though we still verify it against every Candidate Release)