Monitoring Performance Configuration

February 2009

"By default the software runs in ´half speed´ mode. In full speed, the number of monitors running at the same time is doubled. This will result in more CPU and memory usage."

Now, what exactly does this mean? I take it that the throttling only affects the monitors if many monitors are running at the same time and if the hardware (or network) used cannot keep up.

Hypothetically:

Consider 50 monitors running at 30 second intervals & 50 monitors running at 5 minute intervals on rather limited hardware. Does this mean that at half speed, monitors configured for running every 30 seconds will actually run at 1 minute intervals? (while likely 5 minute monitors will remain largely unaffected)

I presume it's not quite as simple as that.

So is there any way to inspect the actual ServersCheck schedule (as it is actually running, to allow correct interpretation of the running checks), rather than the user configured schedule (which won't actually be followed in case it needs to throttle the monitors)?

February 2009

As said: it affects the number of monitors executed at the same time. This impacts parallel processing. Monitors are queued in a cycle and when all previous monitors have been performed the next in the cycle is. Intervals are minimum intervals and can differ from actuals depending on hardware performance, interval, response time systems.

February 2009

Well yes, I gathered as much, but is there any way of actually seeing the frequency at which these monitors are executed?

We're still in a testing phase, but if I configure a highly important monitor to run every 30 seconds, I'd like to know at which interval it actually gets performed instead (since this might have important implications). Preferably without having to make the monitor fail on purpose to check response times.

February 2009

By running the monitoring service in debug mode as per knowledge base article you can see everything it does in the debug log, including the timing intervals.

February 2009

Ok, thanks, I'll give it a shot.

February 2009

The tests run in debug mode, as suggested proved inconclusive for me.

The tests ran as they should (at one point they got bumped by 2 seconds, but that cannot be considered an issue at all).

However, while running in debug mode, I noticed 2 things that jeopardized the entire testing premise:

1) For some reason memory used by the monitoring thread was four times less than what it is under normal service mode.

2) For some reason rules were being halted without any obvious reason (monitoring rules they were dependent on were NOT failing).

I can only conclude that this attempted debug mode test does not reflect what happens in my normal operating environment and that I'll need to validate the monitoring schedule by making the rules fail one by one (letting it fail, making it work and then letting it fail again thus measuring the team between the OK and second DOWN). I suspect it'll be fine (at least under normal circumstances), but still this is quite a gamble to simply leave untested for critical monitors.

Monitoring Performance Configuration

Comments