Slashdot Mirror


User: ltcmdr

ltcmdr's activity in the archive.

Stories
0
Comments
1
First seen
Last seen
Profile
(view on slashdot.org)

Comments · 1

  1. Monitor and alert based on services / applications on Network Monitoring and Alerting? · · Score: 1

    I think you've highlighted an important notion when you talk about creating a centralized alerting system that will alert you to problems but will also help cut down your response time when there is a problem.

    The key here is to move beyond just instrumentation, and add a layer of service mapping and correlation. The challenge is to map the data and events you will get from Nagios (or other monitoring tool/agent) to the services that the systems are provisioning and your service objectives (performance/availability). The goal is to correlate and add context to the alarms so that you can be reliably alerted on real problems, prioritize your response, and efficiently perform root cause analysis. Otherwise you are likely to find yourself chasing redundant problem indications and false alarms.

    A number of the larger systems management vendors (HP, BMC, etc.) have begun selling this approach. I'm not a big fan of most of their solutions though because they often require multiple, poorly integrated products and a lot of consulting dollars to implement. I recommend checking out Managed Objects (http://www.managedobjects.com). The product allows you to easily bring in data from multiple systems and map it to custom defined service maps and service objectives. It also allows you to integrate with change management and trouble ticketing systems or asset databases so you can create trouble tickets automatically or automatically ignore alarms for servers that are down for scheduled maintenance, etc.