Slashdot Mirror


Agent-based or Agent-less Network Monitoring

An anonymous reader writes "ITO has published an interesting article on agent-based and agent-less network monitoring approaches: "Agents can monitor the status (availability and performance) of applications, servers, and network components in significantly more depth than generic management tools, since they are able to gather data through application-specific interfaces, exercise the full application functionality, and perform localised aggregation and summarisation of high volume metrics for example.""

34 comments

  1. I wonder . . . . by aneeshm · · Score: 0, Interesting

    . . . . how much of this overlaps with conventional AI ?

  2. I advise a mixed approach by Limburgher · · Score: 3, Interesting

    I use agents on the few where it's really critical that I be alerted to adverse conditions, say, low disk space, high load, etc. The rest I can jsut check TCP services and be done with it.

    --

    You are not the customer.

    1. Re:I advise a mixed approach by Total_Wimp · · Score: 2, Informative

      Mixed works well for us as well. I know less about large data centers, but on our medium sized network(a couple hundred servers), the performance and instalation costs don't really matter, as long as we restrict the agents to machines the really need it. In actual practice, this works out to about a couple dozen servers. We may add more in the future, but this is totaly managable at the moment.

      I know this article doesn't really cover it, but we feel very different about client computer agents. Deployment is the kicker here. When you have a quarter of your workforce highly mobile, some of whom almost never come into the office, then installing agents is a real headache. Sometimes you can't help it because the agents you need to instal are the very ones that will make managing large numbers of mobile users practicle. But where we can avoid it, we do.

      TW

    2. Re:I advise a mixed approach by Maljin+Jolt · · Score: 1

      Seems you still didn't invented an installation agent.

      --
      There you are, staring at me again.
  3. Why the hassle by mecanicaz · · Score: 1

    Inventing, reinventing and treinventing agents, paying for extra management tools then you discover they're broken or din't fulfill your needs (although otherwise advertised), and of course the cross platform headache; all this and we simply forget the standards, the keyword is SNMP.
    OK, it's not secure, but again what else is secure if we don't give it enough research and care, it can be simply implmented and it's integrated in most of the equipment that need monitoring, but hey we ignore it, as long we didn't pay zillions for it then it's ignorable.
    Don't you just hate it when vendors force you to think out of the open standard and cost you more for a lesser featured solution, people please wake up.

    1. Re:Why the hassle by m1ndrape · · Score: 2, Insightful

      SNMPv3 is supposed to be more secure, but then again how many products out there really support v3.

      --
      Donald Ray Moore Jr. (mindrape)
      Suspected Terrorist
    2. Re:Why the hassle by mecanicaz · · Score: 0, Flamebait

      And why there aren't products supporting v3? The obvious greedy vendors pressure.

    3. Re:Why the hassle by arivanov · · Score: 3, Informative

      I admire your optimism regarding the availability of SNMP and its capabilities. The reality is considerably bleaker.

      First of all, as far as hosts are concerned only a small fraction of people writing an application bother to define a MIB and register OIDs. The fraction that has bothered to read the proxy agent specs and plug themselves correctly into the SNMP agent is even smaller. Even really trivial things like RAID status are simply not present on most OS-es. Plenty of things in the MIB are still 32 bit counters while the OS-es have moved on to 64 bit internally. SNMP on a Unix (or Winhoze for that matter) platform is a disaster area.

      Second, SNMP is too inflexible for large network applications like modern access boxes and high end routers. These nowdays discard most of SNMP functionality and replace it with proprietary protocols or XML. Cisco HFR and the ex-Uniphase (now Juniper) boxes are prime examples.

      Third SNMP has never been the favourite due to its inflexibility for applications related to deep telco nuts and bolts like element management, mobile comms systems, etc. The reasons are too long for a slashdot rant, but they are there and they are real. This is mostly corba territory with some web services sprinkled in a few places. SNMP does not play there.

      Overall, SNMP is used only in places where minimal surface level monitoring is required and the requirement for reliable transfer of alarms and data is not present. It is either discarded or supplemented by custom agents in nearly all cases where people need to look into the guts of the system.

      --
      Baker's Law: Misery no longer loves company. Nowadays it insists on it
      http://www.sigsegv.cx/
    4. Re:Why the hassle by kjs3 · · Score: 1

      Obviously, because customers aren't asking for it.

    5. Re:Why the hassle by Pii · · Score: 2, Informative
      It is actually more secure, in that it supports encryption (if you enable it). Currently, it's only single DES encryption, but efforts within the standards community are looking to add support for 3DES and AES.

      In the interim, however, you can always use IPSEC to provide the security that SNMP lacks, providing your equipment supports it.

      On the NMS front, there are a number of platforms that support SNMPv3. NetCool and Spectrum as a couple of examples, and Concorde will have it by 3rd Q this year.

      --
      For those that would die defending it, Freedom
      has a sweet taste that the protected will never know.
    6. Re:Why the hassle by Anonymous Coward · · Score: 0

      I'd say "hassle" and SNMP go together very well. I have Windows boxes that, for whatever reason, renumber their CPUs every time they reboot. So, I have all my Nagios checks set up for hrProcessorLoad.1, hrProcessorLoad.2, etc. Then a box reboots (believe it or not, our Windows boxes need to be rebooted occasionally!), and its CPUs get renumbered to .3, .4, .5, & .6. Then I get to go fix Nagios.

      Then there's net-snmp, which I've spent 2 years trying figure out why it stops responding to queries after about 8 months. I eventually gave up and scheduled a cron job to restart it every 6 months. Looks like this might be fixed in 5.3 (Thanks net-snmp guys!).

    7. Re:Why the hassle by kjs3 · · Score: 1

      I could not have possibly said it better. Definately consistent with my involvement with large scale platform management.

  4. In a nutshell: speed of alerts vs. footprint by mildness · · Score: 3, Insightful
    Please know that my perspective is systems monitoring opposed to the mostly NW orientation of the original (very good) article.

    The main difference for my company's application is that an agent can tell you immediately of service degradation while an agent-less solution must wait for the next polling interval. As the article mentions, another important consideration is that agents can drill much deeper.

    Importantly, agents require less NW overhead but take up more, often cheaper, RAM, disk and CPU resources.

    In my current situation, my approach is to deploy agents wherever possible.

    Cheers,

    Bill

    --
    bamph
    1. Re:In a nutshell: speed of alerts vs. footprint by Anonymous Coward · · Score: 0

      Oh yeah waiting is such a hassle especially when the polling intervals can usually be set down to 60 seconds and the network overhead is something like a few K for quite a large number of systems. Add to that the fact that most enterprises can accumulate 6-8 agents for various applications and eventually your computers can be quite full of these little to no cpu usage agents. That little to no cpu usage adds up somewhere.

    2. Re:In a nutshell: speed of alerts vs. footprint by mildness · · Score: 1
      ... polling intervals can usually be set down to 60 seconds and the network overhead is something like a few K for quite a large number of systems. Add to that the fact that most enterprises can accumulate 6-8 agents for various applications and eventually your computers can be quite full of these little to no cpu usage agents.

      Work for an vendor of "agent less" monitoring solutions there AC? (:-{)}

      Unless you are blessed with tons of unused bandwith, cranking the polling interval to 60 seconds for thousands of devices will decimate your NW, leading to a lynch mob of NW Admins, managers and end users outside your door.

      To your second point, if you need 6-8 agents on each computer you have selected the wrong monitoring vendor. I find one per server is just fine.

      Finally, I'll point out that you've wisely avoided the subject of monitoring depth.

      Other than that, thank you for your input,

      Bill

      --
      bamph
    3. Re:In a nutshell: speed of alerts vs. footprint by NeutronCowboy · · Score: 1

      Since I install monitoring for a while, I have a question for you - how do you deal with new devices coming onboard? Here's the other question: how fast do you need your alert? Keep in mind that before an alert gets acted upon, it has to be received, read and investigated. These are the two areas where agent-less monitoring really shines, and I'm curious what you think of the tradeoffs here.

      --
      Those who can, do. Those who can't, sue.
  5. To Agent, or Not To Agent, That is the Question by AndrewStephens · · Score: 3, Interesting
    It all really depends on how important the service is. If you can stand a few minutes delay in getting the information, then pinging the service remotely every 2 minutes is going suit you fine. If not, then a specific agent will be required to send out the alert. To be really safe you really need to do both, in case the whole data centre blows up and takes out your agent as well.

    A lot of Windows software that claims to be agentless really just remotely installs a small stub using a domain account behind the scenes to do the task. Microsoft is actually making a decent stab at the problem with WMI, a sort of big brother to SNMP. Unfortunately the implementation is complex, non-standard, and up until now nobody has really used it for the type of remote instrumentation that this article talks about. Even Microsoft's own software has not really been instrumented properly.

    --
    sheep.horse - does not contain information on sheep or horses.
    1. Re:To Agent, or Not To Agent, That is the Question by Jalthar · · Score: 1
      Microsoft is actually making a decent stab at the problem with WMI, a sort of big brother to SNMP. Unfortunately the implementation is complex, non-standard, and up until now nobody has really used it for the type of remote instrumentation that this article talks about.
      Actually, they WERE making a good stab at this - five or so years ago. Since then, the nature of where they're trying to go with this has changed, nearly the entire project-team has disbanded (and was reformed with a different focus), and they've stopped pushing for WMI-related WHQL requirements - thereby nipping the WMI-centric management idea in the bud.

      To correct a few other misconceptions you stated... It really isn't all that complex; it's essentially an object-oriented database, with object instantiation, linkage, inheritance, an event system, and various other spiffy things. It IS most definitely standard; see the DMTF and their specifications for CIM and WBEM. And yes, SOME companies actually do make use of WMI for remote "agent" type system-monitoring; the company I work for makes extensive use of WMI for monitoring of remote servers, and proactively initiating Alerts that are sent out to our own servers letting us know when critical components of our customers' fault-tolerant servers have gone awry.

      Don't get me wrong, Microsoft's WMI has had a LOT of problems in the past - the service crashing, memory leaks, lost notifications, etc. - but it's relatively stable now. And it's a fricking great idea in theory, as per the CIM/WBEM standards...
      --

      --
      Need a break? Check out CircusIrata

    2. Re:To Agent, or Not To Agent, That is the Question by ocbwilg · · Score: 2, Interesting

      A lot of Windows software that claims to be agentless really just remotely installs a small stub using a domain account behind the scenes to do the task. Microsoft is actually making a decent stab at the problem with WMI, a sort of big brother to SNMP. Unfortunately the implementation is complex, non-standard, and up until now nobody has really used it for the type of remote instrumentation that this article talks about. Even Microsoft's own software has not really been instrumented properly.

      Which makes you wonder what the difference is between a really big, complex agent welded into your OS and WMI.

      But seriously, I use both agents and agentless monitoring with WMI. I use Insight Manager running on my servers to warn me of hardware issues, and use a VBScript that I wrote to connect to WMI on my servers to measure things that IM doesn't get. It also pings the servers at regular intervals. Between the two I think that we've got it pretty well covered.

      The interesting thing is that IM basically plugs into WMI itself, though it does have new WMI classes that are HP/Compaq specific. If you have a decent engineering with good scripting skills (VBS, Perl, Jscript, Python, whatever) then it's really easy to use WMI to monitor and manage just about evrything related to Windows servers.

    3. Re:To Agent, or Not To Agent, That is the Question by AndrewStephens · · Score: 1
      WMI is standard, in that the object model was set by an industry wide body, but Microsoft went their own way with the programming interfaces and network transport parts, using DCOM. Remember that the W stands for Windows. I always thought this was a shame, it would be cool to have a single console that managed all the Windows and Unix (and Mac?) machines on the network - which is exactly what Microsoft didn't want, I suppose.

      I am not sure that WMI really counts as agentless anyway, one of its great features is plugging in custom providers which are essentually mini-agents in their own right.

      I agree that WMI has not been widely use in the past. Microsoft are having another go at the problem with their MOM console (although might be called something else this week). Its not clear to me whether it builds on top of WMI providers or uses its own scheme, I haven't kept up with it.

      --
      sheep.horse - does not contain information on sheep or horses.
  6. "Agentless" monitoring does not exist by thesandbender · · Score: 5, Informative

    "Agentless" monitoring is a misnomer dreamt up by marketing and sales types to differentiate their product as "better". All monitoring is agent based, the only difference is if the agent you are using is bundled with the system or a 3rd party agent. Most "agentless" monitoring systems acquire their data through SNMP, sar, netstat, iostat, WMI, etc. All these providers will consume system resources in some manner or another so the argument that agents incur more overheard is usually nonsense (unless the agent is very poorly written). In most cases the monitoring packages bundled with the system can be disabled so the new agents will consume resources that would have been used by the system utilities. And poorly conceived/written monitoring schemes will be a drag on any system. The only real differentiation is:

    a) specific metrics gathered
    b) frequency of update
    c) "agent" based required distribution and control of a 3rd-party piece of software

    Performance and resource utilization are a red herring.

    1. Re:"Agentless" monitoring does not exist by Onan · · Score: 2, Interesting


      This is not correct.

      It is absolutely true that snmpd, sar, and whathaveyou count as "agents" as much as anything else. However, you've artificially limited the discussion to only the range of monitoring appraoches that use such tools; of course when you only discuss types of monitoring that use agents, there is no such thing as agentless monitoring.

      However, many (and arguably many of the best) monitoring approaches simply observe the behaviour of the actual running services, without using any additional tools on the monitored systems.

      eg, want to know whether your webserver is up? Don't rely on a tool running on the webserver machine to look for the process and tell you whether it thinks it's up; just give it a request. Want to know how quickly your webserver serves requests? Just give it the request you care about and time how long the interesting bits of it take. This approach is often referred to as "black box" or "end to end" monitoring, though the latter can be something of a misnomer.

      I would argue that such approaches not only exist, they have decided advantages. Asking a tool on a monitored machine whether it is correctly handling requests will never be as authoritative as simply asking it to handle a request and confirming the results.

    2. Re:"Agentless" monitoring does not exist by NeutronCowboy · · Score: 1

      Another differentiation is management. Do you want to install new agents on every new machine in your infrastructure? Unless of course you're just talking about network devices, for which SNMP is fine.

      --
      Those who can, do. Those who can't, sue.
  7. This is great! by Anonymous Coward · · Score: 0

    We run into this ALL THE TIME. I work at a small software startup http://www.certalertsoftware.com/ and everytime we present/install to a cuatomer this very issue comes up. Our software performs network monitoring of several security metrics including the validity of SSL certificates. So, from a scanning prespective, agent-less monitoring gets us 90%+ of what we need and is fully acceptable by many of the customers.

    Where it gets interesting is when the customer finds out that our software has the ability to UPDATE those remote certificates in an agent/agentless fashion. This is why it gets interesting, because there are SO MANY polocies in corporate America that we never run into the same rationalization for either way to do things.

    Lastly, and most interesting to me is some of our customers have suggested that we champion a standard for Agent Monitors where the agent monitor can be installed once and have the properties like polling, summation, and communication. Then all the different network monitoring packages they have could adopt this spec and be way more manigable by them. Easier to implement by us. And standards based for everyone.

    Seems like a great idea, and we are exploring how to do this, but I ask slashdoters .... would this be of any benefit to you?

  8. Agent servers by Spazmania · · Score: 3, Interesting

    At a previous job, the lead engineers used to joke that our email servers were actually agent servers that also ran email. It would have been funnier if it wasn't true.

    Most monitoring agents go overboard. They monitor everything under the sun, even things that require a significant amount of computing power to wrangle in to useful data.

    Even lightweight agents like Nagios' nrpe do stupid things like an expensive forking scan of the process table once for each monitored process. God help you if you're running HP's Openview.

    --
    Moderating "-1, Disagree" is simple censorship. Have the guts to post your opinion.
    1. Re:Agent servers by Anonymous Coward · · Score: 0

      oh man!

      One of the largest telecom operators in the WORLD users HP OpenView to monitor their ADSL-TVoIP infrastructure and believe me, it's not a small network.

  9. Why the hassle-Cats and dogs. by Anonymous Coward · · Score: 0

    Cynic meet optomist. In Mr cynics world, all things happen for the wrong reasons. e.g. greed. In optomists world things happen for more practical reasons. e.g. customer demand. Tune in next week as we see if we can't get cats and dogs to live together.

  10. Remote Nagios by Anonymous Coward · · Score: 0

    Configuring Nagios to execute remote commands via SSH and the deposit them on a central server has worked for us. I have reduced the CPU/memory footprint that many other "agent-based' monitoring systems have consumed. By making the central server do all the work and keeping the data encrypted, I have let the server we monitor do what they need to do, run ciritical applications only without allowing anyone to see or alter the data in mid-strem. The only thing this setup really uses is the under utilized bandwidth on GB connection to the server.

  11. Doesn't matter... by Anonymous Coward · · Score: 0

    I don't think it really matters. We use a combination of technologies including Nagios (OSS), and some of the offerings from Klir Technologies. The Nagios stuff we do relies heavily on agents, and the stuff from Klir relies heavily on SNMP and is completely agentless. For ease of deployment the Klir stuff wins easily and it offers sufficient information for many of our servers, for those where we want something special or more specific to our application we create custom scripts and use the Nrpe agent under Nagios.

    What ultimately matters is are you getting the data you need? Often you may find you do get this data from agentless solutions, sometimes you may not. I'd recommend going with some of the agentless stuff to start, see where it doesn't quite meet your needs and then supplement it.

  12. You end up needing agents to scale... by Mike+Kirk · · Score: 1

    Note: I work for a company that makes agents + their upgrades.

    Others already mentioned you need agents to do a deep dive... lots of companies are running at least 2 of them (one from the vendor to handle the OS + hardware, one from a 3rd party to do "everything else").

    To monitor and manage a large amount of systems you need to push the "smarts" of the system as far down as possible. Pure agentless/polling systems either run into network issues (saturate links with polling) or CPU issues (what do I do with these alarms?)... usually both. With an agent on each box a lot of intelligence about when to trigger an alarm and what to do about it is baked in, resulting in lower CPU use by whatever the "server" is, and network traffic only generated when something actually goes wrong.

    You do end up with agentless tech built in anyways.. since synthetic transactions are so useful, and you'll always need simple periodic polls to make sure the agents are alive and healthy.

    Mike

  13. The biggest downside, overlooked. by Onan · · Score: 2, Insightful

    I'd say that the biggest drawback to the whole category of approaches involving cooperative monitoring is that it adds complexity. And of course added complexity increases the chances that a system will fail to behave in the way that you expect, or indeed fail to work at all.

    Monitoring systems really should be a couple orders of magnitude more reliable than the things which they monitor. One of the most effective ways to ensure that is by having them be far clearer and simpler; an advantage that cooperative monitoring forgoes.

  14. Hard to say. by poppy+fresh · · Score: 1

    I write agentless network inventory software for a living.
    http://www.bdnacorp.com/index.shtml
    That said, my opinions here are not those of my employer. (I'm an engineer - why else would I be reading slashdot non-main-page article?) My opinions also aren't specifically about our product because it does inventory, not monitoring.

    It's hard to say agent or agentless. Someone in a previous comment said there is no such thing as "agentless" and mentioned SNMP, WMI, sar, etc. Naturally, there needs to be *something* giving values. However, the moniker "agentless" (usually quoted :P) implies that there will be no 3rd party application installation overhead. If you are working in IT and are the one who would have to roll out an agent, you can understand why the distinction is important. Using the device's own management interface is typically seen as safer than an agent as well, because it was written by that device's OEM, not the management software company.

    There are downsides to agentless as well, indeed the polling issues are somewhat true. Some technologies, like SNMP Traps, are capable of notifications for monitoring, but some are not. In the case when a machine only has some kind of shell interface, the only choice is to attempt to use that as periodically as desired.

    I also can't break any NDAs, but to those who think that agentless "doesn't scale"
    http://www.bdnacorp.com/customers.shtml
    http://www.gcn.com/print/24_24/36708-1.html
    We work with several of the largest IT installations in the *world*, and we do it with relative ease. However, we're not a monitoring solution - we do inventory. YMMV with monitoring, and with different vendors.

    1. Re:Hard to say. by NMagic · · Score: 1

      I actually work for one of the biggest software companies out there (Not sure if I'm allowed to say who ;) ). We have a lot of different monitoring processes and systems. We have 55,000 employees, and in the datacenter I physically work at, 5,000+ servers (about the 3rd largest in my co).
        We run at least 3 different monitors at this site that I know of. Some are hardware specific, some are 3rd party. It all comes down to what the different lines of business need. Our QA testers have to have a clean system that won't be interefered with for testing. There can't be any installed programs that aren't going to be on customers' systems. That group has to use agentless. Others need more monitoring for server asset management. Our developers take their servers from a pool. In order to effectively manage that pool, we need to have them managed closely to assertain if the server is being used to its full potential. We get that from agent-based monitoring.
      Basically, it comes down to the universal truth of all things... One isn't "better" than the other. It's just suited for a different purpose.

  15. The best agent based i found... by williamyf · · Score: 1

    Is the Q.3 Based Approach.

    I worked in telecomms, and used/administered both a Nokia NMS2000 and a Siemens OMC-S and OMC-B

    While is WAY more complex than SNMP (rmeember te S is for simple) is Extremely reliable, and has many advantages over SNMP:

    Atomic transactions: In Q.3 you can specify a complex configuration change and be certain that, in case of a failure mid-process, your system will be either in the initial state, or the final one, but not in an intermediate state (the lack of this feature, plus the security issues are the reason no one uses SET in SNMP).

    In Q.3 traps can be acknowledged, therefore, no need to send and resend time and time again.

    In Q.3 the concept of trap clearance is part of the standard, and not something that each vendor implements on its own.

    So, more than only monitoring, Q.3 provides REAL management. Is sad that telecom equipment makers are moving away from it and to an inferior standard (SNMP)

    Just my 0,02

    --
    *** Suerte a todos y Feliz dia!