Slashdot Mirror


Ask Slashdot: Remote Server Support and Monitoring Solution?

New submitter Crizzam writes I have about 500 clients which have my servers installed in their data centers as a hosted solution for time & attendance (employee attendance / vacation / etc). I want to actively monitor all the client servers from my desktop, so know when a server failure has occurred. I am thinking I need to trap SNMP data and collect it in a dashboard. I'd also like to have each client connect to my server via HTTP tunnel using something like OpenVPN. In this way I maintain a site-site tunnel open so if I need to access my server remotely, I can. Any suggestions as to the technology stack I should put together to pull off this task? I was looking at Zabbix / Nagios for SNMP monitoring and OpenVPN for the other part. What else should I include? How does one put together a good remote monitoring / access solution that clients can live with and will still allow me to offer great proactive service to my servers located on-site?

137 comments

  1. Reverse-SSH tunnel phone-home from remote device by Anonymous Coward · · Score: 2, Informative

    Set up a script to initiate a reverse-SSH tunnel from the remote device back to a monitoring server, set up no-login on the tunnel but distribute keys for the monitoring user on the remote devices.

    You should be able to passwordless login from the monitoring box over a completely secure link that doesn't require port-forwarding at the remote site.

  2. Scratch my back by Anonymous Coward · · Score: 2, Interesting

    Will you do my job if I tell you the answer? You've already gotten your start. What more do you need?

    1. Re:Scratch my back by Anonymous Coward · · Score: 0

      You beat me to it...

  3. Central server by Anonymous Coward · · Score: 0

    Install one central server at a colo facility. Use VPN between each of yours to it. You'll call your server 'Support.'

    Now you can use SSH to connect to the remote systems as needed to solve problems. Capture SNMP Traps, Syslog, etc from each via the VPN connection to Support. Filter those and and have notification via email to you.

    how much data is created on each system? I not much and you use mysql you can run a mysql instance on different ports on Support as slaves to those masters. In this case you'll be backing up their data live. At night do a dump of each to a file and compress. Now you have a snapshot. Use these to replace hardware failures where they have not done the backup.

    1. Re:Central server by Noah+Haders · · Score: 2

      Would this centralized server be your universal remote server?

    2. Re:Central server by danknight48 · · Score: 1

      Would this centralized server be your universal remote server?

      Is this a serious question? lol

      It would be a Desktop PC, constantly mobile. Works only in 64bit with local mouse and keyboard inputs.

    3. Re:Central server by mlts · · Score: 1

      I would elaborate on that a bit. I would have in the colo facility a Cisco ASA or other hardened appliance, and use that for the VPN connection.

      I would then build a hardened server that accepts the stuff the parent points out, SNMP traps, syslog (both TCP and UDP), but I would recommend a tool like Splunk or a similar item. Splunk has served me well in my dealings. Once that is in place, I'd set up Splunk forwarders on critical machines for more detailed monitoring.

      From there, I'd create a dashboard for realtime reporting, and a daily report detailing notable events from the past 24 hours. One can customize this to their liking. You can even have the reports mailed to you via the VPN to an internal site.

      The Splunk server will need locked down, but if one is in IT, this is an assumed part of the skillset. I would at least leave SELinux enabled, enroll the Splunk server's SSL key in your PKI, and for the OS, enable SSH keys and two factor authentication. I might even consider placing the Splunk indexes on an encrypted filesystem so if the hardware is physically stolen, the data on your machines is protected.

      Again, the thing to be careful about is the fact that so much sensitive data is on this machine, so it needs a separate firewall, and the box itself needs to be hardened.

  4. NSA by Anonymous Coward · · Score: 1

    Should ask them!

  5. I just discovered NewRelic ... by WayneDV · · Score: 4, Interesting

    Check out www.newrelic.com - even their free service tier offers great features and it's easy to deploy on all servers

    1. Re:I just discovered NewRelic ... by astro · · Score: 3, Informative

      NewRelic is pretty sweet, as the parent says, even at the free tier. They will definitely bombard your email and phone with hard-sales pitches, though, and there's a giant cost leap from free to the next tier.

    2. Re:I just discovered NewRelic ... by Anonymous Coward · · Score: 1

      We have NewRelic deployed and pay for it. It is worth the money for us because we not only get the "Is it up" but get to see the software stack interact with the hardware. We had one client who had feature creep and we watched their VM start to die because of memory creep and it justified putting them on another box. When we showed them the reports, they were quite happy to write a check for the upgrade.

    3. Re:I just discovered NewRelic ... by Anonymous Coward · · Score: 0

      Thanks, I'm taking NewRelic off the list. My time is too valuable to deal with salesdroids every other minute. Other ideas?

    4. Re:I just discovered NewRelic ... by Anonymous Coward · · Score: 0

      We evaluated www.newrelic.com along with a couple of other similar services. www.copperegg.com was our pick. Not free, but easily manageable cost wise...and it's easy to ramp up & down when needed.

    5. Re:I just discovered NewRelic ... by Anonymous Coward · · Score: 0

      A slashdotter knows better than fall the victim of ad spammers. It's faster to write a couple of lines of code to accomplish what OP requested than fill some garbage companies' registering forms and figure out whether their proprietary software can be even trusted.

    6. Re:I just discovered NewRelic ... by WayneDV · · Score: 1

      To that point ... I installed it on 11 servers in 14 day "Pro" trial period. Sales guy contacted me by email, we exchanged 3 emails since I will be subscribing in the future but when I told him that I'm happy with free tier for now, there was no further push from their side. Since then I'm up to 30 servers and loving it.

      FYI: Server monitoring is a side product of theirs. Their main product is app stack monitoring - great for finding failures and bottlenecks in PHP, Ruby, Java apps etc

  6. Ping? by danknight48 · · Score: 2

    For Server active status (eg: am i dead?)
    Inside a while loop or sleep() if you cant be bothered.
    for(int i=0;iMAX_SERVERS;i++)
    {
              IcmpSendEcho(..........);
    }

    For everything else monitoring related. Employ someone to make a custom monitoring application ,or, Google "server monitoring software".

    1. Re: Ping? by Anonymous Coward · · Score: 0

      I used to use something like this for my servers.

      You could do something like the following:

      1) Run netcat listening and logging all connections to a file ($nc -L ... > out.log) (this could/should be replaced by a more specifically coded server eventually)

      2) On your servers maybe in Perl or something, every 30 seconds or so get it to connect to your server and send "OK" or something and close.

      This will let you check the log file or parse it to see if any servers may be down or if they missed any status reports. They can also send over other information like RAM usage, server load, whatever.

      3) The Perl script (or whatever lang) could, after sending OK, check for any commands from somewhere. Maybe get a response from http://yourserver/status.php and if the status is like "remote-server-name","NEEDACCESS" then the Perl script can remotely connect to you that way via (reverse) ssh or whatever.

      Could alternately code this all in C or something, and then you don't need netcat and can make a few small interactions between client and server during each of their status reports (that's what I did.)

      I didn't explain this all very well (sorry about that) but there are tons of ways of doing this in theory because the concept is quite simple, and it can be quite robust depending on how you implement it.

      That said, these days getting a dedicated software solution for this is probably best, then you can call the software provider if you have any problems. You can also somewhat use them (rightfully) as a scapegoat if you have any problems with your customers ("sorry, software x is having some problems, they are working to fix it now.")

    2. Re:Ping? by Enry · · Score: 3, Informative

      For some reason, disabling ping is considered a security feature, so a lot of places block it at the firewall. Cloud services (I'm looking at you, Azure) also either doesn't allow it or can't do it.

    3. Re:Ping? by Anonymous Coward · · Score: 0

      For some reason, disabling ping is considered a security feature, so a lot of places block it at the firewall. Cloud services (I'm looking at you, Azure) also either doesn't allow it or can't do it.

      Ye olde "security by obscurity". Most scripts won't portscan what they don't know exists, and ping is where things start. You can't even "pass" Gibson's firewall portscan test when all your ports are stealthed if your router returns a ping. If that's the masters' thinking, then what are we mere mortals to do when everyone else above us follows misleading leads?

      Haha, captcha: "Probed"

  7. reverse ssh by Marqis · · Score: 1

    Have the clients connect with ssh to your server and open a reverse port. They'll each have to pick a different port on your server.

    Use something like autossh ( http://www.harding.motd.ca/aut... ) to make sure the ssh connection is always open.

    Having said all that, sounds like a great security hole if your server is ever breached. Plus lots of potential privacy violations.

    Marqis

    1. Re:reverse ssh by aheath · · Score: 2

      I agree that this creates the potential for a hug security that has the potential to compromise the privacy of all of the employees at 500 companies. The consequence of this breach might be worse there is a connection between his servers and a payroll system or any point of sale system. I also wonder his clients are willing to open up the ports required to support remote access to their data centers.

    2. Re:reverse ssh by Anonymous Coward · · Score: 0

      Yeah I suppose any central control solution is going to be a great big security hole. Do you think dashboards and automation might be security holes, too? I wonder if the login shell is a security hole, holy crap it just might be.

    3. Re:reverse ssh by Anonymous Coward · · Score: 0

      Read the damned comments you are replying to. A reverse-SSH tunnel would not require opening any ports at the client site. Maybe you should figure out what a reverse-SSH tunnel is?

  8. Keeping track.. by Rigel47 · · Score: 1

    500 OpenVPN connections is going to be a bit of a headache to keep straight. Obviously you won't have 500 tun devices so it'll be a multi-client to server config. You'll need a means of knowing that 10.20.20.x is client x and 10.20.20.y is client y. Of course OpenVPN allows you to do this but maintaining that table by hand could be a bit of a pain.

    HTTPS solutions like NewRelic aren't an option because you want to be able to ssh back into the host..

    Assuming all clients will allow it I can only think to create an out-of-band registration process whereby the clients do something like HTTPS POST to a URL you manage. The POST would contain some degree of identifying information which your system would then use to configure a new OpenVPN client config.

    1. Re:Keeping track.. by fearlezz · · Score: 2

      You'll need a means of knowing that 10.20.20.x is client x and 10.20.20.y is client y. Of course OpenVPN allows you to do this but maintaining that table by hand could be a bit of a pain.

      You mean like the common name of the ssl certificate used to connect in the first place? Combine this with a client-connect script to update dns and/or the ifconfig-pool-persist option and you've got a great solution.

      --
      .sig: No such file or directory
    2. Re:Keeping track.. by dskoll · · Score: 2

      Managing the OpenVPN connections is not that bad. You give each client its own key and certificate and you use OpenVPN's ccd/ directory to assign VPN IP addresses.

      We use the following tools to monitor our servers, but we're only monitoring about 30, not 500:

      • OpenVPN for accessing the remote servers. SSH if we need to log on to the server to do something. Some of our more important servers include built-in KVM-over-IP ability which can be very handy if the OS locks up.
      • Xymon (formerly known as Hobbit) for monitoring the health of remote servers. We include some custom Xymon plugins to monitor SNMP variables. I find Xymon much easier to configure than Nagios, though it's not quite as flexible.
      • Munin for tracking performance and ensuring we have baseline data.

      I'm not sure how well this would scale to 500 boxes, though Xymon claims to be able to monitor "lots of systems".

    3. Re:Keeping track.. by Rigel47 · · Score: 1

      Right but how do you know which connection belongs to which client without setting it all up by hand? Presumably he'll have to initiated the connection via script or manually on the first go-round so I suppose that's the proper time to build out the mapping.

    4. Re:Keeping track.. by Anonymous Coward · · Score: 0

      This is not difficult. Create a MariaDB table with enough information for MAC address, public IP, first checkin, and latest checkin.

      Create a REST API that accepts a MAC address and hands out a local tunnel port. This API checks to see if the device is already assigned a port, and if so, it hands it back out. Otherwise it assigns a new one in a range of your choosing.

      This API needs to be capable of modifying ~/.ssh/config to create aliases for your remote devices. You can build a little simple dashboard that allows you to "capture" devices that have phoned home, and configure their details, like a hostname/alias.

      This should take you about 4 hours to accomplish.

      If you want to do provisioning over the link, the REST API should also modify /etc/ansible/hosts. That's all you need, you can then run playbooks like so:

      # ansible-playbook provisioning.yml --extra-vars "target=my_alias"

      Done.

    5. Re:Keeping track.. by Anonymous Coward · · Score: 0

      In order to support Ansible provisioning make sure your remote devices are imaged with the ansible and python-apt packages installed.

    6. Re:Keeping track.. by dbraden · · Score: 2

      There's no need to install Ansible on the remote systems, only on the machine running the playbooks. All Ansible activity is run over SSH and has no remote dependencies.

    7. Re:Keeping track.. by Anonymous Coward · · Score: 0

      python-apt is a dependency for package installation/dist-upgrades. I've seen many playbooks fail on this.

    8. Re:Keeping track.. by mlts · · Score: 1

      I personally have used Xymon with more than that many systems. It takes time to classify them, but it is doable.

      The price is right on Xymon, however, if I were to recommend a monitoring solution for both real time, "oh shit" monitoring such as a drive array about to fail as well as a historical log (for security and finding a baseline), I'd go with Splunk if possible due to the tools available, and the fact that you can send management-friendly reports about the health of the enterprise up the chain.

      Again, a monitoring server is one of the most sensitive boxes you can have (and usually one that isn't secure), so take the time to harden it and do it right.

  9. PRTG by Anonymous Coward · · Score: 0

    For monitoring, check out PRTG (http://www.paessler.com). I think I'd purchased it by the second or third day of the free trial. Nagios worked but we spent a lot of time fiddling with it.

    1. Re:PRTG by chipperdog · · Score: 2

      NAV has very similar functionality to prtg, but is completely open source.

    2. Re:PRTG by chipperdog · · Score: 1

      Network Administration Visualized is a good alternative to PRTG

  10. observium.org works well for me. by Anonymous Coward · · Score: 0

    ditto.

  11. Or you could by kilodelta · · Score: 1

    Just download JFFNMS - it's a Net Monitoring system more than capable enough to watch 500+ servers. It can also be configured to do email and text alerting. It monitors CPU, Memory, Disk etc. It's pretty much the open source version of Nagios.

    1. Re:Or you could by Idimmu+Xul · · Score: 3, Informative

      Nagios is Open Source.. GPL V2 specifically..

      --
      The problem with slashdot is that most of its users were bullied and stuffed into lockers as kids!
    2. Re:Or you could by Anonymous Coward · · Score: 1

      Actually, forget Nagios. Lately It has been turning into a NIH syndrome/Copyright/ego_clash fight. Go with Shinken instead. Drop in replacement for nagios that scales and does not have childish problems.

    3. Re:Or you could by Anonymous Coward · · Score: 0

      I would like to mention the excellent Nagios frontend/extension Check_MK, which can be found at http://mathias-kettner.com/che... .
      It is 100% GPL and comes with agents for all major OSes.
      You can even set up multiple instances and feed them into your main installation.
      It takes some time to set up properly, but once you're finished, you have yourself an excellent monitoring solution.

  12. Security? by Anonymous Coward · · Score: 0

    Nagios monitoring servers over OpenVPN? Works like a charm.

    BUT before you set this up, be damn sure that you don't punch a hole in your customers' firewalls by having a VPN to your monitoring server. Having 500+ VPN connections from one Linux box to servers located in customers' internal networks might backfire at some point if it's implemented incorrectly.

    I did something similar by having a cron script in the server in customer's network that POSTed some statistics over HTTPS to my server. The firewall in the customer's network blocked pretty much everything else. On my end it was relatively easy to get periodically received statistics into Nagios from 100+ servers around.

    1. Re:Security? by Anonymous Coward · · Score: 1

      "BUT before you set this up, be damn sure that you don't punch a hole in your customers' firewalls by having a VPN to your monitoring server. Having 500+ VPN connections from one Linux box to servers located in customers' internal networks might backfire at some point if it's implemented incorrectly."

      Just disable clien-to-client in the OpenVPN server (which routes all activity through the tun device) and setup iptables to accept only incoming/established connections on the tun device. Only allow the server to create new OUTPUT connections on the tun device (and only to ssh/snmpd/nagios-nrpe for example).

      Use certificates with a decent parsable name to figure out which client is where. Configure static ips for the clients to make it easy.

    2. Re:Security? by Anonymous Coward · · Score: 0

      And ofcourse setup firewalls on the clients to restrict access/forwarding

  13. 2X works by Anonymous Coward · · Score: 0

    2X works

  14. PRTG by Anonymous Coward · · Score: 0

    You can install remote probes, you can monitor any number of things, not only SNMP (apache's server_status limited to your PRTG server's IP, for example, is great)

    www.paessler.com/PRTG

  15. Saltstack by Anonymous Coward · · Score: 0

    Saltstack is a framework designed to accomplish precisely this kind of thing. They don't quite have the fancy dashboards yet, but it has the remote control and the framework for monitoring all there. It's free, open, cross-platform, and documented. It's lightweight and doesn't require any VPNs, and it scales to any size.

    Learn about it here:
    http://docs.saltstack.com/en/latest/

  16. Spiceworks, OpManager by Anonymous Coward · · Score: 0

    Spiceworks is free, easy to setup and maintain. You get some pretty good reporting and monitoring capabilities. It's not as robust as New Relic, we use NR for our "real systems" SW for our internal servers/desktops. NR really shines at monitoring applications. SW is definitely more of an inventory software but you get the basics - heartbeat, RAM, and hard drive stats. I've also used OpManager too - SNMP is good if you have the ability of getting into the network, but if you don't then you probably want something that has an agent on it. SW has a remote agent that collects stats and sends them to a main point for collection through http or https and is pretty much made for Windows, so depending on the server OS at the client site it may be a good fit. SSH is good for the connections, just use fingerprints and keys for authentication and make a plan to swap out the keys 3/4 times a year.

    Hope that helps.

  17. Openvpn and x11vnc by Wycliffe · · Score: 2

    I do something similiar. I use openvpn and x11vnc. I have a cron on each client that runs a
    small perl script that grabs the output of several programs like top, uptime, and sensors
    and then saves the results in an easy to parse file that my server periodically grabs so that
    I have stuff like cpu temperature, cpu usage, memory usage, etc...
    I also grab a screenshot of x11vnc using vnccapture.
    I also have a way to remotely activate reverse ssh if for some reason openvpn fails.
    My only problem with openvpn is key management. Creating and distributing unique keys
    to each client is kindof a pain.

  18. I'm sure by Anonymous Coward · · Score: 0

    I'm sure that there's a Microsoft Solution for you.

    Now, let's discuss the licensing terms . . .

  19. Access denied by Anonymous Coward · · Score: 0

    We are not going to allow a permanent remote access into our network.

  20. There are plenty by Anonymous Coward · · Score: 0

    But there is really only one.

    Www.n-able.com.

    It is not free but is designed to manage thousands of nodes. If you are looking for free then you really need some more expireience and to change your mindset.

    1. Re:There are plenty by Anonymous Coward · · Score: 0

      IMO, the free tools are just as good as N-able's tools.

  21. Xymon/Hobbit to the rescue by Anonymous Coward · · Score: 0

    I tried multiple options out but Hobbit (now called Xymon) fit the bill. It's simplistic enough but also has the features I wanted. I found some of the other systems felt like I was configuring the system more than just monitoring the servers.

  22. Hopefully this goes without saying by 93+Escort+Wagon · · Score: 3, Insightful

    Make damn sure your clients are aware of exactly what you're doing. They probably don't care about the specifics (e.g. openvpn, reverse ssh); but they need to know you can remotely access the boxes.

    It's probably a good idea to have some sort of document to give them that does spell out all the specifics - something they need to acknowledge/sign, with both of you keeping copies.

    --
    #DeleteChrome
    1. Re:Hopefully this goes without saying by Anonymous Coward · · Score: 0

      By asking this type of question, with its "roll your own" flavor, I would say that, ipso facto, no it does not go without saying.

      Furthermore, he can't seem to make up his mind about whether this is SaaS or licensed software. If you are selling this as a service then why do they have their own servers living in their network? If not, why can you remotely administer their servers?

      As far as I can tell, the only reason this wouldn't be a cloudy SaaS-y thing is that his clients are concerned about security/employee/payroll data and thus want it to be stored on servers in their possession and control. If that's the case, why the hell would they allow you to remotely administer them?

      Pick one model or the other, or at least tell us your software product name so we can strongly exhort our purchasing managers to stay the hell away.

    2. Re:Hopefully this goes without saying by dskoll · · Score: 4, Informative

      Actually, the model of remotely-managed on-premise appliances is not that crazy. Assuming it's done securely, you get the best of both worlds:

      If the customer's Internet access goes down, they're not dead in the water as they would be with a cloud solution.

      If you manage everything for them, then the box is completely hands-off... just like a cloud solution.

      There's an entire business category called "Managed Service Providers" whose vendors do exactly this: Remotely manage all aspects of your IT infrastructure so you don't need to worry about anything. For mom-and-pop non-technical businesses, it's an excellent model.

    3. Re:Hopefully this goes without saying by Anonymous Coward · · Score: 0

      So, instead of a multiply redundant cloud based system you are trading that for a single point of failure that is on locally-tended hardware? And this is supposed to be a reliability value-add?

      What's more likely to happen: the loss of access to Amazon cloud services/internet, or a local box getting cacked because it's a single point of failure running on a single physical machine and you have to roll out your software upgrades to n sites, each with their own quirky firewall rules and local net admins?

    4. Re:Hopefully this goes without saying by dskoll · · Score: 2

      The fact that a well-managed cloud service is multiply-redundant is of little consolation if your crappy DSL line goes down for 6 hours and your salespeople cannot access the CRM tool.

      What's more likely to happen: the loss of access to Amazon cloud services/internet, or a local box getting cacked

      Unequivocally for us: Loss of Internet access happens far more often than a server failure.

    5. Re:Hopefully this goes without saying by Anonymous Coward · · Score: 0

      I guess I'm more used to shitty software running on servers that needs to be kicked and/or remotely administered daily. Especially when we're talking about some guy coding this out of his basement and now asking how best to run rampant inside his customers' networks.

      You should get your DSL looked at.

    6. Re:Hopefully this goes without saying by dskoll · · Score: 1

      Our DSL is not particularly unreliable. However, our servers are spectacularly reliable. They run Linux on decent hardware and we almost never have a server failure. Our most common cause of a server failure over the last 10 years has been power failures long enough for the UPS to decide we'd better shut down.

  23. OpenVPN + Nagios by 8083 · · Score: 1

    is the solution I use and is working well. Routers are 1U mini atx boards with pfSense. Nagios mostly with NRPE, SNMP for devices, on which I can't install packets. Works well for last ... 8 years or so.

    1. Re:OpenVPN + Nagios by sirsnork · · Score: 1

      Icinga rather than nagios... always... the simple basic changes to Icinga make it so much nicer to work with, even the v1 branch which is just a fork with some updates

      --

      Normal people worry me!
  24. zabbix is NOT an snmp manager by TheGratefulNet · · Score: 2

    not really. snmp is an afterthought for them and its clumsy as hell to add snmp to it. I tried and gave up. instead, I picked hobbit (uhm, the new name is 'xymon').

    xymon has its quirks but it was not hard to modify to add more snmp features to and its coding was not too bad to get thru. its not written in a lot of 'strange' languages, and that's a plus, to me, too.

    personally, I usually just write snmp code fresh, from scratch, using net-snmp mgr tools. its not hard and you get just what you want and you are not muddled down in lots of 'infrastructure' that someone else thought was good but useless to you (like zabbix).

    --

    --
    "It is now safe to switch off your computer."
  25. BMC Patrol by snowsnoot · · Score: 1

    Excellent monitoring solution can generate KPI based reports, email/sms/snmp notifiactions etc, comes with a bunch of out of box server monitoring modules and you can build your own with scripts or SNMP GETs. I swear by it.

    1. Re:BMC Patrol by Anonymous Coward · · Score: 0

      This was years ago, but I walked away from the highest paying job I ever had working on Patrol as a contractor because it was such an extreme whip-me, beat-me, make-me-write-bad code situation. I've had some recent confirmation that they haven't changed their approach. It was practically designed to leak memory, which is particularly bad for server agents. I find it hard to believe that there is a satisfied customer out there. Maybe they eventually got enough lipstick on the pig or re-wrote the crap they made us write.

    2. Re: BMC Patrol by snowsnoot · · Score: 1

      Im not using the agents :)

  26. KISS by Anonymous Coward · · Score: 0

    1) Setup your own VPN server elsewhere, reachable on the net. Make sure OpenVPN (or the like) support client isolation/incapsulation, and be ready to enforce it with your firewall (at the very least) or some authentication besides the VPN certificates.
    2) Deploy to each client's machine its own VPN access, and let them connect at boot and reconnect if the link goes down.
    3) Setup a .php script or the like to read the vpn active user database, let it display nicely and place it on your vpn server's webserver (with authentication of course)

    If a host is up you know it, and you can use the very same vpn to reach each single client. Anyway, each client can't and won't reach anything besides your vpn server. The same VPN should help with nagios, collectd or anything in between.

  27. Kaseya? by Anonymous Coward · · Score: 0

    An agent on each box creates outbound connections to your central server...
    I know of a bunch of little IT shops that use it, so its not overly expensive... $2.50 or less per agent.

  28. RHQ (JOPR/JON) by Anonymous Coward · · Score: 0

    You might take a look at the RHQ project. It can likely do what you need. You install a server part, plus an agent on each client machine. The agent uses various plugins to monitor various aspects of the server ranging from OS parameters (disk space, I/O, CPU usage, memory usage), to specific pieces of software (JBoss, Tomcat, Apache, MySQL, Oracle, etc). You can define alerts to monitor using any of the metrics gathered by the client agents.

    Oh, and it's open-source, though RedHat would be more than happy to sell you a license with support. https://docs.jboss.org/author/display/RHQ/Getting+Started

  29. Kaseya by Anonymous Coward · · Score: 0

    We have used Kaseya to monitor our servers. It seems like it may be worth looking into for your situation.

  30. I like doing it quick and simple by Anonymous Coward · · Score: 0

    a very basic php script that returns a status code on all the remote servers, a db on my server with list of the various urls, and a jquery page that changes the codes into pretty colored lights or something

  31. SecureLink, Bomgar by Anonymous Coward · · Score: 0

    I was researching this exact situation. Two great remote server support solutions are SecureLink and Bomgar. Both allow access to unattended machines, both have strong auditing for customers that want to track that, and both have allow command line access. SecureLink is cheaper, easier, more barebones, Bomgar is way expensive and has a more involved setup, but has a lot more features.

  32. Checkout r-u-on by Anonymous Coward · · Score: 0

    The idea is you have a public location and an out going protocol t which removes the VPN issue.

  33. logic monitor by Anonymous Coward · · Score: 0

    I came across a service called logic monitor last year though didn't start to evaluate it until about a month ago. I've been doing custom monitoring (mainly graph/trending stuff) for 15 years and nothing I've ever seen comes close to what this service offers and the price is really cheap too, with costs starting at $20/server/month(costs go down with volume of course).

    If you have a vcenter server for example you can monitor all of the metrics for all of your VMs on that vcenter server and it only counts as 1 server. For my org anyway it paid for itself in the first few days of using it since we can consolidate load balancer, firewall, switch, vmware, mysql, and other metrics into very easy to use dynamic dashboards. They do alerting and reporting too but have not had time to mess with them yet.

    The service works on agents, you deploy one or more agents per network segment and they communicate back to their SaaS platform. Then you configure things and view the graphs/dashboards/etc on their platform with your desktop or mobile browser.

    They have another feature that I have not tried that allows you to SSH and remote desktop in through their platform (to their agents which then proxy the connection to the destination system). I suspect ssh wouldn't work for me since all of our servers use ssh keys and I wager the java ssh client they are using doesn't support them (but I haven't tried, don't really need that feature). You can disable this functionality if you wish as well.

    I use it to monitor Sonicwall firewalls, Citrix Netscalers, VMware vCenter (w/600 VMs) & ESX/ESXi hosts, mysql, network switches(SNMP & sFlow), power strips, fibre channel switches, memcache servers, varnish servers, rabbitmq servers, and will be adding custom HP 3PAR monitoring as well soon too. See more info here - http://www.logicmonitor.com/monitoring/ . I got it mainly for the infrastructure end of things and less for the linux-end of things, though it does that well. Literally probably 20,000 data points a minute being collected (probably 15k of those are coming from vCenter).

    It is secure, unlike a massive set of openvpn connections, because it is push, it doesn't need any VPN, it will keep your clients isolated from each other, it will allow you to consolidate monitoring across clients (say create a graph of top 10 CPU usage of your servers cross client for example).

    It is way overkill if all your looking for is simple availability checks, but if you need more sophisticated monitoring again I haven't found anything that comes close, I've spent literally thousands of hours working on monitoring stuff over the years and this product/platform makes it so easy I want to cry. The only knock I have against it is it is SaaS, I would prefer to host it myself, but since I have to make a choice use SaaS or use some other product that can't do this then I have to use SaaS.

    I am not affiliated with this company at all -- as another poster mentioned Newrelic is good too(I am a customer of theirs too) though IMO New relic is more of a developers platform their main value add is real time code instrumentation. They try to do the ops thing as well they just aren't as an attractive platform for me anyway.

    They have a free 2 week trial available as well.

    I've posted only maybe 5 times on slashdot in the past 15 years so I don't have an account, if you wish to get more info from me personally on this you can reach me temporarily anyways at slashdot .@t. linuxpowered .dot. net (I will kill that email address in coming days to avoid spam)

    nate

  34. Zenoss is awesome by Anonymous Coward · · Score: 1

    Zenoss is awesome and as your business scales so can it. Our organization monitors 5000+ servers worldwide in all sorts of places. Zenoss lets you do everything you'd want. Setup notifications for one or more servers, types of errors, and filters within filters. It's a rocking platform and if you're big enough, they'll set it all up for you for a fee.

    1. Re:Zenoss is awesome by Anonymous Coward · · Score: 0

      Yeah been there done that. Tried their software back in 2008 and it appeared nice at first but then sucked major balls.

      ZODB would get corrupted causing things to be missing or lost.
      The "plugin" framework was a terribly complex ZOPE template combined with odd python scripts they force you to use in the plugin for reporting.
      The import/export of servers and monitors never properly worked.

      We even paid for their support and developed free plugins on their website for people yet it never really matured or panned out. Excuses about every rebuild and it was far easier to dump that and write a custom app in 2 months and be done with it.

      Seriously. Don't fall for this trap.

    2. Re:Zenoss is awesome by Anonymous Coward · · Score: 0

      If you are going to use commercial monitoring software, you may want to take a look at ScienceLogic .

      As far as remote connection Reverse ssh tunnels are definitely your best bet if you need to forward SSH or any TCP port

  35. Stunnel for secure connection. by Serpent6877 · · Score: 1

    Would need more information on the locations. Running Linux, Windows, Solaris? I presonally use Zenoss for all of my monitoring. It is handling around 1800 devices right now and monitors all aspects of the network and servers. Zabbix uses agents. So you could run the server at your location and of course the agents connect to it for monitoring. People talk about needing a VPN connection to be safe. But another solution that I would do is use stunnel for encrypting. I do run a large openvpn setup as well. With this large of a VPN setup I would look at possibly using Quagga and doing RIP. It will be easier to manage all the routes and netblocks.

    --
    When all else fails, hire me!
  36. NAV works great by chipperdog · · Score: 1

    NAV is a great network and server monitoring suite...I have it monitoring much stuff connected over VPN.

  37. Re:Reverse-SSH tunnel phone-home from remote devic by Anonymous Coward · · Score: 0

    I once did this and it worked like a charm. I had a central server via which I established a connection to the remote sites. You don't need to write more than a few lines and add a cronjob to make it functional.

  38. Look at the ELK Stack by SpzToid · · Score: 1

    The ELK Stack (ElasticSearch, Logstash, Kibana) are great tools for capturing logs from *anything*, indexing and massaging of the data captured, and then offering up visualization, searches, and dashboards (that refresh). Built with Angular.js so the speed happens.

    We could be talkin' web server logs of the NY Times servers, centralized and displaying dashboards in real-time, or maybe 24/7 sensor data streaming from the ocean floor. The ELK Stack can do it.

    First googled citation, and there's plenty more where this came from: http://thepracticalsysadmin.co...

    --
    You can't be ahead of the curve, if you're stuck in a loop.
    1. Re:Look at the ELK Stack by silas_moeckel · · Score: 1

      ELK works but frankly it's defaults do just about nothing. As a stack sure it's great but it needs to be added as an adjunct to a real monitoring system and it needs useful defaults and/or some sort of add on repository. The opennms boys are working on showing rrd data into ES.

      Pretty much you set up ELK and go great my logs are all one place but it does nothing by default nor is it easy to do anything useful with it. Adhoc searches of logs is great in all but your basically replacing ssh cat | grep. Take a common thing like percolating up an alert when a bit of redundant hardware fails and pushing that event into a ticketing system to the correct group and priority and ELK needs a lot of customization to do anything useful. Sure you can put an search in a window somewhere and make a human look but that is frankly going back 2 decades in sysadmin space. Devs seem to like it but it's pretty much an adhoc reporting tool for them.

      --
      No sir I dont like it.
  39. Re:Reverse-SSH tunnel phone-home from remote devic by BitZtream · · Score: 4, Insightful

    Or, do the right thing and hire a network admin so someone with a clue is involved.

    If you have to ask this question on slashdot, you need to change the question to something appropriate. Based on exactly what was posted, he doesn't have any idea what his requirements are. He knows the conceptual goals, but not the actual goals or requirements. Unless he is trying to change careers from whatever he is to a full time network infrastructure person he is going to be wasting a lot of time getting a clue. That means time he won't be spending doing whatever his actual job is.

    He needs someone who can look at his actual setup, figure what what actually needs monitored, and knows the appropriate ways to do it.

    Short of multiple Bennett hasleton length posts, and many discussions in depth, no answer coming from slashdot or all of them combined is going to be useful.

    Everyone here posting solutions has their own, certainly incorrect idea of what he wants but no one actually knows. No one so far has even started by asking the right questions. It's the blind leading the blind at best.

    --
    Persistent Volume manager for Kubernetes - https://github.com/dwimsey/openshift-pvmanager
  40. Re:Reverse-SSH tunnel phone-home from remote devic by Anonymous Coward · · Score: 0

    The reverse-SSH tunnel is the correct way to "phone home". Maintaining a VPN is a shit show.

    Pure Storage does it this way, and they are quite the experts.

  41. Re: Reverse-SSH tunnel phone-home from remote devi by Redbehrend · · Score: 1

    How does he not have it in the first place? Explorer with 500 client servers.. Soon as I had 5 servers I setup central mobile monitoring lol He needs to hire someone that knows what they are doing for sure. Google it and the top open source monitor comes up as a start...

  42. GFI MAX by DigiShaman · · Score: 2

    Problem solved. Next topic please.

    http://www.gfimax.com/

    --
    Life is not for the lazy.
  43. Re:Reverse-SSH tunnel phone-home from remote devic by Anonymous Coward · · Score: 0

    Just because you're unfamiliar with networking administration doesn't mean this needs to blown up into "hire a network guy". That's just ignorance and (I suspect) trying to make yourself sound important on an anonymous message board.

    The solution is not complicated and has been pointed numerous times in this thread: ping NewRelic, set up the system and you're done.

    As my granddaddy used to say, if you don't know what you're talking about, it's best to not open your mouth and prove it. So no need to apologize, just take the advice and consider it a lesson learned. Best of luck.

  44. Look no further: www.n-able.com by Anonymous Coward · · Score: 0

    You must check out n-central from n-able. It is really great and gives you all kind of monitoring features.

    Admin console is web based and agents push data over ssh back to your central server.

    You really must check it out.

  45. Call me silly by Princeofcups · · Score: 1, Insightful

    But shouldn't this have been part of the design BEFORE you rolled out 500 servers?

    --
    The only thing worse than a Democrat is a Republican.
    1. Re:Call me silly by thegarbz · · Score: 1

      I'll bit, and I'll call you silly.

      Many projects evolve over their lifetimes. This isn't just an IT thing. In many cases during the construction / commissioning stage you'll come out of the end with a wishlist of things and features to add in the future. Many such things would be impossibly expensive (both in money and lost time) to add during the project stage, and many projects which demand everything from the very beginning end up turning into an unmanageable behemoth.

      If the primary goal was to get 500 servers operational then adding this after the go-live is perfectly legitimate.

  46. Re: Reverse-SSH tunnel phone-home from remote devi by GTO44 · · Score: 1

    Maybe he should hire someone considering he has 500 servers and he is just now thinking of implementing a monitoring solution. And this board is only anonymous if you post as AC ;) also, fuck your granddaddy

  47. Nagios with NSCA? by z3r0w8 · · Score: 1

    I would write a wrapper though to make the whole thing bit more robust. Groundwork does this with their GDMA agent and it allows you centrally configure and have the client pick up its configuration.

    --
    -----
  48. That paid product looks like shit by Anonymous Coward · · Score: 0

    That paid product looks like shit

    1. Re: That paid product looks like shit by DigiShaman · · Score: 1

      It does the job and fulfills all the requirements of the OP.

      I use it for this purpose, I should know. For example, if the Information Store service stops or the drive reaches a free space threshold, I'm going to be notified immideately!

      --
      Life is not for the lazy.
  49. check_mk by Anonymous Coward · · Score: 0

    Check_mk works like a charm. We have over 2000+ servers and 100,000 items monitored all done by phone home autossh. And yes this is my day job.

  50. You could outsource this. by rspott · · Score: 1

    Let me know if you want to do something like this and we can work something out. Reply to this and we can connect.

  51. I would do exactly what you outlined by maas15 · · Score: 1

    A place I worked for did exactly that. There are a few details that you should attend to - give out ip addresses based on the ssl certificate used by the openvpn client (and make sure you don't deploy the same ssl cert to two servers!), and have a method of restarting openvpn every time it crashes/disconnects (and exits). You'd be surprised how flaky enterprise internet connections can be. From there my work kept a database of all the openvpn servers and used it to generate a nagios config. Honestly, I've never loved nagios since it frequently doesn't QUITE do what I want, but it's good enough. If your clients are all internet accessable, I've been using a slightly expensive commercial service call Monitis which I really like. Contrary to what a number of people here have said, I don't think you need a network admin at all, if you can get the vpn stuff working with a simple acl (to keep clients' interns from bothering each other) then you should be set.

  52. SolarWinds Server & Application Monitor? by zmq503o1 · · Score: 1

    Have you considered SolarWinds Server & Application Monitor? The latest version, currently in beta adds an optional agent that negates the need for VPN tunnels. It supports overlapping IP address space, NAT traversal, passing through authenticated proxy servers, and communications are fully encrypted. These agents report back to a single, centralized server at your location, or in the cloud, such as Amazon EC2, Azure, RackSpace, etc.. More information can be found at the following links. https://thwack.solarwinds.com/... https://thwack.solarwinds.com/... https://thwack.solarwinds.com/... If that doesn't fit the bill, you should consider taking a look at N-able, which is a purpose built solution designed specifically with MSP's in mind. More information on N-able can be found at the following link. http://www.n-able.com/

  53. Lemme rephrase by Munchr · · Score: 1

    I want to drastically increase my clients exposure to attack by opening remote holes in their network firewall through my equipment. How can I best go about doing so?

  54. Re:Reverse-SSH tunnel phone-home from remote devic by BitZtream · · Score: 1

    The reverse-SSH tunnel is the correct way to "phone home". Maintaining a VPN is a shit show.

    A blanket statement like this shows your cluelessness and shear ignorance.

    Without considerably more information neither you nor I nor anyone else can make such a statement.

    Pure Storage does it this way, and they are quite the experts.

    Oh well, since a company thats barely 5 years old does it this way, and since their primary business line is selling flash drive arrays ... not network administration and monitoring ... they must be the most qualified and perfect example to follow.

    IS IT the right way for THEM? Maybe. Maybe not. To pretend that just because they do it that way, they are experts again just shows your ignorance. Let me guess, you work for them on their monitoring team, don't you?

    --
    Persistent Volume manager for Kubernetes - https://github.com/dwimsey/openshift-pvmanager
  55. bash by LordThyGod · · Score: 1

    ... is your friend. A simple shell script run from cron every so many minutes to test to each server, and then text / email / raise an alarm if no answer. I'd do this from at least 2 locations to allow for transient network issues or the monitoring systems have hardware issue and tank. And don't use windows for critical stuff. A couple of low end linux systems on amazon or similar would work. Low cost, efficient and very manageable.

    1. Re:bash by Anonymous Coward · · Score: 0

      Or just use haproxy with some sort of heartbeat script?

      Why do people consistently use Bash for MONITORING tasks. This is beyond stupid. Unless your script feeds a remotable dashboard you have no idea what's going on with your nodes.

      By all means, use Bash. But Curl something to a REST API for by-second updates or you are just creating more administration nightmares.

    2. Re:bash by i.r.id10t · · Score: 1

      Duct-tapey but it works. Lets add some bailing wire and include a phone with a limited data plan that you can pair with via bluetooth or usb. That way, when both their internet connection *and* the box you are monitoring go tits up at the same time you can be notified as well.

      --
      Don't blame me, I voted for Kodos
  56. Re:Reverse-SSH tunnel phone-home from remote devic by BitZtream · · Score: 2, Funny

    Just because you're unfamiliar with networking administration doesn't mean this needs to blown up into "hire a network guy". That's just ignorance and

    As someone who's been a network admin for a few years, I'm fairly confident in my statements. Do you do even minor surgery on yourself if you're not a surgeon? If you come to slashdot to ask how to do something for your business, you already fucked up and the only valid responses you should be getting from slashdot are help on finding someone who can help you. If he asked 'how do I find someone, like a consultant for a short term project, like this' that would be one thing. He didn't, he came here expecting a solution which illustrates his complete lack of understanding of the problem, THAT IS WHY he needs to hire a network guy.

    He is, by definition, ignorant, which is why he is asking for help ... clearly you are as well as your choice of words indicates. I suggest you learn what the word ignorant means before you brandish it about like an insult as you just end up insulting yourself through your own ignorance.

    (I suspect) trying to make yourself sound important on an anonymous message board.

    I have no need to make myself sound important, I certainly don't need your approval ... and if you bother to google for my nick, you'll find its not even a little difficult to link to a real name, address, and everything else. I'm not in the least bit anonymous. People have been able to recognize that nick and its association with me for 20+ years. On the other hand ... your post ... is from ... anonymous coward. Do you know the meaning of the word ironic?

    As my granddaddy used to say, if you don't know what you're talking about, it's best to not open your mouth and prove it. So no need to apologize, just take the advice and consider it a lesson learned. Best of luck.

    Your grand daddy said that too you a lot, didn't he? Did you ever wonder WHY he said it too you so much? Maybe he was trying to get some sort of point across to you ... Go look in the mirror and repeat those words until you get the point of them and who he was talking about. Hint: Its the guy in the mirror.

    You're an absolutely shitty troll. You just suck at it. Nothing you've said did anything other than show how stupid YOU are, not me.

    --
    Persistent Volume manager for Kubernetes - https://github.com/dwimsey/openshift-pvmanager
  57. Re:Reverse-SSH tunnel phone-home from remote devic by Anonymous Coward · · Score: 0

    Pure is the premiere brand in SAN storage at the moment. Their engineers are simply doing it better than the competition from EMC etc. They understand Linux and they understand network storage.

    They also happen to understand the cloud and the Internet of Things. And apparently they understand how to use the networking Swiss Army knife of our time, SSH.

    You can take your non-technology-related quibbles to your management.

  58. Easy monitor solution by Anonymous Coward · · Score: 0

    Nagios + NRPE (+ iptables).

  59. Re:Reverse-SSH tunnel phone-home from remote devic by Pentium100 · · Score: 1

    As someone who's been a network admin for a few years, I'm fairly confident in my statements. Do you do even minor surgery on yourself if you're not a surgeon?

    I am a network (and Linux) admin by profession, but I can also repair my audio equipment and do some repairs on my car, even though I do not work as a car mechanic or electronics repair guy. While I could find a mechanic to repair my car (and sometimes I do), a lot of the time is is cheaper and faster to do it myself.

    So, if the OP wants to create a monitoring solution himself (assuming he knows something about the monitoring systems) more power to him. I probably would ask a similar question if I had to monitor 500 remote servers that are in different locations (if they are all in the same place I would just use VPN). It would be possible to use VPN or SSH tunnels or something else, but sometimes one may need an advice from others as to which option is the best.

  60. Re:Reverse-SSH tunnel phone-home from remote devic by Anonymous Coward · · Score: 0

    Smug prick.

  61. Oh Please. by Anonymous Coward · · Score: 0

    Frankly, I never understood why ANYONE would ask Slashdot contributors a question. You get nothing but grief from arrogant puds who never give a helpful answer.

    1. Re:Oh Please. by NemoinSpace · · Score: 0

      Don't forget the other side of the coin. As demonstrated in this story's responses. Anyone who has ever seen a freaken snmp message swears it's the best product ever. I'm pretty sure EVERY product ever hacked together has been mentioned.
      Forget about monitoring. Your IT people route all those alerts to /dev/null. FOR THE LOVE OF GOD! PLEASE STOP THESE ALERTS! Nobody is going to do anything about the problem until your customer calls twice anyway. Oh, and by the way, if you don't want your entire business to drop dead, try just once NOT to buy the most absolute cheapcrap hardware you can find. Whatever you do, don't use chinese drywall in your data center.

  62. since we're suggesting RMM... by Anonymous Coward · · Score: 0

    LabTech Software is lightweight, paid software or monitoring servers. Minimal RAM usage and good at sleeping until needed. The Managed Service Provider industry is pretty big, now, so you've got choices for professional tools specifically designed for this kind of job.

    Disclaimer: I work for LabTech Software.

  63. Re:Reverse-SSH tunnel phone-home from remote devic by Anonymous Coward · · Score: 1

    I agree. A meeting needs to be held with the technical team to determine what exactly needs to be monitored.

    With that being said, ask yourself a few questions:

    Are you looking for a heartbeat?

    Are you actually more concerned for the applications running on the servers?

    Are you looking to monitor individual pieces of hardware, e.g. CPU, RAM, etc.

    Are you trying to determine if a there is a network hardware failure as well, e.g. router, switch, etc. (did a switchport die and did I lose a particular subnet or cluster?)

    All or none of these things can be important, but BitZtream is correct. Without a lot more knowledge of what is needed there is no way of giving OP a method of accurately monitoring the required infrastructure.

  64. Check out "The Assimilation Project" by mnemotronic · · Score: 1

    Take a look at The Assimilation Project : What we do: Continually discover and monitor systems, services, switches and dependencies with very low human and network overhead.

    --
    The Russians have won. They have made the world a cesspool of distrust, greed, fear and hate.
  65. Re:Reverse-SSH tunnel phone-home from remote devic by pspahn · · Score: 2

    You sound like a Windows admin for a gov't entity.

    You spend a lot of energy telling people they do it wrong without having any real insight or advice on how to do it correctly.

    A blanket statement like this shows your cluelessness and shear ignorance.

    What does his knowledge of a specific cutting tool have to do with anything?

    --
    Someone flopped a steamer in the gene pool.
  66. xymon is good by Anonymous Coward · · Score: 0

    xymon (aka hobbit) grew out of bigbrother, one of the early monitoring systems. This has extemsive extensions, is highly customisable and doesn't require a client. The only weakness is the windows client, bbwin, but it does work.

  67. Re:Zenoss is awesome - Zenoss Core + OpenVPN by JayTech · · Score: 1

    Anon - Why base your opinion on an experience back in 2008? This is six years later and the product has matured since then. The Zenoss Core (http://www.zenoss.org) open source project is bigger than it's ever been, it is very reliable, and is used by many large corporations today.

    OP - For what it's worth, any open source monitoring software should play just fine with OpenVPN. However, the monitoring feature set should be simplified into a single interface, you don't want to have to be fixing scripts and maintaining the software all the time.

    I actually used to deploy OpenVPN + Zenoss for remote site monitoring. In my case I needed to monitor multiple systems at the customer premises (using Zenoss Enterprise/Service Dynamics for the remote collector integration), but you should have it a bit easier since you only have one server to monitor. I found configuring OpenVPN to be a bit of a challenge, but once that part was done the rest was a piece of cake. It will be a lot of work with the sheer volume of 500 clients (with that amount of traffic you might even need to break it into two OpenVPN endpoints) but I'm sure you are already aware of that.

    I would say definitely take a closer look at Zenoss Core. A side note, Zenoss Service Dynamics is their enterprise product with advance features, but for you the "technology stack" needs only to consist of Zenoss Core (free) + OpenVPN. Set up OpenVPN as you described so that the clients deployed on your remote servers can connect back through https - as long as they have an internet connection no holes need to be poked through your customer's firewalls. Drop Zenoss on the OpenVPN endpoint box(s). Then use the OpenVPN IPs to monitor the servers. For each individual server, configure the SNMP string if Linux, or set up WMI if windows (no need to configure traps, Zenoss polls the boxes at specific intervals). Use the wizard on the Zenoss web interface to add the host and model it. Away you go, you can now see the events in the Zenoss console for everything from ping status to CPU utilization. Events go to the console which you can monitor, or you can easily set up e-mail alerts to trigger. For example, say one of the disks throws a SMART error; trigger an e-mail you so you can ship the customer a new disk to install just like NetApp does.

    As I mentioned, you can definitely use Zabbix or some other variant to do the monitoring part. I researched and played with many monitoring solutions (commercial and free) before I settled on Zenoss. What made the difference for me was that I found I was spending way too much time learning the quirks of the software (e.g. Nagios - config file to add a client, really! SolarWinds - Agent installation required, really!) and not enough time actually deploying monitoring to the targets. Good luck, hopefully this info helps you find the right fit for your environment!

  68. Ping is not reliable by mveloso · · Score: 1

    Ping is almost the worst way to check to see if your server is up. In fact, certain machines will return an ICMP response even after you've broken into their bios-equivalent (hello, Solaris).

    Do a service level check.It's not that hard to do a curl instead of a ping. A curl's results can show you if it's present and functioning. A ping just shows you that the network interface is responding or not.

    People disable ping because if you don't know a server is there you can't attack it. It's like enabling MAC address filtering - it doesn't really help that much, but it in a specific set of circumstances help a bit.

    1. Re:Ping is not reliable by Enry · · Score: 1

      People disable ping because if you don't know a server is there you can't attack it. It's like enabling MAC address filtering - it doesn't really help that much, but it in a specific set of circumstances help a bit.

      If there's no other services presented to the world, yes. But a simple port scan will tell you it's up and that doesn't take long to do.

  69. Security and liability: think Target by mveloso · · Score: 1

    The media says Target was breached due to a compromise at their HVAC vendor. Do you want to be the vendor that gets hit with a liability suit because someone broke in through your network?

    It's obvious from your question that you're not really sure what you're doing. SNMP? That's for network crap, not for server and application level stuff. Why would you even talk about SNMP? Why would you even want a VPN into the customer network?

    If you need access to your server, write it into your support contract, and ask the vendor for a VPN login. Then the vendor can turn that login on and off when an outage occurs. Then just use NewRelic for monitoring (assuming your machine can get out).

    If you need continuous access to your server, write it into your support contract, then make sure that (1) you really need it, and (2)your security is better than your customers' security.

    Or, if you want to screw everyone, just run a TeamViewer instance on it and connect to it on the sly. I'm sure your customers would love that, but that's what you're basically asking them to allow you to do.

  70. Status updates by AndyCanfield · · Score: 1

    I manage a hub server and a backup server. Every 60 seconds the backup server crontab (wget) fetches a 'web page' from the hub server which as a side effect records the callers IP address into a file. Even though the backup srever has a dynamic IP address I can always find it by going to the hub server and looking into that file.

    I have a page I can go to on the hub server which checks the timestamp on the file BackupServer.ip. if it is suspiciously old then that web page turns red and tells me that things are cut off. If all is OK the background stays green. You can see it at http://gregor/ServerCheck.php. I check it every time I start my browser.

    It would be trivial to support more than one call-in server. It would be easy to add more complex status information. From your notebook computer anywhere in the world you can go to that web page and see that all is OK, or, if it is not, what remove server has a problem.

  71. Whats Up by bev_tech_rob · · Score: 1

    Our company uses 'Whats Up' by Ipswitch. Currently monitoring over 2500 devices such as servers, routers, temperature sensors. You can ping devices, monitor for SNMP events, logged events in Windows, AIX, Linux, WMI monitoring, services, tasks.... You can script custom monitors either via VBscript, Powershell, or JavaScript. You can script custom actions for Whats Up to take upon detecting a condition. Can restart services on either *nix or Windows boxes if they go down. Can launch applications if needed if a condition is detected. Can create audio, visual, and email alerts, as well as SMS. They license on a per-device basis as opposed to a per-port basis like SolarWinds. Only thing I don't care for on this software is you can only run Microsoft SQL for a database. Can't use any open source solutions. The default install uses MS-SQL desktop version, but the db size is limited. If you need to go bigger, you have to install a full install of SQL on the server, or connect to a remote SQL server on your network to host your database (as we are). My .02 cents...

    --
    You're messin' with my Zen Thing, man.....
  72. Re: Reverse-SSH tunnel phone-home from remote devi by Anonymous Coward · · Score: 0

    Your spelling of sheer incorrectly shows your sheer ignorance.

  73. Re: Reverse-SSH tunnel phone-home from remote devi by Anonymous Coward · · Score: 0

    Depending of the composition of your client base, this could easily run afoul of their security policies and trigger all kinds of intrusion detection alerts. Make sure that your clients have a clear understanding of what you are implementing and that they are ok with it. Preferably get it in writing.

  74. Cacti - It's GNU by Tyr07 · · Score: 1

    I've used cacti to monitor servers before, works quite well.

    Supports many SNMP functions, easy to setup.

  75. Re:Reverse-SSH tunnel phone-home from remote devic by Tyr07 · · Score: 0

    Everyone who is posting suggestions is actually being useful. These are called 'Ideas' and the original poster can read these 'ideas' and see if something suitable comes to mind. It may also provide him with the valuable insight after researching these solutions that maybe he in fact does need a network admin.

    It's also possible one of the solutions offered will actually suit his needs. He may not require slashdots "help" in any form, he may simply just reaching out to see if there is something he hasn't thought of yet. Communal advice and information. Welcome to the digital age.

    For all you know he could be an exceptionally skilled network admin and more advanced than the people around him so a conversation with them may have a lower chance of yielding results, so he thought he'd throw it out there and see if anyone with knowledge would come forward and offer some ideas.

    Being thorough doesn't mean you're "screwed" because you're at the point where you "need" help from people who browse slashdot.

  76. Assuming security responsibility by Anonymous Coward · · Score: 0

    You are asking for two things. Both these tasks are easy to implement, but a more important issue is security. You are interfacing with a system with very sensitive information. Employee SSN, bank info, full names, etc. You may also have access to sensitive company information. Employee names, Employee pay, etc. This is the information you need to protect or you will lose a customer.

    Setting up a notification system should be easy. Sending SNMP is one way of doing this, but you could also use HTTP to POST information to your website. You would not need to modify firewall rules, nor setup a VPN tunnel. By documenting the information and limiting it to generic data (what happened, and who did it so you know what happened) you can make your customer feel comfortable that you are taking security seriously. I must mention the "who did it" should be a unique ID to identify who performed the task that caused the issue and not the user's SSN nor name; keep it cryptic so that if your system is compromised you can ensure your customers that their information is safe.

    The second part, remote access, raises the bar significantly. As you will have access to all the customer's information and network, you might want to consult a network and security expert on best practices. To give your customer the ultimate level you might want to have some hardware that they can plug in to give you access and unplug to stop everyone from access. This could be as simple a VPN router or a POTS modem.

  77. www.pulseway.com Ultimate Flexability by Megabyte · · Score: 1

    This is an amazing product. I've used this in the past and LOVE it! Need to run a remote powershell command from your android? It does that. Dashboards for all the things? Has that covered.

    Check it out:

                    http://www.pulseway.com/

  78. Another option to reverse ssh tunnels and openvpn by mejustme · · Score: 1

    If you're using linux or BSD, another option to reverse ssh tunnels or openvpn would be EPS Conduits: http://eps-conduits.sourceforg...

    It was written with the goal of having a large number of remote devices form a virtual network for ease of management/maintenance.

  79. Cacti by fuzzywig · · Score: 1
    Cacti is a FOSS monitoring service, that can give you a big dashboard showing up/down status, and you can drill down to view graphs of pretty much anything you can monitor over SNMP. Oh, and you can have emails on up/down and reaching thresholds (eg "$host has reached threshold of 75% full on /var/" or whatever).

    We have VPNs to each data centre and client site and administer them over SSH generally. Some systems (eg ones dealing with customer details like credit cards) we have a single external facing host with Yubikey authentication to reach that network, and we use SSH port tunnelling to reach other hosts.

  80. Re: Reverse-SSH tunnel phone-home from remote devi by Anonymous Coward · · Score: 0

    Can you rebuild the transmission and improve shift firmness while doing it? Can you replace a damaged quarter panel and color-match the paint? That's the analogy for developing a full remote monitoring solution. He's probably already doing basic "replace the plugs" type work on the network, but there's a big jump to what he's asking, and he doesn't seem to know enough to even ask the right questions.

  81. Re: Reverse-SSH tunnel phone-home from remote devi by Anonymous Coward · · Score: 0

    No, an "exceptionally advanced admin" would not ask the question this way. We do know that's not the case here. :)

  82. PRTG is the most cost effective and feature rich by bdwebb · · Score: 1

    So about 7 years ago I tested out Nagios, What's Up Gold, Cacti, Zabbix, SolarWinds Orion, and a variety of other software monitoring solutions and the problem that we had for almost all of them is that they required heavy customization or that they were incredibly expensive when they included more initial customization regarding device discovery, included templates, etc. (a la SolarWinds). We finally settled on PRTG (www.paessler.com) because it had some of the industry standard devices templated already in a basic fashion, has an easy to use interface, and has the ability to be heavily customized.

    Another feature that we were really needing was remote monitoring for our customers as we are an MSP. All Remote Probe agents with PRTG will create an encrypted SSL tunnel between Remote Probe and your core server installation at your office or colocation. This requires no customization at all excepting if you are denying certain ports outbound from the probe server in which case you simply need to allow port 23560 (or whatever you've customized it to) outbound to your core server's public NAT IP). This does not give you remote control of servers necessarily but it does provide a channel for all locally monitored data to be sent upstream to your location without requiring an OpenVPN or anything like that (except if you wanted remote access you could have PRTG's remote probe piggyback across there as well and you would then also have the ability to remote control). You can deploy as many remote probes as you would like and can therefore centralize all your monitoring data as well as create reports, custom maps, and even provide customer access via nested Access Rights dependencies.

    One thing I will mention - SNMP trap monitoring is a wasted effort. I know there are many proponents of it out there but if you are not actively polling your data and gathering graphable results then you have no troubleshooting abilities, no trending reports, no data utilization analysis for service management, etc. You should configure templates for your devices to standardize them and monitor all of your critical data actively so can then use the historical information to say "Ok...this server just went down - why? Check CPU utilization - OH it looks like all cores on this CPU jumped to 100% CPU utilization just before this device went unresponsive. Let me check my individual process utilization - OH there's the process causing the problem." Troubleshooting done. Imagine receiving a trap for this device - if the device is already unresponsive by the time the trap is sent, the trap never reaches your monitoring server and everything is still hunky-dory. You may also have ICMP monitoring in place so you know the device is offline but is the ISP down? Is some LAN resource like a Router/Firewall/Switch down? Is the server down? Why? Most of these questions can be answered by historical monitoring data and I cannot say enough that SNMP traps are useless 95% of the time.

    For validation of my claims & experience with SNMP, I am a Principal Network Engineer for an MSP in LA for over 9 years and we currently operate a PRTG install for our MSP customer monitoring with over 18,000 sensors monitored actively, polled every 30 seconds.

  83. Re:Reverse-SSH tunnel phone-home from remote devic by Anonymous Coward · · Score: 0

    Really, it's time for your medication.

  84. Continuum by bitty · · Score: 1

    Check out http://www.continuum.net/. I've been using their services for over 5 years, and they've been steadily improving it since they split from Zenith Infotech. No, it's not free, but it's quite cheap per unit and you get a lot of bang for your buck. Remote monitoring and alerts on any service, remote access, at-a-glance dashboard, etc. With 500 clients, I'm guessing you'd rather spend your time monitoring the situation than putting together a custom solution.

  85. chef, nagios, collectd, graphite and logstash by Anonymous Coward · · Score: 0

    Similar situation, servers (appliances really) all over the world in customer's networks. We use chef to manage the systems (or puppet, pick your poison). Each system connects back to our management network using OpenVPN, certs managed by chef. Collectd runs on all servers, with some custom plugins for own stuff, plus statsd for instrumenting our own code. All collectd metrics and logs (via syslog protocol) are sent back to our management network, stored in graphite and elasticsearch respectively. Nagios is configued using the chef nagios cookbook plus our own custom layer, which dynamically adds servers and metrics as they appear in chef, and removes them if they are deleted. A good chunk of the checks are requesting metrics from graphite instead of going out to the servers directly, some are passive checks with logstash triggering alerts based on log patterns. We're still in beta-ish stage, and I currently have 3K check in Nagios, which would be impossible to manage by hand. This entire setup would be impossible without completely trusting chef.

    It took a lot of automation work to get to this point, but I'm confident I can easily scale the number of systems out in the wild and everything will continue to work with the exception of needing larger servers for nagios/graphite/logstash.

  86. Check out GroundWork Software by Anonymous Coward · · Score: 0

    We've worked with plenty of Open Source monitoring solutions and in your situation www.gwos.com might be useful. Essentially it includes Nagios, Cacti, RRDTool etc. and you can create custom monitoring for specific services or pick out the services that are already configured (the list is quite extensive). Hope that helps?

  87. Foglight by Anonymous Coward · · Score: 0

    I used to support Foglight which is an enterprise monitoring tool. At 500 system you are just big enough to be able to maybe justify using it. It's a bit expensive, but its pretty nice when everything is all setup.