Slashdot Mirror


Ask Slashdot: Remote Server Support and Monitoring Solution?

New submitter Crizzam writes I have about 500 clients which have my servers installed in their data centers as a hosted solution for time & attendance (employee attendance / vacation / etc). I want to actively monitor all the client servers from my desktop, so know when a server failure has occurred. I am thinking I need to trap SNMP data and collect it in a dashboard. I'd also like to have each client connect to my server via HTTP tunnel using something like OpenVPN. In this way I maintain a site-site tunnel open so if I need to access my server remotely, I can. Any suggestions as to the technology stack I should put together to pull off this task? I was looking at Zabbix / Nagios for SNMP monitoring and OpenVPN for the other part. What else should I include? How does one put together a good remote monitoring / access solution that clients can live with and will still allow me to offer great proactive service to my servers located on-site?

73 of 137 comments (clear)

  1. Reverse-SSH tunnel phone-home from remote device by Anonymous Coward · · Score: 2, Informative

    Set up a script to initiate a reverse-SSH tunnel from the remote device back to a monitoring server, set up no-login on the tunnel but distribute keys for the monitoring user on the remote devices.

    You should be able to passwordless login from the monitoring box over a completely secure link that doesn't require port-forwarding at the remote site.

  2. Scratch my back by Anonymous Coward · · Score: 2, Interesting

    Will you do my job if I tell you the answer? You've already gotten your start. What more do you need?

  3. NSA by Anonymous Coward · · Score: 1

    Should ask them!

  4. I just discovered NewRelic ... by WayneDV · · Score: 4, Interesting

    Check out www.newrelic.com - even their free service tier offers great features and it's easy to deploy on all servers

    1. Re:I just discovered NewRelic ... by astro · · Score: 3, Informative

      NewRelic is pretty sweet, as the parent says, even at the free tier. They will definitely bombard your email and phone with hard-sales pitches, though, and there's a giant cost leap from free to the next tier.

    2. Re:I just discovered NewRelic ... by Anonymous Coward · · Score: 1

      We have NewRelic deployed and pay for it. It is worth the money for us because we not only get the "Is it up" but get to see the software stack interact with the hardware. We had one client who had feature creep and we watched their VM start to die because of memory creep and it justified putting them on another box. When we showed them the reports, they were quite happy to write a check for the upgrade.

    3. Re:I just discovered NewRelic ... by WayneDV · · Score: 1

      To that point ... I installed it on 11 servers in 14 day "Pro" trial period. Sales guy contacted me by email, we exchanged 3 emails since I will be subscribing in the future but when I told him that I'm happy with free tier for now, there was no further push from their side. Since then I'm up to 30 servers and loving it.

      FYI: Server monitoring is a side product of theirs. Their main product is app stack monitoring - great for finding failures and bottlenecks in PHP, Ruby, Java apps etc

  5. Re:Central server by Noah+Haders · · Score: 2

    Would this centralized server be your universal remote server?

  6. Ping? by danknight48 · · Score: 2

    For Server active status (eg: am i dead?)
    Inside a while loop or sleep() if you cant be bothered.
    for(int i=0;iMAX_SERVERS;i++)
    {
              IcmpSendEcho(..........);
    }

    For everything else monitoring related. Employ someone to make a custom monitoring application ,or, Google "server monitoring software".

    1. Re:Ping? by Enry · · Score: 3, Informative

      For some reason, disabling ping is considered a security feature, so a lot of places block it at the firewall. Cloud services (I'm looking at you, Azure) also either doesn't allow it or can't do it.

  7. reverse ssh by Marqis · · Score: 1

    Have the clients connect with ssh to your server and open a reverse port. They'll each have to pick a different port on your server.

    Use something like autossh ( http://www.harding.motd.ca/aut... ) to make sure the ssh connection is always open.

    Having said all that, sounds like a great security hole if your server is ever breached. Plus lots of potential privacy violations.

    Marqis

    1. Re:reverse ssh by aheath · · Score: 2

      I agree that this creates the potential for a hug security that has the potential to compromise the privacy of all of the employees at 500 companies. The consequence of this breach might be worse there is a connection between his servers and a payroll system or any point of sale system. I also wonder his clients are willing to open up the ports required to support remote access to their data centers.

  8. Keeping track.. by Rigel47 · · Score: 1

    500 OpenVPN connections is going to be a bit of a headache to keep straight. Obviously you won't have 500 tun devices so it'll be a multi-client to server config. You'll need a means of knowing that 10.20.20.x is client x and 10.20.20.y is client y. Of course OpenVPN allows you to do this but maintaining that table by hand could be a bit of a pain.

    HTTPS solutions like NewRelic aren't an option because you want to be able to ssh back into the host..

    Assuming all clients will allow it I can only think to create an out-of-band registration process whereby the clients do something like HTTPS POST to a URL you manage. The POST would contain some degree of identifying information which your system would then use to configure a new OpenVPN client config.

    1. Re:Keeping track.. by fearlezz · · Score: 2

      You'll need a means of knowing that 10.20.20.x is client x and 10.20.20.y is client y. Of course OpenVPN allows you to do this but maintaining that table by hand could be a bit of a pain.

      You mean like the common name of the ssl certificate used to connect in the first place? Combine this with a client-connect script to update dns and/or the ifconfig-pool-persist option and you've got a great solution.

      --
      .sig: No such file or directory
    2. Re:Keeping track.. by dskoll · · Score: 2

      Managing the OpenVPN connections is not that bad. You give each client its own key and certificate and you use OpenVPN's ccd/ directory to assign VPN IP addresses.

      We use the following tools to monitor our servers, but we're only monitoring about 30, not 500:

      • OpenVPN for accessing the remote servers. SSH if we need to log on to the server to do something. Some of our more important servers include built-in KVM-over-IP ability which can be very handy if the OS locks up.
      • Xymon (formerly known as Hobbit) for monitoring the health of remote servers. We include some custom Xymon plugins to monitor SNMP variables. I find Xymon much easier to configure than Nagios, though it's not quite as flexible.
      • Munin for tracking performance and ensuring we have baseline data.

      I'm not sure how well this would scale to 500 boxes, though Xymon claims to be able to monitor "lots of systems".

    3. Re:Keeping track.. by Rigel47 · · Score: 1

      Right but how do you know which connection belongs to which client without setting it all up by hand? Presumably he'll have to initiated the connection via script or manually on the first go-round so I suppose that's the proper time to build out the mapping.

    4. Re:Keeping track.. by dbraden · · Score: 2

      There's no need to install Ansible on the remote systems, only on the machine running the playbooks. All Ansible activity is run over SSH and has no remote dependencies.

    5. Re:Keeping track.. by mlts · · Score: 1

      I personally have used Xymon with more than that many systems. It takes time to classify them, but it is doable.

      The price is right on Xymon, however, if I were to recommend a monitoring solution for both real time, "oh shit" monitoring such as a drive array about to fail as well as a historical log (for security and finding a baseline), I'd go with Splunk if possible due to the tools available, and the fact that you can send management-friendly reports about the health of the enterprise up the chain.

      Again, a monitoring server is one of the most sensitive boxes you can have (and usually one that isn't secure), so take the time to harden it and do it right.

  9. Or you could by kilodelta · · Score: 1

    Just download JFFNMS - it's a Net Monitoring system more than capable enough to watch 500+ servers. It can also be configured to do email and text alerting. It monitors CPU, Memory, Disk etc. It's pretty much the open source version of Nagios.

    1. Re:Or you could by Idimmu+Xul · · Score: 3, Informative

      Nagios is Open Source.. GPL V2 specifically..

      --
      The problem with slashdot is that most of its users were bullied and stuffed into lockers as kids!
    2. Re:Or you could by Anonymous Coward · · Score: 1

      Actually, forget Nagios. Lately It has been turning into a NIH syndrome/Copyright/ego_clash fight. Go with Shinken instead. Drop in replacement for nagios that scales and does not have childish problems.

  10. Re:Central server by danknight48 · · Score: 1

    Would this centralized server be your universal remote server?

    Is this a serious question? lol

    It would be a Desktop PC, constantly mobile. Works only in 64bit with local mouse and keyboard inputs.

  11. Openvpn and x11vnc by Wycliffe · · Score: 2

    I do something similiar. I use openvpn and x11vnc. I have a cron on each client that runs a
    small perl script that grabs the output of several programs like top, uptime, and sensors
    and then saves the results in an easy to parse file that my server periodically grabs so that
    I have stuff like cpu temperature, cpu usage, memory usage, etc...
    I also grab a screenshot of x11vnc using vnccapture.
    I also have a way to remotely activate reverse ssh if for some reason openvpn fails.
    My only problem with openvpn is key management. Creating and distributing unique keys
    to each client is kindof a pain.

  12. Hopefully this goes without saying by 93+Escort+Wagon · · Score: 3, Insightful

    Make damn sure your clients are aware of exactly what you're doing. They probably don't care about the specifics (e.g. openvpn, reverse ssh); but they need to know you can remotely access the boxes.

    It's probably a good idea to have some sort of document to give them that does spell out all the specifics - something they need to acknowledge/sign, with both of you keeping copies.

    --
    #DeleteChrome
    1. Re:Hopefully this goes without saying by dskoll · · Score: 4, Informative

      Actually, the model of remotely-managed on-premise appliances is not that crazy. Assuming it's done securely, you get the best of both worlds:

      If the customer's Internet access goes down, they're not dead in the water as they would be with a cloud solution.

      If you manage everything for them, then the box is completely hands-off... just like a cloud solution.

      There's an entire business category called "Managed Service Providers" whose vendors do exactly this: Remotely manage all aspects of your IT infrastructure so you don't need to worry about anything. For mom-and-pop non-technical businesses, it's an excellent model.

    2. Re:Hopefully this goes without saying by dskoll · · Score: 2

      The fact that a well-managed cloud service is multiply-redundant is of little consolation if your crappy DSL line goes down for 6 hours and your salespeople cannot access the CRM tool.

      What's more likely to happen: the loss of access to Amazon cloud services/internet, or a local box getting cacked

      Unequivocally for us: Loss of Internet access happens far more often than a server failure.

    3. Re:Hopefully this goes without saying by dskoll · · Score: 1

      Our DSL is not particularly unreliable. However, our servers are spectacularly reliable. They run Linux on decent hardware and we almost never have a server failure. Our most common cause of a server failure over the last 10 years has been power failures long enough for the UPS to decide we'd better shut down.

  13. OpenVPN + Nagios by 8083 · · Score: 1

    is the solution I use and is working well. Routers are 1U mini atx boards with pfSense. Nagios mostly with NRPE, SNMP for devices, on which I can't install packets. Works well for last ... 8 years or so.

    1. Re:OpenVPN + Nagios by sirsnork · · Score: 1

      Icinga rather than nagios... always... the simple basic changes to Icinga make it so much nicer to work with, even the v1 branch which is just a fork with some updates

      --

      Normal people worry me!
  14. zabbix is NOT an snmp manager by TheGratefulNet · · Score: 2

    not really. snmp is an afterthought for them and its clumsy as hell to add snmp to it. I tried and gave up. instead, I picked hobbit (uhm, the new name is 'xymon').

    xymon has its quirks but it was not hard to modify to add more snmp features to and its coding was not too bad to get thru. its not written in a lot of 'strange' languages, and that's a plus, to me, too.

    personally, I usually just write snmp code fresh, from scratch, using net-snmp mgr tools. its not hard and you get just what you want and you are not muddled down in lots of 'infrastructure' that someone else thought was good but useless to you (like zabbix).

    --

    --
    "It is now safe to switch off your computer."
  15. BMC Patrol by snowsnoot · · Score: 1

    Excellent monitoring solution can generate KPI based reports, email/sms/snmp notifiactions etc, comes with a bunch of out of box server monitoring modules and you can build your own with scripts or SNMP GETs. I swear by it.

    1. Re: BMC Patrol by snowsnoot · · Score: 1

      Im not using the agents :)

  16. Re:Security? by Anonymous Coward · · Score: 1

    "BUT before you set this up, be damn sure that you don't punch a hole in your customers' firewalls by having a VPN to your monitoring server. Having 500+ VPN connections from one Linux box to servers located in customers' internal networks might backfire at some point if it's implemented incorrectly."

    Just disable clien-to-client in the OpenVPN server (which routes all activity through the tun device) and setup iptables to accept only incoming/established connections on the tun device. Only allow the server to create new OUTPUT connections on the tun device (and only to ssh/snmpd/nagios-nrpe for example).

    Use certificates with a decent parsable name to figure out which client is where. Configure static ips for the clients to make it easy.

  17. Re:Central server by mlts · · Score: 1

    I would elaborate on that a bit. I would have in the colo facility a Cisco ASA or other hardened appliance, and use that for the VPN connection.

    I would then build a hardened server that accepts the stuff the parent points out, SNMP traps, syslog (both TCP and UDP), but I would recommend a tool like Splunk or a similar item. Splunk has served me well in my dealings. Once that is in place, I'd set up Splunk forwarders on critical machines for more detailed monitoring.

    From there, I'd create a dashboard for realtime reporting, and a daily report detailing notable events from the past 24 hours. One can customize this to their liking. You can even have the reports mailed to you via the VPN to an internal site.

    The Splunk server will need locked down, but if one is in IT, this is an assumed part of the skillset. I would at least leave SELinux enabled, enroll the Splunk server's SSL key in your PKI, and for the OS, enable SSH keys and two factor authentication. I might even consider placing the Splunk indexes on an encrypted filesystem so if the hardware is physically stolen, the data on your machines is protected.

    Again, the thing to be careful about is the fact that so much sensitive data is on this machine, so it needs a separate firewall, and the box itself needs to be hardened.

  18. Zenoss is awesome by Anonymous Coward · · Score: 1

    Zenoss is awesome and as your business scales so can it. Our organization monitors 5000+ servers worldwide in all sorts of places. Zenoss lets you do everything you'd want. Setup notifications for one or more servers, types of errors, and filters within filters. It's a rocking platform and if you're big enough, they'll set it all up for you for a fee.

  19. Stunnel for secure connection. by Serpent6877 · · Score: 1

    Would need more information on the locations. Running Linux, Windows, Solaris? I presonally use Zenoss for all of my monitoring. It is handling around 1800 devices right now and monitors all aspects of the network and servers. Zabbix uses agents. So you could run the server at your location and of course the agents connect to it for monitoring. People talk about needing a VPN connection to be safe. But another solution that I would do is use stunnel for encrypting. I do run a large openvpn setup as well. With this large of a VPN setup I would look at possibly using Quagga and doing RIP. It will be easier to manage all the routes and netblocks.

    --
    When all else fails, hire me!
  20. NAV works great by chipperdog · · Score: 1

    NAV is a great network and server monitoring suite...I have it monitoring much stuff connected over VPN.

  21. Re:PRTG by chipperdog · · Score: 2

    NAV has very similar functionality to prtg, but is completely open source.

  22. Re:PRTG by chipperdog · · Score: 1

    Network Administration Visualized is a good alternative to PRTG

  23. Look at the ELK Stack by SpzToid · · Score: 1

    The ELK Stack (ElasticSearch, Logstash, Kibana) are great tools for capturing logs from *anything*, indexing and massaging of the data captured, and then offering up visualization, searches, and dashboards (that refresh). Built with Angular.js so the speed happens.

    We could be talkin' web server logs of the NY Times servers, centralized and displaying dashboards in real-time, or maybe 24/7 sensor data streaming from the ocean floor. The ELK Stack can do it.

    First googled citation, and there's plenty more where this came from: http://thepracticalsysadmin.co...

    --
    You can't be ahead of the curve, if you're stuck in a loop.
    1. Re:Look at the ELK Stack by silas_moeckel · · Score: 1

      ELK works but frankly it's defaults do just about nothing. As a stack sure it's great but it needs to be added as an adjunct to a real monitoring system and it needs useful defaults and/or some sort of add on repository. The opennms boys are working on showing rrd data into ES.

      Pretty much you set up ELK and go great my logs are all one place but it does nothing by default nor is it easy to do anything useful with it. Adhoc searches of logs is great in all but your basically replacing ssh cat | grep. Take a common thing like percolating up an alert when a bit of redundant hardware fails and pushing that event into a ticketing system to the correct group and priority and ELK needs a lot of customization to do anything useful. Sure you can put an search in a window somewhere and make a human look but that is frankly going back 2 decades in sysadmin space. Devs seem to like it but it's pretty much an adhoc reporting tool for them.

      --
      No sir I dont like it.
  24. Re:Reverse-SSH tunnel phone-home from remote devic by BitZtream · · Score: 4, Insightful

    Or, do the right thing and hire a network admin so someone with a clue is involved.

    If you have to ask this question on slashdot, you need to change the question to something appropriate. Based on exactly what was posted, he doesn't have any idea what his requirements are. He knows the conceptual goals, but not the actual goals or requirements. Unless he is trying to change careers from whatever he is to a full time network infrastructure person he is going to be wasting a lot of time getting a clue. That means time he won't be spending doing whatever his actual job is.

    He needs someone who can look at his actual setup, figure what what actually needs monitored, and knows the appropriate ways to do it.

    Short of multiple Bennett hasleton length posts, and many discussions in depth, no answer coming from slashdot or all of them combined is going to be useful.

    Everyone here posting solutions has their own, certainly incorrect idea of what he wants but no one actually knows. No one so far has even started by asking the right questions. It's the blind leading the blind at best.

    --
    Persistent Volume manager for Kubernetes - https://github.com/dwimsey/openshift-pvmanager
  25. Re: Reverse-SSH tunnel phone-home from remote devi by Redbehrend · · Score: 1

    How does he not have it in the first place? Explorer with 500 client servers.. Soon as I had 5 servers I setup central mobile monitoring lol He needs to hire someone that knows what they are doing for sure. Google it and the top open source monitor comes up as a start...

  26. GFI MAX by DigiShaman · · Score: 2

    Problem solved. Next topic please.

    http://www.gfimax.com/

    --
    Life is not for the lazy.
  27. Call me silly by Princeofcups · · Score: 1, Insightful

    But shouldn't this have been part of the design BEFORE you rolled out 500 servers?

    --
    The only thing worse than a Democrat is a Republican.
    1. Re:Call me silly by thegarbz · · Score: 1

      I'll bit, and I'll call you silly.

      Many projects evolve over their lifetimes. This isn't just an IT thing. In many cases during the construction / commissioning stage you'll come out of the end with a wishlist of things and features to add in the future. Many such things would be impossibly expensive (both in money and lost time) to add during the project stage, and many projects which demand everything from the very beginning end up turning into an unmanageable behemoth.

      If the primary goal was to get 500 servers operational then adding this after the go-live is perfectly legitimate.

  28. Re: Reverse-SSH tunnel phone-home from remote devi by GTO44 · · Score: 1

    Maybe he should hire someone considering he has 500 servers and he is just now thinking of implementing a monitoring solution. And this board is only anonymous if you post as AC ;) also, fuck your granddaddy

  29. Nagios with NSCA? by z3r0w8 · · Score: 1

    I would write a wrapper though to make the whole thing bit more robust. Groundwork does this with their GDMA agent and it allows you centrally configure and have the client pick up its configuration.

    --
    -----
  30. You could outsource this. by rspott · · Score: 1

    Let me know if you want to do something like this and we can work something out. Reply to this and we can connect.

  31. I would do exactly what you outlined by maas15 · · Score: 1

    A place I worked for did exactly that. There are a few details that you should attend to - give out ip addresses based on the ssl certificate used by the openvpn client (and make sure you don't deploy the same ssl cert to two servers!), and have a method of restarting openvpn every time it crashes/disconnects (and exits). You'd be surprised how flaky enterprise internet connections can be. From there my work kept a database of all the openvpn servers and used it to generate a nagios config. Honestly, I've never loved nagios since it frequently doesn't QUITE do what I want, but it's good enough. If your clients are all internet accessable, I've been using a slightly expensive commercial service call Monitis which I really like. Contrary to what a number of people here have said, I don't think you need a network admin at all, if you can get the vpn stuff working with a simple acl (to keep clients' interns from bothering each other) then you should be set.

  32. SolarWinds Server & Application Monitor? by zmq503o1 · · Score: 1

    Have you considered SolarWinds Server & Application Monitor? The latest version, currently in beta adds an optional agent that negates the need for VPN tunnels. It supports overlapping IP address space, NAT traversal, passing through authenticated proxy servers, and communications are fully encrypted. These agents report back to a single, centralized server at your location, or in the cloud, such as Amazon EC2, Azure, RackSpace, etc.. More information can be found at the following links. https://thwack.solarwinds.com/... https://thwack.solarwinds.com/... https://thwack.solarwinds.com/... If that doesn't fit the bill, you should consider taking a look at N-able, which is a purpose built solution designed specifically with MSP's in mind. More information on N-able can be found at the following link. http://www.n-able.com/

  33. Re: That paid product looks like shit by DigiShaman · · Score: 1

    It does the job and fulfills all the requirements of the OP.

    I use it for this purpose, I should know. For example, if the Information Store service stops or the drive reaches a free space threshold, I'm going to be notified immideately!

    --
    Life is not for the lazy.
  34. Lemme rephrase by Munchr · · Score: 1

    I want to drastically increase my clients exposure to attack by opening remote holes in their network firewall through my equipment. How can I best go about doing so?

  35. Re:Reverse-SSH tunnel phone-home from remote devic by BitZtream · · Score: 1

    The reverse-SSH tunnel is the correct way to "phone home". Maintaining a VPN is a shit show.

    A blanket statement like this shows your cluelessness and shear ignorance.

    Without considerably more information neither you nor I nor anyone else can make such a statement.

    Pure Storage does it this way, and they are quite the experts.

    Oh well, since a company thats barely 5 years old does it this way, and since their primary business line is selling flash drive arrays ... not network administration and monitoring ... they must be the most qualified and perfect example to follow.

    IS IT the right way for THEM? Maybe. Maybe not. To pretend that just because they do it that way, they are experts again just shows your ignorance. Let me guess, you work for them on their monitoring team, don't you?

    --
    Persistent Volume manager for Kubernetes - https://github.com/dwimsey/openshift-pvmanager
  36. bash by LordThyGod · · Score: 1

    ... is your friend. A simple shell script run from cron every so many minutes to test to each server, and then text / email / raise an alarm if no answer. I'd do this from at least 2 locations to allow for transient network issues or the monitoring systems have hardware issue and tank. And don't use windows for critical stuff. A couple of low end linux systems on amazon or similar would work. Low cost, efficient and very manageable.

    1. Re:bash by i.r.id10t · · Score: 1

      Duct-tapey but it works. Lets add some bailing wire and include a phone with a limited data plan that you can pair with via bluetooth or usb. That way, when both their internet connection *and* the box you are monitoring go tits up at the same time you can be notified as well.

      --
      Don't blame me, I voted for Kodos
  37. Re:Reverse-SSH tunnel phone-home from remote devic by BitZtream · · Score: 2, Funny

    Just because you're unfamiliar with networking administration doesn't mean this needs to blown up into "hire a network guy". That's just ignorance and

    As someone who's been a network admin for a few years, I'm fairly confident in my statements. Do you do even minor surgery on yourself if you're not a surgeon? If you come to slashdot to ask how to do something for your business, you already fucked up and the only valid responses you should be getting from slashdot are help on finding someone who can help you. If he asked 'how do I find someone, like a consultant for a short term project, like this' that would be one thing. He didn't, he came here expecting a solution which illustrates his complete lack of understanding of the problem, THAT IS WHY he needs to hire a network guy.

    He is, by definition, ignorant, which is why he is asking for help ... clearly you are as well as your choice of words indicates. I suggest you learn what the word ignorant means before you brandish it about like an insult as you just end up insulting yourself through your own ignorance.

    (I suspect) trying to make yourself sound important on an anonymous message board.

    I have no need to make myself sound important, I certainly don't need your approval ... and if you bother to google for my nick, you'll find its not even a little difficult to link to a real name, address, and everything else. I'm not in the least bit anonymous. People have been able to recognize that nick and its association with me for 20+ years. On the other hand ... your post ... is from ... anonymous coward. Do you know the meaning of the word ironic?

    As my granddaddy used to say, if you don't know what you're talking about, it's best to not open your mouth and prove it. So no need to apologize, just take the advice and consider it a lesson learned. Best of luck.

    Your grand daddy said that too you a lot, didn't he? Did you ever wonder WHY he said it too you so much? Maybe he was trying to get some sort of point across to you ... Go look in the mirror and repeat those words until you get the point of them and who he was talking about. Hint: Its the guy in the mirror.

    You're an absolutely shitty troll. You just suck at it. Nothing you've said did anything other than show how stupid YOU are, not me.

    --
    Persistent Volume manager for Kubernetes - https://github.com/dwimsey/openshift-pvmanager
  38. Re:Reverse-SSH tunnel phone-home from remote devic by Pentium100 · · Score: 1

    As someone who's been a network admin for a few years, I'm fairly confident in my statements. Do you do even minor surgery on yourself if you're not a surgeon?

    I am a network (and Linux) admin by profession, but I can also repair my audio equipment and do some repairs on my car, even though I do not work as a car mechanic or electronics repair guy. While I could find a mechanic to repair my car (and sometimes I do), a lot of the time is is cheaper and faster to do it myself.

    So, if the OP wants to create a monitoring solution himself (assuming he knows something about the monitoring systems) more power to him. I probably would ask a similar question if I had to monitor 500 remote servers that are in different locations (if they are all in the same place I would just use VPN). It would be possible to use VPN or SSH tunnels or something else, but sometimes one may need an advice from others as to which option is the best.

  39. Re:Reverse-SSH tunnel phone-home from remote devic by Anonymous Coward · · Score: 1

    I agree. A meeting needs to be held with the technical team to determine what exactly needs to be monitored.

    With that being said, ask yourself a few questions:

    Are you looking for a heartbeat?

    Are you actually more concerned for the applications running on the servers?

    Are you looking to monitor individual pieces of hardware, e.g. CPU, RAM, etc.

    Are you trying to determine if a there is a network hardware failure as well, e.g. router, switch, etc. (did a switchport die and did I lose a particular subnet or cluster?)

    All or none of these things can be important, but BitZtream is correct. Without a lot more knowledge of what is needed there is no way of giving OP a method of accurately monitoring the required infrastructure.

  40. Check out "The Assimilation Project" by mnemotronic · · Score: 1

    Take a look at The Assimilation Project : What we do: Continually discover and monitor systems, services, switches and dependencies with very low human and network overhead.

    --
    The Russians have won. They have made the world a cesspool of distrust, greed, fear and hate.
  41. Re:Reverse-SSH tunnel phone-home from remote devic by pspahn · · Score: 2

    You sound like a Windows admin for a gov't entity.

    You spend a lot of energy telling people they do it wrong without having any real insight or advice on how to do it correctly.

    A blanket statement like this shows your cluelessness and shear ignorance.

    What does his knowledge of a specific cutting tool have to do with anything?

    --
    Someone flopped a steamer in the gene pool.
  42. Re:Zenoss is awesome - Zenoss Core + OpenVPN by JayTech · · Score: 1

    Anon - Why base your opinion on an experience back in 2008? This is six years later and the product has matured since then. The Zenoss Core (http://www.zenoss.org) open source project is bigger than it's ever been, it is very reliable, and is used by many large corporations today.

    OP - For what it's worth, any open source monitoring software should play just fine with OpenVPN. However, the monitoring feature set should be simplified into a single interface, you don't want to have to be fixing scripts and maintaining the software all the time.

    I actually used to deploy OpenVPN + Zenoss for remote site monitoring. In my case I needed to monitor multiple systems at the customer premises (using Zenoss Enterprise/Service Dynamics for the remote collector integration), but you should have it a bit easier since you only have one server to monitor. I found configuring OpenVPN to be a bit of a challenge, but once that part was done the rest was a piece of cake. It will be a lot of work with the sheer volume of 500 clients (with that amount of traffic you might even need to break it into two OpenVPN endpoints) but I'm sure you are already aware of that.

    I would say definitely take a closer look at Zenoss Core. A side note, Zenoss Service Dynamics is their enterprise product with advance features, but for you the "technology stack" needs only to consist of Zenoss Core (free) + OpenVPN. Set up OpenVPN as you described so that the clients deployed on your remote servers can connect back through https - as long as they have an internet connection no holes need to be poked through your customer's firewalls. Drop Zenoss on the OpenVPN endpoint box(s). Then use the OpenVPN IPs to monitor the servers. For each individual server, configure the SNMP string if Linux, or set up WMI if windows (no need to configure traps, Zenoss polls the boxes at specific intervals). Use the wizard on the Zenoss web interface to add the host and model it. Away you go, you can now see the events in the Zenoss console for everything from ping status to CPU utilization. Events go to the console which you can monitor, or you can easily set up e-mail alerts to trigger. For example, say one of the disks throws a SMART error; trigger an e-mail you so you can ship the customer a new disk to install just like NetApp does.

    As I mentioned, you can definitely use Zabbix or some other variant to do the monitoring part. I researched and played with many monitoring solutions (commercial and free) before I settled on Zenoss. What made the difference for me was that I found I was spending way too much time learning the quirks of the software (e.g. Nagios - config file to add a client, really! SolarWinds - Agent installation required, really!) and not enough time actually deploying monitoring to the targets. Good luck, hopefully this info helps you find the right fit for your environment!

  43. Ping is not reliable by mveloso · · Score: 1

    Ping is almost the worst way to check to see if your server is up. In fact, certain machines will return an ICMP response even after you've broken into their bios-equivalent (hello, Solaris).

    Do a service level check.It's not that hard to do a curl instead of a ping. A curl's results can show you if it's present and functioning. A ping just shows you that the network interface is responding or not.

    People disable ping because if you don't know a server is there you can't attack it. It's like enabling MAC address filtering - it doesn't really help that much, but it in a specific set of circumstances help a bit.

    1. Re:Ping is not reliable by Enry · · Score: 1

      People disable ping because if you don't know a server is there you can't attack it. It's like enabling MAC address filtering - it doesn't really help that much, but it in a specific set of circumstances help a bit.

      If there's no other services presented to the world, yes. But a simple port scan will tell you it's up and that doesn't take long to do.

  44. Security and liability: think Target by mveloso · · Score: 1

    The media says Target was breached due to a compromise at their HVAC vendor. Do you want to be the vendor that gets hit with a liability suit because someone broke in through your network?

    It's obvious from your question that you're not really sure what you're doing. SNMP? That's for network crap, not for server and application level stuff. Why would you even talk about SNMP? Why would you even want a VPN into the customer network?

    If you need access to your server, write it into your support contract, and ask the vendor for a VPN login. Then the vendor can turn that login on and off when an outage occurs. Then just use NewRelic for monitoring (assuming your machine can get out).

    If you need continuous access to your server, write it into your support contract, then make sure that (1) you really need it, and (2)your security is better than your customers' security.

    Or, if you want to screw everyone, just run a TeamViewer instance on it and connect to it on the sly. I'm sure your customers would love that, but that's what you're basically asking them to allow you to do.

  45. Status updates by AndyCanfield · · Score: 1

    I manage a hub server and a backup server. Every 60 seconds the backup server crontab (wget) fetches a 'web page' from the hub server which as a side effect records the callers IP address into a file. Even though the backup srever has a dynamic IP address I can always find it by going to the hub server and looking into that file.

    I have a page I can go to on the hub server which checks the timestamp on the file BackupServer.ip. if it is suspiciously old then that web page turns red and tells me that things are cut off. If all is OK the background stays green. You can see it at http://gregor/ServerCheck.php. I check it every time I start my browser.

    It would be trivial to support more than one call-in server. It would be easy to add more complex status information. From your notebook computer anywhere in the world you can go to that web page and see that all is OK, or, if it is not, what remove server has a problem.

  46. Whats Up by bev_tech_rob · · Score: 1

    Our company uses 'Whats Up' by Ipswitch. Currently monitoring over 2500 devices such as servers, routers, temperature sensors. You can ping devices, monitor for SNMP events, logged events in Windows, AIX, Linux, WMI monitoring, services, tasks.... You can script custom monitors either via VBscript, Powershell, or JavaScript. You can script custom actions for Whats Up to take upon detecting a condition. Can restart services on either *nix or Windows boxes if they go down. Can launch applications if needed if a condition is detected. Can create audio, visual, and email alerts, as well as SMS. They license on a per-device basis as opposed to a per-port basis like SolarWinds. Only thing I don't care for on this software is you can only run Microsoft SQL for a database. Can't use any open source solutions. The default install uses MS-SQL desktop version, but the db size is limited. If you need to go bigger, you have to install a full install of SQL on the server, or connect to a remote SQL server on your network to host your database (as we are). My .02 cents...

    --
    You're messin' with my Zen Thing, man.....
  47. Cacti - It's GNU by Tyr07 · · Score: 1

    I've used cacti to monitor servers before, works quite well.

    Supports many SNMP functions, easy to setup.

  48. www.pulseway.com Ultimate Flexability by Megabyte · · Score: 1

    This is an amazing product. I've used this in the past and LOVE it! Need to run a remote powershell command from your android? It does that. Dashboards for all the things? Has that covered.

    Check it out:

                    http://www.pulseway.com/

  49. Another option to reverse ssh tunnels and openvpn by mejustme · · Score: 1

    If you're using linux or BSD, another option to reverse ssh tunnels or openvpn would be EPS Conduits: http://eps-conduits.sourceforg...

    It was written with the goal of having a large number of remote devices form a virtual network for ease of management/maintenance.

  50. Cacti by fuzzywig · · Score: 1
    Cacti is a FOSS monitoring service, that can give you a big dashboard showing up/down status, and you can drill down to view graphs of pretty much anything you can monitor over SNMP. Oh, and you can have emails on up/down and reaching thresholds (eg "$host has reached threshold of 75% full on /var/" or whatever).

    We have VPNs to each data centre and client site and administer them over SSH generally. Some systems (eg ones dealing with customer details like credit cards) we have a single external facing host with Yubikey authentication to reach that network, and we use SSH port tunnelling to reach other hosts.

  51. PRTG is the most cost effective and feature rich by bdwebb · · Score: 1

    So about 7 years ago I tested out Nagios, What's Up Gold, Cacti, Zabbix, SolarWinds Orion, and a variety of other software monitoring solutions and the problem that we had for almost all of them is that they required heavy customization or that they were incredibly expensive when they included more initial customization regarding device discovery, included templates, etc. (a la SolarWinds). We finally settled on PRTG (www.paessler.com) because it had some of the industry standard devices templated already in a basic fashion, has an easy to use interface, and has the ability to be heavily customized.

    Another feature that we were really needing was remote monitoring for our customers as we are an MSP. All Remote Probe agents with PRTG will create an encrypted SSL tunnel between Remote Probe and your core server installation at your office or colocation. This requires no customization at all excepting if you are denying certain ports outbound from the probe server in which case you simply need to allow port 23560 (or whatever you've customized it to) outbound to your core server's public NAT IP). This does not give you remote control of servers necessarily but it does provide a channel for all locally monitored data to be sent upstream to your location without requiring an OpenVPN or anything like that (except if you wanted remote access you could have PRTG's remote probe piggyback across there as well and you would then also have the ability to remote control). You can deploy as many remote probes as you would like and can therefore centralize all your monitoring data as well as create reports, custom maps, and even provide customer access via nested Access Rights dependencies.

    One thing I will mention - SNMP trap monitoring is a wasted effort. I know there are many proponents of it out there but if you are not actively polling your data and gathering graphable results then you have no troubleshooting abilities, no trending reports, no data utilization analysis for service management, etc. You should configure templates for your devices to standardize them and monitor all of your critical data actively so can then use the historical information to say "Ok...this server just went down - why? Check CPU utilization - OH it looks like all cores on this CPU jumped to 100% CPU utilization just before this device went unresponsive. Let me check my individual process utilization - OH there's the process causing the problem." Troubleshooting done. Imagine receiving a trap for this device - if the device is already unresponsive by the time the trap is sent, the trap never reaches your monitoring server and everything is still hunky-dory. You may also have ICMP monitoring in place so you know the device is offline but is the ISP down? Is some LAN resource like a Router/Firewall/Switch down? Is the server down? Why? Most of these questions can be answered by historical monitoring data and I cannot say enough that SNMP traps are useless 95% of the time.

    For validation of my claims & experience with SNMP, I am a Principal Network Engineer for an MSP in LA for over 9 years and we currently operate a PRTG install for our MSP customer monitoring with over 18,000 sensors monitored actively, polled every 30 seconds.

  52. Continuum by bitty · · Score: 1

    Check out http://www.continuum.net/. I've been using their services for over 5 years, and they've been steadily improving it since they split from Zenith Infotech. No, it's not free, but it's quite cheap per unit and you get a lot of bang for your buck. Remote monitoring and alerts on any service, remote access, at-a-glance dashboard, etc. With 500 clients, I'm guessing you'd rather spend your time monitoring the situation than putting together a custom solution.