Ask Slashdot: Best Way To Block Web Content?
First time accepted submitter willoughby writes "Many routers today have the capability to block web content. And you all know about browser addons like noscript & adblock. But where is the 'proper' place for such content blocking? Is it best to have the router only route packets & do the content blocking on each machine? If using the content blocking feature in the router, will performance degrade if the list of blocked content grows large? Where is the best place to filter/block web content?"
Unplug your modem. Internet is now filtered. Enjoy your day!
Or, perhaps, sitting down with your users and discussing with them how to surf intelligently and safely.
And you all know about browser addons like noscript & adblock. But where is the 'proper' place for such content blocking?
If you're talking about adblocking, the 'proper' place is at your visual cortex where images are processed -- and I know I'm alone in that unpopular view. Blocking ads is like throwing a soda can out a car window in that if one person does it, it's not a problem and it appears to benefit them modestly. But if everyone does it, it ruins the very thing you're enjoying. I can understand why you'd do it if the ad was a massive flash blob but many ads by Google or just images aren't resource intensive.
I've clicked on ads and purchased something twice in my life from ads on a site. Once it was cheap shirts with funny designs on them (I needed new gym shirts) and the other was an eBay auction with a Buy It Now price lower than what I was looking at on that site (not sure how that works). I consider myself a pretty sophisticated person who is "above" advertising but anecdote-wise it's worked on me twice that I can think of. Removing that rare occurrence completely ruins the revenue model.
My work here is dung.
I prefer at the proxy level. Dansguardian/Squid/ClamAV is pretty easy to set up on your distro of choice.
I hate sigs.
I envisage an HTML feature where you can click on something and have it labelled spam at the ISP.
Allowing this info back to the scum that served it would be a privacy invasion of the worst kind.
Perhaps some enlightened ISPs could charge charge people double for serving shit. They would get my business for sure!
I truely believe that if the ads were not so horribly intrusive and bandwith hogging, they could/would be ignored or even watched. Just last night, I watched a really great advert on TV yesterday - way better than the program it was embedded in - watched the ad to the end, and then ditched the actual program! However, I have stopped visiting certain websites because the amount of flash they serve makes it impossible to actually scroll though the content!
Please feel welcome give me the standard spam prevention review form ;-)
Sent from my ASR33 using ASCII
Precisely.
There is no "proper", or "best practice" place. Your two questions are entirely dependent on your use-case scenarios. If you want to block flash scripts on your kids browsers, do it host level at the OS. If you are dealing with a gigantic 2000 employee office campus, then you'd want to probably handle that centrally on a giant honking appliance/router designed for it where you can centrally manage policy.
But ... you can flip both scenarios blocking mechanisms I just mentioned and they'd still work. "Proper" can be entirely subjective based on what you're trying to accomplish and other factors involved
I use OpenDNS...works well and works regardless sof browser.
http://www.kickstarter.com/projects/600284081/adtrap-the-internet-is-yours-again?ref=search
If you want to filter web content use web proxy and advertise it by default on the network. See http://en.wikipedia.org/wiki/Proxy_auto-config and http://en.wikipedia.org/wiki/Web_Proxy_Autodiscovery_Protocol. GlimmerBlocker is a very good ad blocker for Mac that works as a proxy with stunning results.
Bragi Ragnarson Lawful Good (I change the law when it's not good)
How would you like to filter out SSL traffic on a intermediate device? Do you have access to fake CA certificates recognized by the majority of web browsers?
No problem if you use active directory group policies and a squid proxy with ssl-bump and dynamic generated certificates.
Simply use a group policy to push the proxies cert out to the workstations as a trusted root certificate. Problem solved.
Now you can filter out naughty HTTPS sites. Also anyone with root access to the squid proxy can extract all kinds of interesting info from the users HTTPS sessions and manipulate them in interesting ways. And the only way the users would know is by manually checking the certificate. "Whats this Google certificate doing being signed by '*'?"
When you do this using Microsoft TMG theres a big red warning "You may want to check the legal implications of what you are about to do".
In the free world the media isn't government run; the government is media run.
According to the EFF, Google has removed Adblock plus from the Google Play, citing that it violates Google's terms and conditions that stipulate that apps will not interfere with any other app on the store. This only affects android so far, but I imagine now that Google has decided that content blocking is a bad thing, I would imagine that the chrome and firefox extensions will follow. And, sadly, it's probably only a matter of time before Google turn their considerable talents to making sure that any method will fail. I'm not interested in starting a flame war here; I'm just pointing out that when the pre-eminent search engine on the planet weighs in on content blocking in such a heavy-handed way, it can't bode well for any of us.
Blocking content at the router/firewall is the best place to block it inside your network. Otherwise you're dealing with keeping several machines up to date. As IT infrastructure becomes more diverse (Mac, Windows Flavors, Guests etc) keeping individual machines updated will be harder than a centralize point. Another option is to force users to utilize a specifc DNS server (ie http://www.opendns.com/business-security/). Then all you do is block DNS traffic destined for any other DNS servers.
I'd avoid the $50 walmart router and look at some stand alone firewall/routers with good filtering options: IPCop (http://ipcop.org/) + URLFILTER (http://www.urlfilter.net/) or Cop+ (http://home.earthlink.net/~copplus/) or UnTangle (https://www.untangle.com/store/lite-package.html)
Will it slow down your connection? It can if you do not use fast enough equipment, but in general the price of CPU cycles isn't an issue when using PC based solutions.
"Science is about ego as much as it is about discovery and truth " - I said it, so sue me.
I do it on the /etc/hosts level on my dns server. You can find large lists of ad domains that can be added to your hosts file with 127.0.0.1 or 0.0.0.0 to cause them to fail. This covers all machines on your network that use your dns server. The one I use is http://winhelp2002.mvps.org/hosts.txt however they have become slow with updating it. You might want to invest some time in looking for one that is updated more frequently.
It lives in C:\windows\system32\drivers\etc\hosts on windows systems at least up till win7.
Here is an add-block hosts file: http://pgl.yoyo.org/as/serverlist.php?showintro=0;hostformat=hosts
This info is brough by a Linux user... :-)
I have FreNAS set up on a fairly modest box, originally intended to just host a few files. Then I got curious about just this thing, and installed squid in transparent mode with squidGuard. I want to block tracking and ad content at the network level as a security and privacy concern. I installed a blacklist from squidGuard's website and enabled the appropriate domain and url lists.
After about a week, I must say I'm rather impressed. Caching all http traffic while simultaneously blocking ads and trackers noticeably improved website response times, both for cached and non-cached pages. This improvement is even more dramatic on slower connections. So far, no false positives and only first-party ads aren't blocked. Even better, the transparent proxy means no client-side configuration.
As far as lists affecting speed, squidGuard stores domains in a Berkeley-DB optimized database format that does not degrade performance with even huge blacklists (I think my blacklists are running over 1M domains right now). The real speed hit comes from using regex. However, my simple domain-based blacklist works so well I feel no need to go that route. Besides, I don't want to block first-party ads.
One solution is a service that filters domains at the DNS level, such as OpenDNS.
But does anyone know of a similar service on the IP level? Malware attackers may not cooperate by using domain names; IP addresses are less hassle for them, less attention-getting from the average end-user (who knows somewebsite.ru is wrong, but not 134.14.215.12), and they bypass DNS-level security. The IP-level filter would have to be either,
* Something like an RBL, but for all attacks not just for spam.
* A proxy to a service that scans Internet content for attacks, again like their email equivalent (MessageLabs, Postini, etc.). This would be like the malware scanning on some firewalls, but I find those slow down connections too much (especially for fiber-level bandwidth). A datacenter would have much greater bandwidth capacity and much greater scanning capability than the local firewall.
Does anyone provide these services?
In your pseudocode, how would the program determine which fixed-position block elements within a page are "these pop-ups" and which are essential navigation?
If you have a job where you work with a computer, you can almost certainly afford to carry your own personal computer in your pocket so that you do not need to expose your work network to malware
Someone who brings in a computer would be exposing his work network to whatever malware is installed on the personal computer in his pocket.
Don't like it? Don't work there.
If you grew up in a town with one dominant employer, and this employer had a policy with which you did not agree, where would you find the money to relocate to another town?
I assume you try to increase the convenience of browsing and not to restrict anyone of the information (the latter I don’t think is possible). Any blocking will have some unintended effect. Router dns poisoning works relatively well. I had it for a long time and enjoy it. I like that all my machines, including any mobile clients connected to my wi-fi, have less ads displayed. My main purpose is to block tracking sites, rather than disable the ads. I also like the fact that the page content does not change, no scripts get inserted or modified, only the third party sites are blocked.
But... There were cases when I had to disable or modify the blocking. Hulu detects that the ads are blocked and takes a couple of minutes for a timeout to happen. It might be OK to allow a 30 second ad to show in that instance. A checkout in a few online shops may not work at all if the tracking is blocked. Yes, it is the problem with the sites, but I had to enable tracking a couple of times so that I could complete the checkout. Many of the referral sites stop working by clicking the products directly, as the case with goodgle shopping.
While doing some investigation I was shocked to see how much data is shared with third parties even by the big name stores. Every single product you view on a shopping site may generate notifications to facebook, twitter, pinterest, etc. Everything that gets placed in a shopping card may generate “likes” behind the scenes if you have another instance of the browser with logged in profile open. The amount of tracking is phenomenal, and it is my right to restrict it.
There's no such thing as "illegal download"
Agreed, and generally you should think carefully what you want to block. It's unethical to cut the main revenue stream of a website. Of course at some point ads can become unbearably annoying, but at that point you shouldn't visit that website at all.
In my opinion, as a network engineer, routers should never be used for security functions as it just isn't scalable from a support and management perspective (i.e. keeping settings the same across a large number of sites). If you need to block traffic then you need to buy a Firewall and/or a Proxy server. If you can just afford one device, buy a firewall. Most Firewalls can also support routing and routing protocols plus they are optimized to handle the additional overhead of security services.
Unless this is a small environment (less than 30 people) you also do not want to perform security functions on the client as it also doesn't scale well. Granted, you could probably do something with AD group policies and login scripts, but it eventually becomes more difficult to manage in comparison to a Firewall/Proxy solution. In addition, if your clients have Admin access then they can bypass your security by changing the local client settings.
Finally, the organization of your company will also influence how content filtering is deployed. I work in a large organization where network security is a separate group from the WAN group. In this type of organization, it makes sense to keep the security devices separate from the WAN and Internet network routing devices. In smaller organizations, these two support services may be combined.
The thing I don't like about it is that it ruins the certificate trust system. With every site signed by the same certificate, even bad ones are accepted by the browser and there is no way to tell them apart.
Obviously, the best place to get rid of annoying web content is at the source, by not posting it in the first place.
[Before anybody gives a response about Internet freedom, that's well and all, but for certain applications, you only need to have employees access a few websites--like say a corp HQ information system.]
There are many routers that have a way to blacklist certain sites and keywords, though that's basically useless (a few mL vs the ocean?).
Whitelisting would be much more handy, but most routers don't support it.
Not only that, but custom Linux router firmware doesn't (easily) support it. Not DDWrt or Tomato. OpenWrt: you're looking at compiling a lot of stuff yourself. Gargoyle does, but you're giving up a lot of OpenWrt features.
Not only that, but custom Linux router distros (meant for running on x86) like ClearOS and the like don't offer an easy whitelist solution, either. Easy would be something like offering an HTML setup page for the whitelist, and optionally, showing a "This page isn't allowed. 1) OK, 2) Request adding to whitelist" when someone requests an non-whitelisted page, and then the admin can easily click through the whitelist requests.
NOT easy: users having to call you up and then you have to vi the squid file.
Somebody must have figured this out by now?
I'm not a lawyer, but I play one on the Internet. Blog
I for one would not want to pay for the router powerful enough to parse every webpage that passes through it.
Also it would be a far bigger pain to update and modify.
Troll is not a replacement for I disagree.
to live in Iran
Somewhere along the way, the internet isn't meant to be 'free'.
Somebody has to pay for the bandwidth, the infrastructure, etc.
Then comes along content. Content can't always be 'free'. Someone has to place it on the web, someone has to maintain it, someone creates it and depending on the complexity of the content, there are 1 or more content creators and associates/affiliates getting involved and eventually people need to make a living.
Here's the point I'm making with the following example:
My wife plays 'Wordsmith' the free version on her Android phone and must suffer advertising. I, however, paid for my Wordsmith and thus, i'm ad free.
So, What I believe is very important is that user's should KNOW if there are ads in a site prior to entering. Just like users know that the 'free' version of Wordsmith will display ads.
Ads should not be forced onto users, but users should know that there will be ads, suffer them or get out. Or, pay a modest fee and never get bothered. That would make sense in a fair world.
What started as Dynamically Loaded Zones has now morphed in to Response Policy Zones which are useful for sinkholing malware domains by feeding multiple sources. This is more effective than trying to manage all your clients by forcing Adblock & subscriptions to malware filters and has the added bonus of working with all browsers & apps regardless of OS or device. A good write up may be found here.
http://sourceforge.net/projects/loic/
Give some thought to blocking at different levels. Blockers in browsers are obviously very limited to that browser's traffic. The hosts file can be effective for all traffic from a single machine. DNS blocking can be quite effective. For example, OpenDNS allows 25 domains to be blocked with their free plan, more with their nominal cost paid plan. Their Umbrella product works well for mobile and cellular devices that don't go through your own router. These are all very easy solutions, and either free or low-cost, with little setup required. Defense in depth allows blocking at the appropriate level for a given threat. Regarding the ethics of blocking ad content, I suspect most people wouldn't object to unobtrusive ads per se, but unfortunately most major sites incorporate numerous tracking services, so ads come with a serious sacrifice to online privacy.
drinkypoo wrote: "If you have a job where you work with a computer, you can almost certainly afford to carry your own..."
JazzLad wrote: "...smartphone, typically with dedicated internet."
That's still at least a $420 per year expense (source: virginmobileusa.com), especially for someone who's currently paying about one-fifth of that. Have circumstances finally changed such that a smartphone with a data plan, in addition to what one is already paying for Internet at home, is no longer a luxury but now a necessity?
Take a look at the devices from Fortinet ... decent AV/Malware as well as webfilter with "the usual" load of different categories (and the ability to filter based on groups defined e.g. by SSO info from an ADS). Add to that many additional security firewall features, IPS, security scanner, ... to top it off, it's a lot more affordable with better throughput than many (all well-known?) competitors ...
1. If your a business: Institute a policy, simply fire those that violate it, its much cheaper than a router, log things peek every now and then. 2. If your a parent: use parenting? keep an eye on internet usage, disallow internet after hours. or you know be an american find a piece of software to help raise your kids, blame government, education systems, and any thing else for why your kids turn out to be fat lazy unemployed pieces of shit.
Are you a parent trying to keep your kids from porn? Are you a business trying to keep your workers on task? Are you a government trying to control the eyeballs of your citizens? Are you just trying to keep ads away from your personal eyeballs, malware from your personal devices?
If it's for your own personal use there are two approaches:
1) Do it on the device. This has the advantage of being easy to pause if it causes a web site or service to stop working. It has the down side of not being centrally managed. You'll have to set it up on all of your devices/browsers. It may not be available for certain mobile platforms.
2) Do it centralized through a proxy. You only have one place to set it up and you run all of your devices through the proxy. More of a pain to self tune, and you have the added overhead of running a proxy.
If you're one of the other use cases and you want to use keep your users from accessing certain kinds of content, there's really only one answer: Do it as far upstream from your users as you can get. Because the users are not going to be happy with it and some will do everything they can to circumvent it. Ideally you're on a network where you can filter all of their (non-wireless) traffic through a single controlled point where you need physical access (lock and key) and a passcode to make changes. If you can remote admin it, or if people can access the 'net at large without going through that point, you've lost the battle.
Blocking at the web browser level, where the blocking program has an idea of what's going on, works best. Blocking at the IP level will stall out some sites. It's technically possible to block in the browser in such a way that the site can't figure out that it's being blocked. Few sites detect ad blockers yet, but more could. It may be worthwhile to delay loads of ad sites and see if this stalls the loading of the real content. For mobile, it would be amusing to have an ad-blocking proxy site which reads the ads into the proxy machine but never sends them over the air link.
We need a new level of popup-blocking technology, one that understands HTML layers and decides which ones get to appear. Anybody working on this? Also, most of the existing ad blockers run off of big lists of regular expressions, which are manually updated. That's rather retro technology. They should be using classifiers.
Blocking tracking sites is usually a win. For this page, I'm blocking Google Analytics and Comscore Beacon, using Abine's DoNotTrackMe Firefox add-on. This blocking has the amusing side effect that CBS shows will run without showing any ads.
Of course, with "apps", it's much tougher to block. It may be necessary to run apps under a virtual machine that prevents the app from doing certain things. An ad-hostile version of Flash might be worth constructing.
Should some ads get through? We offer Ad Limiter, which declutters Google search result pages by removing all but one ad. We pick the one ad based on our ratings of site legitimacy. Interestingly, most users of that add-on seem to be business sites - usage is high on weekdays and drops off on weekends. There may be a market for business-based ad blocking products.
That kind of element should not be blocked. A popup-like div does a fine job of alerting the user to something
Something in this case being a "special offer".
Even if it's modal to the window it still dies when you navigate away from the spawning page.
If the majority of ad-supported web sites switched to using a pop-up-like div for advertisement, and you were to navigate away from pages that use a pop-up-like div for advertisement, you'd be navigating away from most pages that that aren't amateur or subscription. So what would the web be for?
Internet Control Messaging Protocol is used to control and diagnose network components. DNS values are data, so they use User Datagram Protocol.
42. That's the answer to one question. If you choose not to ask some other specific question, "42" is as good an answer as you can get.
Being uninformed about a subject, and therefore needing help figuring out which questions to ask, I can understand. People who expect a correct answer, while obstinately refusing to decide what the question is, baffle me with their studity.
I tend to think it's unethical to have every move I make tracked by hundreds of different companies.
Fully agree. Although that's more about datamining than advertising...but unfortunately they are often bound together these days.
I have no need to block static ads. I get annoyed at ads with motion though, but they're easy to block. Animated gifs, just hit ESC in Firefox, they stop.
Then I use flashblock which disables all flash-based content. I can selectively choose any content to view it, such as youtube videos and the rest of the flash ads are still blocked.
Ads still get through, and I'm not annoyed at all the flashing/blinking and bandwidth-hogging ads as they are blocked or stopped. Easy.
Normal people fund their own website if they want people to see them. If you need ads, then take it offline.
This is true, but if we start to talk about large websites you obviously can't fund them from some guy's pocket.
The numbers vary by a couple orders of magnitude depending on the traffic. MOST of tbe value in advertising is passively seeing ads, though, building brand recognition rather than immediate action like clicks. You don't click on Metlife Stadium or FedEx field, but Reliant paid $320 million to put their name on Reliant Stadium. Ever clicked a Coca-Cola commercial? Coke spends $3 BILLION per year for you to see their ads, to build brand awareness.
The internet allows you to track clicks, but still most of the value isn't in clicks, but in impressions - brand awareness.
Coke spends $3 billion on advertising every year to build brand awareness. That's the difference between Coke and generic soda. You can't click their TV ads, but most people go to the store and buy Coke, not "cola soda" because having customers see ads works, whether they click the ads or not. Nobody ever clicked a TV ad.
DNS can use udp/53, but it also supports tcp/53 (and even requires it for longer query types.) You'll want to block both just to be sure.
Bill Stewart
New Fast-Compression-only CPR http://preview.tinyurl.com/dy575ks
I do it on the /etc/hosts level on my dns server.
What kind of DNS server software are you using?
I haven't seen yet a DNS server configured to read /etc/hosts. I am using BIND and I do not know if you can even make it read /etc/hosts.
Everything I write is lies, read between the lines.
bind reads /etc/hosts. As far as I know any DNS server I've ever used on Linux reads /etc/hosts
Close the Browser.
It has the advantage of being extremely easy to do (just add a domain to the file), and i have noticed no slowdowns at all on my old netbook.
You should actually notice a speed up! Host file lookups are negligible compared to DNS lookups and HTTP queries...
Everything I write is lies, read between the lines.
... an academic/government network of devices that moved bits from place to place in a store-and-forward ("packet routed," vs. "circuit routed") system in a way that, by design, was able to route around circuit failure. This all happened in and around 1969.
If "freedom and idealism" are or were ever part of the "Internet" I would say that came later.
Remember, before the early 1990s, you had to be a "special person" or "special organization" - i.e. typically connected with the US Government, a university, or a company doing work with the government or a university to have access to "The Internet" or its predecessor network(s). That's not exactly what I would call "freedom."
By the way, I know what you are trying to say, I'm just saying you are mixing apples and oranges and, with respect to the Internet itself (the "IPv4" and now "IPv6" network that came into being in the early 1980s) you are technically not correct.
Knowledge is how to play a game, intelligence is how to win, wisdom is knowing what game to play.
The thing I don't like about it is that it ruins the certificate trust system. With every site signed by the same certificate, even bad ones are accepted by the browser and there is no way to tell them apart.
Counterpoint: If you're in an environment where you're using AD/Group Policy and a squid proxy, you're probably dealing with a group of users that require that sort of network control. Implicitly, they're not checking their certs anyway and wouldn't be able to meaningfully tell the good from the bad even if they had access to that information. If users were doing that, MITM SSL Cert signing wouldn't be necessary in the first place.
Route your traffic through China. Anything bad or even remotely offensive will be filtered out, and I hear they are on top of their shit keeping that stuff up to date.
No one posts ads there. If you can find the content, that is. Or even know what gopherspace is.
The correct place to do this is with some kind of in-line web appliance if you want to do things 'hands off'. You can delegate what users should be able to view, according to group policy or IP range or something, and all your web traffic will be handled via that, preferably between your main switch and your modem. As for what performance impact you will get off running it on a home router... who knows, but the service will probably be rubbish unless it hooks into some large OSS database.
The problem you will always have is 'what should be blocked'. In the past, I've found most 3rd party filters to be a little 'hyperactive', and do more harm blocking content than allowing users to do their damn job. A good one is 'chat sites'. A lot of filters will consider any URL with 'forum' in it to be a 'chat site'. A legit example is MrExcel, and hints on how to write working proprietary VB into your spreadsheet. If you can switch it to minimal settings and just block porn and gambling, life becomes a bit easier... but then you always get people going to golfing or football websites to play hypothetical games which break those filters as well.
I have to ask myself; why else did MS make it possible to add trusted root certs at the OS level and why do all the browsers (I've so far tested) totally trust and respect the OS level trusted root cert list? Isn't it possible to get, say, Chrome to use its own trusted root certs instead?
In the environment where I'm doing this, totally the users require that sort of control otherwise its going to bring the business down. No kidding.
Mind you I do have to explain to the CEO/VP who asked for this, how someone with access to the proxy could mess with their online bank pages to make it look like they had no money, or endless amounts of money. And I'm not sure that THEY understand this risk properly...
In the free world the media isn't government run; the government is media run.
Yet so many wrong answers. If the question is "where do you filter", the answer is "where it makes sense".
You place the filter as "low" on the network diagram as possible while achieving your objectives. To put it another way, as close to your end users as possible.
Mod me down with all of your hatred and your journey towards the dark side will be complete!
I block at the browser level - er, machine level (in the case of non-web based content). The reason for this is that if you suddenly find that something is being blocked that you need access to, it is much easier to adjust it at the machine level than having to log into a router or proxy and change settings.
Of course, this is for a home network, with no wife or kids.
I also usually use VPN tunnels, so blocking at the router or proxy level would be pointless anyways.
Have circumstances finally changed such that accessing sites that are not work related at work are no longer a luxury but a necessity?
You mentioned academics. In addition, when the proxy ends up blocking access to the official web page for a software library that an in-house application uses or may use in the near future, and the rest of the IT department is counterproductively obstinate against allowing necessary access, then yes, a segregated guest net for the break room PC is a necessity.
Most of it was paid for by TAXES, you mean.
Oh, but since the Reagan Revolution we don't believe in taxes being spent to benefit taxpayers any more. Saint Ron taught us to give all the money and infrastructure to corporations who are above the law (like telcos of course) so they can charge us for the use of taxpayer-built infrastructure, because AMERICA.
Don't be a commie, remember Obedience to US Corporations Is Freedom!
America! America! America!
I’d like to run Squid as a proxy which has a local hosts which blocks ads. Could someone recommend a low powered Linux based system that I could run 24x7 which could act as a proxy. I don’t any of my machines on 24x7 and although have a few old desktops and laptops which would be suitable, they would suck way too much power. I have a hackintosh on a Samsung NC10 that I was planning on using, however that’s got a 40W power brick. Is there something small, powerful enough to run Squid and not going to add too much to global warming? I hate ads with a vengeance; however don’t want the polar bears to suffer because of this
Hypertalk is a lot like that.
With the first link, the chain is forged.