Thousands of Sites Wrongly Blocked
Ben Edelman writes: "In the context of the ACLU's pending
challenge to the Children's
Internet Protection Act (PDF), I recently prepared a list of some 6000+ web sites that, by and large, fail to meet the category definitions of popular Internet filtering programs yet are blocked by at least one such program. This topic may be old hat, but my work is new: I have prepared an unusually large list of sites (including police departments, libraries, home-schooling sites, candidates for political office, and on and on), and I have retested these sites over a period of several months."
Where are all the comments? Did everyone just go home? Ok the submission should really have included some more comment provoking remarks. This is sad.
Troller_Park_Trash, If you're already knowledgeable about the means of operation of filtering software, you may find that the most new & interesting part of the http://cyber.law.harvard.edu/people/edelman/mul-v- us/ page is the Appendices listing specific sites that have been, by and large, wrongly classified by filtering programs.
- us/index-subset.html ("Blocked Site Archives - Subset with Linked Pages - Appendix A") gives information about 395 such URLs. You'll likely find yourself surprised that many of these are blocked -- I know I was.
- us/ mentions, a protective order (from the court in which the underlying case is pending) limits distribution of certain portions of my report -- namely anything I learned from reviewing confidential documents from filtering companies, or from attending confidential portions of depositions of their employees. But the work you, and most others here, are likely to find of greatest interest is the listings of specific sites blocked. (I'm presently adding a bit of text and formatting to help folks find this content more quickly and easily.)
For example, http://cyber.law.harvard.edu/people/edelman/mul-v
Regarding the blacking out of certain text from my report: As http://cyber.law.harvard.edu/people/edelman/mul-v
Ben Edelman
[Originally sent to a mailing-list]
In honor of the censorware material just released by ACLU, I thought I'd try a little experiment in distributed verification.
I took one example from Edelman's report:
16. Southern Alberta Fly Fishing Outfitters #6809 /Regional/Countries/Canada/Business and Economy/Shopping and /Regional/North America/Canada/Alberta/Recreation and
http://www.albertaflyfish.com
Blocked by: N2H2 (Pornography - Sep 11, Oct 7), Websense (Sex - Jul 5,
Aug 18, Sep 11)
Yahoo:
Services/Outdoors/Fishing/Fly Fishing/Lodges/
Google:
Sports/Fishing
Fly fishing in Alberta Canada on the world famous Bow River.
Now, what does censorware have against this site? Maybe it doesn't like too many 'Fly' references in one place? No, it turns out that this site has the misfortune to be virtually hosted and share an internet address with:
http://clubexoticx.com - Club Exoticx
There's a bunch of other completely innocuous sites suffering the same collective guilt of the censorware blacklist. I'd like people to go to N2H2's lookup, at http://database.n2h2.com/cgi-perl/catrpt.pl and *verify* this for themselves by testing the following sites:
http://albertaflyfish.com - Southern Alberta Fly Fishing Outfitters
http://alistairbrown.com - Alistair Brown Folksinger
http://eclothing.com - 'The Game Is On Sportswear Company Ltd.'
http://effectivemanagementsolutions.com - Effective Solutions
http://eligh.com - Springboard Consulting
http://eyepowered.com - E Y E P O W E R E D - 360 Degree Panoramas
http://friendlyfacesonline.com - Create personalized family cartoon
http://gear4pickups.com - Gear4Trucks: HitchHoist Portable Truck Crane,
http://informationonhold.com - Information On-Hold
http://letsmakewine.com - Let's Make Wine
http://planetregister.com - Planet Registe
http://ppt-slides.com - 35mm Slides from your computer file
http://proteach.net - Pro Teach Main Page - Baseball instruction
http://rosiedonovan.com - Rosie Donovan Photography
http://springboardtoinnovation.com - Springboard Consulting
Here, I'll make this easy. Just click these URLs:
http://database.n2h2.com/cgi-perl/catrpt.pl?req_UR L=http://albertaflyfish.com R L=http://alistairbrown.com R L=http://eclothing.com R L=http://effectivemanagementsolutions.com R L=http://eligh.com R L=http://eyepowered.com R L=http://friendlyfacesonline.com R L=http://gear4pickups.com R L=http://informationonhold.com R L=http://letsmakewine.com R L=http://planetregister.com R L=http://ppt-slides.com R L=http://proteach.net R L=http://rosiedonovan.com R L=http://springboardtoinnovation.com
http://database.n2h2.com/cgi-perl/catrpt.pl?req_U
http://database.n2h2.com/cgi-perl/catrpt.pl?req_U
http://database.n2h2.com/cgi-perl/catrpt.pl?req_U
http://database.n2h2.com/cgi-perl/catrpt.pl?req_U
http://database.n2h2.com/cgi-perl/catrpt.pl?req_U
http://database.n2h2.com/cgi-perl/catrpt.pl?req_U
http://database.n2h2.com/cgi-perl/catrpt.pl?req_U
http://database.n2h2.com/cgi-perl/catrpt.pl?req_U
http://database.n2h2.com/cgi-perl/catrpt.pl?req_U
http://database.n2h2.com/cgi-perl/catrpt.pl?req_U
http://database.n2h2.com/cgi-perl/catrpt.pl?req_U
http://database.n2h2.com/cgi-perl/catrpt.pl?req_U
http://database.n2h2.com/cgi-perl/catrpt.pl?req_U
http://database.n2h2.com/cgi-perl/catrpt.pl?req_U
You should get
The Site: [all sites above]
is categorized by N2H2 as:
Pornography
If there's some error-message text in a red font, that means the N2H2 program itself wasn't working, try again.
Now, since I've publicized this, I expect it'll be changed very rapidly for this one item. I have a saying: "Alacrity varies directly with publicity". But this is just one example in a HUGE blacklist. What else is lurking in there?
Sig: What Happened To The Censorware Project (censorware.org)
But it seems that someone else disagreed with me, and now it is categorized as 'satire'. Exactly how a site with such poor standards of journalistic integrity is allowed in that category amazes me.
I have now added adequacy.org to my junkbuster file, so I never have to see it again.
I've visited some of the sites on the two lists, and can attest that many of them are flagged properly. Bottom line? I don't think the author of the paper spidered the sites as well as the censorware did. As an example, it takes some work to find the pr0n links at mulletsgalore.com, but they're there.
My company uses Websense, and I must say they are very reasonable. The have a human controlled database, with loads of catergories for companies to block or not block, and they accept web submissions to change it. Changes occur in less than a week if the do occur. I myself got cmdrtaco.net removed from their database (it was marked adult content). Also they don't block russian porno :).
If a company blocks my site as pornography, can I sue for damnages under current slander laws?
All Troll + "offtopic" mods are meta moderated as "Unfair", because you abused the system.
Now, what does censorware have against this site? Maybe it doesn't like too many 'Fly' references in one place? No, it turns out that this site has the misfortune to be virtually hosted and share an internet address with:
http://clubexoticx.com [clubexoticx.com] - Club Exoticx
There's a bunch of other completely innocuous sites suffering the same collective guilt of the censorware blacklist.
Seth, let me make absolutely sure I understand you here. Are you trying to say that these people are categorizing the worth of a web site by the content posted on a different virtual host? The only thing I can imagine that would be more ridiculous would be banning entire IP subnets by the content originating from one address on that subnet. That concept is patently absurd. I can't believe people are being paid to do things like that.
I mean, I understand that Censorware must at some point rely on heuristics, but creating sweeping bans like this based on incomplete information... it's outrageous.
For another example discussed, see
http://sethf.com/anticensorware/cyberpatrol/247for 1.php
Regarding the topic of "banning entire IP subnets", MAPS and other spam blacklists don't do that as an implementation effect. They do it as a deliberate tactic. I don't want to get into that topic too much here, but it's a social issue, not a technological one.
Sig: What Happened To The Censorware Project (censorware.org)
our school uses CyberPatrol... i've actually found myself blocked for legit. sites... so i used their webpage to ask the database -why- the site was blocked? the answer, a category that isn't even listed as one of the possible categories, even less one that our school would have subscribed to... the category? computer-related. amusing, eh?
It's a shame you don't want to get into that topic much here, because it might put some pressure on Jamie to remove the system he checked into slashcode that bans an entire subnet when one person receives too many down-mods.