Slashdot Mirror


Thousands of Sites Wrongly Blocked

Ben Edelman writes: "In the context of the ACLU's pending challenge to the Children's Internet Protection Act (PDF), I recently prepared a list of some 6000+ web sites that, by and large, fail to meet the category definitions of popular Internet filtering programs yet are blocked by at least one such program. This topic may be old hat, but my work is new: I have prepared an unusually large list of sites (including police departments, libraries, home-schooling sites, candidates for political office, and on and on), and I have retested these sites over a period of several months."

21 comments

  1. Hello? by MindStalker · · Score: 1

    Where are all the comments? Did everyone just go home? Ok the submission should really have included some more comment provoking remarks. This is sad.

    1. Re:Hello? by Mr.Phil · · Score: 2

      I wonder if this was the way Slashdot once was long ago.

      To stay on topic, it's a law in michigan that all publicly funded internet access terminals (schools, libraries, etc) have content filtering software installed except when said terminal is in a college enviroment. All K-12 schools and public librarys have to have this filtering though to contiue to recieve public funding. Tossed quite a panic in the ISD that I work with from time to time.

    2. Re:Hello? by troller_park_trash · · Score: 2, Interesting

      I don't remember seeing this on the homepage, so once FP was achieved, why would anyone waste their time with posting to this thread....

      But, seriously, I quit reading the report after I got to the "I received $XXX per hour for my work..." A quick scroll through the rest of the document revealed block upon block of deleted text, and I'm really only interested in that if the deleted text is worth figuring out.

      --
      Is Slashdot
  2. The Most Interesting Parts; Protective Order by bedelman · · Score: 4, Informative

    Troller_Park_Trash, If you're already knowledgeable about the means of operation of filtering software, you may find that the most new & interesting part of the http://cyber.law.harvard.edu/people/edelman/mul-v- us/ page is the Appendices listing specific sites that have been, by and large, wrongly classified by filtering programs.

    For example, http://cyber.law.harvard.edu/people/edelman/mul-v- us/index-subset.html ("Blocked Site Archives - Subset with Linked Pages - Appendix A") gives information about 395 such URLs. You'll likely find yourself surprised that many of these are blocked -- I know I was.

    Regarding the blacking out of certain text from my report: As http://cyber.law.harvard.edu/people/edelman/mul-v- us/ mentions, a protective order (from the court in which the underlying case is pending) limits distribution of certain portions of my report -- namely anything I learned from reviewing confidential documents from filtering companies, or from attending confidential portions of depositions of their employees. But the work you, and most others here, are likely to find of greatest interest is the listings of specific sites blocked. (I'm presently adding a bit of text and formatting to help folks find this content more quickly and easily.)

    Ben Edelman

  3. Experiment - censorware collateral damage verify by Seth+Finkelstein · · Score: 5, Informative
    Well, maybe it's safe for me to post this to Slashdot ...

    [Originally sent to a mailing-list]

    In honor of the censorware material just released by ACLU, I thought I'd try a little experiment in distributed verification.

    I took one example from Edelman's report:

    16. Southern Alberta Fly Fishing Outfitters #6809
    http://www.albertaflyfish.com
    Blocked by: N2H2 (Pornography - Sep 11, Oct 7), Websense (Sex - Jul 5,
    Aug 18, Sep 11)
    Yahoo: /Regional/Countries/Canada/Business and Economy/Shopping and
    Services/Outdoors/Fishing/Fly Fishing/Lodges/
    Google: /Regional/North America/Canada/Alberta/Recreation and
    Sports/Fishing
    Fly fishing in Alberta Canada on the world famous Bow River.

    Now, what does censorware have against this site? Maybe it doesn't like too many 'Fly' references in one place? No, it turns out that this site has the misfortune to be virtually hosted and share an internet address with:

    http://clubexoticx.com - Club Exoticx

    There's a bunch of other completely innocuous sites suffering the same collective guilt of the censorware blacklist. I'd like people to go to N2H2's lookup, at http://database.n2h2.com/cgi-perl/catrpt.pl and *verify* this for themselves by testing the following sites:

    http://albertaflyfish.com - Southern Alberta Fly Fishing Outfitters
    http://alistairbrown.com - Alistair Brown Folksinger
    http://eclothing.com - 'The Game Is On Sportswear Company Ltd.'
    http://effectivemanagementsolutions.com - Effective Solutions
    http://eligh.com - Springboard Consulting
    http://eyepowered.com - E Y E P O W E R E D - 360 Degree Panoramas
    http://friendlyfacesonline.com - Create personalized family cartoon
    http://gear4pickups.com - Gear4Trucks: HitchHoist Portable Truck Crane,
    http://informationonhold.com - Information On-Hold
    http://letsmakewine.com - Let's Make Wine
    http://planetregister.com - Planet Registe
    http://ppt-slides.com - 35mm Slides from your computer file
    http://proteach.net - Pro Teach Main Page - Baseball instruction
    http://rosiedonovan.com - Rosie Donovan Photography
    http://springboardtoinnovation.com - Springboard Consulting

    Here, I'll make this easy. Just click these URLs:

    http://database.n2h2.com/cgi-perl/catrpt.pl?req_UR L=http://albertaflyfish.com
    http://database.n2h2.com/cgi-perl/catrpt.pl?req_UR L=http://alistairbrown.com
    http://database.n2h2.com/cgi-perl/catrpt.pl?req_UR L=http://eclothing.com
    http://database.n2h2.com/cgi-perl/catrpt.pl?req_UR L=http://effectivemanagementsolutions.com
    http://database.n2h2.com/cgi-perl/catrpt.pl?req_UR L=http://eligh.com
    http://database.n2h2.com/cgi-perl/catrpt.pl?req_UR L=http://eyepowered.com
    http://database.n2h2.com/cgi-perl/catrpt.pl?req_UR L=http://friendlyfacesonline.com
    http://database.n2h2.com/cgi-perl/catrpt.pl?req_UR L=http://gear4pickups.com
    http://database.n2h2.com/cgi-perl/catrpt.pl?req_UR L=http://informationonhold.com
    http://database.n2h2.com/cgi-perl/catrpt.pl?req_UR L=http://letsmakewine.com
    http://database.n2h2.com/cgi-perl/catrpt.pl?req_UR L=http://planetregister.com
    http://database.n2h2.com/cgi-perl/catrpt.pl?req_UR L=http://ppt-slides.com
    http://database.n2h2.com/cgi-perl/catrpt.pl?req_UR L=http://proteach.net
    http://database.n2h2.com/cgi-perl/catrpt.pl?req_UR L=http://rosiedonovan.com
    http://database.n2h2.com/cgi-perl/catrpt.pl?req_UR L=http://springboardtoinnovation.com

    You should get

    The Site: [all sites above]
    is categorized by N2H2 as:
    Pornography

    If there's some error-message text in a red font, that means the N2H2 program itself wasn't working, try again.

    Now, since I've publicized this, I expect it'll be changed very rapidly for this one item. I have a saying: "Alacrity varies directly with publicity". But this is just one example in a HUGE blacklist. What else is lurking in there?

    Sig: What Happened To The Censorware Project (censorware.org)

  4. Its a real problem. by Anton+Anatopopov · · Score: 2, Informative
    There is the reverse problem as well. I complained to websense about that hateful adequacy.org site. They were kind enough to classify it as 'obscene/tastelss without any redeeming qualities' or something.

    But it seems that someone else disagreed with me, and now it is categorized as 'satire'. Exactly how a site with such poor standards of journalistic integrity is allowed in that category amazes me.

    I have now added adequacy.org to my junkbuster file, so I never have to see it again.

    1. Re:Its a real problem. by SanLouBlues · · Score: 2, Informative

      For refrence (Database date doesn't necesarily mean the entry changed on that date):
      Websense URL Lookup Tool Results

      The URL:
      adequacy.org

      is classified by Websense under the category:
      Alternative Journals

      According to:
      Database version: 01215
      Database date: 2001-10-30

  5. Many of those sites are *NOT* wrongly blocked by sh00z · · Score: 2, Informative

    I've visited some of the sites on the two lists, and can attest that many of them are flagged properly. Bottom line? I don't think the author of the paper spidered the sites as well as the censorware did. As an example, it takes some work to find the pr0n links at mulletsgalore.com, but they're there.

    1. Re:Many of those sites are *NOT* wrongly blocked by bedelman · · Score: 3, Insightful

      Sh00z, two thoughts:

      1) I agree that some portions of content on some of the sites on my list have been correctly categorized. But in the instance you described, it sounds like the specific URL on my list doesn't contain content meeting filtering programs' category definitions. As a result, even if there's reason to categorize other content on that same server, there's no need to categorize this specific page.

      (To put this a different way: Many of the filtering programs seem to classify entire sites -- all content on an entire domain name, for example. But there's no reason why pages couldn't instead be rated on a page-by-page basis [and indeed some filtering companies report that they do this, too, in at least some instances]. To the extent that programs fail to do review and separately categorize every individual page, they may overblock pages without content meeting their criteria.)

      2) There's no doubt that some URLs on my lists actually do meet filtering companies' category definitions. I'm no librarian, and neither am I otherwise trained in content categorization, so it wasn't my job to identify this content. (Plus, as you can imagine, it's a large task to view many thousands of sites!) Instead, librarians reviewed certain of the sites (including a random sample of the entire list) to attempt to estimate the proportion of sites from my lists that are, in their professional opinions, suitable for use within a library. It's my understanding that the results of their study are forthcoming.

    2. Re:Many of those sites are *NOT* wrongly blocked by sh00z · · Score: 1

      I guess that because the article said "sites," I never expected a page-by-page examination, and that a site would be included or excluded based on all linked content within the same domain name/IP number.

    3. Re:Many of those sites are *NOT* wrongly blocked by bedelman · · Score: 1

      Good point. I've tweaked some of the language to make this more clear. Thanks for the suggestion.

    4. Re:Many of those sites are *NOT* wrongly blocked by Anonymous Coward · · Score: 0

      page by page may seem extreme but blocking by ip is definitely extreme. If I share a site with my family and my portion of the site has content on 18 hobbies. You don't need to require that my blockable content be all the way down in art->photography->museums->nudes, to consider the
      rest of the content harmless.

    5. Re:Many of those sites are *NOT* wrongly blocked by sh00z · · Score: 1

      That's why I said linked content. If your Danni Ashe portfolio is just a click or two away from your little sister's Beanie Baby collection, don't be shocked if censorware is blocking the beanies! It's just common sense, folks.

  6. Websense by SanLouBlues · · Score: 2

    My company uses Websense, and I must say they are very reasonable. The have a human controlled database, with loads of catergories for companies to block or not block, and they accept web submissions to change it. Changes occur in less than a week if the do occur. I myself got cmdrtaco.net removed from their database (it was marked adult content). Also they don't block russian porno :).

  7. Slander? by Unknown+Poltroon · · Score: 2

    If a company blocks my site as pornography, can I sue for damnages under current slander laws?

    --
    All Troll + "offtopic" mods are meta moderated as "Unfair", because you abused the system.
  8. Re:Experiment - censorware collateral damage verif by Anonymous Coward · · Score: 0

    Now, what does censorware have against this site? Maybe it doesn't like too many 'Fly' references in one place? No, it turns out that this site has the misfortune to be virtually hosted and share an internet address with:

    http://clubexoticx.com [clubexoticx.com] - Club Exoticx

    There's a bunch of other completely innocuous sites suffering the same collective guilt of the censorware blacklist.


    Seth, let me make absolutely sure I understand you here. Are you trying to say that these people are categorizing the worth of a web site by the content posted on a different virtual host? The only thing I can imagine that would be more ridiculous would be banning entire IP subnets by the content originating from one address on that subnet. That concept is patently absurd. I can't believe people are being paid to do things like that.

    I mean, I understand that Censorware must at some point rely on heuristics, but creating sweeping bans like this based on incomplete information... it's outrageous.

  9. Re:Experiment - censorware collateral damage verif by Seth+Finkelstein · · Score: 2
    Seth, let me make absolutely sure I understand you here. Are you trying to say that these people are categorizing the worth of a web site by the content posted on a different virtual host?
    Yes, in effect. What's happening behind-the-scenes is that some sites are on the blacklist by their domain name, and some sites are on the blacklist by their IP address. When a site is on the blacklist by IP address, and it's a virtually hosted site, then all the virtually hosted sites on that IP address share the same fate in the censorware.

    For another example discussed, see

    http://sethf.com/anticensorware/cyberpatrol/247for 1.php

    Regarding the topic of "banning entire IP subnets", MAPS and other spam blacklists don't do that as an implementation effect. They do it as a deliberate tactic. I don't want to get into that topic too much here, but it's a social issue, not a technological one.

    Sig: What Happened To The Censorware Project (censorware.org)

  10. categories by Unordained · · Score: 1

    our school uses CyberPatrol... i've actually found myself blocked for legit. sites... so i used their webpage to ask the database -why- the site was blocked? the answer, a category that isn't even listed as one of the possible categories, even less one that our school would have subscribed to... the category? computer-related. amusing, eh?

  11. Re:Experiment - censorware collateral damage verif by An+Ominous+Coward · · Score: 2

    It's a shame you don't want to get into that topic much here, because it might put some pressure on Jamie to remove the system he checked into slashcode that bans an entire subnet when one person receives too many down-mods.