Thousands of Sites Wrongly Blocked
Ben Edelman writes: "In the context of the ACLU's pending
challenge to the Children's
Internet Protection Act (PDF), I recently prepared a list of some 6000+ web sites that, by and large, fail to meet the category definitions of popular Internet filtering programs yet are blocked by at least one such program. This topic may be old hat, but my work is new: I have prepared an unusually large list of sites (including police departments, libraries, home-schooling sites, candidates for political office, and on and on), and I have retested these sites over a period of several months."
Sh00z, two thoughts:
1) I agree that some portions of content on some of the sites on my list have been correctly categorized. But in the instance you described, it sounds like the specific URL on my list doesn't contain content meeting filtering programs' category definitions. As a result, even if there's reason to categorize other content on that same server, there's no need to categorize this specific page.
(To put this a different way: Many of the filtering programs seem to classify entire sites -- all content on an entire domain name, for example. But there's no reason why pages couldn't instead be rated on a page-by-page basis [and indeed some filtering companies report that they do this, too, in at least some instances]. To the extent that programs fail to do review and separately categorize every individual page, they may overblock pages without content meeting their criteria.)
2) There's no doubt that some URLs on my lists actually do meet filtering companies' category definitions. I'm no librarian, and neither am I otherwise trained in content categorization, so it wasn't my job to identify this content. (Plus, as you can imagine, it's a large task to view many thousands of sites!) Instead, librarians reviewed certain of the sites (including a random sample of the entire list) to attempt to estimate the proportion of sites from my lists that are, in their professional opinions, suitable for use within a library. It's my understanding that the results of their study are forthcoming.