How About an Intelligent Open Source Filter?
If we must censor content on the Internet, I would feel better about it (but not much) knowing that the censorship was done by the people rather than some bureaucrat in Congress.
Comment by michael : Many anti-censorship folks have been pushing this line for a long time; that the blacklists used in public institutions should, at the minimum, be open for public inspection. This would, no doubt help cure some of the more egregious errors. But the above poster is making an error in his reasoning. Computers do the searching because there is no other way to do it - you simply cannot categorize hundreds of millions of pages by hand, period, end of sentence. And an algorithmic approach can never fully characterize the range of human expression present on the Web - even assuming, for the moment, that you could get people to agree on what should or should not be censored, there's no way to make rules which will pick out those pages with 100% accuracy, or even anything close to that. Doing so would require the development of true artificial intelligence, which isn't even on the horizon. Calling something "open source" or not doesn't make it magically able to achieve a breakthrough in artificial intelligence. When you add in the fact that with three people in a room you have four different opinions on what should and should not be censored, it should become clear that throwing an open source label at something is not going to result in an easy solution.
There is GroupLens to apply something like this to USENET. Check out their work.
Unfortunatly, the only work I know going on with this is a few professors that I had in college. You can look at their web pages at: Joseph Konstan, John Riedl. The latter site has a lot more information. (ie a link to something)
It is usenet only, but I think this is a way to start. Then we just need a way to rate pages that everyone works on, and can agree to partially. If 100 people call a web site pron, and 10 call it interesting, I'd not want my children to see it. If 5 call it porn, and 100 call it interesting, it might be interesting, but not viewable (by my children who I want to protect more then most /. readers) without a parent to decide if the child needs to know about brest cancer in that level of detail. (TO give an example of where useful information and porn can cross).
Then there is violence. I wouldn't want to view any "violent" web site myself, if the site was movies and pictures of one person murdering anouther. However if the website was hunting videos, I personally consider that normal content and would let anyone of any age see it. A colarabative filter allows me (with time) to build up a personal database cross references with others. then it can say "100 other parents who tend to think like you have called this [murder] site bad" vs "Many people call this [hunting] site violent, but those who tend to agree with you recomend it." If the entire world rates every site they visit out of habit, this could be useful.
Note that above I intentially stuck to single issures. The hunting site with naked hunters is unaccaptable (to me), even though the hunting content may be good. Filtering software must be complex enough to handle all this issues, and yet be simple enough that people use it.
The idea here is that you can control whos database you search, and exactly how you search. eg: If you want to eliminate all sites that mention Libertarianism and have a 10 in their IP address, you should be able to do so.
This would give the user to pick and choose whos database best met their screening requirements, AND be able to control what sort of screening was being done.
My suggestion for the database format would be a simple one:
Main Record:
The owners of the database would be responsible for how the values are given to the site. Some may use automated searches, others may elect to screen pages themselves, yet others may elect to ask contributers to visit the sites and rate them.
the idea of having so many different scores is so that you can include sites that would otherwise be excluded. eg: Medical sites dealing with AIDS issues are going to mention sex. On the other hand, the information is also clearly both medical and educational and so would have a rating in both of those categories, too.
So, to search for AIDS stuff, you'd probably want a filter that you could instruct to exclude any sites with a sex score > (medical + educational).
This doesn't deal with bogus positives, produced by people cloning the title page, and then using a redirect to send you to the "real" site. However, you can filter out those, by simply adding the following set of records:
Index record:
Fields 1 and 2 form the combined key for the record.
The database then computes the values, not just of that page, but of all pages to a user-specified depth (not exceeding some sensible value, or you'd end up with a DOS attack). The values used would then be some weighted average of the values obtained. This would remove the risk of fraudulant title pages, or other forms of deliberate deception.
Because this system is so much under the control of the user, it is not, in any sense of the word, blocking free speech, or censoring anyone. The user has specifically selected the database and the criteria for exclusion. Anything blocked, then, is blocked in full knowledge and by the deliberate hand of the user, who should (by rights) have every right to say what they don't experience.
Also, by having neworked databases, anyone can set a database up. If someone sets up a system that others feel is unfair, biased or unreasonably censoring, they're free to set up their own database. If the users agree with their opinion, they'll switch. If they don't, well, just too bad. They're entitled to their opinions, too, even if that means disagreeing.
This system could be extended, with a very simple front-end, to be a search engine as well as a filter. It works both ways. And, because of its design, you will get far fewer bogus hits. Instead of a few thousand hits, most of which are irrelevent sites and/or scams, you'll get just what you want, because you've screened out everything else.
It's a small world and it smells funny; I'd buy another if it wasn't for the money; Take back what I paid (SoM)