Slashdot Mirror


How About an Intelligent Open Source Filter?

GlitchZ28 asks: "It seems to me that the problem with Internet filters is their blanket approach searching for words in pages, URLs and of course the ole blacklist. It's the same as WWII blanket bombing. Drop 500 bombs at one target and chances are you'll get it (along with a lot of things that weren't targets). Has anyone considered starting a little project to create a simple, very easily modified open source Internet filtering program? Allowing library officals to decide what tactics would fit their needs such as a blacklist of the the obvious porno sites. I really wouldn't mind a filtering system IN public libraries if it could be scrutinized BY the public and then changed."

If we must censor content on the Internet, I would feel better about it (but not much) knowing that the censorship was done by the people rather than some bureaucrat in Congress.

Comment by michael : Many anti-censorship folks have been pushing this line for a long time; that the blacklists used in public institutions should, at the minimum, be open for public inspection. This would, no doubt help cure some of the more egregious errors. But the above poster is making an error in his reasoning. Computers do the searching because there is no other way to do it - you simply cannot categorize hundreds of millions of pages by hand, period, end of sentence. And an algorithmic approach can never fully characterize the range of human expression present on the Web - even assuming, for the moment, that you could get people to agree on what should or should not be censored, there's no way to make rules which will pick out those pages with 100% accuracy, or even anything close to that. Doing so would require the development of true artificial intelligence, which isn't even on the horizon. Calling something "open source" or not doesn't make it magically able to achieve a breakthrough in artificial intelligence. When you add in the fact that with three people in a room you have four different opinions on what should and should not be censored, it should become clear that throwing an open source label at something is not going to result in an easy solution.

9 of 17 comments (clear)

  1. Forget filtering,how about a web moderation system by copito · · Score: 2

    I don't really care about seeing porn from time to time and I don't have any kids. What I would like though is a good distributed web moderation system.

    This is already done to some degree with some search engines, especially specialized ones for books or movies. But I would love to see something built into my browser that would allow me (securely and privately of course) to set my preferences
    (intelligent > 2 | funny > 4 ) & (commercial 1)
    so that links to other stuff are colored differently. Of course obtaining moderator status would be hard, so it would probably have to be a firefly kind of thing where you're views are matched with others of your same views (securely and privately of course). Consider it the poor man's intelligent agent.
    --

    --
    "L'IT c'est moi!"
  2. Commentator Michael doesn't "Get It." (tm) by marcus · · Score: 2

    "You can't categorize millions of pages by hand, period."

    Huh? What happened to the Open Ideal of "many hands, many eyes, many minds"? How many millions of lines of code have been "categorized" by how many programmers? How many hours of thought are behind each one?

    How many CD id numbers and associated track lists have been categorized by users of the free(and non-free) CD databases?

    What experiences and thought processes make you so sure that it can't be done, period?

    --
    Good judgement comes from experience, and experience comes from bad judgement.
    - W. Wriston, former Citibank CEO
  3. Junkbuster wouldn't work on the long run by Pseudonymus+Bosch · · Score: 2

    When Junkbuster achieves popularity, it will be subverted.
    Besides, Junkbuster works because many of the ads include in their URL obvious clues like *ad*, *banner*, *promo* and are sent from centralized sites like DoubleClick and other media brokers.

    Porn, racist URL can be much more diverse, and they usually lead to same-site places. It would be difficult to list every site that runs censurable pages.
    __

    --
    __
    Men with no respect for life must never be allowed to control the ultimate instruments of death.
    GW Bu
  4. Not through URLs by Pseudonymus+Bosch · · Score: 2

    That seems similar to PICS (not that I know so much about PICS).

    But you would be blocking entire sites that use dynamic links like Slashdot.
    Slashdot can contain censurable comments everywhere and they can have lots of URLs, because they embed parameters into the URL.

    As Tim Berners-Lee says somewhere in W3C, URLs should not be stuffed with representation stuff, that should be negotiated.
    __

    --
    __
    Men with no respect for life must never be allowed to control the ultimate instruments of death.
    GW Bu
  5. Why filters? by cr0sh · · Score: 2

    Why do we call these programs "filters"? Because they "clean" the internet. However, that makes one grand assumption - that the internet contains an abbundance of dirtyness, that for some reason should not be viewed by human eyes.

    The fallacy in this reasoning is the idea that information can be dirty or clean, impure or proper, amoral or moral. I submit that all information is neither - that it only becomes so in the mind of the viewer.

    One thing I have noticed in my 27 years of life (which I realize is by no means a long life view, but it is what I have to work with) is that those people who are the most vocal about being and living "clean", tend to be the same ones who are secretly "dirty" behind the back of society. Those people who oppose or don't care about the issue are labled as "dirty" by these same people, because these "unclean" people represent a mirror to them of the way they really are. They feel that if they could rid the world of "dirtyness", that they themselves might become clean, and could thus dispense with their secret.

    This idea is a perversion of logic. Those people that do this generally have failed to be honest with themselves and others. They ususally don't realize that by being truthful (though it may be painful) they can rid themselves of the issue. The rest of the "dirty" world has already realized this.

    We cannot pretend to protect children and adults from things which, even if severely impeded by all technical means (short of brain modification - which I am sure is coming), would still arise in their thoughts anyway (show me child above the age of six who has never thought about sex - we are kidding ourselves if we believe that children don't). I find it amusing at the number of people who rail and rise against the whole issue of porn - who never consider that the porn will always be there, because it is supply and demand. Do these people really think they can rid mankind from viewing sex - when it is the thought of sex and the pleasure which is hard wired into our minds to make us procreate? If sex wasn't important, why would it be pleasurable?

    I don't think we need filters. I think we need more rationality and intelligence.

    --
    Reason is the Path to God - Anon
  6. Yes... by cr0sh · · Score: 2

    But in the minds of those who want to impose filters onto the rest of the population, filters form a way for them to delude themselves into thinking that a filter will make themselves "clean", by "removing" the "mirror" that stares back at them, reminding them of their own secrets.

    I take sympathy with your issues on "bait-and-switch" sites - I have experienced similar sites, but I wouldn't want to impose upon myself a safeguard against such sites, as they only "suprise" me rarely (it would be akin to keeping a small boat in the middle of a desert, just in case it floods). You mentioned you got suprised by a troll. Without knowing the link to the site, did you by any means check to see where the link was pointing to before clicking on it? Also, did you suspect the individual was a troll prior to clicking on the link (sometimes I click on troll's links, just to see where they take me - but I am never suprised by where they go)?

    The only time I could see that you would want a filter for yourself would be in a work environment - one which is so draconian they monitor every move you make (and log all internet traffic). If that was a problem, I would wonder why you would continue to work there. No job is worth that kind of paranoia...

    --
    Reason is the Path to God - Anon
  7. Might be a way to force filters OUT of libraries. by Tau+Zero · · Score: 2
    There is one answer which should never be accepted when a citizen goes to a government agency with a question, and that is "You don't need to know that." When the question is "What sites am I not allowed to see from the library workstations, and why?", the only answer that is acceptable in a free society is a list of site URL's and criteria. Call it a card anti-catalog, a list of things you can't find in the library, but it's all the same. When the library installs filters on the computers, they just made it your business.

    When a library buys a filter from a company which keeps its filter list secret, this principle of public accountability is violated. This looks to me like it could be grounds for a citizen lawsuit against a library, demanding that the filter list be published or the filtering be removed. This could serve both ways; people who don't want legitimate information restricted could hammer on the filter companies for their abuses, and people who do want porn, violence, hate speech or whatnot blocked could make certain that there aren't any critical omissions in the lists either. But I think the most likely outcome is that the filter companies would stop selling to libraries, along with a small flurry of lawsuits by people and organizations like Peacefire against companies like Mattel, for libelling them in their filter classifications (Peacefire, pornographic and violent? HAH!). To keep their filter lists "secret" (as if they'll be safe from the amateur cryptographers and hackers!) they'll have to stop selling to public organizations. Voila, the problem is stamped out at the source.
    --

    --
    Time is Nature's way of keeping everything from happening at once... the bitch.
  8. collaborative filtering by bluGill · · Score: 3

    There is GroupLens to apply something like this to USENET. Check out their work.

    Unfortunatly, the only work I know going on with this is a few professors that I had in college. You can look at their web pages at: Joseph Konstan, John Riedl. The latter site has a lot more information. (ie a link to something)

    It is usenet only, but I think this is a way to start. Then we just need a way to rate pages that everyone works on, and can agree to partially. If 100 people call a web site pron, and 10 call it interesting, I'd not want my children to see it. If 5 call it porn, and 100 call it interesting, it might be interesting, but not viewable (by my children who I want to protect more then most /. readers) without a parent to decide if the child needs to know about brest cancer in that level of detail. (TO give an example of where useful information and porn can cross).

    Then there is violence. I wouldn't want to view any "violent" web site myself, if the site was movies and pictures of one person murdering anouther. However if the website was hunting videos, I personally consider that normal content and would let anyone of any age see it. A colarabative filter allows me (with time) to build up a personal database cross references with others. then it can say "100 other parents who tend to think like you have called this [murder] site bad" vs "Many people call this [hunting] site violent, but those who tend to agree with you recomend it." If the entire world rates every site they visit out of habit, this could be useful.

    Note that above I intentially stuck to single issures. The hunting site with naked hunters is unaccaptable (to me), even though the hunting content may be good. Filtering software must be complex enough to handle all this issues, and yet be simple enough that people use it.

  9. Simple Approach by jd · · Score: 3
    Have a filter which searches a networked database, according to either default or user-entered search criteria. Each database filters by URL. In the case of FTP, Gopher, WAIS or Telnet, this would involve having the filter translate these into URL format.

    The idea here is that you can control whos database you search, and exactly how you search. eg: If you want to eliminate all sites that mention Libertarianism and have a 10 in their IP address, you should be able to do so.

    This would give the user to pick and choose whos database best met their screening requirements, AND be able to control what sort of screening was being done.

    My suggestion for the database format would be a simple one:

    Main Record:

    • Field 1: The URL of the site
    • Field 2: Keyword list for site
    • Field 3: Sex content (1-100)
    • Field 4: Violence content (1-100)
    • Field 5: Graphics content (1-100)
    • Field 6: Commercial content (1-100)
    • Field 7: Religious content (1-100)
    • Field 8: Political content (1-100)
    • Field 9: Educational content (1-100)
    • Field 10: Medical content (1-100)
    • Field 11: Military content (1-100)

    The owners of the database would be responsible for how the values are given to the site. Some may use automated searches, others may elect to screen pages themselves, yet others may elect to ask contributers to visit the sites and rate them.

    the idea of having so many different scores is so that you can include sites that would otherwise be excluded. eg: Medical sites dealing with AIDS issues are going to mention sex. On the other hand, the information is also clearly both medical and educational and so would have a rating in both of those categories, too.

    So, to search for AIDS stuff, you'd probably want a filter that you could instruct to exclude any sites with a sex score > (medical + educational).

    This doesn't deal with bogus positives, produced by people cloning the title page, and then using a redirect to send you to the "real" site. However, you can filter out those, by simply adding the following set of records:

    Index record:

    • Field 1: URL of source page
    • Field 2: URL of destination page

    Fields 1 and 2 form the combined key for the record.

    The database then computes the values, not just of that page, but of all pages to a user-specified depth (not exceeding some sensible value, or you'd end up with a DOS attack). The values used would then be some weighted average of the values obtained. This would remove the risk of fraudulant title pages, or other forms of deliberate deception.

    Because this system is so much under the control of the user, it is not, in any sense of the word, blocking free speech, or censoring anyone. The user has specifically selected the database and the criteria for exclusion. Anything blocked, then, is blocked in full knowledge and by the deliberate hand of the user, who should (by rights) have every right to say what they don't experience.

    Also, by having neworked databases, anyone can set a database up. If someone sets up a system that others feel is unfair, biased or unreasonably censoring, they're free to set up their own database. If the users agree with their opinion, they'll switch. If they don't, well, just too bad. They're entitled to their opinions, too, even if that means disagreeing.

    This system could be extended, with a very simple front-end, to be a search engine as well as a filter. It works both ways. And, because of its design, you will get far fewer bogus hits. Instead of a few thousand hits, most of which are irrelevent sites and/or scams, you'll get just what you want, because you've screened out everything else.

    --
    It's a small world and it smells funny; I'd buy another if it wasn't for the money; Take back what I paid (SoM)