Slashdot Mirror


Semantic Web Under Suspicion

Dr Occult writes "Much of the talk at the 2006 World Wide Web conference has been about the technologies behind the so-called semantic web. The idea is to make the web intelligent by storing data such that it can be analyzed better by our machines, instead of the user having to sort and analyze the data from search engines. From the article: 'Big business, whose motto has always been time is money, is looking forward to the day when multiple sources of financial information can be cross-referenced to show market patterns almost instantly.' However, concern is also growing about the misuses of this intelligent web as an affront to privacy and security."

8 of 79 comments (clear)

  1. All Talk by eldavojohn · · Score: 5, Informative

    So I know a lot of people that get all excited when they read articles on the "semantic web."

    I think that we are all missing some very important aspects of what it takes to make something capable of what they speak of. In all the projects I have worked on, to create something geared toward this sort of solution, you need two things: training data & a robust taxonomy.

    First things first, how would we define or even agree on a taxonomy? By taxonomy, I mean something with breadth & depth that has been used and verified. By breadth I mean that it must be capable of normalization (pharmacetical concoctions, drugs & pills are all the same concept), stemming (go & went are the same action, dog & dogs are the same concept) and also important is how many tokens wide a concept can be. By depth I mean that we must be able to define specificity and use it to our advantage (a site about 747s is scored higher than a site about airline jets which is scored higher than a site about planes). By rigorous I mean that it must be tried and true ... you start with a corpus of documents to "seed" it and have experts (or web surfers) contribute little by little until it is accurate. Oh, it must also be able to adapt quickly and stay current.

    Without a taxonomy, how will we index sites and be able to tell between "water tanks" and "panzer tanks." I think that this is one of the great things that Google is missing to really improve its searching abilities. If you suggest an ontology to replace it, the problems encountered in developing it only multiply.

    Where is the training data? Well, one may argue that the web content out there will suffice as training data but I think that more importantly, they need collections of traffic for these sites and user behavioral patterns to quickly and adequately deduce what the surfer is in need of.

    I feel that these two aspects are missing and the taxonomy may be impossible to achieve.

    Why are we even concerned with security if we can't even lay the foundations for the semantic web? I would argue that once we plan it out and determine it's viable, then we concern ourselves with the everyone's rights.

    --
    My work here is dung.
    1. Re:All Talk by RobotWisdom · · Score: 2, Informative

      I agree. My own (universally ignored) proposal for the taxonomy problem starts with person, place, and thing as 'elements' and builds complex ideas as compounds of these: [faq]

    2. Re:All Talk by Temposs · · Score: 2, Informative
      As another reply mentions, WordNet is a promising avenue of success for creating a taxonomy and an ontology for the web(just read a paper on ontologizing semantic relations using WordNet, actually). In fact, it already is a taxonomy of sorts(and a multi-dimensional one at that), although a generalized one. And there are multitudinous other projects building off of WordNet and paralleling WordNet.

      There's VerbNet, FrameNet, Arabic WordNet, and probably others I don't know about.

      WordNet has become a standard for working with semantic relations computationally these days. It works by storing all known senses of every dictionary word, and each sense has links to other words based on how it's semantically related(synonym, antonym, hyper/hyponym, meronym, troponym, cause, is_a, morphological derivative, etc...)

      There's not any model that can compete with it currently, and it's widely accessible and very easy to use. As this tool improves, so will the semantic web.

      --
      Knowledge is just opinion that you trust enough to act upon. -Orson Scott Card
  2. Semantic Web ~- evil by tbriggs6 · · Score: 5, Informative
    The article does a pretty bad job at explaining the situation. The idea behind the Semantic Web is simply to provide a framework for information to be marked up for machines rather than human eyes. The idea is that using an agreed upon frame of reference for the symbols contained in the page (an ontology), agents are able to make use of the information contained there. Further, an agent can collection data from several different ontologies and (hopefully) perform basic reasoning tasks over that data, and (even better) complete some advanced tasks for the agent's user.

    The article would have us believe that this is going to expose everyone to massive amounts of privacy invasion. This is not necessarily the case. It is already the case that there are privacy mechanisms to protect information in the SW (e.g. require agents to authenticate to a site to retrieve restricted information). Beyond simple mechanisms, there is a lot of research being conducted on the idea of trust in the semantic web - e.g. how does my agent know to trust a slashdot article as absolute truth and a wikipedia article as outright fabrication (or vice versa).

    As for making the content of the internet widely available, some researchers feel this will never happen. As another commenter noted that it is essential that there is agreement in the definition of concepts (ontologies) to enable the SW to work (if my agent believes the symbol "apple" refers to the concept Computer, and your agent believes it refers to "garbage", we may have some interesting but less than useful results). I am researching ontology generation using information extraction / NLP techniques, and it is certainly a difficult problem, and one that isn't likely to have a trivial problem (in some respects, this is goes back to the origins of AI in the 1950's, and we're still hacking at it today).

    For some good references on the Semantic Web (beyond Wikipedia), check out some of these links

    1. Re:Semantic Web ~- evil by tbriggs6 · · Score: 2, Informative

      Ontologies for the Semantic Web are based on description logics (OWL-DL) or first-order logics (Owl-Full). We define classes and their relationships (T-Box definitions), and we define instance assertions (A-Box definitions).

      For example, we could define the Apple domain as :

      Classes: Computer, Garbage, ComputerMfg
      Roles: makesComputer computerMadeBy

      We can assign the domain of makesComputer to be a ComputerMfg, and the range to be a Computer (the inverse would be flipped).

      Class rdf:ID="Computer"

      Class rdf:ID="Garbage"

      Class rdf:ID="ComputerMfg"

      ObjectProperty rdf:ID="computerMadeBy, domain rdf:resource="#Computer", range rdf:resource="#ComputerMfg", inverseOf rdf:ID="makesComputer",

      ObjectProperty rdf:about="#makesComputer", domain rdf:resource="#ComputerMfg", range rdf:resource="#Computer", inverseOf rdf:resource="#computerMadeBy"

      Nothing about Apple yet. So,

      We can assert that "APPLE" is a ComputerMfg (not Garbage), and that it is related to the symbol PoweerBook by the makesComputer / computerMadeBy relationship.

      Computer rdf:ID="PowerBook", computerMadeBy ComputerMfg rdf:ID="APPLE"
                      makesComputer rdf:resource="#PowerBook"

      So, using the Semantic Web (as it stands) requires crisp description logics, and admits (almost) no ambiguity. For those who want to pick at me, yes, OWA and UNA make things a little strange.

      Given that natural language is fraught with uncertainty, this is the root of the automatic ontology generation problem (and the beginning of my research).

  3. Healthcare? by Anonymous Coward · · Score: 1, Informative

    The guy claims that health records are public data? Well, that's a BBC site, but in the U.S. they decidedly are not, since HIPAA was passed.

    But all this semantic web stuff makes me giggle when they start talking about healthcare, anyway. I worked in that industry up until a couple years ago. Semantic web people want to move everybody away from EDI...while the healthcare people are struggling to upgrade to EDI. In 2003 I was setting up imports of fixed-length mainframe records. By the time healthcare is exchanging RDF over the Web, we'll all have nanobots in our blood and won't need doctors anymore anyway.

  4. Well by aftk2 · · Score: 2, Informative

    The semantic web would have to be feasible before it posed some sort of threat, so I wouldn't get too up in arms about this.

    --
    concrete5: a cms made for marketing, but strong enough for geeks.
  5. I have a chapter on SW in my new book by MarkWatson · · Score: 2, Informative

    I am both a sceptic and a fan of the SW. I dislike XML serialization of RDF (and RDFS and OWL) - to me the SW is a knowledge engineering task and frankly 20 year old Lisp systems seem to offer a friendlier notation and a much better working environment. If you are a Lisp-er, check out the (now) open source Loom system that supplies a descriptive logic reasoner and other goodies.

    The Protege project provides a good (and free) editor for working on ontologies - you might want to grab a copy and work through a tutorial.

    I think that the SW will take off, but its success will be a grass roots type of effort: simple ontologies will be used in an adhoc sort of way and the popular ones might become defacto standards. I don't think that a top-down standards approach is going to work.

    I added a chapter on the SW to my current Ruby book project, but it just has a few simple examples because I wanted to only use standard Ruby libraries -- no dependencies makes it easier to play with.

    I had a SW business idea a few years ago, hacked some Lisp code, but never went anywhere with it (I tend to stop working on my own projects when I get large consulting jobs): define a simple ontology for representing news stories and writing an intelligent scraper that could create instances, given certain types of news stories. Anyway, I have always intended to get back to this idea someday.