Slashdot Mirror


Using the Semantic Web to Enhance Search

RobMcCool writes "At Stanford KSL, we really like the Semantic Web. So we've taken many of our favorite web sites, scraped them, and put together a huge pile of RDF, which we'll let you download. We've used that RDF to create a search application, in the spirit of Google Q & A or Microsofts recently announced MSN Search extensions. Our search can answer simple factual queries like the previously discussed population of Portugal but can also answer some more complex ones. We also have a smart autocomplete system, type "tom hanks birth" slowly to see it in action (best with Firefox). We're looking for people to be a part of this search system by running their own search sites, and by putting their data on the Semantic Web. Come check it out!"

9 of 150 comments (clear)

  1. This won't work by holyshitholyshit · · Score: 2, Interesting
    Firstly scraping is the same as what google does, which is fine but only a fool would trust the scraper not to censor their output.

    Secondly, scraping doesn't always work and you will surely have low-grade porno and get rick quick schemes/scams littering your sematic data.

    But let us suppose that the main benefits of a semantic web are (A) access to reference data [which may be falsified, oops], and (B) access to product availability data [which may be falsified, oops, like mail order companies that pretend they have something in stock but don't and yet still charge your credit card].

    It's just won't work.

    It will always be a rough approximation of reality.

    It's just a way of bad way of caching the results of scraping.

  2. A tale of two technologies.... by Crimson+Dragon · · Score: 3, Interesting

    The Semantic Web appears to be a budding server-side solution to the paradigm of information glut online. Social bookmarking appears to be a client-side solution to the paradigm of information glut online.

    It is refreshing to see exciting new solutions to the problems we have at present of targeted information retrieval on the internet. I can remember years of stagnation in this field (read: early 90's), and any change from today's google-and-pray searching mentality among the majority of end-users will be welcome.

    --
    The Crimson Dragon
  3. Semantic Horse shit by Anonymous Coward · · Score: 1, Interesting
    I hate to say it, but Semantic Web blows chunks. No business is ever going to tag all their data so that anyone can use it. Business prefer to build specific webservices to integrate and charge customers. Get a clue W3C, RDF is fertilizer. So far, all the RDF rule engines out there suck from a scalability and performance perspective. There are two RDF rule engines that claim to implement RETE, but several people have analyzed it and shown that neither Jena2.2 nor pychinko implement RETE.

    The best part is the W3C looks down on the business rules world and openly snubs them. for a long time, the W3C camp snubbed RETE algorithm, claiming RDF graphs are better. Once people saw how horrible RDF engines perform as rule count and data increases, they rushed to hack together junk and label it RETE. Sorry, but you have to first understand RETE to implement it. A clueless bunch of impractical day dreamers.

  4. My question by News+for+nerds · · Score: 4, Interesting

    Does it have a countermeasure against 'semantic spam'?

    1. Re:My question by smartdreamer · · Score: 2, Interesting

      There is no such thing as semantic spam. What you refer to is desinformation or information junk. Like the actual web, semantic web is about freedom, openess and accessibility. So, everybody can publish (I don't refer to governement laws, repression, etc.). But semantic web has a solution to this wave of information in a thing called the web of trust which propose giving trust ranking to information and introduce inference engines to compute which links/sites may interest you and why. But this is not for today. ;)

  5. Re:Google watch out... by Anonymous Coward · · Score: 1, Interesting

    Note to self. Dreaming about the world tagging all their data isn't going to happen. It takes too much damn time. Semantic driven search using google's technique works. Producing a RDF graph is crap. Nothing to watch here.

  6. Slashdotting Google bomb? by bcmm · · Score: 2, Interesting

    That second link goes to http://www.google.com/url?sa=U&start=1&q=http://ww w.w3.org/2001/sw/&e=9707
    How is that different to linking to http://www.w3.org/2001/sw/?

    Is Slashdot trying to improve someone Google ranking?

    (Also, did Slashdot always linkify URLs entered as plaintext? I didn't write any "a href" for those two.)

    --
    # cat /dev/mem | strings | grep -i llama
    Damn, my RAM is full of llamas.
  7. Re:RSS is not Semantic Web by Anonymous Coward · · Score: 1, Interesting
    As for RSS, it is limited, but it took off rapidly. RSS v1.0 introduced RDF. That is another step in the right direction. BTW RDF isn't that complicated. Think of it as a triplet : Subject Verb Objet.

    I don't think the evidence on RDF mailing list supports that opinion. Look at the literature in the bookstores about semantic web. If anything, it is full of confusion and the specification is poorly written compared to the HTML and XML specification.

    Triplet does not equal (Subject verb object). What the RDF spec describes is closer to Natural Language parsing concepts. There are many similarities between what the RDF describes as RDF Model graph and dependency grammar techniques http://w3.msi.vxu.se/~nivre/research/sdg.html.

    Anyone remotely interested in NLP knows the problem is very hard to solve using dependency grammar techniques. Statistical approaches have been shown to perform much better.

    Semantic Web is essentially repeating the same mistakes already made in the AI world with NLP. the W3C seems blind to these facts and that's why semantic web is doomed to fail.

  8. Re:Google watch out... by ShinmaWa · · Score: 2, Interesting
    However, it does place a lot of demand on the content provider to provide metadata-rich content

    This statement is why I was wondering why this was considered such a wonderful thing. For a while now, there's been a research project at IBM called WebFountain that not only does everything that Semantic Web attempts to do, but doesn't require any special mark up either. Its goal is to work with completely unstructured data of any type, including web pages, powerpoint documents, word docs, PDFs, etc etc. Based on the article I linked above (which is 18 months old), it seems Semantic Web is actually much more primitive.

    More to the point, in this blog there was an arcticle on WebFountain. In the comments section there was this mention of WebFountain in an RDF/OWL environment:
    if everyone were to agree on a tag set and apply it consistently, and tag everything of possible business interest, then yes, WebFountain would not be so relevant...and people would also need to tag for things that they don't even know will be businesses in 50 years [...] We'll see if that pans out!
    To me, that hit the nail on the head and why a markup-based semantic engine is doomed to failure. While the remark was in a business-context, I think its just as valid in any context.
    --
    The /. Effect: Thousands of users simultaneously accessing a site to not read its content.