Slashdot Mirror


Using the Semantic Web to Enhance Search

RobMcCool writes "At Stanford KSL, we really like the Semantic Web. So we've taken many of our favorite web sites, scraped them, and put together a huge pile of RDF, which we'll let you download. We've used that RDF to create a search application, in the spirit of Google Q & A or Microsofts recently announced MSN Search extensions. Our search can answer simple factual queries like the previously discussed population of Portugal but can also answer some more complex ones. We also have a smart autocomplete system, type "tom hanks birth" slowly to see it in action (best with Firefox). We're looking for people to be a part of this search system by running their own search sites, and by putting their data on the Semantic Web. Come check it out!"

22 of 150 comments (clear)

  1. Google watch out... by jason718 · · Score: 5, Insightful

    Semantic-driven search engines have awesome potential. However, it does place a lot of demand on the content provider to provide metadata-rich content - or to be able to provide intelligent mining tools to create metadata from existing sites.

    This is definitely one to watch...

    1. Re:Google watch out... by ShinmaWa · · Score: 2, Interesting
      However, it does place a lot of demand on the content provider to provide metadata-rich content

      This statement is why I was wondering why this was considered such a wonderful thing. For a while now, there's been a research project at IBM called WebFountain that not only does everything that Semantic Web attempts to do, but doesn't require any special mark up either. Its goal is to work with completely unstructured data of any type, including web pages, powerpoint documents, word docs, PDFs, etc etc. Based on the article I linked above (which is 18 months old), it seems Semantic Web is actually much more primitive.

      More to the point, in this blog there was an arcticle on WebFountain. In the comments section there was this mention of WebFountain in an RDF/OWL environment:
      if everyone were to agree on a tag set and apply it consistently, and tag everything of possible business interest, then yes, WebFountain would not be so relevant...and people would also need to tag for things that they don't even know will be businesses in 50 years [...] We'll see if that pans out!
      To me, that hit the nail on the head and why a markup-based semantic engine is doomed to failure. While the remark was in a business-context, I think its just as valid in any context.
      --
      The /. Effect: Thousands of users simultaneously accessing a site to not read its content.
    2. Re:Google watch out... by Metasquares · · Score: 2, Informative

      As one who has written semantic web pages, it's also rather difficult. OWL is a real pain to write, and most interpreters don't support "OWL Full", which means I'm stuck writing for either "OWL Lite" (now with only half the calories!) or "OWL DL". Forget (X)HTML, too - you need to use XML+RDF to use OWL, which means that if you want content you either need a parser or you need to code two documents for each one: One for human readability, and one that contains the metadata. There used to be a language called SHOE that embedded metadata into HTML via meta tags, but that seemed to have been supplanted by DAML+OIL and OWL.

      If it's made easier to write (like SHOE was, actually), I can see widespread adoption, because the idea of adding machine-searchable metadata to a document is very good; the implementation is just very poor. Otherwise, expect to be paying your web developers a lot more, both rate and timewise, in the future!

  2. From the check it out link... by Anonymous Coward · · Score: 2, Funny
    "Search on TAP was built to answer the following types of queries: There are also two actors named Harrison Ford: the one who played Han Solo, and a silent film star from the 1920's."

    That's nice and all but who shot first and is there a mash up of both scenes with crazy alien bar music mixed with 20's sinister piano.

  3. autocomplete by cryptoz · · Score: 5, Insightful

    Autocomplete is a useless feature that nobody wants to see when the type "a"...and see it load everything that beings with "a". The user is not interested in items starting with "a". Perhas they're interested in terms beging with "anon" or something, which has many fewer items to load, therefore making the load time much faster and not annoying the user in the process.

    Or, even better, never have any autocomplete turned on automatically. Do a VB-like idea, where if you want to see possibilities at a certain point, hit a specific key that will register for the list to pop down.

    1. Re:autocomplete by davegust · · Score: 2, Insightful

      Have you tried Google Suggest? Auto complete is very useful when it doesn't slow down the typing, and when the results are in a useful order.

  4. Semantic Web? by DoctoRoR · · Score: 4, Informative

    The Stanford research is interesting, but I'm still trying to make up my mind about the Semantic Web, learning about RDF, and whether I need to bake in ways of handling these kinds of assertions in my web app. The Stanford group writes, "Our hope is that our search application spurs development of the Semantic Web, and leads to sites publishing their data in this format so that we don't have to." It obviously takes more work to encode such information and getting user contributions auto-marked for the semantic web. For a counter viewpoint, take a look at some of Clay Shirky's work -- in particular:

    Will the semantic web be supported by future versions of Drupal, phpBB, and other grass-roots content management web apps? Not sure. Since a lot of the content is visitor generated, you would have to build in ways of providing easy markup. Would be interested to hear /. thoughts on the matter.

  5. Re:best with firefox by Timesprout · · Score: 4, Insightful

    No, 'works best with Firefox' is just as bad as 'works best with IE'. What would be nice would be to see 'works best with any standards compliant browser'.

    --
    Do not try to read the dupe, thats impossible. Instead, only try to realize the truth
    What truth?
    There is no dupe
  6. slashdotted by maharg · · Score: 2, Funny
    --

    $ strings FTP.EXE | grep Copyright
    @(#) Copyright (c) 1983 The Regents of the University of California.
  7. Semantic Web Pitfalls by aftk2 · · Score: 3, Insightful

    While the idea of the semantic web has been legitimately lambasted, I think it's a bit far from DOA. While I agree that it's not exactly practical, I think that if you get enough sites displaying their content in such a manner, you'll eventually reach a point at which others will do the same.

    I mean, think about it this way - while laziness or inertia might initially win out, once someone's competitors start to explore the idea of the semantic web, interest will start to be shown in it, especially once it becomes either profitable to do so.

    --
    concrete5: a cms made for marketing, but strong enough for geeks.
  8. This won't work by holyshitholyshit · · Score: 2, Interesting
    Firstly scraping is the same as what google does, which is fine but only a fool would trust the scraper not to censor their output.

    Secondly, scraping doesn't always work and you will surely have low-grade porno and get rick quick schemes/scams littering your sematic data.

    But let us suppose that the main benefits of a semantic web are (A) access to reference data [which may be falsified, oops], and (B) access to product availability data [which may be falsified, oops, like mail order companies that pretend they have something in stock but don't and yet still charge your credit card].

    It's just won't work.

    It will always be a rough approximation of reality.

    It's just a way of bad way of caching the results of scraping.

  9. A tale of two technologies.... by Crimson+Dragon · · Score: 3, Interesting

    The Semantic Web appears to be a budding server-side solution to the paradigm of information glut online. Social bookmarking appears to be a client-side solution to the paradigm of information glut online.

    It is refreshing to see exciting new solutions to the problems we have at present of targeted information retrieval on the internet. I can remember years of stagnation in this field (read: early 90's), and any change from today's google-and-pray searching mentality among the majority of end-users will be welcome.

    --
    The Crimson Dragon
  10. awesome! by Anonymous Coward · · Score: 3, Funny

    ...now I can finally search for "images of women with breasts larger than 36D"!

  11. Might actually help by Artifakt · · Score: 3, Insightful

    This looks like it will broaden the volume of useful searches. Right now, there are at least two limits that show up when searching:

    1. For really popular subjects, the useful links are swamped in the noise of sites trying to make a buck off of getting you to look at their ads before directing you to somewhere else, that might have the actual content or might not.

    2. For many less popular subjects, there is some oddity, like an unusual term being borrowed by some other field, so that it is something most people have never heard of, but people in two or more specialties use it frequently, in very different ways. resulting in strangeness. (i.e. the search engine throws up 23,003 links for a search on "Sator Resartus". 30% are esoteric literary criticism, 20% relate to apoptosis (cell biology), 20% relate to building moral inhibitions into A.I., 10% to Keith Laumer novels, and the rest are probably noise).

    (I'm sure there are more than these two limits. Someone else may want to comment on some others).

    This is likely to help with the second case, oddities in the data set grouping. (it could sort links into the larger sub-categories, query the user which one(s) seemed most applicable, and maybe even sort out a small set of links that explain, for the previous example, how a high brow literary term got borrowed by the other fields).
    It's not as likely it would help with the first case, though, as sites that don't have actual content are actively duplicitous. Something that is actively trying to fool humans is still likely to be very successful at fooling our tools.

    --
    Who is John Cabal?
  12. My question by News+for+nerds · · Score: 4, Interesting

    Does it have a countermeasure against 'semantic spam'?

    1. Re:My question by smartdreamer · · Score: 2, Interesting

      There is no such thing as semantic spam. What you refer to is desinformation or information junk. Like the actual web, semantic web is about freedom, openess and accessibility. So, everybody can publish (I don't refer to governement laws, repression, etc.). But semantic web has a solution to this wave of information in a thing called the web of trust which propose giving trust ranking to information and introduce inference engines to compute which links/sites may interest you and why. But this is not for today. ;)

  13. Slashdotting Google bomb? by bcmm · · Score: 2, Interesting

    That second link goes to http://www.google.com/url?sa=U&start=1&q=http://ww w.w3.org/2001/sw/&e=9707
    How is that different to linking to http://www.w3.org/2001/sw/?

    Is Slashdot trying to improve someone Google ranking?

    (Also, did Slashdot always linkify URLs entered as plaintext? I didn't write any "a href" for those two.)

    --
    # cat /dev/mem | strings | grep -i llama
    Damn, my RAM is full of llamas.
  14. Re:Semantic Horse shit by Anonymous Coward · · Score: 2, Insightful

    Nice straw man argument. How many people making their own personal site is going to dedicate 2/3 of their time to tag their content? The only people that are going to tag their content are those looking to abuse the system. No sane individual is going to spent 3 months of time to go back and edit all their pages with tags. Even then, you still have the problem of conflicting categories (aka ontologies). There will never be a globally accepted set of Onotologies. It's all pipe dream. Why should users spend hours and hours to tag their site when google is already doing a good job of indexing pages?

  15. I'm still trying to figure out... by rah1420 · · Score: 2, Funny

    ...not only what the Semantic Web is about, but more pragmatically why this is in "Hardware." :)

    --
    Mit der Dummheit kämpfen Götter selbst vergebens.
  16. You missed the point! by holygoat · · Score: 2, Insightful

    The Semantic Web is about describing resources, not tagging pages.

    Indeed, you might output RDF from your processing of Web pages.

    Extracting information from semi-structured text is very different to making logical assertions about resources.

  17. Re:Bashers watch out... by Dasch · · Score: 2, Insightful
    So, where do you find the business case that justifies web designers all over the world spending even 10 % extra time to specify the information needed by the Semantic Web???

    if it would mean that their sites would rank higher in the search results, I'd say that they all would...
  18. How is this different from HTML? by klatty · · Score: 2, Insightful
    The Semantic Web provides a common framework that allows data to be shared and reused across application, enterprise, and community boundaries.


    Isn't this basically what HTML is supposed to do kind of?