Slashdot Mirror


Semantic Web Getting Real

BlueSalamander writes "Tim O'Reilly just did an interview with Devin Wenig, the CEO-designate of Reuters. With no great enthusiasm I started to read yet another interview on how the semantic web was going to make everything great for everybody. Wenig made some good points about the end of the latency wars in news and the beginning of the battle for automatically detecting linkages and connections in the news. Smart news, not just fast news. Great stuff — but just more words? Nope — a little searching revealed that Reuters just opened access to their corporate semantic technology crown jewels. For free. For anyone. Their Calais API lets you turn unstructured text into a formal RDF graph in about one second. I ran about 5,000 documents through it and played with a subset of them in RDF-Gravity. The results were impressive overall. Is this the start of the semantic web getting real? When big names and big money start to act, not just talk, it may be time to pay attention. Semantic applications anyone? The foundation appears to be here."

14 of 135 comments (clear)

  1. Where's the Money? by Blakey+Rat · · Score: 2, Interesting

    I've never understood what the financial benefits for a site joining the semantic web are supposed to me. Reuters may be one thing, but how would you sell this technology to Amazon? Or NewEgg? If commercial sites can't/won't use it, how is it supposed to gain critical mass?

    1. Re:Where's the Money? by pereric · · Score: 3, Interesting
      If I have a business selling - for example - bicycle pedals, being well listed at www.bike-pedal-finder.com, or by users of some yellow pages could certainly help my business. If the search engines could use information like below, it will probably help:

      <dealer name="my company">
        <in stock>
          <pedal model=M525 price=20E>
          <pedal model=M324 price=10E stauts=pre-owned>
        </in stock>
        <location> ... </location>
        <shipping> ... </shipping>
      </dealer>
  2. Yawn... by icebike · · Score: 4, Interesting

    So I need this WHY?

    Most websites have little to say, and take all day to say it.
    Having a detailed graphical analysis of the blather seems unlikely to improve the situation. GI,GO.

    It would seem spending just a tad more time writing for HUMANS would be way more productive than writing for machines. Having a thousand computers watching your 100 monkeys seems unlikely to bring enlightenment or useful knowledge out of a pile of garbage and human blathering that passes for information on the web these days.

    People used to write web pages.
    Now they write software to write web pages.
    Its not surprising they now need to write software to understand the web pages.
    Whats the point?

    --
    Sig Battery depleted. Reverting to safe mode.
    1. Re:Yawn... by QuantumG · · Score: 4, Interesting

      Writing AI that can read English (and all the other languages) and figure out the meaning is just, well, taking too long. But let's say it wasn't.. what would be the point? Would you say there was no point? Or would you say it was freakin' awesome and look forward to the day when you can actually ask a question and get a sensible answer from a machine?

      Well, if we are very forgiving we can get this kind of thing happening with current technology, we just have to supply all the "content" in a form that our primitive algorithms can handle. The Semantic Web is that. Maybe around the 3rd generation of these algorithms we might be ready to do the translation to machine form automatically.. maybe not.. but at least the Semantic Web people are again talking about translation.. was a time when they all said it was a fruitless path and the best way was to just supply applications for creating machine readable content easily.

      --
      How we know is more important than what we know.
    2. Re:Yawn... by QuantumG · · Score: 3, Interesting

      Uh huh.

      When is the next shuttle launch?

      This is the first hit, not shuttle launch info.

      This is the second hit.. ah hah! The next launch is on Feb 7.. wait a minute, it's Feb 10! Was it delayed or something? Oh, I see, it says "Launched".. great, when's the next one.. March 11 +.. hmm.. wtf does + mean? Apparently I need to read this and hmm.. nothing there about what the + means.. I guess it means it might get delayed, they do that.

      See all that reasoning I had to do? See how long that took me? That's what the Semantic Web is for.

      --
      How we know is more important than what we know.
    3. Re:Yawn... by daigu · · Score: 4, Interesting

      I'll tell you why you need it. It provides another layer of abstraction. Let's try a few illustrative examples.

      1. Let's say you work for a Fortune 500 company and you get over 10,000 emails a day from customers complaining. Do you think it is better to read each one or have a tool that abstracts it to graphically display key concepts that they are complaining about so management can do something about it today?

      2. You are a clinical researcher in Cancer and have a terabyte of unstructured patient data. Can you think how text descriptions of pathology reports might be displayed graphically against outcomes to suggest some interesting insights?

      There's a lot of useful information that isn't on blogs - although it would be useful for them too. You need to exercise a bit more imagination.

    4. Re:Yawn... by MightyYar · · Score: 2, Interesting
      It's a damn good point, but I'm better at it than a computer. Though to tell you the truth, Google's spam filter on gmail is darned close to perfect (once trained) - so I can see how they would be able to filter the information using something akin to their spam filter. And they'd still use something like pagerank to rank the results, so that might go a long way toward nailing the spammers.

      But I wonder whether that approach is going to be any simpler or more effective than just developing better or more intelligent search algorithms? Then they don't have to determine whether or not the information is bullshit, because chances are that I'm not searching for herbal Viagra so my search terms aren't in the page.

      It's not just spammers that will throw a wrench into the semantic web... what if I accidentally leave out the metadata for a page? Or make a cut-and-paste error and forget to edit the metadata so that it is completely wrong for a page? The answer, as I see it, is computer-generated metadata... at which point, why not just build that functionality into your search engine?

      By the way, if you instead search for "Space Shuttle Launch Schedule", the first result on Google is very apropos. I often find that Google rarely leads you astray once you learn to think like a search engine (which isn't very hard - they are dumb). But I'll grant you that a more natural language for search queries would be a boon for beginners.

      Oh, and the plus after March 11? There is a legend at the top of the page:

      Legend: + Targeted For | * No Earlier Than (Tentative) | ** To Be Determined :)
      --
      W..w..W - Willy Waterloo washes Warren Wiggins who is washing Waldo Woo.
  3. Command line vs GUI all over again by EmbeddedJanitor · · Score: 3, Interesting
    THis looks like command line vs GUI wars all over again. GUIs are fine for rapidly hitting easy-to-find targets but sometimes typing is far easier and faster. Lumbering crap GUIs are really hard to drive (eg. MS Visual Studio).

    Semantic webs might be OK for small document sets where you can visualy search tags and click them. Want to look up something about monkeys? Look for the tag that says monkeys (or maybe find primates first, then monkeys) and click it.

    But for huge data sets this sucks. After a smallish number of documents & subjects it must be far easier to type monkeys in search box and have Google etc do the search.

    This might work for handling some queries, but will suck supremely for complex queries over large data sets (eg. the whole www).

    --
    Engineering is the art of compromise.
    1. Re:Command line vs GUI all over again by smurgy · · Score: 3, Interesting

      I really think you're forgetting about the power of booleans over indexed content and the weakness of string searching. Positing a tag-dense web search in which autoindexers crunch tags for every page as one containing an overabundance of hits compared to string searching is arguable, but in fact what tag searching does is provide a far meaningful range of hits. There might or might not be more, but it's better.

      We need to couple the proposed "semantic web" with more than the single-box search page or rather, allow users who can't cope with anything beyond single-box and/or learning to use operators to have their good old search google interface as a second option and put the current advanced search on the front end.

      Pie in the sky I know, but I like to think that the drive to search simplicity is reflective of the needs of the last generation (scared of information density) and not of the potential of the future ones (growing up searching).

      I can handle a search pretty well, and I'd enjoy getting more of a chance to search for meaning not just strings. Think of a search page with a theoretically infinite number of boxes - each box drops down to a specific type of search (tags, headers, content etc.) and operator, each box I can put an importance rating (so pages with matching tags are vital, pages with matching strings rank higher but aren't necessary etc. etc. etc. depending on my needs) and under the bottom box is a spawn new box button. If I don't like my results I customise my search, search-in-results, change my elements.

      Professionally I work with custom-indexed databases all the time and it's a pain in the behind to know the amount of information available of the net but be faced time and again with its limitations. Every criticism you make of semantic searching here applies ten times over to string searching. Should the tag creation software be able to match human tagging in accuracy it would easily override it in coverage. As to accuracy, look at the tags assigned here. The article references the OS release of Reuters' Calais, and someone's assigned the tag "vaporware". Given that vaporware is by definition unreleased (and never to be released) software I'd say human tagging is running at 33% failure on this article at least.

  4. Re:What? by owlnation · · Score: 2, Interesting

    Yes -- essentially.

    And the only reason we moved from Web 1.0 to web 2.0, and the only reason we need to move from Web 2.0 to Web 3.0 is...

    We are still stuck on Search 1.0

    Well, ok, to be fair to Google -- Search 1.5

    Sorry, but we won't see much improvement in utility until someone rolls out Search 2.0. That is a product LONG overdue.

  5. hype, waste of time, or big mess by globaljustin · · Score: 3, Interesting
    the wiki article you linked to says:

    For example, a computer might be instructed to list the prices of flat screen HDTVs larger than 40 inches (1,000 mm) with 1080p resolution at shops in the nearest town that are open until 8pm on Tuesday evenings. Today, this task requires search engines that are individually tailored to every website being searched. The semantic web provides a common standard (RDF) for websites to publish the relevant information in a more readily machine-processable and integratable form

    On first read, I like what they are trying to do, but I see so many problems with what they are thinking, and I am not a web designer in any sense.

    First, I don't have a problem finding things to buy on the internet. The problem is, signal to noise ratio. There are TOO MANY google results for something like 'plasma tv.' No matter what kind of RDF is used, it will be abused by people who want their URL to show up in your search for whatever reason. I think someone touched on this earlier a little in this thread, but it deserves repeating.

    Second, can you imagine a scenario where, say, best buy or fry's uses some 'semantic web' application to do real time web searchable updates of their inventory? That's what would have to happen for this to work, and do something that isn't already possible.

    Right now, I can search for 'plasma tv' in google or ebay. Then I can call my local retailers to see if they carry that item, and have it in stock. In order for this system to make any kind of tangible change in the example given, retail chains would have to update their inventories online, whenever a purchase is made, or new items delivered to the store.

    It's an interesting idea. I wonder if the retailers would go for it? All it means for them is fewer people comming into their stores...sounds like that would hurt sales.

    I also hate internet hype. I really fouls things up, more than some want to acknowledge. I try to keep my 64 year old dad educated enough to buy coffee beans on ebay, check email, look at news, etc. Every time he sees 'symantic web' or 'web 2.0' in the media, it just confuses him, and I imagine, people like him who just use the net for basics like online bill pay, ebay, etc. He doesn't need a new buzzword to motivated to shop online or whatever.

    he has the motivation already...silly contrived 'new meida' buzzwords just waste time and confuse people
    --
    Thank you Dave Raggett
  6. Re:Semantic Spam by nwbvt · · Score: 2, Interesting

    It does seem like we are in a cycle. Way back in the days when dinosaurs like Lycos and Hotbot ruled the search engine world, information on the net was categorized by tagging. Those of you over the age of 17 remember it, back then if you did a search for "American Revolution" half your results would end up being porn sites that put meta tags containing the phrase "American Revolution" on their page (although I can say those were great days to be a teenager). Then Google came about with their new "Page Rank" system which was much harder (though still not impossible, look up Google-bombing or the church of scientology's use of Google for more details) to fool. Now all of a sudden we hear talk of going back into a world of tags that are being advertised as more "democratic" and this more sophisticated (but similarly flawed scheme) known as the "semantic web". Who wants to bet this new system won't last more than at most a year or two?

    --
    Mathematics is made of 50 percent formulas, 50 percent proofs, and 50 percent imagination.
  7. natural language processing in search? by pbhj · · Score: 2, Interesting

    timeOday >>> "evolution of natural language processing in search (rather than manual tagging) will solve the problem"

    But then if you're creating an addon for joomla (or any template elements really) to display event listings why not add a semantic tag so that a search engine could limit the domain by "tag:events". The extra effort involved is pretty minimal, especially when, if you code well, each event is probably in a "<div class="event eventtype"> ..." anyway.

    Once people realise that search engines can do semantic filtering then it will be worth it.

    As for tag-spamming well surely google, et al., won't accept based on tag first but will do their usual contextual/ quantative analyses first and then limit based on tags. So we wouldn't be gaining any spam over what we have now?

  8. Re:Semantic Spam by soxos · · Score: 2, Interesting

    The whole semantic web thing offloads categorization to the content source, the very party that is most likely to try to abuse the system.
    That's the same criticism given to Wikipedia or unmoderated Slashdot. Consider Semantic web for discovery combined with moderation and see that there could be something to this.