Slashdot Mirror


Semantic Web Getting Real

BlueSalamander writes "Tim O'Reilly just did an interview with Devin Wenig, the CEO-designate of Reuters. With no great enthusiasm I started to read yet another interview on how the semantic web was going to make everything great for everybody. Wenig made some good points about the end of the latency wars in news and the beginning of the battle for automatically detecting linkages and connections in the news. Smart news, not just fast news. Great stuff — but just more words? Nope — a little searching revealed that Reuters just opened access to their corporate semantic technology crown jewels. For free. For anyone. Their Calais API lets you turn unstructured text into a formal RDF graph in about one second. I ran about 5,000 documents through it and played with a subset of them in RDF-Gravity. The results were impressive overall. Is this the start of the semantic web getting real? When big names and big money start to act, not just talk, it may be time to pay attention. Semantic applications anyone? The foundation appears to be here."

16 of 135 comments (clear)

  1. Re:Semantic Spam by Reverend528 · · Score: 4, Funny

    Well, as long as the spammers stick to the spec and use the type for their content, then it should be pretty easy to filter.

  2. Content? by Walzmyn · · Score: 4, Insightful

    What good are fancy links if the content still sucks?

  3. Yawn... by icebike · · Score: 4, Interesting

    So I need this WHY?

    Most websites have little to say, and take all day to say it.
    Having a detailed graphical analysis of the blather seems unlikely to improve the situation. GI,GO.

    It would seem spending just a tad more time writing for HUMANS would be way more productive than writing for machines. Having a thousand computers watching your 100 monkeys seems unlikely to bring enlightenment or useful knowledge out of a pile of garbage and human blathering that passes for information on the web these days.

    People used to write web pages.
    Now they write software to write web pages.
    Its not surprising they now need to write software to understand the web pages.
    Whats the point?

    --
    Sig Battery depleted. Reverting to safe mode.
    1. Re:Yawn... by QuantumG · · Score: 4, Interesting

      Writing AI that can read English (and all the other languages) and figure out the meaning is just, well, taking too long. But let's say it wasn't.. what would be the point? Would you say there was no point? Or would you say it was freakin' awesome and look forward to the day when you can actually ask a question and get a sensible answer from a machine?

      Well, if we are very forgiving we can get this kind of thing happening with current technology, we just have to supply all the "content" in a form that our primitive algorithms can handle. The Semantic Web is that. Maybe around the 3rd generation of these algorithms we might be ready to do the translation to machine form automatically.. maybe not.. but at least the Semantic Web people are again talking about translation.. was a time when they all said it was a fruitless path and the best way was to just supply applications for creating machine readable content easily.

      --
      How we know is more important than what we know.
    2. Re:Yawn... by daigu · · Score: 4, Interesting

      I'll tell you why you need it. It provides another layer of abstraction. Let's try a few illustrative examples.

      1. Let's say you work for a Fortune 500 company and you get over 10,000 emails a day from customers complaining. Do you think it is better to read each one or have a tool that abstracts it to graphically display key concepts that they are complaining about so management can do something about it today?

      2. You are a clinical researcher in Cancer and have a terabyte of unstructured patient data. Can you think how text descriptions of pathology reports might be displayed graphically against outcomes to suggest some interesting insights?

      There's a lot of useful information that isn't on blogs - although it would be useful for them too. You need to exercise a bit more imagination.

    3. Re:Yawn... by QuantumG · · Score: 4, Insightful

      Ok, you seem to be of the belief that I'm still talking about search.. in the classical "give me a web page about" sense. I'm not.. and the Semantic Web people are not. "next" has a meaning.. everyone knows what it is. "shuttle launch" has an almost unique meaning.. although some concept of our culture and common sense is needed to disambiguate it. Asking when the next shuttle launch is has a unique answer: a date and a statement of the confidence in that date. For example "March 12, depending on weather and other things that might scrub the launch." I don't expect this to be "webpages that are kept up-to-date with information specific to the next shuttle launch"... I expect the answer to my question to be synthesized in real time from a dynamic pool of knowledge which is obtained from reading the web. I want a brain in a jar that is at my beck and call to answer every little question like this that I have through-out the day.. on everything from spacecraft launches to what the soup of the day is at the five closest restaurants to my office. There doesn't need to be some web page that is updated daily by some guy who works near me and enjoys soup.. there just needs to be information on soup and location posted by restaurants in my area.

      So am I talking about search? Well, yes, but its an algorithm that uses search to answer my questions.. instead of me having to do it.

      Think about that soup question.. how would you do it now? I'd go to Google maps.. enter the location of my office, search businesses for restaurants, click on one of the top 5 to see if they have a daily updated menu, note the soup of the day, go back to Google maps, click on the next one, etc, until I had the answer I wanted. That's a pretty simple algorithm.. it's something a machine learning system could come up with.

      --
      How we know is more important than what we know.
  4. Re:Where's the Money? by QuantumG · · Score: 5, Insightful

    Yeah, it won't matter until Google starts getting in on the act. When you can search for "a website where I can get free kittens and other pets" and get exactly that, instead of just sites that have those keywords in it (like this message in a day or so), then it will be valuable for people to RDF their site and maybe even look at the mess that the translator makes and clean it up.

    --
    How we know is more important than what we know.
  5. Great, just great ... by ScrewMaster · · Score: 4, Funny

    Semantic Web Getting Real

    Just what we need. Yet another version of RealPlayer.

    --
    The higher the technology, the sharper that two-edged sword.
  6. Re:Symantec Web? AHHHHHHH!!! by bane2571 · · Score: 4, Funny

    I read it like this:
    Semantic web getting real [player]
    and immediately thought "it was bad enough when the original web got it"

  7. In case you have no clue what they're talking abou by WK2 · · Score: 4, Informative

    If you are like me, and have absolutely positively no dang fucking clue what the summary is talking about: http://en.wikipedia.org/wiki/Semantic_Web

    According to the Wikipedia history, this concept has been around since at least 2001.

    --
    Write your own Choose Your Own Adventure. http://www.freegameengines.org/gamebook-engine/
  8. Not the Semantic Web by timeOday · · Score: 5, Insightful

    IMHO this is not the semantic web. The primary representation is still (just) natural language. Anything in addition to that is really just search engine technology under a different banner. Is that a bad thing? No! I've always said the semantic web was bound to fail because people don't want to spend a lot of extra effort tagging their information so others can slice and dice it; instead, the evolution of natural language processing in search (rather than manual tagging) will solve the problem. Maybe the Reuters idea of exposing the "inferred" metadata will be useful (as opposed to normal searches like google who simply keep this metadata in their own indices), though as yet I don't see why.

  9. Re:Semantic Spam by fonik · · Score: 4, Insightful

    And this seems to be a major problem of the whole semantic web buzz. Search engines like Google can cut down on abuse because they're a third party that is unrelated to the content. The whole semantic web thing offloads categorization to the content source, the very party that is most likely to try to abuse the system.

    It just doesn't seem like the best idea in the world to me.

  10. Re:Semantic Spam by Necrobruiser · · Score: 5, Funny

    Of course you realize that this will just lead to a bunch of neo-netzis with their anti-semantic remarks....

    --
    "I planned within my means and got a fixed rate mortgage, so where's MY bailout?" -cafepress
  11. A Little too Cynical by Gregory+Arenius · · Score: 4, Insightful

    I understand being jaded about internet hype and buzzwords but I'm still surprised that after nearly eighty comments there doesn't seem to be anyone who has anything to say other than "vaporware" and "it won't work because of the spammers." Yes, maybe it has been overhyped and yes it is taking a while for the envisioned ideas to come to fruition but that doesn't mean that those ideas aren't worthwhile.

    I'll use the following example because I recently had to do this with non semantic tools. Lets say you wanted to see how good or bad a job a transit agency is doing in its city in comparison to other similar cities. A couple of metrics you might use to find similar cities would be population size, population density and land area. Google doesn't do a good job with something like that. You end up needing to search for cities individually and then finding their data points. Or you can find a list of cities ranked by population or population density. If you search on Google for something like that you end up at one of the Wikipedia lists. These lists are helpful but....still lacking. They don't contain all the cities you need or they don't provide a way to look at multiple data sets at the same time. The lists are also compiled by hand and aren't automatically updated when the information on the city page is changed. The data is in wikipedia though. Every city page lists that information in a little box near the start of the article. But how do I take this data that is in Wikipedia from the form that its in into a form that I can use to find what I need to know? Enter the semantic web.

    Lets say that wikipedia, or at least the parts dealing with geography, were semantic. Now, there are tens of thousands of pages describing countries, regions, states, counties, parishes, cities, towns and villages. Then those pages are translated into many other languages. Some of the data that these pages contain is of the same type . They all contain the name of the locality, latitude, longitude, size, population size and elevation. For data such as this it would be pretty easy to have a form to enter the data into as opposed using the usual markup and the form could put the data into the proper markup for the page and the proper RDF. Once the data is in proper RDF form it would be easy to automate the process of updating translations of that page with the new data as well as updating any pertinent lists. It would also make it easier for people who want to analyze or use the data because they would be able to access it much more easily.

    But nobody really wants machine readable access to this information, you might say, except for the random geek and researcher. I would disagree. Lets say you're using a program like Marble which is similar to Google Earth in some ways but is completely open source. If they wanted to display the population of a city when you hover over it they would currently have to create and maintain their own dataset or they'd have to write a parser to extract it from wikipedia. Neither of those options is particularly easy at the moment but if the information was in semantic form on wikipedia it would be a piece of cake.

    The strength of the semantic web isn't, in my opinion, going to be AI like personal agents or anything like that. It'll be things that in many ways are already here. Like Yelp putting geotags on the restaurants they reviews and apps like Google Earth taking that data thats available in machine readable (Semantic!) for to overlay that data on a map so that you can see whats nearby. It'll be applications doing the same with the geotags from flickr. Its really useful mashups like http://www.housingmaps.com/. Its the transit agency putting realtime bus data up in semantic form so you can see on your iphones google map how far away the bus is. So yeah, maybe the semantic web is overhyped but that doesn't mean there isn't a lot of substance there, too.

    Cheers,
    Greg

  12. Re:Semantic Spam by SolitaryMan · · Score: 4, Informative

    And this seems to be a major problem of the whole semantic web buzz. Search engines like Google can cut down on abuse because they're a third party that is unrelated to the content. The whole semantic web thing offloads categorization to the content source, the very party that is most likely to try to abuse the system. It just doesn't seem like the best idea in the world to me.

    I think you are missing the point of Semantic Web: you can refer or link to an object, not just document.

    The company declares its URI. Now, If you are writing an article about this company, you can uniquely identify it and every web crawler knows *exactly* what company are you talking about. If the URI for the company is a hyperlink to its web site, then it can't be abused: the company itself declares what it is. The unique URI will in fact be a link to some file with information about company (maybe an RDF file -- doesn't really matter for the concept)

    The system can (and will be abused) in the same way as an old web: irrelevant links, words, concepts -- nothing new for the crawler and can be defeated with existing techniques.

    Again, Semantic Web = Links between concepts, not just documents, so please do not bury the good idea under the pile of misunderstanding.

    --
    May Peace Prevail On Earth
  13. Vapourware my arse by theno23 · · Score: 4, Insightful

    The company I work for, Garlik has two products that are run off semantic web technology. DataPatrol (for pay) and QDOS (free, in beta).

    We use RDF stores instead of databases in some places as they are very good at representing graph structures, which are a real pain to real with in SQL. You often hear the "what can RDF do that SQL can't" type arguments, which are all just nonsense. What can SQL do that a field database, or a bunch of flat files can't? It's all about what you can do easily enough that you will be bothered to do it.

    A fully normalised SQL database has many of the attributes of an RDF store, but
    a) when was the last time you saw one in production use?
    b) how much of a pain was it to write big queries with outer joins?

    RDF + SPARQL makes that kind of thing trivial, and has other fringe side benefits (better standardisation, data portability) that you don't get with SQL.

    I guess it shouldn't be a surprise to see the comments consisting of the usual round of more-or-less irrelevant jokes and snide commentary - this is Slashdot after all - but I can't help responding.