Slashdot Mirror


Semantic Web Getting Real

BlueSalamander writes "Tim O'Reilly just did an interview with Devin Wenig, the CEO-designate of Reuters. With no great enthusiasm I started to read yet another interview on how the semantic web was going to make everything great for everybody. Wenig made some good points about the end of the latency wars in news and the beginning of the battle for automatically detecting linkages and connections in the news. Smart news, not just fast news. Great stuff — but just more words? Nope — a little searching revealed that Reuters just opened access to their corporate semantic technology crown jewels. For free. For anyone. Their Calais API lets you turn unstructured text into a formal RDF graph in about one second. I ran about 5,000 documents through it and played with a subset of them in RDF-Gravity. The results were impressive overall. Is this the start of the semantic web getting real? When big names and big money start to act, not just talk, it may be time to pay attention. Semantic applications anyone? The foundation appears to be here."

135 comments

  1. Semantic Spam by Rog7 · · Score: 2, Insightful

    Next up, semantic spam.

    Actually, I think it's beaten the rest of the content to the punch. =(

    1. Re:Semantic Spam by Reverend528 · · Score: 4, Funny

      Well, as long as the spammers stick to the spec and use the type for their content, then it should be pretty easy to filter.

    2. Re:Semantic Spam by smittyoneeach · · Score: 1

      Much of the output of the various news sources today is, arguably, spam.
      So the question I would have liked to pose is:
      Since we can't filter out bias, how can the technology help to make the news biases more transparent and quantifiable?
      For example, work like this about VP Cheney deserves to be bagged, tagged, and ignored, for it is a blemish on the face of legitimate journalism.

      --
      Get thee glass eyes, and, like a scurvy politician, seem to see things thou dost not.--King Lear
    3. Re:Semantic Spam by fonik · · Score: 4, Insightful

      And this seems to be a major problem of the whole semantic web buzz. Search engines like Google can cut down on abuse because they're a third party that is unrelated to the content. The whole semantic web thing offloads categorization to the content source, the very party that is most likely to try to abuse the system.

      It just doesn't seem like the best idea in the world to me.

    4. Re:Semantic Spam by Necrobruiser · · Score: 5, Funny

      Of course you realize that this will just lead to a bunch of neo-netzis with their anti-semantic remarks....

      --
      "I planned within my means and got a fixed rate mortgage, so where's MY bailout?" -cafepress
    5. Re:Semantic Spam by msuarezalvarez · · Score: 2, Informative

      This is slashdot and all, I know. But you seem not to have read even the summary: this is about someone exposing an API which lets you turn text into and RDF graph independently of the text producer. If you want, this something like someone giving you access to a tool like the one used by Google.

    6. Re:Semantic Spam by UbuntuDupe · · Score: 0

      *fighting urge not to say it...*

      I don't think they're being anti-semantic when they say things like that, they're just saying that newish tag systems should be segregated off to a special "neighborhood" of Web, maybe marked with a special star or something, so that they can easily be deleted if they turn out to cause trouble.

      (Admit it, you smirked...)

    7. Re:Semantic Spam by recharged95 · · Score: 1

      Google is tied to all their content cause that how they make money (it IS related). Also, no different if a 3rd party choose not to cut down on the abuse--now you have 2 parties to convince to not abuse. Do no evil is subjective remember?

    8. Re:Semantic Spam by fonik · · Score: 1

      Yeah, I did read that and I was speaking generally about the whole semantic web buzz and not specifically about the artcile. This is a case of a single third party categorizing a large amount of data. Since they are all categorized in the same way the potential for abuse is low. But is that an improvement over current search algorithms?

    9. Re:Semantic Spam by SolitaryMan · · Score: 4, Informative

      And this seems to be a major problem of the whole semantic web buzz. Search engines like Google can cut down on abuse because they're a third party that is unrelated to the content. The whole semantic web thing offloads categorization to the content source, the very party that is most likely to try to abuse the system. It just doesn't seem like the best idea in the world to me.

      I think you are missing the point of Semantic Web: you can refer or link to an object, not just document.

      The company declares its URI. Now, If you are writing an article about this company, you can uniquely identify it and every web crawler knows *exactly* what company are you talking about. If the URI for the company is a hyperlink to its web site, then it can't be abused: the company itself declares what it is. The unique URI will in fact be a link to some file with information about company (maybe an RDF file -- doesn't really matter for the concept)

      The system can (and will be abused) in the same way as an old web: irrelevant links, words, concepts -- nothing new for the crawler and can be defeated with existing techniques.

      Again, Semantic Web = Links between concepts, not just documents, so please do not bury the good idea under the pile of misunderstanding.

      --
      May Peace Prevail On Earth
    10. Re:Semantic Spam by nwbvt · · Score: 2, Interesting

      It does seem like we are in a cycle. Way back in the days when dinosaurs like Lycos and Hotbot ruled the search engine world, information on the net was categorized by tagging. Those of you over the age of 17 remember it, back then if you did a search for "American Revolution" half your results would end up being porn sites that put meta tags containing the phrase "American Revolution" on their page (although I can say those were great days to be a teenager). Then Google came about with their new "Page Rank" system which was much harder (though still not impossible, look up Google-bombing or the church of scientology's use of Google for more details) to fool. Now all of a sudden we hear talk of going back into a world of tags that are being advertised as more "democratic" and this more sophisticated (but similarly flawed scheme) known as the "semantic web". Who wants to bet this new system won't last more than at most a year or two?

      --
      Mathematics is made of 50 percent formulas, 50 percent proofs, and 50 percent imagination.
    11. Re:Semantic Spam by ultranova · · Score: 1

      In Soviet Russia, the system abuses you !

      --

      Forget magic. Any technology distinguishable from divine power is insufficiently advanced.

    12. Re:Semantic Spam by semanticsearch · · Score: 1

      The idea is that there is also an identity and trust infrastructure. Take this and mix with OpenID and you can marginalize spam (as we know it). I know Slashdotters are fond of some kinds of spam.

    13. Re:Semantic Spam by soxos · · Score: 2, Interesting

      The whole semantic web thing offloads categorization to the content source, the very party that is most likely to try to abuse the system.
      That's the same criticism given to Wikipedia or unmoderated Slashdot. Consider Semantic web for discovery combined with moderation and see that there could be something to this.
    14. Re:Semantic Spam by msuarezalvarez · · Score: 1

      Well, this is not a search algorithm, this is an (API giving access to an) algorithm which constructs an RDF graph from plain text data. While such a thing can be used for searching, your question is not very different from comparing apples and oranges.

    15. Re:Semantic Spam by master_p · · Score: 1

      What if the URI points to a link which redirects the user to the company's page but also adds spam in the page?

    16. Re:Semantic Spam by aevans · · Score: 1

      It's already not lasted for 7 or 8 years.

    17. Re:Semantic Spam by Anonymous Coward · · Score: 0

      Of course you realize that this will just lead to a bunch of neo-netzis with their anti-semantic remarks....


      Could this mean we've invoked Gideon's Law?
    18. Re:Semantic Spam by tpz · · Score: 1

      Same problem/solution as the non-semantic web: the page wouldn't have the official URL and would be excluded by engines due to lack of pagerank/whuffie/whatever.

      What the semantic web does above and beyond the normal web is best summed up in two parts:

      1. Allows for relations between concepts, identities, etc., not just documents.

      2. Allows for the relations to be unambiguously typed (eg. "employed by" versus "employer of" for a quick off-the-top example.) Think "rel" on steroids.

      Lots of interesting things can then be built on these two things (including, among other things, inferring likely information and relations from known information) but those two are the fundamental distinguishing elements.

    19. Re:Semantic Spam by nwbvt · · Score: 1

      Oh no, tagging certainly wasn't big back in 2000, 2001. And no, early adopters don't count, as we are talking about popular trends when discussing cycles.

      --
      Mathematics is made of 50 percent formulas, 50 percent proofs, and 50 percent imagination.
    20. Re:Semantic Spam by altek · · Score: 1

      Of course, we could keep this at Layer 3, if programmers would just start properly implementing the Evil Bit!

      --
      THE MAGIC WORDS ARE SQUEAMISH OSSIFRAGE
  2. What? by TubeSteak · · Score: 0, Offtopic

    Is the semantic web supposed to be one of those Web 3.0 things?

    --
    [Fuck Beta]
    o0t!
    1. Re:What? by owlnation · · Score: 2, Interesting

      Yes -- essentially.

      And the only reason we moved from Web 1.0 to web 2.0, and the only reason we need to move from Web 2.0 to Web 3.0 is...

      We are still stuck on Search 1.0

      Well, ok, to be fair to Google -- Search 1.5

      Sorry, but we won't see much improvement in utility until someone rolls out Search 2.0. That is a product LONG overdue.

    2. Re:What? by STrinity · · Score: 2, Insightful

      Is the semantic web supposed to be one of those Web 3.0 things?


      If by that you mean "a collection of buzz-words that everyone uses without having Clue 1 what the hell they're talking about," yes.
      --
      Les Miserables Volume 1 now up with my reading of
    3. Re:What? by AuMatar · · Score: 1

      Why? Quite frankly I've never had a search that didn't find what I wanted in the top 10 links of google, within 1-2 tries. 90%+ of the time its within the top 5 links in 1 try.

      --
      I still have more fans than freaks. WTF is wrong with you people?
    4. Re:What? by Cederic · · Score: 1


      Google fails to provide useful search results for a lot of searches. Some of them may well have no internet content available. Many others the content is swamped by a pletheora of other site aggregotors, link farms, or even genuine vendors selling an item you're searching for, but not giving you the information you're after.

      Sure, I can refine my Google searches to cut out all these distractions. But I'm lazy; I want a two word search to give me the link I need straight away.

    5. Re:What? by AuMatar · · Score: 1

      Show me an example of one of those searches. I hear people claim this, but I've never, ever found one. The most I've found is too much content around my keywords and needing to be more specific. I've never seen a link farm or an aggregator in one of my searches.

      Then again, less than 1% of my searches have to do with buying something. Maybe thats the difference.

      --
      I still have more fans than freaks. WTF is wrong with you people?
  3. Content? by Walzmyn · · Score: 4, Insightful

    What good are fancy links if the content still sucks?

  4. Where's the Money? by Blakey+Rat · · Score: 2, Interesting

    I've never understood what the financial benefits for a site joining the semantic web are supposed to me. Reuters may be one thing, but how would you sell this technology to Amazon? Or NewEgg? If commercial sites can't/won't use it, how is it supposed to gain critical mass?

    1. Re:Where's the Money? by QuantumG · · Score: 5, Insightful

      Yeah, it won't matter until Google starts getting in on the act. When you can search for "a website where I can get free kittens and other pets" and get exactly that, instead of just sites that have those keywords in it (like this message in a day or so), then it will be valuable for people to RDF their site and maybe even look at the mess that the translator makes and clean it up.

      --
      How we know is more important than what we know.
    2. Re:Where's the Money? by ushering05401 · · Score: 2, Insightful

      Feeding Proxies is one potentially lucrative use of semantic technology.

      Here is a basic scenario for ten years down the line:

      1. You build a profile probably through a combination of allowing your online activities to be profiled, filling out in-depth surveys, and rating certain types of web-content on a semi-regular basis.

      2. A proxy identity is imbued with a 'personality' based on both your preferences as represented in step one, and ongoing analysis of content that causes you to register a strong reaction.

      3. The proxy consumes content and delivers what it believes to be desirable content to your device of choice.

      Given this business model we could see a return to the old 'portal' style of doing web business - though the portal itself would be largely invisible to the subscriber. Anything as simple as changing diction of a news item could vastly alter the interest of the proxy public.

    3. Re:Where's the Money? by pereric · · Score: 3, Interesting
      If I have a business selling - for example - bicycle pedals, being well listed at www.bike-pedal-finder.com, or by users of some yellow pages could certainly help my business. If the search engines could use information like below, it will probably help:

      <dealer name="my company">
        <in stock>
          <pedal model=M525 price=20E>
          <pedal model=M324 price=10E stauts=pre-owned>
        </in stock>
        <location> ... </location>
        <shipping> ... </shipping>
      </dealer>
    4. Re:Where's the Money? by sime0n · · Score: 1

      I've been wondering the same thing, actually, and found this post on ReadWriteWeb on Dapper's plans to use semantic data to drive an advertising network pretty interesting: http://www.readwriteweb.com/archives/dapper_funding_the_semantic_web.php For a company like Reuters, I could see them driving ads for country, market, or industry reports using the tags embedded in their stories, or let other businesses further down the information analysis pipeline do the same.

    5. Re:Where's the Money? by Simon+(S2) · · Score: 2, Funny
      --
      I just don't trust anything that bleeds for five days and doesn't die.
    6. Re:Where's the Money? by Anonymous Coward · · Score: 0

      Interposable methodologies and massive multiplayer online role-playing games have garnered profound interest from both system administrators and end-users in the last several years. Despite the fact that it at first glance seems counterintuitive, it is supported by previous work in the field. Furthermore, The notion that cryptographers agree with peer-to-peer communication is largely outdated. Therefore, the simulation of digital-to-analog converters and flexible archetypes are never at odds with the refinement of symmetric encryption.

    7. Re:Where's the Money? by Anonymous Coward · · Score: 0

      ... how the hell do you figure that is a valid XML element name? heh.

  5. Anti- Semantic comments in 3 ... 2 ... 1 ... by Anonymous Coward · · Score: 1, Funny

    And now for a host of Anti-Semantic comments in 3 ... 2 ... 1 ...

    Well, I am sure the authors will just call them Anti-Zio[a]ntic comments.

  6. Yawn... by icebike · · Score: 4, Interesting

    So I need this WHY?

    Most websites have little to say, and take all day to say it.
    Having a detailed graphical analysis of the blather seems unlikely to improve the situation. GI,GO.

    It would seem spending just a tad more time writing for HUMANS would be way more productive than writing for machines. Having a thousand computers watching your 100 monkeys seems unlikely to bring enlightenment or useful knowledge out of a pile of garbage and human blathering that passes for information on the web these days.

    People used to write web pages.
    Now they write software to write web pages.
    Its not surprising they now need to write software to understand the web pages.
    Whats the point?

    --
    Sig Battery depleted. Reverting to safe mode.
    1. Re:Yawn... by InsurgentGeek · · Score: 2, Informative

      You're a little unclear on the concept of an RDF graph. It's not a graph like your intro algebra class - it's a RDF (thats Resource Description Framework) representation of the semantics of a document. Check Wikipedia for Semantic Web or RDF.

    2. Re:Yawn... by QuantumG · · Score: 4, Interesting

      Writing AI that can read English (and all the other languages) and figure out the meaning is just, well, taking too long. But let's say it wasn't.. what would be the point? Would you say there was no point? Or would you say it was freakin' awesome and look forward to the day when you can actually ask a question and get a sensible answer from a machine?

      Well, if we are very forgiving we can get this kind of thing happening with current technology, we just have to supply all the "content" in a form that our primitive algorithms can handle. The Semantic Web is that. Maybe around the 3rd generation of these algorithms we might be ready to do the translation to machine form automatically.. maybe not.. but at least the Semantic Web people are again talking about translation.. was a time when they all said it was a fruitless path and the best way was to just supply applications for creating machine readable content easily.

      --
      How we know is more important than what we know.
    3. Re:Yawn... by InsurgentGeek · · Score: 1

      Perfect! A concise reasonable explanation. Thanks.

    4. Re:Yawn... by jlarocco · · Score: 1

      I can already ask Yahoo or Google a question and get a sensible answer. I guess I'm missing how this "semantic web" thing equates with AI that understands the meaning of English.

      Besides that, if you rely on the "content providers" to provide the meta-data the system is less than useless. Legitimate sites won't use it or update it, and illegitimate sites will abuse the system.

    5. Re:Yawn... by QuantumG · · Score: 3, Interesting

      Uh huh.

      When is the next shuttle launch?

      This is the first hit, not shuttle launch info.

      This is the second hit.. ah hah! The next launch is on Feb 7.. wait a minute, it's Feb 10! Was it delayed or something? Oh, I see, it says "Launched".. great, when's the next one.. March 11 +.. hmm.. wtf does + mean? Apparently I need to read this and hmm.. nothing there about what the + means.. I guess it means it might get delayed, they do that.

      See all that reasoning I had to do? See how long that took me? That's what the Semantic Web is for.

      --
      How we know is more important than what we know.
    6. Re:Yawn... by tm2b · · Score: 1

      The point is that sophisticated enough tools can help you find the websites that do have something useful to say.

      The amount of garbage out there only makes these tools more necessary.

      --
      "It is our blasphemy which has made us great, and will sustain us, and which the gods secretly admire in us." - Zelazny
    7. Re:Yawn... by MightyYar · · Score: 1

      You are pretty knowledgeable about this stuff, so I'm going to ask you:

      How does this stuff handle abuse? I mean, what's to stop Senior Spamalot from marking up all his machine-readable stuff for shuttle launches, but actually dishing you to a Viagra page? I don't understand how the "Semantic Web" won't be terribly abused.

      --
      W..w..W - Willy Waterloo washes Warren Wiggins who is washing Waldo Woo.
    8. Re:Yawn... by QuantumG · · Score: 3, Insightful

      How do *you* know when information is bullshit?

      How does Google's pagerank algorithm?

      --
      How we know is more important than what we know.
    9. Re:Yawn... by chthonicdaemon · · Score: 1

      I must say I don't think it's quite as "freakin' awesome" as you seem to. I believe that natural language is not only hard to handle correctly, but also hard to use correctly. There is a reason why we have formal specifications and legal language -- "natural" language is just too vague. Now in some niche areas where you don't have your hands available I can see the allure of voice recognition, but I honestly think that speaking to computers to have them do stuff in anything resembling natural language will be harder to use to get to a specific goal than what we have now. I suppose if you just want some kind of result, that's not so bad, but I kinda like getting exactly what I ask for. A much better argument here. I know it's about programming, but that's basically what we do with computers on any level of use.

      --
      Languages aren't inherently fast -- implementations are efficient
    10. Re:Yawn... by QuantumG · · Score: 1

      So ask in a formal language.. point is, we can't even ask questions now.

      We can't even ask questions about systems which are designed to be machine readable. Look at software debuggers.

      --
      How we know is more important than what we know.
    11. Re:Yawn... by Dan+East · · Score: 1

      Obviously the current searches are not semantic, so the key is searching for the right thing. At first glance, your query sounds simple enough. However, the problem is that there simply may not be any webpages dedicated to providing the exact information you asked for. In this case, are there webpages that are kept up-to-date with information specific to the next shuttle launch? What you really need to search for is not the "next" shuttle launch, whose definition is always changing, but "shuttle launch schedule", or even simply "shuttle schedule".

      Should it be easier to search than that? Sure, that would be nice. My biggest concern is that since the semantic engine is trying to infer meaning to your query (specifically, display pages that don't explicitly match your query - in this case, not when the next shuttle launch is, but simply the current shuttle launch schedule), it would be open to even more abuse through spamming and PageRank type abuse.

      Dan East

      --
      Better known as 318230.
    12. Re:Yawn... by daigu · · Score: 4, Interesting

      I'll tell you why you need it. It provides another layer of abstraction. Let's try a few illustrative examples.

      1. Let's say you work for a Fortune 500 company and you get over 10,000 emails a day from customers complaining. Do you think it is better to read each one or have a tool that abstracts it to graphically display key concepts that they are complaining about so management can do something about it today?

      2. You are a clinical researcher in Cancer and have a terabyte of unstructured patient data. Can you think how text descriptions of pathology reports might be displayed graphically against outcomes to suggest some interesting insights?

      There's a lot of useful information that isn't on blogs - although it would be useful for them too. You need to exercise a bit more imagination.

    13. Re:Yawn... by QuantumG · · Score: 4, Insightful

      Ok, you seem to be of the belief that I'm still talking about search.. in the classical "give me a web page about" sense. I'm not.. and the Semantic Web people are not. "next" has a meaning.. everyone knows what it is. "shuttle launch" has an almost unique meaning.. although some concept of our culture and common sense is needed to disambiguate it. Asking when the next shuttle launch is has a unique answer: a date and a statement of the confidence in that date. For example "March 12, depending on weather and other things that might scrub the launch." I don't expect this to be "webpages that are kept up-to-date with information specific to the next shuttle launch"... I expect the answer to my question to be synthesized in real time from a dynamic pool of knowledge which is obtained from reading the web. I want a brain in a jar that is at my beck and call to answer every little question like this that I have through-out the day.. on everything from spacecraft launches to what the soup of the day is at the five closest restaurants to my office. There doesn't need to be some web page that is updated daily by some guy who works near me and enjoys soup.. there just needs to be information on soup and location posted by restaurants in my area.

      So am I talking about search? Well, yes, but its an algorithm that uses search to answer my questions.. instead of me having to do it.

      Think about that soup question.. how would you do it now? I'd go to Google maps.. enter the location of my office, search businesses for restaurants, click on one of the top 5 to see if they have a daily updated menu, note the soup of the day, go back to Google maps, click on the next one, etc, until I had the answer I wanted. That's a pretty simple algorithm.. it's something a machine learning system could come up with.

      --
      How we know is more important than what we know.
    14. Re:Yawn... by Brandybuck · · Score: 1

      You will need it because it will take far more than porn downloads to fill up the harddrives of tomorrow. Indexed links between every word in every file to every other word in every file will take care of that nagging empty space.

      --
      Don't blame me, I didn't vote for either of them!
    15. Re:Yawn... by martin-boundary · · Score: 2, Insightful
      You think that if we feed weak AI algorithms a lot of cleaned up, pre-tagged data, that's going to help overcome the weakness of the algorithms and produce something worthwhile?

      Sorry, there's a flaw in your reasoning: Who gets to pre-tag the data? Everybody. But you can't trust everybody on the net. So you'll get a lot of data that's specifically designed to confuse and subvert the weak algorithms, and by definition such algorithms aren't strong enough to rise to the challenge.

      The Semantic Web people will get a nasty shock when they realize that what they've really got is the Spamantic Web.

    16. Re:Yawn... by QuantumG · · Score: 1

      Blah, vetting the quality of your inputs is necessary but it's a completely different algorithm to answering queries. This is already true of search engines... and we have good ways of handling it. But hey, you're the kind of person who gives up looking for a job because you're sure no-one will hire you.

      --
      How we know is more important than what we know.
    17. Re:Yawn... by martin-boundary · · Score: 1
      1) "vetting the quality of your inputs" is not AI. It's just putting in what you want to see coming out, assuming you understand sufficiently the way the particular algorithm you're tweaking works.

      2) "we have good ways of handling it" is a euphemism for human beings. Yes, just throw people at the problem and let them censor the bits of data that they don't like. Again, you're just letting in what you want to see coming out. Search engines have teams who get paid to scrub their data. It's not AI. We still get tons of garbage in searches.

      3) I'm the kind of person who doesn't like being swindled with big words which hide thin deliverables. The problem with claiming AI power which depends on human power behind the scenes is that human power on the net just doesn't scale.

    18. Re:Yawn... by QuantumG · · Score: 1

      You must be living in some other world to me. Google search results are not vetted by humans. It's this little algorithm called pagerank.. you might have heard of it.

      --
      How we know is more important than what we know.
    19. Re:Yawn... by martin-boundary · · Score: 1
      Perhaps you should read up on it? PageRank proper is only a small factor in Google's index sorting method. Other factors are ad hoc things like weights for whether words appear in headings or paragraphs, whether the page is full of hidden keywords, whether the word "homepage" appears prominently etc.

      PageRank itself is merely about counting links, which is entirely independent of content, and not as useful on its own as you might think. For example, there's no guarantee that an index page will appear before a subordinate page if all you use is PageRank, so PageRank is simply overruled. There's special code just to try and make sure people's homepage appears first when their name is put in to the search box.

      Google's search results are vetted by teams of humans all the time. That's also the only way so called spam pages can be identified. Once it becomes clear there's a trend, an ad hoc censoring algorithm can be written to hide those kinds of spam pages from the returned results. And if someone complains, some more ad hoc code might be written to fix the bugs in the censoring algorithm.

      In any event, there's a whole lot of human oriented massaging of results to comply with criteria that simple algorithms can't discover on their own. And still Google's search results are full of dupes, they aren't clustered properly, and are often out of date, or haven't you noticed?

    20. Re:Yawn... by QuantumG · · Score: 0

      Hehe, no, maybe *you* should read up on it.

      --
      How we know is more important than what we know.
    21. Re:Yawn... by martin-boundary · · Score: 1

      Right, whatever.

    22. Re:Yawn... by ThePromenader · · Score: 1

      Page rank is a gauge of popularity, not content, more than anything. It's a factor that only comes into play when google's algorithm judges your content at the same level of that of another page as an answer to a query - only then the most popular page gets top spot.

      I like the concept of a semantic web, but frankly, I don't like its present trend of implication. It seems so "chunky" (metacruft), and still has to be managed by humans if it hopes to attain any level of accuracy.

      If we can't mimic human thought, perhaps we can make a search method that can take into account the results of its reasoning. Boolean searches are quite powerful - why not develop a system along those lines? With added functionality - say, "bob" -5 "Ralph" would turn up pages that have those two words within five words of each other, with results ordered by relevance (matching boolean '-5', matching boolean 'AND', matching one word, etc.). Have the boolean markup generated by a GUI, if you will. As for prices, I'm sure these could be recognised by any search engine if it is programmed correctly.

      Even this solution does not seem "complete" to me - somehow we're going to have to find a way to recreate human reasoning (to a certain level) before we can have a semantic web that is of any widespread (www) use to anyone.

      --

      No, no sig. Really.

      ThePromenader
    23. Re:Yawn... by Anonymous Coward · · Score: 0

      So basically you're talking about dumbing down something that people already know how to use and do not find complicated, by rehashing the same old Star Trek interface ideas nobody cared about in their prior iterations.
      MS Help had this, Ask Jeeves had this, people simply don't like it. It's not useful. Can you Semantickers move on with your life and stop bothering us already? Thank you.

    24. Re:Yawn... by Arancaytar · · Score: 1

      People used to write web pages.
      Now they write software to write web pages.


      We also have software to write software (see [[Compiler]]). Now that is just lazy and decadent.
    25. Re:Yawn... by MightyYar · · Score: 2, Interesting
      It's a damn good point, but I'm better at it than a computer. Though to tell you the truth, Google's spam filter on gmail is darned close to perfect (once trained) - so I can see how they would be able to filter the information using something akin to their spam filter. And they'd still use something like pagerank to rank the results, so that might go a long way toward nailing the spammers.

      But I wonder whether that approach is going to be any simpler or more effective than just developing better or more intelligent search algorithms? Then they don't have to determine whether or not the information is bullshit, because chances are that I'm not searching for herbal Viagra so my search terms aren't in the page.

      It's not just spammers that will throw a wrench into the semantic web... what if I accidentally leave out the metadata for a page? Or make a cut-and-paste error and forget to edit the metadata so that it is completely wrong for a page? The answer, as I see it, is computer-generated metadata... at which point, why not just build that functionality into your search engine?

      By the way, if you instead search for "Space Shuttle Launch Schedule", the first result on Google is very apropos. I often find that Google rarely leads you astray once you learn to think like a search engine (which isn't very hard - they are dumb). But I'll grant you that a more natural language for search queries would be a boon for beginners.

      Oh, and the plus after March 11? There is a legend at the top of the page:

      Legend: + Targeted For | * No Earlier Than (Tentative) | ** To Be Determined :)
      --
      W..w..W - Willy Waterloo washes Warren Wiggins who is washing Waldo Woo.
    26. Re:Yawn... by dkf · · Score: 1

      The answer, as I see it, is computer-generated metadata... at which point, why not just build that functionality into your search engine? Yahoo are already doing that. If you go to their search page, enter some search term (e.g. "linux") and search. Now, on the results page there should be a little arrow down at the bottom of the top bar; click on that and it will open up a panel that includes concepts linked to the search terms (and also possible refinements of the search). I know (from talking to the people at Yahoo) that they're deriving the concepts automatically from their spidered data, and it works really well.

      How resistant is it to spam? No idea, to be honest!
      --
      "Little does he know, but there is no 'I' in 'Idiot'!"
    27. Re:Yawn... by MightyYar · · Score: 1

      Thanks, neat link - I haven't used Yahoo search in a long time.

      It's a cool trick, but it doesn't really do much useful right now. I tried the space shuttle example, and it didn't really add any value over and above what google does. On the other hand, it does a pretty good job when your search is not very specific - like just typing "Britney Spears".

      They should make it more obvious that you need to push that little arrow! I never would have tried that!

      --
      W..w..W - Willy Waterloo washes Warren Wiggins who is washing Waldo Woo.
    28. Re:Yawn... by Lally+Singh · · Score: 2, Informative

      It's the difference between having all of your customer data in a set of text files vs a database. The database is structured, which lets the computer do more analysis on it. It can also index that data more effectively.

      Here's one example, say I want to do a little semi-political research. I ask semantic google (which, for the sake of argument, has a more advanced query language) for the relationship between the price of RAM and the price of oil.

      Right now, google could at best look for an article on that specifically.

      With a semantic web, it can find data points for the price of RAM & oil in various places and give me back a table. Why? because the pages would be marked with those datapoints specifically.

      Or, which years have wars with total dead > some threshold. A summation query over the lifetime of the war can do that. I don't have to find a single webpage where someone's done that by hand. Or some specialized data service for it. Google (or some other search agent) could correlate that data for me from blogs, newspaper articles, UN reports, etc. Combine them all together (b/c it knows that they're all data points for the same thing), and give me a report. It could even show me a comparison of which data sources give which numbers, letting me see report bias right there.

      Give you a little bit of a chubby? Definitely gives me one. Add this to a smart voice-operated query agent and you have some star-trek stuff going on.

      --
      Care about electronic freedom? Consider donating to the EFF!
    29. Re:Yawn... by Anonymous Coward · · Score: 0

      Glad to see you've got some sort of understanding of the advantages of databases.. LOL

      I hope you told your NASA/NSA/MI5 friends all about it. Make sure you add it to your CV as proof of the valuable learning experience your MS Excel data entry job under the auspices of Mr Walter Mitty 007 was.

    30. Re:Yawn... by chthonicdaemon · · Score: 1

      We can't ask questions? I suppose asking for the value of a variable in gdb doesn't count. If you are referring to 'why is this program not working?' in the debugger reference, I assume you know that getting answers for these questions effectively requires strong AI, or at least a better specification of what it is that we are trying to do with the program in the first place.

      Back to search and the semantic web, I think that we are using formal languages to ask questions in search every day. I would love to see some examples of questions that you would like to ask (you don't need to use a formal language). Many (if not most) of the ones I can think of are answered as well as can be expected by Google, even though they don't have explicit semantic search. For the ones that require better connection mapping, I think people are making good progress with larger databases and more efficient indexing methods. I suspect, however, that the real questions you want answered require an enormous amount of context. Now, if this is the case, then your original rant about not being able to ask questions is a bit obtuse. If I am missing some meaning in your words, I suppose that's just a great proof of how hard it is to be clear in natural language.

      --
      Languages aren't inherently fast -- implementations are efficient
    31. Re:Yawn... by cromar · · Score: 1

      They had/have this to a very small small extent. Their implementations are still based around keyword search and extracting keywords out of human readable questions. A truly semantic information structure would be much different, and would allow allow all kinds of new ways to interact with content beyond simple search. Imagine being able to not only ask English language questions, but also to be able to use a scripting languages or something like a semantic regular expression to find customized content. Hell, even tagging is somewhat semantic, and it has truly made a lot of changes to the way we interact with content.

      Until the data is there to be queried, one can only imagine the implications of such richly connected data.

  7. I beat off on my router... by Anonymous Coward · · Score: 0

    ... now I can surf the sementic web.

  8. Command line vs GUI all over again by EmbeddedJanitor · · Score: 3, Interesting
    THis looks like command line vs GUI wars all over again. GUIs are fine for rapidly hitting easy-to-find targets but sometimes typing is far easier and faster. Lumbering crap GUIs are really hard to drive (eg. MS Visual Studio).

    Semantic webs might be OK for small document sets where you can visualy search tags and click them. Want to look up something about monkeys? Look for the tag that says monkeys (or maybe find primates first, then monkeys) and click it.

    But for huge data sets this sucks. After a smallish number of documents & subjects it must be far easier to type monkeys in search box and have Google etc do the search.

    This might work for handling some queries, but will suck supremely for complex queries over large data sets (eg. the whole www).

    --
    Engineering is the art of compromise.
    1. Re:Command line vs GUI all over again by smurgy · · Score: 3, Interesting

      I really think you're forgetting about the power of booleans over indexed content and the weakness of string searching. Positing a tag-dense web search in which autoindexers crunch tags for every page as one containing an overabundance of hits compared to string searching is arguable, but in fact what tag searching does is provide a far meaningful range of hits. There might or might not be more, but it's better.

      We need to couple the proposed "semantic web" with more than the single-box search page or rather, allow users who can't cope with anything beyond single-box and/or learning to use operators to have their good old search google interface as a second option and put the current advanced search on the front end.

      Pie in the sky I know, but I like to think that the drive to search simplicity is reflective of the needs of the last generation (scared of information density) and not of the potential of the future ones (growing up searching).

      I can handle a search pretty well, and I'd enjoy getting more of a chance to search for meaning not just strings. Think of a search page with a theoretically infinite number of boxes - each box drops down to a specific type of search (tags, headers, content etc.) and operator, each box I can put an importance rating (so pages with matching tags are vital, pages with matching strings rank higher but aren't necessary etc. etc. etc. depending on my needs) and under the bottom box is a spawn new box button. If I don't like my results I customise my search, search-in-results, change my elements.

      Professionally I work with custom-indexed databases all the time and it's a pain in the behind to know the amount of information available of the net but be faced time and again with its limitations. Every criticism you make of semantic searching here applies ten times over to string searching. Should the tag creation software be able to match human tagging in accuracy it would easily override it in coverage. As to accuracy, look at the tags assigned here. The article references the OS release of Reuters' Calais, and someone's assigned the tag "vaporware". Given that vaporware is by definition unreleased (and never to be released) software I'd say human tagging is running at 33% failure on this article at least.

  9. Great, just great ... by ScrewMaster · · Score: 4, Funny

    Semantic Web Getting Real

    Just what we need. Yet another version of RealPlayer.

    --
    The higher the technology, the sharper that two-edged sword.
  10. Advertising..... by IHC+Navistar · · Score: 0, Offtopic

    If online news outlets cut out the advertising promos that precede every video news clip, it would be a million times more popular that it already is.

    I mean, nobody wants to see an advertisement tht is twice as long as the video clip itself. People will especially be turned off when they realize they took the time two view a 30 second mattress or advertisement just to view a 45sec-1min news clip about a story that is either boring, uninformitive as print, or just plain crappy.

    Advertising is the Black Plague of all media. It's consumer repellent ability can't be denied, and the number of good ideas that have been ruined by ads is unimaginable.

    --
    Knowing Google's lust for data collection, the Soviet Union is still alive and well inside the psyche of Sergey Brin....
  11. Wordpress Plugin by Anonymous Coward · · Score: 0

    I think the bounty for a word press plugin is a neat idea. Having seen how poor the performance is for existing tag suggestion tools for word press, maybe calais can do a better job?

  12. long way to go.. by emj · · Score: 1

    there is some thing similar at http://www.powerset.com/ they are still in Beta though, and it's not working that great. We will never get perfect matches from computers, but the question is if semantics will ever be better than just keywords.

    1. Re:long way to go.. by QuantumG · · Score: 2, Insightful

      blah, search is great and all, but that shouldn't really be the ultimate purpose of the Semantic Web.

      Asking a question and getting a sensible answer, that's the killer app.

      --
      How we know is more important than what we know.
  13. Symantec Web? AHHHHHHH!!! by Akaihiryuu · · Score: 1

    Am I the only one who misread that?

  14. pfft... by djupedal · · Score: 3, Funny

    "Wenig made some good points about the end of the latency wars..."

    Mr. Wenig must not be all that familiar with /.'s 'editorial' habits :\

  15. Re:Symantec Web? AHHHHHHH!!! by bane2571 · · Score: 4, Funny

    I read it like this:
    Semantic web getting real [player]
    and immediately thought "it was bad enough when the original web got it"

  16. Re:Symantec Web? AHHHHHHH!!! by gotzero · · Score: 3, Funny

    "Please note this environment may not be completely safe, so we are going to prevent you from entering. We have also initiated so many system processes that it will simulate a virus on this system."

    The links in that article are neat. I am looking forward to watching the maturity of this!

  17. Oops... by Aaron5367 · · Score: 1

    The first time I read the title, I thought it said 'Symantec Getting Real'. Well, I was planning to leave a smart comment about Symantec and Real don't belong in the same sentence.

  18. In case you have no clue what they're talking abou by WK2 · · Score: 4, Informative

    If you are like me, and have absolutely positively no dang fucking clue what the summary is talking about: http://en.wikipedia.org/wiki/Semantic_Web

    According to the Wikipedia history, this concept has been around since at least 2001.

    --
    Write your own Choose Your Own Adventure. http://www.freegameengines.org/gamebook-engine/
  19. Not the Semantic Web by timeOday · · Score: 5, Insightful

    IMHO this is not the semantic web. The primary representation is still (just) natural language. Anything in addition to that is really just search engine technology under a different banner. Is that a bad thing? No! I've always said the semantic web was bound to fail because people don't want to spend a lot of extra effort tagging their information so others can slice and dice it; instead, the evolution of natural language processing in search (rather than manual tagging) will solve the problem. Maybe the Reuters idea of exposing the "inferred" metadata will be useful (as opposed to normal searches like google who simply keep this metadata in their own indices), though as yet I don't see why.

  20. Why can't AI get the semantics from the plain text by presidenteloco · · Score: 2, Insightful

    When you start aggregating as much text as google does, the semantics just starts popping out, in the form of word relationship statistics.
    The massive corpus size, when measured carefully, acts to filter semantic signal from expressive difference "noise".

    Combine that kind of latent semantic analysis of global human text with conceptual knowledge representation and inference
    technologies (which would use a combination of higher-order logic, bayesian probability, etc) and it should be possible to
    create a software program that could start to get a basic semantic understanding of documents and document relationships
    in the ordinary "dumb" web.

    Could the proponents of the semantic web please tell me what it will add to this?

    My basic proposition is that if an averagely intelligent human can infer the semantic essence (the gist, shall we say), of
    individual documents, and relationships between documents on the web, why can't we build AI software that does
    the same thing, and then reports its results out to people who ask.

    --

    Where are we going and why are we in a handbasket?
  21. Re:In case you have no clue what they're talking a by InsurgentGeek · · Score: 1

    Ummh, I think that's the point. The concept - first advocated by Tim Berners Lee - has been around for a long time. The technology to make it real has not. This is a big step in that direction. It's not the whole answer - but services like this will help overcome one of the key constraining factors: ubiquitous metadata tagging of content.

  22. Re:Why can't AI get the semantics from the plain t by Anonymous Coward · · Score: 1, Insightful

    [...] if an averagely intelligent human can [do X], why can't we build AI software that does the same thing [...]
    Because wetware is still ahead of machines in a few domains. Be thankful for that because when we can build AI software for everything, we won't be needed anymore.
  23. OpenCalais by lenzg · · Score: 3, Funny

    Finally, Reuters released OpenCalais as free open-source software. OpenDover will appear any time soon. (someone may then connect both using a Channel, SSH perhaps)

    1. Re:OpenCalais by Zoxed · · Score: 1

      > (someone may then connect both using a Channel, SSH perhaps)

      Trains-on-rails, tunneled, would be the most secure: less chance of someone seeing your bytes ferried across, and a man-in-the-middle attack would be much more difficult !!

  24. Actually... by Anonymous Coward · · Score: 0

    Instead of that, I misread Calais as Cialis, which wasn't helped by the first post being about spam...

  25. Really real this time? by jfengel · · Score: 0, Flamebait

    The best indicator of vaporware seems to be continual postings on Slashdot that something is real.

    Given that the Semantic Web is neither Semantic nor Web, I think we've got another data point for that theory.

    1. Re:Really real this time? by msuarezalvarez · · Score: 1

      Dude, you forgot the ending `Discuss'.

      Kids...

  26. Re:Symantec Web? AHHHHHHH!!! by webmaster404 · · Score: 1

    Nope, I did too, and I was wondering... does this mean that Norton won't crash and slow down Windows computers more then most spyware/viruses?

    --
    There is no "disagree" moderation, and troll, flamebait and overrated are not valid substitutes
  27. Re:Why can't AI get the semantics from the plain t by msuarezalvarez · · Score: 2, Informative

    Could the proponents of the semantic web please tell me what it will add to this?

    Actually, the story is about a tool which does (a part of) what you are describing.

  28. Vaporware? by TheBrutalTruth · · Score: 0, Flamebait
    Uhh, maybe we need to get rid of the tags, Slashdot. Or get rid of the ignorant assholes who tag erroneously. Or those who tag things that exist (RTFA!), as vaporware, intentionally merely because they don't like/agree with the idea.

    Probably easier to get rid of tags...


    --
    Enlightenment is a pipe dream. So where's the pipe?
    1. Re:Vaporware? by smurgy · · Score: 2, Insightful

      I noticed that too... I was looking at the tags to provide an example of what machine-created tagging has to go up against to beat human tagging for a rant up above. I guess I have to thank that idiot for proving my point. Humans do hostile tags, they haven't yet written a subroutine to make a machine act like a jerk.

  29. hype, waste of time, or big mess by globaljustin · · Score: 3, Interesting
    the wiki article you linked to says:

    For example, a computer might be instructed to list the prices of flat screen HDTVs larger than 40 inches (1,000 mm) with 1080p resolution at shops in the nearest town that are open until 8pm on Tuesday evenings. Today, this task requires search engines that are individually tailored to every website being searched. The semantic web provides a common standard (RDF) for websites to publish the relevant information in a more readily machine-processable and integratable form

    On first read, I like what they are trying to do, but I see so many problems with what they are thinking, and I am not a web designer in any sense.

    First, I don't have a problem finding things to buy on the internet. The problem is, signal to noise ratio. There are TOO MANY google results for something like 'plasma tv.' No matter what kind of RDF is used, it will be abused by people who want their URL to show up in your search for whatever reason. I think someone touched on this earlier a little in this thread, but it deserves repeating.

    Second, can you imagine a scenario where, say, best buy or fry's uses some 'semantic web' application to do real time web searchable updates of their inventory? That's what would have to happen for this to work, and do something that isn't already possible.

    Right now, I can search for 'plasma tv' in google or ebay. Then I can call my local retailers to see if they carry that item, and have it in stock. In order for this system to make any kind of tangible change in the example given, retail chains would have to update their inventories online, whenever a purchase is made, or new items delivered to the store.

    It's an interesting idea. I wonder if the retailers would go for it? All it means for them is fewer people comming into their stores...sounds like that would hurt sales.

    I also hate internet hype. I really fouls things up, more than some want to acknowledge. I try to keep my 64 year old dad educated enough to buy coffee beans on ebay, check email, look at news, etc. Every time he sees 'symantic web' or 'web 2.0' in the media, it just confuses him, and I imagine, people like him who just use the net for basics like online bill pay, ebay, etc. He doesn't need a new buzzword to motivated to shop online or whatever.

    he has the motivation already...silly contrived 'new meida' buzzwords just waste time and confuse people
    --
    Thank you Dave Raggett
    1. Re:hype, waste of time, or big mess by mdwh2 · · Score: 1

      It's an interesting idea. I wonder if the retailers would go for it? All it means for them is fewer people comming into their stores...sounds like that would hurt sales.

      You might as well the same thing about the Web though - why would all these companies go to the trouble of having websites, especially if it means fewer people in their stores?

      Because it means more sales. And sales with fewer people in the stores is a good thing - less costs.

  30. Confusing terms. by v(*_*)vvvv · · Score: 1

    The semantic web refers to a specific attempt/vision put forth by w3c.

    http://www.w3.org/2001/sw/

    This article is about a news organization using semantic tools to help extract and manipulate certain data. Sure, they are related a little maybe, but if related meant equal, then every computer would break.

    Just because the word "semantic" matches, they've confused the two domains, and if humans can't even do it, I wonder what our automated semantic web would look like with robots trying to make connections. I cannot even begin to imagine how hackable that would be.

    1. Re:Confusing terms. by Joosy · · Score: 1

      Regardless of whether or not this is the "real" semantic web, the concept will never fly until they rename it. Most people don't grasp what semantic is supposed to mean in this context, but if they called it something like the data web then the lightbulb would click on for a lot more people.

      --
      I'm sick and tired of these hip, "ironic" sigs. This is an actual, honest-to-goodness no-nonsense sig!
  31. In all seriousness... by v(*_*)vvvv · · Score: 1
    It is because our best AI is still extremely stupid compared to even a dumb dog.

    In reponse to:

    My basic proposition is that if an averagely intelligent human can ... , why can't we build AI software that does
    the same thing, ...
    1. Re:In all seriousness... by Gazzonyx · · Score: 1

      It's not a function of stupidity, it's a function of the limited fanout factor of a computer. The brain has a fanout factor of 10,000 whereas a computer has a fanout factor of 10, IIRC. Our mind can grasp details, isolate them and compare them to other 'things' (experiences, objects, people, sights, sounds, etc.) without explicit instructions to do so, whereas a computer cannot (and I highly doubt ever will) do this. This is as I understand it from talking to someone, somewhere down the line - please correct me if I'm off base.

      --

      If I mod you up, it doesn't necessarily mean I agree with what you've said, sorry.

  32. Re:Why can't AI get the semantics from the plain t by The+Master+Control+P · · Score: 2, Insightful

    Why should I be thankful about spending my adult life working because machines aren't up to the task? I'll be thankful when machines take the work and leave us free to do what we want.

  33. So does that mean... by Anonymous Coward · · Score: 0

    we're going to see a lot of semantic goosestepping and .sig <h1><i>l!</i></h1>s?

  34. Not the Social Web by Anonymous Coward · · Score: 1, Insightful

    "No! I've always said the semantic web was bound to fail because people don't want to spend a lot of extra effort tagging their information so others can slice and dice it"

    And yet we have social sites.

  35. you're not the only one who misread by Laebshade · · Score: 1

    I misread it as "Symantec Web Getting Real" and I was like, "wtf? The maker of Norton's website is buying Real?"

  36. Re:Why can't AI get the semantics from the plain t by Anonymous Coward · · Score: 1, Insightful

    I really don't see that happening. The transition to this sort of economy is basically where the problem is now. As human labor is replaced by robotic arms in factories, those employees are left to find another job. Only, their entire skill set has now been replaced, so they are back to square one... They don't receive pay for the rest of their lives just because their job was replaced with a machine that does it better.

  37. Oblig. Matrix by SeaFox · · Score: 1

    Semantic Web Getting Real

    "If real is what you can feel, smell, taste and see, then 'real' is simply electrical signals interpreted by your brain."
  38. "Free" for "anyone"? Not so fast. by janbjurstrom · · Score: 2, Informative

    Reuters just opened access to their corporate semantic technology crown jewels. For free. For anyone. Their Calais API lets you turn unstructured text into a formal RDF graph in about one second. ...
    It's "free" for "anyone" for loose definitions of the terms. Glancing at their terms of use (emphasis added):

    You understand that Reuters will retain a copy of the metadata submitted by you or that generated by the Calais service. By submitting or generating metadata through the Calais service, you grant Reuters a non-exclusive perpetual, sublicensable, royalty-free license to that metadata. From a privacy standpoint, Reuters use of this metadata is governed by the terms of the Reuters and Calais Privacy Statements.
    So you pay with your metadata. One can say you're doing that with Google too. Nevertheless, that's not entirely free.

    Also, it's not yet for "anyone." According to the Calais roadmap, only English documents are accepted: "Calais R3 [July 2008] begins ... to incorporate a number of additional languages... Japanese, Spanish and French with additional languages coming in the future."
    --
    668.5
  39. A Little too Cynical by Gregory+Arenius · · Score: 4, Insightful

    I understand being jaded about internet hype and buzzwords but I'm still surprised that after nearly eighty comments there doesn't seem to be anyone who has anything to say other than "vaporware" and "it won't work because of the spammers." Yes, maybe it has been overhyped and yes it is taking a while for the envisioned ideas to come to fruition but that doesn't mean that those ideas aren't worthwhile.

    I'll use the following example because I recently had to do this with non semantic tools. Lets say you wanted to see how good or bad a job a transit agency is doing in its city in comparison to other similar cities. A couple of metrics you might use to find similar cities would be population size, population density and land area. Google doesn't do a good job with something like that. You end up needing to search for cities individually and then finding their data points. Or you can find a list of cities ranked by population or population density. If you search on Google for something like that you end up at one of the Wikipedia lists. These lists are helpful but....still lacking. They don't contain all the cities you need or they don't provide a way to look at multiple data sets at the same time. The lists are also compiled by hand and aren't automatically updated when the information on the city page is changed. The data is in wikipedia though. Every city page lists that information in a little box near the start of the article. But how do I take this data that is in Wikipedia from the form that its in into a form that I can use to find what I need to know? Enter the semantic web.

    Lets say that wikipedia, or at least the parts dealing with geography, were semantic. Now, there are tens of thousands of pages describing countries, regions, states, counties, parishes, cities, towns and villages. Then those pages are translated into many other languages. Some of the data that these pages contain is of the same type . They all contain the name of the locality, latitude, longitude, size, population size and elevation. For data such as this it would be pretty easy to have a form to enter the data into as opposed using the usual markup and the form could put the data into the proper markup for the page and the proper RDF. Once the data is in proper RDF form it would be easy to automate the process of updating translations of that page with the new data as well as updating any pertinent lists. It would also make it easier for people who want to analyze or use the data because they would be able to access it much more easily.

    But nobody really wants machine readable access to this information, you might say, except for the random geek and researcher. I would disagree. Lets say you're using a program like Marble which is similar to Google Earth in some ways but is completely open source. If they wanted to display the population of a city when you hover over it they would currently have to create and maintain their own dataset or they'd have to write a parser to extract it from wikipedia. Neither of those options is particularly easy at the moment but if the information was in semantic form on wikipedia it would be a piece of cake.

    The strength of the semantic web isn't, in my opinion, going to be AI like personal agents or anything like that. It'll be things that in many ways are already here. Like Yelp putting geotags on the restaurants they reviews and apps like Google Earth taking that data thats available in machine readable (Semantic!) for to overlay that data on a map so that you can see whats nearby. It'll be applications doing the same with the geotags from flickr. Its really useful mashups like http://www.housingmaps.com/. Its the transit agency putting realtime bus data up in semantic form so you can see on your iphones google map how far away the bus is. So yeah, maybe the semantic web is overhyped but that doesn't mean there isn't a lot of substance there, too.

    Cheers,
    Greg

    1. Re:A Little too Cynical by CSLarsen · · Score: 1

      Just a simple thing like right-clicking on a dollar amount in your browser and choose "Convert to local currency" would be a huge improvement of what's already available. Or being able to have your browser dynamically recognize dates and format them from American to European format, client-side.

      --
      Claiming to be pedantic on Slashdot is asking for trouble
  40. Because "AI" is a misnomer by melted · · Score: 2, Informative

    There's no more "intelligence" in AI than in a can of Campbell soup. It's basically statistics, linear algebra and (sometimes) handcoded rules for reasoning. It doesn't evolve. It doesn't build upon what it "knows". It has no self-awareness or consciousness and its reasoning capabilities, if present, are extremely weak compared to even children.

    We're so early in the development of this field that no one can even define what "self awareness" or "consciousness" really is, let alone how to create it or scale it. Folks try. There's Cycorp, there's Powerset, there are a lot of people in academia who work on NLP, Machine Vision, classification, neuroscience, etc. There is, however, no unifying vision or theory/understanding what is it we're trying to build, and the current methods have nothing in common with "intelligence" per se. They do learn, in a sense that they figure out the hidden structure of a given set of data by approximating it using a mathematical model. Even though this model sometimes closely matches what a human brain does (e.g. in multilayer neural nets), they don't come anywhere close to what one would call "intelligence". What they lack is scale (and speed), and advanced cognitive mechanisms required to become self-learning.

    It's also interesting to note, that at this point humans know on a high level how their brain works. Neocortex is a six layer neural net with links going cross-layer and neurons organized into columns. Trouble is, there's hundred billion neurons. We sorta know how vision works, too. Trouble is, we can't work with it in real time (because, naturally, you'd need a chunk of those hundred billion neurons). Heck, even human language is a pain in the ass if you don't have advanced cognition (AKA strong AI), with ability to understand euphemisms, sarcasm and idioms, paraphrase, generalize and specialize. Heck, even anaphora resolution is not solved yet (i.e. what does he/she/it in the current sentence refer to in the previous text). It's as if you had a bunch of parts and no manual and someone asked you to assemble a spaceship out of what you have, warning you that some parts are broken and may require you to make your own replacements. Without blueprints. Blindfolded. With your hands tied behind your back.

    I do believe that in 50 years we will have strong AI, though. I work in a science lab, however, and many researchers don't share my optimism.

  41. Vapourware my arse by theno23 · · Score: 4, Insightful

    The company I work for, Garlik has two products that are run off semantic web technology. DataPatrol (for pay) and QDOS (free, in beta).

    We use RDF stores instead of databases in some places as they are very good at representing graph structures, which are a real pain to real with in SQL. You often hear the "what can RDF do that SQL can't" type arguments, which are all just nonsense. What can SQL do that a field database, or a bunch of flat files can't? It's all about what you can do easily enough that you will be bothered to do it.

    A fully normalised SQL database has many of the attributes of an RDF store, but
    a) when was the last time you saw one in production use?
    b) how much of a pain was it to write big queries with outer joins?

    RDF + SPARQL makes that kind of thing trivial, and has other fringe side benefits (better standardisation, data portability) that you don't get with SQL.

    I guess it shouldn't be a surprise to see the comments consisting of the usual round of more-or-less irrelevant jokes and snide commentary - this is Slashdot after all - but I can't help responding.

    1. Re:Vapourware my arse by Aram+Fingal · · Score: 1

      We use RDF stores instead of databases in some places as they are very good at representing graph structures, which are a real pain to real with in SQL. You often hear the "what can RDF do that SQL can't" type arguments, which are all just nonsense. What can SQL do that a field database, or a bunch of flat files can't? It's all about what you can do easily enough that you will be bothered to do it.
      Without knowing the details of your circumstances, it sounds like, maybe, the real point is that what you want is an object oriented database rather than relational one. RDF allows for much more of an object oriented design than a traditional RDBMS does.

      A fully normalised SQL database has many of the attributes of an RDF store, but a) when was the last time you saw one in production use?
      I've seen lot's of poorly normalized databases and even situations, with a database of my own design, where I realized later that I should have done things differently. Still, there always seems to be a way to work around the shortcomings. My question is: Is it really easier to follow the best practices of the Symantec Web than the best practices of relational database design.

      b) how much of a pain was it to write big queries with outer joins?
      My experience has been that you only need to solve a few of those big query problems once and use the solutions to create views. Then you can have a very well normalized database but still be able to query it easily.
    2. Re:Vapourware my arse by Scarblac · · Score: 1

      I work for a research lab in the Netherlands; we've also finished quite a few projects using Semantic Web technology. Our use case is large heterogenous data sets in agrotech, like representing all knowledge on growing tomatoes and tomato quality in the Dutch agro sector.

      Finally a comment that compares Semantic Web technology to RDBMS technology. It's very unfortunate that it has "Web" in the name. Makes the clueless think it's supposed to be a try for WWW 3.0, or something...

      --
      I believe posters are recognized by their sig. So I made one.
    3. Re:Vapourware my arse by Wastl · · Score: 1

      For the record; I am a researcher working in the Semantic Web area, and I am primary developer of the system IkeWiki and the reasoning language Xcerpt. Since this discussion seems to pop up again and again on Slashdot, I didn't want to add comments to the same issues (trust, search) again. But your comment might add something new to the discussion:

      Without knowing the details of your circumstances, it sounds like, maybe, the real point is that what you want is an object oriented database rather than relational one. RDF allows for much more of an object oriented design than a traditional RDBMS does.

      In principle, you are right. But there is an important difference between RDF and Object Oriented Databases: while OO DBMS require that the data always conforms to a strict, pre-defined schema, RDF data is semi-structured and can be very flexibly extended. To give an example: in an OO DBMS, it is a problem if a person is defined only by first name and last name, and someone else wants to add a "friend" relationship to this person that is not foreseen in the schema. With RDF, this is not an issue: programs and repositories that were designed just for first name/last name will equally well work in the presence of a "friend" property. In a Web environment, chaotic as it is, this is a crucial property.

      Greetings, Sebastian

    4. Re:Vapourware my arse by tpz · · Score: 1

      It is just as unfortunate that it has "Semantic" in the name. I spent a good few years working in the RDF space (we were frankly too far ahead of our time to be commercially viable, as the potential clients could see the value but the VC's didn't "get it") and if there is one thing that I have seen hurting semantic web proponents again and again is the damn name "semantic web", which is so ridiculously overloaded as to be outright dangerous to anyone trying to talk about it.

      As a random aside, the other thing that needs to stop is people showing anything even remotely resembling a visualization of the raw graph, as it completely throws readers off and sends them into tangents about "these graphics will be useless" and "clicking on tags will suck", etc.

  42. Jane Jones has a developer key by dugeen · · Score: 1

    I clicked 'here' for a developer key and was told that it had been despatched to jane.jones@gmail.com. Good news for Jane Jones.

  43. natural language processing in search? by pbhj · · Score: 2, Interesting

    timeOday >>> "evolution of natural language processing in search (rather than manual tagging) will solve the problem"

    But then if you're creating an addon for joomla (or any template elements really) to display event listings why not add a semantic tag so that a search engine could limit the domain by "tag:events". The extra effort involved is pretty minimal, especially when, if you code well, each event is probably in a "<div class="event eventtype"> ..." anyway.

    Once people realise that search engines can do semantic filtering then it will be worth it.

    As for tag-spamming well surely google, et al., won't accept based on tag first but will do their usual contextual/ quantative analyses first and then limit based on tags. So we wouldn't be gaining any spam over what we have now?

  44. Re:Why can't AI get the semantics from the plain t by semanticsearch · · Score: 1

    Actually, NLP software does generally use those statistical methods. RDF is a storage and sharing mechanism - that's the big deal.

  45. Fallacy: Designing for Old People by EgoWumpus · · Score: 2, Insightful

    I try to keep my 64 year old dad educated enough to buy coffee beans on ebay, check email, look at news, etc. Every time he sees 'symantic web' or 'web 2.0' in the media, it just confuses him, and I imagine, people like him who just use the net for basics like online bill pay, ebay, etc.

    I'm afraid whenever I see this argument I immediately tend to discredit all the rest that I've read in that post. Designing technology for those who are least able to uptake it is a losing proposition at best; at worst a total disaster. Technology has always been utilized by those less set in their ways first, less invested in the capital and experience of doing it the 'old way', and is only more broadly adopted once it proves out as a better way to do things. Universal acceptance tends to only come after a generation; when those who are poorly situated to utilize it have passed on.

    This speaks to your other concern rather tellingly. Fry's may not put their inventory online. But if Best Buy does, and reaps more rewards, then you can bet eventually all companies will do this as standard practice. Far more likely - a company that is smaller and more mobile will do it first, and then get bought out by a larger company that will adopt it's practices in order to stay potent in a changing marketplace.

    But the successful online inventory app is not going to design for Best Buy first. They're going to design for Mom and Pop shop, and scale up to whatever customer they can find. When it proves out or doesn't there will be tangible evidence for others to act on - rather than meaningless hype.

    Finally, I think the thing that the semantic web provides is more of the ability of the end user to control results. As we perfect our ability to parse machine language, we perfect our ability to hear clear signal amongst all the noise. I look forward to the day when we have this technology in more than a nascent stage, and think it's silly to dismiss it before then.

    Also, I look forward to the day when people stop designing for me. Because presumably I'll be happy with what I have!

    --

    [Ego]out

  46. Semweb @ NASA by mhermans · · Score: 1

    NASA's an other big name who recently started using semweb-technology "for real": "Last week POPS--the expertise location service we built for NASA--went into production as an Agency-wide application; it's thought to be the first "institutional" (that is, business) Semantic Web app deployed Agency-wide at NASA. http://clarkparsia.com/weblog/2008/02/07/our-babys-all-grows-up/

  47. slasdot me, please! by Eric+Coleman · · Score: 1
    In the registration process for getting an API key there is the following question and choices:

    How many people do you anticipate will use your application?
    1-10 (Just me and mine.)
    10-100 (Intranet, protected access.)
    100-1,000 (Slashdot me, please!)
    1,000-10,000+ (Everyone, I hope.)

    I'm sure there is some sort of semantic joke in their somewhere but I can't find it.
  48. I actually tried Calais. Here are my results. by glebleu · · Score: 1

    I didn't try 5000 documents, but only two, one general text and one financial news. The results I got results were promising, but at the same time IMO not reliable and actionable enough that I would use this technology today to buy/sell stocks automatically. - http://lebleu.org/blog/2008/02/10/kicking-the-tires-with-opencalais/ - http://lebleu.org/blog/2008/02/03/microsoft-offer-to-buy-yahoo-semantic-analysis-by-opencalais/

    1. Re:I actually tried Calais. Here are my results. by K-Man · · Score: 1

      That's about what I expected. I've worked with a similar entity-extraction tool, and it's largely a process of twiddling the rules until it gets most of what you think is in there. The regex's, etc., used to find the entities are so large that the engine can easily bog down, and it's a complete pain to have minor changes balloon into 2x or worse performance hits.

      One shortcoming is the lack of interactivity on large datasets. Most web searchers iterate through a few queries until they get what they want, but in this case the iterations could take days.

      --
      ---- "If we have to go on with these damned quantum jumps, then I'm sorry that I ever got involved" - Erwin Schrodinger
  49. Re:In case you have no clue what they're talking a by phpWebber · · Score: 1

    And back then, we talked about trying to implement it. Then I read this:

    http://www.well.com/~doctorow/metacrap.htm

    Put a wet blanket on the whole idea (thankfully).

  50. Semantic MediaWiki already exists by lennier · · Score: 1

    http://meta.wikimedia.org/wiki/Semantic_MediaWiki

    It's basically just a matter of tweaking it and putting some real data in.

    --
    You are not a brain: http://books.google.com/books?id=2oV61CeDx-YC
  51. Fallacy: Not reading comment by globaljustin · · Score: 1

    Designing technology for those who are least able to uptake it


    I never mentioned Design! You didn't read my post very well, did you. I said that the HYPE of buzzwords like 'semantic web' or 'web 2.0' is lame, unnecesarily confusing, and annoying. The word hype was the first word in the subject of my post!

    Here, I've copied the paragraph from my post that you read incorrectly, emphasis mine

    I also hate internet hype. I really fouls things up, more than some want to acknowledge. I try to keep my 64 year old dad educated enough to buy coffee beans on ebay, check email, look at news, etc. Every time he sees 'symantic web' or 'web 2.0' in the media, it just confuses him, and I imagine, people like him who just use the net for basics like online bill pay, ebay, etc. He doesn't need a new buzzword to be motivated to shop online or whatever.


    You are a troll...either that, or you are not the sharpest knife in the drawer.
    --
    Thank you Dave Raggett
    1. Re:Fallacy: Not reading comment by EgoWumpus · · Score: 1

      I think that I spoke to the point you made, if not to the point you think you made. This 'discussion' - which I will leave vaguely defined, as it is - be it the design, the hype, or whatever, is taking place between people who are actively seeking this technology. It has really little to do with those people who due to some circumstance (of which age may be one) could care less until it's matured.

      It is therefore a fallacy to use your grandfather as an example of why things are 'too confusing'. There IS a lot of confusion over what the semantic web is; people are straining for something that is poorly defined at present. But it is foolish to bring those people who aren't even remotely interested into it, and expecting that to somehow have a bearing on the subject at hand.

      --

      [Ego]out

    2. Re:Fallacy: Not reading comment by globaljustin · · Score: 1

      you are definitely a troll...this 'discussion' is over

      --
      Thank you Dave Raggett
  52. Re:In case you have no clue what they're talking a by indig0 · · Score: 1

    What does the Semantic Web offer, why do we care?

    I'm going to try to add some personal perspective in addition to the worthy Wikipedia article linked in the parent, because I see a lot of criticism in these threads and not for the traditionally criticism-worthy issues. In case you're wondering, I was involved in a non-trivial Semantic Web related project in 2005: a learning experience, I won't mention it further. That said, I could be _totally_ wrong; but this is how I see things.

    The first question: Why do we want the Semantic Web? Sure, it sounds fancy, but why should _you_ (the average Slashdot reader) be excited about it? Well, let me explain why _I'm_ excited and maybe you'll agree... I tend to move data around, through systems and people, changing formats as necessary, making logical decisions based on the data as appropriate. Often there's a convenient library or tool or API for assisting me in doing this, making my life easier by abstracting the process of getting at that data and mashing it around into something more immediately useful. From my perspective, the Semantic Web will give me that power at a new level of convenience. Semantic markup, formats, ontologies, etc, allow data-centric code to be written more quickly and with less reinventing of wheels. Ever written a screen scraper? A perl script to pull data out of a proprietary log format? The Semantic Web will not be a panacea for these kinds of problems, but if we can convince people to mark up more data in reasonably common/standard ways then hopefully things like software mashups should become easier than ever! When software is better able to understand what data _is_, without huge amounts of domain-specific programmer effort, making decisions based on that data should be easier. Take a look at Firefox, microformats, and SPARQL, for example. Do users care about the Semantic Web? I don't think so, because all they should see is basically the same old browser-rendered Web. However, our ability (as software developers and general geeks) to produce useful tools and websites using Semantic Web data may result in even better websites and dynamic services.

    Second question: What does the Semantic Web look like? Not like Gravity, in my opinion. Dynamic graphs are handy visualization tools for some kinds of data, but definitely not all! In fact, they're pretty brittle and they don't scale at all. There are a lot of interesting proposed solutions to the visualization problem (see SIMILE and MIT's Haystack), but I don't think it really matters. Within a specific domain, there will always be better visualization tools than a generalized visualization method (written by those familiar with the domain). So, the Semantic Web will look basically the same as the current web. In fact, if you start looking carefully, I think you'll see it all around you...

    Third question: Why is "open data" exciting and what's the difference between just opening a MySQL database to the public and the Semantic Web vision? Well, if a site is exposing its "database" in RDF using a common ontology, then you can make use of their data just as you'd use their services via an API. A data provider may not foresee all the potentially useful ways to use their data just as they may not foresee ways to make use of their API, but a clever programmer can take from their surroundings what is needed and make of it something more. If you think this is random, note that /. and k5 have been serving up RDF of their frontpages for years and that today we regularly use RSS feeds and some black magic to do similar things.

    As I said, I could be way off the mark here. This is just the simplified perspective I've adopted after thinking about it for a while and reading the common sources. Please don't take this as gospel or thorough, comments or corrections are very welcome.

  53. Re:In case you have no clue what they're talking a by WK2 · · Score: 1

    I read that article, and it convinced me of nothing. All it says is that meta-data is not perfect, and will not create a utopia. Duh.

    --
    Write your own Choose Your Own Adventure. http://www.freegameengines.org/gamebook-engine/
  54. Calm down! by EgoWumpus · · Score: 1

    I don't know what provoked your vitriol. I'm not a troll - but the moderators are welcome to disagree. Since they haven't yet, I'm currently disposed to thinking you're overreacting to my disagreement with your viewpoint. I'm as happy as you seem to be to let it lay, however.

    --

    [Ego]out

  55. how open is opencalais? by Anonymous Coward · · Score: 0