Slashdot Mirror


Nepomuk Brings Semantic Web To the Desktop, Instead

An anonymous reader writes "Technology Review has a story looking at Nepomuk — the semantic tool that is bundled with the latest version of KDE. It seems that some Semantic Web researchers believe the tool will prove a breakthrough for semantic technology. By encouraging people to add semantic meta-data to the information stored on their machines they hope it could succeed where other semantic tools have failed."

32 of 140 comments (clear)

  1. Um, no thanks by Anonymous Coward · · Score: 5, Funny

    I've tried Symantec products in the past, and they are worse than actually having a virus. They slow your PC to a crawl, get their claws into every part of your computer, and are extremely difficult to purge when you finally give up on them.

    1. Re:Um, no thanks by Wonko+the+Sane · · Score: 2, Insightful

      Whoosh

      Didn't you get the memo?
      That should read:

      "You may have Frontotemporal Dementia. Please see your physician."

    2. Re:Um, no thanks by pitchpipe · · Score: 3, Funny

      You sound like an anti-Semitic asshole to me. ;^)

      --
      Look where all this talking got us, baby.
  2. Care to explain? by Leafheart · · Score: 2, Interesting

    What exactly is semantic web, and why haven't I ever heard of it?

    --
    --- "When you gotta do something wrong. You gotta do it right. (Fighter)"
    1. Re:Care to explain? by orkybash · · Score: 2, Informative

      It describes the ability to add metadata to web content (tags, etc), and you haven't heard of it because web 2.0 is the more popular term. ;)

    2. Re:Care to explain? by mcgrew · · Score: 4, Informative

      I'm dubious

      I have yet to see "semantic web" fully explained, but Wikipedia is giving some good insight into it, especially into its nebulousness. It is supposed to make web (or in this case, desktop) documents machine-readable.

      TFA deals not with the Semantic Web, but rather the "semantic desktop". As it says, "Semantic Web researchers believe the tool will prove a breakthrough for semantic technology. By encouraging people to add semantic meta-data to the information stored on their machines they hope it could succeed where other semantic tools have failed".

      HTML had "semantic tools" built in - keywords.
      <meta meta name="description" content="Auto Mechanics">
      <meta name="keywords" content="auto, mechanincs, wrench, sex, penis, tits, clit, boobs">

      You see how it was abused. Any more advanced semantic tools will be similarly abused.

      There are other problems, as the wikipedia article explains:

      Practical feasibility
      Critics question the basic feasibility of a complete or even partial fulfillment of the semantic web. Some develop their critique from the perspective of human behavior and personal preferences, which ostensibly diminish the likelihood of its fulfillment (see e.g., metacrap). Other commentators object that there are limitations that stem from the current state of software engineering itself (see e.g., Leaky abstraction).

      Where semantic web technologies have found a greater degree of practical adoption, it has tended to be among core specialized communities and organizations for intra-company projects.[12] The practical constraints toward adoption have appeared less challenging where domain and scope is more limited than that of the general public and the World-Wide Web.[12]

      [edit] An unrealized idea
      The original 2001 Scientific American article by Berners-Lee described an expected evolution of the existing Web to a Semantic Web.[13] Such an evolution has yet to occur. Indeed, a more recent article from Berners-Lee and colleagues stated that: "This simple idea, however, remains largely unrealized."[14]

      [edit] Censorship and privacy
      Enthusiasm about the semantic web could be tempered by concerns regarding censorship and privacy. For instance, text-analyzing techniques can now be easily bypassed by using other words, metaphors for instance, or by using images in place of words. An advanced implementation of the semantic web would make it much easier for governments to control the viewing and creation of online information, as this information would be much easier for an automated content-blocking machine to understand. In addition, the issue has also been raised that, with the use of FOAF files and geo location meta-data, there would be very little anonymity associated with the authorship of articles on things such as a personal blog.

      [edit] Doubling output formats
      Another criticism of the semantic web is that it would be much more time-consuming to create and publish content because there would need to be two formats for one piece of data: one for human viewing and one for machines. However, many web applications in development are addressing this issue by creating a machine-readable format upon the publishing of data or the request of a machine for such data. The development of microformats has been one reaction to this kind of criticism.

      Specifications such as eRDF and RDFa allow arbitrary RDF data to be embedded in HTML pages. The GRDDL (Gleaning Resource Descriptions from Dialects of Language) mechanism allows existing material (including microformats) to be automatically interpreted as RDF, so publishers only need to use a single format, such as HTML.

    3. Re:Care to explain? by Anonymous Coward · · Score: 2, Insightful

      "Semantics" is information about meaning (whereas syntax is information about form). Semantic tools try to provide meaning by describing relationships between information atoms. The goal is to create systems which can answer questions like "how old is the president's oldest child?" with just the age, instead of listing all documents which contain the words "old" "president" "oldest" and "child".

    4. Re:Care to explain? by radtea · · Score: 4, Interesting

      The Semantic Web is a failed attempt to extend the WWW via "semantic markup", which allows users/editors/etc to tag content (text, images, data) using a standard format that can be read, processed and exchanged by machines which can then give users more useful pointers to stuff that they care about.

      The Semantic Web has failed for a bunch of reasons, with many people tending to blame the tools. However, those of us of a particular epistemological bent believe that it is doomed in principle as current conceived because "meaning" is a verb, not an adjective.

      "These data mean X" is completely incoherent on this view of meaning, like saying "This smell of orange blossoms has Republican leanings." "Meaning" is simply not an attribute of data, any more than political tendencies are an attribute of scents.

      The Semantic Web fails to capture almost everything about the entities that do the meaning (people) but instead is based on the belief that meaning is a property of data. Data inspires meaning, but meaning is something that humans do, and the Semantic Web has no effective mechanism for capturing this, although with sufficient markup by many individuals on the same data it should be possible to do something similar to ROC evaluation of the ways people mean, which would greatly enhance the utility of the Semantic Web.

      A colleague who works in GIS pointed out an consequence of this phenomena to me many years ago when he described an experiment involving a bunch of geologists mapping a particular terrain. At the end of the day, after integrating all their inputs, he could tell who mapped where, but not what anybody mapped.

      --
      Blasphemy is a human right. Blasphemophobia kills.
    5. Re:Care to explain? by thermian · · Score: 2, Insightful

      It describes the ability to add metadata to web content (tags, etc), and you haven't heard of it because web 2.0 is the more popular term. ;)

      Personally I think that metadata/tag based systems are the wrong road for semantic analysis of web pages. As soon as the semantics of a thing is decided by additional information added to describe that thing, its open to abuse.

      The only advantage is its faster than what should be done, which is using good old maths to extract the true 'meaning' of a document or object.

      Its not hard. Well, ok, its a little hard. Oh ok, its really rather difficult, but there are plenty of places you can get example code or libraries to make things easier.

      --
      A learning experience is one of those things that say, 'You know that thing you just did? Don't do that.' - D. Adams
    6. Re:Care to explain? by Dynedain · · Score: 4, Insightful

      I've got a better reason why it failed that doesn't require delving into first year philosophy.

      People are lazy. Look at any image database and figure out why it's difficult to find something. Because people don't want to spend 20 minutes filling in tags for a single image they just want to show off to their friends.

      Now expand that to every other form of data type, and its easy to see why the semantic web never did, and never will take off without significant AI involvement.

      --
      I'm out of my mind right now, but feel free to leave a message.....
    7. Re:Care to explain? by giuntag · · Score: 2, Interesting

      Explained better than I could ever do: http://www.well.com/~doctorow/metacrap.htm

    8. Re:Care to explain? by anomalous+cohort · · Score: 2, Interesting

      I disagree. First of all, the semantic web is just about allowing content creators to associate context with their content to facilitate a context sensitive search. The semantic web has lackluster adoption because google does a great job at context sensitive search without the context providing meta-data markup.

      A more limited version of semantic web has achieved some notable traction. Microformats are another way of associating context with content that is more agreeable with content providers.

      A more compelling technology offering than Nepomuk for advancing semantic web would be Reuters' OpenCalais project. That's the one you should be watching. Another interesting trend to watch is how semantic web is affecting the more popular collective intelligence movement.

    9. Re:Care to explain? by Chelloveck · · Score: 2, Informative

      People are lazy. Look at any image database and figure out why it's difficult to find something. Because people don't want to spend 20 minutes filling in tags for a single image they just want to show off to their friends.

      And even when they do fill in the tags, they're sloppy about it. Things get misspelled and mislabeled all the time. Most people are very inconsistent about labeling even when they're trying their best to do an honest, thorough job. Okay, let me tag this photo "wife", because has my wife as the subject. And "boat" because she's standing on a boat. And "ocean", because that's where the boat is. Better make that "Atlantic Ocean". Let's add the month, year, and day, too. And the time of day. There. Now I can query for "all pictures of my wife in the Atlantic on a dark and stormy night". Oh, wait, I forgot to tag the weather...

      Of course, this doesn't even touch on the problem of people just plain lying about their data to make it more appealing to possible viewers. I want the picture to show in search engines, so I'll tag it "nude", "pr0n", and "teen". Those tags have nothing to do with the picture, of course, but they'll get it noticed.

      I don't expect a Semantic Revolution to happen as long as fallible, inconsistent, lying, cheating humans are in the loop.

      --
      Chelloveck
      I give up on debugging. From now on, SIGSEGV is a feature.
    10. Re:Care to explain? by hey! · · Score: 2, Insightful

      Actually, I'd say it's too early to say that the Semantic Web has failed. What has clearly failed for now is the vision for how the technology was to be used.

      For one thing, it turned out that really, really clever textual matching is a lot more powerful than anybody thought possible. Twenty years or so ago, you'd have thought that you'd need to have some kind of sophisticated metadata to do the kinds of stuff we take for granted in Google today. I turns out that a technology that turns a needle in a haystack into a box of needles with some straw mixed in is pretty darned useful. Human intelligence picks the needle of meaning from the straw of superficial matches pretty effectively.

      But what about non-human intelligence?

      Well, here is another failure of the vision. Clearly, a semantic web is much more friendly to non-human agents. However, the whole agent philosophy of software design is extremely failure prone. A project which makes a resource easier to use for people is a safer bet than one which tries to replace human reason.

      That said, you have the wrong end of the stick, philosophically. It is because meaning is not an attribute of data that we need semantic technology, It might be less contentious and pretentious if we simply call it "metadata".

      If I want to find the rate of a certain disease in each county, the numerator is quite easy: I count all the instances of the disease. But the denominator turns out to be tricky, because of what I call the curious case of the dog barking in the night: some counties don't report any cases because they don't have any, others lack the technical capability to detect it.

      Consider a county that can't detect the disease. I ought to exclude that county from the denominator in my rate calculations. On the other hand, a county which can detect ought to be included in the denominator, even if it reports no cases. However, since it found no cases, what we usually have is an absence of data which looks identical to the absence in counties that aren't capable.

      You have to have the metadata to tell these cases apart. You have to have a model saying such and such a lab protocol is capable of detecting such and so set of infectious agents, and then you need metadata linking each data set to the appropriate model. You can do it by hand, manually discarding the data for counties you know you can't use, but this is really quite awkward when you cosider that the situation can change from year to year, or even within a year.

      The model aspect presents a considerable can of worms. For any purpose, you want enough model, but no more than that. This is akin to the situation of novice designers who set out to create object frameworks before the have defined the software application. For us to share data we have to have some common model of things (although our terminology may differ). On the other hand it is certain our models disagree with each other; we want enough shared model to work together without forcing our entire model on each other, which is impractical.

      The point is that you can't guess all the kinds of uses that future users as yet unknown might want to put data to, what kind of meaning they might extract from it. That's why search engine technology works so well: you put your stuff on the web and it gets spidered by Google: no guesswork needed. The Semantic Web, on the other hand, requires anticipating how the data will be used, which limits its usefulness. The "limits" here are, however ones of scope; the Semantic Web can't do everything, it certainly can't take the place of Google. Within the scope of its potential applications, it could be very useful indeed.

      --
      Post may contain irony: discontinue use if experiencing mood swings, nausea or elevated blood pressure.
    11. Re:Care to explain? by grcumb · · Score: 2, Interesting

      It describes the ability to add metadata to web content (tags, etc), and you haven't heard of it because web 2.0 is the more popular term. ;)

      Wrong and wrong. Sort of. 8^)

      The Semantic Web is the term coined by Tim Berners Lee, describing the ability to associate data using inference (rather than explicit reference). In his conception, it relies on XML data formats and the ability to use common elements to translate between one and the other.

      It's not a terribly easy concept to grok at first, but the basic premise is that in data transformation, you only need to know the two steps closest to you in order to translate (and process) data from numerous other sources. As long as we know how to get from A to B and B to C, we can go straight from A to C.

      The vision of the Semantic Web, therefore, is of a Web that is completely transparent, because XML data encapsulation and transformation is ultimately universal, albeit with the presence of an unknown number of intermediate steps.

      --
      Crumb's Corollary: Never bring a knife to a bun fight.
    12. Re:Care to explain? by eihab · · Score: 2, Insightful

      If web content is readable and meaningful to me than it already has inherent meaning. Semantic tagging duplicates effort.

      Semantic web is also about accessibility. Take a blind person for example surfing the web using a screen reader, do you have any idea how horrible his/her browsing experience would be like in the web today?

      [Robotic voice]
      Document Title - Slashdot | Nepomuk Brings Semantic ..
      Document Body
      Stories - Anchor link
      Slash boxes - Anchor link
      Comments - Anchor link
      Search
      Form field - Text - Query
      Submit button - Search
      New for nerds, stuff that matters
      Hello eihab! - Link
      Help & Preferences - Link ....

      Click on a different page, and there you go listening to the same headers _again_. It can get very frustrating.

      Without semantics there's no easy way for a screen reader (or other accessibility enabling devices) to successfully translate a document to something intelligible and usable.

      You can see how a photograph (or even worse, an image with text that conveys something) can be completely hidden from a blind user without proper meta data that describe it. Or how a mildly complicated table would be read completely out of order if the reader couldn't distinguish between header rows and content rows, etc. (That's why designing using tables is a horrible idea).

      The example I gave above is solved *cough*hacked*cough* today by adding two anchor links at the top of the page (skip to contents and skip to navigation) then hiding these links from regular browsers using CSS (Note: It happens to also be a valid [not hack] solution to the problem of scrolling past long navigation links on mobile devices).

      I think you can see how a reader could easily identify which parts of the document are important and what should be skipped over or highlighted had it been served a semantic and valid [x]HTML document.

      --
      If you can't mod them join them.
  3. As a KDE 4 user... by orkybash · · Score: 5, Informative

    I've tried out Nepomuk and, while I have to say that it's promising, it's got miles to go before it's even near ready. The main problem is application support. Sure, you can rate and tag and describe your files in the Dolphin file browser. So what? You can do the same in Vista. This doesn't mean anything if applications don't hook into this and make use of it. Of the apps I've used, Gwenview (a photo viewer) has Nepomuk partially implemented but it's buggy and you need to compile it yourself with it explicitly enabled (this will apparently change in KDE 4.2). Digikam, which allows you to rate, tag, and describe photos already, says that they have no plans of integrating with Nepomuk anytime soon. Amarok 2 has work towards a Nepomuk collection, but the devs say that this will always run along side the main, MySql-based collection and it's nowhere near ready yet. My email is in the cloud so I can't even begin to talk about KDE-PIM's support or lack thereof.

    The other problem at the moment is a lack of ability to query your semantic data. Can I get anything to show all photos with my wife in them that I've rated four or above? Not at the moment. Hopefully this is coming in KDE 4.2, but as it stands at the moment it makes Nepomuk a case of write-only memory.

    So, maybe something to get excited about in the future, but not quite yet.

  4. Horrible name. by haeger · · Score: 2, Insightful

    NepoMUCK? Anything ending in "MUCK" doesn't sound like a good product. The concept is very interesting but the name isn't the best I've seen.

    I'm glad that they don't prefix everything with K though.

    Yes, I know that Nepomuk means "Networked Environment for Personalized, Ontology-based Management of Unified Knowledge" as stated in the article.

    .haeger

    --
    You are not entitled to your opinion. You are entitled to your informed opinion. -- Harlan Ellison
    1. Re:Horrible name. by Cornwallis · · Score: 4, Funny

      Agreed. They ought to call it NepoGIMP. Now that's a name.

    2. Re:Horrible name. by shadwstalkr · · Score: 4, Funny

      Yes, I know that Nepomuk means "Networked Environment for Personalized, Ontology-based Management of Unified Knowledge" as stated in the article.

      I assumed it was KumOpen (come open) backwards. I think the real acronym is even stupider than that.

    3. Re:Horrible name. by Znork · · Score: 3, Insightful

      Yep, that was my first thought as well. Quickly followed by wondering if 'into a collaboration environment which supports both the personal information management and the sharing and exchange across social and organizational relations' was some kind of euphemism for, eh, group pr0n of some kind.

      Oh, well, either they have much less dirty minds than mine, or someone's desire for well-indexed pr0n browsing has gotten slightly out of hand.

  5. On the brighter side... by Cyberax · · Score: 4, Funny

    It's not as bad as GIMP :)

  6. I doubt it will catch on... by Angostura · · Score: 2, Insightful

    And I'll tell you why.

    The Nepomuk Web site wants to make me chew my own arm off. Now, I'm familar with the Semantic Web, I'm excited by the idea of semantic organisation. But this site is the epitome of grim, lifeless European research-ese. It completely fails to convey the technological approach, how it works, or why you should give a damn. I get the impression that the team was more interested in the EC funding then actually developing a disruptive technology.

    Why why can't researchers spend 15 minutes thinking about how to convey the importance and excitement of what they are trying to do in terms of practical examples.

    I'm afraid you'll probably have to wait until some enterprising 3rd party to grab the source and build some of the technology into a different product.

    1. Re:I doubt it will catch on... by contra_mundi · · Score: 2, Informative
    2. Re:I doubt it will catch on... by leobard · · Score: 2, Informative

      The Nepomuk Web site wants to make me chew my own arm off.

      ha, good one.

      Why why can't researchers spend 15 minutes thinking about how to convey the importance and excitement of what they are trying to do in terms of practical examples.

      There are some, but they are not very elegant:

      http://dev.nepomuk.semanticdesktop.org/wiki/UsingNepomuk
      http://dev.nepomuk.semanticdesktop.org/wiki/UsingDropBox
      Or check out the KDE stuff:
      http://nepomuk.kde.org/discover/user
      also in cute little moving pictures:
      http://www.youtube.com/watch?v=_8oavLQeAjM

  7. Redundant by bjourne · · Score: 2, Insightful

    All information is semantic. This slashdot post is information encoded using English semantics. Unfortunately for the machines, the English semantics are way to complicated for them to understand. So they need a simpler set of grammar rules to be able to parse it. But why would anyone want to waste time marking it up just for the benefit of machine readability when google basically can accomplish the same thing without all that metadata markup cruft?

  8. Metacrap rant from Corey Doctorow by pzs · · Score: 2, Interesting

    There's a good rant from Corey Doctorow about this. I think the best phrase that summaries people's high hopes for the semantic web is "nerd hubris".

  9. Re:giorgio@elementi.ws by maxwell+demon · · Score: 2, Funny

    1) Everything must be tagged.

    Easy: Just have a bot add "untagged" tags to everything not yet tagged. Then it's tagged, because it's tagged "untagged".

    2) Information must be TRUE (otherwise you will get bad deductions).

    Also easy: Just remove all wrong information before making your deduction. OK, so how is the computer to know what is wrong? Well, that's of course again semantic information, so just tag anything wrong as "wrong". If some "wrong" tagging happens to be wrong, you can still tag that as "wrong" as well.

    3) Ontologies, that is schemas stating what IT IS, should be shared (please don't die laughing)

    Just upload them onto any p2p network. Sharing is what they are for, aren't they?

    3) Not all "SCHEMAS" can be deductible (the complexity of what you state is a huge COMPUTATIONAL problem).

    Well, if the software gets stuck, it still can ask a human.

    Note to the humour impaired: Imagine a smiley after each sentence!

    --
    The Tao of math: The numbers you can count are not the real numbers.
  10. You got that exactly backwards by MarkusQ · · Score: 4, Interesting

    The Semantic Web is a failed attempt to extend the WWW via "semantic markup", which allows users/editors/etc to tag content (text, images, data) using a standard format that can be read, processed and exchanged by machines which can then give users more useful pointers to stuff that they care about.

    You got that exactly backwards.

    The WWW was an earlier doomed attempt at semantic markup, and up until the summer of '93 or so it looked like it might work. That's when the early rants about people using the tags to control layout instead of too convey meta information (e.g. using em to get italics in a bibliography, dt/dd to make roman numeral lists, etc.) started--or at least when I first became aware of them. In fact, pretty much the entire history of HTML has been a tension between the language's designers and purist, who want users to care about what markup means, even if it does nothing, and the vast majority of users who only care about what it does regardless of the "meaning" that may be ascribed to it. Once you can get your head around both perspectives some of the goofier things in the whole tawdry history (the Table Wars, XML, CSS) make a lot more sense.

    Ok, a little more sense. But only if you already knew what people are like.

    --MarkusQ

  11. Semantic Web Article in CACM by raddan · · Score: 2, Insightful

    There's actually a pretty good introduction to the semantic web in this month's Communications of the ACM. You're right when you say that the semantic web is, as yet, mostly unrealized. But it has huge potential.

    Relational databases were in the same position in the late 60's/early 70's. We needed ways to combine and extract information automatically with a simple and expressive language. Relational database management systems, combined with SQL were the result of that, and they were a smashing success. They are now a standard business tool. The key to that success is essentially the role that the database's ontology plays in an RDBMS.

    Having spent a lot of time professionally and academically working with and studying database technologies, most of the work is in understanding your data. Specifically, building a data model. A well-built data model is essentially an ontology. There are various techniques used to make sure that your can be handled automatically, mainly by normalization. This requires a tremendous amount of work on the part of the database designer, but the end result is that the end-user can query this data in fairly simple terms and get an enormous richness of data, sometimes in ways that even the database designer did not foresee. I think the success of database systems is what is driving a lot of the work in building the semantic web.

    So you can see-- the big problem with the web is not just that data is not just unstructured, but that there are no standardized ontologies out there. RDF is an attempt to solve some of these problems simply, because you can embed your ontology, but it may be well off. On the other hand, if new tools make structuring data very easy or natural, people may be motivated to do the extra work because they'll personally benefit from it. For example, many people annotate or organize their photo collections naturally, so that they can share them with others. A smart photo gallery software writer may be able to come along and take advantage of that behavior to further enhance the meaning of that data.

  12. That's the idea. by Balinares · · Score: 4, Informative

    > ... the semantic web never did, and never will take off without significant AI involvement.

    I understand that the point of Nepomuk is to allow for automated tagging by the standard tools of the KDE desktop. For instance, say you receive a picture from an IM contact who KDE also knows (through the address book framework, Akonadi) lives in Europe.

    Then Nepomuk would allow you to make search queries as "Bring up all the pictures that people living in Europe sent me last week". Well, that's the theoretical goal anyway; we will see if they ever get there.

    There's one nifty application already: you can create a Folder View plasmoid on your desktop, and instead of making it display ~/Desktop/ as usual, you can make it display the result of a query through the Nepomuk KIO slave. See here how it works.

    --

    -- B.
    This sig does in fact not have the property it claims not to have.
  13. Re:This indexing fad should curl up and die by lennier · · Score: 3, Interesting

    "Everybody and his uncle tries to make systems that will index every piece of crap on your PC and it invariably results in a useless and horrible waste of resources."

    On the contrary, we should seriously be asking ourselves *why*, when all our data is sitting there on our PCs, we've let ourselves get into such a state of disorganisation at the operating system level that a class of program called 'indexer' exists as a third-party tool in the first place.

    How come it's not already taken as given that the primary thing an operating system *does* is, you know, *know where all its data is*?

    It's as if we're living in an age before 'directories' were invented - or before databases had 'indexes' and 'queries' - and we have to manually write down and key in raw sector numbers every time we open a file. And we're okay with that, because we think - and teach - that that's 'just how computers work'. We've accepted that there's a whole class of things our computers can't do 'because there's no application to do that'.

    Something is wrong with this picture.

    --
    You are not a brain: http://books.google.com/books?id=2oV61CeDx-YC