Slashdot Mirror


Tim Berners-Lee and the Semantic Web

An anonymous reader writes "As we all know, Tim Berners-Lee is the hero of the Web's creation story--he conjured up this system and chose not to capitalize on it commercially. It turns out that Sir Tim (he was knighted by Queen Elizabeth II in July) had a much grander plan in mind all along--a little something he calls the Semantic Web that would enable computers to extract meaning from far-flung information as easily as today's Internet links individual documents. In an interview with Technology Review, the Web-maestro explains his vision of 'a single Web of meaning, about everything and for everyone.'"

66 of 250 comments (clear)

  1. What Does 42 Mean for Privacy? by Allen+Zadr · · Score: 3, Interesting
    'a single Web of meaning, about everything and for everyone.'

    So, once this is off the ground, who wants to bet that the answer really is, 42?

    Seriously though, this could be really cool, but I imagine that this could have some very adverse effects on privacy given the amount of information that finds itself on the web. Items that are linked by obscurity in disperate places would be easily linked into a single profile (If the stuff he's talking about isn't primarily smoke and mirrors). Either way, like any powerful technology, it will have both good and bad consequences. Here's hoping for the good...

    --
    Kinetic stupidity has a new brand leader: Allen Zadr.
    1. Re:What Does 42 Mean for Privacy? by cynic10508 · · Score: 2, Insightful

      Seriously though, this could be really cool, but I imagine that this could have some very adverse effects on privacy given the amount of information that finds itself on the web. Items that are linked by obscurity in disperate places would be easily linked into a single profile (If the stuff he's talking about isn't primarily smoke and mirrors). Either way, like any powerful technology, it will have both good and bad consequences. Here's hoping for the good...

      People would do well to note the principle: Security by obscurity isn't.

    2. Re:What Does 42 Mean for Privacy? by Allen+Zadr · · Score: 4, Insightful
      Ah, but what constitutes privacy but an obscurity of your own behaviors in certain circles.

      That is to say, I may be an item scammer in online gaming realms, or in Diablo, but not in EverQuest. However, I may be one of the most honest people I know in the real world. Perhaps I have a second account that I use to Troll on Slashdot, but otherwise have this account where I try to post insightful information. You have the right to link these things, you may even have the right to link these to real world data like where I work and where I park my car. However, if I jilted someone in Diablo, do I want them to so easily find me and take it out on my car (as some people would)?

      Do I want my employer having instant access to all of my online transactions, regardless if I'm on shift or off shift at the time? Individually, these are not things that have been considered something you would even want to 'secure', yet they may be valuable to someone.

      --
      Kinetic stupidity has a new brand leader: Allen Zadr.
    3. Re:What Does 42 Mean for Privacy? by cynic10508 · · Score: 2, Insightful

      Ah, but what constitutes privacy but an obscurity of your own behaviors in certain circles.

      I would disagree. I would say privay is more like cryptography in that privacy is the ability to control who knows certain information. So privacy is confidentiality.

      That is to say, I may be an item scammer in online gaming realms, or in Diablo, but not in EverQuest. However, I may be one of the most honest people I know in the real world. Perhaps I have a second account that I use to Troll on Slashdot, but otherwise have this account where I try to post insightful information. You have the right to link these things, you may even have the right to link these to real world data like where I work and where I park my car. However, if I jilted someone in Diablo, do I want them to so easily find me and take it out on my car (as some people would)?

      Well, this goes off on a tangent. I would argue that you're making an incorrect metaphysical and/or epistemelogical distinction in dividing your "virtual" and "real" personas. What is ethical in one is ethical in another and vice-versa.

      Do I want my employer having instant access to all of my online transactions, regardless if I'm on shift or off shift at the time? Individually, these are not things that have been considered something you would even want to 'secure', yet they may be valuable to someone.

      Kind of another tangent. If you're using your employer's network then legally you've pretty much given up the right to privacy. My suggestion would be not to use company computers to do anything that you wouldn't want them looking at.

    4. Re:What Does 42 Mean for Privacy? by Allen+Zadr · · Score: 2, Interesting
      The Semantic Web is for chasing tangents. Sorry if this seems marginal to you.

      My point in the virtual vs. real persona is that you cannot expect the same behavior patterns from the same people given totally different situations. My killing your character in an online death-match does not mean I would be unethical enough to kill you. Likewise, if I pick up trinkets from the monsters you have slain (clearly, they are not my spoils to take), this does not mean that I will take tips off of tables at a restaurant.

      Similarly, most of my 'online' activity is done from home. That does not mean that a symantec web is designed to tell the difference. In fact, just the opposite. It's designed to merge all data that's available on me into a single profile. Again, this could be misleading. If I spend 3 hours (average) per day gaming, does this make me less capable of doing my job? Maybe, maybe not. Would this change the way my employer perceives my performance? Probably, yes.

      The other point which I think you are trying to make, is that if the data is out there, then it can already be searched out from other means already. This may be the case, but not necessarily.

      Given a much more personal example: If my cross-identy is posted by a friend on an obscure site, Google may pick that up. If you then trace my cross-identity into the online world, you will find many, many postings - as well as political views (mostly by the name you see me posting under now). My politics definately don't agree with those whom pay my salary. Would they hold these politics against me if they were easily traced? I don't know. I honestly don't want to find out. Point being, the symantic web (if working) would quickly link me with my politics.

      My greater fear, it would be just as easy for an advertiser to do this (not that they don't already to some extent), it would just be even easier. The only benefit? I may stop getting ads for things I don't need.

      --
      Kinetic stupidity has a new brand leader: Allen Zadr.
    5. Re:What Does 42 Mean for Privacy? by crschmidt · · Score: 2, Informative

      There are several solutions to the problems you describe. I'll address the few I'm most comfortable with responding to - not because the others are unsovable, simply because I don't want to provide inadequate information.

      All information on the web should be taken with, as they say, a grain of salt. Depending on what you are looking at, it has more or less value. For example, something on Wikipedia can probably be assumed to be relatively accurate, whereas something on Joe Schmo's website on Geocities will probably be considered to be less accurate in general. The semantic web allows for you to see who is saying something in a number of ways, and to verify this information:

      • URI Source - If the source of data about Chevy Trucks is at chevy.com/trucks.rdf, you'll probably have a pretty good reason to trust it.
      • dc:creator - a self-assigned name for the creator of the document
      • Most importantly, wot:assurance: a signature, using standard public/private key encryption, of a document, assuring that the signer indeed did create the information

      Each of these methods of determining where information is coming from has its own special place in assigning credence to the document in question. Thus, if a document signed by crschmidt@crschmidt.net says that the person "CHristopher Schmidt" owns the email address crschmidt@crschmidt.net - it's probably safe to trust that person.

      Once the data is available on the web, it is easy to find other data: one of the basic terms is "seeAlso" - a way for providing other URLs to look for data at. Once the web starts, it is easy to link it, and to do so is to increase the data .You don't need something smart or intelligent - simply wander around, collect all the rdfs:seeAlso links, and download those - and continue from there. This process, known as "scuttering", is an easy way to start creating a relatively large data store.

      Using descriptions of when information is updated allows tools to understand when they should check back for more information. Similar to the way RSS feeds (which are a part of the Semantic Web) can inform tools that they will be updated in 2, 4, 6, 24 hours, general RDF documents can do the same thing - saying 'check me again in a week" or more.

      There are currently tools for working with the semantic web in a small scale. Although this is nothing like the big dream - having almost everything described, so that computers can really understand the world around them - these tools do have their usefulness. I can now ask "What is the name of the person whose aim name is cr5chmidt", and be told the answer. Although it's not perfect - very little about the semantic web is perfect yet - it doesn't need to be. For more information, see my post on the bot I created to spider semweb data in my blog.

      As you said, it won't be easy. However, it is possible, and it seems to me more and more likely each day that working on these tools and increasing the amount of semantic data in every little way can help.

      --
      -- Christopher Schmidt YouTube Quality of Experience
    6. Re:What Does 42 Mean for Privacy? by blue+trane · · Score: 2, Insightful

      Didn't they come up with a few viruses for it though?

  2. The Semantic Web is the next big thing by Anonymous Coward · · Score: 2, Funny

    and has been for over a decade (or more).

  3. 'Twas a happy day on SemWebCentral... by tcopeland · · Score: 3, Interesting

    ...when the man himself signed up for a user account. w00t!

  4. What is the semantic web? by Anonymous Coward · · Score: 5, Informative

    Well, beyond the "knowledge management"-type mumbo jumbo, anyway. Some basic definitions are here, here, and .

  5. You don't want a "single" web... by Pig+Hogger · · Score: 3, Insightful
    You don't want a "single" web... You want a multitude of them, and carefully isolate them (beyond normal information reading and referencing).

    This is to insure against a monoculture that is so disastrous in computer circles as demonstrated by the numerous security failings of Windows...

    1. Re:You don't want a "single" web... by JimDabell · · Score: 3, Insightful

      This is to insure against a monoculture that is so disastrous in computer circles as demonstrated by the numerous security failings of Windows...

      Windows executes stuff. The semantic web is just data. Your warnings about a monoculture apply to the semantic web about as much as they apply to text files.

    2. Re:You don't want a "single" web... by JimDabell · · Score: 3, Insightful

      Remember when you couldn't get a virus just by reading an e-mail?

      Yes, and again, the problem is when the stuff that executes has a monoculture. It's not like you see Pine users or KMail users infected by emails with Outlook viruses in.

  6. Duplicate Posting by Anonymous Coward · · Score: 5, Funny

    See the original here.

    Actually Slashdot posts this article over and over again every few months, with basically the same headline (sometimes "and" sometimes "on" sometimes "Tim" sometimes not). Kinda bizarre really. :-) I've never read any of them, I only know this Berners-Lee fellow from the headlines.

  7. Dang CERNopeans! by Anonymous Coward · · Score: 4, Funny

    As we all know, Al Gore is the hero of the Web's creation story.

  8. "Where's some semantic web software?" by tcopeland · · Score: 4, Informative

    This always gets asked - and a partial answer is right here.

    Eclipse plugins, visualization tools... there's some good stuff there.

    1. Re:"Where's some semantic web software?" by Schwarzchild · · Score: 2, Interesting
      Yeah but is it anything that you'd want to use?

      The God Emperor of XML, Tim Bray, doesn't seem to know of any such software so he posted a challenge.

      --

      "sweet dreams are made of this..."

  9. about everything and for everyone... by over_exposed · · Score: 4, Funny

    Except for China, they get their own semantic web with special semantic filters in place that semantically keep their citizens under semantic control.

    --
    "The object of war is not to die for your country, but to make the other bastard die for his." - Patton
    1. Re:about everything and for everyone... by Anonymous Coward · · Score: 5, Funny

      I hope you're not anti-semantic?

  10. Opposing view by Psychic+Burrito · · Score: 5, Informative

    If you'd like an opposing view, make sure to read Clay Shirky's take on the semantic web.

    1. Re:Opposing view by david.given · · Score: 2, Interesting
      If you'd like an opposing view, make sure to read Clay Shirky's take on the semantic web.

      Having just read quite a lot of his article before becoming far too annoyed to go any further, I really wouldn't take him very seriously. The bulk of his complaint is that although the Semantic Web is about drawing conclusions from widely disparate pieces of data, people don't think like that. I have no complaint with this.

      However, he attempts to illustrates his point with lots of syllogisms. Unfortunately, he doesn't seem to understand them. For example, he uses this one:

      1. Count Dracula is a Vampire
      2. Count Dracula lives in Transylvania
      3. Transylvania is a region of Romania
      4. Vampires are not real

      ...to illustrate that despite the fact that all the above statements are correct, the only conclusion you can draw is that Romania is not real.

      Huh?

      The only way you can come to that conclusion is if you assume that statement 2 implies that, if X lives in Y and X is not real, then Y is not real. Which is an invalid assumption. Therefore his conclusion is not valid.

      The entire essay is full of things like this. When he's talking in generalities, he makes a small amount of sense, but as soon as he starts using specifics, he stops making sense. There may be something to his basic point, but I'm not inclined to trust someone's opinions on a fundamentally logic-based concept who seems to be so inept at using logic. Treat with caution.

    2. Re:Opposing view by mr_majestyk · · Score: 2, Insightful

      semantic web allows people to publish their own ontologies, and the best tools should be those that learn to extract interesting info from various sources.

      That's right. More to the point, the system supports many ontologies, and allows the best ontologies to rise to the top.

    3. Re:Opposing view by Allen+Zadr · · Score: 2, Insightful
      Having read both of your articles, I do not see either of them as opposite, but rather complimentary.

      All information that is subjective is a poor candidate for the symantec web. All information that is quickly subject to change is a poor candidate for the symantec web. When mixing subjective (verb) pointers to a given truth on a large scale, modified by objective pointers, where even one of many thousands is false (or mis-keyed), the overall meaning can become quickly subverted.

      In other words, if I get enough people to post somewhere that Allen Zadr lives in New Mexico, the multiple verbs that would otherwise point to the actual fact -- there is no Allen Zadr -- would be subverted. That is, unless you could syntactically link Allen Zadr to an actual human being.

      Even more simply, the symantic web is only as good as the data. It's not very difficult to get a well trusted source to make an assertion of a truth while avoiding the linking details - thus presenting the users with a subverted view of reality. It has many flaws, and many promises. It won't fail, but it will never be better or worse than the existing systems, just different.

      --
      Kinetic stupidity has a new brand leader: Allen Zadr.
    4. Re:Opposing view by Sique · · Score: 3, Insightful

      No, computers don't need meaning to handle data. Computers need syntax and rules how to act at syntactic structures. The semantic web is founded on the hope that enough syntax thrown at huge amounts of data turns magically into semantics.

      It's based on the assumption that all semantics can be explained by syntax. So far this has not been proven yet, and all attempts to get there went stuck somewhere and turned out something different, sometimes useful (Chomsky's grammars), sometimes not so useful.

      The semantic web would have to deal with the laziness of people who can't be bothered to write meaningful ALT attributes to tags. It can try to guess on some of the semantics, but it can also easily be fooled. Everyone who ever tried to use content filters for an internet connection knows what I am talking about. There are lots of false positives rejected and hundreds of questionable sites run through, because the syntax of a site alone doesn't help with evaluation the semantics (the meaning) of this site.

      --
      .sig: Sique *sigh*
    5. Re:Opposing view by null+etc. · · Score: 2, Interesting
      I don't really find value in Clay Shirky's arguments against syllogisms, which serve as the basis of value within a semantic web.

      In order to prove that syllogisms are flawed, Clay presents examples of common English statements, and attempts to arrive at flawed deductions. Such flaws only work for Shirky due to the ambiguity of the English language.

      In reality, a semantic web would neither store nor organize data according to the loose ambiguities of English. Rather, such information would need to be highly structured, using a formal system, in order for the accuracy of syllogisms to work.

      As an example, let me examine a sentence that appears within a technical specification of a project I'm working on:

      A financial institution may offer customers the ability to download account statements from its web site.

      If this sentence were to be placed on the semantic web, it would be useless, given the ambiguity of several words and contexts. Instead, the meaning of each phrase, clause, and word would need to be made fully explicit using a formal semantic representation. Such a representation might be based on a hierarchical data structure such as XML.

      If the above sentence were to be fully clarified, it would appear as:

      From amongst the entire set of financial institutions actual or theoretical, a set of one or more such financial institutions may exist that offers each customer, from a set of one or more of the financial institution's actual customers if the financial institution is actual, or theoretical customers if the financial institution is theoretical or the financial institution is actual and may theoretically have customers, the ability for the customer to download each account statement from a set of one or more of the customer's actual account statements if the customer is actual, or theoretical account statements if the customer is theoretical or the customer is actual and may theoretically have account statements, from the web site owned and administered by the financial institution.

      Obviously this structure is much larger, but it contains all of the information necessary to resolve the sentence's ambiguities.

      The above structure could also be expressed simply in XML. To examine a fragment of the above structure:

      From amongst the entire set of financial institutions actual or theoretical

      This would most likely appear using a structured representation such as:

      (target)
      (set)
      (scope)entire(/scope)
      (members)
      (membertype)financial institution(/membertype)
      (instancetypes)
      (type)actual(/type)
      (type)theoretical(/type)
      (/instancetypes)
      (/members)
      (/set)
      (/target)

      The Slashot "comments" field is extremely broken, so I've been forced to use parentheses and omit indentation.

      Isn't it funny how the english sentence fragment is so much easier for humans to understand, even though both representations contain the same information? It's amazing what our brains do "automatically" by operating under certain contexts. Similarly, a machine will have much greater ease in understanding and processing the formalized structure, in cases where it wouldn't even be able to guess at the corresponding english fragment (Well, it would be able to guess, but with hilarious results. What's that, a piece of toast rules over Utah?)

      No doubt, translating normal human english sentences into a semantic web will be a lengthy and complicated process. But some mitigating factors:

      • As "prefab" semantic units are constructed, such units could be reused without reconstructing them.
      • The resuse of units will allow the full value of the unit to be achieved, without introducing unecessary and confusing variance between instances of identical units.
      • Units may be constructed in such a way to semantically avoid the ambiguities of every language, not just english. Such a conversion p
    6. Re:Opposing view by Fnkmaster · · Score: 2, Insightful
      While I understand where you are coming from, let me present the parts of his arguments that do seem to hold water to me.


      1. The Semantic Web (or rather, ontology construction and construction of relationships between your local ontology and other ontologies) is complicated and time consuming, and require you deciphering lots of other people's stuff to connect your stuff to it. Ultimately the success of any new technology, especially one that requires widespread adoption to be useful, must be easy enough to adopt that people adopt it. RSS, HTML and other successful technologies allow you to focus your effort on the local endeavour and don't require tons of formalized, structured organization of data, which runs somewhat counter to human nature. They are thus substantially less labor intensive to implement, and have therefore been taken up quite rapidly. This argument I consider to be perfectly valid and fairly strong.


      2. Trust of ontological data is a critical issue because lots of false assertions and mediocre data will inevitably creep into a large, distributed "semantic web". This is a problem with the web currently, and you definitely have to take everything you read with a grain of salt, trust certain sources more than others, and so on. I think this argument holds some water, but I think this problem is addressable.


      Personally, I think it will ultimately be easier to implement something like Cyc to build structured knowledge networks from information in human grokkable form. The internal representation of a Cyc-like machine will probably look quite similar to the semantic web, including the ability to adjust world view, evaluate source material reliability, etc. Getting a machine to build this knowledge representation, despite all the ambiguities of human expression, is more likely to succeed and be useful to humanity (IMHO) than getting lots of humans to interact with computers and technology in a structured, logical fashion. This is not to say that there aren't applications where structured ontological data would work well.


      I particularly like the idea of auto-translation between different structured data formats, but I do agree with Clay that it's more likely that businesses will construct isolated "island" ontologies (such as a specific XML schema for describing formatted data) and deal with translation to other formats on an ad-hoc basis, for simple resource allocation and cost reasons.


      Your argument (pro) seems to rely on the idea that tools will make things easier. I can't help but think of 4GL programming, SQL and attempts to make programming accessible to "average" people. The fact is good tools make things easier, but only certain people or people trained to do so can really think in a structured, logical fashion and express that in a way that a computer understands. No efforts to handwave away that issue to "tools" has ever succeeded. Tools can help, but they are not a panacea. HTML is so successful and widespread because it's simple to edit, as it only requires basic visual thinking to understand - and tools let you skip the intermediate step and edit the visual representation directly.


      The concept of editing semantic information is fundamentally not so simple, because humans don't formalize their thinking about relationships on a day-to-day basis. Like visual mapping tools for XML, they may make things slightly easier, but I wouldn't expect any magic. Like I said, I think that we will ultimately end up there, but I believe it will be approached from the other direction.

    7. Re:Opposing view by Thuktun · · Score: 4, Insightful
      If you'd like an opposing view, make sure to read Clay Shirky's take on the semantic web.

      His writings appear to have some uncorrected logical fallacies.
      Consider the following assertions:
      • Count Dracula is a Vampire
      • Count Dracula lives in Transylvania
      • Transylvania is a region of Romania
      • Vampires are not real
      You can draw only one non-clashing conclusion from such a set of assertions -- Romania isn't real.
      You can conclude the following from those statements:
      • Count Dracula is not real
      • Count Dracula lives in a region of Romania
      I'd like to see the mystery step that combines these to conclude that Romania isn't real; at most, you could say that Romania houses something that isn't real. The conclusion he makes isn't supported by any logic.

      More importantly, these are dumbed-down semantics. The assertion that a fictional character lives somewhere real needs to be qualified that this occurs in a certain set of fictional stories, not real life. The fact that these unqualified statements are represented in this example ontology means that the ontology is insufficient, not that this method isn't useful.

      Another example in that article:
      • US citizens are people
      • The First Amendment covers the rights of US citizens
      • Nike is protected by the First Amendment
      You could conclude from this that Nike is a person, and of course you would be right.
      This is even factually incorrect. The First Amendment doesn't actually say anything about US citizens; it restricts the US Congress from certain actions, period, not for certain people.

      Ignoring this, you can make one conclusion and reduce this to the following:
      • the First Amendment covers the rights of people
      • Nike is protected by the First Amendment
      Concluding that Nike is a person from this is a logical fallacy. (Nothing in these logical statements says the First Amendment might not also cover the disposition of small peanut butter sandwiches with blueberry jam, which set Nike might then be an element of.)

      I find it hard to treat this article with much weight, given its fast-and-loose treatment of logic and ontological assertions.
    8. Re:Opposing view by david.given · · Score: 2, Insightful
      The conclusion is invalid because YOU happen to know that it's invalid. It certainly could be valid given only the rules presented. As an example, if you used Superman and Metroplis in the above example, it would work fine.

      Rule 2 does not provide any information about the reality of its parameters. Stating things a bit more formally:

      1. isA(dracula, vampire)
      2. locatedIn(dracula, transylvania)
      3. locatedIn(transylvania, romania)
      4. ~isReal(vampire)

      These aren't rules, they're statements providing one-way inferences. You may only create forward logic chains. There aren't really any interesting conclusions you can come up with from this, apart from being able to state that some unreal things live in Romania.

      Shirky gives examples of some of Dodgson's syllogisms (and Dodgson is a master among logicians). Dogson's syllogisms are interesting because they're based around rules. Take the one about poems:

      1. No interesting poems are unpopular among people of real taste.
      2. No modern poetry is free from affectation.
      3. All your poems are on the subject of soap-bubbles.
      4. No affected poetry is popular among people of real taste.
      5. No ancient poetry is on the subject of soap-bubbles.

      He uses generic statements, rather than absolute statements. You can see this if I restate it:

      1. isInteresting(X) IMPLIES ~isPopular(X)
      2. isModern(X) IMPLIES isAffected(X)
      3. isYours(X) IMPLIES isAboutBubbles(X)
      4. isAffected(X) IMPLIES ~isPopular(X)
      5. ~isModern(X) IMPLIES ~isAboutBubbles(X)

      Notice that all these rules have to be specified in generic terms. We have equations we can manipulate. This means we can use them. There's an rule that ~A IMPLIES B == B IMPLIES A which lets us restate as follows::

      1. ~isPopular(X) IMPLIES isInteresting(X)
      2. isModern(X) IMPLIES isAffected(X)
      3. isYours(X) IMPLIES isAboutBubbles(X)
      4. isAffected(X) IMPLIES ~isPopular(X)
      5. isAboutBubbles(X) IMPLIES isModern(X)

      And from here it's just a matter of substituting in, since (A IMPLIES (B IMPLIES C)) == (A IMPLIES C). This means that we can prove that your poems are modern, affected and uninteresting, but popular.

      You need the statements to provide the fundamental information, and the rules to let you manipulate that information. (Dodgson avoids needing a statement by using rule 2 instead; it would work just as well had rule 2 been ~isInteresting(yourPoem), but that would only let you prove that yourPoem was uninteresting, not that all your poems are uninteresting.).

      Shirky's trying to discredit the Semantic Web by using a syllogism of his own, that goes like this:

      1. Syllogisms that don't contain rules are useless.
      2. The Semantic Web is constructed out of syllogisms.

      From this he's trying to draw the erroneous conclusion that the Semantic Web is useless. I leave the problem with this as an exercise to the reader.

      Seeing as he is apparently trained in this stuff, which I am not, this makes me think that he is either (a) incompetant or (b) is deliberately trying to mislead people. Either way, I don't trust his logic.

  11. Semantic Web by null+etc. · · Score: 3, Informative

    A topic I posted a few years ago is perfectly relevant to this submission: http://slashdot.org/comments.pl?sid=92504&cid=7953 441

  12. interesting technology... by LiquidMind · · Score: 2, Interesting

    "...enabling computers to extract meaning from far-flung information as easily as today's Internet simply links individual documents."

    i wonder if this could be used for a computer's local file system as well. I know microsoft is working on this (WinFS or OFS or whatever it's supposed to be called), but it would be damn awesome to apply this not just to the internet.

    --
    This sig contains repetition and redundancy.
    1. Re:interesting technology... by KjetilK · · Score: 2, Interesting

      The RDF geeks are allready discussing a marriage of Reiser4 and RDF.

      --
      Employee of Inrupt, Project Release Manager and Community Manager for Solid
  13. Two major problems to a semantic web by levram2 · · Score: 5, Insightful

    The extra work required to put data into a standard data format won't be done. People can't bother making their pages w3c complaint (even slashdot). The second problem is that data formats can rarely be agreed upon by a large community. Look at how many calendar event and news feed formats there are.

    1. Re:Two major problems to a semantic web by jilles · · Score: 2, Insightful

      The reason people don't bother with w3c compliant webpages is that there is no obvious advantage. Slashdot works fine in all modern browsers and aside from some bandwidth that could be saved by going fully XHTML/CSS there is little to be gained (well there are a number of advantages but they're obviously lost on the editors).
      With data it is different, just look at how quickly RSS & ATOM are being adopted. There's an obvious advantage because having a feed on your site makes it easier for readers to learn about new content on your site. It doesn't matter that there are multiple competing standards because the tools that matter are standards neutral (most feed readers can handle most RSS and ATOM variants). If there is a sufficiently large enough group of people using a particular (open) format, it is worthwhile to program functionality to do stuff with this data.

      The RSS world is also spawning some interesting semantic things such as track back links and perma links. Not all of these things will survive but there already are these mini semantic webs emerging. These networks are growing in size and scope. People write tools to search and navigate them in various and sometimes unexpected ways. Whenever one tool involves multiple networks, effectively a larger one emerges.

      IMHO the semantic web is not something that will be released by some big software company or standards body like the w3c but rather something that will emerge out of the chaos of different standards, formats that are out there today. There will not be some monolithic onthology that explains everything but rather there will be many domain specific, simple onthologies that may be abstracted from by tools so that relations between datasets may be established and explored without requiring much changes to the data. Where meaningful relations exist, tools and standards will emerge to exploit these relations.

      --

      Jilles
  14. Re:The rest of us call this... by BigGerman · · Score: 2, Insightful

    Exactly.
    And here is the problem: what "the rest of us" are going to do when Google goes south? Either collapses under its own weight or finally broken by its corporate overlords?
    Can't put all the eggs in one basket. The only sane future is the one with unified, object-driven search and retrieval methods distributed amongst information consumers and producers.

  15. Re:The rest of us call this... by mr_majestyk · · Score: 3, Informative

    The rest of us call this... GOOGLE.

    Google identifies relationships between data using only on the links between pages containing the data.

    The Semantic web represents relationships between data based on metadata (i.e. data about data). This is a far more powerful way to describe the meaning of data.

    works for me.

    Maybe, but that doesn't mean its the best way to accomplish what you are trying to do.

  16. This burns me up!!! by octaene · · Score: 5, Funny

    I'm so tired of Semantic trying to take over all the security tools. Are they now trying to take over the Internet? I mean really, Semantic Antivirus totally sucks ass big-time!!! And don't get me started on Semantic's SystemWorks tool and how bad it blows!

    Oh, wait a minute...

  17. Meanwhile... by genixia · · Score: 2, Funny

    ...a team in Redmond is tasked to make sure that Microsoft own the "single Web of meaning, about everything and for everyone."

  18. Obvious candidate for massive abuse by gammelby · · Score: 2, Insightful
    How is the semantic web going to handle abuse like pr0nn g_annotation>...? I mean, anybody can put up bogus annotations to promote their filthy business, like we saw it in the days before google and pagerank.

    Ulrik

    1. Re:Obvious candidate for massive abuse by KjetilK · · Score: 2, Insightful
      I suspect the answer to that one are immense social networks, user participation and webs of trust.

      The WWW also has Annotea, to allow for people to submit annotations. Now, you can imagine lots of people having a simple way to rate pages, a rating option could for example be "Supplied metadata are bad/fraudulent", or something like that.

      You would first and foremost make decisions based on ratings from people you trust. That is, people who are close to you in your FOAF-based social network.

      When every Internet user becomes a reviewer, and people are well connected in a social network, so that there is a review available of most pages, there is going to be a very strong incentive for authors to supply accurate metadata. Think of it as moderation.

      Face it, allthough it happens that you stumble upon pr0n involuntarily, the vast majority of pr0n surfers do it on purpose. Pr0n0graphers (this is getting a bit too leet for me...) then will have strong incentive to refrain from such tactics, they will be modded into oblivion anyway, and accurate metadata is going to bring them traffic, since they are modded up by those who actually surf pr0n.

      So, unless the goatse guy is a friend of yours, I don't think it is a big worry.

      Provided SW becomes a reality that is.

      FOAF is a really good start, though, go create it now!

      --
      Employee of Inrupt, Project Release Manager and Community Manager for Solid
  19. Why is a hero? by Gothmolly · · Score: 3, Interesting

    Because he chose not to capitalize commercially on the Web? How is the measure of your altriusm the measure of your heroism? I understand that many people DO feel that way, but nobody has ever really explained WHY heroism is a necessary consequence of altriusm. Why is someone who makes a profit necessarily evil? The man who invented a corrugated-cardboard coffee-cup holder holds a patent on it; every Starbucks coffee sold puts a penny in his pocket. Why is that wrong?

    --
    I want to delete my account but Slashdot doesn't allow it.
  20. Statistical text analysis killed semweb by Ars-Fartsica · · Score: 5, Insightful

    As has been stated many times, content producers will spoof semantic data just like they used to with the META tag...which is why no one uses the META tag anymore. Relevance algorithms take into account link analysis and statistical text analysis to provide a much more truthful representation of what data is there. Sorry Tim.

    1. Re:Statistical text analysis killed semweb by Ars-Fartsica · · Score: 2, Interesting
      And who's to say that the Semantic Web metadata will not be populated with statistical text analysis and hyper-text analysis?

      Statistical methods excel at query relevance, not ontological interpretation. If the latter were the case, Google would be auto-constructing DMOZ instead of seeding page rank with it.

  21. The next "web"? by daveschroeder · · Score: 2, Informative
    Croquet

    ...from the minds of Alan Kay, David Smith, David Reed, and others...

  22. Ontology by dodongo · · Score: 5, Interesting

    I want to offer an alternative, as proposed by Victor Raskin at Purdue. I speak for neither Sergei Nirenburg nor Victor (who does enough talking for himself).

    While this idea for more thorough, concise, and accurate searches is a good one, I would question whether embedding semantic tags into web pages is the way to go.

    As outlined in Ontological Smenatics, there is an automated system of semantic processing already underway. Basically, it takes a text, then runs it through a parser, which looks up meanings in a lexicon, then reduces whatever translation it comes up with to a text-meaning representation (TMR), by pushing the concepts from the lexicon through an ontology / onomasticon / world-knowledge library. The TMR is basically the "pulp" of the semantics of the article, web page, book, or whatever it's been fed. It just contains the ideas, the things involved, and other relevant concepts, stripped of all other linguistic information.

    TMR is great, becuase the TMR can be used then, by reversing the process and using the lexicon of another language, to translate a text from one language to another.

    However, it seems to me that with the bits and pieces of the TMR stored in a search engine's index, this could be a huge boon for the search engine.

    Instead of just trying to match keywords, by parsing the TMR of web pages and by parsing TMR of search strings, you no longer search for keywords, but keyconcepts.

    The advantage to semantic searches / indexes by this implementation is manifold:

    -Searches (and the web as a whole) will gain the richness Mr. Berners-Lee is advocating.

    -Web authors will not be able to lie in their semantic tags, or otherwise misinform spiders what the page is about (remember tags?)

    -No extra work is required in the actual construct of the web or *ML standards. The TMR is only generated and stored by the sites / processes that need it.

    -Others?

    Just an alternative solution, for fun :)

    1. Re:Ontology by Feynman · · Score: 2, Funny
      OK. How does it do with this sentence: "Time flies like an arrow?"

      It returns: "Fruit flies like a banana."

    2. Re:Ontology by dodongo · · Score: 2, Insightful

      Well... I actually wrote a paper lambasting the ontology for precisely what you bring up here. Specifically, I wrote working from a draft of Adele Goldberg & Ray Jackendoff's paper "The English resultative as a family of constructions" paper (_Language_ vol. 80 no.3, September 2004). It deals with strange things like

      "The trolley rumbled through the city"

      and led me to believe Victor's ontological approach would have some serious problems encoding this if it didn't have a more attuned syntax processor. It wasn't a good paper, but I made my point, and you bring up a similar idea on a more basic (and thus, even more problematic) level.

      Anything remotely "idiomatic" (specifically, where the combinatoriality of semantics fails, as it does in your example, where time does not "fly" in the sense that it does not move through the air held aloft by differences in air pressure) starts to generate serious problems.

      Your problem could be solved if the lexicon had in it information about common idioms, which it presumably would, to be functional on any level more colloquial than academic writing. Most linguists would tell you the lexcion really does encode idioms in some fashion too, so this wouldn't be some sort of computational stop-gap.

      So the lexcion has in it "time flies" or something. The parser (or some sublevel of it) would then identify "like" as a metaphorical comparison to the following predicate "an arrow."

      Thus, the TMR would have something to do with time moving briskly towards a target, perhaps.

      I'm not saying this is an entirely feasible option, but read what Tim Berners-Lee is proposing, and see if you find it much more plausible. The amount of information out there people would have to manually encode would preclude the system from having any real functionality beyond keyword search. While I'm not a huge fan of the current implementation of the ontology, I do think future generations could start to sort things out. Its advantage is that once the concept database, the onomasticon, is complete, it should be mostly self-trainable, which is what Berners-Lee's solution lacks.

  23. Not doing it right by vigyanik · · Score: 4, Insightful

    The fact that Tim has been trying for 15 years to sell this idea with little success indicates that he approach is insufficient. He is pitching the idea just like a startup would, giving cool examples and everything. But in practice, all he is doing is proposing and overseeing standards. Developing standards for an idea is not what is required to prove that an idea works. Standards should follow successful technology, not vice versa. You need to have companies that make products professionally and offer complete solutions (i.e. make it work real-life situations). Doing it for a very simple example that he quotes ("find pictures taken on sunny days") itself is a big, big deal. Perhaps Tim should get involved with companies in this field as an advisor/consultant. You know, there are enough smart people out there who could develop the standards. But very few people with his name and recognition to truly ignite commercial interest in his ideas.

    1. Re:Not doing it right by dubious9 · · Score: 4, Insightful

      Perhaps Tim should get involved with companies in this field as an advisor/consultant.

      Um... he invented www and started the W3C. I'd say he's had some experience with companies as a advisor. Take a look at some of the W3C recommendations and look for corporate involvment.

      But in practice, all he is doing is proposing and overseeing standards.

      That's kinda what the W3C *does*.

      Standards should follow successful technology, not vice versa.

      XHTML,XML,XSLT and a lot of other recommendations started as standards that *later* had robust implementations. Technology that starts without standards if often not fully thought out and awkward, and at worst, proprietary. Waiting for technology before standards will only inhibit interoperability and adoption of the standard.

      The fact that Tim has been trying for 15 years to sell this idea with little success indicates that he approach is insufficient.

      I suppose that it has nothing to with the fact that it's a tremendouly difficult and abitious project. You're right. Anything that take 15 years to develop should be scrapped.

      --
      Why, o why must the sky fall when I've learned to fly?
  24. Google can leverage its search by PineHall · · Score: 4, Informative

    Here is an account that predicts that Google will leverage its search results to create a Semantic Web. I see this as a distinct possibility. Especially Google leveraging its search results to help people buy and sell stuff.

    1. Re:Google can leverage its search by NoOneInParticular · · Score: 2, Interesting
      The people at Google are probably too smart to buy into yet another failure-to-be from GOFAI (Good Old Fashioned Artificial Intelligence). Next to automatic translation (60s) and expert systems (80s), the semantic web (00s) will soon be found on the garbage heap of technology. Whenever the real world kicks in, crisp logic and deductive reasoning fail simply because they cannot account for uncertainty in the basis of their reasoning: their assumptions. There is no formal way to assert the truthfullness of assumptions (or if you want, ontologies), they are either true or false. That's it, there is no 'maybe' or 'could be' or 'pretty likely': true or false.

      Any form of information found on the web, from whatever trusted source, needs to be evaluated on the likeliness that it is true. From this likeliness, you can start reasoning and finally come up with a conclusion plus a degree of belief in that conclusion, but you will not be able to state that an assumption is absolutely true of false. As crisp logic only leads to valid conclusions assuming absolute truth or falsehood of its assumptions, any conclusion drawn from that meta-assumption is invalid, or at best unqualified.

      No, the abberation called fuzzy logic is no solution

      Enter the world of Bayesian reasoning. Here the truth of a proposition is never absolutely true or false, there are only degrees of belief and a system of systematic and consistent calculations to derive the likeliness of conclusions in the presence of uncertainty, plus a method to add new evidence to the calculations. Take a simple crisp assumption 'The sun always comes up in the morning'. For a semantic webber this statement is either true or false, and whenever two trusted sites claim two opposing views on the matter, the human operator needs to fix the inconsistency. A Bayesian webber might start to reason first: okay, first in the absence of any information I will assign an observation of true to the assertion, and an observation of false. This is my informationless prior and makes the likelihood 50%. Then I'm going to count: every time the sun has come up in the morning I count one for truth of the assertion. If it didn't I count one for falsehood. As I don't remember it ever not happening (and I would have noticed!) I can claim about the number of days I have lived to the truth of the assertion. That's about 4 9's of truths. Now I can ask someone else if they ever saw the sun not coming up. Assuming that I trust them for 95% to give me the correct answer, I can easily add a few extra nines to my belief in the assertion. Also reading some physics books adds to my belief, up to the point that it will take quite a lot of conflicting evidence to make me doubt that particular assertion.

      It might be interesting to note that from this strong belief in the assertion I can actually deduce that somebody that tells me otherwise is very likely lying to me, and I should watch whatever the person says. A semantic web will fly flat on its face when there are conflicting pieces of information or outright lies on 'trusted' webpages

      Note that the two approaches are completely at odds: for the crisp logic approach everything is either true or false, for the bayesian logic approach nothing is purely true or false(*). The Bayesian approach is well-known, but can easily lead to computational explosions. However, it seems to be the only way to reason in a world where evidence can (will) be contradictory and assumptions cannot be trusted. Without a consistent framework of reasoning with uncertainty (and the Bayesian framework is provably the only consistent one), the semantic web will be yet another failure of AI.

      (*) Bayesian probabilities can be completely true or false (1 or 0), but no-one in his right mind would do that because from that there is no mathematical way to change your belief, 20 9's should be enough for anybody.

  25. Re:The rest of us call this... by bongoras · · Score: 4, Insightful
    The Semantic web represents relationships between data based on metadata (i.e. data about data). This is a far more powerful way to describe the meaning of data.

    And this is what makes me wonder if this will amount to much more then an interested research project for grad students. In order for the SemWeb to amount to anything useful, everyone is going to have to include the metadata necessary to integrate their data into the Semantic Web. How's that going to work? Who's going to make it work?

  26. Will the "spash screen"... by jbarr · · Score: 2, Funny

    ...have the words "Don't Panic" prominently displayed?

    --
    My mom always said, "Jim, you're 1 in a million." Given the current population, there are 7000 of me. God help us all!
  27. Tagging vs. Understanding Conext by saddino · · Score: 2, Interesting

    The common thread to the Semantic Web is that there's lots of information out there--financial information, weather information, corporate information--on databases, spreadsheets, and websites that you can read but you can't manipulate. The key thing is that this data exists, but the computers don't know what it is and how it interrelates. You can't write programs to use it.

    IMHO, the problem with the Semantic Web is the same problem that evolved the Web from a linked knowledge store to a commercial-driven directory.

    Yes, it would be nice if all data were tagged and understandable, but let's be honest: the commercialization (and its result: exploitation by marketers) of the web would certainly spill into the Semantic Web, and so Berners-Lee's vision would be once again ruined by 1) incorrect/misleading tagging, 2) competing standards and 3) out and out fraud.

    I assume what Berners-Lee really wants is for a machine to truly understand that, using his example: something is a calendar, and that you are interetsed in it, and that you should add the event to your schedule and then book a flight for it.

    But the chances are -- one day -- machines will be able to understand how data is typed by understanding the context around it (just as a human would go through the aforementioned process manually).

    Obviously, this type of reading "comprehension" is a long ways off, but the "search engine wars" are resulting in a lot of mind power thrown at the problem of understand context. And I'm guessing it'll be a reality before anything as pure as the vision for the Semantic Web is realized.

    (and to throw in a plug for my own copmaniy's attempt at understanding web context: theConcept).

  28. Second System Effect by xleeko · · Score: 4, Insightful

    I've been hearing noise about the semantic web, RDF, and what not for years now, and every time I do, the first thing that pops into my head is "Second System Effect".

    He got lucky once, because he put together some tools that were simple and straightforward enough for people to pick it up quickly, thereby avoiding the fate of the dozens of other hypertext systems going back to the late 1980's.

    Now, like all second systems, he wants to "do it right", over-engineering away all of the things that made the first one take off ...

    Just my opinionated rant ...

  29. Re:The rest of us call this... by j1m+5n0w · · Score: 3, Interesting

    Google identifies relationships between data using only on the links between pages containing the data.

    The Semantic web represents relationships between data based on metadata (i.e. data about data). This is a far more powerful way to describe the meaning of data.

    This is an important point. Google computes the pagerank of a page based on the eigenvector of the web link matrix, which is a clever and usually effective approach. Unfortunately, each link only conveys a little bit of information. A link from page A to page B is assumed to be an endorsement of page B's relevance by page A. But what if you could add extra metadata to the links? Not just a URL and a human readable text label, but a machine readable label as well, like this?

    <a href=http://slashdot.org relevance=0.3 novelty=0.8 accuracy=-0.2 funny=0.2> slashdot </a>

    If you could apply arbitrary attributes to web pages, google would have much better information to go on, and a user could specify the importance of certain attributes depending on what he/she is looking for.

    -jim

  30. Re:The rest of us call this... by JimDabell · · Score: 3, Interesting

    Google's a hack. No, really, it tries to extract meaning from web pages that really aren't engineered to store that kind of information.

    Google is also an application. The Semantic Web is all about building the infrastructure so applications like Google don't have to chase the holy grail of AI to become more than a hack. Think of the Semantic Web as the layer underneath Google.

  31. Actually, Google is a search engine by wombatmobile · · Score: 4, Informative

    The rest of us call this... GOOGLE.

    Google searches undifferentiated text. In contrast, the semantic web is all about differentiating text by adding meta tags.

    For example, the word "Hilton" on a web page is ambiguous. It could be a hotel, or a celebrity. Which is it? With the semantic web we'd know:

    <motel>
    Hilton
    </motel>

    <celebrity>
    Hilton
    </celebrity>

    Of course, this is a fairly trivial example. A more meaningful example:

    <partnumber>
    LHMJ67523119900012
    </partnumber>
    1. Re:Actually, Google is a search engine by wdavies · · Score: 2, Interesting

      Perhaps I meant generalized.

      JVN quote. That was his exact point. You can model very restricted subsets successfully, but the whole thing is too much to encode. I've no problem with designing data structures. Its just when someone says data structure solve the grand AI problem that I have an issue.

      Sure, you want to do an XML schema for books - go ahead. For CD's sure. In fact, for any domain. Although bear in mind the documentation of the API is going to get bigger and bigger until it is unmanageable (or you end up with natural language, and we are back where you started with, using IR techniques...).

  32. Why this is a bad idea - it's a taxonomy by Animats · · Score: 4, Insightful
    The big problem with the so-called "semantic web" is that trying to taxonomize ideas doesn't work very well. Full-text search works much better.

    In the beginning, we had library card catalogs, with their painful attempts to index and cross-reference books. That works well in some areas, typically ones where names of people are significant. Attempts to apply the same approaches to technical papers worked less well.

    There's a very elaborate classification system for patents. When you had to look through patents on paper or microfilm, it was essential. Now that we have full text search, it's used less and less.

    A modern example of this approach is the ACM Taxonomy, a structure into which all computer science can be fitted. (As an exercise, try to put the current Slashdot stories into that taxonomy.) Nobody actually uses that taxonomy to find anything.

    As to data interchangability, that's a separate issue, and more of a standards one. The big problem for publicly available data is that the cost of encoding the data is borne by different people than those who benefit from the encoding. Many companies don't like having all their product and pricing information easily searchable by price. (Froogle may change this, because Google has so much clout.)

    I've spent some time dealing with public financial reporting. There's opposition to detailed disclosure in a standardized format. Many companies don't want their detailed information to be too easily analyzed. Embarassing results show up.

    The future is better search engines, not user-created indexing data. As we've painfully learned, a search engine must look at the same data a human reader would, or it will be lied to. Lied to to the point of uselessness.

  33. And the bigger problem: Trust by SoTuA · · Score: 2, Interesting
    Standards for metadata have been implemented, people can't be bothered to mark their pages, that's true, but the bigger problem is trust: How do you know that the metadata is true? It is the same as in the web right now, you can't know with no other references if the data is right, alghough, being a human being, you can judge on the quality of the data (i.e. a properly-written study that states that X is better than Y will garner more trust/respect than a document written in "OMFG X is tEh r0x Y is the Zux0rz!!!!111!1111!!1one and onety-one" style) But a computer reading the metadata is another point entirely.

    Trust is one of the major stumbling blocks of semantic applications and automatic knowledge management issues.

  34. Re:No, there's something there by Allen+Zadr · · Score: 2, Insightful
    Your faith in computational logic is astounding. Not to say that you may not be right, but to dismiss the possibility that 'shady' logic relationships such as this one would simply not occur. Especially when there are billions of similar relationships.

    By your declaring such functionality to be an error of logic does not (in my view) make it less likely.

    Back to my very example... the 'scams and cheats' property assertion of an online gamer against my account number is, by definition, a symantic inferrence. Unless a human jumps to the various links that make up the conclusion. Couple this with the very fact that my fictional search would be along the lines of 'transaction trust', the property does apply to the query.

    Basically that is the point. It is broken beyond usable functionality. It cannot make the conclusions advertised. It can link to points to help a human create valid conclusions.

    --
    Kinetic stupidity has a new brand leader: Allen Zadr.
  35. Re:The rest of us call this... by mr_majestyk · · Score: 3, Informative

    And what happens when people start misusing the metadata like the current meta tags?

    The Semantic Web just provides a method for expressing metadata. Maintaining the integrity of those expressions involves a different set of problems. Some of the solutions include trust metrics like Slashdot's own distributed moderation (PDF) or Advogato.

  36. The need for information management pops up again. by master_p · · Score: 4, Interesting

    If you have followed this little crazy guy that is me, you may have seen that most of today's computer problems are because modern operating systems offer nothing in the information management department.

    Remember the CVS story a couple of days before? it's information management: http://slashdot.org/comments.pl?sid=123076&cid=103 47565

    WinFS is also about information management: http://slashdot.org/comments.pl?sid=121101&cid=101 99083

    The story that the Evolution e-mail client offers the e-mail data as a data model separate from the application? another information management issue.

    The web? information management issue.

    Distributed databases? information management issue.

    Web search engines? information management issue.

    Windows search tool? information management issue.

    The Windows registry? information management issue.

    The unix etc directory? information management issue.

    Enterprise workflows? again, an information management issue. That's why there is no general workflow solution accepted and used worldwide.

    Dynamic web site contents? information management issue.

    The semantic web? another information management issue!

    As you can see, from the numerous examples given above, all that an operating system should do, but no one does, is that it must manage information instead of files. If that is coupled with a distributed networked environment, 90% of the world's software would be considered obsolete overnight and the productivity and fun from using computers will increase 10fold.

    If any open source developer is reading this, you may contact me for a private discussion on the idea. THIS IS OPEN SOURCE'S BIGGEST CHANCE TO LEAD THE TECHNOLOGICAL RACE!

  37. Nice Try, Tim by Master+of+Transhuman · · Score: 2, Insightful

    As you do note in your comments, however, it's not really doable without a good simulation of conceptual processing.

    Still, every little bit helps. Certainly a "Semantic Web" would be more useful than the current one.

    --
    Richard Steven Hack - This sig is TOO GODDAMN SHORT TO DO ANYTHING USEFUL WITH! MORONS!
  38. SemWeb == Huge Prolog program by calambrac · · Score: 2, Interesting

    The semantic web sounds a little like a massively distributed Prolog program, with each separate semweb component defining a rule or relation, and each semweb-aware program just being a query into the environment... Other questions: how do you avoid redundancies, or pulling data you don't want, or keeping data confined to specific locales or interpretations, or keeping labels synced with the actual data? What prevents someone from declaring something foo when it's actually bar?

  39. need standardization? by yonyonson · · Score: 2, Insightful

    for data to be shared and recognized as distinct fields of information, won't there need to be standardization across all hosts in order to use the data in any comprehensible way?

    ie.
    <product>
    Acme(tm) xxxxx
    </product>
    on host #1
    while on host #2 the same item is recognized as:
    <saleitem>
    Acme(tm) xxxxx
    </saleitem>
    how will the semantic web describe and relate items which are recognized as an item for sale but under different labels?