Tim Berners-Lee and the Semantic Web
An anonymous reader writes "As we all know, Tim Berners-Lee is the hero of the Web's creation story--he conjured up this system and chose not to capitalize on it commercially. It turns out that Sir Tim (he was knighted by Queen Elizabeth II in July) had a much grander plan in mind all along--a little something he calls the Semantic Web that would enable computers to extract meaning from far-flung information as easily as today's Internet links individual documents. In an interview with Technology Review, the Web-maestro explains his vision of 'a single Web of meaning, about everything and for everyone.'"
This is to insure against a monoculture that is so disastrous in computer circles as demonstrated by the numerous security failings of Windows...
The rest of us call this... GOOGLE.
works for me.
The extra work required to put data into a standard data format won't be done. People can't bother making their pages w3c complaint (even slashdot). The second problem is that data formats can rarely be agreed upon by a large community. Look at how many calendar event and news feed formats there are.
Ulrik
As has been stated many times, content producers will spoof semantic data just like they used to with the META tag...which is why no one uses the META tag anymore. Relevance algorithms take into account link analysis and statistical text analysis to provide a much more truthful representation of what data is there. Sorry Tim.
The fact that Tim has been trying for 15 years to sell this idea with little success indicates that he approach is insufficient. He is pitching the idea just like a startup would, giving cool examples and everything. But in practice, all he is doing is proposing and overseeing standards. Developing standards for an idea is not what is required to prove that an idea works. Standards should follow successful technology, not vice versa. You need to have companies that make products professionally and offer complete solutions (i.e. make it work real-life situations). Doing it for a very simple example that he quotes ("find pictures taken on sunny days") itself is a big, big deal. Perhaps Tim should get involved with companies in this field as an advisor/consultant. You know, there are enough smart people out there who could develop the standards. But very few people with his name and recognition to truly ignite commercial interest in his ideas.
I've been hearing noise about the semantic web, RDF, and what not for years now, and every time I do, the first thing that pops into my head is "Second System Effect".
He got lucky once, because he put together some tools that were simple and straightforward enough for people to pick it up quickly, thereby avoiding the fate of the dozens of other hypertext systems going back to the late 1980's.
Now, like all second systems, he wants to "do it right", over-engineering away all of the things that made the first one take off ...
Just my opinionated rant ...
semantic web allows people to publish their own ontologies, and the best tools should be those that learn to extract interesting info from various sources.
That's right. More to the point, the system supports many ontologies, and allows the best ontologies to rise to the top.
All information that is subjective is a poor candidate for the symantec web. All information that is quickly subject to change is a poor candidate for the symantec web. When mixing subjective (verb) pointers to a given truth on a large scale, modified by objective pointers, where even one of many thousands is false (or mis-keyed), the overall meaning can become quickly subverted.
In other words, if I get enough people to post somewhere that Allen Zadr lives in New Mexico, the multiple verbs that would otherwise point to the actual fact -- there is no Allen Zadr -- would be subverted. That is, unless you could syntactically link Allen Zadr to an actual human being.
Even more simply, the symantic web is only as good as the data. It's not very difficult to get a well trusted source to make an assertion of a truth while avoiding the linking details - thus presenting the users with a subverted view of reality. It has many flaws, and many promises. It won't fail, but it will never be better or worse than the existing systems, just different.
Kinetic stupidity has a new brand leader: Allen Zadr.
Seriously though, this could be really cool, but I imagine that this could have some very adverse effects on privacy given the amount of information that finds itself on the web. Items that are linked by obscurity in disperate places would be easily linked into a single profile (If the stuff he's talking about isn't primarily smoke and mirrors). Either way, like any powerful technology, it will have both good and bad consequences. Here's hoping for the good...
People would do well to note the principle: Security by obscurity isn't.
No, computers don't need meaning to handle data. Computers need syntax and rules how to act at syntactic structures. The semantic web is founded on the hope that enough syntax thrown at huge amounts of data turns magically into semantics.
It's based on the assumption that all semantics can be explained by syntax. So far this has not been proven yet, and all attempts to get there went stuck somewhere and turned out something different, sometimes useful (Chomsky's grammars), sometimes not so useful.
The semantic web would have to deal with the laziness of people who can't be bothered to write meaningful ALT attributes to tags. It can try to guess on some of the semantics, but it can also easily be fooled. Everyone who ever tried to use content filters for an internet connection knows what I am talking about. There are lots of false positives rejected and hundreds of questionable sites run through, because the syntax of a site alone doesn't help with evaluation the semantics (the meaning) of this site.
That is to say, I may be an item scammer in online gaming realms, or in Diablo, but not in EverQuest. However, I may be one of the most honest people I know in the real world. Perhaps I have a second account that I use to Troll on Slashdot, but otherwise have this account where I try to post insightful information. You have the right to link these things, you may even have the right to link these to real world data like where I work and where I park my car. However, if I jilted someone in Diablo, do I want them to so easily find me and take it out on my car (as some people would)?
Do I want my employer having instant access to all of my online transactions, regardless if I'm on shift or off shift at the time? Individually, these are not things that have been considered something you would even want to 'secure', yet they may be valuable to someone.
Kinetic stupidity has a new brand leader: Allen Zadr.
Ah, but what constitutes privacy but an obscurity of your own behaviors in certain circles.
I would disagree. I would say privay is more like cryptography in that privacy is the ability to control who knows certain information. So privacy is confidentiality.
That is to say, I may be an item scammer in online gaming realms, or in Diablo, but not in EverQuest. However, I may be one of the most honest people I know in the real world. Perhaps I have a second account that I use to Troll on Slashdot, but otherwise have this account where I try to post insightful information. You have the right to link these things, you may even have the right to link these to real world data like where I work and where I park my car. However, if I jilted someone in Diablo, do I want them to so easily find me and take it out on my car (as some people would)?
Well, this goes off on a tangent. I would argue that you're making an incorrect metaphysical and/or epistemelogical distinction in dividing your "virtual" and "real" personas. What is ethical in one is ethical in another and vice-versa.
Do I want my employer having instant access to all of my online transactions, regardless if I'm on shift or off shift at the time? Individually, these are not things that have been considered something you would even want to 'secure', yet they may be valuable to someone.
Kind of another tangent. If you're using your employer's network then legally you've pretty much given up the right to privacy. My suggestion would be not to use company computers to do anything that you wouldn't want them looking at.
1. The Semantic Web (or rather, ontology construction and construction of relationships between your local ontology and other ontologies) is complicated and time consuming, and require you deciphering lots of other people's stuff to connect your stuff to it. Ultimately the success of any new technology, especially one that requires widespread adoption to be useful, must be easy enough to adopt that people adopt it. RSS, HTML and other successful technologies allow you to focus your effort on the local endeavour and don't require tons of formalized, structured organization of data, which runs somewhat counter to human nature. They are thus substantially less labor intensive to implement, and have therefore been taken up quite rapidly. This argument I consider to be perfectly valid and fairly strong.
2. Trust of ontological data is a critical issue because lots of false assertions and mediocre data will inevitably creep into a large, distributed "semantic web". This is a problem with the web currently, and you definitely have to take everything you read with a grain of salt, trust certain sources more than others, and so on. I think this argument holds some water, but I think this problem is addressable.
Personally, I think it will ultimately be easier to implement something like Cyc to build structured knowledge networks from information in human grokkable form. The internal representation of a Cyc-like machine will probably look quite similar to the semantic web, including the ability to adjust world view, evaluate source material reliability, etc. Getting a machine to build this knowledge representation, despite all the ambiguities of human expression, is more likely to succeed and be useful to humanity (IMHO) than getting lots of humans to interact with computers and technology in a structured, logical fashion. This is not to say that there aren't applications where structured ontological data would work well.
I particularly like the idea of auto-translation between different structured data formats, but I do agree with Clay that it's more likely that businesses will construct isolated "island" ontologies (such as a specific XML schema for describing formatted data) and deal with translation to other formats on an ad-hoc basis, for simple resource allocation and cost reasons.
Your argument (pro) seems to rely on the idea that tools will make things easier. I can't help but think of 4GL programming, SQL and attempts to make programming accessible to "average" people. The fact is good tools make things easier, but only certain people or people trained to do so can really think in a structured, logical fashion and express that in a way that a computer understands. No efforts to handwave away that issue to "tools" has ever succeeded. Tools can help, but they are not a panacea. HTML is so successful and widespread because it's simple to edit, as it only requires basic visual thinking to understand - and tools let you skip the intermediate step and edit the visual representation directly.
The concept of editing semantic information is fundamentally not so simple, because humans don't formalize their thinking about relationships on a day-to-day basis. Like visual mapping tools for XML, they may make things slightly easier, but I wouldn't expect any magic. Like I said, I think that we will ultimately end up there, but I believe it will be approached from the other direction.
In the beginning, we had library card catalogs, with their painful attempts to index and cross-reference books. That works well in some areas, typically ones where names of people are significant. Attempts to apply the same approaches to technical papers worked less well.
There's a very elaborate classification system for patents. When you had to look through patents on paper or microfilm, it was essential. Now that we have full text search, it's used less and less.
A modern example of this approach is the ACM Taxonomy, a structure into which all computer science can be fitted. (As an exercise, try to put the current Slashdot stories into that taxonomy.) Nobody actually uses that taxonomy to find anything.
As to data interchangability, that's a separate issue, and more of a standards one. The big problem for publicly available data is that the cost of encoding the data is borne by different people than those who benefit from the encoding. Many companies don't like having all their product and pricing information easily searchable by price. (Froogle may change this, because Google has so much clout.)
I've spent some time dealing with public financial reporting. There's opposition to detailed disclosure in a standardized format. Many companies don't want their detailed information to be too easily analyzed. Embarassing results show up.
The future is better search engines, not user-created indexing data. As we've painfully learned, a search engine must look at the same data a human reader would, or it will be lied to. Lied to to the point of uselessness.
By your declaring such functionality to be an error of logic does not (in my view) make it less likely.
Back to my very example... the 'scams and cheats' property assertion of an online gamer against my account number is, by definition, a symantic inferrence. Unless a human jumps to the various links that make up the conclusion. Couple this with the very fact that my fictional search would be along the lines of 'transaction trust', the property does apply to the query.
Basically that is the point. It is broken beyond usable functionality. It cannot make the conclusions advertised. It can link to points to help a human create valid conclusions.
Kinetic stupidity has a new brand leader: Allen Zadr.
As you do note in your comments, however, it's not really doable without a good simulation of conceptual processing.
Still, every little bit helps. Certainly a "Semantic Web" would be more useful than the current one.
Richard Steven Hack - This sig is TOO GODDAMN SHORT TO DO ANYTHING USEFUL WITH! MORONS!
His writings appear to have some uncorrected logical fallacies.
You can conclude the following from those statements:
- Count Dracula is not real
- Count Dracula lives in a region of Romania
I'd like to see the mystery step that combines these to conclude that Romania isn't real; at most, you could say that Romania houses something that isn't real. The conclusion he makes isn't supported by any logic.More importantly, these are dumbed-down semantics. The assertion that a fictional character lives somewhere real needs to be qualified that this occurs in a certain set of fictional stories, not real life. The fact that these unqualified statements are represented in this example ontology means that the ontology is insufficient, not that this method isn't useful.
Another example in that article:This is even factually incorrect. The First Amendment doesn't actually say anything about US citizens; it restricts the US Congress from certain actions, period, not for certain people.
Ignoring this, you can make one conclusion and reduce this to the following:
- the First Amendment covers the rights of people
- Nike is protected by the First Amendment
Concluding that Nike is a person from this is a logical fallacy. (Nothing in these logical statements says the First Amendment might not also cover the disposition of small peanut butter sandwiches with blueberry jam, which set Nike might then be an element of.)I find it hard to treat this article with much weight, given its fast-and-loose treatment of logic and ontological assertions.
for data to be shared and recognized as distinct fields of information, won't there need to be standardization across all hosts in order to use the data in any comprehensible way?
ie. on host #1while on host #2 the same item is recognized as:
how will the semantic web describe and relate items which are recognized as an item for sale but under different labels?
Rule 2 does not provide any information about the reality of its parameters. Stating things a bit more formally:
These aren't rules, they're statements providing one-way inferences. You may only create forward logic chains. There aren't really any interesting conclusions you can come up with from this, apart from being able to state that some unreal things live in Romania.
Shirky gives examples of some of Dodgson's syllogisms (and Dodgson is a master among logicians). Dogson's syllogisms are interesting because they're based around rules. Take the one about poems:
He uses generic statements, rather than absolute statements. You can see this if I restate it:
Notice that all these rules have to be specified in generic terms. We have equations we can manipulate. This means we can use them. There's an rule that ~A IMPLIES B == B IMPLIES A which lets us restate as follows::
And from here it's just a matter of substituting in, since (A IMPLIES (B IMPLIES C)) == (A IMPLIES C). This means that we can prove that your poems are modern, affected and uninteresting, but popular.
You need the statements to provide the fundamental information, and the rules to let you manipulate that information. (Dodgson avoids needing a statement by using rule 2 instead; it would work just as well had rule 2 been ~isInteresting(yourPoem), but that would only let you prove that yourPoem was uninteresting, not that all your poems are uninteresting.).
Shirky's trying to discredit the Semantic Web by using a syllogism of his own, that goes like this:
From this he's trying to draw the erroneous conclusion that the Semantic Web is useless. I leave the problem with this as an exercise to the reader.
Seeing as he is apparently trained in this stuff, which I am not, this makes me think that he is either (a) incompetant or (b) is deliberately trying to mislead people. Either way, I don't trust his logic.
Well... I actually wrote a paper lambasting the ontology for precisely what you bring up here. Specifically, I wrote working from a draft of Adele Goldberg & Ray Jackendoff's paper "The English resultative as a family of constructions" paper (_Language_ vol. 80 no.3, September 2004). It deals with strange things like
"The trolley rumbled through the city"
and led me to believe Victor's ontological approach would have some serious problems encoding this if it didn't have a more attuned syntax processor. It wasn't a good paper, but I made my point, and you bring up a similar idea on a more basic (and thus, even more problematic) level.
Anything remotely "idiomatic" (specifically, where the combinatoriality of semantics fails, as it does in your example, where time does not "fly" in the sense that it does not move through the air held aloft by differences in air pressure) starts to generate serious problems.
Your problem could be solved if the lexicon had in it information about common idioms, which it presumably would, to be functional on any level more colloquial than academic writing. Most linguists would tell you the lexcion really does encode idioms in some fashion too, so this wouldn't be some sort of computational stop-gap.
So the lexcion has in it "time flies" or something. The parser (or some sublevel of it) would then identify "like" as a metaphorical comparison to the following predicate "an arrow."
Thus, the TMR would have something to do with time moving briskly towards a target, perhaps.
I'm not saying this is an entirely feasible option, but read what Tim Berners-Lee is proposing, and see if you find it much more plausible. The amount of information out there people would have to manually encode would preclude the system from having any real functionality beyond keyword search. While I'm not a huge fan of the current implementation of the ontology, I do think future generations could start to sort things out. Its advantage is that once the concept database, the onomasticon, is complete, it should be mostly self-trainable, which is what Berners-Lee's solution lacks.
Didn't they come up with a few viruses for it though?