Tim Berners-Lee and the Semantic Web
An anonymous reader writes "As we all know, Tim Berners-Lee is the hero of the Web's creation story--he conjured up this system and chose not to capitalize on it commercially. It turns out that Sir Tim (he was knighted by Queen Elizabeth II in July) had a much grander plan in mind all along--a little something he calls the Semantic Web that would enable computers to extract meaning from far-flung information as easily as today's Internet links individual documents. In an interview with Technology Review, the Web-maestro explains his vision of 'a single Web of meaning, about everything and for everyone.'"
Well, beyond the "knowledge management"-type mumbo jumbo, anyway. Some basic definitions are here, here, and .
See the original here.
:-) I've never read any of them, I only know this Berners-Lee fellow from the headlines.
Actually Slashdot posts this article over and over again every few months, with basically the same headline (sometimes "and" sometimes "on" sometimes "Tim" sometimes not). Kinda bizarre really.
As we all know, Al Gore is the hero of the Web's creation story.
This always gets asked - and a partial answer is right here.
Eclipse plugins, visualization tools... there's some good stuff there.
The Army reading list
Except for China, they get their own semantic web with special semantic filters in place that semantically keep their citizens under semantic control.
"The object of war is not to die for your country, but to make the other bastard die for his." - Patton
If you'd like an opposing view, make sure to read Clay Shirky's take on the semantic web.
The extra work required to put data into a standard data format won't be done. People can't bother making their pages w3c complaint (even slashdot). The second problem is that data formats can rarely be agreed upon by a large community. Look at how many calendar event and news feed formats there are.
I'm so tired of Semantic trying to take over all the security tools. Are they now trying to take over the Internet? I mean really, Semantic Antivirus totally sucks ass big-time!!! And don't get me started on Semantic's SystemWorks tool and how bad it blows!
Oh, wait a minute...
As has been stated many times, content producers will spoof semantic data just like they used to with the META tag...which is why no one uses the META tag anymore. Relevance algorithms take into account link analysis and statistical text analysis to provide a much more truthful representation of what data is there. Sorry Tim.
I want to offer an alternative, as proposed by Victor Raskin at Purdue. I speak for neither Sergei Nirenburg nor Victor (who does enough talking for himself).
:)
While this idea for more thorough, concise, and accurate searches is a good one, I would question whether embedding semantic tags into web pages is the way to go.
As outlined in Ontological Smenatics, there is an automated system of semantic processing already underway. Basically, it takes a text, then runs it through a parser, which looks up meanings in a lexicon, then reduces whatever translation it comes up with to a text-meaning representation (TMR), by pushing the concepts from the lexicon through an ontology / onomasticon / world-knowledge library. The TMR is basically the "pulp" of the semantics of the article, web page, book, or whatever it's been fed. It just contains the ideas, the things involved, and other relevant concepts, stripped of all other linguistic information.
TMR is great, becuase the TMR can be used then, by reversing the process and using the lexicon of another language, to translate a text from one language to another.
However, it seems to me that with the bits and pieces of the TMR stored in a search engine's index, this could be a huge boon for the search engine.
Instead of just trying to match keywords, by parsing the TMR of web pages and by parsing TMR of search strings, you no longer search for keywords, but keyconcepts.
The advantage to semantic searches / indexes by this implementation is manifold:
-Searches (and the web as a whole) will gain the richness Mr. Berners-Lee is advocating.
-Web authors will not be able to lie in their semantic tags, or otherwise misinform spiders what the page is about (remember tags?)
-No extra work is required in the actual construct of the web or *ML standards. The TMR is only generated and stored by the sites / processes that need it.
-Others?
Just an alternative solution, for fun
The fact that Tim has been trying for 15 years to sell this idea with little success indicates that he approach is insufficient. He is pitching the idea just like a startup would, giving cool examples and everything. But in practice, all he is doing is proposing and overseeing standards. Developing standards for an idea is not what is required to prove that an idea works. Standards should follow successful technology, not vice versa. You need to have companies that make products professionally and offer complete solutions (i.e. make it work real-life situations). Doing it for a very simple example that he quotes ("find pictures taken on sunny days") itself is a big, big deal. Perhaps Tim should get involved with companies in this field as an advisor/consultant. You know, there are enough smart people out there who could develop the standards. But very few people with his name and recognition to truly ignite commercial interest in his ideas.
Here is an account that predicts that Google will leverage its search results to create a Semantic Web. I see this as a distinct possibility. Especially Google leveraging its search results to help people buy and sell stuff.
And this is what makes me wonder if this will amount to much more then an interested research project for grad students. In order for the SemWeb to amount to anything useful, everyone is going to have to include the metadata necessary to integrate their data into the Semantic Web. How's that going to work? Who's going to make it work?
I've been hearing noise about the semantic web, RDF, and what not for years now, and every time I do, the first thing that pops into my head is "Second System Effect".
He got lucky once, because he put together some tools that were simple and straightforward enough for people to pick it up quickly, thereby avoiding the fate of the dozens of other hypertext systems going back to the late 1980's.
Now, like all second systems, he wants to "do it right", over-engineering away all of the things that made the first one take off ...
Just my opinionated rant ...
The rest of us call this... GOOGLE.
Google searches undifferentiated text. In contrast, the semantic web is all about differentiating text by adding meta tags.
For example, the word "Hilton" on a web page is ambiguous. It could be a hotel, or a celebrity. Which is it? With the semantic web we'd know:
Of course, this is a fairly trivial example. A more meaningful example:
That is to say, I may be an item scammer in online gaming realms, or in Diablo, but not in EverQuest. However, I may be one of the most honest people I know in the real world. Perhaps I have a second account that I use to Troll on Slashdot, but otherwise have this account where I try to post insightful information. You have the right to link these things, you may even have the right to link these to real world data like where I work and where I park my car. However, if I jilted someone in Diablo, do I want them to so easily find me and take it out on my car (as some people would)?
Do I want my employer having instant access to all of my online transactions, regardless if I'm on shift or off shift at the time? Individually, these are not things that have been considered something you would even want to 'secure', yet they may be valuable to someone.
Kinetic stupidity has a new brand leader: Allen Zadr.
In the beginning, we had library card catalogs, with their painful attempts to index and cross-reference books. That works well in some areas, typically ones where names of people are significant. Attempts to apply the same approaches to technical papers worked less well.
There's a very elaborate classification system for patents. When you had to look through patents on paper or microfilm, it was essential. Now that we have full text search, it's used less and less.
A modern example of this approach is the ACM Taxonomy, a structure into which all computer science can be fitted. (As an exercise, try to put the current Slashdot stories into that taxonomy.) Nobody actually uses that taxonomy to find anything.
As to data interchangability, that's a separate issue, and more of a standards one. The big problem for publicly available data is that the cost of encoding the data is borne by different people than those who benefit from the encoding. Many companies don't like having all their product and pricing information easily searchable by price. (Froogle may change this, because Google has so much clout.)
I've spent some time dealing with public financial reporting. There's opposition to detailed disclosure in a standardized format. Many companies don't want their detailed information to be too easily analyzed. Embarassing results show up.
The future is better search engines, not user-created indexing data. As we've painfully learned, a search engine must look at the same data a human reader would, or it will be lied to. Lied to to the point of uselessness.
If you have followed this little crazy guy that is me, you may have seen that most of today's computer problems are because modern operating systems offer nothing in the information management department.
3 47565
1 99083
Remember the CVS story a couple of days before? it's information management: http://slashdot.org/comments.pl?sid=123076&cid=10
WinFS is also about information management: http://slashdot.org/comments.pl?sid=121101&cid=10
The story that the Evolution e-mail client offers the e-mail data as a data model separate from the application? another information management issue.
The web? information management issue.
Distributed databases? information management issue.
Web search engines? information management issue.
Windows search tool? information management issue.
The Windows registry? information management issue.
The unix etc directory? information management issue.
Enterprise workflows? again, an information management issue. That's why there is no general workflow solution accepted and used worldwide.
Dynamic web site contents? information management issue.
The semantic web? another information management issue!
As you can see, from the numerous examples given above, all that an operating system should do, but no one does, is that it must manage information instead of files. If that is coupled with a distributed networked environment, 90% of the world's software would be considered obsolete overnight and the productivity and fun from using computers will increase 10fold.
If any open source developer is reading this, you may contact me for a private discussion on the idea. THIS IS OPEN SOURCE'S BIGGEST CHANCE TO LEAD THE TECHNOLOGICAL RACE!