The Semantic Web Going Mainstream
Jamie found a story about a new web tool that is trying to break ground into the semantic web. It's called twine, and it supposedly will intelligently aggregate your data, be it youtube videos, emails, or whatever you accumulate in your travels. Not the first, not the last, but here's hoping something comes out of the ideas someday.
I really don't like this idea. One good hack from the Russian MAFIA and the game would be over. All your eggs are belong to us, as it were.
The simple truth is that interstellar distances will not fit into the human imagination
- Douglas Adams
"access and use the Site and electronically copy, (except where prohibited without a license) and print to hard copy portions of the Site Materials for your informational, non-commercial and personal use only"
Can't use their service for commercial purposes; how mainstream can it be?
My turnips listen for the soft cry of your love
Here's a futuristic tailored smeantic search example!
Sure, I understand that managing expectations is important, but let's not lose sight of what this article really is.
"Trolls they were, but filled with the evil will of their master: a fell race..." -- J.R.R. Tolkien on Olog-hai
Unless I've missed some whole new sub-branch, semantic web to me means marking it up properly to give meaning to the various page elements via correct tags and microformats. This is just an overgrown agregator.
I want a list of atrocities done in your name - Recoil
Even better... On their site they say
http://www.twine.com/about and there's a great section about Web 3.0 here
It's great for a laugh... until you realize that by this time next year we'll probably be on Web 10.0
While I am a fan of the "esoteric field of machine learning", as the article mentions, I am also well aware of the countless of disappointments so far (thus no AI..). There have been many designs that can tackle toy problems, but nothing yet that has been able to handle large corpuses of text so far. The big problem being is that to really be able to do proper categorization the program must understand what it's reading. Which, again, requires some type of intelligence.
While methods are available to do categorization based on either static or learned heuristics, they are less than perfect (think about Safe Search in Google images -- it works decently, but definitely not perfectly). In fact, just parsing a single English sentences can be a difficult task for computers (if the sentence doesn't fall into a context free grammar). So the best we can probably hope Twine to do is categorize based off of word frequency (okay, they probably use some higher order stats).
Whenever I read about a new semantic technology, I always think of Wordnet (developed by Miller, who is the same guy responsible for the study showing we can remember 5-7 digits). Wordnet was developed as a database for the hierarchy of all words. Words are defined by their relationship to other words.
While it's a great idea, and useful for some projects, it also far from perfect, as words do not in the end have a static relationship to each other. The semantic web in the end relies on a static relationship between words (either through common usage or through a relationship through words).
Let's see if it works on Slashdot.
It's well-known in linguistics and philosophy that "You don't get semantics from syntax." It's well-known in computer science that computers are syntactical. It's well-known in recent business history that all startups claiming they'd produce "expert systems" or "artificial intelligence" in which computer systems would, despite these accepted truths, perform semantic feats have miserably failed to live up to their claims.
So why don't we give PR puff pieces like this the same warm reception we give to the latest announcement of a perpetual motion machine? It's the kind of project only plausible to those who know very little of the basic background well-accepted by experts in the pertinent adjacent fields. That one or two big names from the success of the syntactical www either aren't familiar with or don't accept core knowledge from linguistics and philosophy of language is finally no different than Thomas Edison working for years on a machine to talk to ghosts: brilliance in one area most often doesn't translate into other areas in which you have no background - and even more rarely into areas where nobody knows how it would be done.
"with their freedom lost all virtue lose" - Milton
The search engines are currently still mostly syntaxic. Look for a word, see pages matching that word, in a more or less relevant order... This means you have to play trick games with search engines in order to find what you want...
Imagine you could simply query things like: Find me an appointement with a dentist that takes my insurance, has good ratings and lives near where I live. From your personal information (your calendar, where you live), public information (consummer ratings on the dentists, maps, information from de dentist office, from your insurance etc) a semantic web search engine could provide you with an answer.
All it takes is for the data published on the internet to be *structured*
\u262D = \u5350
Hold on so you are saying that any hosted service is unsafe then? What about all the people who use hosted email, or hosted collaboration, or hosted file servers? Sure if a hacker gets into anything it's unsafe. Heck even enterprise software that is locally hosted is at risk. Geez, if we're that terrified, let's not even use computers or the Internet at all then. Twine is no more at risk than Gmail, Facebook, Salesforce or any other online service that holds information that is not all public. Get real.