The Semantic Web Going Mainstream
Jamie found a story about a new web tool that is trying to break ground into the semantic web. It's called twine, and it supposedly will intelligently aggregate your data, be it youtube videos, emails, or whatever you accumulate in your travels. Not the first, not the last, but here's hoping something comes out of the ideas someday.
I really don't like this idea. One good hack from the Russian MAFIA and the game would be over. All your eggs are belong to us, as it were.
The simple truth is that interstellar distances will not fit into the human imagination
- Douglas Adams
http://www.technologyreview.com/printer_friendly_article.aspx?id=19627 for those who don't want the ads
but even without ads the article is very shallow. how is it "semantic" web exactly?
"access and use the Site and electronically copy, (except where prohibited without a license) and print to hard copy portions of the Site Materials for your informational, non-commercial and personal use only"
Can't use their service for commercial purposes; how mainstream can it be?
My turnips listen for the soft cry of your love
Not the first, not the last, not ground breaking...not news worthy.
Overall though I like the idea!
Sorry folks, but twine just isn't gonna cut it. We need something sturdier. Someone needs to start a similar project called 'ducttape'.
"Written with the Semantic Web Standards, called W3C, in mind."
Yikes. That's horrible.
I want to delete my account but Slashdot doesn't allow it.
... Russian Mafia eggs you!
Let us not become the evil that we deplore.
Sure, I understand that managing expectations is important, but let's not lose sight of what this article really is.
"Trolls they were, but filled with the evil will of their master: a fell race..." -- J.R.R. Tolkien on Olog-hai
While reading TFA I had a flashback to reading a 90's era ASP press release. "Ohhh... Shiny and pointless!"
Unless I've missed some whole new sub-branch, semantic web to me means marking it up properly to give meaning to the various page elements via correct tags and microformats. This is just an overgrown agregator.
I want a list of atrocities done in your name - Recoil
This article is crap - however, the idea is not entirely hot air, even though it is being touted as 'the next, big thing', which I very much doubt it will be. I think the 'semantic web' is trying to solve a non-existent problem; we're not suffering from 'information overload' - the net has just been filled up with useless rubbish, like adverts, SPAM, entertainment and adverts. And did I mention adverts? Fortunately it is not necessary to 'manage' any of that - all you need is to be able to avoid it, which existing SPAM filters and ad-blockers already do reasonably well.
Apart from that, I think using a tool like the one proposed (however vapidly) in the article presents it's own dangers. Letting a machine manage and 'understand' information that is important to you is not wise. Think of the spellchecker deathtrap: You misspell words in such a way that they become correctly spelled words with another meaning - like 'them' vs 'then', or 'than', or 'there' vs 'their'. Sometimes you stumble over texts where the author has clearly relied on the spellchecker without proofreading it afterwards, and the meaning has become garbled, or even worse, it has changed to something the author didn't intend, but which seems plausible enough. Just imagine if you were an amateur ornithologist who collects some articles mentioning 'cock pheasants' and 'blue tits' - and suddenly your collection of articles is tagged 'pornography'. Perhaps not the most catastrophic of scenarios, but certainly an example of the kind of surprises you can expect from the 'semantic web'.
While I am a fan of the "esoteric field of machine learning", as the article mentions, I am also well aware of the countless of disappointments so far (thus no AI..). There have been many designs that can tackle toy problems, but nothing yet that has been able to handle large corpuses of text so far. The big problem being is that to really be able to do proper categorization the program must understand what it's reading. Which, again, requires some type of intelligence.
While methods are available to do categorization based on either static or learned heuristics, they are less than perfect (think about Safe Search in Google images -- it works decently, but definitely not perfectly). In fact, just parsing a single English sentences can be a difficult task for computers (if the sentence doesn't fall into a context free grammar). So the best we can probably hope Twine to do is categorize based off of word frequency (okay, they probably use some higher order stats).
Whenever I read about a new semantic technology, I always think of Wordnet (developed by Miller, who is the same guy responsible for the study showing we can remember 5-7 digits). Wordnet was developed as a database for the hierarchy of all words. Words are defined by their relationship to other words.
While it's a great idea, and useful for some projects, it also far from perfect, as words do not in the end have a static relationship to each other. The semantic web in the end relies on a static relationship between words (either through common usage or through a relationship through words).
I always misread it as the "Sementic Web" and I get really excited that more interesting ways to look at pr0n are on their way to the internets.
Let's see if it works on Slashdot.
It's well-known in linguistics and philosophy that "You don't get semantics from syntax." It's well-known in computer science that computers are syntactical. It's well-known in recent business history that all startups claiming they'd produce "expert systems" or "artificial intelligence" in which computer systems would, despite these accepted truths, perform semantic feats have miserably failed to live up to their claims.
So why don't we give PR puff pieces like this the same warm reception we give to the latest announcement of a perpetual motion machine? It's the kind of project only plausible to those who know very little of the basic background well-accepted by experts in the pertinent adjacent fields. That one or two big names from the success of the syntactical www either aren't familiar with or don't accept core knowledge from linguistics and philosophy of language is finally no different than Thomas Edison working for years on a machine to talk to ghosts: brilliance in one area most often doesn't translate into other areas in which you have no background - and even more rarely into areas where nobody knows how it would be done.
"with their freedom lost all virtue lose" - Milton
So if you want to have it used in the industry you just have to say "The semantic web will revolutionize the industry." Maybe they can integrate it into Web 2.0...
I'm trying to teach myself to set people on fire with my mind... Is it hot in here?
It's funny that the posters bash semantic web, saying its useless and will never materialize... all while adding semantic annotations to the slashdot posting (tagging beta)!! ROFL.
I do agree that the article is crap though.
Lojban is unambiguous and much better suited to this than Spanish or English. Sure, you can make English as precise as possible--but most of written English is not. I don't see how you can have a semantic web when the semantics aren't clearly defined. Lojban, parsable like Java, makes it truly possible.
I went to the Twine site to find out what it was all about but I just got bombarded with meaningless buzzwords and technodrivel. This is what you tend to get from people who want to sound cutting edge but haven't got a clue, so I concluded that they didn't really know what it was all about either.
I for one wait with Clay Shirkey to welcome our "devestatingly intelligent" machine searching overlords!
Beware of bugs in the above code; I have only proved it correct, not tried it.
I looked at the Twine web site, and I can't figure out what they're actually doing. It's all buzzwords. There's a video of the Twine guy speaking at the "Web 2.0 Summit". The video is useless; the guy is doing a demo, but the video only shows the face of the speaker, not the demo.
Apparently the "natural language recognition" seems to consist of recognizing names of people, products, and companies. The examples were "Tim Bernars-Lee" and "Google", which are so unique that they're easy. But would it work for "Robert Smith" and "Joe's Plumbing"? There was no indication that it uses context to disambiguate the non-trivial cases. It still requires manual tagging for most data.
There's a scheme for tracking document changes. There's a system that builds up a profile of the user based on what they store, which sounds like a targeted advertising engine. There's a personalized search engine. There are "collaboration features". There are contact lists.
But from the available information, it's not yet possible to tell if this is useful.
The major difficulty the semantic web faces with adoption is that it is very hard
to get people to tag things consistently and thoroughly, and it is also hard to have machines do
the tagging. It looks like Radar Networks has made some progress in getting machines
to do the tagging: you give them information, they or their machines tag it.
If this is true, it could really help get the semantic web off the ground. The guys at radar networks are not
clueless amateurs as some commentators above have suggested; you might say they've been around the block
a few times.
Even then, if we somehow put in measures to detect ornithology pages, my cock pheasant pornography site could be misclassified, too.
Some very cool apps have already been built on top of it like http://newsatseven.com/, http://www.squadinfo.com/, http://www.optevi.net/newstracker and many others.
It's not the "real" semantic web - but it's an open-access starting point.
The also have a firefox plugin at http://gnosis.clearforest.com/ that does semantic analysis real time as you browse. I use this constantly while reading business news or browsing Wikipedia.
What they clearly don't have is Twine's marketing budget.
You can express vagueness all you want (zo'e).
1. On my site www.mortality.com, I write, "All men are mortal". On site www.men.com, someone writes, "Socrates is a man".
:- man(x).
2. The semantic spider finds both these pages, and rather than indexing the words "All men are...", it adds to its knowledgebase:
mortal(x)
man(socrates).
noting the sources of information, of course.
3. In the search engine, I write: Is Socrates mortal?
4. The engine translates the query to:
?- mortal(socrates).
5. The Prolog inference engine finds this to be true, by the rules in (2).
6. The UI answers that yes, using www.mortality.com and www.men.com, one can conclude that Socrates is mortal.
The horribly naive proximity-based methods of today's search engines are the result of computer engineers throwing up their arms with a cry of "this problem is too hard, let's design something that looks clever but really has no clue". Straight statistics should be relegated to judging conflicting information - for example, a Bible thumper will want to weight answers in favour of religious sites, while an evolutionary biologist would want to filter out knowledge extracted from same. Since Wikipedia is a secondary source, one would want this weighted a lot lower, than, say, knowledge inferred directly from an archive of academic papers. Sensible, reconfigurable defaults.
Hold on so you are saying that any hosted service is unsafe then? What about all the people who use hosted email, or hosted collaboration, or hosted file servers? Sure if a hacker gets into anything it's unsafe. Heck even enterprise software that is locally hosted is at risk. Geez, if we're that terrified, let's not even use computers or the Internet at all then. Twine is no more at risk than Gmail, Facebook, Salesforce or any other online service that holds information that is not all public. Get real.
Now I'm going to take a dog turd, a cat turd, and a goat turd and mash them together into a vaguely upright shape. Look, the leaning tower of pisa!
Is this just a tool to undo all the damage / barriers to entry that google has placed on the internet via it's adwords and adsense programs?
This is away for the gov'ment to get all the info in one place and to get you to put it there voluntarily. I bet there is a clause somewhere in the user agreement that gives them permission to access and use your content. You know in order to properly catagorize it.
Good overview of semantic web for novices. Why do I never have mod points when I need them.