Why the Semantic Web Will Fail
Jack Action writes "A researcher at Canada's National Research Council has a provocative post on his personal blog predicting that the Semantic Web will fail. The researcher notes the rising problems with Web 2.0 — MySpace blocking outside widgets, Yahoo ending Flickr identities, rumors Google will turn off its search API — and predicts these will also cripple Web 3.0." From the post: "The Semantic Web will never work because it depends on businesses working together, on them cooperating. There is no way they: (1) would agree on web standards (hah!) (2) would adopt a common vocabulary (you don't say) (3) would reliably expose their APIs so anyone could use them (as if)."
Thank God for Web4.1!
One of the problems is lack of standardization, and one of the symptoms is Yahoo! normalizing Flickr's user accounts with its own?
The semantic web will fail because it is too complex and noone outside the academic community working on it really understands it. The ad-hoc tagging systems and microformats Web 2.0 has brought are good enough for most people, and much simpler for the casual web developer to understand.
Doesn't Web 2.0 reach a "critical mass" as some point, where busineese will no longer be able to not cooperate? Of course, it all gets very fragile even then...
...says the guy who's blogging this opinion...
ilovegeorgebush
The researcher is just annoyed because no one sent him invites to Gmail.
It was created to solve a problem we had when everyone was using Hotbot and Altavista, but people are trying to introduce it into a world where everyone is using Google. (And Wikipedia. And all that Web 2.0 junk.)
I don't need you to mark "This page is a REVIEW of a CELL PHONE that has the NAME iPhone" anymore. All I need to do is Google "iPhone review" or hop on over to Amazon. Problem pretty freaking solved from my perspective.
Help poke pirates in the eyepatch, arr.
Only way to set an industry standard is, to get so fast so big in a new market/technology that everybody has to follow.
Problem is, when you get so big so fast, there are almost neccessarily major flaws in the designs.
Problem is, you never get rid of them again.
Just because I can imagine doing a hippopotamus, doesn't mean I'd like to do it.
It might fail for the reasons given (no I've not read the full article yet - naturally) but personally I think it will fail simply because it's too much work for the amount of payback. It would be great if one day magically over night all our data was semantically marked up but that's not going to happen. The reality of it is that we will have to mark up the majority of content by hand. Even then inter-ontology mappings are so difficult that I'm not sure the system would be much use.
Perhaps worse than that though is the prospect of semantic spamming. It would be impossible to trust the semantic mark up in a document unless you could actually process the document and understand it. What would be the point in the mark up in that case?
I used to have a better sig but it broke.
So what is this semantic web / web 2.0 thing anyway?
Sure, we're all seeing community sites, blogs, tagging, etc. But each of those sites is an individual site, and their only connections seem to be plain HTML links. Community sites don't really allow collaboration, blogs are standardized personal web pages and who here uses tags to actually find information? All these things might warrant a "Web 1.0 patch 3283" label, but is it really a new type of web? Is it the type and magnitude of paradigm shift that the first web was? It only seems like people are just becoming more aware of the possibilities of the same web it was 10 years ago.
Slashdot social media options: AIM, ICQ, Yahoo, Jabber and Mobile Text. Why no MySpace?
Go to Wikipedia (for example) and look up the definition. Then tell me you understand it.
... which is marked up as being about Mini Coopers. I'm looking for stuff about 1964 Cooper S inlet manifold modifications. This page looks like it might be interesting to my client, but quite a lot of people get confused between the different models of SU carburettor which were used that year. Does this page refer to the model with the No.4 Red needle or not?'
See? Not a hope that a concept which includes 'collaborative working groups' as part of its definition can ever succeed.
I mean these are the people which gave us HTML and CSS, god help us.
Meaning is derived by humans from the interaction between data, knowledge and dialogue. What the semantic web will give us is:
1) Data
2) Limited knowledge to the extent that common, sufficiently rich models of relationships, taxonomies and ontologies are applied to the data.
3) No dialogue. When Google can say 'hello Mr www.fountainofallknowledge.com. I see you have a page called
And get a sensible reply.
Which it understands.
Then I'll be interested. Until then all it will be is tagging but with a poncy name and a load of spurious academic nonsense being spouted around it to make it sound exciting.
The thing the academics who push the semantic web fail to consider (most of the time) is that the Real World does not function like their Ideal World. In the Ideal World, everybody cooperates and works together to produce something of value for all mankind. So we get lots of correctly and appropriately marked up pages that give useful information on what's stored therein.
But in the Real World, any online system that is used by a large enough number of people will eventually become attractive for spammers and scammers to defile and twist to their own purposes. So you'll get a deluge of pages that appear to be useful reviews of digital cameras (and are marked up as such) but in fact simply go to a useless "search" page that has lots of link farm references.
And if you say "Ok, so we don't trust the author of the page, we have someone else do it"... then who? Who's going to do all the work? Answer: Nobody. AI is nowhere near being smart enough for this. Keyword searching is, unfortunately, here to stay. If you trust the author to do the markup, then the spammers have a field day. If you say "Only trusted authors" then the system will still fail, due to laziness on most people's part - if a system isn't trivial to implement and involves some kind of "authentication" or "authorization" then nobody will use it, period. The Web succeeded in the first place because anybody anywhere could just stick up a Web server and publish pages, and it was immediately visible to the whole world.
The Semantic Web will fail for the same reason that the "meta" tag failed in HTML: Any system that can be abused by spammers, will be abused.
So, the Semantic Web, which is all about helping people find stuff, will fail. Not because of any technological shortcomings (it's all very nice in theory), but simply because we as people won't work together to make it work. Well, a small number of people could work together, but as that number got larger, until it reaches the point of being useful, it will automatically get to the tipping point where it becomes worthwhile for the spammers to jump in and foul it all up.
The Semantic Web is a solution in search of a problem.
No matter how cool your RDF/OWL ontologies are, the real world is perfectly happy with plain XML/CSV. If there isn't an obvious benefit, people won't switch.
This sig is intentionally left blank
Maybe these things will fail in the public world of free service bureaus with which this guy is familiar, but the concept of webservice API is exploding in the vertical market spaces. In only the last two or three years virtually every single vendor my company works with in the financial industry has launched fully WSE compliant webservices to tie into their products. Previously you would have to work in batch by uploading a file to a secure FTP site and wait for results to appear as another file in that same FTP site. Now the results are real-time.
Companies are certainly embracing the new standards (and yes, there are standards) and they are certainly using them to replace existing older protocols and there is a lot of money to be made in this field.
No, this is about the SOAP API being replaced by a less flexible AJAX API. Never used either of them to be honest, but that's because I don't have any real need for them. When it comes to the content of my own websites (or rather my customers websites), I'd much rather prefer relying on my own database than an index google made.
Lewis Carol had it right, and George Orwell agreed with him: "Which is to be master" is the question that matters.
In free societies, everyone is master, and our language is conditioned only by the minimal need to communicate approximately with others. Beyond that, we are free to impose whatever semantics we want, and we do this to a far greater extent than most people realize. As a friend who works in GIS once said, "If I send out a bunch of geologists to map a site and collate their data at the end of the day, I can tell you who mapped where, but not what anyone mapped." Individual meanings of terms as simple as "granite" or "schist" are sufficiently variable that even extremely concrete tasks are very difficult.
Imposing uniform ontologies on any but the most narrowly defined fields is impossible, and even within those fields nominally standard vocabularies will be used differently by rapidly-dividing "cultural" subgroups within the workers in the field.
The semantic web is doomed to fail because language is far more highly personalized than anyone wants to believe. I think this is a good thing, because the only way to impose standardized meanings on terms would be to impose standardized thinking on people, and if that were possible someone would have done it by now. Whereas we know, despite millennia of attempts, no such standardization is possible, except in very small groups over a very specialized range of concepts.
Blasphemy is a human right. Blasphemophobia kills.
Best essay on the topic I have come across: http://www.well.com/~doctorow/metacrap.htm
This is the real world, most things aren't total successes or total failures.
Most likely the symantic web will fail to achieve all it's objectives but achieve some of them, and may eventually rise again after it's failed. This is the nature of progress. Good ideas that fail are usually resurrected later. However the blogger is probably right, as long as the symantic web is going to be "handed" to us by a group of established corporations it will most likely never succeed, there's too much incentive for back stabbing in that top-down implementation. For it to succeed it needs to be so obvious that there's more money and power available by playing nice that all but the most black hearted capitalists will play nice. We have to be aware that people like spammers exist, though, and anything that could potentially be used to generate advantage will be abused to death.
Fanatically anti-fanatical
But there are three ways to get that.
1) A search service that indexes all of Romario's goals.
2) A manually built asset that aggregates all of Romario's goals.
3) A standard system of semantic tags that self-identifies all Romario goal assets.
#1 is Google. As you point out now it relies primarily on keywords but you oversell the problem in two ways. First of all most video hosting sites already provide author and/or community tagging--thus providing a way for keywords to be assigned. Second, you're comparing a future semantic Web against the Google of today.
#2 can be provided by commercial video companies now ("1,000 Great Man U Goals," etc). It's also possible that a fan site could do the manual labor to find, upload, and keyword the videos.
#3 is the "semantic Web" approach, wherein all content providers follow a standard for self-identifying their content in a computer-parsable way.
The thing that distinguishes 1 and 2 from 3 is the scope of work required. #1 and #2 rely on a small team of dedicated people to accomplish the task. #3 relies on a very broad group of people of varying levels of dedication.
If you're talking practically about the solution, none of those approaches are going to to get to 100%. As others have pointed out there is a real human semantic problem in identifying which goals of Romario to count, how far back to look, etc.
But the key is that #1 and #2 are approaches of a scope that we know can work. #3 seems unlikely to get the buy-in and effort required.
Build a man a fire, he's warm for one night. Set him on fire, and he's warm for the rest of his life.
One of the features of the W3C's model (based on RDF) is that it doesn't push the idea that everyone should adopt the same vocabulary (or ontology) for a topic or domain. Instead it offers a way to publish vocabularies with some semantics, including how terms in one vocabulary relate to terms in another. In addition, the framework makes it trivial to publish data in which you mix vocabularies, making statements about a person, for example, using terms drawn from FOAF, Dublin Core and others.
The RDF approach was designed with interoperability and extensibility in mind, unlike many other approaches. RDF is showing increasing adoption, showing up in products by Oracle, Adobe and Microsoft, for example.
If this approach doesn't continue to flourish and help realize the envisioned "web of data", and it might not after all, it will have left some key concepts, tested and explored, on the table for the next push. IMHO, the 'semantic web' vision -- a web of data for machines and their users -- is inevitable.
His second point is just a common misconceptions and FAQs. It doesn't require that people does that.
I have just accepted a position with a consultancy that does a fair amount of work for those cut-throat businesses. And they are interested, very interested, in fact. Which is also why Oracle, IBM, HP, even Microsoft is interested.
Typical use case for them is: So, you bought your competitor, and each of the companies sit on big valuable databases that are incompatible. You have huge data integration problem that needs solving fast. So, throw in an RDF model, which is actually a pretty simple model. Use the SPARQL query language. Now all employees have access to the data they need. Problem solved. Lots of money saved. Good.
But this is not part of the open web, you say? Indeed, you're right. So, Semantic Web technologies have allready succeeded, but not on the open web. And since I'm such an idealist, I want it on the open web. So, the blog still has a valid point.
We need to make compelling reasons why they should put (some) data on the open web. It isn't easy, but then, let TimBL tell you it wasn't easy to get them on the web in the first place. It is not very different, actually. The main approach to this is capitalise on network effects. There is a lot of public information, and we need to start with that.
So, partly, that's what I'll do. We have emergent use cases, and that's the evil part of cut-throat business. You don't talk about those before they happen. So, sorry about that. I think it will be very compelling, but it'll take a few years. If you're the risk-averse kinda developer who first and foremost has a family to feed, then I understand that you don't want to risk anything, and you can probably jump on the bandwagon a couple of years from now, having lost relatively little.
But if you, like me, like to live on the edge, and doesn't mind taking risks doing things that of course might fail, then I think semweb is one of most interesting things right now.
Employee of Inrupt, Project Release Manager and Community Manager for Solid
I think the problem is in the author's head. Difficulties always exist between vendors. They are worked out when the beneifts of cooperation outweigh the benefits of non-cooperation. I believe what we are calling the semantic web has other features that many consider failure but in reality are inherent to sharing the amount of information we are trying to share, namely, universal uniformity and a single clean interface into human civilization (which is really what the sm is) is impossible and foolish to hope for. The semantic web will make some things vastly easier and the price we will pay is that other things will become far more difficult. This will stimulate more innovations and hence more problems, etc.
Ivan Handler
I think the big issue right now is that the computer industry doesn't even know what they mean by "Web 2.0" and the marketing departments hide their ignorance admirably by repeating buzzwords until people think they understand concepts they don't. Ok, Tim O'Reilly is careful to define such terms when he uses them (good for him!) but few others seem to do the same.
At least with things like TCP/IP, relational database theory, information theory, and the like, the concepts are well defined, not some mishmash of marketing buzzwordspeak and sloppy definition. Of course, TCP/IP as it is now often taught (via OSI) is just as muddled even though the model is (and ought to be) clear as daylight. If people are going to cover OSI and TCP/IP they ought to cover the entire protocol ideas, design criteria, etc. That way people will *understand* why OSI protocols (like H.323) are so awkward when run on TCP/IP. [/rant]
The big thing is, instead of having a vague marketing buzzword about something, it is helpful if we devide things into usable and practical concepts. Social networking, web services, service oriented architectures, semantic markup, etc. rather than lumping it all together into a vague term that doesn't really mean anything.
LedgerSMB: Open source Accounting/ERP