Semantic Web Getting Real
BlueSalamander writes "Tim O'Reilly just did an interview with Devin Wenig, the CEO-designate of Reuters. With no great enthusiasm I started to read yet another interview on how the semantic web was going to make everything great for everybody. Wenig made some good points about the end of the latency wars in news and the beginning of the battle for automatically detecting linkages and connections in the news. Smart news, not just fast news. Great stuff — but just more words? Nope — a little searching revealed that Reuters just opened access to their corporate semantic technology crown jewels. For free. For anyone. Their Calais API lets you turn unstructured text into a formal RDF graph in about one second. I ran about 5,000 documents through it and played with a subset of them in RDF-Gravity. The results were impressive overall. Is this the start of the semantic web getting real? When big names and big money start to act, not just talk, it may be time to pay attention. Semantic applications anyone? The foundation appears to be here."
Next up, semantic spam.
Actually, I think it's beaten the rest of the content to the punch. =(
Is the semantic web supposed to be one of those Web 3.0 things?
[Fuck Beta]
o0t!
What good are fancy links if the content still sucks?
I've never understood what the financial benefits for a site joining the semantic web are supposed to me. Reuters may be one thing, but how would you sell this technology to Amazon? Or NewEgg? If commercial sites can't/won't use it, how is it supposed to gain critical mass?
Comment of the year
And now for a host of Anti-Semantic comments in 3 ... 2 ... 1 ...
Well, I am sure the authors will just call them Anti-Zio[a]ntic comments.
So I need this WHY?
Most websites have little to say, and take all day to say it.
Having a detailed graphical analysis of the blather seems unlikely to improve the situation. GI,GO.
It would seem spending just a tad more time writing for HUMANS would be way more productive than writing for machines. Having a thousand computers watching your 100 monkeys seems unlikely to bring enlightenment or useful knowledge out of a pile of garbage and human blathering that passes for information on the web these days.
People used to write web pages.
Now they write software to write web pages.
Its not surprising they now need to write software to understand the web pages.
Whats the point?
Sig Battery depleted. Reverting to safe mode.
... now I can surf the sementic web.
Semantic webs might be OK for small document sets where you can visualy search tags and click them. Want to look up something about monkeys? Look for the tag that says monkeys (or maybe find primates first, then monkeys) and click it.
But for huge data sets this sucks. After a smallish number of documents & subjects it must be far easier to type monkeys in search box and have Google etc do the search.
This might work for handling some queries, but will suck supremely for complex queries over large data sets (eg. the whole www).
Engineering is the art of compromise.
Semantic Web Getting Real
Just what we need. Yet another version of RealPlayer.
The higher the technology, the sharper that two-edged sword.
If online news outlets cut out the advertising promos that precede every video news clip, it would be a million times more popular that it already is.
I mean, nobody wants to see an advertisement tht is twice as long as the video clip itself. People will especially be turned off when they realize they took the time two view a 30 second mattress or advertisement just to view a 45sec-1min news clip about a story that is either boring, uninformitive as print, or just plain crappy.
Advertising is the Black Plague of all media. It's consumer repellent ability can't be denied, and the number of good ideas that have been ruined by ads is unimaginable.
Knowing Google's lust for data collection, the Soviet Union is still alive and well inside the psyche of Sergey Brin....
I think the bounty for a word press plugin is a neat idea. Having seen how poor the performance is for existing tag suggestion tools for word press, maybe calais can do a better job?
there is some thing similar at http://www.powerset.com/ they are still in Beta though, and it's not working that great. We will never get perfect matches from computers, but the question is if semantics will ever be better than just keywords.
Am I the only one who misread that?
"Wenig made some good points about the end of the latency wars..."
/.'s 'editorial' habits :\
Mr. Wenig must not be all that familiar with
I read it like this:
Semantic web getting real [player]
and immediately thought "it was bad enough when the original web got it"
"Please note this environment may not be completely safe, so we are going to prevent you from entering. We have also initiated so many system processes that it will simulate a virus on this system."
The links in that article are neat. I am looking forward to watching the maturity of this!
The first time I read the title, I thought it said 'Symantec Getting Real'. Well, I was planning to leave a smart comment about Symantec and Real don't belong in the same sentence.
If you are like me, and have absolutely positively no dang fucking clue what the summary is talking about: http://en.wikipedia.org/wiki/Semantic_Web
According to the Wikipedia history, this concept has been around since at least 2001.
Write your own Choose Your Own Adventure. http://www.freegameengines.org/gamebook-engine/
IMHO this is not the semantic web. The primary representation is still (just) natural language. Anything in addition to that is really just search engine technology under a different banner. Is that a bad thing? No! I've always said the semantic web was bound to fail because people don't want to spend a lot of extra effort tagging their information so others can slice and dice it; instead, the evolution of natural language processing in search (rather than manual tagging) will solve the problem. Maybe the Reuters idea of exposing the "inferred" metadata will be useful (as opposed to normal searches like google who simply keep this metadata in their own indices), though as yet I don't see why.
When you start aggregating as much text as google does, the semantics just starts popping out, in the form of word relationship statistics.
The massive corpus size, when measured carefully, acts to filter semantic signal from expressive difference "noise".
Combine that kind of latent semantic analysis of global human text with conceptual knowledge representation and inference
technologies (which would use a combination of higher-order logic, bayesian probability, etc) and it should be possible to
create a software program that could start to get a basic semantic understanding of documents and document relationships
in the ordinary "dumb" web.
Could the proponents of the semantic web please tell me what it will add to this?
My basic proposition is that if an averagely intelligent human can infer the semantic essence (the gist, shall we say), of
individual documents, and relationships between documents on the web, why can't we build AI software that does
the same thing, and then reports its results out to people who ask.
Where are we going and why are we in a handbasket?
Ummh, I think that's the point. The concept - first advocated by Tim Berners Lee - has been around for a long time. The technology to make it real has not. This is a big step in that direction. It's not the whole answer - but services like this will help overcome one of the key constraining factors: ubiquitous metadata tagging of content.
Finally, Reuters released OpenCalais as free open-source software. OpenDover will appear any time soon. (someone may then connect both using a Channel, SSH perhaps)
Instead of that, I misread Calais as Cialis, which wasn't helped by the first post being about spam...
The best indicator of vaporware seems to be continual postings on Slashdot that something is real.
Given that the Semantic Web is neither Semantic nor Web, I think we've got another data point for that theory.
Nope, I did too, and I was wondering... does this mean that Norton won't crash and slow down Windows computers more then most spyware/viruses?
There is no "disagree" moderation, and troll, flamebait and overrated are not valid substitutes
Actually, the story is about a tool which does (a part of) what you are describing.
Probably easier to get rid of tags...
Enlightenment is a pipe dream. So where's the pipe?
On first read, I like what they are trying to do, but I see so many problems with what they are thinking, and I am not a web designer in any sense.
First, I don't have a problem finding things to buy on the internet. The problem is, signal to noise ratio. There are TOO MANY google results for something like 'plasma tv.' No matter what kind of RDF is used, it will be abused by people who want their URL to show up in your search for whatever reason. I think someone touched on this earlier a little in this thread, but it deserves repeating.
Second, can you imagine a scenario where, say, best buy or fry's uses some 'semantic web' application to do real time web searchable updates of their inventory? That's what would have to happen for this to work, and do something that isn't already possible.
Right now, I can search for 'plasma tv' in google or ebay. Then I can call my local retailers to see if they carry that item, and have it in stock. In order for this system to make any kind of tangible change in the example given, retail chains would have to update their inventories online, whenever a purchase is made, or new items delivered to the store.
It's an interesting idea. I wonder if the retailers would go for it? All it means for them is fewer people comming into their stores...sounds like that would hurt sales.
I also hate internet hype. I really fouls things up, more than some want to acknowledge. I try to keep my 64 year old dad educated enough to buy coffee beans on ebay, check email, look at news, etc. Every time he sees 'symantic web' or 'web 2.0' in the media, it just confuses him, and I imagine, people like him who just use the net for basics like online bill pay, ebay, etc. He doesn't need a new buzzword to motivated to shop online or whatever.
he has the motivation already...silly contrived 'new meida' buzzwords just waste time and confuse people
Thank you Dave Raggett
The semantic web refers to a specific attempt/vision put forth by w3c.
http://www.w3.org/2001/sw/
This article is about a news organization using semantic tools to help extract and manipulate certain data. Sure, they are related a little maybe, but if related meant equal, then every computer would break.
Just because the word "semantic" matches, they've confused the two domains, and if humans can't even do it, I wonder what our automated semantic web would look like with robots trying to make connections. I cannot even begin to imagine how hackable that would be.
In reponse to: My basic proposition is that if an averagely intelligent human can
the same thing,
Why should I be thankful about spending my adult life working because machines aren't up to the task? I'll be thankful when machines take the work and leave us free to do what we want.
we're going to see a lot of semantic goosestepping and .sig <h1><i>l!</i></h1>s?
"No! I've always said the semantic web was bound to fail because people don't want to spend a lot of extra effort tagging their information so others can slice and dice it"
And yet we have social sites.
I misread it as "Symantec Web Getting Real" and I was like, "wtf? The maker of Norton's website is buying Real?"
I really don't see that happening. The transition to this sort of economy is basically where the problem is now. As human labor is replaced by robotic arms in factories, those employees are left to find another job. Only, their entire skill set has now been replaced, so they are back to square one... They don't receive pay for the rest of their lives just because their job was replaced with a machine that does it better.
"If real is what you can feel, smell, taste and see, then 'real' is simply electrical signals interpreted by your brain."
Also, it's not yet for "anyone." According to the Calais roadmap, only English documents are accepted: "Calais R3 [July 2008] begins
668.5
I understand being jaded about internet hype and buzzwords but I'm still surprised that after nearly eighty comments there doesn't seem to be anyone who has anything to say other than "vaporware" and "it won't work because of the spammers." Yes, maybe it has been overhyped and yes it is taking a while for the envisioned ideas to come to fruition but that doesn't mean that those ideas aren't worthwhile.
I'll use the following example because I recently had to do this with non semantic tools. Lets say you wanted to see how good or bad a job a transit agency is doing in its city in comparison to other similar cities. A couple of metrics you might use to find similar cities would be population size, population density and land area. Google doesn't do a good job with something like that. You end up needing to search for cities individually and then finding their data points. Or you can find a list of cities ranked by population or population density. If you search on Google for something like that you end up at one of the Wikipedia lists. These lists are helpful but....still lacking. They don't contain all the cities you need or they don't provide a way to look at multiple data sets at the same time. The lists are also compiled by hand and aren't automatically updated when the information on the city page is changed. The data is in wikipedia though. Every city page lists that information in a little box near the start of the article. But how do I take this data that is in Wikipedia from the form that its in into a form that I can use to find what I need to know? Enter the semantic web.
Lets say that wikipedia, or at least the parts dealing with geography, were semantic. Now, there are tens of thousands of pages describing countries, regions, states, counties, parishes, cities, towns and villages. Then those pages are translated into many other languages. Some of the data that these pages contain is of the same type . They all contain the name of the locality, latitude, longitude, size, population size and elevation. For data such as this it would be pretty easy to have a form to enter the data into as opposed using the usual markup and the form could put the data into the proper markup for the page and the proper RDF. Once the data is in proper RDF form it would be easy to automate the process of updating translations of that page with the new data as well as updating any pertinent lists. It would also make it easier for people who want to analyze or use the data because they would be able to access it much more easily.
But nobody really wants machine readable access to this information, you might say, except for the random geek and researcher. I would disagree. Lets say you're using a program like Marble which is similar to Google Earth in some ways but is completely open source. If they wanted to display the population of a city when you hover over it they would currently have to create and maintain their own dataset or they'd have to write a parser to extract it from wikipedia. Neither of those options is particularly easy at the moment but if the information was in semantic form on wikipedia it would be a piece of cake.
The strength of the semantic web isn't, in my opinion, going to be AI like personal agents or anything like that. It'll be things that in many ways are already here. Like Yelp putting geotags on the restaurants they reviews and apps like Google Earth taking that data thats available in machine readable (Semantic!) for to overlay that data on a map so that you can see whats nearby. It'll be applications doing the same with the geotags from flickr. Its really useful mashups like http://www.housingmaps.com/. Its the transit agency putting realtime bus data up in semantic form so you can see on your iphones google map how far away the bus is. So yeah, maybe the semantic web is overhyped but that doesn't mean there isn't a lot of substance there, too.
Cheers,
Greg
There's no more "intelligence" in AI than in a can of Campbell soup. It's basically statistics, linear algebra and (sometimes) handcoded rules for reasoning. It doesn't evolve. It doesn't build upon what it "knows". It has no self-awareness or consciousness and its reasoning capabilities, if present, are extremely weak compared to even children.
We're so early in the development of this field that no one can even define what "self awareness" or "consciousness" really is, let alone how to create it or scale it. Folks try. There's Cycorp, there's Powerset, there are a lot of people in academia who work on NLP, Machine Vision, classification, neuroscience, etc. There is, however, no unifying vision or theory/understanding what is it we're trying to build, and the current methods have nothing in common with "intelligence" per se. They do learn, in a sense that they figure out the hidden structure of a given set of data by approximating it using a mathematical model. Even though this model sometimes closely matches what a human brain does (e.g. in multilayer neural nets), they don't come anywhere close to what one would call "intelligence". What they lack is scale (and speed), and advanced cognitive mechanisms required to become self-learning.
It's also interesting to note, that at this point humans know on a high level how their brain works. Neocortex is a six layer neural net with links going cross-layer and neurons organized into columns. Trouble is, there's hundred billion neurons. We sorta know how vision works, too. Trouble is, we can't work with it in real time (because, naturally, you'd need a chunk of those hundred billion neurons). Heck, even human language is a pain in the ass if you don't have advanced cognition (AKA strong AI), with ability to understand euphemisms, sarcasm and idioms, paraphrase, generalize and specialize. Heck, even anaphora resolution is not solved yet (i.e. what does he/she/it in the current sentence refer to in the previous text). It's as if you had a bunch of parts and no manual and someone asked you to assemble a spaceship out of what you have, warning you that some parts are broken and may require you to make your own replacements. Without blueprints. Blindfolded. With your hands tied behind your back.
I do believe that in 50 years we will have strong AI, though. I work in a science lab, however, and many researchers don't share my optimism.
The company I work for, Garlik has two products that are run off semantic web technology. DataPatrol (for pay) and QDOS (free, in beta).
We use RDF stores instead of databases in some places as they are very good at representing graph structures, which are a real pain to real with in SQL. You often hear the "what can RDF do that SQL can't" type arguments, which are all just nonsense. What can SQL do that a field database, or a bunch of flat files can't? It's all about what you can do easily enough that you will be bothered to do it.
A fully normalised SQL database has many of the attributes of an RDF store, but
a) when was the last time you saw one in production use?
b) how much of a pain was it to write big queries with outer joins?
RDF + SPARQL makes that kind of thing trivial, and has other fringe side benefits (better standardisation, data portability) that you don't get with SQL.
I guess it shouldn't be a surprise to see the comments consisting of the usual round of more-or-less irrelevant jokes and snide commentary - this is Slashdot after all - but I can't help responding.
I clicked 'here' for a developer key and was told that it had been despatched to jane.jones@gmail.com. Good news for Jane Jones.
timeOday >>> "evolution of natural language processing in search (rather than manual tagging) will solve the problem"
..." anyway.
But then if you're creating an addon for joomla (or any template elements really) to display event listings why not add a semantic tag so that a search engine could limit the domain by "tag:events". The extra effort involved is pretty minimal, especially when, if you code well, each event is probably in a "<div class="event eventtype">
Once people realise that search engines can do semantic filtering then it will be worth it.
As for tag-spamming well surely google, et al., won't accept based on tag first but will do their usual contextual/ quantative analyses first and then limit based on tags. So we wouldn't be gaining any spam over what we have now?
Actually, NLP software does generally use those statistical methods. RDF is a storage and sharing mechanism - that's the big deal.
I try to keep my 64 year old dad educated enough to buy coffee beans on ebay, check email, look at news, etc. Every time he sees 'symantic web' or 'web 2.0' in the media, it just confuses him, and I imagine, people like him who just use the net for basics like online bill pay, ebay, etc.
I'm afraid whenever I see this argument I immediately tend to discredit all the rest that I've read in that post. Designing technology for those who are least able to uptake it is a losing proposition at best; at worst a total disaster. Technology has always been utilized by those less set in their ways first, less invested in the capital and experience of doing it the 'old way', and is only more broadly adopted once it proves out as a better way to do things. Universal acceptance tends to only come after a generation; when those who are poorly situated to utilize it have passed on.
This speaks to your other concern rather tellingly. Fry's may not put their inventory online. But if Best Buy does, and reaps more rewards, then you can bet eventually all companies will do this as standard practice. Far more likely - a company that is smaller and more mobile will do it first, and then get bought out by a larger company that will adopt it's practices in order to stay potent in a changing marketplace.
But the successful online inventory app is not going to design for Best Buy first. They're going to design for Mom and Pop shop, and scale up to whatever customer they can find. When it proves out or doesn't there will be tangible evidence for others to act on - rather than meaningless hype.
Finally, I think the thing that the semantic web provides is more of the ability of the end user to control results. As we perfect our ability to parse machine language, we perfect our ability to hear clear signal amongst all the noise. I look forward to the day when we have this technology in more than a nascent stage, and think it's silly to dismiss it before then.
Also, I look forward to the day when people stop designing for me. Because presumably I'll be happy with what I have!
[Ego]out
NASA's an other big name who recently started using semweb-technology "for real": "Last week POPS--the expertise location service we built for NASA--went into production as an Agency-wide application; it's thought to be the first "institutional" (that is, business) Semantic Web app deployed Agency-wide at NASA. http://clarkparsia.com/weblog/2008/02/07/our-babys-all-grows-up/
I'm sure there is some sort of semantic joke in their somewhere but I can't find it.
I didn't try 5000 documents, but only two, one general text and one financial news. The results I got results were promising, but at the same time IMO not reliable and actionable enough that I would use this technology today to buy/sell stocks automatically. - http://lebleu.org/blog/2008/02/10/kicking-the-tires-with-opencalais/ - http://lebleu.org/blog/2008/02/03/microsoft-offer-to-buy-yahoo-semantic-analysis-by-opencalais/
And back then, we talked about trying to implement it. Then I read this:
http://www.well.com/~doctorow/metacrap.htm
Put a wet blanket on the whole idea (thankfully).
http://meta.wikimedia.org/wiki/Semantic_MediaWiki
It's basically just a matter of tweaking it and putting some real data in.
You are not a brain: http://books.google.com/books?id=2oV61CeDx-YC
I never mentioned Design! You didn't read my post very well, did you. I said that the HYPE of buzzwords like 'semantic web' or 'web 2.0' is lame, unnecesarily confusing, and annoying. The word hype was the first word in the subject of my post!
Here, I've copied the paragraph from my post that you read incorrectly, emphasis mine
You are a troll...either that, or you are not the sharpest knife in the drawer.
Thank you Dave Raggett
What does the Semantic Web offer, why do we care?
/. and k5 have been serving up RDF of their frontpages for years and that today we regularly use RSS feeds and some black magic to do similar things.
I'm going to try to add some personal perspective in addition to the worthy Wikipedia article linked in the parent, because I see a lot of criticism in these threads and not for the traditionally criticism-worthy issues. In case you're wondering, I was involved in a non-trivial Semantic Web related project in 2005: a learning experience, I won't mention it further. That said, I could be _totally_ wrong; but this is how I see things.
The first question: Why do we want the Semantic Web? Sure, it sounds fancy, but why should _you_ (the average Slashdot reader) be excited about it? Well, let me explain why _I'm_ excited and maybe you'll agree... I tend to move data around, through systems and people, changing formats as necessary, making logical decisions based on the data as appropriate. Often there's a convenient library or tool or API for assisting me in doing this, making my life easier by abstracting the process of getting at that data and mashing it around into something more immediately useful. From my perspective, the Semantic Web will give me that power at a new level of convenience. Semantic markup, formats, ontologies, etc, allow data-centric code to be written more quickly and with less reinventing of wheels. Ever written a screen scraper? A perl script to pull data out of a proprietary log format? The Semantic Web will not be a panacea for these kinds of problems, but if we can convince people to mark up more data in reasonably common/standard ways then hopefully things like software mashups should become easier than ever! When software is better able to understand what data _is_, without huge amounts of domain-specific programmer effort, making decisions based on that data should be easier. Take a look at Firefox, microformats, and SPARQL, for example. Do users care about the Semantic Web? I don't think so, because all they should see is basically the same old browser-rendered Web. However, our ability (as software developers and general geeks) to produce useful tools and websites using Semantic Web data may result in even better websites and dynamic services.
Second question: What does the Semantic Web look like? Not like Gravity, in my opinion. Dynamic graphs are handy visualization tools for some kinds of data, but definitely not all! In fact, they're pretty brittle and they don't scale at all. There are a lot of interesting proposed solutions to the visualization problem (see SIMILE and MIT's Haystack), but I don't think it really matters. Within a specific domain, there will always be better visualization tools than a generalized visualization method (written by those familiar with the domain). So, the Semantic Web will look basically the same as the current web. In fact, if you start looking carefully, I think you'll see it all around you...
Third question: Why is "open data" exciting and what's the difference between just opening a MySQL database to the public and the Semantic Web vision? Well, if a site is exposing its "database" in RDF using a common ontology, then you can make use of their data just as you'd use their services via an API. A data provider may not foresee all the potentially useful ways to use their data just as they may not foresee ways to make use of their API, but a clever programmer can take from their surroundings what is needed and make of it something more. If you think this is random, note that
As I said, I could be way off the mark here. This is just the simplified perspective I've adopted after thinking about it for a while and reading the common sources. Please don't take this as gospel or thorough, comments or corrections are very welcome.
I read that article, and it convinced me of nothing. All it says is that meta-data is not perfect, and will not create a utopia. Duh.
Write your own Choose Your Own Adventure. http://www.freegameengines.org/gamebook-engine/
I don't know what provoked your vitriol. I'm not a troll - but the moderators are welcome to disagree. Since they haven't yet, I'm currently disposed to thinking you're overreacting to my disagreement with your viewpoint. I'm as happy as you seem to be to let it lay, however.
[Ego]out
http://developer.opencalais.com/forum/read/14617