SPARQL Graduates to W3C Recommendation
KjetilK writes "The W3C just gave SPARQL the stamp of approval. SPARQL is a query language for the Semantic Web, and differs from other query languages in that is usable across different data sources. There are already 14 implementations of the spec available. Most of them are free software. There are also billions of relations out there that are query-able, thanks to the Linking Open Data project. The structured data of Wikipedia is now query-able at DBpedia. Also, have a look at Ivan Herman's presentations on this topic."
A query language for the semantic web...
A what for the what now?
I'd always assumed the semantic web was some meaningless and faded buzzword designed to keep the W3C away from useful stuff. Is it back again with a vengeance?
THE SEMANTIC WEB II: THIS TIME IT'S FOLKSONOMY
Eek.
Mr Sparquru! You have very lucky dishes!
"Sometimes, I doubt your commitment to SPARQL Motion! "
...
With apologies to Donnie Darko
A Human Right
Though the Semantic web is not important for the casual user--I think Google is pretty good now--but for a machine trying to converse with a human being, the semantic web is a great advance. I myself have an open source project on Googlecode that had a place holder for just this item. Thank god it's coming along.
Every time there is a story about the Semantic Web here, people trot out the old "It's utopian vaporware" nonsense. The technologies that stand behind the term "Semantic Web" have existed for nearly a decade now and have produced much fruit. Just see Visualizing the Semantic Web by Geroimenko & Chen (Springer-Verlag, 2nd ed. 2005) which has plenty of real-world examples of using these technologies to get real work done.
Sure, the average joe isn't producing semantically meaningful markup when he uses his whizbang Web 2.0 sites, but then again what the average joe produces isn't worth all that much anyway. Even if the Semantic Web doesn't expand to include all Internet activity, it has and continues to do much good.
I think Berners-Lee just shot his load.
Sure, the average joe isn't producing semantically meaningful markup when he uses his whizbang Web 2.0 sites, but then again what the average joe produces isn't worth all that much anyway. Even if the Semantic Web doesn't expand to include all Internet activity, it has and continues to do much good.
Cutting a swathe through your charmingly misplaced snobbery for a second, the ideal thing would be for you to provide a useful example or two of this human thing called SEMANTIWEB, and explain to silly old me how it has already changed my life but I'm just too gosh darned ordinary to have noticed.
Yet another web markup thingy i have to learn..
Web development surely is a bitch.
The OP did already refer you to a book, so why are you asking for examples?
Oh, it is actually really simple. See, first thing is that you link two documents. That's good old HTML. Then, you realise that you would want to link anything. Like persons. So, you give those persons a URI. You can't retrieve a person over the Internet, that's why it is a URI, not a URL, but you can get a description of the person. And then you realise that you want to say something about the nature of the relationship. So you put in a third URI that says something about the relationship. For example that the person knows that other person, or is his son, or something.
so
<http://www.kjetil.kjernsmo.net/foaf#me> <http://xmlns.com/foaf/0.1/knows> <http://www.w3.org/People/Berners-Lee/card#i>
simply says that I know timbl. I hope you're less stumped than you used to be.
If you grok this, you've grokked 90% of RDF.
Employee of Inrupt, Project Release Manager and Community Manager for Solid
Well, no, it hasn't changed your life just yet, but you could check out a few links in the story, there is a lot of potential there. I'm not going to run off on conspiracy theories, but it is pretty clear that many big players likes to keep things under locks, that's a hurdle that makes this take slightly longer.
In my submission, I gave an example query, which you can run at DBPedia with their standard prefixes:
SELECT ?name ?birth ?death ?person WHERE
{ ?person skos:subject ;
dbpedia2:birth ?birth ;
foaf:name ?name .
OPTIONAL { ?person dbpedia2:death ?death }
FILTER (?birth "1945-01-01"^^xsd:date) . }
ORDER BY ?name"
What this says is "give me the name, birth data and death date of a person that has the following properties:
It is a computer scientist, who has a birth day and a name and optionally a death date, then filter based on the date and order it by name.
There are now billions of such stuff you can query, and if you're open minded, it could indeed change your life.
Employee of Inrupt, Project Release Manager and Community Manager for Solid
I read the FAC, and once again we are reminded that graph theory, so fundamental to computer science, is not about making charts. But man, its a terrible word, because, one does want to think about graph as in graphic, when its really about the data. I think instead of graphs, we should call them something different, like:
schmoo
zenny
budka
dango
chumpy
or something. anything but graph.
This is my sig.
Maybe because he wants a quick example that doesn't require him buying a book.
Oh, come on. I can't be the only one that noticed that example doesn't work...
Am I the only person who looked at that name 'SPARQL' and went 'Is that Sun's new name for MySQL?'
My blog
The thing that bugs me a lot about this so-called semantic web is its reliance on humans to be accurate. Our minds do not operate on the same clear-cut logic as a machine, in other words we are able to make inferences from semantics.
To use your current example, what if your person was classified as a "programmer", or "software engineer" rather than a computer scientist? I understand that there are varying meanings for that word, my computer-science teach used to call first year students "computer-scientists" although they were inexperienced n00bs w/o a good understanding of the subject.
... not the semantic web
relations out there that are query-able
Gee, if only there was a general theory of data management that would allow us query RELATIONS. Wouldn't it be nice if this theory was complete, could handle any data management task, and had a closed algebra to build queries.
Too bad. I guess we'll have to wait until 1970 for someone to describe it. Until then, we can stick with more limited pointer-based data representations like trees and graphs, and relations that only have exactly three attributes. And everybody can create their own ad-hoc query languages and give them funny-sounding names. That would, like, totally rock.
I think I'll name mine FUQLBOLU - Fucked-Up Query Language Based On Limited Understanding
(That's pronounced FUKKEL-BOW-LOO, buddy.)
I suppose it's cool to emphasize the semantic web use of SPARQL. But, at its core SPARQL is a query language for RDF data stores. It takes some learning, but using SPARQL against an RDF data store feels much cleaner than using SQL against a relational database. It's slower though. Much slower. That's why it works best for small data sets.
My company stores the schema for our objects in RDF and use SPARQL to query against that schema. The actual data is saved to a relational database (our experiments with an all-RDF system concluded that it's just too slow for large data sets).
The RDF data stores can exist in arbitrary places (they don't need to be local), but I wonder how slow that would be to query.
Nevertheless, I encourage people to at least learn about this stuff. It's good for the same reason that learning about Ruby and Python is a good thing even if you only ever program in Java or C++. RDF and SPARQL make you start thinking about inferences and ways of storing data which allow you to derive more information from your information. When I first learned about RDF I had the same type of aha moments that I had when I first learned a dynamic language (FWIW, it was TADS3) after years of using static languages.
Cow Cube
SUNSHINE!
Ubuntu is an African word meaning 'I can't configure Debian'
In my submission, I gave an example query, which you can run at DBPedia with their standard prefixes:
Maybe my own search skills are rusty, but I couldn't find actual documents anywhere in the site, just various gibberish examples. In other words, is there actual documentation - especially a list of properties - anywhere ?
Forget magic. Any technology distinguishable from divine power is insufficiently advanced.
Free mod points to anyone who can give me the "So What" summary. The summary is useless and the linked articles failed to inform. Usually this just mans a circle jerk, but who know, there might be something useful or important in there.
I think the semantic web would be incredible, once it is widely implemented by content providers - a great example is dbpedia's query
"Soccer player with tricot number 11 from club with stadium with >40000 seats born in a country with more than 10M inhabitants"
but, as far as i can see, it's just too tedious to implement. There should be something in between full-fledged semantics, and stuff like RSS which expose information in a rather un-semantic way.
I just ran in to a problem trying to unify various "Event" feeds using RSS from various websites in a central calendar. Very stupid work, since it just feels wrong writing various parsers for such simple information like "date, subject, text". And if people can't even get around to use readily available standards for stuff like this, how will they ever implement the semantic web?
My 'solution' would be a gigantic awareness-campaign on how cool it would be vor everybody involved to use some kind of standard (even something like microformats)....
Guess we'll finally find out now. The Semantic Web remains Tim Berners-Lee's vanity project: well-intended but poorly thought out and unfortunately unwanted.
And after reading the standard, most of the articles, and looking at a couple of implementations, not only have I hit arbitrary but fairly high limits on what I will put up with before my eyes glaze over, I've also hit the 'don't give a s--- limitation as well.
One of their projects reports having indexed and interlinked "over two billion RDF triples, which are interlinked by around 3 million RDF links (October 2007)". Well frabulous. And so what??? Even looking at their graphic for all of the different databits that they have linked in, I don't find myself all that interested for a simple reason: to use the information with any kind of speed, I still have to take the data I can acquire and convert it into something that I can structure into a high speed high power database locally.
Which means secondarily that really what I want more is for other smart people to take the interlinked documents and the associated data and mine it and put it out in some location where I can use the data en-masse and at high speed for my own purposes. Not to learn yet another SQL variant that on it's best day will still be dog-slow.
I guess what I am saying is that -- while I understand the goals and purposes of the semantic web and the tools including SPARQL that are being developed for it-- I don't know when or even if the glaze factor will ever be low enough to capture my interest enough vs. looking for or maybe even buying the data that someone else has aggregated from the Semantic web. Thoughts?
...Open Source isn't the only answer -- but it's almost always a better value than the alternatives...
SPARQL is both a recursive acronym and contains other acronyms! SPARQL Protocol and RDF Query Language.
I vote this worst acronym ever!
http://www.openlinksw.com/index.htm
The guy mentioned turns out to be the founder and CEO, and he keeps a personal blog space with a lot of stuff about SPARQL, but man, protect your eyeballs from the vision gouging link clutter. Has all the visual appeal of a rental car insurance application form.
http://www.openlinksw.com/blog/~kidehen/
Even includes a link to the Zitgist data viewer. Amazingly, that domain was still available.
http://www.zitgist.com/ Zitgist (pronounced "zeitgeist") is an industry standards compliant Semantic Web Query Service. Its goal is to help Web users locate data, information, and knowledge on the Web. My god, hope springs eternal. The only occasion I'd pronounce "zit" as "zeit" is to rhyme it with "shite". Also, that's capital Web, pronounced "veeb".
Except you've already posted in this thread. Good luck modding me up.
Looks like there's a SPARQL grammar from which JavaCC can generate a parser (and, since it's a JJTree grammar, an abstract syntax tree). Nice to have that piece of work available and BSD licensed....
The Army reading list
After working in Semantic Web technologies for the last few years, and trying to integrate them into the current web, I can verify there's a lot of reasons people valid reasons people are skeptical.
1. True, RDF stores tend to be slow. Triple/RDF stores are sometimes built on top of SQL databases, and (for example) the database has to do a million inner joins per query. Column stores, other native graph stores offer some hope to this problem.
2. True, SPARQL isn't that hard to learn, and it's simpler than expressing the same kind of query in SQL.
3. True, SQL will be around for quite a while. Tables and RDBMSs are well understood and sufficient for many kinds of data which can be structured. Genetic data, NLP and other more atomic data needs a more generalized storage/query structure.
But the W3C is doing a work by preempting the problems that come without standardizing. (Looking at Iraq, pre-emptive strategies aren't always the best.) It's hard now because no one like Apple, or Sun is throwing huge dollars at the standard. But you HP has done a lot of work with Jena. There's OpenRDF, and YARS from DERI Ireland. There's ARC2 which runs in PHP and is easy to work with.
Yes, ANOTHER technology. But my job isn't just making web pages for money. The vision is to bring information to people and improve the stability and standard of life. Technology like this is already used by those with a lot of resources - so it's important to have some level of knowledge like this "for the people". (And no, SPARQL is not communist.... I think.)
Alas, "dango" is already taken. Calling them "meshes" or "networks" seems reasonable to me, though I suppose current usage is already well established.
Theoretically, classification is not a singleton value but a list of values.
My classifications could include "league bowler" "husband" "programmer" "database programmer" "texas resident" etc.
Layne
Oracle, Franz and OpenLink (the latter having a free software version), are about to show that in practice. I think we're getting there.
And people have been doing really large databases with great performance on in-house applications. I haven't seen them live, but I have read their papers.
So, it is an important point that you raise, but I think it has a solution in near future too.
Employee of Inrupt, Project Release Manager and Community Manager for Solid
I have experimented with RDF for many years (best toolkit for experimenting, I think, is the Swi-prolog semantic web library: http://www.swi-prolog.org/packages/semweb.html).
:-)
I much prefer the higher level OWL representation with descriptive logic, but the problem is that support for lower level RDF is much better. There are commercial and open source OWL+descriptive logic reasoner packages, but there is much better coverage for RDF tools. In any case, with the exception of the expensive (commercial) Lisp based AllegroGraph and RacerPro tools, and the open source Swi-prolog semantic web library, almost all of the tools I have tried (Sesame, Jena, Pellet, etc.) are written in Java. Fortunately, most of the Java tools are open source, so playing/experimenting only costs your time
Yeah, but this would just be tagging :)
In RDF, since you can relate any resource (i.e. concept) to any other, you can also relate "tags" (e.g. rdf:type properties) to each other. RDF features some simple inference-enabling vocabulary for creating taxonomies out of these, OWL offers even more and this is not the limit.
Then you can easily discover similar "tags" by analysing the number of common instances and semantic distance between them both in the taxonomy created by RDF/OWL vocabulary as well as any other.
A great example of this is the results for the sample query "Mayors of US cities higher than 1000m" -- of the ten results, Roger Reed, mayor of Fredericktown, Ohio, is mayor of a city that is 1090 feet above sea level.
"They redundantly repeated themselves over and over again incessantly without end ad infinitum" -- ibid.
But, you know someone out there is thinking to themselves: "How can I use this new technology to spam people."
Just like everything else, somehow someone is going to try to shove their advertising down it.
Fascism trolls keeping me up every night. When I starts a preachin', he HITS ME WITH HIS REICH!
Or, to generalize: the problem with the "semantic web" is that Good Old-Fashioned AI failed, and somebody seems to have failed to get the memo. The "semantic web" really is just "expert systems, now with XML! (but don't call them that!)." Somebody failed to read or understand late Wittgenstein.
Are you adequate?
Yesterday, I searched for myself on DBPedia and found one row :|
containing my name, birth date and death date which seems to be today!
WTF!
Too bad it's not XML. I really like to generate/modify my XSLT scripts using XSLT.
I recall reading that Chris Date has indicated that RDF can be used as a foundation for relational theory (it's in "sixth normal form").... the major difference is that most RDF processors make an open-world assumption vs. most databases assuming a "closed world".
-Stu
So what's the big improvement of your example over, say
SELECT name, birth, death from person
WHERE yada, yada, yada (or perhaps OUTER JOIN depending on the structure)
AND birth = '1945-01-01'
ORDER BY name
I really can't see that the query syntax will change anyone's life. I'm sure that data sets that are non-relational and 2D will be a great thing and that the query language for it won't.
Oh, I can't help quoting you because everything that you said rings true
Perhaps I'm being naive, but can't you use xquery for this? I was thinking the other yesterday after the Sun/mySQL annoucement that if I had to pick something today likely to follow the same trajectory as Monty's little project that it would be eXist-db, or some other NXD.
I, for one, shall kneel before our great querying overlords.
The hurdle I have always seen with these kind of meta-data collections is that it is a huge amount of for the large part manual effort to convert the plentiful textual data into "semantic" meta data. For example its easy for a person to know that something like "DOB: 1/1/01", "foo was born on Jan, 1, 1901", "foo born son of bar in the late 16th century" and a picture of a family tree with dates on it all represent the same semantic data but how can we extract that with little effort.
As far as I can tell the semantic web solution is to meta-tag by hand which is going to lead to problems such as interpretation, for example imagine classifying musical genres and you already run into problems, are acoustic guitar duo Rodrigo y gabriela "metal", "pop", "flamenco", "acoustic", all of the above? That is of course not to say the enormous amount of manual (or at best semi-automated) work involved in the tagging.
Further the space of semantic relations between an arbitrary set of objects is enormous (certainly lower bounded by n!, and given you can have an arbitrary selection of semantic relation types the total space is essentially infinite). Your example isn't actually very useful if you are researching computer scientists where DOB and DOD arent really very relevant. Can I for example make the much more useful query on Computer Scientists who have published at least 10 journal articles and who are alo behaviourists but not connectionists? It requires a domain expert to give judgement on exactly what denotes a behaviourist or a connectionist, and some of that information might be inferred. For example if a person has published a paper on novel back propagation techniques for neural nets they are probably a connectionist.
All this basically means is that by necesity the semantic space is (and always will be) sparsely populated until we have solved the natural language semantics problem. Which means that most of the time the data that someone would want to query for just won't be there.
I have discovered a truly remarkable sig which this post is too small to contain.
THIS
IS
SPARQL!!!
A person could be classified as a programmer AND a Software Engineer AND a computer scientist, and/or the taxonomy could have those terms as synonyms. OWL/RDF supports multiple classification.
It also works on OpenWorld reasoning, which means that even if the terms programmer/software engineer/computer scientist aren't synonymous in your system, and even if an individual belongs to only one of those classes, it isn't assumed that they DON'T belong to the other two classes. Not unless the three classes are declared disjoint, so that individuals in one class can't belong to the other two.
Accuracy still counts, but individuals aren't just one thing, and therefore don't have to be unnaturally pigeonholed into just one class.
This is oft-repeated argument against the Semantic Web, but it doesn't hold up to close examination. The Semantic Web doesn't rely on human accuracy any more than computer applications in general do, and the Semantic Web also provides a platform on which one can establish distributed trust systems, etc., to address problems associated with source unreliability.
That's what defined ontologies exist to deal with.