Slashdot Mirror


Practical RDF

briandonovan writes "World Wide Web Consortium (W3C) Director Tim Berners-Lee and his compatriots would like to transform the current Web into a 'Semantic Web' where 'software agents roaming from page to page can readily carry out sophisticated tasks for users' using 'structured collections of information and sets of inference rules.' The Resource Description Framework (RDF), designed as a language for expressing information about resources on the Web, and allied technologies are the result to date of ongoing efforts at the W3C to furnish Semantic Web proponents with the requisite tools. While it's far too early to predict whether TimBL's grand vision will be realized, RDF/XML (the XML serialization of RDF) is already in widespread use, having been incorporated into a surprising array of applications." Read on below for briandonovan's link-stuffed review of O'Reilly's Practical RDF. Practical RDF: Solving Problems with the Resource Description Framework author Shelley Powers pages 331 publisher O'Reilly & Associates rating 9/10 reviewer Brian Donovan ISBN 0596002637 summary Great introduction to RDF, an assortment of tools and utilities for working with RDF, and some real-world applications.

RDF first hit my radar screen a couple of years ago while I was working on a barebones tool to manage my personal website. I was writing the code to generate RSS feeds ("What is RSS?") for my site and had to choose whether to support RSS 0.9x (non-RDF) or RSS 1.0 (RDF-based) or both. Long story short: I went with RSS 1.0 and was able to implement the feeds, but never got any further into RDF afterwards. I couldn't make headway through the RDF-related working drafts rapidly enough to justify the time that I was spending, there weren't any worthwhile-looking books available at the time, and the few online tutorials that I found were sorely lacking -- possibly because the specs themselves were still evolving as the RDF Core Working Group hashed out some remaining issues.

Fast forward a few years: the dust in RDF-land seems to be settling a bit (although new working drafts of all of the current RDF specs were released on September 5th, most of the changes from previous versions appear to be relatively minor) and, with the publication of Shelley Powers' Practical RDF: Solving Problems with the Resource Description Framework, there's finally a good book available on the subject.

Overview After an introductory chapter that touches on the history of RDF and some applications of RDF/XML (the preferred, W3C-blessed serialization of RDF), the book is divided into three broad sections. In the first, the reader is guided through the raft of documentation produced by the RDF Core WG, including : Resource Description Framework (RDF): Concepts and Abstract Data Model, RDF/XML Syntax Specification, RDF Model Theory (formerly Semantics), and RDF Vocabulary Description Language 1.0: RDF Schema. Before moving on to Part II, where she surveys programming language support and tools available for working with RDF (with code snippets where appropriate), Powers spends a chapter developing an RDF vocabulary, "PostCon," that's used throughout the remainder of the book for demo purposes.

Chapter 7, the first in the tools-focused portion of Practical RDF is dedicated to (mostly Java-based) editors, parsers, validators, browsers, etc. for desktop use. Next, she dives into Jena, the Java RDF toolkit that began life as the labor of love of HP Labs researcher Brian McBride before being elevated to the status of a formal HP Labs project under their Semantic Web Research umbrella. Another HP Labs Semantic Web project, Damian Steer's BrownSauce, a slick little Java-based RDF browser, was introduced back in Chapter7. Means for manipulating RDF/XML in Perl (RDF::Core, part of Ginger Alliance's PerlRDF project), PHP (RAP, the RDF API for PHP), and Python (RDFLib) are addressed in Chapter 9. RDF query engines/languages are taken up next -- rdfDB QL, the query language of R.V. Guha's rdfDB (written in C); SquishQL, implemented in the Java-based Inkling query engine (built atop PostgreSQL); RDQL, used within Jena; and Sesame, a JSP/Servlet querying engine that supports both RDQL and its own query language, RQL, and can be deployed atop MySQL or PostgreSQL. Powers rounds out this part of her book with a chapter that deals briefly with the leftovers. Drive, an RDF API for C#, is briefly discussed along with RDF APIs for less fashionable programming languages : Nokia's Wilbur for CLOS, XOTcl for Tcl, and RubyRDF for Ruby. Redland, an RDF toolkit written in C with Java, Perl, PHP, Python, Ruby, and Tcl wrappers, is covered at some length (about half a dozen pages) and a couple more are given over to Redfoot, a Python RDF framework consisting of RDFLib (mentioned earlier in the Perl/PHP/Python chapter), a small-footprint HTTP server (according to the changelog at redfoot.net, they're using Medusa), and a native scripting language called Hypercode that lives within CDATA blocks in RDF/XML (example).

The last third of Practical RDF is devoted to uses of RDF and begins with a chapter on the OWL Web Ontology Language, an extension to RDF that's designed to supply more constraints for RDF vocabularies than can be provided by RDF Schema alone. This chapter would have been better situated after Chapter 5, which addresses RDF Schema, and feels a bit out of place here. RSS 1.0, the RDF-based syndication format, gets a chapter all of its own, beginning with a short synopsis of the evolution of RSS and the rift between the RSS 0.9x/2.0 and RSS 1.0 camps, progressing through descriptions of the RSS elements, some discussion of the use of modules, RSS autodiscovery, and aggregators (Amphetadesk, Meerkat, and NetNewsWire are mentioned), and finishing with an example RSS file (a syndicated list of book recommendations), producing RSS 1.0 using the Informa RSS Library (a set of Java classes), and merging two RSS 1.0 files using the XML::RSS Perl module. Two "Applications Based on RDF" (commercial and noncommercial) chapters top off the book. Noncommercial applications of RDF are visited first : Mozilla, where history and bookmarks, among other classes of information, are stored in RDF; the Creative Commons licensing scheme, whose proponents encourage content creators to embed RDF snippets into their documents and applications to provide information about the work itself and the restrictions placed on its reuse under the particular CC license that they've chosen; a Java and PostgreSQL based digital library system jointly developed by MIT and HP that uses RDF; and FOAF (Friend-of-a-Friend), an RDF vocabulary designed to express personal information and interpersonal relationships. Among the list of commercial applications utilizing RDF that comprises the final chapter in the book is Chandler, the same as yet very-alpha personal information manager that's managed to garner multiple mentions on this site.

The Verdict

The real meat of Practical RDF, for me, was in Chapters 1 through 6 (plus the OWL chapter, Chapter 12). This is not to say that the material in the last 2/3 of the book isn't useful or interesting. The section on RDF software tools is a great annotated survey of what's out there right now ... and I would imagine that installing and testdriving each of the software applications featured in those chapters must have been an extremely time-consuming process. The chapters describing real-world applications of RDF could be useful to someone trying to convince a manager that RDF is a viable, widely-used technology. Given a choice, though, I would rather have seen those pages spent on additional coverage of RDF, RDFS, and OWL with more example RDF vocabularies developed (like PostCon, which the author formulated, then refined through RDFS and OWL). The displaced material could have been made available online at the author's site for the book. A lot of that information will become less accurate over time as the software evolves and people come up with more applications for RDF anyway.

All nitpicking aside, though, if you're looking for a book on RDF, then you can't go wrong with Shelley Powers' Practical RDF.

You can purchase Practical RDF from bn.com. Slashdot welcomes readers' book reviews -- to see your own review here, read the book review guidelines, then visit the submission page.

18 of 120 comments (clear)

  1. RDF is quite pratical by stonebeat.org · · Score: 4, Informative

    RDF is quite pratical - with or without the book. There are several hundred websites explaning how to use RDF in your application. There are classes for JAVA/PHP etc for this purpose. A interesting use of RSS is to integrate it with the IMAP, and get the latest email show up on your portal page.

  2. Inside RDF is a smaller language... by Googol · · Score: 2, Interesting


    RDF is a great idea. But it needs to loose the java and the XML. People who are attracted to those have no use for RDF--they want messages they can read without documentation. I know XML is more than that, but in the corporate world its attraction is "configuration files I can read after the author was outsourced".

    There are two XML movements--one creating a kludgy layer of application bureaucracy and the other visionary. RDF presently combines the worst of both. Neither "side" really wants it. AI is happy with ontologies and the corporate world is happy with messages 100 times larger than the underlying network protocol. (Could be worse: ASN.1 anyone?)

    *BUT* the underlying idea to RDF (ontologies for your metadata). RDF schema is really more important than RDF syntax. The idea is a simple model for describing metamodels. This fits in the same space as UML metamodels, and the Common Warehouse metamodels, only it is much more light weight and you can implement it with existing tools (you do have to use XML--eeeewww).

    XML serves one good perpose--it makes s-expressions socially respectable in corporate world and for that I am greatful. They almost got Scheme in too (DSSSL), but the angle-bracket police got them. Too bad.

    RDF can sneak in metaprogramming if you let it.

    =googol=

    1. Re:Inside RDF is a smaller language... by Googol · · Score: 2, Informative
      Ahem. Could you please elaborate more on this? If I am not mistaken RDF is good for creating links to physical resources with a certain kind of criteria that can be shared between different kind of applications.

      What you are describing is the most common and original application of RDF--streaming content. RDF itself is also an XML-compatible syntax and a schema (box and arrow diagram) for that syntax. The intended interpretation of the syntax is the description of "models". That is, the RDF is a metamodel and its schema is a meta-metamodel.

      Think of it like a for layer model. You have your message (data), the structure of your message (metadata), the structure of your metadata (RDF), and the structure of RDF (RDF schema). Why 4 layers? Because everyone gets tired after 3--you don't need more than pointer to pointer to void in C: **mydata. Data plus two levels of abstraction is enough--you reach closure since a pointer to a pointer is still a pointer.

      Of course data describing data is also data, so you could stop at three levels. But everyone likes to think about models, not data, so you get three levels of models. RDF Schema is just a model for describing metamodels. Nothing to do with content at all, except as an application.

      =googol=
  3. RDF is not RDF/XML Was: Stop the XML madness by wdebruij · · Score: 2, Informative

    For a research project I've actually been doing a bit of reading about RDF and OWL yesterday. When you do, you occasionally come across these types of remarks.

    > java + XML = demand for 4+ghz CPUs

    Let's make one thing clear: RDF is not an instantiation of the XML syntax. You can use XML to transfer RDF statements, but for reasoning other, internal, representations are to be preferred.

    As I'm working on a Prolog project that needs RDF I use the SWI-Prolog RDF library, which, according to
    this recent paper (pdf)* speeds up processing 22 folds compared to using the RDF/XML serialization syntax. Please note that Mozilla uses Prolog+RDF as well.

    (*) here's google's html version of the paper

  4. Re:Beautiful by Shimbo · · Score: 3, Funny

    XML hasn't been widely adapted yet

    True, but would you want to see 'XML the Movie' ?

  5. Dave Winer... by antic · · Score: 5, Funny


    $50 says Dave Winer is pissed off that he didn't get mentioned in this write-up...

    Double that if I do a gleeful dance when it turns out that he is. Weeee!

    --
    'Thats they exact same thing a banana wrench monkey.'
  6. Spoilers. by Captain+Large+Face · · Score: 2, Funny

    Damn it.. I hate it when reviewers give away the ending.. :(

  7. Re:Stop the XML madness by julesh · · Score: 4, Insightful

    java + XML = demand for 4+ghz CPUs

    Err.. OK.

    1. Java runs perfectly adequately for me on my 400 MHz machine. Typical application startup times are ~1 second which is generally acceptable, and once the application is running there's not normally a noticeable difference between it and a 'native' application (whatever that might mean for you...). (Note the distinction between noticeable and measurable, also please bear in mind that I'm not talking about AWT/Swing apps here, those really are slow, but that's the library not the language that's responsible, IMHO).

    2. XML might be a little slower to process than other similarly expressive data formats (eg s-expressions, ASN.1 and similar). Maybe by a factor of 10, even. However, the data formats I am comparing it to were considered acceptable for use on 4 MHz processors, and even then the I/O time was a lot more significant than processor time for such operations. Processor speed growth has substantially outpaced IO speed growth over that period.

    AFAICT the only people "demanding 4 GHz CPUS" are the "I've got a better PC than you" crowd, serious gamers, and people who are doing really demanging applications, like video editing or scientific applications (or who want to do a lot of work on ).

  8. It Needs More Vocabulary Descriptions by Erisian+Pope · · Score: 4, Interesting

    I just finished skimming the whole book and reading about half. My biggest complaint is there isn't much guidance as to where you should go and define your own vocabulary and where you should use an existing one. The only vocabulary discussed besides the RDF core is Dublin Core. To make things worse, most of the examples shows using a custom vocabulary that unnecessarily defines 'Author' and 'Title' instead of using Dublin Core's 'creator' and 'title'.

    I like RDF alot, its really a great tool, but without some serious guidance and discipline when defining vocabularies its going to descend into babble and become pretty useless.

    Does anyone know of a good resource for finding emerging standards for RDF vocabularies so we don't all go out and reinvent the wheel?

  9. Serious flaws in the current semantic web model by Anonymous Coward · · Score: 4, Insightful

    I've been working in this area. First off the reviewer is wrong. There are very few production systems using RDF. In fact most of it right now is pure academic research. The commercial implementations of RDF graft on a whole bunch of things to make it useful. One critical flaw of the current thinking is URI is authorative and persistent. In other words, a URI uniquely identifies a domain and does not change. That is a falicy which does not exist in commercial sites. URI/URL's are rarely persistent or authorative. RuleML in my opinion is a much better approach to building a semantic web. As far as OWL goes. It is horribly broken and the commercial industry is moving towards other models of onotology. Most are actually going with a webservices model, rather than a strict ontology. There are numerous issues and problems which the current semantic doesn't address. For example the whole concept of binding is poorly addressed and is not flexibly. Many of the researchers believe RDF should be the object model, but companies are using schema, relaxNG and XMI. Semantic web holds a lot of promise if only they work out these critical issues.

  10. Re:Scary page by danbri · · Score: 2, Interesting

    Yeah, it was staring into their cold dead eyes that had me generate this one for the FOAFCorp,
    http://www.foaf-project.org/images/foaf lets.corp.p ng

    (foafcorp: http://rdfweb.org/foafcorp/intro.html -- reworking of theyrule.net data in rdf and svg)

  11. RDF Tools by MarkWatson · · Score: 2, Informative
    I enjoyed this book review - useful, and the links to tools are useful.

    One tool not mentioned: the semantic web library for Swi-Prolog that provides a high level toolkit for dealing with RDF, Owl, etc. Since the hoped-for use of RDF is applications that make logical inferences, Prolog seems like a good language to use :-)

    The Jena and Sesame packages are written in Java and also are very good tools.

    The big problem is getting people to use RDF - this technology can only be useful if enough people use it (think FAX machines).

    I believe that the earliest large scale adoption of Semantic Web technologies will really be on company LANs and be used for organizing company/.organizational information.

    Think of shifting from information technology to knowledge management technology.

    -Mark

  12. Re:What kinds of advanced searches? by gatekeep · · Score: 2, Insightful

    You're not thinking outside the box. What the world really needs is searching along the line of "All the pr0n since yesterday with red-haired women."

  13. Where're the Semantics? by plasticmillion · · Score: 2, Insightful
    I have to admit that I haven't been following RDF closely for a year or so, but I did spend a lot of time investigating the standardization effort from its inception (in like 1996... no joke). At the time I was struck by the appallingly obfuscated specification and syntax.

    It seems like a lot of progress has been made since then, but personally I still don't see the point. If you buy into XML as the "lingua franca" of semantic data interchange, then great. I do too. But what exactly is RDF useful for? If we can agree on an XML schema for our data, we can exchange it directly without the need for yet another layer of abstraction on top of it.

    The really hard part is agreeing on the schemas, and this has nothing to do with RDF. Having worked in one XML vocabulary standardization effort (Universal Business Language), I can only stress that the technical and political challenges of getting any group of individuals and companies to agree on any common data format are enormous. For example, it would be great if Amazon and B&N used the same schema for their book descriptions, but imagine trying to make this happen (particularly as they are likely to feel that the specificities of their formats represent some kind of competitive advantage).

    So until proven wrong I continue to believe that RDF is nothing but smoke and mirrors. The easy stuff is done by XML right out of the box, and the hard stuff has nothing whatsoever to do with data structures and wire serialization formats.

    1. Re:Where're the Semantics? by danbri · · Score: 2, Insightful

      The RDF design addresses the concerns you raise, by virtue of RDF's focus on data merging. You can't take two arbitrary XML documents and (without domain knowledge) reliably merge the information they encode. You can with RDF; just merge the sets of triples that constitute the two RDF graphs. This has knock-on effects in the real world: the granularity of "mixing and matching" between independent vocabularies is much finer. Instead of picking whole document formats, you can use just some parts of another's RDF vocabulary. This gets us away from a situation where you have to decide to use, or not use, an entire XML vocabulary.

      For example, FOAF documents often contain bits of markup designed in other fora, alongside terms from the core FOAF vocabulary. Markup that describes places (lat/long/alt etc)., documents (Dublin Core), syndication (RSS), 1000s of noun terms (Wordnet), and various others (blood type, food preferences, biographical details).

      RDF makes it cheaper to put together this sort of composite information, since the groups (formal and informal) who came up with these vocabularies didn't need to sit around a table together and agree a single common DTD or XML schema. They each did what they do best, and RDF glues it all together.

    2. Re:Where're the Semantics? by plasticmillion · · Score: 2, Insightful
      Perhaps I am playing devil's advocate here, but not intentionally. I really don't get it. Let's say I design a set of XML schemas using XSD, along the lines that you mention (i.e. places, documents, syndication, etc.). Each one has it's own namespace.

      Why couldn't I just make an FOAF schema that pulls in the element types from the appropriate "component" schemas, qualifying the types with the correct namespaces?

      It still strikes me that RDF is simply an alternative to XSD, and it's not clear to me why it is a better one.

  14. / Because providers always tell the truth... / by *weasel · · Score: 3, Interesting

    /(...)/ == sarcasm

    On our staggeringly democratic web, anyone can be a publisher, and as Meta tags have shown - not everyone has the truth in mind.

    I find it odd to note that it is never discussed how RDF will be kept from rapidly degenerating into Meta-tag style abuse.

    Will there be an authority that will verify content descriptors, or at least handle complaints of abuse?

    I would honestly like someone to prove me wrong, to show me where the technology prevents, handles and/or reduces abuse. Because I'm genuinely excited about what is possible with a trustworthy intelligent network. However, I'm just not seeing it here.

    Even normally trustworthy hosts tend to have some disingenuous information in their RSS feeds when they think it will benefit their business.

    (Eg. altering post dates or posting phantom or questionable updates to get more hits from feed subscribers, broadly labelling their content to avoid being properly categorized to expand their exposure, etc)

    So is it accounted for?

    --
    // "Can't clowns and pirates just -try- to get along?"
  15. Re:OWL is not for what the reviewer thinks by SpammersAreScum · · Score: 2, Informative

    Sorry, I have to disagree with this, having developed ontologies in both DAML and OWL. Both build on RDF and RDF Schema. An OWL ontology uses subClassOf, subPropertyOf, domain, range, et al out of the rdfs (RDF Schema) namespace.