Slashdot Mirror


Google Open Sources Its Data Interchange Format

A number of readers have noted Google's open sourcing of their internal data interchange format, called Protocol Buffers (here's the code and the doc). Google elevator statement for Protocol Buffers is "a language-neutral, platform-neutral, extensible way of serializing structured data for use in communications protocols, data storage, and more." It's the way data is formatted to move around inside of Google. Betanews spotlights some of Protocol Buffers' contrasts with XML and IDL, with which it is most comparable. Google's blogger claims, "And, yes, it is very fast — at least an order of magnitude faster than XML."

7 of 332 comments (clear)

  1. compare to thrift ( from facebook) by Anonymous Coward · · Score: 5, Informative

    both really from the same design sheet, but thrift has been opensource'd for over a year, and has many more language bindings. its been in use in several opensource projects (thrudb comes to mind), and has much more extant articles/documentation.

    http://developers.facebook.com/thrift/

  2. Re:No PERL API ??!!?? by yknott · · Score: 5, Informative

    According to Brad Fitzpatrick's(of LiveJounral fame) blog, He's working on Perl support.

  3. Re:WTF am I missing by jandrese · · Score: 5, Informative

    They open sourced the compiler (for C++, Java, and Python) that lets you actually use the data interchange format. If you follow the link you can download the code and start using it today. The code is open source.

    --

    I read the internet for the articles.
  4. Re:JSON by Temporal · · Score: 4, Informative

    Structurally Protocol Buffers are similar to JSON, yes. In fact, you could use the classes generated by the Protocol Buffer compiler together with some code that encodes and decodes them in JSON. This is something some Google projects do internally since it's useful for communicating with AJAX apps. Writing a custom encoding that operates on arbitrary protocol buffer classes is actually pretty easy since all protocol message objects have a reflection interface (even in C++).

    The advantage of using the protocol buffer format instead of JSON is that it's smaller and faster, but you sacrifice human-readability.

  5. Re:Why another encoding scheme? by Abcd1234 · · Score: 4, Informative

    You think? Take BigTable. Wikipedia describes it as: '"a sparse, distributed multi-dimensional sorted map", sharing characteristics of both row-oriented and column-oriented databases'. Sounds, to me, like a specialized solution to a very specialized problem, a problem that, I presume, didn't fit with any existing solution. Same goes with GFS. After all, do you really think they didn't evaluate existing solutions before embarking on building an entirely new distributed filesystem? Do you really think they're that stupid?

    As for Protocol Buffers, given the existing solutions out there (such as ASN.1 and CORBA) are generally ugly and/or over-engineered, it sounds to me like they're simply addressing a gap in the industry... after all, XML and SOAP aren't the end-all and be-all of generic object-passing protocols.

  6. Re:An order of magnitude over XML? by jd · · Score: 4, Informative
    Technically, you are correct - platform-agnostic data transfer has been possible since Sun's earliest RPC implementations. However, this seems to be considerably lighter-weight (although so is Mount Everest) and because order is specified, it's going to be much simpler to pluck specific data out of a data stream. You don't need to have an order-agnostic structure and then an ordering layer in each language-specific library.

    There have been all kinds of attempts to produce this sort of stuff. RPC, DCE, Corba, DCOM, etc, are programmatic interfaces and handle function calls, synchronization, etc. OPeNDAP is probably the closest to Google's architecture in that it is ONLY data. It's more sophisticated, as it handles much more complex data types than mere structures, but it has its own overheads issues. It isn't designed to scale to terabyte databases, although it DOES scale extremely well and is definitely the preferred method of delivering high-volume structured scientific data - at least when compared to the RPC family of methods, or indeed the XML family. I wouldn't use it for the kind of volume of data Google handles, though, you'd kill the servers.

    --
    It's a small world and it smells funny; I'd buy another if it wasn't for the money; Take back what I paid (SoM)
  7. Re:Likely story! by cnettel · · Score: 4, Informative

    The problem is that, in my experience, it is easy to write a 99 % XML-compliant parser that is 10 times faster. That last percent, though...