Google Spanner: First Globally Scalable Database With External Consistency
vu1986 writes with this bit from GigaOm: "Google has made public the details of its Spanner database technology, which allows a database to store data across multiple data centers, millions of machines and trillions of rows. But it's not just larger than the average database, Spanner also allows applications that use the database to dictate where specific data is stored so as to reduce latency when retrieving it. Making this whole concept work is what Google calls its True Time API, which combines an atomic clock and a GPS clock to timestamp data so it can then be synched across as many data centers and machines as needed."
Original paper. The article focuses a lot of the Time API, but external consistency on a global scale seems to be the big deal here. From the paper: "Even though many projects happily use Bigtable, we have also consistently received complaints from users that Bigtable can be difficult to use for some kinds of applications: those that have complex, evolving schemas, or those that want strong consistency in the presence of wide-area replication. ... Many applications at Google have chosen to use Megastore (PDF) because of its semi-relational data model and support for synchronous replication, despite its relatively poor write throughput. As a consequence, Spanner has evolved from a Bigtable-like versioned key-value store into a temporal multi-version database. Data is stored in schematized semi-relational tables; data is versioned, and each version is automatically timestamped with its commit time; old versions of data are subject to configurable garbage-collection policies; and applications can read data at old timestamps. Spanner supports general-purpose transactions, and provides a SQL-based query language." Update: 09/20 17:57 GMT by T : Also in a story at Slash BI.
until Google decides to make it (and all your data) go away...
Spanner has two features that are difcult to implement in a distributed database: it provides externally consistent reads and writes, and globally-consistent reads across the database at a timestamp.
One of the issues with large distributed data systems was that reads at different nodes could retrieve data at a different (though consistent) state. I have seen this on google, a search shows a recent news item, then another doesn't show it again, before it finally covers all nodes and is generally available.
Making this whole concept work is what Google calls its True Time API, which combines an atomic clock and a GPS clock to timestamp data so it can then be synched across as many data centers and machines as needed.
I'm guessing there's a little more to it than reinventing and installing ntp on your DBMS server. That little bit more is the actual interesting part.
"Science flies us to the moon. Religion flies us into buildings." - Victor Stenger
"Spanner also allows applications that use the database to dictate where specific data is stored ..."
Would this put data out of legal reach from the Patriot Act if the data were stored in Sveden or Grand Caymans?
I wonder what Doug will be up to for the next year or so.
From the original paper linked at the summary post above:
Spanner’s data model is not purely relational, in that rows must have names. More precisely, every table is required to have
an ordered set of one or more primary-key columns.
OK, relational keys should not be ordered. But the fact that each table must have a key makes it a relation, at least in
principle, so Spanner at first looks like it is in fact more relational than SQL. Am I missing anything?
Leandro Guimarães Faria Corcete DUTRA
DA, DBA, SysAdmin, Data Modeller
GNU Project, Debian GNU/Lin
What happens when governments decide it's time to tamper with or block GPS signals?
I'm Rocco. I'm the +5 Funny man.
"Spanner" can mean "peeping tom" in German. Go figure.
The press coverage of this in Germany should be interesting. In german, "Spanner" is a derogatory term for a voyeur. Given that many Germans already feel like google is watching everything they do, they irony won't be lost on them!
...and I thought, these Android apps are really getting out of hand....
Koans and fables for the software engineer
I can imagine a lot of people misreading "Google Spanner" as "Google Spammer".
"The agriculture ministry is not in charge of Gundam" - Japanese ministry official.
In section 4, first para, last sentence
a read at t will see "every transaction that has committed as of t"
seems like it's really "every transaction that has committed with timestamp >= t"?
TT is pretty neat because it knows that somewhere there is an absolutely correct time, but nowhere in the DB to they actually know what the time is, only the value within an error bound. That said, there is no way the DB can know that something was committed at an exact time t, only that it was labeled with a timestamp that was close to the exact time, which is good enough.
So can you read the next row on the primary key to the next record greater than the one you are at?
I wonder how this compares in performance to the best Oracle can offer? Should they be looking over their shoulder, or looking ahead and figuring how to catch up.
Two 100ppm crystals might give you a 200uS/Sec error, but if you measure how they are drifting with respect to the timeservers, then the change in frequency offset should be more that an order of magnitude smaller. This might lower the crystal's contribution to the error bar 10-100x.
Nice job, thanks for publishing it.
In German, "ein Spanner" is a peeping Tom.