Slashdot Mirror


The 32-Bit Dog Ate 16 Million Kids' CS Homework (code.org)

"Any student progress from 9:19 to 10:33 a.m. on Friday was not saved..." explained the embarrassed CTO of the educational non-profit Code.org, "and unfortunately cannot be recovered." Slashdot reader theodp writes: Code.org CTO Jeremy Stone gave the kids an impromptu lesson on the powers of two with his explanation of why The Cloud ate their homework. "The way we store student coding activity is in a table that until today had a 32-bit index... The database table could only store 4 billion rows of coding activity information [and] we didn't realize we were running up to the limit, and the table got full. We have now made a new student activity table that is storing progress by students. With the new table, we are switching to a 64-bit index which will hold up to 18 quintillion rows of information.
The issue also took the site offline, temporarily making the work of 16 million K-12 students who have used the nonprofit's Code Studio disappear. "On the plus side, this new table will be able to store student coding information for millions of years," explains the site's CTO. But besides Friday's missing saves, "On the down side, until we've moved everything over to the new table, some students' code from before today may temporarily not appear, so please be patient with us as we fix it."

3 of 161 comments (clear)

  1. Re:Using the cloud is so safe and secure... by halivar · · Score: 4, Informative

    How do you back up data that was never stored? Or logs for transactions that never completed? And how, even if you had those transactions, would you meaningfully restore them when the restoration process itself would simply repeat the result of overflowing the available indexes?

    This isn't a typical disaster recovery scenario. The architecture itself is at fault, and the data is lost.

  2. Re:Using the cloud is so safe and secure... by Anonymous Coward · · Score: 2, Informative

    They didn't lose all data. The lost every every insert into a table the occurred after its index reached it's maximum value. As the database insert was the method of storing the data, there's nothing to recover.

  3. 64bit by ledow · · Score: 4, Informative

    Honestly don't get why everything these days isn't just 64-bit by default.

    You can hit 32-bit limits just buying a memory chip, or bog-standard storage. 4 billion is not a big number in those terms.

    32-bit times are dead.
    32-bit filesizes are dead.
    32-bit memory sizes are dead.
    32-bit file counters are dead.
    Hell, it's not inconceivable that in some things 32-bit user counters could die - with account recreation and spam accounts, surely the big people are having to deal with that.

    Just stop faffing about and use 64-bit for everything, by default, from the start. 8 bytes isn't a huge amount of overhead nowadays.

    But starting with the assumption "4 billion is enough" when some people have more than 4bn in their bank account, some services have more than 4bn users, and people can buy 4bn-whatevers in their local electronics store is stupid.

    But 4 billions lots of 4 billion is not a limit that you will hit for a very, very, very long time. Even 128-bit isn't unseen - IPv6, ZFS, GPUs - and that's 4 billion lots of 4 billion 64-bit numbers each of which is capable of holding 4 billion lots of 4 billion.

    Supercomputer architectures did this a long time ago, translating and assuming everything is 128-bit so that you never have to worry about a limit.

    Why does it take so long for basics like web servers and databases to get there? 64-bit by default, MINIMUM. Anything that incurs a performance hit on that is old, and up to the user to resolve.