Slashdot Mirror


The 32-Bit Dog Ate 16 Million Kids' CS Homework (code.org)

"Any student progress from 9:19 to 10:33 a.m. on Friday was not saved..." explained the embarrassed CTO of the educational non-profit Code.org, "and unfortunately cannot be recovered." Slashdot reader theodp writes: Code.org CTO Jeremy Stone gave the kids an impromptu lesson on the powers of two with his explanation of why The Cloud ate their homework. "The way we store student coding activity is in a table that until today had a 32-bit index... The database table could only store 4 billion rows of coding activity information [and] we didn't realize we were running up to the limit, and the table got full. We have now made a new student activity table that is storing progress by students. With the new table, we are switching to a 64-bit index which will hold up to 18 quintillion rows of information.
The issue also took the site offline, temporarily making the work of 16 million K-12 students who have used the nonprofit's Code Studio disappear. "On the plus side, this new table will be able to store student coding information for millions of years," explains the site's CTO. But besides Friday's missing saves, "On the down side, until we've moved everything over to the new table, some students' code from before today may temporarily not appear, so please be patient with us as we fix it."

31 of 161 comments (clear)

  1. Well then. by Anonymous Coward · · Score: 5, Funny

    That doesn't inspire a whole lot of trust in the system. Who did they get to code this thing, elementary school kids?!?

    1. Re: Well then. by Anonymous Coward · · Score: 4, Insightful

      Are you kidding or just have no memory of the past? Sites were incredibly fragile from before. Outages were the norm and you could take down most weak terribly written PHP sites by sneezing at them the wrong way. Maybe you haven't been here long, but we used to have this thing called "Slashdotting" which would take websites down just by being linked to by this webpage. Nothing had any ability to scale and of it did 99â... of the time that hardware was wildly over provisioned.

      This has nothing to do with your "hur Hur millennials" bullshit. 32 bit and 64 bit numbers, and problems with picking the right one had been around since the beginning of time. The person that picked 32 bit instead of 64 bit was more likely to be some grizzly old-timer used to drive and memory space being the main constraint. The evil millennial characatur you hate so much would have made it 128 bit and wasted all that space because in this day and age, why the fuck not?

  2. Using the cloud is so safe and secure... by QuietLagoon · · Score: 3, Insightful

    At least there was a back-up... Or not... Not even a 24-hour transaction log... Or not... Way to go code.org... set that example...

    1. Re:Using the cloud is so safe and secure... by halivar · · Score: 4, Informative

      How do you back up data that was never stored? Or logs for transactions that never completed? And how, even if you had those transactions, would you meaningfully restore them when the restoration process itself would simply repeat the result of overflowing the available indexes?

      This isn't a typical disaster recovery scenario. The architecture itself is at fault, and the data is lost.

    2. Re:Using the cloud is so safe and secure... by Anonymous Coward · · Score: 2, Informative

      They didn't lose all data. The lost every every insert into a table the occurred after its index reached it's maximum value. As the database insert was the method of storing the data, there's nothing to recover.

    3. Re: Using the cloud is so safe and secure... by Anonymous Coward · · Score: 2

      Shouldn't whatever interface these students were using have told them the save failed?

  3. And a valuable lesson learned: by aix+tom · · Score: 5, Insightful

    Don't trust the cloud as the only place you store your work.

    1. Re:And a valuable lesson learned: by Tablizer · · Score: 2

      Don't trust the cloud as the only place you store your work.

      A generalized version is don't trust any one system. Put copies on different servers/devices.

      Of course there's a break-even point where the labor to manage backups exceeds that lost on average to failures.

  4. Don't look at it that way... by mmell · · Score: 5, Insightful

    Consider this a real-world lesson for our youth in the ways that design choices can have unanticipated effects on implementation, manageability and viability of software in the long haul. For extra credit, the kids that are affected should be encouraged to explore what they could have done to mitigate the risk caused by some grown-up's oversight.

    1. Re:Don't look at it that way... by Anonymous Coward · · Score: 2

      The people that originally made the decision to go with a 32 bit table are probably long gone. The real lesson here is don't waste time worrying about things you won't be around to have to deal with.

    2. Re:Don't look at it that way... by Dog-Cow · · Score: 3, Insightful

      Y2K wasn't a "bug". It was a reasonable design decision made when storage (both RAM and long-term) was expensive and scarce. Computer systems were new, and no one had any idea how long programs would be running.

      On the flip-side, 64 bit ints have been cheap for ages now. Code.org programmers were just lazy fucks.

    3. Re:Don't look at it that way... by TheRaven64 · · Score: 2

      For a lot of things, a 64-bit id is overkill, but if you do use a smaller int, then always set something up to notify you when any of the top few bits is set so that you have a nice long time to migrate the data.

      --
      I am TheRaven on Soylent News
    4. Re: Don't look at it that way... by reanjr · · Score: 3, Interesting

      For a lot of things, 32 bit is overkill, but you don't see people storing 24 bit numbers. This is a fundamental problem with premature optimization. You should always use the largest precise integer available unless you have a compelling, evidence-based reason not to. The onus should be on the 32bit users to demonstrate their choice is better.

  5. We will never learn by Xarin · · Score: 5, Funny

    4 billion rows of coding activity is all we will ever need

    1. Re:We will never learn by newcastlejon · · Score: 4, Insightful

      What sort of DBMS are they using that doesn't notify the admin when a table is nearly full? What sort of client are they using that doesn't tell the user when an attempt to write to a DB fails?

      --
      If God forks the Universe every time you roll a die, he'd better have a damned good memory.
    2. Re:We will never learn by Mashiki · · Score: 2

      If you can't make it fit onto a 8-bit eeprom chip you're doing it wrong?

      --
      Om, nomnomnom...
  6. the people dont care. by nimbius · · Score: 4, Insightful

    "The way we store student coding activity is in a table that until today had a 32-bit index... The database table could only store 4 billion rows of coding activity information

    if it can only store four billion rows, it isnt "the cloud." its just a KVM instance running on a shared hosting facility then, isnt it.

    we didn't realize we were running up to the limit, and the table got full.

    so not only were you incapable of scaling your infrastructure or your program to handle four billion rows --something every sysadmin on the planet is capable of-- you weren't even competent enough to set up monitoring for it.

    We have now made a new student activity table that is storing progress by students.

    the ones that lost all their data dont care. the students will leave to try something else, the educators will fall back on lesson plans that werent written by a corporate think tank, and your 'hour of code' will remain just another hour of minecraft in a kids life.

    With the new table, we are switching to a 64-bit index which will hold up to 18 quintillion rows of information.

    you dont get it. no one fucking cares about your SQL table limits but you, and youre oblivious to the fact that a table with eighteen quintillion rows would never load. code.org will be no different than the spanish or french class in a kids life. a fractional percentage of them will actually go on to use it as a career.

    --
    Good people go to bed earlier.
    1. Re: the people dont care. by Dracos · · Score: 4, Funny

      09:19 to 10:33 is 74 minutes, not 20. Did you learn arithmetic on code.org?

  7. More important lesson by saboosh · · Score: 5, Insightful

    Thank you for teaching the kids the importance of taking responsibility and being honest and open about your mistakes. It's okay to make mistakes as long as we learn from them. Too many people today are afraid of making mistakes and cover them up.

    1. Re:More important lesson by saboosh · · Score: 5, Insightful

      I find people like anecdotes here so please allow me to add: I was raised by very "tough" parents with a very "tough" form of discipline. Mistakes meant punishment. Today I have a 9 year old daughter who, like any other human being, makes mistakes. A few years ago I noticed a very strange phenomena with regards to "dealing" with her mistakes". When I would get upset with her and punish her for spilling on the couch or forgetting to clean her room I would see her make it again and as time went on she would get, either, more defensive about it or try to lie about it. At some point my fiancee asked that I try a different approach: Try being kind and loving with my response and take time with her to show empathy, to share that Im not perfect either and to figure out another way of handling whatever the mistake was.... the taking-time part is probably the toughest for me because it means work, im sure many can relate.... but, strangely, I noticed that she was making the mistakes I handled the new way a lot less... and she seemed to be ok with handling them a new way. She started to clean her room on her own and even though her coordination did not allow her to stop from spilling she was more careful about where she took her drinks and cleaned them up more quickly.... its really ass backwards to me... and to top it off, she seems less anxious around me and my responses and seems less defensive... Im not a psychologist and wont pretend to understand the how or why of it, I just know she seems less distracted and anxious and I seem to get more hugs from here and I will take that over trying to "force" her to learn anyday.

    2. Re:More important lesson by Dutch+Gun · · Score: 4, Insightful

      Caveat: It's okay to make mistakes as long as no one was hurt or killed by easily preventable errors. Obviously, that doesn't apply here, so I definitely agree. Sharing your experience and turning it into a teachable moment ensures others learn from it as well.

      It would have been less embarrassing for them to just make up some excuse about a temporary outage, or blame a DDOS attack, or Russian hackers. It's good to remember that when lambasting them about what idiots they are for not noticing this before their DB puked on them. It's tempting to do, but really does nothing but stroke your own ego while at the same time encouraging people to try to hide their mistakes to avoid this sort of public shaming.

      So, yeah, kudos for them for owning up to their own mistake.

      --
      Irony: Agile development has too much intertia to be abandoned now.
  8. Wasn't any Code.org dev around for Slashdot's fail by El+Cubano · · Score: 4, Interesting

    Seriously, was not a single developer or architect from Code.org around when Slashdot overflowed its 24-bit index? I know it has been a few years now, but I'm sure there are folks here who remember threading breaking and all other sorts of problems when it happened. Remember: https://slashdot.org/story/06/11/09/1534204/slashdot-posting-bug-infuriates-haggard-admins

    Granted, that was Slashdot, and while annoying, it was hardly the end of the world This problem with Code.org clearly reinforces "cloud bad" to people who are already fearful of putting their data in the cloud.

    I am guessing that Code.org didn't bother tracking things like how to close to various limits they were getting, but I bet that they are now. In any event, when this happened to Slashdot 10+ years ago, I suppose you could argue that we weren't as advanced. In 2016-2017 there is no excuse for such a critical architectural flaw. To me, it completely undermines my confidence in their entire platform. What other time bombs are ticking under the surface there?

  9. Well duh by cyber-vandal · · Score: 5, Funny

    It's code.org not databasedesign.org

  10. 64bit by ledow · · Score: 4, Informative

    Honestly don't get why everything these days isn't just 64-bit by default.

    You can hit 32-bit limits just buying a memory chip, or bog-standard storage. 4 billion is not a big number in those terms.

    32-bit times are dead.
    32-bit filesizes are dead.
    32-bit memory sizes are dead.
    32-bit file counters are dead.
    Hell, it's not inconceivable that in some things 32-bit user counters could die - with account recreation and spam accounts, surely the big people are having to deal with that.

    Just stop faffing about and use 64-bit for everything, by default, from the start. 8 bytes isn't a huge amount of overhead nowadays.

    But starting with the assumption "4 billion is enough" when some people have more than 4bn in their bank account, some services have more than 4bn users, and people can buy 4bn-whatevers in their local electronics store is stupid.

    But 4 billions lots of 4 billion is not a limit that you will hit for a very, very, very long time. Even 128-bit isn't unseen - IPv6, ZFS, GPUs - and that's 4 billion lots of 4 billion 64-bit numbers each of which is capable of holding 4 billion lots of 4 billion.

    Supercomputer architectures did this a long time ago, translating and assuming everything is 128-bit so that you never have to worry about a limit.

    Why does it take so long for basics like web servers and databases to get there? 64-bit by default, MINIMUM. Anything that incurs a performance hit on that is old, and up to the user to resolve.

    1. Re:64bit by thegarbz · · Score: 2, Insightful

      But starting with the assumption "4 billion is enough" when some people have more than 4bn in their bank account

      Yep, I should bog down my computer processes because someone else is rich. Incidentally how many bits does it take to represent the number 4bn? While we're at it do you realise that the number of planets that humans have colonised is 1? Let's build a database with a 25 year life expectancy, how many bits would you assign to the index? 64bits? Your approach is the reason computers are frigging slow. It's the reason why I wait for ages to open up Chrome on a Quad 1.4Ghz Snapdragon.

      How about instead of just blindly wasting resources you actually learn about statements of requirements and project scopes.

      Supercomputer architectures did this a long time ago, translating and assuming everything is 128-bit so that you never have to worry about a limit.

      Didn't you just say we should use 64bit for everything by default?

      Why does it take so long for basics like web servers and databases to get there? 64-bit by default, MINIMUM.

      No thanks. I'll target 8bit minimum and scale up as needed.

  11. Oh the irony by luis_a_espinal · · Score: 3, Insightful

    Code.org CTO Jeremy Stone gave the kids an impromptu lesson on the powers of two with his explanation of why The Cloud ate their homework. "The way we store student coding activity is in a table that until today had a 32-bit index... The database table could only store 4 billion rows of coding activity information [and] we didn't realize we were running up to the limit, and the table got full. We have now made a new student activity table that is storing progress by students. With the new table, we are switching to a 64-bit index which will hold up to 18 quintillion rows of information.

    The of seeing a programming education site using 32-bit indexes without any form of index space monitoring is both hilarious and surreal.

    Who the hell runs a cloud-based, massively accessible operation with 32-bit indexes? And who the hell runs a production system without database monitoring?

  12. Deja vu by jdavidb · · Score: 3, Funny

    I remember when Slashdot had this exact same problem with comment ids!

  13. Periodic testing and data reviews anyone? by gb7djk · · Score: 2

    Perhaps this will kick someone into looking at the database, as a whole, on a periodic basis to check other limits. Maybe do the odd test transaction or spot trends in other tables which are unexpected? Maybe run some regression tests? Then use this information to tweak the data model in controlled fashion before it breaks.

    You know, like grown ups do...

  14. Re:fucking idiots by tepples · · Score: 2

    Let me guess: Your C89 compiler defined an int as 16-bit and a long as 32-bit, and it provided no 64-bit type. The C type "long long" wasn't part of the standard until 1999.

  15. Just the other day - posts about not needing math by dbIII · · Score: 2

    Not long ago there were some posts here about programmers not needing to know any mathematics.
    It didn't take very long for an article to appear that showed the consequences of not cracking open some books.

    Who would have thought - Knuth seems to have a bit more of a point than the guy who taught himself PHP.

  16. Why a singular index? by phorm · · Score: 2

    A singular index seems like a weird thing to have in this case anyways. Wouldn't it be better to have a multi-column index on something like userid+item rather than an index of all items?