The 32-Bit Dog Ate 16 Million Kids' CS Homework (code.org)
"Any student progress from 9:19 to 10:33 a.m. on Friday was not saved..." explained the embarrassed CTO of the educational non-profit Code.org, "and unfortunately cannot be recovered."
Slashdot reader theodp writes:
Code.org CTO Jeremy Stone gave the kids an impromptu lesson on the powers of two with his explanation of why The Cloud ate their homework. "The way we store student coding activity is in a table that until today had a 32-bit index... The database table could only store 4 billion rows of coding activity information [and] we didn't realize we were running up to the limit, and the table got full. We have now made a new student activity table that is storing progress by students. With the new table, we are switching to a 64-bit index which will hold up to 18 quintillion rows of information.
The issue also took the site offline, temporarily making the work of 16 million K-12 students who have used the nonprofit's Code Studio disappear. "On the plus side, this new table will be able to store student coding information for millions of years," explains the site's CTO. But besides Friday's missing saves, "On the down side, until we've moved everything over to the new table, some students' code from before today may temporarily not appear, so please be patient with us as we fix it."
The issue also took the site offline, temporarily making the work of 16 million K-12 students who have used the nonprofit's Code Studio disappear. "On the plus side, this new table will be able to store student coding information for millions of years," explains the site's CTO. But besides Friday's missing saves, "On the down side, until we've moved everything over to the new table, some students' code from before today may temporarily not appear, so please be patient with us as we fix it."
That doesn't inspire a whole lot of trust in the system. Who did they get to code this thing, elementary school kids?!?
At least there was a back-up... Or not... Not even a 24-hour transaction log... Or not... Way to go code.org... set that example...
Don't trust the cloud as the only place you store your work.
Consider this a real-world lesson for our youth in the ways that design choices can have unanticipated effects on implementation, manageability and viability of software in the long haul. For extra credit, the kids that are affected should be encouraged to explore what they could have done to mitigate the risk caused by some grown-up's oversight.
4 billion rows of coding activity is all we will ever need
It is no surprise to me that the ones creating and operating this platform are just as incompetent as the "graduates" they produce. Mediocrity breeds mediocrity...
Most ACs are not even worth the keystrokes to insult them. Be generically insulted by this and ignored otherwise.
"The way we store student coding activity is in a table that until today had a 32-bit index... The database table could only store 4 billion rows of coding activity information
if it can only store four billion rows, it isnt "the cloud." its just a KVM instance running on a shared hosting facility then, isnt it.
we didn't realize we were running up to the limit, and the table got full.
so not only were you incapable of scaling your infrastructure or your program to handle four billion rows --something every sysadmin on the planet is capable of-- you weren't even competent enough to set up monitoring for it.
We have now made a new student activity table that is storing progress by students.
the ones that lost all their data dont care. the students will leave to try something else, the educators will fall back on lesson plans that werent written by a corporate think tank, and your 'hour of code' will remain just another hour of minecraft in a kids life.
With the new table, we are switching to a 64-bit index which will hold up to 18 quintillion rows of information.
you dont get it. no one fucking cares about your SQL table limits but you, and youre oblivious to the fact that a table with eighteen quintillion rows would never load. code.org will be no different than the spanish or french class in a kids life. a fractional percentage of them will actually go on to use it as a career.
Good people go to bed earlier.
Thank you for teaching the kids the importance of taking responsibility and being honest and open about your mistakes. It's okay to make mistakes as long as we learn from them. Too many people today are afraid of making mistakes and cover them up.
Seriously, was not a single developer or architect from Code.org around when Slashdot overflowed its 24-bit index? I know it has been a few years now, but I'm sure there are folks here who remember threading breaking and all other sorts of problems when it happened. Remember: https://slashdot.org/story/06/11/09/1534204/slashdot-posting-bug-infuriates-haggard-admins
Granted, that was Slashdot, and while annoying, it was hardly the end of the world This problem with Code.org clearly reinforces "cloud bad" to people who are already fearful of putting their data in the cloud.
I am guessing that Code.org didn't bother tracking things like how to close to various limits they were getting, but I bet that they are now. In any event, when this happened to Slashdot 10+ years ago, I suppose you could argue that we weren't as advanced. In 2016-2017 there is no excuse for such a critical architectural flaw. To me, it completely undermines my confidence in their entire platform. What other time bombs are ticking under the surface there?
It's code.org not databasedesign.org
I admit, I've mostly done it for speed purposes, but my understanding is that the record limit is per partition, so you could also use it to deal with record limits.
They could either partition based on user IDs (might be faster to select by for the bulk of the queries), or by date (making it easier to manage autonumber fields).
Build it, and they will come^Hplain.
The people who run code.org don't know how to code.
Oh wait, that wasn't news.
Honestly don't get why everything these days isn't just 64-bit by default.
You can hit 32-bit limits just buying a memory chip, or bog-standard storage. 4 billion is not a big number in those terms.
32-bit times are dead.
32-bit filesizes are dead.
32-bit memory sizes are dead.
32-bit file counters are dead.
Hell, it's not inconceivable that in some things 32-bit user counters could die - with account recreation and spam accounts, surely the big people are having to deal with that.
Just stop faffing about and use 64-bit for everything, by default, from the start. 8 bytes isn't a huge amount of overhead nowadays.
But starting with the assumption "4 billion is enough" when some people have more than 4bn in their bank account, some services have more than 4bn users, and people can buy 4bn-whatevers in their local electronics store is stupid.
But 4 billions lots of 4 billion is not a limit that you will hit for a very, very, very long time. Even 128-bit isn't unseen - IPv6, ZFS, GPUs - and that's 4 billion lots of 4 billion 64-bit numbers each of which is capable of holding 4 billion lots of 4 billion.
Supercomputer architectures did this a long time ago, translating and assuming everything is 128-bit so that you never have to worry about a limit.
Why does it take so long for basics like web servers and databases to get there? 64-bit by default, MINIMUM. Anything that incurs a performance hit on that is old, and up to the user to resolve.
Code.org CTO Jeremy Stone gave the kids an impromptu lesson on the powers of two with his explanation of why The Cloud ate their homework. "The way we store student coding activity is in a table that until today had a 32-bit index... The database table could only store 4 billion rows of coding activity information [and] we didn't realize we were running up to the limit, and the table got full. We have now made a new student activity table that is storing progress by students. With the new table, we are switching to a 64-bit index which will hold up to 18 quintillion rows of information.
The of seeing a programming education site using 32-bit indexes without any form of index space monitoring is both hilarious and surreal.
Who the hell runs a cloud-based, massively accessible operation with 32-bit indexes? And who the hell runs a production system without database monitoring?
All homework for all students stored in a single table with a 32bit limit and this guys are teaching kids how to code?
If they'd split girls from boys in the database ( so they could continue their disgusting policy of de-funding teachers who taught boys ), it would have lasted a bit (sic) longer.
I remember when Slashdot had this exact same problem with comment ids!
Secession is the right of all sentient beings.
For trusting the "cloud".
I would say having a 32 bit number as some kind of ID for activity is not even a database design issue, it's almost a pure programing issue. Any programmer should know better than to keep a unique ID in some kind of 32 bit value... heck the "fix" to move to a 64-bit value is better but not as good as using a for-real UUID which is really more of a standard (and even larger than the 64-bit value), and also something any programmer should know about.
"There is more worth loving than we have strength to love." - Brian Jay Stanley
According to TFS, nothing was lost. They just can't access their stuff until it's moved over to the new database. No disaster. No lesson. No dog. Just off line for a few days.
BFD
The only thing worse than a Democrat is a Republican.
Even back in the 1990 I was taught to use a long rather than an int for database keys.
And people who have been on /. long enough know why as well.
Why to avoid trusting cloud services with any data that you can't afford to lose.
The cloud has nothing to do with choosing the bit size of an index column..that's a total design fail.
See kiddies...just because you can "code" and feel like the cool kids doesn't mean your shit can run in production....
SysOps baby.....that's who rules your shit..
I know the cloud is this big boogyman, and there are lots of reasons not to trust or that are real, but this problem would have occurred on pretty much any database, cloud or not. Cloud ate my homework is a cute line, but this would have happened even if this DB server was some space heater sitting under someone's desk.
it was a 32-bit doge*
Perhaps this will kick someone into looking at the database, as a whole, on a periodic basis to check other limits. Maybe do the odd test transaction or spot trends in other tables which are unexpected? Maybe run some regression tests? Then use this information to tweak the data model in controlled fashion before it breaks.
You know, like grown ups do...
Code.org correctly reinforces "cloud bad" to people who should be fearful of putting their data in the cloud.
FTFY.
Anons need not reply. Questions end with a question mark.
Not long ago there were some posts here about programmers not needing to know any mathematics.
It didn't take very long for an article to appear that showed the consequences of not cracking open some books.
Who would have thought - Knuth seems to have a bit more of a point than the guy who taught himself PHP.
Recently, a lot of technology allows people to know less, and do a worse job.
Docker lets you ignore how to integrate your application into an operating system.
SolrTM let you ignore how your data will be accessed, it will build indexes for every access pattern you could want.
Eventually a programmer will need to know nothing, and will produce the kinds of results that come from knowing nothing.
Computer Science degrees used to be the baseline for the rank and file programmer, then the EE's realized they could get a better job programming than being an EE. Now it's open for just about anyone. Enjoy the results.
It's just someone else's computer.
Glad to see it would never happen to slashdot.
http://michaelsmith.id.au
exposed themselves to kids as completely incompetent rank amateur coders.
Got it.
The guys at code.org should clearly never be allowed to code on any system that matters. No work on avionics, finance, medical equipment, weapons systems, spacecraft, etc. Just stick to meaningless stupid internet-centric time wasting shiny baubles and cell phone apps, guys, because you're gonna kill somebody with your incompetence if you get near anything truly important. It really does not take much more effort to write good, solid code than to shovel crap, but it DOES require you to know what you are doing and think it through.
With the new table, we are switching to a 64-bit index which will hold up to 18 quintillion rows of information.
Is that bigger than a bajillion?
I've calculated my velocity with such exquisite precision that I have no idea where I am.
in a few million years, the table will be full again. And then nobody expected it, again.
To me, it completely undermines my confidence in their entire platform
So do you avoid all companies who have ever had a free product down one time for at least 74 minutes and was completely open and honest about it?
A singular index seems like a weird thing to have in this case anyways. Wouldn't it be better to have a multi-column index on something like userid+item rather than an index of all items?
Please use GUID identifiers or a composite key if there is to be a lot of data in a table.
If 9,223,372,036,854,775,807 rows is enough, use the 8-byte bigint instead.
GUID is good if you need to spread the hot data around as neighboring values may have vastly different ids.
An integer value is better suited to partitioning the data.
Contrary to popular belief, don't use integers for primary indexes. Multi column "natural" indexes can handle way more rows.
Just because the old databases used "record numbers", doesn't mean you have to... ;-)
This the sort of thing that happens when engineers (especially software engineers) don't think outside of the box and considering the consequences of the code they write.
Sometimes, real fast is almost as good as real-time.