Slashdot Posting Bug Infuriates Haggard Admins
Last night we crossed over 16,777,216 comments in the database. The wise amongst you might note that this number is 2^24, or in MySQLese an unsigned mediumint. Unfortunately, like 5 years ago we changed our primary keys in the comment table to unsigned int (32 bits, or 4.1 billion) but neglected to change the index that handles parents. We're awesome! Fixing is a simple ALTER TABLE statement... but on a table that is 16 million rows long, our system will take 3+ hours to do it, during which time there can be no posting. So today, we're disabling threading and will enable it again later tonight. Sorry for the inconvenience. We shall flog ourselves appropriately. Update: 11/10 12:52 GMT by J : It's fixed.
*Clap clap clap*
Curiosity was framed; ignorance killed the cat. -- Author unknown
Please do.
The right to offend is far more important than the right not to be offended. (Rowan Atkinson)
... roll over to be a last post?
Anyone could have made the mistake.. good to keep us all in the loop though :)
And let this be a reminder to the kids - RTFM, twice!
welcome our 2 to the power of X overlords.
Last post!
Alright, who's the joker who posted the 16,777,216th comment?
:D
Thanks for breaking slashdot, jerk
LegendMUD
Its like y2k, only worse!
As Nelson Muntz would say "HA HA"
If it's too difficult, I can't understand it !
Does this mean that comment id#16777215 has the longest thread in history?
Can anyone actually find it to see - I tried but could only get to 16777217, its likely to be in a journal or just a reply to an older article.
liqbase
As if a thousand geeks all made the same damn "last post!" joke at once. . . . . .
I mean, look how quick we got to 16M comments. 4.1 Gigacomments will come in hardly any time at all. I predict we'll be doing all this again in merely a few weeks!
SIGSEGV caught, terminating
wait... not that kind of sig.
I wonder who posted comment #16777216. That person should win some sort of "I borked Slashdot!" award.
Slashdot Burying Stories About Slashdot Media Owned
...why wasn't this problem discovered on the dev system in advance?
"I use a Mac because I'm just better than you are."
Taken from http://franksworld.com/blog/archive/2005/01/04/600 .aspx
Chapter 2: Destructional Patterns
2.4 Detonator
The Detonator is extremely common, but often undetected. A common
example is the calculations based on a 2 digit year field. This bomb
is out there, and waiting to explode!
...comment 16777215.
Mmmm... CT, are you sure the parent index was your only problem?
"2^24 comments ought to be enough for anyone" -- CmdrTaco
Some of you are asking which comment it was that got the cid 16,777,216. The answer is that none did. For redundancy, Slashdot is now running multiple-master replication which skips values for auto-increment. Our db-1 assigns odd-numbered primary key IDs, and db-2 assigns even-numbered. Right now writes are going to db-1 so newly created rows will have only odd IDs.
The comment that got 2**24-1 was this one, if anyone cares :)
Sorry about the inconvenience, everyone.
That's cool, I'll just pretend I'm on Digg, with its 1981 Commodore 64 BBS-style threading.
Wait..sorry Commodore fans. I know it had better threading than Digg.
Haggard admins? Does this mean that the Admins will go buy some meth and get a massage?
Uh, this is a reply to the 8th post down from the top (remember to use this like an array and zero reference). Yes, I'm talking to you, admdrew.
You claim that the 16,777,216th comment would have broke it but I contest that actually the 16,777,217th comment poster would be the culprit. Since it should be able to handle that many comments if it is zero referenced, and it would actually be the one after that one that would break it. You laugh but these kinds of problems plague a lot of coders?
If you don't agree with me, please respond below and reference my comment ID.
My work here is dung.
... should have been enough for anyone.
We don't see the world as it is, we see it as we are.
-- Anais Nin
No threading? Welcome to Farkdot.
'Yes, firefox is indeed greater than women. Can women block pops up for you? No. Can Firefox show you naked women? Yes.'
Lars T.
To the guy who modded me down from perfect to terrible Karma - Apple haters still suck
Dupe! I TOTALLY posted this story like, last WEEK man! (I laugh, but I betcha someone might post this in seriousness)
USE colorful confetti ON heavily-armed clown
Flogging and Haggard in the same sentence? If we can get "crystal meth" in, we'll hit the trifecta!
The truth about Scientology, Xenu, and you: Operation Clambake
I used to work at Comair. Remember, that airline that stranded about 10,000 people in the airport a couple of Christmases ago? Same deal. Program was capable of handling only a certain number of changes. Hopefully your president won't have to resign.
So is the bug still in the CVS revision of Slash, or was it fixed 5 years ago and Slashdot never applied the patch?
Give a 2^0-year Slashdot subscription to the guy who hit the limit and one to the the first non-administrator guy who successfully posted after the fix.
If you can find the first guy who COULDN'T reply due to the limit, give him one too. He deserves something for his trouble.
Knowledge is how to play a game, intelligence is how to win, wisdom is knowing what game to play.
Actually, comment 16,777,217 couldn't break it, because that comment's parent cid could have only been 16,777,215. Up until then, there wouldn't have been an overflow value put into the db.
There's no telling which comment it is, because (16,777,217 + 2n) might not have been a reply, meaning it would come up correctly.
:(){
Does this mean that Slashdot is going to denegrate into Digg now?
My blog
Reply to comment number 16786251:
...but it's probably not zero-referenced. Typically, ids in SQL start at 1.
``You claim that the 16,777,216th comment would have broke it but I contest that actually the 16,777,217th comment poster would be the culprit. Since it should be able to handle that many comments if it is zero referenced''
Please correct me if I got my facts wrong.
I certainly admit I wasn't thinking 0-based when I wrote that. The question is, though, should we blame the person who wrote the last valid comment (therefore ruining the fun for the rest of us), or whoever wrote the first broken comment?
Also, is everyone going to add the obligatory 'parent' link on their posts today?
[ Parent ] - [ Reply to this ]
LegendMUD
Any thoughts on making the DB publicly accessable other than through teh Dot? Not sure what I'd do with all that data, but I'm sure these's a grad student somewhere who'd love the opportunity...
If brevity is the soul of wit, then how does one explain Twitter?
Brillant!
I always wondered where Paula Bean ended up...
mod parent up
The best education consists in immunizing people against systematic attempts at education. - Paul Feyerabend
Uhhhh who's your daddy?
A fool throws a stone into a well and a thousand sages can not remove it.
Slashdot being a news (for nerds) site, I would expect that the usage patterns are such that a huge majority of the content accessed by users is very recent -- say, perhaps, 90% of the database hits are for stories and comments that were posted in the last week.
So why, pray, is this usage pattern not accounted for in the database design?
Mod parent ... wait..
LOST Post!
http://slashdot.org/~themusicgod1/journal/137880 ...ok, so it was obvious...
Let the flamewars begin...
Web Sig: Eddy Currents
Take all the time you need, I'm more than willing to refrain from posting durin.... Oh shit!
At least nobody can feed the trolls now!
If Slashdot released the Slashcode more frequently, with more/better comments/docs, and encouraged some of the many of us who complain about bugs/features to help the project, then it's more likely that someone would have debugged this bug earlier.
Open source - it's not just a buzzword, it's a way of life.
--
make install -not war
And this is why you should not have arbitrary limits in your programs, ladies and gentlemen. Not even limits on the values your numbers can represent
Now this is a real Slashdotter! This guy knows how to build an infinite computer!
Terrorists can't threaten a country's freedom and democracy. Only lawmakers and voters can do that.
Why on earth does MySQL have a 24 bit integer datatype? On what platform does it even remotely make sense to use that in the first place? It's going to get cast to 32 bits for any arithmetic operations anyway, and on most platforms today alignment requirements are going to pad the extra byte in memory and disk, so you're not even saving any space. Why even give someone the option over choosing between 16 bit and 32 bit integers?
This
poot_rootbeer asks why all the comments are in one table, when the data access pattern is such that 90% of our hits are on only the most recent entries in that table.
The answer is that we used to do it this way but it's a huge pain. In 2000 we converted from having two tables for 'stories', recent and archived, and merged them together. The performance hit was not big, and it made the code so much simpler it was a no-brainer.
It's the database's job to cache properly whether we split the table or not, and the database does that just fine. The only performance problem could be when there is a rush of inserts, or updates to the same sets of rows, spanning both newer and older portions of the table, and that just doesn't happen.
If we did want to do this we wouldn't split the tables manually; the code complexity is too high a price to pay. In MySQL 5.0 we would use a MERGE engine, which has issues of its own but would involve smaller changes to our code. That's still not worth it for us. What we're probably going to do is wait for MySQL 5.1 to get out of beta and then do some performance testing on tables partitioned by date and see if that gains us anything. For example, a SELECT on our comments table could be limited with a WHERE clause to only retrieve rows with a date >= the discussion object's date, which for 90% of our queries MySQL 5.1 could optimize to only look at the most recent partition. If the gains turn out to be significant, then since partitioning involves very limited code changes, we'll probably do that. Generally speaking, though, database performance is not a problem for us. So far our main bottlenecks have been CPU and RAM on the webheads. As long as we don't do anything stupid our database performance has been fine, though, as today proves, we are quite capable of being stupid.
[ Parent ]
....yet another non-existent comment numbered 16777215. And another one. And another one.
Normally, accessing a non-existent comment gets you either the "nothing to see here" message or the "can't find that comment in this discussion" message. Where are the ghost comments coming from?
Any thoughts on making the DB publicly accessable other than through teh Dot? Not sure what I'd do with all that data, but I'm sure these's a grad student somewhere who'd love the opportunity...
/. that people didn't plan on having released to the internet at large. Passwords, for example (even if they're stored only as hashes, getting the whole DB would make it feasible to crack them); real email addresses, real names...I assume that the subscription process doesn't involve actually storing credit card information in the DB (I don't know; I've only used PayPal), but that might be another concern.
Not just grad students; as a DBA by profession, I'd love a crack at the DB. If nothing else, it would give me a great place to play around with MySQL. Not to mention the ability to maybe extract some interesting user-level statistics.
Of course, the odds of this happening are pretty damn low - there'd have to be an awful lot of work and review done to scrub the DB of information that is entrusted to
Just the email addresses would be a huge deal - can you imagine the market value of such a targeted list of addresses?
In short, it would be fantastically cool for them to release the DB, but it would be a lot of work on their part for no particular return. Not to mention that if they released it once, they'd no doubt be pestered to keep releasing periodic updates...then there's the bandwidth issues...and, even, the potential copyright issues (/. doesn't own the copyright on posted comments, the poster does)...then the copyright issues for stuff they do own; releasing the DB would make it trivial for a bad actor to post a mirrored slash. A little bit of domain typosquatting and some ad deals, and you could be talking about real money.
If I were them, there's no way in hell I'd even think about doing it.
But it would be cool.
Parent
Reality has a conservative bias: it conserves mass, energy, momentum...
Would not have happened if Slashdot used PostgreSQL.
Let the flamewars begin...
Unthreaded flame wars are much less enjoyable.
The number 2^24 is of interest to digital computer artists, as that is the number of unique colors combined in the commonly implemented "True Color" RGB8 space. That color space is looking pretty limiting in some respects, but that is truly a lot of unique colors when you think about it. A 16 megapixel image does not need to repeat any color used.
If all slashdot posts from the history of Slashdot were sorted into color bins,Once that were done, people could simply post their replies as a reference to existing posts. "Hey, #938D3A to you, buddy!" "Know what I think of that? #F2C2A9!"
[
Ah, MySQL. Where trying to insert a row with a column value larger than the column can actually store results in MySQL clipping it to the max value.
Fortunately, as of MySQL 5, you can fix this problem.
So, yes, early versions of MySQL had a brain-dead default SQL mode that simply "corrected" invalid column values, but MySQL 5 fixes this.
Now if only they would add column constraints...
You are in a maze of twisty little relative jumps, all alike.
One day, Cmdr Taco is designing his database, and he sits down at a table with three integers on it. First, he tries the baby bear's integer, but exclaims "2 meager bytes is way too small for my appetite."
Next, he tries Papa bear's integer, but proclaims "4 bytes is way too big for my little site, I'd just end up wasting so much."
Finally, he tries Mama bear's integer, and extols "3 bytes is just right," not noticing it was really the same as Papa Bear's bowl in disquise.
This
Get your mod points ready, this is off topic, but considering the current state of discussion anyway, I don't feel so bad about it.
/. admins won't (and shouldn't) consider releasing a copy of the /. DB to the public, something occurred to me.
/. are owned by the poster, according to that one line that shows up on all the comment pages (specifically, "The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.") At the same time, though, /. doesn't provide a method for having comments you've made removed from the DB.
/. from displaying them in future? Or is there some kind of implicit license in posting on /.? Did I clicksign an agreement covering this when I joined (this was getting on towards a decade ago, so I really don't remember the joining process at all)?
Regardless, while writing this post regarding why the
Comments on
If I own the copyright on the comments I've made, shouldn't I be able to rescind publication rights on them, and prevent
Or are publication rights, once granted, irrevocable?
Of course, I suppose asking questions when there's no way for people to hit reply is a specific form of vague insanity...still, I'm curious.
Reality has a conservative bias: it conserves mass, energy, momentum...
I'd really like to see it. I bet it goes something like, "what's this stupid web thingy anyway? I bet it'll never make it to version 2.0..."
The Kai's Semi-Updated Website Thingy
Sorry, I think index was a bad choice of words.
Whenever a post is made and it has a parent cid, that number must be stored in the table.
If MySQL saturates instead of rolling over (see this comment), then all replies after comment 16,777,215 will have the wrong parent cid, and I don't think there's any way to fix it.
:(){
Ten points for honesty!
At least they didn’t try to make bullshit excuses. I respect them for being up front about the real nature of the issue.
Quantum materiae materietur marmota monax si marmota monax materiam possit materiari?
As your punishment, you should write some kind of data-mining algorithm that starts from the point you disabled threading and try to construct intelligent threads based on the subject and the body of comments...
Twelve-and-three-quarter inches. Unyielding. This wand belonged to Bellatrix Lestrange.
I guess this is another thing to add to the MySQL gotchas page. Of course, in a decent database engine, like PostgreSQL, if you alter a column data type then the indexes are updated to reflect this.
With all teh funnae posts about it, let me be the first one to ask: why were you using 3-byte integers to begin with? Why would anybody anywhere ever use these for any reason at all? What advantage to these have? Why was this table laid out like this? This doesn't make sense to me at all. Were you really imagining that shaving a byte off each post was going to save you DB space? I can't quite believe that. But than what exactly would be the motivation for using such an odd integer size?
We're all born with nothing.
If you die in debt, you're ahead.