Schooling Microsoft On Random Browser Selection
Rob Weir got wind that a Slovakian tech site had been discussing the non-randomness of Microsoft's intended-to-be-random browser choice screen, which went into effect on European Windows 7 systems last week. He did some testing and found that indeed the order in which the five browser choices appear on the selection screen is far from random — though probably not intentionally slanted. He then proceeds to give Microsoft a lesson in random-shuffle algorithms. "This computational problem has been known since the earliest days of computing. There are 5 well-known approaches: 3 good solutions, 1 acceptable solution that is slower than necessary and 1 bad approach that doesn’t really work. Microsoft appears to have picked the bad approach. But I do not believe there is some nefarious intent to this bug. It is more in the nature of a 'naive algorithm,' like the bubble sort, that inexperienced programmers inevitably will fall upon when solving a given problem. I bet if we gave this same problem to 100 freshmen computer science majors, at least 1 of them would make the same mistake. But with education and experience, one learns about these things. And one of the things one learns early on is to reach for Knuth. ... The lesson here is that getting randomness on a computer cannot be left to chance. You cannot just throw Math.random() at a problem and stir the pot and expect good results."
Hmm, there's a nice shuffle implementation in Java that Microsoft could use... Oh, wait...
Rgds
Damon
http://m.earth.org.uk/
just requires something like:
pickabrowser() {
if (rand()>0.05) {
use IE
} else {
pickabrowser()
}
}
Nobody said anything about bias.
if we gave this same problem to 100 freshmen computer science majors, at least 1 of them would make the same mistake (as Microsoft)
Ouch, that's gotta sting.
What's the problem? It's random enough for a browser selection screen.
This isn't an application where a statistically random shuffle is required.
There were some folks a while ago who wrote up a security investigation they did of Party Poker's site.
One of the problems they had was a terrible, non-uniform shuffling algorithm. Completely different problem than what MS did, but interesting nonetheless. (I actually guessed that this is what MS did, but that's not the case.)
Showing a browser selection has been imposed on them and these geeks think MS is going to select the best approach possible for randomness. No wonder none of you are sucess in business.
Given that each user is only going to see this screen once per computer, I'd say simply using the seconds of the current minute as a random seed should be OK. Can't see why you would need more randomness that that in this particular situation. Just make sure that the distribution of browsers evens out for all seeds...
They could as well just have used the last millisecond to show the browser. I mean, it's a screen shown only once to a user. What's more random, and uniform, than the time the screen appears in milliseconds modulo 5?
Why not? Is the author suggesting that random functions in use today are somewhat deficient? What is his solution?
You know, it's really too bad that the author of the article the summary linked to didn't write up an article answering exactly that. Then maybe Slashdot could have linked to it.
(In a nutshell, the answers are, respectively: "because plopping a 'rand()' into your code doesn't mean that what you'll get out is uniform", "no", and "use a shuffling algorithm that works.")
People like the higher-ups at Microsoft (or most companies, I believe) do not care or want to hear about these issues.
:-|
If it does not involve a bonus (for the executive) or making them look good (the executive), engineers have to shut up and smile.
-Todd
p.s. Of course, this is my opinion--not what I would do, and it goes against good ethics. But, who in Silicon Valley cares about ethics?
Omne ignotum pro magnifico.
A Google search on:
gives exactly the bogus answer that Microsoft used in the top hit.
Unfortunately for Microsoft, a bing search gives the same top hit.
What the problem actually seems to be is not that they're using random, but how they're using it.
What they essentially did was use something like qsort() to sort the list, but in their comparator function, instead of returning which of the two strings comes first, they return some random crap instead. qsort() or whatever they used was designed to have results that are consistent, and with input like that it could potentially abort and leave the list entirely unsorted. Or if you're really unlucky it could sit there sorting them forever.
There are 5 well-known approaches: 3 good solutions, 1 acceptable solution that is slower than necessary and 1 bad approach that doesn’t really work. Microsoft appears to have picked the bad approach.
We are still talking here about the random selection of browsers, or something more broad?
...they should ask the Debian folks.
... nine nine nine nine nine nine ...
http://dilbert.com/strips/comic/2001-10-25/
Microsoft was so grateful, because Microsoft cares.
Fuck systemd. Fuck Redhat. Fuck Soylent, too. Wait, scratch the last one.
It is probably a combination of two things:
1) Hate for MS. MS is doing what some have said they've needed to do in giving users browser choice, and they've done so as to try not to promote any given one. While that makes proponents of choice happy, it makes MS haters mad. The more MS does to try and accommodate users and play fair, the less there is to hate on them for legitimately. As such haters are going to try and find nit picks to bitch about.
2) General geek pedantry. Many geeks seem to love to be exceedingly pedantic about every little thing. If a definition isn't 100% perfect, at least in their mind, they jump all over it. I think it is a "Look at how smart I am!" kind of move. They want to show that they noticed that it wasn't 100% perfect and thus show how clever they are.
Doesn't matter, it is what it is and as you said, random enough. This guy can whine all he likes.
"But I do not believe there is some nefarious intent to this bug"
The article states that IE is more likely than any other browser to appear at the bottom of the list. To me, this is one of two optimal positions (top or bottom being easiest to pick out).
Microsoft is so well known for dirty tricks I'm sure that this is not an accident
looking at the outcome IE comes off the worst with the current algorithm, please keep it that way. Thanks from all the Web Developers.
my band is more brutal techno punk than yours
i'm trying to figure out why they went to the trouble of screwing things up as bad as they did, a trivial decrementing for loop pulling randomInt(1,i) and placing it in slot 5-i would be the obvious off the top of the head solution, and it would work.
Snowden and Manning are heroes.
The only people where the selection and any possible bias inherent in the implementation of the random() function are the last group, which is also quite possibly the smallest of the three sets. In an ideal world, you are going to get a bellcurve with the optimum position being in the middle and the least favourable at the sides, so if Microsoft wants to implement random() so that IE is more likely to end up in the position that is joint least likely to be picked of the big five browsers, that's fine by me.
It's also apparently fine for Google's Chrome as well, so if you are anywhere near Ballmer's office in the next few days then I'd keep an eye out for any flying chairs, just in case...
UNIX? They're not even circumcised! Savages!
No, Math.random is not the problem, the problem is how it is used. They used it as random input to a sorting algorithm without considering how the sorting algorithm works. The assumption that any sorting algorithm with inconsistently random input = random order is wrong. If they had assigned a random value to each element and sorted by that value the result would have been truly random as the value associated with each element would have been consistent.
What they essentially did was use something like qsort() to sort the list, but in their comparator function, instead of returning which of the two strings comes first, they return some random crap instead.
That algorithm seems really wrong, for the reason you mentioned - that potentially you get an infinite loop.
However after further thought, I don't think that result is possible, and I find it more interesting to think about why this does not work as intended. After all, if you are randomly comparing random numbers why is the end result not still random?
First, why it is not going to ever result in an infinite loop - because pretty much any sorting algorithm is not going to engage in an infinite number of comparisons. Pretty much any technique I can think of is always going to be left with an assumption about a growing number of elements and where they fit into place, so even though the comparison result is totally random the sort will always complete (the results will just always be wrong, if you can call any ordering based on a broken comparator wrong).
I think that the non-randomness of the result is related, in that that assumption is made as to where a value falls and then the remaining elements fall into line after that result... which might explain how one element like Safari was so out of whack when the others were also out of balance but to a lesser degree, as the initial list is probably always fed in the same order and the algorithm works on one particular element first to start the sort (which might also explain the browser variation, having slightly different sorting algorithms).
"There is more worth loving than we have strength to love." - Brian Jay Stanley
Given that each user is only going to see this screen once per computer, I'd say simply using the seconds of the current minute as a random seed should be OK.
A) That was not the problem.
B) Consider the result instead of the algorithm is it OK to have your "random" list just about always present any one choice in the bottom two elements? Because that is what happened for Safari.
If you aren't going to insist on a list that's even close to random then you should not make randomness a requirement.
"There is more worth loving than we have strength to love." - Brian Jay Stanley
One solutions takes 3 seconds, can be done by an intern, and makes the company no money. The other solution takes a little bit of time, maybe some reading or prior knowledge and still makes the company no money. The results yielded for each solution are acceptable for the situation. Given the cost to profit it seems like Microsoft chose EXACTLY the right solution.
This is like your community telling you that you must have a fenced in yard for your dog to be off the leash and then setting up a cheap 6-foot standard wooden fence and then the local anti government militia guy laughing at your ignorance because everyone who knows anything about fences knows you choose the solution that's 12 feet high with curved top to prevent climbing and a sunken base of 3 feet to prevent dog-tunneling.
Here's the problem - consider the results again. Safari will almost always (almost 50% of the time) be put in the bottom two elements. In fact depending on the algorithm used it's 40-50% chance of being put in one exact slot (either choice four or five).
When the whole point of the list is promote browser competition, it makes no sense to accept a list which is that skewed for ANY browser result from the list. You need to have it properly shuffled so that no one browser has a statistical advantage or disadvantage - if you are going to claim it doesn't matter then why not let Microsoft set an arbitrary fixed order for the list?
That is not what the legal injunction against them says they can do, therefore the randomness of the results DO matter. Just as in most things in life, correctness of results is actually important.
"There is more worth loving than we have strength to love." - Brian Jay Stanley
Out of curiosity - why would they need to seed the random number generator?
Wouldn't -not- seeding it be better? That way MS don't have to include some seed number in their script and have everybody thinking they're manipulating the results by using seed numbers that they know would generate a certain ordering in various versions of IE.
I actually found this bit...
Although I can't verify this, as the test page provided... http://www.robweir.com/blog/shuffle.html ...won't actually run in IE8 here :D
That said.. odd method of getting random results out of an array.. I'm sure it's more efficient but KISS would dictate you grab one random result out of the array, delete the one you chose, pick a new one, delete, etc. until the array is down to 1 and you stick that last one at the end. Can't really go wrong with that unless the random number generator itself is hosed, and even non-mathematicians can understand how that one works. It's not like it's a high-performance application.
According to the description, it returns either 0 or < 0.
That's not the way the comparator function works in any implementation I've seen. Hopefully, the test code the author wrote wasn't as random as the documentation.
Excel formula:
=ROUND(RAND(),1)*10 .
This works for up to 11 different browsers (yes, 0 is a number).
kdawson obviously skimmed the article for a few seconds and attributed the flaw to Math.random and Microsoft.
Of course, hindsight is always twenty-twenty. We now know a shuffling algorithm should have been used and Microsoft could have evaded prosecution by including a random^H^H^H^H^H^Hshuffled list of browsers for shuffling the list of browsers instead of IE only to run the browser selection process.
i'm trying to figure out why they went to the trouble of screwing things up as bad as they did, a trivial decrementing for loop pulling randomInt(1,i) and placing it in slot 5-i would be the obvious off the top of the head solution, and it would work.
No, it wouldn't work. Here's a possible run of that algorithm:
Initial:
a = -----
Step 1: randomInt(1,5) -> 3
a = ----3
Step 2: randomInt(1,4) -> 1
a = ---13
Step 3: randomInt(1,3) -> 2
a = --213
Step 4: randomInt(1,2) -> 1
a = -1213
Step 5: randomInt(1,1) -> 1
a = 11213
Doesn't look like a random shuffle. Indeed, it doesn't even look like a shuffle at all.
Indeed, your algorithm has the following properties:
The Tao of math: The numbers you can count are not the real numbers.
Out of curiosity - why would they need to seed the random number generator?
Wouldn't -not- seeding it be better?
I can't speak for Javascript but in most C libraries - at least in the DOS era when I last made this mistake - the library would default to a static seed of its own during program initialisation, probably 0 or something.
As a result, if you made a program that started, output 15 random numbers and then quit, each time you ran it you would get the same 15 numbers. :P
What you're supposed to do to avoid that is seed the RNG at the program start, using the current time since the epoch as the seed. This is what I was referring to, not using a static seed
People don't take us seriously because we think that software should be implemented correctly, and that the programmers employed by the world's most successful software company should know trivial algorithms? Ri-i-i-ight.
Anywhoo... So what you're telling me is that Microsoft's programmers made a mistake for a production system that 99% of freshmen CS students wouldn't make? In this case, I think you're actually giving too much credit to Freshman CS students...
I'm trying to teach myself to set people on fire with my mind... Is it hot in here?
On the down side.. it's a webpage.
To cover the first bit.. it's a webpage - if a potential problem has been found, it can be fixed. I.e. this random selection thing? They can implement a better one.
Though I do wish they would start by fixing the page's use of javascript to sort the results -after- their display.
Turn off Javascript.. the order appears to always be: IE, FF, Opera, Chrome, Safari.
In fact - if you have a slow machine like I do.. leave javascript on, and just refresh the page.. look closely, and you'll find that -that- is the order of display *before* it gets shuffled by the javascript.
On the down side... if they were to detect the user not using javascript, and would switch server side-shuffled results instead, people would wonder how fair -those- are.. and since they can't peek inside the server, there's no way to tell.
Which leads to another down side... it's on a server out of our control - it's entirely possible, albeit unlikely, that one out of N hits favors browser X by design.
note: I don't really care much.. I don't think this should have been pushed through at all. But if they -have- to do it, I'd rather they do it well, just because it would have prevented the flurry of articles on this and other issues (such as the rollout not being instant, but stretched out through April(?) - probably to keep media attention / panicked users calling their tech support to a minimum), even in times to come.
Well unless there is something else going on Rob needs to go back to school too. A simple quicksort algorithm with the comparator replaced with a random choice becomes a recursive random partition of the array. A simple inductive argument shows that this will not produce the unbalanced results he found. So the problem is elsewhere (most likely in the RNG seeding or a bias in the pivot selection, if he is running it in a tight loop and the seed is something like, current second, then the same seed may appear several times).
A-ha... crystal clear - thanks for clarifying :)
Just to check - IE8 certainly seems to do this - whether it does it properly, I wouldn't know, but it's certainly not the same results each time I fire up IE8 with a test page (just printing out 10 random numbers).
Certainly not ;) But if MS were to seed the number, they'd still have to get that seed out of somewhere.. though I suppose they could use javascript date/time functions for that.
There's only ONE trully random-compliant function: http://xkcd.com/221/
I use it everyday. Funny, though, 4 turned out to be my good-luck number since I started using it...
Have you heard about SoylentNews?
And to follow up after re-reading the article he claims that non-termination is possible. But since in quick-sort each recursive call is to a smaller list (we removed the pivot element otherwise the algorithm has real problems) the sort must always terminate.
Now this might not be a very elegant solution and definitely violates the abstract specifications of a sort routine but the problems pointed out by Rob are not actually the real problems. Way to go slashdot, always making sure articles that slag on Microsoft are factually accurate .
I'm almost certain he's suggesting you splice out the array item chosen each time so that the remaining indices are enumerated as 1-X at all times. I'm pretty sure that would work.
I'm not convinced it is simpler, although I see a couple people on slashdot suggesting it is. Adding a random comparator to an existing sort function is dead simple and obvious (wrong, but simple and obvious), and I'd be surprised if it weren't a lot more than 1% of freshmen that come upon that one, particularly when the top search results actually suggest that broken algorithm, unless these freshmen had never encountered sort functions before.
Microsoft used bogus, non-static, comparisons of random numbers to "sort" the list, but they generated new, meaningless random numbers for every comparison. IE must have been the last entry, and had a 50% chance of the sort loop exiting the first time it was encountered (leaving it in the last position). IANAProgrammer and this is an obvious error to me. I thought TFA was going to be about a poorly seeded random number generator.
Isn't the *middle* choice going to be more important than the leftmost one?
We have a screen-centred dialog here that shows five browsers. One is going to end up smack-middle on the screen - and guess what - Windows centres the mouse pointer when it boots, which is when the screen will appear.
People often go for middle ground. Then they go to the direction they read... So in my mind, the 3rd choice is going to be the most important, followed by 4 for LTR cultures, and 2 for RTL.
But I'm talking out of my ass here... Any research into the importance of elements in a short horizontal list?
As another poster said there is a slight error. A fix would be to use a vector holding the remaining browsers to randomly select as the next item in the shuffle list. Or you could use probability theory to shift the list of randomly generated indices so only remaining indices as selectable in each iteration.
MS should have kept it simple. There's no need to overcomplicate something so trivial.
Ok in all fairness, bubblesort is arguably the best sorting algorithm because in today's modern computing power it will do the job sufficiently for 99% of your problems and is easy to implement and verify. Its especially efficient when appending an already sorted array to another already sorted array, which some other popular algorithms can't claim.
And the naive solution to a shuffling an array is similar to bubblesort, traverse through the array and generate a random number up to array size to swap the element with. If current position + number > array size then deduct the array size from the number to get the position to swap with. Maybe I should patent that because it apparently isn't obvious.
I think I have to agree that this looks like a situation where an author, likely a young one, is trying to nerd off when they dont understand how Microsoft operates. My question is why would you go through this kind of effort in the first place...
Still caring about which browser people are using? Some people just can't let the fuck go.
It falls right up there with what kind of toilet bowl cleaner I use on my list of things that concern me.
I'm a senior programmer, over 10 years experience. I don't think I would have EVER imagined a solution to randomizing that would involve a sort comparison function returning a random result. This is because I have an understanding of the implicit (and in some languages, explicitly documented) total ordering contract imposed on the comparator function.
But you know what? If Microsoft doesn't care for nice things like correctness, why would any big software development company? Excepting cases like Wolfram Research's Mathematica and Autodesk's AutoCAD, of course. And yet they have the balls to call their people software engineers?
I'm not disagreeing with the conclusion that the quick and cheap way is sufficient and satisfactory from both the business standpoint and for this problem the technical standpoint. I'm simply saying that, as a profession, the field is so compromised with crap like that this that it's no wonder most software sucks. It's easy to see how an experienced and knowledgeable developer can be replaced by a couple of best-low-bid hacks.
Of course management rarely sees or understands the difference between throwing a hacked up function at an unimportant problem like this one, and doing the same thing when designing the firmware for a car, a risk calculation algorithm for derivative investments, or the math portion of a CPU. As a result, they'll promote the hack who spewed out this junk to a position working on something important, and the same bad algorithm will get used when it matters, and off we go with the people dying and the lawsuits and the congressional investigations.
"Never ascribe to malice that which is adequately explained by incompetence" One thing I couldn't help but notice though, is that Microsoft always pops IE in the number one spot for a moment *before* shuffling the browsers and showing them in randomized order... Very visible if you visit the ballot manually in IE and hit F5 a few times: http://www.browserchoice.eu/
Both the article and the summary mixes up the concepts. Randomness and bias are related but different things. Think of a biased coin loaded in favor of heads - the heads may appear twice as often as the tails, but the distribution is still random. Here too, contrary to the summary's claim of "far from random", the results are random, just biased, and biased against IE, if I may add, which is an important fact the summary omitted.
Leaving the random number generator unseeded can sometimes produce skewed results. It used to be that you were almost guaranteed to get the same output sequence, especially in optimized languages, as the default seed was some fixed value. This isn't as much of an issue these days, as most get properly pre-seeded with the time or some other variable. But it is still good practice to properly pre-seed your code, especially for old systems / guaranteed randomness / stodgy old manness.
The ______ Agenda
Why does it have to be random? If they're not taking into account market share or other factors, evenly dividing the possible permutations of browser order should be more "fair" than a "random" order. So you could cycle the permutations in order based on serial number. The only way this wouldn't work would be if you do multiple installs/selections per OS. In that case, make every install go to the next permutation in the list. You'd have to phone home to track this, but Windows phones home constantly anyway.
Of course, they'd screw up the implementation of this approach too.
one would think that a company such as Microsoft that has been owning the software market for decades now would know how to implement a randomizing algorithm correctly.
WTF? They can't even get their slashes round the right way.
Deleted
Yes, because Microsoft is the only company that puts out software with vulnerabilities it in. I suppose that's why my Mac and Linux boxes install security updates all the time too.
Don't take life so seriously. No one makes it out alive.
1 acceptable solution that is slower than necessary
Good god, they're shuffling an array with 5 freakin elements. The "slow" solution would have been more than adequate. In fact, I would argue it is the best solution since the code would have been simpler and easier to maintain (and quicker to download).
Ummm, just how many times does this man think people will register there new copy of Windows or are people going to be given a browser choice every day/Everytime they bootup?? Sometimes people just take things that don't matter way too far.
Jack of all trades,master of none
Well unless there is something else going on Rob needs to go back to school too. A simple quicksort algorithm with the comparator replaced with a random choice becomes a recursive random partition of the array. A simple inductive argument shows that this will not produce the unbalanced results he found. So the problem is elsewhere (most likely in the RNG seeding or a bias in the pivot selection, if he is running it in a tight loop and the seed is something like, current second, then the same seed may appear several times).
You do realise that quicksort is whole family of algorithms? Most library routine writers consider it a bug if you sort the same data twice and it comes up in a different order, so they tend to avoid random pivot and use median of 3 or 9 pivot instead. These will give biased shuffling. They also also switch to insertion sort for small subarrays which will lead to biased shuffling.
Closing the walls partitioning takes 1/2 to 1/3 times as many swaps as the Lomuto partitioning (which is a simple left to right for loop) so it's usually the preferred method to partition. Lomuto also goes quadratic for equal valued elements. Most closing the walls partitions use the pivot as a sentinal, but with a random comparison the indexes can walk right past the end the subarray and generate an array index error.
I developed one of the good implementations in school... god, Microsoft is such a pile of tinkerers! Is there anything they can get right, that a highschool student could not? No, I don't mean marketing, bribing politicians and blocking competition! They exist for 35 years and - heck - they still have problems with leap years!
sometimes I get the impression that they are trying the "a thousand monkeys on a thousand typewriters" approach...
The MAFIAA is a bunch of mindless jerks who will be the first up against the wall when the revolution comes
Talking about random shuffles, does anyone know of a shuffle algorithm emulating a "real" dovetail shuffle of a deck of cards?
In the late 80's, Dave Bayer and Persi Diaconis (Trailing the dovetail shuffle to its lair. The Annals of Applied Probability) showed that 7 shuffles of a deck of 52 was sufficiently random for most purposes. The New York Times ran a story and following a number of critiques showing that the results were not sufficiently random for every kind of application (see New Age Solitaire), I heard no more of this seemingly promising NlogN approach to shuffling.
And one of the things one learns early on is to reach for Knuth
Knuth is for computer scientists. Not everybody who writes code meets that definition. A lot of us (and I include myself) don't even qualify as "engineers".
For most programmers, the best way to write good "select random x from 1..n is not to brush up on our algorithmics. That's like fabricating a car part instead of going to the auto supply. (Hey, there's a good reason the car analogy keeps popping up!) You need to rely on standard, well-tested libraries. Josh Bloch even refers to this use case as an example of why you should rely on library code.
At least based on the summary, I'd say the author also provides a really good demonstration of "how to be a smug condescending twit".
#DeleteChrome
An array can be easily randomized in JavaScript by simply assigning a fixed random number to each element of the array:
function randomizeArray(a){var i,b=[];for(i=0;i<a.length;i++)b.push([a[i],Math.random()]);b.sort(randomizeArraySort);for(i=0;i<b.length;i++)b[i]=b[i][0];return b;}
function randomizeArraySort(a,b){return a[1]<b[1]?-1:a[1]>b[1]?1:0;}
usage example:
alert(randomizeArray(["a","b","c","d","e"]));
I do not understand the problem. Why is it "bad" if one of the browsers appears on the far right side instead of the far left side? I know it would not influence my choice, what WOULD influence my choice is if I had heard from friends/colleagues etc about one or more of the browsers before. Whether Firefox is on position 2 or 3 or 4 is totally irrelevant. Maybe they could just present the browser choices in a big circle instead of a horizontal list ...
Yes I am well aware of this, in my post I state a "simple quicksort" algorithm which precludes several of the issues you mention and I discusses the possibility of problems in the pivot selection. But as you will note the article does not discuss any of these issues so we have no idea if it is one of these or not (it just claims it must be the sorting).
So we basically have an article that says I ran a benchmark and I can't explain the results so I'll just make up something that sounds good.
This is the result of a government mandate.
Why on earth did said mandate not specify what method was being used to sort the browsers? Surely it gave parameters for inclusion in the list...
Wow, did you 'graduate' from the same school as the Microsoft programmer who wrote this code?
The sort function requires that the comparison function defines a strict weak ordering. Anything else violates the precondition, and the result of calling sort with a bad comparitor is completely unspecified. Not long ago on comp.lang.c++.moderated there was a query from someone complaining about what appeared to be a bug in their standard library sort() function, which was consistently crashing. It turned out their comparitor was faulty, and the sort function was rolling off the end of the array and accessing beyond the array bound.
Simply assign each browser a random number, then sort them by each corresponding number. The current solution is over-engineered. It's till random, but lacks an even distribution of probability. They should make the very simple change to "fix" that, if for no other reason than just to be beyond reproach.
-- "I never gave these stories much credence." - HAL 9000
Is it just me, or does everyone find this extremely irrelevant...seeing as I've already made my browser choice. This only affects "first-time computer users," really....which should be less than 1% of the European population by now. Is Europe so short on teenagers to set up their (gran)parents' new computers that we need to worry about the noobs?
Most likely I graduated from a different school but I appreciate that you asked.
You are quite right the formal precondition of the sort method is violated and this implementation is quite sketchy in that respect. However, that does not tell us what the distribution errors are caused by. As Rob is making other strange statements like possible non-termination (which should be impossible in any reasonable quicksort), I am not particularly happy to just believe him without a more careful analysis (e.g. seeding, how the test harness was implemented, pivot selection, etc.).
Three things would be needed for Microsoft to implement an effective random selection:
a spec' to follow the "standard"
a competent programmer
management willingness to let the programmer follow the spec'
Starting back with C-80 (a CP/M-80 attempt at a "C" compiler), Microsoft has consistently demonstrated that their organization lacks at lest one of those attributes.
int t_array[16];
int *t_pointer = &t_array[0];
however, in C-80 (t_pointer != t_array).
The Amiga programmers guide specified that addresses were 32-bit (68000), and not to store any data (flag bits, whatever) in the upper byte. Microsoft stuffed flags into the upper byte when implementing Amiga Basic, which promptly broke everything when the 68020 bonded out the upper address byte.
Try visiting the page with javascript disabled. You'd expect to see a message along the lines of "this page requires javascript to display correctly", right? But of course, that's not what you get in this case...
I always liked Knuth's choice quote from John von Neumann on the subject:
"Anyone who considers arithmetical methods of producing random digits is, of course, in a state of sin."
- Michael
I wonder how many javascripts on the web are using this same flawed random sort implementation? Probably lots.
-- "I never gave these stories much credence." - HAL 9000
One particular web developer is less intelligent than your average CS major. News at 11!
So a billion dollar company is safing a few pennies by not using a proper routine, that they probably got developed and ready to use in any case and instead produces out of several approaches the one that is bad.
And that doesn't tell you anything.
Well, that tells me something about you. Can you guess what it is?
MMO Quests are like orgasms:
You may solo them, I prefer them in a group.
This ain't hard and MS has got (or should have with the salaries they are paying) plenty of smart people. And yet they continue to screw things up time and after time. And this is not a compatibility issue were MS has to ensure its software runs on a billion different configurations with users disabling critical stuff because they think they know better. This is just a random selection, a text-book case and they get it wrong...
And those who claim about MS not wanting to spend countless resources on this. Sorry, but any decent dev would have knocked this out in a day. Give a week of testing and we are still talking peanuts in development cost for MS. That they still didn't get it right. Well that says a lot about MS. It always seems to be a race towards the bottom for them. With every product, everything they do, you got to wonder, what did they screw up this time.
IE5 mobile browser without CSS. X-box that is underpowered. New music store for the zune so that existing stores and music don't work. New x-box with a failure rate that you couldn't get if you dropped half the boxes during transport.
And if they can't even get the right random algorithm, how can we be sure they get anything else in their code right? Would you buy a car if the car radio fell out whenever you turned it on? No, because it says something about quality. If the things you can see, don't work, then what about the things you can't see?
MMO Quests are like orgasms:
You may solo them, I prefer them in a group.
It's not modulo 5, it's modulo 5! (the number of possible permutations of N items is N!=1.2.3...N)
You are still confusing idealized algorithms with real-world implementations. Where in the javascript specification does it say that the sort() function must use QuickSort? And even more specifically, where does it say that the QuickSort implementation will coincide exactly with the idealized version you learned in CS101 ?
In practice, implementations of sort() generally use something other than QuickSort whenever the array (or partition) is smaller than some (small) fixed size. This is because using QuickSort to sort a very short array is actually quite slow, much slower than other algorithms (consider for example the sequence of operations performed by QuickSort on an array of 3 numbers, compared with a simple hand-coded direct sort). So, even if the Javascript implementations in question are using an algorithm based on QuickSort for large arrays, it is quite likely that for the array of 5 numbers it is using a completely different algorithm. How are you to know whether that algorithm is guaranteed to terminate if given ill-formed input?
Huh? The distribution errors are caused by the comparitor, that violates the precondition for the sort() function. Full stop, end of story. The actual distribution will be likely depend sensitively on the details of the sort implementation, which it almost certainly the case is since it appears to be a different distribution for every browser. If you can get hold of the source code of the javascript interpreter you might be able to see how sort() was implemented in a specific case, and understand how the actual distribution arose, but that isn't a very useful thing to do. It may well change in the next 0.01 version of the browser anyway.
Pretty much any technique I can think of
One that comes to mind that could potentially run forever with a sufficiently fucked comparison function is a naive bubble-sort that doesn't bother with the optimisation of reducing the list size with each pass and uses a pass with no changes as the terminating condition.
note: i'm known as plugwash most places but i screwd up registering that here somehow in the past and now can't register
Those of you that are computer scientists should take a moment to consider that randomness is not the same as uniformity (as an insightful reader commented in TFA and triggered me to respond there).
Just because the only way to produce an algorithm for uniformity is via a random number generator, this does not mean that there aren't other non-statistical approaches. Here's one:
"The computer upon Windows installation contacts a MS site that uses a global installation counter - each new installation would increase the counter from N to (N+1) and then present a browser order according to (N modulo 5!). This is a totally deterministic process, with no randomness at all (statistical tests for randomness would fail because of the autocorrelation), which however would lead to perfect uniformity: at any given time instant, each browser would have been placed in each of the 5 positions with a percentage of precisely 20%, as required. The same kind of uniformity could be produced by using the installation serial number (licence) of Windows: since the licence key space is well-defined, the order of browsers could be also well (uniformly) defined from the serial number itself. There might be a problem with volume licences, but VLKs are a small percentage of total installations.
However, on a single offline computer, with no knowledge of history (what ballot was presented globally) or without a licence key, programmers have to resort to mathematics in order to produce
uniform (not necessarily random) distributions. This is an application of the law of large numbers: if the ballot is uniform on the same computer, it will be uniform globally." (using quotes because I'm quoting myself).
In conclusion, we should not care if the distribution is not "random" but whether it is uniform (i.e. all possible permutations of 5 browsers appear with equal frequencies).
We now know a shuffling algorithm should have been used and Microsoft could have evaded prosecution
Wow, I didn't know the EU already brought another anti-trust suit over this!
Oh wait, they didn't, who are you to say this function isn't sufficient to protect them from further prosecution? As I understand it, all they had to do was present a list of browser choices instead of just loading their own, and not choose the order of that list ahead of time. Of course, any list is going to have an order of some sort to begin with, so a simple random function accomplishes that goal.
There is a distinction that needs to be made here, it's a bit of a semantic but it is important nonetheless: It is not Microsoft's responsibility to "promote competition" against themselves. It is Microsoft's responsibility to not abuse their position of market dominance to unfairly disadvantage their competition in order to drive them out of the market. The EU felt that installing IE as the default browser on Windows was unfairly restricting a scarce resource to gain competitive advantage. As such, IE cannot be the default browser, and users must be given a choice from a list not chosen ahead of time.
You can't honestly argue that they are unfairly limiting a scarce resource in order to gain an unfair advantage because of a 50% chance they'll be the end option among four other options, can you?
You guys seriously need to get some perspective. I know it's fun to bash the big company, but seriously, it gets old after a while. Just keep them in line, that's all that's needed.
Security is mostly a superstition... Avoiding danger is no safer in the long run than outright exposure. - Helen Keller
Is there any guarantee that the sort algorithm the browser uses will be a quicksort?
note: i'm known as plugwash most places but i screwd up registering that here somehow in the past and now can't register
That's a good point, in fact since a plain bubble sort requires a pass with no changes I think it would run for a very long time indeed...
No sort called from a library would be using Bubble Sort though, if you're going to the trouble to build an API with a comparator your sorting algorithm should be half decent.
"There is more worth loving than we have strength to love." - Brian Jay Stanley
Safari will almost always (almost 50% of the time) be put in the bottom two elements [out of five].
Very funny, but you ignore the fact that between the two different browsers "bottom two" really equates to one exact position in each case. Sure "almost always" is probably not a good description but even 40% is a lot more than 20% no matter how you figure it.
"There is more worth loving than we have strength to love." - Brian Jay Stanley
Wow... the bottom 2 elements out of a possible 5 ALMOST 50% of the time.
Read the results. It's really ONE out of five, it's just that the position changed depending on the browser that was performing the test. So it's not two out of five, it's one out of five - it's just which one is the chosen slot can alter based on the algorithm used.
Yes, 40% is far different than 20%.
"There is more worth loving than we have strength to love." - Brian Jay Stanley
I agree with your general point. The particular implementation was not good. Violating method contracts leads to bugs. Yes, obviously. But issue was poor distribution in the random numbers. And the author never determined why this was the case. Was it the implementation of quicksort, an issue with seeding when run in a tight test harness, as you mention does it used different sorts for different input sizes, etc?
So yes, we have a program that is poorly written and produces the wrong answer. We also have an article that is poorly written and does not explain why the results are what they are (other than don't violate method contracts, which is good advice but not very enlightening).
No there is no guarantee and the article does not specify which sorting algorithm is actually used. But any reasonable sorting algorithm should terminate very quickly even with the incorrect comparison function. Which is one of the reasons I found the article annoying, lots of general claims but very little analysis in the end.
I usually don't post much but for some reason the article struck me as designed to have just enough technical content to seem authoritative but that the intent was simply a chance to get in some cheap shots at Microsoft.
Microsoft was one a start-up, where asking for forgiveness was the rule and the best results trumped the right way. Now it is a Fortune 100 multinational corporation, a'la Exxon. The rule is "ask for permission," and performance reviews are managed by an HR department the size of Twitter. I suspect that someone from Legal now approves the "randomness routine." Lest you think I jest, I bet they have a patent pending on this infamous algorithm.
What I find even worse than the 'non-randomness' of the choice screen, is that the choices are shown in a fixed order as the page is downloaded and prior to the script being run. This can take several seconds, depending on your connection speed etc. This is more than enough to read the first couple of entries and make your choice.
Internet explorer 8 is the first choice in the fixed order that you see while the page is downloading.
http://www.browserchoice.eu/BrowserChoice/browserchoice_en.htm
They should have the options as 'visible=false' until they are randomly sorted, then make them visible.
I.O.U One Sig.
How do you know?
Because I've not been living under a rock for the past year or so? Did you honestly read NOTHING about the whole EU case against Microsoft and the resolution?
"There is more worth loving than we have strength to love." - Brian Jay Stanley
Failure to understand the article isn't really fucking up. It's standard Slashdot practice.
There's no point in questioning authority if you aren't going to listen to the answers.
Did you actually read the article?
No, the random numbers themselves were just fine. If there was a problem with the random number generator, then the results in the section 'Fixing the Microsoft Shuffle' in the article would have not produced a uniform distribution.
The real problem is explained clearly in the article:
Garbage In, Garbage Out.
If you want to analyze the behavior of a specific implementation of an algorithm given input that violates the preconditions, then lookup the source code for your javascript interpreter. That may be a fun exercise to do, if you are bored, but it isn't very illuminating unless you want to understand why browser X produced that specific distribution. If you call sort() on data that doesn't satisfy the requirements of a comparitor, then you are not actually sorting the data. You can follow through the algorithm and see what it would do instead, but for sure it isn't sorting it. Consider for example the behavior of a typical hand-coded routine for sorting an array length 3 (sorry about intending, can't figure out how to get it to work properly in ecode):
For an exercise, try following through the probabilities of each outcome, if we replace the comparison operation with a coin toss. An unbiased random shuffle should have an equal probability of producing each permutation of the array. For 3 numbers, there are 6 possible permutations so each outcome should have probability 1/6. Is that satisfied here? (hint: no!) Try the same exercise with [insert your favorite sort algorithm here]... Try it with bubble sort too. That is probably what Rob had in mind with the 'may cause infinite loop' comment.
Unless you have some non-garbage to contribute to this discussion, I'm out of here.
Can't we go ONE WEEK without a Microsoft story hitting the front page? Seriously, how many shills do they have working here?
Microsoft didn't seem to have this much trouble generating random numbers when they wrote Excel.
Have gnu, will travel.
This error can be easily traced back to the first google result (Actually it should be the first bing result in this case).
To be perfectly honest, this is exactly what I would have done too.
It takes 5 minutes to dig out my copy of Knuth.
It takes 1 minute to pirate Knuth and search through the pdf.
But it only takes 10 seconds to copy and paste this one-liner from the first google hit.
That probably explains my lack of success in the job market for the past decade...
One thing I couldn't help but notice though, is that Microsoft always pops IE in the number one spot for a moment *before* shuffling the browsers and showing them in randomized order... Very visible if you visit the ballot manually in IE and hit F5 a few times: http://www.browserchoice.eu/
Yes, that's the initial ordering and very visible if one turns off Javascript (NoScript).
Internet Explorer 8
Mozilla Firefox
Opera Browser
Google Chrome
Safari
Maxthon
K-Meleon
Flock
Avant Browser
Sleipnir
FlashPeak SlimBrowser
GreenBrowser
Yet Socrates himself is particularly missed.
A lovely little thinker but a bugger when he's pissed.
Or at least download a copy of Linux. There. Problem solved. Time for lunch.
For very small sorts (four or less I think) bubble sort is actually more efficient than other more advanced algorithms. So it could get used under some circumstances.
Or at least, I read that it was more efficient once in a book somewhere. Perhaps that is no longer true.
I, too, was amazed. Instead of choosing some easy, if slightly tedious, solution, which would be very trustable, they chose some idiotic solution that risks infinite loops at unpredictable times. I might be moved to finally admit there is some purpose to having math education in CS -- to try to avoid this kind of really dumb programming.
Well, the code might be easier to understand if you're using a high level language that has push and delete from array commands. Or it would be if the poster hadn't screwed up the explanation...
What gives you that idea? Ideally, if you supply an invalid comparitor to a sort function, then a quality implementation will do whatever is best to aid debugging (especially if it has a debug mode activated). Some examples of what I would expect: crash with a segfault and activate a debugger; throw an exception; go into an infinite loop and wait for a programmer to attach a debugger.
The worst possible case would be to execution to continue as if there was no error, when in fact the postcondition of sort() is not satisfied and the state of the program may be logically inconsistent.
So people keep saying press F5 to see the unsorted order, and I went to try it (after allowing javascript), and when I refreshed I got: Google Chrome image - Google Chrome Firefox image - Safari Firefox image - Firefox IE - IE Opera - Opera and after about a 2-3 second pause, staring at that, the first Firefox image changed to the correct Safari one
> "99.99% random" process into a "100% random"
Did you look at the same stats I did?
Incorrect: X-squared = 13340.23, df = 16, p-value 2.2e-16
Fixed Version: X-squared = 21.814, df = 16, p-value = 0.1493
It's not 99.99% random. It's way off. But you don't know what any of those numbers mean, do you? Or are you going to go Google them?
Out of curiosity - why would they need to seed the random number generator?
Wouldn't -not- seeding it be better? That way MS don't have to include some seed number in their script and have everybody thinking they're manipulating the results by using seed numbers that they know would generate a certain ordering in various versions of IE.
If you don't seed a random number generator, you get the same sequence of values every time. As it happens, the Javascript random function seeds based on the current system time, which is usually sufficient when you don't need your randomization to be cryptographically secure. So you're right that the programmer should not need to specify a seed in this case, but the fact is that the RNG is seeded behind the scenes (by necessity). Also no competent programmer would ever hard code a "seed number" in their script, as you suggest, as it would be just as bad as having no seed--you'd get the same sequence each time. Ordinarily, if you're manually seeding (and, once again, not doing cryptography), you use the system time just as Javascript automatically does.
That said.. odd method of getting random results out of an array.. I'm sure it's more efficient...
Efficiency doesn't matter when you're doing it wrong.
Not only is this solution "good enough", from an anti-trust standpoint it is probably far better than a truly random function.
Obviously dude INAL, but he seems to think he's a judge and speaks for Europe. If the European court wished to have Microsoft innovate "better than random" they could have asked for this solution directly, or Microsoft could have proposed it formally. Anything less is flirting with contempt of court. To think that they pay their junior legal representation on the order of $300/hour just to ask the judge if they might please submit such a request, the idea that they elected to go with "good enough" to save 15 minutes of programming effort that a high school intern could have done better boggles the mind.
But thanks for the faux MBA. After Canada won the hockey game (woohoo), I hung around for the free Texas Hold'em, my first game in real life with strange drunks. In this game, you got a special chip for each beer consumed prior to the game. This chip was worth a SB on any round of play. The guy with seven of these white chips couldn't even properly count out a raise of 1000, but he did offer some excellent business advice while he fumbled around.
Another guy with a big stack of white bonus chips sees my 20 BB raise from early seat with his suited J2 (I'm holding the big slick), stays with me after I go all-in hitting an ace on the flop, then busts me out when he hits his flush draw on the river.
Then I return home to a troll paean that Microsoft's formal incompetence is "better than random". Some days you can't win.
The act of determining whether it is likely to be more efficient probably wastes more CPU cycles than are saved by switching sort algorithms. Anyone who builds in the optimisation of using a bubble sort for extremely small data sets is a clueless idiot when it comes to the overall picture.
I bet if we gave this same problem to 100 freshmen computer science majors, at least 1 of them would make the same mistake.
Well, that seems an awfully high standard for Microsoft to hold itself to.
Microsoft: Ranked 1st percentile with Freshmen CS Majors
int placeInternetExplorer() //chosen by fair dice roll
{
return 1;
}
Maybe Im wrong,, but do Apple offer a choice of browser, with a random selection on a new os? Im sure that they don't.
It seems half the things MS are criticized for, actually the competition do also.
Choice of Browser on default install
Backwards compatibility (try getting the latest mac os on a mac mini...)
Sharing drives (NTFS shared portable drive on a MAC / With a PC.
Pot calling the kettle black..http://developers.slashdot.org/comments.pl?sid=1566166&op=reply&threshold=1&commentsort=0&mode=thread&pid=
How the hell is the parent with a quote from TFA and a funny comment on that off-topic?
Looks like an MS troll did modding.
Slashdot really needs to fix the moderation and boot anybody who uses it for improper reasons.
Why Microsoft is required to put a browser selection list at all is beyond me- do we require Toyota to offer GM, Ford, or Chrysler wheels when selling new cars? I see it as the same issue.
if you never use the same random number twice. Sorting fails because if two or more elements have the same 'random' number assigned to them, they won't switch places compared to the other elements compared to whatever numbers the rest has. This means that it's less optimal.
So calling sorting a non-optimal randomization technique isn't always valid: assigning unique random numbers to the elements does work.
Never underestimate the relief of true separation of Religion and State.
SourceGuard: whenever you right-click, a message pops up. Enterprise level source-code protection for commercial websites. Compatible with their IRM(TM) technology (Education discounts available)
tons of R&D, you guys!
o hai
One simple, if cynical, explanation for how this occured is that someone at Microsoft came up with this simple, but predictably bad, implementation and then they figured out which starting position gave IE the best final position and that's the order that went out to production. It would have taken less than a programmer-day of experimentation to figure out which input array gave the optimal distribution and that kind of investment is completely worthwhile given what Microsoft sees as being at stake here.
For a country^W company with a history like Microsoft's, there's really no reason to be cutting them slack on a high-stakes 'error' that resolves in their favour -- In fact, it's a combination of such 'errors' that put them into the sights of anti-trust investigators in the first place.
Finally -- even if this was the result of massive managerial oversight, the error is now in the open, and the fix is (a)almost trivial, and (b)available on Wikipedia.
if MS doesn't fix this problem now, then I'd cite that refusal as evidence that the 'error' is really intentional.
Free Software: Like love, it grows best when given away.
Comment removed based on user account deletion
If you want to analyze the behavior of a specific implementation of an algorithm given input that violates the preconditions, then lookup the source code for your javascript interpreter. That may be a fun exercise to do, if you are bored, but it isn't very illuminating unless you want to understand why browser X produced that specific distribution.
Yeah that is exactly what I wanted because the article was about why a particular javascript implementation produced a poor distribution for the browser selection screen in the EU browser ballot. That is a specifc bug in a specific implementation.
I can tell you that failure to do array bounds checking before loads/stores will lead to bugs as well. But that does not explain why a specific bug exists. If you want to discuss poor programming practices that is fine but I did not find this a very enlightening way to do it.
And if you run bubble sort, as described in basic algorithms, it will always terminate (regardless of the comparator) since progress is always made via the loop indexing. In fact termination is often not a function of the loop body behavior. One only needs to show that there is an order (which is not infinitely decending/asscending) on some function of the program state in order to ensure termination. For bubble sort this is just the order on the loop index counter (i).
No, the bug was not in any specific implementation of Javascript. The bug was in the code that called sort() with invalid input. It is unreasonable to expect that a function will do what you want it to do, if what you want is outside of the specifications of the function. Here is a challenge for you: produce a sort function, that functions as a usual sort function when provided with a comparitor that defines a strict weak ordering, but when supplied with a comparitor that produces random coin toss output produces an unbiased random shuffle. I think you will have great difficulty doing this! It will almost certainly result in a non-optimal function.
No, that is very rare indeed. The only times where lack of array bounds checking is an actual bug is where the index into the array is not well controlled (eg, it came from user input). In 99% of cases, the index into the array is some previously computed value, and if it happens to lie outside the array bound, then an error has already occurred, somewhere prior to the array indexing. That doesn't mean that array bounds checking isn't useful: it is very useful, but the bug that led to the array index out of bounds has already happened by the time the indexing itself occurs. It is only a diagnostic tool, like a CCTV camera recording the identity of a murderer - by the time you see the recording, the murder already happened.
The usual implementation of bubble sort keeps a flag to indicate whether any swaps were performed on the current iteration. If no swaps were performed, the algorithm exits because (for a well defined comparitor) the array must be in order. Keep looping until no swaps were performed.
It is also possible to loop for exactly N iterations, because it can be proven (again, for a well defined comparitor) that at most N iterations will be performed. But this is slower on average, because it ALWAYS performs O(N^2) iterations, whereas if you check the number of swaps and terminate once the array is in order, the average case can be less then O(N^2). For example, if the array is already sorted, the algorithm can exit after 1 iteration.
Sorry, that wasn't clear: I meant to say O(N^2) operations. I'm using the convention that a single iteration of bubble sort is a loop over all the elements in the array, comparing (and optionally swapping) elements a[i] and a[i+1], so that each iteration is O(N) operations. The complete bubble sort is, for worst case, O(N^2) because it takes in worst case ~N iterations.
Look, I agree with you on mis-using the sort routine. Using a method to do something unintended does not cause bugs, it makes them more likely and is bad form. But without further analysis it is not certain if and why the sort is the source of the bug.
As for your challenge see the first example of quicksort from wikipedia:
--------
function quicksort(array)
var list less, greater
if length(array) 1
return array
select and remove a pivot value pivot from array
for each x in array
if x pivot then append x to less
else append x to greater
return concatenate(quicksort(less), pivot, quicksort(greater))
------
Note that this correctly implements quicksort. By induction on length we show the uniform distribution:
length = 1: trival.
length = 2: a 50/50 comparison has a 50% chance of producing either order.
length = k: Each element has a 50% chance of being in less and a 50% chance of greater. By induction we will get permutations of each of these uniformly. Since we have a uniform chance of putting each element in each of these (and pivot is selected uniformly at random) the combination of the three will produce all purmutations with equal probability.
So if you want to argue the effects of possible implementation details then that is fine but that was my point. These implementation details are key to determining what actaully caused the problem (which as shown above may not be the sort at all). The article never addresses these only says the problem must be there, then after changing multiple things the problem isn't.
FTFA:
I'm just off to RTFWP, but why should that stop me from questioning - wouldn't the posting of an implementation of an algorithm in language-X onto a Wikipedia page constitute "original research", and therefore require to be taken back down. (If it's not original, then it must be a copy of something from somewhere, which then becomes a copyright problem)?
OK, off to RTFWP ... and sure enough, the algorithm implementations are there, and no source is cited. So, surely this is original research and should be struck out (along with all other similar examples).
Birds are not dinosaur descendants;birds are dinosaurs, for all useful meanings of "birds", "are" and "dinosaurs"
My god, how many times do we have to go over the same ground?
I don't know what to say here. If you really believe this, I don't think there is much more to discuss, as we are clearly operating on different wavelengths (and I hope you never ever get a job as a computer programmer!). Consider another example:
Your statement is "Using a method to do something unintended does not cause bugs, ...". Do you really think that calling sqrt(-1), given that specification above, is not a bug?!?! If you do think it is a bug, can you explain how this example is different to the one in the Browser selection box, where a function with a specification
is called with a comparitor comp that doesn't satisfy a strict weak ordering?
You might be correct, that the naive quicksort algorithm as presented on Wikipedia would actually produce a proper random shuffle, if given a `coin toss comparitor' as input. But my point still stands: the naive algorithm isn't optimal, because you can do better than choosing random pivots, and also for small arrays quicksort is inefficient anyway. In any production implementation of quicksort, it isn't going to recurse all the way down to length 1 arrays, rather it will stop at length 3, or maybe more, and use some other sort function on the short array.
If you don't choose the pivots to be uniformly distributed, the resulting shuffle is biased. For example, if you choose the middle element to be the pivot, then on an array of length 3, ABC, with B as the pivot element, you get ACB,CAB,BAC,BCA each with probability 0.125, and ABC,CBA with probability 0.25. There is nothing in the specification of QuickSort that says that this is an invalid implementation! Note that the Javascript doesn't even specify QuickSort as the algorithm to begin with. The chances of any browser implementing the Javascript sort function via the exact implementation of QuickSort that you wrote above essentially zero.
IE is #1 if you have javascript disabled. How random!
It has a CD with neary 200 MP3s on it, and for some insaaane reason it is very fond of playing Tatu's "They're not gonna get us"...which I'm not sure how got on the CD in the first place... *mumble mumble*
So I start the car, there it is. I press next a few times, find nothing I like, and hit the Random button again, and guess what. I have heard nothing but Tatu and Madness songs for two fricking weeks to and fro work, and now I'm borderline certifiable.
I can only hope the cruise control / pedal issue offs me before the CD player breaks my remaining will to live.
You do realize there is a difference between code that is dangerous (but technically correct) and code that is actually buggy. As per my quicksort example. Dangerous, but correct.
You are getting intent and results confused. Calling my quicksort routine with an invalid compare function clearly violates the intent of the contract abstractions but functionally produces the correct results. The same may happen with your sqrt(-1) example, it is a bad idea from a modularity standpoint but may be functionally correct.
If you want to talk about if that is the case for the browser selection, well, I am glad you agree that determining if the implementation of the JS quicksort routine causes the bug (or if it is something else) would be a good idea. Unfortunately the author of the article was happy to jump to premature (although not unreasonable) conclusions.
And don't worry about me being a programmer. My code is quite reliable and in fact I do a lot of work on software reliability and correctness.
I don't think that there is such a thing as "dangerous (but technically correct)". Given a specific implementation, it may just happen to do what you want it to, if you supply a specially crafted input that violates the spec. I would call that dangerous, incorrect, but may happen to work on a specific implementation. There is no reason to expect that it will continue to work on the next release of the software.
It would be fair enough, if you wrote your own algorithm, lets call it marron_sort(), that has the spec:
This algorithm serves (at least) two purposes: (1) given a comparitor defining a strict weak ordering, it sorts the array. (2) if the comparitor returns unbiased coin tosses, then the array is permuted by an unbiased random shuffle.
I hope that you can agree that this is a rather weird specification! In any event, this is nothing like the actual specification of Javascript sort(). The fact that there is one theoretical implementation that does what you want when given invalid input doesn't mean that it is in any way reasonable to expect the same behavior for random implementation X. Especially since the 'one theoretical implementation' in this case is non-optimal.
No it doesn't. The QuickSort specifies a postcondition that on algorithm exit, the array is sorted. There is no way that a non-sorted array can be considered 'correct'.
Huh what? Firstly, what does 'modularity' have to do with it? And secondly, can you explain how calling sqrt(-1) 'may be functionally correct' ? What do you expect it do to?
One more time, for the road: the bug was caused by some n00b programmer calling the Javascript sort function with an invalid comparison function. I don't understand why you seem to be trying to blame the implementor of the Javascript sort function, when the sort function is completely correct and obeys the specification! There is probably no browser in existence that would actually implement sort in the way that you apparently think it should be. If you really think that strongly about it, I suggest you talk to the W3C. I don't expect they'll pay any attention to you though.
What? Are you on drugs? Rob wrote a test program, demonstrated that the code written by the MS programmer generated a non-uniform distribution, and verified that a correctly written random shuffle algorithm produces a uniform distribution, using the same test program. Rob also went through the code and identified the specific bug that caused the problem: incorrect use of the sort() function. How can you possibly call that a premature conclusion?
During this thread, I have identified at least two sort fragments that are likely to be similar to code used in real-world implementations of the Javascript sort()
There are lots of dangerous but technically correct programs. In fact almost every program written is ill specified and most likely has methods that are used in ways that they were not initially intended.
As for my example, I agree in a "good" program one would always use it according to the given specifications. Doing anything else is breaking modularity and invites errors when making later changes. If you fail to understand the difference between modularity/abstraction (using the semantics provided by the contracts) and correctness (producing desired results) then I cannot help you.
No what Rob demonstrated was that replacing the call to sort made the error go away. But several things were changed here as well (most notably the random number generation and initial seeding were probably altered substantially). So, while he is right that calling sort to get a purmutation is a bad idea and he is (probably) right that the sort was the problem, this is never actually shown in the article. He just assumes that the bug went away so it must mean that the sort was the problem (but as I showed above this need not be the case).
As for your concern for me, I doubt if my code has appeared on TheDailyWTF, but I do know that I recently had some of my work in the ACM proceedings of "Verification Model Checking and Abstact Interpetation" on modeling the semantics of programs for later verification applications. So, I do know a bit about program correctness and specification. How about you?
If you define correctness as "producing the desired results on one particular version of one particular platform", then you have a recipe for non-robust code that won't survive in the real world.
Damn, I've been wasting my time. Now I know you didn't properly read the article. He includes the test programs he used. They are pretty short, and easy to verify that there is no difference at all in the random number generation, nor in initial seeding (whatever that is supposed to mean - you seed the random generator once at the start of the program [or better still, when the machine is powered on], and never touch it afterwards). What a waste of a couple of hours.
Not worth responding to, just reinforces that you never looked at the source code.
Conference proceedings don't count for much. I can't see any evidence that you know anything about devloping robust programs. Not that I'd expect you to, it is quite a different game to developing verification software anyway (although if the verification software itself needs tp be robust, then we have a problem).
Or if you're really unlucky it could sit there sorting them forever.
No, qsort will always abort, no matter how bad you chose your random comparisons. It may need N^2 comparisons and produce an order that is unexpected and non-uniformly distributed, but otherwise it's a pretty safe bet.
Where is the heartburn about Mac or Ubuntu with all of the stuff they load onto their systems and then prevent one from loading outside stuff??
What the hell are you talking about?
Yes a Mac ships with Safari. But it's only a click away to install Firefox - many do.
Macs hardly ship with anything, that is the appeal... and there is ZERO prevention of loading outside stuff. Same with Ubuntu (though that does ship with a lot more preloaded stuff).
The fact is that Windows is a monopoly, and for the good of the web Europe has decided Microsoft needs to allow clear browser choice. Frankly I don't give a damn which browser users choose, the important thing is that it's not all IE so that web sites have to be developed to a standard, not a browser.
I actually do not support Microsoft being forced to do this, but since they are so forced let's do it right.
"There is more worth loving than we have strength to love." - Brian Jay Stanley
If you define correctness as "producing the desired results on one particular version of one particular platform", then you have a recipe for non-robust code that won't survive in the real world.
I think defining correctness as "producing the desired results is pretty reasonable. Sure correct doesn't mean robust, but they are different things and this discussion has been about correctness.
Damn, I've been wasting my time. Now I know you didn't properly read the article. He includes the test programs he used. ...
Not worth responding to, just reinforces that you never looked at the source code
Yeah, I looked at the code. He replaces the sort says the bug is gone, must be a bug in how the sort is used. Sure seems likely but where is the actual bug. If we are going to spend the time giving a case study of a bug I think showing exactly there the bug is and why it is matters. But I suppose reasonable people can disagree.
The main thing this discussion has shown is the existence of different metrics for program quality, i.e. an error and a bad design, one is wrong the other is undesirable but correct (at least for the current program). Both are bad but in different ways and need to be handled using different techniques.
Conference proceedings don't count for much. I can't see any evidence that you know anything about devloping robust programs. Not that I'd expect you to, it is quite a different game to developing verification software anyway (although if the verification software itself needs tp be robust, then we have a problem).
If that is your opinion. But I should note that IBM, Microsoft, Mozilla, etc. spend a fair bit of money to sponsor and send people to these things. So someone apparently thinks these proceedings count for a fair bit (and companies that care a lot about building robust systems).
I'll leave my part at that.
If they had assigned a random value to each element and sorted by that value the result would have been truly random as the value associated with each element would have been consistent.
As another poster above pointed out, that only works if your "random values" are unique.
-AC
No, from an engineering point of view it isn't reasonable at all. I can see where you are coming from, and that you want to have a definition of correctness that your verification tools can use, so you can prove that some piece of software is or isn't 'correct'. I'm sure that makes for great marketing! But without formal verification against a specification, there is no way to certify that the design is correct against even the smallest change in the system. I'm going to harp on about this because its an important point that you might not be able to see from your ivory tower[*].
Lets take a more traditional engineering example: a bridge. The bridge designer comes up with a set of specifications for each part, and a bunch of people build the parts and deliver them, and some more people put the bridge together. Now, suppose that the person making one particular cable screws it up, and instead of being able to withstand a force of 100,000N, it can only stand a force of 1,000N before breaking. According to the design specification of the bridge, this cable must be able to withstand the full 100,000N when undergoing extreme stresses (high winds, lots of traffic, etc). Now also suppose that by a lucky coincidence, the person making the support structure elsewhere in the bridge did an excellent job and over-engineered their components such that, even under high stresses, the cable is never tested beyond 1,000N, instead the forces are distributed among other parts of the bridge. The result is, the bridge doesn't fall down.
I'm betting at this point that you would say that the bridge is correct. After all, your verification tools will throw some inputs (traffic, weather, etc) at the bridge, covering all of the extremes and corner cases, and it doesn't fall down, so it must be correct!
The only problem is, that there is a good chance a whole bunch of people will die, sooner or later. Eg, suppose that later on when the bridge needs some maintenance on some of the supports, a different contractor is brought in, who looks at the spec, builds the component, and puts it in the bridge. He can do this, while still satisfying the specification, in such a way that suddenly the cable, that previously only had 1,000N force on it, now has the full design specification of 100,000N and the cable promptly snaps. You are still asking "where is the actual bug?". If you cannot see where the bug is, I think there is no hope.
The above example is a very close analogy to the Javascript code. The bridge support structure is the Javascript standard library, and the cable is the Microsoft browser choice web page. The code in the web page doesn't meet the specification, but due to a lucky accident, it happens to work. Umm, except that it doesn't: on none of the browsers tested, does it actually work. The only way it would work is if the Javascript sort function was implemented in the precise way that you seem to advocate, a naive QuickSort that just happens to have the property that if you violate the precondition in a particular way, the algorithm will violate its postcondition in a particular way that coincides with what you actually want. If you want to call that correct then I guess no one can stop you, but when you sell your verification software to customers you should make it very clear how you are defining 'correct'. I rather suspect their definitions will disagree!
This is why, in some schools of thought, it is considered a serious bug if a function does anything sensible with input beyond what is specified in the precondition. It can hide bugs elsewhere in the code, because it opens up the possibility that you accidentally misuse a function, but that misuse is hidden and is likely to surface and bite you when you least expect it. For a simple exa
I agree that correctness and robustness are different things. Robustness is about flexibility of a specification. Non-robust code (and programs) have very tight specifications and are useful only only under a narrow set of circumstances.
Corretness is (in my, and I think fairly common definition) a question of whether the specification is implemented correctly. From an architectural point of view, correctness may be applied to the specification itself, so one can talk about correctness of the specification, but that isn't what I'm talking about here. Correctness of the program is purely about bugs (or hopefully lack of!) in the software implementing the specification.
According to this definition, it is possible to have an incorrect program that still produces correct results. This may happen, for example, if the program calls a standard library function incorrectly (ie, violating a precondition), but the implementation of that function is such that it just happens to work anyway. Such a piece of software may still be useful, no denying that, but I assert that it is still not correct. A formal proof will show that it is incorrect because internal invariants will not be met. In some cases it may be possible to devise a new set of specifications (preconditions and invariants) such that the program is correct. But that means that you need to be able to modify the specifications, and obviously for something like a standard library function you have no control over the external specification.
I just put a better Python implementation in the wikipedia page for fisher-yates shuffle, adapted from the Python standard library. Can I get a code review?
http://en.wikipedia.org/wiki/Fisher–Yates_shuffle
It's exactly what's happened.
MS marketing's Mod Squad is ruining Slashdot.
Sorting requires a self-consistent definition of ordering.
The article implies the root of the problem is because it introduces inconsistency in comparisons. Could this be fixed by generating a random seed, then replacing all calls to Math.random() with Math.random(seed)?
You cannot just throw Math.random() at a problem and stir the pot and expect good results
You can expect good enough results though. This particular application isn't exactly rocket science or cryptography, slightly non-random is good enough for me.
Granted, it's a silly mistake to make. I would imagine they didn't put their best and brightest on this one though. Interns, anyone?
Porquoi?