Canada's 'Random' Immigration Lottery Uses Microsoft Excel, Which Isn't Actually Random (gizmodo.com)
An anonymous reader writes: Last year, Canada introduced a new lottery system used to extend permanent-resident status to the parents and grandparents of Canadian citizens. The process was designed to randomly select applicants in order to make the process fairer than the old first-come, first-served system. There's just one problem: the software used to run the lottery isn't actually random. The Globe and Mail reported the Immigration, Refugees and Citizenship Canada (IRCC) uses Microsoft Excel to run the immigration lottery to select 10,000 people for permanent resident status from a field of about 100,000 applications received each year. Experts warned that the random number generating function in Excel isn't actually random and may put some applicants at a disadvantage.
First, it's best to understand just how the lottery system works. An Access to Information request filed by The Globe and Mail shows that IRCC inputs the application number for every person entering the lottery into Excel, then assigns them a random number to each using a variation of the program's RAND command. They then sort the list from smallest to largest based on the random number assigned and take the first 10,000 applications with the lowest numbers. The system puts a lot of faith in Excel's random function, which it might not deserve. According to Universite de Montreal computer science professor Pierre L'Ecuyer, Excel is "very bad" at generating random numbers because it relies on an old generator that is out of date. He also warned that Excel doesn't pass statistical tests and is less random than it appears, which means some people in the lottery may actually have a lower chance of being selected than others.
First, it's best to understand just how the lottery system works. An Access to Information request filed by The Globe and Mail shows that IRCC inputs the application number for every person entering the lottery into Excel, then assigns them a random number to each using a variation of the program's RAND command. They then sort the list from smallest to largest based on the random number assigned and take the first 10,000 applications with the lowest numbers. The system puts a lot of faith in Excel's random function, which it might not deserve. According to Universite de Montreal computer science professor Pierre L'Ecuyer, Excel is "very bad" at generating random numbers because it relies on an old generator that is out of date. He also warned that Excel doesn't pass statistical tests and is less random than it appears, which means some people in the lottery may actually have a lower chance of being selected than others.
Why not just accept all the immigrants who show up? That's what they tell the US to do, right?
It doesn't matter whether the "RND" function is ideally random in a mathematical sense. It only matters whether the "random" number generated is independent of the identities of the people applying to be admitted.
The initial conditions for the Universe were set at the Big Bang, and everything that follows is deterministic. Some people will tell you that quantum uncertanties introduce randomness, but my theory discounts that.
IBM SPSS. https://www.ibm.com/analytics/...
Well I've ready before about how the RAND() function of Excel 2003 and 2007 wasn't good enough for scientist purpose. But I seriously wonder what's the bias and how it'll affect a 10 000 number scale. I don't think it's candidate #1455 have 5 time more chance to be picked than candidate #976. I guess it's more in the "0.0001 time more" scale.
Furthermore, how is the list order selected? Because if the order of the list is "kinda" random, it add the the randomness of the process. In other word, if the list order is a "little" random and the RAND is a "little" random, then the whole thing is "better" random.
Elok
Unless I'm missing something, it doesn't matter how random the PRNG is if the selection isn't influenced by the other relevant data. Everyone got their random number generated by the same shitty PRNG, so it's a fair and equitable system.
We build this generators for MSFT Excel. Too much entropy is gathered and every numbar is equal. Multply by a special constant and producing a perfect and secure random numbar.
Kerpal
The story is about an issue that is completely irrelevant.
It doesn't matter whether the "RND" function is ideally random in a mathematical sense. It only matters whether the "random" number generated is independent of the identities of the people applying to be admitted.
It isn't even that. Just because the distribution of random numbers isn't random it doesn't mean the sort order based on that isn't random. For example, suppose my random number generator only put out numbers divisible by 1/(2^16) which is what a finite precision binary based system is going to do. This distribution isn't random because it's zero density at many possible floating point values. Yet the sort order might be perfectly random.
Some drink at the fountain of knowledge. Others just gargle.
Is it physically painful being so stupid? I sure hope so.
Quoting the original article: http://www.pages.drexel.edu/~bdm25/excel2007.pdf
The random number generator has always been inadequate. With Excel 2003, Microsoft attempted to implement the Wichmann–Hill generator and failed to implement it correctly. The fixed version appears in Excel 2007 but this fix was done incorrectly. Microsoft has twice failed to implement correctly the dozen lines of code that constitute the Wichmann–Hill generator; this is something that any undergraduate computer science major should be able to do. The Excel random number generator does not fulfill the basic requirements for a random number generator to be used for scientific purposes:
it is not known to pass standard randomness tests, e.g., L’Ecuyer and Simard’s (2007) CRUSH tests (these supersede Marsaglia’s (1996) DIEHARD tests—see Altman et al. (2004) for a comparison);
it is not known to produce numbers that are approximately independent in a moderate number of dimensions;
it has an unknown period length; and
it is not reproducible.
"Randomly" sort the list 10 times and everything will end up "randomer".
Does it help Republican children deal with their INCEL problems to lie constantly in support of a treasonous moron like Donald Jumpsuit Drumpf?
Anyone who attempts to generate random numbers by deterministic means is, of course, living in a state of sin. -- John von Neumann
"which means some people in the lottery may actually have a lower^W higher chance of being selected than others."
the preceding comment is my own and in no way reflects the opinion of the Joint Chiefs of Staff
As long as no one knows what the biases are, there is not an actual issue. Probability, at least for these purposes, is epistemological.
That said, they should not use proprietary software. Public money, verifiability, freedom and so on.
What did they do back when Excel could handle only ~65K rows?
Regardless of Excel's poor random function, the way this is being described as being done, it sounds pretty legit and random enough. There's no bias on assigning the random number to each name, and the name itself isn't being used to generate the random number. So this should be fine.
Just because it doesn't meet some math/computer geek's standards of proper random number generation, doesn't mean it's not useless for this application. I say thumbs up. The RNG being perfect isn't really necessary.
"Random" in this application means the numbers assigned to each applicant are generated by the RAND function, then chosen sequentially from the resulting list.
If instead every applicant was assigned a sequential number, then the RAND function was used to pick from that list, then it is possible that certain sequential numbers would have a less equal chance of being selected, but not under the reverse.
If the RAND function assigns multiple users the same 'random' number, so what? All duplicates get selected at the same time.
Is the argument that Canada needs to print out 100,000 tickets, drop them in a (large) bucket, and pick 10,000 'winners', shuffling all remaining tickets after each 'pull'?
I'm shocked the summary didn't include a reference to President Trump - how did that slip through?
Ken
Sure, RAND() might only be pseudorandom but they're putting the numbers in the spreadsheet in a random order. It doesn't sound like they're sorting the application numbers before assigning them a pseudorandom number.
Given that your position in the spreadsheet is essentially random (appears to be based on application number) the excel random number generator not being perfectly random is not really a problem.
That said, why would you use excel for this?
"When you are dead, you do not know you are dead. It's only painful & difficult for others. The same applies when you are stupid." --someone
In another case, their sons/daughters shouldn't be forced to be resident in Canada. They are free to be resident in any place of the world. Their parents are who have their custodies.
in Excel a random number of times as Excel calculates a new random number on each sort. 15-20 sorts should do the trick.
That's effectively the position of Democrats.
Didn't seem to be Obama's position; he still holds the record for most immigrants deported in any presidental administration, 2.7 million: http://immigrationimpact.com/2017/01/04/deportation-numbers-2016/
(although Donald may be working on breaking that record)
TL;DR: The selection process is random enough for its purpose, the type of attack proposed would already require access to the data which could be manipulated anyway, and this story is bunk. When someone says that something is "random" what they really mean is that, given a finite number of possible valid values "N", that every attempt to predict that value will result in the correct value only 1/N times over an essentially infinite period of time.
Nominally, random numbers are generated through a true random seed that comes from sources such as radioactive decay, cosmic background radiation, ring oscillator or other effectively chaotic process. This is fed into a pseudorandom number generator which is a giant shift register with specified taps to generate what are nominally random numbers.
Are the implementations screwed up? Sure they are. Can they be influenced deterministically? Of course. Can this be done usefully? Not really given the value of the targets involved and the amount of infiltration required to get there. I emphasize this last point because these professors are indicating that someone could influence the random number generator. Well guess what guys? You would need access to the computer running the spreadsheet anyway, which means you could already do whatever you want to rearrange the results. Why would they waste their time influencing the RNG deterministically?
This story is muckraking bunk by people who again don't really want people to understand security as much as they want to stamp a name for themselves. I'd be much more concerned that this is being handled in a spreadsheet rather than in an air-gapped database infrastructure.
Just another example vapidly pseudotechnical clickbait foisted on us by this shell of technonews. This isn't cryptography, nobody is really trying to reverse-engineer anything associated with this, least of all the arthritic boomers. As others have said nobody has any real advantage in the selection process because the chance of rolling 23 is higher than 4. They all were given the same opportunity for either of those ranks. This is what happens when someone with a little information acts like an expert and writes a crappy article which will cause even stupider people to start hooting like a troop of baboons. News flash morons, nothing on your computer is truly "random" and all requires seed numbers unless you use a true random number generator which relies on some form of sensor noise. Excel may be particularly poor at this because Microsoft sucks, but it still doesn't make the selection process unfair.
Unless they let entrants pick where they are in the list, it doesn't matter if the random number generator is not completely fair.
Maybe it's biased such that entrants 50,000 - 51,000 are much more likely to end up sorted to the top, but unless the entrants can choose where they are in the list, I don't see why that really matters. Sure, someone that controls the list could move their friends to that range to make them more likely to end up at the top, but they could also move their friends to whatever random numbers and up at the top.
Let go to play the "casino roulette" ... the bank asymptotically always wins thanks to the zero (that has a hell probability of 1/37) ...
Is it not worse than this lottery for an average player?
I built a system once to select applicants for various voluntary programs. I didn't rely on any random function at all, but built a system that gave every participant an equal, fair chance to be selected.
My system that would calculate the MD5 hash of the individual's name concatenated with the name of the program. This is a completely deterministic process - the result for any individual will always be the same. I then sorted alphabetically on the resulting hash (technically, I discarded all but the last 10 characters of the hash), and selected the first N individuals.
This system is completely biased - once the program name is selected, certain people (based on their names) would be almost guaranteed to be at the top of the list, and others would be almost guaranteed to be at the bottom. In the classic Publisher's Clearinghouse style, "You may already be a winner!" before you even choose to sign up. Every program (based on it's name) is very biased towards some people and not others.
But that doesn't matter, because the bias is not DIRECTED. It's not that people with "A" names are all more likely to be selected. Or individuals that sign up first. Or people who apply close to noon on a Tuesday. It's biased towards a very arbitrary set of people, and that set of people IS statistically random.
Actually, I've seen exactly the opposite analysis.
Democratic politicians want more legal immigration but less illegal immigration. Legal immigrants vote and pay taxes, illegals don't.
Republican politicians want less legal immigration, but more illegal immigration. Illegal immigration depresses wages, meaning more profits for corporations. (Even if the corporations don't hire illegals, the illegals have a downward pressure on all unskilled-labor wages).
Gizmodo just discovered what a PRNG is
Long ago I watched a video of a guy selling gaming dice. He was an entertaining fellow, talking about how his dice were fairer because he hadn't destroyed the vertices. Most manufacturers erode their dice quite heavily in order to give them round edges and eliminate the blemish where they were clipped from the mold.
I grabbed a micrometer, and sure enough, my dice were measurably oblong. Thing is, I tried to determine the length of time I'd need to prove that his dice were fairer via statistics and actual rolling. I realized I'd have to pass the test along to my heirs. Random number generators are like pizza. When they're bad, they're usually still pretty good.
I still bought some of his dice though. They're quite pretty since they have their original facets.
Oh, and if you just need fair d6's you can readily get ahold of casino dice.
It only matters whether the "random" number generated is independent of the identities of the people applying to be admitted.
Indeed, if it was a true random number it would end up discriminating against unlucky people. Something the poor sod having to use an Excel spreadsheet of this magnitude probably understands all too well now.
What's wrong with first come, first serve?
Coder's Stone: The programming language quick ref for iPad
I was helping two elderly gentlemen a few years ago try to fix the problem they were having keeping bowling score averages for their group. They had a very simple spreadsheet, which had been created in Microsoft Works spreadsheet (part of the low-end, no-frills application suite Microsoft used to provide with PCs). If someone bowled 100 the first game, 100 the second game, and did not bowl the third game, Works calculated the average as 66.67, not 100 which was expected!
After searching online, it turns out that Microsoft Works spreadsheet's AVERAGE function treated empty cells as if they had a zero inserted! Completely brain-dead. I've never seen any other spreadsheet program treat empty cells as zeros.
We're not doing cryptography here. This is just assigning an arbitrary numerical value to a row and then sorting based on those values. Unless they can show that the sorting comes out in some non-random, predictable order, this is a non-issue.
I think you're confusing libertarians with anarchists.
That wouldn't be surprising. Libertarians often confuse libertarians with anarchists.
by Anonymous Coward on Monday June 11, 2018 @05:33PM
Divide the number of slots by the number of people. in this case it's 10 = 10K/100K
Put 10 numbers in a hat and draw out one, say 7. Admit applicants 7. 17. 27. 37....
Who cares if more people named Aaron get picked than Zachary? I don't know either, so the process is random!
So, Canada just does not want unlucky immigrants...
That said, I'm shocked, shocked to find out, the most adorable country in the world accepts only about 10% of the immigrants seeking to enter (legally)...
In Soviet Washington the swamp drains you.
The bigger problem is audibility. Who's to say the guy running the lottery isn't rolling and rerolling until some friend wins?
Too many white people are winning. We need to have more (((diversity))).
Why wouldn't they build the wall to make a deal on DACA? Trump said he was willing to deal, but they just gave up.
I've read academic articles that essentially claimed, there is no such thing as a random number. Even hardware based noise generators contained the seeds of order, or biases of various sorts.
If you want to get pedantic, get a mathematician involved!
Oh wait, I also read an article that claimed the entire basis of mathematical knowledge was shaky, and nothing could be "proven for certain". And no, it was not by or about philosophy, we are talking hard math here.
Really, Really Random Number Generator
Why a Wall Full of Lava Lamps Is a Terrific Random Number Generator
Fixed it for ya
I don't see a lot of consistent application of principles from them. I've yet to meet one that turned down free medical care when they needed it. I've known a lot of libertarians who go to the VA long after they've left the military. I know a lot that work in psuedo private sector jobs like the defense industry. My personal favorite is a libertarian friend of mine who gets it from his dad, but has severe health problems. He's come up with some of the craziest justifications to square his LIbertarian ideas with the fact that he needs medicine to live but can't afford to buy it himself (and wouldn't be able to even in a perfect libertarian world since his illness is bad enough he can't work).
Even Ayn Rand took social security in her old age. Though to her credit she had to be convinced to take it rather than die in the street. Her writings weren't profitable until the Republicans decided they needed an intellectual
My experience with Libertarians is they're folks who never grew out of that phase in your teenage life where you really, really hated being told what to do. You know the one. It's when you're just starting to realize how capable you are, when you're at your peak of learning capacity and you're figuring things out faster than the adults. And you really are (teenage brains work that way).
What I find especially maddening is the libertarians who rail against coastal elites and SJW and are perfectly OK with billionaires having unlimited wealth because, hey, they earned it by virtue of having it. Never mind the fact that money is power and you can't be free in a world with that much wealth inequality. After all, you're not free if somebody controls your access to food, shelter, healthcare, education and transportation (the latter needed to access the former). You're one week's food, one winter's cold or one pill away from slavery. True freedom only arrives when everybody has their needs cared for not because they can threaten or cajole people into getting it but because they're humans, and humans have a right to those things.
Hi! I make Firefox Plug-ins. Check 'em out @ https://addons.mozilla.org/en-US/firefox/addon/youtube-mp3-podcaster/
The story is about an issue that is completely irrelevant.
It doesn't matter whether the "RND" function is ideally random in a mathematical sense. It only matters whether the "random" number generated is independent of the identities of the people applying to be admitted.
With no intention of diminishing the importance of your statement; that is blindingly obvious. There are two other excellent points raised by others in the comments here: that imperfect randomness does not make the process manipulable by immigration candidates and that sort order of assigned imperfect random numbers can itself be perfectly random.
The story is mis-reported as a scandal; there is in fact no scandal whatsoever. So who made up the fake news? Tom Cordoso is the author of the original story at the Globe and Mail which the Gizmodo article linked in the Slashdot summary cites. Cordoso quotes Université de Montréal computer-science professor Pierre L’Ecuyer as saying “Anything would be better” [Than the Excel random number generator] but, crucially, Cordoso omits the context of that comment. Was L’Ecuyer referring to its suitability for this particular method and application, or was he commenting on its suitability for general use, including, for example cryptography? In neither the Gizmodo nor Globe and Mail articles can I find any mention of an expert unambiguously expressing judgment on the immigration randomization method specifically. A close reading suggests that the criticism originates with the journalist, and that he deceptively implies it to be the opinion of experts.
Some enterprising citizen journalist should contact the cited experts and ask them 1) Did their comments refer to general usage of the Excel random number generator or specifically to the immigration randomization methodology. 2) What is their opinion of the immigration randomization methodology 3) Do they agree with the points made here about it being a nothingburger 4) Have they read the Globe and Mail article, if so do they believe that their comments were wrongly contextualized.
If anyone does that, it would be nice to see a followup article here on Slashdot.
Ceci n'est pas une signature.
I was inordinately smug in the early 1980's, when I liked computers and my randomizer seed was the time the user pressed return, in milliseconds.
The article and embedded links talk about how bad Excel's algorithm is, but never states the Excel version that Canada's IRCC uses. In this case it matters because recent versions of Excel are OK.
Excel 2010 and later uses Mersenne Twister for the PRNG. This is good.
https://support.office.com/en-...
Excel versions before Excel 2010 use an implementation of the Wichman-Hill that provides not-so-good pseudorandom numbers.
https://support.microsoft.com/...
There is no telling what its going to do.
You may have heard of the basic income experiment in Finland, for which they chose 2000 individuals from a target group of their interest (basically unemployed of certain age bracket). They did publish the source code used to pick this random sample, and I did take a look at it; although it was using much more a professional tool than Excel (I already forgot the details), I had to conclude that if random number generator was even seeded (which probably used time of day on second resolution), it probably had less than twenty bits of meaningful randomness in it. Proper randomness would have required using tens of thousands of bits of randomness, and probably a better-designed selection routine, considering legal outcomes of this.
It is very unlikely any trouble would actually arise from this case (neither on validity of the experiment, or as being biased against groups or individuals on any intentional level), but it left me suspicious of the whole deal. There should be more trustworthy methods for these kinds of setups!
use random.org
you're welcome.
Is it? Can you quote a single Democrat saying this? Ever?
Actions speak louder than words. They oppose all attempts to secure the border, enforce immigration law, or deport those here illegally.
Not even close. In fact the Obama administration deported 2.5 million illegal aliens, not only more than any other presidential administration, but more than all the other 20th century presidential administrations put together. They called him the "deporter in chief".
"Actions speak louder than words."
https://www.wusa9.com/article/news/local/verify-did-obama-deport-more-people-than-any-other-president/408785995
https://www.npr.org/2017/01/20/510799842/obama-leaves-office-as-deporter-in-chief
https://splinternews.com/sorry-obamas-still-deporter-in-chief-1821625282
https://abcnews.go.com/Politics/obamas-deportation-policy-numbers/story?id=41715661
https://www.migrationpolicy.org/article/obama-record-deportations-deporter-chief-or-not
Democrats generally want more ILLEGAL aliens they can later TURN INTO voting residents who are dependent on "the system" through never-ending amnesty programs.
That's silly. There aren't any "never ending" amnesty programs, there are no amnesty programs whatsoever that turn illegal aliens into voting residents,period. These don't exist.
The last time there was an amnesty program was under President Reagan, who provided amnesty for 3 million illegal immigrants. That was back in 1987, over thirty years ago. (that was the Immigration Reform and Control Act of 1986, followed by Reagan's executive action to give legal status to more illegals not covered by the Immigration act the year following that.)
I used to work at a gaming company. For years before I arrived, they used Microsoft Access to pick locations to drop winning prize packages. This was for games like cereal boxes with a winning game piece glued inside. It was just basic VBA - seed random number generator with date/time, then pick a "random" row from a table of zip codes. Apparently it had held up to audit (full disclosure, this was circa 2000).
Once I was there, for a new game project, my proposal was to use a raffle drum (which we did actually possess), purchase true random number generation equipment, or disclose that computer selection would be "pseudo-random". I literally got screamed at by a vice president for that. They wanted a shortcut to glory: quick, cheap, fast, and easy. Thankfully I got backing from our tech-saavy CIO and my proposal went forward as-is.
Never got an apology from the VP, though. Less than two years later, they were out of business. The reason?
Most of their clients fired them after a massive scandal caused by taking shortcuts on security and procedural oversight.
I first heard "Good Enough for Government Work" 45 years ago and it was old then. Why the sudden umbrage over this obvious irony?
BT
Tracy Johnson
Old fashioned text games hosted below:
http://empire.openmpe.com/
BT
excel is being used for everything you can imagine, it's crazy.
it's the hammer that makes every problem look like a nail.
On a long enough timeline, the survival rate for everyone drops to zero.
1) OMG, excel, rly?
2) Well at least it is a newer version of excel if it can handle 100,000 records...
3) Why not at the very least Access, I mean at least it is a DB of sorts...
4) OK Summary and Slashdotters, don't get in a twist about how mathematically accurate the RND function is and how truly random it is. They are not using it in a statistically relevant way to derive scientific samples or something. They are simply using it select a bunch of people in an unbiased way not based on immigration criteria. You could do the same thing all sorts of different ways, but this was probably just the easiest thing they could come up with. I have no doubt there are all sorts of lottery type selections that happen on all sorts of things that are likely way less random than even using an excel function. They summary says that using the RND function "MAY" cause some people to have a lower chance, but doesn't really elaborate on exactly how that might occur, nor does the article shed any light on it either. It *MAY* work just fine. You could also use simple changes in process by say assigning a RND number by record, then sorting by that number, then assigning another RND number, then resorting that number to try and eliminate (or mitigate at least) any possible statistical relationship between number creation and order and time etc... Anyway for the primary function of making unbiased selections it is probably a perfectly fine method to use, even if the random calculation isn't a perfect as some other methods.
5) Lastly, while silent on the actual method used specifically other than speculation, I would hope that loser applicants might get a weighted advantage on the next years lottery, say a duplicate record for each loss for example... Anyway just a thought.