As counterintuitive as it may sound, it is possible to eliminate cheating in poker through cryptographic methods. Protocols designed for that purpose are called mental poker protocols. The key idea is to encrypt the `deck' and give it to someone else to shuffle it, then later reveal the key used so that people can check consistency of the deck. The full protocols are of course a bit more complicated than this.
In fact, these ideas are not limited to poker; a useful primitive is a cryptographic coin flipping protocol, from which various shuffling-related algorithms can be constructed.
Look up Joux's construction for breaking concatenated hashes. Concatenated hashes are only as strong as the strongest among them, whereas you would expect that the concatenation of an n-bit hash with an m-bit hash would provide (n+m)-bit security.
Although I'm not sure how these shortcut collision attacks fit in Joux's construction -- my recollection of it was that it was useful for collision search using the birthday paradox.
That may be a SuperPi world record, but definitely not the overall record: Steve Pagliarulo's QPi can compute 1 million digits of Pi in 6.68 seconds in a Pentium 4 1.6 GHz box. You read that right.
BTW, the same computer takes 189 seconds to compute 2^20 (~1 million) digits using SuperPi. Among the community of Pi-calculating programmers, it's well known that SuperPi is terribly slow. I don't know why overclockers still hang on to it when most programs out there for calculating Pi are faster than it.
I've been using computers for 15 or 16 years now, and been a QWERTY touch typist for at least 10. Although I never had a problem with RSI (I think guitar playing might be related to that), a couple of years ago I decided to preemptively switch to Dvorak like you, and I also appreciated the challenge in a layout switch. I was aware of the fact that I'd be typing in QWERTY everywhere except on my computer at home.
The transition period is the worst. During a certain phase of your training, you can't type quickly in either layout, so be prepared to lose some productivity -- in fact I suggest that you do this while on vacation, or at least be sure that you're not going through a stressful period in your life, or you'll give up before training is complete.
After a while I started to get up to speed with Dvorak but not QWERTY. I believe this is mostly due to the fact that I was typing in Dvorak most of the time -- it seems one actually needs some retraining to regain their old speed in QWERTY, and this is what happened.
Today I can type as fluently in Dvorak as I can in QWERTY, although most of my typing is done in Dvorak, of course. I'm periodically forced to use QWERTY, which ensures I don't forget about it. Now I can't comment on what happens if you use Dvorak all the time and QWERTY very seldom (say once a month for a few minutes), whether you lose your QWERTY skills or not. However I'd expect that the skills are indeed lost if you don't practice, so you might consider periodically switching to QWERTY if you've gone too long without using it.
A warning though: although I can easily switch from one layout to another, it's a bit more problematic if you have to do it frequently. For instance, sometimes I'm logged to a Windows box via RDP, and the RDP client forces the QWERTY layout despite the system-wide default for Dvorak. In these situations I have the expected problems (start typing on a new window and realize that you're still using the layout of the previous window), but in addition I might get confused midsentence or something like that. Seems the brain can't handle constant switchs.
Finally, something I wish I had been told before switching: the hardest part of switching are keyboard shortcuts. The way your mind processes them is not to think of the key Control plus the letter C, say, but rather something like the key at the corner of the keyboard plus the key in the bottom row that is pressed by the middle finger. Getting used to keyboard shortcuts in the new layout was probably the hardest part of the switch for me.
This is almost unbelievable. Yesterday I was still arguing with a friend that this was impossible, the world would end before Apple switched to Intel.
Although, I like the news. I've been an Intel zealot for the past few years, but got tired of Linux (hadn't used Windows either for a while) so I decided to test new waters with the Mac. It's great that I can now have my favorite processor with my new favorite OS (:
This is the best news of the year, for me at least. No longer will I (and many other developers) need to compromise when writing multiplatform C++ code -- Qt is just the natural choice for this task, and I'm very glad they decided to release Qt again under the GPL.
Again, kudos to Trolltech -- great companies like them are pretty much the exception these days.
However, it is sad that they didn't publish full details of the cipher. This goes against full disclosure principles.
I can already hear screams of `what do you want the cipher for? Are you going to steal cars and get free gas?' No. But using this excuse, researchers can prevent me and others from implementing a faster attack, or even finding an attack of smaller complexity -- this is a Feistel cipher, so it shares some structure with DES and thus some similar attacks (linear, differential cryptanalysis) might apply.
They're basically monopolizing their right to do research on this device. Sure, it's their call, particularly as they put a lot of work into it, but it's not exactly following established principles of academia.
And for God's sake, don't try to disguise this information-hiding attitude as a theft prevention device -- the amount of published details is just enough for a blackhat with a modest amount of resources to produce another working implementation, but probably no one else is going to bother. Cars will be stolen anyway, but researchers will be unable to do their job.
I've seen the privacy objections on other posts, and they're right, but everyone's overlooking a very interesting possibility -- setting up a network of trusted friends/relatives/etc. If they're really trusted
I believe if this network is going to survive, it'll be by allowing the creation of such trusted communities.
Re:Can't see how Verisign could win.. (article)
on
The Race Is On For .net
·
· Score: 2, Insightful
So Denic isn't messing about and while ICANN would love nothing more than VeriSign to lose the.net registry, it would be equally delighted to see Denic win it. Why? Because Denic is the most powerful registry outside of ICANN control.
So it appears that The Registrar thinks that DENIC eG will win the bid. This is especially apparent when contrasted with their earlier snippet about Verisign's bid:
That was just an unfortunate choice of words by The Register. Reading further one sees that Denic is aggressively opposed to ICANN, so what the article author meant was that ICANN would hate to see either Verisign or Denic winning the bid.
AMD has declared dominance in the gaming and server microprocessor market
AMD may declare what they want, but the numbers speak for themselves. I strongly doubt anyone can provide numbers showing that AMD is ahead of Intel in the server market (though I may grudgingly concede the gaming market).
Regardless, I can assure you that it'd be just as fast if about 20% not 100% of the code were written in C. I'm not familiar with emulators, so maybe those 20% are actually 30 or 40%, but never 100%. I assert this as a speed freak and optimized assembly coder (did a couple of cores for distributed.net). It's just a complete waste of time to write, say, GUI code and file handling in assembly.
Actually, I'd go so far as to hypothesize that ZSNES would be faster if it were written in C/C++ with careful assembly optimization only where needed: the higher productivity associated with a high-level language would mean more time to optimize the parts of the code where speed really matters.
I was always partial to SNES9X for some reason (perhaps it's the fact that they don't waste their time coding everything in assembly, as nobody should), and it's also open source. Whether it is GPL'd or not is just flamewar fodder -- most certainly the submitter's intention.
Re:The entire 1 gigabyte size issue....
on
The Webmail Wars
·
· Score: 2, Interesting
That has to be one of the best ideas I've read on Slashdot in a long time. Congratulations.
By the way, indexing email attachments is very simple, just do it like a P2P network would: compute a hash on the attachment, store it along with the attachment's size and check for matches.
Someone might complain about the possibility of collisions under this scheme. Now if a secure hash function were used (not MD5 as it has been broken) then the system would be, for all practical purposes, shielded from collisions: even a smallish 128-bit hash would require approx. 2^64 different attachments to be present in the system, before the probability of finding a collision by the birthday paradox would be non-negligible.
This is an unthinkable amount of messages -- even if all 6 billion people on earth were to send 10 messages a day, each with different attachments, it would take almost a million years before Google would need to worry about collisions. But of course they could check the size of attachments before declaring a match, making it even harder to find collisions (as if it wasn't hard enough). The statistical distribution of message attachment sizes is hard to predict, but it would easily add another few orders of magnitude to the amount of messages required to produce a collision.
The GMP library uses a probabilistic algorithm as documented here. This is not as bad as it sounds. For instance, Dan Bernstein suggests in this paper a probabilistic procedure which consists of some trial factoring, a strong probable prime test and a Lucas test. Quoting him:
There is no known example of a composite number p that slips past all of these tests; I'm confident that nobody will ever find one by accident.
Of course, if you absolutely have to be sure that you're dealing with a prime, you could use a probabilistic algorithm to sieve for probable primes, then apply the n-1 test to obtain a primality certificate for your probable prime (which then becomes a certified prime with 0% probability of being composite). Of course this would be a bit costly, but for small integers the n-1 test is hard to beat -- ECPP and APR-CL have just too much overhead at this range. But I'm pretty sure your application, whatever it is, can live with the small probability that a PrP is composite.
I'd venture to say I know what I'm talking about. One can't publish a paper about large-scale distributed primality proofs without knowing these basics. Here's a sample of my work. Now don't get me wrong, I'm not trying to sound like a smart-ass, but I'm not going to let someone question my knowledge in this field (which I have studied for years) and remain quiet.
Now it seems people here are not familiar with the terminology associated with Mersennes, so I'll give a crash course.
A Mersenne number M_n is of the form 2^n - 1. If 2^n - 1 is prime, then M_n is a Mersenne prime. If 2^n - 1 is composite, then M_n is a Mersenne composite. The integer n in this expression is called the exponent. When I talk about `exponents of Mersenne composites', I'm refering to some prime n for which M_n = 2^n - 1 is composite.
Now if 2^n - 1 is prime, then n must be prime, but the converse is not always true, the smallest counterexample being 2^11 - 1 = 2047 = 23*89. Turns out that for prime exponents n = 24,036,583, only in 41 cases is 2^n - 1 actually prime. A list of these cases can be found here. For the other 1,509,222 prime values of n = 24,036,583, 2^n - 1 is composite. This translates to a compositeness ratio of 99.9973%.
If I were wrong, then most of the Mersenne numbers would be composite.
Do you have reading comprehension problems? Or is it just logic that escapes you? If only 41 primes up to 24,036,583 correspond to exponents of Mersenne primes, then the remaining 1.5 million or so primes up to 24,036,583 correspond to exponents of, you guessed it, Mersenne composites. I'd say that a compositeness ratio of 99.997% would indicate that `most' Mersenne numbers are composite.
You have suggested that most of the large prime numbers are not Mersenne numbers, which is an unrelated concept
...and I'm going to list a few improvements here that I think the author missed.
In one of the early programs, the author had the idea of skipping all even numbers, which are of course divisible by 2. In the next program he found out how he could skip numbers divisible by 3. One can make a systematic method out of that -- in the literature it's known as adding a wheel to the sieve of Erathostenes. Thing is, he implemented the sieve of Erathostenes foregoing this improvement he had implemented earlier. This leads to a big improvement: when using a wheel with the first four primes 2,3,5,7, there are only 48 residue classes out of 210 = 2*3*5*7 possible that are not divisible by any of these four primes. In practice this means that, for each 210 numbers you'd consider, the sieve would only need be applied in 48 of them. That's a gain of 4.375. One can make a gain of nearly 5 using a wheel of the first five primes, but it begins to get complicated.
Another improvement would be to make the program cache-friendly. He sort of did this, down one level of the memory hierarchy, by paging the data to disk -- now he only needed to realize that, just as memory is much faster than disk, cache memory is faster than main memory. I've developed a sieve employing this optimization and it pays off indeed. This is considered a basic technique in the realm of high-performance computing. (The NFSNET, see my sig, uses a sieve including this and other optimizations, for instance).
I might get flamed for this, but he should consider assembly programming. There aren't many applications where assembly will pay off so handsomely as this one, and the `core' of the sieve is very small, so only a few lines of assembly are needed. Vector integer instructions, such as MMX and the integer instructions of SSE2, can be employed to very good advantage here. My sieve program also included this, and it gives a real nice boost -- and only about 25 lines of assembly were used, the rest was C.
Lastly there's the issue of representing the output. He stored the primes on disk by writing their values. A little compression can be realized in this representation: for instance he's printing out the primes in decimal, while hexadecimal would be more space-efficient. And instead of printing out the ASCII characters from 0 to 9, A to F, he could use a binary representation and read them through an hex dump -- that saves half the space, as a byte is used to store two hexadecimal digits instead of one.
A perfectly good representation that could save a lot of space (8 times, to be precise) would be a bitmap, which he used on some of his programs. That's where you represent primality or lack of it by setting and clearing bits of an integer. It's pretty easy to determine a prime from a bitmap value, so this representation is just as good as the previous one. One could also use the wheel idea I explained before to save even more space: a concrete example would be that for each block of 210 numbers, one needs only bitmaps for 48 of them, as the rest are divisible by the first four primes. In this example a savings of 4.375 would be realized. Of course, all these ideas would require special readers programs, instead of just opening up the files in a text editor -- a small price to pay for such a large space savings.
But in case only sequential reading of the primes is desired, there is an even more efficient representation: just write out the prime gaps, i.e. the difference between consecutive primes. One bit can be saved by realizing that, other than 3 - 2, all prime gaps are even, so one can just store the prime gap divided by two. One could really nitpick and point out that 0 is an impossible prime gap, so one could subtract two from all gaps, but this gain is too small to be worth the hassle. According to Thomas Nicely (incidentally the guy who discovered the bug on the Pentium FPU during his studies on computational number theory), one could store the prime gaps of all
Blockquoth the FA's author, regarding how to store a list of primes:
Keeping it in an array is simplest, but one must declare the array before finding primes. How big do you declare it.
Have a look here for a pretty tight upper bound on the number of primes up to a given number. Using an array, instead of a linked list, would probably lead to a small speed improvement on his code.
He could also use an std::vector from C++. As far as I can tell it's pretty easy to resize.
Yes, quantum computing can factor integers in polynomial take. However, it takes exponential time to simulate a quantum Turing machine. So in the end your algorithm is still exponential on a deterministic Turing machine, and probably with much worse asymptotics to beat. I bet my lowly Pentium 4, running an implementation of the Number Field Sieve, would be faster the Blue Gene running a simulation of Shor's quantum factoring algorithm.
Hello Phil, how are you doing? Sorry I didn't answer your email back when I received it.
However, I don't think we can easily do sieving, for instance, using the FPU. That would rule out the most powerful factoring algorithms like NFS and MPQS.
Also, GIMPS is completely different from your ordinary key cracking workload. They're doing huge FFTs on really huge numbers. You can't really compare it to a factoring program working on hundred-digits integers.
As counterintuitive as it may sound, it is possible to eliminate cheating in poker through cryptographic methods. Protocols designed for that purpose are called mental poker protocols. The key idea is to encrypt the `deck' and give it to someone else to shuffle it, then later reveal the key used so that people can check consistency of the deck. The full protocols are of course a bit more complicated than this.
In fact, these ideas are not limited to poker; a useful primitive is a cryptographic coin flipping protocol, from which various shuffling-related algorithms can be constructed.
Look up Joux's construction for breaking concatenated hashes. Concatenated hashes are only as strong as the strongest among them, whereas you would expect that the concatenation of an n-bit hash with an m-bit hash would provide (n+m)-bit security.
Although I'm not sure how these shortcut collision attacks fit in Joux's construction -- my recollection of it was that it was useful for collision search using the birthday paradox.
That may be a SuperPi world record, but definitely not the overall record: Steve Pagliarulo's QPi can compute 1 million digits of Pi in 6.68 seconds in a Pentium 4 1.6 GHz box. You read that right.
BTW, the same computer takes 189 seconds to compute 2^20 (~1 million) digits using SuperPi. Among the community of Pi-calculating programmers, it's well known that SuperPi is terribly slow. I don't know why overclockers still hang on to it when most programs out there for calculating Pi are faster than it.
I've been using computers for 15 or 16 years now, and been a QWERTY touch typist for at least 10. Although I never had a problem with RSI (I think guitar playing might be related to that), a couple of years ago I decided to preemptively switch to Dvorak like you, and I also appreciated the challenge in a layout switch. I was aware of the fact that I'd be typing in QWERTY everywhere except on my computer at home.
The transition period is the worst. During a certain phase of your training, you can't type quickly in either layout, so be prepared to lose some productivity -- in fact I suggest that you do this while on vacation, or at least be sure that you're not going through a stressful period in your life, or you'll give up before training is complete.
After a while I started to get up to speed with Dvorak but not QWERTY. I believe this is mostly due to the fact that I was typing in Dvorak most of the time -- it seems one actually needs some retraining to regain their old speed in QWERTY, and this is what happened.
Today I can type as fluently in Dvorak as I can in QWERTY, although most of my typing is done in Dvorak, of course. I'm periodically forced to use QWERTY, which ensures I don't forget about it. Now I can't comment on what happens if you use Dvorak all the time and QWERTY very seldom (say once a month for a few minutes), whether you lose your QWERTY skills or not. However I'd expect that the skills are indeed lost if you don't practice, so you might consider periodically switching to QWERTY if you've gone too long without using it.
A warning though: although I can easily switch from one layout to another, it's a bit more problematic if you have to do it frequently. For instance, sometimes I'm logged to a Windows box via RDP, and the RDP client forces the QWERTY layout despite the system-wide default for Dvorak. In these situations I have the expected problems (start typing on a new window and realize that you're still using the layout of the previous window), but in addition I might get confused midsentence or something like that. Seems the brain can't handle constant switchs.
Finally, something I wish I had been told before switching: the hardest part of switching are keyboard shortcuts. The way your mind processes them is not to think of the key Control plus the letter C, say, but rather something like the key at the corner of the keyboard plus the key in the bottom row that is pressed by the middle finger. Getting used to keyboard shortcuts in the new layout was probably the hardest part of the switch for me.
Hope that helps.
This is almost unbelievable. Yesterday I was still arguing with a friend that this was impossible, the world would end before Apple switched to Intel.
Although, I like the news. I've been an Intel zealot for the past few years, but got tired of Linux (hadn't used Windows either for a while) so I decided to test new waters with the Mac. It's great that I can now have my favorite processor with my new favorite OS (:
This is the best news of the year, for me at least. No longer will I (and many other developers) need to compromise when writing multiplatform C++ code -- Qt is just the natural choice for this task, and I'm very glad they decided to release Qt again under the GPL.
Again, kudos to Trolltech -- great companies like them are pretty much the exception these days.
However, it is sad that they didn't publish full details of the cipher. This goes against full disclosure principles.
I can already hear screams of `what do you want the cipher for? Are you going to steal cars and get free gas?' No. But using this excuse, researchers can prevent me and others from implementing a faster attack, or even finding an attack of smaller complexity -- this is a Feistel cipher, so it shares some structure with DES and thus some similar attacks (linear, differential cryptanalysis) might apply.
They're basically monopolizing their right to do research on this device. Sure, it's their call, particularly as they put a lot of work into it, but it's not exactly following established principles of academia.
And for God's sake, don't try to disguise this information-hiding attitude as a theft prevention device -- the amount of published details is just enough for a blackhat with a modest amount of resources to produce another working implementation, but probably no one else is going to bother. Cars will be stolen anyway, but researchers will be unable to do their job.
I've seen the privacy objections on other posts, and they're right, but everyone's overlooking a very interesting possibility -- setting up a network of trusted friends/relatives/etc. If they're really trusted
I believe if this network is going to survive, it'll be by allowing the creation of such trusted communities.
That was just an unfortunate choice of words by The Register. Reading further one sees that Denic is aggressively opposed to ICANN, so what the article author meant was that ICANN would hate to see either Verisign or Denic winning the bid.
Hey Slashdot editors, why not append NN: (for `not news') or some such before the story title so I can avoid wasting my time reading the summary?
AMD may declare what they want, but the numbers speak for themselves. I strongly doubt anyone can provide numbers showing that AMD is ahead of Intel in the server market (though I may grudgingly concede the gaming market).
Regardless, I can assure you that it'd be just as fast if about 20% not 100% of the code were written in C. I'm not familiar with emulators, so maybe those 20% are actually 30 or 40%, but never 100%. I assert this as a speed freak and optimized assembly coder (did a couple of cores for distributed.net). It's just a complete waste of time to write, say, GUI code and file handling in assembly.
Actually, I'd go so far as to hypothesize that ZSNES would be faster if it were written in C/C++ with careful assembly optimization only where needed: the higher productivity associated with a high-level language would mean more time to optimize the parts of the code where speed really matters.
I was always partial to SNES9X for some reason (perhaps it's the fact that they don't waste their time coding everything in assembly, as nobody should), and it's also open source. Whether it is GPL'd or not is just flamewar fodder -- most certainly the submitter's intention.
That has to be one of the best ideas I've read on Slashdot in a long time. Congratulations.
By the way, indexing email attachments is very simple, just do it like a P2P network would: compute a hash on the attachment, store it along with the attachment's size and check for matches.
Someone might complain about the possibility of collisions under this scheme. Now if a secure hash function were used (not MD5 as it has been broken) then the system would be, for all practical purposes, shielded from collisions: even a smallish 128-bit hash would require approx. 2^64 different attachments to be present in the system, before the probability of finding a collision by the birthday paradox would be non-negligible.
This is an unthinkable amount of messages -- even if all 6 billion people on earth were to send 10 messages a day, each with different attachments, it would take almost a million years before Google would need to worry about collisions. But of course they could check the size of attachments before declaring a match, making it even harder to find collisions (as if it wasn't hard enough). The statistical distribution of message attachment sizes is hard to predict, but it would easily add another few orders of magnitude to the amount of messages required to produce a collision.
So is 31337 in decimal.
Of course, if you absolutely have to be sure that you're dealing with a prime, you could use a probabilistic algorithm to sieve for probable primes, then apply the n-1 test to obtain a primality certificate for your probable prime (which then becomes a certified prime with 0% probability of being composite). Of course this would be a bit costly, but for small integers the n-1 test is hard to beat -- ECPP and APR-CL have just too much overhead at this range. But I'm pretty sure your application, whatever it is, can live with the small probability that a PrP is composite.
Sorry, forgot to link to the list of Mersenne primes: click here
I'd venture to say I know what I'm talking about. One can't publish a paper about large-scale distributed primality proofs without knowing these basics. Here's a sample of my work. Now don't get me wrong, I'm not trying to sound like a smart-ass, but I'm not going to let someone question my knowledge in this field (which I have studied for years) and remain quiet.
Now it seems people here are not familiar with the terminology associated with Mersennes, so I'll give a crash course.
A Mersenne number M_n is of the form 2^n - 1. If 2^n - 1 is prime, then M_n is a Mersenne prime. If 2^n - 1 is composite, then M_n is a Mersenne composite. The integer n in this expression is called the exponent. When I talk about `exponents of Mersenne composites', I'm refering to some prime n for which M_n = 2^n - 1 is composite.
Now if 2^n - 1 is prime, then n must be prime, but the converse is not always true, the smallest counterexample being 2^11 - 1 = 2047 = 23*89. Turns out that for prime exponents n = 24,036,583, only in 41 cases is 2^n - 1 actually prime. A list of these cases can be found here. For the other 1,509,222 prime values of n = 24,036,583, 2^n - 1 is composite. This translates to a compositeness ratio of 99.9973%.
I hope everything is clearer now.
Do you have reading comprehension problems? Or is it just logic that escapes you? If only 41 primes up to 24,036,583 correspond to exponents of Mersenne primes, then the remaining 1.5 million or so primes up to 24,036,583 correspond to exponents of, you guessed it, Mersenne composites. I'd say that a compositeness ratio of 99.997% would indicate that `most' Mersenne numbers are composite.
I can only suggest you to heed your own advice.
...and I'm going to list a few improvements here that I think the author missed.
In one of the early programs, the author had the idea of skipping all even numbers, which are of course divisible by 2. In the next program he found out how he could skip numbers divisible by 3. One can make a systematic method out of that -- in the literature it's known as adding a wheel to the sieve of Erathostenes. Thing is, he implemented the sieve of Erathostenes foregoing this improvement he had implemented earlier. This leads to a big improvement: when using a wheel with the first four primes 2,3,5,7, there are only 48 residue classes out of 210 = 2*3*5*7 possible that are not divisible by any of these four primes. In practice this means that, for each 210 numbers you'd consider, the sieve would only need be applied in 48 of them. That's a gain of 4.375. One can make a gain of nearly 5 using a wheel of the first five primes, but it begins to get complicated.
Another improvement would be to make the program cache-friendly. He sort of did this, down one level of the memory hierarchy, by paging the data to disk -- now he only needed to realize that, just as memory is much faster than disk, cache memory is faster than main memory. I've developed a sieve employing this optimization and it pays off indeed. This is considered a basic technique in the realm of high-performance computing. (The NFSNET, see my sig, uses a sieve including this and other optimizations, for instance).
I might get flamed for this, but he should consider assembly programming. There aren't many applications where assembly will pay off so handsomely as this one, and the `core' of the sieve is very small, so only a few lines of assembly are needed. Vector integer instructions, such as MMX and the integer instructions of SSE2, can be employed to very good advantage here. My sieve program also included this, and it gives a real nice boost -- and only about 25 lines of assembly were used, the rest was C.
Lastly there's the issue of representing the output. He stored the primes on disk by writing their values. A little compression can be realized in this representation: for instance he's printing out the primes in decimal, while hexadecimal would be more space-efficient. And instead of printing out the ASCII characters from 0 to 9, A to F, he could use a binary representation and read them through an hex dump -- that saves half the space, as a byte is used to store two hexadecimal digits instead of one.
A perfectly good representation that could save a lot of space (8 times, to be precise) would be a bitmap, which he used on some of his programs. That's where you represent primality or lack of it by setting and clearing bits of an integer. It's pretty easy to determine a prime from a bitmap value, so this representation is just as good as the previous one. One could also use the wheel idea I explained before to save even more space: a concrete example would be that for each block of 210 numbers, one needs only bitmaps for 48 of them, as the rest are divisible by the first four primes. In this example a savings of 4.375 would be realized. Of course, all these ideas would require special readers programs, instead of just opening up the files in a text editor -- a small price to pay for such a large space savings.
But in case only sequential reading of the primes is desired, there is an even more efficient representation: just write out the prime gaps, i.e. the difference between consecutive primes. One bit can be saved by realizing that, other than 3 - 2, all prime gaps are even, so one can just store the prime gap divided by two. One could really nitpick and point out that 0 is an impossible prime gap, so one could subtract two from all gaps, but this gain is too small to be worth the hassle. According to Thomas Nicely (incidentally the guy who discovered the bug on the Pentium FPU during his studies on computational number theory), one could store the prime gaps of all
Have a look here for a pretty tight upper bound on the number of primes up to a given number. Using an array, instead of a linked list, would probably lead to a small speed improvement on his code.
He could also use an std::vector from C++. As far as I can tell it's pretty easy to resize.
Wrong. There's more than 1.5 million primes up to 24,036,583 (the largest Mersenne exponent known), and only 41 of these are Mersenne exponents.
You just need the right software. With PARI-GP my 2.6 GHz P4 takes a little more than 13 seconds.
Yes, quantum computing can factor integers in polynomial take. However, it takes exponential time to simulate a quantum Turing machine. So in the end your algorithm is still exponential on a deterministic Turing machine, and probably with much worse asymptotics to beat. I bet my lowly Pentium 4, running an implementation of the Number Field Sieve, would be faster the Blue Gene running a simulation of Shor's quantum factoring algorithm.
Hello Phil, how are you doing? Sorry I didn't answer your email back when I received it.
However, I don't think we can easily do sieving, for instance, using the FPU. That would rule out the most powerful factoring algorithms like NFS and MPQS.
Also, GIMPS is completely different from your ordinary key cracking workload. They're doing huge FFTs on really huge numbers. You can't really compare it to a factoring program working on hundred-digits integers.