Slashdot Mirror


New Pattern Found In Prime Numbers

stephen.schaubach writes "Spanish Mathematicians have discovered a new pattern in primes that surprisingly has gone unnoticed until now. 'They found that the distribution of the leading digit in the prime number sequence can be described by a generalization of Benford's law. ... Besides providing insight into the nature of primes, the finding could also have applications in areas such as fraud detection and stock market analysis. ... Benford's law (BL), named after physicist Frank Benford in 1938, describes the distribution of the leading digits of the numbers in a wide variety of data sets and mathematical sequences. Somewhat unexpectedly, the leading digits aren't randomly or uniformly distributed, but instead their distribution is logarithmic. That is, 1 as a first digit appears about 30% of the time, and the following digits appear with lower and lower frequency, with 9 appearing the least often.'"

26 of 509 comments (clear)

  1. Re:Other bases? by Anonymous Coward · · Score: 5, Funny

    It would be bad.

  2. Re:Other bases? by hkz · · Score: 5, Informative

    Benson's Law is actually independent of the number base used. It wouldn't be much of a mathematical property if it wasn't. No matter how you convert a number, you will always see the same bias.

  3. Re:Other bases? by Anonymous Coward · · Score: 5, Funny

    Bad as in "cross the streams" bad, or "according to an AC on Slashdot" bad ?

  4. Re:Other bases? by Megaweapon · · Score: 5, Funny

    base-9 or base-11?

    NEVER FORGET

    --
    I'm sure "SlashdotMedia" will improve on all the wonders that Dice Holdings blessed us all with
  5. Re:Other bases? by pdxp · · Score: 5, Informative

    It wouldn't change the logarithmic nature of the distribution of the digits, AFAIK.

    My math degree is getting dusty, but I'm pretty sure that the same pattern could be represented in another base by changing their generalization of Benford's law to include it, and the distribution would look like log(x)/log(9) or log(x)/log(11). Remember, changing the base of a logarithm is easy: for example, log(x)/log(e) = ln(x)

    So you get the same distribution, different base.

  6. Cryptography? by PolygamousRanchKid+ · · Score: 5, Funny

    Could this have any applications there?

    "Well, I wasn't expecting The Spanish Mathematician . . ."

    --
    Schroedinger's Brexit: The UK is both in and out of the EU at the same time!
  7. Re:Other bases? by CaseyB · · Score: 5, Funny

    All your base are belong to Benford.

  8. Re:Other bases? by Lillesvin · · Score: 5, Funny

    I'm pretty sure that in base-2 with no zero-padding, 100% will start with 1. :-p

    --
    "Live free or don't."
  9. If you're dealing with phone numbers by Ralph+Spoilsport · · Score: 5, Interesting
    It has less to do with math and more to do wit physics: as in how to use a an old school phone. Phone numbers, until comparatively recently would "prefer" lower numbers because they are EASIER TO DIAL. If a company had the phone number (909)999-9009 you would HATE dialing that thing. It would take about half a minute just to dial the damn number.

    Ssssshhhhhhik!
    diggadiggadiggadiggadiggadiggadiggadiggadigga!

    Total pain in the finger.

    1 as a first number was reserved for "other stuff" like international calls, so the lowest possible area codes (first numbers) went to places like New York City (212 - very quick to dial) or LA (213) because millions of people would be dialing that number, so it made for an overall faster dialing experience for (on average) more people.

    This is compared to the relatively few people who lived in more obscure parts of the country, like Saginaw MI (989) or Bryan TX (979).

    So, you have millions of phones in 212, thousands in 979. The result: saved effort in dialing.

    Also, to this end there was a preference for exchanges to have lower numbers as well to save on dialing effort, and phone numbers with lower (but NON-ZERO) values were sought after. You'd see advertisments like "Call RotoRooter - 213 464 1111 !" or "Call us NOW for a free analysis! 201 738 1122 !" etc. and so on.

    So, lower numbers in phone numbers have been a product of primitive dialing technology. Now with touchtone - all that is out the window - but the historic trend is still there and quite powerful - people will pay good money for a 212 area code for the distinction of being in the "real" New York Area code...

    RS

    --
    Shoes for Industry. Shoes for the Dead.
    1. Re:If you're dealing with phone numbers by jmp_nyc · · Score: 5, Informative

      While you're absolutely right about the reasoning behind NYC, LA, and Chicago getting 212, 213, and 312, you're a little off on the 989 and 979 area codes, which are much more recent.

      In the original system design, all area codes had a middle digit of 0 or 1. The convention was that a middle digit of 1 was used for area codes that only covered part of a state, while a middle digit of 0 was used for area codes that covered entire states. Furthermore, an area code could not begin with a 1 or a 0. and an area code with a middle digit of 1 couldn't have 1 as the third digit. (This left the shortest dial time area code for a statewide code as 201, which went to New Jersey.)

      As early as the late 1950s, the idea of single area codes for some states went out the window (with NJ splitting into 201 and 609 in 1958) because of increasing population and proliferation of phone service.

      By the late 1980s, the rules were further changed to allow for area codes with middle digits other than 1 or 0. Area codes like 989 and 979 weren't introduced until the late 1980s at the very earliest, by which point very few people were still using rotary phones. At one point, I had heard that the middle digit value of 9 was reserved for the future to allow for four digit area codes, but I can't vouch for the accuracy of that recollection. There are plenty of other rules, some of which you can see summarized here...
      -JMP

  10. Re:Other bases? by dynamo52 · · Score: 5, Insightful

    I'm pretty sure that in base-2 with no zero-padding, 100% will start with 1.

    ...and all but one would end with 1 as well.

    --
    Like this comment? I accept Bitcoin! - 153sc8UUBXyp12ofQqfAWDmJrzyiKCYC1x
  11. Independent Verification by eldavojohn · · Score: 5, Interesting
    Here's what I got on my own counts using the first million primes:

    1: 415441
    2: 77025
    3: 75290
    4: 74114
    5: 72951
    6: 72257
    7: 71564
    8: 71038
    9: 70320

    Which puts the probabilities at:

    1: 0.415441
    2: 0.077025
    3: 0.07529
    4: 0.074114
    5: 0.072951
    6: 0.072257
    7: 0.071564
    8: 0.071038
    9: 0.07032

    My computer is currently crunching the first fifty million primes and I will post those as a reply to this post later today when it is done.

    These ratios on numbers 2-9 seem far too close in range for this to be a true log scale. Hopefully with more data it will be more logarithmic.

    --
    My work here is dung.
  12. Re:Other bases? by Anonymous Coward · · Score: 5, Informative

    Numbers are objects, I wish people would understand that numbers are just distinctions. The whole of mathematics is really just a language of form and structure, a system to systematize and decribe structure and forms (relationships are a type of form).

  13. Re:Other bases? by Ibag · · Score: 5, Informative

    Benford's law works by the observation that, when numbers come up in certain real world contexts, the fluctuations you get in numbers should be proportional to the numbers themselves. Phrased differently, variations tend to be relative, not absolute. Because of this, if you have a very large range of random numbers from many real world measurements, then you would expect the number between t and t*(1.0001) not to vary too much for small changes in t. Let us try to use this observation very coarsely. Among the numbers with 6 digits, the number that look like 1xxxxxx (those between 100000 and 200000) should be about the same the number between 200000 and 400000. The same thing happens with the numbers with 5 digits or 7 digits or n digits (assuming that you have a wide range of random numbers, and the numbers are the kind that come from certain sorts of real world measurements). Additionally, you can get distributions for the first two digits, the first three digits, etc.

    This observation doesn't depend on the base that you're working with.

    Now, with the prime numbers, they have a distribution that is different from a lot of real world measurement data. The number of primes between n and n+d is approximately d/ln(n), where ln is the log with base e and d is small compared to n. So the number of primes between 500000 and 600000 is about 100000/ln(500000), and the number of primes between 500000 and 600000 is about 100000/ln(600000). By using this, and being slightly more careful, one can determine fairly easily the distribution of the leading terms of the prime numbers.

    This is not a hard result. I would say that any professional mathematician who knew about the basic distribution of the primes could derive the distribution of the leading digis of the prime numbers fairly easily if anybody actually asked them to. The reason nobody mentioned this before is that nobody actually cares. While Benford's law does have applications to fraud detection, this new result does not. It's one of those things that makes people say "ooh, a pattern!" but which is just an easy and somewhat mundane corollary to a well known theorem.

  14. Re:Other bases? by Anonymous Coward · · Score: 5, Funny

    Knock knock.

    Who's there?

    9/11.

    9/11 who?

    YOU SAID YOU'D NEVER FORGET!

  15. Re:Other bases? by Anonymous Coward · · Score: 5, Funny

    Oh yeah? Well give me two minutes and check again.

  16. Re:Other bases? by Anonymous Coward · · Score: 5, Informative

    They are also distributed as Benson's law describes, providing that k is not a rational power of the base. IAAM.

  17. Re:Other bases? by tuck182 · · Score: 5, Informative

    You mean, how many are Mercene primes?

  18. Re:Other bases? by Bromskloss · · Score: 5, Funny

    I'm pretty sure that in base-2 with no zero-padding, 100% will start with 1. :-p

    100% = 100/100 = 1 = 0b1, which, by the way, looks like "Obi" and sounds like "Obi-Wan" when you say it.

    --
    Swedish plasma phys. PhD student; MSc EE; knows maths, programming, electronics; finance interest; seeks opportunities
  19. Re:Other bases? by jd · · Score: 5, Funny

    "Bad" as in you will see the Message as hinted at by Carl Sagan's "Contact". It's from God and apparently decodes to: "We apologize for the inconvenience".

    --
    It's a small world and it smells funny; I'd buy another if it wasn't for the money; Take back what I paid (SoM)
  20. Re:Other bases? by PleaseFearMe · · Score: 5, Funny

    It would be bad with binary. All numbers start with 1's.

  21. Re:Other bases? by Anonymous Coward · · Score: 5, Funny

    I will never understand how people do that. You have the link right there. Even if you didn't open it to make sure, the link itself mentions the name "Mersenne Prime", and yet you write Mercene.

  22. Re:Other bases? by spartacus_prime · · Score: 5, Funny

    You have no chance to survive make your prime.

    --
    If you can read this, it means that I bothered to log in.
  23. Some More Information by eldavojohn · · Score: 5, Interesting

    So I read the comments and see that I need to do this in ranges or 1 to 100, 1 to 1000, etc. Which is fine, I've added another R method and would post the code here if it didn't yell at me for junk characters. So here are your Benford lists:

    All Primes 1-100
    Counted Occurances:
    4, 3, 3, 3, 3, 2, 4, 2, 1
    Frequencies:
    0.160, 0.120, 0.120, 0.120, 0.120, 0.080, 0.160, 0.080, 0.040

    All Primes 1-1,000
    Counted Occurances:
    25, 19, 19, 20, 17, 18, 18, 17, 15
    Frequencies:
    0.149, 0.113, 0.113, 0.119, 0.101, 0.107, 0.107, 0.101, 0.089

    All Primes 1-10,000
    Counted Occurances:
    160, 146, 139, 139, 131, 135, 125, 127, 127
    Frequencies:
    0.130, 0.119, 0.113, 0.113, 0.107, 0.110, 0.102, 0.103, 0.103

    All Primes 1-100,000
    Counted Occurances:
    1193, 1129, 1097, 1069, 1055, 1013, 1027, 1003, 1006
    Frequencies:
    0.124, 0.118, 0.114, 0.111, 0.110, 0.106, 0.107, 0.105, 0.105

    All Primes 1-1,000,000
    Counted Occurances:
    9585, 9142, 8960, 8747, 8615, 8458, 8435, 8326, 8230
    Frequencies:
    0.122, 0.116, 0.114, 0.111, 0.110, 0.108, 0.107, 0.106, 0.105

    All Primes 1-10,000,000
    Counted Occurances:
    80020, 77025, 75290, 74114, 72951, 72257, 71564, 71038, 70320
    Frequencies:
    0.120, 0.116, 0.113, 0.112, 0.110, 0.109, 0.108, 0.107, 0.106

    This is the raw data so to turn that into something visual, I dumped it into a Google spreadsheet and made it public (note the scale on the y axis). Enjoy!

    It seems that the curve is flattening out the more data I collect, but the logarithmic curve may be valid. I have the data for 100,000,000 and will add that to the spreadsheet once it completes.

    --
    My work here is dung.
  24. I Found a Fit! by eldavojohn · · Score: 5, Interesting
    The results for all primes between one and one hundred million:

    Counted Occurances:
    686048, 664277, 651085, 641594, 633932, 628206, 622882, 618610, 614821
    Frequencies:
    0.119, 0.115, 0.113, 0.111, 0.110, 0.109, 0.108, 0.107, 0.107

    So there's some more data for you. I added that to this spreadsheet.

    So I hope that satisfies everyone who replied to my thread first of all. I hope 5,761,455 primes between one and one hundred million satisfies you.

    I used a very simple Non Linear Squares model to solve for a single constant on a log of these values. I think I have a fit. Using Benford's model and the NLS Package in R, I found:

    f(x) = 0.020814 * log(161.147689 * ((x+1)/x))

    To fit quite nicely, here's the summary:

    Formula: y ~ Const1 * log(Const2 * ((x + 1)/x))

    Parameters:
    Estimate Std. Error t value Pr(>|t|)
    Const1 0.020814 0.001940 10.7292 1.343e-05 ***
    Const2 161.147689 80.222081 2.0088 0.08452 .
    ---

    Residual standard error: 0.0010413 on 7 degrees of freedom

    Number of iterations to convergence: 8
    Achieved convergence tolerance: 1.8104e-07

    Here is the list of frequencies next to what my model produced:

    Benford Prime Rates
    0.11907548
    0.11529674
    0.11300704
    0.11135972
    0.11002984
    0.10903600
    0.10811193
    0.10737045
    0.10671280

    NLS Model Results
    0.1202106
    0.11422279
    0.11177125
    0.11042794
    0.10957828
    0.10899193
    0.10856276
    0.10823497
    0.10797641

    I would wager that they are correct. Neat discovery!

    --
    My work here is dung.
  25. Re:Other bases? by Thing+1 · · Score: 5, Funny

    IAAM.

    Wow, first use of "I am a moron" I've seen in the field!

    Hmm, or it is Mormon?

    --
    I feel fantastic, and I'm still alive.