Slashdot Mirror


Counting the World's Books

The Google Books blog has an explanation of how they attempt to answer a difficult but commonly asked question: how many different books are there? Various cataloging systems are fraught with duplicates and input errors, and only encompass a fraction of the total distinct titles. They also vary widely by region, and they haven't been around nearly as long as humanity has been writing books. "When evaluating record similarity, not all attributes are created equal. For example, when two records contain the same ISBN this is a very strong (but not absolute) signal that they describe the same book, but if they contain different ISBNs, then they definitely describe different books. We trust OCLC and LCCN number similarity slightly less, both because of the inconsistencies noted above and because these numbers do not have checksums, so catalogers have a tendency to mistype them." After refining the data as much as they could, they estimated there are 129,864,880 different books in the world.

109 comments

  1. NOSE by Anonymous Coward · · Score: 0

    Dont inclue the copyrighted books OPEN SOURCESSDSDV

  2. How do you define "different book"? by jonnythan · · Score: 3, Interesting

    Look at textbooks - new editions that are almost indistinguishable from the previous editions have new ISBNs. Do we count every single one as a different book?

    1. Re:How do you define "different book"? by bluefoxlucid · · Score: 1

      Same thing with any other book. Second editions and republishings (the Del Rey version versus the Pyr version, etc) with the same exact text unedited; multiple publishers of public domain works; etc.

    2. Re:How do you define "different book"? by Anonymous Coward · · Score: 0

      Different language and different markets often have different isbns as well.

    3. Re:How do you define "different book"? by Anonymous Coward · · Score: 0

      Does it contain the exact same information? If not I would think it's a different book.
      If one has pictures and the other doesn't, to me that is a different book. Even if one sentence or word was changed i would consider that a revision of some sort and a totally different book because it contains different information. One word can have a major impact on reading just compare the 1000's of "Bibles", i'd consider them all different books.

    4. Re:How do you define "different book"? by Suki+I · · Score: 1

      With the advent of self-publishing and individuals purchasing their own ISBN blocks, the possibility of different works getting the same ISBN increases greatly. Especially when they are not using a distribution service like Amazon that *might* check to see if that ISBN is already in use.

    5. Re:How do you define "different book"? by Jeng · · Score: 1

      Also, if a publisher purchases a title from another publisher it gets a new ISBN with the new publisher even though it is the same book.

      --
      Don't know something? Look it up. Still don't know? Then ask.
    6. Re:How do you define "different book"? by gpf2 · · Score: 2, Informative

      What about translations? What about bootlegged copies from the 18th century? What about languages that have no direct concept of "editon?" The International Federation of Library Associations and Institutions (IFLAuhas been wrestling with this for a while. Their solution -- of sorts: Functional Requirements of Bibliographic Records (FRBR). http://www.ifla.org/en/publications/functional-requirements-for-bibliographic-records Pretty dense and not consistently adopted.

    7. Re:How do you define "different book"? by natehoy · · Score: 1

      It's a one-page article, and contains a really good explanation of what they mean by a book for the purposes of their counting, and why.

      The following sentence from the article really which cuts straight to the heart of their concept of uniqueness:

      It makes sense to consider all editions of “Hamlet” separately, as we would like to distinguish between -- and scan -- books containing, for example, different forewords and commentaries.

      So, yes, if they scan textbooks they'll scan all versions they can get, and treat them as separate works.

      --
      "This post contains words, known to the State of California to cause thought. Wash brain thoroughly after reading."
    8. Re:How do you define "different book"? by jd · · Score: 1

      Also hardback vs. paperback, publishing in different regions as a distinct book, etc. Maybe ISBNs could be extended so that it encodes all these different fields in additional digits so that there is a component that is unique to a specific book (regardless of edition, publisher, etc), extra information that uniquely* identifies which specific edition/version/variant of the book it is and then yet more information that uniquely identifies which publisher circulated that book.

      *A SHA-2 or SHA-3 hash of the book's contents + cover + publusher would probably be close enough to unique, given that it's rare that editions run into sufficient numbers that a collision is even remotely likely, and would avoid any arguments over which publisher had what number or how to identify which version was what - especially for older books where there may be no unique way to determine this or the information simply no longer exists. A hash will always work.

      --
      It's a small world and it smells funny; I'd buy another if it wasn't for the money; Take back what I paid (SoM)
    9. Re:How do you define "different book"? by jd · · Score: 1

      Again, this is why I'd like to see additional information encoded in an extension to the book's ISBN number, such as a hash of the contents. Regardless of what the extension is, the split should permit you to identify "works that descend directly from a single work" plus "works that differ in content" (regardless of what they descend from). Then there would be no problem. You would be able to extract the level of information you wanted and no information would risk getting lost because such-and-such a group didn't think it important.

      --
      It's a small world and it smells funny; I'd buy another if it wasn't for the money; Take back what I paid (SoM)
    10. Re:How do you define "different book"? by icebike · · Score: 1

      And every goddamed one of them is scanned by google, foisted by Barnes and Noble and Amazon and everybody else as a separate book.

      I once counted twenty different versions of the same popular (copyright lapsed) classic, all scanned by Google, many from the exact same edition found in various libraries. Some horrible, some quite readable.

      I'm not sure anything is served by having both the 1902 and the 1903 versions of any popular fiction available in ebook form. Any serious researcher would search out the physical books and not rely on a scan anyway.

      --
      Sig Battery depleted. Reverting to safe mode.
    11. Re:How do you define "different book"? by pilgrim23 · · Score: 1

      In the 1480s a edition of Dante's Divine Comedy was printed in Venice. In 1481 another was printed in Florence. Each is the exact same text barring printer mistakes and if you are lucky enough to have the Florence one which includes the plates; illustrations. Each is also an absolute work of art in its own right and distinct from the other. Should these be recorded as one book or two?

      --
      - Minutus cantorum, minutus balorum, minutus carborata descendum pantorum.
    12. Re:How do you define "different book"? by Anonymous Coward · · Score: 0

      Oh, YOU aren't sure if anything is served, so it's bad? Thank goodness we have you here to police technological advances for us. How about, instead, we get all the data it's possible to get and the people who want to use it can, and the people who don't, they don't have to?

      The only real issue here is being able to distinguish between things which is solved by proper metadata. After that, it's a simple exercise in filtering.

    13. Re:How do you define "different book"? by Anonymous Coward · · Score: 0

      You mean they're exactly the same but with different problem sets...

  3. What about self published works? by Anonymous Coward · · Score: 0

    And what about self published books? They wouldn't have an ISBN unless they became wildly successful and then maybe not even then.

    1. Re:What about self published works? by insertwackynamehere · · Score: 1

      That's not true. Getting an ISBN isn't hard and self publishing companies will generally assign you one as part of the deal.

    2. Re:What about self published works? by Itninja · · Score: 1

      Indeed. QOOP, Blog2Print, etc have alone printed thousands (if not tens of thousands) of different books for individuals; who then publish their own work. Maybe those don't "count".

      --
      I judt got a nre Kinesis keybiartf so please excusr ant egregiou typos.
    3. Re:What about self published works? by Suki+I · · Score: 1

      That's not true. Getting an ISBN isn't hard and self publishing companies will generally assign you one as part of the deal.

      Amazon's Kindle, for example, will assign you an ISBN. However, if you bought your own ISBNs you can use them too. You are supposed to assign a different one to the eBook, paperback, audio and hardback. However, if you use the same one for all there are not many checks to stop you if you are using multiple services.

    4. Re:What about self published works? by dgatwood · · Score: 1

      That's not true. Getting an ISBN isn't hard and self publishing companies will generally assign you one as part of the deal.

      Depends on the size of the publishing house and the expected sales volume. If you're selling through a major bookstore chain, yeah, you're going to have an ISBN. For an independent author selling a few hundred copies of a book on the history of Three Way in a local bookstore, you probably won't have an ISBN---particularly if the book printing and binding was done at the Kinko's in Jackson. The single ISBN would cost as much as you'd make on the whole book.

      ISBN numbers are very much geared towards large volume commercial publishing. The system grudgingly handles smaller publishing to a point, but beyond that point, a lot of stuff falls through the cracks.

      --

      Check out my sci-fi/humor trilogy at PatriotsBooks.

  4. 8 or 9-place estimate by Anonymous Coward · · Score: 2, Insightful

    estimate would be about 130 million, not 129,864,880

    1. Re:8 or 9-place estimate by SomeJoel · · Score: 2, Insightful

      But 130 million can't possibly be right! We better assign some false precision to make our estimate believable. Significant digits are for science teachers and marriage counselors!

      --
      <Complete your profile by adding a signature!>
    2. Re:8 or 9-place estimate by langelgjm · · Score: 1

      Significant digits are for science teachers and marriage counselors!

      Ok, what am I missing here?

      --
      "Anyone who [rips a CD] is probably engaging in copyright infringement." - David O. Carson
    3. Re:8 or 9-place estimate by Anonymous Coward · · Score: 0

      Penis...the significant digit.

    4. Re:8 or 9-place estimate by dgatwood · · Score: 2, Funny

      Ring finger, presumably.

      --

      Check out my sci-fi/humor trilogy at PatriotsBooks.

    5. Re:8 or 9-place estimate by aynoknman · · Score: 1

      But 130 million can't possibly be right! We better assign some false precision to make our estimate believable. Significant digits are for science teachers and marriage counselors!

      Why stop at 8 or 9? 18 is much better and just as meaningful: 129,864,880.461938427

      --
      We need a "+1 -- nice sig" moderation.
    6. Re:8 or 9-place estimate by Anonymous Coward · · Score: 0

      I suspect that what the summary meant was "we identified 129,864,880 unique books".

  5. Whew....almost done! by SQLGuru · · Score: 1, Funny

    I'm almost done reading them all!

    1. Re:Whew....almost done! by Capt.DrumkenBum · · Score: 1

      Damn, I need to spend a lot more time reading.

      --
      If I were God, wouldn't I protect my churches from acts of me?
    2. Re:Whew....almost done! by Anonymous Coward · · Score: 0

      Lucky we have copyright to promote the creation of more books for you.

    3. Re:Whew....almost done! by XSpud · · Score: 1

      I'm almost done reading them all!

      That's my next challenge - once I've finished reading the web.

    4. Re:Whew....almost done! by SQLGuru · · Score: 1

      I can just ruin the ending for you....
      http://www.wwwdotcom.com/

    5. Re:Whew....almost done! by kayditty · · Score: 0

      I hope you save this for last.

  6. Foreign books? by Anonymous Coward · · Score: 0

    how about all the books printed in china, the rest of asia, middle east etc that don't have ISBN's?

    1. Re:Foreign books? by jeffmeden · · Score: 1

      If they don't have the will to obtain an International Standard Book Number for their Internationally published book, then why bother counting it at all? After all, I wrote a book in first grade, consisting of 16 pages of poorly drawn pictures and brutal (if accurate) grammar... Should this be counted too?

    2. Re:Foreign books? by Anonymous Coward · · Score: 0

      After all, I wrote a book in first grade, consisting of 16 pages of poorly drawn pictures and brutal (if accurate) grammar... Should this be counted too?

      Absolutely!

      Your book could still be a boon to anthropologists studying pre Mayan calendar destruction societies.

    3. Re:Foreign books? by cablepuller · · Score: 1

      I'd like to look at that. I have seen many unique Sketchbooks. Given that all humans who are able to write, sooner or later scribble into their calendars, etc., I would estimate roughly: there are as many written pages as there were produced sheets of paper (minus the amount of drawings, test-pages, and official documents). Go get your ISBN, your first grade book may be the missing puzzle-piece in the evolution of mankind ;)

  7. Re:How do you define "different version"? by blair1q · · Score: 1

    I can tell this topic is going to be dominated by people who never had to deal with the internals of a revision-control system, much less a configuration-management system, because the issues are somewhat trivial once you get past your fear of the variables.

  8. Re:How do you define "different version"? by natehoy · · Score: 1

    Also by people who have never read the article, where it explains in some significant detail how they try to determine what constitutes "a book" for the purposes of their counting.

    --
    "This post contains words, known to the State of California to cause thought. Wash brain thoroughly after reading."
  9. I propose a new filesystem by TrisexualPuppy · · Score: 0

    In order to count and house all the world's books, we, of course, are going to need a new filesystem. I propose to call it TSPFS. The fundamental unit of the said filesystem is a BLoC, representing 115M books. And of course, 640K BLoCs should be enough for anyone...

    1. Re:I propose a new filesystem by Anonymous Coward · · Score: 0

      ahem if u dont get it

    2. Re:I propose a new filesystem by Anonymous Coward · · Score: 0

      given that a BLoC is a Burning Library of Congress, which from the link converts to about 4 petajoules, I'm confused as to why your units of measure are units of energy.

      I'm not being critical, I just mean, how does a file system store data as energy? Is it potential energy?

  10. Stupid estimate by xemc · · Score: 0

    That's a stupid estimate. Since they admitted there is so much uncertainty, they should have just said 130 million. (Or better, 0.13 billion to retain the significant digits)

    1. Re:Stupid estimate by hvm2hvm · · Score: 1

      no, 1.3E+8

      --
      ics
    2. Re:Stupid estimate by maxwell+demon · · Score: 1

      0.13 Gigabooks.

      --
      The Tao of math: The numbers you can count are not the real numbers.
    3. Re:Stupid estimate by Flea+of+Pain · · Score: 1

      How many Libraries of Congress is that?

      --
      Do not argue with an idiot. He will drag you down to his level and beat you with experience.
  11. adasd by Anonymous Coward · · Score: 0

    http://rlslog.in/wallpapers/3909-widescreen_40.html

  12. That's an ESTIMATE? by wealthychef · · Score: 3, Interesting

    I'm very suspicious about their numerical precision. IF it's an estimate, then they are saying it's 129,864,880 +/- 10. That is, they are pretty sure there aren't 129,864,980 books. I think they should make their estimate something like "we think there are about 130,000,000" or whatever accuracy they actually believe.

    --
    Currently hooked on AMP
    1. Re:That's an ESTIMATE? by NixieBunny · · Score: 1

      For sure. Even gravity can't be specified to that many significant digits, and it's a bit more knowable than the number of books in the world.

      --
      The determined Real Programmer can write Fortran programs in any language.
    2. Re:That's an ESTIMATE? by Caledfwlch · · Score: 1

      Also, what is the date and time of this estimate? How many books are published a day around the world?

      --
      These views express my own personal opinions, not those of the other voices in my head
    3. Re:That's an ESTIMATE? by demonbug · · Score: 1

      If you RTFA (blasphemy, I'm sure), Google doesn't say that 129,864,880 is an estimate - they say that is the number of books, total (at least until Sunday).

      The only estimate is mentioned is "16 million bound serial and government document volumes".

      Surprise surprise, subby is the culprit that turned such an exact number into an "estimate".

    4. Re:That's an ESTIMATE? by city · · Score: 1

      I'm suspicious about the accuracy of numbers in general, I use 'some' for a few things and 'many' for more. I estimate there are many books in the world.

      --
      I am a v1ral sig. Plse c0py me and h3lp me spread. Thank y0u?
    5. Re:That's an ESTIMATE? by Lunix+Nutcase · · Score: 1

      That's the point. There is no way in hell that their accuracy is that great.

    6. Re:That's an ESTIMATE? by wealthychef · · Score: 1

      Google might not say in TFA, but the number they came up with includes approximations and estimates. The precision is not as given.

      --
      Currently hooked on AMP
    7. Re:That's an ESTIMATE? by Anonymous Coward · · Score: 0

      One, two, some, many... somemanysometwomanymanysome?

    8. Re:That's an ESTIMATE? by MattskEE · · Score: 1

      You lose accuracy by representing error bounds simply by the significant digits of the number. It is convention-dependent that the last sig fig is assumed to be +/- 1 (zero being assumed non-significant unless followed by a decimal point, unless the zero is already after a decimal point). That's what I remember from high school chem. And it's a convention that makes sense for, say, reading a temperature off of a thermometer. You don't know if the actual value was rounded up or down to give the instrument readout. Of course assuming that a thermometer which has 0.1 degree celsius resolution also has accuracy to 0.1 degree is a not necessarily valid, but that's another topic.

      But this is an estimate of the number of books, there is no instrument being read here. This is simply their estimate with error bounds that are obviously much greater than the last significant digit. If they had said 130. million then the value would be assumed to be between 129 milling and 131 million, based on the aforementioned convention which some people use. But if their error bound is +/- 1 million for a given percent certainty then they are more accurate by saying 129,864,880 +/- 1 million than by stating 130. million, even though the two are very close.

      By doing a detailed analysis of how accurately their algorithm determines the number of records based on a random sampling of records they could perhaps come up with a way to determine their error bounds. But such an analysis would probably take a great deal of effort, and I think that they just want to give us their best guess at this time.

  13. Mod parent up by Anonymous Coward · · Score: 0

    -=][Interesting][=-

  14. As a Data Collector... by Anonymous Coward · · Score: 0

    > Various cataloging systems are fraught with duplicates and input errors, and only encompass a fraction of the total distinct titles.

    You callin' me a liar?

  15. Wow by demonbug · · Score: 2, Insightful

    They should write a book!

    1. Re:Wow by Suki+I · · Score: 1

      I became your fan for that :)

  16. Seriously... by clickclickdrone · · Score: 1

    Who cares? Does it matter?

    --
    I want a list of atrocities done in your name - Recoil
    1. Re:Seriously... by SomeJoel · · Score: 3, Insightful

      Who cares? Does it matter?

      Does anything?

      --
      <Complete your profile by adding a signature!>
    2. Re:Seriously... by Flea+of+Pain · · Score: 1

      Mod parent up...+1 emo.

      --
      Do not argue with an idiot. He will drag you down to his level and beat you with experience.
    3. Re:Seriously... by Smauler · · Score: 1

      No... don't be an asshole, GP won't like that. Mod GP down -1 Emo.

  17. Helluva encryption ratio by Anonymous Coward · · Score: 0

    OK so now we can represent every text in the world with a 32 bit key. We just need the world's fanciest decryption algorithm to recover the texts...

    1. Re:Helluva encryption ratio by dgatwood · · Score: 1

      Ooh. I've got it. We'll call it the Library of Congress crypto scheme. We could use it for encrypting other stuff, too. Any arbitrary word could be encoded as an LOC identifier, a page number ,and an offset in bytes or words. Man, wouldn't that suck to decrypt?

      --

      Check out my sci-fi/humor trilogy at PatriotsBooks.

    2. Re:Helluva encryption ratio by Anonymous Coward · · Score: 0

      Yeah, hmm..... evil grin

  18. 1 in 50 people wrote a book by doconnor · · Score: 1

    If you divide the number of books by the current world population, you get that there are one unique books for every 50 people, or on average one in 50 people wrote a book, including many poor, illiterate and children.

    Of course, some book writers have died and many have written more the one book, but I suspect that most books have been written recently and their writers are still alive.

    If you only include adults who live a comfortable western lifestyle, it may be as maybe as high as one in 10.

    1. Re:1 in 50 people wrote a book by SomeJoel · · Score: 2, Insightful

      I suspect that most books have been written recently and their writers are still alive.

      And I suspect that you are full of crap.

      --
      <Complete your profile by adding a signature!>
    2. Re:1 in 50 people wrote a book by longhairedgnome · · Score: 1

      I wish I had mod points for you sir.

      --
      GENERATION O98346: The first time you see this, copy it into your sig and remove a random number from the generation. T
    3. Re:1 in 50 people wrote a book by doconnor · · Score: 1

      90% of all scientists who ever lived are alive today and many of the books have been written by scientists.

      While the percentage may not be has high for all authors, but I think it would be close.

    4. Re:1 in 50 people wrote a book by maxwell+demon · · Score: 1

      I suspect that most books have been written recently and their writers are still alive.

      Indeed, just yesterday I met Shakespeare. He was talking with Lewis Caroll and Douglas Adams. Unfortunately I couldn't talk to them, because Plato was just coming around the corner, arguing with Aristoteles and Kant about some philosophical problem, and I would have been in their way. On the other side of the room, Mao was arguing with the evangelists about who has written the better Bible. Karl Marx didn't help Mao, because he was too busy talking to Adam Smith about whether the invisible hand was good or evil. Dante and Kafka were talking about if the hell was absurd, while Agatha Christie was arguing with Arthur Conan Doyle and Edgar Allan Poe about how to write good criminal stories.

      --
      The Tao of math: The numbers you can count are not the real numbers.
    5. Re:1 in 50 people wrote a book by mcgrew · · Score: 1

      Isaac Asimov wrote over 500 books. I don't know know haw many Terry Pratchett has written but the number is in the dozens. There's Clarke, Heinlein, Niven... and those are just a few science fiction writers (yes, Asimov also wrote nonfiction and Pratchett is known mainly for fantasy). Serious authors write more than one book each.

      So your average is a little meaningless.

    6. Re:1 in 50 people wrote a book by doconnor · · Score: 1
    7. Re:1 in 50 people wrote a book by SlippyToad · · Score: 1

      Given the enormous explosion in literacy and printing press technology over the last 100 years, I would say he's probably closer than you think. Also, it's estimated that human knowledge doubles every 7 years -- that would mean a doubling of the number of things written down or published.

      What would resolve this is to discover how many books existed 100 years ago, and 50 years ago.

      --
      One day I feel I'm ahead of the wheel / the next it's rolling over me / I can get back on / I can get back on
    8. Re:1 in 50 people wrote a book by Smauler · · Score: 1

      A suprisingly large proportion of the humans who ever have lived are actually alive now (most people estimate it about 10%). It is _way_ easier now to publish a book than it was even 100 years ago.

      I'm not saying you're wrong about GP's assumptions made, but personally I'd guess he's right. That's just a guess though ;).

    9. Re:1 in 50 people wrote a book by pz · · Score: 3, Informative

      Isaac Asimov wrote over 500 books. I don't know know haw many Terry Pratchett has written but the number is in the dozens. There's Clarke, Heinlein, Niven... and those are just a few science fiction writers (yes, Asimov also wrote nonfiction and Pratchett is known mainly for fantasy). Serious authors write more than one book each.

      So your average is a little meaningless.

      No, averages are very meaningful. Extremely meaningful. They are the AVERAGE (usually the mean), which means that some values will be above, and some values will be below. The idiocy comes in when people mistakenly jump to the conclusion that just because an average exists, it means that every value must be exactly the same as the average. Or, just because you can find extreme values far away from the average that again the average is not meaningful.

      If the average states that 1 in 50 people have written a book, then, by gum, it will be easy to find plenty of people who have written zero books, somewhat fewer who have written exactly one (something below 1 in 50), much fewer who have written exactly two, even fewer who have written exactly three, etc. That does not mean that example authors with hundreds of books cannot exist, it only bounds how frequent they can be.

      Of the myriad of ideas that the academic community has utterly failed in educating the general public about, it's the relationship between averages and distributions. One more time: just because an average exists, it does not mean that every datum has the same value as the average. As an example, just because the average male in the US is 5' 9", it does not mean that every single male is that tall, nor that you will not find ones that are shorter, taller, or even much shorter or much taller. The tallest man (according to my 20 seconds of research through Google) was 8' 11", and the shortest was 1' 10" ... does that lessen the meaningfulness or utility of the average male height? Rather the contrary: it provides important information as to the extent of the distribution of heights.

      Now, I suspect that the parent poster is trying to say that because -- by loosely founded speculation -- most authors are professional authors ("serious authors") and therefore will have more than one book to their name, the classification of people into authors and non-authors will be skewed against 1:50. I would not argue against that (in fact, I indirectly argued for it above). Nevertheless, using the utterly non-scientific sample of the books above my desk, most authors have only one book to their name, so the number isn't going to be much worse than 1:50, perhaps 1:55 or 1:60. That kind of pure, unadulterated speculation is exactly the sort I would love to see proved wrong with hard data.

      --

      Put my fist through my alarm clock with its ding-dong death inside my ear. - The Blackjacks.
    10. Re:1 in 50 people wrote a book by mattack2 · · Score: 1

      The Straight Dope, in 1987, said:

      Demographers have come up with estimates ranging between 69 billion and 110 billion humans.

      http://www.straightdope.com/columns/read/413/how-many-people-have-lived-on-earth-since-the-dawn-of-time

    11. Re:1 in 50 people wrote a book by mattack2 · · Score: 1

      So you're dead, and talking to us from Riverworld, right?

    12. Re:1 in 50 people wrote a book by mcgrew · · Score: 1

      One more time: just because an average exists, it does not mean that every datum has the same value as the average.

      That was the point I was trying to make. If there is one book written for each group of fifty people, the average would be one in fifty but the actual number of authors would be less than one in fifty people, probably far less. But as you say, there's no way of knowing how much less without actual data.

  19. Units by Anonymous Coward · · Score: 0

    I'm not sure I follow.... How much is that in Libraries of Congress?

  20. Old News by rssrss · · Score: 1

    Qoh.12 [12] ... Of making many books there is no end,

    --
    In the land of the blind, the one-eyed man is king.
  21. Ph D Thesis ? by Anonymous Coward · · Score: 0

    Do Ph. D. thesis manuscripts (and other academic writings) count as books ? If so, I bet there's much more than "only" 130e6...

  22. They could just use by kilodelta · · Score: 1

    The same checksum they use for UPC codes. Sum up the 10 significant digits. Then take that sum(S) and push up to the next tens unit(T). The difference of T-S = check digit.

    E.g. UPC code 54556 39824. Sum is 51. Next tens is 60. 60-51=9 so the check digit is 9. The same basic formula could work for ISBN numbers too.

    1. Re:They could just use by oggiejnr · · Score: 1

      ISBN check codes are designed to catch common errors back when hand entry was common -

      a run of two digits in the wrong place (eg 556 instead of 566)
      a mistyped digit
      two digits swapped around by one place

      The UPC code does not support the latter at the expense of only requiring the check symbol to be one of 10 regardless of the number of digits in the code. The ISBN algorithm requires n+1 where n is the number of data digits. Whether this is required nowadays given that very few ISBNs are entered by hand is another issue.

  23. can't grok the numbers... by Anonymous Coward · · Score: 1, Funny

    129,864,880 different books? What is that in Libraries of Congress?

    1. Re:can't grok the numbers... by bannable · · Score: 1

      Just about one and a half.

      --
      "If you see a man on a horse, he is likely an enemy. Kill the man and eat the horse."
    2. Re:can't grok the numbers... by Anonymous Coward · · Score: 0
  24. Re:How do you define "different version"? by dgatwood · · Score: 2, Funny

    You read the article?

    Impostor! Burn the witch!

    --

    Check out my sci-fi/humor trilogy at PatriotsBooks.

  25. 129,864,880 published books, that is. by andrewagill · · Score: 4, Insightful

    How about the books that people write and spread around to friends or books published by small in-house printshops, often as promotional material? Books written before ISBN that are still in libraries but no longer published (Bodoni's type specimens come to mind, though it looks like some of these are indeed catalogued by WorldCat)? Books that were printed years ago that we know we lost to the ages (the lost Gospel of Barnabas--not the forged Gospel of Barnabas--comes to mind). What about the books that we never knew existed?

    This estimate isn't bad for published works, but it does not adequately answer the question posed, ``Just how many books are out there?''

  26. Re:How do you define "different version"? by Smauler · · Score: 3, Informative

    Look at textbooks - new editions that are almost indistinguishable from the previous editions have new ISBNs. Do we count every single one as a different book?

    From TFS : if they contain different ISBNs, then they definitely describe different books

    If they're using this method, GP's point is valid. The books are not really new books, they're essentially the same as previous editions but have different ISBNs. In essence, these new editions with new ISBNs are being counted twice (or more) for very small revisions to the same book.

  27. Stately homes by Anonymous Coward · · Score: 0

    Visit some of the stately homes of England and it will be obvious that there are lots and lots of books that are unlikely to be in very many libraries but which would contain lots of fascinating historical and geographical info. Things like the history of our county, memoirs of my service as a priest in this parish. Many of these homes are operated by the National Trust but often the home and contents is still privately owned. It would take a lot of work to get access to scan this stuff, but I would love to see it done. There are thousands of small local museums and libraries throughout the world with lots of regional information, garnered from the estate of prominent citizens who died. Google has only scratched the surface with their scanning to date.

  28. Re:How do you define "different version"? by natehoy · · Score: 2, Informative

    From TFA: Well, it all depends on what exactly you mean by a “book.” We’re not going to count what library scientists call “works,” those elusive "distinct intellectual or artistic creations.” It makes sense to consider all editions of “Hamlet” separately, as we would like to distinguish between -- and scan -- books containing, for example, different forewords and commentaries. (emphasis mine)

    For Google's definition of what constitutes a unique work as used to derive the stated quantity, the use of ISBN as described is perfectly valid. They are OK with "almost the same work" != "the same work".

    So their counting methodology would consider "Fundamentals of Math 3rd Ed by I. M. Counting" to be a distinct work from "Fundamentals of Math 4th Ed by I. M. Counting".

    In fact, if the publisher released a paperback version, it would be considered another separate work, because the typesetting and page layouts may differ, and might include different forewords, different pages on the index, etc.

    It's a separate and distinct work, from Google's point of view, where they are trying to index the works that they want to scan.

    Remember, their goal is to capture as much as possible of the entire sum of human writing. A different foreword is a unique work to them.

    Of course, you can then disagree with Google's counting methodology, which is fine. If you do, then the number they have reached for their purposes is meaningless to you and you'd better start counting based on your own definition.

    It'll take a while, good luck, and let us know what you come up with. :)

    --
    "This post contains words, known to the State of California to cause thought. Wash brain thoroughly after reading."
  29. All You Need is ONE Book by Anonymous Coward · · Score: 0
    Dianetics by L. Ron Hubbard...

    (Bet you thought I was going to say the Bible. Wrong, I'm crazier than that!)

  30. 129,864,880 different books by Anonymous Coward · · Score: 0

    there are 129,864,880 different books in the world

    So how many library of congresses is that?

  31. ISBN sucks for digital books by bcrowell · · Score: 3, Insightful

    ISBNs suck as identifiers for digital books, especially digital books that are free. There are two problems.

    Problem number one is that they cost money. Let's say someone writes up a really nice manual documenting some open-source software. He wants the manual to be free, just like the software. But now if he wants an ISBN, he has to pay money to get the ISBN, which means expending dollars on a book that is not going to be bringing in any dollars. The fact that ISBNs cost money is out of step with the fact that we have this thing called the World Wide Web, which is basically a huge machine for letting people do publishing without the per-copy costs that are associated with print publishing.

    The other problem is that ISBNs are supposed to uniquely identify an edition of the book. This makes sense for traditional print publishing, where the economics of production forced people to make discrete editions widely spaced in time. It makes no sense for print on demand or for pure digital publishing. I've written some CC-licensed textbooks. When someone emails me to let me know about a typo or a factual error, I fix it right away in the digital version, and I usually update the print-on-demand version within about 6 months. No way am I going to assign a different ISBN every 6 months.

    We can say that ISBNs are for printed books, not for ephemeral web pages, but that doesn't really work. The two overlap. My textbooks exist simultaneously as web pages, pdf files, and printed books. Amazon sells a book for the kindle using one ISBN, assigning a different ISBN to the printed version. Print-on-demand books share some characteristics with printed books (e.g., they're physical objects) and some with the web (can be updated continuously).

    By the way, why do you think library catalogs don't show ISBNs? It's because ISBNs are meant as commercial tools, like the barcode on a box of cereal. If google finds ISBNs useful for other purposes than selling copies of books, it's probably because google is trying to deal with a massive number of books using a minimum amount of human labor.

  32. So ~200TB = "All The Books" by billstewart · · Score: 1

    A typical book is in the range of 1-2MB of text, assuming you're representing actual letters, as opposed to scanned images of the text, and ignoring illustrations, pictures, etc. So if there are about 130 million books, that's about 200TB to store them uncompressed, maybe 50TB compressed. If you've got multiple versions that are almost identical (e.g. Third Printing from Paperback Publisher B has a different copyright page than First printing from Hardback Publisher A, and maybe a different cover page illustration and blurbs on the back cover), then the different versions add a percent or two.)

    As correlation, Wikipedia says the Library of Congress has about 20 million books (in a collection of 100 million things), and The InterWebs say that the Library of Congress is about 20TB (not clear if that's just books or not.) So that says 130 million books would be about 130TB uncompressed; it fits on the back of the same envelope.

    So for about $5000 of computer equipment, your town or school could have its own copy of The Library, with All The Books.
    So far, The Internet Archive has digitized about a million books - you could probably fit that onto 1-2 BlueRay disks.

    --

    Bill Stewart
    New Fast-Compression-only CPR http://preview.tinyurl.com/dy575ks
    1. Re:So ~200TB = "All The Books" by bluefoxlucid · · Score: 1

      I just compressed a 2.0GB plain text log file to 42M, so I imagine compressing a book would have roughly 40:1 performance. So 200TB would compress to more like 5TB. If we used a sort of cell-shading on colorful images with flat color regions, we could turn them back to simple images that blot down to PNG very well; same with black and white illustrations, although restoring colored pencil on paper would pose difficulty (really, we want to identify the paper and remove it in favor of a paper-like background texture).

      So maybe 10TB or so for everything.

  33. what about pre-20th century works? by morethanapapercert · · Score: 1
    OK, I'm a bad little slashdotter, I actually RTFA. I noticed a few things:

    1)TFA actually acknowledges that the ISBN is very North America-centric, but the other cataloging types are also either N.A-centric or at least western world-centric.
    2) The entire article is based on efforts to simply compile a list of books by aggregating and loosely filtering/sorting several other lists. The lists mentioned are, as far as I know, all heavily biased toward 19th and 20th century works. (The article explicitly mentions that one problem is that it doesn't include numerous works not intended for commercial consumption, such as doctoral theses and so on.)

    I would argue that the most important works to digitize first is not the low-hanging fruit of works already cataloged and in most cases, existing in multiple copies in multiple locations. (we are at little risk of losing the works of Dan Brown (cited in the article) to the depredations of time during the scope of this project.) To me; the most important works to get digitized are those works where there are only one or two copies, are possibly hundreds of years old and are moldering away forgotten on the back shelves of some monastary or filed and forgotten in the bowels of some museum.
    What I'd like to see is Google and a few other digital data industry leaders get together and create a bounty system for old books. Simply put: The Global Translation Movement will pay say a buck a page multipled by the confirmed age of the book in question. (similar pay scales would have to be worked out for those really old "books" that consist of wood tablets, bamboo or papyrus strips and so on.) The project would need to go out of its way to contact old monastaries, nunneries, temples, museums and so forth. A 200 page folio that is 250 years old nets 50,000$ for the monastary that scans it and shares the digital copy with the world. My inspiration for this came from the Islamic Translation Movement of medieval times.
    You could do similar bounties for translations as well into four or five of the world's most widespread languages. (Chinese, English and Arabic come to mind.)
    If I were some kind of intellectual or academic authority, this is something that I'd seriously pitch at the next Ted Talk...

    --
    I need a wheelchair van for my son. Help me get the word out. https://www.gofundme.com/wheelchair-van-for-jj
  34. 129,864,880 different books in the world by briniel · · Score: 1

    and there are even more in Lucien's library in the dreaming.

  35. Re:How do you define "different version"? by Smauler · · Score: 1

    I was not proposing a new method of counting books... I was only supporting the OP in his assertion that their method contains limitations regarding repetition of works with minor differences.

    I was mainly responding to those who just said RTFA without seeing basic facts in TFS.

  36. I considered that, but there's a problem... by N0Man74 · · Score: 1

    So if I wrote a book about this, should I call it "The 129,864,880 Books That You Must Read Before You Die", or "The 129,864,881 Books That You Must Read Before You Die"?

  37. I Like 'Em (Books) by Anonymous Coward · · Score: 0

    I am curious about the characterization of ancient texts. Does the ISBN system take account of books written before the ISBN was created? After all, books have been around for a very long time. The printing press made books inexpensive and pervasive, but books existed long before.

    Take a famous example, the Gutenberg Bible. Does it have an ISBN number? Now a much more difficult one: How about the Code of Hammurabi, which was "published" on clay tablets? How about the Dead Sea Scrolls, at least the intact ones? And what about some of the Mayan books, which are incredibly rare? How about some of the Egyptian texts, written on papyrus?

    It would be interesting to know what qualifies as a "book".

  38. I'm pretty sure... by johosaphats · · Score: 0

    Steven King has written at least that many.

  39. Boobs by nasta · · Score: 1

    Damn, I first read "Counting the World's Boobs"!

  40. READ THEM ALL by Anonymous Coward · · Score: 0

    Wish I wasn't so proficient at speed reading ... and multilingual.

    Not much left to do after reading them all except to play Grand Theft Auto for the rest of my life I guess.

  41. Define: Book by Anonymous Coward · · Score: 0

    They seem to define what a book is, but what is it really?

    That is one of the things my dad ended up defining for his book cataloging (handling books, magazines, short stories, etc).

    For him, he defined that you have a work, which is contained in a volume. (OK, looking at the articles link on what a work is, that seems to be similar to my dad's definition. For him it is the actual text, irregardless in what form).
    A volume is a container, containing anywhere from 1-many works(think collections of short stories, one of the reasons he designed his database in that way). Sometimes the same work is in multiple volumes (Think: short story runs in a magazine, later placed into a collected works book. Same work, multiple volumes. Same idea with paperback versus hardback. Same thing, different forms (usually the same thing at least) ) Or, how about the same work from 2 different publishers? (Fellowship of the Ring, we have 2 copies. One from ACE, one from another publisher whose name escapes me at the moment. Assuming they are the same, should we count one or 2? Content being the same, does it matter that we have 2 physically different objects whose contents are the same?)

    How do you deal when a publisher makes changes depending on which printing, but doesn't mark it as a new edition?(Maybe a spelling corrections? Would likely end up with the same ISBN)

    An example in the article with Hamlet, with different forwards and commentaries. Are they truly unique, or could you break it as different containers, each containing Hamlet + something else? (OK, the containers will be unique, but some of the content inside will not be. Of course this assumes they don't modify the Hamlet(not sure what they might do))

    So I guess what I am asking, is the content what we are counting(irregardless of physical container), or is it the physical objects that we are counting? (Not sure how to label the digital works in this case)

  42. Efficiency of lossless compression by billstewart · · Score: 1

    Log files are typically very structured low-entropy data. With random natural-language text you seldom get better than 3 or 4 to 1 lossless compression. Image compression can do better, but that's typically already been done to get the JPG/PNG/GIF/etc., and it's typically lossy, and of course video compression is much better because most of an image doesn't change much from frame to frame. But in this case they're trying to OCR the data, so much of that image compressibility has already been replaced (because you're using one byte to represent the letter instead of a bunch of bytes representing black marks on white paper.)

    --

    Bill Stewart
    New Fast-Compression-only CPR http://preview.tinyurl.com/dy575ks
    1. Re:Efficiency of lossless compression by bluefoxlucid · · Score: 1

      Structure has nothing to do with compression: only content affects redundancy. In this case, a lot of it is IP and IDS logs, mixed with system logs, etc... it's a log of everything that touches syslog, basically. And bzip2 uses 900,000 byte blocks, so anything more than a meg away is irrelevant.

      English text compresses rather well in any case, as it is by nature well-structured and redundant. I used to analyze uncompressed English text through shitty encryption by shit as simple as two-tuple frequency counting, which let me determine the key length; then I assumed the most frequent character in any key space was a space, figured out how to figure out which key applies to which character and where common characters were under the encryption, then regenerated the key.

      A special-case lossless text compressor would be pretty spectacular, but ultimately not so much better than Lempil-Ziv-Markov for general use.