Slashdot Mirror


Google Is Using AI To Digitize 5 Million Historical Photos (theverge.com)

Google is working with The New York Times to digitize its huge collection of about 7 million historic images. The pictures are apparently kept in the newspaper's "morgue," which contains pictures going back to the 19th century, many of which exist nowhere else in the world. The Verge reports: That's why the company has hired Google, which will use its machine vision smarts to not only scan the hand- and type-written notes attached to each image, but categorize the semantic information they contain (linking data like locations and dates). Google says the Times will also be able to use its object recognition tools to extract even more information from the photos, making them easier to catalog and resurface for future use.

14 of 30 comments (clear)

  1. Dupe by dereference · · Score: 2

    From earlier today no less.

  2. Re:I hate c6gunner... apk by Anonymous Coward · · Score: 1

    Your whole life story consists of fucking with people and thinking that THIS time they won't fuck with you back.

  3. Dupe Dupe Dupe by Stonent1 · · Score: 2, Funny

    Dupe of URL Dupe dupe dupe of URL.

  4. How does AI work on non-digitized photos? by mark-t · · Score: 1

    Don't the photos have to already *BE* digitized for AI to process them in the first place?

  5. After digitization, coorelation and AI scanning by bobstreo · · Score: 1

    the images will be given to BookFace for more friend suggestions. /s

    I hope they don't destroy the originals, the longevity of image storage on silver nitrate media beats any digital technology ever invented.

    1. Re:After digitization, coorelation and AI scanning by jd · · Score: 2

      Silver nitrate and the original magnetic core memory had about the same lifespan. However, you run into problems of size, speed of access, etc. For those youngsters who never encountered it, core memory was an improvement on the Williams memory device that had been used in previous generations of machines. It used slowly-decaying magnetic fields to store information for 100+ years. This made it the world's first electronic non-volatile storage. At a density of 32 kbits per cubic foot, it was also very inefficient.

      So, whilst it's not quite correct to say that old-fashioned film is better than all digital in terms of longevity, it's better than most and those that come close in longevity don't come close in information density. High-grade medium film carries a LOT of information in a couple of square inches.

      --
      It's a small world and it smells funny; I'd buy another if it wasn't for the money; Take back what I paid (SoM)
  6. Upload everything to Archive.org too by Anonymous Coward · · Score: 1

    We can not trust Google or any other corporation to preserve that images. Please, also upload a copy of all that pictures to the Internet Archive (archive.org), otherwise when this contents stops being good business to Google and/or The New York Times it will be lost forever. Maybe someday we will get an AI for the Archive.org, at that moment we will rock all the knowledge of the world !!!

  7. Re: how is google doing the digitizing? by batukhan · · Score: 1

    Apparently Google has the AI which makes it tremendously simpler to scan and label millions of photos somehow. /s No people will still do the bit with the scanning and labeling. But they will also use AI to generate more labels. Which requires more people but special Google kind of people

  8. nice article by Abhishek+ku · · Score: 1

    Excellent article..

  9. I, too, have an archive. by jd · · Score: 2

    My collection is a few tens of thousands of photographs from three families dating from 1880 or so to 1980. A proper scan that gets out the greatest amount of actual information has yielded an average of about a gigabyte per photograph, so far. That's a lot of information. So my pathetically small collection holds about ten or so terabytes of data.

    I've absolutely no idea how I'm going to store that kind of volume of data, it's not like Google will offer.

    But I bet you a dozen doughnuts that there are thousands of families in the same boat, who have vast collections of negatives that they'll destroy because they don't have room and don't see an immediate value in.

    I also bet that if those thousands of families could be persuaded to get those images scanned, if they'd be willing to contribute to the cost of the collating and storage costs, that it would seriously change the way the past is seen by historians, family history fans and archivists.

    And I'd bet that such an archive would have a profoundly greater impact than the NYT archive would.

    It won't happen because those aforementioned families will be stubborn and prefer destruction over conservation, because none of the cloud vendors would be interested in helping disseminate information of individually uncertain value, and because most people see history as someone else's problem.

    --
    It's a small world and it smells funny; I'd buy another if it wasn't for the money; Take back what I paid (SoM)
    1. Re:I, too, have an archive. by Anonymous Coward · · Score: 1

      You don't need a gigabyte per photo. It sounds like that's an uncompressed 16-bit TIFF at the scanner's maximum resolution. Unless the photos were taken with a 4x5 or larger view camera and you have the negatives/glass plates or large prints, the photos don't really contain that much information. Try exporting one as a 10 megapixel JPEG and compare it side by side with the TIFF, scaled to the same size. You probably won't be able to tell the difference.

      You can also compress the TIFFs with LZW or ZIP, and for black and white photos if you don't care about preserving tinting you can convert them to grayscale. Both will dramatically decrease your file sizes.

    2. Re:I, too, have an archive. by kqs · · Score: 1

      I've absolutely no idea how I'm going to store that kind of volume of data, it's not like Google will offer.

      Are you sure? Google says that they'll store unlimited pictures at up to 16MP, which is better quality than most older film. So.... yeah, they did offer. Maybe not in an uncompressed format or in your preferred format, but probably with way more metadata than you can reasonably add. I don't know if there is a way to tell Google the date of an old photo, but if so, then a search for "photos from the 1930s of people at a beach" suddenly becomes trivial.

      I also bet that if those thousands of families could be persuaded to get those images scanned, if they'd be willing to contribute to the cost of the collating and storage costs, that it would seriously change the way the past is seen by historians, family history fans and archivists.

      Sounds to me like Google does offer, but thousands of families don't want to share their photos. Seems like someone should start encouraging people to do so.

    3. Re:I, too, have an archive. by Solandri · · Score: 1

      Google gives you unlimited storage of pictures up to 2048x2048 resolution. You can set up the Google Photos app to make instant cloud backups of any photos you take with your cell phone, which for most people will be good enough, with them manually copying the full-res photo if they take a particularly good one. (They also give you unlimited backup of videos though I'm not sure of the size and length restrictions.)

      If you subscribe to Amazon Prime, it includes Prime Photos which gives you unlimited storage of photos of any resolution. Their Prime Photos app can also do instant cloud backups of photos taken with your cell phone. I use this to supplement my NAS and its backup (the cloud storage is off-site, in case my house burns down).

      If you subscribe to Office 365, it includes 1 TB of cloud storage on OneDrive.

      Speaking as an amateur photographer, I only considered about 1 in 30 photos to be "keepers". About 1 in 300 as standout. This ratio seems to hold even for professional photographers (National Geographic did a story on this, and their photographers said they shot about 5000-7000 photos for a story, to produce the dozen photos which made it into a magazine story). So the fact that I have over 10,000 slides and negatives in a storage box doesn't necessarily mean I'd want to scan/store 10,000 photos. It's more a logistical matter of not being able to separate the good photos from the bad on a negative strip, or from a box of slides organized by date/event. (I did scan most of them 15 years ago, but the HDD died - that was the incident which made me OCD about keeping backups.)

    4. Re:I, too, have an archive. by jd · · Score: 1

      I have the negatives, they're all medium and in good condition. Decent quality, too. What I don't want to do is lose any information, as anything lost due to damage to those negatives is lost forever. I've a decent, if not great, scanner - an old V600 - which will do 24-bit. Some of the negatives are simply too big to scan at that quality, at the resolution at which there are sufficiently few grains showing for me to be sure there's new information.

      Once it's scanned, there's probably ways to reduce the image. I can't imagine the full dynamic range the scanner can handle appears on the film, it's far too old for that, and almost certainly no new significant information is added long before the resolution I'm using.

      --
      It's a small world and it smells funny; I'd buy another if it wasn't for the money; Take back what I paid (SoM)