Slashdot Mirror


Google Is Using AI To Digitize 5 Million Historical Photos (theverge.com)

Google is working with The New York Times to digitize its huge collection of about 7 million historic images. The pictures are apparently kept in the newspaper's "morgue," which contains pictures going back to the 19th century, many of which exist nowhere else in the world. The Verge reports: That's why the company has hired Google, which will use its machine vision smarts to not only scan the hand- and type-written notes attached to each image, but categorize the semantic information they contain (linking data like locations and dates). Google says the Times will also be able to use its object recognition tools to extract even more information from the photos, making them easier to catalog and resurface for future use.

30 comments

  1. how is google doing the digitizing? by Anonymous Coward · · Score: 0

    are they bringing equipment to the archives to scan everything themselves? and what use is google's ai for this?

    1. Re: how is google doing the digitizing? by Anonymous Coward · · Score: 0

      Yeah this sounds like something you could just hire people to do. Scanning and data entry are brain dead tasks so I guess you'll have to pay the people to be bored all day.

      Oh now I know why they want AI...they think it's free or reusable haha. An AI for doing this will only do this so bet wisely...just hire people. This is too niche and dumb of a task for AI.

    2. Re:how is google doing the digitizing? by Anonymous Coward · · Score: 0

      At the very least, you should read the summary! Half of it (!), and the whole quote (!!), is explaining what the AI will be used for. The AI will be used to analyze the photos to recognize the texts and objects in them, and to catalog and categorize them.

    3. Re: how is google doing the digitizing? by batukhan · · Score: 1

      Apparently Google has the AI which makes it tremendously simpler to scan and label millions of photos somehow. /s No people will still do the bit with the scanning and labeling. But they will also use AI to generate more labels. Which requires more people but special Google kind of people

  2. Coming soon to ReCAPTCHA by Anonymous Coward · · Score: 0

    ReCAPTCHA - Outsourcing Google's 'AI' since 2007

  3. Dupe by dereference · · Score: 2

    From earlier today no less.

    1. Re:Dupe by Anonymous Coward · · Score: 0

      Not quite, the previous one was about NYT doing the digitization, this one is about Google helping doing it.

      It's a split article, for twice the ad revenue.

    2. Re:Dupe by Anonymous Coward · · Score: 0

      But you NEED to KNOW.

      It's important, dammit!

  4. Re:I hate c6gunner... apk by Anonymous Coward · · Score: 1

    Your whole life story consists of fucking with people and thinking that THIS time they won't fuck with you back.

  5. Dupe Dupe Dupe by Stonent1 · · Score: 2, Funny

    Dupe of URL Dupe dupe dupe of URL.

    1. Re:Dupe Dupe Dupe by Anonymous Coward · · Score: 0

      How do they fuck up this bad..... so often?

      Oh, and the "related stories" at the bottom are never related.

    2. Re:Dupe Dupe Dupe by Anonymous Coward · · Score: 0

      Sorry, "related links"

  6. How does AI work on non-digitized photos? by mark-t · · Score: 1

    Don't the photos have to already *BE* digitized for AI to process them in the first place?

  7. After digitization, coorelation and AI scanning by bobstreo · · Score: 1

    the images will be given to BookFace for more friend suggestions. /s

    I hope they don't destroy the originals, the longevity of image storage on silver nitrate media beats any digital technology ever invented.

    1. Re:After digitization, coorelation and AI scanning by jd · · Score: 2

      Silver nitrate and the original magnetic core memory had about the same lifespan. However, you run into problems of size, speed of access, etc. For those youngsters who never encountered it, core memory was an improvement on the Williams memory device that had been used in previous generations of machines. It used slowly-decaying magnetic fields to store information for 100+ years. This made it the world's first electronic non-volatile storage. At a density of 32 kbits per cubic foot, it was also very inefficient.

      So, whilst it's not quite correct to say that old-fashioned film is better than all digital in terms of longevity, it's better than most and those that come close in longevity don't come close in information density. High-grade medium film carries a LOT of information in a couple of square inches.

      --
      It's a small world and it smells funny; I'd buy another if it wasn't for the money; Take back what I paid (SoM)
  8. Upload everything to Archive.org too by Anonymous Coward · · Score: 1

    We can not trust Google or any other corporation to preserve that images. Please, also upload a copy of all that pictures to the Internet Archive (archive.org), otherwise when this contents stops being good business to Google and/or The New York Times it will be lost forever. Maybe someday we will get an AI for the Archive.org, at that moment we will rock all the knowledge of the world !!!

    1. Re: Upload everything to Archive.org too by Anonymous Coward · · Score: 0

      We cannot trust the NYT either. They have now repeatedly called for the assassination of Trump.

      They represent the moneychanger mafia.

  9. AI to Digitize? by Anonymous Coward · · Score: 0

    It's called a scanner. What an attempt at a puff piece gone wrong.

  10. No by Anonymous Coward · · Score: 0

    They are using algorithms to automate, just like people did before. What will it take to kill the use of the term, 'AI'?

  11. 5 Million or 7 Million? by Anonymous Coward · · Score: 0

    Or only digitizing 5 of the 7 million?

    For fuck's sake, how hard is it to summarize an article?

  12. nice article by Abhishek+ku · · Score: 1

    Excellent article..

  13. I, too, have an archive. by jd · · Score: 2

    My collection is a few tens of thousands of photographs from three families dating from 1880 or so to 1980. A proper scan that gets out the greatest amount of actual information has yielded an average of about a gigabyte per photograph, so far. That's a lot of information. So my pathetically small collection holds about ten or so terabytes of data.

    I've absolutely no idea how I'm going to store that kind of volume of data, it's not like Google will offer.

    But I bet you a dozen doughnuts that there are thousands of families in the same boat, who have vast collections of negatives that they'll destroy because they don't have room and don't see an immediate value in.

    I also bet that if those thousands of families could be persuaded to get those images scanned, if they'd be willing to contribute to the cost of the collating and storage costs, that it would seriously change the way the past is seen by historians, family history fans and archivists.

    And I'd bet that such an archive would have a profoundly greater impact than the NYT archive would.

    It won't happen because those aforementioned families will be stubborn and prefer destruction over conservation, because none of the cloud vendors would be interested in helping disseminate information of individually uncertain value, and because most people see history as someone else's problem.

    --
    It's a small world and it smells funny; I'd buy another if it wasn't for the money; Take back what I paid (SoM)
    1. Re:I, too, have an archive. by Anonymous Coward · · Score: 1

      You don't need a gigabyte per photo. It sounds like that's an uncompressed 16-bit TIFF at the scanner's maximum resolution. Unless the photos were taken with a 4x5 or larger view camera and you have the negatives/glass plates or large prints, the photos don't really contain that much information. Try exporting one as a 10 megapixel JPEG and compare it side by side with the TIFF, scaled to the same size. You probably won't be able to tell the difference.

      You can also compress the TIFFs with LZW or ZIP, and for black and white photos if you don't care about preserving tinting you can convert them to grayscale. Both will dramatically decrease your file sizes.

    2. Re:I, too, have an archive. by kqs · · Score: 1

      I've absolutely no idea how I'm going to store that kind of volume of data, it's not like Google will offer.

      Are you sure? Google says that they'll store unlimited pictures at up to 16MP, which is better quality than most older film. So.... yeah, they did offer. Maybe not in an uncompressed format or in your preferred format, but probably with way more metadata than you can reasonably add. I don't know if there is a way to tell Google the date of an old photo, but if so, then a search for "photos from the 1930s of people at a beach" suddenly becomes trivial.

      I also bet that if those thousands of families could be persuaded to get those images scanned, if they'd be willing to contribute to the cost of the collating and storage costs, that it would seriously change the way the past is seen by historians, family history fans and archivists.

      Sounds to me like Google does offer, but thousands of families don't want to share their photos. Seems like someone should start encouraging people to do so.

    3. Re:I, too, have an archive. by Solandri · · Score: 1

      Google gives you unlimited storage of pictures up to 2048x2048 resolution. You can set up the Google Photos app to make instant cloud backups of any photos you take with your cell phone, which for most people will be good enough, with them manually copying the full-res photo if they take a particularly good one. (They also give you unlimited backup of videos though I'm not sure of the size and length restrictions.)

      If you subscribe to Amazon Prime, it includes Prime Photos which gives you unlimited storage of photos of any resolution. Their Prime Photos app can also do instant cloud backups of photos taken with your cell phone. I use this to supplement my NAS and its backup (the cloud storage is off-site, in case my house burns down).

      If you subscribe to Office 365, it includes 1 TB of cloud storage on OneDrive.

      Speaking as an amateur photographer, I only considered about 1 in 30 photos to be "keepers". About 1 in 300 as standout. This ratio seems to hold even for professional photographers (National Geographic did a story on this, and their photographers said they shot about 5000-7000 photos for a story, to produce the dozen photos which made it into a magazine story). So the fact that I have over 10,000 slides and negatives in a storage box doesn't necessarily mean I'd want to scan/store 10,000 photos. It's more a logistical matter of not being able to separate the good photos from the bad on a negative strip, or from a box of slides organized by date/event. (I did scan most of them 15 years ago, but the HDD died - that was the incident which made me OCD about keeping backups.)

    4. Re:I, too, have an archive. by jd · · Score: 1

      I have the negatives, they're all medium and in good condition. Decent quality, too. What I don't want to do is lose any information, as anything lost due to damage to those negatives is lost forever. I've a decent, if not great, scanner - an old V600 - which will do 24-bit. Some of the negatives are simply too big to scan at that quality, at the resolution at which there are sufficiently few grains showing for me to be sure there's new information.

      Once it's scanned, there's probably ways to reduce the image. I can't imagine the full dynamic range the scanner can handle appears on the film, it's far too old for that, and almost certainly no new significant information is added long before the resolution I'm using.

      --
      It's a small world and it smells funny; I'd buy another if it wasn't for the money; Take back what I paid (SoM)
    5. Re:I, too, have an archive. by Anonymous Coward · · Score: 0

      because most people see history as someone else's problem.

      With autists like you trying to make out that there's a gigabyte of useful information in a family snapshot, I'm not fucking surprised they don't want to know! I bet these "aforementioned families" can't wait to get rid of this fucking whacko they let in to their attics.

      Are you even fucking related to any of them?

  14. Re:I hate c6gunner... apk by Anonymous Coward · · Score: 0

    Shut the fuck up, you hapless goober. You prance around on this site desperately trying to hawk your weak-ass crap, and it's cringeworthy.

    You're that loser at the party who's only allowed to attend so he can be laughed at and made fun of. And yet you come crawling back, "Oh please guys, please can I come, please please please?"

  15. Impersonating me AGAIN? apk by Anonymous Coward · · Score: 0

    See subject: I pity c6gunner caught impersonating me (his name's the submitter signing "APK") https://linux.slashdot.org/com... as he obviously forgot to submit as AC vs. using his registered 'lusername' instead, lol!

    * Simply because he tried to INSULT me & I made him a COMPLETELY FAIR CHALLENGE he couldn't meet or beat by showing me he's done better work in the past prior to his impersonating me there.

    APK

    P.S.=> You shouldn't throw stones when you live in a glass house boys - especially vs. me... apk