Slashdot Mirror


Google Pushes Open Source OCR

SocialWorm writes "Google has just announced work on OCRopus, which it says it hopes will 'advance the state of the art in optical character recognition and related technologies.' OCRopus will be available under the Apache 2.0 License. Obviously, there may be search and image search implications from OCRopus. 'The goal of the project is to advance the state of the art in optical character recognition and related technologies, and to deliver a high quality OCR system suitable for document conversions, electronic libraries, vision impaired users, historical document analysis, and general desktop use. In addition, we are structuring the system in such a way that it will be easy to reuse by other researchers in the field.'"

9 of 212 comments (clear)

  1. the presidential papers by User+956 · · Score: 4, Funny

    The goal of the project is to ... deliver a high quality OCR system suitable for document conversions, electronic libraries, vision impaired users, historical document analysis

    So, will it work on documents written in crayon? It would be a tragic loss for Dubya's presidential documents to get lost in the sands of time. On the scale of the library of Alexandria. No, seriously.

    --
    The theory of relativity doesn't work right in Arkansas.
    1. Re:the presidential papers by adickerson0 · · Score: 2, Funny

      No need, Dubya turns his work into the Secretary of Education so she can put a gold star on each page. While this may seem like a childish system it is really the only sort of over site he would agree to. The original plan was to scan everything and place an RFID Gold Star on each page for tracking, that way the Executive Branches work could be preserved, however this led to a few problems. Apparently the Sec of Ed got to busy and turned the work over to an intern. The intern decided to not only put a Gold Star on each page but actually started grading the papers. This lead to the "Inbasion of Iwack Plans" scandal. Dubya's plan, which included a drawing of himself in a jet holding an American flag, was given a "A+ Good Work" stamp. This of course was given back to the Presidnet who decided that if it was a "A+" then there is no way his plan would fail.

  2. Orcopus? by voice_of_all_reason · · Score: 4, Funny

    Orcopus:

    Level: 15
    Race: Fell Marine
    HP: 290/290
    EP: 200/200
    Water elemental
    Drops: Tentacle

  3. Re:Language? by fireboy1919 · · Score: 5, Funny

    Since the official language of the Googleplex is Googlese, and the original project was developed by the US Census bureau - notorious for their use of no languages except Esperanto, it goes without saying (though I'm saying it anyway), that it will read only Klingon.

    Remember kids, there are no stupid questions.
    Only people who don't RTFA who ask questions.

    --
    Mod me down and I will become more powerful than you can possibly imagine!
  4. Re:The beginning of the end? by thePowerOfGrayskull · · Score: 2, Funny

    A good captcha has a non-sense string of characters in various cases, all skewed and distorted, with extra geometric elements obscuring the characters. This renders unavailable somewhere around half of the clues that an OCR uses. Hell, if we obscure it enough it can be practically buried under geometric noise; and once we do that, we've solved the AC problem on /.!
  5. Re:From? by MightyYar · · Score: 2, Funny
    --
    W..w..W - Willy Waterloo washes Warren Wiggins who is washing Waldo Woo.
  6. Re:captcha's by cmacb · · Score: 4, Funny

    captcha's are not restricted to images of letters. For example: you could ask people to solve a regular text question (this would also fix accessibility issues)

    You mean as in:

    Describe what the following expression does in 30 words or less:
    {"ab", "c"}{"d", "ef"} = {"abd", "abef", "cd", "cef"}


    Man, I'll never get into forum postings if they do that!
  7. Re:captcha's by JamesTRexx · · Score: 4, Funny

    Describe what the following expression does in 30 words or less:
    {"ab", "c"}{"d", "ef"} = {"abd", "abef", "cd", "cef"}


    Answer: Makes my head hurt...

    *click* Access to MySpace granted, have a nice day.

    Which forum were you taking about again? :-)

    --
    home
  8. Ocropus? by 6Yankee · · Score: 2, Funny

    Is that a Chinese mispronunciation? ;)