Google Releases Tesseract as Open Source

← Back to Stories (view on slashdot.org)

Google Releases Tesseract as Open Source

Posted by ryuzaki0 on Monday September 4, 2006 @03:27PM from the bit-rot dept.

An anonymous reader writes "Google recently released Tesseract as open source. Originally developed at the HP Labs from 1985-1995, it has been touted as one of the most accurate Optical Character Recognition (OCR) programs available. Having sat on the shelf gathering dust for so many years, Google cleaned up some of the more outdated portions of the code and released it for general consumption. You can download Tesseract over at Sourceforge.

5 of 251 comments (clear)

Min score:

Reason:

Sort:

Re:As much as I like open source software ... by aweinert · 2006-09-04 15:32 · Score: 5, Informative

CAPTCHAs are specifically meant to break OCR... and if you RTFA, it say it does poorly with grayscale and color documents. Baisically its meant for reading typed text... like in a book.
Re:From the Project by kevlarman · 2006-09-04 16:10 · Score: 3, Informative

if you had bothered to browse cvs you would find that it has been released under the apache license: http://tesseract-ocr.cvs.sourceforge.net/tesseract -ocr/tesseract/COPYING?view=markup

--
A mouse is a device used to point to the xterm you want to type in
Re:NFB owns you by MrNonchalant · 2006-09-04 17:08 · Score: 4, Informative

You can build accessible CAPTCHAs, using images with a sound backup for blind users. My girlfriend is visually impaired and non-accessible CAPTCHAs are a real problem for her, she can't register at some sites without assistance.
Re:I call bullshit by johansalk · 2006-09-04 22:25 · Score: 3, Informative

If captcha is using humans, wasn't there an anti-captcha thing spammers were doing by having people answer some captcha to get into some free porn that is then used (their answer) to get the bots through legitimate sites the spammers wanted to get into?
Re:Un-Finishable by gweeks · 2006-09-04 23:03 · Score: 3, Informative

> This is patently false. New stuff comes out of copyright every day.

This is just so un-true. In the United States (the only place that project Gutenberg worries about) nothing is entering the Public Domain except unpublished manuscripts where the author died 70 years ago. Nothing else will enter the public domain until 2019. Congress has affectivly frozen the public domain.