Slashdot Mirror


Digitizing Your Dead Trees?

smart2000 asks: "I'm tired of lugging around dead trees. I've just moved offices and had to move over 100 pounds of 'essential' technical books. It is clear to me that the dead tree industry is never going to supply the books I want in electronic form, so it's time to do it myself. What hardware and software should I use?"

"The Plan: Take the binding of each book and cut it off. Feed into a scanner with duplex and cut-sheet feeder. Scan as a 300 DPI jpeg with compression. Then OCR them overnight. I don't expect the OCR to be perfect, just good enough to use as a searchable index.

What are the suitable scanner choices for Linux? Any recommendations for OCR software that will write in an open format? Has anyone done this before?"

2 of 347 comments (clear)

  1. Let me get this straight... by deacon · · Score: 5, Insightful
    You are going to cut up thousands of dollars worth of your "essential" books?

    And put them into an inferior visual format you cannot read without the computer being working and on?

    And you are going to spend about 100 hours to do this.. and the original books are going to be ruined.

    All this just so you don't have to make 3 trips to move your books?

    Mmmkayyy.. (backs away slowly)

    Have you ever heard of a dolly?

  2. Re:Do you really need them? by Waffle+Iron · · Score: 5, Insightful
    Do they actually have time to read them? Or are they more for show?

    Back before the Web when I was a hardware designer, books were a kind of currency that engineering salespeople used to entice you to meet with them. Each chip manufacturer printed stacks and stacks of data books covering their various product lines. They'd give these to the sales reps who would cart them in on dollies to hand out to the engineers who showed up to hear their latest pitch.

    In a way, huge bookshelves with hundreds of books was a status symbol, showing that you'd been around a while and a lot of people thought it was worthwile to give you books. It was useful to have all of that info available, but few people actually used more than 1% the data that was on their shelves.

    The instant the chip companies put their chip data on the web, all of those books became totally useless. Now I'm doing software, everything is online, and I can go for weeks on end without picking up a technical book.

    I do sometimes miss the office atmosphere you get from row after row of data books neatly segregated by the corporate logos and color schemes on their spines. It had an important look to it.