Slashdot Mirror


NASA Requests Help With Von Braun's Notes

DynaSoar writes "NASA is soliciting ideas from the public on how best to catalog and digitize the collected notes of Wernher von Braun. 'We're looking for creative ways to get it out to the public,' said project manager Jason Crusan. 'We don't always do the best with putting out large sets of data like this.' The PDF notes are those of rocket scientist Wernher von Braun, the first director of NASA's Marshall Spaceflight Center in Huntsville, Alabama and are typed with copious handwritten notes in the margin. According to the official request for information, NASA needs ideas on what format to use (PDF), how to index the notes, and how to create a useful database. The unique nature and historical value of the data, literally discovered in boxes six months ago, is what motivated NASA to ask the public for ideas."

7 of 148 comments (clear)

  1. Re:NASA by HalifaxRage · · Score: 5, Funny

    Next week: What to do with this big golden box thing? We tried opening it and some guy's face melted.

    --
    bomb the us up set someone
  2. Obligatory Tom Lehrer.. by Anonymous Coward · · Score: 5, Funny

    Gather round while I sing you of Wernher von Braun
    A man whose allegiance is ruled by expedience
    Call him a Nazi, he won't even frown
    "Ha, Nazi schmazi," says Wernher von Braun

    Don't say that he's hypocritical
    Say rather that he's apolitical
    "Once the rockets are up, who cares where they come down
    That's not my department," says Wernher von Braun

    Some have harsh words for this man of renown
    But some think our attitude should be one of gratitude
    Like the widows and cripples in old London town
    Who owe their large pensions to Wernher von Braun

    You too may be a big hero
    Once you've learned to count backwards to zero
    "In German oder English I know how to count down
    Und I'm learning Chinese," says Wernher von Braun

    1. Re:Obligatory Tom Lehrer.. by Anonymous Coward · · Score: 5, Informative

      Here he is performing it live.

    2. Re:Obligatory Tom Lehrer.. by Anonymous Coward · · Score: 5, Funny

      Looks recorded.

  3. TIFF FTW by alta · · Score: 5, Interesting

    Lets go with a format almost anyone can read. As soon as their all scanned in as high res TIFFs THEN you can begin to OCR them and create hybrid PDF's which CAN be indexed. From there we have a good start with high quality originals and searchable dirivitives. Then people can start rolling whatever custom solutions they want to.

    Yes, I know that OCR is going to be very crude, especially for anything hand written. But what it will do is get us a very good starting point. Id like to see a wiki set up with the OCR'd text as the beginning text, a link to the document and then the public can begin to go in and correct the OCR mistakes, and fill in what just flat out couldn't be OCRd.

    --
    Do not meddle in the affairs of sysadmins, for they are subtle, and quick to anger.
  4. Use a Wiki to Process Images to Open Format by eldavojohn · · Score: 5, Insightful

    Well, considering they host over 6,000 pdfs and the RFI is in PDF with the title of the document being "Microsoft Word - WvB RFI 6-24-09.doc" by Jason Crusan who used Acrobat Distiller 7.0.5(Windows), I think we know what everyone uses at NASA. Fine. I'm not going to bitch about that. Instead I'm going to point out that if you're already dependent on Adobe Acrobat Reader & Microsoft Word being around until the end of time supporting your old doctypes, you might as well release these in PDF from DOC sources too.

    But, if I were doing this: Assuming these are all in images, put the images in whatever format you want and make a generic wiki page for each of them. Then let users log in (NASA fans should pour in) and translate the pages to annotated wiki pages with the footnotes (normally references) being all the side notes that were penciled in. They can categorize them by related missions and maybe even tag them ... you will need at least one or two people on your staff to administrate. Diagrams and drawings will probably need to be cropped and retained as images. Keep those in a lossless format but distribute whatever saves you bandwidth.

    Once that's done, ideally you'd put it in some XML standards based format (ODF or OOXML, yeah, that's another argument to be had) that you will always be able to read even if you have to build your own viewer/converter. Keep these sources indexed and provide for people the rendered PDF/PS/PNG/whocares and then you could probably build scripts to rebuild all from sources if you want. New technology comes out or people want to view them in HTML 5--no problem, just build a neat little XSLT for them.

    As for indexing them, I can tell you one way not to do it. Don't do the thing that curators of classical music did. Man, that's like speaking another language to me. Arrange the notes by mission or date if you can and any natural titles that arise for the favorites, add to it as an alias.

    --
    My work here is dung.
  5. Re:Contact MIT and their archival department by Will.Woodhull · · Score: 5, Informative

    Let me fix that for you:

    the SECOND MOST IMPORTANT aspect of the documents is that it is easily searched.

    The FIRST is of course making a high fidelity digital copy of the original pages, that will serve as the authority on all questions of possible ambiguity in the handwriting, or whether a figure in the margin is a thumbnail sketch or a mere doodle.

    A 600 or 1200 dpi .png image of each page in full color would do as the master digital archive. The .png format is an excellent choice since it is open, well understood, and going to be around for a long, long time. Its accuracy is more than adequate for this work. That it supports lossless compression is a bonus: images of pages usually compress very well. Copies of the master digital library should be kept at various institutions and made available on request to anyone.

    Then for public and research use, convert each page to HTML 4.01 strict, (since it is universally available, will be around for a long, long time, and Google, etc, can do the indexing for us). UTF of course, especially since Werner used some German and Greek glyphs in his handwriting.

    Suggest using OCR to handle conversion of the typed notes, and volunteers or cheap student labor to transcribe the handwritten material (use consensus of several transcribers to assure accuracy). These can be incorporated into the main pages as divs and spans inserted into the correct place in the flow (use classes like "left margin" and "rightmargin"). CSS can use absolute positioning to make them marginal accordians (expand from the margin on mouseover), etc.

    Treat sketches like the handwriting: put an img of the sketch into a div or span at the right place in the flow, then also add a searchable text description of the sketch in that div.

    A simple script can process the final HTML fragment of each page and insert id="unique" attributes on each paragraph, etc, and <a name="unique"> targets where these would be useful.

    The finished NASA product should be a simple online database using server side scripting to compose and serve out pages on request. It should be built with cooperation from Google and other search platforms so that spiders will have good access to the body of the work without causing excessive bandwidth problems. It should be possible for any researcher to develop his own custom search engine. Ideally, it will support not just the notes, but also concordances, wiki discussions, etc.

    I once did a lot of this kind of work in moving sermons and such that were circulated by mimeograph in the 1960s and 1970s to web pages. I digitized the pages with a Minolta Z1 camera on a reverse tripod using indirect lighting, and converted to OCR with OmniScan (IIRC). The OCR came out in Word 97 format, and I used Perl scripts to transcribe to HTML. If the technical quality of the originals is good, this can go pretty fast and is highly accurate, even as a basement project. If the original notes use consistent formatting, which I would expect of Werner, then scripting with good use of regular expressions cna do the bulk of the HTML markup.

    --
    Will