Slashdot Mirror


Digitizing 100 Years of Astronomical Data

Maximum Prophet writes to mention that a collection of glass plates containing astronomical information from the late 19th century through the mid-1980s is being considered for digitization. "The accumulated result weighs heavily on its keepers on Observatory Hill, just up Garden Street from Harvard Square: more than half a million images constituting humanity's only record of a century's worth of sky. 'Besides being 25 percent of the world's total of astronomical photographic plates, this is the only collection that covers both hemispheres,' said Alison Doane, curator of a glass database occupying three floors, two of them subterranean, connected by corkscrew stairs. It weighs 165 tons and contains more than a petabyte of data. The scary thing is that there is no backup." I'm sure that anyone with a spare $5 million or so would be welcomed with open arms.

38 of 115 comments (clear)

  1. That's quite a bit. by L.+VeGas · · Score: 2, Funny

    165 tons of glass plates?

    Sounds like a typical lunch clean-up after Rosie O'Donnel.

    Sorry. I'm truly sorry.

  2. Glass plates will outlive the digital"backup" by gatkinso · · Score: 4, Insightful

    now there is some irony.

    --
    I am very small, utmostly microscopic.
    1. Re:Glass plates will outlive the digital"backup" by WhatHappenedToTanith · · Score: 5, Funny

      Are you sure about the stability of glass plates? I hear a lot of people have real trouble with windows stability! Sorry, I'll go now....

    2. Re:Glass plates will outlive the digital"backup" by KokorHekkus · · Score: 4, Insightful

      now there is some irony.
      But currently they also makes them vulnerable to a single point of failure (as indirectly pointed out in the article). If you have some data that has any real value for you then having only one copy (or only one storage facility) isn't any real protection whatever method you use. In this case we have data that would be readily accepted for backup by organisations all around the globe and barring a worldwide upheaval the safety of the data would be much better than any single glassplate could offer.

      Of course the ideal would be if we could develop a cheap digital permanent storage that had guaranteed physical longevity, say several millenia. That combination would allow easy dissemination of the data and safety by using a multiplicty of sources.
    3. Re:Glass plates will outlive the digital"backup" by seaturnip · · Score: 3, Informative

      So what? Copy the digital version onto a second set of disks when it comes close to expiring.

      Lossless copying means that given a little bit of maintenance, expiration of digital media is a nonissue.
    4. Re:Glass plates will outlive the digital"backup" by profplump · · Score: 2, Interesting

      And your photographic copy would A) degrade over time and B) lose quality with each copy. IMHO that's not a very good archive. Moreover, in order to slow the inevitable decay that comes with time and reactive chemicals on paper/plastic/metal/whatever, you'd still need a climate-controlled facility. And you'd still need a team of operators to make the copy, and to make later copies as the earlier ones degrade. And more than anything else, you'd need someplace to store *another* 165 tons of photos, which is certainly larger than the space required to store a petabyte of data in a modern digital format.

      I'm not really seeing how your photographic archive saves money. I'm not convinced it would produce better longevity either. You might get better longevity for a single copy than with digital data, but it's a whole lot cheaper to make digital copy #2 than to make photographic copy #2.

      If you're worried about file formats you could simply leave a printed text detailing the data format. Then anyone with the ability to read the media would be able to recreate viewing software, even if none existed for then-modern computers.

      If you're worried about being able to read the media then you're really worried about ongoing funding -- someone to continue preserving the archive in the future. That's a problem that exists regardless of the format of the archive; if someone decided they didn't want to keep paying for 3 floors of a building, or to continue making copies of the photographic archive, you'd still be in trouble.

    5. Re:Glass plates will outlive the digital"backup" by Cecil · · Score: 4, Insightful

      Ever tried to maintain archival backups for a petabyte-worth of data?

      Yes, as a matter of fact. Definitely a lot of work is involved, but do you believe that you wouldn't need a team of document managers, millions of dollars worth of floor space, and expensive climate controlled facilities for archival of microfiche? You most certainly do. It's a lot of data. Period. No matter what you try to do with it, it's a lot of data. It's going to require a lot of resources. That's just a fact of life.

      Anyway, noone in their right mind would choose microfiche for that type of data. If you're only storing plain text pages it's adequate (though I still don't think it would be the "right way to do it" in this day and age), but for photographic plates? Not going to work.

      Microfiche is vastly overrated, in my opinion. My current project involves taking 2 floors worth of 30-50 year old microfiche and scanning it, OCRing it, and PDFing it. Yes it certainly does age. Quite poorly, in fact. The quality is absolutely terrible compared to the paper versions, some of it is stuck together, and indexing and cataloging it is a nightmare all of its own.

      Yes, there are challenges in the digital world too, but most are easily surmountable given a little bit of common sense in understanding that digital is not magic. It doesn't mean you can "fire and forget". The documents will still require maintenance, cataloging, protection and monitoring. Format obsolescence is very nearly a nonissue, it is blown way out of proportion. That's where the "maintenance" comes in. The key benefit of digital is that you can and should losslessly upgrade your format whenever obsolescence is becoming a concern. Formats do not disappear overnight and suddenly everyone forgets what to do with them, you have plenty of time to make your transition if you're paying attention (which you must be: again, digital is not magic).

    6. Re:Glass plates will outlive the digital"backup" by FST777 · · Score: 2, Insightful

      By that time, other techniques will be available to copy the digital archive over. Heck, it might even be possible to make a copy of the digital data on glass plates, complete with descriptions of the used protocol.

      It's true that digitized data is more prone to failure than most analog carriers. The whole point is that digitized data is much easier copied over and over again, without loss, independent from whatever carrier used.

      --
      Free beer is never free as in speech. Free speech is always free as in beer.
  3. This sounds like a job for Google by MDMurphy · · Score: 2, Insightful

    Google provides views of the Earth, Moon and Mars, why not stars? If the information was made available for them to deliver to their users, they might be interested.

    1. Re:This sounds like a job for Google by CanSpice · · Score: 2, Informative

      Don't worry, it's coming. I've seen previews of Google Sky at a couple of astronomical conferences so far. Also, check out partner number four for the Large Synoptic Survey Telescope.

  4. A Million People With $5 by Stranger4U · · Score: 3, Insightful

    This seems like a great opportunity for either corporate sponsorship, or a grass-roots donation drive. In all honesty, $5 million isn't a whole lot of money for the likes of any real corporation, and it probably wouldn't be that hard to raise it through small donations from individuals. Espectially if you could ascribe names to some or all of it. How would it feel to be able to personally identify which plates you paid to have scanned? (this image of the Crab Nebula brought to you by John Smith) I'm surprised Paul Allen or Richard Branson aren't all over this like stink on shit.

  5. Google by blhack · · Score: 4, Insightful

    I'm sure that a company like google would be MORE than willing to fund a project archiving these. The positive press, proliferation of their intended "do no evil/good guy/just another bunch of geeks" image, having their name on a major scientific project would easily be worth the investment.

    --
    NewslilySocial News. No lolcats allowed.
  6. InfiniBytes by Doc+Ruby · · Score: 4, Informative

    contains more than a petabyte of data

    Glass photographic plates, especially from silver emulsion, are analog at extremely fine granularity. Effectively molecular, depending on how flat the glass surface was settled from its molten liquid state. The features of its silver oxide crystals, laid in place by individual photons arriving from vastly distant stars, could be meaningful at less than a nanometer. Especially when measuring extremely subtle influences, like the gravity from one distant star bending the light of another distant star, measured across a century in which those stars lost gravitational mass, for comparison.

    There is a practically infinite amount of data on each of those plates, limited by our precision in measuring them. It's a smaller degree of infinity than that of the sky. But the original infinite sky is lost. While the plates' lesser infinities are impossible to replace, and all we'll get to use to look back across all the billions of years we saw in a long century of them.
    --

    --
    make install -not war

    1. Re:InfiniBytes by modecx · · Score: 4, Insightful

      here is a practically infinite amount of data on each of those plates, limited by our precision in measuring them.

      And limited by the lenses/mirrors, and limited by atmospheric effects, and inconsistencies in the glass, and the silver, and, and....

      I can't testify to the quality of the glass negatives, but I can testify to the fact that as much as people like to believe, even the best modern analog capture sources aren't anywhere near practically infinite, even in the best laboratory conditions.

      --
      Constitutional rights may be respected, repealed, or modified; but they must never be ignored.
    2. Re:InfiniBytes by Doc+Ruby · · Score: 2, Insightful

      Well, the lenses/mirrors that are now lost to history do introduce noise. But the atmospheric effects, and inconsistencies in the glass and silver, and probably much of the "writing" noise from the optics do all hold the possibility of being filtered out. Maybe not now, with today's early signal processing tech. But in another hundred or more years, that signal info could be available. If we don't damage them in the interim.

      --

      --
      make install -not war

    3. Re:InfiniBytes by Doc+Ruby · · Score: 2, Informative

      In another hundred years that kind of data collection will probably be easy. But still extremely valuable, because the data recorded in them is irreplaceable.

      If the astronomers who recorded these plates weren't anal, then astronomy wouldn't be advanced enough by now for you to enjoy it as an amateur.

      --

      --
      make install -not war

    4. Re:InfiniBytes by monopole · · Score: 2, Insightful

      Having worked with holographic media for decades (which is about as fine resolution as you can get optically) the maximum resolution is on par with the grain size 40 nm (Afga 8e75) and considerably worse both due to the wavelength of light and the expansion of grains during exposure. To get 'molecular' resolution you'd have to go over to dichromate plates far too slow.

      Due to speed considerations the grain of these plates would be much worse. But well within the resolution of the 'scope used for recording.

      All that said these plates are a goldmine once digitized due to the ability to do massive searches both spatially and temporally.

  7. That's at least... by TheSHAD0W · · Score: 2, Funny

    a thousand minibuses full of magtape.

  8. Re:Why does it have to cost so much? by JohnnyGTO · · Score: 3, Insightful

    Those plates as well as being old and delicate contain a LOT more information then a piece of paper. Considering that something less then 1/4 the size of the period on the end of the sentence is important your scanning at a much higher resolution.

    --
    Si vis pacem, para bellum! For evil to succeed good men need only do nothing!
  9. Re:Backup? by TheSHAD0W · · Score: 2, Interesting

    How about we make a backup of the backup on glass plates...

    Ack! Put down that knife!

  10. sounds familar by Anonymous Coward · · Score: 4, Funny

    anyone with a spare $5 million or so would be welcomed with open arms

    That's what she said!
  11. Re:Would Google archive it, perhaps? by ScrewMaster · · Score: 2, Insightful

    Google might do it just because it would be un-evil, and worth quite a few brownie points with scientists around the globe, not to mention that it would be cool archive to search.

    --
    The higher the technology, the sharper that two-edged sword.
  12. Re:Why does it have to cost so much? by ghostlibrary · · Score: 5, Informative

    > I spearheaded a "digital backup" of around 90 filing cabinets of papers ...
    > It took 2 years and way WAY WAY less than $5,000,000 to do it

    500,000 plates. Over 2 years, assuming 50 wks/yr means just 5000 plates need be scanned per week. 1000 plates per day. 125 plates per hour. And this is large, fragile glass with really high data density, so you have to be a) careful in handling and b) use slow high-res scanning.

    Let's take a guess that it takes only 10 minutes per plate (to fetch, tag, load, scan, and return). So we need only 20 people to scan 125 plates/hour.

    Well, assume 20 scanning people and 1 IT guy handling the sysadmin work for the petabyte storage. Also one scientist/manager. Take a low intern/grad student $35k, 1 sysadmin at $65k, 1 PM/sci at $85K. All x2.5 for overhead, for 2 years. That's $4.25 mil in salaries.

    There's also buying a redundant petabyte and all the necessary gear. I'm amazed they figure $5mil can do it.

    --
    A.
  13. Re:data/mass ratio by HappyEngineer · · Score: 3, Informative

    It depends on the number of pounds in a ton, but if it's short tons then

    165 short tons = 149,685,482 grams
    1e15 / 149,685,482 = 6,680,674 bytes per gram

    A quick check of amazon turns up a 1TB drive which weights 2.4 pounds.
    That's 1,089 grams which is 918,592,757 bytes per gram.

    Unless I've messed up my math, it looks like hard drives store 137 times more information per gram. That's not as large a multiple as I had imagined though. The whole thing should still be between 1 and 2 tons when put on hard drives.

  14. All this stuff should be digitized and made public by syousef · · Score: 2, Insightful

    When I completed my Astronomy masters access to publicly available data from various sources (most notably NASA data made free to the public) was a real boon. It meant we could do analysis on actual real data instead of artificial or sanitized textbook material. A couple of the students built on this to do some original research. (Sadly that's not the way I went, as my time was more limited).

    There are also lots of amateurs out there running a wide variety of very specialized packages to do everything from discovering asteroids to keeping tabs on the brightness of stars and watching for supernovae.

    --
    These posts express my own personal views, not those of my employer
  15. Harvard can handle the burden by tchdab1 · · Score: 4, Informative

    From here: http://www.hno.harvard.edu/guide/finance/index.htm l,

    This:
    Harvard University's endowment, valued at $25.9 billion at the end of FY 2005, is a collection of more than 10,800 separate funds established over the years to provide scholarships; to maintain libraries, museums, and other collections; to support teaching and research activities; and to provide ongoing support for a wide variety of other activities. The great majority of these funds carry some type of restriction.

    I think they can scare up the change.

    1. Re:Harvard can handle the burden by Anonymous Coward · · Score: 2, Informative

      Absolutely correct. According to records, Harvard saw their endowment fund appreciate over 16% in a single year (FY2005). Sixteen percent of $30 billion is nearly $5 billion which would allow them to quite easily fund this project. Even if Harvard has the fund invested in an interest-bearing account at 5%, they're still seeing around $1.5 billion per year in interest income - something more than $4 million per day. This project is chump change.

    2. Re:Harvard can handle the burden by moosesocks · · Score: 2, Interesting

      They might, but I doubt it. Unless they could potentially turn it into a media blitz, I genuinely doubt that Harvard (or any private institution for that matter) would pick up this sort of project.

      If they did, they'd keep it private, and only share it amongst other institutions "prestigious" enough to be deserving of the blood and sweat of Harvard scientists.

      I'm sorry, but the Ivy League has quickly degenerated into a billionaire's playground. If they turn away thousands of "perfectly qualified" applicants per year, and have all this money lying around, there are very few legitimate reasons not to capitalize on this, build up their capacity, and start being equitable about who gets to study/work there.

      The Ivy League has become a game of prestige, and nothing more. I don't trust them with vital bits of science that could potentially go toward the public good. They've tarnished the name of academia.

      --
      -- If you try to fail and succeed, which have you done? - Uli's moose
  16. Re:Would Google archive it, perhaps? by networkBoy · · Score: 4, Funny

    Google Universe (beta)
    Searchable in Lat/Lon/time/intensity
    that would be awesome...

    --
    whois gawk date unzip strip find touch finger mount join nice man top fsck grep eject more yes exit umount sleep dump
  17. A great idea. by niktemadur · · Score: 4, Interesting

    If they manage to standarize a century of these plates, it would significantly extend the time range of data to digitally extrapolate and detect objects previously missed. Just to speak of mapping our own cosmic backyard, a significant amount of slow moving, previously undetected Kuiper Belt Objects, for example, would more easily pop into view. Surely a bunch of comets, too.

    Clyde Tombaugh captured Pluto several times during his three decades long hunt for the elusive Planet X, but failed to put the pieces together. If he had had digital technology, he would have shaved off at least a decade of effort. So imagine all the extremely useful raw data still stored in those plates.

    --
    Lil' Thindime, lilting a lacrimose lament, krashes the kwaint konfines of Kokonino Kounty
  18. Luckily glass isn't a liquid.... by Joce640k · · Score: 2, Insightful

    Luckily glass isn't a liquid so they won't distort.

    --
    No sig today...
    1. Re:Luckily glass isn't a liquid.... by ajs · · Score: 3, Interesting

      Glass *is* a liquid (sort of), but it does not flow, which is what I think you were getting at.

    2. Re:Luckily glass isn't a liquid.... by ajs · · Score: 2, Informative

      Glass does flow...a look at 80-150 year old windows will show this. Please follow the link in the post you were replying to, and/or look this up on snopes. This is not true. 80-150 year old glass is simply warped due to less advanced manufacturing techniques, and often thicker at the bottom because window makers tended to place the thicker edge at the bottom.

  19. Re:Why does it have to cost so much? by geekyMD · · Score: 2, Insightful

    Holy crap dude, you just won the asshat of the year prize. Do you have any idea of the magnitude, delicacy, or importance of the data you're talking about? To say nothing of the needed precision when scanning.

    "I scanzord 90 filing cabinets of paper into teh computerz"

    You know what, I used to launch model rockets. Its really easy to make stuff go up. Just buy the kit, attach a little engine and off it goes. $30 easy! Freakin NASA I bet they're spending all of our tax dollars on pr0n.

    "cheapish 20megapixel camera" - Ever hear of the Hubble? I hear people like it for more than those weird nebulae pictures. I guess we should have just given one of those astronuts a Nikkon and let him go to town. Much cheaper.

    And I guess we should use lossy compression, its just empty space out there right? I bet we could get the infinite sky down to a couple hundred GB. (JPEG, its for astronomy too!)

  20. 165 tons? My God by fredrated · · Score: 2, Funny

    that's astronomical!

  21. More than just a flat scan by CraterGlass · · Score: 4, Informative

    There is more to this than simply scanning a flat image. The emulsion on these plates is a three dimensional medium, and different data can be extracted depending on your focal depth into the the emulsion. I believe David Malin did much pioneering work on this kind of thing, including the use of different layers for unsharp masking.

    There will be information in the plates that is not yet part of human knowledge, and a simple scan of one focal plane is not going to get it all.

    Certainly it is worth taking backup images of these plates in any way we know how, but we should remain aware that, as of today, no technology exists that will make exact duplicates of them, so great care should always be taken to preserve the originals.

  22. Re:Why does it have to cost so much? by Chief+Camel+Breeder · · Score: 2, Informative

    You need specialized scanning machines for astronomy. Office equipment doesn't do the job.

    • The plates have to be scanned in transmission, not reflection (they are photographic negatives).
    • You have to accurately measure the darkness of the plate in order to deduce the light intensity that fell on it. Office scanners only approximately measure the light and dark - enough for visual presentation, not enough to do maths with the result.

    My colleagues in the UK had such a scanner. It was ~7 tonnes of metal, glass and electronics (heavy so as to be very stable), lived in its own building and needed several clever people to keep it running. Building one of these (or cloning one you already have so as to work faster) could cost a big chunk of the $5M.

    The scanner I knew took ~ 30 minutes to scan a plate. For the harvard collection, choose between one scanner (which they may not have; otherwise why did they wait until now to start the project?) and a long project with big sallary bill, or multiple scanners, at extra capital cost, and less money for people.

  23. GoogleSky by 12357bd · · Score: 2, Insightful

    Seriously, let Google index not only that collection, but any stellar image information and launch GoogleSky.

    --
    What's in a sig?