Hardware To Archive/Manage Large Collection Of Images?
HarpoX asks: "Technology is quickly allowing digital cameras to produce images as good as conventional film while cutting time and costs. The archiving of negatives has long been accomplished but the buildup of digital images is providing new problems to be solved. With the potential of accumulating a couple hundred gigabytes of images how can one most efficiently deal with the archiving, storage, retrieval, and management of these assets? Tape drives offer good storage capabilities (20/40 GB DAT, 40/80 GB DLT) but seem to leave the management aspect of these files to be burdensome or impossible. CD-ROM's offer a versatile usage and management while being very cheap but are so small in storage space (640 MB) that it doesn't seem worth the time. Networked hard drive space would seem to offer the most management possibility but would its permanence be not as reliable as a static media? Is there some combination of several media that could work together and maximize productivity?"
Oh boy.
I'm about to embark on a multi-six-figure project to *fix* a bad image archiving system for a major vendor. What they did was this:
- Content is manufactured via scanning of microfilm to TIFF images
- The TIFFs are put through a QA process. Bad scans are re-done. All of this is done by *people* (the scanning too).
- Once out of QA, the images are burned to CD
- The CD storage is via several jukeboxes of CD reader/writers (2 read heads, 1 RW head). Each jukebox holds 200 CDs, I think. These are combined into CD towers with multiple jukeboxes in them.
- Access to the images is then managed by a piece of C++ code that knows how to drive the CD jukebox (it's huge, and not like a normal CD drive) and get data off of it.
This turns out to completely suck. Getting images off the drives is slow and buggy. The media chosen was far too cheap, for one thing. Springing for the good stuff, however, suddenly makes CDs much less attractive in terms of cost. We often see read errors, stuck drives, etc.
My company contracted to put this data out onto the Web. This turned out to be a nightmare. With unreliable and very slow hardware (remember, the CDs have to be moved into a reader from their position within the jukebox!), real-time delivery is just impossible.
By contrast, spinning disk is constantly decreasing in price. Sure, you pay a lot of money for an EMC unit, but it offers much more in terms of flexibility and expansion.
My client took the wrong road. Now they are faced with several months of I/O to get the CD data (1600 CDs worth) off the buggy drives and onto an EMC unit. Then they have to pay us to update our Web server software.
CDs are great for archives. They are very very bad for any sort of access to the data. I wouldn't recommend a CD solution. Another point to mention is no one seems to know what the shelf life of CD-Rs is... be a pity to have those turn into coasters in 5-20 years.
It's a strange world -- let's keep it that way
HarpoX -
You've actually touched on a few seperate issues with managing digital images electronically...
I've spent the last few months developing such a system for my company. My company shoots on average between 450 and 500 rolls of 36 exposure film every month. Most of the projects can take anywhere from 6 months to several years, and those images need to be available until the project closes.
I looked at using digital cameras instead of standard film cameras. I ran into two issues - the first is file size. In some cases we need to blow the photo up to make presentations. In order to get enough detail to make an E sized blowup of the photo, you're looking at- as a minimum - of an 18MB sized image. Figure 36 exposures per roll, 475 rolls of film, etc and you'll be in the terabyte range for storage requirements. Also, the cameras are expensive and our employees tend to drop them, get run over, etc.
Yes, this means we scan in negatives! This really isn't as big a deal as it may seem. Some of the better film scanners can handle bulk scanning of negatives. Even a top of the line Kodak roll scanner will only set you back about two high end digital cameras.
The film system is developed using MySQL, Perl and Apache running on Slack.... Realizing that I could run into several hundred thousands of images that need to be available online, I designed the system so that it could be split up. I didn't store the images within the database, but instead they are simply stored in the filesystem. Right now the entire system runs on one server, but it could easily be split out to a seperate web, database and image server if performance suffers.
As for archiving images, we only do it when a case closes. All indexing information within the database gets moved to an archived database and the images are moved off to CDROM and cataloged. That way, anyone can look them up and go get the images when needed. This is mainly a company procedural issue, your requirements probably differ. I wouldn't be comfortable with long term storage of information on tape, especially if it's very long term - like forever. CDROM for us was the easiest solution since it's also possible to get CD writers that a jukebox like in nature. They can make a multi CD set rather easy, simplifying long term storage.
Cheers! Mark
Remove the '_nospam' from my email address....