Hardware To Archive/Manage Large Collection Of Images?
HarpoX asks: "Technology is quickly allowing digital cameras to produce images as good as conventional film while cutting time and costs. The archiving of negatives has long been accomplished but the buildup of digital images is providing new problems to be solved. With the potential of accumulating a couple hundred gigabytes of images how can one most efficiently deal with the archiving, storage, retrieval, and management of these assets? Tape drives offer good storage capabilities (20/40 GB DAT, 40/80 GB DLT) but seem to leave the management aspect of these files to be burdensome or impossible. CD-ROM's offer a versatile usage and management while being very cheap but are so small in storage space (640 MB) that it doesn't seem worth the time. Networked hard drive space would seem to offer the most management possibility but would its permanence be not as reliable as a static media? Is there some combination of several media that could work together and maximize productivity?"
There is software that does this on the DB side. The company that sells it could probably suggest a good way to store it as well.
We have looked at doing a similar thing because we produce 200 GB of images and movies every few months. Our web group and marketing group wants to get at this data later but doesn't know exatly what they want till they see it.
We are still looking at different solutions for this. Some suggestions have been SAN's (server area networks?) or other such things. These support terabytes of data and as long as the filesystem is in an order that makes sense, jumping to projects can be automated with shell scripts.
For databases, I know that Informix has Media360. I think that Oracle might have the same. I also know that there are 3rd party apps out there, but I can't think of the name.
-I just work here... how am I supposed to know?
Look at every other question of data storage. There are simple and easy answers, storing pictures is no different from storing documents.
In your case, it appears you want:
Instant access
Lots of space
Cheap storage
In a typical situation, you use large HD for instant access, in a raid mirror for data integrity and safety. You also get a large capacity WORM drive for backups. Tape drives arereally your best choice.
In any storage situation, you face four questions:
How much storage do you need online
How much storage do you need nearline
How much storage do you need offline
How important is the data
Online storage would be a hard drive. You access this data weekly. (Fast, Big)
Nearline storage would be a tape changer, reel to reel, etc. This isn't used widely now as online storage is so inexpensive. You access this data several times a year. (Slow, Big)
Offline storage would be data stored on a media which is handled by a person. Change the tapes, put a CD/DVD-RAM, etc in. This is used primarily for backup and archive. You rarely access this data. (Slow, Labor required, Big)
The main reason to use near-line and offline were that they were less expensive than online. This is no longer the case unless you plan an online SCSI RAID.
The final question is backup. As you indicate, CD is too small. You can get tape drives that manage 40GB per tape, this would probably be your best bet. Remember to keep a complete backup off-site (fire, disaster, etc).
For your situation, I would again recommend a huge online RAID (40GB IDE drives in a mirror and striped configuration), and an automatic tape changer backup.
-Adam
Plaid ribbon campaign against code commenting:
If it was hard to write, it should be hard to understand.
If you build your database correctly you shouldn't have a problem with blobs. They won't effect searching or caching since you queries will be against the non-blob fields. The really nice thing is that you can use the database backup utilities to save all of your data.
You would have to build a front end to fetch and insert pictures but that shouldn't be too hard and you'll probably want that anyway to support stuff like version archiving, etc.
One plus of everything in the database is that you can fetch the data across the network with having to share a drive via samba or NFS.
With the better databases you can even share the data via replication.
The downsides are that you would basically have to become a database administrator to take care of the system and stuff like that.
It's _really_ _really_ expandable (1 pci adapter -->8*16 devices, configuration is realatively easy as well (unlike scuzzy)...
and performance is quite nice. Stuff is more expensive (notably, the bus adapter and cables) but the disks are OK -- I've seen seagate 9.1G hd/~100 (7200 RPM).
willis/
disclaimer: I work for a fibre channel company
there is no thing
what else could you want?
Otherwise, I'd say DVD-ROM, since it would have true archive-type storage (read-only), and it's bigger than CD-ROM.
BLOBS are generally considered a bad thing by DB gurus. There are several reasons for this:
Of course, for a home photo album linked to a website you can probably get away with Blobs. For the scale the question seems to be asking about (some kind of professional publication/agency) I don't think they'd be suitable.
.02k
Oh boy.
I'm about to embark on a multi-six-figure project to *fix* a bad image archiving system for a major vendor. What they did was this:
- Content is manufactured via scanning of microfilm to TIFF images
- The TIFFs are put through a QA process. Bad scans are re-done. All of this is done by *people* (the scanning too).
- Once out of QA, the images are burned to CD
- The CD storage is via several jukeboxes of CD reader/writers (2 read heads, 1 RW head). Each jukebox holds 200 CDs, I think. These are combined into CD towers with multiple jukeboxes in them.
- Access to the images is then managed by a piece of C++ code that knows how to drive the CD jukebox (it's huge, and not like a normal CD drive) and get data off of it.
This turns out to completely suck. Getting images off the drives is slow and buggy. The media chosen was far too cheap, for one thing. Springing for the good stuff, however, suddenly makes CDs much less attractive in terms of cost. We often see read errors, stuck drives, etc.
My company contracted to put this data out onto the Web. This turned out to be a nightmare. With unreliable and very slow hardware (remember, the CDs have to be moved into a reader from their position within the jukebox!), real-time delivery is just impossible.
By contrast, spinning disk is constantly decreasing in price. Sure, you pay a lot of money for an EMC unit, but it offers much more in terms of flexibility and expansion.
My client took the wrong road. Now they are faced with several months of I/O to get the CD data (1600 CDs worth) off the buggy drives and onto an EMC unit. Then they have to pay us to update our Web server software.
CDs are great for archives. They are very very bad for any sort of access to the data. I wouldn't recommend a CD solution. Another point to mention is no one seems to know what the shelf life of CD-Rs is... be a pity to have those turn into coasters in 5-20 years.
It's a strange world -- let's keep it that way
HarpoX -
You've actually touched on a few seperate issues with managing digital images electronically...
I've spent the last few months developing such a system for my company. My company shoots on average between 450 and 500 rolls of 36 exposure film every month. Most of the projects can take anywhere from 6 months to several years, and those images need to be available until the project closes.
I looked at using digital cameras instead of standard film cameras. I ran into two issues - the first is file size. In some cases we need to blow the photo up to make presentations. In order to get enough detail to make an E sized blowup of the photo, you're looking at- as a minimum - of an 18MB sized image. Figure 36 exposures per roll, 475 rolls of film, etc and you'll be in the terabyte range for storage requirements. Also, the cameras are expensive and our employees tend to drop them, get run over, etc.
Yes, this means we scan in negatives! This really isn't as big a deal as it may seem. Some of the better film scanners can handle bulk scanning of negatives. Even a top of the line Kodak roll scanner will only set you back about two high end digital cameras.
The film system is developed using MySQL, Perl and Apache running on Slack.... Realizing that I could run into several hundred thousands of images that need to be available online, I designed the system so that it could be split up. I didn't store the images within the database, but instead they are simply stored in the filesystem. Right now the entire system runs on one server, but it could easily be split out to a seperate web, database and image server if performance suffers.
As for archiving images, we only do it when a case closes. All indexing information within the database gets moved to an archived database and the images are moved off to CDROM and cataloged. That way, anyone can look them up and go get the images when needed. This is mainly a company procedural issue, your requirements probably differ. I wouldn't be comfortable with long term storage of information on tape, especially if it's very long term - like forever. CDROM for us was the easiest solution since it's also possible to get CD writers that a jukebox like in nature. They can make a multi CD set rather easy, simplifying long term storage.
Cheers! Mark
Remove the '_nospam' from my email address....