Large-Scale Video Archiving?
BondHeadGuy asks: "Ok, say you have 1000+ cameras emitting 30 frames/second worth of 640x480 grayscale video...and you have to store it indefinitely. What do you do? This is a real question, believe it or not. 30 frames/s * 300 KB/frame = 9 MB/s per camera. 100:1 video compression brings that down to ~90 KB/s. But 90 KB/s * 1000 cameras = 90 MB/s, or ~8 terabytes/day. Retrieval, though, can be essentially arbitrarily slow. Reliability should be good enough to not be annoying long term. Is there a solution that: has 8 TB/day storage capacity, can handle the 90 MB/s write speed, and lets you save some bucks on the (slow) read side?"
Although you are probably looking for a digital solution, don't overlook the solutions that already exist. Security camera VCR's (available at RadioShack et al.) can put 24 hours (or more) of video on a single VHS tape. Get a few VCR's (at $200 each), and a pallet of VHS tapes at Sam's club, and you could record all the video you want!
Like a security camera in a stairwell or something? In that case you can use motion detection to start/stop recording and save well over 100:1. The choice of video codec is going to be important if it's for security (so faces, etc. can be recognised), but if not, you can crank the compression ratio up quite high on most codecs, especially the video codecs that do frame-by-frame motion differencing (i.e. not MJPEG).
300k/sec seems very excessive. You could try converting it all to mpeg4 with a DivX encoder (http://www.divx.com) and that should compress it right down. If you've got sound in there too, strip it out or at least convert to MP3.
You can do all this with a great program called Virtual Dub (http://www186.pair.com/vdub/)
You've got mail. Pattern baldness. - Crow
Gonna be expensive,
How long does the data need to be stored for? Tape is good if indefinate storeage is not a requirement. (Tape degrades fast.. but is reusable)
Terabyte tape libraries are fairly common. Check out any of the major datacenter manufacturers. Sun and HP both have a unit of about 7TB. But you're talking several 100k$ for a fully automated unit.
Cheapest route would be to go back to the dark ages. Buy a bunch of 100GB tape drives and lots of tape (70 tapes a day ain't bad). Hire a few minimun wage tape monkeys to change tapes on command. Setup a LED display or a big monitor for the computer to flash tape change commands on. (Old IBM trick)
Mark
What your really looking at is some kind of Heirarchical Storage Solution. What happens is that once you have predetermined how much data will be saved from the camera each night. You can get some kind of disk array to store it on. That disk array will also be attached to some kind of HSM solutions such as what is provided by StorageTek's SAMFs. That solution will automatically backup the data that is stored on your disk and remove it from your disk so new data can be stored on the costly disks. From now on your OS and applications think that the data is on disk but in reality its on tape. When the data is requested the software will automatically get it from tape and place it back on the disk. This can be rather costly however.
Be pragmatic and only archive 15fps. This cuts your archive media costs by ~50% no matter what solution you choose. 15fps should be adequate, although who knows your exact parameters.
Go with a FC solution - stay away from EMC, as they will try to sell you a massive Symmetrix for your needs. Sounds like you need a building block approach, one block a day. Doesn't need to be TOO fancy, eh?
Here are some options for FC disk storage:
- Sun T3
- EMC Clariion
- Compaq Storageworks
- HP VA7400 -- my fav
Just to warn you, you're looking at something on the order of 20k/day to operate this setup... now, I'm sure the price would go down QUITE a bit if you're purchasing 8-10TB a day, but even still, it's a huge cost.
I looked at a 10TB solution from the above vendors, and the cheapest I got it was $0.0425/MB!
A suitably large DLT library with a fairly large number of drives would probably do this. Couple it with some HSM (Hierarchical Storage Management software) and you're probably all set.
In terms of sizing, assuming you get 6MB/s per DLT drive, you'll need at least 15 drives. Go for 20. This gives you room to do cutovers, and the like. I'd recommend fronting this with a LARGE disk for scratch space (preferably solid state, but if that's not in the budget, a big old SCSI disk'll do.) You'll need a pretty hefty server to handle all this (at least a pair of Sun E450s for redundancy). You'll also chew through at least 200 tapes a day at a native capacity of 40G/tape.
HOWEVER, this is by no means cheap. The virtue of the fact that you're talking about 8 terabytes a day should be a clue to that. The sort of tape archive, tape supply, and tape library you'll need is... vast. You're talking very high-end hardware here. You'll need a good cataloging system, and some serious software to maintain all this. You'll need to keep about 75% of your drives streaming all day every day. Tape costs alone will run to about 10k/day, let alone electricity, storage, maintenance and initial outlay. I'd venture a project like this is probably a $15 million dollar outlay to do it right, with at least 2 full time support staff and budget on the order of $40k/day . But if you've got the money, go for it.
and you have to store it indefinitely.
Retrieval, though, can be essentially arbitrarily slow
Oh, so your looking for a storage medium with infinite space but slow retrieval time?
Easy. Free-Space Medium.
Just use an extremely high gain antenna, a ton of power, and the space around us. Transmit the compressed data stream, aimed at a distant planetary body of your choosing. I would reccomend something in the 100 light year range or so. Now, when the waves hit the body and are reflected back to earth, you will have what is essentially a 100 light year long piece of storage.
And when the waves get back to earth, the technology for terrestrial storage will be extremely inexpensive, and the reception equipment will be too.
I apparently forgot that sig != uptime...
Are we presuming that there will be action most of the time in all 1000+ cameras? If not, then why store the images where there is no action? I'm doing a similar digital surveillance thing, albeit much smaller. (7 cameras, 1 fps, only at night [when there's not much action anyway]) My images all go to jpeg's and I wrote a little C program to throw away all the "similar" images. My algorithm is somewhat conveluded, but more-or-less only keeps the images if they differ by more than a certain amount. I'm sure video compression schemes like mpeg would pretty much take care of this if you're storing to video format. Storing to jpegs has the benefit of being able to easily pick out any time stamp, but I can't watch them like a video.
A simpler solution might be to put a motion detector on each camera and have them only record when there is motion. Using your 100:1 image compression, you estimate 8 terabytes/day. I would expect you could get (warning: pulled from thin air) 1/10th of that by ignoring anything that isn't moving.
But then you have a quandry: was there really no motion at camera #469 at 12:30am, or was something just broken?
Are you capturing in digital format? Are you sure that your systems are even capturing at 30fps?
It's unlikely that any digital conversion device(s) would be able to handle the input from 1000+ cameras and then be able to get that data to a central storage location through a local network.... the bandwidth needed for something like that would be incredible (90 MB/s ??), perhaps even requiring it's own seperate gigabit fibre network. Even in a high-end server with several devices connected to it you'd be lucky to capture 20fps.
But if you were able to get it done, storage options would probably consist of some type of RAID array (with a HUGE number of disks to be able to hold 8TB/day).
Storing that much data indefinately would require enormous rooms dedicated to storage devices, which may not be feasible. Storing data for a week or even a month would be a challenge in itself.
Having things in digital format is nice for indexing and fast retrieval, but in this case it may be too costly. Storing data on video tape may not be as fast or convenient, but it's much easier to store 2 twelve-hour tapes per day per camera than it is to set up and maintain 8 Terra-bytes of hard drive space per day.
100:1 video compression brings that down to ~90 KB/s.
Very interesting problem, with one more very interesting challenge that hasn't been raised yet:
Because the video is streaming in 24/7, you'd have to build a real-time compression system that could handle the 9MB/s and produce a 100:1 ratio. You could perhaps distribute that across multiple machines/CPUs, or build a custom parallel hardware setup to handle the encoding, but at this scale, the overhead of everything might prevent you from reaching the essential criteria of real-time.
Does anyone know what the hardware requirements are for real-time encoding one 640x480 stream? Now, multiply by 1000.
If things aren't actually moving in most of the shots (ATM or warehouse surveillance cams, for example) then you'll be able to get far better than 100x video compression.
Also, how much a factor is comunication. 1000+ cameras ona LAN or WAN?
Any secondary logging going on here? Any metadata (ATM transactions, notes, etc.) that should be stored along with the media? Do you want to use this data for easier access? Is there any preprocessing (facial recognition)?
You mentioned recall could be arbitrarily slow, but if it's possible to speed it up with only small changes, is it worth it?
Feel free to ignore these questions. Largely I'm just curious about something you probably can't talk about, but then again as a systems engineer, I'd find it difficult to recommend a solution without knowing more factors that could impact on ways I can't think of until I know more factors...
Kevin Fox
Get a thousand TiVO's. Why settle for AVI quality when you can see your terrorists and burglars in stunning MPEG-2?
Now you can make your own decision about helping him out (or not).
Since you did not state a retrieval time or storage/retention needs, I am going to offer to scenarios; one for long term, fast access storage, one for short term and/or slow access storage.
Storing 8TB/day for a long time with quick access would probably require a tape silo, which is essentially a tape library the size of a small house. StorageTek is one of the leaders in silos (And might be the only vendor making them these days.), and they make some pretty nice stuff. Their PowderHorn 9310 is a nice model for bulk storage and quick recovery. A downside to the silos is that they do not often handle DLT tapes, which can make it hard to use tapes outside of the library.
If you do not need fast access to the data, and have time to root through tapes for restores, just get a smaller tape library (Anything in the 50-100 tape range from ATL/Quantum Adic or Qualtstar running SuperDLT drives controlled by Veritas Netbackup would give you an easy way to handle all the data. NetBackup has excellent archiving capabilites (IE record data, wipe data from disk.), works on just about any platform out there, scales well, and keeps files in GNUTar format for easy access. As for storing the tapes themselves, if you have a small retention time just keep around a few hundred tapes to cycle through. If you need to store the data for a long time, get a few thousand tapes and a set of nice shelves to keep them on. If you do not have somewhere to store them, Iron Mountain does a great job storing data, I have worked with them before and toured one of their facilities, and I can vouch that they do a great job storing data.
hitachi has several very large storage arrays that are very competitive with EMC last i checked. again, that is if you need it to be in digial format and need it to be online.
alex
I've looked into almost this exact problem (we had about 100 hours of full color video/day - broadcast quality)
Your going to have to get VERY friendly with your local "Storage Area Network" vendors. What we came up with as a best SHORT term solution was this - Store the video on Video tape or DVD (depending on quality requirements - DVD is NOT broadcast quality), and then use multiple players - things like DVD jukeboxes/tape changers. They can either be manually loaded, or a robot. You then use a cache to store the vidio on a last in/last out basis if you need fast playback (assumption here - the most recently used tapes are most likely to be used again)
Encoding isn't that bad a problem - you just use multiple encoding stations - You say you have 1000 cameras - you're probably going to need better than 1000 encoding stations (don't forget spares) - you batch up 1/2 hour (for example) files and write those out to the SAN when your done - while one station is encoding, the next is recording, and you batch the encoded file up into Near line storage, so you don't NEED real time
Storage is going to take space/money BIG MONEY - your talking about 30 DVDs worth of data/day depending on your robots. Figure 1000s/day
Charlie
-- 73 de KG2V For the Children - RKBA! "You are what you do when it counts" - the Masso
Obviously this is some sort of security system that watches a large amount of space. So we are talking either a Casino or park of some kind. If not, then these are the people to ask.
:)
Also, is keeping all of the footage forever a requirement? Or just some of the footage? I would think you may want to keep the footage for a couple of days or weeks at most. If something requires footage to be kept longterm then you would move that from the harddrive to cdrom or dvd.
This is a job for a cluster of iMac's if I ever saw one
This is not the sig you are looking for...
Plus it can store 12TB clustered.
Since I thought the problem originally specified usage of 8 TB/day, stored indefinitely, I can't really see how this solution could work, as you would quickly overrun capacity, and I suppose buying new machines every couple of days is not an option.
This is actually a pretty easy question to answer:
Don't Do It.
This is someone either playing a theoretical game (in which case, the answer is "outsource it") or its someone who has no idea what they really want. You have, ultimately, many conflicting specs here.
You may as well ask for a space shuttle that can fly to pluto in two minutes with no fuel.
Any system that is recording a thousand video inputs is unlikely to need 30 fps for 24/7 (I can't think of anything short of national security installations that would even desire to record 30 fps 24/7, and you'd still have trouble justifying 1000 cameras to cover every building in Washington, DC). Not to mention the logistical implications of DELIVERING 1000 full-frame video feeds to a central location -- you could saturate the entire radio spectrum for the eastern seaboard or have to build the largest gigabit LAN ever deployed.
If you have a real question, please ask it, but this is as bad as a pointy-headed boss spouting off insane specs as the "requirements" for a project because he wants to be on the cutting edge.
And BTW, you won't need 300k per frame for a grayscale 640x480 video image (except that you desire insane specs, which point we've already covered). A fine quality image could be stored in 25-50k, even less depending on the real needs (of which this project seems to lack).
Recursive: Adj. See Recursive.
Do you work for that new "Homeland Security" agency??
So what HSM means is Hiarchial Storage Management. Basically, when file hits a threshold of time, space or whatever, it will take that file and put it to tape. Then, it will replace the original file with a stub of a file that says 'when this file is needed, it's located here!'
Now, for tape storage, I highly recommend going with LTO as a tape format. You might consider doing SCSI LTO tape drives with a Crossroads 4450 connected to Broccade switches to make a SAN as well. By putting it on a SAN, you'll have the ability to spread around your clusters that you'd be putting in. LTO can spool data at about 10-20 MB/sec. Hence, if you get an STK or IBM storage library with LTO, you can fit around 20 tapes in there, and do 200 MB/sec. Plus, LTO has variable speed when writing to it, so it's better than DLT in that regard. Not to mention LTO's 100 Gig native capacity and a better compression ratio than DLT. (2.0 vs 2.2) Then, it's just a matter of cycling tapes through. If you're honestly talking that high amount of data to keep INDEFINATLY, then you might want to look at STK's Powererhorns, which hold around 2000 tapes. Plus, you can always add another wall of Tapes if you're not getting the throughput you're expecting. Or you could look at some of the larger scale robots out there, but they don't support LTO tape format yet.
By doing the EMC SAN solution to an STK powderhorn, you're looking at an enterprise level solution that will support you for years to come. Course, this comes from someone who's a vendor-neutral consultant with experience with similar technology, so your YMMV. `8r)
Let us know how it goes!
Gonzo Granzeau
"Nothing the god of biomechanics wouldn't let you into heaven for.." -Roy Batty
/dev/null ???
Pros: Extremely high write speed
Cons: Hard to get data back out, but since "Retrieval... can be essentially arbitrarily slow" you've can just re-film whatever it was that you missed. With the money you save on the video gear, you should have a nice little production budget, too...
OtakuBooty.com: Smart, funny, sexy nerds.
The Casino industry is probably the most advanced in the business of surveilence... the average Vegas casino probably approaches the scale you're talking about already, however they probably don't archive indefinately.
However, any information I've seen shows them to still be mostly analog capture for any storage, or at least digital-to-analog conversion for storage.
Although they probably won't talk about their security systems, they'd be a great resource.
MadCow.
I used to have a sig, but I set it free and it never came back.
Why 640x480? That's higher resolution than broadcast TV. Do you need that? Broadcast TV is 460x360. Capturing at that resolution will lose you detail, of course, but if it's detail you can lose, your storage requirement just dropped by 40%.
And since you said retrieval can be "arbitrarily" slow, I'd look into using VHS videotapes--even if you store compressed digital on them--as a storage medium. They're slow as hell for rerieval, but the media might be cheap, especially compared to the likes of AIT and such.
That's an awful lot of data. Why exactly are you doing this? What is the application? Who are you working for? If you are working for/in the United States, does this application meet the requirements of the 4th and 1st Amendments to the Constitution?
Just a few minor questions.
sPh
Like we're going to tell you Mr. NSA/CIA/FBI/Big Brother. :)
Need Free Juniper/NetScreen Support? JuniperForum
Only 1000 cameras?
I mean 1000 cameras is only enough to put one camera into each home in a fairly small community. Most of the solutions I'm seeing posted so far don't scale up very well. What if you need multiple cameras per home? And what to do about large cities? Maybe this should be a seperate Ask Slashdot question?
I'll see your senator, and I'll raise you two judges.
Comment removed based on user account deletion
That'd be a storage nightmare.
I don't think so.
Let's assume one camera per VCR, full 30 fps. That's 3 8-hour tapes per day per camera, 3000 tapes a day from 1000 VCRs. 1000 VCR's should cost you $100,000 and take up one
medium sized room (power and AC will need to be enhanced). 3000 tapes per day shouldn't cost more than $3000, or $1 million per year.
You'll only need a few tape monkeys at any given instant, because they'll be around one tape needing changing every 28 seconds. A days's worth of VCR tapes (assuming we pack them in boxes with NO room to spare and stack the boxes in blocks) will take up about 1.5 cubic meters or 50 cubic feet (based on 1x4x8 inches per tape, my rough estimate). That means for a year's worth of tape you need 550 cubic meters or 20,000 cubic feet, which is 3300 square feet if piled six feet high. 3300 square feet is about the floor-size of one big house.
Question to original: Are you still sure you want to do this? If so you might be best off "spreadking the load around". IE: Don't do it all in one place. There are a million convenience store camera's and vcr systems in the world, but they're not all in one place.
Off-hand I can only think of one thing that would handle 3,000 terrabytes per year, and that's if the half million people using Morpheus donated 6 Gigabytes of space each year to your cause.
NASA also thought about this, all the way up to Petabytes.
Just pay everyone on slashdot $10 to run a special version of morpheus. Then just send everyone feed from one camera. When you need to access it, just hope they're online.
Username taken, please choose another one.
Recently at a trade conference (Crestech) I saw a demo of a system some grad students had put together which paired a panoramic low-rez camera with a moveable hi-res camera. The panoramic would observe an area, and software would recognize areas of activity and focus the other camera on it. But this isn't really what the Ask Slashdot'er was looking for, obviously.
Freedom: "I won't!"
1000 monkeys, $4000000
1000 typewriters, $100000
1000 cameras, $30000 per day
Capturing the moment when one of the monkeys types the complete works on Shakespeare? Priceless.
"People that quote themselves in their signatures bother me" - athakur999
-
Faster throughput
- Less mechanical wear on the tape
- higher density on the tape
Dumping to disk first will also mean that you can use off the shelf backup software (like NetBackup, mentioned before). Given the kind of moneyo you're going to be spending, it's probably going to be worth it.If you can do it, alternate between dumping to one disk array and reading from another (better yet, go through three, so you have a bit of buffer) you can get an effective increase in the effeciency of drives if they're not seeking to write and read at the same time. Obviously there will be a real advantage to using RAID.
It would also be to your advantage to have multiple CPUs controlling the tape drives. Each one would have it's own small farm. You should be able to have multiple CPUs feeding the drives on one tape library.
Given that you're going to need tape monkeys feed the tape library, it may be worthwhile to not use a tape library, but I'd suggest that the drives be in at least small tape libraries. The reason for this is that a tape library can read the bar code on the back of each tape as it goes into the drive .. Otherwise you are almost sure to have errors logging which tape was where/when. With the volume of tapes you're going through it could be absolute hell trying to track down a mis-labeled tape. A small tape library would also allow you to keep drives in more constant motion.
The last thing is to make sure that you have more drives than you 'really' need. With many drives in constant use they will break down from time to time. Make sure that you keep that in mind when you design both your hardware and your software. The horrid thing is that, if they're reasonably well built, failure could be clustered (identical drives with similar usage).
For 90MB/sec Super DLT promises 10MB/sec which means you'll need at least 10 drives. I'd budget for 15 drives -- it gives you some reserve capacity, and allowance for things like non- streaming and occasional drive failure (in spite of their lofty promises) and people who want to read the tapes. It's usually easier to get the budget for the extra drives when you start the project, than after you run into the inevitable problems.
Sometimes boldness is in fashion. Sometimes only the brave will be bold.
Any video compression algorithms worth using for this kind of application do comparisons from one frame to the next, and only compress the differences (except for occasional reference frames.) Some of them do substantial motion compensation to model the differences, others don't. Many of them let you tweak the frequency of reference frames - is it every 10? Every 100? Do you need the ability to go backwards, or is smooth forward and clunky backwards good enough?
Very few locations actually generate much motion on a 24-hour basis, except for road traffic cameras, and I'd be extremely surprised to see an application need to store those on a long-term basis (as opposed to storing for a week or so in case there are traffic accidents - anything you need longer than that should probably be handled by license-plate recognizers.)
Bill Stewart
New Fast-Compression-only CPR http://preview.tinyurl.com/dy575ks