All that will happen is that the site in question will blacklist your scraping application. I work for a media organization, and we deal with this stuff all the time. It's far more cost efficient for us to simply whack the application than to try and track down the jokers. It's actually pretty trivial to nail an automated scraper: they're obvious on the logs.
So the few times I've had someone ask me to do this sort of scraping, my response is usually that sure, fine, it works, but it's very easy to spot on the logs, and the information is very likely to become unavailable at unpredictable intervals.
In the long run, it's usually pretty futile to scrape in the first place. When you're stealing content just to drive traffic, you tend to have a crappy site. The only time I ever did a professional scraping app that was "justified" and "legal", the victim was another business unit within the same corporation, and we had every right to the data that they "couldn't" compile for us.
Proof of Concept; sad, but in Securityville this is actually used often enough that it would be considered a "normal" acronym. The debate usually revolves around the fact that a lot of PoC's are completely esoteric and can't be made into actual workable mass-market exploits.
Norton is itself a virus. It hogs resources, causes errors, and can't be removed without killing the host.
For what you pay, you should get something that is better than cheaper or free products available on the web...I usually replace Norton with AVG, and while I'm not a huge fan of AVG, I've never had anyone complain.
You've got a bunch of 4 year old drives in Raid 0? Jesus. I'd be afraid to run a defragger or reboot. The average service life of a hard drive these days is considered to be 3-5 years, so it would be a good idea to make a backup of anything you care about.
Yea, you need to duplicate them. Might be easier to just buy one modern drive that is bigger than all the older drives, and copy all the data new drive.
At that point the MTBF was longer than the projected lifespan of the system, so we didn't bother. One of the drives we put in actually does have some kind of intermittent fault; it still get a blip on my logs every now and then.
Eh. The problem isn't how they made it. The game they made was okay, and it developed a nice little niche following.
And then WoW blew up, and they decided to try and be WoW, even though the game had been pretty much designed to be NOT WoW, at which point the whole thing caught fire imploded and shit itself into a grotesque mockery of life.
Look at Eve...Same era, also sci-fi themed, similarly geared toward the hardcore contingent, but Eve stayed true to itself and is quietly prospering.
What Blizzard does well is figure out what they want to do, and make it into a good game. What Sony (and EA) does well is try to figure out what will make them the most money in the shortest time.
Most of that was probably reel tape; no doubt that crap went to hell in record time...It's exposed to the air in multiple places, people actually TOUCHED chunks of it at various times...And if they're having mold issues, it's sitting in a humid warehouse somewhere.
Our stuff is in a nice climate controlled safe, and it's all DDS tape and newer, the sort that doesn't get crud in it in normal usage.
Sure, if you only need 16gb of info, then almost any backup solution will meet your needs. Sign up for a couple of gmail accounts, and mail it to yourself. Pay Amazon 2 bucks a month to store it in S3...Hell, if you trust Amazon not to lose your data (debatable) they'd only charge 1,843.20 cents a month to store your 12TB (not counting the 1,228.80 they'd charge you when you uploaded it).
It's a problem of scale. 1gb is trivial. 1,024GB is difficult. 12,288GB is obscenely difficult. Reliable, redundant, offsite storage is nearly impossible for that quantity of data for anyone except a decent sized corporation. If you put together the amount of storage I deal with at work, its between 10-20TB, but the amount we back up in the hardcore offsite manner is under 100gigs.
The vast majority of Egypts writings were stored on perishable papyrus, not carved or painted on stone. Of all that they ever wrote or stored, we have but the tiniest fraction remaining.
If we lost technology today, there would be nothing left but paper in 20 years. In a thousand, there wouldn't even be much paper.
Sure, right now. The first hard drive I ever bought was 8 megabytes and cost 600 dollars. 4 years ago I bought a 1gb usb flash drive for 300 dollars, now they're running 10-20 bucks.
In a few years solid state will be something I'm looking at VERY seriously. It has serious potential for long term storage. Yea, it's too expensive...right now...But in the long run it's the most promising thing out there.
Yep. The drives we had were all sequential serial numbers...They were good drives, IBM Ultrastar's, which were a benchmark for reliability before Hitachi came along, and the little bastards held up. We didn't lose any data (and we had a nightly backup, so no biggie), though the whole experience probably stripped a year off my life.
But I agree completely; I can't imagine trying to convince my boss to cycle out a few thousand dollars worth of working drives a year, even though its the way it ought to be done.
If you've got an HD camcorder you can fill that up with three hours of video. I know people who's iPods have that much data on them.
I'm not saying everyone has multiple TBs of info lying around but 1TB isn't ridiculous these days, and 1TB is pretty much impossible for joe user to back up without using another hard drive.
I've got a mainframe circa 1984 that's been using the same type of drive since 1989. Last year we pulled all the year-end financial numbers off the yearly backups dating back to that point. Zero failed tapes.
Consumer-grade CDs and DVDs use a photosensitive dye to record information. It can degrade in anywhere between 2 to 5 years...Longer if you keep it in a cool dark place, but not 20 years.
I actually have an RRRAID...A redundant redundant redundant array of inexpensive disks. I may lose 1 raid. I may even lose 2. But I probably won't lose 3. But that solution is WAY out of reach for the average consumer, and is only possible for me because the amount of data I have on hand doesn't change very quickly.
Even 1TB is a problem, and that is within the reach of consumers these days. And if you think your external HDD is protecting your data, you're crazy. The failure on those is single point, and thats more likely on an external drive that gets moved around than on any internal drive. Beyond that, I'm sure your rotational policy is lax; everyone's is, so what you're really saying is you have some of your data backed up. Depending on how often you back up, you may only lose a month or two.
Wow, how incite-ful. Doesn't matter what the discussion is, some geek is bound to weigh in with all the shortcomings of any idea.
Newsflash: there is no perfect backup! No method is foolproof, especially when it's bound to be boring as hell, and you've got an inevitable human factor. You get lazy moving the tapes offsite, you put off fixing a dead drive because there are 4 others, you wipe your main partition upgrading your distro and forget that your CRON rsync script uses the handy --delete flag, and BOOM wipes out your backup.
Shit happens. Pointing out what we all already know doesn't do anything helpful.
Yea, but DVD is transient crap. How long will those last? A few years? You cannot rely on home-burned optical media for long term storage, and while burning 12 terabytes of information on to one set of 1446 dvds (double layer) may not seem like a big deal, having to do it every three years for the rest of your life is bound to get old.
For any serious storage you need magnetic media, and though we all hate tape, 5 year old tape is about a million times more reliable than a hard drive that hasn't been plugged in in 5 years.
So either you need tape in the sort of quantity that the private user cannot justify, or you're going to have to spring for a hefty RAID and arrange for another one like it as a backup. Offsite if you're lucky, but it's probably just going to be out in your garage/basement/tool shed.
Now, what do you do if you can't rely on RAID? No other storage is as reliable and cheap as the hard drive. ZFS and RAID-Z may solve the problem, but they may not...You can still have failures, and as hard disk sizes increase, the amount of data jeopardized by a single failure increases as well.
The real issue is one that anyone who has ever had to recover a multi-drive array can tell you instantly: if one drive fails, and the other drive was bought at the same time, and has had a nearly identical usage pattern, the odds of the other drive failing are well above average.
I once had a single drive fail in a 24 disk array. The disks were arranged, RAID 5, in groups of 3, glued together by Veritas (from back before it got bought by crappy symantec). By the time the smoke cleared we had replaced 19 out of 24 drives. They had all been bought at the same time, and as they thrashed rebuilding their failed buddies, they started dying themselves. The remaining 5 drives we replaced anyway, just because.
That's a worst case, but multiple failures are far from uncommon, and very few people correctly cycle in new drives periodically to reduce the chance of a mass failure.
Yea, because we all backup 12TB of home data to an offsite location. Mine is my private evil island, and I've bioengineered flying death monkeys to carry the tapes for me. They make 11 trips a day. I'm hoping for 12 trips with the next generation of monkeys, but they're starting to want coffee breaks.
I'm sorry, but I'm getting seriously tired of people looking down from the pedestal of how it "ought" to be done, how you do it at work, how you would do it if you had 20k to blow on a backup solution, and trying to apply that to the home user. Even the tape comment in the summary is horseshit, because even exceptionally savvy home users are not going to pay for a tape drive and enough tapes to archive serious data, more less handle shipping the backups offsite professionally.
This is serious news. As it stands, the home user that actually sets up a RAID 5 raid is in the top percentile for actually giving a crap about home data. Once that becomes a non-issue, then the point has come when a reasonable backup is out of reach of 99% of private individuals. This, at the same time as more and more people are actually needing a decent solution.
2007 isn't that bad. The effing "x" formats are a P.I.T.A but as per usual, the next Office version is a decent incremental upgrade, which will, in due course, be adopted by the business community at large.
If they followed the same sort of incremental, professional design philosophy with Windows, they wouldn't spend so much time having their user base frothing in hatred and rage.
All that will happen is that the site in question will blacklist your scraping application. I work for a media organization, and we deal with this stuff all the time. It's far more cost efficient for us to simply whack the application than to try and track down the jokers. It's actually pretty trivial to nail an automated scraper: they're obvious on the logs.
So the few times I've had someone ask me to do this sort of scraping, my response is usually that sure, fine, it works, but it's very easy to spot on the logs, and the information is very likely to become unavailable at unpredictable intervals.
In the long run, it's usually pretty futile to scrape in the first place. When you're stealing content just to drive traffic, you tend to have a crappy site. The only time I ever did a professional scraping app that was "justified" and "legal", the victim was another business unit within the same corporation, and we had every right to the data that they "couldn't" compile for us.
Proof of Concept; sad, but in Securityville this is actually used often enough that it would be considered a "normal" acronym. The debate usually revolves around the fact that a lot of PoC's are completely esoteric and can't be made into actual workable mass-market exploits.
Norton is itself a virus. It hogs resources, causes errors, and can't be removed without killing the host.
For what you pay, you should get something that is better than cheaper or free products available on the web...I usually replace Norton with AVG, and while I'm not a huge fan of AVG, I've never had anyone complain.
You've got a bunch of 4 year old drives in Raid 0? Jesus. I'd be afraid to run a defragger or reboot. The average service life of a hard drive these days is considered to be 3-5 years, so it would be a good idea to make a backup of anything you care about.
Yea, you need to duplicate them. Might be easier to just buy one modern drive that is bigger than all the older drives, and copy all the data new drive.
I suppose you have a citation that proves that Dontdropthesoap California's zip code is not 10101-1010?
I'll wait.
Inmate #3L33T3
P.M.I.T.A Prison
Dontdropthesoap, CA, 10101-1010
4 years is too long. You need to start rotating in some new drives...Even the very best drives don't offer warranty replacement past 5 years.
Years, or weeks, depending. Good MMOs are amazing cash cows. Crap MMOs are massive money pits.
At that point the MTBF was longer than the projected lifespan of the system, so we didn't bother. One of the drives we put in actually does have some kind of intermittent fault; it still get a blip on my logs every now and then.
My bad. I was thinking fifteen cents a gig, and just typed cents instead of dollars.
Eh. The problem isn't how they made it. The game they made was okay, and it developed a nice little niche following.
And then WoW blew up, and they decided to try and be WoW, even though the game had been pretty much designed to be NOT WoW, at which point the whole thing caught fire imploded and shit itself into a grotesque mockery of life.
Look at Eve...Same era, also sci-fi themed, similarly geared toward the hardcore contingent, but Eve stayed true to itself and is quietly prospering.
What Blizzard does well is figure out what they want to do, and make it into a good game. What Sony (and EA) does well is try to figure out what will make them the most money in the shortest time.
Lets hope he discovers some porn this time...
Most of that was probably reel tape; no doubt that crap went to hell in record time...It's exposed to the air in multiple places, people actually TOUCHED chunks of it at various times...And if they're having mold issues, it's sitting in a humid warehouse somewhere.
Our stuff is in a nice climate controlled safe, and it's all DDS tape and newer, the sort that doesn't get crud in it in normal usage.
Sure, if you only need 16gb of info, then almost any backup solution will meet your needs. Sign up for a couple of gmail accounts, and mail it to yourself. Pay Amazon 2 bucks a month to store it in S3...Hell, if you trust Amazon not to lose your data (debatable) they'd only charge 1,843.20 cents a month to store your 12TB (not counting the 1,228.80 they'd charge you when you uploaded it).
It's a problem of scale. 1gb is trivial. 1,024GB is difficult. 12,288GB is obscenely difficult. Reliable, redundant, offsite storage is nearly impossible for that quantity of data for anyone except a decent sized corporation. If you put together the amount of storage I deal with at work, its between 10-20TB, but the amount we back up in the hardcore offsite manner is under 100gigs.
The vast majority of Egypts writings were stored on perishable papyrus, not carved or painted on stone. Of all that they ever wrote or stored, we have but the tiniest fraction remaining.
If we lost technology today, there would be nothing left but paper in 20 years. In a thousand, there wouldn't even be much paper.
Sure, right now. The first hard drive I ever bought was 8 megabytes and cost 600 dollars. 4 years ago I bought a 1gb usb flash drive for 300 dollars, now they're running 10-20 bucks.
In a few years solid state will be something I'm looking at VERY seriously. It has serious potential for long term storage. Yea, it's too expensive...right now...But in the long run it's the most promising thing out there.
Yep. The drives we had were all sequential serial numbers...They were good drives, IBM Ultrastar's, which were a benchmark for reliability before Hitachi came along, and the little bastards held up. We didn't lose any data (and we had a nightly backup, so no biggie), though the whole experience probably stripped a year off my life.
But I agree completely; I can't imagine trying to convince my boss to cycle out a few thousand dollars worth of working drives a year, even though its the way it ought to be done.
If you've got an HD camcorder you can fill that up with three hours of video. I know people who's iPods have that much data on them.
I'm not saying everyone has multiple TBs of info lying around but 1TB isn't ridiculous these days, and 1TB is pretty much impossible for joe user to back up without using another hard drive.
I've got a mainframe circa 1984 that's been using the same type of drive since 1989. Last year we pulled all the year-end financial numbers off the yearly backups dating back to that point. Zero failed tapes.
Consumer-grade CDs and DVDs use a photosensitive dye to record information. It can degrade in anywhere between 2 to 5 years...Longer if you keep it in a cool dark place, but not 20 years.
I actually have an RRRAID...A redundant redundant redundant array of inexpensive disks. I may lose 1 raid. I may even lose 2. But I probably won't lose 3. But that solution is WAY out of reach for the average consumer, and is only possible for me because the amount of data I have on hand doesn't change very quickly.
Even 1TB is a problem, and that is within the reach of consumers these days. And if you think your external HDD is protecting your data, you're crazy. The failure on those is single point, and thats more likely on an external drive that gets moved around than on any internal drive. Beyond that, I'm sure your rotational policy is lax; everyone's is, so what you're really saying is you have some of your data backed up. Depending on how often you back up, you may only lose a month or two.
Wow, how incite-ful. Doesn't matter what the discussion is, some geek is bound to weigh in with all the shortcomings of any idea.
Newsflash: there is no perfect backup! No method is foolproof, especially when it's bound to be boring as hell, and you've got an inevitable human factor. You get lazy moving the tapes offsite, you put off fixing a dead drive because there are 4 others, you wipe your main partition upgrading your distro and forget that your CRON rsync script uses the handy --delete flag, and BOOM wipes out your backup.
Shit happens. Pointing out what we all already know doesn't do anything helpful.
Yea, but DVD is transient crap. How long will those last? A few years? You cannot rely on home-burned optical media for long term storage, and while burning 12 terabytes of information on to one set of 1446 dvds (double layer) may not seem like a big deal, having to do it every three years for the rest of your life is bound to get old.
For any serious storage you need magnetic media, and though we all hate tape, 5 year old tape is about a million times more reliable than a hard drive that hasn't been plugged in in 5 years.
So either you need tape in the sort of quantity that the private user cannot justify, or you're going to have to spring for a hefty RAID and arrange for another one like it as a backup. Offsite if you're lucky, but it's probably just going to be out in your garage/basement/tool shed.
Now, what do you do if you can't rely on RAID? No other storage is as reliable and cheap as the hard drive. ZFS and RAID-Z may solve the problem, but they may not...You can still have failures, and as hard disk sizes increase, the amount of data jeopardized by a single failure increases as well.
The real issue is one that anyone who has ever had to recover a multi-drive array can tell you instantly: if one drive fails, and the other drive was bought at the same time, and has had a nearly identical usage pattern, the odds of the other drive failing are well above average.
I once had a single drive fail in a 24 disk array. The disks were arranged, RAID 5, in groups of 3, glued together by Veritas (from back before it got bought by crappy symantec). By the time the smoke cleared we had replaced 19 out of 24 drives. They had all been bought at the same time, and as they thrashed rebuilding their failed buddies, they started dying themselves. The remaining 5 drives we replaced anyway, just because.
That's a worst case, but multiple failures are far from uncommon, and very few people correctly cycle in new drives periodically to reduce the chance of a mass failure.
Yea, because we all backup 12TB of home data to an offsite location. Mine is my private evil island, and I've bioengineered flying death monkeys to carry the tapes for me. They make 11 trips a day. I'm hoping for 12 trips with the next generation of monkeys, but they're starting to want coffee breaks.
I'm sorry, but I'm getting seriously tired of people looking down from the pedestal of how it "ought" to be done, how you do it at work, how you would do it if you had 20k to blow on a backup solution, and trying to apply that to the home user. Even the tape comment in the summary is horseshit, because even exceptionally savvy home users are not going to pay for a tape drive and enough tapes to archive serious data, more less handle shipping the backups offsite professionally.
This is serious news. As it stands, the home user that actually sets up a RAID 5 raid is in the top percentile for actually giving a crap about home data. Once that becomes a non-issue, then the point has come when a reasonable backup is out of reach of 99% of private individuals. This, at the same time as more and more people are actually needing a decent solution.
2007 isn't that bad. The effing "x" formats are a P.I.T.A but as per usual, the next Office version is a decent incremental upgrade, which will, in due course, be adopted by the business community at large.
If they followed the same sort of incremental, professional design philosophy with Windows, they wouldn't spend so much time having their user base frothing in hatred and rage.