On The Preservation Of Endangered Web Resources ...
An unnamed correspondent asks: "Recently Mathworld, what many would consider one of the more valuable Web resources, fell victim to a copyright lawsuit. We've seen in the past that through sufficient mirroring the community can save such resources (DeCSS for example) from similar legal onslaughts. What Web resources do you consider most valuable and/or most vulnerable to legal attack and is there any effort under way to mirror and preserve these resources?"
"Personally, I'd like to see an official community set up to protect such resources. Call them the Information League perhaps. Set up a mailing list for members and whenever some (perhaps corporate) entity tries to snuff out a Web site a member sends an e-mail to the list and all other concerned members could mirror the site."
Kinda reminds me of the old emulator days of Asimov.net
Most of non binary usenet. any www.*news*.com, slashdot.org and so forth. software arcihves...
It shoulldn't be anything that ftp.cdrom.com can't handle =)
---
-
ping -f 255.255.255.255 # if only
I'm sure this company, and any company with a page has a backup somewhere on some HD. So it's truly not gone. Oh well...
If there was such a thing in place, then the powers that be would have an easy target.
Browser? I barely know her!
Didn't somebody just setup a data haven? Perhaps something like that could be setup for saving these doomed resources, using funds from.. well, i don't know where exactly (i haven't thought this out that much), perhaps popular support..
/* Of course I'm real, but can you prove it? */
the tabliature / chord archive. it's been around as long as i have, and always seems to be on the verge of shutting down.
pezpunk
Internet killed the video star,
i could live a little longer in this prison
But what about in practice? I, for one, would sign up guaranteed. But as it would tend to happen when the information "bat signal" was sent out I probably wouldnt A. Have the space B. Have the energy or C. Have the bandwidth to attempt to mirror one site, let alone many. Plus, all the gov has to do is get a warrant and make the sign up list known to them and go after each one of us. Yeah, I know Im being paranoid. And I know that there are probably some people with the time, energy, space and bandwidth to do this. But out of, say, 15000 people I bet only 2 or 3 actually do something. Just my two cents, so dont hate me.
----------------------------------
Looking for hardware (Currently need: Large Etch-a-Sketch) Have one? See my journal!
And if such an organisation were created, which I find an excellent idea, a first place for them to host their communications could be havenCo.
This sounds ideal, but wouldn't such mirroring easily bring legal trouble upon those doing it? This "league" would need to be anonymous in some way, or the offended party may just bring suit agains the league as a whole.
pGina, http://www.xpasystems.com - Making the big boys play nice.
Wouldn't you then end up with the lawyers targetting the Information League and putting them out of business (by suing the individual members or the ISPs that provide their access, if nothing else), then going back and targetting the original victim?
While I realize that "copyright" is kind of a nasty word these days, any time you talk about doing this sort of thing, you're going to run into copyright laws. If the laws are wrong, then you work to change them either by going through the system, or being prepared to stand up to take your medicine if (when) the authorities come down on you (see any discussion of Civil Disobedience). Otherwise, you earn the distinction of being a scoflaw and get no respect from society at large.
...phil
...phil
"For a list of the ways which technology has failed to improve our quality of life, press 3."
--
--
You are a fucking moron.
Information cannot be wiped from it, because requests for the information only increase it's popularity, and thus spread it further across more hosts.
The only problem with FreeNet is that they are not, and do not intend to be a permenant storage repository for information. If no-one requests your document, it will eventully fade off of the network.
--
This message brought to you by Colin Davis
Colin Davis
Copyright is not intrinsically evil. Anyone who uses the GPL to protect their work against carpet-baggers understands that.
IMHO you should keep:
1) OLGA (+ lyrics)
2) IMDB
3) Mathworld
Does anyone know if IMDB would fall prey to a similiar lawsuit?
Matt
Personally, I think any and all information that is on the internet should be considered "protected." From the truly useful to society, to the geek interest sites, to pure trash.
To do otherwise would be to diminish the beauty of a free exchange of ideas that we currently have.
Copyright laws to Internet knowledge bases is as Ceasar to the Library of Alexandria.
Food for thought
Is slashdot getting payed for info about copywright violators?
OK everyone lets post our favorite illegal site.
DUH!!!!
http://Lenny.com
Obviously how well this works depends on how much stuff there is. DeCSS has gone far and wide because you can print it on a t-shirt if you need to. An entire encyclopedia/mathematics DB would be somewhat tougher, but I'm sure it's still doable.
--
Dyolf Knip
You don't have nearly the "copyright" issues at that point (e.g., news-site.com might not want you to be a public mirror while they're selling ad-space and trying to live, but when they're getting their ass kicked in court, they may not be nearly as inclined to go after you for preserving their livelihood and image), and you can basically keep the "mirroring" active for a certain period of time and then drop it. (e.g., if I put out an APB today to "mirror slashdot", but in 60 days slashdot is still around, drop the mirror, the crisis is over.") You do that to conserve resources. Obviously if the system has been told (as in the above example) that a site IS down, then it holds on to it for as long as necessary (forever potentially), where interested parties could then mirror it themselves. D
________________
________________
Private Essayist
The problem with mirroring such sites is (obviously) copyright. If you don't tell the site's author that you're mirroring the site, you're (probablly) infringing on their copyright.
If you do tell the author, you're a target for a lawsuit as well (ala DeCSS and numberous "John Does") because why have a mirror if you keep it secret. If you do that, its not a mirror, its a backup which any intelligent site author should be doing anyway.
In order for mirroring to be successful, you've got to have an insame number of mirrors. DeCSS was small enough that it was possible for that code but I would assume that a large resource such as Mathworld is larger than the source to DeCSS.
The very power of the deCSS mirrors is their association -- they have none. There's no authority or listing of who has what, so would-be litigators are hard-pressed to do anything about it.
In creating a 'membership', you are creating a mechanism for the dismemberment.
We'll starve 'em into better art.
If we all steal, then it makes it right........
Assumes facts not yet in evidence.
...phil
...phil
"For a list of the ways which technology has failed to improve our quality of life, press 3."
about vulnerable, but the ones I use most are (and I don't have the URLs on my work computer) are sites with information on elements and compounds. This goes beyond the periodic table of elements, but includes common (and not so common) compounds used for a variety of reasons. The information I usually need is along the lines of flash points, melting points, boiling points, reactions with other compounds, etc.
Yes, I am serious, and no I am not going to tell you what I use them for.
Eric Gearman
--
Atomic batteries to power! Turbines to speed!
The DeCSS experience shows that corporations and trade groups with vast financial resources and legal clout have no problem firing off unlimited barrages of form "cease-and-desist" letters to ISP's, universities, webmasters ... etc.
Ultimately, I believe mirroring is a temporary solution to the copyright conundrum. It's high time a membership-based organization was formed -- kinda like the EFF of intellectual property -- to protect valuable online resources from succumbing to the profit-driven proprietarization of the Internet.
Sincerely,
Vergil
Insects and Grafitti Photos
I was always wondering why slashdot never mirrored DeCSS. Afraid of a lawsuit or what? Anyways, I think I like the sound of endangered.slashdot.org
I was just reading the FAQ on Mathworld. It's an interesting little ditty, peek here: http://mathworld.wolfram.com/docs/faq.html This type of corporate raiding always burns my arse, simply because it is greed, greed, greed. Monsters. Anyone want to bet that CRC has a similar version of Mathworld coded up and ready to go?
Am I the only one who thinks this sounds as classifying bandwidth as an endangered species?
People replying to my sig annoy me. That's why I change it all the time.
HavenCo is the ideal place to host this type of information. I can already hear a few of you screaming, so put a sock in it. This is a great idea whose time has come, and needs support.
jX [ Make everything as simple as possible, but no simpler. - Einstein ]
Of course Mathworld is still with us! Of course it is!
o rld.wolfram.com/PolynomialEquation.html
;))
What is the answer? Forget freenet, gnutella, and whatever else you're thinking of. The pages are still out there. What we need is a great big URL filter, where dead pages can be resurrected, Lazarus-like, with this simple function:
void alive_url(char *dead_url, char *alive)
{
strcpy(alive, "http://www.google.com/search?q=cache:");
strcat(alive, dead_url);
}
Thus, when we ask "Where is the mathworld page on polynomials?" we get the response http://www.google.com/search?q=cache:http://mathw
Now all we need to do is convert that to cgi, and run everything through this simple filter. Presto, live site
Lord Google will look after us all.
"Elmo knows where you live!" - The Simpsons
Remember CueCat? That's pretty hard to find right now. If a content provider really decides to go all out, attacking, say, 2600 not just for DeCSS links but also plain text of html locations, and if they win, then the info will have to go way underground to survive. Servers off shore might help for awhile, but looks like the European Cybercrime Treaty is working on wrecking that as well. My opinion? Find servers in Mainland China, host it there, and tell Chinese Authorities you're trying to undermine captialist pigs in the U.S. and please don't shut you down. In a world where the US is this hostile towards free information, the only place to host it might be in countries hostile to the U.S. -
----------
ah honey, we're all resplendent - Bill Mallonee
in that case we can sue Slashdot for copyright violation. The page clearly states all comments to be the property of their respective posters...
People replying to my sig annoy me. That's why I change it all the time.
There are agencies that are basically godzilla-sized racks of VCRs and tape recorders (well, it's probably all digital, now) connected to satellite dishes, antennas, and able. And they record EVERYTHING. And I mean EVERY channel, every radio station, everything, so that there is a "backup" of whatever was broadcast.
I almost worked at one of these places.
And if you wanted, say, to use a clip from some TV station, you could go and get appropriate copyright permission from the copyright owner, and then get the clip from the billions of tapes in the warehouse.
I'm surprised that there aren't people archiving every UseNet post. It would certainly be an interesting exercise.
--- Jump!! Fire!! Bullet time!! - Lego version of the Matrix
Deja seems to be a pretty unique resource. They've already stopped allowing us access to "older" parts of their archives that I think are relevant. What happens when they decide they no longer want to support the current system at all. IMHO, these usenet archives need to be free and accessible to all. Damn! It bugs me that I can't even find something that I posted at the end of August... I need to find the answer again. It's like there is a black hole for about 10 days.
Fresh from the FAQ:
Q: What's this about a lawsuit?
A: In March 2000, CRC Press LLC, a subsidiary of Information Holdings Inc., filed a copyright infringement lawsuit in the Southern District of Florida, claiming that the web site mathworld.wolfram.com violates their copyright in Eric Weisstein's CRC Concise Encylopedia of Mathematics published by CRC in November 1998.
Q: Why do they think the site violates their copyright?
A: Three and one-half years ago, Eric signed a book deal with CRC in which he agreed to provide printed, camera-ready pages for the encyclopedia. He thought he was selling them a printed snapshot of his existing web site, not the whole web site. CRC now claims that he sold them his whole web site, not just a printed book.
Q: So, did he sell them the web site or not?
A: Eric did not believe he was selling them his web site: he thought he was selling them the right to print a book and that he would be able to keep his web site up. If he had had more experience in the publishing industry, he would have insisted on a contract that made this crystal clear, but he didn't. Eric's contract, which is a standard boilerplate book contract that has probably been signed by many other CRC authors, does not give CRC explicit rights to the website. However, the court found that the contract is ambiguous on this point. What Eric intended to sell CRC is at the heart of this lawsuit.
Q: Doesn't the standard "right to reproduce in all media" clause cover the web site?
A: The web site is not based on or derived from the printed book: it existed for years beforehand. We believe and argue that the printed book is a derivative work. We don't dispute that CRC would have the right to put up a web site containing, for example, PDF files of the printed book. But we strongly object to the idea that their copyright in the printed book allows them to reach back and gain control of Eric's preexisting, ever-changing, collaborative internet community.
Q: Did Wolfram Research just cave in and yank the site to avoid trouble?
A: Absolutely not. We have kept the site up as long as we were able, but unfortunately CRC requested and was granted a preliminary injunction that orders us to take the site down until the case goes to trial. By direct order of the court, we had no choice and no alternative but to take it down.
Q: Isn't a lot more harm being caused by taking it down than leaving it up?
A: We respect the judge's well-reasoned opinion that the site should be taken down until the dispute is settled: he considered the evidence available to him in the legal record. He simply did not agree that the harm to the community at large would be enough to justify keeping the site available.
Browser? I barely know her!
In contrast, someone that goes out there and sets up a "Slimey Sex Site" has got to know that they will see some sort of opposition, whether from:
The "porn" site would seem to me to be more likely to have some funding and concern about such attacks.
In effect, it may be more likely that the "pornsters" will get attacked, in one way or another; the fact that they can expect such attacks leads to them "hardening" themselves, at least from a legal perspective.
Thus, the taxonomy may be more like:
If you're not part of the solution, you're part of the precipitate.
One of my favorite sites was the Internation Lyrics Server. It contained song lyrics submitted by its visitors, so essentially they were the opinions of what the songs said (who can really tell what Iron Butterfly is singing in "In-A-Gadda-Da-Vida" anyway?). Nevertheless, the music industry sued, so the site shut down. It's sort-of back, but much of the database has been expunged. I'd love it if someone could get their hands on the original database and mirror it.
"If I have seen further than other men, it is by stepping on their glasses." - Michael Swaine
...Napster to that list. I, for one, have been carefully backing it up for about a year now...
-jpowers
-jpowers
I would have to rate OLGA as one of those sites I'd love to see never go away. Luckily it is mirrored a lot and the mirrors put up whatever they want. Still if OLGA was out of its legal troubles I bet there'd be 10x as much tab submitted. I'm actually suprised in all my time on Slashdot this is the first time OLGA came up, I think it is as bad if not worse than the Napster/DeCSS/etc... cases.
This does bring up the related note of sites that just go down permanently which had good info on them. Does anyone remeber the Antics and Mayhem site with info on Fresnel lenses, Frozen CO2 and links to all sorts of crazy stuff? It moved a few times but now its gone. Guess I will just have to stick with Backyard Ballistics
Antiporn! You do realize that if your ideas were enforced, the mere fact you posted a response to this would get you locked up, right? The mere use of the word 'Hell' could be construed as pornographic, and thus illegal under your facist regime.
For your daily injection of perversion, visit Stile Project. Officially endorsed by AntiPorn!
jX [ Make everything as simple as possible, but no simpler. - Einstein ]
I'm just a crazy goy, but I tell you his web resources are so fantastic you could plotz!
Lawsuits and threatening letters are expensive. Massive mirroring schemes work by making so many copies of the "forbidden" data that it would be prohibitively expensive to sue all the archives. If a company thinks that having some abandonware game available for free on the net will cost them $10,000 in lost revenue, and a nastly letter from the legal department costs $100, it makes financial sense to go after 1 or 10 or 50 mirrors. However, if there are more than 100 mirrors shutting them all down would be more expensive than forgoing the revenue lost due to downloading. Mirroring won't stop lawsuits, but it can make them too costly to use in some cases
0 1 - just my two bits
Erowid and The Lycaeum are in danger every time that bitch from California introduces another unconstitutional law against disseminating drug information.
Many major university computer science departments also have whole-Web archives for the purpose of running siumlations of spiders and other automated information collecting and processing tools.
The main problem is that this information is not always publicly accessible and is within the long arm of the lawyers. Maybe the best way to implement this would be to arrange to have somebody like HavenCo purchase these snapshots on a monthly basis, keep them in near-line storage and move censored content that is deemed important by the Information League back "into print".
There's actually this site called afterlife.org which offers to mirror important web-sites should the actual webmaster become dead. Whether you believe in spiritual afterlife or not, your webpages can live on!
Does my bum look big in this?
Weisstein certainly wouldn't be in this predicament if his website were being sold in book form: it predates the contract with CRC, and he says that it is not derived from that work (in fact, it's more likely that the reverse is true). Why should his website be treated any differently than any other former publication?
For a VERY long time, Eric absolutely demanded NO mirrors, and would firewall off and permanently deny access to anyone who tried to mirror it.
....)
This is why I like to mirror. That way, if a wonderful resource get's blocked/denied/taken down, I can still use it. (treasuretroves, digitalblasphemy,
-- Spoken as a small contributor and as someone who tried to mirror and was firewalled off.
This is the type of problem Freenet was designed to solve. Freenet is not really ready fro prime-time, but in a year or so, it might be ready for putting endangered information in...
--The knowledge that you are an idiot, is what distinguishes you from one.
I agree with you that mirroring information that various political regimes would repress is important. But the difficulties faced by Chinese activists lie in getting the information to a group or group in a first world country with liberal speech laws that will permit access to the information. With Human Rights Watch, Amnesty International, and the the like there are a large number of central locations for archiving this information in a protected manner. While this may someday become an issue for western democracies, it isn't yet.
The problem described in the original post is assuring the continued availability of information that is illegal to provide in the same liberal/western countries. If you've already decided to violate copyrite and intelectual property laws on a grand scale, where do you do it? Mirroring inside the US is ineffective for any information that isn't this week's cause celebre. DeCSS may be widely available now, but the furor will die, people will become less inclined to mirror, and indexed and accessible mirrors will eventually fold under legal pressure. For larger data sets/code bases this problem will be even greater.
In this light, is there a non-US/EU country with adequate connectivity, a stable government, and a track record for lax copyrite enforcement? People can shout HavenCo until they're blue in the face, but until Sealand has been hosting content without interference for at least a couple of years we need a stable fall back plan. I'm open to suggestions.
Wait... you mean you still haven't joined the ACLU?
Consumers in those countries are forced to use software created in the US because no one in those markets sees any money in producing software for their countrymen.
They are forced to use Microsoft software because no one else can make money producing software period. The advantage of one company owning the standards.
You can post your web sites on Mojo Nation (warning: this is in beta! It is not stable, but it works.). Documents posted to Mojo Nation are not deletable. (This is due to some complicated peer to peer architecture and RAID-like splitting of the data into multiple redundant shares, of which you need only a subset to reconstruct the original document. See the web site for docs.)
Regards,
Zooko, Evil Geniuses For A Better Tomorrow
CRC representatives will be at a number of technical conferences this year, including the Computer Security Conference in Chicago next week. I intend to visit their booth and talk to their representative about this shameful action. You should, too.
Assuming they don't just use an attractive freelancing schoolteacher, which other book companies seem to do...
In this instance the rights to the copyright were stolen from the author. The book was a collection of information that the author had compiled and made freely available on the internet. When he had the information published in book form, he did not intend to take down his web site.
about a hundred bucks in this case.
CRC Checksum Failure.
try { do() || do_not(); } catch (JediException err) { yoda(err); }
just search bn.com for "crc math". They're still $100 tho.
try { do() || do_not(); } catch (JediException err) { yoda(err); }
You can still get the MathWorld site out of the google cache. Here's a quick and dirty hack to make the google cache "navigatible:" http://net127.com/g c/i ndex.cgi/mathworld.wolfram.com/topics/
this one, without a doubt.
wishus
---
Two thoughts:
1. It seems that if not everyone who contributed to the site signed the copyright transfer agreement or if those who did sign were not aware that the site would be removed from the web (in part or in whole--who ever heard of rotating exclusion to information access!?), then those people might instigate a class action lawsuit against CRC.
2. Perhaps a new model of web publishing should be started, in the spirit of Eric's site, but with a legally binding, non-transferable GPL type of contract which is implicitly "signed" by anyone who contributes to the site. It might be structured in such a way that the copyright may never be transferred, in part or in whole. But maybe rights could be assigned non-exclusively in the event that a print form would be useful.
So, who is going to set up the new be-all-end-all Encyclopedia?!
People mention Freenet, but Freenet only protects information that is there. You've got to make sure that if police or military forces comes bursting in one night, information must be stored where they can't close it down and distributed from there.
Also, it is important that people who support a site don't use too much bandwidth and HD space before it gets serious. Othervice, people may not be able give the necessary resources.
What I have in mind is a network where those providing endagered resources can call for support (CFS). Those who respond to the CFS set up a software to download an image of the site every now and then (say once a month, once a week or something), and at least after controversial information has been published.
Next, we need something that sets off an alarm that the endagered site is being attacked. This has to include the possibility that the site just goes down without warning (military forces shoot the webmaster and blows everything to pieces, to take an extreme). This could be done by checking every now and then if the server is up, and if it stays down for any extended period of time, the alarm would go off. Naturally, there must not be too many false alarms, or the system will loose credibility. This pretty much rules out Windows as platform.... :-) Also, it should be possible for the administrator to set off the alarm by a single command, so that if somebody comes bursting in, they have to act fast to stop the information from being transmitted. Other features such as the administrator saying "if my site goes down at 12:15 and you don't hear from me, we're under attack". Also, intelligence might try to fool the system to mirror useless or bogus information, we would have to work hard to make sure we are one step ahead.
If a site is under attack, there are a number of things that could be done. First, put up a mirror of any information that you have stored, dump it on Freenet. Maybe some sort of system could be set up so that nameservers are updated with information about one of the mirrors, so that the web site has very little downtime? Perhaps a global network of name servers similar to the two provided by Granite Canyon's Public DNS service, where authority can be transfered as part of an alarm. One can also attempt to keep e-mail working as well, but that's of little use if the admin has been shot.... If the alarm has been set off by the admin, one should try to download a mirror as a part of the alarm response to get the latest.
I have also been thinking about how to use the internet to try to keep those suppressed online using minimalist solutions, e.g. TCP/IP over ham radio. It might have low bandwidth, but perhaps sufficient for e-mails...?
Employee of Inrupt, Project Release Manager and Community Manager for Solid
Bram and Greg and Drue hacked faster download and upload, and the latest version is substantially faster (especially noticeable on larger files). Give it a try and let me know (zooko@mad-scientist.com) if you are still unsatisfied.
It definitely works on RedHat and Debian. You might have gotten a screwed up RedHat build that I accidentally posted last week. It was only up for 10 minutes... Anyway, e-mail me if it doesn't work. There's no reason for it not to run on any Linux bux.
With 100 GByte hard drives coming soon(if there not here already), why don't we just keep a copy of every page we visit(maybe just a text copy) on our hard drive in a permenant cache. Or maybe bookmark certain domain names for this treatement.
That way each person has partial mirrors of their favorite sites, and in the event that one is shut down the community can try to rebuild the site.
One advantage of this is copyright people would have a hard time criticizing this practice as web browsers and servers are already caching lots of web content. We could then make our cache searchable and shareable ala gnutella. And hey, we'd have an easier time putting up mirrors of media slip ups, like Time/Warner linking to the DeCSS code before they're "corrected".
John McDonald wrote this excellent article on the subject of Mathworld, and its important legal implications for authors and publishers over who owns the digital rights to Internet content.
http://www.oreilly.com/news/treasure_1100.html
We know that the install process and documentation need to be fixed up. The reason we haven't done so yet is because there are only a handful of hackers working on Mojo Nation and we have been busy fixing up other more pressing issues like the aforementioned slow download problem.
Our next priority is making a new improved install process and making it easier to use.
By the way, if you could send e-mail to support@mojonation.net or mojonation-users@lists.sourceforge.net telling us specifically what was impossible without reading the manual, that would help.
Mojo Nation is an open source project. This doesn't mean you can't complain about it (I hate it when people say that you aren't allowed to complain about open source projects just because they are open source.), but it does mean that if you submit a patch that corrects the documentation or auto-configures the browser proxy or whatever, we will gratefully accept the patch and put your name in the CREDITS file, where it will be praised eternally by generations of grateful Mojo Net users.
In the case of sites that use the URL Query String to fetch pages out of the database. As long as the 'bot can physically cache the resultant file to disk and maintain a table of hard-coded URL Query Strings that point to the saved file, it shouldn't be much of a problem.
eg.4 t tp://www.somesite.com/index.php?page=145 20,25234
An original file can be obtained thusly:
http://www.somesite.com/index.php?page=14520,2523
the resultant HTML that is fetched when the 'bot makes that request would be saved in it's caching system. The actual URL string would be stored in a seperate table and accessed in some method by clients coming to the caching site, maybe like this:
http://www.chachingsite.com/fetch_page.php?page=h
(of course, the above string would be properly URL encoded, but I'm to lazy to do that for illustration purposes)
The request at the caching site would simply trigger a process that would fetch the physical file pointed to by the record with the id "http://www.somesite.com/index.php..."
This is just an illustration. It's not meant to be the ultimate solution, just prove that there are work arounds for handling dynamic sites.
-- kwashiorkor --
Leaps in Logic
should not be confused with
-- kwashiorkor --
Leaps in Logic
should not be confused with
Jumping to Conclusions.
If the resource is popular then it gets mirrored automatically by greedy block servers who are hoping to sell copies in return for Mojo to people that download it. (Note that you earn Mojo by running a Mojo Nation client, so it is more like "trading" your bandwidth and your disk space and the blocks you've collected for the blocks that the other guy has collected.)
So as far as I can tell, mirroring useful web resources that a large community uses is a perfect use for Mojo Nation. I wouldn't recommend depending solely on Mojo Nation at this point (BETA! BETA! It's the letter that comes before Gamma which is the kind of radiation that made Spiderman and The Incredible Hulk!), but I would recommend experimenting: take a web site that you are mirroring, do a `wget -r -k' on it, then run the Mojo Nation utility "cmdpub" on the resulting directory.
Regards,
Zooko, Evil Geniuses For A Better Tomorrow
Of course, if the vested IP interests get their way and are able to get perpetual copyrights passed then all libraries and historians could close up shop. They wouldn't be able to archive anything without tracing back the descendants of the original copyright holders.
They would need to e.g. find out who would inherit the copyrights to Shakespeares plays or Beethoven's 5th Symphony (not to mention Homer or Cicero's works) and pay them a fee to use the works.
A company could strike a deal with a copyright inheritor and own these works in perpetuity and sue to have them removed from all libraries and museums.
The most threatened material like decss or cuecat should be preserved, along with various software that my be patented in some counties but not others (Lame, BladeEnc etc) Also, anything useful that has the potential to draw legal threats like reverse engineered device drivers for WinPrinters (Lexmark), WinModems, parallel port win-scanners. Think about it. If Digital Convergence can go after software drivers on linux and claim that there secret protocol is protected IP then every other dumb hardware manufacturer out there could go after all sorts of drivers for Linux or any other OS other than windows. Some companies have been successful in squashing certain software such as cp4break, glide wrappers simply because mirroring didn't happen fast enough.
It is usually considered fair use to copy for historical, archival usage only. One would have go be clear that any mirroring under fair use would not be "republished", thus put on to a backup tape and put into a sealed envelope and mailed to ones self would be sufficient.
In the case of sites that use the URL Query String to fetch pages out of the database. As long as the 'bot can physically cache the resultant file to disk and maintain a table of hard-coded URL Query Strings that point to the saved file, it shouldn't be much of a problem.
I really don't think it's possible. Think about this for a second. The output generated by a particular database query is dynamic , remember. The output generated by a such a query is worthless for the purpose of mirroring because its content is good only for so long as the database state is the same. Think about a page with a search form. Or a submission field that changes the database state. You're screwed. Without a copy of the database, you just cannot mirror a DB driven site.
Even leaving aside the question of database state, query strings are not the only thing that can determine what HTML a particular person using a particular browser will see. What about pages with HTML generated by the client's IP or browser or whatever. There's no way you can simulate this without having the code.
One of the first incredable databases on the web was the Lyrics database. (I've finally forgoten the domain, I think it was something like lyrics.ch.se)
:)
It was all typed in by users by hand--Lyrics to nearly any song you could imagine, and it was created by us, and stolen from us.
I wonder why they haven't shut down the CDDB databases yet--What is the difference between that data and the data in the lyrics database? Both are simply bits available on a CD, just a different encoding mechanism
For me the most valuable resource on the Web that is in risk of disappearing is Dejanews.
We've already lost everything older than one year, and now what's left is being sold off to some unnamed party.
How much legal strong arming by some pro-censorship or copyright protection group will be required to remove it forever? Not to mention the COS.
I use Dejanews daily, and would sorely miss even this now diminished archive.
Face it, nothing will have its copyright ever expire again.
That's a rather simplistic view, though. Remember that the copyright expires 80 years after the death of the copyright owner, not 80 years after it was created. And if you explicitly sign away copyright to someone else (e.g. your estate), the copyright will last much longer. And, of course, if you sign the copyright to a company, it's a certain period after that company ceases trading (I'm pretty certain it's a shorter time, though).
But, you're probably right. Most stuff that's copyrighted now is owned by huge companies, and so for all intents and purposes, the copyright won't expire.
I personally think a far fairer system would be based on earning from a copyrighted work. Perhaps a company should have automatic copyright for ten years, and then have to prove that the work is still making them an arbitrary amount (say $10,000 p.a.) to have extended copyrights on. This would allow people to copy software that is no longer supported by the manufacturer, books that are out of print, music only available on old scratchy 33's, etc...
But I don't suppose for one minute that it's going to happen. The major companies concerned have far too much influence over government practices (witness the MP3 / DeCSS revolution that's getting them all really scared).
This is an ideal use for Freenet. Freenet saves you the trouble of having to run multiple mirrors. All you have to do is run a Freenet node, just like lots of people do. When someone tries to shut down a web site that you value, insert the web site into Freenet. Anyone else attempting to insert the web site will be told that it has aleady been inserting.
It's perfect for this sort of thing. The point, after all, is to defeat censorship.
The new version of Freenet has a web proxy interface to allow the easy viewing of web pages in Freenet. Also, there exist convenient scripts to mirror a site into Freenet.
What does matter is getting mathworld back online. And I see that as easily done. The simple fact is that CRC is not behaving in their own best interest. The web site is not competition for the book -- it's free advertising for the book. Besides which, who will sell them web content after this incident?
So CRC is just sabotaging their own product. And drying up any further web-originated product. And creating a lot of ill will in the process. They may have the legal right to screw themselves, but if enough people point out that they are screwing themselves, they might well stop.
Slashdotters have considerable power to communicate this point. There are a lot of them, they know about web economics, and they are precisely the kind of technical audience CRC depends upon. So here's the relevent contact info, taken from their web site:
CRC Press LLC Headquarters 2000 NW Corporate Blvd Boca Raton,FL, USA 33431
Phone 1(800)272-7737 x6066 (561)994-0555 Fax - 1(800)374-3401 (561)989-9732
Please , make this a exercise in lobbying, not a DoS attack. One short fax or phone call per person. Anything else is self-defeating.
__________________
Nice try, but I guess you are forgetting the DeCSS affair. How many *hundreds* of sites had that code up? Now where are they?
"Don't mind me cutting myself on Occam's Razor"
Your remarks show a remarkable lack of sophistication, or perhaps deliberate ignorance, of the issues.
If only you could stick your tongue out in your posting, your ad hominem would be complete, but your argument would still be heavily flawed.
Yes, the factors you mention would contribute to some shortcomings in their software industry, but a lack of any possible revenue due to ubiquitous software piracy means that they're more than just behind. Their software industries are doomed to never develop. A lack of respect for copyrights is the most important consideration in all this, and that was my point.
As evidence, look at their hardware industries. SE Asia has a thriving hardware industry... because you can't just photocopy/bit-copy hardware.
As I mentioned before, the damage to their software industries is largely self-inflicted.
Why are you letting these clowns ruin our country?