Dealing w/ Copying of Online Articles via Open Proxies?
Creosote asks: "Concerns about piracy are no longer just for the big commercial media outfits. JSTOR, one of the major repositories and distributors of online versions of scholarly journals, has been hit by crackers taking advantage of open proxy servers to download about 51,000 articles from 11 JSTOR journals. Even nonprofit academic publishers rely on income from publications to exist, so the spectre of large-scale unauthorized copying is legitimately scary to them. In a letter to librarians and publishers, the president of JSTOR notes that while the "threat of open proxies has been recognized for some time in the web community...it does not appear that network administrators, librarians, or content providers are aware that organized efforts are being employed to gain unauthorized access to restricted campus resources" through them. I work for a nonprofit publisher (a university press) that will soon be making peer-reviewed digital projects available online, and they can't all be given away for free, so this hits close to home. Are there better solutions than turning into an attack dog, ala the RIAA and the MPAA?"
what is an open proxy?
At least thats what I hear anyway. ;)
~~~
Piracy is wrong? (angelic expression)
Er, copyright infringement, because piracy has such a "dirty" sound to it.
And coppyright infringement is either a triviality or a birthright, so the arguments here go.
*
More seriously, I sympathize. I guess the honor system is out?
Ideally, even peer-reviewed work (or, I would hope especially peer-reviewed, because it is significantly value-added) would be in the public domain anyway. The single best approach would be to acquire grant or public funding as a one-time purchase of the data.
After honor system and public domain come the tedious closed-access or copyright-suit methods, which are vulnerable to hacking and piracy, respectively. In case there are further alternatives, I'll be lurking here to hear them.
Another retarded open proxy problem
If you can't risk that your data is copied, don't publish it, at least not digitally. Others will see an opportunity where you see a threat. We'll have to wait and see who is right in the end. "Information wants to be free" may sound like a naive romantic vision, but there is some truth to it. Think about how much free information your whole life is based on and how many people worked to create that information. Would things really work if information did not have a tendency to break free of restrictions?
The people who stole the articles probably have no intention to resell them. Probably, they were just doing it because they could. The articles will sit on some hard drive somewhere, and eventually be deleted.
It would be impossible to resell the articles without revealing who stole them. Also, would you want an article from an unknown source, that could have changed it?
Anyone with content on the web has three choices when it comes to people stealing their copywritten materials.
1) Say nothing and absorb the losses
2) Become aforementioned "attack dog"
3) Take the materials down
Pirates would like nothing more than for content providers to do choice 1. However, that's an unlikely scenario to last for a long time and eventually the content provider will have to resort to either attacking pirates (ala RIAA) or simply take their ball and go home.
At least with #2 the stuff stays online and is accessible to legitimate users of the material. No one wins if the material goes offline.
I have been pwned because my
Don't have a single point of failure. Whitelisting IPs for access is great, but just like any other method of authentication, it has its weaknesses and should be used in conjunction with any number of other authentication mechanisms.
I find it difficult to sympathize with people who wish to keep academic journals locked away.
I'm not sure where focus on IP address issues has come from... but RFC 2616 and RFC 2617 explicitly discuss secure access to WWW entities, and IP address's are not the key.
IP address restrictions are of only limited use, due to HTTP's stateless behaviour. As I've noted in another post, chains of proxies will quickly eliminate any IP based restrictions.
Some steps that JSTOR could take include adding cache-control headers (must-revalidate comes to mind) to prevent cache hits occuring without the JSTOR servers knowledge, and thus allow them to perform partial validation on the actual client (i.e. by checking the Via header). Note that checking the Via header is less-than-secure, but better than simply trusting the customers proxy to be secure.
Secondly, use authentication - assign a username and password to the content, using (say) Digest authentication, which is proxy friendly. Mark the content as explicitly cachable with revalidation, and you will get 1 If-Modified-Since request per download from proxies, and be able to check the user details each time. There would be an administrative issue with this, but I'll leave creative approachs to that as an exercise.
One can imagine various enrichments to this model (e.g., allowing a reviewer to go back and change his opinion of the article if he finds he cannot replicate the results in his laboratory), but I think you get the basic idea. Having everything in the open domain will indeed shut down the revenue for academic journals, but that doesn't mean that the time-honored system of peer review has to go down the drain, it just needs to be updated.
(Note: Reviewers who haven't yet published anything, and who do not have tenure at a recognized academic institution, will be awarded zero moderation strength; this is still a closed system for academics, even though it is based on openness. The usual disclaimers for strength of encryption - to ensure no impersonations - apply.)
For some reason I really like #3. They're really not that important and getting rid of them will allow other faces to see the light.
If these so-called publishers were interested in academic integrity, they would GPL these so-called journals make distribute Free knowledge to the entire world.
Conformity is the jailer of freedom and enemy of growth. -JFK
Stealing 51,000 articles through an open proxy? You better believe that's a paddling!
Trusting that there will be enough people who care about your materiel to support your publication costs?
Fact is, most of these materiels are being sold to university libraries who have open/semi open access to the materiel as a mission statement. These organizations will still subscribe whether or not these materiels are found 'in the open'. For one, they have to behave legitimately, and for another they know that the best way to continue the existence of these materiels would be to keep paying their dues.
At the same time, these dues which constantly come in support you currently, correct? Are you just trying to maximize profit or is there a genuine concern of people switching to non legitimate sources and thus a problem with continued existiece? If it's the former, SHAME! The latter, well, publicise that information. Show real data about not being able to survive and then ask slashdot again.
And who knows, open access seems to work for some academic publishers, who know that the dues to continue their existence will come in because people care to support the content. Maybe it'll even work for you.
For a commercial example of the same, do check out the baen free library and what it has done to their sales. (www.baen.com)
On google
Contrary to what the poster asks I feel the peer review process could be best served by using an open model. Most reviewers give their time for free. Most of the cost of a journal goes to publishers and printers.
A collabaritve method would seem to benefit the authors, the reviers and the readers. In fact the only losers would be the publishers/printers.
I'd love to read the latest journals but not at the prices they are asking.
Are there better solutions than turning into an attack dog, ala the RIAA and the MPAA?
This is essentially the argument for DRM. You want to be able to provide electronic information but in a way that it cannot be duplicated at will. Both Intel and Microsoft are working hard on making this possible and within a few years better solutions will exist than exist now.
So the short answer to your question is "sort of, but in practice not for another 2 years or so". I'm sure other posters will address the sort of solutions. If you want to know what's coming Palladium FAQ.
The more important issue as an academic press is where you are going to stand on the right to read. Academia depends on a relatively free flow of information that is inexpensive. By its very nature what you are asking to do is be able to control the downstream flow of information.
You may find that when the technology is available it is rejected by the academic community. You'll then have to decide if you are primarily a commercial agency providing digital content like Disney or Time Warner; or primarily an academic agency which supports freedom of information exchange even at the cost of lost sales.
Anyway I suggest the following essay on the moral issues. the right to read.
Are there better solutions than turning into an attack dog, ala the RIAA and the MPAA?
Here's my solution: publish the articles online with deliberate errors. Make sure that people who download them legitimately know what those errors are, so they can account for them, but people doing big bulk downloads as described in the question won't know (and probably won't care). Then they can publish it as much as they like, but they'll soon realise they've got themselves a dud.
Just an idea. Whether it's practical in your situation is another question - I don't know enough about what you're doing to answer that one.
(Spudley Strikes Again!)
"Even nonprofit academic publishers rely on income from publications to exist, so the spectre of large-scale unauthorized copying is legitimately scary to them."
So let me get this straight,
There are a large number of these publications available for download by people within certain IP blocks. These are available to be freely downloaded by anyone in those blocks. And people are using proxies to download them? There is no other security to prevent unauthorized users from accessing the site?
Am I the only one who sees something wrong here?
I especially like the use of the word "cracker". The big bad hacker used open proxies to hax0r your download page! Seriously though, if you're counting on IP-based authentication for networks you don't have control over, you're BEGGING for problems. IP based authentication only works under the premise that the machines with those IPs can be trusted.