Ensuring Permanence Of Online Scientific Journals
"To help solve this problem, the Stanford Library is collaborating with the National Science Foundation and Sun to create a system called LOCKSS (Lots Of Copies Keep Stuff Safe). LOCKSS is an open source, java/linux based server system which is designed to run on cheap computers at libraries and permanently cache journals to which the libraries subscribe. The LOCKSS systems talk to each other to preserve the integrity of their caches and ensure that there are always at least a minumum number of copies of each article around the world. Read about the current alpha test at the LOCKSS homepage or in this article in the Chronicle of Higher Education "
Sounds like self-interrogating distributed file systems can be useful to people unlikely to get sued by rock bands, as if that wasn't obvious.
The immeadiate benefit of this kind of thing over current distributed services such as FreeNet is of course the fact that data stored on LOCKSS will be permanently available irrespective of how many times people actually request that page. On FreeNet a page is only kept on the network for as long as people are actually requesting the page - there is a "decay" of old information which makes it unsuitable for this kind of guaranteed archival.
The other advantage of the LOCKSS system is that it maintains a certain number of redundant copies across the network, and regularly checks these against each other to ensure that the integrity of each copy is undisturbed by accidents and general bit rot. This system could keep data in pristine form for an indefinite amount of time - as long as the system runs the data is available and correct.
But as for its use as an archive for other kinds of content as suggested in the story? Well, given that it doesn't appear to be anonymous like FreeNet, the same problems that we're now seeing with Napster will undoubtedly occur, and given that the whole point of the system is to keep files on there no matter what happens, the people running the LOCKSS servers will want to keep a close eye on what goes onto the system since removal will be fairly difficult. I doubt that it'll take off for this kind of purpose without the guaranteed anonymity that FreeNet has.
Another related project worth a look at is the Internet Archive which provides snapshots of public Internet sites for researchers.
I'll be amazed if they can get a significant number of major publishers to agree to this. I work in a company related to the electronic publishing industry and I know that publishers are just as fussy about their copyright as any other industry, if not more so.
I would suspect that libraries participating in this kind of project leave themselves open to all kinds of action in similar ways to the Napster issue. Since most if not all libraries have a limited budget any threat from a publisher is likely to cause the software to be removed, which doesn't really produce a confident, secure archiving solution.
It is certainly true that this is one of the biggest issues in the electronic publishing industry at the moment though, if not THE biggest.
Q.
..the LOCKSS system is simply a webpage caching system with the added feature of being able to talk to other PCs and compare webcaches of the same document. Doesn't sound like a replacement for Napster or Gnutella to me.
This would be A GOOD THING (tm) if there was any real value-add to these online journals, but let's face it, most of them are just transitions from the in-house print format to pdf/html.
When 'proper' scientific online journals emerge - ones that allow online peer review, rapid publication (by which I mean hours, not weeks), and generally facilitate scientific debate and progress, whilst allowing access to all interested parties (e.g. how many medical journals do you know that accept submissions from patients?) this will be an issue. But then they won't be able to have a paper version, and the giant publishing houses will fall (Yeah, naive, I know) and it SHOULD be different.
What I want is an online journal in docbook, or some other xml that allows me to do proper contextual searching, and ask questions like what papers talk about knees, have been reviewed by 'respected' personages, cited by at least 20 other authors and are less than 2 months old
At the moment this might be a good mechanism for public facilities such as schools and libraries that don't have the space/staff/cash to take the paper versions of these journals, but ensuring the permanence of online journals - there ain't really any such thing YET.
Raist@postmaster.co.uk
Methink LOCKKS is good for save and redundant storage of data. However, it gives *no* waranty for the storage on the *long term*. Actually, I think that is a real problem for digital data. This problem is far from being solved.
A total different point of view is the following. What good is an ideal storage system for the scientific world? Probably none, because that is not the problem.
As a PhD student, I'm deeply involved in the scientific world, not to say that I'm (going to be) a scientist myself. But a progressive one, who certainly likes the net, and electronic publishing. I've published electronically myself, at Brain Research Interactive. However, ZERO response!
The scientific culture has the following properties:
(i) Scientists have highly conservative attitude. If 'the others' don't like it, they simply will no touch it.
(ii) Naturally, status is very important. Only scientists with very, very high status can change things. Furthermore, the status of your publications is all. You will not publish in a journal with a low status, only if your data is bad. So, as long as the electronic journals don't have a high status, they will be neglected.
(iii) Apart from the aforementioned characteristics, the peer review is important. The peer review mechanism is more important than how the journal looks like, or its medium. And for the peer review, you need good editors and a good system, which of course is expensive.
In conclusion, just put up a website with articles and call it a journal, won't work. The safety of the data is only a minor point.
But, let's keep on trying!
Jeroen
Writing about music is like dancing about words - FZ
Hence, it would be an apples-to-oranges comparison to compare LOCKSS to Napster or FreeNet, which are meant to provide a more dynamic sharing service with other users on the Internet.
In fact, from the FAQ, it seems that LOCKSS can't be accessed from outside the implementating facility (it only caches the journals libraries subscribe to), so your whole concern is really moot.
Go get your free Palm V (25 referrals needed only!)
Before I'd so much as seen a webpage I was using DATASTAR and Dialog (and a couple of other big online databases.) For those who haven't seen them, they are /awesome/ -- they have complete, indexed fulltext of literally thousands of newspapers, newswires, magazines, journals (academic and popular). I was thinking about this last night in the context of searching, ie that I was lucky to have had some training on searching those (they had their own oh-so-user-friendly commandline search languages) .
This is the biggest missed opportunity of the web/net. Searching for articles on something using standard web search engines is slow, painful, and often you end up with a random assortment of stuff. You spend ages sorting spurious hits from the real thing, following links that look like they might be relevant but actually aren't, and so on.
What I would like is a web interface to one of those databases. I'd even be willing to pay small amounts to get the fulltext of an article once located. Too ofen the best info you can find is a mixture of someone's personal notes, a couple of academic sites' "top level overviews" without anything specific and a bunch of lame niche sites. When I first heard about the web I naively imagined it might become something like the great free public lending library; alas, not so.
Is there any chance of digital access to the LOCKSS info ? Not unless you're physically in the library, I guess. Ah well.
vila: a long and noble tradition
Camaron de la Isla 'When I sing with pleasure, my
"None are more hopelessly enslaved than those who falsely believe they are free." -- Goethe
Hmm, perhaps somethink along the same lines as Advogato could be applied here. It's a weblog like /. but instead of a moderation system it relies on a "trust metric" where users are certified by other users ensuring that people with relevence to the field are given more of a voice. Of course, it's not a perfect system but it would definitely be more productive than /.'s moderation system for an online journal.
You'd also need more advanced formatting (perhaps a LaTeX to HTML converter since LaTeX seems to be the preferred choice for writing papers) so that equations, tables and graphs could be included in both the paper and responses, a decent search engine with multiple criteria for finding articles/comments.
I think you could do this now, but it would be a very difficult project to code. Still, maybe someone out there's working on it?
All you have to do is get the tape archive of Echelon and then you've got copies of everything.
What we could do with is an online _based_ submission and review site for scientific papers; something based on the /. model (with a discussion area for online discussion and analysis of papers, some sort of versioning to allow corrections by the author, and the ability to rate papers on a scale of 1-10). Papers scoring highly (a weighted average of the scores) could then be submitted to a more formal 'classic' peer review, then see real paper (thus allowing Real World income from the process). The distilled papers that emerge from this should be of a higher quality, with the authors of papers that make good points but that have glaring holes given time to repair their mistakes, and in cases where a reader/reviewer is in a similar field and can fill in gaps the author missed, opportunities for both to produce a joint paper that neither could have competently completed alone.
--
-=DaveHowe=-
There is also another area for concern with online journals. When you or your library subscribe to a hard print journal you get sent a paper copy which you get to keep and refer to whenever you want -- for the rest of eternity if you keep it in good condition!
This may not be the case with an online journal. Here the publisher can license the journal to you in such a way, that if you decide to stop your subscription, you don't just not get access to future editions, but you lose access to material you previously did had access to.
I don't know how prolific this kind of licencing is, but I bet we are going to see more of in the future.
This may not be too much of an issue at the moment as many journals are hard copy + online access; but eventually the hard copies are going to go.
If you use an online Journal check out its license, and see where you stand.
Keep your programs tidy.
Exitzero.
It seems like most people are looking as to if this could be used to store, illegally, copyrighted data.
This is for SCIENTIFIC journals. While the journal does have copyright protection, they run articles written by researchers at various universities (and in industry).
The desire of more researchers to be published has resulted in additional journals being formed. Because publishing a journal on the web is dirt cheap, it makes sense that with the Internet available, more of these journals will appear.
The problem is, you need an archive of it. This system is a system to guarantee that we do not lose knowledge. It doesn't even have to be available. They could cut a deal with the publishers of the journals that they will maintain the archive, but that if the company goes out of business or stops providing old articles, the archive can show them.
This would be voluntary, but the publishers would jump at it. Why? Because this system gives them more credibility than a web page alone. The guarantee against the future loss is the best protection for their journal, which makes it more likely to get high quality entries.
While Freenet or other groups may use similar technology, this is a COMPLETELY different project. This isn't about letting people submit data and protect it, this is about preserving the body of scientific knowledge so we don't lose it when a company goes bankrupt. Digital versions are easier to duplicate than paper equivalents, but our system of copyrights is trying to discourage that. E-books, E-journals, E-magazines have a significant risk. A copy of an article in a manilla folder can be lost or destoryed, but is otherwise perfect. A bookmark to a website can disappear at the whim of a publisher, and there are legal AND technical attempts to prevent you from properly saving an article...
It's a very strange situation, and projects like this are VERY important to prevent us from losing knowledge.
I know this sounds elitist, because I'm worrying about the body of knowledge of scientists but not other people. Here is the thing, the information age has allowed more people to publish their ideas and beliefs. However, because we have all jumped onto this technology, we didn't take adequate safeguards to ensure that we don't LOSE anything in this transistion.
If we archived everything that was traditionally published, we'd have the old status quo. If we archive everything traditionally published and let others publish non-archived, we have a better system than the status quo. An environment where we publish everything, maintain nothing is questionable. In some ways it is better than the status quo, more liberal publishing, and in other ways worse, more data loss.
The idea is to come up with a STRICTLY better system, where NOTHING is lost and we gain some advantages. Normally, there are tradeoffs. The goal is to avoid tradeoffs, and just make things better.
Alex
Even one of the head developers of FreeNet has said your post was BS and still you defend it.
He has also said, as of right now, that your posts are the truly pointless and stupid ones.
-
We cannot reason ourselves out of our basic irrationality. All we can do is learn the art of being irrational in a reasonable way.
Understood, the business model developed in the paper age when someone had to print & distribute academic papers. But I cannot see a good reason why firms lke Elsevier should continue to be as hugely rich as they appear to be.
The web offers an easy way to take most of the cost out of the loop. What cost remain - that of web publishing, and having journals edited & papers reviewed, should (it seems to me) be capable of being funded from academic departmental budgets, in return for the academic judos of being an reviewer/editor/web publisher.
In this enlightened scenario, there would be very much greater dissemination of the knowledge produced, to the benefit of a very much wider set of users.
There are other dead tree arhives that need saving.
Museums also hold important archives that will simply not be available to the public in the near future as older books and journals become too fragile to allow casual browsing.
For example the Natural History Museum in London contains archives dating back hundreds of years. The original diaries of Darwins voyages are held there but the pages are so fragile now that ordinary visitors can no longer examine them.
Paper sources do not have an indefinite life and if they were reproduced electronically they could be available online as a resource to be treasured and not let to wither away only accessible to a few select researchers. A system such as LOCKSS could provide a cheap method to preserve ancient tomes and to promote wider access.
Huh?
This kind of sychonizing mechanism is already implemented in all sorts of distributed networks. Any rudimentary distributed file system (i.e. Coda) will perform such necessary synchronizations with each other. The point is the LOCKSS was never intended to be an truly-open-free-for-all system, and it respects copyrights by staying that way.
No one in their right mind would creating a Napster-like sharing program that automatically synchronizes files with many other users and implement a cache that NEVER erases (after all, isn't that your whole point about it better than FreeNet?)
I said it before, and I'll say it again - it's an apples-to-orange comparison. Given any sufficient and effort and time, any software system has the potential to do possibly anything. Saying this system has the potential to be a better FreeNet is akin to say ICQ has the potential to be a better OS, if "implemented" in such a way. The goals of FreeNet and LOCKSS are fundamentally different in every sense, and forcing one to be another would just give you a complex and inefficient hack of a system.
Go get your free Palm V (25 referrals needed only!)
My fear is that increasing amounts of resource will be poured into maintaining these academic papers for posterity, when many are nothing more than a rehash of earlier work, or turn out to be pure crap anyway.
Yes, but the "pure crap" shouldn't get past the whole peer review process to start with, so you'll only be left with the material that is worthwhile. And any further moderation is censorship really - who decides what is and what isn't worthwhile? It'd have to be someone involved in the field in order to be able to properly judge it, but on what criterion are you going to judge whether one paper is worth more than another?
In short, it needs moderating. Books and papers going out of print, or the final copies getting lost, are natures way of moderating irrelevant crap. Because, if it wasn't irrelevant, someone would have invested in preserving it.
But what may seem to be completely irrelevant when it was first written may turn out to be essential to a later development a hundred years down the line. Especially in maths there are a lot of small developments which seem to be pointless at the time but which turn out to be a key part of a greater whole discovered later. You can't decide to throw things away on the basis of "relevance", since relevance is something which you can never tell at the time.
He was talking about scanning paper documents, i.e. those printed before computers became widespread.
...phil
...phil
"For a list of the ways which technology has failed to improve our quality of life, press 3."
I thought this was one of the reasons for the Library of Congress: to preserve information. Now, if we can just persuade the LoC to cat rectum | gunzip >head and realize the end of the 20th century is nigh we might not need things like this.
www.eFax.com are spammers
Right now we have little dificulty looking at journals (scientific or otherwise) from 150 years ago or more. The pages may be brittle, but the information is there. Obviously, documents thousands of years old still exist and provide priceless information to researches.
The larger goal of electronic preservation is to ensure (if possible) that the electronic data of today will have a lifespan equal, or better, to a physical copy. Right now, this is not possible, and it actually seems that the situation for digital information is worse than for physical data. Think about it, depending on whom you talk to, a CD will last 25-75 years, magnetic media have obvious limitations. Then there is the problem of technology/platform change. How can we guarantee that in 150+ years, PDF, HTML, or even digital data itself will be able to be read? We could be using quantuum computers with some bizarre storage medium taht are completely incompatable with today's technology.
A common solution seems to be to just transfer file formats to tomorrows technology as it is created, and transfer these files from the old media to new as it expires. But at the rate humanity is accumulating knowledge, we could quickly be spending more of our time changing PDFs to whatever, and then that whatever to the next whatever, and so on. Another solution is to maintain the archaic platforms of today so that our files can be read. But, however well cared-for, mechanical devices will break down in time, so that is not a realistic option.
Another possible option is to maintain write emulators for all of the platforms in exsistence today, so that they can run linux or windows or whatever on the badass machines of tomorrow. But that runs into loads of proprietary technology/patent/copyright/legal issues that slashdotters are all well familiar with. Data needs to be freely available to researchers of the future. It should be just as easy (ie, no license required) as it is to pick up a book off the shelf.
So from what I know of digital library collection preservation, the situation at present is pretty grim. We are spending huge amounts of money to rush to publish documents in a digital format, with no assurances that 100 years from now (much less a thousand), this data will be available for general concumption. We are all hoping that it will just "work out" or a technological panacea will emerge.
For more on this topic, try this link.
- Slowing down publication is a good thing. It forces the writer to take the time to double check the data, make sure that the paper is readable, and the conclusions you are presenting are correct. (You'll be amazed at what you catch)
- Contrary to popular belief, "peer review" does not (nor should not) happen instantly. When I recieved a paper to review, I would first read through it once, and then again to make sure that everything made sense. Then, I would go to the library and look at some of the prominent references mentioned in the paper, to get familiar with the research and see the paper "in context". (Again, science does not happen in a vacuum.) Many times a scientific paper is very specialized, and even experienced referees may not be innately familar with the subject matter. (Case in point: I was an experimentalist, and would often need to go find some of the latest theoretical work.)
- Paper journals force you to live outside of your specialty. Online searches are too good at giving you 'exactly what you want'. When you thumb through the paper journal (or even the online version of a paper journal), you might find a paper that is pretty applicable to what you are doing. ("Hey, that's pretty close to what I'm trying to do!") You lose this with specific topic searches.
Paper journals provide what online forums struggle with: crap control. It's not a perfect system, but forces a little thoughtfulness into the process. (Besides, I can't imagine wading through a bunch of "FIRST POST" and "Hot grits in the pants" articles in the latest copy of Phys. Rev. Letters...)Here at UNC-CH (where I serve on the Library Administrative Board; teach Library & Information Science; also I help run Project Gutenberg - good enough?), total subscription costs for journals published by Elsevier are around $1millionUS/year. Subscription costs go up every year.
The deal that Elsevier offered for access to their e-journal collection (electronic access to print journals) was a little complicated, but boiled down to:
The solution is to accept the deal (sort of Faustian, I'd say). At the same time, we've made local agreements with Duke and NC State to make sure one location keeps a print copy of every journal we're otherwise getting an electronic copy of.
This way, libraries are sure they're continuing their archival role (with paper, in this case), but at the same time trying to offer the benefits of electronic access to their constituents.
Bottom line: While we don't really know how to best maintain archives to ejournals, at least libraries can cooperate to make sure some sort of access is retained, while going forward with new e-journals.
no problem! just put all your priceless articles on an NT webserver* in Word2000 format. they'll be available for the ages to Share and Enjoy!#
* assuming the evil "hackers" don't trash your webserver (though we have no idea how, since there are no security holes in IIS)
# at least until we release the next version of Word
seriously, I don't think it's going to happen any time soon. I'm publishing a paper this summer, and there is a whole freaking PAGE I have to add in saying that SAE owns any and all rights to my paper, etc.
personally, I find that a little scary.
nor do I think that they're going to let just anyone archive them -- maybe that service that already archives about a zillion news sources. that's subscription-based, though, and I don't want to see what's going to happen once we don't have paper copies to rely on anymore!
Lea
no problem! just put all your priceless articles on an NT webserver* in Word2000 format. they'll be available for the ages to Share and Enjoy!#
Lea
* assuming the evil "hackers" don't trash your webserver (though we have no idea how, since there are no security holes in IIS)
# at least until we release the next version of Word
This doesn't just touch physics journals- although the physicists are more likely to be rational about the issue than the record or movie industries. But no debate about copyright or intellectual ownership I have seen to date has looked at the long-term issues. And by long-term, I don't just mean years or decades, I mean thousands of years.
Everyone decries the burning of the library of Alexandia. It's destruction has greatly impoversihed us today, it denied us access to the thoughts and works of the people who wrote those books. The burning of the library of Alexandria was a lobotimization of human culture.
But if Alexandria was a lobotomization, today's IP rules are senility. This is because, for short-term gain, the ignore the fact that the only way information survives long-term is if it's _copied_. No media lasts forever. Pop quiz: how many books today are older than 200 years old? Euclid's "Elements" survives today because it was copied, and copied again. The image of cloistered monks painfully hand-copying books survives to this day.
We need to address the concerns not only of the artists, writers, creators, and IP corporations, not only of the IP consumers, but also the concerns of our decendents a thousand years from now. Otherwise, we risk being a culture with no history, and therefor no future.
Really, who cares? This isn't the burning issue. Fact is, any papers of any worth whatever will be downloaded and stored by others working in the same field. The University where the paper originates will certainly have a copy.
The real issue which everybody seems to be ignoring is that most scientific journals charge exorbitant subscription fees. Remember that in order to cover some very active fields (eg neuroscience) you need access to a dozen journals. This puts access to scientific papers completely our of most peoples' reach since most people don't live within a short walking distance of a University library.
Most published scientific research comes out of universities and government-funded laboratories and much of it is ultimately paid for by you the public. In any event most scientists today would surely agree that this most fertile fruit of human knowledge should belong to humanity at large, not a select few.
Us open source types often bang on about how the open source model of software development is theoretically sound because it closely resembles the peer-review model of scientific progress. But while with open source software absolutely anybody can obtain code through easily accessible channels and abslutely anybody is welcome to contribute, with mainstream scientific research not only are laymen mostly excluded from actively participating, they also generally can't even get to read the published details.
The whole system of publication in subscription-only journals is thoroughly outdated and completely inappropriate for the so-called "information age" IMHO.
Consciousness is not what it thinks it is
Thought exists only as an abstraction