Ensuring Permanence Of Online Scientific Journals
"To help solve this problem, the Stanford Library is collaborating with the National Science Foundation and Sun to create a system called LOCKSS (Lots Of Copies Keep Stuff Safe). LOCKSS is an open source, java/linux based server system which is designed to run on cheap computers at libraries and permanently cache journals to which the libraries subscribe. The LOCKSS systems talk to each other to preserve the integrity of their caches and ensure that there are always at least a minumum number of copies of each article around the world. Read about the current alpha test at the LOCKSS homepage or in this article in the Chronicle of Higher Education "
Sounds like self-interrogating distributed file systems can be useful to people unlikely to get sued by rock bands, as if that wasn't obvious.
This is very serious Issue. Paper is rotting away as we speak, valuable documents dating from the first person to put pen to paper. these need to be digitally scanned for future generations. Out
Excluding libraries bursting into flame or being flooded, paper documents are still more reliable than magnetic media (or most other computer-based storage methods).
If an online-only journal is important enough, maybe some institutions (such as university libraries) should print out and archive the material, which would also help those without Internet access (gasp - these people exist?).
Imagine - the slashdot collection. 23 square miles of library, every story and comment ever posted!
The immeadiate benefit of this kind of thing over current distributed services such as FreeNet is of course the fact that data stored on LOCKSS will be permanently available irrespective of how many times people actually request that page. On FreeNet a page is only kept on the network for as long as people are actually requesting the page - there is a "decay" of old information which makes it unsuitable for this kind of guaranteed archival.
The other advantage of the LOCKSS system is that it maintains a certain number of redundant copies across the network, and regularly checks these against each other to ensure that the integrity of each copy is undisturbed by accidents and general bit rot. This system could keep data in pristine form for an indefinite amount of time - as long as the system runs the data is available and correct.
But as for its use as an archive for other kinds of content as suggested in the story? Well, given that it doesn't appear to be anonymous like FreeNet, the same problems that we're now seeing with Napster will undoubtedly occur, and given that the whole point of the system is to keep files on there no matter what happens, the people running the LOCKSS servers will want to keep a close eye on what goes onto the system since removal will be fairly difficult. I doubt that it'll take off for this kind of purpose without the guaranteed anonymity that FreeNet has.
Another related project worth a look at is the Internet Archive which provides snapshots of public Internet sites for researchers.
Fianlly maybe we can have all the papers locally and let us reasonably search through them. It is incredibly annoying when you have to visit several sites to do a full text search
If you liked this thought maybe you would find my blog nice too:
Lets see if they are going to sue every library over the world for exchanging illegal mp3's
Jeroen
Secure messaging: http://quickmsg.vreeken.net/
I'll be amazed if they can get a significant number of major publishers to agree to this. I work in a company related to the electronic publishing industry and I know that publishers are just as fussy about their copyright as any other industry, if not more so.
I would suspect that libraries participating in this kind of project leave themselves open to all kinds of action in similar ways to the Napster issue. Since most if not all libraries have a limited budget any threat from a publisher is likely to cause the software to be removed, which doesn't really produce a confident, secure archiving solution.
It is certainly true that this is one of the biggest issues in the electronic publishing industry at the moment though, if not THE biggest.
Q.
..the LOCKSS system is simply a webpage caching system with the added feature of being able to talk to other PCs and compare webcaches of the same document. Doesn't sound like a replacement for Napster or Gnutella to me.
And because of psychology and sociology I know that your post will probably a result of some strange twist in your head caused by your direct social environment ;)
Jeroen
Secure messaging: http://quickmsg.vreeken.net/
This would be A GOOD THING (tm) if there was any real value-add to these online journals, but let's face it, most of them are just transitions from the in-house print format to pdf/html.
When 'proper' scientific online journals emerge - ones that allow online peer review, rapid publication (by which I mean hours, not weeks), and generally facilitate scientific debate and progress, whilst allowing access to all interested parties (e.g. how many medical journals do you know that accept submissions from patients?) this will be an issue. But then they won't be able to have a paper version, and the giant publishing houses will fall (Yeah, naive, I know) and it SHOULD be different.
What I want is an online journal in docbook, or some other xml that allows me to do proper contextual searching, and ask questions like what papers talk about knees, have been reviewed by 'respected' personages, cited by at least 20 other authors and are less than 2 months old
At the moment this might be a good mechanism for public facilities such as schools and libraries that don't have the space/staff/cash to take the paper versions of these journals, but ensuring the permanence of online journals - there ain't really any such thing YET.
Raist@postmaster.co.uk
Methink LOCKKS is good for save and redundant storage of data. However, it gives *no* waranty for the storage on the *long term*. Actually, I think that is a real problem for digital data. This problem is far from being solved.
A total different point of view is the following. What good is an ideal storage system for the scientific world? Probably none, because that is not the problem.
As a PhD student, I'm deeply involved in the scientific world, not to say that I'm (going to be) a scientist myself. But a progressive one, who certainly likes the net, and electronic publishing. I've published electronically myself, at Brain Research Interactive. However, ZERO response!
The scientific culture has the following properties:
(i) Scientists have highly conservative attitude. If 'the others' don't like it, they simply will no touch it.
(ii) Naturally, status is very important. Only scientists with very, very high status can change things. Furthermore, the status of your publications is all. You will not publish in a journal with a low status, only if your data is bad. So, as long as the electronic journals don't have a high status, they will be neglected.
(iii) Apart from the aforementioned characteristics, the peer review is important. The peer review mechanism is more important than how the journal looks like, or its medium. And for the peer review, you need good editors and a good system, which of course is expensive.
In conclusion, just put up a website with articles and call it a journal, won't work. The safety of the data is only a minor point.
But, let's keep on trying!
Jeroen
Writing about music is like dancing about words - FZ
My fear is that increasing amounts of resource will be poured into maintaining these academic papers for posterity, when many are nothing more than a rehash of earlier work, or turn out to be pure crap anyway.
OK, storage and networking costs are coming down all the time. This may be so, but the infrastructure has to be provided and maintained, for an open ended volume of data. In addition, there is a cost to each researcher in the future who may have to trawl through a load of lame crap to find the paper they need.
In short, it needs moderating. Books and papers going out of print, or the final copies getting lost, are natures way of moderating irrelevant crap. Because, if it wasn't irrelevant, someone would have invested in preserving it.
Once these papers are moderated out of usage through neglect, a future generation may indeed be interested in them. We have a name for such people - archaeologists. These are people who sift through the discarded, irrelevant crap of civilizations, to find out about the civilizations who went before them, the ones who valued these discarded things.
Stephen Hawking has written another book. It's about time as well.
Hence, it would be an apples-to-oranges comparison to compare LOCKSS to Napster or FreeNet, which are meant to provide a more dynamic sharing service with other users on the Internet.
In fact, from the FAQ, it seems that LOCKSS can't be accessed from outside the implementating facility (it only caches the journals libraries subscribe to), so your whole concern is really moot.
Go get your free Palm V (25 referrals needed only!)
Before I'd so much as seen a webpage I was using DATASTAR and Dialog (and a couple of other big online databases.) For those who haven't seen them, they are /awesome/ -- they have complete, indexed fulltext of literally thousands of newspapers, newswires, magazines, journals (academic and popular). I was thinking about this last night in the context of searching, ie that I was lucky to have had some training on searching those (they had their own oh-so-user-friendly commandline search languages) .
This is the biggest missed opportunity of the web/net. Searching for articles on something using standard web search engines is slow, painful, and often you end up with a random assortment of stuff. You spend ages sorting spurious hits from the real thing, following links that look like they might be relevant but actually aren't, and so on.
What I would like is a web interface to one of those databases. I'd even be willing to pay small amounts to get the fulltext of an article once located. Too ofen the best info you can find is a mixture of someone's personal notes, a couple of academic sites' "top level overviews" without anything specific and a bunch of lame niche sites. When I first heard about the web I naively imagined it might become something like the great free public lending library; alas, not so.
Is there any chance of digital access to the LOCKSS info ? Not unless you're physically in the library, I guess. Ah well.
vila: a long and noble tradition
Camaron de la Isla 'When I sing with pleasure, my
"None are more hopelessly enslaved than those who falsely believe they are free." -- Goethe
There indeed is a difference, but I think you make a mistake here.
Astrology is about predicting the future or horoscopes with some mumbling about the cosmos blah blah blah.
Astronomy is a real science, and seti is trying to prove a that there is live out there. They are simple trying to provide evidence for a mathematical model that says that there is a good chance that there is live outside our solar system. no astrology involved here.
The assumption that it will not be of any use is also wrong, although not in the near future there is a change that theorizing about quantum theory, relativity and such will provide faster computers/communication in the near future. The caveman rubbing two sticks together probably got critisized to for not hunting but once he got fire I bet people said something different....
Jeroen
Secure messaging: http://quickmsg.vreeken.net/
It's a good idea, but I can't see how they can 'ensure that there are always at least a minimum number of copies of each article around the world'.
First of all, according to them, only libraries which subscribe to certain journals will have the articles - therefore if only 5 libraries subscribe only 5 copies will exist.
Secondly, conversly, what happens if 5,000 libraries subscribe? That'll be a lot of redundent storage - yeah, I know it is sorta what they are aiming for, but I'm guessing that these journals will need a bit of file space per issue.
Oh, and I just love the bit about 'java/linux based server system which is designed to run on cheap computers at libraries' - most library computers I've seen are still 1986 type modules (no GUI). The 'high-spec' machines in my city library (which has around 6 public 'dumb terminals', 8 staff 'dummies' and 3 public 'PCs') have a per-hour charge for usage.
Richy C.
--
Hmm, perhaps somethink along the same lines as Advogato could be applied here. It's a weblog like /. but instead of a moderation system it relies on a "trust metric" where users are certified by other users ensuring that people with relevence to the field are given more of a voice. Of course, it's not a perfect system but it would definitely be more productive than /.'s moderation system for an online journal.
You'd also need more advanced formatting (perhaps a LaTeX to HTML converter since LaTeX seems to be the preferred choice for writing papers) so that equations, tables and graphs could be included in both the paper and responses, a decent search engine with multiple criteria for finding articles/comments.
I think you could do this now, but it would be a very difficult project to code. Still, maybe someone out there's working on it?
It's OK to get drunk with your professors. Just don't sleep with them. Drink a tall glass of water NOW and take 2 asprin. E-mail me in the morning.
A-men. Aaaa-men. Aaaaaaa-men. A-men! A-MEN!
I shouldn't say this since I normally don't criticize ACs, but you are so full of shit that a bowl of hot grits down your pants would actually make you smell better.
Thank you?
Yes, I read the article, but what I was responding to in part was the comment at the end of the article blurb:
Sounds like self-interrogating distributed file systems can be useful to people unlikely to get sued by rock bands, as if that wasn't obvious.
Given that this system has the potential to be implemented in such a way as to be openly accessible rather than limited to a set userbase, and that was what HeUnique seemed to be implying, that was why I wrote what I did. What it is now is not what it could be in the future, and this type of mechanism can be implemented in other distributed networks as well. And given that the content it stores does not necessarily have to be scientific journals, I think my point does apply.
All you have to do is get the tape archive of Echelon and then you've got copies of everything.
This is because ACs have brains too. The problem is a biased moderation system that
a) unfairly demotes AC often by 2 points
b) encourages kharma-whoring by offering a +1 attack
The solution is not e-mail registration, that is for someone willing have this garbage traced back to them (a liabilty) despite getting absolutly no compensation to post this shit. Think about it: why should I have either an e-mail or an IP address associated with this drivel?
Registration should only secure you a user name to aid continuity. E-mail and IP logging is unneccessary. User name and password - unix got it right how come slash got it wrong?
What we could do with is an online _based_ submission and review site for scientific papers; something based on the /. model (with a discussion area for online discussion and analysis of papers, some sort of versioning to allow corrections by the author, and the ability to rate papers on a scale of 1-10). Papers scoring highly (a weighted average of the scores) could then be submitted to a more formal 'classic' peer review, then see real paper (thus allowing Real World income from the process). The distilled papers that emerge from this should be of a higher quality, with the authors of papers that make good points but that have glaring holes given time to repair their mistakes, and in cases where a reader/reviewer is in a similar field and can fill in gaps the author missed, opportunities for both to produce a joint paper that neither could have competently completed alone.
--
-=DaveHowe=-
For the record, I've not only read your drunken rant, but enjoyed it. In fact, it is "funny".
There is also another area for concern with online journals. When you or your library subscribe to a hard print journal you get sent a paper copy which you get to keep and refer to whenever you want -- for the rest of eternity if you keep it in good condition!
This may not be the case with an online journal. Here the publisher can license the journal to you in such a way, that if you decide to stop your subscription, you don't just not get access to future editions, but you lose access to material you previously did had access to.
I don't know how prolific this kind of licencing is, but I bet we are going to see more of in the future.
This may not be too much of an issue at the moment as many journals are hard copy + online access; but eventually the hard copies are going to go.
If you use an online Journal check out its license, and see where you stand.
Keep your programs tidy.
Exitzero.
It seems like most people are looking as to if this could be used to store, illegally, copyrighted data.
This is for SCIENTIFIC journals. While the journal does have copyright protection, they run articles written by researchers at various universities (and in industry).
The desire of more researchers to be published has resulted in additional journals being formed. Because publishing a journal on the web is dirt cheap, it makes sense that with the Internet available, more of these journals will appear.
The problem is, you need an archive of it. This system is a system to guarantee that we do not lose knowledge. It doesn't even have to be available. They could cut a deal with the publishers of the journals that they will maintain the archive, but that if the company goes out of business or stops providing old articles, the archive can show them.
This would be voluntary, but the publishers would jump at it. Why? Because this system gives them more credibility than a web page alone. The guarantee against the future loss is the best protection for their journal, which makes it more likely to get high quality entries.
While Freenet or other groups may use similar technology, this is a COMPLETELY different project. This isn't about letting people submit data and protect it, this is about preserving the body of scientific knowledge so we don't lose it when a company goes bankrupt. Digital versions are easier to duplicate than paper equivalents, but our system of copyrights is trying to discourage that. E-books, E-journals, E-magazines have a significant risk. A copy of an article in a manilla folder can be lost or destoryed, but is otherwise perfect. A bookmark to a website can disappear at the whim of a publisher, and there are legal AND technical attempts to prevent you from properly saving an article...
It's a very strange situation, and projects like this are VERY important to prevent us from losing knowledge.
I know this sounds elitist, because I'm worrying about the body of knowledge of scientists but not other people. Here is the thing, the information age has allowed more people to publish their ideas and beliefs. However, because we have all jumped onto this technology, we didn't take adequate safeguards to ensure that we don't LOSE anything in this transistion.
If we archived everything that was traditionally published, we'd have the old status quo. If we archive everything traditionally published and let others publish non-archived, we have a better system than the status quo. An environment where we publish everything, maintain nothing is questionable. In some ways it is better than the status quo, more liberal publishing, and in other ways worse, more data loss.
The idea is to come up with a STRICTLY better system, where NOTHING is lost and we gain some advantages. Normally, there are tradeoffs. The goal is to avoid tradeoffs, and just make things better.
Alex
The solution is not e-mail registration, that is for someone willing have this garbage traced back to them (a liabilty) despite getting absolutly no compensation to post this shit. Think about it: why should I have either an e-mail or an IP address associated with this drivel?
Agreed. I mean it's only a weblog isn't it? It doesn't really matter who you are or what you post at all in the real world. And I don't really see the point in starting ACs at zero - after all, anyone can get an account and post the same stuff at 1. And WTF does an E-mail account say about who you are anyway? Nothing really.
Even one of the head developers of FreeNet has said your post was BS and still you defend it.
He has also said, as of right now, that your posts are the truly pointless and stupid ones.
-
We cannot reason ourselves out of our basic irrationality. All we can do is learn the art of being irrational in a reasonable way.
Understood, the business model developed in the paper age when someone had to print & distribute academic papers. But I cannot see a good reason why firms lke Elsevier should continue to be as hugely rich as they appear to be.
The web offers an easy way to take most of the cost out of the loop. What cost remain - that of web publishing, and having journals edited & papers reviewed, should (it seems to me) be capable of being funded from academic departmental budgets, in return for the academic judos of being an reviewer/editor/web publisher.
In this enlightened scenario, there would be very much greater dissemination of the knowledge produced, to the benefit of a very much wider set of users.
LOCKSS, is a very good idea, but perhaps the Gutenberg Project or something similar would be a suitable receptacle for all the knowledge. The only obstacles would be copyright or lack of resources to transfer the data.
I am a man, not a toy.
Thanks. So I was wrong, but hey, I'm not going to lose any sleep over the issue :) And at least the replies which were more than "MODERATE THIS SHIT DOWN!" have allowed me to learn something, so in a way it was worth me making the original point anyway...
If you're worried about the archival quality of CD-ROMs, that is a reasonable concern, but there are archival issues with books too that are dealt with by all libraries - they just have to learn some new stuff.
There are other dead tree arhives that need saving.
Museums also hold important archives that will simply not be available to the public in the near future as older books and journals become too fragile to allow casual browsing.
For example the Natural History Museum in London contains archives dating back hundreds of years. The original diaries of Darwins voyages are held there but the pages are so fragile now that ordinary visitors can no longer examine them.
Paper sources do not have an indefinite life and if they were reproduced electronically they could be available online as a resource to be treasured and not let to wither away only accessible to a few select researchers. A system such as LOCKSS could provide a cheap method to preserve ancient tomes and to promote wider access.
Huh?
This kind of sychonizing mechanism is already implemented in all sorts of distributed networks. Any rudimentary distributed file system (i.e. Coda) will perform such necessary synchronizations with each other. The point is the LOCKSS was never intended to be an truly-open-free-for-all system, and it respects copyrights by staying that way.
No one in their right mind would creating a Napster-like sharing program that automatically synchronizes files with many other users and implement a cache that NEVER erases (after all, isn't that your whole point about it better than FreeNet?)
I said it before, and I'll say it again - it's an apples-to-orange comparison. Given any sufficient and effort and time, any software system has the potential to do possibly anything. Saying this system has the potential to be a better FreeNet is akin to say ICQ has the potential to be a better OS, if "implemented" in such a way. The goals of FreeNet and LOCKSS are fundamentally different in every sense, and forcing one to be another would just give you a complex and inefficient hack of a system.
Go get your free Palm V (25 referrals needed only!)
I am not associated with kuro5hin.
I visit both sites.
I point out k5 because I would like to see slashdot changed. If, before change, slashdot is replaced, so be it.
I will not develop my own site using Perl or slashcode or whatever you call it.
If that's the case, please post while registered next time. No self-respecting AC would post that drivel. That AC checkmark box is a pox on this site. Either end AC posting or end the war on AC posters. Slashdot wants to have its cake and eat it too.
Yes, very much so. It is trying to solve a huge problem, and it's a nice try.
Where people are missing the point is in that they are assuming that academic journal publishers are nice, happy people who live in the academic world and believe in sharing information. This just doesn't happen in real life.
I work on a project involving linking abstract information to full text. Some publishers won't even allow "deep-linking" to individual articles on their web sites - I'm talking about one of the major scientific journal publishers here - their URLs have an encrypted hash at the end to prevent anyone from producing deep links.
Publishers are incredibly protective of their copyright and I just don't see something like this taking off for major journals. The article I read was talking about JStor - well that's a not-for-profit organisation and they only store back issues anyhow (a very useful service, don't get me wrong, but they're not a primary publisher).
The desire of more researchers to be published has resulted in additional journals being formed. Because publishing a journal on the web is dirt cheap, it makes sense that with the Internet available, more of these journals will appear.
The reason this isn't happening as quickly as it would appear is that existing scientific journals have a great "kudos" and prestige associated with them. Every article in those journals is peer-reviewed by respected academics in the same field. Having an article published in one of these big journals can greatly affect an academic's career prospects and pay.
Doing this independently over the web is not impossible but it's a bit more difficult than just throwing up journal articles on a web site.
I'd love to see a "Slashdot" style journal where people put up articles and otehrs within the field replied with comments, but we're a bit of a way away from this yet.
Q.
I have removed kuro5hin from my profile because I am NOT associated with them and don't wish to cause confusion.
I am also not associated with CNET - however they invariable post things 3 days before slashdot or 3 days after. Kinda funny actually.
I thought this was one of the reasons for the Library of Congress: to preserve information. Now, if we can just persuade the LoC to cat rectum | gunzip >head and realize the end of the 20th century is nigh we might not need things like this.
www.eFax.com are spammers
Many scientists keep electronic copies of articles they consider valuable or important in their field. (Not me, of course, I would never do this, unless it turns out it's not illegal.) I'm sure just about everyone I know keeps a little collection of pdf files, and collections of other kinds of data will certainly become popular to the extent that valuable articles are published in those formats.
Anyway, it seems to me this ought to produce a natural survival of the fittest articles. Those articles that are most widely appreciated will be cached in the most locations. In a large enough field, this is almost certainly already the case. If less popular articles (i.e., those least appreciated by the scientific masses) are lost, this is probably no greater a tragedy than the loss of work which goes unpublished (i.e., those least appreciated by 2-3 reviewers).
Of course, this enormous and rapidly growing archive is completely unorganized, and doesn't provide an easy mechanism for public access for the distant future. But the level of concern that valuable articles will be lost should be less than if people weren't already making enormous personal archives.
Theres also the Internet Archive who are building a library of snapshots of publicly accessible Internet sites, currently standing at 14 terrabytes of information stored on of information on digital linear tapes.
The Internet grows at a rate of 10 percent a month, according to the Archive's estimates, while the average life of a Web page is only 75 days. Obviously, a lot of data is being lost. Much of that comes from commerce and media sites that often kill pages containing obsolete information.
But some of this information is still relevant to researchers and historians
For example The Internet Ecologies Area at Xerox's Palo Alto Research Center is using multiple snapshots from the Internet Archive on disk -- "the Web in a box" -- as a kind of test tube for understanding the Web.
The ultimate goal of Internet Archive is to provide free access to the Internet's complete past, so that individuals looking for clues into how a culture changes will have one more medium to play around with.
Call it LIBNET.
Hell you could even have Deja News index all the articles for you.
Deleted
Agreed that paper is still the most useful form of storage if done right (acid free, well stored, etc.), but the digitalized form does address two major issues:
:)
(1) accessibility of the data from anywhere;
(2) having an additional archive for redundancy.
Being a history buff myself, I have something of an attachment to good old paper and it's close friend microfiche.
-L
This is definitly a great idea. Alot of people would lose out on information if it wasn't for something like this. I hope experiments like this will even help us push towards preserving our printed material. Here at the University I attend, there has been a real push to preserve printed material in a digital format. I just wish this would help urge the Library of Congress start some sort of "digital preservation" project.
The reason it is both valuable and necessary to be able to store all kinds of documents in perpetuity is two-fold:
(a) you will never lose anything by having more information available to you
(b) only hindsight can tell you what kind of information will be valuable.
If we had been able to store all the information about past civilisations then we wouldn't need archaeologists, who are in essence glorified hardware-based search engines!
A far-fetched example to illustrate point B: far in the future, years after chickenpox and all other viral diseases have been eradicated, a random mutation creates a new chickenpox-like disease. Diseases were eradicated, so people decided storing information about how to cure them was unnecessary. The plague of ChickTwo wipes out all life on earth :-) (Hmm, I smell next year's blockbuster...)
A more down-to-earth example (I always think of them later) for point A would be that you encounter an engineering problem that was attempted but never solved years ago: it would seem documenting a failed attempt at doing something would be a bad idea, but in reality being able to check what everybody else has already tried would greatly accellerate your own attempts to solve the problem.
Information is going to be so easy to store in such large amounts that soon the issue of whether to bother to store it will fade away. What does need work, as numerous others have mentioned, are better search engines and methods of ranking items by relevance, not importance: this is not the same as throwing them away.
Relevance-based searching, as opposed to popularity-based, is why Google is a better search engine than most/all others.
Disclaimer: I have a MLIS and I used to work for an organization affiliated with OCLC. I now work for a wholly owned subsidary of Reed-Elsevier, who btw is not participating in this project that I am about to write about.
I completely understand the need to archive data/research, especially those found in STM journals (Science/Technology/Medicine). History has shown us the dangers of not being diligent in archiving AND it has shown us the difficulty with archive. There already exists an organization in the library community that is providing an excellent archiving solution. That organizatin is OCLC. They have been a repository since 1967. Starting out with archiving cataloging records and sharing them (for cost to preserve/maintain) to their membership.
OCLC's archiving solution is called ECO, Electronic Collections Online, where a good number of publishers from around the world are supplying OCLC with digital copies of their journals to be maintained. Additionally as technolgy and storage media change, OCLC has taken the leadership in migrating that data to new standard formats as they evolve. Information on ECO may be found here and specifically information on the archiving is here and the participating publishers are here.
Of course everything has a cost. Any university that is taking on this type of activity should really do a serious study on why they are doing it, how much are they willing to spend, will they or future administrations continue funding their archiving project, or should they combine resources with agencies/organizations that are already doing this.
Right now we have little dificulty looking at journals (scientific or otherwise) from 150 years ago or more. The pages may be brittle, but the information is there. Obviously, documents thousands of years old still exist and provide priceless information to researches.
The larger goal of electronic preservation is to ensure (if possible) that the electronic data of today will have a lifespan equal, or better, to a physical copy. Right now, this is not possible, and it actually seems that the situation for digital information is worse than for physical data. Think about it, depending on whom you talk to, a CD will last 25-75 years, magnetic media have obvious limitations. Then there is the problem of technology/platform change. How can we guarantee that in 150+ years, PDF, HTML, or even digital data itself will be able to be read? We could be using quantuum computers with some bizarre storage medium taht are completely incompatable with today's technology.
A common solution seems to be to just transfer file formats to tomorrows technology as it is created, and transfer these files from the old media to new as it expires. But at the rate humanity is accumulating knowledge, we could quickly be spending more of our time changing PDFs to whatever, and then that whatever to the next whatever, and so on. Another solution is to maintain the archaic platforms of today so that our files can be read. But, however well cared-for, mechanical devices will break down in time, so that is not a realistic option.
Another possible option is to maintain write emulators for all of the platforms in exsistence today, so that they can run linux or windows or whatever on the badass machines of tomorrow. But that runs into loads of proprietary technology/patent/copyright/legal issues that slashdotters are all well familiar with. Data needs to be freely available to researchers of the future. It should be just as easy (ie, no license required) as it is to pick up a book off the shelf.
So from what I know of digital library collection preservation, the situation at present is pretty grim. We are spending huge amounts of money to rush to publish documents in a digital format, with no assurances that 100 years from now (much less a thousand), this data will be available for general concumption. We are all hoping that it will just "work out" or a technological panacea will emerge.
For more on this topic, try this link.
Logically, the term 'relavant' sic requires a to: clause - relevant to what?
OK so obviously it's a troll, but the points are worth refuting anyway
Could a physicist have come up with Java ? MP3 ? Napster ? KDE ?
dunno, but
- Tim Berners-Lee is a physicist and we wouldn't be having this discussion if he hadn't looked into technological approaches to sharing scientific documents, and
- mp3 is a compression algorithm (from Al-Kwharizmi, 14th century arabic mathematician) and so a mathematician (or several) DID come up with mp3
- hardly life-changing exsamples, now are they?
CS and Marketing, these are the Physics and Math of the new economyforget CS, think Engineering. Forget Marketing, it's an activity not a science. And then think, hmm, the biggest explosion in literature volume over the last twenty years is in the biomedical disciplines. That's not only where a lot of research is being done, but also where staggering amounts of money are being spent. And there can be no biology without chemistry, no chemistry without physics and no physics without maths.
I think ...
actually, it doesn't look much as though you do....
I suggest you talk to some scientists at some point and get a clue
TomV
True ACs rarely attack their own kind. Please uncheck the "post anonymously" box if you have any integrity.
- Slowing down publication is a good thing. It forces the writer to take the time to double check the data, make sure that the paper is readable, and the conclusions you are presenting are correct. (You'll be amazed at what you catch)
- Contrary to popular belief, "peer review" does not (nor should not) happen instantly. When I recieved a paper to review, I would first read through it once, and then again to make sure that everything made sense. Then, I would go to the library and look at some of the prominent references mentioned in the paper, to get familiar with the research and see the paper "in context". (Again, science does not happen in a vacuum.) Many times a scientific paper is very specialized, and even experienced referees may not be innately familar with the subject matter. (Case in point: I was an experimentalist, and would often need to go find some of the latest theoretical work.)
- Paper journals force you to live outside of your specialty. Online searches are too good at giving you 'exactly what you want'. When you thumb through the paper journal (or even the online version of a paper journal), you might find a paper that is pretty applicable to what you are doing. ("Hey, that's pretty close to what I'm trying to do!") You lose this with specific topic searches.
Paper journals provide what online forums struggle with: crap control. It's not a perfect system, but forces a little thoughtfulness into the process. (Besides, I can't imagine wading through a bunch of "FIRST POST" and "Hot grits in the pants" articles in the latest copy of Phys. Rev. Letters...)Sure, journals cost a lot of money to produce (in the hardcopy world). But a whole lot of academic journals are simply an exercise in price-gouging. They charge $10K because they know damn well that the faculty of a university will DEMAND that the library carry a specific high-prestige journal. There isn't any fundamental reason to charge what they do -- witness their profit margins. I'm sure Elsevier would make noise about how their high-margin journals finance the low-margin ones, but that's simply a lie -- Elsevier makes too much raw profit for that to be the case.
This, to my mind, is why online academic publishing is so important -- information won't be locked up in these expensive ghettoes any more, and more researchers (and students) will be able to access it.
This is also, not coincidentally, why you won't see any major companies like Elsevier getting involved in low-cost online journals (for them it would be like killing the goose that laid the golden egg).
The only way Elsevier will go online with their stuff is if they can charge multiple hundreds or thousands of dollars for access.
I have no
Our taxes also fund the local fire department, but that doesn't mean they have to give us rides in their big red trucks for free.
I take it by "Free software ethos" you really mean you want journals to be free? Part of the reason academic journals cost so much is because (a) they have a very limited audience, (b) they generally don't sell their space for advertising, and (c) their target audiences, research institutions and universities, can (usually) afford these prices. Subscription rates for individuals, while expensive, are not outrageously so in my opinion for most journals.
If you've ever tried purchasing an esoteric book in the science or mathematics fields, you've probably experienced something similar: a 150-page book may retail for 150$, when the local grocery store hawks pulp fiction by the metric ton. As you identified, it results from their business model: if you are only going to sell a few thousand of something, then a high markup is required in order to make even a modest profit on your work. While I agree that academic books and journals could be cheaper, and they should be so when the distribution costs are lowered due to electronic publishing, I doubt that they could be made completely free without sacrificing quality in the process. Many journals that publish electronically (for example, the Physical Review Letters) offer lower subscription rates for the electronic version of their journals than the paper version.
Incidentally, free electronic journal services do exist, e.g. the Los Alamos e-Print archive at xxx.lanl.gov. One thing you will probably notice is that while many of the articles are outstanding, just as many are "I wiped my nose this morning and decided what I saw on the tissue was publishable so here it is" quality. It's hit-or-miss with these articles sometimes. Standard practice among many disciplines is to archive an early draft of the work on the ePrint archive and then publish the refereed, edited, corrected version in a journal such as the Physical Review....
In this enlightened scenario, there would be very much greater dissemination of the knowledge produced, to the benefit of a very much wider set of users.
...which brings me to my point: High quality, refereed journals that cost money are, in many ways, superior to unfiltered electronic archives precisely because they charge for their services and then in turn use a portion of that money to perform quality control. Part of what one pays for is the process of having experts in the field (hopefully) perusing each article closely to catch mistakes made by the authors or elucidate points the authors left unclear. Editors coordinate the refereeing process, and publishers maintain an infrastructure for ensuring this process happens in a timely manner. As long as people are willing to pay for quality control, then a market will exist for these journals. Electronic publishing can do away with many of the costs of publication and distribution of the information, but I don't see how it can reduce the cost to zero without asking publishers to simply get out of the publishing business altogether.
The other problem here is not that data is lost due to simply disappearing. We also loose all the intermedate revisions of a document. All we end up with is the final final final, most recent draft.. I suspect (especially with politically sensitive reporting for historical purposes) that the intermediate copies could add a lot to the sum total of information.
Here at UNC-CH (where I serve on the Library Administrative Board; teach Library & Information Science; also I help run Project Gutenberg - good enough?), total subscription costs for journals published by Elsevier are around $1millionUS/year. Subscription costs go up every year.
The deal that Elsevier offered for access to their e-journal collection (electronic access to print journals) was a little complicated, but boiled down to:
The solution is to accept the deal (sort of Faustian, I'd say). At the same time, we've made local agreements with Duke and NC State to make sure one location keeps a print copy of every journal we're otherwise getting an electronic copy of.
This way, libraries are sure they're continuing their archival role (with paper, in this case), but at the same time trying to offer the benefits of electronic access to their constituents.
Bottom line: While we don't really know how to best maintain archives to ejournals, at least libraries can cooperate to make sure some sort of access is retained, while going forward with new e-journals.
They are self-organizing and self-policing.
If they do a bad job, then people start new
ones to do a better job. The client chooses
which ones to join or start based on their
perception of quality.
I'm sure that if the LOCKSS system works out well the librarians and archivists will still complain.
;D
They are just the type who will complain about anything. I mean, they get to sit all day and stare at shelves of smelly old books, and they have nothing better to do than complain.
Personally, I think that if they really want to do something they should all get themselves some really reliable laser printer, and get started on making those electronic archives permanent.
Opportunities multiply as they are seized. --Sun-Tzu
no problem! just put all your priceless articles on an NT webserver* in Word2000 format. they'll be available for the ages to Share and Enjoy!#
* assuming the evil "hackers" don't trash your webserver (though we have no idea how, since there are no security holes in IIS)
# at least until we release the next version of Word
seriously, I don't think it's going to happen any time soon. I'm publishing a paper this summer, and there is a whole freaking PAGE I have to add in saying that SAE owns any and all rights to my paper, etc.
personally, I find that a little scary.
nor do I think that they're going to let just anyone archive them -- maybe that service that already archives about a zillion news sources. that's subscription-based, though, and I don't want to see what's going to happen once we don't have paper copies to rely on anymore!
Lea
no problem! just put all your priceless articles on an NT webserver* in Word2000 format. they'll be available for the ages to Share and Enjoy!#
Lea
* assuming the evil "hackers" don't trash your webserver (though we have no idea how, since there are no security holes in IIS)
# at least until we release the next version of Word
This doesn't just touch physics journals- although the physicists are more likely to be rational about the issue than the record or movie industries. But no debate about copyright or intellectual ownership I have seen to date has looked at the long-term issues. And by long-term, I don't just mean years or decades, I mean thousands of years.
Everyone decries the burning of the library of Alexandia. It's destruction has greatly impoversihed us today, it denied us access to the thoughts and works of the people who wrote those books. The burning of the library of Alexandria was a lobotimization of human culture.
But if Alexandria was a lobotomization, today's IP rules are senility. This is because, for short-term gain, the ignore the fact that the only way information survives long-term is if it's _copied_. No media lasts forever. Pop quiz: how many books today are older than 200 years old? Euclid's "Elements" survives today because it was copied, and copied again. The image of cloistered monks painfully hand-copying books survives to this day.
We need to address the concerns not only of the artists, writers, creators, and IP corporations, not only of the IP consumers, but also the concerns of our decendents a thousand years from now. Otherwise, we risk being a culture with no history, and therefor no future.
Why is paper considered more "permanent" by any standards? It is bulky, it requires trees, something we don't have a lot of, it is way too difficult to disseminate (as someone who has done a quite some research I can tell you how backbreaking it is when you have to lug massive journals to and fro), it is difficult to search through it. And if you happen to find that the particular article you happen to be looking for has been neatly cut out by someone, well can't do much about it except cursing it. Photocopying journals is a pita. And most important, the speed with which research is done today means that by the time that paper journal is compiled and reviewed and edited and mailed, the whole thing is already outdated. Journals aren't something you curl up in bed with (usually). And data on hard drives is way more easy to backup, archive, disseminate then on paper. Paper decays with time and degrades with use, which can't be said for electronic media. I for one, think that paper journals can be completely done away with.
Farhat.
At the intersection of computation and biology.
If they're worried, they should just archive away.
As for the journals making it difficult, I think the universities are paying way too much for an independent party to just organise and collate _their_ material.
The universities should just get together and set up their own journals. They can then publish, subscribe and archive for whatever prices they want.
Storage is pretty cheap considering the volume involved and how much they are already forking out to the journals per year.
Cheerio,
Link.
Can anyone here say NIH?
(8-DCS)
paper will last longer than any computer.
if you print on hemp or cotton paper, there are samples that have lasted more than 400 years. you really think that the html you create now and store on a hard disk or CD-ROM will still be accessible in 400 years... <ha ha ha ha ha!>
anything physical/electronic is transitory -- get over it.
No, they put out fires for free; that is the purpose of funding them. The purpose of funding them is not to provide a taxi service.
I concede some of the points you make, such as that there are free & low cost electronic journals, that cost will never fully be wrung from the system, that you generall have to pay for quality, that there is too much crud around.
If government can fund research and education - as it does in the UK and EU and presumably US - can it not fund whatever costs cannot be wrung out of a system that dispenses with the Elseviers? Yes, you need reviewers and editors and administrators, and the better they do their jobs the better the quality is. But in my experience, reviewing journals provides kudos for the reviewer, which they parley into better & higher paid academic jobs. The better the reputation of the journal they review for, the more the kudos. For me, writing and reviewing papers would seem to be part of the normal fare of the academic, the salary of whom is paid by the state. Editorial and admin tasks are surely not so onorous that their being funded by Universities (The MIT Journal of This, The LSE Journal of That) would cause bankrupcy. A can see a Free Journal business model which appears to be Win (for readers) Win (for academics) Win (for government) Win (for society, development, implementation of that which arises from the research) Win (for quality...or at least, no worse quality than is currently the case.)
Granted, finally, we might have to meet in the middle ground somewhere. Currently the stakes are well against the impoverished - or even the fairly wealthy - would be reader.
And, I would argue, there is value in getting this information out of the University library shelf and into the hands of the masses (or that subset that is interested). Yes, we need better crud filters and less trash published. But we need better access now to the fruit of the research that we fund.
Finally, I recall back in 95-96, in the UK, there was much debate in government as to whether government publications should be sold or given away free on the internet. Thankfully, the decision from central government was "publish for free". And its great. Now, with ease, I can read Hansard, Bills, Acts, Statutory Instruments, Research Papers. Government recognising that citizens should not be charged to read those things they have paid to be written. Let us see if the Academics can be pursuaded to look down from their lofty comfortable subsidised towers into the real world - small town England - where there is the same demand to read that which for which we have already paid.
(I once had lunch with the chairman of Elsevier; nice bloke - he picked up the bill too.) Hmm. I think
In fact, as it happens, LOCKSS is being worked on in association with HighWire Press at Stanford (disclosure: also my employer :), which publishes the online versions of nearly 200 major scientific journals and has a _very_ good relationship with a large number of publishers. They also list the Journal of Biological Chemistry and Science Online as "partners".
I wouldn't be so quick to dismiss the interest that the non-profit academic societies have in preserving this information.
I am Jack's complete lack of surprise.
I also concede many points in your very thoughtful reply, and I think we both agree that changes to the system would be an improvement. (I, for one, would start with journals that charge exhorbitant page-charge fees! It's not just the end-user who suffers...).
One difficulty with changing the system is that one is left with a "chicken and egg" dilemma. As a personal example, I only read about four or five journals in my research, and I only publish in those same journals. If I want to communicate to people in my field I need to publish in the journals that they read, so unless those journals change their way of doing business, I have little choice but to continue publishing in them.
I applaud the decision in the UK to make this information accessible on the net for free; perhaps convincing the journal publishers to provide low-cost access in public libraries to electronic versions of their journals may be a first step at providing the access you refer to.
Incidentally, in the case of scientific journals anyway, I think that most of what's published is of scant value to any except experts in the field, and these experts generally have access to a university or laboratory library. (I am a physicist, and I lack the background to get much out of at least 80% of Physical Review Letters, as an example). I doubt journal publishers would lose much revenue by providing such a service.
it was 3 in the morning, i was procrastinating to avoid a stupid program i have to write... (damn college work... who needs it anyway... at my job, i get my make my own projects...) of course, here i am, at work, procrastinating by posting to slashdot... interesting...
Think that was flamebait? You've obviously never met me in person...
$email=~tr/.@/
It's really good to see that this sort of issue is being considered so well in advance of full transferral to digital scientific communique. Unfortunately it has to be in advance, because the transition from paper publications to electronic form is moving along at the pace of a cold sloth. It hurts me every time i watch the multiple paper copies of manuscripts go in, watch the bills from the publishing company come back and several months later the paper is published in unmodifiable format. Open research is the future, and it needs to happen now. (further rants here. )
somebody bent my whookey.
I notice they're quite good at copying existing software and calling it a revolution, though ;)
--
It's a
-- Danny Vermin
They don't have to get any agreement from publishers.
The libraries have subscriptions. They pay library rates, which means they get to allow the public to read them. They are making archival copies, for which there is ample case law all the way up to the Supreme Court. What exactly is the problem?
Of course if some dirtball buys a dirtball law and a dirtball judge, anything can happen. When that happens there will be no more libraries.
Zax
-- We are Linux. Resistance is measured in Ohms.
Really, who cares? This isn't the burning issue. Fact is, any papers of any worth whatever will be downloaded and stored by others working in the same field. The University where the paper originates will certainly have a copy.
The real issue which everybody seems to be ignoring is that most scientific journals charge exorbitant subscription fees. Remember that in order to cover some very active fields (eg neuroscience) you need access to a dozen journals. This puts access to scientific papers completely our of most peoples' reach since most people don't live within a short walking distance of a University library.
Most published scientific research comes out of universities and government-funded laboratories and much of it is ultimately paid for by you the public. In any event most scientists today would surely agree that this most fertile fruit of human knowledge should belong to humanity at large, not a select few.
Us open source types often bang on about how the open source model of software development is theoretically sound because it closely resembles the peer-review model of scientific progress. But while with open source software absolutely anybody can obtain code through easily accessible channels and abslutely anybody is welcome to contribute, with mainstream scientific research not only are laymen mostly excluded from actively participating, they also generally can't even get to read the published details.
The whole system of publication in subscription-only journals is thoroughly outdated and completely inappropriate for the so-called "information age" IMHO.
Consciousness is not what it thinks it is
Thought exists only as an abstraction