dchud · Slashdot Mirror

← Back to Users

User: dchud

dchud's activity in the archive.

Stories: 0
Comments: 3
First seen: 2000-08-22
Last seen: 2002-11-05
Profile: (view on slashdot.org)

Comments · 3

Topic misrepresented, and what you don't see on More Universities to Publish Courseware Online · 2002-11-05 08:25 · Score: 4, Informative

Maybe I should've been a little more wordy in the original post; I'm afraid the focus of these stories has been mostly misrepresented. FYI also, I helped work on the development of DSpace, so I'm biased, but since I no longer work there my remarks in no way should be taken as representing their official viewpoint. :)
The collaborative effort of the institutions mentioned, and the stories posted, are not primarily focused on courseware (although they are explicitly intended to support long-term storage and access to courseware materials). The goal of these efforts, which in these stories surround the DSpace project specifically, is to extend the range of services provided by these institutions, more specifically their libraries, to incorporate a scaleable model of digital shelf space. In other words, these are infrastructure efforts (so if you really are impressed by that part, don't bother reading on!).
At MIT Libraries, for instance, the main focus of their DSpace implementation is to capture the digital products of research conducted within the MIT community. This includes articles, books, technical reports, theses, datasets, audio files, videos, images, maps, and so on. Much like the existing physical library buildings and collections, these are to be organized according to how they can best serve the departments, labs, schools, and research centers at the Institute, which the new exception being that at first DSpace will focus on capturing materials generated locally, rather than selecting and collecting materials produced externally. Or worse, research materials that are generated locally by people at MIT, then given to publishers, and then sold back to the libraries at great cost. So from an infrastructure perspective, what they are trying to achieve is to extend the range of what libraries provide in terms of collections and services to now also include all kinds of digital materials, starting especially with digital materials created at MIT.
A few examples illustrate this best: first, consider the junior faculty member with her own articles on her department web page. We've all seen such web pages disappear within 1-3 years. What happens to her colleagues at other institutions who lose access to her articles, which maybe never got published in traditional outlets, but are nonetheless vital to their own work, and thereafter are reduced to so many broken bookmarks? At MIT, DSpace will take stewardship of those materials, giving them a persistent url and carefully recording descriptive, technical, and preservation metadata about the files and their formats. So in this case, DSpace takes that 1-3 year period of unreliable access and extends it to a minimum of 3-7 years of predictably reliable access. At this point in the web's history, you can't really get that anywhere else, and there's every reason to hope that number will really reach into the decades; it just can't be promised reasonably today.
A second example: an interactive, multimedia, experiential web resource administered by some professor on an aging redhat 6.0 machine under their desk. It's rich in data, it demonstrates a breakthrough in the state of the art, or the idea, in some nascent discipline, and it's widely used by scholars of that discipline, and it _can't_ be "just printed out". What happens when that machine blows a partition, or is comprimised because its amateur sysadmin is really a scholar, not a wizard?
Obviously, as indicated in the story, a good third example is courseware materials. If you look closely at OCW or the other well-known examples thereof, you'll see that in many ways, they are (IMHO) foremost publishing ventures serving the educational process. Getting the materials into standard form, getting them delivered by a deadline, keeping them viable during their relevant terms. Doing this so openly, and freely, is indeed very exciting. But every term that comes up introduces new classes, new upkeep, etc., and you have to have an answer for where the materials from the previous semesters' courses are going to land. There has to be infrastructure support for that, and having a service in the libraries providing long term persistent storage and access to do just that is an awfully good answer, if the tools, policies, and budget are in place to do that.
These examples were much better articulated by several of the excellent speakers at yesterday's launch event (sorry, couldn't find a link), and are increasingly recognized as very common and very troubling scenarios across academia. Once you think about what the technological requirements of providing that infrastructure are, it quickly comes clear that such initiative require solid, reliable software with lucid, maintainable designs, and no magic. After all, you could do it with just a filesystem, right? :) To get the services delivered properly, and in a way scholars can trust, however, you have to focus on developing policy, procuring budget, and delivering on an mission-driven focus of getting the service right and keeping it running. In other words, what you don't see behind the systems is the amount of non-technical work behind getting these things going, and making them sustainable.
The focus of the multi-institutional efforts is to expand, replicate, and formalize approaches to doing the same at many other large institutions where the impact can be equally significant. Seeing the level of public and private support of these efforts, and that there's a line in the sand now drawn with a software release marking a reliable starting point to answering the technical question, is quite exciting, and indeed is a breakthrough. If you really still think nothing new is being offered, and that DSpace isn't more than a stripped down sourceforge thinking like a card catalog, send me email and I'll direct your attention to a few folks at MIT and HP who will blow your mind with how well they've thought through and planned for these problems. :)
Re:simpler and more complex than you'd think on Open Source Library Card-Catalog Apps? · 2000-08-22 09:29 · Score: 1

Well, thanks, yeah, I work on this stuff a lot. :)
There are already a number of useful Free-as-in-Speech add-ons to proprietary library tools, such as Prospero and DBA. These are largely possible because they use the standardized bits in the tools to which they add value: z39.50, TIFF, etc. As you suggest, it's definitely a niche which is proven to work, with these small solutions paying off big-time to many institutions. This might seem off-topic, but the more of these small tools we have the easier it gets to start hooking them together into environments like the one the original ? poster is asking about.
I think the answer to the "why's Z so slow?" dilemma is like what Larry Wall (I think) said about Perl and Python: that Perl's worse than Python because people wanted it worse. Work on z39.50 began _long_ before SQL92 hit the markets in working products.
A key area where many vendors and publishers are starting to work together around new open standards and even code is content linking, mostly because they have to. They don't necessarily want to, but _not_ allowing linking to external sources diminishes value, and they're all catching on finally. So there's some hope, but we've got to keep hacking to keep them honest. Trust me, it works -- when a long-proprietary-code/data-vending .com sees 600 lines of GPL'd perl which can kill off their product line, they're more than willing to start offering up more interoperability if not freeing up their code. It's better for everyone.
simpler and more complex than you'd think on Open Source Library Card-Catalog Apps? · 2000-08-22 04:45 · Score: 4

There are about three problems here (hopefully they won't moderate me down for this cuz I work for oss4lib.org :). The simpler bits have to do with the mindset of librarians: liberal about access, conservative about library collections. Since an online card catalog is about the collection, we librarian types tend to forestall any major systems overhauls until the last possible moment. And our systems vendors only have about a $500M business to sell to, so the general mindshare remains rivalrous, proprietary, dedicated to supporting legacy apps, and lacking overflow of hacker talent. Thus our systems generally suck and few are willing to admit it out loud.
Second is that half of the pieces that go into a big library management system (including the catalog part) are really generic business systems: EDI, invoicing, accounting, etc., but they haven't been abstracted out of the realm of our systems vendors. So the level of standards followed there is minimal so those modules generally don't interoperate with our trading partners (i.e. internal payment systems and external suppliers). Lots of redundant keying and more crappy systems to maintain there, all of which is typically deeply and proprietarily tied into the catalog data.
All that said -- and to our vendors' credit they are tending to get better these days -- we've been sharing catalog data like hackers are sharing code for over 100 years. We've been doing it online for about 35 years, but the way we do it now is pretty much the same way we've been doing it for those 35 years. i.e. largely dependent on one of two .orgs/vendors to be a clearinghouse for sharing catalog data. But those folks disappear if they can't sell the data back to us after we create it for them. So nobody running a library wants them to disappear. Especially because we've got to handle one-of-a-kind rare items in big research libraries as well as unusual local items in public libraries and so on.
Imho the solution is to first outsource all the standard business stuff to vendors+free software that can do the same job with existing standards-based tools. Then abstract away as much as possible of the catalog data into free references sources shared and maintained by the library community (think: you could run your own amazon.com recommendations site, etc.). This is what we're trying to do (shameless plug alert) with the jake project for journals. Same thing applies for books, although there are probably >=100M records to normalize.
If we can get that done, then anybody could hack up a gtk+ front end to the free, shared catalog, and pick and choose the items you have yourselves. It would work sorta like dict.org or jake. Just imagine how much easier it will be to search for ebooks in gnutella once this is done... :)