Domain: oclc.org
Stories and comments across the archive that link to oclc.org.
Comments · 62
-
Digital Archives
I worked on a digital archive project at a library research institiute (OCLC). Digital archives are a royal pain. You first have to transfer the analog material to digital. Doable, but costly. Then you have to have a way of indexing it. And remember, we need an index scheme that can handle poetry, baseball cards, and music scores as well as gov't docs and books. Then you need to be able to store it. Finally, there is retrieval and display.
Now make it all last a zillion or two years. Any digital media we have today (tape, cd, etc.) might last 20 years if you are lucky. Even if you built a special purpose computer to store it, the silicon chips themselves might last only 20 years before they break down. If you can find a media that lasts, then you have to guarantee that the format will be readable. This requires that you archive the software that reads the format and perhaps the OS that the software runs on.
A digital library also loses a lot as well. If we archive the Domesday Book and lose the original, we have lost any opportunity to learn about the paper and ink technology of the original copy.
There is a branch of Library and Information Sciences that studies these problems. There have also been a couple of ACM CACM issues devoted to some of this. -
Where do I start? Where do I go?
The web fails as an information medium because people's most frequent questions when finding online information are "Where do I start?" and "Where do I go from here?"
As you might have learned once or more in any practical science or statistics class, Data is meaningless! For data to be useful it must be filtered and organized into information. The web has tons of data, but overall a very low percentage of it is usable information.
This is very evident in the overwhelming use of search portals -- Google, Yahoo, NorthernLignt, whatever. They put a very thin film of organization on web data. The sad fact is that the organization is hardly trustworthy.
Imagine all the books you've ever seen stacked in a warehouse. That's the web right now. The early idea was to organize web content using TLDs and meaningful URLs. You all know how successful that's been. The web needs meaningful, navigable data content organization.
Libraries have developed a marvelous content organization scheme. I can go to any library of any size, look in a master list for the topic call number, and then look at the numbers on the ends of the stacks and find any book in a matter of minutes. The only drawback is that a book may be in use, a drawback that the web will elminate. That very same numbering system should be immediately applicable to any web content. It's all there, folks. We just need to find a way to adapt it to online content.
This isn't a new problem. The libraries have done it for nearly a century now. Here's the queston. How do we implement it? -
Re:Corporate America steps up to the plate
I would have to say that they do. Librarians are right behind IT professionals in making the information age work. In fact, a good number of IT professionals are IT professionals. Some of them believe in open source too. A great example is the Prospero document delivery system. It's a set of Perl scripts for Apache with a Windows front end done in C. And it saves thousands of dollars over the competition.
Librarians are much more than just little old ladies (and even many who actually are little old ladies have quite impressive computer skills). And they provide a valuable service to keep information--in all of its forms--freely accessible for the public to use now, and for years to come. They are most definitely on our side in this fight.
-
Porn's not growing, at least not in proportionGuppy06 is wrong--porn's not growing any more than anythine else online. According to the Online Computer Library Center, the share of sites on the net that are porn was the same in 1998 and 2000 (2.3% of web sites), and dropped noticably in 1999 (1.9%).
I suspect porn probably peaked as share-of-sites five ir six years ago, when the net was mostly used by horny geekboydom, and not many other people.
The number of horny geekboys on the net has probably stayed close to the same, but women are now the majority of US Internet users, and the average age of net users is up too. (Can't find that stat.)
Enon
-
Re:There may be an innocent reasonI actually think this MS activity is fairly innocent, but your reasoning here (that its something like Purls) is all wrong. That facility is provided by a decent implementation of HTTP, to wit dealing with 301 responses as per section 10.3.2 of rfc2616:
10.3.2 301 Moved Permanently
The requested resource has been assigned a new permanent URI and any future references to this resource SHOULD use one of the returned URIs. Clients with link editing capabilities ought to automatically re-link references to the Request-URI to one or more of the new references returned by the server, where possible.
(my emphasis). If MS had set things up so that the URLs were like http://www.ms.com/redir?news_service_1 , and were switching between providers, then yes I would agree with you that this was a valid argument, but thats not what they're at.
As long as they dont mess with *my* bookmarks I don't mind, I've never yet felt the need to use the ones supplied by the browser vendors.
-
There is oclc.org
THis is a not for profit which works with libraries. I am not sure what they offer, I only know of them through my brother who works there. Might be worth looking into at www.oclc.org.
-
Re:worthwhile? yes
Well, people are already working on the XML spec. (More than one, I think.) There is, I believe, a MARC XML spec based on the normal MARC format, but that's not the most user friendly format around. (It's designed to be converted back and forth to MARC.) I also think Dublin Core can be expressed in XML, but I'm not sure.
-
Re:Can't be illegal....
Most libraries use WorldCat, also known as the OCLC union catalog. Unfortunately, it's not available for free to the general public. However, you don't say what kind of school you're at--if you're at a college or university, there's a very good chance that they have WorldCat, and I believe most colleges/universities have licensed it for anyone in the college/university to use at no additional cost. Check with a librarian.
-
Good intentions but reinventing the wheel?Yet another archiving solution? Ye-Gads!
Disclaimer: I have a MLIS and I used to work for an organization affiliated with OCLC. I now work for a wholly owned subsidary of Reed-Elsevier, who btw is not participating in this project that I am about to write about.
I completely understand the need to archive data/research, especially those found in STM journals (Science/Technology/Medicine). History has shown us the dangers of not being diligent in archiving AND it has shown us the difficulty with archive. There already exists an organization in the library community that is providing an excellent archiving solution. That organizatin is OCLC. They have been a repository since 1967. Starting out with archiving cataloging records and sharing them (for cost to preserve/maintain) to their membership.OCLC's archiving solution is called ECO, Electronic Collections Online, where a good number of publishers from around the world are supplying OCLC with digital copies of their journals to be maintained. Additionally as technolgy and storage media change, OCLC has taken the leadership in migrating that data to new standard formats as they evolve. Information on ECO may be found here and specifically information on the archiving is here and the participating publishers are here.
Of course everything has a cost. Any university that is taking on this type of activity should really do a serious study on why they are doing it, how much are they willing to spend, will they or future administrations continue funding their archiving project, or should they combine resources with agencies/organizations that are already doing this.
-
Good intentions but reinventing the wheel?Yet another archiving solution? Ye-Gads!
Disclaimer: I have a MLIS and I used to work for an organization affiliated with OCLC. I now work for a wholly owned subsidary of Reed-Elsevier, who btw is not participating in this project that I am about to write about.
I completely understand the need to archive data/research, especially those found in STM journals (Science/Technology/Medicine). History has shown us the dangers of not being diligent in archiving AND it has shown us the difficulty with archive. There already exists an organization in the library community that is providing an excellent archiving solution. That organizatin is OCLC. They have been a repository since 1967. Starting out with archiving cataloging records and sharing them (for cost to preserve/maintain) to their membership.OCLC's archiving solution is called ECO, Electronic Collections Online, where a good number of publishers from around the world are supplying OCLC with digital copies of their journals to be maintained. Additionally as technolgy and storage media change, OCLC has taken the leadership in migrating that data to new standard formats as they evolve. Information on ECO may be found here and specifically information on the archiving is here and the participating publishers are here.
Of course everything has a cost. Any university that is taking on this type of activity should really do a serious study on why they are doing it, how much are they willing to spend, will they or future administrations continue funding their archiving project, or should they combine resources with agencies/organizations that are already doing this.
-
How about Universal Decimal Classification?
I'we been wondering why none of the library classification systems have emerged on the net? Back in the good old days when I relied on the library for the information Universal Decimal Classification system was extremely handy. Even if you didn't know the name of the book you could browse thru a certain category that interested you.
The idea is that a book can belong to a single class that is marked by a decimal schema. Top categories are:
0 Generalities. Information. Organization.
1 Philosophy. Psychology.
2 Religion. Theology.
3 Social Sciences. Economics. Law.Government. Education.
4 (vacant)
5 Mathematics and Natural Sciences.
6 Applied Sciences. Technology. Medicine.
7 The Arts. Recreation. Entertainment. Sport.
8 Language. Linguistics. Literature.
9 Geography. Biography. History.The main categories are defined further down:
61 Medical Sciences. Health.
62 Engineering and Technology Generally.
63 Agriculture, Forestry, Stockbreeding,Fisheries.
64 Domestic Science; Household Economics. .....and further and further:
631 AGRICULTURE
631.1 Farm Management
631.15 PlanningThe classification would be used like KEYWORD meta tag in HTML and search engines would index it. This would enable user to specify word as well as the topic they are looking the information on.
To prevent the misuse of the classification, only one or two classes should be allowed per page. Like
"Marketing of agricultural products" -> 380.13:631
(38 = Trade. Commerce. Communication. Transport.)UDC is language independed and it has already been translated to numerous languages. Also most libraries use some kind of numerical classification so many people are familiar with the concept
To help page authors to classify their pages a special website could be created. It should contain at least
- Information about UDC and why it should be used
- Complete browsable UDC listing in various languages
- Easy to use "wizard" that guides you thru the classification and spits out the correct HTML-tag.
- UDC aware search engine
- Petition list for other search engines to enable UDC classification
How about it? Is it a good idea?
One major problem in the matter is that the UDC classification is copyrighted. I couldn't find more than a skeleton listing from the web! So the first step would be to negotiate the licence for it or to the competing Dewey Decimal Classification. I don't think it would be wise to start building a own scheme without negotiations since both UDC and DDC are in extensive use. But if everything else fails, Gnu Decimal Classification to the rescue!
More information about classification on internet see:The role of classification schemes in Internet resource description and discovery
-
I never metadata I didn't like...
Anyone ever heard of metadata? Instead of indexing every word in a document we should be capturing accurate, relevant metadata about it and facilitating the searching of that. As non-text content increases (like mp3s, videos, images, audio streams, animations, etc.) on the internet, the need for a new search paradigm increases as well. Of course there's the the Dublin Core, but much more interesting to me is the IEEE LTSC's work. Their metadata standard, currently at version 3.8, is very close to being finalized. In addition to providing general fields, it also includes some that supposedly facilitate the instructional use of the object of the metadata.