Domain: oclc.org
Stories and comments across the archive that link to oclc.org.
Comments · 62
-
Probably NAT or proxy related
This is most likely proxy-related.
Google human-detection / anti-SPAM efforts are IP based and unless you're authenticated against google there's a very high chance you entire institution is being seen as a single entity. This is usually related to campus level NATing.
There is a variant which is the result of a well-intentioned librarian putting google scholar behind EZproxy ( https://www.oclc.org/support/s... ).
-
The Basics
I'm an archivist at a mid-sized university archives, trying to develop a policy for archiving computer files ('born-digital records' in archival parlance).
Get Your Bits Off (Old Storage Media)
-
Re:Voters Filter Library Funding
Voters seldom get to vote on budgets in most cities, and certainly not at the line item level. Bond issues would certainly come under closer scrutiny especially in this economy. Federal funding of libraries is under close scrutiny over the last year.
On the other side, a Gates foundation study found that people who find libraries "transformational" rather than simply "informational" are more likely to vote for more funding. (What transformational means is unclear).
Still there is a great layer of insulation between the library board and the tax payer.
It may take years, but funding support will wane when local tax payers figure out they are funding something other than a quiet place to read where they feel comfortable sending their kids. I suspect it would not take much, perhaps only a rows computers filled by unwashed geezers in sweatpants with only one hand on the keyboard, before the library board will choose to sequester the material to specific parts of the library, if merely to preserve the library itself.
-
Re:Now there are two gaps ..
Let's say
.1% of the population if properly educated will help to advance mankind. It may be understanding a disease or how a virus mutates that leads to better treatments. It may be the engineering (math, physics, chemistry) of the next addition to our infrastructure (roads, water, electricity, phone, internet etc.).
Personally, I think teaching millions of children who would rather be ignorant is a fair price to pay for the thousands that advance our civilization. Correlation may not equal causation but a quick look at per capita education spending does not support the view that a wide swath of knowledge has a negligible affect on your life. -
Feature or bug?
This used to be considered something that was potentially a Good Thing. To help prevent link rot and redesigns from breaking links, people thinking a lot about Hypertext came up with initiatives like PURL's: http://purl.oclc.org/docs/
Now that the primary usage of these redirects are simply to shorten links to something more convenient, we're using the same tech (a 301) and using it in different ways. One question is, how many people use the "custom link name" feature of tinyurl.com vs, simply let a random string of text be used? And, will a service start letting us update link destinations after the fact (like the original purl site did)? If so, how do you prevent nefarious uses of this (like moving it to goatse after it's memeing about)?
In terms of the filtration-for-tracking-purposes? That horse has left the barn already; I'm more concerned with final destinations not being recorded over time for posterity. These redirect services are totally interchangable anyway... as soon as one starts using interstitals, people will move to another one.
-
Re:This is hardly anything new
...our underfunded schools in the US can't afford it...
Nice myth... but the US spends more in education than anyone else... and has much less to show for it. Don't tell me the problem with the US educational system has anything to do with the money spent.
-
Re:They can claim....
Theoretically, you are correct except for the fact that they compiled that "standard library card catalog" (hereafter referred to as Dewey Decimal Classification or DDC) system initially, making the numbering the original work. http://www.oclc.org/dewey/ [oclc.org]
Again, they can claim anything. That does not mean that they are right.
Since they originally compiled the books and created the numbering system
i don't think it is accurate to say that "they" "created" the numbering system, the DDC was created in 1876. At issue would be whether or not adding the DDC constitutes an original work.
I would still say that it is mere aggregation and not a "new" creative work.
-
Re:They can claim....
You said:
"A database of books based on the standard library card catalog is not something whose collection would be protected by copyright."Theoretically, you are correct except for the fact that they compiled that "standard library card catalog" (hereafter referred to as Dewey Decimal Classification or DDC) system initially, making the numbering the original work. http://www.oclc.org/dewey/
Since they originally compiled the books and created the numbering system (thereby creating a unique database which is no longer simply facts, known as the DDC), your post amounts to a whole lot of "whoosh" followed by an astounding amount of nothingness.
If anyone is to blame for this, it is the libraries who license that database instead of something a little more free, in all respects.
-
Might be able to read paper here...
Obnoxiously this alleged scholarly research is not available for free, so we'll just have to speculate wildly what it says based on the abstract.
I do think it's important to understand how people respond to something as common in today's lives as computer games. It's important to someone; other research might not be important to you, but science isn't always about finding a practical application for everything right away. Science sometimes just asks "why?" or "what if?".
As for being freely accessible, the research wasn't funded by NIH, so its rules requiring publicly-funded-by-US-taxpayers research to be released under Open Access doesn't apply. The paper states that "This study was supported by the Finnish Funding Agency for Technology and Innovation and European Community NEST project 28765: "The Fun of Gaming: Measuring the Human Experience of Media Enjoyment.""
Fortunately, the trend these days seems to be more toward open access in the past, so have patience, young Padawan...
I can get to the article, because I work at a university. This link may or may not work for you:
FirstSearch: Full Text -
This sounds great, but it's not.
It's actually a corporate interests doing a land-grab on state owned resources.
Quite some time ago, the state of Ohio began building a new, high speed internetwork that was paid for by taxpayers. This network was supposed to be available only to research and nonprofit institutions like universities, non-profit hospitals, and so forth. This network had strict access standards, and getting your organization connected (unless you were someplace like Ohio State University) wasn't easy to do. Even companies like OCLC were not permitted to connect to the network. Commercial use of the network was strictly prohibited by charter. It was a good thing for encouraging research and collaboration between research institutions in the state of Ohio.
Not too long ago a few entrepreneurial types decided that if they could just tap into that high-speed network, they could circumvent the telcos and resell access to that network as a broadband data network. Except that doing so would be against the charter, and basically equate to corporate welfare. But they weren't discouraged, because the current governor was on his way out of office, and they spent lots of money on lobbyists who wound up taking roles as technology advisors to the campaigns for both of the major candidates for governor.
I know this because the for-profit hospital that I was employed by at the time was actually approached by this new company about buying access to this high-speed network. At the time we asked them how they planned to pull it off, because we knew that they couldn't legally resell this network access, even if they could get it. Their response was "the next governor will be receptive to our business ideas and change the rules." Since the election hadn't happened yet, we asked them if they knew something about the voting machines that we didn't, and their response was that they had basically convinced both of the two major candidates to see things their way. We were not impressed, not just because we thought that the whole deal was morally questionable but also because the people who approached our company about it came across as extremely sleazy. After meeting with us once about it (which got a very tepid response), they began using our hospital's name in marketing materials for the community that we were located in as if we had already signed on to the project (presumably to convince other businesses that it was a good idea).
So now it's finally happened. We have a new governor, and he's OK'd these new companies to take the high-speed research network away from the institutions that we, the taxpayers, built it for and handed it to businesses that just want to make a fast buck off of it. On one hand, I'm appalled that a state funded, maintained, and sponsored resource could be co-opted by corporate interests and taken from it's intended purpose. On the other hand, I know that our AT&T sales rep was very concerned about this effort, and usually anything that pisses in AT&T's coffee is a good thing. So do I oppose it because it's morally wrong, or do I support it because it could hurt AT&T? -
The US spends more on education than any country
Same for health care.
http://ucatlas.ucsc.edu/spend.php
http://www.cmwf.org/publications/publications_show .htm?doc_id=372221
http://www.nationmaster.com/graph/hea_spe_per_per- health-spending-per-person
http://thebluesite.com/ustopseducationspend.htm
http://www.oclc.org/reports/escan/economic/educati onlibraryspending.htm
http://www.nationmaster.com/graph/edu_tot_exp_as_o f_gdp-education-total-expenditure-gdp
And yet, both are getting worse. MAYBE spending more isn't the answer...
By the way, I love the Anti-US troll. I can't get enough of the pandering. -
Re:Google's got a long way to go . . .
Most libraries' collections are very similar to most other libraries' collections, and the greatest overlap occurs with the books that are the most important.
Because the original Google 5 libraries have their holdings entered into WorldCat, a statistical study was done that showed that those five libraries would account for 33% of the 32 million books in that database. It also showed that 61% of the books held by the Google 5 are uniquely held by only one library. Essentially, the holdings of libraries follows a common pattern of a short high followed by a very long tail. If, even with their long tails, these 5 major libraries account for only 1/3 of books that libraries have entered into WorldCat, imagine how many libraries it will take to find and digitize the long tail of that one bibliographic database.
Less ephemeral works (the kind typically preserved in library collections a century later) generally all had their copyrights renewed in the U.S
The rate of copyright renewal was very low. According to Lessig ("Free Culture" p. 135) "In 1973, more than 85 percent of copyright owners failed to renew their copyright." I've seen estimates that about 90% of the books published between 1923 and 1978, when renewal was abolished, were never renewed. That means that there are MANY public domain books in that time frame, only we can't easily know which ones they are. You can look them up in the renewal database, but my impression is that the database is not considered to be complete, and therefore not entirely reliable. If you find the book in the database, it was renewed. If not...
-
Worldcat; one sly fox?While it seems like a neat concept on the surface, I'm not sure I'm a fan.
The link you provided was dead for me though, although this worked:
http://www.oclc.org/worldcat/
The whole thing looks rather suspiciously proprietary; in order to get access and be able to search directly, you have to pay -- or be a member of a library that does. Basically what they're doing is getting libraries to contribute their electronic catalogs to the database, and then selling access to the resulting data -- BACK to the libraries that contributed! Not a bad business model, all they have do do is maintain the hardware and database, and watch the information and cash flow in. With every contribution, what they have becomes more valuable.
From http://www.oclc.org/worldcatsets/about/cooperative /default.htmWhile OCLC catalogers create some Collection Sets, most are built by OCLC member libraries, which have purchased a predetermined content set from a publisher and cataloged the set using OCLC cataloging tools in order to make it available to you.
... If you are an OCLC member institution, you can contribute a Collection Set of records your staff has cataloged. OCLC will set up a special authorization to allow you to input the records into WorldCat at no charge.Putting data IN to their system, that's free (naturally); getting anything useful out doesn't seem to be quite so easy, or cheap.
At least not directly. It seems that they have partnered with some web sites in a program called Open WorldCat to share their content, including with Google Scholar and Google Books, but there's apparently no direct public access. The closest I could get was by searching Google Scholar for a term, looking for the [BOOK] results, then clicking on the "Library Search" link, which took me to an Open WorldCat page.
The link to the Open WorldCat page doesn't use a human-readable link, either; it looks like a hash of some sort. For example, the Library link from the Google result for P.L. George's "Automatic mesh generation: application to finite element methods" is this:
http://www.worldcatlibraries.org/wcpa/ow/f9f4fc530 c1c64e2a19afeb4da09e526.html
Maybe someone can figure out what hash they're using and provide a way to search them directly; just in case anyone was wondering, doing a Google search for "mesh generation" site:worldcatlibraries.org doesn't return anything.
I like their concept in terms of unifying all the library records, but I really am uncomfortable and frankly put off by their obvious and shameless attempts to monetize what ought to be a public resource. I'm glad it's at least searchable through Google, but their web site makes it clear that they'd much prefer you pony up some great and unspoken (of course there's no price listed, so we can only guess) wad of cash to get at their database.
I suppose that their partnership with the likes of Google and Amazon is a step above totally proprietary databases that are 100% pay-to-play, but I still find the concept of any database that's build up almost entirely from contributions by tax-supported Public Libraries doesn't have a globally accessible direct interface to let people search it. Plus, it's not clear that the information that you can search via Google is even their whole catalog: "Open WorldCat returns only the holdings of OCLC member libraries that subscribe to the WorldCat database on FirstSearch." Assumedly, the database that you pay for is more complete, a -
Worldcat; one sly fox?While it seems like a neat concept on the surface, I'm not sure I'm a fan.
The link you provided was dead for me though, although this worked:
http://www.oclc.org/worldcat/
The whole thing looks rather suspiciously proprietary; in order to get access and be able to search directly, you have to pay -- or be a member of a library that does. Basically what they're doing is getting libraries to contribute their electronic catalogs to the database, and then selling access to the resulting data -- BACK to the libraries that contributed! Not a bad business model, all they have do do is maintain the hardware and database, and watch the information and cash flow in. With every contribution, what they have becomes more valuable.
From http://www.oclc.org/worldcatsets/about/cooperative /default.htmWhile OCLC catalogers create some Collection Sets, most are built by OCLC member libraries, which have purchased a predetermined content set from a publisher and cataloged the set using OCLC cataloging tools in order to make it available to you.
... If you are an OCLC member institution, you can contribute a Collection Set of records your staff has cataloged. OCLC will set up a special authorization to allow you to input the records into WorldCat at no charge.Putting data IN to their system, that's free (naturally); getting anything useful out doesn't seem to be quite so easy, or cheap.
At least not directly. It seems that they have partnered with some web sites in a program called Open WorldCat to share their content, including with Google Scholar and Google Books, but there's apparently no direct public access. The closest I could get was by searching Google Scholar for a term, looking for the [BOOK] results, then clicking on the "Library Search" link, which took me to an Open WorldCat page.
The link to the Open WorldCat page doesn't use a human-readable link, either; it looks like a hash of some sort. For example, the Library link from the Google result for P.L. George's "Automatic mesh generation: application to finite element methods" is this:
http://www.worldcatlibraries.org/wcpa/ow/f9f4fc530 c1c64e2a19afeb4da09e526.html
Maybe someone can figure out what hash they're using and provide a way to search them directly; just in case anyone was wondering, doing a Google search for "mesh generation" site:worldcatlibraries.org doesn't return anything.
I like their concept in terms of unifying all the library records, but I really am uncomfortable and frankly put off by their obvious and shameless attempts to monetize what ought to be a public resource. I'm glad it's at least searchable through Google, but their web site makes it clear that they'd much prefer you pony up some great and unspoken (of course there's no price listed, so we can only guess) wad of cash to get at their database.
I suppose that their partnership with the likes of Google and Amazon is a step above totally proprietary databases that are 100% pay-to-play, but I still find the concept of any database that's build up almost entirely from contributions by tax-supported Public Libraries doesn't have a globally accessible direct interface to let people search it. Plus, it's not clear that the information that you can search via Google is even their whole catalog: "Open WorldCat returns only the holdings of OCLC member libraries that subscribe to the WorldCat database on FirstSearch." Assumedly, the database that you pay for is more complete, a -
Worldcat; one sly fox?While it seems like a neat concept on the surface, I'm not sure I'm a fan.
The link you provided was dead for me though, although this worked:
http://www.oclc.org/worldcat/
The whole thing looks rather suspiciously proprietary; in order to get access and be able to search directly, you have to pay -- or be a member of a library that does. Basically what they're doing is getting libraries to contribute their electronic catalogs to the database, and then selling access to the resulting data -- BACK to the libraries that contributed! Not a bad business model, all they have do do is maintain the hardware and database, and watch the information and cash flow in. With every contribution, what they have becomes more valuable.
From http://www.oclc.org/worldcatsets/about/cooperative /default.htmWhile OCLC catalogers create some Collection Sets, most are built by OCLC member libraries, which have purchased a predetermined content set from a publisher and cataloged the set using OCLC cataloging tools in order to make it available to you.
... If you are an OCLC member institution, you can contribute a Collection Set of records your staff has cataloged. OCLC will set up a special authorization to allow you to input the records into WorldCat at no charge.Putting data IN to their system, that's free (naturally); getting anything useful out doesn't seem to be quite so easy, or cheap.
At least not directly. It seems that they have partnered with some web sites in a program called Open WorldCat to share their content, including with Google Scholar and Google Books, but there's apparently no direct public access. The closest I could get was by searching Google Scholar for a term, looking for the [BOOK] results, then clicking on the "Library Search" link, which took me to an Open WorldCat page.
The link to the Open WorldCat page doesn't use a human-readable link, either; it looks like a hash of some sort. For example, the Library link from the Google result for P.L. George's "Automatic mesh generation: application to finite element methods" is this:
http://www.worldcatlibraries.org/wcpa/ow/f9f4fc530 c1c64e2a19afeb4da09e526.html
Maybe someone can figure out what hash they're using and provide a way to search them directly; just in case anyone was wondering, doing a Google search for "mesh generation" site:worldcatlibraries.org doesn't return anything.
I like their concept in terms of unifying all the library records, but I really am uncomfortable and frankly put off by their obvious and shameless attempts to monetize what ought to be a public resource. I'm glad it's at least searchable through Google, but their web site makes it clear that they'd much prefer you pony up some great and unspoken (of course there's no price listed, so we can only guess) wad of cash to get at their database.
I suppose that their partnership with the likes of Google and Amazon is a step above totally proprietary databases that are 100% pay-to-play, but I still find the concept of any database that's build up almost entirely from contributions by tax-supported Public Libraries doesn't have a globally accessible direct interface to let people search it. Plus, it's not clear that the information that you can search via Google is even their whole catalog: "Open WorldCat returns only the holdings of OCLC member libraries that subscribe to the WorldCat database on FirstSearch." Assumedly, the database that you pay for is more complete, a -
It's called OCLC
... and it's been around for the last 30 years or so. And it has something like 60 million records, created by professional librarians. Done already.
-
don't forget library resources
I'm a big fan of Amazon.com when looking for book information, but I'd also like to point out that public libraries often pay for access to book databases for their patrons, many of which can be accessed from home.
My library subscribes to Novelist and Novelist K-8, which can be awesome when looking for fiction.
Many libraries also pay for patron access to the Books in Print database.
Finally, if you're determined enough, you can find some interesting things in WorldCat, the union catalog of OCLC libraries. This is now searchable from Google and other places. -
Worldcat, of course.
Worldcat. http://www.oclc.org/worldcat/default.htm/ . 65 million items. No more and no less than a unified catalog of mayor libraries, in the US and beyond, unified on the basis of sharing open-format records (MARC), that obey clear standards of bibliographic description and classification, developed and proven thru many decades. AARC2, LCSH, etc. Where cataloguers have gone thru the pain of researching who is who, what is what and where is where. And not just books, but serial publications, maps, sound recordings, pictures, computer files, and those weird things called realia. I love library catalogs and the cataloguers that make them. (Library reference zombies, and Library managers, that is another story).
-
Re:The Dewey Decimal System
I couldn't believe that was really true, so I went & found the OCLC site on Dewey and amazingly, they do claim to own it. Copyright really is forever...even if it's FROM 1870!!! Sigh.
Found these two statements on that web page, which to my mind are contradictory, even with 100 year copyright terms (are they 200 years now??)
The Dewey Decimal Classification (DDC) system, devised by library pioneer Melvil Dewey in the 1870s
--
All copyright rights in the Dewey Decimal Classification system are owned by OCLC. -
Re:LoC vs. Dewey Decimal
Public and many school libraries use Dewey because they put their holdings in an OCLC database called Worldcat, which then lets them request items from other libraries doing the same. You also have to pay OCLC a "licensing fee" for Dewey, which irritates me to an irriational degree. (They sued some hotel for using it without paying and won.)
Most other people use LoC because, well, it's what the LoC uses. :) There are only 2 major systems, I don't see much wrong with knowing at least a little about both. (though the books that cross reference them could definately use an update.) -
Re:The Dewey Decimal System
Are you maybe thinking of the Library Hotel which got into trouble with OCLC, who owns the trademark / copyright to the Dewey Decimal System?
As far as I can tell the only cost you might run into in trying to categorize with the Dewey Decimal system is if you want to purchase one of OCLC's classification indexes.
But maybe you're thinking of a different instance in which OCLC required payment for use of the classification system for a small private collection. If that's the case, I'd be genuinely interested to hear more about it.
-
who uses the internet the most
> Do you have any supporting stats?
Not exactly what you're asking for, but:
http://www.oclc.org/research/projects/archive/wcp/ stats/intnl.htm
Also the majority of the top web sites are in the US.
As for who uses the net the most, a few years ago it was about evently split between ~50% US and ~50% rest of the world. But I'm sure it has changed since then as the net continues to grow in the rest of the world. Obvously the US number will fall over time, since it started out as 100% of the people on the net were in the USA. (That wasn't so long ago, I remember it well). -
Re:Isn't it obvious...
OCLC's breakout of webservers per country as of 2002
I'd love to see a more recent compilation like this, but if true, then the US is increasing their share of websites, while those in EU states are decreasing. If still trending this way, the EU will effectively lock themselves out of the majority of the Internet if this does occur and subsequently fails. -
Re:MP3 devices will sort music by the Dewey Decima
The Dewey Decimal system is copyright and trademarked by OCLC and they have been known to threaten people with legal action. I kid you not...
-
Re:Python will kill RubyI spent the past couple of weekends working on a messageboard in Rails. I don't know about the "ten times faster development" claims... but I do feel like I'm getting around three times as much done versus working in PHP -- and I already knew PHP, but just started to pick up Ruby a couple of weeks ago. When they say this framework "fits your brain", they really mean it.
The Rails folks are very good at marketing -- but they surely haven't forgotten to put a solid product behind that buzz.
As for Ruby losing to Python? Well...
At work, we're in the middle of re-implementing OCLC's PURL redirection server (which is a tasty casserole of Perl, C, and god only knows what else). With the goal of demonstrating that we don't need our own private copy of Apache (as OCLC uses), a pile of ReWrite rules, and an army of Perl scripts to work with its Berkeley DB backend, I threw together a quick demo using Ruby's WEBrick servlet and connected it to PostgreSQL. Thankfully, I was able to persuade the decision-makers that a scripting language and an RDBMS are a reasonable solution to our problem... but their attitude toward Ruby was similar to yours. "I dunno, I haven't heard much about it, let's use something else."
We settled on Python, which, of course, has its own SimpleHTTPServer which fills roughly the same niche as WEBrick. But it's slower, it dies if you throw too many concurrent connections at it, and its built-in methods are far cruder than those of its Ruby counterpart. I'm going to have to write a lot more code to pull it off in Python.
Obviously this is an anecdotal example... but I just keep coming across things in Ruby that simply make more sense, and just work better than they do in other systems. After a couple of weeks, I'm certainly sold -- even though $PREFERRED_LANGUAGE will keep paying the bills, Ruby is a great tool to have at my disposal.
-
Re-re-explained
Okay, so basically this is the problem: when Google encounters a status 302 redirection (as opposed to the status 301 redirection) it then indexes the content as belonging to the initial URL, not the URL at the end result of the 302 redirection. Other things happen later because of google's design.
302 redirections are temporary redirections - the idea is that a 302 is supposed to be used when someone needs to be redirected to a new page, but should still use the original URL if they want to come back later. As an example, the page http://purl.oclc.org/OCLC/PURL/CONTRIBUTORS performs a 302 redirect to http://purl.oclc.org/docs/contributors.html. This means that although your web browser needs to go to some other URL for the content at the moment, they really should remember the first url as the permanent one.
Contrast this with what happens when your browser visits http://snowplow.org/martin - you get sent a 301 redirect to http://snowplow.org/martin/. (Note the extra slash) In this case, the server is saying "the url with the slash on the end is the real location, and you should not try to come back here without the final slash in the future."
Ideally, if every web browser behaved according to spec., bookmarks (remember bookmarks?) would get automatically updated to the new URL when you selected them and the redirect was a 301 redirect. However, for a 302 redirect, the bookmark would stay as is.
302 redirects can be very useful when you want to set up a hierarchy of "logical" URLs that will permanently point to the correct location. 301 redirects are useful when you're obsoleting an old URL and wish people to go and use the new URL from now on.
Okay, so how does this relate to google? Well, let's suppose that you have a great site on fruitbats. I can set up http://www.example.com/topics/fruitbats to be a 302-style redirect to your site, essentially saying "The information at http://www.example.com/topics/fruitbats is temporarily being hosted by http://www.yoursite.com/". Now, google when it spiders pages will see that, will go retrieve the text from your page and will then index it under http://www.example.com/topics/fruitbat, since after all I just gave a temporary (302) redirect.
But it gets worse, because a final part of google's indexing process is to compare pages for identical text, and throw out all but one of the URLs. Apparently this stage has nothing to go on other than the text and the recorded URLs, and so your URL stands a fifty-fifty chance of being thrown out.
Except that I've not just redirected http://www.example.com/topics/fruitbats to your site, but also http://www.example.com/topics/fruitbat, http://www.example.com/topics/fruit_bat, and http://www.example.com/topics/fruit_bats. Now your lone URL doesn't stand much of a chance of being the one kept by the "throw out duplicates" processor, does it?
In a sense, of course, there's little google can do to prevent this, because even if they weighted 302-redirects lower in their "throw out duplicates" stage, I could always just go snag a copy of your website each time googlebot visits, in essence doing the redirection myself. (How? Just search the apache mod_rewrite guide for "Dynamic Mirror") However, doing it through 302 redircts means that google pays for the bandwidth to go get your page, not me. (Not that this is necessarily a signficant amount of bandwidth, since we're only talking about basic google here and not images. Depending on the revenue you get by misdirecting google queries it might be economical)
Of course, for this to really work, I'd need a list of websites sorted by category to build up my redirect db. But wait! The ODP feed provides exactly that.
I am a little bit wary of doi -
Re-re-explained
Okay, so basically this is the problem: when Google encounters a status 302 redirection (as opposed to the status 301 redirection) it then indexes the content as belonging to the initial URL, not the URL at the end result of the 302 redirection. Other things happen later because of google's design.
302 redirections are temporary redirections - the idea is that a 302 is supposed to be used when someone needs to be redirected to a new page, but should still use the original URL if they want to come back later. As an example, the page http://purl.oclc.org/OCLC/PURL/CONTRIBUTORS performs a 302 redirect to http://purl.oclc.org/docs/contributors.html. This means that although your web browser needs to go to some other URL for the content at the moment, they really should remember the first url as the permanent one.
Contrast this with what happens when your browser visits http://snowplow.org/martin - you get sent a 301 redirect to http://snowplow.org/martin/. (Note the extra slash) In this case, the server is saying "the url with the slash on the end is the real location, and you should not try to come back here without the final slash in the future."
Ideally, if every web browser behaved according to spec., bookmarks (remember bookmarks?) would get automatically updated to the new URL when you selected them and the redirect was a 301 redirect. However, for a 302 redirect, the bookmark would stay as is.
302 redirects can be very useful when you want to set up a hierarchy of "logical" URLs that will permanently point to the correct location. 301 redirects are useful when you're obsoleting an old URL and wish people to go and use the new URL from now on.
Okay, so how does this relate to google? Well, let's suppose that you have a great site on fruitbats. I can set up http://www.example.com/topics/fruitbats to be a 302-style redirect to your site, essentially saying "The information at http://www.example.com/topics/fruitbats is temporarily being hosted by http://www.yoursite.com/". Now, google when it spiders pages will see that, will go retrieve the text from your page and will then index it under http://www.example.com/topics/fruitbat, since after all I just gave a temporary (302) redirect.
But it gets worse, because a final part of google's indexing process is to compare pages for identical text, and throw out all but one of the URLs. Apparently this stage has nothing to go on other than the text and the recorded URLs, and so your URL stands a fifty-fifty chance of being thrown out.
Except that I've not just redirected http://www.example.com/topics/fruitbats to your site, but also http://www.example.com/topics/fruitbat, http://www.example.com/topics/fruit_bat, and http://www.example.com/topics/fruit_bats. Now your lone URL doesn't stand much of a chance of being the one kept by the "throw out duplicates" processor, does it?
In a sense, of course, there's little google can do to prevent this, because even if they weighted 302-redirects lower in their "throw out duplicates" stage, I could always just go snag a copy of your website each time googlebot visits, in essence doing the redirection myself. (How? Just search the apache mod_rewrite guide for "Dynamic Mirror") However, doing it through 302 redircts means that google pays for the bandwidth to go get your page, not me. (Not that this is necessarily a signficant amount of bandwidth, since we're only talking about basic google here and not images. Depending on the revenue you get by misdirecting google queries it might be economical)
Of course, for this to really work, I'd need a list of websites sorted by category to build up my redirect db. But wait! The ODP feed provides exactly that.
I am a little bit wary of doi -
Re:how lazy have we become?
They're probably busy updating the DDC
... it's not going away any time soon. The 22d edition came out last year.
DDC -
Passing off
And not the first time.
The same guy, John Guagliardo, World eBook Library, also runs NetLibrary.NET. There is a netLibrary.COM, owned by OCLC, not Guagliardo, which sells access to online books, including framed HTML versions of Project Gutenberg texts, to libraries.
The search at Project Gutenberg 2 takes you off-site to the same search used by NetLibrary.NET.
Do a search for:
yet againCompare ***The Project Gutenberg Etext of Yet Again, by Max Beerbohm***, at World eBook Library
Yet Again, at World eBook Library, and
Yet Again, at Project Gutenberg. Basically World eBook Library strips out the Project Gutenberg license and slaps their own copyright on it. -
Re:Libraries, right.
> But seriously, who checks out books at libraries anymore?
um, quite a few people
U.S. libraries circulate 1,947,600,000 items a year
Each day, U.S. libraries circulate nearly 4 times as many items as amazon
Five times more people visit U.S. public libraries each year than attend U.S. professional and college football, basketball, baseball and hockey games combined.
all from here (google cache) and here (original PDF)
And no, I'm not a library geek, I was just appalled at the naivety of your statement, and googled for those stats. -
Example case requires Dewey Decimal license fee!
The Dewey Decimal System is a highly protected trademark of Online Computer Library Center -- use it without paying a license fee, and they'll sue you (another story)
From their FAQ: May I use the DDC to organize information on my Web site?
The DDC is owned by OCLC Online Computer Library Center, Incorporated ("OCLC"). We do consider licensing arrangements for the DDC database. To request a licensing proposal, please send an e-mail message to DeweyLicensing@oclc.org, describing in detail your proposed use of the DDC. -
Example case requires Dewey Decimal license fee!
The Dewey Decimal System is a highly protected trademark of Online Computer Library Center -- use it without paying a license fee, and they'll sue you (another story)
From their FAQ: May I use the DDC to organize information on my Web site?
The DDC is owned by OCLC Online Computer Library Center, Incorporated ("OCLC"). We do consider licensing arrangements for the DDC database. To request a licensing proposal, please send an e-mail message to DeweyLicensing@oclc.org, describing in detail your proposed use of the DDC. -
Example case requires Dewey Decimal license fee!
The Dewey Decimal System is a highly protected trademark of Online Computer Library Center -- use it without paying a license fee, and they'll sue you (another story)
From their FAQ: May I use the DDC to organize information on my Web site?
The DDC is owned by OCLC Online Computer Library Center, Incorporated ("OCLC"). We do consider licensing arrangements for the DDC database. To request a licensing proposal, please send an e-mail message to DeweyLicensing@oclc.org, describing in detail your proposed use of the DDC. -
Re:Let 'em know.Your mailto is wrong. It should be webinput@oclc.org.
~~~
-
Re:Bullshit
Says here it's for "other classifications." It looks like it's used for state documents mostly. I don't see any mention of forbidden books.
-
A little more info...
Here's what I was able to find on their web site...
- The Dewey system is constantly evolving. This is not like a copyright on an old text, this is more lika the copyright on a piece of software.
- I was unable to find any licensing policies, but was able to see that a yearly subscription to the web version of their system(as opposed to buying a $475 book that they update every 7 years) was around $500.
- They have some interesting OAI hooks into the dewey system for your perusal (OAI servers, etc). It's all Open, by the way.
- I have an email in to DeweyLicensing@oclc.org asking them about this... I'll post a followup if I hear back. -
A little more info...
Here's what I was able to find on their web site...
- The Dewey system is constantly evolving. This is not like a copyright on an old text, this is more lika the copyright on a piece of software.
- I was unable to find any licensing policies, but was able to see that a yearly subscription to the web version of their system(as opposed to buying a $475 book that they update every 7 years) was around $500.
- They have some interesting OAI hooks into the dewey system for your perusal (OAI servers, etc). It's all Open, by the way.
- I have an email in to DeweyLicensing@oclc.org asking them about this... I'll post a followup if I hear back. -
I Used to Work for OCLC
I obviously can't speak for them, but I can provide some background on what they do. OCLC is a nonprofit org providing services for approx 45,000 libraries around the world. If you are a librarian and need to figure out how to catalog a new book in your collection, you go to OCLC to see how others have done it. Ever needed an item that wasn't in your library? OCLC handles the system for arranging inter-library loans. They do a fair amount of original research for libraries and they even open source some of the results. PURL is another OCLC project that some of you may be familiar with. The Dublin Core MetaData Initiative was co-founded by a researcher who got his start at OCLC and is now running the W3C's Symantic Web Initiaitve. OCLC is very well known and respected in the library community.
Library budgets the world over are under attack given the current economic situation. This leaves less and less money available for building the kind of common infrastructure that will help libraries continue to provide new and relevant services for their patrons as more and more of the content becomes digital. OCLC certainly has both the right and the need to defend the Dewey Decimal Trademarks from infringers. -
I Used to Work for OCLC
I obviously can't speak for them, but I can provide some background on what they do. OCLC is a nonprofit org providing services for approx 45,000 libraries around the world. If you are a librarian and need to figure out how to catalog a new book in your collection, you go to OCLC to see how others have done it. Ever needed an item that wasn't in your library? OCLC handles the system for arranging inter-library loans. They do a fair amount of original research for libraries and they even open source some of the results. PURL is another OCLC project that some of you may be familiar with. The Dublin Core MetaData Initiative was co-founded by a researcher who got his start at OCLC and is now running the W3C's Symantic Web Initiaitve. OCLC is very well known and respected in the library community.
Library budgets the world over are under attack given the current economic situation. This leaves less and less money available for building the kind of common infrastructure that will help libraries continue to provide new and relevant services for their patrons as more and more of the content becomes digital. OCLC certainly has both the right and the need to defend the Dewey Decimal Trademarks from infringers. -
School library
Hmm... from what I've found out about DDC, it seems like my school library uses it.
I really doubt they have a license. And there's no way to find out until tuesday... I can't wait!
Oh, and here's a nice intro on DDC:
http://www.oclc.org/dewey/versions/ddc22print/intr o.pdf
(Why is there a space between the 'r' and 'o'?) -
Re:Yipee
"Dewey Decimal System"
Ummm. Ohh
And I was thinking that it was a horrible typo. Maybe Melvil once heard the name for the base twelve radix, and thought 'cool name'?
Duodecimal \Du`o*dec"i*mal\, a. [L. duodecim twelve. See {Dozen}.]
Proceeding in computation by twelves; expressed in the scale
of twelves. -- {Du`o*dec"i*mal*ly}, adv.
[1913 Webster]
-
When you reach 48 million bibliographic records...
... let me know. Then I might take a look at the wheel you've reinvented.
-
Re:A Great Idea
In the academic world, a tool such as this already exists in the form of WorldCat. It has some 48 million records, from clay tablets to computer files, and is decidedly expensive to access.
-
Re:Simple Solution: Take Computers out of Librarie
Libraries are repositories of information
Yeah. And whoever heard of information on the internet?Research? Bullshit, The only research are google searches.
Obviously you've never used FirstSearch to search the "over one million three-hundred thousand entries gleaned from essay collections, dissertations, monographs and over 6,000 journals" available in the MLA Bibliography Database. Have you tried EBSCO to search all the databases it provides access to?A library inventory should contain books, periodicals, and articles of note.
This statement reminds me of one of the internet's fundamental problems: accessing desired information. What good is an enormous distributed repository of information if there exist no tools to locate worthwhile content? The books, periodicals, and articles of note in your library are worthless without effective methods of searching them. Computers provide those methods, which are far better than what was available before. -
OCLC
Um...
Libraries already do this via OCLC (and actually there are now vendors/jobbers out there that can and/or will do this for about a US$1.50 per book) -
Less expensive alternative to Books in Print
OCLC offers much less expensive databases of books. Their WorldCat database includes 47 million bibligraphic records. Based on a quick look at their site, it that only member libraries who share their databases with OCLC have access to to WorldCat. However, I suspect that free, publicly available book database could negotiate membership.
Note: for participating libraries, the cost of WorldCat is much less than $30K. (I don't know how much, but I know that the public library where I used to work could never afford a $30K subscription to anything, but we did have WorldCat access.) -
Less expensive alternative to Books in Print
OCLC offers much less expensive databases of books. Their WorldCat database includes 47 million bibligraphic records. Based on a quick look at their site, it that only member libraries who share their databases with OCLC have access to to WorldCat. However, I suspect that free, publicly available book database could negotiate membership.
Note: for participating libraries, the cost of WorldCat is much less than $30K. (I don't know how much, but I know that the public library where I used to work could never afford a $30K subscription to anything, but we did have WorldCat access.) -
Less expensive alternative to Books in Print
OCLC offers much less expensive databases of books. Their WorldCat database includes 47 million bibligraphic records. Based on a quick look at their site, it that only member libraries who share their databases with OCLC have access to to WorldCat. However, I suspect that free, publicly available book database could negotiate membership.
Note: for participating libraries, the cost of WorldCat is much less than $30K. (I don't know how much, but I know that the public library where I used to work could never afford a $30K subscription to anything, but we did have WorldCat access.) -
Re:Would be good for small libraries worldwide
Small libraries and the like can access OCLC. OCLC provides a definitive copy of the books record. Can you imagine what would happen if some one tried to enter in their own data? Not only do books which have the same title have different ISBNs the data being entered would be subject to the interpritation of the person entering it (eg St. vs Saint)
There are rules that need to be followed in order to maintain any sort of consistancy in record keeping. Remeber, a library isn't kept at all like your bookshelf. -
Re:Would be good for small libraries worldwideThere is already a company that provides just such a service: Online Computer Library Center from which libraries can buy bibliographic records to load into their online catalogs (or print for their card catalog). OCLC recently purchased NetLibrary, a provider of e-books. NetLibrary was having financial difficulties, and OCLC jumped in to make sure all those libraries who "purchased" these e-books would still have access.
Another source of Books in Print is through Gale Group. Many local libraries are purchasing access to the Gale Group databases (Books in Print, InfoTrac, etc) for their users. For instance, Virginia residents can type in the bar code number from their library card to get access to these databases from home.
I work in a library, but I'm not a librarian.