Internet Book Database?

← Back to Stories (view on slashdot.org)

Posted by michael on Tuesday April 16, 2002 @10:14AM from the readin-ritin-and-right-clickin dept.

Anonymous Coward writes "Just about everyone has used either the CDDB or freedb CD databases. And many people are also familiar with DVD Profiler, a well developed database for DVD fans. Each of these public databases have a number of wonderful strengths, and a few weaknesses, but they are well thought out and well developed. After searching Google, sourceforge and every other search engine I could think of, I have come to the conclusion that there is not a well developed internet book database. While many people would be quick to point out the various commercial websites (Amazon, Barnes and Noble, etc), and the various library databases (Library of Congress, Boston Public Library, and other online catalogs), none of these online databases offer the same ease of use of DVD Profiler, or the open structure of the online CD databases. The closest program I could find was the shareware program Readerware. This program will search several web sites and download the pertinent information, but it is extremely inefficient, as it does not then store the data in a central database to make it easier for the other users, and in my opinion, the UI is terrible. What programs, if any, do those of you reading /. use to keep track of your books? If you were to start an open source internet book database project, what features would you include in it?" Books in Print is the definitive book database; apparently it costs about $30,000/year to license it.

28 of 231 comments (clear)

Min score:

Reason:

Sort:

Have you tried the Dewey Decimal System? by Ummagumma · 2002-04-16 10:20 · Score: 3, Informative

I put all my books in order on my shelves, and make 3.5" index cards for each, organized by the Dewey Decimal System.

That way, when the power goes out, I can still find the right book by candlelight. ;)

--
"The natural progress of things is for liberty to yield and government to gain ground." - Thomas Jefferson
1. Re:Have you tried the Dewey Decimal System? by parc · 2002-04-16 11:19 · Score: 3, Informative
  
  I'm in the process of doing the same thing, but I hit on something nice: the computer's great at putting things in order. So I scan or type in the ISBN, a perl script grabs the books information from the LOC(via z3950), and when I'm done, the system spits out a list of books in LOC order with the Title/Author next to it. As I go down the list of books, it naturally gets put in the right order on the shelf.
  
  At least that's the theory. I'm over 200 books so far, and I've just finished about 1/3 of the books.
  
  This is gonna take some time.
The Example of CDDB by Alien54 · 2002-04-16 10:22 · Score: 5, Interesting

I am concerned about the prior example of the CDDB, where all of these people contributed tyo this great resource, only to have the resource get sold off and commercialized and turned into a tool to track users, etc.
So While I really like the idea of the database, I do not like the possibility of the thievery of honest work by generous people.
Is there someway so that this could be donated into the public domain or something from day one?
(just trying to wrap my mushy mind around this for the moment.)

--
"It is a greater offense to steal men's labor, than their clothes"
1. Re:The Example of CDDB by ewhac · 2002-04-16 11:48 · Score: 3, Interesting
  
  Form a 501(c)(3) non-profit corporation that owns and operates the database. Draw up the corporate charter such that the database must be maintained for the sole benefit of the community, that users' activity will never be tracked, etc.
  My (limited) understanding is that the law makes 501(c)(3) charters very hard to change. As such, new management can't just waltz in and "sell out" the company and its resources.
  The only remaining danger is that the organization becomes politically influential and either leverages that influence to the detriment of the community, or itself comes under the influence of corrupt organizations.
  Schwab
  
  --
  Editor, A1-AAA AmeriCaptions
Re:To keep track of my books? by Lemmy+Caution · 2002-04-16 10:24 · Score: 5, Insightful

In conjunction with a barcode scanner/CueCat, it could make it a lot easier to start private libraries. I have a couple thousand books, about a hundred of which I lend out at any given time. Be nice to make a catalog, and a freely available cddb-like ISBN-based system would make that a lot easier.
singlefile by yum · 2002-04-16 10:25 · Score: 4, Informative

==>
Re:What would be the point? by jcwren · 2002-04-16 10:25 · Score: 4, Insightful

My use for such a database is partly for to make sure I haven't already read something (I just love buying the same book twice because the cover changed), and for insurance reasons.

I want to be able to use a barcode scanner (or even type the ISBN by hand), and pull all the relevant information from a DB to my local machine. This is exactly the point of CDDB, as I see it.

If I don't have to enter all the information by hand for a CD, why should I have to do it for a book?

--jcwren (owner of about 2700 books)
Re:Cue::Cat by Lemmy+Caution · 2002-04-16 10:29 · Score: 4, Insightful

What if you could scan it, it brings a copy of that record into your local database, prints out a book plate, and the can set up a borrowing schedule for it? That'd be cool. And very helpful for small libraries.
Why do we need this? by bstadil · 2002-04-16 10:31 · Score: 3, Insightful

I have come to the conclusion that there is not a well developed internet book database.

Why do we need this? Books are not searchable by nature so making it easier to find information about a book still leaves the issue of how do we get access to it. Making an eBook DB makes some sense. The ISBN numbering has been in effect for a long time and you can find any book reference that has a write up or reference on the net via Google. Thirdly the research community has oddles of system for referencing articles and papers.

--
Help fight continental drift.
Would be good for small libraries worldwide by line-bundle · 2002-04-16 10:31 · Score: 5, Insightful

Some people ask `what is the point?'.

My answer to that is the following: It would be nice to be able to lookup info about a book, given a small amount of information. Suppose you are a library and you want to catalogue books. Instead of having to type in all the information yourself you could just type in the ISBN and all the information get downloaded to the local catalogue.

I have had to make a database and enter data for a library and that would make life a lot easier!.
What's the Purpose? by rubinson · 2002-04-16 10:34 · Score: 3, Interesting

What programs, if any, do those of you reading /. use to keep track of your books? If you were to start an open source internet book database project, what features would you include in it?

What purpose would such a database serve? CDDB/freedb, for example, allow us to automatically download the album titles automatically. Saves everyone a lot of tedious work. Obviously, you're not going to be doing this for books.

As a graduate student, I maintain a single text file of all articles and texts that I've ever referenced. Each entry has a unique identified which I use the UIDs in my own articles instead of typing the full reference. A shell script then updates then updates the references and BibTeX automatically generates the bibliography.

I could see where it could be useful to have a centralized resource that could automatically download those references - but only if it was quicker/easier than typing it in myself (and that only takes a couple of seconds).

What other purposes would such a database serve? How would it make my life easier?
A start by JasonMaggini · 2002-04-16 10:34 · Score: 3, Informative

This site has some stuff on using barcode scanners (including the ever-popular cuecat) to catalog books...
I personally would like to catalog my collection with a relatively decent amount of information, but who wants to sit there and type all that stuff in?
I agree that the trick would to keep a database from going to the Dark Side like CDDB did...
Like this? by danro · 2002-04-16 10:34 · Score: 3, Interesting

Is there someway so that this could be donated into the public domain or something from day one?

Maybe by making the source available under the GPL, and making the ability for different instances of the database to exchange information with each other be a part of the project?
That way anyone with a T1 and a fairly large disc could have his own bookDb.

That way, no single entity would be in exclusive control of the data.

On the other hand no two databasers would be exactly the same.
Hmm...
Database design is not my field really, maybe I should shut up, and just write a few frontends to the db once someone has dreamt one up...

--

"First lesson," Jon said. "Stick them with the pointy end."
Free Library Databases - and a protocol by outlier · 2002-04-16 10:38 · Score: 5, Informative

Most large university libraries have free (beer) databases that typically contain huge numbers of books (many that are not held by the library).

For example, see mirlyn.web.lib.umich.edu and sign in as a guest and you can do all sorts of searches.

These libraries typically use the Z39.50 standard to connect. Z39.50 is a pretty decent standard, and it is widely used, standardized, and allows you to connect to many many databases.

Sounds like this could be what you're looking for.
Re:What would be the point? by hoggoth · 2002-04-16 10:39 · Score: 5, Informative

Then just stick the ISBN numbers into MySQL, an Excel spreadsheet, or an Access database.
Then write a quicky Perl script to scan through the records and any that don't have all the information filled in, go scrub it off of Amazon's web site.
I've already written several Perl scripts that scrub data from Amazon. It's pretty simple.

(hint:

use LWP::Simple;
$page = get http://www.amazon.com/exec/obidos/ASIN/$isbn;
($d esc, $pgs, $price, $other) = $page =~ /use regex to find (desc) and (pgs) and (price) and (other) usefull stuff/;

)

--
- For the complete works of Shakespeare: cat /dev/random (may take some time)
Here are some useful links... by sailracer6 · 2002-04-16 10:43 · Score: 5, Informative

The UPC Database

You can add entries here for ANYTHING with a standard UPC, so some books are in here. Very useful.

The Book-Scanning Project

This guy wrote some Python scripts to convert UPC's to ISBN's - it can be done - and then feed them into Amazon's search engine. Very interesting, and he's already done it, so he has some experience.
37signals by fm6 · 2002-04-16 10:55 · Score: 3, Insightful

Impressive web design. Pretty and usable and minimalist use of HTML. Rare to see all that in one place.
Seems to be a project of 37signals. Some interesting work in their portfolio.
Re:To keep track of my books? by Lemmy+Caution · 2002-04-16 10:56 · Score: 3, Interesting

Fair enough, the key value would be T/A/P/E (Edition information) + ISDN - but a system that returned T/A/P for ISDN when it's there and vice versa (or null when it isn't) would still be very, very helpful. Reducing data input by the percentage of books in a library that are 20 years old is still a gain.
feed this page an isbn, get XML out by Pinball+Wizard · 2002-04-16 10:59 · Score: 5, Informative

if you are looking for XML data, feel free to use this page(asp at the moment, but it will soon be redone in perl).

The important thing is it outputs XML, so if you want to build an interface to it for your own application, you can. Its not a 100% complete database, but it should give you basic information on any book available.

I wrote this specifically for external search engines back when XML was the new hot thing. Funny thing is, the sites that search us usually want an FTP data feed, so this doesn't really get used much. But again, feel free(be reasonable if you use a bot - maybe limit your bot to a search every 5-10 seconds, please).

--
No, Thursday's out. How about never - is never good for you?
Re:For items out of copywrite... by lkaos · 2002-04-16 11:02 · Score: 5, Interesting

http://www.gutenberg.org

is the official url IIRC

absolutely wonderful resource. they have a ton of books and the transcriptions are of pretty high quality--the have an excellent qa process.

--
int func(int a);
func((b += 3, b));
Re:To keep track of my books? by Jonathan · 2002-04-16 12:01 · Score: 3, Insightful

Well, if you only own a couple hundred books or so, keeping track of them isn't hard. If you own thousands, having some system is needed. I have about as many bookshelves as I have wallspace, and even then books are stacked 2 or three deep on them.
Re:What would be the point? by rmohr02 · 2002-04-16 12:20 · Score: 4, Insightful

So you also fail to see the usefulness of the Internet Movie Database? I, personally, visit the IMDb almost as much as I visit /.
Actually, there's work being done on one... by ExplodingTeakettle · 2002-04-16 12:28 · Score: 4, Informative

What you're basically proposing is a way to share bibliographic metadata -- not the book itself, but table of contents information, library holdings, etc. There are standards amongst libraries for doing this (ISO Z39.50 and AACR2--both of which are horribly abstruse and generally a pain to deal with). Dr. Rob Cameron, along with a small group of Simon Fraser University students, has been working on the seeds of a system for sharing bibliographic metadata -- see http://www.usin.org. This basically extends the URI standard to support ISBN and ISSNs, initially to support scholarly communication, but also making it possible to create what we call "personal bibhosts" with support for annotations, shared notes, etc. Among other things, we've implemented searches across various worldwide libraries to obtain and compare bits of bibliographic info, and so forth. Yes, you still run into the problems of inconsistent data for a given ISBN/ISSN (as a previous poster pointed out), but hey...you have to start somewhere!
Another use for a Cue Cat by ScottBob · 2002-04-16 13:22 · Score: 3, Interesting

So I scan or type in the ISBN, a perl script grabs the books information from the LOC(via z3950), and when I'm done, the system spits out a list of books in LOC order with the Title/Author next to it.

And what better to scan the ISBN with than a Cue Cat. My mother has about 400 paperback romance novels, and every time she goes to the bookstore, she can't figure out if she's read that book yet or not. She picks a book up, reads two pages, and says "I can't tell if I've read that one before or not." (Of course, I ask her how can she tell?) A Cue Cat and a CDDB style book database would allow me to scan the barcode and catalog every one of her books very quickly so she can bring a printout to the bookstore with her.
Re:Cue::Cat by raincrow · 2002-04-16 13:32 · Score: 4, Insightful

In the same way that people research CDDB and IMDB to see who has recorded what and who has directed what and who starred in what, people also want to see full author bibliographies, checklists for book series, edition lists (there are people who collect multiple editions and sometimes even printings of the same book). Then the hunt begins. There are more uses of catalogs and libraries than just adding metadata to your own collection. There's also research.

Take a look at the SFDB for an example.
Re:Wrote my own by ChazeFroy · 2002-04-16 13:42 · Score: 4, Interesting

Carnegie Mellon University has been working on the ulib project for a number of years now.

This is also a shameless plug for one of my IRC friends responsible for this. Hi Latinum.
I have one in development right now... by ChrisKnight · 2002-04-16 15:04 · Score: 3, Informative

I am currently building a database if ISBN numbers with the following records: Title, Author, Publisher and Media.

It hadn't really occurred to me that others might like access to this kind of data as well.

Seriously, is there enough interest that it might be worth the effort to add a request interface that returned an XML object of the data that I have? Would others contribute to it?

I currently have 294,652 completed entries in my database. I'm out of work and bored, and I'll make it publicly accessible if I get some feeback indicating that it would be worth the effort.

-Chris

--
-- This sig is only a test. If this were a real sig it would say something witty. --
To keep track of your references by SgtChaireBourne · 2002-04-16 19:05 · Score: 3, Interesting
BIBTEX and MARC are two format for managing bibliographic data. But if you're thinking of rolling your own reference manager, then you'll quickly find out that it's not just a flat file and then you'll also need to integrate it with your data source and with your editor/wordprocessor.
If you just want to import citations, the Z39.50 search and retrieval protocol is the way to import from yor library catalog and many online databases. Indexdata has number of multiplatform tools that you can use, such as YAZ (a z39.50 client) and PHPYAZ. Three commercial packages import from Z39.50 sources nicely (Bookwhere, Procite and Endnote) both Procite and Endnot work well at managing your footnotes during workprocessing, taking care of numbering and layout (e.g. APA or Chicago Manual of Style, etc.).
If you want something under GPL and more oriented to managing web sites and other Internet resources, then you may want to try hypatia. You'll have to ask special for it, but it's available. Here are the parts I've seen so far:
- Web-based interface, both end users and maintainers.
- Fully multi-lingual, including both interface and content. (It is very easy to add another language to the interfaces. Right now English and Spanish are complete, Norwegian and Finnish are being translated.) Support for Unicode (Which means you're free to add interfaces in or ).
- Useable on many different platforms, including Linux, Unix, and Windows.
- Individual installations can exchange records, allowing federated content and service providers to work together seamlessly. (Haven't tried it yet.)
- Compatible with relevant standards, including MARC, Dublin Core, and the Networked Reference standard currently under development by NISO.
- Special features for digital collections, such as automatic URL checking.
- Authority control over names (e.g. People and Organizations).
- Uses perl/MySQL/javascript
You can see the end user interface in production at the IPL in the serials, newspapers, or online texts collections. The collection managment interfaces are even nicer and very useful. I'm sure it can be tweaked for data on legacy media as well.
--
Beta is broken and the link to classic doesn't work. Stop wasting our time or there won't be anybody left here.