Internet Book Database?
Anonymous Coward writes "Just about everyone has used either the CDDB
or freedb CD databases. And many
people are also familiar with DVD
Profiler, a well developed database for DVD fans. Each of these public
databases have a number of wonderful strengths, and a few weaknesses, but they
are well thought out and well developed. After searching Google, sourceforge and every other search engine I could think of, I have come to the conclusion that there is not a well developed internet book database. While many people would be quick to point out the various commercial websites (Amazon, Barnes and Noble, etc),
and the various library databases (Library of
Congress, Boston Public
Library, and other online catalogs),
none of these online databases offer the same ease of use of DVD Profiler, or
the open structure of the online CD databases. The closest program I could
find was the shareware program Readerware.
This program will search several web sites and download the pertinent
information, but it is extremely inefficient, as it does not then store the data
in a central database to make it easier for the other users, and in my opinion,
the UI is terrible. What programs, if any, do those of you reading /. use to keep track of your books? If you were to start an open source internet book database project, what
features would you include in it?" Books in Print is the definitive book database; apparently it costs about $30,000/year to license it.
I use a bookcase...
What would be the point of a book database? The databases for DVDs and CDs allow for players on a machine to spit out relevant track/title information. I'm having a hard time coming up with a reason to have a book database.
I put all my books in order on my shelves, and make 3.5" index cards for each, organized by the Dewey Decimal System.
;)
That way, when the power goes out, I can still find the right book by candlelight.
"The natural progress of things is for liberty to yield and government to gain ground." - Thomas Jefferson
Wrote my own Mysql/PHP. Not very good just enough to keep track of them. http://www.teuse.net/books
Shawn Moore http://www.teuse.net
The most likely reason for this is, at least so far, the difference in format between books and digital discs of any kind. It's very easy and direct to examine the structure of a disc, but until books become digital as well this won't be as simple.
Books In Print is a great resource, if you have access to it. Amazon works well as a poor man's version.
I fail to see the usefulness of such a database, outside the traditional search engine uses. CDDB and freedb both serve a function in that they identify some electronic data for me so I don't have to--a CD i've inserted into my drive. DVD Profiler presumably performs a similar function (I've not used it so I can't say for certain). But books don't have an analogue in this area. If you had an electronic version of a book, presumably it would also have whatever index you needed with it. And if you wanted an index across titles, you would use some search engine like google. But there aren't enough of these kinds of titles to warrant such an application, and i'm afraid I don't see the advent of that time approaching. Between incompatible proprietary formats and the DMCA, I think it'll be quite a long time before we have a standard "book cd" format that is used in generic book appliances a la' Rocketbook.
While I understand some /. posters actually do know how to read, I suspect that the closest they get to a book is the title, which tells them all they need to know in order to hold definitive opinions on the book's author, subject, publisher, and political position.
All kidding aside, the resource my wife regularly uses is google to find pages regarding books she reads for her book groups.
I would love to see an internet book database, though I know of none. In fact, I would be interested in contributing to such a project.
He looked at me and said, "Kid, we don't like your kind, and we're gonna send your fingerprints off to Washington."
So While I really like the idea of the database, I do not like the possibility of the thievery of honest work by generous people.
Is there someway so that this could be donated into the public domain or something from day one?
(just trying to wrap my mushy mind around this for the moment.)
"It is a greater offense to steal men's labor, than their clothes"
==>
Yeah, and let's enable the database so that you can point your cue::cat at the book's barcode and up pops the relevant page with information about which book you're reading.
Ain't it easier to just look at the cover??
I use Readerware, and while I grant to you that it is "inefficient" in some sense (and yes, the interface sucks), the folks working on it are continuously updating the thing, and its ability to search about 2 dozen different sources for book information is really wonderful. Since most people don't play books by putting them in a slot in their computer, there isn't really that much demand for a really high-power archiver. I personally just scan my new books in and click "update" - Readerware finds everything I need, no problem, and I don't have to do it that often. Chris
I used to work for the local (independent) college bookstore (Illini Union Bookstore), and we had access to Books in Print in both dead tree (very old) and web-based (shared a login with our university's library) formats. While the information was usually very good and very reliable, there were many problems.
Do you have any old books? BIP can be very unreliable when trying to find books published before 1980. Even still, BIP doesn't include information on all the different editions of a particular book, so your ISBN may not yield any results.
Speaking of no results, the search feature on BIP is incredibly unreliable. You can search for an ISBN, not find a book, then search for the title and come up with a book with the ISBN you just searched for. Try putting that ISBN back into the search box and it doesn't work! Sometimes you get what you want, sometimes you don't.
Aside from searching for basic bibliographic information (title, author, illustrator if any, publisher info, etc.), pricing and availability information (available for most books in BIP's database) are not up-to-date as they report them to be. Many times we ordered books and the publisher told us the books were priced very differently from what BIP told us. Good luck getting an accurate estimate of how much your book collection is worth!
In the end, a book database like cddb's cd database or even better, like imdb's movie database including reviews and ratings would help people organize and maintain their private collections, and would help bookstore employees get their job done. If only the book database software our bookstore used had the ability to access an outside database like that!
I have come to the conclusion that there is not a well developed internet book database.
Why do we need this? Books are not searchable by nature so making it easier to find information about a book still leaves the issue of how do we get access to it. Making an eBook DB makes some sense. The ISBN numbering has been in effect for a long time and you can find any book reference that has a write up or reference on the net via Google. Thirdly the research community has oddles of system for referencing articles and papers.
Help fight continental drift.
Some people ask `what is the point?'.
My answer to that is the following: It would be nice to be able to lookup info about a book, given a small amount of information. Suppose you are a library and you want to catalogue books. Instead of having to type in all the information yourself you could just type in the ISBN and all the information get downloaded to the local catalogue.
I have had to make a database and enter data for a library and that would make life a lot easier!.
What programs, if any, do those of you reading /. use to keep track of your books? If you were to start an open source internet book database project, what features would you include in it?
What purpose would such a database serve? CDDB/freedb, for example, allow us to automatically download the album titles automatically. Saves everyone a lot of tedious work. Obviously, you're not going to be doing this for books.
As a graduate student, I maintain a single text file of all articles and texts that I've ever referenced. Each entry has a unique identified which I use the UIDs in my own articles instead of typing the full reference. A shell script then updates then updates the references and BibTeX automatically generates the bibliography.
I could see where it could be useful to have a centralized resource that could automatically download those references - but only if it was quicker/easier than typing it in myself (and that only takes a couple of seconds).
What other purposes would such a database serve? How would it make my life easier?
This site has some stuff on using barcode scanners (including the ever-popular cuecat) to catalog books...
I personally would like to catalog my collection with a relatively decent amount of information, but who wants to sit there and type all that stuff in?
I agree that the trick would to keep a database from going to the Dark Side like CDDB did...
Is there someway so that this could be donated into the public domain or something from day one?
Maybe by making the source available under the GPL, and making the ability for different instances of the database to exchange information with each other be a part of the project?
That way anyone with a T1 and a fairly large disc could have his own bookDb.
That way, no single entity would be in exclusive control of the data.
On the other hand no two databasers would be exactly the same.
Hmm...
Database design is not my field really, maybe I should shut up, and just write a few frontends to the db once someone has dreamt one up...
"First lesson," Jon said. "Stick them with the pointy end."
Most large university libraries have free (beer) databases that typically contain huge numbers of books (many that are not held by the library).
For example, see mirlyn.web.lib.umich.edu and sign in as a guest and you can do all sorts of searches.
These libraries typically use the Z39.50 standard to connect. Z39.50 is a pretty decent standard, and it is widely used, standardized, and allows you to connect to many many databases.
Sounds like this could be what you're looking for.
You can add entries here for ANYTHING with a standard UPC, so some books are in here. Very useful.
The Book-Scanning Project
This guy wrote some Python scripts to convert UPC's to ISBN's - it can be done - and then feed them into Amazon's search engine. Very interesting, and he's already done it, so he has some experience.
Probably an agent provacateur for the Author's Guild trying to incite the Open Source community into writing a book tracking list for them to use to keep track the livelyhood-stealing activities of that awful Jeff Bezos and those bastards at Half-Price Books.
--
"Outlook not so good." That magic 8-ball knows everything! I'll ask about Exchange Server next.
I have not followed freedb much, but I suspect the software that runs it could be modified to work with books. The Freedb software is under the GPL and uses a MySQL database backend. Someone looking to procrastinate for a few days could probably have a working book database within a few days.
--Ben
Seems to be a project of 37signals. Some interesting work in their portfolio.
Check out bookcrossing.com. You can have your own bookshelf. Just type the ISBN, it retrieves the cover art, the author and all that. You can fix it too.
I use it, I like it.
"Piter, too, is dead."
...then I guess there is no point. I have 5 at home and 2 at work.
The important thing is it outputs XML, so if you want to build an interface to it for your own application, you can. Its not a 100% complete database, but it should give you basic information on any book available.
I wrote this specifically for external search engines back when XML was the new hot thing. Funny thing is, the sites that search us usually want an FTP data feed, so this doesn't really get used much. But again, feel free(be reasonable if you use a bot - maybe limit your bot to a search every 5-10 seconds, please).
No, Thursday's out. How about never - is never good for you?
http://www.gutenberg.org
is the official url IIRC
absolutely wonderful resource. they have a ton of books and the transcriptions are of pretty high quality--the have an excellent qa process.
int func(int a);
func((b += 3, b));
I am writing my own catalog with MySQL/Perl for several reasons.
1) I don't have enough space in my tiny room to fit all my books into bookcases, but with the db I can put some books in boxes in the closet and easily find out in which box a certain book is.
2) I want my books sorted according to a standard classification system but still be able to have them in my own way in the bookcase. Currently I use a heavily outdated (1987) Swedish classification system that the kind folks at my school library lent me. So I'll definitely take look at the Dewey Decimal system mentioned earlier.
3) I have books in several languages and with a db I can have the same kind of information on different books in different languages in the same place. Thus I don't have to look up the romanization for the Kanji (Chinese charachters in Japanese) more than once. But of course it will store the original Kanji-titles as well.
4) I can easily create lists of books that I want to buy and, that friends have borrowed from me or books that I have borrowed.
When it's finished I want it to handle 2-bit languages in a nice way, be compliant with existing standards for book classification, both Swedish and international, allow for easy list creation and have a nice interface.
Have you ever heard of Project Gutenberg? It is basically doing what you are talking about and has been since the 1970's. They have a pretty good collection, and I would totally suggest anyone interested in an internet book DB to help them out with their cause. Although I see your point that a full index of all books (without content) would be a pretty cool thing to have.
"Your 'Gin n'tonic Futon Brain' sure makes you smart!"
"That's 'Positronic-photon Brain', you idiot!"
I'm not sure about NBA, but the kick-ass free Lahman MLB database is available at baseball1.com. It's got stats going back to 1871...
Try the Australian mirror as well. Because of discrepancies in the way countries expire copyright, many books are listed on the Australian site that are not available on the main repository.
And before you go bagging fullscale on the US; There are many books listed on the US site that are not on the British. We're not the worst! =)
http://www.gutenberg.net.au
.sig: Now legally binding!
We had a room full of tech-books that had accumilated since before I started. When the PHBs would have dumped the whole lot in a skip, I spent the better part of a week sorting through that lot, Addison-Wesley X11 reference, Adobe Postscript reference, C, Fortran, a complete set of A/UX manuals, etc. etc. etc. No-one in the organisation even knew they existed anymore. Being able to zap the barcodes through a reader and generate a catalogue on our intranet would be cool, there are another three depertments that have similar geek rooms that are always one zealous PHB away from the dumpster.
Hmmm... I am sure there must be ISBN search facilities I can screen-scrape.... Coworker types ISBN of their newly received book into a front end and voila! I can grep for it.
Xix.
"Everything is adjustable, provided you have the right tools"
I think a book database could be pretty interesting just as a central ISBN/publisher/year/author reference. (Yes, Google is wonderful, but you never know what context an ISBN match is going to be in; the whole point of having a central resource in consistency.) But then, my wife and I have a living room lined with bookcases, and the bookcases are starting to encroach on our hallway and bedroom too. :)
:)
:)
But you could do some pretty interesting stuff with an IMDB-style book database, at least for fiction. I'm picturing entries for fictional characters and locations, along with birth and death dates, even user-moderated (Wiki?) biographical sketches where available, cross-referenced by author. Instant encyclopedia of Arkham/Castle Rock... cool!
But even outside of a single author's oeuvre, there would be great cross referencing stuff you could do.
Say I read and really liked a detective novel that takes place in Los Angeles in the 1940's.
It would be pretty cool to have a reliable database where I could plug in the ISBN of the book I just read, and get a cross-referenced list of other books set in the same time/place/genre - without the busy, sales-oriented "You might also like" mess you get from a site like Amazon.
Maybe include a user comments section, if there's some sort of meta-moderation available - point-missing/inane/poorly written Amazon user reviews instantly send me into a blind rage
-Oh, and you could do automated metasearches with the new Google API, too
Did I read that right? You mean that title, author, subject, date, and category are not searchable fields? Its impossible to search the contents of a book for patterns? Its not easy to store/index a book's content in the database itself?
Perhaps you typed that wrong, or im misunderstanding you, but that statement sounds profoundly false.
You might want to look into project gutenberg. Because they do that. (If copy restrictions were shorter, they would have tons more stuff too)
What you're basically proposing is a way to share bibliographic metadata -- not the book itself, but table of contents information, library holdings, etc. There are standards amongst libraries for doing this (ISO Z39.50 and AACR2--both of which are horribly abstruse and generally a pain to deal with). Dr. Rob Cameron, along with a small group of Simon Fraser University students, has been working on the seeds of a system for sharing bibliographic metadata -- see http://www.usin.org. This basically extends the URI standard to support ISBN and ISSNs, initially to support scholarly communication, but also making it possible to create what we call "personal bibhosts" with support for annotations, shared notes, etc. Among other things, we've implemented searches across various worldwide libraries to obtain and compare bits of bibliographic info, and so forth. Yes, you still run into the problems of inconsistent data for a given ISBN/ISSN (as a previous poster pointed out), but hey...you have to start somewhere!
Something like this is going to have initial and ongoing costs. Even if it is developed under an open license, there should be some provision made for commercial use and licensing, but not ownership, of the database once completed.
The alternative is to have the project run out of money, and be bought, probably by a business, and then commercialized anyway.
The best projects will always be those that balance the commercial aspects with the public interest aspects.
It is an excellent idea, however.
So set your browser to underline links. I prefer that mode anyway. Even when web designer use conventional gimmicks to indicate links, it's not always obvious what they are.
So I scan or type in the ISBN, a perl script grabs the books information from the LOC(via z3950), and when I'm done, the system spits out a list of books in LOC order with the Title/Author next to it.
And what better to scan the ISBN with than a Cue Cat. My mother has about 400 paperback romance novels, and every time she goes to the bookstore, she can't figure out if she's read that book yet or not. She picks a book up, reads two pages, and says "I can't tell if I've read that one before or not." (Of course, I ask her how can she tell?) A Cue Cat and a CDDB style book database would allow me to scan the barcode and catalog every one of her books very quickly so she can bring a printout to the bookstore with her.
Oh, I get it all right. I have more than 20 thousand books -- no idea of the actual number, that's just based on multiplying the number of packed-full shelves by the number of books on an average shelf. Many of them are old, as in pre-ISBN. Many of them were published in other countries and/or in other languages and don't show up in your typical database. I have numerous Hungarian books, for example, that aren't in the online catalog of any United States library.
I'm working on a catalog of my books (and my etexts, and my tens of thousands of physical and digitized sound recordings, and small quantities of miscellaneous other media -- I'm not really into video). Indeed, bibliography is an interest of mine, and I've long had ideas for very nontraditional, loosely-structured, multiply-hierarchical hypertextual catalogs. I've been implementing small parts of these ideas for over ten years.
But actually getting any reasonable fraction of my library into a database strikes me, on even my most optimistic days, as be a Herculean task. It's hard to get started, because when I do have any free time, I prefer actually reading the books to cataloguing them. Oh, when I actually get out of postdoctoral research hell and get a real job, I might have enough money to hire someone to do data entry (then again, I'm likely to want to spend the extra money on books -- fortunately I just got married and my wife might act as a braking force against that tendency).
With a little luck, I'll have the structural framework for my catalog coded in a year or two. But actually getting the data into a database will be a huge task, and one which my CueCat (or the more professional barcode scanners I recently dumpster-dived) will hardly begin to help with. (Only comparatively-recently published books have bar codes, and not even all of them).
A unified catalog with all the records from Library of Congress, Books in Print, and university/state libraries around the world would be fantastic, though, if only to "fill in the blanks" with a minimum of manual entry for any given book. (I do have access through my university to some things that help, though, the unified bibliographical catalogs that librarians use. But I have to write glue code to automate access to them, and that's a pain in the butt).
Why do I want to catalogue my library? Well, there are a couple of reasons. The main one is probably that I want to build the hypertextual database that I alluded to above. When I read books, I make notes (mentally or otherwise). The notes usually make reference to other books. It would be nice to record these notes in the database; eventually it would be a web reflecting what I've thought about various books throughout time. I'm a fairly disorganized person, and if I just jot something down somewhere I'll lose track of it. And if I try to keep it all in mind, I'll inevitably start to forget.
Being disorganized also justifies a catalog on purely practical terms -- it would be nice to know for sure, when for instance I see a book that I've already read and liked in a used bookstore, whether I already have the book (in which case I certainly don't want a duplicate), or read it somewhere else (in which case I certainly do want to buy it). And, since my books are not shelved according to any rational system, a catalog might help me find them (though I don't usually have much trouble with this). Note that I have no intention of significantly rationalizing the shelving even if I do catalogue the books. I'm much more likely to simply record my idiosyncratic locations in the database.
A final reason for cataloguing is that my collection is fairly comprehensive in a few specialized areas and I definitely do have a few books, at least, that would be very hard to find in this country. I'd be willing to lend out such books to (trustworthy) people. But people need to be able to find out that I have the books, and I need to be able to keep track of any loans as I'd be loath to lose even a single book. A catalog would be absolutely indispensable for this.
Kiscica
SearchDay - Curling Up with a Good Book Search Engine.
I am currently building a database if ISBN numbers with the following records: Title, Author, Publisher and Media.
It hadn't really occurred to me that others might like access to this kind of data as well.
Seriously, is there enough interest that it might be worth the effort to add a request interface that returned an XML object of the data that I have? Would others contribute to it?
I currently have 294,652 completed entries in my database. I'm out of work and bored, and I'll make it publicly accessible if I get some feeback indicating that it would be worth the effort.
-Chris
-- This sig is only a test. If this were a real sig it would say something witty. --
See related article/discussion on Advogato from a few days ago.
I built one but it is very free-form and may not be what you want.
My objective was to do a quick keyword search on a list of 100,000 records from several different sources. Generally I have one line per book, and while some of the indices provide more information that is all I use.
I didn't want to spend the time to do a real database job and I wanted to use Perl regular expressions to do a quick keyword search within author and title text. So I keep recent indices next to the search program compressed variously with zip, gzip, or bzip2. I can direct the system to make a single text file which contains the unpacked text all appended together and compressed again. It will also list stats for each file.
Its main function is to wait for a keyword to be typed in, and it will immediately (PIII/450MHz Linux Inspiron 7.5K) display a numbered list of matching books, in alphabetical order grouped by indice. You can then select certain numbers from the list, or reduce the number of records by adding more keywords. This is sufficent for me and has helped me discover unknown titles and new authors because of its way of narrowing down on information. Perhaps if I had more structured files I would have used Perl's BoulderIO which has solved huger problems of library science in merging genome data files, see bio.perl.org.
If you just want to import citations, the Z39.50 search and retrieval protocol is the way to import from yor library catalog and many online databases. Indexdata has number of multiplatform tools that you can use, such as YAZ (a z39.50 client) and PHPYAZ. Three commercial packages import from Z39.50 sources nicely (Bookwhere, Procite and Endnote) both Procite and Endnot work well at managing your footnotes during workprocessing, taking care of numbering and layout (e.g. APA or Chicago Manual of Style, etc.).
If you want something under GPL and more oriented to managing web sites and other Internet resources, then you may want to try hypatia. You'll have to ask special for it, but it's available. Here are the parts I've seen so far:
- Web-based interface, both end users and maintainers.
- Fully multi-lingual, including both interface and content. (It is very easy to add another language to the interfaces. Right now English and Spanish are complete, Norwegian and Finnish are being translated.)
Support for Unicode (Which means you're free to add interfaces in or
).
- Useable on many different platforms, including Linux, Unix,
and Windows.
- Individual installations can exchange records, allowing
federated content and service providers to work together
seamlessly. (Haven't tried it yet.)
- Compatible with relevant standards, including MARC, Dublin Core, and the
Networked Reference standard currently under development by NISO.
- Special features for digital collections, such as automatic
URL checking.
- Authority control over names (e.g. People and Organizations).
- Uses perl/MySQL/javascript
You can see the end user interface in production at the IPL in the serials, newspapers, or online texts collections. The collection managment interfaces are even nicer and very useful. I'm sure it can be tweaked for data on legacy media as well.Beta is broken and the link to classic doesn't work. Stop wasting our time or there won't be anybody left here.
As a server-side app, it doesn't have to be distributed to others. So the GPL won't do any good. I like the interchange of info between databases, but that can be very hard to implement correctly. You'll run into all kinds of mismatches where you need to do the right thing (when do records denote the same book?). You can fix a lot of these problems by focusing on the ISBN, but you'll still run into inconsistent data and the like. There will also be security issues, you have to make sure that the guy who you synchronize with hasn't polluted his database with wrong data. Lastly, copying an entire database takes a lot of horsepower. You don't want people doing this to your database during busy hours.
Apart from these technical considerations, it's very likely that one database will become the standard that everyone uses, even when there is more than one choice (everyone will use the most extensive, most correct database). I doubt that there will be many syncronizations into this database, most people will be interested in making a copy of it.
All in all I think the best bet is to incorporate a feature that makes it easy to automatically dump the database to a mirror at a certain time (midnight or so). There's no good way to make sure that this actually get's done though, but you can always write a screenscrape application to punish the organization that doesn't supply you with a dump (scraping the entire database will hurt them bad). When the organization goes nasty, you can switch over to a copy and try to beat them with an open alternative.
The Drowned and the Saved - Primo Levi
I'm trying to gather information about books on my site, The Virtual Bookcase. It grew out of trying to make a database of my own books and at the same time doing projects at work regarding books and information about books. I try to gather reviews but I'm ofcourse limited to reviews that people enter (please enter more reviews!) or reviews that I can reuse (such as those from amazon). The amazon-affiliate linking helps recoup a bit of the costs (bandwidth, domain name) but the amount of time invested in the software for the site and the maintainance of the databases is of course never repayed. I do learn a lot in the process :) I'm now at a stage where I think the technical stuff can take care of itself for a while and I need to learn more about site design and usability and how to get other types of information on books such as press releases and general book news.
The Virtual Bookcase: book reviews