Internet Book Database?
Anonymous Coward writes "Just about everyone has used either the CDDB
or freedb CD databases. And many
people are also familiar with DVD
Profiler, a well developed database for DVD fans. Each of these public
databases have a number of wonderful strengths, and a few weaknesses, but they
are well thought out and well developed. After searching Google, sourceforge and every other search engine I could think of, I have come to the conclusion that there is not a well developed internet book database. While many people would be quick to point out the various commercial websites (Amazon, Barnes and Noble, etc),
and the various library databases (Library of
Congress, Boston Public
Library, and other online catalogs),
none of these online databases offer the same ease of use of DVD Profiler, or
the open structure of the online CD databases. The closest program I could
find was the shareware program Readerware.
This program will search several web sites and download the pertinent
information, but it is extremely inefficient, as it does not then store the data
in a central database to make it easier for the other users, and in my opinion,
the UI is terrible. What programs, if any, do those of you reading /. use to keep track of your books? If you were to start an open source internet book database project, what
features would you include in it?" Books in Print is the definitive book database; apparently it costs about $30,000/year to license it.
I use a bookcase...
What would be the point of a book database? The databases for DVDs and CDs allow for players on a machine to spit out relevant track/title information. I'm having a hard time coming up with a reason to have a book database.
I put all my books in order on my shelves, and make 3.5" index cards for each, organized by the Dewey Decimal System.
;)
That way, when the power goes out, I can still find the right book by candlelight.
"The natural progress of things is for liberty to yield and government to gain ground." - Thomas Jefferson
yes... i would definately put that in...
.............. I NEVER LEARNED TO READ!!
nevermind.
MARIJUANA, SHROOMS, X: ONLINE?! - E
Wrote my own Mysql/PHP. Not very good just enough to keep track of them. http://www.teuse.net/books
Shawn Moore http://www.teuse.net
The most likely reason for this is, at least so far, the difference in format between books and digital discs of any kind. It's very easy and direct to examine the structure of a disc, but until books become digital as well this won't be as simple.
Books In Print is a great resource, if you have access to it. Amazon works well as a poor man's version.
I fail to see the usefulness of such a database, outside the traditional search engine uses. CDDB and freedb both serve a function in that they identify some electronic data for me so I don't have to--a CD i've inserted into my drive. DVD Profiler presumably performs a similar function (I've not used it so I can't say for certain). But books don't have an analogue in this area. If you had an electronic version of a book, presumably it would also have whatever index you needed with it. And if you wanted an index across titles, you would use some search engine like google. But there aren't enough of these kinds of titles to warrant such an application, and i'm afraid I don't see the advent of that time approaching. Between incompatible proprietary formats and the DMCA, I think it'll be quite a long time before we have a standard "book cd" format that is used in generic book appliances a la' Rocketbook.
While I understand some /. posters actually do know how to read, I suspect that the closest they get to a book is the title, which tells them all they need to know in order to hold definitive opinions on the book's author, subject, publisher, and political position.
All kidding aside, the resource my wife regularly uses is google to find pages regarding books she reads for her book groups.
I would love to see an internet book database, though I know of none. In fact, I would be interested in contributing to such a project.
He looked at me and said, "Kid, we don't like your kind, and we're gonna send your fingerprints off to Washington."
So While I really like the idea of the database, I do not like the possibility of the thievery of honest work by generous people.
Is there someway so that this could be donated into the public domain or something from day one?
(just trying to wrap my mushy mind around this for the moment.)
"It is a greater offense to steal men's labor, than their clothes"
I'm not sure that an online free book database becomes very relevant until I start reading books in a digital format. CDDB, freedb, and DVD Profiler are all for digital media while books are still primarily in paper format.
I will continue using my book shelf for the time being.
==>
Yeah, and let's enable the database so that you can point your cue::cat at the book's barcode and up pops the relevant page with information about which book you're reading.
Ain't it easier to just look at the cover??
yes, that, and the KaZaA free CD database too.
I use Readerware, and while I grant to you that it is "inefficient" in some sense (and yes, the interface sucks), the folks working on it are continuously updating the thing, and its ability to search about 2 dozen different sources for book information is really wonderful. Since most people don't play books by putting them in a slot in their computer, there isn't really that much demand for a really high-power archiver. I personally just scan my new books in and click "update" - Readerware finds everything I need, no problem, and I don't have to do it that often. Chris
Oh, you mean the hardcopy of a web page?
just a bad joke! don't bother to flame me my submodernpostcomslashdotantireactionary freinds... I read bound printouts all the time.
-pyrrho
There _is_ need for something like that. Cause
- not all books have ISBN (though i wonder what the "primary key" to add/search them to the bookDB would be)
- there's no complete database on the net which has a nice API/prococol (you've always to parse the f*ckin amazon or barnes&nobles pages..)
- we need a _free_ database.
I used to work for the local (independent) college bookstore (Illini Union Bookstore), and we had access to Books in Print in both dead tree (very old) and web-based (shared a login with our university's library) formats. While the information was usually very good and very reliable, there were many problems.
Do you have any old books? BIP can be very unreliable when trying to find books published before 1980. Even still, BIP doesn't include information on all the different editions of a particular book, so your ISBN may not yield any results.
Speaking of no results, the search feature on BIP is incredibly unreliable. You can search for an ISBN, not find a book, then search for the title and come up with a book with the ISBN you just searched for. Try putting that ISBN back into the search box and it doesn't work! Sometimes you get what you want, sometimes you don't.
Aside from searching for basic bibliographic information (title, author, illustrator if any, publisher info, etc.), pricing and availability information (available for most books in BIP's database) are not up-to-date as they report them to be. Many times we ordered books and the publisher told us the books were priced very differently from what BIP told us. Good luck getting an accurate estimate of how much your book collection is worth!
In the end, a book database like cddb's cd database or even better, like imdb's movie database including reviews and ratings would help people organize and maintain their private collections, and would help bookstore employees get their job done. If only the book database software our bookstore used had the ability to access an outside database like that!
I have come to the conclusion that there is not a well developed internet book database.
Why do we need this? Books are not searchable by nature so making it easier to find information about a book still leaves the issue of how do we get access to it. Making an eBook DB makes some sense. The ISBN numbering has been in effect for a long time and you can find any book reference that has a write up or reference on the net via Google. Thirdly the research community has oddles of system for referencing articles and papers.
Help fight continental drift.
What is this book thing you are talking about?
Some people ask `what is the point?'.
My answer to that is the following: It would be nice to be able to lookup info about a book, given a small amount of information. Suppose you are a library and you want to catalogue books. Instead of having to type in all the information yourself you could just type in the ISBN and all the information get downloaded to the local catalogue.
I have had to make a database and enter data for a library and that would make life a lot easier!.
The company responsible for Books In Print probably has a patent on a "list of books". :)
isbn.nu is something of a step in the right sirection...it doesn't sell books directly, so it's pretty much a disinterested party (even imdb isn't really that anymore, since it's an Amazon site). but It would be nice to have something more comprehensive, that covered also books not in print that were important, that would have comprehensive (as best as possible) listings of every edition (in English only) of, say, The Aeneid.
maybe there isn't a widespread interest in this, and that's why it never developed. alt.rec.movies (or whichever usenet group) imdb grew out of obviously filled a need for lotsa people...why didn't this happen with books? maybe it's too late to start now.
still, I'd be interested. as it stands now, people discuss movies, actors, etc in their blogs and link w/out thinking to imdb pages. for books, they end up linking to amazon (or B&N or, even more rearely, booksense). none of those sites give quite the same depth of info on a book as imdb does on a movie.
What programs, if any, do those of you reading /. use to keep track of your books? If you were to start an open source internet book database project, what features would you include in it?
What purpose would such a database serve? CDDB/freedb, for example, allow us to automatically download the album titles automatically. Saves everyone a lot of tedious work. Obviously, you're not going to be doing this for books.
As a graduate student, I maintain a single text file of all articles and texts that I've ever referenced. Each entry has a unique identified which I use the UIDs in my own articles instead of typing the full reference. A shell script then updates then updates the references and BibTeX automatically generates the bibliography.
I could see where it could be useful to have a centralized resource that could automatically download those references - but only if it was quicker/easier than typing it in myself (and that only takes a couple of seconds).
What other purposes would such a database serve? How would it make my life easier?
This site has some stuff on using barcode scanners (including the ever-popular cuecat) to catalog books...
I personally would like to catalog my collection with a relatively decent amount of information, but who wants to sit there and type all that stuff in?
I agree that the trick would to keep a database from going to the Dark Side like CDDB did...
Is there someway so that this could be donated into the public domain or something from day one?
Maybe by making the source available under the GPL, and making the ability for different instances of the database to exchange information with each other be a part of the project?
That way anyone with a T1 and a fairly large disc could have his own bookDb.
That way, no single entity would be in exclusive control of the data.
On the other hand no two databasers would be exactly the same.
Hmm...
Database design is not my field really, maybe I should shut up, and just write a few frontends to the db once someone has dreamt one up...
"First lesson," Jon said. "Stick them with the pointy end."
I keep information on all the books I've read in my brain.
--WH--
I use a technology called BookShelf. Since all books have an identifying mechanism on their spine, it is relatively easy to optically scan the books for their unique identifier and select it based off of that.
I have a website. It's about Macs.
the great thing about cddb is that as soon as you put the cd in grip knows what the tracks are..... now i've searched my puter for the drive tray that takes books but i'm having a hard time locating it....
or do i need to buy all new books with IR or wifi???
-------
Drink Coffee - Do Stupid Things Faster And With More Energy!
Too bad the Cue::Cat makers are no longer in business. Good idea, bad timing I suppose.
Reply to this comment with contact info and we could start working on the details.
Klerck, thank you for once again brighening up my day by exposing slashcode is the tangled mess of shitty perl that it is.
Most large university libraries have free (beer) databases that typically contain huge numbers of books (many that are not held by the library).
For example, see mirlyn.web.lib.umich.edu and sign in as a guest and you can do all sorts of searches.
These libraries typically use the Z39.50 standard to connect. Z39.50 is a pretty decent standard, and it is widely used, standardized, and allows you to connect to many many databases.
Sounds like this could be what you're looking for.
Moron. It says right in the story that he did use Google. Sheesh
You can add entries here for ANYTHING with a standard UPC, so some books are in here. Very useful.
The Book-Scanning Project
This guy wrote some Python scripts to convert UPC's to ISBN's - it can be done - and then feed them into Amazon's search engine. Very interesting, and he's already done it, so he has some experience.
You dead right about not needin no index or nothin online that is already in the book.
But peep this: What if there was some kinda database that had the whole book in digital form, but was only searchable instead of readable. That way when fools need to find out instances of a certain word of phrase in a book, they could find out on the web, and get page numbers to look up in they hard copy. It'a keep the publishers happy cause you wouldn't be able to pirate they material, and it'a keep readers happy cause they'd have some of the digital benefits that they cain't get wit a paper book.
Na'am sayin?
Although it's probably just a small subset of what you're looking for, Project Guttenburg is a database of books which are out of copywrite. Since it only contains books out of copywrite, it is able to give you the book contents as well. Not useful for looking up the NY Times bestseller list, but if you need to look up something from The Art of War or Macbeth, see http://promo.net/pg/ and download the whole book as a text file.
char *mySig;
jackass, enough said... obviously he wanted community aid as he plans on starting such a project if there is not a better solution out for free or relatively close... now STFU/STFD... if you are not part of the solution you are part of the problem
What would really be great is having every book ever written on the internet in full-text (not just a summary), stored in a database (like google does for webpages.). Just imagine being able to type in a search phrase and being able to search the text of every book that was ever printed.
I wonder about the possibility of this based on:1. storage space
2. database efficiency for all that information
3. most importantly: copyright laws.
Probably an agent provacateur for the Author's Guild trying to incite the Open Source community into writing a book tracking list for them to use to keep track the livelyhood-stealing activities of that awful Jeff Bezos and those bastards at Half-Price Books.
--
"Outlook not so good." That magic 8-ball knows everything! I'll ask about Exchange Server next.
I been workin in book distribution fo 3 years, mainly with our database of book infomation. BIP's data is all fsked up, incomplete and outta date. They still ain't getting regula updates on $hit because they technology is so outta date, they can't handle it.
Another case: Ingram, which sells infor to booksellers ova iPage. They was hand typin infomation into they database up till LAST YEAR. And people was PAYIN fo it?
Baker & Taylor is the only database that I would trust at all. I hear they feed Amazon.
Na'am sayin?
OK. Try to imagine that you own more than 100 books. I know it's hard, considering that you currently own <20, but just try. Now, imagine that you would like to have the ability to easily track book loans to others, track where books are stored, show others on the web what books you have for loaning, etc. Do you really want to sit down and do data entry on every book you own? Since you are a geek, you got a free Cue::Cat, so why not just scan in the ISBN and do a lookup over the net instead of typing it all in? I'll tell you why not. BECAUSE THERE IS NO GOOD DATABASE. Sure, amazon.com has some stuff, but it's a pain in the butt to parse their text, and when they update their site format, your 31337 perl script is hosed.
Sheesh! Just because something isn't electronic doesn't mean that there is no reason to store pertinant information in a database.
Having helped my mother do data entry on her collection and having done volunteer work in a small library I can tell you that what the author describes would be very nice.
I have not followed freedb much, but I suspect the software that runs it could be modified to work with books. The Freedb software is under the GPL and uses a MySQL database backend. Someone looking to procrastinate for a few days could probably have a working book database within a few days.
--Ben
Seems to be a project of 37signals. Some interesting work in their portfolio.
Wouldn't it be nice if you could take a digital photo of your bookcase and have it be automatically converted into a list of books?
If such a database also had spine/cover info that let's a program do automated recognition, this would be possible. Then you could put them all up for sale. Or you could look up a book without having to keep your bookcase organized.
You can put your CD and your DVD into your computer, which is why it's handy to have an online database to auto-lookup the details. It's hard to put my copy of Good Omens into my 3 1/2 inch floppy drive, and who the hell would want to type in an ISBN number? And who would even remember an ISBN number? :)
"It's here, but no one wants it." - The Sugar Speaker
Check out bookcrossing.com. You can have your own bookshelf. Just type the ISBN, it retrieves the cover art, the author and all that. You can fix it too.
I use it, I like it.
"Piter, too, is dead."
...then I guess there is no point. I have 5 at home and 2 at work.
The important thing is it outputs XML, so if you want to build an interface to it for your own application, you can. Its not a 100% complete database, but it should give you basic information on any book available.
I wrote this specifically for external search engines back when XML was the new hot thing. Funny thing is, the sites that search us usually want an FTP data feed, so this doesn't really get used much. But again, feel free(be reasonable if you use a bot - maybe limit your bot to a search every 5-10 seconds, please).
No, Thursday's out. How about never - is never good for you?
I am writing my own catalog with MySQL/Perl for several reasons.
1) I don't have enough space in my tiny room to fit all my books into bookcases, but with the db I can put some books in boxes in the closet and easily find out in which box a certain book is.
2) I want my books sorted according to a standard classification system but still be able to have them in my own way in the bookcase. Currently I use a heavily outdated (1987) Swedish classification system that the kind folks at my school library lent me. So I'll definitely take look at the Dewey Decimal system mentioned earlier.
3) I have books in several languages and with a db I can have the same kind of information on different books in different languages in the same place. Thus I don't have to look up the romanization for the Kanji (Chinese charachters in Japanese) more than once. But of course it will store the original Kanji-titles as well.
4) I can easily create lists of books that I want to buy and, that friends have borrowed from me or books that I have borrowed.
When it's finished I want it to handle 2-bit languages in a nice way, be compliant with existing standards for book classification, both Swedish and international, allow for easy list creation and have a nice interface.
Many large library databases are searchable with a protocol called Z39.50. There is a Perl module implementing this protocol (among many others). Check out http://perl.z3950.org/ for full docs. The reason you get back complex stuff when you do a search should be obvious if you ever read the cataloging information about a book in a library catalog. There's a lot of stuff there. If you're using this for making a catalog of your private library, do a "known item search", for example using ISBN.
Have you ever heard of Project Gutenberg? It is basically doing what you are talking about and has been since the 1970's. They have a pretty good collection, and I would totally suggest anyone interested in an internet book DB to help them out with their cause. Although I see your point that a full index of all books (without content) would be a pretty cool thing to have.
"Your 'Gin n'tonic Futon Brain' sure makes you smart!"
"That's 'Positronic-photon Brain', you idiot!"
Does anyone know where I could get a Sports Statistics Database, either free or reasonably priced? In particular NBA game by game Stats?
Readerware uses the Hsqldb Java Database as its backend. The author doesn't document the fact very well, but you can access the database directly, if you have the client-server edition. I've used the SQuirreL Java-based sql client to browse through the database on a number of occasions, and I'm planning on using it soon to do some data cleanup. (Sometimes the python interpreter that Readerware uses to parse the bookseller's webpages gets confused)
Alternately, the Readerware interface does provide an export capability, so if you really wanted to, you could just export all of your data and then reimport it into a SQL database you're more familiar with.
IMHO, the guy who develops Readerware puts a lot of work into it, and I'm happy to pay him a little for such a useful program. Although an open source version of Readerware would be nice, it'd take a lot of work, and Readerware serves my purposes fine.
For all those guys that are wondering "What's the point?", imagine this scenario:
...
...) but linking them together would be one of the greatest creations ever made on the Web.
You're walking into a bookstore. You see a book you don't know, and would like to know more about it. You take you Palm (which of course has a wireless connection to the Net), and type in book:isbn:90-6565-781-9 (yes, that's a real number) in the address bar.
Now, you get all the information you want. Links to professional reviews (with micropayments if you want to read them), user-contributed reviews, offers from other shops, second-hand offers, other books by this author, translations of this book...
Say you are looking at a webpage of someone with similar interests as you. He recommends a book he likes, by giving it a link to book:isbn:90-274-3184-1. You can click that link, and go to the same information as in the previous scenario, but this time, you want to know where to buy it online, offline, where you can find the closest library that has this book available, reserve the book there,
If you have read the book, you would like to write a review about it. Ok, you go to the site, you write a review, submit it, after which other readers can give your review a moderation, which influences your karma, which influences your next review, and so on...
Are you getting the point? The individual components of this idea already exist (Amazon.com, isbn.nu, epinions.com, [yourlibraryhere].org,
I was suspicious of this price because every little bookstore I go into seems to have online access to BIP, so checked on the Books in Print web site. They have a sliding rate, even a free trial. Unfortunately I couldn't get price details because their web site crashed my Netscape browser.
OCLC offers much less expensive databases of books. Their WorldCat database includes 47 million bibligraphic records. Based on a quick look at their site, it that only member libraries who share their databases with OCLC have access to to WorldCat. However, I suspect that free, publicly available book database could negotiate membership.
Note: for participating libraries, the cost of WorldCat is much less than $30K. (I don't know how much, but I know that the public library where I used to work could never afford a $30K subscription to anything, but we did have WorldCat access.)
"The dinosaurs died because they didn't have a space program." - Niven
I'm not sure about NBA, but the kick-ass free Lahman MLB database is available at baseball1.com. It's got stats going back to 1871...
One useful thing about it in its current form, by the way, is that it will do a realtime search of various book sites (those evil patent-wielders at Amazon, BN.com, etc.), and display a table letting you comparison by price or reported delivery time. So that's pretty cool.
John
lies.com
I agree that this is definitely very pretty web design -- but both singlefile.com and 37signals.com have one fundamental flaw which just drives me up the wall -- there is no way to visually find hyperlinks in the page. I was forced into scanning my mouse cursor over the page to find them (or tabbing through them all, which is even worse). Enough to drive me right off the website...
When we were moving, I needed to catalog my books since they're living in boxes until I can afford new bookshelves. There was nothing really out there except a bunch of pieces, so I assembled them and re-wrote some into a cold fusion app that scrapes Amazon, BN, and AmazonUK for titles and cover images and such into a SQL database. Input comes from a cuecat scan or typed barcode or ISBN. Pretty basic, just for me and my family, and yet I considered sharing it with the world... but I can't afford to have my bandwidth hit. If you have a volunteer to host or want a copy let me know at booklist @ webplumbers dot [skip this] com.
UserAdvocate: The voice of the user
Try the Australian mirror as well. Because of discrepancies in the way countries expire copyright, many books are listed on the Australian site that are not available on the main repository.
And before you go bagging fullscale on the US; There are many books listed on the US site that are not on the British. We're not the worst! =)
http://www.gutenberg.net.au
.sig: Now legally binding!
Say until book publishers add something like a magnetic stripe, like on credit cards, or something similar a real Internet book ? Not within my lifetime.
Well, and for at home, 1000 plus or so books, a flat file database will do for sure.
Ugh, Hardcopy....
Find Escorts, Strippers, Massage Parlours, Swingers
We had a room full of tech-books that had accumilated since before I started. When the PHBs would have dumped the whole lot in a skip, I spent the better part of a week sorting through that lot, Addison-Wesley X11 reference, Adobe Postscript reference, C, Fortran, a complete set of A/UX manuals, etc. etc. etc. No-one in the organisation even knew they existed anymore. Being able to zap the barcodes through a reader and generate a catalogue on our intranet would be cool, there are another three depertments that have similar geek rooms that are always one zealous PHB away from the dumpster.
Hmmm... I am sure there must be ISBN search facilities I can screen-scrape.... Coworker types ISBN of their newly received book into a front end and voila! I can grep for it.
Xix.
"Everything is adjustable, provided you have the right tools"
The nice folks at isbn.nu have a database you must check out. Try http://www.isbn.nu/0201563177 for example.
Hasn't anyone heard of isbn.nu? I use the site almost daily, it's also got links to buy the books on retailer's sites. That, of course, is their source of income (referral fees).
I think a book database could be pretty interesting just as a central ISBN/publisher/year/author reference. (Yes, Google is wonderful, but you never know what context an ISBN match is going to be in; the whole point of having a central resource in consistency.) But then, my wife and I have a living room lined with bookcases, and the bookcases are starting to encroach on our hallway and bedroom too. :)
:)
:)
But you could do some pretty interesting stuff with an IMDB-style book database, at least for fiction. I'm picturing entries for fictional characters and locations, along with birth and death dates, even user-moderated (Wiki?) biographical sketches where available, cross-referenced by author. Instant encyclopedia of Arkham/Castle Rock... cool!
But even outside of a single author's oeuvre, there would be great cross referencing stuff you could do.
Say I read and really liked a detective novel that takes place in Los Angeles in the 1940's.
It would be pretty cool to have a reliable database where I could plug in the ISBN of the book I just read, and get a cross-referenced list of other books set in the same time/place/genre - without the busy, sales-oriented "You might also like" mess you get from a site like Amazon.
Maybe include a user comments section, if there's some sort of meta-moderation available - point-missing/inane/poorly written Amazon user reviews instantly send me into a blind rage
-Oh, and you could do automated metasearches with the new Google API, too
Check out EndNote. You can search the Library of Congress and dozens of university libraries. Very handy.
Did I read that right? You mean that title, author, subject, date, and category are not searchable fields? Its impossible to search the contents of a book for patterns? Its not easy to store/index a book's content in the database itself?
Perhaps you typed that wrong, or im misunderstanding you, but that statement sounds profoundly false.
You might want to look into project gutenberg. Because they do that. (If copy restrictions were shorter, they would have tons more stuff too)
I have often asked this same question: basically what I want is an All Music Guide for books.
The site would allow you to look up an author see his or her released books in chronological order by year published. You can't get that from Amazon because it's too clogged with marketing crap and duplicate listings of the same books.
Google and other tools are useful if you know what you are looking for, but for browsing a list of an author's collection there is no reliable source.
Hmmm, if this was 1995 I would smell a business plan brewing...
What you're basically proposing is a way to share bibliographic metadata -- not the book itself, but table of contents information, library holdings, etc. There are standards amongst libraries for doing this (ISO Z39.50 and AACR2--both of which are horribly abstruse and generally a pain to deal with). Dr. Rob Cameron, along with a small group of Simon Fraser University students, has been working on the seeds of a system for sharing bibliographic metadata -- see http://www.usin.org. This basically extends the URI standard to support ISBN and ISSNs, initially to support scholarly communication, but also making it possible to create what we call "personal bibhosts" with support for annotations, shared notes, etc. Among other things, we've implemented searches across various worldwide libraries to obtain and compare bits of bibliographic info, and so forth. Yes, you still run into the problems of inconsistent data for a given ISBN/ISSN (as a previous poster pointed out), but hey...you have to start somewhere!
I've been working on a project, OpenCritic, which aims to build an IMDB-style, open platform for the collection and cross-referencing of book/movie/music/etc. data.
The goal is to be a) cross-media, i.e. supporting lists and articles that interrelate different media; and b) open content, which is to say, done with a GPL-style license to guarantee open access to it in the future (unlike IMBD or CDDB).
You can check it out and see the project mailing list at http://www.opencritic.org
spdinpdx wrote:
Also, once you had the database, you (or anyone on the net) could add stuff like scanned images or ASCII versions of the TOC and bibiography (so you can check from work whether that book at home is worth driving back for at lunch), reviews, recomendations for related books, links to or copies of related papers...
---- Tim McCormick http://www.tjm.org
is a database in most big libraries,
especially university libraries.
has tons of stuff.
but you are smacking into the heart
of things here. information analysis and
gathering on printed matter is a huge business.
some of the largest multinationals in the
world, reed elsevier for example, do nothing
but organize and provide access to electronic
databases of printed matter.
these companies, like firstsearch, own the
copyrights on these databases and they are not
going to give them up anytime soon. in some
cases even using a systems page number system
can get you in trouble, like when Westlaw
tried to shut down a rival lawbook publisher.
Those lawbooks, like you see in every lawyer
show on TV, they are gold with a red stripe,
are simply written versions of court cases,
that have been annotated and organized and
indexed.
yes you can get all this on computer, but
the companies taht own this information
are not going to give it up easily.
Something like this is going to have initial and ongoing costs. Even if it is developed under an open license, there should be some provision made for commercial use and licensing, but not ownership, of the database once completed.
The alternative is to have the project run out of money, and be bought, probably by a business, and then commercialized anyway.
The best projects will always be those that balance the commercial aspects with the public interest aspects.
It is an excellent idea, however.
Abebooks has software (free) called Homebook. It is a database management program that allows you to enter the ISBN number and the program goes out and retrieves the rest of the information for you. With a bar code reader this is very nice.
This is good for keeping your personal database. BIP is not $30K a year but is still too expensive for individuals. When I owned a small bookstore I used to get Books in Print from the library when the new editions came in. Sometimes you can arrange to get the older copies from large bookstores. Everyone uses the CD's now instead of the books.
BIP only supported windows last time I checked.
"The Omelette" - A retort to Malda's Omelette analogy.
Let me try to give you an analogy for Slashdot's homepage.
Yes, please liken something to something in a cliché staid analogy because we the reader are too stupid to understand any overly complex and high level reason why you can't explain yourself properly. Either that or you are full of crap, don't know what you are doing and are lucky as hell to have what you have.
It's like an omelette: it's a combination of sausage and ham and tomatoes and eggs and more.
It is a motley collage, a miasma, a montage or eclectic and seemingly unrelated things. It may be a myriad of unrelated things, related at only the most abstract levels. It certainly isn't an omelette.
Over the years, we've figured out what ingredients are best on Slashdot.
What critical acclamations have you had that makes you think this is so? Just because you get a lot of hits, and subsequently subject your readership to unwanted bandwidth consuming detritus, doesn't mean you know what's best. It is just like a Reynolds family member claiming they know what's best for them, nicotine and smoke are not unhealthy, and then they die of lung cancer. You are an egotistical megalomaniac. If this site was run based on a meritocratic method rather and juvenile selfishness, it would have serious potential.
The ultimate goal is, of course, to create an omelette that I enjoy eating: by 8pm, I want to see a dozen interesting stories on Slashdot.
The ultimate goal is to please yourself, to feed your id. You have no desire to please the community by which you make your living. You are selfish, sheltered and removed from your community. You are on a one way soapbox, a pulpit, and you talk at people. I would probably include you in a list of people I would kill if I could get away with it.
I hope you enjoy them too.
I do not.
I believe that we've grown in size because we share a lot of common interests with our readers.
Mobocracy is good? You would rather collect people without regard to quality. This means nothing. Budweiser is the most consumer beer, but its garbage. This is analogous to Slashdot, to stoop to your food and beverage analogy. Bud beer. Its good because a lot of people drink it. No, no. Don't bother trying to get critical acclimation. Don't bother, you know as long as you "control" Slashdot, you never will.
But that doesn't mean that I'm gonna mix an omelette with all sausages, or someday throw away the tomatoes because the green peppers are really fresh.
So serving rotten food is acceptable how? Its better to keep your silence and let people wonder if you are fool than to speak up and remove all doubt. "Gonna." Pathetic. Simply pathetic. This is a hick like expression, akin to something on the order of, "I'm gonna open a can of whup ass on him for peggin Mary Joe Susie Lee."
There are many components to the Slashdot Omelette. Stories about Linux. Tech stories. Science. Legos. Book Reviews. Yes, even Jon Katz.
Jon Katz is the worst thing about this place. If it isn't the wasting of my bandwidth that I pay for, its this that bothers me the most. On a sidebar, I would like to hold you and the rest of the scum who send ad banners to my connection legally liable for unwanted bandwidth usage. This crap half the time doesn't even come from your site. It would be less of an affront if you stored you vile ads on your own site, but you took the easy way out and decided to outsource the production of garbage to similarly-devoid-of-ethics people with slightly more intelligence and infrastructure to provide this illegal content.
By mixing and matching these things each and every day, we bring you what I call Slashdot. On some days it definitely is better than others, but overall we think it's a tasty little treat and we hope you enjoy eating as much as we enjoy cooking it.
Grotesque things are often of huge interest to people. This holds true with me in regards to Slashdot. I hate you, I hate Jon Katz, I hate most of the content here. Some of the best stuff is written at -1. You would suppress those who are different while you are "different like everyone else," just another marginally educated half assed "programmer" who on the scale of things lucked out even more so than Bill Gates (reason: I would assume your IQ is probably his divided by 2 or 3 and you aren't working at a McDonald's where you should be). Whenever you have participated in a discussion thread, you are obnoxious, rude and ungrateful. You policies are horrible, you content is basically a smattering of other people's work and you benefit from this. You web page reeks of someone who completes nothing that he starts. Your obsession with anime is a testament to how juvenile you are, your spelling is horrific, you grammar is oft questionable; you are a poor editor Mr. Malda.
I hope only the worst outcomes for any and all of your endeavors henceforth. I hope your fiancée or if you are lucky, your marriage falls apart. I hope your Jubei breaks. I hope you lose your job. I hope that you fail because you are displacing true talent.
Answered by: CmdrTaco
Last Modified: 6/14/00
Prok Fried rice.
For a centralized catalogue of free documentation (FDL and alikes) take a look at http://www.gfdd.org. It's both a tool (php+pg) and an index.
Still beta stage though.
bye
So I scan or type in the ISBN, a perl script grabs the books information from the LOC(via z3950), and when I'm done, the system spits out a list of books in LOC order with the Title/Author next to it.
And what better to scan the ISBN with than a Cue Cat. My mother has about 400 paperback romance novels, and every time she goes to the bookstore, she can't figure out if she's read that book yet or not. She picks a book up, reads two pages, and says "I can't tell if I've read that one before or not." (Of course, I ask her how can she tell?) A Cue Cat and a CDDB style book database would allow me to scan the barcode and catalog every one of her books very quickly so she can bring a printout to the bookstore with her.
Why would anyone start/support such a project non-commercially? As soon as Internet users realise they're not going to get everything for free (and no, even 'free' software is going to cost 'nothing'), then the sooner the Internet will become a useful portal to useful things.
Sorry for getting worked up about this, but a lot of guys sitting here moaning about what 'should' be out there really naff me off - if you want it for nothing then go build it, if you can't be bothered to go build it, then don't expect other ppl to, and *certainly* don't complain when they don't.
There are lots of publishers whose catalogs never make it into the listing. Also, as we might gather from the title, it only covers books in print, which obviously excludes a huge range of publications.
A database would be a great idea for this reason... (though it doesn't necessarily have to be centralised).
check it out .... http://ISBN.nu ...
better than ANYthing out there.
This idea about a master book database is fine, but what we REALLY need is a nonprofit entity that hosts and preserves reader comments.
Amazon now absolutely dominates the book industry because it got there first. Book critics (like
me use amazon to publicly share our comments and criticism.
Amazon does a great service, but it resides in the commercial sphere. What if they decide that you need to pay a subscription to have access to the reviews? What if they decide that the author can't republish the reviews elsewhere? What if Amazon goes out of business? Eventually competitors will spring up, and amazon will no longer wield the influence it does. But we need a centralized solution to store these reviews.
I do agree that this would pose certain programming challenges, as well as legal challenges (i.e., how to moderate postings? how to limit space? how to protect against libel?).
Ideally this sort of database should exist for all the arts, but I suspect that there are advantages to separating them.
Robert Nagle, Idiotprogrammer, Houston
I have several thousand books, and I want to catalogue them for insurance purposes (they're not worth anything, I just want a list in case something happened to them, like fire).
I'm not particularly competent with scripting so I've just been using endnote (www.endnote.com) which is a bibliographic tool, to look up book information and download it to my computer. I then export that information to an Access database. I've also done a similar thing for my CDs.
It is extremely tedious. At work in our library we catalogue everything ourselves because we can't afford access to the big databases like Kinetica.
An open source system would be extremely expensive, because acquiring all that data from LOC, BIP, etc will cost you money. Not to mention the fact that it is hardly comprehensive. Unless you only own run of the mill paperbacks, you will find that there is almost no information left about your books. Knowledge about classic texts from the 60s even, is starting to disappear as last copies are discarded.
I would like to think that the American Library Association or similar could get involved with such a project if it started up but i doubt that they would.
Well, one could just use the very mature IMDb engine and just modify it. This would work for all kinds of databases.
You can get the source at ftp.imdb.com. Little known fact. You can get all the files used there, including the database. I don't know the license used, though.
-twb
Reasons why a book database is much harder than a CD database:
1. There is generally only one editiion of every CD. There must be hundreds of Editions of (say) Pride and Prejudice. Do you keep one record in your database, or many.
2. How do you uniquely identify a book? CDs have track number and lengths (and maybe digital IDs?) which are always the same. Some books even change their titles between editions. Loc Control numbers and ISBNs only apply on a per-edition basis
3. Performing lookups will be much harder because you have to figure out the ID beforehand and enter it manually, as opposed to just popping a CD in the tray and letting the computer figure out the ID. This will mean fewer people adopt the the syatem. (Bar cide scanners are your friend, in this case.)
This is why library science is such a huge discipline.
SearchDay - Curling Up with a Good Book Search Engine.
One thing that you might want to consider is that
the descriptions of digital information is often
unique. For example, CDDB and freedb both use
the TOC data and the disc's running time as a
means to identify the disc, as these are likely
to be unique.
Books are not so easily identified, as any
librarian familiar with field #300 in a MARC
record will tell you. As a book's edition
changes, this description can change as well.
I am not sure that I understand the problem AC is
trying to solve, but he might want to look at
this site hosted by those Sourceforge people:
http://www.oss4lib.org
Hope this helps...
These PHP programs are noteworthy.
sig: BeanShell: lightweight scripting for Ja
I am currently building a database if ISBN numbers with the following records: Title, Author, Publisher and Media.
It hadn't really occurred to me that others might like access to this kind of data as well.
Seriously, is there enough interest that it might be worth the effort to add a request interface that returned an XML object of the data that I have? Would others contribute to it?
I currently have 294,652 completed entries in my database. I'm out of work and bored, and I'll make it publicly accessible if I get some feeback indicating that it would be worth the effort.
-Chris
-- This sig is only a test. If this were a real sig it would say something witty. --
See related article/discussion on Advogato from a few days ago.
I put together a simple mysql/asp (i know, sorry, it was before i learned php) to categorize my books. Still in progress, and ive only gotten around to adding about 1 shelf out of the roomful of books. It primarily uses author/title info for storing books, but also ISBN, publisher name, and pub date. For multiple copies, it keeps track of individual copies. It also keeps track of condition, and copyright date vs. printing date. It is VERY time intensive to add books in however.
my books site
I built one but it is very free-form and may not be what you want.
My objective was to do a quick keyword search on a list of 100,000 records from several different sources. Generally I have one line per book, and while some of the indices provide more information that is all I use.
I didn't want to spend the time to do a real database job and I wanted to use Perl regular expressions to do a quick keyword search within author and title text. So I keep recent indices next to the search program compressed variously with zip, gzip, or bzip2. I can direct the system to make a single text file which contains the unpacked text all appended together and compressed again. It will also list stats for each file.
Its main function is to wait for a keyword to be typed in, and it will immediately (PIII/450MHz Linux Inspiron 7.5K) display a numbered list of matching books, in alphabetical order grouped by indice. You can then select certain numbers from the list, or reduce the number of records by adding more keywords. This is sufficent for me and has helped me discover unknown titles and new authors because of its way of narrowing down on information. Perhaps if I had more structured files I would have used Perl's BoulderIO which has solved huger problems of library science in merging genome data files, see bio.perl.org.
OMG! An "Ask Slashdot" question in which the submitter checked search engines first!? :-)
Computer Science is no more about computers than astronomy is about telescopes. --E. W. Dijkstra
There is an open source bibliography management program called pybliographer that will eventually (I hope, as I am the one who is working on part of it) have Z39.50 client functionality. If you are a programmer who knows Python (or wish to learn), please stop by the website, read the discussions and source code (it's really not that bad, it only took me a few months to pick it up), and maybe help out? From what I've been reading in the mailing list is that the main developers are working on a more robust way to store the data--eventually leading to perhaps a share-able database, maybe even adaptable enough to become the program that the parent post is alluding to.
Linux at home
I would love a multilingual book database.
As an amateur linguist I collect books in all the
languages of the world. I often try to get the
original language version of a novel that has
become famous in its English translation. This
type of information is very difficult to find on
the internet currently - especially for non-Latin
script languages such as Chinese, Arabic, Thai.
Such a database would require proper unicode
support, standard romanization methods, and
understand that Author's names don't work the same
in all languages. How many times have I had to
look under both "G" and "M" in bookshops for the
works of Gabriel Garcia Marquez!
Books in print only covers books currently carried by commercial distrtibutors, and only those in English (I am assuming it covers US, ENglish, Australian, Canadian etc, be it in several volumes or in one fat subscription).
We need something for books in other languages/countries (I am Spanish and own a sizeable number of South American Books).
On other news, Andrew Plotkin (Zarf of Inform fame) has a nice tale of his project to digitize his book collection catalogue: The Book-Scanning Project. Sorry if this is redundant, one really doesn't have the time to read aaaall other contributions.
http://barrapunto.com/ - News for nerds, en español
This kind gentlman has posted a solution for owners of books that have barcodes. Scan in to a text file, run the script and it gets the info from Amazon.
http://www.eblong.com/zarf/bookscan/
Are you spontaneously enthusiastic about everyone having everything you can have? - Buckminster Fuller
If you just want to import citations, the Z39.50 search and retrieval protocol is the way to import from yor library catalog and many online databases. Indexdata has number of multiplatform tools that you can use, such as YAZ (a z39.50 client) and PHPYAZ. Three commercial packages import from Z39.50 sources nicely (Bookwhere, Procite and Endnote) both Procite and Endnot work well at managing your footnotes during workprocessing, taking care of numbering and layout (e.g. APA or Chicago Manual of Style, etc.).
If you want something under GPL and more oriented to managing web sites and other Internet resources, then you may want to try hypatia. You'll have to ask special for it, but it's available. Here are the parts I've seen so far:
- Web-based interface, both end users and maintainers.
- Fully multi-lingual, including both interface and content. (It is very easy to add another language to the interfaces. Right now English and Spanish are complete, Norwegian and Finnish are being translated.)
Support for Unicode (Which means you're free to add interfaces in or
).
- Useable on many different platforms, including Linux, Unix,
and Windows.
- Individual installations can exchange records, allowing
federated content and service providers to work together
seamlessly. (Haven't tried it yet.)
- Compatible with relevant standards, including MARC, Dublin Core, and the
Networked Reference standard currently under development by NISO.
- Special features for digital collections, such as automatic
URL checking.
- Authority control over names (e.g. People and Organizations).
- Uses perl/MySQL/javascript
You can see the end user interface in production at the IPL in the serials, newspapers, or online texts collections. The collection managment interfaces are even nicer and very useful. I'm sure it can be tweaked for data on legacy media as well.Beta is broken and the link to classic doesn't work. Stop wasting our time or there won't be anybody left here.
You might want to have a look at project GnuteMberg! (not to be confused with project Gutenberg) subproject, GnuteMberg! Free Documentation Database. It's an effort to catalog all documentation covered by free (as in "free speech") licenses. It might as well be a starting point, and all the code is released under the GPL.
The post is about book metadata (author, title, ISBN number...), not text.
CDDB and FreeDB are not databases of MP3 streams. They just store and exchange metadata such as songs titles.
If you do OCR on books, and don't want to rip them apart, you have to put both pages on the scanner at once, or screw around an emormous amount. 11x17 works OK for this, these new scanners are really junky for doing this.
As a server-side app, it doesn't have to be distributed to others. So the GPL won't do any good. I like the interchange of info between databases, but that can be very hard to implement correctly. You'll run into all kinds of mismatches where you need to do the right thing (when do records denote the same book?). You can fix a lot of these problems by focusing on the ISBN, but you'll still run into inconsistent data and the like. There will also be security issues, you have to make sure that the guy who you synchronize with hasn't polluted his database with wrong data. Lastly, copying an entire database takes a lot of horsepower. You don't want people doing this to your database during busy hours.
Apart from these technical considerations, it's very likely that one database will become the standard that everyone uses, even when there is more than one choice (everyone will use the most extensive, most correct database). I doubt that there will be many syncronizations into this database, most people will be interested in making a copy of it.
All in all I think the best bet is to incorporate a feature that makes it easy to automatically dump the database to a mirror at a certain time (midnight or so). There's no good way to make sure that this actually get's done though, but you can always write a screenscrape application to punish the organization that doesn't supply you with a dump (scraping the entire database will hurt them bad). When the organization goes nasty, you can switch over to a copy and try to beat them with an open alternative.
The Drowned and the Saved - Primo Levi
What would be really valuable is a Google style index that pretended that every page of every print version of a book was a different web page and fully indexed the content. The access would have to be a little different, but I'd love to have it.
Basically, you could enter a search phrase, get a list of books it hit with excerpts, click on one of those and get a list of print versions, choose the print version you have and be told what page the hit is on.
The design could be bolstered by allowing you to enter the particular books that you have in your library and automatically narrowing down the lists.
It could also contain online material so as to allow you to search both the Internet and book libraries at the same time.
Most libraries buy cataloging information from the largest database of records in the world (books, serials, cds, dvds, etc) called WorldCat.
47 million records. Cataloging comes from member libraries from around the world and is shared by all.
Many public libraries are making the WorldCat database available to patrons via remote access
very often through another OCLC service called FirstSearch.
Info about WorldCat at
http://www.oclc.com/
cheers,
rubble (a librarian from d.c.)
I dont know the means that this site is using but :
http://www.spinfree.com/singlefile/
is an online book cataloguing service that uses ISBN to fill form-data fields, and they have an Export facility
These guys used to have an app that had similar functionality.
[Disclaimer: I am not affiliated with this service in any way - just a happy user]
--
"Everything in moderation, including moderation"
Try Allreaders.com. (http://www.allreaders.com) They let you search by very specific elements of plot, setting, theme, and character to find exactly the book you're looking for. It's the only "Browsable" engine for books that lets you search for kinds of books, instead of a straight title/author search.
The Library of Alexandria is a book recommender database. I joined when they were still in their pure data gathering stages. Nowadays you have to click on the Departments:Recommender link to get past the online fiction store.
You rate several stories [Dreadful, Boring, So-so, Enjoyable, Really Good, Excellent, Fabulous]. Then you can ask for recommendations. The database correlates your ratings with everyone elses ratings and finds the people with ratings closest to yours, your "neighbors". Then it uses your neighbors' ratings to recommend books that you haven't rated yet. The recommendations each have a confidence rating [Pure Speculation, Wild Guess, Extremely Low, Very Low, Low, Medium-Low, Medium, Medium-High, High, Very High, Extremely High, Almost Positive] based on how many neighbors recommended the book, what the range of ratings are, and how "close" each neighbor is. Obviously, the more books you rate, the more accurate the system can be. With this system I've discovered lots of books that I love, but never would have picked while browsing in a bookstore.
Dragging this post somewhat back on-topic, users can enter in story title's and authors that are not in the database yet. Similar to CDDB and freeDB, most stories were entered by the users, not the administrators.
I am not connected with the Library of Alexandria website except as an occasional customer of their online store and as a long time user of their database; over the past four years or so I've entered 2314 ratings.
forget about DVD
How many posts do we have to have asking what the purpose is before they become redundant? Those posts have no purpose, yet it isn't stopping people from postng them.
Being able to search by topic/genre alone is more than enough reason. Booksellers are good for some searches, but they don't often list books that are long out-of-print. Sometimes I just want to know who wrote some book I read long ago. If the db contained skeletal plot info, you could find that book whose name you forget. References to other works, a la IMDB, which is something the book itself doesn't even contain.
These are just off the top of my head.
For two good reasons:
1.
I am currently developing an online application for college students to list their books for sale to each other. It would be nice to ask for only an ISBN and populate my listings with CORRECT info, it's also be nice to have one record per book. Not one for "john doe, title" and "johhn doee, titlle". Easier to validate, good data coherency and accuracy.
2.
it costs frickin $30,000 a year for the definitive resource, I cannot afford it, my free service will not use it. So there!
Most of the books I've read I can remeber most of it (or at least the general plot/themes/ideas of the book). I've almost gotten to the point where I could go into a library and just ask for the books that came out in the past week. Or what I do is download new books (legally even at Project Gutenberg, run it through some text-to-speech program, grab the audio and burn it onto CD to listen to while I drive.
Hmmm, I have 5 mod pts, its time to metamod, and on top of that I have to meta-metamod? When do I get to read slashdot?
While the GPL was not aimed directly at stopping things like this, this is just one of the beautiful side effects: I prevents exactly this type of thing from happening. All those out there who love BSD and it's license should wake up and realize that Apple and Microsoft have been leeching off your talent and generosity.
Is there someway so that this could be donated into the public domain or something from day one?
Yes, put it under a license that is similar in spirit to the GPL. GPL has always been about protecting the users' rights. A license similar in spirit for any database should do the same.
This is just goes to show how little people pay attention to books and libraries. Libraries have had an *open* standard, called Z39.50, to access databases containing information about books, for years. It is a standard based on MARC records, and just about every commercial library cataloging program is based on it, and can read/write these records. The Library of Congress provides full access via the Z39.50 interface to two databases, one for people to test with, and the other containing the entire LOC.
The ISBN is just a number. It is assigned by the publisher. Anyone can use that number.
But an ISBN is not unique. There are books with more than one ISBN and one ISBN with more than one book. (No, not just different covers on same book -- totally different books) Make sure your programs can deal with exceptions.
However, getting info on a book from a non-public source is not OK. You're using the resources of that source, and they probably also have copyright on the data. If they say you can do that, fine.
For years I've intended to get just my Asimov collection into a database of some sort, but the idea of pulling down all the books and typing the info by hand is daunting (or maybe ridiculous). So, I've spent the last hour or so playing with some ideas I picked up from this article's discussion earlier. After hunting around in vain to see if there was a decent book database out there, I started thinking "ya know, if we /.ers all got together, we could probably build and populate a database..."
...IOW, I'd love to contribute to such a project. If anyone is seriously thinking of starting such a thing, let me know.
What timing you have!.. Just 2 days prior to you posting this I was searching on the net for a service similiar to this! I'm trying to set up a friend's bookstore database and was trying to create a script that will automatically import the upc on the back of the book which would automatically populate the database with pertanent info on the paticular book. ie. description, author. Yet I am without luck in this mission,.. If you could please tell me of any info developing on behalf of this post... Please.. please.. tell me! Otherwise I'm gonna have to parse thru amazon.com and that is unstable and possibly illegal!... .. Don't make me break the law! hahaha
josh_moch@hotmail.com
Hey, thanks for the tip on endnote. Very nice export feature that will output any format you can imagine! In a matter of a couple hours of playing around, I managed to tweak a custom output template to export the data to XML, and create a rudimentary style sheet to transform what I wanted to see into a table. The biggest pain was catching all the "special" characters, but a few global-replaces and I was done. This is perfect for my own need to track not only what books I do have but what I want to get.
This seems to be a potentially good tool for what the author might need. It will be a two step process -- 1) find what you want using endnote and export the results in XML (or some other easy to manipulate format), 2) an automated procedure for importing and storing the information in the local database.
Library of Congress Online Catalog (USA):
http://catalog.loc.gov/
British Library Public Catalogue (UK):
http://blpc.bl.uk/
Um...
Libraries already do this via OCLC (and actually there are now vendors/jobbers out there that can and/or will do this for about a US$1.50 per book)
Your complaints about being offended offend me.