Slashdot Mirror


Advice for Building a Multi-Platform Lyrics Database?

AntonOnymous,Cowherd asks: "I am in the process of designing an application for general public use. The application will allow end users to search and display a large collection of songs (both lyrics and tunes) with annotations, all in text format. The intent is for this application to run cross-platform (Linux, Windows, Mac, and whatever else), so I want to avoid platform-specific binaries as much as possible. I also believe that the program should be Open Source. The end users will not necessarily be computer experts, so I want to avoid as much additional setup on their computers as possible. The application (data and program) will all be stored on a CD or DVD, and it should be able to be run locally. The most important part of this application is the data, not the program, so the guts of it should be fairly simple with a decent user interface. Does anyone have any suggestions as to general approach to setting this up, or have any pointers to existing open source programs which already perform a similar function?" "One way to implement this would be to set up each song (with lyrics, tune, and annotations) as a single record in a database. I would like to avoid the inherent security issues and overhead of setting up and running a database on a user's computer.

Another possibility, which is fairly appealing, is to use a Web Browser to provide the user interface, and to use Open Source text indexing/searching programs (such as Lucene or Egothor) as the engine. It is probably safe to assume that most users have a Browser. However, most users probably would not have a web-server (even a local one) on their computer, and going by the principle of as little messing around with the user's computer as possible, I would like to avoid having to set one up, even a local one."

3 of 65 comments (clear)

  1. Start with the MusicBrainz code by Matt+Perry · · Score: 2, Informative

    You can start with the MusicBrainz codebase. The schema already supports albums, tracks, and annotations. You could extend it for your purpose to add lyrics. A daily dump of the database is available as is the source code to the server application.

    --
    Slashdot: Failed Car Analogies. Amateur Lawyering. Anecdote Battles.
  2. Re:He didn't say anything about a service by vorpal22 · · Score: 2, Informative

    I can't recall the name of it (PearLyrics, perhaps?) but there was an excellent program for OS X that integrated with iTunes and would query several sources and download lyrics to songs you were listening to. The author took it down under threat of being sued by the RIAA, if I recall correctly, and it didn't violate copyright in any way imaginable.

    Even if the guy would have won in court, there's likely no way he could have afforded the legal costs, unfortunately, and his programming time was wasted :(.

  3. Re:Internationalization / UTF-8 by Kelson · · Score: 2, Informative

    Well, "script" doesn't really make sense in the context of your original post, but I'll take you at your word that you don't see the appeal of mixing scripts on one page.

    To start, I'll direct you to the Japanese codepage 932, which includes at least four scripts: basic latin alphabet, katakana, hirigana, and kanji. People seem to have thought it was necessary to be able to use all of those on one page, perhaps because Japanese tends to mix three of them together on a regular basis and likes to throw in English words for flavor. (No doubt, Latin characters helped to write computer programs as well.)

    Unicode just extends the principle so that you can do things like:

    • Aggregate titles from articles in multiple languages
    • Use one language for content and another for labels (or, in the case of the web, navigation)
    • Write something like a Japanese/Russian dictionary intended for readers, that displays words the way you would see them in actual Japanese or Russian text

    ...and so on. The Unicode character set is just a big flat space, just like ASCII except with a lot more code points.

    The point about internationalization perhaps shouldn't focus on UTF-8 specifically -- one could use UTF-16 instead -- but both encodings give you access to the Unicode character set, which allows you to, as you put it, "define the code page once per document."