Slashdot Mirror


Advice for Building a Multi-Platform Lyrics Database?

AntonOnymous,Cowherd asks: "I am in the process of designing an application for general public use. The application will allow end users to search and display a large collection of songs (both lyrics and tunes) with annotations, all in text format. The intent is for this application to run cross-platform (Linux, Windows, Mac, and whatever else), so I want to avoid platform-specific binaries as much as possible. I also believe that the program should be Open Source. The end users will not necessarily be computer experts, so I want to avoid as much additional setup on their computers as possible. The application (data and program) will all be stored on a CD or DVD, and it should be able to be run locally. The most important part of this application is the data, not the program, so the guts of it should be fairly simple with a decent user interface. Does anyone have any suggestions as to general approach to setting this up, or have any pointers to existing open source programs which already perform a similar function?" "One way to implement this would be to set up each song (with lyrics, tune, and annotations) as a single record in a database. I would like to avoid the inherent security issues and overhead of setting up and running a database on a user's computer.

Another possibility, which is fairly appealing, is to use a Web Browser to provide the user interface, and to use Open Source text indexing/searching programs (such as Lucene or Egothor) as the engine. It is probably safe to assume that most users have a Browser. However, most users probably would not have a web-server (even a local one) on their computer, and going by the principle of as little messing around with the user's computer as possible, I would like to avoid having to set one up, even a local one."

65 comments

  1. spam? by Ajmuller · · Score: 0, Offtopic

    since when did spam for someones program get promoted to the front page of /. /// heads back over to digg

  2. Internationalization by CRCulver · · Score: 4, Insightful

    Whatever you do, please store everything in UTF-8 encoding, since most of the lyrics of the world's music are not in English. I was outraged the day I discovered that the old CDDB system required everything to be in ISO-8859-1. What is someone to do with music in foreign scripts? ISO-8859-1 doesn't even have the necessary characters from standard Latin transliterations (such the characters with carons for Cyrillic transliteration).

    If you don't have any experience with Unicode issues, a problem shared by a regrettable number of developers, try Gilliam's Unicode Demystified .

    1. Re:Internationalization by fm6 · · Score: 1
      You're more right than you know. Really, every programmer should know the basics of Internationalization these days. The problem is that there are a lot of obsolete concepts floating around, as symbolized by the fact that most people still think that "ASCII" and "text" are the same thing. That's not been true for a long time, even if you're writing software that doesn't need to be localized.

      The inventors of Java had a the right idea: store all your characters using Unicode, and translate them to the local character set when you do I/O. If you implement this right, programs, are Internationalized by default, even if the programmer doesn't know what Internationalization is. Unfortunately, a lot of early Java class libraries did not implement it right. Worse, a lot of early documentation didn't make this internationalized-by-default feature as clear as it should have. To this day, most Java applications don't get Internationalization right on the first release. And this on a platform that was designed to make it easy!

    2. Re:Internationalization by david.given · · Score: 1
      Unfortunately, a lot of early Java class libraries did not implement it right.

      In fact, Java --- and Windows --- got it so catastrophically wrong (using 16-bit values for characters, instead of 32-bit value) that it was found easier to change the Unicode specification to prohibit most characters that wouldn't fit in a 16-bit value!

      There is a standard in place for encoding such things using 16-bit values; it's UTF-16, and given that it's a variable-length-character encoding like UTF-8, it rather defeats the whole purpose of using 16-bit characters in the first place. Most apps you'll meet just ignore it, and will break horribly when they come across a UTF-16 extended character.

      Your best bet is simply to do everything in UTF-8. It degrades nicely into ASCII, which means that all your old friends like strcpy() and strtok() will Just Work in the vast majority of cases that you'd be interested in. Admittedly, to find the nth character in a string you've got to start at the beginning and read (and discard) n characters, but you'll be surprised how infrequently you need to do this.

    3. Re:Internationalization by Anonymous Coward · · Score: 0
      The inventors of Java had a the right idea: store all your characters using Unicode

      The inventors of Java had the right idea, copied from Microsoft who had it long before them: All characters are natively Unicode in NT-based Windows, and are converted from ASCII and back as needed. Win32 API calls secretly come in A or W appended flavors, so when you call e.g. SendMessage, if you chose to do an ASCII build, SendMessageA is instead called which converts to Unicode and calls the wide version.
      If you implement this right, programs, are Internationalized by default

      Nothing is internationalized by default. There is no magic that converts a program's English strings into their Traditional Chinese translations.
    4. Re:Internationalization by fm6 · · Score: 1
      In fact, Java --- and Windows --- got it so catastrophically wrong (using 16-bit values for characters, instead of 32-bit value) that it was found easier to change the Unicode specification to prohibit most characters that wouldn't fit in a 16-bit value!
      Please. Both Java and Windows simply implemented Unicode. The decision to try to do every character set on the planet in 16 bits was Unicode comittee's decision, not Sun's or Microsoft's. And it's a mistake they've been able to work around.
      Your best bet is simply to do everything in UTF-8. It degrades nicely into ASCII, which means that all your old friends like strcpy() and strtok() will Just Work in the vast majority of cases that you'd be interested in.
      So UTF-16 is uncool because it doesn't degrade gracefully for 32-bit characters, while UTF-8 is great even though it doesn't degrade gracefully for 8-bit characters? That's absurd. If all you care about is the 7-bit characters that UTF-8 supports, why not just use ASCII? And if all you care about is supporting a large body of Western users, why not just use Windows 1252? After all, it will work correctly on 90% of the computers on the planet!

      Every few days I get an email from libraryelf.com, reminding me of what books I have checked out from the public library. All their emails and web pages use UTF-8, which makes sense for the kind of material they're handling. However, they forgot to specify the character set in the email headers. So right now, I book I have out by Arturo Pérez-Reverte is listed as by Arturo Pérez-Reverte. So much for degrading gracefully!

      When I said that the early class libraries for Java were screwed up, I wasn't talking about their choice of Unicode. I was talking about the authors of the libraries who made exactly the kind of bytes-are-characters mistakes that you're making. Whether you use UTF-8 or UTF-16, you need to get away from that.

    5. Re:Internationalization by david.given · · Score: 1
      ...while UTF-8 is great even though it doesn't degrade gracefully for 8-bit characters? That's absurd. If all you care about is the 7-bit characters that UTF-8 supports, why not just use ASCII?

      Because when you're parsing text, you're usually only interested in a few special characters --- control codes, spaces, etc. These are all in the ASCII range. These means that all the UTF-8 extended characters will just pass straight through, unchanged, correctly. You don't need to worry about them. Because UTF-8 consists of ASCII (with the top bit of each byte clear), and extended characters (with the top bit of each byte set), there's no chance of corrupting an extended character or misinterpreting part of an extended character as an ASCII character. It's astonishingly convenient to write code for.

      The reader cares about the extended characters. You, the one writing the parsing code, usually don't.

      ...right now, I book I have out by Arturo Pérez-Reverte is listed as by Arturo Pérez-Reverte. So much for degrading gracefully!

      But it has degraded gracefully --- you can read it, can't you? If they'd sent it in, say, UTF-16 instead it certainly wouldn't be readable; you'd have an incomprehensible block of Base64 encoded binary data.

    6. Re:Internationalization by fm6 · · Score: 1
      Oh, you're talking about parsing. I thought we were talking about reading. But you're still wrong. If you're parsing anything, you should be using well-tested parsing libraries, not rolling your own crap using libc functions. The problem with re-inventing the wheel is that homemade wheels are not very solid.

      It's absurd to call that kind of garbling "degrading gracefully" just because it's sort of readable. And the UTF-16 version will be perfectly readable, if the sender remembers to add the correct character type header to the message, as the sender of this UTF-8 message should have. No matter what character set you use, you should always provide the metadata your receiver needs to interpret it correctly. There are many good reasons to favor UTF-8 over UTF-16 — but the fact that it exacts smaller penalties for sloppiness is not one of them.

  3. web service by fuct000 · · Score: 2, Interesting

    store all of the data on a server and write either a .NET or Java EE program to share the information as a web service. Then just have a desktop client people download, which contacts the webservice to request the information.

    --
    Free continuous multi-player strategy http://www.holy-war.com/
    1. Re:web service by AntonOnymous,Cowherd · · Score: 1

      Nope. Web service is out. Or, at least, it cuts out the requirement that this be able to run as a standalone app on a system that is not connected to the Internet.

      --
      ... a titanic intellect in a world of icebergs...
  4. Wiki on a stick by Glonoinha · · Score: 3, Interesting

    Sounds like a perfect application of Wiki on a stick. I set one up in a few hours, most of which I wasn't even sober - and it can install with a zero-footprint (designed to run from a thumbdrive.)
    I have a little more write-up in my Journal, along with links.

    --
    Glonoinha the MebiByte Slayer
  5. Canned DB. by Anonymous Coward · · Score: 0

    You're making this more complicated than it needs to be. Since the DB will be frozen you can pre-compute the index and store that. The front-end can be browser-based since browsers are on every platform, and within a subset are compatiable. O-Reilly use to do this with some of their book/CD combinations.

  6. Get a good lawyer. by Anonymous Coward · · Score: 0

    The record companies will come after you sooner or later if your site is successful.

    Never mind that you're providing a service that they no longer care to provide. I remember when albums and even early CDs came with the lyrics to every song in a booklet. Don't see that very often anymore, if at all. And I have probably dozens of songs in my iPod that I would never have found if lyrics databases didn't exist (because God knows the shitty DJs on radio today can't be bothered to give the title and artist when you really want to know it).

    So, good luck to you, but form an LLC to own what you're building so when the record companies come after you, they won't bankrupt you personally.

    1. Re:Get a good lawyer. by Prod_Deity · · Score: 1

      I agree about the lawyer, but this is not the record companies field.
      They have domain over the actual song, but the written lyrics of the songs.
      Those are delt with by ASCAP/BMI.
      Just as the famed PearLyric app for Mac OS X has found out....

      If you don't have the funds to pay to ASCAP/BMI (the good guys), then you're pretty much screwed.

  7. Copyrights by AuMatar · · Score: 4, Insightful

    Music lyrics, unfortunately, are copyrighted. Every db on the web thats gained real size has been shut down by the RIAA. Whatever you do needs to be hosted out of a country that doesn't do copyrights, or you're dead in the water.

    --
    I still have more fans than freaks. WTF is wrong with you people?
    1. Re:Copyrights by c0d3h4x0r · · Score: 1

      Did you not read the original post?

      The application (data and program) will all be stored on a CD or DVD, and it should be able to be run locally.

      --
      Moderator hint: a comment is neither "Flamebait" nor "Troll" if it is true.
    2. Re:Copyrights by deanj · · Score: 1

      Uh.. did YOU read what he wrote? Distributing copyrighted material will get this guy a lawsuit.

    3. Re:Copyrights by c0d3h4x0r · · Score: 1

      Yes, but that's only if you get caught. The data isn't being served up through a central, obvious web server. If the app is open-sourced and distributed along with the data feely around the internet, it will be near-impossible for anyone to shut all copies of it down.

      --
      Moderator hint: a comment is neither "Flamebait" nor "Troll" if it is true.
    4. Re:Copyrights by swimin · · Score: 1

      Real Sneaky.

      Put your secret plan on /.

    5. Re:Copyrights by Kelson · · Score: 2, Insightful

      You're assuming he's going to be distributing copyrighted material -- and that he's going to be distributing it without permission. (You can distribute copyrighted material all you want, if you've gotten permission to do so. Otherwise, the publishing industry would be vastly different, and GPL software wouldn't exist.)

      The story doesn't tell us anything about which songs he's going to be including. For all we know, it could be a collection of folk songs or hymns that are already in the public domain.

      Advice to watch out for copyright issues is good. Assuming he's violating copyright (even if it's the safe way to bet) is still jumping to conclusions, and ignoring the technical questions posed in the story doesn't help anyone.

    6. Re:Copyrights by Anonymous Coward · · Score: 0
      Assuming he's violating copyright (even if it's the safe way to bet) is still jumping to conclusions, and ignoring the technical questions posed in the story doesn't help anyone.

      Neither does shoving your head way up your ass trying to imagine some scenario in which this guy isn't going to get his head handed to someone on a plate by the courts. Sometimes answering the question the original poster was too stupid to ask first (rather than the one he did) is the best response.

    7. Re:Copyrights by AntonOnymous,Cowherd · · Score: 1

      Umm... Not all music lyrics are copyrighted. And, as I said elsewhere, the bulk of the songs I am going to be using are no longer (if they ever were) under copyright. Or I will be going through the appropriate hoops for licensing. Also, this is not going to be a web service, so hosting doesn't enter into it at all. Any insights into how best to set up the software rather than nitpicking on the data? Thanks.

      --
      ... a titanic intellect in a world of icebergs...
  8. Heh... by djsmiley · · Score: 3, Interesting

    Webservice.

    Lots of websites already do this, why bog your self down with something that has already been done? Unless its for some kind of research project for university/college of course.

    Open source solutions which do the same? Amarok has a "lyrics" tab which brings up the lyrics to the playing song, i think they are pulled from wikipedia but im not sure.

    Also musicbrainz has a huge database of music too, this is why they are seemingly linked in amarok.

    So basicly your not onto a winner with this unless your going to offer something all the hundreds of others fail to offer.

    Amarok, wikipedia and musicbrainz are all open source.

    Im not sure however, how all of these cope with non-english alphabets, which is something lots of people tend to bring up.

    --
    - http://www.milkme.co.uk
    1. Re:Heh... by AntonOnymous,Cowherd · · Score: 1

      Umm... You did read my post, didn't you? This is not going to be a web service - it should be run locally on a user's computer from a CD or DVD. As for your concern about whether or not I am "onto a winner", I appreciate the thought, but it doesn't really matter, and I would prefer some suggestions on the software implementation to commiserations.

      --
      ... a titanic intellect in a world of icebergs...
  9. hypocrasy in action by gEvil+(beta) · · Score: 1

    I like how you plan to use open source software so that you can then violate someone else's copyright. You do realize that you won't have the rights to distribute the music and lyrics to these songs, don't you? That is unless, of course, you plan to only distribute songs that are in the public domain. In which case, you'll have a fairly small market (yes, I realize there are some instances where this wouldn't be true--church hymns for instance).

    --
    This guy's the limit!
    1. Re:hypocrasy in action by AntonOnymous,Cowherd · · Score: 1

      Sorry to spoil your day, but I'm not planning on violating anyone's copyright here. I won't bother thanking you for automatically assuming that that was my plan. The songs that I will be including in my application are going to be either public domain, or else (hopefully) properly searched out and licensed. Part of the idea of using OSS is to help keep costs down and provide more time / opportunity / funds for making sure that everything is done properly. If you have any suggestions as to the best way to go about setting up the software, I would appreciate them. If you just want to kvetch about the content of the database, then I would prefer not to hear it...

      --
      ... a titanic intellect in a world of icebergs...
  10. Not gonna happen by gorbachev · · Score: 1

    You will be sued the minute you launch the service. Lyrics are copyrighted, and fiercely protected by the copyright owners.

    --
    In Soviet Russia, I ruled you
  11. It's the data format and APIs by Bogtha · · Score: 1

    You think setting up a local database is a security risk, but setting up a local web server isn't? Why? You are aware that databases don't have to be servers listening on public ports don't use? You could use something like SQLite.

    The important thing is not the implementation itself. It's the data format and/or API. Make the data available, and plenty of people will be willing to write web interfaces, Qt interfaces, GTK interfaces, etc. Expose the API as plain C, and make the data easily importable/exportable, and it really doesn't matter if you produce the crappiest proprietary implementation imaginable, because both the backend and the frontend can be replaced individually.

    --
    Bogtha Bogtha Bogtha
    1. Re:It's the data format and APIs by AntonOnymous,Cowherd · · Score: 1

      Thanks for the tips on databases. I'll look into it. Sorry for my confusion on the database vs. web server security issue. I realize that a local web server is probably just as risky. The idea of using a web browser as frontend is tempting (cuts out a lot of effort), but if it means having to set up a local web server, then it's not as nice looking... As I said, I want to have an application that makes as little impact on a user's computer as possible, and if I can avoid setting up any new services, then great! As for the importance of the implementation vs. the data format, I do intend to make this as open as possible. However, I need to have a good implementation in the first place, since the intended audience for this is not necessarily that technical. Thanks for you suggestions.

      --
      ... a titanic intellect in a world of icebergs...
  12. He didn't say anything about a service by hackwrench · · Score: 1

    He seems to be saying that everything will be on the person's computer. He doesn't say where the lyrics are coming from. Perhaps the user is to cut and paste them from online. I don't see how this will be much of an improvement on either Googling for lyrics, or using local search on text files on your hard drive.

    However, http://www.animelyrics.com/ is one database of lyrics that isn't getting sued, like most things anime.

    1. Re:He didn't say anything about a service by vorpal22 · · Score: 2, Informative

      I can't recall the name of it (PearLyrics, perhaps?) but there was an excellent program for OS X that integrated with iTunes and would query several sources and download lyrics to songs you were listening to. The author took it down under threat of being sued by the RIAA, if I recall correctly, and it didn't violate copyright in any way imaginable.

      Even if the guy would have won in court, there's likely no way he could have afforded the legal costs, unfortunately, and his programming time was wasted :(.

    2. Re:He didn't say anything about a service by Bobsledboy · · Score: 1

      If you use Konfabulator on XP, there is a fantastic widget that does this for the W0indows version of iTunes. It has the bonus feature of adding the album art to the songs as well. Link

    3. Re:He didn't say anything about a service by AntonOnymous,Cowherd · · Score: 1

      You're right. I didn't say anything about this being a web service. Contrary to the popular opinion that seems to be being voiced here, the songs in the database will not be current pop songs, and I will be trying my best to make sure that no copyrights are being violated. The lyrics and rest of the song information will be pre-entered into the database, and there is no question of the user entering anything in themselves.

      --
      ... a titanic intellect in a world of icebergs...
  13. Re:Internationalization / UTF-8 by Anonymous Coward · · Score: 0

    I get that it's "nice" to be able to encode something in Latin, Greek, Russian, Arabic, Chinese, etc. But what I don't get is the fascination with putting multiple code pages in the same document. Seriously: I don't get it

    Can someone please explain why it's a good idea to have the option of changing the code page on every single character in the document? To me that feels like a step backwards. Why not just define the code page ONCE per document -- perhaps even in metadata?

    I'm serious.

  14. Start with the MusicBrainz code by Matt+Perry · · Score: 2, Informative

    You can start with the MusicBrainz codebase. The schema already supports albums, tracks, and annotations. You could extend it for your purpose to add lyrics. A daily dump of the database is available as is the source code to the server application.

    --
    Slashdot: Failed Car Analogies. Amateur Lawyering. Anecdote Battles.
  15. Re:Internationalization / UTF-8 by CRCulver · · Score: 0

    Do you know nothing about Unicode? UTF-8 is a single character set, or "code page" as you say. You just define it once in the first characters of the document, and then you can put nearly every script ever created in the same document without changing anything.

  16. The Mozilla Platform by Feneric · · Score: 2, Interesting

    Copyright issues aside (I'm assuming that you're talking about lyrics that you have the legal right to use) I'd say that there's a pretty simple answer to your problem. You're thinking through the pros and cons of using a back-end database versus a browser front-end, and you're not keen on running any flavor of server.

    You can get both the database and browser advantages without having to set up a separate server by building your app on the Mozilla platform. You can utilize its built-in RDF capabilities to store your data in a clean, extensible way, and fairly quickly put together a user interface using XUL and CSS that can work with Firefox, Seamonkey, Flock, etc., or even just the XUL app runner for a more stand-alone user experience.

    Because all of your data (and even interfaces) will be XML-compliant, you'll even be making it easier for third party apps to work with your stuff.

    1. Re:The Mozilla Platform by T-Ranger · · Score: 1

      I second this Mozilla-as-a-platform is ideal for this scenario. While you, or any other potential programmer, may not know mozilla-the-platform: XUL, XBL, RDF and heavy CSS and JS; you or any potential programmer kinda sorta know it all, right now. And you, and your pool of programmers, will still know it 10 or 15 years from now. You cant say that about wxWidgets, QT, or GTK. Note also that XULRunner will support SQLite, Real Soon Now.

    2. Re:The Mozilla Platform by AntonOnymous,Cowherd · · Score: 1

      Thanks, both for the useful information and for the assumption that I am not intending to break copyright. Your described setup sounds like what I am looking for - something that has the database and browser advantages without having to set up local servers. Just a few questions. How well would this work with non-Mozilla-based browsers (such as Opera or Internet Explorer)? Unfortunately, not everyone has Mozilla or Firefox or..., and requiring someone to change browsers could be a roadblock to them. What sort of performance would this give for data access? Comparable to database? And would this allow for arbitrary text string searches in any of the lyrics or annotated fields in a reasonably efficient way? Finally, since I am not at all up on the Mozilla platform, RDF, XUL, CSS, etc., what would be a good resource (or resources) for me to start learning this? Many thanks.

      --
      ... a titanic intellect in a world of icebergs...
    3. Re:The Mozilla Platform by Feneric · · Score: 1

      Thanks...

      You're welcome.

      How well would this work with non-Mozilla-based browsers (such as Opera or Internet Explorer)?

      If you know your users are going to be using a bunch of different browsers, it'd probably make sense to build your system around XULRunner. That way it'd be pretty much like a stand-alone app, but you (as the developer) would still get the advantages of having a built-in system to handle HTML, XML, CSS, RDF, etc. and the user would be none the wiser (although it'd be pretty almost trivially easy to provide a browser-like interface for the user, reducing learning curves for your app). You should be able to make it so they could even run it directly off the CD / DVD by just double-clicking an icon. They wouldn't really even need to install anything.

      What sort of performance would this give for data access? Comparable to database? And would this allow for arbitrary text string searches in any of the lyrics or annotated fields in a reasonably efficient way?

      I've personally not used it in this way for my own apps so I can't give you a straight answer with the certainty of one who's directly done it (my own work with RDF data stores in Mozilla has pretty much been just for user preferences and the like). However, I can say that:

      1. It's the system that Firefox, Thunderbird, Sunbird, etc. all use and they seem reasonably fast.

      2. The new SQL option mentioned by T-Ranger definitely won't weaken any of the platform's existing capabilities in this department, but it could conceivably make things quite a bit better for you if you find that the performance isn't currently what you'd want.

      3. RDF's capabilities in the way of metadata may lead you down some interesting paths that you've not yet considered regarding methods of indexing / searching lyrics beyond the straight raw text search.

      I think it'll do everything you need and then some. You'll probably even be able to find some existing Mozilla-based programs that will get you part of the way there. You should be able to view the source of all of the products on Mozilla Add-ons to find sample code to do all manner of things.

      Finally, since I am not at all up on the Mozilla platform, RDF, XUL, CSS, etc., what would be a good resource (or resources) for me to start learning this?

      There's a ton of good info for free online for all of these topics. The Mozilla Developer Center will provide you with lots of tips and an invaluable reference to the Mozilla platform, XUL, RDF, JavaScript, XML, XULRunner, etc. The W3C will provide you with probably all you'll need to know about CSS, as well as further information on RDF, XML, and HTML. There are also loads of books out there; I've personally read and found Rapid Application Development with Mozilla and Cascading Style Sheets: The Definitive Guide to be pretty much all I needed to start writing Mozilla apps, but a quick glance through Amazon brings up entries like

  17. Congratulations on having an idea... by topham · · Score: 1, Insightful


    Now you've reached the point of actually needing a clue to accomplish it.

    Just pay someone, you obviously don't have a clue.

    1. Re:Congratulations on having an idea... by MBCook · · Score: 1
      Why is that insightful? That's a textbook troll.

      What's with the "you obviously don't have a clue" part? He obviously DOES. He knows what he's doing. He has some ideas of how to do it. He is just asking for guidance from people who may know more than him. It's called learning. You think people are just born knowing how to write a complex application like this? They have to learn about it.

      Those people he should pay, how did they get their clue?

      If there is some obvious reason why he shouldn't continue, why don't you, I don't know, TELL EVERYONE. That way he and the rest of us can learn from it. If you don't have a good reason, why did you bother to post?

      Oh that's right, you're a troll.

      You comment would at least make a little sense if it was a question like "I've got an idea for this application that is cross-platform and (everything else he talked about) and a book on QBasic from 20 years ago. Should I use a for loop or a while loop?" But his question had merit.

      I would have liked to see the answer. Unfortunatly there are only 2 or 3 answers posted. There is one side note about using UTF-8 (good idea), and the rest are all "don't bother, the RIAA will sue you" posts (which are, unfortunately, true).

      --
      Comment forecast: Bits of genius surrounded by a sea of mediocrity.
  18. Maybe I'm missing the point... by Karl+Cocknozzle · · Score: 1

    ...but it seems to me writing binaries would be a mistake, and that the best route would be browser-based. If you're doing this as a "public consumption" application, requiring novice PC users to learn a special application to do something like this will make part of your audience reticent to try your application. If you tell them "Open your web-browser and go this private web-site at http://www.whatever/" it won't seem as imposing--so many musicians have MySpace pages, seeing it as a web-page to visit a la Google is vastly superior than perceiving it as a "Program I have to learn how to use." Musicians like simple: If you can do it in a browser with a handful of elegant controls, you are better off, since a well-implemented browser-based application is effectively client-OS-independent. It might matter what runs on your web, DB, and app servers, but otherwise, not an issue unless you're talking about an Apple IIgs running ProDos 16 and an alpha release of Mosaic over your token-ring LAN... But how many of your users are going to be in that weird of a configuration?

    Solaris, BSD, Linux, Windows, Mac OS, all of the above have standards-compliant web-browsers, and all have a Java Virtual Machine--a challenge to make all the platforms have identical performance, but easier than writing ten different client applications, and more likely to be usable.

    --
    Who did what now?
  19. Re:Internationalization / UTF-8 by Anonymous Coward · · Score: 0

    Forgive my use of the outdated term "code page" (which was actually the proper term before unicode changed the terminology). Reread my post with s/code page/script/ and you should be satisfied.

    Now, can someone please explain why it's a good idea?

  20. Java by MBCook · · Score: 1
    I would say use Java. That was you don't have to recompile the application for every architecture. You want it to run on Mac OS? Is that PPC or Intel? For Linux is that x86, PPC, Sparc, what? With Java it doesn't matter. Plus it would also run on Solaris and a few others.

    As for the database handling since this will be static (if you want it to run off a CD, it's static) here is what I can think of. You can embed an SQL server (I know there is one, can't remember the name) and do it that way. I don't know if that is an option for Java. Your other option is to store it in files. You could easily make a bunch of directories (made by a script) that would give you a large directory tree. Simply assign a ID to each song. The first digit of the ID is the first folder (there are 0-9), the second folder inside the first works the same only for the second ID, etc. You can go as far as you need. You can either keep the individual files, or inside each directory that you make (maybe you only want to go 2 levels deep) you keep ALL the data for those songs in a compressed file (XML, serialized objects, whatever). Then you have an index file that you load that tells you how to find the songs (which ID goes to which artist/cd/track) and you could add a second file that holds a database of words and in which IDs they appear for searching purposes.

    I'd say go Java. You can include the JRE on the disc (at least for OS X and Windows). Java is very stable and mature, where as something else like wxWindows may not be (don't know how well it performs on various platforms). Plus wxWindows or QT would require extra libraries.

    If the user didn't want to run the app off the CD, they'd just have to copy it all to a folder they have access to. If you put the code in a JAR file, not only is it cleaner but you can run the program simply by double clicking on it in either Windows or OS X (might need an extra property or two in the manifest file for OS X).

    For the web part, you could have a little application that launches the web server and closes the server when you close the program. You could embed the web browser in that. It would be more complex though, especially when going cross platform.

    I'm going to have to stick with using Java. I think that will be your best bet.

    --
    Comment forecast: Bits of genius surrounded by a sea of mediocrity.
    1. Re:Java by jgrahn · · Score: 1
      I would say use Java. That was you don't have to recompile the application for every architecture.

      Yes, saving a few milliseconds of CPU time, once, is important.

      Seriously, if Java has any major benefits, that you don't have to compile it isn't one of them. Or at least it wouldn't be one if all computers came with a C compiler.

      Also, Java isn't the only programming language with this property. Python, Ruby, Perl, Tcl ... the only popular languages which normally need a compiler are C and C++.

  21. nice idea... by Chimera512 · · Score: 1

    but what's the point? lyrics are copyrighted, and how useful is it to have "annotations" attachted to a song, and is it that hard to just listen to the lyrics isn't that why we have music in the first place?

    1. Re:nice idea... by Gertlex · · Score: 1

      Gee... I dunno... Some us of have a hard time hearing?

      It seems to be a commonly expected thing that you know the words to various songs. It's extraordinarily hard to do that by ear with a hearing loss.

  22. Do it in another country by Anonymous Coward · · Score: 0

    Choose a country that has sane copyright, data, and privacy protection laws. The Netherlands, maybe. Not in the US.

  23. Re:Internationalization / UTF-8 by Anonymous Coward · · Score: 0

    How are you meant to encode the lyrics to, say, Baby Love Child, where it's mainly English, but with some Japanese? No single code page contains both English and Japanese characters.

  24. Good story about a Lyrics Server and the Lawyers by PixelJonah · · Score: 2, Interesting

    So, a friend of mine wrote one of the first online lyrics servers.

    Here's his story.

  25. Just don't do it by Anonymous Coward · · Score: 0

    Do we need another lyrics database? hell no we don't. I can't turn around without bumping into 3 or 4 of them. Every time I search for something on google, it finds at least 10 lyrics sites that have songs including those words. Do you know how many artists wrote songs about elephantitis, or psoriasis, or ranitidine????

    Seriously, search for any lyrics and you get hundreds of sites, all with the same spelling or typo errors. We don't need another one!!!!

  26. Re:Internationalization / UTF-8 by Kelson · · Score: 2, Informative

    Well, "script" doesn't really make sense in the context of your original post, but I'll take you at your word that you don't see the appeal of mixing scripts on one page.

    To start, I'll direct you to the Japanese codepage 932, which includes at least four scripts: basic latin alphabet, katakana, hirigana, and kanji. People seem to have thought it was necessary to be able to use all of those on one page, perhaps because Japanese tends to mix three of them together on a regular basis and likes to throw in English words for flavor. (No doubt, Latin characters helped to write computer programs as well.)

    Unicode just extends the principle so that you can do things like:

    • Aggregate titles from articles in multiple languages
    • Use one language for content and another for labels (or, in the case of the web, navigation)
    • Write something like a Japanese/Russian dictionary intended for readers, that displays words the way you would see them in actual Japanese or Russian text

    ...and so on. The Unicode character set is just a big flat space, just like ASCII except with a lot more code points.

    The point about internationalization perhaps shouldn't focus on UTF-8 specifically -- one could use UTF-16 instead -- but both encodings give you access to the Unicode character set, which allows you to, as you put it, "define the code page once per document."

  27. The difference between i18n and L10n by tepples · · Score: 1

    Nothing is internationalized by default. There is no magic that converts a program's English strings into their Traditional Chinese translations.

    "Internationalized" means capable of working with user data in multiple languages and does not imply ability to translate user data from one language to another. "Localized" means that the interface is available in more than one language.

  28. Re:Good story about a Lyrics Server and the Lawyer by Kelson · · Score: 1

    Wow. Reading that I suddenly remembered my own experience providing a lyrics service on the web.

    Back in 1995, I put together a website that cross-referenced the lyrics to Les Misérables in English, French and German (all typed in by hand from the CD liner notes). At first it was hosted on webspace at AOL, but I later moved it to some space I had at college. From 1996-2000 I added songs in more and more languages, each time carefully cross-referencing and linking so that you could jump from each song straight to the same song in each other language. I had a modern French version (the original was considerably different from the show as it opened in London and Broadway) in all-caps, and a French speaker agreed to provide all the accents and diacritical marks. People sent me, sometimes one song at a time, lyrics in Hungarian, Norwegian, and Swedish. I tracked down import CDs of more languages that I could type myself. People even started sending me songs in Chinese and Japanese, first as GIF images, later in text. I learned a lot about cross-platform use of character encodings and fonts, and about website accessibility.

    After I graduated from college, a friend at the lab agreed to keep my site running for a few months while I found new hosting. In January 2000, I bought a domain name. In February, I transferred my entire website from www.arts.uci.edu to hyperborea.org. In March I received a cease-and-desist letter. Knowing I had no legal right to keep the lyrics online, I took the Les Mis section down that afternoon, leaving only the parts that weren't subject to copyright.

    Now, keep in mind that I ran this site for five years at AOL and UCI, making no effort to hide it. Within a month of setting up my own domain name, suddenly the lawyers were after me? It seemed too much of a coincidence.

    Even today, there are still pages on the net that link to "Les Mis: The Complete Multilingual Libretto." (Of course, many of them are Geocities sites that haven't been updated since 1997, or exported bookmarks files languishing on some university server.) And I still get the occasional request for lyrics by email.

  29. get yourself a lawyer... by 3.14159265 · · Score: 1

    Seriously!

  30. Troll my ass... by darken9999 · · Score: 1

    I disagree. Ask Slashdot used to be specific questions about a technology or how to go about something. Lately, however, it's been one question after another that goes:

    "I'm working on this project that will be able to do X? How do I do it?"

    There's a big difference between learning how to do something and asking somebody else to figure it out for you.

    1. Re:Troll my ass... by AntonOnymous,Cowherd · · Score: 1

      Thanks to MBCook (the parent post to your reply) for backing me up a bit here. I would like to think that I have a clue, being somewhat technically proficient, but not being completely up-to-date on all technologies. I'm sorry for offending your sensibilities with my question to Ask Slashdot, but I fail to see how it particularly differs from being a question about "how to go about something". I'm not asking for someone to do this for me (although it would be easier on me if they did :-). As I said, I am somewhat technically proficient, and I do know my way around programming, but I am not up-to-date on everything. My question was more to elicit suggestions from anyone who has done anything similar as to what the best way to focus my energies would be. Why should I reinvent the wheel if someone else has already done this? Why shouldn't I try to learn from the experience of others rather than spending months going down blind alleys and having to either scrap everything or create some godawful monster that doesn't do what is wanted? As I said in my post, the data in this application is the important thing; I'd prefer spending my time working on the data than trying to figure out the implementation. I'm sorry that you seem to feel I made a mistake by thinking that Slashdot was a place to go to learn where to learn how to do something. While the bulk of the answers have not been that useful, there have been a few people who have given me pointers that I will be following up on (many thanks to them), and I appreciate the time that they have tried to save me from going down implementation dead ends.

      --
      ... a titanic intellect in a world of icebergs...
  31. Legal Fees. by paullyjunge · · Score: 1

    Don't get sued by the RIAA, many a lyrics website has been taken down for copyright infringement.

  32. Don't get sued. by Suppafly · · Score: 1

    Advice for Building a Multi-Platform Lyrics Database?

    Try not to get sued.

  33. The NMPA, NOT the RIAA by FlanaganMusic · · Score: 1

    Actually, it is the NMPA (National Music Publishers' Association) and the publishing companies, who shut down the lyrics databases. The RIAA has no jurisdiction over the use of lyrics.

    If you want to start such a database, my advice would be to lay the groundwork for it, software-wise. Then, contact the publishers to get lyric reprint licenses. It would be nice to say that the publishers would be happy to provide you with such licenses, but chances are they would be difficult to obtain, since you are not actually making a recording with said lyrics attached.

    To find out who owns the publishing rights (and therefore lyric reprint rights) to songs, at least in the U.S. you can search databases like ascap.com and bmi.com. You will find out quickly that the major music groups, i.e. Warner Brothers, EMI, Sony/BMG, and Universal, own a vast majority of these rights. ASCAP and BMI will provide you with contact information for the publishers.

    In short, get the licenses, or at least investigate how much it would cost to do so before you start such an undertaking. Chances are it will be more than you expected, but a lot less that court costs when you get sued for copyright infringement. Don't say I didn't warn you.

  34. Data, not program logic by jgrahn · · Score: 1
    Focus on the data first, then the program logic. The data format, the license and the means of creation and distribution.

    You mention open source. What about the lyrics themselves? If you are the single provider of that CD or DVD, I don't care if the programs are open source or not. All I care about is that the data is in an open format so I can code against it myself. Closed-format content is useless to me.

  35. Re:Internationalization / UTF-8 by Cyberax · · Score: 1

    That's easy: a song in Russian with one author from Latvia. You won't be able to write author's name in Latvian.

    Besides, there are about 6 Russian codepages: Win1251, KOI8-R, CP866, ISO, MacCyr, GOST-Cyr. What codepage are you going to use?