Slashdot Mirror


Is There A Standard for Software Metadata?

tagish asks: "I'm sitting here preparing some Java source to release under the GPL and wondering how best to tell people about what I'm doing. It seems to me that there's a compelling need for a simple, extensible standard for software metadata -- an agreed way of describing any piece of code, what it does, who made it, what license(s) it's available under, what platforms it supports, what it is compatible with and so on. The first question then is: does such a standard exist? And if it does why is it not more popular?"

"It's one thing to make this stuff available, but if people can't find it I'm wasting my time. Of course there are places I can go to publicise what I've done (Freshmeat, Jars, Gamelan, and Servletcentral in this case) and those services perform a valuable function, but in practice it is still quite hard for someone to find some code in language X that performs function Y in a way that complies with constraint Z. There's no search engine that finds reusable code based on variable criteria and, given the number of incompatible ways source code can be packaged, described and distributed, little prospect of anyone building one.

Right now, when I release this code, if I want people to find it I have to:

  • write a description of it
  • set up a home page for it
  • register that page with numerous search engines possibly using the description I wrote
  • visit the appropriate repository and announcement sites making submissions at each
  • find out whether there's an appropriate usenet group and post to it

Assuming that such a standard doesn't exist does anyone want to get together with me and devise one. I'm thinking of something (human) language independent, simple, capable of encompassing all types of code, amenable to automatic processing. What about it?"

As much as I agree that something like that should exist, I believe that if you feel strongly about your code, then a home page is a must for your project (as well as writing descriptions about your project and registering it with search engines). A metadata standard would be a big help in this respect, but it's not going to be a replacement for going out there and spreading the word yourself as best you can.

With that said, what current data formats could be extended to serve as such a metadata standard, and if none of them are completely sufficient to handle this type of application, what would such a format need to be robust and flexible enough to serve this purpose.

5 of 92 comments (clear)

  1. Check out Project Meta. by Q*bert · · Score: 5
    It is a very ambitious project. The goal is to make a single format not only for project metadata but also for package metadata, abstracting over RPMs, debs, ports, and the like. The leaning is toward making it XML-based.

    The leader of the project, SF Perl Mongers' own Rich Morin, is being very circumspect about it, trying to gather lots of information from experts in different OSs and distributions, and of course working on it in his free time, so the product is not there now--but if you're interested in contributing to such an effort, this would be the place to help out.

    Vovida, OS VoIP
    Beer recipe: free! #Source
    Cold pints: $2 #Product

  2. The Linux Software map by Scorcher · · Score: 4

    "The LSM is a directory of information about each of the software packages available via FTP for the Linux operating system. It is meant to be a public information resource. All entries have been entered by volunteers all over the world via email using the template below..."

    ftp://ftp.execpc.com/pub/lsm/LSM.README

  3. Well... by Wdomburg · · Score: 5

    I'm not aware of one that is cross-platform, though there is one for Linux called the "Linux Software Map."

    The format includes the following fields:

    Title
    Version
    Entered-date
    Description
    Keywords
    Author
    Maintained-by
    Primary-site
    Alternate-site
    Original-site
    Platforms
    Copying-policy

    Given that there is a platform field, despite it being refered to as the *Linux* Software Map, this does qualify on most of the criteria that you mentioned.

    Freshmeat, though not a format, is also a fairly comprehensive database of software which provides much the same information as you mentioned, including:

    Title
    Description
    Author
    Licence
    Category
    Download
    Packages
    Homepage
    Changelog

    Freshmeat, aside from providing updates on their site, also provide them via text files, which are suitable for simple automated parsing.

    Though neither solution is entirely perfect, both are definitely close to what you're looking for.

    What I would like to see is an SQL backend, with a simplified query engine on top of it that returns an XML formated document back. This would take care of the extensibility portion of it, as fields could be added to the backend and XML format, without breaking compatibility with the client.

    Likewise, I would like the database to be available as a download so that mirrors could be created and/or alternative front-ends.

    (E.g. The search functions of Freshmeat aren't always flexible enough for me to easily pinpoint what I am looking for. I would definitely prefer being able to download a snapshot of the database and run custom SQL queries locally.)

    In any case, freshmeat and lsm are likely your best choices for the time being.

  4. iBiblio's LSM and Dublin Core by Kanagawa · · Score: 4

    There are several such initiatives under way in the on-line library community -- librarians collect so much cruft and since they tend never to throw things out, they feel an even stronger need than you for good metadata. Dewey Decimal System is one such (very simple) metadata standard (sortof). Anyway, SunSITE.UNC.edu -- now iBiblio.org -- has required Linux developers uploading software to /incoming to include a inux Software Map (lsm) file for quite a long time now. The .lsm file is a basic metadata file in a fairly simple format. So, you might look at that: http://www.ibiblio.org/pub/Linux/LSM-TEMPLATE The Dublin Core initiative is a more generalized attempt to answer the question, "How do we standardize on a metadata format?" Dublin Core is using XML and XML DTDs as the basis for their work. It applies to not only software but also to other online resources. So, as one might guess its arcane and difficult to understand at best and completely impenetrable most of the time. You can find more about Dublin Core at http://www.purl.org/dc Sadly, most search engine companies focus on searching a specific kind of document type -- like HTML -- for arbitrary content. Interestingly, searching metadata is both an easiser computational problem to solve and more productive for the user. Unfortunately, its also a far more difficult social problem. Getting everyone to write common metadata is very, very difficult. Going back and writing metadata for any sizeable archive (say, iBiblio, for example) is a Herculean task. I think most of the coders who write search engines are more interested in the actual mathematics behind searching than they are in actual Document Retrieval. You might also check out http://www.cnidr.org, who were the authors of Isearch and some other good searching tools.

    --
    "He wrested the world's whereabouts from the heavens And locked the secret in a pocketwatch." - Dava Sobel
  5. OSD from w3.org by eyeball · · Score: 4

    I believe this is what you want: Open Software Description Format (OSD) from w3.org.

    Abstract: This document provides an initial proposal for the Open Software Description (OSD) format. OSD, an application of the eXtensible Markup Language (XML), is a vocabulary used for describing software packages and their dependencies for heterogeneous clients. We expect OSD to be useful in automated software distribution environments.

    --

    _______
    2B1ASK1