Slashdot Mirror


Is There A Standard for Software Metadata?

tagish asks: "I'm sitting here preparing some Java source to release under the GPL and wondering how best to tell people about what I'm doing. It seems to me that there's a compelling need for a simple, extensible standard for software metadata -- an agreed way of describing any piece of code, what it does, who made it, what license(s) it's available under, what platforms it supports, what it is compatible with and so on. The first question then is: does such a standard exist? And if it does why is it not more popular?"

"It's one thing to make this stuff available, but if people can't find it I'm wasting my time. Of course there are places I can go to publicise what I've done (Freshmeat, Jars, Gamelan, and Servletcentral in this case) and those services perform a valuable function, but in practice it is still quite hard for someone to find some code in language X that performs function Y in a way that complies with constraint Z. There's no search engine that finds reusable code based on variable criteria and, given the number of incompatible ways source code can be packaged, described and distributed, little prospect of anyone building one.

Right now, when I release this code, if I want people to find it I have to:

  • write a description of it
  • set up a home page for it
  • register that page with numerous search engines possibly using the description I wrote
  • visit the appropriate repository and announcement sites making submissions at each
  • find out whether there's an appropriate usenet group and post to it

Assuming that such a standard doesn't exist does anyone want to get together with me and devise one. I'm thinking of something (human) language independent, simple, capable of encompassing all types of code, amenable to automatic processing. What about it?"

As much as I agree that something like that should exist, I believe that if you feel strongly about your code, then a home page is a must for your project (as well as writing descriptions about your project and registering it with search engines). A metadata standard would be a big help in this respect, but it's not going to be a replacement for going out there and spreading the word yourself as best you can.

With that said, what current data formats could be extended to serve as such a metadata standard, and if none of them are completely sufficient to handle this type of application, what would such a format need to be robust and flexible enough to serve this purpose.

10 of 92 comments (clear)

  1. Check out Project Meta. by Q*bert · · Score: 5
    It is a very ambitious project. The goal is to make a single format not only for project metadata but also for package metadata, abstracting over RPMs, debs, ports, and the like. The leaning is toward making it XML-based.

    The leader of the project, SF Perl Mongers' own Rich Morin, is being very circumspect about it, trying to gather lots of information from experts in different OSs and distributions, and of course working on it in his free time, so the product is not there now--but if you're interested in contributing to such an effort, this would be the place to help out.

    Vovida, OS VoIP
    Beer recipe: free! #Source
    Cold pints: $2 #Product

  2. The Linux Software map by Scorcher · · Score: 4

    "The LSM is a directory of information about each of the software packages available via FTP for the Linux operating system. It is meant to be a public information resource. All entries have been entered by volunteers all over the world via email using the template below..."

    ftp://ftp.execpc.com/pub/lsm/LSM.README

  3. Well... by Wdomburg · · Score: 5

    I'm not aware of one that is cross-platform, though there is one for Linux called the "Linux Software Map."

    The format includes the following fields:

    Title
    Version
    Entered-date
    Description
    Keywords
    Author
    Maintained-by
    Primary-site
    Alternate-site
    Original-site
    Platforms
    Copying-policy

    Given that there is a platform field, despite it being refered to as the *Linux* Software Map, this does qualify on most of the criteria that you mentioned.

    Freshmeat, though not a format, is also a fairly comprehensive database of software which provides much the same information as you mentioned, including:

    Title
    Description
    Author
    Licence
    Category
    Download
    Packages
    Homepage
    Changelog

    Freshmeat, aside from providing updates on their site, also provide them via text files, which are suitable for simple automated parsing.

    Though neither solution is entirely perfect, both are definitely close to what you're looking for.

    What I would like to see is an SQL backend, with a simplified query engine on top of it that returns an XML formated document back. This would take care of the extensibility portion of it, as fields could be added to the backend and XML format, without breaking compatibility with the client.

    Likewise, I would like the database to be available as a download so that mirrors could be created and/or alternative front-ends.

    (E.g. The search functions of Freshmeat aren't always flexible enough for me to easily pinpoint what I am looking for. I would definitely prefer being able to download a snapshot of the database and run custom SQL queries locally.)

    In any case, freshmeat and lsm are likely your best choices for the time being.

  4. iBiblio's LSM and Dublin Core by Kanagawa · · Score: 4

    There are several such initiatives under way in the on-line library community -- librarians collect so much cruft and since they tend never to throw things out, they feel an even stronger need than you for good metadata. Dewey Decimal System is one such (very simple) metadata standard (sortof). Anyway, SunSITE.UNC.edu -- now iBiblio.org -- has required Linux developers uploading software to /incoming to include a inux Software Map (lsm) file for quite a long time now. The .lsm file is a basic metadata file in a fairly simple format. So, you might look at that: http://www.ibiblio.org/pub/Linux/LSM-TEMPLATE The Dublin Core initiative is a more generalized attempt to answer the question, "How do we standardize on a metadata format?" Dublin Core is using XML and XML DTDs as the basis for their work. It applies to not only software but also to other online resources. So, as one might guess its arcane and difficult to understand at best and completely impenetrable most of the time. You can find more about Dublin Core at http://www.purl.org/dc Sadly, most search engine companies focus on searching a specific kind of document type -- like HTML -- for arbitrary content. Interestingly, searching metadata is both an easiser computational problem to solve and more productive for the user. Unfortunately, its also a far more difficult social problem. Getting everyone to write common metadata is very, very difficult. Going back and writing metadata for any sizeable archive (say, iBiblio, for example) is a Herculean task. I think most of the coders who write search engines are more interested in the actual mathematics behind searching than they are in actual Document Retrieval. You might also check out http://www.cnidr.org, who were the authors of Isearch and some other good searching tools.

    --
    "He wrested the world's whereabouts from the heavens And locked the secret in a pocketwatch." - Dava Sobel
  5. RDF - Resource Description Framework by Christopher+B.+Brown · · Score: 3
    W3C Resource Description Framework is the nearest thing to what you want; see also RDF and Metadata by Tim Bray.

    The most notable places where RDF is presently used for real things (as opposed to "we'd like it to be used here vaporware") include:

    The latter is exceedingly relevant, as it represents an encoding of metadata about Linux software packages in RDF form.

    --
    If you're not part of the solution, you're part of the precipitate.
  6. Re:But I thought... by Ross+C.+Brackett · · Score: 3

    Puh-lease. The kinds of metadata described need more structure than what a README can provide. Perhaps he should look into the NFO file format. It's human readable, infinately extensable and much more k-rad. If you're interested in learning about this exciting new format, visit the NFO consortium to view their library of sample implementations.

  7. Poor javadoc by fm6 · · Score: 3
    I find it kind of ironic that this question is asked in connection with Java software. The specifications for both Java and Java 2 include conventions for software metadata: Javadoc comments. These do not support all the information Tagish wants to record, but they do support a lot of it. You can argue that Javadoc is for APIs, not for programs -- but in the Java world, a program is just a class that's meant to be called from a command line launcher.

    Perhaps people find the Javadoc Conventions to be just a little confusing?

    (Anybody who knows me knows I have a personal bias on things Javadoc. Probably not worth discussing on Slashdot. I mention it just to keep myself honest.)

  8. Thats all you do? by DeadSea · · Score: 3
    • write a description of it
    • set up a home page for it
    • register that page with numerous search engines possibly using the description I wrote
    • visit the appropriate repository and announcement sites making submissions at each
    • find out whether there's an appropriate usenet group and post to it
    How about
    • Put it in your slashdot sig.
  9. OSD from w3.org by eyeball · · Score: 4

    I believe this is what you want: Open Software Description Format (OSD) from w3.org.

    Abstract: This document provides an initial proposal for the Open Software Description (OSD) format. OSD, an application of the eXtensible Markup Language (XML), is a vocabulary used for describing software packages and their dependencies for heterogeneous clients. We expect OSD to be useful in automated software distribution environments.

    --

    _______
    2B1ASK1
  10. Metadata, URI, mirrors etc..... by Alesha · · Score: 3
    Sorry for self-quotation (from the TERENA Technical Report FTP Mirror Tracker):
    Unfortunately, there is still no coherent architecture for mirroring and for mirror sites to register their collections with the sites which they mirror. In fact, we lack even a common (de facto) standard for recording this replication information in a machine readable for-mat. Some progress was made on this a few years ago by the Internet Engineering Task Force s [1] working group on Internet Anonymous FTP Archives, with the creation of the so-called IAFA Templates [2]. These provided a simple machine readable format for recording per-resource or collection metadata, which could easily be created by hand or programatically. Although support for IAFA templates was integrated into some software packages, e.g. the ALIWEB search engine [3] and the ROADS resource discovery sys-tem [4] , this approach never became successful on a large scale. The World Wide Web Consortium s Resource Description Format (RDF) [5] and the Dublin Core metadata effort [6] may eventually provide a viable machine readable interchange format.

    Currently, the database underlying the freshmeat.net weblog [7] is perhaps the closest thing we have to a genuine mirror registry - though it focuses almost exclusively on soft-ware packages and operating system distributions, and only offers limited mirror informa-tion. RDF is also being used in this capacity as part of rpmfind.net [8], although mirror information is very limited in this case too. The Internet Engineering Task Force s Uni-form Resource Names effort [9] is also relevant here, since it would be very useful if there were persistent and location independent names for these collections of replicated resources.

    [1] http://www.ietf.org/ Internet Engineering Task Force website
    [2] http://info.webcrawler.com/mak/projects/iafa/ IAFA Working Group & IAFA Templates homepage
    [3] http://aliweb.emnet.co.uk/ ALIWEB website
    [4] http://roads.opensource.ac.uk/ ROADS website
    [5] http://www.w3.org/RDF/ World Wide Web Consortium Resource Description Format (RDF) homepage
    [6] http://purl.org/dc/ Dublin Core website
    [7] http://freshmeat.net/ freshmeat.net website P. Lenz & Andover Advanced Technologies, Inc.
    [8] http://rpmfind.net/ rpmfind.net website
    [9] RFC 1737, Functional Requirements for Uniform Resource Names K. Sollins & L. Masinter December 1994

    Another attempt to create a framework for such a metadata was an "Open-Software-Index" that Oliver Maruhn and myself tried to create almost 2 years ago. After this document some discussion had started (code name "Russian Freshmeat") that had shifted mostly to localisation of such a metadata. Unfortunately no working code was produced.

    And at the end somewhat less relevant to the topic.

    This kind of metadata should be extremely valuable for implementation of the URIs and particularly for the I2C(s) (URI tp URC). Quote from the RFC 2483:

    "Uniform Resource Characteristics are descriptions of resources. This request allows the client to obtain a description of the resource identified by a URI, as opposed to the resource itself or simply the resource's URLs. The description might be a bibliographic citation, a digital signature, or a revision history. This memo does not specify the content of any response to a URC request. That content is expected to vary from one server to another."
    Hopefully we already have mechanism for the I2L(s) (FTP Mirror Tracker).