Is There A Standard for Software Metadata?
"It's one thing to make this stuff available, but if people can't find it I'm wasting my time. Of course there are places I can go to publicise what I've done (Freshmeat, Jars, Gamelan, and Servletcentral in this case) and those services perform a valuable function, but in practice it is still quite hard for someone to find some code in language X that performs function Y in a way that complies with constraint Z. There's no search engine that finds reusable code based on variable criteria and, given the number of incompatible ways source code can be packaged, described and distributed, little prospect of anyone building one.
Right now, when I release this code, if I want people to find it I have to:
- write a description of it
- set up a home page for it
- register that page with numerous search engines possibly using the description I wrote
- visit the appropriate repository and announcement sites making submissions at each
- find out whether there's an appropriate usenet group and post to it
Assuming that such a standard doesn't exist does anyone want to get together with me and devise one. I'm thinking of something (human) language independent, simple, capable of encompassing all types of code, amenable to automatic processing. What about it?"
As much as I agree that something like that should exist, I believe that if you feel strongly about your code, then a home page is a must for your project (as well as writing descriptions about your project and registering it with search engines). A metadata standard would be a big help in this respect, but it's not going to be a replacement for going out there and spreading the word yourself as best you can.
With that said, what current data formats could be extended to serve as such a metadata standard, and if none of them are completely sufficient to handle this type of application, what would such a format need to be robust and flexible enough to serve this purpose.
Got Rhinos?
Sounds like a good idea
XML would be the current standard to start with, you'd just need to develop a schema that contains the data you want to share. It'd definately help code repositories.
Trolls throughout history:
Jonathan Swift
Freshmeat.net is good enough for me.
--
This same story was on kuro5hin with exactly the same submission text.
Methinks someone's trying to be funny.
$PROGRAM_NAME:$LICENSE_NAME:(Commercial|Shareware| Freeware|Semi-free|Free|Public-Domain):$ AUTHOR:$WEBSITE:$EMAIL
Then hack something together with cpio to include it with your program.
See? Problem solved.
The .lsm files have been standard (find them on SunSite, or whatever it's called this week), and I suppose Freshmeat's format is pretty standard on the web, too...
---
pb Reply or e-mail; don't vaguely moderate.
pb Reply or e-mail; don't vaguely moderate.
The leader of the project, SF Perl Mongers' own Rich Morin, is being very circumspect about it, trying to gather lots of information from experts in different OSs and distributions, and of course working on it in his free time, so the product is not there now--but if you're interested in contributing to such an effort, this would be the place to help out.
Vovida, OS VoIP
Beer recipe: free! #Source
Cold pints: $2 #Product
"The LSM is a directory of information about each of the software packages available via FTP for the Linux operating system. It is meant to be a public information resource. All entries have been entered by volunteers all over the world via email using the template below..."
ftp://ftp.execpc.com/pub/lsm/LSM.README
I'm not aware of one that is cross-platform, though there is one for Linux called the "Linux Software Map."
The format includes the following fields:
Title
Version
Entered-date
Description
Keywords
Author
Maintained-by
Primary-site
Alternate-site
Original-site
Platforms
Copying-policy
Given that there is a platform field, despite it being refered to as the *Linux* Software Map, this does qualify on most of the criteria that you mentioned.
Freshmeat, though not a format, is also a fairly comprehensive database of software which provides much the same information as you mentioned, including:
Title
Description
Author
Licence
Category
Download
Packages
Homepage
Changelog
Freshmeat, aside from providing updates on their site, also provide them via text files, which are suitable for simple automated parsing.
Though neither solution is entirely perfect, both are definitely close to what you're looking for.
What I would like to see is an SQL backend, with a simplified query engine on top of it that returns an XML formated document back. This would take care of the extensibility portion of it, as fields could be added to the backend and XML format, without breaking compatibility with the client.
Likewise, I would like the database to be available as a download so that mirrors could be created and/or alternative front-ends.
(E.g. The search functions of Freshmeat aren't always flexible enough for me to easily pinpoint what I am looking for. I would definitely prefer being able to download a snapshot of the database and run custom SQL queries locally.)
In any case, freshmeat and lsm are likely your best choices for the time being.
There are several such initiatives under way in the on-line library community -- librarians collect so much cruft and since they tend never to throw things out, they feel an even stronger need than you for good metadata. Dewey Decimal System is one such (very simple) metadata standard (sortof). Anyway, SunSITE.UNC.edu -- now iBiblio.org -- has required Linux developers uploading software to /incoming to include a inux Software Map (lsm) file for quite a long time now. The .lsm file is a basic metadata file in a fairly simple format. So, you might look at that: http://www.ibiblio.org/pub/Linux/LSM-TEMPLATE
The Dublin Core initiative is a more generalized attempt to answer the question, "How do we standardize on a metadata format?" Dublin Core is using XML and XML DTDs as the basis for their work. It applies to not only software but also to other online resources. So, as one might guess its arcane and difficult to understand at best and completely impenetrable most of the time. You can find more about Dublin Core at http://www.purl.org/dc
Sadly, most search engine companies focus on searching a specific kind of document type -- like HTML -- for arbitrary content. Interestingly, searching metadata is both an easiser computational problem to solve and more productive for the user. Unfortunately, its also a far more difficult social problem. Getting everyone to write common metadata is very, very difficult. Going back and writing metadata for any sizeable archive (say, iBiblio, for example) is a Herculean task. I think most of the coders who write search engines are more interested in the actual mathematics behind searching than they are in actual Document Retrieval. You might also check out http://www.cnidr.org, who were the authors of Isearch and some other good searching tools.
"He wrested the world's whereabouts from the heavens And locked the secret in a pocketwatch." - Dava Sobel
I would also like to point out RDF as an existing use of XML for something very similar to what you're asking for. It can be extended with more tags if you need them. Right now it's targeted more towards web content, but I think it will give you some good ideas.
--
The most notable places where RDF is presently used for real things (as opposed to "we'd like it to be used here vaporware") include:
The latter is exceedingly relevant, as it represents an encoding of metadata about Linux software packages in RDF form.
If you're not part of the solution, you're part of the precipitate.
This may shock you, but both sites get stories from user submissions. So all this means is that the some person posted both stories to Slashdot and kuro5hin, big freaking deal.
OSD (Open Software Description), implemented by Microsoft in XML. Of course, it is the work of Satan.
There's even real code available, in Python, which I confess I haven't looked at, so I'm vauge on what it does or soesn't do yet. I suspect there's that which is worth a look.
I'd check out the Linux Software Maps, hosted at metalab (now ibiblio). Mor interesting is probably the Dublin Core Project. Although It's not set up specifically for software, it's extinsible. You could create an XML DTD (which many have suggested) using the Dublin Core standard for sucha purpose.
links:
iBiblio Linux archives
Dublin Core homepage
My other computer is your Windows box
While there are no set-in-stone standards for describing electronic media, including code, there are some evolving standards in this area. One place to start would be to look at the Dublin Core website - this is a descriptive schema that provides a 'core' or basis for metadata schemes. Also look at the work on RDF (links can be found from the Dublin Core site) for info on structuring metadata.
"What we have here, is a failure to communicate." - Cool Hand Luke
Doesn't seem to have an element to store the license the software is distributed under. Or a homepage. Or a download link. Or a file size.
.deb, .rpm etc. have to offer.
I guess XML is a good idea, but how about trying to mirror everything one can enter for a freshmeat.net entry? And consider what
www.freshmeat.net
Duh.
I have no problem with your religion until you decide it's reason to deprive others of the truth.
my god, you are a fucking moron. please re-read my post and you will find the second paragraph considers and discounts your 'explanation'.
Abashed the Devil stood,
And felt how awful goodness is
Part of the .NET platform is a standard for metadata - "like IDL and type libraries on steroids..."
This will allow cross-language, cross-platform code integration. VB can call C++ or Java directly, using the metadata information.
At least in theory - that's what COM was supposed to be for :)
Info at http://msdn.microsoft.com and MSJ, et al.
Perhaps people find the Javadoc Conventions to be just a little confusing?
(Anybody who knows me knows I have a personal bias on things Javadoc. Probably not worth discussing on Slashdot. I mention it just to keep myself honest.)
Some parts of such a metadata standard are easy: language, compiler, platform, architecture, etc. But once you start trying to document the actual functionality of your code, you get into some sticky territory that is still the domain of researchers at a number of universities. The problem first is to devise a language powerful enough to facilitate formal methods. The next problem is actually convincing people that it's worth all the effort to formalize their specs (I think it is, but there are many who disagree). The last problem is coming up with a search algorithm that is able to match specs. For this part, you can't just use a string match or unification algorithm... there's some deeper semantic and structural analysis that needs to be done to determine that a certain fragment of code meets the constraints you want. To make the whole problem even worse, we don't even know if such an algorithm is computable! So, a full-blown metadata standard seems a bit out of the question now, but if you're willing to lower your standards a bit, I bet you can whip up a more practical implementation (with some natural language thrown in).
... for hosting your open-source project or source code 'snippets'. They offer a wide range of services for free if you host your project there, including bug tracking, file/version archiving, etc.
You can find them here.
I haven't seen project documentation [templates / standards / requirements] on their site, but perhaps you can be the one to create them.
Name, Title, Organization, Address, Phone #, Fax #, URL, Date, etc... These metadata get assigned to the higher level of metadata such as: Originator, Copyright holder, Maintainer, Mirror Sites, etc...
It gets more complicated at the next description level. For instance, a set of metadata for software would be something like: Programming language, Operating System, Compiler, Library requirements (dependencies), Hardware requirements, License, Distribution restrictions, Lines of code, etc... Along with this would be tags like version number.
Then comes the software description: Application type (e.g., graphic converter, audio playback), User interface, Data input, Data output, Data formats, Batch/Interactive, Algorithms, Previous versions, Code stolen from, etc...
Metadata should be flexible enough to take in new types. Metadata sometimes points to more metadata which points to more metadata. Not all the metadata attributes need to be filled in. One should strongly attempt to standardize some of the key words. Metadata are a bitch to come up with.:-)
Ooooh, did para_droid dare to point out your stupidity? Bad man para_droid! Bad man! Bitchslap para_droid!
Abashed the Devil stood,
And felt how awful goodness is
..and comment your code in javadoc or QT style.. Javadoc style is better IMO..
<^>_<(ô ô)>_<^>
The FILE_ID.DIZ will get the job done.
If you create a system like this, it would be useful for finding EVERYTHING.
If post #39 had contained that infomation it *wouldnt* have been so moronic and I wouldnt have had to reply in the way I did.
You have suggested to me one possible explanation: Slashdot is full of losers who find this story interesing. Kuro5hin is read entirely by Slashdotters. Those same Slashdotters voted the story in.
This makes me wonder about something else. Why does no-one apart from me have the guts to post abuse logged-in? Are you afraid of losing your precious 'karma'?
Abashed the Devil stood,
And felt how awful goodness is
Of course, this is most relavent to Open Source projects that make extensive use of CVS, but in a few years there will be no conflict to worry about.
----
What part of XML don't you understand? Microsoft plans to use it for just about everything... If you want to be left behind, don't use XML.
--
Peace,
Lord Omlette
ICQ# 77863057
[o]_O
I believe this is what you want: Open Software Description Format (OSD) from w3.org.
Abstract: This document provides an initial proposal for the Open Software Description (OSD) format. OSD, an application of the eXtensible Markup Language (XML), is a vocabulary used for describing software packages and their dependencies for heterogeneous clients. We expect OSD to be useful in automated software distribution environments.
_______
2B1ASK1
I don't mean this to be flamebait or a troll, please don't read it this way.
I'm not sure about most people and can't speak for any of them, but I've personally never visited either the Linux map nor the Meta project.
For me (and millions of others?) the tried-and-true method of software distribution tarball, website, and all-important README file have been sufficient all these years.
Of course it's not efficient, it doesn't encourage searchability, etc... But it's what everybody uses and is used to.
I personally haven't seen enough discipline in the Open Source community with regards to a metadata project. The RM and Meta projects are great, but people need to use them.
Don't sweat the petty things. But do pet the sweaty things.
Isn't that what open source is all about? RTFC!(Read The Fucking Code)
Reading source code requires moving source code somehow to your local host. Bandwidth costs money. What the OP is looking for is metadata, or something small that describes the code in a well-defined, searchable form.
<O
( \
XGNOME vs. KDE: the game!
Will I retire or break 10K?
Microsoft COM, Microsoft .NET, what's next?
Microsoft.org: Microsoft will blow big bux0rz on .NET (which requires an always-on, high-speed connection, making it inaccessible to a large number of Windows customers), stop making profits, and will have to become a nonprofit.
<O
( \
XGNOME vs. KDE: the game!
Will I retire or break 10K?
I'm thinking it would be nice if certain metadat could be embedded in the binary after it's been compiled. Say for instance, you try to open up a binary file using your web browser, instead of it not knowing what to do, you could have the first X number of bytes be a document in XML or HTML, with various info about the binary, maybe including things like; what other files are part of this program, and where they're located (this would have to be dynamic of course, created at compile time for instance?), the author, web sites, dependencies, etc.
Just a thought.
"God is REAL
that would mean that you have to move to some sort of symbolic token language. The fact is, the problem is even greater with programming languages. If you program in C/C++, Pascal, COBOL etc, you're really programming in the English version of that language. That leads to very bizzare and sometimes funny code in languages other than English, where keywords are English but identifiers and comments are in some other language.
I think what's important to realize here is that we're not trying to find the meaning of life. XML is simply a standardized way of tagging information and it's not perfect. But the quest for perfection can sometimes prevent us from arriving at a solution at all. The whole struggle to standardize on various industry-specific markup languages is difficult enough and has led to enough feuds and confusion. Let's not make it even more difficult by obfuscating the whole issue with another order of complexity. Once XML has done its job well, we can worry about the finer points.
If you're going to use an existing standard (and really, you should) then Dublin Core is the standard to use. Despite what the previous poster said, it is not particularly confusing, and it has fields that are appropriate to software (e.g. Creator, Contributor, Version, Rights).
ibiblio is also working on something they call the Opensource Metadata Framework which seems to be based on or even a subset of Dublin Core. I don't know why they didn't just use Dublin Core. See http://www.ibiblio.org/osrt/ldpcore/ldp_elements for the spec.
Frankly, you're getting into the area of librarianship, so you should ask a librarian. (IANAL.) You might do this at lisnews.com or at oss4lib.com; particulary the latter. By the way, it's my experience that programmers are bad librarians - even digital library programmers like myself - who think they are good librarians (we invented search engines, didn't we?) so take everything you read in this forum with a grain of salt.
I'm sitting here preparing some Java source to release under the GPL
Java.. GPL.. problems talks problems, questions, concerns.. ehhhhh.. geez man.. just code !!
Are we coders or politicians ?
Just code !!!
The only thing that makes this a little curious is that both posts were made within minutes of each other. Both sites are invested in by VA. It just looks too strange.....
Kuro5hin's coverage of this is quite extensive.
I've always thought it would be cute for linkers to be aware of software licenses, and of license incompatibilities. License identification would be embedded directly within object files and libraries, presumably put there by license-aware compilers. If you tried to link GPL-incompatible application code with a GPL-covered library, the linker would report an error.
Yes kids, that's why I'm at Berkeley. And if you patent my idea and make millions off of it, I'll sue you silly. :-)
I believe this question could be extended to more general one: How to assign description fields to any digitized data?
I see no reason to restrict yourself to program code only. It could be extremely useful to be able to structure any information you want: consumer, scientific, entertainment, whatever.
Also, it is not mentioned directly, but the spirit of such standard would imply that you also have ability to process this information. That is be able to search. In a parlance of our times this means "search from the web"
Is there any open source solution for web accessible searchable database with user updateable and flexible structure?
I already have a system that does most of what is described in the story for in-house projects i do. The system uses XML in the back end and adds the file into the package. I also have an automated program that creates a readme.txt and readme.html from the data.
The XML holds a short changelog, date/times of all builds, long descriptions of changes for each build, misc project information, a major and a minor general description field, and some other minor information. the generator appends the file size of both the final package and all files contained. This system is designed for in-house only, and is in no way currently useable as a solution for public use (part of a requirement i've made to co-ordinate development between our 4 coders). also, many of you may be disapointed to know its win32 only, as that is all the work we do internally. But if anyone wants to know some of the specifics, of such a system, let me know.
take a look at UML the Unified Modeling Language great for documenting software.
Buy my shit at http://www.cellup.com
Just to clarify, the OMF is heavily based on Dublin Core. We actually generate a subset of DC metadata, formatted in XML using a DTD that we developed in house. The theory behind dublin is that you can extend it or create a subset to fit your needs, which is what we did. Thanks for the heads up on OMF, though
My other computer is your Windows box
...fileid.diz :-)
Aaaaah, for the glory days of the local bbs scene...
gfunk007
Send lawyers, guns, and money!
Depending on what your software does you might want to take a look at UPnP.org and see if their is a DSP template for your software yet. This is mostly for devices but, again, if your software can be shown as a device (might consider a streaming media server a device) this would be where you would want to go. Thanks, Kyle
I've been thinking along the same lines for a while now. Open Source can promote a great deal more reuse and expanded component libraries. But this assumes that potential users and clients can find the existing resources and know what they have.
Microsoft is part of the MDC (Metadata Coalition) which has a standard called Open Information Model (OIM) for storing metadata about various things. Each OIM sub-model has an XML DTD used for importing and exporting metadata from a repository and a mandates using SQL to query the repository. You could creation an OIM class model and DTD. You could also look at the OMG equivalent. XML on it's own is not enough - it's a data encoding standard, not a storage mechanism for a repository!
And at the end somewhat less relevant to the topic.
This kind of metadata should be extremely valuable for implementation of the URIs and particularly for the I2C(s) (URI tp URC). Quote from the RFC 2483:
Hopefully we already have mechanism for the I2L(s) (FTP Mirror Tracker).Right now I'm working on a proof-of-concept kinda thing that will test the current implementation of the xml-format and the tools. Another guy is working on a Windows-implementation, using the same standard.
If all goes well I'll have something to post by the end of this week. Keep an eye out on Freshmeat.
The book where David Korn et al describe a lot of the things they built and use at AT&T research I can't recall the name and my copy of the book got "lost" at a previous job however there was a software repository described that allowed fuzzy searching on characteristics. It may have some useful information in it. Even if that isn't exactly what you want the book is valuable source of information and ideas.
The National Institute of Standards and Technology has a division in the Info.Tech. Lab that has metadata as one of their projects. Looks like they're thinking XML.
"Even if you're on the right track, you'll get run over if you just sit there" - Will Rogers
...on freshmeat at http://freshmeat.net/search/?q=verinfo. It supports all the data items the original poster mentioned, and more.
While XML, PAD, and readme do have their uses what you really want to look at is how something like this could be automated and produce a machine readable format that might be rendered into any number of languages (including into other programming lanuages).
Think about what a compiler does. It translates silly little 'human readable' files into machine readable instructions. Yes, compilers have lots more features than that, but the key feature is that it turns my poorly thought out ideas about a problem into a set of instructions that my processor can deal with. So what you want to do is create some sort of meta-file which isn't machine code, per se, but a representation of precisely what the code is doing. You'd want some way of including comments from code into a description block for the main program and for each function (maybe some sort of encoding scheme so that it could be language independent)
The renderer would be able to take that and generate english text OR mandarin OR turkish OR C++ OR JAVA OR COBOL OR (god help us all) PL/I.
So those of you who might have once written a compiler (I'm sorry) take a look at what that process is and think about what you might be able to do. I'd imagine something like:
PROGRAM BOBSHOE
some user defined text
pseudocode
USES OBJECT BLAH
USES OBJECT ANOTHERBLAH
OBJECT BLAH
etc. So, what do you think?
- I settled down long enough to write this and have now collected far too much dust. Damn Dust.
Do any of the package managers do this? Should they? Or is this all part of one big problem of which package managers and all this other stuff are only pieces...
XMI is a recently developed metadata standard by OMG. It's encoded on XML and is used on some tools: ArgoUML(http://www.argouml.org) Case Tool, Rational Rose and there's a tool from IBM, XMI Toolkit (http://alphaworks.ibm.com/tech/xmitoolkit), which automatically extracts metadata from a Java program. The standard is rather complex. To be productive with it, you have to understand things like metadata architecture, the OMF meta-meta-metadata standard (also by OMG). The alternative, XIF, used on its repository tool, is not much simpler, but there's very little information available to write a XIF-compliant tool.
XIF is used by the Microsoft Repository
You left out the solution SourceForge.net is working on, called Trove, or simply the Software Map. It contains fields for Development Status, Environment, Intended Audience, License, Operating System, Programming Language, Topic, and Description, and is centrally served from http://sourceforge.net/softwarema p/trove_list.php.
Several of the categories are even hierarchical, which helps validate the values used. Another benefit is that if the license is open 'enough', you can host your web page and downloads at SourceForge, at which point it will help you track versions and release notes.
Wow -- that sounds like I'm a SourceForge PR person. Please understand that I'm not necessarily advocating them as the best solution -- I think freshmeat and lsm are extremely valuable. I just wanted to make sure the SourceForge solution was mentioned.
--Chouser
--Chouser
"To stay young requires unceasing cultivation of the ability to unlearn old falsehoods." -LL
Whatever format is used, it should
XMI is part of the OMG's Unified Modeling Language Specification 1.3, and it stands for XML Model Interchange. It is intended as a mechanism to reliably transport UML models between tools.
The OMG metadata standard is the Meta Object Factory version 1.3. It allows the specification of the meta-meta data and provides IDL interfaces for accessing a repository based on the MOF definition.
The title is right, but the content is wrong. XML is not a meta-data standard. XML just provides the syntax, not the semantics. And Metadata needs semantics. What I said is that XMI is a meta-data standard. It's not a generic metadata standard for describing web resources like Dublin Core or IAFA. XMI is more suited for coding in XML the metadata of OO development tools. In the ISO metadata architecture, metadata standards are grouped in 4 levels, according to what they can describe. As much as UML (level 3) is a metadata standard for describing OO models, MOF (level 4) is a metadata standard for describing metamodels like UML. But MOF is not limited to UML. It can be used to describe entity-relationship models, data warehouse models, component models and even generic metadata. XMI is a mechanism for mapping ANY meta-model which can be described by MOF in XML. One might object that it's not a metadata standard because it cannot by itself describe anything without a meta-meta-model like MOF behind the scenes. I think this objection is useless. There are metadata standards for description, like MOF, and metadata standards for encoding, like XMI.
Apparently Microsoft's new .NET platform incorporates some kind of meta-data for "assemblies" (ala packages in Java I think). It might be worthwhile looking at what they've done. I believe it's XML based and includes documentation, authorship, licencing and other required assemblies amongst other things.
Idiot Alert! Learn something about both sites before you try to figure them out. Slashdot has editors that post stories. This was a question, so it was posted by an editor to "Ask Slashdot" a section for questions. Kuro5hin, OTOH, works by having the users vote on the stories to determine whether they should be posted. The users apparently decided the story should be posted.
So, to repeat what I just said so that your small pea-sized brain can comprehend it, their was one editor, and a bunch of users.
g'd day!