OSD Database Downloadable As XML
Providing a list of applications stable enough to recommend to non-gurus is a worthy endeavor, so it's great to see this project slowly becoming more useful. There are gaps to plug going forward, though. The default text strings can be ambiguous, and the information provided on individual projects doesn't always give much to go on. For instance, look at the Mosix page, where you'll find that "This product has no Latest version yet," "This product doesn't fix anything," and "This product is not like any other," but no email contact information for Mosix authors. Similarly ambiguous pages are provided for Gnucleus and OpenOffice.
I exchanged some email with Steve on the state of the entries in the database, and asked about how the missing information could be filled in. He told me that while project maintairers (and site administrators) are the only ones who can update entries, users can contact the administrators of individual projects directly through the OSD site to suggest changes or clarification.
"We're trying to make things easier for the maintainers. ... I think there is a serious lack of product maintainers to help authors," he said. To that end, Mallet may soon provide example projects for software authors to emulate, and is in the early stages of a unified project-listing tool which would update listings on various web sites. Given the number of sites that offer downloads or simply track various software projects, that could be a boon to developers.
Hopefully, this will turn into the sort of tool that you can show a boss or teacher to answer the bugaboo of Free / Open Source being unready for prime time (or just overwhelming and undifferentiated).
I find it somewhat amusing that the Linux kernel itself isn't listed in the directory. (yes, I know that they list just apps, but still, Linux is to most people the most prominent example of a open source project)
Someone could leverage XML RDBMS like DBXML which is based on the "XML:DB" standard.
If enough people are interested, I could try downloading dmoz myself and "massage" it into some dbxml store on my own system and build a web-based interface to query it, I've just been really busy with other stuff lately though.
If you happen to read this and are interested, shoot me an email at valmont@wildstar.net and we can take it from there.
Extraordinary Vacations. Exceptional Prices
It's worse than that. It can't be valid because it doesn't even have a DTD.
But this document is not even well-formed XML. In other words, it is not XML at all. It's plain text with some tags.
For details on what it means for an XML document to be well-formed or valid, see the spec at the W3C
Is there a PHP class or something that everyone's using for this? I saw a couple offerings at freshmeat that relates to ODP and some some tools and code are here, but I'm curious what most people are using.
W
-------------------
-------------------
This is my SIG. There are many like it, but this one is mine.
YHBT. This is a variation on the same 'foo is dying' framework that's been going around /. for months. Look for any kind of post matching 'BSD is dying' and compare it to this...same thing.
-- Veni, vidi, dormivi
The real shame was watching the categories I created with TLC lie fallow for months and months without any one to update them.
With inane policies like these is it any wonder that this directory lacks up-to-date information and is in general disarray? Me thinks not.
I was thinking of the immortal words of Socrates, who said, "I drank what?"
Life is a tale told by an idiot, full of sound and fury, signifying nothing.
William Shakespeare
It states on the download page that it's not validating XML. And did you take a look at the DTD for this? It's very simple, about as simple as you can get. Basically useless for rigorous validation.
I agree your way of structuring the data is better, but I would add that many of the data items should be attributes. I mean you have elements and attributes available, why not use both? It would have made things much faster and cleaner to keep up to date and ensure all parsers can validate it quickly. Can you really see a SAX parser making use of that xml? And a DOM parser would consume an enormous amount of memory needlessly. Oh well, I'm sure they were in a big hurry to get this info available. And it'll get well cleaned up in the next few months.
Some are available for both Java and C++.
Sorry I don't have a more detailed answer to your question but I'm sure something can be built from the Apache XML stuff.
Mike
-- Could you use my software consulting serv
Someone should create an HTTP interface to a dmoz XML database, which would allow users to place XPATH queries which would return XML nodesets to the requesting client.
That's an interesting idea, but it's not quite the same problem. You describe a good solution to a "pull" scenario, which is great for queries instigated by a client, but it's not as good as a "push" for providing a newsfeed from a site.
I'd suggest RSS 1.0 as a good format to produce (possibly based on the same XPath-based pull that you describe). Once it's in RSS 1.0, then it's trivial to make it appear on any number of sites, or to aggregate it into other more generalised newsfeeds.
For implementing the "pull" side, then XPath encapsulated in SOAP is an easy way to build clients, and not too hard for the endpoint server. I've been doing this recently, so that a UI component (DHTML in Javascript) could selectively retrieve pieces of a big taxonomy document that was >MB in total.
My one concern (and my own personal bias) is that I see many of these items as running off the limits of what XML (and XMLDB) is good at, and being better handled in RDF. Certainly RSS 0.91 (which is XML) couldn't do this, but RSS 1.0 (which is RDF) could easily. Of course, that then makes XPath unworkable as a query language and there's not yet a stable "RDFPath" equivalent for RDF.
I'm also interested in working on this. Anyone else, drop me a mail if you are too.
The more important one would be for the licencing info- I was about to face the task of building up a database of (L)GPL'd applications manually. I'd say they've definitely saved me some effort... sure they're not all there but it's a start... thanks, guys.
On the topic of the GPL, anybody notice they've licenced this XML document under the GNU Free Document License? I can see the press release now: 'Argh! Viral pac-men documents!'
I use my computer to write music, so I went to see their listing of stable audio software. The only things listed there are a crossfade plugin for xmms, GLAME and a soundfont editor. I've tried GLAME. Listing it as "stable" is a joke. And to top it all off, they have these things listed multiple times in categories they shouldn't be in. I'm fairly certain a soundfont editor doesn't qualify as "sound synthesis".
I want to stress I'm not trying to discredit the GLAME team or any of these software packages. But what good is OSD if it's categories are a mess? I might as well just use freshmeat.
c.
I'm all for OSD, and means for making it easier to search, read about, and acquire apps. However, the people that would most likely find this searchable database are the people who already know how to search other OSD networks, like sourceforge and freshmeat. I can only see this, by itself, is going to confuse people more with multiple area to get the same software, but possibly at different versioning levels, in comparison to freshmeat and sourceforge. Maybe if this list was synced somehow with the freshmeat lists, that might provide a very powerful tool for people new and experienced to the Open Source world to get, play with, learn, upgrade, hack, and love Open Source. IMHO though...
Sounds to me like the point of this project is a global infinite loop. I don't know much about this, but if that's what it is...count me out. I have it bad enough as it is. (I run windows
chances are, this is a joke.
It is stored in a database, then put into XML. -Steve Malllett of OSD
Steve Mallett of OSD
-Steve Mallett of OSD
Not only that it is not valid, the XML structure is not very logical either.
The authors of the XML file has written it like this:
<group_name></group_name>
<--properties of group-->
<group_name></group_name>
<--properties of group-->
whereas a more clever structure would have been:
<group>
<group_name></group_name>
<--properties of group-->
</group>
This way the different groups would have been separated in a more logical manner, and it would be "easier" to parse the information in the XML file.
but xml was not designed to replace databases. to store everything in an xml format is a bad idea. you store it in a database, and pass the variables into an xml document, then parse that with an xsl or xhtml. xml is a transport, not a storage.
-
sean
The world moves for love. It kneels before it in awe.
Great, XML. Now MS can use it in the next version of Office. :P
Obviously for such things as flat databases XML is perhaps not the best solution, but XML can and should be used for storing data such as marked up full text documents (TEI) or descriptions of Archives or Museum objects (EAD/Spectrum) XML is most definitely not only a transport. The XML in the database provided is awful, and demonstrates why XML needs to be thought out in advance rather than generated directly from a database. For example, no encapsulation of individual projects, just a single layer of tags from beginning of the document to the end. -- Azaroth
http://www.o-r-g.org/~cheshire/osd/osd.tgz
Also, a search engine (Cheshire2) running over the XML with a Very simple interface/display is available at:
http://www.o-r-g.org/~cheshire/osd/
Enjoy =)
-- Azaroth