OSD Database Downloadable As XML

← Back to Stories (view on slashdot.org)

OSD Database Downloadable As XML

Posted by timothy on Sunday July 22, 2001 @01:38PM from the information-that-wants-to-be-free dept.

After taking some heat a few months ago for not having many products listed, the Open Source Directory has been plugging away. Steve Mallett writes: "We made the product database of Open-Source Directory downloadable in XML today. Announcement here at newsforge. We're hoping that people begin to use the data like google uses dmoz. More people see the data, which increases awareness of open-source which increases the database which gets more people to display the data etc, etc ... You get the point."

Providing a list of applications stable enough to recommend to non-gurus is a worthy endeavor, so it's great to see this project slowly becoming more useful. There are gaps to plug going forward, though. The default text strings can be ambiguous, and the information provided on individual projects doesn't always give much to go on. For instance, look at the Mosix page, where you'll find that "This product has no Latest version yet," "This product doesn't fix anything," and "This product is not like any other," but no email contact information for Mosix authors. Similarly ambiguous pages are provided for Gnucleus and OpenOffice.

I exchanged some email with Steve on the state of the entries in the database, and asked about how the missing information could be filled in. He told me that while project maintairers (and site administrators) are the only ones who can update entries, users can contact the administrators of individual projects directly through the OSD site to suggest changes or clarification.

"We're trying to make things easier for the maintainers. ... I think there is a serious lack of product maintainers to help authors," he said. To that end, Mallet may soon provide example projects for software authors to emulate, and is in the early stages of a unified project-listing tool which would update listings on various web sites. Given the number of sites that offer downloads or simply track various software projects, that could be a boon to developers.

Hopefully, this will turn into the sort of tool that you can show a boss or teacher to answer the bugaboo of Free / Open Source being unready for prime time (or just overwhelming and undifferentiated).

23 of 46 comments (clear)

Min score:

Reason:

Sort:

Somewhat amusing by Anonymous Coward · 2001-07-22 10:04 · Score: 2

I find it somewhat amusing that the Linux kernel itself isn't listed in the directory. (yes, I know that they list just apps, but still, Linux is to most people the most prominent example of a open source project)
1. Re:Somewhat amusing by Arandir · 2001-07-22 12:09 · Score: 3
  
  Linux may be the most "prominent" example of Open Source, but the three pieces of Open Source work that are actually the most used are not listed either. Perl, Apache and XFree86.
  
  The OSD is not meant to be a definitive archive. It's mission is to provide a resource for users. I think it has a done a good job in this regard.
  
  --
  A Government Is a Body of People, Usually Notably Ungoverned
Re:embedding ODP (& OSD) content in your web site. by valmont · 2001-07-22 11:24 · Score: 2

Someone should create an HTTP interface to a dmoz XML database, which would allow users to place XPATH queries which would return XML nodesets to the requesting client.
Someone could leverage XML RDBMS like DBXML which is based on the "XML:DB" standard.
If enough people are interested, I could try downloading dmoz myself and "massage" it into some dbxml store on my own system and build a web-based interface to query it, I've just been really busy with other stuff lately though.
If you happen to read this and are interested, shoot me an email at valmont@wildstar.net and we can take it from there.

--
Extraordinary Vacations. Exceptional Prices
Re:That document is not valid XML! Standards? Anyo by Arrowhead · 2001-07-22 17:47 · Score: 2

It's worse than that. It can't be valid because it doesn't even have a DTD.

But this document is not even well-formed XML. In other words, it is not XML at all. It's plain text with some tags.

For details on what it means for an XML document to be well-formed or valid, see the spec at the W3C
embedding ODP (& OSD) content in your web site... by VValdo · 2001-07-22 10:17 · Score: 4

What is the consensus on the best way to include Open Directory Project (ODP/dmoz) content in your web site, say for a mini-portal (or in the case of OSD, for a mini software directory)? I dont' want to simply download/display dmoz's RDF/XML file on a weekly basis because (1) I'm only interested in a tiny portion of the ODP which relates to my web site and (2) I'd like to encourage people to be uploading new content back to dmoz, so I'm looking for a way to pull "live" content from dmoz and let my visitors send links back to dmoz.
Is there a PHP class or something that everyone's using for this? I saw a couple offerings at freshmeat that relates to ODP and some some tools and code are here, but I'm curious what most people are using.
W
-------------------

--
-------------------
This is my SIG. There are many like it, but this one is mine.
Re: by Teferi · 2001-07-22 11:16 · Score: 2

YHBT. This is a variation on the same 'foo is dying' framework that's been going around /. for months. Look for any kind of post matching 'BSD is dying' and compare it to this...same thing.

--
-- Veni, vidi, dormivi
Gripe -lousy directory due to dictatorial policies by DamnYankee · 2001-07-22 14:59 · Score: 3

I was one of the original category editors of DMOZ way back when it started and long before the AOL bureacrats took over. I was really dedicated to my two small categories - paragliding and paramotoring - and built them up from scratch. Shortly after AOL took over I suddenly found myself locked out as a category editor. After repeated inquiries as to why, it turns out that I had listed some non-English web sites on my "English only" categories and this was against AOL policy hence an immediate boot.

The real shame was watching the categories I created with TLC lie fallow for months and months without any one to update them.

With inane policies like these is it any wonder that this directory lacks up-to-date information and is in general disarray? Me thinks not.

I was thinking of the immortal words of Socrates, who said, "I drank what?"

--
Life is a tale told by an idiot, full of sound and fury, signifying nothing.
William Shakespeare
Re:That document is not valid XML! Standards? Anyo by leucadiadude · 2001-07-22 17:53 · Score: 2

It states on the download page that it's not validating XML. And did you take a look at the DTD for this? It's very simple, about as simple as you can get. Basically useless for rigorous validation.

I agree your way of structuring the data is better, but I would add that many of the data items should be attributes. I mean you have elements and attributes available, why not use both? It would have made things much faster and cleaner to keep up to date and ensure all parsers can validate it quickly. Can you really see a SAX parser making use of that xml? And a DOM parser would consume an enormous amount of memory needlessly. Oh well, I'm sure they were in a big hurry to get this info available. And it'll get well cleaned up in the next few months.
Re:General XML question from a newbie. by goingware · 2001-07-23 02:23 · Score: 2

Look for the XML solutions at the Apache XML Project - Xerces and the like.
Some are available for both Java and C++.
Sorry I don't have a more detailed answer to your question but I'm sure something can be built from the Apache XML stuff.

Mike

--
-- Could you use my software consulting serv
Re:embedding ODP (& OSD) content in your web site. by dingbat_hp · 2001-07-22 20:12 · Score: 2

Someone should create an HTTP interface to a dmoz XML database, which would allow users to place XPATH queries which would return XML nodesets to the requesting client.
That's an interesting idea, but it's not quite the same problem. You describe a good solution to a "pull" scenario, which is great for queries instigated by a client, but it's not as good as a "push" for providing a newsfeed from a site.
I'd suggest RSS 1.0 as a good format to produce (possibly based on the same XPath-based pull that you describe). Once it's in RSS 1.0, then it's trivial to make it appear on any number of sites, or to aggregate it into other more generalised newsfeeds.
For implementing the "pull" side, then XPath encapsulated in SOAP is an easy way to build clients, and not too hard for the endpoint server. I've been doing this recently, so that a UI component (DHTML in Javascript) could selectively retrieve pieces of a big taxonomy document that was >MB in total.
My one concern (and my own personal bias) is that I see many of these items as running off the limits of what XML (and XMLDB) is good at, and being better handled in RDF. Certainly RSS 0.91 (which is XML) couldn't do this, but RSS 1.0 (which is RDF) could easily. Of course, that then makes XPath unworkable as a query language and there's not yet a stable "RDFPath" equivalent for RDF.
I'm also interested in working on this. Anyone else, drop me a mail if you are too.
Ready-made database of licences... by kaiidth · 2001-07-22 10:12 · Score: 2

I definitely see a couple of immediate uses for that data. The less important (but more useful for the majority of the universe, yeah...) is just to look up useful applications, and frankly you probably don't need to download the database for that... might be a good idea for somebody to write a client so people could browse it offline, on second thoughts. It's fairly small when gzipped (130k or so) but could be a worthy addition to a Linux distro for those who disdain Freshmeat.
The more important one would be for the licencing info- I was about to face the task of building up a database of (L)GPL'd applications manually. I'd say they've definitely saved me some effort... sure they're not all there but it's a start... thanks, guys.
On the topic of the GPL, anybody notice they've licenced this XML document under the GNU Free Document License? I can see the press release now: 'Argh! Viral pac-men documents!'
How do they validate the entries? by antibryce · 2001-07-22 09:56 · Score: 3

I use my computer to write music, so I went to see their listing of stable audio software. The only things listed there are a crossfade plugin for xmms, GLAME and a soundfont editor. I've tried GLAME. Listing it as "stable" is a joke. And to top it all off, they have these things listed multiple times in categories they shouldn't be in. I'm fairly certain a soundfont editor doesn't qualify as "sound synthesis".
I want to stress I'm not trying to discredit the GLAME team or any of these software packages. But what good is OSD if it's categories are a mess? I might as well just use freshmeat.
c.
1. Re:How do they validate the entries? by blab · 2001-07-22 10:50 · Score: 4
  
  It is really a product's admin to be truthful. However; our Social Contract outlines that while it is their responsibility (we can't test them all) the interest of the directory are primary. If it sucks & there is no way its stable, write to the admin & tell him so.
  Ultimately the info is open for catching bugs like this one. If it is a bug it will get weeded out.
  -Steve Mallett of OSD
Great idea, but maybe not best approach. by pjbass · 2001-07-22 10:47 · Score: 3

I'm all for OSD, and means for making it easier to search, read about, and acquire apps. However, the people that would most likely find this searchable database are the people who already know how to search other OSD networks, like sourceforge and freshmeat. I can only see this, by itself, is going to confuse people more with multiple area to get the same software, but possibly at different versioning levels, in comparison to freshmeat and sourceforge. Maybe if this list was synced somehow with the freshmeat lists, that might provide a very powerful tool for people new and experienced to the Open Source world to get, play with, learn, upgrade, hack, and love Open Source. IMHO though...
huh? by djocyko · 2001-07-22 09:44 · Score: 2

"We're hoping that people begin to use the data like google uses dmoz. More people see the data, which increases awareness of open-source which increases the database which gets more people to display the data etc, etc ... You get the point."

Sounds to me like the point of this project is a global infinite loop. I don't know much about this, but if that's what it is...count me out. I have it bad enough as it is. (I run windows ;-)

chances are, this is a joke.
Re:perhaps a bit off topic--ANSWER by blab · 2001-07-22 11:08 · Score: 2

It is stored in a database, then put into XML. -Steve Malllett of OSD
Re:embedding ODP (& OSD) content in your web site. by blab · 2001-07-22 11:32 · Score: 2

Actually its this.
Steve Mallett of OSD
unified project-listing tool by blab · 2001-07-22 10:03 · Score: 5

The "unified project-listing tool" refered too above is at: http://sourceforge.net/projects/trovesendtwo/ The idea is: you put the information for your product in a client & it updates SF/FM and or OSD at the same time without having to login & change all those listings. This is possible because, all these sites anyway, are based on loosely the same interface, data & category map. And yes, we could use some assistance with it!
-Steve Mallett of OSD
Re:That document is not valid XML! Standards? Anyo by jhol · 2001-07-22 14:54 · Score: 3

Not only that it is not valid, the XML structure is not very logical either. The authors of the XML file has written it like this: <group_name></group_name> <--properties of group--> <group_name></group_name> <--properties of group--> whereas a more clever structure would have been: <group> <group_name></group_name> <--properties of group--> </group> This way the different groups would have been separated in a more logical manner, and it would be "easier" to parse the information in the XML file.
perhaps a bit off topic by sehryan · 2001-07-22 10:45 · Score: 3

but xml was not designed to replace databases. to store everything in an xml format is a bad idea. you store it in a database, and pass the variables into an xml document, then parse that with an xsl or xhtml. xml is a transport, not a storage.
-
sean

--
The world moves for love. It kneels before it in awe.
XML? by SilentChris · 2001-07-22 10:02 · Score: 2

Great, XML. Now MS can use it in the next version of Office. :P
Re:perhaps a bit off topic (XML) by azaroth42 · 2001-07-22 17:33 · Score: 2

Obviously for such things as flat databases XML is perhaps not the best solution, but XML can and should be used for storing data such as marked up full text documents (TEI) or descriptions of Archives or Museum objects (EAD/Spectrum) XML is most definitely not only a transport. The XML in the database provided is awful, and demonstrates why XML needs to be thought out in advance rather than generated directly from a database. For example, no encapsulation of individual projects, just a single layer of tags from beginning of the document to the end. -- Azaroth
Fixed XML Files by azaroth42 · 2001-07-22 19:09 · Score: 3

Here's a valid XML file and DTD:
http://www.o-r-g.org/~cheshire/osd/osd.tgz
Also, a search engine (Cheshire2) running over the XML with a Very simple interface/display is available at:
http://www.o-r-g.org/~cheshire/osd/
Enjoy =)
-- Azaroth