Expanding the use of XML in Linux?
elemur asks: "I was wondering if there are any projects to expand the use of XML in Linux? There are alot of areas where XML could be more easily and consistently used than continuing making more and stranger configuration files. Many configs could probably fall under a generalized standard application config DTD, and applications that needed something more targeted could supply their own. Some sort of DTD repository could be setup on the machine to handle this. Then, apps just need to use libxml (or whatever it would be called) to handle the reading and parsing. It would seem to make things much more consistent. Has anybody looked into this sort of thing?" It's a good thought. And a standardized configuration file format might be the thing to reduce some of the complexity most folks find in Linux. What do you all think about the capabilities of XML?
Update: 09/29 04:03 by C : Screwtape submitted this tidbit "I just saw this on MozillaZine and I'm quite impressed. Somebody has taken the XML parser from Mozilla, and written software that makes it work like an xterm - but with extra features. For example, you can write a replacement for ls where all the filenames are hyperlinks to the actual files. The site is here. "
While I agree this can definitely be another good way to use XML; I'm not sure everyone will be willing to abandon the ASCII config file formats they have been using for a very long time, and move to an XML-based configuration registry. But something like this has to be done sooner or later...
Why? Using XML for configuration files doesn't actually buy you anything. All XML is is a way of concisely describing the format of a file -- but the data (and more importantly the data's semantics) in each configuration file will vary between programs just as much as they do now.
If you proposed that all configuration files be written as Lisp s-expressions, people would look at you funny because they could easily see that that doesn't magically win -- but mumbling the phrase "XML" seems to escape those filters, even though it's just s-exps with typed parentheses.
[I'm not kidding about the s-exp thing, either. I started to write a little system using XML and XSL to transform some of my docs into HTML, flat text, and LaTeX forms. But when I realized (thanks to Erik Naggum) that the XML document was just a way of serializing a tree structure I changed tacks. Instead, I stored my docs as Lisp s-exps and then was able to use Common Lisp instead of XSL to write the tree-walker and was done in a quarter of the time the other way would have taken.]
In any event, the thought that might bring real benefit is the idea that you can build a hierarchical configuration structure where apps can acquire config data from the containing configuration classes. (It shouldn't be a registry tree, though, because an application might rightfully belong to several disjoint classes -- you'd want a registration directed acyclic graph.) Defining XML DTDs would be a convenient (but not essential) way of creating a lingua franca.
In essence, configuration classes would correspond to OO classes, and the configuration acquisition tree would be an inheritance graph. This could be huge win if the graph were well-designed, because each piece of config data would be kept in one place, and program configurations would automatically adjust when that unique datum was changed. However, if the tree were poorly-designed then you would lose big in the same ways that bad OO designs suck hard -- configurations would be very brittle with lots of interapplication dependencies and needed information would be scattered all over the place.
My personal belief is that the registry approach would tend towards the "huge lose" case -- incrementally designing a good OO architecture requires aggressive refactoring, and refactoring configuration information among dozens of software projects that aren't even necessarily aware of each other would be an interesting problem in change management.
So I see having to manually adjust individual configuration files is the price we pay for letting each project develop independently of the others.
However. It is not all fine and dandy.
The "configuration problem" has not one issue, but several:
XML represents Yet Another Format; it is of value if it pushes out some of the existing formats. If it merely augments the population with another, there is no win here.
Result: Ambiguous. XML might provide value.
The issue here is that you need to ensure that the configuration is written out correctly.
This may require writing out the new config to a new file, validating that it is readable and correct. (Oops, made a mistake updating /etc/inet.d. Now the system won't reboot...)
There is merit to having a "database form" ala IronDoc where the physical representation is a database system, which provides a somewhat different persistence model than the typical text file.
(Before people start proposing that I be shot, I tend to favor the notion of, if using a binary format, synchronizing it carefully with a text format.)
The merit of a "databased" scheme, which should provide a separate database for each facility, is that updates can be implemented "instantly" without needing to rewrite a whole file, and without a need to parse the file. Note that even in a situation where XML is used as an interchange format, there is still merit to storing the "tree" in database form. David McCusker, author of IronDoc and architect of the (regrettably failed) "Bento" database system that was part of OpenDoc, suggests this very use for IronDoc.
For those that feel religious about using text files, a system like libPropList still has merit over the "let's do something with XML" idea since it has, already debugged, the locking, parsing, and config-file-rewriting code that let's use XML, it's k001 doesn't inherently provide.
In short, deciding to use XML merely establishes a format; it does not resolve that:
Michael Stonebraker (of fame with such developments as Ingres and Postgres) has most recently founded a company called Cohera based on the Mariposa Distributed Database Management System. This tool allows many databases to work together to process queries.
The "obvious" implication of this with this thread is that a valuable thing to be able to do is to join together many "databases" that are configuration repositories, and provide a central way of getting at the data.
The critical thing that is necessary is for configuration repositories to provide some sort of "metadata" so that they, in effect, publicize their existence.
A "federation" tool like Linuxconf, Ganymede, or such, can then be used to join together the metadata and manage it all together.
Unlike the situation with the infamous Windows Registry, this doesn't force all the configuration data into one fragile binary DB; it allows the data to stay wherever it was concluded that it should reside.
The critical factor here is not that data files all have a common format; it is that there be some way of translating their data into a common format.
XML has a lot to offer here in terms of providing a central "presentation" format. It could offer more if tools were available to make this a two-way street, where updates done to the central XML could be pushed back to the individual configuration data repositories.
However. If someone writes some integration code to (say) connect Linuxconf to libPropList so that it could directly manipulate libPropList files, that would also represent a movement in the right direction.
Conclusion: XML may have value to offer in confederating config information.
That has to come along with a whole lot of coding effort to build robust configuration data repositories that may or may not use XML.
If you're not part of the solution, you're part of the precipitate.
It seems the poster wants XML introduced into all applications - not really the Kernel. I don't know many places where the Kernel would benefit from XML - except for the one configuration file the code itself wouldn't need XML.
I think what is being addressed is more an application issue. That would need to be addressed to many different vendors simultaneously. For instance: Apache config files, WuFtpd/ProFtd/etc files, Gnome/KDE config files...
I could see a lot of reusability for a config file parser for application development. But it would seem like the development tools would need this XML ability and not just everyone using XML. But I imagine that C/C++, Perl, Python, Java, etc already have XML parsers/creators. So really people are waiting for developers to embrace XML.
Does anyone now embrace XML? What are the advantages of XML over other config file parsers? Are there other standardized config file parsers? I know I've written my share of wheels in different languages for parsing config files.
Joseph Elwell.
XML in place of the current config files would have some advantages. For one thing, it would allow use of a single high-performance parser to parse the files. The days of writing/copying a config file parser for every application would be over. Perhaps we could create a shared library to do this. (If there is any interest in a libxml.so, please let me know. It sounds like a cool project.)
It might reduce version control issues in some cases since new/unknown tags in XML can be ignored (much like unknown HTML tags are ignored). However, a well-written config file parser would do this already.
It would probably speed up the process of creating GUI front-end configurators, since the parser/generators could be reused. An advanced user without a GUI configurator is like a fish without a bicycle, but it would be helpful to newbies and regular desktop users. The "Linux is hard to use" argument would start to go away.
There are some big drawbacks, though. The first is that tons of applications would have to be revised in order to read XML config files. In an open-source world, this means a long painful process where some developers switch to XML immediately and others wait a while. Then there is the pain of converting your customized http.conf/fstab/.profile/etc (bad geek pun intentional) files to XML.
Also, there is the fact that most of the cool tools in Linux are really designed for all of the Unix world. Realistically, the Linux environment can't just switch over to XML config files unless the entire Unix community does.
Maybe future apps should use XML as their config file format. I don't see our well-worn existing tools making a switch anytime soon, however.
Just my 2 cents.
Save the whales. Feed the hungry. Free the mallocs.
But the biggest win will come from minimising the proliferation of DTDs. If the community can co-operate on the development of common DTDs then the exchange of data between software agents developed by different projects will be hugely easier. By all means have diffrent projects - both KDE and Gnome have, in my opinion, benefited from the competition between the two - but if that competition develops the sort of bitterness which reduces communiction and co-operation we all lose.
I would strongly urge anyone who is developing a new software agent - whether it's a user-level application or a new daemon - which either stores data or exchanges data with other agents to seriously consider XML as a format, but more importantly should look at the DTDs that already exist to see if any will fit, and should communicate with anyone else working on related tools.
If anyone wants to look at the XML tutorial I gave at INET99 it's here
I'm old enough to remember when discussions on Slashdot were well informed.