On The CopyLeft Of DTDs
"Writing a DTD is a challenge in itself (my company had never tried to go to the Web before, and never heard of XML until my project). To make the system work, we should then write software to adapt our supplier's data model to ours: for n suppliers we would need 2(n-1) correspondences (import and export) from their data model to ours which gets to be expensive on a large scale. Having a common model would help, especially for small companies not on the Web yet (those which rely only on paper data sheets for instance). My opinion, as there is no standard on our industry like RosettaNet, is that we could speed up things, and avoid babelization of XML tags by releasing our model with a Copylefted licence, lowering the cost and hassle for others on our market to build electronic publishing tools. Of course, there is a lot of money invested in our DTD, so what if competitors try to steal it?
Would the Copyleft of our DTD be a good idea?"
Are you a moron? The whole question is about definition -- contrary to the popular belief XML is not a unified standard for representation of a structured data, it's an umbrella standard for different kinds of data representation formats. And to use any of those XML-based formats one needs:
XML was and is criticized for the lack of means to convert the second into a code that can be automatically included in the first and used to create programs that operate with the data according to its semantics -- all DTD is good for is to automatically determine if certain input is indeed compliant with it (what is called "validation", even though it never guarantees that data is valid or consistent from the application or data model point of view), and for human to read the description and write a code to process the data.
While XML still sucks because no such connection between formats amd semantics can be established, the original question was about publishing first (and hopefully the second), so others will be able to write applications that use the same format. DTD can apply to either XML or SGML, but in this case there isn't much difference between them in the results for the programmer, as he will end up doing all the job after some simple parser deserialized the data.
Contrary to the popular belief, there indeed is no God.
A DTD is supposed to standardize data formatting, isn't it? Think less "copyleft" and more "standardized". This is one situation where the Artistic license makes sense, because it requires non standard versions to be labeled as such.
The Artistic license is so vague though, you might want to have your legal department draft something based on the BSD license, with a clause that hacked versions would have to be relicensed under a different name. That would give developers maximum freedom without compromising the standard. In other words, they could steal your code but they couldn't steal your brand name; similar to RedHat.
A GPL'd DTD would compel other developers to release refinements, but it would do nothing to protect your brand. Brand theft would be far more damaging than code theft.
For all intensive purposes, "whom" is no longer a word. That begs the question, "who cares"?
If you are working on a new project use XML Schema rather than DTDs. DTDs are a hangover from the days of SGML and do not allow you much control on the content of your documents.
If you use XML Schema then you can specify exactly the format and content of your fields and validate the document much more precisely than just PCDATA / CDATA permits.
Go and have a look at the W3C site before you commit yourself, it is an easy change at the start of a project but will be much harder later.
Description of XML schema can be found at http://www.w3.org/XML/Schema .
I've both used and written a number of DTD's and releasing one under the GPL would really make no sense. The GPL freely allows anyone to modify your code, which is the last thing you want with a DTD. Since a DTD is a formal specification, you need to keep control over it. Ideally, once defined, it should never be allowed to change. If people can modify it as they like, then it becomes useless, since your XML documents may not conform to the modified versions.
If I were you, I would use something very similar to the Docbook copyright notice:
Copyright 1992-2000 HaL Computer Systems, Inc.,
O'Reilly & Associates, Inc., ArborText, Inc., Fujitsu Software
Corporation, Norman Walsh, and the Organization for the Advancement
of Structured Information Standards (OASIS).
$Id: docbookx.dtd,v 1.12 2000/08/27 15:15:26 nwalsh Exp $
Permission to use, copy, modify and distribute the DocBook XML DTD
and its accompanying documentation for any purpose and without fee
is hereby granted in perpetuity, provided that the above copyright
notice and this paragraph appear in all copies. The copyright
holders make no representation about the suitability of the DTD for
any purpose. It is provided "as is" without expressed or implied
warranty.
If you modify the DocBook DTD in any way, except for declaring and
referencing additional sets of general entities and declaring
additional notations, label your DTD as a variant of DocBook.
HH
On the other hand, don't expect the copyleft to protect your DTD. If anyone wants to use the data format in a proprietary application, well, they might not be able to use your DTD directly, but they can clone it and the result would probably not be considered a derivative work.
There are a few rights that we want to protect for the good of Free Software. We don't want API copyrights to be enforcible. We want to have the right to reverse-engineer for purposes of compatibility. We don't want to have a Microsoft come along and say "You can't make word processors that are Word-compatible, the file-format is copyrighted". Asserting the copyleft on a file format isn't compatible with this. However, a DTD isn't a file format, just its description. Thus, go ahead and copyleft your DTD, but be aware of the limitations.
Thanks
Bruce
Bruce Perens.
For starters, it wouldn't take a team of rocket scientists to clean-room clone the DTD to a level of functionality that'd satisfy most anyone.
For seconders, there are already a bijillion incompatible DTDs out there. The world doesn't need more.
And most importantly, requiring your suppliers and/or customers to conform to a closed-source DTD *COSTS THEM RESOURCES.* You shoot yourself in the foot when you do that: as soon as someone with a cheaper solution comes along, kiss your contract goodbye.
The best thing you can do is work *with* your competition to develop a *single* DTD that saves all your suppliers/customers money. Compete on the basis of service, of added-value, or something else that counts. Competing based on proprietary DTDs is just utterly stupid.
--
--
Don't like it? Respond with words, not karma.
Can you give an example of what you mean by creating processing routines without using some external information?
Any kind of data that reflects something in real life. For example, description of financial transactions, where only certain combinations of values are valid, and the effect of transaction should be calculated using known algorithms based on the document content itself and the database that describes entities involved that can be known only partially to each party (say, I don't know, how much money a brokerage and the government have, but I do know how to calculate brokerage fee and taxes when I sell stock, and I know how it affects my account -- but I can't just ask INS and brokerage to give me a bunch of machine-readable definitions that when compiled will allow me to process those things automatically).
XML 1.0 is the W3C's first Recommendation for the Extensible Markup Language, a system for defining, validating, and sharing document formats on the Web
I don't see the word "displaying" or any synonym for it anywhere in this definition. Documents may be shared for any kind of purpose, and displaying is just one of them.
The primary use of the web has displaying information since it was invented. As far as I can tell, XML does exactly it was designed to.
Not true. HTTP was created as the protocol to transfer HTML files and images, however since HTTP 1.0 full MIME types support was added, and protocol was transformed into "better FTP". I don't think, rpm file that I have downloaded to update Red Hat distribution yesterday was ever meant to be displayed -- it serves its purpose only by being processed by rpm utility, and most of data that it contains is not human readable at all. XML is supposed to be used in the same way -- in my example of financial transaction (for what OFX format may be used -- it has first SGML-based and later XML-based version) the functionality is completely unrelated to the display of data, and even if such information is displayed, financial transactions usually are not displayed in the way as they are performed, but converted and combined together to be human-readable using external algorithms.
Contrary to the popular belief, there indeed is no God.