Slashdot Mirror


On The CopyLeft Of DTDs

Makila writes: "I'm working on a project to digitize all the company marketing content, enabling us to re-use data for paper publishing, CD-rom, and Web publishing. The idea beyond that, as we are distributors, is to make suppliers contibute electronically their product descriptions, including technical specs and pictures, all elements that would be edited by us afterwards." To make a long story a bit shorter, Makila is looking for opinions on putting his company's DTD [?] under a free license. What pitfalls and advantages are involved in doing this?

"Writing a DTD is a challenge in itself (my company had never tried to go to the Web before, and never heard of XML until my project). To make the system work, we should then write software to adapt our supplier's data model to ours: for n suppliers we would need 2(n-1) correspondences (import and export) from their data model to ours which gets to be expensive on a large scale. Having a common model would help, especially for small companies not on the Web yet (those which rely only on paper data sheets for instance). My opinion, as there is no standard on our industry like RosettaNet, is that we could speed up things, and avoid babelization of XML tags by releasing our model with a Copylefted licence, lowering the cost and hassle for others on our market to build electronic publishing tools. Of course, there is a lot of money invested in our DTD, so what if competitors try to steal it?

Would the Copyleft of our DTD be a good idea?"

25 of 51 comments (clear)

  1. Free is great, but by perdida · · Score: 2

    there are things that cost money. How about compatibility problems and other things that will require some sort of tech support. How will you pay for that? or will your company be able to absorb the cost in the hopes that everyone begins to use your DTD?

  2. Propose a standard by TheWoundedSeagull · · Score: 2

    Once you have done the analysis and have a working system, put your XML DTD forward as the standard. Make this your business model - be the first to market. Get the qudos of being the ones who wrote the standard. Let the other companies "steal" you DTD - if they do, you have created a bigger less segmented market in which you are the leaders. Try and be the best implementors and supporters of the system that you have written the standard for. What licence should a standard be released under? Does not really matter - the only problem is if another company takes the standard and uses and "embrace and extend" policy that makes old implementations incompatible. The role of the standards body is to check whether implementations conform to the standard. If you cant find the right standards body to do that - do it yourself - you need a "brand" to make this work.

  3. copylefted DTD by rsmith · · Score: 2

    You might have a look at the freely available DocBook DTD. If you can use that, you don't have to roll your own.

    If you copyleft your DTD, with a GPL-like license, nobody can steal it, because it's free. You might even create a standard, if it's a usable DTD. And you could share the load for data conversion, by asking your contributors to format the data according to your open DTD before submitting it.

    I'm not really sure if there are really any downsides, unless your DTD is in some way your critical moneymaking resource (although I can't imagine how).

    Just my $0.02.

    Roland

    --
    Never ascribe to malice that which is adequately explained by incompetence.
  4. Mainly depends on your internal politics. by Christopher+Thomas · · Score: 2

    Whether or not this is a good idea depends mainly on your company's approach to business.

    In the grand scheme of things, it won't help your competition much, as they'd just spend the time to develop their own in-house solutions anyways when they felt the need. The practical effect of releasing the spec is that you've made a fixed, one-time donation of manpower to your competitors (they no longer have to develop their own versions of this spec).

    On the other hand, there is little direct benefit to you releasing the spec. Some groups will adopt it, others won't, and you'll still have to spend a lot of time beating on your customers to use it properly. The good news is that a), free/open tools to perform conversion to/from common formats may become available, which reduces your support load to your customers (you'd otherwise have to provide the tools yourselves), and b) the spec may be extended by others when shortcomings are noticed. This is a benefit - you get R&D for free.

    In practice neither effect is likely to be large unless you get lucky/unlucky. Your competitors will probably develop their own in-house specs tailored to their own needs anyways, and unless this is spectacularly useful, the Open Source and Free Software communities are unlikely to glom on to it to the extent required for free (beer) tools and an improved spec to appear.

    What will determine whether management approves/disapproves this idea is a) whether their optimistic about the OSS/FS community's ability to spontaneously produce tools, and b) how cagey they are about their "intellectual property". Most likely scenario: They'll see no benefit and some potential loss, and more importantly see a chunk of their IP hanging out there for the world to see. Project not approved.

    But, IMO it's still worth a shot, as long as you state your justifications carefully and do your research.

  5. Hey... by soulsteal · · Score: 2

    Wasn't DTD banned in 1972 for causing Bad ThingsTM?

  6. Re:XML by Alex+Belits · · Score: 3

    Are you a moron? The whole question is about definition -- contrary to the popular belief XML is not a unified standard for representation of a structured data, it's an umbrella standard for different kinds of data representation formats. And to use any of those XML-based formats one needs:

    1. DTD or Schema
    2. Description (in human-readable form) of what data actually means and what restrictions are placed on it

    XML was and is criticized for the lack of means to convert the second into a code that can be automatically included in the first and used to create programs that operate with the data according to its semantics -- all DTD is good for is to automatically determine if certain input is indeed compliant with it (what is called "validation", even though it never guarantees that data is valid or consistent from the application or data model point of view), and for human to read the description and write a code to process the data.

    While XML still sucks because no such connection between formats amd semantics can be established, the original question was about publishing first (and hopefully the second), so others will be able to write applications that use the same format. DTD can apply to either XML or SGML, but in this case there isn't much difference between them in the results for the programmer, as he will end up doing all the job after some simple parser deserialized the data.

    --
    Contrary to the popular belief, there indeed is no God.
  7. Standardization. by istartedi · · Score: 5

    A DTD is supposed to standardize data formatting, isn't it? Think less "copyleft" and more "standardized". This is one situation where the Artistic license makes sense, because it requires non standard versions to be labeled as such.

    The Artistic license is so vague though, you might want to have your legal department draft something based on the BSD license, with a clause that hacked versions would have to be relicensed under a different name. That would give developers maximum freedom without compromising the standard. In other words, they could steal your code but they couldn't steal your brand name; similar to RedHat.

    A GPL'd DTD would compel other developers to release refinements, but it would do nothing to protect your brand. Brand theft would be far more damaging than code theft.

    --
    For all intensive purposes, "whom" is no longer a word. That begs the question, "who cares"?
  8. Re:No danger by Alex+Belits · · Score: 2

    No one ever made money on selling DTDs -- formats exist to be used, and keeping a "secret" DTD makes no sense as it can be easily reverse-engineered if someone really needed it. The only people who benefit from closed formats are ones that make their whole business model around selling software that implements them -- a model that is counterproductive for actual use of information.

    --
    Contrary to the popular belief, there indeed is no God.
  9. Don't use DTD - use XML Schema by uksv29 · · Score: 3

    If you are working on a new project use XML Schema rather than DTDs. DTDs are a hangover from the days of SGML and do not allow you much control on the content of your documents.

    If you use XML Schema then you can specify exactly the format and content of your fields and validate the document much more precisely than just PCDATA / CDATA permits.

    Go and have a look at the W3C site before you commit yourself, it is an easy change at the start of a project but will be much harder later.

    Description of XML schema can be found at http://www.w3.org/XML/Schema .

    1. Re:Don't use DTD - use XML Schema by frisket · · Score: 2

      SGML has some 30-40 extra options that we left out when designing XML, to make it easier to program for. Writing XMl software is not hard; writing SGML software is much harder because you have to cope with all the options. There are lots of big differences in the tagging of SGML once you start using all the options for abbreviation (minimization). Linuxdoc uses SGML because they started before XML was invented. They did their stuff a bit off on a tangent without getting any help from people who already knew SGML, so a lot of it was very flaky for a long time. Now it's stable, but there's a huge legacy of SGML. But DocBook-X is ready to use. Toolchain is easy: XML--XSLT--LaTeX--PDF using JadeTeX, or XML--Omnimark--LaTeX--PDF. Emacs has an excellent xml mode built into the standard psgml-mode for SGML (does both). There's also a DTD mode for writing DTDs and an XSL IDE mode for doing XSL. Buzzword compliance is important when you have top deal with management. But XML requires less resources than SGML because it's simpler to program for. But structured doc and info needs careful planning no matter what s/w you use. ///Peter --

  10. GPLing DTD's makes no sense by Hieronymus+Howard · · Score: 5

    I've both used and written a number of DTD's and releasing one under the GPL would really make no sense. The GPL freely allows anyone to modify your code, which is the last thing you want with a DTD. Since a DTD is a formal specification, you need to keep control over it. Ideally, once defined, it should never be allowed to change. If people can modify it as they like, then it becomes useless, since your XML documents may not conform to the modified versions.

    If I were you, I would use something very similar to the Docbook copyright notice:

    Copyright 1992-2000 HaL Computer Systems, Inc.,
    O'Reilly & Associates, Inc., ArborText, Inc., Fujitsu Software
    Corporation, Norman Walsh, and the Organization for the Advancement
    of Structured Information Standards (OASIS).

    $Id: docbookx.dtd,v 1.12 2000/08/27 15:15:26 nwalsh Exp $

    Permission to use, copy, modify and distribute the DocBook XML DTD
    and its accompanying documentation for any purpose and without fee
    is hereby granted in perpetuity, provided that the above copyright
    notice and this paragraph appear in all copies. The copyright
    holders make no representation about the suitability of the DTD for
    any purpose. It is provided "as is" without expressed or implied
    warranty.

    If you modify the DocBook DTD in any way, except for declaring and
    referencing additional sets of general entities and declaring
    additional notations, label your DTD as a variant of DocBook.


    HH

  11. Who are you? by divec · · Score: 2
    Of course, there is a lot of money invested in our DTD, so what if competitors try to steal it?

    How difficult would it be for a competitor to "steal" the DTD anyway? I mean, copy your ideas whilst renaming tags, restructuring the DTD a bit, and so on, till it wasn't provably derived from your DTD? The only point of you having a non-free license to defend your DTD is if this kind of defense might work. If your DTD would be easy to duplicate anyway, then you're not getting any security from a non-free license.

    As to whether copylefting the DTD would help your company, I think the answer largely depends upon who you are, and your relationship with your suppliers. If you are having problems persuading your suppliers to use your DTD, then being able to point to the open license might help: "this is poised to become the standard". On the other hand, if all your suppliers are happy to use the DTD already, then you won't make any short-term gain. You might make long-term gain if future suppliers would be more willing to use a copylefted DTD; but that depends on what your industry's like and what kind of stance your suppliers are likely to take.

    --

    perl -e 'fork||print for split//,"hahahaha"'

  12. Copyleft probably won't protect a DTD by Bruce+Perens · · Score: 4
    It's nice to copyleft your DTD, and it makes it possible for others to use it and to make programs that are compatible with your data storage format. It will probably encourage others to make their tools Free Software. That might make your program more popular, because it has all of these nice accessories that you didn't have to develop.

    On the other hand, don't expect the copyleft to protect your DTD. If anyone wants to use the data format in a proprietary application, well, they might not be able to use your DTD directly, but they can clone it and the result would probably not be considered a derivative work.

    There are a few rights that we want to protect for the good of Free Software. We don't want API copyrights to be enforcible. We want to have the right to reverse-engineer for purposes of compatibility. We don't want to have a Microsoft come along and say "You can't make word processors that are Word-compatible, the file-format is copyrighted". Asserting the copyleft on a file format isn't compatible with this. However, a DTD isn't a file format, just its description. Thus, go ahead and copyleft your DTD, but be aware of the limitations.

    Thanks

    Bruce

  13. "Copyright" DTDs make no sense by FFFish · · Score: 3

    For starters, it wouldn't take a team of rocket scientists to clean-room clone the DTD to a level of functionality that'd satisfy most anyone.

    For seconders, there are already a bijillion incompatible DTDs out there. The world doesn't need more.

    And most importantly, requiring your suppliers and/or customers to conform to a closed-source DTD *COSTS THEM RESOURCES.* You shoot yourself in the foot when you do that: as soon as someone with a cheaper solution comes along, kiss your contract goodbye.

    The best thing you can do is work *with* your competition to develop a *single* DTD that saves all your suppliers/customers money. Compete on the basis of service, of added-value, or something else that counts. Competing based on proprietary DTDs is just utterly stupid.


    --

    --

    --
    Don't like it? Respond with words, not karma.
    1. Re:"Copyright" DTDs make no sense by Black+Parrot · · Score: 2

      > For starters, it wouldn't take a team of rocket scientists to clean-room clone the DTD to a level of functionality that'd satisfy most anyone.

      Actually, a "clean-room" implementation, if possible at all, would have to have different tag names, or else you would get sued for violating the copyright. But with different tag names, it would not actually work with data tagged the original way. Those "open" documents would not be so open after all.

      IMO, copyrighted DTDs will be the major weapon in the next generation of attempts to corner the market via proprietary data formats.

      Alas, that's a vision somewhat different from the promise of XML.

      --
      Sheesh, evil *and* a jerk. -- Jade
  14. Release Those DTDs as Open Source! by rnturn · · Score: 2

    Unless you want your data to be inaccessible to anyone else. What would be the point of a company declaring of ``We're Open! We use XML!'' and then tie up the use of the data with some silly license attached to the DTD.

    I'd love to see something big happen to XML. But then I had high hopes for EDI way back when. It turned into a total mess where every implementation was a custom job it was doomed to fall on its face and find far fewer companies that wanted to take advantage of it. And each job was custom since no one could agree on things like what ``customer code'' meant. Hard enough to get two divisions of the same company to agree on that let alone two separate companies. Along comes XML and it just might fall on its face for similar reasons.
    --

    --
    CUR ALLOC 20195.....5804M
  15. Eh? by Calamari+Indigo · · Score: 2

    I can't believe that DTD are a serious subject on slashdot. Next we'll be arguing about fonts.

    This sucks.

  16. DTD need consensual XML standards by crovira · · Score: 2

    I'm currently engaged in some XML efforts where I work. The hard part is not the XML. Almost all database engines can generate XML wrappers for data objects based on their schema or generate data objects from XML streams.

    The difficulty comes from getting two sets of people to agree on what the objects definitions are or are going to be. That requires collaboration and cooperation. Two things that are not going to come from any software effort.

    All software developpers tend to treat the invertion of fire as their exclusive intellectual property and you can eat your meat cold and bloody or pay them for the privilege of cooking your steak.

    The effort will have to come from consortia of clients and related firms who use data processing but aren't in that business.

    That said, yes you can publish the DTD specifications arrived at by the consortia and it wil be aequately covered by the document copyright.

    Though I think that using copyleft would allow you to avoid stupidity like the RAMBUS debacle.

    Newton said I see far because I stand on the shoulders of Giants. Linus Thorvald RMS et alia are giants. Bill Gates is a big dip in the level playing field. Emulate Linus and you stand a chance. Emulate Bill and your effort will degenerate into a pack of wild dogs tearing at a haunch.

    --
    MSBPodcast.com The opinions expressed here are my own. If you don't like 'em... Think up your own stuff.
  17. Slashdot's Own Example Of DTD/XML Use by mccormick · · Score: 2

    If anyone is a bit unsure of what a DTD is, you may be interested to see how Slashdot (and the Slash code in general) use XML and DTDs.

    Slashdot (again, Slash if it's setup to) produces all headlines in a convienient, machine-readable format. It can be found at www.slashdot.org/slashdot.xml .

    At the same time, the DTD for this file (called 'Backslash' and can be found at www.slashdot.org/backslash.dtd) essentially describes to an XML parser what is and what is not allowed in the file. It essential defines what constitutes a "valid" document; it is valid meaning that when compared against the DTD, it conforms to the defintion.

    "Well-formed" is another XML term which means it at least is formatted correctly accordingly to the XML definition (for example, single tag elements end in a backslash.)

    If you're interested in learning about XML and this DTD stuff, as well as all the latest proposals that are meant to replace DTDs (such as XML Schema's), check out the official W3C site at www.w3.org/XML/.

    --
    Pete
  18. DTD's may not be subject to copyright, anyway by n8ur · · Score: 2

    There's a good chance that a DTD wouldn't be subject to copyright.

    Copyright protects the expression of ideas, and not ideas themselves (that's what patents are for). There's a copyright law concept called "the merger doctrine" that says (more or less) that you cannot copyright a work that represents the only possible expression of an idea -- to do so would result in copyright protecting the idea along with its expression, and that's beyond a copyright's power. The idea and its only expression are said to merge, and thereby fall out of the scope of copyright protection.

    (The case that set this idea out was Baker v. Selden, which was decided at the turn of the last century and had to do with a book of accounting forms -- the expression of the form was its idea, and as a result people were free to copyright the layout of the form.)

    This is the reason right-thinking people believe that APIs cannot be copyrighted -- by definition, the API is the only accurate expression of the idea represented by the interface, and the merger doctrine applies.

    A DTD would likely be subject to the same reasoning.

  19. DTD is an interface... by Arandir · · Score: 2

    As I understand it, a DTD is an interface. As such, it should be completely unrestricted. No proprietary licenses, no copyleft, just plain old BSD, MIT or like license.

    I believe that copyright law says that you cannot prevent anyone from using an interface. Any license that restricts access to the interface is taking *away* a right that the user already possesses. This is a pretty big step for copyleft to take, and I don't know that it is legally valid without an end user license.

    Another option would be a "weak" copyleft, that guarantees access to the original DTD, but does not restrict any software that uses the DTD. Sort of an LGPL for DTDs. I know you guys want a world where the people you don't like don't exist, but you twist the meaning of "freedom" beyond recognition when you dictate the license that other people's XML documents must be under. (I'm not leveling this solely at the copyleft community, but also at the commercial firms that do the same with proprietary licenses).

    --
    A Government Is a Body of People, Usually Notably Ungoverned
  20. Re:XML by Alex+Belits · · Score: 2

    As I recall, an XML DTD can optionally include CSS information. This would enable a user agent (browser) to display an xml document correctly.

    Who said anything about displaying information? Most of information isn't meant to be displayed as often as it should be processed, and my complaint is about inability to create any processing routines without using some external information, even if the standard can be easily formalized in the form of constraints and processing algorithms that correspond to the nature of data. I understand that in this obsessed with GUI consumer software industry people are more likely to first think about pretty forms of displaying the data, but for any kind of real work this would be tail wagging the dog.

    --
    Contrary to the popular belief, there indeed is no God.
  21. Re:No danger by Alex+Belits · · Score: 2

    I would show you an example of a huge loss that was mostly wasted effort of reimplementing things multiple times and extending the format/protocol in ways that it was not able to accomodate because no one else designed their systems to be compatible with it, but the problem is... format is still closed, so how can I publish anything about it?

    --
    Contrary to the popular belief, there indeed is no God.
  22. Re:No danger by frisket · · Score: 2
    The point about an SGML document is that a user MUST have the DTD in order to use the document, so it is impossible to keep it secret.

    Under certain circumstances you can use a DTD to create an XML document, and then send someone the document without the DTD, because XML may still work without it.

    XML doc control is different from open-source programming in that a DTD is not a program, it's more like a config file, and there is no point in keeping it secret.

    If your DTD is a useful tool, it makes sense to allow others to use it; if it's really useful it may even become a de facto standard.

    But there are so many useful DTDs out there already that creating your own should only be done after a document analysis has demonstrated that none of the existing ones will fit the bill.

    ///Peter
    --

  23. Re:XML by Alex+Belits · · Score: 3

    Can you give an example of what you mean by creating processing routines without using some external information?

    Any kind of data that reflects something in real life. For example, description of financial transactions, where only certain combinations of values are valid, and the effect of transaction should be calculated using known algorithms based on the document content itself and the database that describes entities involved that can be known only partially to each party (say, I don't know, how much money a brokerage and the government have, but I do know how to calculate brokerage fee and taxes when I sell stock, and I know how it affects my account -- but I can't just ask INS and brokerage to give me a bunch of machine-readable definitions that when compiled will allow me to process those things automatically).

    XML 1.0 is the W3C's first Recommendation for the Extensible Markup Language, a system for defining, validating, and sharing document formats on the Web

    I don't see the word "displaying" or any synonym for it anywhere in this definition. Documents may be shared for any kind of purpose, and displaying is just one of them.

    The primary use of the web has displaying information since it was invented. As far as I can tell, XML does exactly it was designed to.

    Not true. HTTP was created as the protocol to transfer HTML files and images, however since HTTP 1.0 full MIME types support was added, and protocol was transformed into "better FTP". I don't think, rpm file that I have downloaded to update Red Hat distribution yesterday was ever meant to be displayed -- it serves its purpose only by being processed by rpm utility, and most of data that it contains is not human readable at all. XML is supposed to be used in the same way -- in my example of financial transaction (for what OFX format may be used -- it has first SGML-based and later XML-based version) the functionality is completely unrelated to the display of data, and even if such information is displayed, financial transactions usually are not displayed in the way as they are performed, but converted and combined together to be human-readable using external algorithms.

    --
    Contrary to the popular belief, there indeed is no God.