Slashdot Mirror


Can XML Replace Proprietary Document Formats?

Pauly asks: "My former profession of Technical Writer was made very painful by my customers' requirement to have their documents delivered in MS Office formats. PDF/FrameMaker was not acceptable, as they needed to be able to edit the documents as well. Let me tell you, it is painful watching a 3,000+ page Word97 manuscript, the fruit of weeks of hard labor, rendered into rubbish by my customer's Word95. I've missed deadlines, lost money, and will never forgive Microsoft for their abuse of me and my kind. My question: is it possible that XML-based standard file formats suitable for word processor, spreadsheets, etc. could be created that forever do away with proprietary binary formats and inadequate file conversion routines? This notion seems to be working for the graphics crowd in the form of SVG. The benefits are obvious, what are the drawbacks?"

291 comments

  1. Re:Proprietary Formats = Never Confused by Anonymous Coward · · Score: 1

    > Hardly any two web browsers render the same HTML the same way. True. But wasn't that the whole point of HTML in the first place? It was designed to deliver the same content on different platforms, with hints (= the tags) on the semantics of the content, not on how to format it. This way you could use whatever renderer you prefer. (visual or text-to-speech, ...)

  2. XML needs to be integrated into Linux by Anonymous Coward · · Score: 1

    As a professional IT consultant working for one of the top names in the software industry I am working on a detailed report into the "open source" phenomenon (thanks to various people for pointing out that it is not freeware per se) as started by Linus Torvalds with his Linux operating system some six years ago.

    Anyway, let me tell you that XML is truly the wave of the future as far as the major players in the corporate domain are concerned. A lot of companies have rejected custom-built solutions in favour of the perceived "openness" that XML provides, and once the SOAP protocol is approved, XML will be the definitive technology for software to include in this day of interoperability.

    Since XML is going to be such a huge part of the industry within a year or two, what I think that Linux (and indeed, every operating system that wants to compete) needs to do is to integrate XML into the kernel as an intrinsic part of the system architecture. Then software vendors, looking for a rock-solid platform with which to write mission-critical apps in the internet able domain, will choose Linux as their platform of choice, since having XML integrated into the kernel will provide stability and performance attributes.

    Anyway, once all interapplication communication is acheived through the use of XML and the SOAP protocol, any operating system fully supporting these in a native environment will benfit hugely in the perception of tech-savvy CTOs. This is why I think that Linus and his kernel team should make this their number one priority if Linux is to suceed.

    1. Re:XML needs to be integrated into Linux by Darkstorm · · Score: 1

      Why have you gotten so offended?

      How to put it politely? Basicly it is very clear you do not understand kernals or XML or quite a few things. You seem to be business oriented, and that is the problem.

      For those of us who do understand kernals and parsing and programming...what you are suggesting is not reasonable. The kernal has one major job, it controls the hardware, it is the controlling force of the os. It handles TCP/IP and that is all. From there it is up to software to do the rest.

      As for running fast, well since linux tends to run fast and remain stable, that right there should be the important factor to an IT shop. XML similar to HTML it is a language, not piece of hardware. What your are suggesting would be like asking the kernal team to put the gcc compiler in the kernal. Besides being laughed at, you would be concidered a fool and promptly ignored.

      You must also remember that linux for the most part is advanced by the people who love it. Selling something is not always first on their mind, if so there would be tons of "user friendly" software for linux. Most linux software is very stable, but not really pretty. This usually bothers no one because it works.

      If you are going to post here and post on something you know almost nothing about, expect to get flamed.

      --
      If ignorance is bliss, the world is full of blissful people
    2. Re:XML needs to be integrated into Linux by Darkstorm · · Score: 1

      It goes:
      Ethernet (or other networking hardware)
      TCP
      IP
      Software

      The reasoning for TCP being in the kernal is that most networks are using TCP. TCP (Transmission Control Protocal) controls the traffic to the ethernet divice. It handles packet loss and other functionallity to get the information in and out of the ethernet device. The IP is the link between TCP and software (and I'm not positive of its inclusion in the kernal). The Internet Protocol Controls the routing of the pacets from the TCP layer to the correct software applications, ie Web Browser which then parses the HTML/XML.

      This is a bit general, but I don't have the need to go into the depths of TCP/IP every day.

      --
      If ignorance is bliss, the world is full of blissful people
    3. Re:XML needs to be integrated into Linux by Vaz · · Score: 1

      Oh, I thought thin client PCs were the wave of the future? Not to mention of Prolog and AI in 1980's?

      I don't need this talk of "x is wave of the future" spiel.

    4. Re:XML needs to be integrated into Linux by mistabobdobalina · · Score: 1

      OK, now obviously all of the responses to this fairly lame post have been valid. However there is a "kernel" (heh) of truth here i.e., MS is planning to expose everything as SOAP calls over HTTP. Linux should try to do the same.

      --
      -- your knees hurt, don't they?
    5. Re:XML needs to be integrated into Linux by nuance · · Score: 1
      It goes:
      Ethernet (or other networking hardware)
      TCP
      IP
      Software

      Actually it goes
      Ethernet (or other networking hardware)
      IP
      TCP
      Software
      TCP runs over IP as does UDP. TCP is lower level than IP

    6. Re:XML needs to be integrated into Linux by nuance · · Score: 1
      Ok Sorry about the second post, but the priview button really is too close to the submit one :-(

      That last sentance of mine should read "IP is lower level than TCP". IP handles the routing of a packet over the hardware, TCP handles splitting the message into transmittable chunks (packets), error correction and higher level stuff like that.

    7. Re:XML needs to be integrated into Linux by mrfiddlehead · · Score: 1
      Troll? Methinks you give Mr Professional IT consultant too much credit. Clueless suit would be a better mod.

      He's right about the importance of persuing XML as a standard though, but, as many have stated already, XML has no place in the kernel.

      --
      :wq
    8. Re:XML needs to be integrated into Linux by RightSaidFred · · Score: 1

      I believe your layers are switched:

      Physical (your hardware)
      IP (addressing and routing)
      TCP\UDP (control and error correction)
      Applications (HTTP, FTP, TELNET, etc.)

      Winsock \ Berkeley Sockets is between the applications and the stack, not IP.

      If you think TCP is the sole TCP/IP protocol -its not. UDP is used for DNS and DHCP. IP is in the kernel because even if you are not on a network, it is sometimes used for communication between processes.

    9. Re:XML needs to be integrated into Linux by Matts · · Score: 2
      You said:

      Since XML is going to be such a huge part of the industry within a year or two, what I think that Linux (and indeed, every operating system that wants to compete) needs to do is to integrate XML into the kernel as an intrinsic part of the system architecture. Then software vendors, looking for a rock-solid platform with which to write mission-critical apps in the internet able domain, will choose Linux as their platform of choice, since having XML integrated into the kernel will provide stability and performance attributes.

      I say I've never heard such horse s**t before. Integrating an XML parser into the kernel will do exactly zero for the performance of XML parsing. This is a userland task. It's fairly obvious that you know nothing about XML, nothing about kernels, and I'm guessing these "top names" didn't do enough research about your knowledge before asking you to do their report. I look forward to seeing it headline on ZDNet.

      Want to deliver XML with Apache to different media in varying styles? Get AxKit

      --

      Matt. Want XML + Apache + Stylesheets? Get AxKit.
    10. Re:XML needs to be integrated into Linux by Deven · · Score: 2

      Is the user "Deven" so stone cold stupid that he falls for the "IT Consultant" troll bait?

      It doesn't matter one bit that it was troll bait. Adding irrelevant stuff to the kernel is a bad idea, and even a troll could get other people thinking that maybe it's a good idea. The last thing we need is thousands of people clamoring to put every application into the kernel. (Sure, they wouldn't be heeded, but it would be a distraction nonetheless.)

      Or is the user "Deven" actually a sophisticated troll herself?

      Bzzt. Wrong on both counts: (1) I'm not trolling, and (2) I'm male.

      On \., who can tell?

      "On the Internet, nobody knows you're a dog." (Or a program that can pass a Turing test?)

      --

      Deven

      "Simple things should be simple, and complex things should be possible." - Alan Kay

    11. Re:XML needs to be integrated into Linux by imac.usr · · Score: 2
      Your buzzword-heavy post concerns me a bit as to its level of trollosity, but in case you're for real, Apple's already done something like what you describe, at least as far as configuration of the system - check out the first few pages of this ArsTechnica article on Mac OS X DP3 for details.

      --
      I use Macs for work, Linux for education, and Windows for cardplaying.
    12. Re:XML needs to be integrated into Linux by Deven · · Score: 5

      Why have you gotten so offended? If you don't like what I have to say then at least be polite, after all, it only reflects badly on you and hence Slashdot as a whole. I have commonly found an amazing resistance to different opinions amongst the "open" source community, which seems to me to be the antithesis of what you stand for.

      Comments such as "XML should be in the kernel" betray a lack of understanding as to the proper function of the kernel. Worse yet, (unlike, say, khttpd), putting an XML parser in the kernel wouldn't provide any benefit. All you're doing is encouraging the kind of useless feature bloat that Microsoft is rightly loathed for. That's why people get upset about remarks like this; they don't want this attitude to spread further than it already has.

      Anyway, what you are clearly unaware of is that the perception of performace and stability is far more important in the corporate domain than the actuality of the situation. By integrating XML into the kernel, you have provided Linux with a major marketing point for the people who are actually in charge of what their company uses.

      You won't be able to maintain the perception of performance and stability if the actuality is the opposite. Even Microsoft, with its legendary marketing might, has begun to pick up on this fact a little. (Note how stability has become a marketing point for them; why would it need to be, but for the constant crashing of their existing products?)

      The exact breakdown of an operating system varies from one OS to another. In general, the purpose of any "operating system" is to arbitrate and manage hardware resources. Anything else is basically fluff. XML parsing is an application support issue, and detracts from the core function of managing hardware resources. Occasionally, an application function may be put in the kernel for good reasons, usually related to huge performance advantages gained by an in-kernel implementation. (khttpd is an example of this.) Even this is resisted strongly, because it "pollutes" the most critical code in the entire system, and poses an inherent risk to the stability, integrity and maintainablility of the system as a whole.

      Basically, to add an application-specific function to the kernel, you had better have a really good reason to be suggesting it, one that can be justified (and defended) on a technological basis. If Linus were to allow marketing considerations (such as this) to drive kernel development, not only would he lose the respect of most of his supporters, but the end result would be just as crappy as Windows, sooner or later.

      Given that Linus himself has talked about "world domination", doesn't it seem short-sighted to ignore a major selling point in favour of your petty-minded arguments?

      Keep in mind that "world domination" remarks are somewhat tongue-in-cheek. Yes, he's half-serious, but only half. He wants people to use Linux over Windows because it's a better system. It wouldn't remain better if this approach to kernel development were adopted. Keeping the kernel pure isn't a "petty-minded argument"; it's a critical element of good design.

      All that said, you would have received a much different response had you suggested that Linux systems (as a whole) start integrating XML support , use XML for system configuration and provide XML services for applications. There's a good argument to be made for that, and the marketing value should be similar. There's also technological arguments to be made in favor of it. The distinction here is that this support would all be in "user space" rather than the kernel, even though it might be an integral part of the operation of the system as a whole. The kernel is the core of the system, and the idea of integrating XML into Linux does not imply that it belongs in the kernel.

      --

      Deven

      "Simple things should be simple, and complex things should be possible." - Alan Kay

  3. OT: First Post Generator by Anonymous Coward · · Score: 1
    How about a topic where the /. engine generates a 1st post for *every* reply? On top of that, a gen-u-ine "Slashdot First Post Certificate" is downloaded to the submitter. Then 1st posts will be common, and this whole thread will disappear! No?

    According to the old book "Hackers", an old MIT OS had a HALT (or CRASH) command accessible to any user. It would bring the computer to a stop, requiring a reboot. The idea was to make crashing the computer trivial, so that the current game of "can I write a program to crash the computer?" would stop.

    Of course, this command depends on an ethic to *not* run the command (i.e., the "brotherhood of man" instead of the "depravity of mankind").

  4. Reading comprehension? (Was: This is beneath /.) by Anonymous Coward · · Score: 1
    Sorry, but you deserved to lose your document.

    Did you read the intro? The document wasn't lost, it was mangled.

    DonkPunch asked earlier, "How was this [the foisting of inferior products on an uninformed populace] allowed to happen?"

    This is your answer. It was allowed to happen because some of the loudest, most visible advocates of alternate products were the least polite, the least diplomatic, and, in this case, the least thoughtful.

    I would find this hilarious if it wasn't true. A lot of Slashdotters are fond of throwing around "RTFM!" or "You deserved to get hosed because you made a mistake!", but they can't be bothered to read (or take to heart) the advocacy HOWTO. For pity's sake, hair-trigger flamage is so common in the anti-MS community that somebody actually felt the need to write a document telling people what ought to be obvious: mindless flamage does not change someone's mind.

    www.alarmist.org

  5. Re:Why not even html by Luis+Casillas · · Score: 1
    The title in your post is really crazy. I thought you were saying someting absolutely nuts, until I distilled the essentials of what you want: portable file format, editable with simple tools, readable on its own, etc.

    HTML is a crazy choice (at least as a _writing_ format; it might be a good choice for _presentation_), but LaTeX is pretty good for this.

    Not to say that LaTeX is perfect-- I have tons of gripes with it, but it works, it's here now, and has a large user base that continuously extends it.

    Seriously name one fearture in Word97, Star Office, Word Prefect that couldn't been done in a nice GUI html editor? Just name one, one example.

    Multiple column text. Footnotes and endnotes. Automatic section numbering. Automatic generation of tables of contents and indexes. Extensibility with macros. Precise control over how your document will be formatted on the printed page. Need I continue?

    Not that Word does any of these particularly well, BTW ;)

  6. Agreed. by Luis+Casillas · · Score: 1
    TeX is Turing-complete, which for a description language is a bad thing.

    You've hit the nail in the head. This is very closely related to my major gripe with TeX and LaTeX.

    The fact that you extend TeX by writing programs in TeX is absolutly horrendous. I find it to be completely dense and impenetrable. TeX might be a reasonable page description language, but it absolutely sucks as a programming language.

    If I could design the LaTeX replacement of my dreams, I'd go with a system with 2 or 3 languages:

    1. The language you use to write the document itself. I'd pick some SGML DTD, and give it a standard tagset that mirrors LaTeX.
    2. A simple language to provide a macro-rewriting facility for end users. Stuff to save users from typing too much.
    3. A full blown programming language to implement serious extension modules to the basic system. An extension could thus be a DTD defining additional tags, together with a program to provide semantics for them.

    And did I mention that TeX's syntax is just horrible? "\command{...}"-- it's just begging for users to mismatch braces when stuff gets nested too deep!

  7. Re:Why not even html by Luis+Casillas · · Score: 1
    Multiple column text.

    read "TABLE" tag.

    Ugly hack; basically, you have to explicitly lay out your text in the two columns. And switiching back and forth between multicolumn and normal mode requires a lot of editing.

    Footnotes and endnotes.

    Read "FONT SIZE=1" and "I"

    Yeah right. Has it occured to you that there's more to footnotes than just small script?

    Automatic section numbering.

    uh what? an html editor would be able to automatically add page numbering, not sure what section numbering is

    You don't know what section numbering is, yet you want to claim HTML is adequate for serious document preparation? Jeez.

    Automatic generation of tables of contents and indexes.

    Read "TABLE" again

    Yeah, right. Which is the HTML tag whose semantics consists of looking at all the sections and subsections in my document, figure out their numbering and in which page they occur, and automatically generate a table of contents?

    Which is the HTML tag that allows me to mark points in the document I want indexed, and which is the tag that, when encountered, will cause and index of all the parts I marked, with page numbers, to be generated automatically?

    Extensibility with macros.

    Any decent GUI html editor would be able to add this fearture

    But again, you sorely miss the point. I don't mean a keyboard macro in an editor to insert some preset text; I mean a macro facility to extend the document language itself. In HTML, this would be, for example, something like a tag that let you define a new tag.

    When you are repeating a particular pattern over and over, you want a new language idiom to represent the pattern (which makes your document more legible), not just a keyboard macro to insert it over and over again. I do this all the time in LaTeX-- when I find I'm repeating some commands to often in a certain way, I define a new command to encapsulate the pattern.

    Need I continue?

    More than likely, I still don't see you point. Ok lets assume Word97 is the greatest word processer of all time, say it has the prefect user interface.

    I refuse to make that assumption ;-)

    Ok, now on the back end, rip, tear and pull out that property format. Ok, so you just have the front end now right? Ok when it tries to save the file, have it push everything out into html instead of Word97 DTD. Ok, so you have an HTML file pushed out, but the user doesn't know it is an html document, pretty neat huh?

    One thing is to use HTML as the file format for a particular application-- the application can do a lot of things that HTML _itself_ doesn't have. But the original post was talking about editing HTML with vi or HotDog Pro.

    My point was that HTML is a terrible language for _authoring_ documents. If you want to use some portable, application independent language for _writing_ serious documents, HTML doesn't cut it-- it's underfeatured. A good language for these purposes is LaTeX; it's free, extensible (and has tons of extensions), featureful, and can be used to output a document into many formats: dvi, ps, pdf and html, among others. It has professional typesetting capability. It has facilities for automating tables of contents, indexes, bibliographies, and many other things.

    Really, take a look at LaTeX and its feature set. This is the kind of thing you want for a document authoring language.

  8. Re:Typical slashdot ignorance by Pauly · · Score: 1
    The "typical ignorance" is yours alone this time. Microsoft has everything to do with:

    Incompatibility between versions of Word

    Binary file formats that work only in their software

    Brutish business practices that remove the benefits of competition in this marketplace

    My customer is as much a sucker in this scenario as you and I.

  9. Re:Already Happened by echo · · Score: 1

    I've heard this many times by many people, but how true is it really? I installed Office 2000 and opened up Word 2000, I typed a paragraph, then went to "File.. Save"... gave it a name... it put the default .doc extension on the file.

    Then I tried opening up the .doc file with Notepad.. and.. guess what! Binary garbage.. no XML.

    Am I missing something here?

  10. Re:Ever hear of OpenDoc? by Malc · · Score: 1

    Microsoft aren't the only people doing this. It seems to me that SQL in MS SQLServer is more ISO-92 compliant than in Oracle. Doesn't Oracle use that wacky += operator to do outer joins, or something like that?

  11. Re:Sabotage by Malc · · Score: 1

    "Vive le France!"

    Is that: Vive la France!

    My French is rather rusty though ;)

  12. Re:Hey genius... by Malc · · Score: 1

    Saving Word97 as Word95 might not work too well if he's been too trendy and tried to use all of the latest cool things. One has to support the lowest common denominator until it becomes too much effort or the profit gains aren't worth it. If this has happened more than once, he should have reconsidered using Word97 in the first place.

  13. Re:Already Happened by Evangelion · · Score: 1


    Unless, of course, they publish it on the web as a 'trade secret'.

    Then anyone who parses it can theoretically be sued for violation (because, well, you *could* have looked at the spec. remember that - the posting of the trade secret on the web is probably just preparing a club to use on anyone who implments a version. not legal, but, hey, how many open source programmers can legally defend themselves against MS?)

  14. Re:Already Happened by Evangelion · · Score: 1


    tell that to the decss authors.

  15. Office 2000 and XML by pberry · · Score: 1

    I did a little digging into some Office 2000 "white papers" and Microsoft talked a lot about storing a lot of internal file format stuff in XML. I think they might be doing a little buzz word dropping and embrace and extend at the same time. They are say they are basically using XML for all their 'save as' formats such as HTML. Link here

    I can't say for sure since I haven't purchased a copy of Office 2000 yet.

    --
    -- Are you an EFF member yet?
  16. Re:It will never work. by PiMan · · Score: 1

    On the contrary, XML can cover every conceivable standard. It's a metalanguage, not a language in its own right. You use it to create languages, which in turn describe documents. DTD are incredibly versatile; Schemas will be moreso.

    Ideally, you don't make tags for bold, you'd use the existing, open, XSL format to stylize documents.

    HTML (well, XHTML) is an example of an XML format. Open standards exist for hypertext structure, vector graphics, software descriptions, multimedia, wireless transfer, documents, and the stylization and display of all of the above.

    Generally, XML documents are "well-formed" (or am I thinking of "valid"? Always get them confused...), which means you can read them even without a DTD, and get a good idea of what they do. However, like source code, XML can be purposelyfully obfuscated, so you can still proprietize it somewhat. But it's definitely better than, say, .doc or .pdf.

    --
    Windows 2000: Designed for the Internet. The Internet: Designed for UNIX.
  17. Re:Why not even html by perfecto · · Score: 1
    It's called size you dude. For every HTML tag on a page you need at least 7 bytes, you have two ', one / and at least two letters. In a binary document format you have only a single byte to specify a text format and then one to end it, two bytes is alot better than a minimum of 7. If you were typing up a really neat looking many-page document that you need to send over a network, every byte you save on size is less monet it costs to transmit the document. For one person on a fat pipe it isn't that big a deal but with lots of people the pipe gets alot smaller. Word doesn't exactly save space but a well designed binary format would save alot of space.

    considering that a minimum word document is 19k, i think your point is moot!



    --
    J Perry Fecteau, 5-time Mr. Internet
    Ejercisio Perfecto: from Geek to GOD in WEEKS!

  18. Ethical Standards: If you don't mind indulging... by freeBill · · Score: 1

    ...an old man, read this little digression into history:

    The fundamental question here is: Who owns my company's data? My company or the company which wrote the program which I use to store it? I think few companies would answer that by saying their vendor owns it. (And any who do should have a stockholder revolution tomorrow to throw the idiots out.)

    Why then do we allow any company to encrypt our data in proprietary data formats? (It is encryption, even if it is poor encryption which has been broken many times.)

    The answer lies in the history of programming: There was a time when many of the developments in computer technology came in the area of file formats. When a company implemented a great new feature, part of the research that went into developing the feature was a file format. And coders, justifiably proud of their research, made those file formats proprietary to help protect their original research.

    That day is probably long past. Most proprietary formats still in use today are far inferior technologically to XML. The only purpose they still serve is the form of limited encryption they offer which allows vendors to hold customers' data hostage.

    This is wrong. Coders should admit it's wrong. It's wrong whether it's done by Microsoft or some independent programmer working from his basement. The data belongs to the customer. If you can't keep the customer by satisfying his needs, you should not be allowed to keep him by deliberately making it more expensive to change vendors.

    Nor should you be allowed to extort more money for upgrades by creating non-compatible formats.

    What we really need is a voluntary set of standards for programmers. Such a standard should emphasize the rights of the user, including the right to own the file format for all data storage.

    I would suggest a few additional standards as well. Although I prefer open-source and non-proprietary programs, I don't think that should be a part of the standard. But there should be a guarantee that any programmer subscribing to the standard should release all source code at any point where he or his company is no longer going to support it.

    I'm sure other standards could be adopted. If we voluntarily subscribed to such a standard, it would make it much harder for any company which did not subscribe to justify its abuses of the user.

    The very first Word-for-Windows file format was a step backwards technologically. And the science of file formats has progressed considerably in the years since then. But Microsoft's file formats have not improved. (One could even argue they have taken some giant steps backwards, although some of those backward leaps were probably originally intended to improve things.)

    Can you imagine what MS representatives might say if they were asked, "Why don't you subscribe to the Code of Good Programming Practices?"? They certainly couldn't argue their file formats are better technologically.

    And then comes the inevitable follow-up question, "Don't you believe that your customers' data is owned by them?"

    If they answered "no," any corporate executive who bought a Microsoft product might become civilly liable to a shareholder lawsuit on the grounds that such a purchase put the company at risk for future extortion. If such a risk was not mentioned in quarterly reports, a class-action lawsuit might be possible.

    Until coders adopt the kind of standards of professional behavior common in other professions (CPAs, financial advisers, doctors, even CLOWNS, for goodness sake!), they are not going to be successful in convincing consumers that they should not buy products from programmers who do not abide by those standards. If we cannot articulate the standards which we feel Microsoft is violating, how can we expect the pointy-haired bosses to figure it out on their own?

    --
    Eternal vigilance only works if you look in every direction.
  19. Re:Why not even html by dvdeug · · Score: 1

    Gzip and zip are two commonly used compression formats, the first of which the KWord people are using in their documents. Either will win you more than your hacked quasi-compression, and neither lose readability (zless, for instance.)

  20. Re:WP formats by dvdeug · · Score: 1

    LyX, not Klyx. Klyx was a one time hacked port to KDE that wasn't kept up to date, and shouldn't still be out there to sully LyX's name.

    ASCII was never designed for WP and DTP. It does work for plain text, and the basis for higher level protocals, though. Again, Unicode (not wide ASCII) was designed to be a plain text channel, not a WP channel.

  21. Succinct answer from another TW by alumshubby · · Score: 1

    Technologically, yes. Given market realties, doubtful.

    --
    "How many light bulbs does it take to change a person?" --BMcC-->
  22. Re:An open question by acroyear · · Score: 1
    Not because somebody actually went to a store and bought the latest MS Office.

    What happens is MS updates their software. The OEMs that do the bundling (including Dell, where a LOT of companies get their computers), automatically include the upgrade of Office on every NEW box that comes out.

    Now, the next wave of new computers that goes to a company automatically is given the update, and instantly, incompatabilities exist. The only solution is for the company to independently purchase a complete set of licenses for the new product to all other employees, pass the cd around, and get on with it.

    MS then makes the real money, when (in order to take work home and bring it back again without lossage) the employee has to buy a copy of the Office upgrade from a store, at store prices, at their own expense.

    Dear God, how the money rolls in...

    --
    "But remember, most lynch mobs aren't this nice." (H.Simpson)
    -- Joe
  23. Doesn't RTF address this? by Doctor+Memory · · Score: 1

    I am (thankfully) not an expert, but doesn't RTF address this? I mean, I understand you won't be able to embed "live" spreadsheets and graphs (a la OLE), but won't it at least properly handle fonts, paragraph spacing and page breaks? The downside, of course, being that your file is considerably larger, and can take some time to convert.

    Actually, what you want is SGML, but I don't know of any SGML-compliant word processors. Both XML and HTML are facets of SGML. Maybe what you really need is a HTML->Word translator (since Word will [allegedly] already output in HTML format).

    --
    Just junk food for thought...
  24. Re:Barriers exist right now... by Mr.+Frilly · · Score: 1

    Hmmm, from your needs, it looks to me like you've just completely decribed LaTeX!

    Just remember, in engineering/computer science/etc., most problems have already been solved. The solution to document processing (at least in my subjective mind) was already solved in the 80's with TeX/LaTeX

  25. Re:Small-minded viewpoints by Darkstorm · · Score: 1

    Just because I use phrases applicable to the high-level people I work with I should be reviled? I su[pose to you peans working at the bottom that seems fair, but you'll never earn the $$$ I'm on with a narrow-minded attitude like that.

    I shouldn't do this, but I just can't help it.

    Ok, Mr Big shot, I have discovered one thing since I've been working with computers (and it has been quite a few years now) is that the big level marketing people have discovered they can resell the same crap over and over if they come up with fancy new words for the same old stuff. The major difference between you and the professional IT/development type people here is that we don't need fancy words to describe whats going on. You and your high level know nothings may have your fancy words, but how long will you be in business without us low level peons who actually do the real work and skip over the fancy phrases for the same old shit?

    If that is the way you think of others you wouldn't last long as a project manager. Fancy words do not imply knowledge to those of us who do understand something about computers and programming. So skip the "Why is everyone getting upset" act and face the fact that you do not know what you are talking about. Your post was as bad as the zdnet artical linking all open source projects to mozilla, and making them all out to be the same. Before you decide to try and impress the people here go find one of the peons and let them read your post first. If they start laughing, just don't bother.

    This is a Flame. It was intentionally written and posted as such.

    --
    If ignorance is bliss, the world is full of blissful people
  26. Visual DocBook editor by sdanic · · Score: 1
    What's required is a visual DocBook editor which secretaries can use, and is cheaper than the proprietary Word Processors that it's intended to replace.

    Then, secretaries won't mind using it, bosses won't mind buying it, tech people won't get frustrated with the document format.

    Check out the following request for a Visual DocBook/XML Editor on CoSource and lend your support.

    Currently Arbortext Adept, SoftQuad XMetal and Adobe FrameMaker cost way more than MS Word does, so there's no incentive for frugal business managers to purchase them. An inexpensive, easy to use visual editor might be able to tip the balance and encourage people to produce documents in non-proprietary formats.

  27. OK, I take it back by DLPierson · · Score: 1

    If the quotes in some of the replies in this thread are accurate, it seems that the guy really does want XML support in the kernel. If so, he's hopeless.

  28. Aw, cut him some slack by DLPierson · · Score: 1

    Yeah, he's pretty clueless about Linux, doesn't understand the difference between the kernel and all the higher layers, doesn't understand the non-organization, etc. He also appears to have no idea how much XML support is already available on Linux.

    However, he's right on one important point. The business world is going overwhelmingly to XML as a basis for interoperability. Those of us who care about connecting to other business systems need to plan to, uhm, exploit this.

    Anything that helps Linux talk to the rest of the world helps. SOAP is simply a pretty flexible XML based remote procedure call standard (descended from XMLRPC). Even though M$ started it, all the implementations I'm aware of are OS independent ones (Perl, Java, and soon Python) from non-M$ sources.

  29. (-1: Offtopic) by Gromer · · Score: 1

    In the US more than 95% of the privately held land is owned by only 3% of the population.

    And more than 95% of the cattle is owned by less than 1% of the population. What's your point? Why does absolutely everything have to be equally distributed? If your statistic referred to money it would make more sense, but land ownership is about as relevant to the modern world as cattle ownership

    --
    "Never let your sense of morals prevent you from doing what is right" -Salvor Hardin
    1. Re:(-1: Offtopic) by quonsar · · Score: 1

      ...land ownership is about as relevant to the modern world as cattle ownership.

      You must have been staring at your monitor too long. Land is the ultimate resource, every human endeavor requires it. The demand for it grows exponentially, and there is a limited supply. Figure it out.

      ======
      "Rex unto my cleeb, and thou shalt have everlasting blort." - Zorp 3:16

    2. Re:(-1: Offtopic) by quonsar · · Score: 1

      One would think a Slashdot reader, of all people, would realize how dated this concept is.

      Perhaps my profession (real estate appraiser) has warped my viewpoint. I doubt I am your typical Slashdot reader! And I've NO doubt there are plenty of smarter people here than I... :-) Thanks for a thought-provoking response.

      ======
      "Rex unto my cleeb, and thou shalt have everlasting blort." - Zorp 3:16

    3. Re:(-1: Offtopic) by Gromer · · Score: 2

      One would think a Slashdot reader, of all people, would realize how dated this concept is. Land may be the ultimate resource in terms of overall human welfare (though I contest that), but that does not make it a meaningful measure of power or social stratification. I am a member of the American elite in a number of ways (white, male, upper-class, well-educated), and by any sane measure I have a disproportionate amount of power in this country (or will, once I get out of college :-)), and yet I am likely to never own any more land than my house covers, if that. Land may be where food comes from, but it sure isn't where influence or power comes from. 3% of America probably does hold most of the power in this country, but it's not the 3% that owns 95% of the land, it's the 3% that has 95% (or however much) of the money. And believe me, they're not the same 3%. Bill Gates isn't a big landowner. Neither is Larry Ellison. The only man to get notably rich off land lately is Donald Trump, and he did that in real estate, which isn't really the same thing.

      As for the demand issue, demand for land grows at an exponential rate only within the confines of simple-minded mathematical models (trust me: the population of this planet is not a first-order DE in one variable) and scare headlines in the media. It is now very clear that the world population will stabilize within 40-60 years. Even the most panicky and irrational of the estimates for the world's eventual maximum population permit that entire population to fit on the eastern seaboard of the United States, at the population density of San Francisco, with plenty of farmland left over to feed them. Moreover, the world's food supply is currently growing faster than its population. Overpopulation is bunk.

      --
      "Never let your sense of morals prevent you from doing what is right" -Salvor Hardin
  30. Re:Good Idea but counter gratuitous complexity by Ethan · · Score: 1

    I'm disappointed that this was moderated to Score: 3 (Insightful). I don't like Microsoft software any more than the next guy, but if this were pointed at Linux it'd be labelled FUD and moderated to -1 faster than you could blink.

    I try to stay out of moderation fights, but come on guys...

    (BTW, after the above post gets moderated down, feel free to knock this one down, too, so nobody has to read it :-P)

  31. What's the advantage of XML over EBNF? by Tijn · · Score: 1

    Maybe a stupid remark, but as far as I can see the only difference between using XML (with DTD's) and EBNF (extended Backus-Naur Form for those without a CS degree) is that in the resulting language you have to have tags with instead of any shape you like.
    If you had a standard way of writing down an EBNF specification of a language (essentially the same as writing a DTD), and some way to specify it's meaning (probably the same as that Schema-stuff; haven't really understood that yet) you'd be in the same place, but without the , right?

    So what's new about XML? Just the fact that HTML-viewers (e.g. browsers) can handle it more easily? Or the fact that specifications for a language are slightly easier to write?

    1. Re:What's the advantage of XML over EBNF? by Tijn · · Score: 1

      Sorry, second time around it seems the < and > in 'have to have tags with < and >' got lost...

  32. M$ has a lot of crap to live down (Excel user!) by crovira · · Score: 1

    I use Macs, I have a copy of Excel (I had both arms twisted to buy it.) I recently tried to read a rinky-dink little nothing Excel 97 spread sheet, (nothing special to it at all,) only to be told that NONE of the damn conversions I tried on the client's '97 NT machine worked when I get them back home to my Mac (and this with Excel for NT 4.x.)

    M$ has a lot to answer for and I REFUSE to by another copy of their crappy products to try to get at something saved in a different crappy version. That's what should be against the law.

    M$ Word is an even worse offender. Have you looked at the crap that's on a Word file. 90% of it is waste. WordPerfect was easy to parse. This stuff is absolute drivel.

    Bill Gates can kiss my [expletive deleted] on his way to oblivion.

    --
    MSBPodcast.com The opinions expressed here are my own. If you don't like 'em... Think up your own stuff.
  33. Re:Absolutely! by ZeroLogic · · Score: 1

    Pagelayout is a bit more difficult and would probably not work, but I see xml+schema/dtd as a way of crossing tex w/ a wysiwyg editor, where if the dtd were detailed enough, then you would have a large enough grammer to describe everything a wyswyg editor gives you, in a portable format.

  34. Re:Drawback of XML by IntlHarvester · · Score: 1

    The XML format used for 'Slashboxes' is actually called RSS and is also used by My Netscape. More info can be found at http://www.xmltree.com/ and http://www.scripting.com/. There's actually hundreds of sites using this format.

    This is actually a pretty minimal use of XML. The entire Slashdot application itself could be driven by XML as the delivery mechinism. Right now, changing your viewing options requires a roundtrip to the server and a slow PL script.

    An alternative could just dump down the articles and comments to your browser in XML format and then have those comments sorted/filtered/formatted quickly on the client side by XSLT, using either a server- or a user-supplied stylesheet, making Slashdot a much faster and more flexible applicaiton.

    This could be done today, but only with IE 5. I suspect when Mozilla matures and more people understand this stuff, we will seem more purely XML-to-the-client driven web sites, going beyond the simple 'between sites' B2B-type applications.
    --

    --
    Business. Numbers. Money. People. Computer World.
  35. Re:Tools, and Data vs. Presentation by IntlHarvester · · Score: 1

    I think what you are getting at is the classic "broken" business operation where people are throwing all of their business data into unstructured formats like Word or Excel files, or even paper forms and so on. Of course, you can stay in business with shoddy internal office practices, so there's not much incentive for people to change. Especially when the only programs they are familiar with is MS Office. That is, until you throw the web into the situation.

    They can stumble along doing tedious reformatting into HTML, or somehow use the cumbersome conversion tools MS gives you. But that's never going to solve the real problem. The real answer involves investing in back-end office automation systems and trying to lift the users out of the document muck.

    (I suspect that is what happened to the Ask Slashdot person -- somehow finding a client that wants 3,000 pages of documentation and at the same time insists on using a five year old version of Word. What they should be investing in a system to help manage large amounts of documentation, before hiring out the tech writing tasks. Part of the answer is a SGML/XML format, but a bigger part is the tools to manage that format and it's content.)

    Luckily, Microsoft has invented lots of neat kludges to help preserve the client-side document mess. Word long ago expanded to become a database front end, and it's would be fairly easy to write 'templates' for things like press releases that automatically generate an XML file with a home-brew DTD or stick the information into a database. You can also write simple server-side scripts that do the same thing with COM. (In the long run, Office+Exchange will turn into a XML-based 'groupware' system that should provide better back-end access to this data.)

    The big catch here is that 'the solution' requires investment in back-end systems and (gasp) some programming. It's not realistic to expect users to generate XML by hand, and the shrinkwrapped toolset is not here yet. However, anything that you can do to get data off people's C: drives is a big step forward. Good luck.
    --

    --
    Business. Numbers. Money. People. Computer World.
  36. Re:Absolutely! by IntlHarvester · · Score: 1

    Don't forget that 90% of "everyone" is Microsoft Word, and that "partial" understanding of documents doesn't cut it in the real world where interoperablity (with Word) is a huge problem.

    Even in a heavily Microsoft shop, we've got Word/Excel interoperability problems with Lotus Notes and with our RDBMS systems.

    A standard WP format would go a long way to solve these things. XHTML + CSS3 might be providing this - we just now need to get the vendors to implement it. The nice thing about a real standard is that it can be compliance tested, and mandated during Government/Corporate procurement.
    --

    --
    Business. Numbers. Money. People. Computer World.
  37. Re:I'll tell you how... by IntlHarvester · · Score: 1

    Well actually the reality is that the guy didn't see the problem coming when he should have. (1) Customer wants 3000 pages of documentation. (2) Customer uses five year old version of Word. (3) Customer refuses to use anything else. The whole situation has Huge Pain In The Ass written all over it.

    So, he could have provided some 'consulting' to help the customers put a real documentation system in place, including the proper tools. (The key word here is "tools", which may or may not include XML/SGML formats.)

    Or, he could have just charged the hell out of them for forcing so many headaches on his plate. I probably would have just done the latter, because they seem pretty clueless.
    --

    --
    Business. Numbers. Money. People. Computer World.
  38. Re:Drawback of XML by IntlHarvester · · Score: 1

    So, are you saying that you could manage slashdot's presentation controls on the client side with HTML and CSS? Maybe... but it would take a fair amount of javascript too.
    --

    --
    Business. Numbers. Money. People. Computer World.
  39. Re:Ever hear of OpenDoc? by DJerman · · Score: 1

    Aaahg! you got me. I haven't installed it since 6.5. After all, once you pay for Oracle, why use anything else?

    --
  40. Re:Ever hear of OpenDoc? by DJerman · · Score: 1

    True, (and it's (+), btw) but Oracle specifies Entry Level compatibility, and it delivers that. The outer join syntax is an optional level. MS SQL, on the other hand, will choke on valid entry level statements :-).

    --
  41. Re:Already Happened by Kyobu · · Score: 1

    I thought that the legal status of trade secrets was that they were unenforceable. That is, if you patent something, then other people may be barred from using it, or charged a royalty, but that if you make something a trade secret, it was unprotectable. For instance, the recipe for Coca-Cola is a trade secret because the company did not want to publish it. Of course, the DMCA or UCITA may have changed this along with everything else.

    --
    Switch the . and the @ to email me.
  42. short answer: by Blue+Lang · · Score: 1

    no. proprietary file formats are not designed to make your life as a geek easier, they are designed to lock people into using, and encouraging other people to use a single applicatation or suite.

    other answer: yes, duh, of course an open standard for document exchange would be great. there have, in fact, been several. but there's nothing to keep any given company from extending 'xml' to only be 'fully featured' within their product.

    if you don't want sucky proprietary file formats, don't use sucky proprietary tools.

    --
    blue

    --
    i browse at -1 because they're funnier than you are.
  43. Re:XML does everything - whatever. by Fat+Cow · · Score: 1

    msxml bastardisation? isn't it the only implementation of xslt integrated into a browser at the moment? certainly the mozilla support isn't there yet.

    if this is true, then you should be blaming all the other implementors for not keeping up and supporting it, rather than blame microsoft, who are the only company even to complete a minimal implementation.

    don't complain, don't moan, do something about it. and smile :o)

    --
    stay frosty and alert
  44. Other factors by mikek · · Score: 1

    What I'm wondering is what effect this standard would have on LaTex, which is already an open document standard that also uses markup tags. Assuming that the XML standard is designed well, people could compose all of their documents in a text editor and we would no longer need other formats.

    The idea sounds neat, though, because word processors would become like web browsers that all try to render the same code into the same format. It could even become a standard web publishing format and that creating a word document would be the same as making a web page. This could eventually replace html. It's about time for that anyway.

  45. SVG is working? by Photon+Ghoul · · Score: 1

    SVG seems like it's quite a ways off from being a common image format. Does anyone have examples of this?

  46. Re:Drawbacks by AmirS · · Score: 1

    > 1) The file size would be larger by quite a bit.

    Would this really be the case? I'm thinking of a standard MS Word file, which seems pretty bloated anyway. At least an XML file would have some linear relationship to the size of the document it stores.

  47. Re:Already Happened by Shadowlion · · Score: 1

    So we're to praise them because they implemented an unfinished spec?

    Out here in the real world, implementing an unfinished design is considered a *bad* thing, precisely because it is unfinished and could change at any given moment - leaving said feature working incorrectly.

  48. Here's a thought by Hangtime · · Score: 1

    /me gets up on the soapbox

    If your working on a 3000+ page manuscript and getting paid for it, wouldn't have been smart to find out what version of Word they were using before starting the project? I mean bitch, whine, and moan at Microsoft all you want, but it's like taking a Photoshop file to my printer instead of a TIF or EPS. While there all graphics files, they don't produce the same results and my printer can't use the Photoshop file to print with anyway. So when you say, I'm mad an MS because they changed the format; I say, don't bitch because you didn't do your job. You didn't ask enough questions as a consultant because if you had, you would have found out what version they were running. BTW, Word does suck for the most part when laying out text and graphics that's why I use Pagemaker to do it!

    /me comes down from the soap box

    Hangtime

  49. Re:Learning XML, XHTML by Vagary · · Score: 1

    XHTML is (mostly?) backwards compatible because it's basically just well-formed HTML. If you're already using CSS then writing XHTML rather than HTML won't do much to cramp your style. If you're not: you should be! Using XHTML is like tabbing your code -- there's no reason not to and it might help you later on. Download HTML tidy to help you along the way.

    I must say I feel the same way about XML. I've written DTDs and linked CSSs such that it displays nicely in IE5, but I'm not sure where this is leading. What we need is a database of DTDs so that instead of writing my own I'll try and make something that is compatible with someone else's. Obviously for a major project I can consistantly use a DTD interally, but for just random web docs there should be an Open Content repository.

    And yes, I know about XML.ORG's list, but they're hardly RFCs.

  50. Re:Forward compatibility by Kaa · · Score: 1

    It is perfectly possible to define a file format that is both forward and backward compatible.

    That depends on what kind of information you want to store. It's easy to do for bitmaps, since what you have to store is well-defined and limited. It's practically impossible to do for word processing files, since you cannot know what additional complexity will be added to the format later. To give a crude example, if a word processor v1.0 does not understand footnotes, and v2.0 does, it's very hard to make a v1.0 program deal intellegently with a footnoted document. I don't really see a way around this.

    Kaa

    --

    Kaa
    Kaa's Law: In any sufficiently large group of people most are idiots.
  51. Re:Solving the wrong problem by MikeWarren · · Score: 1
    You say:
    The real issue is Microsoft's mis-prioritization of marketshare and profits over the usefulness of the software to the end user.

    This isn't Microsoft's fault; they are a company in a capitalist system. Participants in a capitalist economy are rewarded based solely on their profit, not usefulness or innovations. I don't pretend it's good, but blaming Microsoft for maximizing their profit is just dumb.

    --
    Mike Warren
  52. Re:New file formats are not new! by sohp · · Score: 1

    I gotta give props to this one. Did you use "exactly the same set of operating system, software, fonts, video drivers, printer drivers, paper and ink cartridges", or exactly how was it "rendered into rubbish" with Word95? Let's be charitable and pretend you and the customer had used the same version of Word ... what fonts did you select? What colors? What screen (or printer) resolution did you write to? Was it rubbish visually or printed (Never mind WSYIWYG!)? This is the reason real professional printers have settled on things like the Pantone color matching system.

    Sorry my 3000-page document writer friend, but from my perspective you're missing some elements of due diligence, and your complaint doesn't cover these omissions.

  53. Solving the wrong problem by FascDot+Killed+My+Pr · · Score: 1

    Sure, an XML file would be easier to edit by hand and could be made to conform to a DTD. Heck, why even use XML? Why not straight ASCII? I don't give a flying fig how my docs are formatted, as long as they are accurate and complete.

    But that's not your problem. There are plenty of proprietary file formats that don't have the problems that Office file formats have. The real issue is Microsoft's mis-prioritization of marketshare and profits over the usefulness of the software to the end user.

    There is no (technical) reason in the world that the various versions of Word have to be mutually incompatible, nor is there any (technical) excuse for its touchiness regarding formatting. But one of these has business value (lock-in and upgrade treadmill effect) and the other has a business reason (feature bloat, poor documentation, etc).

    So go ahead and create an XML word processor file format. But it won't go anywhere until the MS drones adopt it--which won't happen until MS does. Which won't happen until Office split off from MS (if then).
    --
    Have Exchange users? Want to run Linux? Can't afford OpenMail?

    --
    Linux MAPI Server!
    http://www.openone.com/software/MailOne/
    (Exchange Migration HOWTO coming soon)
  54. I was being sloppy by FascDot+Killed+My+Pr · · Score: 1

    What I probably should have said was:

    The real issue is Microsoft's mis-prioritization of temporary marketshare and short-term profits over the usefulness of the software to the end user (which is the way to maximize profits long-term. cf DOJ v MS).
    --
    Have Exchange users? Want to run Linux? Can't afford OpenMail?

    --
    Linux MAPI Server!
    http://www.openone.com/software/MailOne/
    (Exchange Migration HOWTO coming soon)
  55. depends on which side you look... by MSZ · · Score: 1

    From the thechnical point of view, this is possible, without any problem. A well thought out standard like XML (or even good old HTML) might allow this.

    However, on Redmond-style business POV, this is a big no-no. What, open the formats and allow competitors to easily exchange data? Who would buy bloatware like Office then?

    The problem is not technical but political, unfortunately. And until M$ is destroyed it will surely remain so.

    --
    The moon is not fully subjugated. I demand a second assault wave preceded by a massive nuclear bombardment.
  56. Re:Bad engineering or deliberate forced upgrades? by MSZ · · Score: 1
    The IFF file format family, originally defined by Electronic Arts, and forming the basis of AIFF, AVI and MOV files does exactly this. MS is aware of this technique (after all it used it in AVI), but either through poor engineering chice or malice chose to impose frequent file format incompatibilities on users of Office.
    GREED is the answer. They are not that stupid, nor evil for evil's sake. They want our $$$ and will do anything, short of sending armed thugs, to get that. Look at the marketing drivel... They'd say it's incompatible because the new version is so new and advanced and whatnot. But from old DOS MSWORD to the newest one, the only new feature I found useful was addition of spellchecker for my language. Who the fsck needs talking paperclip? But no, I will have to upgrade to read the junk my boss or PHB types in the company send around. btw: Way back in the past I was using editor called QR-Tekst. The format was proprietary but mostly textual and simple. A coworker wrote QRT to Quark Xpress converter without problems. Even before the company started giving away format description. Later, they made the Windoze version and un-m$-like, the format only had small change to accomodate font names. It was still readable (some formatting was lost) with old version. Another fine company that cared for the customer and went out of business for that.
    --
    The moon is not fully subjugated. I demand a second assault wave preceded by a massive nuclear bombardment.
  57. Re:Already Happened by fReNeTiK · · Score: 1

    Well the option is called save as HTML, not XML. I'm not complaining about the format not being well-formed XML.

    HTML 4.0, on the other hand...

    BTW, someone pointed out this byte article, which I pretty much agree with.

    Basically, it's already a great step to have a format which is more or less human readable. This will makes it alot easier to create third party parsers/conversion tools.

    --
    I strongly believe that trying to be clever is detrimental to your health. -- Linus Torvalds
  58. Re:Already Happened by fReNeTiK · · Score: 1

    Am I missing something here?

    In a word, yes.

    Right beneath the normal save option is a nice looking "save as webpage" button. Use it...

    ...Won't be of much help tough because as other people have pointed out already, this feature actually produces an (MS)HTML document with XMLized OLE related info.

    Note that if you have a relatively complex document, Office2K will create a directoy with the same name as the file in which it puts various other stuff.

    I just went to the trouble of running a small "saved as webpage"-word2000 doc trough the W3C validation service. Result:

    Sorry, this document does not validate as HTML 4.0 Transitional.

    But it is quite a nifty feature if you're running an all Microsoft shop (which I have the displeasure of doing), since it allows people to post stuff on the local intranet without any additional work... I guess Microsoft is quite pleased with this feature :)

    --
    I strongly believe that trying to be clever is detrimental to your health. -- Linus Torvalds
  59. why not use rtf by lomion · · Score: 1

    Why not just use rtf? rtf files can be read/created by most any word processor.

    --
    this space for rent
  60. Drawbacks by revscat · · Score: 1

    This would off course be possible, because it's just the kind of thing XML is designed to handle. If Word stored their documents using one DTD/schema, and you wanted to load it in WordProcessorX, you could just use their stylesheet to convert it over. No problem.

    Really I only see two (minor) drawbacks to using XML as an end-user document storage mechanism. 1) The file size would be larger by quite a bit. No biggie with today's hard drives. 2) It would be editable by end users. This is a good thing if you know what you're doing, but if you don't and fuck up the well-formedness of the XML document you would be up a creek.

    Having said this, I think that companies will be moving towards using XML as a stardard format soon, if not sooner. Microsoft is already using XML pretty extensively to store info in Win2k. It wouldn't surprise me at all if they added a "Save As|XML Document" to the Office suite.

    -Rev.
  61. MS does XML... sort of... by Garg · · Score: 1

    Office 2000 has been widely touted as using XML. Having dealt with converting personally, I can say it "sort of" does. It does use a lot of XML, but it also inserts various proprietary codes of some kind. I managed to kluge up some ways of inserting the codes, just so we could export XML data from a mainframe directly into Excel, but it's definitely not standard. And right now I don't see that MS has any incentive to make it standard. I'm betting BizTalk, that they've been promoting as open, understands (and possibly uses) this crap.

    Garg

    --
    Garg
    Alumnus, Xavier's School for Gifted Youngsters
  62. A common standard by laursen · · Score: 1

    It would be great with a common standard for documents/etc - no doubt about that. But how long will it take for some company (lets call is M$) to include own additions and extensions to that standart? Just look at HTML/CSS/Java ...

    1. Re:A common standard by techwatcher · · Score: 1

      Way, way back before many of you were born (1982 or so), I was asked to evaluate hardware and software for my documentation department. I wrote this wonderfully literate paper carefully analyzing word processing (text processing, etc.) at a high level of abstraction. I pointed out that whatever package we chose needed to be able to perform certain tasks (formatting is particularly critical for technical writing, as is searching and editing rather than just writing). Especially, I pointed out, it would be best if the embedded formatting commands, as well as the rest of the document/data file, contained only ASCII characters.

      Much to the dismay of IBM (which lobbied hard with my manager to persuade him to go with the brand-new PC and Wordstar -- which allowed only a single line as header or footer!), we went with ASCII documents. (The software was called LazyWriter, which fit in 32k RAM and allowed 32k data files -- but you could chain them freely -- and had marvellous features I've never seen matched on any other WP. In fact, it even included a terminal emulation feature so we could "read in" any other ASCII file via modem or serial cable, straight into the WP!)

      A few years after that, some of the persons responsible for ATEX (used by publishers) created XyWrite, which also only employs ASCII. I started using XyWrite then, I'm still using it now, and I don't know why I would ever stop using it. Among other things, the package comes with programmable macros, keyboards, menus, etc. In fact, I use XyWrite to code HTML (using stored sets of macros and programs), and for each new Web site client, I write a program using XyWrite's XPL (extended programming language) to convert text for their site directly into their own custom templates.

      Last year a small PR division in a major corporation (hint: their cable service isn't offering NYPD Blue tonight!) asked me to fix their site after a number of "professional Web design" firms had been in using incompatible packages. Using XyWrite on my trusty laptop, I could see all the excess code these packages had dumped into the pages, and make the simple changes these "Web designers" apparently couldn't make. (After one presses the list button too many times, it seems one can't simply erase an item with these fancy packages?!)

      When I write technical documents (or any other client) for end-users to keep, I work in XyWrite first, then convert. I don't know how many conversion packages currently exist, but XWORD and Word for Word are two I've used in the past.

      Btw, I already know I will be writing a simple XPL program to convert older Web pages to the new XML/HTML standard, pretty soon!

    2. Re:A common standard by M$+Mole · · Score: 1

      Not long...any large software company out there would immediately start adding some proprietary extensions to work better with their software. Until the inclusion of Mozilla in their browswer (the alpha is out for those of you living in a cave), Netscape used proprietary HTML tags...which made life all kinds of fun for web developers such as myself. On the subject of XML, it actually is a wonderful middle-man for conversion between different formats. And it is true that MS currently uses XML in their Office 2000 suite to allow conversions of documents into different forms. I recently started using XML for various projects, and while it can be a pain to learn, it is incredibly useful. The only drawback I've found is finding good tutorials and what not, since every tutorial has to be written for a specific function working with a specific other language (e.g. Using XML with PERL or XML with ASP, etc.).

      --
      Karma: Non-existant. Due mostly to the fact that you smell funny and nobody likes you.
    3. Re:A common standard by King+Babar · · Score: 2
      That is something I've never understood. The only reason Netscape could ruin the web with the proprietary tags was because so-called web developers embraced the proprietary tags and used them all over.

      If you didn't like the proprietary tags, why did you use them?

      Well, I think there are two answers here. The first answer is that some of the proprietary tags were disgustingly useful where appropriate. Remember that things like tables were first seen as proprietary extensions before they were ever blessed by the W3C. And there were a few other things like "center" that looked easier to use than waiting for somebody (anybody!) to come up with decent style sheets. (No, it wasn't a new technology, but I would claim that general style sheets are hard for on-the-CRT display formats.)

      The second part of the answer was that it wasn't the specialized tags so much that ruined things (ignoring "blink" and "font" for the moment) but the real dorkiness of relying on parsing quirks in html to get layout effects. You know, bulletless list elements to get indents and such.

      --

      Babar

    4. Re:A common standard by King+Babar · · Score: 2
      Remember that things like tables were first seen as proprietary extensions before they were ever blessed by the W3C.

      Wrong. Work on the table specification for HTML started in 1993, a year before Netscape was founded. Netscape wasn't even one of the first 3 browsers to implement tables. However, Netscape was the first to not follow the proposal, and invent something much poorer.

      Hmm...you've got a point there. Thanks for whacking me upside the head. :-\

      I now can't remember whether or not Mosaic had tables; Emacs-w3 did, but I'm not sure when. I'm presuming Arena did, although I never did use Arena very much.

      In my (limited) defense, though, I did say "blessed" by the W3C. Do correct me if I'm wrong, (please! I want to remember this stuff right!) but tables weren't in the HTML2.0 spec (which was RFC 1866). Whatever else you have to say good or bad about the HTML+ (later HTML3) spec, it ended up being canned. Worse, it languished for what seemed like forever at the time, and that's where I remember the floodgates opening up wide.

      No, that doesn't excuse the crappiness that Netscape unleashed; I do now remember how much it pissed me off. Thanks for the memories. :-/

      Oh yes: you're completely right about "center" and what I remember as the Great Alignment War. Dunno where my brain was when I wrote the post you were responding to...

      --

      Babar

    5. Re:A common standard by Abigail · · Score: 2
      Netscape used proprietary HTML tags...which made life all kinds of fun for web developers such as myself.

      That is something I've never understood. The only reason Netscape could ruin the web with the proprietary tags was because so-called web developers embraced the proprietary tags and used them all over.

      If you didn't like the proprietary tags, why did you use them?

      -- Abigail

    6. Re:A common standard by Abigail · · Score: 2
      Remember that things like tables were first seen as proprietary extensions before they were ever blessed by the W3C.

      Wrong. Work on the table specification for HTML started in 1993, a year before Netscape was founded. Netscape wasn't even one of the first 3 browsers to implement tables. However, Netscape was the first to not follow the proposal, and invent something much poorer.

      And there were a few other things like "center" that looked easier to use than waiting for somebody (anybody!) to come up with decent style sheets.

      I worked with browsers that were able to use stylesheets before Netscape came with center (Arena, Emacs-WWW). Furthermore, before Netscape came with center, much work was done on HTML 3.0, which had the align attribute (and DIV). But no, Netscape didn't look at the draft, the invented the less flexible, and proprietary, center. Microsoft techniques.

      ...but the real dorkiness of relying on parsing quirks in html to get layout effects.

      Well, you can't blame browser vendors for the fact that so-called web-developers had no clue what the web was about.

      -- Abigail

  63. SGML ? by BigTom · · Score: 1

    There will never be a single document standard. SGML has been around for 15+ years. It has tool support (from hard core publishing tools to WP 8), it is a true standard, it has government and industry support (from the engineering people who really care about documentation).

    Why don't we use it? Because the software industry has NIH syndrome gone mad. Why use an existing standard format when you can roll your own format and maybe it'll become the new standard, you'll control it and make a fortune!

    Tom

  64. I tried for a simple explaination, if you insist.. by BoLean · · Score: 1

    META-MARKUP LANGUAGE, schmeta markup language. My examples woold have given you a better indication of what I was talking about, but since this wonderful forum parses out unrecognized or disallowd tags they did no appear.

    XML is a set of tags or identifiers that allows a parser to sort out a flat data file into discrete elements (or species as my Chem prof was fond of using). The Babel problem still boils down to the fact that the only way we as users can make use of this technology is if there are difinitive standards. It matters little if two companies develop their own Micro XML implementation. They don't need any help for that. "Hey IBM, here comes a file. The first 10 characters on each line is the persons name and from position 11-40 is their description." Step one should have been to establish a good generic XML. After that then third party industry groups would have had a standard to conform to. That way even if somthing as simple as:
    :
    (define database databasename)(define table tablname, keys fieldname, talbeclass tableclasstype fieldlist field1.filed2.field3...)
    :
    could be understood and viewed by any basic XML reader even if they didn't have the industry or company specific TableClass method built into the parser.The W3C dropped the ball years ago.

    The true promise of XML is to provide a data interchange format that allows far more detail about data structures and document/graphics formatting than current 'MARKUP LANGUAGEs" like HTML, SGML, SVG provide.

  65. DTDs are not that important by p3d0 · · Score: 1

    You forget about the benefit of a DTD.

    No I don't. :-)

    DTDs are not that important in the grand scheme of things. They specify the valid structure of a type of XML document. Period. They do no more than any other kind of data format specification. In fact, they do less than most format specs, because they do not specify any semantics. They don't fundamentally change the world.

    I'll be the first to say that XML is very convenient. I chose it for a project last summer because it fit the bill quite well. But the question was whether it will replace file formats, and the answer is no, because XML, even with DTDs, does not do what a bona fide data format does; namely, specify the semantics of the data.

    --
    Patrick Doyle

    --
    Patrick Doyle
    I mod down every jackass who puts his moderation policy in his sig. Oh, wait a sec....
    1. Re:DTDs are not that important by p3d0 · · Score: 1

      You make a good point that DTDs are useful for maintaining the proper format of a document. We still disagree, however, about the semantics.

      DTD's do indeed have their semantics documented, indeed most of the more common ones have their semantics documented MUCH more extensively than ANY proprietary format out there.

      Perhaps, but this is not a feature of XML. Nothing in XML mandates (or even assists in) providing semantics for each construct.

      So, again, I feel that this doesn't change the fact that XML can't replace proprietary formats. As I said originally, it's XML + DTDs + semantics.

      --
      Patrick Doyle

      --
      Patrick Doyle
      I mod down every jackass who puts his moderation policy in his sig. Oh, wait a sec....
    2. Re:DTDs are not that important by X · · Score: 3

      Man, all those people using SGML must be imagining these benefits then!

      Seriously, having a DTD is VERY helpful, because it allows you to edit a document using ANY SGML (or nowdays XML) compliant editor and ensure that you will be producing something which can be loaded back in to the original editor 100% cleanly (and without blowing away half of the structure that the original editor had setup). This is the specific functionality that the question was referring to.

      DTD's do indeed have their semantics documented, indeed most of the more common ones have their semantics documented MUCH more extensively than ANY proprietary format out there. Easy and obvious examples would be HTML and XHTML, indeed just about anything produced by the W3C. Better exampleswould include DocBook, TEI, MIL-STD-38784, and ISO 12083. I would argue that these are all documented much more extensively than most proprietary file formats. Certianly, being proprietary doesn't mean that the file format defines semantics any better than something with a DTD.

      Sure the semantics aren't enforced by the DTD, but they can be enforced by the end user, something which is typically not true when you're editing a proprietary format using a foreign tool.

      This kind of stuff is done by the US government on a daily basis.

      --
      sigs are a waste of space
  66. Yebbut by p3d0 · · Score: 1

    Personally, I think the government could do much to open up the playing field by making it so all documents sent to the government had to be in some openly documented file format (XML based if you like to pretend that XML solves all problems, or just some random binary format or what not.)

    Trust me, don't let the government decide on that format! I have seen government-designed data formats, and in the process of trying to think of every possible piece of data you might want to store, they forget to take a step back and look for a simple underlying abstraction.

    Of course, my opinion is biased by my experience. My appologies to the government data format designers reading this.

    --
    Patrick Doyle

    --
    Patrick Doyle
    I mod down every jackass who puts his moderation policy in his sig. Oh, wait a sec....
  67. Re:Nope by p3d0 · · Score: 1

    If Word21 needs to add additional elements or attributes to support new features, they simply create new tags.

    Sure, and they could presumably do that in their own format. But they don't, so why should we expect them to do it with XML?

    "XML can't replace proprietary document formats. That's like asking if ASCII could replace proprietary document formats."

    I must not be understanding what you mean when you're refering to ASCII since simple texts replace proprietary document formats all the time. TeX, CSV, RTF, HTML, PS, all are human readable text files.


    You just said it yourself. It's not ASCII that replaces these things, it's TeX, CSV, etc. Those are the file formats. ASCII is just a character encoding scheme.

    --
    Patrick Doyle

    --
    Patrick Doyle
    I mod down every jackass who puts his moderation policy in his sig. Oh, wait a sec....
  68. WordPerfect doesn't have that problem by Reziac · · Score: 1

    ... because it has had a STABLE DOCUMENT FORMAT for 7 YEARS now: WPDOS 6.1, and WPWin versions 6.1, 7.0, 8.0, and 9.0 all use the SAME default file format (presumably WP8 for linux uses the same format too; not sure what the final Mac version uses, but I can tell you it's compatible with WP5.1 DOS/Win). No need for everyone in a document editing chain to have the same version.. or even the same OS!!

    Word's lack of compatibility between versions was, of course, a marketing trick: an easy way to force everyone to upgrade to the latest version to maintain document compatibility.

    --
    ~REZ~ #43301. Who'd fake being me anyway?
  69. You've had better trolls by georgeha · · Score: 1

    mr. anonymous IT consultant trolling for one of teh top names in the software industry.

    George

  70. Re:An open question by eschurma · · Score: 1

    Why do we tolerate it? That is a VERY easy question. Not changing the formats over time would mean being stuck with the same feature set forever. Want more features (yes, you do, no matter what you say-columns didn't use to be possible, then were, now multiwidth columns are possible), need a more powerful format (or dtd or schema or whatever).

  71. Re:Absolutely! by TheGS · · Score: 1

    Of course the thing is that we have to get a single DTD supported as a standard.
    Someone else pointed out that MSOffice 2K documents may be "XML-based", but I don't think their DTD is public for other vendors to support.
    For graphics, we now have the public standard of SVG, which seems like it might get a really high adoption rate.

  72. Use a real text formatter by franknagy · · Score: 1

    A 3000+ Word document! *Shudder*

    Have you ever considered using something like
    TeX for stuff like that? That's what is the
    choice around here for scientific papers.

    --
    Dr. Frank J. Nagy Fermilab Computing Division Authentication and Directory Services Group
  73. In a word, No by ikekrull · · Score: 1

    XML is just a structured text file.

    There are no standards on what that structure should be, past definite rules on how the structure is described, and no way to automagically convert one type of XML construct to another.

    Believing that XML will unify file formats is akin to believing that SQL will make data held in one database magically interoperate with data from another...

    Regardless of how the information is stored, you still need to know what the information is and in what context to use it.

    XML does not address this at all. You could quite happily embed an entire binary word file in a CDATA section of an XML file, call it well-formed, say you're file format is XML-based, and have it pass through any XML validator in existence.

    The data would still be as useless as if it was a binary word file.

    --
    I gots ta ding a ding dang my dang a long ling long
  74. I want to see it all! Key = DOM by EverCode · · Score: 1

    XML is not the only thing that can be cool. There is also DOM (Document Object Model), the way XML can be accessed and manipulated. There are different versions of DOM, and the functions keep progressing.

    To tell you the truth, it really does not matter if a document is XML to be DOM compatible. You can make an SQL database DOM compatible. You can make a filesystem DOM compatible. You can make a binary Word 2004 file DOM compatible.

    If companies and organizations make everything DOM compatible, it won't matter very much how the actual data is stored. Just as long as it can be accessed using the same functions.

    Imagine if there was a DOM compatible filesystem... Hell, a server could use the filesystem for a database instead of something else. Can you imagine the efficiency improvements! Scot Hacker wrote a script to use the BeOS filesystem in a similar manner. He has a website running off of it, www.betips.net. I think he has the site running off a cable modem (or used to), so please don't slashdot it, just take my word for it, or visit it later.

    Anyway, XML is only one option to store data. XML has a lot of potential, and I think it is one of the best routes to go, but it should not be the only choice.

    Also, I think that Word Processors, Spreadsheets, Web Page editors, etc, are all going to become synonymous over the next few years. It is likely that they will use XML and XHTML (XML namespace).

    Also, please support Mozilla (mozilla.org), the organization has potential to expand beyond making a new web browser suite, as maybe the next major project could be implmenting better XML support into *nix.

    Thanks,

    EC


    "...we are moving toward a Web-centric stage and our dear PC will be one of

    --

    EverCode
  75. Re:Why not even html by GooseKirk · · Score: 1

    For the most part, I agree with this. To be honest, I haven't read enough about XML to get excited about it. From what I've picked up here and there, I understand two things about XML:

    1) It's the Next Big Hot Cool Thing That Everyone's Doing So You Should Too
    2) It's just swell stuff if you want to read web pages on your cell phone.

    So, in a nutshell, it's kinda like "push" technology, only slightly more useful because SUV drivers who talk on their cell phone will start browsing on their phone while driving and eliminate themselves from the gene pool.

    HTML certainly has its limits, but with a good GUI editor, for most people, I don't understand why HTML wouldn't work just fine. Would someone post a link or something explaining in hypeless terms just exactly why XML is da bomb?

  76. LaTeX GUI by sh_mmer · · Score: 1


    there's a really good LaTeX front end called Scientific Workplace, though it's for windoze and it's not free. but it's really good--infinitely better than any version of word ever. in fact, i'd go so far as to say that it's a pleasure to work with--really! the downside is that i don't actually know LaTeX; want to learn it tho...

    cheers,

    sh_

    --
    Interested in learning Chinese or Japanese? check out Chinese/Japanese-English Dictiona
    1. Re:LaTeX GUI by mouseman · · Score: 2
      there's a really good LaTeX front end called Scientific Workplace, though it's for windoze and it's not free.
      there's a really good LaTeX front end called LyX, that runs under Linux, and is free (beer and speech). There's also a version written for KDE, called klyx, which i haven't used.
  77. Probably not, unless... by ffatTony · · Score: 1

    XML Documents are quite large because all that extra relationship information is contained within the document as well. Because XML is just the content and not the layout you'll need that as well. XML is great because it is a really convenient format and with XSLT is easily trasferable into HTML, PDF, etc, but most people would rather have a nicely compressed binary rather than a text XML file and Stylesheet.

    I think XML will be invaluable for webdesign, for most applications config files ( MS has already mentioned going this route and I've heard mention of it on the gtk/gnome mailing lists ), and a few other purposes, but I think the small size and ease of use of a binary format will always be desirable.

  78. Yes, It Can by zx · · Score: 1

    If things like conglomerate finished. Just see their preview and I think it's so cool. So rather than bastarding MS Word take a lot at their code and let's hack it together OKAY?

  79. Re:Yes, It Can (Correction) by zx · · Score: 1

    So rather than bastarding MS Word, take a look at their code and let's hack it together OKAY?

    I wish I could edit my own post.....:(

  80. It is possible but not likely by smallpaul · · Score: 1

    Yes, there could be a truly interoperable "rich text format" used by every word processor and based upon XML.

    It hasn't been done for a variety of reasons:

    • developing it would be a non-trivial exercise.
    • most people in the "XML world" are more interested in generic markup than in WYSIWYG word processor formats. This means that the presentation is dicatated by a complex stylesheet which adds a level of complexity. This makes the problem even harder to solve (we've been working on it for more than ten years!) Some people would actively oppose a common word processing format based on the principle of direct WYSIWYG application of style.
    • HTML+CSS is "good enough" for some subset of people which takes away some of the impetus.
    • Microsoft would probably not go along. This means that 90% of the world's word processor users would still be left behind.
    • The competing word processor vendors are so small and weak that the development would actually incur a sizeable cost.
    • Word processors internally have very different ways of thinking about the organization of text and formatting. A common data format might inadvertantly lead to a common "look and feel" which would hurt vendors who charge extra for their extra features.
    • The feature set of word processors is not entirely congruent so you get into a typical "Lowest Common Denominator" versus "Extensible Pseudo-Standard" situation.
    • There are ostensibly already a few (failed) international standards in this area like ODA.

    Of course people in the opens source software world are motivated differently than those in the software sales business so if a standard was to arise it would probably arise out of cooperation among those groups. Still, it would take a lot of hard work and cooperation.

    I once proposed that the existing XSL Formatting Objects could be leveraged in this way once, but was shouted down by those who felt that any move away from generic markup is inherently a step backwards.

  81. Re:Why not use RTF as a standard? by kiatoa · · Score: 1

    I believe that RTF has problems and is not a stable standard. I.e. it changes with each release of Word. MIF (the FrameMaker interchange Format) on the other hand is very easy to parse and is very stable. I have written many scripts to manipulate MIF and find it to be a very easy to understand format. The biggest downside to MIF IMO is how verbose it is. I suspect this is a problem with any human readable document format however.

    --
    90% of the wealth is in 2% of the pockets. Bummer to be in the majority.
  82. Re:Nope by samantha · · Score: 1

    Huh? XML is meant to be used to store information in a non-proprietary format with tags and such to facilitate use of the information by more programs than just a single one. That is a large part of its power.

    It bothers me greatly that MS and other companies take my personal data in WP documents, addresses and so on and encode it in a way that I can't get at with anything but their tools. That is theft. I don't agree to be locked into a particular vendor just because I may have used their tool[s].

    Even MS is giving lip service to replacing their proprietary formats with XML.

  83. Re:DocBookX and CVS by epine · · Score: 1


    When people talk causually about XML they usually mean XML under a domain-appropriate DTD even if they don't realize it. Suppose I said I was going to write a book in ASCII and everyone jumped in to warn me that it is possible to encode proprietary content in ASCII. Are those comments all that helpful?

    When I researched this issue I came up with the same answer: XML + the DocBook DTD. I didn't try XMetaL because I wanted something that ran under BSD as well as NT. So I'm using the PSGML plug-ins for emacs. These are helpful, but not entirely adequate.

    I also asked around about XML transformators. Jade and DSSSL are reported to be reliable if you don't need to mess around with the style sheets. DSSSL style sheets are written in a subset of the Scheme language, which is a LISP derrivative.

    Some died in the wool SGML people regard DSSSL as the densest, most impenetrable jungle going. Pitch out SGML and then adopt SGML's big scary brother instead. Well, I was warned.

    In the pure XML sphere you have the immature XSLT technologies. These are supposed to be easier than DSSSL. You could spend a year of your life really getting up to speed with XSLT.

    It's ends up not being a choice of technology and more a choice of civilization. MS has zero sales base on planet Vulcan and Spock really *likes* DSSSL. Convinced? It works for me.

  84. Re:New file formats are not new! by epine · · Score: 1

    The correct leg work for a client using Win95 is not to bend at both the knees, but to extend one foot toward their lame backsides.

    Doing all the compatibility leg work sounds very client centric in theory, but it leads to a chocolate chip cookie with no chocolate in it.

    Writing a book which communicates effectively is a very hard task.

    I recommend that all authors put in the special effort to figure out how to set up two Win95 workstations to produce identical output under MS Office. It will make the writing process *so* much easier and the clients of your client will be very impressed that all the ligatures have come out perfectly. So impressed that they'll believe everything you say even if you didn't get your facts straight.

  85. Re:XML is the way we should go! by supakoo · · Score: 1

    If everyone used XML for their formatting ...

    Huh? Have you used XML? XML has very, very little to do with formatting (I suppose it depends on your implementation) and everything to do with document structure. As long as users just clickey-clickey their bold & italics, bump indents with a button, and make headlines with font changes rather than use a style consistently, this is a problem that XML won't solve. Even a 'save as XML' option. To say XML is the panacea is to not understand the problem. Even a tight XML DTD can be implemented crappily.

    --
    "Use the force, fool!" -- What Mr. T might say if he was a Jedi Knight.
  86. Re:I'll tell you how... by chthonic · · Score: 1

    Yes, you do find a way to deal with it. The complaint, however, is that the reality of working with most MS Products, is that you must have a copy of the exact version of the product that your client is using, and this must be the tool that you use. If you don't do this, the aggravation is such that the work isn't worth taking. Even if you do this, you will still probably run into problems if the document is non-trivial.

  87. Re:Hey genius... by chthonic · · Score: 1

    Yes, but even a GIF with it's proprietary compression and licensing is well-documented, and picking appropriate tools for the job is easy. With Word (and most MS documents), you need the same version as your client, and you must use that to product the document, or you're screwed.

  88. Gnumeric by bradstew · · Score: 1

    The Gnumeric spreadsheet from the Gnome project uses XML as it's native file format. I think their's is a project to watch. They also have as a goal to import Excel as perfectly as possible.

  89. I use XML quite often by xemacs · · Score: 1
    and the nice thing is that you focus on content!
    after my content is okay, I start building the look of my doc with a css stylesheet.
    The result is that any XML browser can render it the same way.
    My tools are:

    Editor: Morphon XML editor [java]

    Browser: Mozilla

  90. How about some remedy? by gh3 · · Score: 1

    Pity there wasn't anything in the DOJs proposal to require MS to completely document all of their past and future proprietary document formats. It may not eliminate their "monopoly", but it'd go a long way towards making it irrelevant.

  91. Definitely a Biased rant from co-inventor of XML by eshaft · · Score: 1

    You gotta admit, he's got a point - I'd never ever attampt to deliver some weird customized tag format to my customers, I mean, how would you even explain it to people who have their email forwarded to their secretaries because they don't know how to print it out?!
    XML and Style sheets are great for web design and the like, where PEOPLE DON'T NEED TO SEE THE SOURCE CODE, but thinking that Joe Lawnmower and Sally Broomstick are going to sit down and tweak the XML formatting (or whatever) of the frontpage of the New York Times over breakfast is ludicrist.

    --
    lf.o
  92. Cash by deefer · · Score: 1
    Is the problem. From a users point of view, a portable, extendable file format is a total winner. You can take your documents to your clients and feel confident that they will display them as you had formatted them.
    The problem is, too many vendors hide behind the proprietary formats to cling to their market share. It will cost you too much to change file formats after a time; it _is_ possible to reformat garbled documents into Word Perfect or Star Office, but the vendors don't want you to know that, nor does your PHB want to see you waste time moving stuff across. And vendors play on that.
    I wonder in the not so distant future - you'll get nailed by the local tax authority for some auditing or something, and the spreadsheet you need to prove your diligence comes up as "Error: file format not valid" for that 6 year old file you forgot to update...

    Strong data typing is for those with weak minds.

    --

    Strong data typing is for those with weak minds.

  93. Re:Ever hear of OpenDoc? by JonK · · Score: 1

    Such as? I'm intrigued - I've not managed to get it to barf on ISO-compliant SQL since 6.5
    --
    Cheers

    --
    Cheers

    Jon
  94. XML only benefits the priviledged few by Rares+Marian · · Score: 1

    Oh so a shop has to spend money on a billion packages, AND waste time converting docuiments WHEN they should be WORKING on CUSTOMERS NEEDS?

    HE has lost MONEY because of PRPRIETARY formats, has been needlessly DELAYED, and NO IT WWAS NOT HIS fault.

    I am seriously tired of the whiners who can't keep up when there is a CLEANER solution.

    --
    The message on the other side of this sig is false.
    1. Re:XML only benefits the priviledged few by Malc · · Score: 2

      It's nothing to do with proprietary formats. The docs from Microsoft product were being used in another. The format was effectively an open format for the people who created the two applications.

      No, the problem was caused by new functionality and changes in existing functionality. Even if the doc was based on an open format, that would not help with how Word renders the document.

      Before agreeing to any contracts it should have been clear what exactly was being delivered to his clients. If they had agreed to Word95 then he deserved to lose money. If they had agreed to Word97, then it was his client's fault. Somebody screwed up on the business side from the very beginning - it's not a technical issue.

      Don't forget, HTML is an open standard but no browser whether open source or not renders pages identically.

  95. Beast with two backs by Rares+Marian · · Score: 1

    Surely you jest.

    --
    The message on the other side of this sig is false.
  96. Conversion causes loss of formatting by Rares+Marian · · Score: 1

    The Word97 to Word95 converter was broken for several months. I had to use it at school it was ridiculous. You morons see a pretty screen you think the code under the hood is holy.

    --
    The message on the other side of this sig is false.
  97. I don't know... let's ask Sun by supabeast! · · Score: 1

    "is it possible that XML-based standard file formats suitable for word processor, spreadsheets, etc. could be created that forever do away with proprietary binary formats and inadequate file conversion routines?"

    I bloody well hope so... Perhaps Sun should look into doing this in the next iteration of Staroffice. Sure M$ would try to destroy it, but they do that with everything now...

  98. Bad engineering or deliberate forced upgrades? by epeus · · Score: 1

    It is perfectly possible to define a file format that is both forward and backward compatible.

    The IFF file format family, originally defined by Electronic Arts, and forming the basis of AIFF, AVI and MOV files does exactly this.

    MS is aware of this technique (after all it used it in AVI), but either through poor engineering chice or malice chose to impose frequent file format incompatibilities on users of Office.

    Maybe the right response is a class action suit by people like the original author harmed by this.

  99. Re:Forward compatibility by epeus · · Score: 1

    It is perfectly possible to define a file format that is both forward and backward compatible.

    The IFF file format family, originally defined by Electronic Arts, and forming the basis of AIFF, AVI and MOV files does exactly this.

    MS is aware of this technique (after all it used it in AVI), but either through poor engineering or malice chose to impose frequent file format incompatibilities on users of Office.

    XML is not a panacea - having to express all embedded images in a textual form is more than a little cumbersome.
    I'm sure this flexibility had a lot to do with the MPEG4 committee adopting the QuickTime file format.

    Maybe the right response is a class action suit by people like the original author harmed by this.

  100. Re:I don't think so! was: Re:Already Happened by holt · · Score: 1

    why shouldn't IE be able to view word documents? christ! they are both Microsoft products. microsoft developed the file format. so why shouldnt they be able to implement the format anywhere they want in their software?

    I dont understand how being able to view something in IE hurts anyone.

    oh, I am posting this from netscape 4.5 running on linux on a old pentium. i have 3 linux servers running under by control. so i am not MS-biased. I just don't understand why people have problems with microsoft adding features to their software.

    ugh. i am not even going to get started on the stupidity of this whole microsoft bullshit anti-trust lawsuit.

  101. XML & Abi... by Paul+Neubauer · · Score: 1

    I was looking for a cheap (ideally free, yes as in beer, like most folks think when you say 'free'), light word processor after I had gotten a 32bit Win* box set up. A few searches gave a few candidates, most pricy or with what looked like irksome interfaces. Then I saw AbiWord.

    The price was certainly right. The cross platform was good (Hey, I could also use this on that old HP) and the size was decent. That it used XML was bonus. I looked at a silly test file in a text editor just to see. Nice. No stupid blank boxes of who knows what formatting info, no silly extras, just readable formatting - that could probably be stripped out with an HTML to text program if really needed.

    I then spent part of an Easter break getting a Word document viewer going for my father. All the time silently muttering to myself about folks sending Word docs when a text file would suffice. Sure, you *can* open one in text editor, but you can't read it very well.

    I certainly hope XML takes over, but that depends on those making word processors and office suites seeing it being to their advantage to support and to *primarily* support. Of course I also hope lean software takes over, too...

    --
    I don't subscribe to RMS's GNUtopian vision.
  102. Sorry, but you really aren't getting it. by dwalsh · · Score: 1

    XML is a META-MARKUP LANGUAGE, a language for defining markup languages. It is not a single file format, that every application will understand. For example, an app. for chemistry will read CML (Chemical Markup Language) and will never be able to make sense of a document in HRML (Human Resources Markup Language).
    The fact that so many different standards are being built upon XML is a testament to its success. The benefit brought by XML is that it removes low-level file format, encoding etc. issues, and allows the creation of structured documents/data. The fact remains, however, that a chemistry program is needed to understand what a element means, and HR software is needed to understand what a tag means.
    The fact that in some cases industries can't haven't developed a single schema for their domain, is unfortunate, but not catastrophic. There will be a shakeout that will remove many formats, and those that remain can be transformed into one another (e.g. via XSLT).
    This Tower of Babel nonsense is standard issue XML FUD. The fact that say, GM and Volkswagen, haven't decided on a single format for car components and what it means, is not a failure of XML or an issue that can be solved by any technology.

    --
    ${YEAR+1} is going to be the year of Linux on the desktop!
  103. I shouldn't have to tell you this by donutello · · Score: 1

    If you want to add new features and the existing file formats don't support this, you have two choices: Don't add the new feature or change file formats.

    The peer-pressure argument works both ways. If your file formats are not compatible, that is as much a deterrent for the first customers to upgrade as it is an incentive for the last ones to do so.

    Whenever a product changes its file formats in a way that makes it incompatible with existing products or versions of itself, it is because someone somewhere along the way made the decision that the new features were worth sacrificing format compatibility for. The success (or otherwise) of that new product is a testimony (or otherwise) of customers agreeing with that decision.

    I'm defensive about this because I work on a product where we have recently had to change our database formats between versions. Luckily, our customers constitute a very small niche market so it was possible for us to individually educate them on why we had to do it, but we still did our damndest best to work with what we had before we ultimately made the decision to change formats.

    An argument can be made, of course, about why the formats were incompatible in the first place. But often, it's only possible to see so far into the future.

    --
    Mmmm.. Donuts
  104. Re:You have it backwards. by eightball · · Score: 1

    More than anything, I think it is a problem with which keyword is chosen..
    As I read it now, it is rated '2 - flamebait'.
    When I looked at what each vote was, there was one flamebait, two funny and one overrated (knock down from 3, I would guess)..

    Looks like it should be rated 'funny' if just because of votes..

  105. Cost of Word Time Vs PC by bluetoad · · Score: 1

    I've never seen a document that size handled properly by Word. It is probably cheaper to buy a PC and a package like Framemaker or Applixware and hand the whole kit to the customer than to spend the hours of aggravation battling Word. Then they can edit the document.

  106. Exactly! This XML cheering dumbness is depressing! by porttikivi · · Score: 1

    I thought Slashdot readers where people with basic understanding of a concept of a "data format". So how is everybody having the concepts of syntax and semantics so confused with XML?

    Standardizing XML public DTDs is practically no easier than standardizing any other public data formats. Whether they are binary, ASCII or Unicode based makes no real difference.

    Interpreting XML documents conforming to a given DTD is practically no easier than interpreting any other standard data formats. The syntaxes (where XML helps) are not the challenge, the semantics are (no help from XML in converting the semantics of one DTD to another).

    Something like XSL is nice, but again: standardizing on one form of machine interpretable rendering instuctions (semantics specification language) is no easier than on other, regardless whetehr the instuctions are XSL, Postscript, TeX, Tk, OpenGL or whatever.

    --
    Anssi Porttikivi / app@iki.fi
  107. Re:Learning XML, XHTML by grarg · · Score: 1

    have recently been trying to read up on XML and XHTML. (is that slightly redundant?)

    They're not the same thing. XML is a meta mark-up language that can be (and is) used to define any number of mark-up languages like XHTML, MathML, SVG, WML (W=WAP) etc. Indeed, you could theoretically write your own version of HTML through XML, but better to stick to the one that has already been defined, namely XHTML.

    It's worth learning XML because it is going to happen; the software support (not even including MS) is coming on-line. The sooner we can get rid of .doc the better.

    Re comments above about Word compatibility issues, I've had the best fun on occasion trying to open a file created in Mac Word '98 subsequently in PC '97 or 2K. At this stage, I've resorted to laying out projects etc in HTML (!) or else the truly non-proprietary document type: good ol' ASCII :-)

    --
    The conclusion of your syllogism, I said lightly, is fallacious, being based on licensed premises
  108. xml+dtd+stylesheet is the way to go! by sds · · Score: 1

    in short, switching from layout-based formatting to contents-oriented one (using xml/dtd/stylesheets) will allow separating file formats from the editing software, which will solve most of our problems. please see details.

  109. it's called a monopoly by alterneight · · Score: 1
    This is all really very simple - the microsoft monopoly has pushed Word onto everyone's desktop as a standard you can't be without, so there's no pressure for interoperation anymore - and so no pressure for a common standard. I remember a time a few years back when every document I sent, I sent in RTF - because WordPerfect users needed to access it. Now? Microsoft monopoly tactics more or less killed them, and everyone sends word documents.

    Oh, also of course, all those nice little OLE features of your mail reader are oh-so-supported by Word...

    Thr trouble with having my attention drawn to the Microsoft monopoly is that it makes me really angry - I realise how much, every day, my life is adversely affected by it. How much stupidity I have to put up with for it. And no, a few advanced clever features do NOT justify it.

    These days I find Word unusable (too complex, translated documents in particular seem to be impossible to edit or reformat) - so I mostly use an HTML editor, which is a big step backwards, but if someone could enhance it a bit (cut and paste of table cells...) it would be all I need. I'll happily live with the standards that are out there - they are enough.

  110. XML actually Damaging to Standardization? by benwb · · Score: 1
    There's a couple of things that XML does that may actually damage standardization of file formats:

    1) It's really easy to create a new file format. You don't really have to think about it- just come up with some descriptive tags and voila, a whole new format.

    2) It's really easy to parse the structure of an xml document. There's not really an incentive to to come up with a standard library for parsing an xml schema because it's so easy to role your own.

    3) Few vendors are going to want to work together to define a file format for a type of application. This isn't even necessarily as evil as Microsoft trying to control a standard. Developers by their very nature are going to have different ideas about how a document should be structured- that plus the much larger design time required for a file format which applies to a broad group of features will almost ensure that in the near future no one will be using a common standard. I like XML a lot, but I think it's foolish to think that it's going to help standardize file formats. If we wanted to do that it's really not any harder to do using a binary standard.

  111. Re:Typical slashdot ignorance by benwb · · Score: 1
    I was all set to jump on you about your second point, until I looked up the binary file format spec on microsoft's web site.

    To quote (Gosh, I really hope this falls under fair usage):

    NOTE: The Microsoft Office binary file format documentation was removed from the MSDN in 1999. If you would like to receive this documentation, you can send e-mail to officeff@microsoft.com, or mail to:
    Office File Format Documentation Request
    One Microsoft Way
    Redmond, WA 98052

    Office Development Structured Storage and File Formats

  112. Re:Barriers exist right now... by Brazilian · · Score: 1
    I really believe that the first company releasing a first class DocBook/XML editor for a price under $100 will make an absolute killing in the marketplace.

    I agree that an XML editor for under $100 would make a killing. Not DocBook though - there are a lot of things that DocBook lacks support for (my Ph.D. thesis being one of them).

    What I would love to see would be an editor that would allow me to easily generate a DTD, easily edit XML based off that DTD, then easily generate style sheet(s) for the media that I'd like to publish it on. My thesis is a perfect example - the university has a specific format (the DTD), I need to generate the thesis itself (the XML), and I'd like to publish it both on paper, as HTML, and as plain text (the style sheet(s)).

    Even better yet, as I anticipate the thesis being a cobbling together of several papers, I'd like to be able to import XML elements from the previous papers into the thesis to save time.

    So far I haven't found anything even remotely close to resembling good XML tools. James Clark's XML/SGML tools are a start, but I don't want to have to remember dozens of tags when I edit (for that matter, I don't want to have to know the exact DTD or stylesheet syntaxes) off the top of my head..

  113. It will never work. by greggman · · Score: 1

    The best you can possibly do is something like HTML where there are tags you can ignore if you don't understand them but eventually these tags will become require reading.

    There is no standard which can cover every new idea even XML.

    If I start with some format that can store ASCII text and then later somebody comes up with an idea for BOLD, ITALICS and UNDERLINE and I didn't already think of that in that original spec the best I can do is add it in as a proprietary tag.

    Now of course I suspect that BOLD, ITALIC and UNDERLINE are covered by some part of the XML spec but what part covers inserting Flash Animations or 3D data based on Nurbs or 3D data based on Metaballs or 4 dimensional tables.

    The answer, it does NOT cover it, it expects you to use proprietary tags that others will then ignore. Unfortunately as some point those tags will become required in order to get the information out.

    You can't win. It's an unsolvable problem. The best you can do is pick a simpler format to "export" to that doesn't save all the nifty data that the original application needed to make life easier but that most software can read. A perfect example of this is Photoshop. The original spec of any graphics format just specified a rectangle of pixels. Now a photoshop file consists of multiple layers, some of them not made of pixels with various effects applied. Tomorrow there will be more types of layers with more data no other application will understand. The best you can do is export to some other format like Targa or Tiff or JPeg but of course you loose all your flexibility.

    The same will be true for text. Some of the text in that file may be generated on the fly, Like date of modification or a link from some other page or an extract of data from a database into a chart. You can't put the "links" to that data in XML in a non-proprietary way. The best you can do is store some non-proprietary version of the result of those links in the file and then other software can display the information at the time of saving the file. Of course if this is not what the author wanted you to see (ie, he wanted you to see a flash animation, not a still image of the first frame) then your are out of luck.

    -gregg

  114. OS X Uses XML As Standard Format For Config Files by Bill+Daras · · Score: 1

    Mac OS X uses XML as the standard format for all its configuration files. Which is in my opinion, a wonderful development. I hope it catches on elsewhere. Using Linux, I despise having programmers use many different formats. Jumping between them is incredibly annoying, trying to recenter yourself everytime. "Why isn't this WORKING! Oh yeah, that's how it's done in the other app!"

  115. Re: PS is not ideal... by taiwanjohn · · Score: 1

    The original point was that PS is weak as a "document" format because it was never intended as a such, but rather as a "page description language" used for talking to printers.

    The classic example is hyphenation. Many PS tools insert hard hyphens and line-breaks, making it impossible to *reliably* reconstruct the original "flow" of text, for whatever purpose (searching, grepping, etc.).

    Personally, I think XML with an embedded DTD and XSL/CSS formatting info would fill the needs of a "universal" document format.

    For most users, even HTML is overkill...

    --
    XML is like violence. If it doesn't solve your problem, you're not using enough of it. --AC
  116. Re:Already Happened by ^ · · Score: 1

    Not really. It's HTML based, not XML based. Just because it's based on HTML doesn't mean that anyone else can look at the document.

  117. Whos fault is it? by chafey · · Score: 1

    Come on, no experienced technical writer would agree to use Word for anything that large. Although MS Word is buggy, you need to take responsibility for agreeing to something that isn't practical. If I agree to deliver a web search engine running on an Apple IIe, it wouldn't make any sense to blame the hardware when I fail...

  118. Re:Be careful when throwing around the word ignora by BagMan2 · · Score: 1

    I agree that a common document format is desireable. I just get tired of seeing the slashdot crowd blame Microsoft for all their woes. To quote the topic paragraph: "and will never forgive Microsoft for their abuse of me and my kind." Microsoft is not guilty of anything here. Almost every word processor on the market saves their data in a proprietary format. The PDF/FrameMaker files that he wanted to give them are just as proprietary, but I don't see him bashing Adobe for the same sin.

    No, I am the one seeing things clearly here. The slashdot crowd has long had problems seeing things objectively as it relates to Microsoft. Further evidenced by my original post (which was not a troll) being marked down to a -1 score by the moderator. In fact, I have found that any defense of Microsoft is always left at 0/1 or even moderated down. The whole process on this message board gives me the feeling that it's just a bunch of linux lovers trying to make each other feel good about the fact that they can't run 90% of the applications on the market effectively under 'their' operating system.

  119. Re:Why XHTML? by cybermage · · Score: 1

    Attribute value pairs cannot be minimised. So instead of tags like you must instead write .

    You do realize that Slashdot strips most html, right? Could you try that again substituting parenthesis or something?

    --

  120. Re:Why XHTML? by cybermage · · Score: 1

    Eeeewww. Thanks for clearing that up though.

    --

  121. Re:Bloatware by gatekeeper-eu · · Score: 1

    XML documents could also be regarded as bloatware, but because there is lots of 'free' space in XML documents they lend themselves very well to compression. This is a current topic over at XML.com. Perhaps a great advantage for specialist 'publishers' is the use of DTDs (Document Type Definitions). By publishing this, either with the document or on the web, the publisher can use eXtensions to vanilla XML to create specialist or unique features. As for 'Print Ready Copy' I think the jury is still out!

  122. Re:TeX coolness by jorbettis · · Score: 1
    Yeah, TeX is great. I use LaTex for all my documents, but what about making a WYSIWYG editor for TeX?

    I think that would be a great idea. TeX has already proven itself as a great document format. The best thing is that, because of macros and stuff, everybody can kind of go thier own way with it, but the actual file will be readable by everybody because all of their macros will still decompose into standard complient TeX.

    --

    Jordan Bettis

    ``Wherever you go, there's another stupid sigfile quote.''
  123. Format Is NOT The Problem (much) by buzzcutbuddha · · Score: 1

    I don't think the problem is the format that the data comes in as much as it is the viewer used to get at that data. Star Office and Word Perfect have shown us that it is possible to open foreign binaries without much difficulty. The trick comes in deciding how it's displayed. Netscape and Microsoft can't agree on how HTML is to be displayed, and HTML was supposed to be the great equalizing formatting language way back when. Netscape and IE render XML differently now, and it doesn't look like they will agree on it either.

    The biggest problem is getting everyone to agree what the finished output will look like.

    1. Re:Format Is NOT The Problem (much) by gammatron · · Score: 1

      Netscape and Microsoft can't agree on how HTML is to be displayed, and HTML was supposed to be the great equalizing formatting language way back when.


      HTML is not supposed to look the same on every browser. That was not ever its intended goal. A lot of people have tried to turn HTML into a page description language like PostScript or .pdf with various tricks (tables for page layout, Flash, etc). The whole point of HTML is that the author gives various pieces of the text different attributes and the user (and his browser) determines how to display that information.

      --
  124. Not quite the right question, but close. by ikaros · · Score: 1

    The question is really, "Which open-standard text formatting system should replace the MS DOC format?"

    Full document sharing requirements necessarily preclude proprietary formats. They have to - otherwise, as we're seeing with .DOC, you find yourself locked into a format that's not stable even across revisions of the native software - and given MS's legendary lack of reliability, sometimes not even stable between two machines running the same rev.

    It also requires that the document is freely editable, so that knocks out the otherwise quite useful Adobe Acrobat PDF format.

    PostScript, XML, and RTF remain viable (although I'm not certain of RTF's parentage - is it an MS format primarily?). I can't think of any reason TeX shouldn't be a contender - of course, my familiarity with TeX/LaTeX is limited, so there may be reasons it shouldn't be considered a standard.

    The point is that XML is certainly an option for a fully-portable document format, but it's not the only option.

    What are the primary open-standard document formats, anyway? I'm sure I'm missing a lot of them.



    ikaros, who's a moderately obsessive WordPerfect partisan :)
    --
    You're only as young as the last time you changed your mind -- Timothy Leary
  125. My pain in the Publishing Industry by doublem · · Score: 1
    Let's see. We have 130 courses. Some started as WordPerfect, others started as Word, our authors have everything from Word 2.0 to Office 2000, and the pagination changes every time we print on a different printer than the one the document was formatted with. WordPerfect graphics get trashed by Word. Neither WP or word can really open files generated by previous versions of the "same" software!

    Now we're loading all the courses on the Internet in HTML format, which Word and WordPerfect can't generate worth spit, which means they all have to be done by hand. Of course, a permanent migration to HTML is out of the question because the owner is too cheap to buy decent HTML editing software, and HTML isn't very good for printing books.

    I'd say there's plenty of room for XML to replace the hodgepodge of formats I have to deal with now!




    Matthew Miller,

    --
    "Live Free or Die." Don't like it? Then keep out of the USA
  126. Re:Bloatware by DrSkwid · · Score: 1

    i think this really goes to show how complex document presentation requires planning and forethought

    the things people are giving horror stories about here have solutions already built into Word.

    not to diss anybody or any product but just sitting down and typing is clearly not the approach for any large scale project which a thesis no doubt is.

    most people don't spend the time learning the true capabilites of such complex programs as Word and end up spending hours re-jigging things that should take moments if you use the tools in front of you.

    Heck most people I've witnessed don't use paragraph styles and love to spend ages changing fonts and sizes manually - trashing any special formatting they put in for special cases.

    People are averse to learning when they think they "need to get on with it". If you do a lot of document preparation then spend a week learning the tools available.

    Sadly most users are scared to even try because they think they'll never understand it. How many people have you seen that don't even know what tabs do or how to set them!
    .oO0Oo.

    --
    There are places where the networks are not touching,and there are places where they are-Boeing's Lori Gunter
  127. MS Word and Linux Alternatives? by Doctor+Fishboy · · Score: 1

    A sob story:

    I wrote my thesis in Word95 two years ago. Yes, I am a Linux user, but my thesis had over 200 figures that needed to be drawn, and I did not fancy the horror of LaTex with a poor version of xfig. Believe it or not, I think the Word drawing utilities are very good, and I needed the ability to modify and use a WYSINWYG (WYSI Nearly WYG) editor. I was in a hurry, I was weak, the dog ate my homework, etc.

    But, oh boy, have I paid for my mistake - I tried recently to get a couple of figures out using Word 2000 and all the inserted images and painstaking drawings are gone - with a lot of fuss I have PDF'ed it all in 97 and I have the originally printed Postscript files still to hand, but I now wish I had a time machine so I could go back and slap Mr Gates around the face a few times.

    The trouble is, there *is* no broad cross-platform alternative - StarOffice doesn't cut it at the moment and I'd be terrified to commit to writing a big thesis-type document with it. I don't want my work beta-testing a word processor.
    At the moment it seems that plain text or maybe HTML is the only way to ensure your text and graphical content gets through.

    1. Re:MS Word and Linux Alternatives? by wdavies · · Score: 1

      Use Latex :)

      I converted my Thesis from Word to Latex, and it only took a couple of days. I'd had a problem of cutting and pasting 4 or 5 papers I'd written in varying versions of word, that had completely trashed the styles and numbering

      Dump your Word file to HTML and then start tagging.

      It is typographically much much nice than anything Word produces.

      Get the Latex Handbook. Its works real well.

      Winton

    2. Re:MS Word and Linux Alternatives? by roundclock · · Score: 1
      I used Office 97. I knew that it "saved files as html".

      My teacher said I knew nothing about HTML in my web page class. She said it was crap. I said how did you know?

      Wait a minute, it was obvious! The HTML support in Office 97 was crap!

      Then I used Office 2000 this year. It made great looking web pages. Well, my teacher and school uses Netscape 4.7. She asked me why I would use all these "advance features". I said because I know my stuff. She then told me no, it is actually because I don't know any better and my HTML is crap.

      This is not a true story. But, it could be, which is my point. No trolls, I know it is dumb.

    3. Re:MS Word and Linux Alternatives? by jilles · · Score: 2

      I would have had to learn latex instead of framemaker, so it wouldn't have saved me time. BTW. the reason I was working in word is because I got sick of compiling & debugging my texts. I used TeX before and I have to admit it is usefull if you are doing math formulas. If you are not (like me), you might as well work with a decent word processor.

      The problem with word is that it is not very suitable for writing stuctured documents. It has all the necessary features but the implementation is crappy. Framemaker however is all about structure. It suits all my needs and provides the comfort of a wordprocessor whereas Latex does not. In addition it has nice graphical features and you can embed objects from other programs. Latex would force me to convert all my images to eps (not supported by the programs I use).

      Having experienced Latex, Framemaker and Word I can say that I consider both Word and Latex a step backwards. Latex delivers nice results but sends you back to the stoneage from a usability perspective. Word is exactly the opposite: the interface is very nice but the result sucks. Framemaker is in the middle, decent result and a fairly good UI (not perfect though).

      --

      Jilles
    4. Re:MS Word and Linux Alternatives? by jilles · · Score: 2

      I had the same problem 1 1/2 year ago. I started out working on my master thesis in word 97. After a few weeks of no problems I ran into some problems with images. Since it was not the first time I looked for an alternative.

      I then installed framemaker 5.5. It took me a week to convert my document and erase all traces of ms word from my thesis. It was a good move, though. Framemaker is excellent for creating structured documents (such as a thesis). I haven't looked back since. I now work as a Ph D. student and write all my papers in framemaker. I have not run into any serious problems yet.

      I particularly like how framemaker forces you to work (structure your document using paragraph and character tags). This is also the way I used word in the past. Unfortunately word automagically fucks op your document structure if you don't pay attention.

      I wouldn't consider any other wordprocessor at the moment than framemaker. Word is nice if you know how to work with it, however it is too buggy to do any graphics in it. Basically anybody I know who ever tried to do anything serious involving graphics and word has had to deal with all sorts of bugs in word. Framemaker doesn't have this problem. It's rock solid, available on many platforms (including Linux). It's also very suitable for scientific documents since most conferences and journals have templates for framemaker (if they haven't it's usually easy to create them yourself given a detailed description of how the document should look).

      Of course you don't have much interoperability with word. You can import word, but the result is usually not pretty. You can also export word/rtf but there are some problems here as well (especially with graphics).

      --

      Jilles
  128. Re:Barriers exist right now... by mr3038 · · Score: 1
    it looks to me like you've just completely decribed LaTeX

    umm... NOT. Have you tried to make your own "style sheets" with LaTeX? Pretty hard I would say. Editing .css-files is a joy compared to that.

    Though I have to admit that for math markup there is no real competitor.
    _________________________

    --
    _________________________
    Spelling and grammar mistakes left as an exercise for the reader.
  129. Drawback to standards by Zordak · · Score: 1

    I don't think solid, open standards generally have many drawbacks for users. They never have. The problem with them is that corporations have huge vested interests in their proprietary binary formats, and as long as people are willing to pay money for their product, they're going to use those formats. I look forward to a world where a web page is a web page, a document is a document, a spreadsheet is a spreadsheet and so on. But I can't see large corporations benevolently giving up their hold on whatever part of the market they individually hold in the near future.

    --

    Today's Sesame Street was brought to you by the number e.
  130. Re:LaTeX and dissertations by Signail11 · · Score: 1

    I'd be more afraid of being hit with MTW than a dictionary. I'm a bit surprised that a grad level course in GR would be taught using _Gravitation_ as a primary text.

  131. XML by Kooki+Monster · · Score: 1
    The reason that people use Word is because, well, everybody uses Word.... As much as I hate it, and as much as I don't want to admit it, it's the industry standard. Windows is only popular because businesses need to use office - as anybody on this side of the world will tell you, the ECDL doesn't cover Corel Office, you don't learn about X-Windows....

    XML requires an SGML parser - something like Lynx can be embedded in a typewriter, things like Mozilla / Opera / MIE will work under PC architectures, but I think perhaps the problem isn't rendering things... it's creating them. You or I, the average Slashdot reader shouldn't have much of a problem creating XML documents, but what about Joe Public? What about all those sad little Mac people who can't handle more than one mouse button (no offense ;o) Users(TM) like to Drag'n'Drop(TM) - have any of you ever seen what the code that comes out of Frontpage looks like?

    Snip - ed.

  132. Re:Typical slashdot ignorance by piranesi · · Score: 1

    don't forget that MS's MSDN also (somewhere) has some source for a Word document viewer

    the only problem is that it
    a) doesnt seem to render most Word docs correctly
    and
    b) the license states that you can _only_ use it under Windows


    but of couse that doesnt really matter since its pretty worthless anyway. Personally i've been waiting a couple years for the opensource movement to come up with a metadocument(by the way the other things i'm waiting for are audio & video codecs, some kind of platform abstraction so linux binaries can run on x86, ppc, alpha, andall the rest(i don't know how to do it but it needs done), an X-windows replacement (one isn't butt ugly, doesnt suck, and knows the difference between a client and a server;^p) and finially a nice abstracted unix management tool.

    but what do i know

  133. Why not use RTF as a standard? by stikves · · Score: 1

    You mentioned RTF. Why is RTF not the standard in Linux, it is easy to generate and parse and allows us to share documents with Word users (by the way i personally don't use Word as long as i can do my work with LyX)

  134. I'll tell you how... by GNUs-Not-Good · · Score: 1

    It raises the question that SlashHeads scream about all the time, "What about the customers needs?"

    Everytime some corporation does something that the kids here don't like, you hear,

    "We are the customers, they have to listen to us."

    Then comes along this guy, who complains because his customers use a doc format that he does not like and has trouble with. If he is a professional, he finds a way to deal with it.

    Dumbass.

  135. So..apply the patch... by GNUs-Not-Good · · Score: 1

    I seem to remember when there is a security flaw with a Linux distribution, everyone screams that anyone who is a professional would apply the patch, otherwise they have no excuse to complain. What is this guy's excuse?

    The same applies here. If this man makes his living doing documentation, he should keep up to date with the latest and greatest patches.

  136. Yeah...but... by GNUs-Not-Good · · Score: 1

    he is supposed to be a documentation professional, so he should have a copy of all the tools his customers require, whether he likes MS or not (in this case Word95 native). If he does not like MS, he should find customers that meet his needs, even though that is not the way it usually works.

  137. Microsoft Word Viewer by ngaihua · · Score: 1

    Actually, for those who don't have the latest version of MS Word, you still can view the file with a Word Viewer available from Microsoft free of charge. I have word 95... but haven't upgrade since (don't see the advantage of all the new bells and whistles). But I still view newer word files with their viewer.

  138. Re:Learning XML, XHTML by TomV · · Score: 1
    What we need is a database of DTDs so that instead of writing my own I'll try and make something that is compatible with someone else's.

    To quote from BizTalk,

    BizTalk is an industry initiative started by Microsoft and supported by a wide range of organizations, from technology vendors like SAP and CommerceOne to technology users like Boeing and BP/Amoco. BizTalk is not a standards body. Instead, we are a community of standards users, with the goal of driving the rapid, consistent adoption of XML to enable electronic commerce and application integration.

    Now, I know it's MS-initiated, but it's not MS-controlled, and it is a start

    TomV

  139. Re:Good Idea but counter gratuitous complexity by TomV · · Score: 1
    FUDbuster!

    As the linked page states, this file definition comes From the Office book, found in the Microsoft Office Development section in the MSDN Online Library. .

    Can we have a little less of the 'undocumented formats' thread, then?

    TomV

  140. What makes you think SVG is working? by The+Pim · · Score: 1
    See for example http://www.levien.com/svg/report1.html .

    Repeat as necessary:

    XML does not guarentee interoperability, editability, viewability, or manipulation.

    Please don't think I'm at all down on SVG. It just seems there's a need to remind people that XML does not cure all your object definition and access woes.

    --

    The evaluation of an action as 'practical' . . . depends on what it is that one wishes to practice.
  141. Maybe you shouldn't be such a baby by jsin · · Score: 1

    What's wrong with RTF anyway?

  142. Re:Why not even html by Refrag · · Score: 1

    Ever heard of file compression? You'd be amazed at how well a text file will compress!

    ...well, I wouldn't, but maybe you would.

    --
    I have a website. It's about Macs.
  143. Re:Learning XML, XHTML by mdray · · Score: 1
    We have just started using XML for our online databases at work. We are using Cocoon (a bunch of java servlets running on Apache) to pull stuff from a MySQL DB (using the Cocoon SQL processor) and then format it into WML/HTML depending on the user agent. This is all handled by Cocoon with no perl/php/scripting. The advantage to Cocoon being written in Java is the ease with which you can create a processor for just about any application, eg accessing POP mailboxes. Currently processors for SQL and LDAP exist.

    Initially it is a steep learning curve - but the rewards are worth it. You can have your web designer make up the XSL stylesheets which are then applied to the static or dynamic XML, thus keeping content and design entirely separate.

    XML is here to stay: this is how the web should have been from the start.

  144. XML won't change anything by blirp · · Score: 1

    XML won't do anything that
    a) hasn't been done before
    b) can't been done more elegantly

    The main reason XML won't help for anything, is that it solves the easy question, syntax.
    Semantics is left to each and everyone of us.
    So, we'll end up with gadzillion of ASCII documents that no-one can read.
    Instead of gadzillion of binary documents that no-one can read.

    Remember how much a pain HTML is/was, even with only two competing decoders?

  145. Re:Good Idea but counter gratuitous complexity by Rico_Suave · · Score: 1
    Bah - that's just more anti-MS zealot FUD.

    --

  146. Re:Hey genius... by Rico_Suave · · Score: 1
    Troll? This post is right on. It's not Microsoft's fault at all. If someone needs a GIF file, I'm not going to send an EPS format to them.

    --

  147. WTF?? by Docrates · · Score: 1

    C'mon, do you know how XML works?, the chances of a file format created using XML replacing proprietary file format is the same as any other open source file format, even if it's just a C++ data structure, replacing proprietary file formats... there's nothing to XML that will solve your problem!! that's like saying, hey, the new jFAT64 file system is out, will i be able to replace proprietary file formats with it??, it just doesn't work that way... you can probably recreate word95 formats using XML and can do it without XML, it's all in the parser/interpreter/renderer. so you created a very nice, very complete file format using XML, well now every word processor has to support it properly (parse it, interpret it, render it) in order for it to get anywhere, but the same is true for anything you do without XML anyways!!!


    ========================

    --

    There are two kinds of people in the world: Those with good memory.
  148. Re:Not Already Happened by Frizzle+Fry · · Score: 1
    A link to that byte article is here. As the title, "Microsoft XML: The Cup Is Half Full: W2K Does XML, Sort Of" implies, the article states that it is somewhat good and somewhat bad that MS has bastardized XML. You may disagree. I'm not sure I buy it.

    Also, if you are looking for a word processor that does use XML -- genuine, correct, pure XML -- as it's native format, you should check out AbiWord.

    It's all I use now, ever since I got sick of giant, bloated, slow StarOffice, the office suite that thinks it's an operating system. It's nice to finally see a word processor aimed at people who need want a word processor, not a spreadsheets not an email program, and not sure as hell not an "integrated desktop". Stupid StarOffice. Oh yeah, and AbiWord is GPL'd, unlike StarOffice. Kick ass.

    The bus came by and I got on
    That's when it all began
    There was cowboy Neal
    At the wheel
    Of a bus to never-ever land

    --
    I'd rather be lucky than good.
  149. XML Based Word Processing by emin · · Score: 1
    Something I've wanted for a long time is an XML based word processor. I would like to write a document indicating bold, italic, chapters, footnotes, equations, references, figures, etc. using XML tags. Then I'd like to run some kind of compiler that would take the XML and produce the output in ASCII, HTML, PDF, PostScript, Man Page format, Info Format, or whatever. For example, I could use a tool like this to write my resume ONCE and have it converted to whatever format I need to use it in.

    I can kind of do this with MS Word, or with latex+latex2html, texinfo, etc. but those are all incomplete. I think XML could make word processing much better not only because of the backward compatability, but also because you can write one document and "print" it in lots of different ways.

    Have applications been written which do this kind of thing or do I need to write one myself?

    -Emin Martinian

  150. It's started, let's hope it finishes by GrouchoMarx · · Score: 1
    Using XML as a file format, or more accurately an XML-based DTD, won't automatically make all conversion problems go away. Converting from one DTD to another will still be a chore, and will still probably loose something in the translation. However, it will be easier than it is now, especially with converting Windows -> Mac -> Linux, etc.

    The real advantage of XML-based file formats is that they should be easier to write a program for than binary ones. Check out the Open eBook Initiative. Given that the first round of eBook readers never went very far, someone got the bright idea that a standard, straight-forward format would be good for business. It probably will be. But it's also good for developers. If the OEB spec really lives up to claim and is a genuine XML-document DTD, it shouldn't be that hard to write 3rd party book reader programs for existing devices like, say, the Palm? The upcoming Yopy? Your Linux desktop?

    That's where the real benefit of XML files come in. They will help in conversion, but will really help in simplifying the process of just writing compatability for the DTD into 3rd party programs.

    --GrouchoMarx

    --

    --GrouchoMarx
    Card-carrying member of the EFF, FSF, and ACLU. Are you?

  151. Proprietary Formats = Never Confused by bonzoesc · · Score: 1
    Even though most modern word processors can read HTML, they do it in different ways. Some, like FrontPage, basically try to preserve the formatting, and just edit the content. Others, like Publisher, try to convert everything into tables. Hardly any two web browsers render the same HTML the same way. Therefore, a proprietary format remains good because if only one program reads it, there will never be any confusion about how to render a given tag.

    If you were to write a reader for a common format, you would have to put a lot of effort into it to make it look exactly like a closed-source viewer for the same format. If you made your own closed-source viewer/editor for your own proprietary format, you could guarantee that a document saved in that format would look the same. That is why proprietary formats are popular.

    "Assume the worst about people, and you'll generally be correct"

  152. Bloatware by srhuston · · Score: 1

    (First Post?) But seriously, I have a class on Digital Circuitry Design this semester at college, and I have experienced this firsthand numerous times, including the time when I completed an exam in 10 minutes, and spent the next TWO HOURS getting M$ Word to stop throwing the diagrams of the schematic and simulation wherever it wanted after typing another word. And then when trying to convert a file written in my room (Word 97) to one for printing in the lab (Word 95), I spent the better part of another lab period just fixing what wasn't broken to begin with.

    --
    Three dits, four dits, two dits, dah!
    Radio, radio, rah rah rah!
    1. Re:Bloatware by Mr.+Frilly · · Score: 2

      Heh heh heh

      Two year ago, my two roommates and I were all finishing off our EE Master Theses. Two of us had gone with LaTeX. I went with LaTeX because, well, why would you ever want to write a thesis in anything else (especially when your school is cool enough to provide their own latex thesis style)? Roommate #1 went with LaTeX 'cause he had a PC at home, and a Mac in the lab, and had already seen the damage that the Word 97/Word 98 switcheroo could do.

      Roommate #2, however, choose to go with Word 97, which provided much amusement for the rest of us, as he spent the last three days before his thesis due date moving pictures and text around trying to get his thesis to look as good as those produced by LaTeX.

  153. This is beneath /. by Pinball+Wizard · · Score: 1
    Sorry, but you deserved to lose your document.

    Even the most ignorant newbie should know to save and back up their documents. That has got to be the most basic principle of using a computer.

    Regardless of the computer system, you must do this, or risk losing your work. This is not a Microsoft issue.

    Furthermore, Word has always allowed you to save in plain text. With Word 97, you could have also saved it as HTML. Both are documents supported by a wide range of vendors. Word 2000 will save your document in XML. A proprietary file type enables more precise formatting, but you don't have to use it. You could even change the default file type to text if you wanted to. Although if that was what you wanted, why are you using Word in the first place?

    Tough luck, remember to back up your work next time.

    --

    No, Thursday's out. How about never - is never good for you?

    1. Re:This is beneath /. by Pinball+Wizard · · Score: 1
      >> the fruit of weeks of hard labor, rendered into rubbish

      That was what threw me off. If it was just a copy that was destroyed, its hardly "weeks of hard labor rendered into rubbish"

      Still, its less of an issue than the author makes it. You could use text(defeats the purpose of Word, however) You could choose Save As Word 95. Finally, the person at the other end could have obtained a free Word 97 viewer. Bottom line, its up to the author to make sure the intended audience can read something. No one is forcing anyone to use Word.

      --

      No, Thursday's out. How about never - is never good for you?

  154. Re:KOffice uses XML! by GeZ117 · · Score: 1

    More precisely, KOffice will use a documented XML based format. It should be a tared archive with the XML files and other data files referenced, like images and other. And if you put the link into an <a href tag>, it would have been better.

    --
    sigmentation fault
  155. Re:Already Happened - Nah! by lbrlove · · Score: 1

    Another reply worked this out pretty well on a technical level, but how about the philosophical level?

    Microsoft would *never* commit their documents to an industry compliant DTD (or group of DTDs). They have made their place in this industry by de-commoditizing standards, not by promoting them. Look at Java, Visual Basic, etc. And certainly Microsoft has taught their competition to use the same tactics.

    Microsoft absolute dominates the office suite sector right now, and to cater to a standard simply does them no good whatsoever.

    -L

  156. Re:Tools for the job by somero · · Score: 1

    AMEN

  157. Drawback of XML by wsabstract · · Score: 1

    My last attempt at learning XML was some 6 months ago, when I tried by reading the top seller in its domain "XML: a primer". The thing was so dull (due to its subject, I suspect), I have yet to touch XML since. There's a reason why there are currently billions and billions are webpages on the net- ease of learning of the underlining technology- HTML. While I can appreciate the potential of XML and what it could do for the web, I just don't see it spreading like wild fire anytime soon.

    ---------------

    --

    ---------------
    JavaScript tutorials scripts
    1. Re:Drawback of XML by Abigail · · Score: 2
      An alternative could just dump down the articles and comments to your browser in XML format and then have those comments sorted/filtered/formatted quickly on the client side by XSLT, using either a server- or a user-supplied stylesheet, making Slashdot a much faster and more flexible applicaiton.

      Stylesheets have been part of HTML from its first standard, of spring 1994, before Netscape existed, Bill Gates was aware of the internet or before anyone talked about XML. Stylesheet capable HTML browsers were available 5 years ago. Stylesheets actually predate HTML - they come from the SGML world. It's actually quite old technology; it only has recently become a buzzword.

      -- Abigail

    2. Re:Drawback of XML by Pinball+Wizard · · Score: 2
      I don't see XML being nearly as useful for documents as it is for simply passing data between sites.

      For example, slashdot puts their headlines in a XML format, making it easy for other sites to create a "slashdot headline box", adding some interesting content. Another good use would be to have a search engine for your site that outputted XML. If you rely on comparison sites like mysimon to drive traffic to your store, having the data in XML rather than HTML can make the communication between sites far more reliable.

      To me, XML is about data, not documents. Also, I can recommend "XML Bible" by Elliot Rusty Harold or "Professional XML" by Wrox as a couple of interesting books if you ever decide to revisit the subject.

      --

      No, Thursday's out. How about never - is never good for you?

    3. Re:Drawback of XML by Abigail · · Score: 3
      So, are you saying that you could manage slashdot's presentation controls on the client side with HTML and CSS?

      Well, yes. With 1996 standards even. It might of course be hard to find a popular browser that is even remotely up to date, but you can't blame HTML or stylesheets for that. And XML isn't a magic wand that suddenly makes browser authors do something "advanced" instead of going for the mass market appeal.

      -- Abigail

  158. Re:Already Happened by roundclock · · Score: 1
    This is correct

    Think of it as a "markup language" for MS products. If you are using different MS products, such as Office 2000 apps and IE5, they should know what the document is and "render" it correctly.

    This means a person can go to a web page generated by, say MS Word 2000, and click on the edit button in the toolbar that has the word icon. They can edit it, and save it again back to the web as you said.

    Try viewing that document in another browser! Does anyone know what happens when you view that same document with Netscape6? I know all other browsers can't display it correctly. Even IE5 doeosn't always, so go figure.

  159. Re:Small-minded viewpoints by roundclock · · Score: 1

    Better yet, learn the fancy words for the highers ups, know the technology for the real people doing the real work. Then you have the best of both worlds!

  160. Re:An open question by roundclock · · Score: 1

    No, SATAN is used to find exploits in networks or something like that.

  161. Re:Hey genius... by roundclock · · Score: 1
    Okay, this point can be argued.

    Office 2000 has a format that is supposed to be compatible with Office 97 and Office 95. You say this is good!

    Run some tests, look at the file size difference in this format vs. a 2000 format.

    (Hint) Buy larger and more file server hard drives, and mail server hard drives, etc. The difference is huge, literally! (Especially if the documents have images_)

  162. Re:XML Standards - possible, but MS won't allow it by roundclock · · Score: 1
    I did. The company I worked for said they have different locations that need the same format. This format has to work with other companies. These other companies use the same format with other companies, so they use this format, etc.

    MS has most of companies desktops. They have most companies word processing applications installed. They now have most companies browsers installed. Now, start combining these together, like Office 2000 does, and what will happen?

    Quite simply, you will have to NOT USE their OS, Office suite, and browser to some extent.

    Then the arguement can be made that, the cost to fix these differences is more than a corporate license to use them, and then add the problems and confusion with other companies we do business with.

    Change has got to start somewhere.

  163. Re:Possible? Yes. Likely? No way. Here's why: by roundclock · · Score: 1

    HAH! That is funny. Is HAH! perfectly valid as well? Maybe the exclamation point is not?

  164. Re:Why not even html by roundclock · · Score: 1
    Seriously, a paper typed on a $15 type writer from ebay and a $200 Microsoft office suite, tell me, can you really tell the differance between them.

    Well, yes I can.

  165. Re:Why not even html by roundclock · · Score: 1
    I could say like others...

    I don't usually respond to trolls but...

    You're right I just did.

    XML allows companies and industries to make their own markup language that can be viewable by a standard application, the web browser.

    Arguements can be made to what a standard application is, and what browser you are using, but the point is, you can understand what the data is, where it came from, and how to use it. this is.

    This is much more than Next Big Hot Cool Thing That Everyone's Doing So You Should Too.

  166. Drawback: size by LoonXTall · · Score: 1

    ". . . what are the drawbacks?"

    Size, although that was a problem anyway. I hate how Word files have many lines of garbage across the top. A blank (!) Word 95 file is 4.5 KB (saved by WordPad). But if you use XML, instead of saving the information in a proprietary bytecode (0x0C for bold+italic?), it must be saved in a human-readable tag, which would be 6 bytes in HTML, and probably longer in XML (not abbreviated B and I.) In a huge document, that would mean a huge overhead.


    -- LoonXTall
    --

    ~~~LXT~~~
    Life is like a computer program: anything that can't happen, will.

  167. Not a panacea by StJefferson · · Score: 1
    While open DTDs should make data translation easier, there will still be competing data standards, issued by competing organizations, with competing priorities.

    Although it seems to have foundered in execution, Microsoft's BizTalk framework, with an emphasis on putting the DTDs out there for collaborative use, seems to be a step in the right direction.

    As to the original question, though... again, it's not likely that XML will resolve the difficulties -- everyone who takes a pass at a file format for a given type of data will have their own little spin on what should be in there...

  168. opacity and XML file formats by treedragon · · Score: 1
    All the following merely excerpts my 19April2000 rant on opacity in proprietary formats, and the effects of potential XML usage by Microsoft.

    [quote] The context for this page is best stated in my initial brief correspondence with Dan Gillmor. I previously described this page as an essay about ensnaring open systems with trace amounts of proprietary content, or about making systems so complex that only large teams can easily accomplish productive development. [snip]

    • All open formats which support general embedding can embed cryptic closed content as easily as transparent open content. This includes text formats like XML as well as binary formats like IronDoc.
    • Using an open format does not guarantee openness. Only a commitment to avoid closed content ensures openness. So using XML does not prevent embedding of cryptic proprietary content inside open standards.
    • Similarly, all open source projects can be used to write mysteriously complex systems as easily as lucidly simple systems. This includes proprietary systems that become open source as well as historically open systems.
    • Neither using open source nor converting to open source guarantees an operating system will lower barriers to entry for competitive development. Only a commitment to avoid complex and ever-changing systems will work. So opening source code does not prevent ongoing propietary control.
    [/quote]
    --
    Values have meaning only against the context of a set of relationships.
  169. Be careful when throwing around the word ignorance by Ian+Wolf · · Score: 1

    You are correct in that Microsoft is not completely to blame for getting "us" into this situation. In fact, Corel and Lotus didn't help much either. However, Microsoft eventually destroyed the product offerings of their rivals and established an "office document monopoly", rather than working with its rivals to establish a document standard, as any "responsible" business would have done.

    However, you are obviously missing the point of this discussion. Many customers/users are completely unaware of their alternatives. A common document format helps everyone, not just the MS Office, Star Office, AbiWord, GO, or WordPerfect users. It gives people all over the world the ability to share information as they intended regardless of the application they use.

    I'm sorry you can't get past the Windows/Linux posturing and realize the merits of a common document format.

    --
    "The words of the prophets are written on the Slashdot walls."
  170. Re:Already Happened by stephenbooth · · Score: 1

    I just want to add before everyone blasts MS for half-assing XML....XML wasn't even finalized when Office2K was in development. Not much you can do about that...

    /. is irrelevant.

    Microsoft are part of the group of companies who are building the XML standard, they should know what the standard is. If they can't/won't wait for the final standard, or at least comply to the transitional one then they should be critisises for it.

    /. is cool.

    Alyson Hannigan is beautiful.

    --
    "Don't write down to your readers, the only people less intelligent than you can't read" - Sign on Newspaper Office Wall
  171. Tools for the job by stephenbooth · · Score: 1

    The point of XML (as I understand it) is not to replace HTML as a method of presenting content in web browsers. It is a structured way of storing and transfering content in a standardised way which can then have styles applied to it so that it can be presented in the best possible format for the end users agent.

    An HTML page which looks good in a browser running on a PC won't necessarily look good on a WAP phone screen, printed page or a WebTV, it may not be suitable for an audio browser. If the content is stored in XML you can create different style sheets (XSL) to handle each user agents quirks and foibles (or possibly use presupplied stylesheets if your XML complies with the DTDs the stylesheets were written in reference to) and present your content in a manner that the end user can see it in the best possible format.

    Another usage is the transfer of data between companies. For example a product catalogue could be sent out in XML which the customer views (via an XSL style sheet) in their browser and selects the products they desire. The ordering system uses the same XML catalogue to process the order and create a stock pick order to the warehouse which in turn triggers the generation of an invoice and request for payment (probably also in XML) which can then be processed by the finance and credit control systems of the supplier and maybe even the customer.

    As far as the average Buffy fan in the street is concerned in producing their home page all they are probably going to need to know is the changes from HTML 4.0 to XHTML 1.0 like having to close tags and be a bit more disiplined in making their code properly formed.

    /. is great.

    Alyson Hannigan is a wonderful, talented actress.

    --
    "Don't write down to your readers, the only people less intelligent than you can't read" - Sign on Newspaper Office Wall
  172. Maybe it is time for a summit by Golias · · Score: 1
    It seems to me that what we really need is for all of the major software vendors (sans M$) to assemble at Camp David or somewhere with the aid of IEEE and hash out an agreed-upon document standard.

    Start by trademarking a catchy name for it, like "eWord". Choose a single set of rules for formatting documents. Each software vendor can create their own fancy tools and macros for creating this content, but only those that can be read on all competing systems will be allowed to call their product "eWord Compliant".

    Agree to meet every six months to update the standard. All updates to the standard should include a beta period so everybody can write patches for their prior software versions. Companies that do not keep their older versions compliant for at least three years lose the right to brand their current products as "eWord Compliant".

    Promote the hell out of the standard, like the way Intel ads used to be appended at the end of all PC commercials. For example, Corel's WP ads would include the blurb, "...of course Word Perfect is 100% eWord Compatable. Don't buy any text editor that is not eWord Compatable, or your customers might not be able to read your documents!" Star Office ads would say, "...not only does StarOffice read Microsoft formats, but it is also eWord Compatable, assuring your of seamless interoperability with your business partners." For one, people could spread FUD against M$ instead of for it: "Yea, Word2004 is pretty neat, but it's not eWord compliant, so I can't really reccomend it if you expect to interact with other companies."

    Heavy marketing, along with GPL'd translators between the new standard and M$ formats, would quickly change people's perceptions.

    Perhaps the same should be done for spreadsheets, day planners, etc.

    If I had the power and influence to get something like this of the ground, then that's pretty much how I would handle it.

    --

    Information wants to be anthropomorphized.

  173. Re:Absolutely! by Sawbones · · Score: 1

    To an extent maybe.

    There are some many minute adjustments a lot of print media require I doubt a single DTD could handle every possible scenerio. Yes XML would work wonderful for normal documents/manuscripts/memos whatever, but I seriously doubt you'd want to use it for anything with any moderate amount of layout control. PDF/framemaker will reign supreme there for a bit longer me thinks.

    --

    Ad in classifieds: Pandora's Box (no box) $5
  174. DocBookX and CVS by Tobias_Ratschiller · · Score: 1
    I can completely understand that Word gave you nightmares. I'd never want use Word for any longer professional text.

    That's one reason why I love XML. Using the DocbookX-DTD, you've all markup you need for technical documentation at your disposal. Combine this with CVS for version control, and you've a great base technology. Now use XMetaL (from Softquad, Windows only), and your authoring environment is complete with a graphical editor. For me, this has proved to be a dream team, and I've written a complete book and many articles and tutorials with this setup.

    The XML source can then of course be converted to HTML (integrated in XMetaL, using James Clark's XSL transformator XT, and Norman Walsh' XSL stylesheets for DocBook), or to RDF, Postscript, Plaintext, PDF, etc, using Jade and DSSSL. Has worked fine during all stages of the authoring process, and I'd definetely recommend this if you can get your publisher/client to adopt it.

  175. Re:Ever hear of OpenDoc? by WebBug · · Score: 1

    The point isn't that an SQL implimentation be only 100% ISO 92, but that it at least support 100% ISO compliance.

    Extension are nice additions that add extra function and improve ease of use. We like extensions because they are what push the standard and lead us towards ISO2000.

    When certain companies add extension, they have an execrable tendency to remove base compliance. MS SQL server does not support the base ISO 92 standard, nor does Oracle for that matter. Few companies do support the standard 100% but a notable few are DB/2 and mSQL.

    However, the open doc standard was designed in an attempt to create a standard for document interchange that was not proprietory to one company. Like Java, there was a real chance that the lives of users could have been simplified greatly.

    I support any standard that attempts to make computer use more tranparent for the average person. After all, isn't that what all of use developers want anyway? More happy users?

    --
    Later . . . . . . WebBug // I don't really have 8 arms but . . .
  176. Ever hear of OpenDoc? by WebBug · · Score: 1

    Your real question is not how stupid you could be suppling a Word97 doc to a Word95 office but rather will there ever be a universal data format?

    The short answer is NO. It is not in the interests of Microsoft to support any standard that does directly lead to more money in their pockets. You can bet that their support of any standard will include massive proprietary changes.

    SQL in MSQL is not SQL as written by ISO.
    Basic in Access Basic bears little or no resemblance to Basic.
    Java on microsoft isn't.
    FrontPage uses Exploder proprietary mark up tags that have nothing to do with HTML.

    If you have to pay money for it, Microsoft will support it. If it prevents Microsoft from forcing you to buy a new version of Word every year, you can bet Microsoft won't support it. DOJ or not!

    --
    Later . . . . . . WebBug // I don't really have 8 arms but . . .
  177. Re:XML does everything - whatever. by chompz · · Score: 1
    Anyone remember all the slaughter of good software design movements microsoft has made? About 20 years ago it was learned that having distinct stratification of complexity in the OS and software was best. Have a solid low-level engine which doesn't really do much, just deal with hardware (or the OS in user-space software) and then build the complexity on top.

    Microsoft's approaches are kinda like, well, mcgyver. Duct tape it and if it holds, leave it alone. In the end, this duct taping procedure will catch up (blue screens anyone) and when it does, the work involved in fixing it for real is astronomical. Ever wonder why everything is so messed up on the windows platform? In order to increase the rate of production, they cut as many corners as possible and hope it works.

    The point? Microsoft products use propriety document formats for two reasons, its better for their pockets, and it allows them to produce new versions at a faster rate. Need one new feature in word, just add one rule to the word document format. The problem? Not a soul understands it because it is so hacked together. Coming up with a entirely new document format would lose all of the complexity they have developed one hack at a time and would cause a severe lapse in the release cycle for word/excel/office/whatever. It seems microsoft trys to have entirely new releases for all thier products every few years, or as often as possible in some cases.

    so is that why win98 suffers from nearly all of the bugs of win95? Yup, they were never fixed, even though MS knew they existed. Its too much work to rewrite a troublesome section of code, just leave it and hope that catching some error case is enough, even though the problem lies elsewhere.

    Serves them right to get busted for screwing over thier customers.

    Never understimate the difference it makes when a piece of software is re-engineerd for its current feature set instead of just patching it along.

    --
    Spring is here. Don't believe me, look outside!
  178. XML?? by beefarino · · Score: 1

    Do we really need to invent another text markup language using xml? What about HTML? it's about as standard as you're gonna get. Plus I can post my documents on the web, I can send them as emails. There are already a large number of high quality WYSIWYGs out there for html, plus a most of the tech community is pretty fluent in html I imagine. Why reinvent the wheel... Beefarino

  179. Re:Already Happened by MelloDawg · · Score: 1

    I just want to add before everyone blasts MS for half-assing XML....XML wasn't even finalized when Office2K was in development. Not much you can do about that...

    --
    /. is irrelevant.
  180. Re:XML does everything - whatever. by Tom+Bradford · · Score: 1

    From the very first paragraph at www.w3c.org/xml:
    "The Extensible Markup Language (XML) is the universal format for structured documents and data on the Web."

    So... It's a 'format for structured data'. How is my saying "Data Representation Format" any different than that? I'll argue that it's not different at all, and that my only fault is not using "Format for Structured Data" instead of "Data Representation Format." from the very beginning.

    So I guess your real argument is with the W3C. Also, issue 4 of "XML In 10 Points" states "XML is a family of technologies", and that is how I was presenting it.

    So Thank you.

  181. Re:XML does everything - whatever. by Tom+Bradford · · Score: 1

    enables you to add markup to any document

    This is inaccurate. Adding XML to a Word Document, for example, would be of no value to an XML parser/validator. XML collectively IS the document, and isn't simply part of the document.

    And while we're on the subject of documents, most people who deal with XML in the realm of data feeds, which is probably where you'll find the bulk of XML applications outside of the web, would probably argue with calling XML a document, because they're using it to represent records, which on their own do not constitute a document. Data does not equal document, and the W3c was very careful to word their definition of XML in that way.

  182. Re:XML does everything - whatever. by Tom+Bradford · · Score: 1

    Well, my apologies for being summarily vague about XML technology in general. I suppose I just underestimated the necessity to explain clearly what XML is. Next time, I'll be sure to post a disclaimer of some sort, describing what XML is and drawing clear lines between the various facets of the technology. In this way, I will be less easily a target for trolls who are quick to point out even unrecognizable inaccuracies in order to compensate for their lack of experience or status in life.

    Tom Bradford (CTO) The dbXML Group

  183. RTF is sub-optimal by boback · · Score: 1

    It's really generous to say that MS programmers understand RTF. Having spent years working in the belly of the beast, using tools that attempted to convert RTF to other formats, I can confirm that it is really difficult -- if not impossible -- to get a mapping that works, even when the guy who supposedly REALLY knows is just down the hall. (As a matter of practice, not only does he not want to be bothered about work he did last year, he doesn't want to be bothered by small fry like putative customers, internal or external, period.) The problem with RTF, as another poster notes, is that it specifies text format, and doesn't have to be concerned with niceties like the end of some group of items. Getting a parser (other than Word) to identify the last bullet in a bulleted list, the last number in a numbered list and the like is difficult: The endings are ambiguous. A fundamental question here is: Does document structure matter? I think almost anyone in the document business would give a resounding yes to that question. And most probably would say that you're a fool to even ask that question, but I say sometimes it's good to get back to first principles. For the reader, document structure is usually signaled by text appearance. (I say usually, because some of you will have had to work with MilSpec docs, where text formatting is at a minimum.) It is really easier to apply appearance specifications to structure (e.g., SGML, XML) than it is to divine an intended structure from a set of appearance specifications (e.g., RTF). That being the case, it is easier to convert from one (structured) markup language to another, or from one DTD to another, or from a structured markup language with appearance specifications to a set of appearance specifications, and get a similar looking result -- a result where the format signals the structure to the reader -- than it is to convert from appearance specifications to structured markup. So what about conversions? Well, my history in this business tells me that they will go on from now til evermore. In a distant universe not so long ago, I spent a good deal of time writing a parser to convert Ventura Publisher (what's that?) to MIF. Now is my work being converted from MIF to XML? Could well be. A lot of the conversions -- especially from Word to other formats -- are going to be difficult, time-consuming and ugly. Better to begin with a standard that makes conversions as easy as possible, if you have the choice. And it's not RTF.

    1. Re:RTF is sub-optimal by boback · · Score: 1
      Let's try this again, with structure.

      It's really generous to say that MS programmers understand RTF. Having spent years working in the belly of the beast, using tools that attempted to convert RTF to other formats, I can confirm that it is really difficult -- if not impossible -- to get a mapping that works, even when the guy who supposedly REALLY knows is just down the hall. (As a matter of practice, not only does he not want to be bothered about work he did last year, he doesn't want to be bothered by small fry like putative customers, internal or external, period.)

      The problem with RTF, as another poster notes, is that it specifies text format, and doesn't have to be concerned with niceties like the end of some group of items. Getting a parser (other than Word) to identify the last bullet in a bulleted list, the last number in a numbered list and the like is difficult: The endings are ambiguous.

      A fundamental question here is: Does document structure matter? I think almost anyone in the document business would give a resounding yes to that question. And most probably would say that you're a fool to even ask that question, but I say sometimes it's good to get back to first principles. For the reader, document structure is usually signaled by text appearance. (I say usually, because some of you will have had to work with MilSpec docs, where text formatting is at a minimum.)

      It is really easier to apply appearance specifications to structure (e.g., SGML, XML) than it is to divine an intended structure from a set of appearance specifications (e.g., RTF). That being the case, it is easier to convert from one (structured) markup language to another, or from one DTD to another, or from a structured markup language with appearance specifications to a set of appearance specifications, and get a similar looking result -- a result where the format signals the structure to the reader -- than it is to convert from appearance specifications to structured markup.

      So what about conversions? Well, my history in this business tells me that they will go on from now til evermore. In a distant universe not so long ago, I spent a good deal of time writing a parser to convert Ventura Publisher (what's that?) to MIF. Now is my work being converted from MIF to XML? Could well be. A lot of the conversions -- especially from Word to other formats -- are going to be difficult, time-consuming and ugly.

      Better to begin with a standard that makes conversions as easy as possible, if you have the choice. And it's not RTF.

  184. Can XML Replace Proprietary Document Formats? by Arnaud+Sahuguet · · Score: 1

    There are some nice smart conversion tools available on the Web.
    Try
    http://wheel.compose.cs.cmu.edu:8001/cgi-bin/bro wse/objweb

    And the answer to the question is yes. Just be patient.

    Regards.

  185. why restricting yourself with xml? by ddec · · Score: 1

    LaTeX is so powerful and neat!!
    lyx(klyx) will help you (www.lyx.org)
    moreover you can generate several kind of output

    o dvi then generate plain text or postscript
    o html using latex2html or hyperlatex
    and there is work for an XML converter
    o Pdf Oh yeah this is definitely my favorite
    use pdflatex (hyperlink bookmarks ....)
    handles pdf image, jpeg, png ...

    And the only things you need are

    xemacs+AucTeX+refTeX+x-symbol+TeTeX

    therefore choose debian as a linux distribution.

  186. Re:An open question by Anonymous Coward · · Score: 2
    The question is: Why do software consumers tolerate this?

    1. Because they don't know any better.
    2. Because people, as a general rule, resist change, even when it's for the better.

    The first point cannot be addressed by marketing or advocacy alone. MS has a hold over the market that is incredible, both in terms of penetration and mindshare. Every business executive has heard of MS. How many have heard of StarOffice? Or know that WordPerfect is still around? How many know what XML is, let alone that it exists?

    The second point is basic human psychology. We're comfortable with predictable patterns, with habits, with the devil that we know.

    Why do they blindly pay for new versions every few years when their current versions do everything they need and more?

    Marketing and habit. Marketing because everybody recognizes MS Office and dancing paperclips, not StarOffice or XML. Habit because everybody has been trained to believe that the latest version must be the best because it's the most recent. (Argumentum ad novum? I forget which logical fallacy it is.)

    People are afraid of change, to some extent or another.

    I wonder what we can do about that. Certainly crying and moaning about MS won't stop it. It's time to do something besides whine.

    www.alarmist.org

  187. Re:Nope by X · · Score: 2

    XML is a simplified version of SGML. Both of them are more than just parsers. You forget about the benefit of a DTD. By using SGML/XML and an appropriate DTD, you can ensure that document structure is not lost. XML in particular is great for handling tags that for whatever reason aren't even defined in a DTD.

    This has been a huge win for people using SGML for quite some time.

    --
    sigs are a waste of space
  188. WP formats by jd · · Score: 2
    Been tried, never worked too well.

    TeX/LaTeX is a system which supports macros, total system independence (both as source and compiled), has a wide range of fonts and has a degree of respect in the publishing industry.

    But it has ONE WYSIWYG editor with any popularity - Klyx - and is utterly unsupported by any common word-processor on the market today.

    Then there's RTF. Supprted (some) on the PC, but keeps changing. The format's too unstable and too primitive to be usable, on any real scale.

    ASCII, itself, is supposedly a standard for information interchange (hence the acronym). But it doesn't have the range to be useful for WP and DTP any more. Wide ASCII has the range, but isn't widespread and is still far and away too limited.

    What's needed is a standard that will take the market by storm. It's no use simply being good, or even "the best, so far". That doesn't shift users, or software houses. What's needed is something so outrageous, so crazy, that it captures mindshare by force. Unfortunately, the only people who come up with ideas like that are all in sales, and the idea is a technological disaster, all round.

    --
    It's a small world and it smells funny; I'd buy another if it wasn't for the money; Take back what I paid (SoM)
  189. Re:Learning XML, XHTML by stevelinton · · Score: 2

    The thing to understand is that, as someone has said, XML is not, in itself, a representation format for anything. Instead it (and its cousins XSL etc.) are a framework for representation formats.

    XML fixes a low-level syntax for identifying what is a tag, which is the matching close tag, etc. (like HTML without the meanings for the tags, and with some obvious (in hindsight) stupidities removed. By actually defining meanings for the tags, and specifying what tags are allowed in what contexts, you can then construct a representation format for some type of data.

    So what is the point, I hear you cry! Well the point is that

    1) The XML rules encourage you to be sensible in defining your format

    2) Applications handling different XML-based formats can share large chunks of parser code

    3) The common underpinning makes it much easier to
    work with hybrid documents that include data in multiple XML-based representation formats

    4) Some limited processing and checking can be done just with the raw XML and perhaps the formal
    format specification (the DTD).

    Steve

  190. Re:An open question by Ed+Avis · · Score: 2
    Gosh, now I wonder what kind of certain source would generate PostScript that was so broken that a simple filter would be unable to do a 2-up transformation on it.

    'Broken'? You are missing the point. PostScript is intended as a programming language for telling the printer where to put ink on the page; if it does that then it is not 'broken'. A tool which formats PS files as 2up might make certain assumptions about the format of the file, but if those assumptions turn out to be wrong then the tool is broken, not the PS file.

    To make an analogy: suppose you ran the PostScript through a 'grep' program, which dumped core because it couldn't handle lines longer than 80 characters.

    --
    -- Ed Avis ed@membled.com
  191. Re:Biased rant from co-inventor of XML by Ed+Avis · · Score: 2
    Office2K will already save docs in a kind of bastardized HTML++ format which truly sucks because it is neither rules-following HTML nor well-formed XML

    You can use HTML Tidy to correct the HTML output by Word.

    --
    -- Ed Avis ed@membled.com
  192. Re:Good Idea but counter gratuitous complexity by Detritus · · Score: 2

    The Microsoft Word file format has been documented. An HTML version can be found here. The problem is that it is complicated, ugly, and dependent on OLE 2.0.

    --
    Mea navis aericumbens anguillis abundat
  193. Re:Already Happened by IntlHarvester · · Score: 2

    It's not really XML based... More like HTML with some XML-like stuff sprinkled in.

    Byte has a short description of the Word format that you might want to look at.

    I've looked a little at the Excel format. Once thing that seems clear it that the O2000 formats are almost human readable. It shouldn't be that difficult for someone to whip a converter -- well, it should be easier than parsing the binary formats.
    --

    --
    Business. Numbers. Money. People. Computer World.
  194. CSS3 by IntlHarvester · · Score: 2

    Presumably you could handle most of the document formatting via vanilla XHTML and CSS.

    However, one problem is that there currently isn't a sufficient standard to support printed output, along with things like margins, page numbering, headers/footers, foot/end notes, and so on.

    An alert slashdotter pointed this out to me just the other day - the proposed CSS3 Page Media Properties spec addresses most of these issues. However, it's not done yet, and has not been implemented anywhere that I know of. So, it might be a year or more before we have a truly open format that could be used for word processing programs.
    --

    --
    Business. Numbers. Money. People. Computer World.
  195. Re:Why not even html by Graymalkin · · Score: 2

    It's called size you dude. For every HTML tag on a page you need at least 7 bytes, you have two ', one / and at least two letters. In a binary document format you have only a single byte to specify a text format and then one to end it, two bytes is alot better than a minimum of 7. If you were typing up a really neat looking many-page document that you need to send over a network, every byte you save on size is less monet it costs to transmit the document. For one person on a fat pipe it isn't that big a deal but with lots of people the pipe gets alot smaller. Word doesn't exactly save space but a well designed binary format would save alot of space.

    --
    I'm a loner Dottie, a Rebel.
  196. KOffice uses XML! by pointwood · · Score: 2

    The upcomming KOffice (http://koffice.kde.org/) which is going to be released together with KDE 2.0 sometime this summer, is using XML as documentformat!

  197. MSFT sometimes "forgets" that feature by Zach+Frey · · Score: 2

    Here is your answer...File...save As...Word 95...end of question.

    If only it were that simple.

    Today, that shouldn't be a problem, but when Office97 first came out, Word95 was not an available "Save As" format. So, Word97 could read Word95 files, but couldn't output them. This "oversight" was fixed in the first Office97 service pack, but still ...

  198. Re:An open question by King+Babar · · Score: 2
    No, SATAN is used to find exploits in networks or something like that.

    Hmm...I think you're right. There's even an O'Reilly book about this if memory serves.

    So I guess it must be really be Microsoft, then. :-)

    --

    Babar

  199. Re:Already Happened by King+Babar · · Score: 2
    Some information is also stored in the HTML file in XML complient tags to help the source app to provide it with further information about the original source document -- to make it appear seamless.

    Well, except if the Byte article is accurate about the fact that the "XML islands" aren't really quite XML either. :-(

    FrontPage strips these XML tags out of the HTML files and breaks round-tripping.

    Good God.

    This is so absurd, it...it has to be an accident. I mean, why in the world would FrontPage want to screw around with comments of any kind? Please tell me that it doesn't screw over script tags within comments, for example.

    I mean, that's almost as lame as patenting style sheets out from under the W3C, right?

    --

    Babar

  200. Re:An open question by King+Babar · · Score: 2
    Tools that take PostScript as input tend to be fairly fragile if they're trying to do anything beyond just rendering the document. "2up" converters often fail on PostScript generated from certain sources.

    Gosh, now I wonder what kind of certain source would generate PostScript that was so broken that a simple filter would be unable to do a 2-up transformation on it. Could it be the same outfit that can't make it's word processor use the same format in consecutive versions? The same outfit that gave us the gratuitously extended character set known as Windows-1252? The same company that was a guiding force in W3C stylesheet discusssions and then tried to patent the use of stylesheets? The same people who now claim to be on the XML bandwagon except that they fall off every half-mile when their stuff still doesn't parse as anything?

    Could it be... SATAN?

    --

    Babar

  201. Re:An open question by King+Babar · · Score: 2
    Gosh, now I wonder what kind of certain source would generate PostScript that was so broken that a simple filter would be unable to do a 2-up transformation on it.

    'Broken'? You are missing the point. PostScript is intended as a programming language for telling the printer where to put ink on the page; if it does that then it is not 'broken'. A tool which formats PS files as 2up might make certain assumptions about the format of the file, but if those assumptions turn out to be wrong then the tool is broken, not the PS file.

    Baloney. On at least two counts.

    Postscript was designed as a page description language that was purposely abstracted away from the act of putting ink or toner (or photons) on the "page". The appropriate use of postscript should allow a wide range of transformations on the graphics or pages so described; this is the whole beauty of the concept. Why on earth would you bother describing fonts as outlines if you weren't going to do interesting transformations on them?

    The second issue here is that Adobe has, for several years, documented a standard for the overall structure of Postscript documents that allows utuilities like psnup to do all kinds of cool and useful things with postscript files. But to get the goodies, you have to follow the (fairly trivial) guidelines. (In brief: you have to have BoundingBoxes specified and use the special %%Page* comments correctly.) Postscript produced by many companies works great with things like psutils. But not Microsoft. And this isn't anything new; this crap has been going down for years. I'm willing to accept the possibility that it's just a stupid bug that nobody in Redmond wants to fix. But just because they don't fix it doesn't mean it ain't broken.

    To make an analogy: suppose you ran the PostScript through a 'grep' program, which dumped core because it couldn't handle lines longer than 80 characters.

    Uh, and your point? Really, I must be missing something interesting about your world. In my world, if a program dumps core, the program is broken. Really, there's no grey zone here. Crashing == Broken. No matter what you think the proper role of PostScript is.

    --

    Babar

  202. Forward compatibility by Kaa · · Score: 2

    Let me tell you, it is painful watching a 3,000+ page Word97 manuscript, the fruit of weeks of hard labor, rendered into rubbish by my customer's Word95.

    Well, if your customer runs Word95, shouldn't you have checked this before spending all these weeks in Word97? Especially since you clearly spend some time rehashing in what format exactly should the document have been presented.

    And I can't really see this a Word failing, too. You are asking for forward compatability -- kinda hard to realize. It's unreasonable to expect a piece of software be able to read file formats from its future versions (unless its plain-vanilla or tagged text).

    Kaa

    --

    Kaa
    Kaa's Law: In any sufficiently large group of people most are idiots.
  203. Re:Hey genius... by Merk · · Score: 2

    Now how did this get moderated up??

    This guy is working in Word 97. Maybe he upgraded because he wanted new features, maybe he was forced to upgrade whatever. Now he works in Word, wants to save a file, the native format is now Word 97, and being Microsoft they make sure that Word 97 files are not backwards compatible with Word 95. You can "save as Word 95" but that actually converts it to Word 95. Any time you convert something it approximates how to do the thing you want in the new format. If you've spent lots of time positioning things, and making it look good, it will now most likely look like crap.

    The AskSlashdot question was "Can XML Replace Proprietary Document Formats?", the problem with Word was the reason for the question. But hey, if Word works fine for you, feel free to keep on using it, dumbass.

  204. Tools, and Data vs. Presentation by dmorin · · Score: 2
    I think an important issue here is that XML's strength lies in the abstraction of data from presentation. I would hate to see an XML file that says &lt bold &gt Bold stuff &lt /bold &gt. That's what XSL is for! XML is for saying &lt title &gt This is my title &lt /title &gt and then having XSL say "title==bold, Helvetica, 12pt, blinking." So instead of having a generic word processor DTD, you need DTDs for "business letter", or "press release", stuff like that. You won't get everything broken down perfectly, but getting some of the structure specified ((book)(titlepage)(author)...(contents)(chapter).. .) is significantly better than nothing.

    Now, having said that, tools become the next major piece. There is only 1 HTML -- but there are as many XMLs as there are DTDs. This is very intimidating. Nobody wants to write XML tags directly. They expect a tool to do it. Therefore, if you want to have your news department crank out press releases in XML format, you're going to need to supply them with a tool that specifies press releases in XML format. That means telling them where to type the title, the date, the byline, and so on....NOT giving them the same old word processor. And they're not going to like it.

    I'm dealing with this problem right now at work. We want all the departments to start sharing content. I convinced them that the first step in doing that is to get rid of the HTML and Word formats, and store things in raw XML, and then everybody can pick and choose what they want and slap their own look and feel on it. They all agreed this was a wonderful idea. Then somebody pointed out that they'd have to start creating their content using this new format, and they said "Oh..uh...that sounds like alot of work.....no."

    -d

    (P.S. - And why the hell doesn't plain text formatted messaging work?? Do you know how much of a royal pain it is to talk about XML without being able to use angle brackets?!)

  205. Re:An open question by plunge · · Score: 2

    This only true for really poorly thought out format in the first place. It's like database design- you have to think about how additions and deletions will affect the overall structure, and make it efficient enough so that changes wont cause anomolies in the future. Word formats are so poorly done be they take so much for granted- they are written with a certain set of features in mind with little correct room for exapansion. Good format programing, on the other hand, tries to think of features in the abstract- builds functions that are reuseable and easily combinable- in short making everything modular. This is one fo the ways you can tell a good programer from a lousy one- do they get the job done exactly compentantly (microsoft word) or do they get the job done RIGHT (something like XML, which is very easily extendable).

  206. Content-free language by David+A.+Madore · · Score: 2

    I must say I really don't understand the hype about XML. It's certainly a progress wrt SGML, because it's free (in the sense that the standard is free, as in speech, by the W3 Consortium, whereas SGML is an ISO standard). But technically I find its usefulness questionable. It is a completely content-free language, for one thing. Not that this is a major defect, but it's certainly not something to be enthusiastic about. And despite its claim to simplicity, it seems like we still don't have one free (as in speech), validating XML parser library that also doesn't fsck everything up in its handing of Unicode (last time I checked, the libxml from the W3 Consortium was completely broken in this respect and expat was hardly better; even the SGMLtools don't really validate XML, since they only validate it as SGML).

    All right, let's admit it's a general-purpose, content-free, easy to parse markup language. And so what? Didn't LISP sexps exist long before this? They are exactly that, and they're far simpler than XML. I still don't see it.

  207. Re:Reinventing TeX by David+A.+Madore · · Score: 2

    TeX is not a format, it's a language. Or, if you want, it is a format that has only one implementation and that is defined by it, and that's bad. Furthermore, TeX is a very old standard and it's quite painfully appearent. Try to write a context-free grammar for TeX, for example: you can't (and Knuth deserves serious blame for this, since he is the one who suggested the name "Backus-Naur form"). TeX is not semantical, it's presentational: another bad point (LaTeX does not have that disadvantage, at least). TeX is Turing-complete, which for a description language is a bad thing (for an input language it's good, obviously). Last but not least, TeX does not support Unicode (the specially modified version of TeX which does, Omega, is a progress, but I don't think it's still good enough).

    Sorry, no cigar. There are plenty of reasons to prefer anythingML over TeX.

  208. DocBook by Hard_Code · · Score: 2

    Um, there already is a standard SGML/XML documentation markup language: DocBook

    http://www.oreilly.com/catalog/docbook/chapter/b ook/docbook.html
    http://www.oasis-open.org/docbook/
    http://www.docbook.org/

    --

    It's 10 PM. Do you know if you're un-American?
  209. Seeking XML alternatives? by Tackhead · · Score: 2
    IMHO, yes, XML can replace proprietary binary formats, but only insofar as the authors of editing software are willing to release not only XML but the DTDs as well.

    As long as the DTDs remain locked up in the software, you're fscked.

    I'm presently quite happy with Framemaker (proprietariness be damned, at least I can count on a Frame doc written under Solaris to render the same way on NT or Mac, whereas with M$-Turd, I can't even depend on the same goddamn file to page-break the same way on two NT boxen!) to generate PDF and WebWorks Publisher for batch HTML conversion, but am becoming increasingly open to alternatives.

    Fsck Microslut's half-baked excuse for XML. They're not interested in anything more than lining their own pockets and reinforcing the Orifice monopoly. Interoperability is not in their vocabulary. Scalability never was. The only good use for M$Turd is for writing one-page memos. (If you're a technical writer, I point out that at least this is long enough to write a letter of acceptance for a new job, and a letter of resignation to any manager dumb enough to use "but Office is the corporate standard, and we've already paid for it" as an excuse to take away professional authoring tools.

    Rant off. Where the hell was I going with this? OH yeah...

    I may soon have the budget for a pure XML solution, does anyone know anything about ArborText? Looks bloody promising, and appears to offer easy integration with the DocBook DTD as a sweet bonus.

  210. Re:Absolutely! by mhatle · · Score: 2

    In my understanding of XML, DTDs are in adequate. Unfortunatly all a DTD does is provide a grammer (think of a compiler [yacc]) that can parse the document and tell you if it's legal or not.

    What is really needed is a schema which indicates data types, and much more information.

    The point of XML is to "document" a proprietary format in a non-proprietary way. There is very little specified in the XML standard other then how to tag something. The tags are up to the vendor, and the hope is that the tags are human readable (and understandable).

    --Mark

  211. Re:Biased rant from co-inventor of XML by Spyky · · Score: 2

    First off, while there's a place for MS Word, a 3000-page document ain't it. In my experience it tends to severe breakage in this situation.

    Amen to that. I wouldn't trust Word for anything much over 20+ pages.

    I know several other people have mentioned it, but LaTeX is seriously the way to go. I know it has somewhat of a learning curve, and there aren't any really good/stable gui front ends for it. But it works. It doesn't eat your data, and it makes elegant formatting easy, and as long as you use your definitions properly, its very easy to change entire document formats with ease. Added benefits are available dvi->HTML converters, so you can make your document available in a easy to read, web-accessible format, and also have a hard copy with nice postscript fonts.

    I used LaTeX to document a year long project for a CS course that I worked on. It took all of a week to learn everything I needed to use and become very proficient with the system. I have nothing but good things to say about it.

    Also several very large and respectable publishing companies (Addison-Wesley for one) use LaTeX almost exclusively for their typesetting. In fact Addison-Wesley's "Latex users guide and reference manual" is a simple resource for LaTeX. Trust me, its worth learning a system that works NOW. Sure XML has some benefits, and hopefully we'll see some systems that really take advantage of XML formatting, but for now, there just isn't much out there. Trust your 3000+ page documents to a system thats been in use since the '80s, you can't go wrong.

    Oh, and the added benefit, its free :-)

    Spyky

  212. Re:Absolutely! by Abigail · · Score: 2
    If the word processor "industry" were to get together to support a single DTD (Document Type Definition) so that everyone would know how to react to specifict tags then you could have a format that any WYSWIG editor would render correctly.

    That would of course be silly and pointless. That's like saying "ah, now that we have lex and yacc, let's hope there will only be one programming language, supported by all compilers".

    One DTD to do it all will lead to bloatware. I don't think anyone is waiting for that.

    -- Abigail

  213. Re:Yes and No by Abigail · · Score: 2
    XML is not a formatting language: it's a content marking language.

    No, it's not, for the same reason BNF isn't a programming language. XML is a way of formalizing content marking languages. XML is a meta-language.

    -- Abigail

  214. The value of human/script readable data by dsplat · · Score: 2

    Just this afternoon, I generated an HTML document with some on-the-fly keyboard macros in Emacs. I needed to produce a simple table that corrolated the error codes our software received from a server with the error codes we returned to the user. I could have cut and pasted that by hand, but what I did is almost certainly complete and correct. I wouldn't be confident of that if I had pointed and clicked for a couple of hours. Besides, the whole thing took less time this way.

    The assumption that you will use a particular application to manipulate data is a poor one. It limits you to the capabilities that it provides. Word processors generally provide limited or non-existent scripting capabilities. So when you want to automatically generate tables in a document from some other files, you are stuck doing it manually. That is a recipe for documentation that is out-of-date and full of errors.

    --
    The net will not be what we demand, but what we make it. Build it well.
  215. Reinventing TeX by dsplat · · Score: 2
    When I was first introduced to SGML a decade ago, I remember appreciating it's merits, but asking what it offered that TeX didn't. Yes, HTML offers us links, which TeX didn't. But I've watched people discover the reasons that drove me to start using TeX for documents with long lifetimes or automatically generated content. It's format is:

    • human-readable
    • portable
    • fully documented
    • consistent from release to release


    If you have any documents generated with early word processors, can you still read them with anything?

    I don't mean to say that SGML, HTML, XML or FOOML is a bad thing. But they are simply another way of given us what we've had with TeX for years, with a few enhancements. Let's remember TeX's strengths and not allow them to be lost with newer tools.
    --
    The net will not be what we demand, but what we make it. Build it well.
  216. Page Layout vs. Content Description vs. hybrids by billstewart · · Score: 2
    These battles have been fought for a long time :-)
    1. Types of Document Markup Languages

      • Content Description Languages (CDLs)

        HTML, XML, and their parent SGML are content description languages - they describe document content entities such as paragraphs, headings, lists and tables, but don't describe how to make black marks on paper or RGB marks on CRTs.

        They're well-suited to automated layout programs , letting the document reader determine how to lay out the visuals, which may be different depending on whether the target reading environment is hi-res dead tree phototypesetters, medium-res CRTs, text-to-speech readers, braillewriters, cell-phone microscreens, PDA mini-screens, dumb terminals, browsers with images turned off for speed or font set really large for low-sight people or really small for pocket-sized printouts, etc. SGML was the original flexible metalanguage; HTML is a simplified static instantiation, as are the cell-phone variants, and XML is a newer SGML variant that's learned from 15 years of real-world experience.

        They're also well-suited for automated content handling, such as the XML developments that are replacing EDI for applications like purchase orders. Editing CDL documents is easy, as long as you stick to the defined structures.


      • Page Description Languages

        PDLs let authors tell the computer how to make documents look the way they want them to, and programs that process them make various compromises to support different presentation media, such as paper or CRTs. They range from things like Postscript, which forces the display to do the best job it can rendering an image that the authoring program specifies, to things like MSWord, which let the display device determine layout, and reprocess the entire input document any time you change printers. They also range from higher-level systems which know a lot about the document structure to lower-level systems that know about output but know next to nothing about structure - "systems" includes the application programs as well as the representation language.

        All of them have the problem that if you want to edit the contents, the output looks different so they have to cope with what you've done. Depending on the document structure and app design, they may have to repaginate the entire document, or only up to a chapter/section break, or they may be crude and only patch the current page and force you to renumber the rest if you want.


      • Hybrids

        Lots of authors want to specify the output appearance, regardless of whether this constrains the readers' choices - you can see hybridization like this crunched into HTML, with commands for fonts, font sizes, colors, and the newer cascading style sheet stuff. It's possible to do this in ways that preserve content - the language represents that this is a "Heading Type 2", and instructs that "Heading Type 2" be represented in Palatino Bold Blinking with a full line-break after the heading text. It's also depressingly common to lose content structure information, especially during translations, either because the target language doesn't have a mechanism for representing the content tags, or because the translator writer JUST DOESN'T GET IT. An example of the former problem is rendering for constant-width ASCII or for GIFs or Faxes. A depressingly stupid example of the latter is saving MSWord documents in HTML - MSWord knows about objects like headings and paragraphs, and knows that the current user's settings for a "Heading 2" object and "Normal Paragraph" object are 14-point Arial Bold followed by a single blank line and 10-point Times Roman followed by a single blank line - but instead of outputting an H2 heading object, a P paragraph marker, the text, and another P, the depressingly stupid program outputs a request for 14-point Arial Bold font, the header text, a couple of BR spaces, a request for 10-point Times Roman font, the paragraph text, and some more spaces.


    2. Application Program Dependence

      Application Programs can do lots of different things with PDLs or CDLs. For instance, you can use comments to put non-printed document structure information into PDLs, or to put layout information into CDLs, and programs that know about it will use it, while programs that don't know about it will ignore it. That doesn't mean there are any industry standards about doing this, so of course one editor may stomp all over another's markings, or may leave them in place while adding things the first program doesn't notice because the comments weren't updated. Postscript is an egregious contributor to this - it's an extremely general-purpose programming language, and there are lots of different ways to get the same set of black marks on paper, ranging from bitmaps to format-annotated CDLs, and almost no two applications can read each other's Postscript.


    3. Production Software

      • Framemaker

        While PDFs are both liked and disliked because they are designed not to be editable, I'm surprised your customer couldn't accept FrameMaker. It's one of the best WYSIWYG large-document production systems I've seen over the last decade, and if the customer wants to export pieces into MSFoo, they can, but if edit the entire document, they probably should have enough control over the process that they can buy a few copies of Frame for what's basically a trivial addition to the cost they've already paid for producing 3000 pages of documentation. Also, to do big documents, you need tools that can cope with multiple authors working simultaneously, and Word isn't really designed for that.

      • Alternatives

        If you're doing 3000 pages of documentation, or for that matter 300, and you're not a graphic arts shop or something, it's probably going to be mostly text, or text with user-interface illustrations, and you're going to use a uniform formatting style for the whole thing, modulo a few tweaks for illustrations that need to be placed on a page to fit together with an occasional tweak. I'd think the best approaches these days are either to build the thing in hypertext to start with, or else use some of the tools from the GNU / Emacs world, or else use a batch production system like LaTeX or troff with an appropriate macro set.
        Learning HTML was soeasy, since it looked a lot like the Troff -mm Macros :-)

        It's been a long time since I've been part of anything over a hundred pages or so; the last time I was on a very large RFP response project, the boss had some kinky troff macros and basic shell mungers that let us keep the entire document in a database, so we could track which of the N thousand requirements were being handled by which authors, in what files, patch up the figure and page-number references (really a two-pass process) , and build the indexes to the document and the crossreferences to the RFP requirements document it was a response to.



    Arrrgh - Slashdot doesn't let me use H1 and H2 :-)
    --

    Bill Stewart
    New Fast-Compression-only CPR http://preview.tinyurl.com/dy575ks
  217. XML is the way we should go! by Maul · · Score: 2

    I'm not saying it will be, but it should. XML is very nice, in my opinion. If everyone used XML for their formatting, we would have a standard, cross platform, well designed DTD that everyone could use. None of this Word 95 eating a Word 97 document crap that goes on when people use Microsoft Word. If every word processor had some sort of built in XML support, there could be a much greater share of information in this sense. I'm not sure if we'll see XML replace proprietary document formats any time soon. It would be nice to see it, and it could certainly handle the job quite well. However, I'm pretty sure MS especially won't want to let go of their format.

    --

    "You spoony bard!" -Tellah

  218. TeX coolness by Greyfox · · Score: 2

    I use LaTeX for just about everything these days. It can generated PDF files (With links.) A small perl script can turn it into HTML. It can be easily rendered into XML (or XML into LaTeX) as well. Its output is far nicer than most WYSWYG word processors I've seen. It makes generating tables of contents, lists of references and indices simple. And lets not forget that it's the ONLY thing you'll want to use if you're writing a math text book.

    --

    I'm trying to teach myself to set people on fire with my mind... Is it hot in here?

    1. Re:TeX coolness by Greyfox · · Score: 2

      That's been done too. Check out Lyx or Klyx. I prefer to work at the raw LaTeX level, but some people don't. If you haven't checked out these two programs, I suggest that you do (They'll be over on freshmeat.)

      --

      I'm trying to teach myself to set people on fire with my mind... Is it hot in here?

  219. Why XHTML? by spiralx · · Score: 2

    Hmm, I'm not amazingly well read up on these things myself, but it was my view that XHTML is a well-formed version of HTML. It's really the follow on to HTML 4, but instead of being HTML 5 it will be XHTML 1.0. It is designed to be portable across all kinds of applications and platforms, and the fact that it is well-formed means that the applications required to view it will be simpler than today's browsers. Also, since it is XML-based, new elements can simply be added to the existing DTD rather than having to rewrite it from scratch.

    Anyway, what is XHTML and how does it differ from HTML? Well, the following points apply:

    • All XTHML tags and attributes must be lowercase.
    • All tags must be closed. This means that tags which are currently not closed, such as must close themselves e.g. . Also, nested tags must be closed as well, and closed in the order they were nested in.
    • Attribute values must be quoted. So no more BORDER=0 - it must become border="0".
    • Attribute value pairs cannot be minimised. So instead of tags like you must instead write .

    For an overview of all things XHTML, see here.

  220. Yes and No by ^ · · Score: 2

    It depends what you mean by using application independent formats. XML is not a formatting language: it's a content marking language. Formatting is then left to XSL or some equivalent language. Certainly MSWord could manipulate documents in an XML format; in order to display them and make them pretty, however, it would need to translate them, perhaps to its proprietary binary format.

    The question then arises as to whether some standardized DTDs will appear, which then word processors could recognize and have their own XSL templates for. I'd argue this won't happen; more likely, MS would come up with a DTD, and others would follow.

    On one hand, professional work would certainly stay within the ideas presented by a DTD (this is a paragraph: see, it's indented). On the other, random users wouldn't; they desire a final look, not a semantically consistent document. Someone who wants to indent a paragraph might use the wrong word processor feature; it's an ordered list of one element with no bullets. Put it into another program, and it might look entirely different.

    1. Re:Yes and No by Antaeus+Feldspar · · Score: 2
      It depends what you mean by using application independent formats. XML is not a formatting language: it's a content marking language. Formatting is then left to XSL or some equivalent language. ... random users wouldn't [mark up according to content]; they desire a final look, not a semantically consistent document. Someone who wants to indent a paragraph might use the wrong word processor feature; it's an ordered list of one element with no bullets. Put it into another program, and it might look entirely different.

      Actually, I fail to see why this would be so terrible.

      The first reason for its non-terribleness is that it's what people do now... end users use spreadsheets for databases, databases for mail merges, mailing label files for databases. We can encourage them in better habits, but we would be unwise to declare "Oh, no, we can't give you powerful tools, since you'd just use them wrong."

      The second reason actually extends the first reason: sometimes doing it wrong is actually right. Namely, because doing it 'wrong' can produce something acceptable, fast, while doing it 'right' requires more time and care and installed base of expertise, all going into a product that may be used once and thrown away.

      The third reason is to correct a strange blind spot that may originate from the perception of what the Microsoft Way is and how the Right Way must be diametrically opposed. Namely -- if the end user is interested in a final look, why is that an inappropriate use of XML?? Isn't the point of XML to be the mark-up language that can be adapted to any domain? I don't see why we should insist that 'the final look' is an exception, for which the only correct answer is XML plus XSL...

      What's the best programming language? Trick question; a bondage-and-discipline language may be the best for writing code that someone else will have to maintain, but it's probably not best for a quick-and-dirty one-shot or for prototyping. There's no one right language. And just the same, there's no stone-graven law that says an XML editor will always be better than a word processor (or a word processor than a text editor). Having a choice would be better than what we have now... and having a XML-based word processor format would be better than a proprietary binary word processor format.

      --
      If people are to respect the law, perhaps the law should begin by respecting the people.
  221. XML is not for humans by Animats · · Score: 2

    I don't see XML as a formatting language at all. XML is most useful as a way to transmit data which will be read by a program, not a human. It's best for pumping invoices, purchase orders, customs documents, insurance claims, and similar forms-oriented documents around. Backers of XML keep talking about it as a formatting language for web browsers, but that's really a side issue.

  222. Re:Why not even html by jbarnett · · Score: 2


    Multiple column text.

    read "TABLE" tag.

    Footnotes and endnotes.

    Read "FONT SIZE=1" and "I"

    Automatic section numbering.

    uh what? an html editor would be able to automatically add page numbering, not sure what section numbering is

    Automatic generation of tables of contents and indexes.

    Read "TABLE" again

    Extensibility with macros.

    Any decent GUI html editor would be able to add this fearture

    Precise control over how your document will be formatted on the printed page. Any decent GUI html editor would be able to do this.

    Need I continue?

    More than likely, I still don't see you point. Ok lets assume Word97 is the greatest word processer of all time, say it has the prefect user interface. Ok, now on the back end, rip, tear and pull out that property format. Ok, so you just have the front end now right? Ok when it tries to save the file, have it push everything out into html instead of Word97 DTD. Ok, so you have an HTML file pushed out, but the user doesn't know it is an html document, pretty neat huh?

    Ok, so html is a little bloated, lets use a freely avaiable compress program to compress that sucker. Have it save to disk in html, compress it, without the user even knowing about it.

    Then what do we have here, Bob has a file called `Resume.gz`, smaller than a word document, more portable then a word document (you can find gzip on almost any machine) but looks exactly like a word document and the user doesn't even know he is save out into an html format. What is the disadtanages here? It makes the admin life more easier, when Bob say "Oh dam on a Mac/OpenVMS/Unix/DOS I need to find a Word97 machine" his freind can say "Wait Bob, dont' be silly you old gus, this machine can read you document, the days of property DTDs are gone my freind"

    --

    "`Ford, you're turning into a penguin. Stop it.'" -THHGTTG
  223. Why not even html by jbarnett · · Score: 2


    Why not even html 4.0 for a standard DTD? Seriously how much stuff do we REALLY need shoved into our bloated word processers/documents any ways? HTML 4.0 can do Tables, images, highlights, points, etc, etc. Seriously, give me a standard Word97, Star Office, Applix, Word Perfect documents and tell me that something in them can't be done in staight html. Give me any document and I bet I could convert it into staight HTML and have it look exactly like the $250 office suite.

    How much useless crap do we REALLY need in documents? Seriously, a paper typed on a $15 type writer from ebay and a $200 Microsoft office suite, tell me, can you really tell the differance between them. We need to write paper, and have these papers look professinal and that is it, and you are telling me this can't be done in staight HTML?

    Hell, HTML is everywhere, take your document edit in Hot Dog Pro and view it in Netscape under Windows. On a Unix machine? Use vi, bluefish and Netscape or kfm as the viewer. Got a mac, uh, I know for a fact there is Mac HTML editing software, just don't know it's name at the momement. See where this is going?

    Create your documents, upload you documents to a web server, or put them on a floppy disk and I guarnette that everyone on any decent platform will be able to not only view, but edit these documents.

    Does anyone else remember the KISS theorgy in "intro to CS". Keep It Simple Stupid. Everyone can make good use of staight HTML and it looks dam good, what is the problem?

    If you want to test this out, the next time you have to write a document (school paper, resume, etc) do it in staight HTML. The open up your favorite graphical browser and click on "print", hand in the paper, does the other person notice? If no, this is a good idea, if yes, this idea sucks.

    Seriously name one fearture in Word97, Star Office, Word Prefect that couldn't been done in a nice GUI html editor? Just name one, one example.

    When I had to start writing papers, my freind told me "No you can't write documents in vi, here use this". I load up Word95, after it takes fifteen minutes to load and grins the hard disk the entire time (lack of memory, swaping a lot) it basically gave me the funcation to put "Bullets" and "underline". Yea, why the hell can't I do that in vi and html, print it in Netscape and it will look the exact same, the differance though

    1) the document is protable
    2) the document can be uploaded to the web without any modifactions
    3) I can use any standard ASCII or html editor on any platform to edit it
    4) the document size is smaller
    5) I have html bragging rights
    6) the document looks just as good as any non-portable property document
    7) I don't have to pay $200-500 for a word processer

    I have thought about this for a long time and haven't really seen any advantages in using a word processer over html. You may debat, "the are %100 easier to use", you could make a GUI html editor just as easy to use, just 'hide' all the tages from the user and have all the same buttons. Just most (or some) html editor don't have spell checkers, etc. But how hard would it really be to build these into a GUI HTML editor? Very simple.

    I don't mean to be flaming here, but I honestly don't see any adtanages over other word processers (hell you could use these as the front end and just have it "spit" out html in the background when the user it not looking)

    It sounds so simple, please enlighten me if I am wrong here.

    --

    "`Ford, you're turning into a penguin. Stop it.'" -THHGTTG
  224. You have it backwards. by small_dick · · Score: 2

    The issue is not whether we should use open document formats.

    The issue is a failure of the global populace to standardize their computing platforms.

    If you people would just keep your software updated and standardize on Microsoft products, this would be a non-issue.

    If all computer users would just vote republican in November, we could get a decent man in the White House who will put a stop to the DOJ madness, and show that twerp RMS the true meaning of freedom -- the freedom to use a single product and vendor.

    If you could just put your personal issues aside and trust Bill Gates, everything will run smoother, and people will love and smile again.

    --


    Treatment, not tyranny. End the drug war and free our American POWs.
    See my user info for links.
    1. Re:You have it backwards. by small_dick · · Score: 3

      my god, at least you realized it was a joke. what are the requirements to moderate? here they are:

      1) Recent lobotomy (credit for ECT)
      2) Totally humorless (credit for cluelessness)
      3) Blind
      4) Stupid
      5) Poke at keyboard with cane.

      it was soooo obviously a joke...oh well. knowing my luck i'll probably end up trying to teach one of these chumps to program one day...take a deep breath...start over at the beginning...keep trying to break through...arrrrgh.

      --


      Treatment, not tyranny. End the drug war and free our American POWs.
      See my user info for links.
  225. Hey genius... by GNUs-Not-Good · · Score: 2

    they have word95 and you are giving them word97?

    That is not Microsoft's fault. That is your fault.

    This has to be one of the dumbest AskSlashdot questions ever.

    Here is your answer...File...save As...Word 95...end of question.

    1. Re:Hey genius... by doublem · · Score: 3

      Have you ever tried saving a complex Word 97 document in Word 95 format? If it's just text with some bullets, italics and bolding it's no big deal. If you have graphics, Wordart(tm) or anything more complex than "Left Justify" you're screwed. If you like I can e-mail anyone who asks a sample of what I'm talking about. When I started here, some moron opened a WP file and saved it as a Word file. It took 10 hours to reformat the document because of the arcane features that had been used in the original.


      Matthew Miller,

      --
      "Live Free or Die." Don't like it? Then keep out of the USA
  226. Re:Absolutely! by somero · · Score: 2

    I agree entirely that XML conceivably allows us to focus on the matter at hand: writing well.

    I'm a technical writer and I've said all along that media means nothing, the tools are irrelevant, and technology is merely a practical means to get a job done.

    (Refer to prior posts about the horrors of MS Word and the glories of LaTeX. Which allowed success? LaTeX painlessly produced more thesis papers, correct?)

    My professors didn't like to hear my stand on media, but I haven't been proven wrong yet. They were mystified by media because people who use online media are less forgiving than people who read hardcopy. The problem? They didn't realize that to write well takes discipline, hard work, and diligence. Sadly, they were my professors.

    If we use XML, we focus on writing well. Presentation becomes secondary, as it should, with the use of XSL, DTDs, or whatever other mechanism is available to output our words.

    "Think well, speak well, write well."

    Perhaps my dissenting professor forgets, but she defeats her own argument with her mantra.

    Tim Somero

  227. What you ask for has existed for years... by Karl_Schroeder · · Score: 2

    ...It is just now that the general public is becoming aware of it, in the form of XML. Just visit any IBM documentation shop. They've done all their documentation in SGML for years; in this problem space, there is no difference between SGML and XML.

    Four years ago, I faced exactly the same problem as you: several thousand pages of product documentation formatted in Word. In our case, we had just lost five tech writers, leaving me holding the bag. So, I cobbled together an SGML publishing system with a colleague, for about $1000, including an on-line collaborative editing system, full on-line browsing of all docs, and semi-automatic translation of the SGML into LaTeX, thence into postscript or PDF. These days, all you need is a copy of WordPerfect 9 for the writer, a Linux box running the Xalan XSL formatter and a copy of PDFLaTeX, and you can single-source web and print versions of all your docs. The cost would be the cost of WP9, but of course you could just use a text editor.

    This is not new technology. It's quite mature, as institutions like IBM and the U.S. military will attest.

    As to myself, I create new DTDs as I need them for writing projects. One language per project, often.

    By the way, you do not want to standardize on one "word processor" language for XML--that would be to miss the whole point of XML/SGML.

  228. SGML has been doing this for years. by X · · Score: 3

    This is exactly what SGML has been doing for documents for years. The government and military has been using SGML to ensure that document structure is maintained and that documents are always readable.

    Of course SGML is pretty complex, so XML has been born to simplify SGML. XML is now being used to accomplish the same thing.

    --
    sigs are a waste of space
  229. Barriers exist right now... by Matts · · Score: 3
    Sadly there are still barriers to this becoming a reality (although I really hope it does become a reality). Perhaps the biggest barrier is the lack of really good XML authoring tools. I really believe that the first company releasing a first class DocBook/XML editor for a price under $100 will make an absolute killing in the marketplace. Current offerings such as add-on modules for Word, FrameMaker/SGML, and WP/SGML just don't quite cut the mustard. XMetaL Pro looks pretty good, I hope the next version will be better.

    In short, I expect to see this sort of tool become a reality in this season's software releases.

    Other barriers to this also include decent formatting. We have reasonable XSLT styles for DocBook, but completely modifying these to make a custom look and feel is still pretty hard. Someone is going to release an XSLT WYSIWYG editor real soon now and make another killing in the market.

    So in summary, I think yes, XML can and will replace proprietary formats. And ultimately be easier to work with.

    Want to deliver XML with Apache to varying media devices in different styles? Get AxKit

    --

    Matt. Want XML + Apache + Stylesheets? Get AxKit.
  230. Re:An open question by Zagadka · · Score: 3

    There are other document formats which deliver the same power, have been around longer, have not *radically* changed, and are open to implementation by other vendors. HTML and XML-based grammars are only one example of this. PostScript would be an even better example.

    Just one nit: PostScript is actually a pretty bad example of this, because while it's reasonably easy to generate, it's horrendously hard to extract any useful information from.

    Tools that take PostScript as input tend to be fairly fragile if they're trying to do anything beyond just rendering the document. "2up" converters often fail on PostScript generated from certain sources. Many graphics packages that allow insertion of EPS simply can't render the EPS on-screen unless there's an embedded TIFF "preview". PostScript to text converters rarely, if ever, work.

    PostScript is a nice language for talking to printers. It isn't a good language for talking to software though. That fact that it's Turing complete means a lot of the analyses that would be useful to do on documents simply can't be done with PostScript without actually executing it, and there's no way you can tell if it'll ever halt. PostScript documents also tend to just be filled with low-level rendering information, not high-level semantic information required for things like searching, translation, converson into other formats, etc.

    XML is far superior in this respect. XML documents can encode semantic information, and they're easy to analyze. They're also a heck of a lot easier to parse. There are many XML parsers available. I can only think of one PostScript parser that isn't built into a printer (GhostScript). XML isn't a panacea though. Even if every application vendor switched to XML, they'd probably all use different DTD's. That's still better than unreadable binary formats though, because it's a lot easier to reverse engineer the file format, if it isn't published.

  231. Absolutely! by ZeroLogic · · Score: 3

    The beauty of XML is that it allows people to focus on the data and not the formatting. If the word processor "industry" were to get together to support a single DTD (Document Type Definition) so that everyone would know how to react to specifict tags then you could have a format that any WYSWIG editor would render correctly. And, it would also allow people to do tex style editing as well. Using their favorite text editor (xemacs of course!)

    /ZL

  232. Re:An open question by AndyElf · · Score: 3

    Word is not the worst case here, Excel is even worse -- it has changed in almost every new release of MSOffice.

    As for why this happens -- peer pressure, and that's exactly what Pauly talks about. If your client uses it, so will you (or at least you will have to convert to your customer's format before exchanging documents). In the recent past it was not even so much a question of tollerance, rather of no choice. Look at any of the Office Productivity Suites reviews at ZDNet or C|Net -- MS is almost always a clear-cut winner, even though most of the blows and whistles an average consumer will NEVER use (as a side note, wouldn't you think that most users could happily live with functionality of Word 2.0?).

    As for what could be done to resolve it, I think that trying (whenever possible) to exchange HTML docs could be one solution, but you loose some control over the layout and won't be able to do any sort of document automation. And when it comes to a 3000+ page document -- you just gotta convince that customer not to use Word for this.

    A few people had mentioned TeX and LaTeX, as well as SGML here, but I guess this is not the answer for Pauly, as his customers are not happy with it. OTOH, slowly educating them could help a lot. FrameMaker would be the best choice then: you don't need UNIX to run it (unless you'd want to try to convert your customer completely), get great documents, can convert them into SGML (with FrameMaker-SGML).

    --

    --AP
  233. Binary file formats vs XML by harmonica · · Score: 3

    The XML approach is much better from a technical point of view. With XML you can specify the structure of documents in the DTD and you simply need one of the many XML libraries to actually parse the data (and even detect errors). If word processing creators would not agree on a single DTD but create their own (which is the most probable thing to happen), you can specify a conversion scheme using a query language and even convert XML word processor documents between the DTD's automatically -- if every element in the source DTD has an equivalent element in the destination DTD.

    With products as XMill it is even possible to compress XML documents very welll so that the additional markup won't result in bloated files.

    Binary proprietary formats are only good for keeping the structure secret and competitors out of the race. I wonder why Microsoft opened up theirs... Maybe it has become complicated enough so that nobody tries to create a filter! Or the descriptions do not contain 100 percent of the file format or wrong information... Yes, that's a bit paranoid, I know. Anyone from the KOffice team here to give us some insight?!

  234. Already Happened by CharlieG · · Score: 3

    Guess what? Microsoft (I know - don't bug me) already does it! The New Word 2K format is XML based

    --
    -- 73 de KG2V For the Children - RKBA! "You are what you do when it counts" - the Masso
    1. Re:Already Happened by Refrag · · Score: 3

      No, it hasn't. I worked for Microsoft during the rollout of Off2000. Word2000's file format is read & write compatible with Word97's. That means it hasn't changed a whole lot if at all since the last rev. (for once)

      XML is used in HTML files created by Off2000 applications (except for FrontPage) to use in "round-tripping" of HTML files from source app, to HTML app, back to source app. You see, there is a small application that reads the XML tags in the HTML files and sends it off to the source app for further editing when you Edit the document. Some information is also stored in the HTML file in XML complient tags to help the source app to provide it with further information about the original source document -- to make it appear seamless.

      FrontPage strips these XML tags out of the HTML files and breaks round-tripping.

      --
      I have a website. It's about Macs.
  235. XMLBabel-Logic gone Awry by BoLean · · Score: 3

    I'm usually very sceptical of new buzzword technologies. When I first heard about XML and did a little reasearch I was floored by the elegant simplicity of the model. XML at its most basic is a set of parser(HTML or whatever) tags that allow the representation of structured data. Much like simple HTML tables are constructed of tags like "" tags, XML extend this to define more complex structures.

    For instance, a simple dataset containg haircolor,eyecolor and name for a group of people could be represented with tags like .... This idea is not only a boon for people trying to translate complex information across the web but it also allows for greated complexity in documents viewed on the web.

    Here's the part where things went Awry. The W3C (World Wide Web Consortium is the offical standards organization for the web. Their biggest problem is that as a standards body they are trying to maintain stabilty and conformity of standards. This makes them rather slow at approving and implementing new standards. In the past this resulted in companies like Netscape and Microsoft integrating new technologies into their browsers long before they become new standards. Javascript and ActiveX are just two examples. Can't really blam them, they have to compete in the marketplace and he who gives the consumer what they want soonest usually wins and gets to set the defacto standards. In a nutshell, the W3C has become little more than a R&D organization.

    So, then we get to XML. Initially proposed over six years ago it was initially rejected by the W3C. Many outside the W3C like the proposal though so many groups started developing and testig different variations of XML. XML and similar technologies like XSL-Extensible Style Language, SVG Scalable Vector Graphics and and a plethora of others began to appear. See the Oasis to get an idea for how far this has gone.

    Today there are so many standards for XML variants that there are actually groups with competing standards for XML formats as specific as data exchange between banks. Kind of like a modern day tower of Babel.

    So, to answer your question, yes XML holds a lot of promise for document and data interchangability among different software products, but between here and that goal is one huge civil war among competing groups and technologies. Giants of the software industry like IMB and Microsoft have already staked their grounds. Recent Patent Rules changes and passage of UCITA in several states have complecated matters by allowing companies to patent abstract things like database structure and parsing rules. Hopefully like the war between VHS and Beta a clear winner emerges quickly more importantly the winner must be an open standard.

  236. New file formats are not new! by Speare · · Score: 3
    Quoted: Let me tell you, it is painful watching a 3,000+ page Word97 manuscript, the fruit of weeks of hard labor, rendered into rubbish by my customer's Word95. I've missed deadlines, lost money, and will never forgive Microsoft for their abuse of me and my kind.

    I'm not going to get into the debate about "open" standards, XML vs proprietary format, or whether Microsoft is somehow evil.

    I will say that if you prepared 3000 pages in a format that your client wasn't able to use, it's your fault. Stand up and do the legwork to understand your client's needs. If your client had the same version of Word, or you started with a copy of their version of Word, it wouldn't have mangled your "weeks of hard work." If you need critical compatibility, preview using exactly the same set of operating system, software, fonts, video drivers, printer drivers, paper and ink cartridges that they will use.

    Applications extend their format all the time. I can't load a Photoshop 5.0 document into version 1.0 without problems. I can't load an HTML 2.0 compliant page into an HTML 1.0 compliant browser without problems.

    The same thing would happen even if Microsoft was 100% XML 1.0 compliant, as soon as people made XML 2.0 documents.

    It's your responsibility to provide the results for your client; stop blaming the tools. Get tools that will provide the results your clients want. "Gee, my hammer's left-handed, that's why I need to start your kitchen cabinets all over again."

    (New file formats are not new. =anagram>
    Lament, now refine software.)

    --
    [ .sig file not found ]
  237. XML easily subverted by Greyfox · · Score: 3
    People talk about how easy XML will make data interchange, but some Evil Company (That shall remain nameless) can still make XML just as obtuse as their previous unreadible file formats. Do you think they'd embrace XML if they couldn't also "extend" it?

    XML offers quite a few benefits, not least of which being that it forces the author to think of a document in terms of a tree. It by no means will enable everyone to just start talking overnight, magically.

    --

    I'm trying to teach myself to set people on fire with my mind... Is it hot in here?

  238. Good Idea but counter gratuitous complexity by ibm1130 · · Score: 3

    Good idea Cliff unfortunately this would prevent M$ from breaking things whenever they needed to hence it is unlikely to occur at least as far as M$ Office is concerned. If the proposed remedies in the current M$ anti-trust case include ( as they should ) measures to force ( even temporarily ) M$ to open its file formats then the situation may change. The downside is that M$ will then attempt to coopt whatever the standard becomes and voila there we are back at square one.

  239. XML does everything - whatever. by Ron+Harwood · · Score: 3

    I've got friends claiming that XML is the panacea for computing... for everything from e-commerce to a replacement for SAP to a standard for documents.

    The only problem is that the applications have to be created to support the XML standard. So, unless you have a word processor that supports XML, and the people you're sending your documents to can read XML with their software ('cause you know MS will make an MSXML bastardisation...) you might still be out of luck.

    1. Re:XML does everything - whatever. by Tom+Bradford · · Score: 3

      XML is not the end-all be-all of data representation formats, but it is certainly one of the most flexible formats for representing textual (or textually representatble) data. It will never replace binary standards unless a very good generalized binary compression and representation system for XML documents is developed and adopted by the XML community. My company is working on such a beast.

      Regarding the Applications. It's true. They're not quite there yet, but they're coming. My company is putting together quite a decent one, and many other vendors are trying to do it right as well. The Windows support, unfortunately, will generally be there before the UNIX support, but UNIX support is not far off.

      Regarding Microsoft and XML. Microsoft, though I hate to admit it, is one of the more influential catalysts for the development and standardization of XML specifications at the W3C. Their stance on XML, for the most part, is a driving stance, as opposed to what it has been in the past with other Internet technologies, being embrace and extend. Let's give them some credit in that sense then.

      Tom Bradford (CTO) The dbXML Group

  240. Why? by mosch · · Score: 4

    The reason we do this is simple. The United States government. Nope, I'm not claiming conspiracy, but look at what one has to do to do business with the US government. You must submit your specs in Word.

    Now all the businesses that want to do business with the government switch to word. So what happens next? The businesses that do businesses with those busiensses switch to word. It's recursive.

    Personally, I think the government could do much to open up the playing field by making it so all documents sent to the government had to be in some openly documented file format (XML based if you like to pretend that XML solves all problems, or just some random binary format or what not.)

    This simple move would smack Microsoft far harder, and more fairly than most any DoJ action.
    ----------------------------

  241. Learning XML, XHTML by lapsan · · Score: 4

    I spend my working time (and then some) as a Web Designer and have recently been trying to read up on XML and XHTML. (is that slightly redundant?)

    It is turning out to be quite a difficult task. While everything I read tells me that it will be replacing all those proprietary document formats, it doesn't tell me exactly how that is supposed to work in a real world scenario. I believe that it does have that potential am stuck in exactly the same place as the poster... not being able to find the answer to what seems to me is a rather basic and obvious question. Is it worth my time to learn XML for future use or is it just another wild dream of a select few people?

  242. XML Wouldn't Help by ecampbel · · Score: 4

    If Microsoft's Word 95 and Word 97 document formats were XML based, there is no guarantee that you could seamlessly down convert a Word 97 document to a Word 95 document. What if your Word 97 document uses a few features that are specific or changed in Word 97? The XML converter would have to approximate the Word 95 equivalent and would probably botch the job, the same way the existing 97->95 converter did. The bottom line is that the file format changed between Word 95 and Word 97, and it doesn't matter how the format is stored, things will go wrong when you attempt to down covert.

    In addition, XML only effects how the file is stored on its disk. Internally, Microsoft Word will represent your document the same regardless of whether its stored as XML or in a binary format. If it wants to create a binary version of your document, Word will simply write your document's raw internal data structures to the disk; if it wants to create an XML version of the document, it will first convert its internal binary version to XML and write it to disk. The only case where an XML based file format is better is for third parties who don't know the internal structure of Word's file formats, but still want to read its files. For Microsoft, it has intimate knowledge of its file formats, so storing it as XML gives no advantage to Microsoft applications

    --

    Sig goes here
  243. Re:Nope by jwkane · · Score: 4

    "Your example of Word formats changing is a perfect one. If Word95 used XML, Word97 could still be incompatible if it used different elements and attributes."

    You're overlooking a fundamental feature of XML. If Word21 needs to add additional elements or attributes to support new features, they simply create new tags. If the document is loaded in Word20 (ignorant of those tags) it won't look quite right (whatever feature was implimented with those tags will be skipped) but it will still display. If M$ wanted to try and maintain it's current upgrade-4-compatability approach, they could change all the tags with every version, but such obvious and outlandish behavior would only serve to destroy whatever fragment of reputation they still have.

    "XML can't replace proprietary document formats. That's like asking if ASCII could replace proprietary document formats."

    I must not be understanding what you mean when you're refering to ASCII since simple texts replace proprietary document formats all the time. TeX, CSV, RTF, HTML, PS, all are human readable text files. Certainly XML is only part of the solution, it stores the content while the format is handled elsewhere. In that sense it differs from the traditional mixed approach.

    The most important thing about the transition from mixed formatting/content to clearly delineated content vs. formatting is that the author isn't (ultimatly) going to have any control over formatting. Relax and give a little thought. The format of a document should be determined (or at least be determinable) by the person reading it. If I counted the times I've read the source of someone's HTML because their background is obnoxious I would have wasted much time.

  244. Not Already Happened by Matts · · Score: 5

    No, it hasn't (already happened).

    Microsoft want you to believe that they are buzword compliant, but in reality the output from Microsoft's "Save As HTML" looks like XML, smells like XML, but isn't. Try parsing it.

    See the recent Byte article "The cup is half full" for more details. I'm surprised you haven't heard about this. MS is using it's proprietary XML Islands inside a HTML document. That means you have to get a HTML parser to be able to parse it. The content of the XML is just as proprietary. It's basically a conversion of their OLE Document objects into XML.

    --

    Matt. Want XML + Apache + Stylesheets? Get AxKit.
  245. An open question by DonkPunch · · Score: 5

    The question is: Why do software consumers tolerate this?

    The compatibility breaking between different versions of Word is well-known and oft-maligned. I have a hard time seeing it as anything more than a forced upgrade cycle, where Word users MUST buy the latest version in order to exchange documents.

    There are other document formats which deliver the same power, have been around longer, have not *radically* changed, and are open to implementation by other vendors. HTML and XML-based grammars are only one example of this. PostScript would be an even better example.

    So why have business environments settled on a standard which seems clearly to not be in their best interests? Why do they blindly pay for new versions every few years when their current versions do everything they need and more?

    I'm all for letting the free market determine the best product, but Word strikes me as a solid example of the free market failing in this regard. Perhaps poor consumer education is preventing software from being a truly free market. The feature set of Word is nice, but the upgrade-insuring file format should cause people to run away. I would be skeptical of a car that used non-standard gasoline and forced me to buy an engine upgrade each year to handle new gas.

    How has this been allowed to happen?

    --

    Save the whales. Feed the hungry. Free the mallocs.
  246. Sabotage by overshoot · · Score: 5

    What's the downside? Simple. Lack of tool support. There are lots of portable document formats out there already. MIF is published, WordPerfect doc format is published, even RTF is supposedly for portability, etc. Why not send your customers docs in these formats? Because the word processor that has 94% of the market has no incentive to enable competitors by supporting them, and even has a great deal of incentive to minimize compatibility between its own generations (as you found out.)

    Assuming that any open document standard emerges, you can pretty well bet that saving from the market leader to that format will be an ugly process (have you looked at the HTML that that turkey produces? Blech!) You can also bet that imports from it will be better but still a pain. For real fun, try repetitive translations between the native format and the portable one and compare the starting and end results.

    The sad fact is that monopolists have a huge stake in incompatibility (read the Halloween Documents) and every reason to maintain it. The rest of us will just have to survive in that environment until it changes. Changing it is another topic entirely, but for once I'll say, Vive le France!

    --
    Lacking <sarcasm> tags, /. substitutes moderation as "Troll."
  247. Nope by p3d0 · · Score: 5

    XML can't replace proprietary document formats. That's like asking if ASCII could replace proprietary document formats. XML and ASCII are not really file formats. They simply don't do the same job as file formats.

    If you have ever used lex or yacc, then you'll know what I mean when I say that XML parsers essentially do the job of lex, but not of yacc. An XML parser is little more than a scanner which breaks a file into chunks to simplify the next level of processing. The XML parser gives the illusion of hierarchical processing that lex can't do, but it's an illusion nonetheless.

    Your example of Word formats changing is a perfect one. If Word95 used XML, Word97 could still be incompatible if it used different elements and attributes.

    So no, XML will not replace proprietary file formats. XML + proprietary DTD specifications + proprietary semantics could replace proprietary file formats. Is this an improvement? Probably. Will it make backward (or forward, or sideways) compatibility problems go away? Nope.
    --
    Patrick Doyle

    --
    Patrick Doyle
    I mod down every jackass who puts his moderation policy in his sig. Oh, wait a sec....
  248. Biased rant from co-inventor of XML by tbray · · Score: 5

    First off, while there's a place for MS Word, a 3000-page document ain't it. In my experience it tends to severe breakage in this situation.

    Office2K will already save docs in a kind of bastardized HTML++ format which truly sucks because it is neither rules-following HTML nor well-formed XML, and it could have been without much trouble. A little bird has told me that a not-too-distant future release of Office will have a *real* XML save format, which would be cool. I mean, a lot of the tags will still be proprietary MS gibberish, but at least you can parse 'em, and it'll be way less susceptible to inter-version breakage.

    A basic part of the XML dream was the notion that the idea that software packages have proprietary data formats is just as silly as the 80's notion that computer networks should have proprietary per-wire data formats (remember DECnet, Wangnet, SNA?). So what pauly wants is exactly what XML is trying to do.

    Having said that, a lot of the infrastructure we need to make it easy to author and deliver XML isn't here yet.

    What I'm doing these days for complex documents is writing them in HTML++, by which I mean mostly well-formed HTML to which I add my own tags (e.g. , ) whenever I need to; because you can display what you've written in old browsers, which helpfully ignore the non-HTML tags, and you can write perl scripts or use XSL to turn it into RTF if you want to publish paper, and with Mozilla you can write a CSS stylesheet and dress up your own tags the way you want.

    Cheers, Tim Bray