Slashdot Mirror


Pretty Printing From An XML File?

Omega1045 writes "Where I work we are developing a new product that receives an XML document (on a W2k workstation), and we need to format and print said document. We are currently using XSLT + CSS to build a cool little HTML page out the the XML, then use a browser to print out the HTML. However, while HTML is a nice format for display, it is not a nice format for printing. We have messed around with the idea of spitting out Rich Text with XSLT. However, Rich Text is confusing and quite frankly sucks. We are looking for a (free if possible) format that we can translate our XML document into via XSLT, and print. The best idea we have at this point is to translate into a Word or OpenOffice XML schema document, and use one of those applications to print. Other ideas?"

9 of 65 comments (clear)

  1. Postscript? by Hanji · · Score: 3, Interesting

    I'm not actually familiar with the details of postcript at all, but it certainly seems a logical format to consider if printing things is your concern.

    --
    A Minesweeper clone that doesn't suck
  2. PCL, depending on how complex your layout is. by np_bernstein · · Score: 2, Interesting

    The company I work for dynamically fills out complicated forms and fills in their data. We use PDF, sure, but if you've got any complicated stuff where you need things to be very exact, or need to support things like mixed pages sizes, etc. You want to look into Printer Control Language, originally created by HP and supported on most printers.

    --
    RandomAndInteresting.comdefending the world from stupidity since 1979
  3. Try Docbook by Pyromage · · Score: 2, Interesting

    Docbook is an XML based document format, with support to output to many different formats, including HTML and LaTeX, as I recall.

    I'm not sure how the docbook LaTeX filters work, but you may want to avoid LaTeX, for several reasons: special characters. LaTeX doesn't do Unicode, you'll have to translate those characters. That's not a huge problem, merely an annoyance.

    But quotes can be annoying. Latex wants directional quotes. This is fine only if you have full control of your source and are willing to deal with it.

    I tried to go direct to latex on one of my projects, it's not straightforward. Unless I'm missing something obvious; if someone does know a solution, please inform me 8-}

  4. SVG? by big+daddy+kane · · Score: 2, Interesting

    If I understand what you're asking, SVG would be a good choice. Bullet sharp text that prints excellently. It can be automatically generated and is based off of xml so it shouldn't be too hard to intergrate.

    1. Re:SVG? by hsoft · · Score: 2, Interesting

      Yup. I'd personally say "try harder with {X}HTML", but in case it fails utterly, SVG will definately be the way to go! It will be much harder for you though to transform XML into printable SVG than into XHTML.

      --
      perception is reality
  5. Listen to me ;-) by cookiepus · · Score: 3, Interesting

    I've had do to just this, actually... here's the setup. Don't ask me why certain things were the way they were, certainly you can improve. I inherited some of this. But it worked...

    First, we had a bunch of product data in a MS SQL server db. We had a Java (I think) task that nightly dumped XML file (one per product) based on the DB.

    Then, we applied an XSLT transformation to each XML to produce the static HTML page for that day (static both to reduce server load and optimize google's searching of it, since Google didn't/doesn't like dynamic content)

    Then we wanted to produce a printer catalogue, so rather than printing pages, I made an XSLT that transformed the XML not into HTML but into FOP. FOP is some Java shit from Apache that takes FOP files and spits out a PDF.

    Obviously I don't remember details, but it worked.

    I had the idea to generate the PDFs not just for the printed catalogues but also as "printable version" for each HTML page. So both PDFs and HTMLs were generated nightly. Yeah it took a while but it was cool.

    It also served to improve our pagerank because (1) the PDFs made it look like we've got twice as much content and because (2) google gave higher weightings to PDFs (at the time, anyway)

    And, it was easy.

  6. XML -> RTF via XSL... by timjones · · Score: 2, Interesting
    I have two production applications at work that convert a subset of HTML to RTF via an rtf.xsl stylesheet, with results good enough, that my users actually call it (RTF) the "print" format, whilst the HTML is (naturally) the "view" format. The subset it supports is TABLE, TBODY/THEAD, TR/TD, FONT COLOR/SIZE, @BGCOLOR attributes, and rudimentary PNG/JPEG image embedding (but only as part of the text stream & in a table cell, not as independently positioned images). I had to add a few things to plain HTML, like a "base font size", a "landscape/portrait" attribute to BODY, and attributes to pass to RTF's "\cellx" tag (because HTML auto-sizes everything, and in RTF you have to specify the position of the right edges of table cells).

    All my users use Word as a helper application, which they think is great, but it makes me cringe to think of more invocations of an MS product.

    I would loved to try producing the PDF, XSL/FO and Postscript outputs via XSL transformations, just haven't had the time to try it yet. I'm sure they would yield better results, so go for it!

  7. Re:You're almost there... by Magnus+Reftel · · Score: 2, Interesting
    Take the XML and the XSL and transform it into 100% valid XHTML. HTML 4 is deprecated, the standard will not be updated.

    As the poster said, they've tried HTML, and didn't like it. I very much doubt that the print quality of XHTML would be any better than HTML. (I don't quite understand either why you're including screen styles for a page that is intended only for printing.)

    As for HTML 4 being a dead end, the WHAT WG, a collaboration among developers from most browsers, are defining a set of specifications intended to extend HTML4 in the short term, and serve as a base for a fifth version of HTML later.

    --
    print "Yet another p{erl,ython} hacker\n",
  8. Re:FOP by pmuellr · · Score: 2, Interesting
    One problem with Apache's FOP is that it doesn't support keep, orphan, widow type stuff, so it's difficult to get nice looking paginated stuff, broken at natural places. FOP supports it in the 'spec', Apache FOP doesn't support it in the implementation.

    I was using FOP to create 'slides', and it did an ok job. Nice that it supports links in the PDF file.

    I also looked at ReportLab for Python, which seemed slightly better to me than FOP, with one exception. I think the link support was not as nice as Apache's, and at the time I was too reliant on it to be able to move to ReportLab.