Slashdot Mirror


HTML Tags For Academic Printing?

meketrefi writes "It's been quite a while since I got interested in the idea of using html (instead of .doc. or .odf) as a standard for saving documents — including the more official ones like academic papers. The problem is using HTML to create pages with a stable size that would deal with bibliographical references, page breaks, different printers, etc. Does anyone think it is possible to develop a decent tag like 'div,' but called 'page,' specially for this? Something that would make no use of CSS? Maybe something with attributes as follows: {page size="A4" borders="2.5cm,2.5cm,2cm,2cm" page_numbering="bottomleft,startfrom0"} — You get the idea... { /page} I guess you would not be able to tell when the page would be full, so the browser would have to be in charge of breaking the content into multiple pages when needed. Bibliographical references would probably need a special tag as well, positioned inside the tag ..." Is this such a crazy idea? What would you advise?

21 of 338 comments (clear)

  1. LaTeX by Anonymous Coward · · Score: 5, Informative

    You seem to be talking about LaTex. It already exists. Don't reinvent it.

    1. Re:LaTeX by The+Snowman · · Score: 5, Informative

      You seem to be talking about LaTex. It already exists. Don't reinvent it.

      Another alternative is RTF, which is a sister SGML language of HTML. While it may have drawbacks, it would accomplish most if not all of what is required.

      --
      24 beers in a case, 24 hours in a day. Coincidence? I think not!
    2. Re:LaTeX by Anonymous Coward · · Score: 3, Informative

      No one really writes RTF by hand though. DocBook is more like what this person is suggesting.

    3. Re:LaTeX by fermion · · Score: 4, Informative
      Latex is really the solution. There is no reason to reinvent the wheel. In fact, reinventing the wheel might cause problems when submitting papers. From what I have seen, many academic journals prefer .tex and .eps files. I can't imagine what they would do with HTML.

      The nice thing about LaTex is that, like HTML, it is a pure markup language, but it is a markup language that understands typesetting so one tend to get a good page layout no matter what. OTOH, HTML merely identifies blocks of text as various generic types, and really does not have a context for the types. The render engine is free visualize, make a sound, or do whatever it wishes to represent the blocks. CSS is what imposes a consistent visual framework, so what one needs to duplicate LaTex is in fact CSS.

      Use LaTex. Except for the often limited fonts, it is vastly superior to an word processor, because a word processor is not the write tool to create real documents. We have know that for many years. That is why people bought pagemaker. And I think the lack of fonts forces people to create compelling content. LaTex is free, there are many good books,and if you do have a hankering to code, you can always play with Tex.

      --
      "She's a scientist and a lesbian. She's not going to let it slide." Orphan Black
    4. Re:LaTeX by Anonymous Coward · · Score: 5, Informative

      Actually, the font problem is solved by using XeLaTeX (which uses XeTeX).

      Full OpenType support. Looks amazing.

    5. Re:LaTeX by femto · · Score: 5, Informative

      LyX. I wrote a thesis in it and didn't have to resort to any manual interventions in the generated LaTeX. Couple it with SVG diagrams, generated by inkscape, and you have a seamless authoring system that handles both text and graphics. SVG means there is no messy task of keeping source and postscript output synchronised (just right click a diagram within LyX to edit the SVG source with inkscape). Use gnuplot to generate your (postscript) graphs and you have pretty well a complete authoring system. A few years ago, LyX and inkscape were too immature to use seriously, but they have matured. I recommend the combination.

    6. Re:LaTeX by plasticsquirrel · · Score: 5, Informative

      Use LaTex. Except for the often limited fonts, it is vastly superior to an word processor, because a word processor is not the write tool to create real documents. We have know that for many years. That is why people bought pagemaker. And I think the lack of fonts forces people to create compelling content. LaTex is free, there are many good books,and if you do have a hankering to code, you can always play with Tex.

      LaTeX has limited fonts, but if you use XeTeX (which uses LaTeX), you can not only use all the LaTeX stuff, but also any TrueType or OpenType font, and native unicode support as well. This is a godsend for typesetting anything that includes words or characters not in English, or just for people who are picky about typography. My personal favorite font is the open source SIL Gentium family, which is not only much more beautiful and readable than Times, but contains a fuller character set that makes it compatible with many more languages. Once you start writing documents with XeTeX and nicer fonts, you see how lacking word processors are for good typography and well-structured documents, and how self-limiting the concept is.

      For newcomers to LaTeX and XeTeX, including packages and specifying options can be a bit time-consuming when you just want to get started with a basic A4-sized document. Here is the basic XeTeX file template I use for simple stuff. I'm picky about margins, line spacing, fonts, etc. so you know that it's a safe place to start out.
      -

      \documentclass[11pt]{article}

      %%
      % XeTeX packages
      %%
      \usepackage{fontspec}
      \usepackage{xunicode}
      \usepackage{xltxtra}

      %%
      % Formatting packages
      %%
      \usepackage{setspace}
      \usepackage[vcentering,dvips]{geometry}
      \usepackage{fancyhdr}

      \geometry{papersize={8.5in,11in},total={6.5in,8.8in}}

      \pdfpagewidth 8.5in
      \pdfpageheight 11in

      % 10pt font: 1.15
      % 11pt font: 1.1
      \setstretch{1.1}

      \setmainfont[Mapping=tex-text]{Gentium Basic}

      \begin{document}

      Some text...

      \end{document}

      --
      Systemd: the PulseAudio of init systems
    7. Re:LaTeX by thefringthing · · Score: 3, Informative

      Most distributions of LaTeX come with some way to compile it and export to PDF, which is both portable and readable.

    8. Re:LaTeX by Nitewing98 · · Score: 4, Informative

      Yes, not using CSS is crazy. Especially since CSS includes the concept of media types. A simple "media='print'" in the CSS spec should hand that.

      --

      Nitewing '98

      Everything works...in theory.

    9. Re:LaTeX by TheRaven64 · · Score: 3, Informative

      CSS has had attributes for page breaks since CSS 2.1. I've not played with them for a while, but Opera supported them correctly back in 2003, so I'd imagine that they work well now. You can find the relevant part of the specification here. It lets you specify margins, soft breaks, rules for two-sided printing and so on. I played with the idea of using HTML instead of LaTeX for a while, but decided not to for two reasons. First, LaTeX is easier to type and read, and second because LaTeX produces nicer-looking output.

      --
      I am TheRaven on Soylent News
    10. Re:LaTeX by TheRaven64 · · Score: 3, Informative

      LaTeX is a macro system built on top of TeX. XeTeX is a TeX implementation. Whether any given LaTeX package will work with XeTeX (or any other TeX implementation, like pdftex) depends on whether it uses any implementation-specific features. Most don't.

      --
      I am TheRaven on Soylent News
    11. Re:LaTeX by smallfries · · Score: 4, Informative

      Further to the two AC posts: you are doing something wrong. As an academic I send / receive latex source all the time and expect it to compile and reproduce the same exact same results. How are you abusing TeX to get these kinds of problems?

      --
      Slashdot: where don knuth is an idiot because he cant grasp the awesome power of php
  2. PDF? by sys.stdout.write · · Score: 5, Informative

    As much as I hate Adobe, there's a reason why PDF files dominate acadamia..

  3. CSS3 is the solution by Tiles · · Score: 5, Informative

    This is exactly what CSS is designed for, presentation. The CSS3 Paged Media module already defines a number of the properties and settings you're going for. It even includes positions such as @bottom-center to allow you to position footnotes and references. The only thing missing is a way to mark this up in HTML, which could easily be done with anchors and the longdesc attribute, coupled with the CSS content: property. What you're looking for is a CSS3 enabled browser, not a new specification.

  4. Have you looked at PrinceXML? by sandford · · Score: 3, Informative

    Is there a reason you don't want to use CSS? Because, there are already CSS extensions that do exactly what you want. The book Cascading Style Sheets - Designing for the web, was written using only HTML and CSS and prepped for printing using PrinceXML. The PrinceXML web site has a bunch of HTML+CSS similar samples, including academic papers.

  5. Static Page Feeds are available by caffiend666 · · Score: 4, Informative

    Static configurations are available already, not the intelligent ones being requested. Has sufficed for what I needed:

    To have print page break add: <p style="page-break-before: always">

    Also, to hide odd font and underline for links:

    <STYLE TYPE="text/css" MEDIA=print> <!-- A { text-decoration: none; color: black } --> </STYLE>

    Yes, they have to be massaged a little.

    --
    Here's to losing my Karma Bonus again....
  6. Re:Nope by nlawalker · · Score: 3, Informative

    I should have been more clear: HTML describes the *structure* of a document, of which pages are not a part.

    As many have said above, you could use CSS if you really wanted to, since page specifications are presentational aspects of the document. Or, you could use LaTeX, which is designed for this kind of use.

  7. what do you want to do? by jipn4 · · Score: 4, Informative

    If you want to save the source form or markup, use a language designed for it: LaTeX. LaTeX lets you represent all the things you would want to represent in an academic paper, it's fairly readable, very widespread, and has tons of tools. And LaTeX converts to both HTML and PDF.

    If you want to display on the web, use HTML. It's meant for the web. It's not a good representation for paged media. If you must represent paged media, you need to use CSS or XSL, but you probably don't want to.

    If you want archival quality paged representations, PDF is the only game in town really. HTML with CSS doesn't come close. But it doesn't make sense to save your own papers only in PDF because PDF is not really editable and doesn't have the semantic information.

  8. Don't use HTML by emandres · · Score: 3, Informative

    You wouldn't want to use HTML for something like this, especially with newer versions of HTML. There has been a steady transition in HTML away from specification of the aesthetic appearance of a page. For this reason tags like and are considered nonstandard anymore, mostly because CSS does a way better (and cleaner) job of it.

    --
    The only way to tell the difference between a hamster and a gerbil is that the hamster has more white meat.
  9. XSL:FO by Roxton · · Score: 4, Informative

    There's a little-used standard that came out of the W3C along with XSLTs called XSL:FO. You write your document in XSL:FO markup, and then one of any number of processors like XEP to convert it into PDF or what have you.

    http://www.w3schools.com/xslfo/default.asp

    One of the original purposes of it was so that you could use XSLTs to transform the same XML data into both XHTML or XSL:FO for publishing. The standard never took off though. XSL:FO just doesn't have enough options to be typographically interesting, compared to SVG.

    Of course, the right answer is LaTeX, but you might want to give XSL:FO a try for familiarity's sake.

  10. Use DITA by wooden+pickle · · Score: 4, Informative

    Someone mentioned XML/XSL/FO. Don't try to write your content in XSL-FO. You'll hate every minute of it.

    I'd look in to using DITA (Darwin Information Typing Architecture). It's a set of canned XML structures, plus a specification for how to process and customize those structures. It includes tags for stuff like footnotes...I bet it covers a lot of your use cases. There are some good intros to how these XML structures work here: http://dita.xml.org/book/dita-wiki-knowledgebase

    As DITA is XML, you can convert it to HTML and whatever else you feel like, pretty easily. There's an open-source implementation of the DITA spec called the DITA Open Toolkit (http://sourceforge.net/projects/dita-ot/). The DITA Open Toolkit includes stylesheets/scripts to publish HTML and PDF, among other things. PDFs are published via XSL-FO. Just like HTML needs a web browser to render something useful, XSL-FO requires a FO processor to create a PDF. So, in the end you write DITA, XSLT and other scripts transform that DITA to XSL-FO, the a FO processor consumes the XSL-FO and spits out a PDF. The DITA Open Toolkit comes with an open-source FO processor (Apache FOP). FOP doesn't fulfill everyone's needs, but it might work very well for you.

    Unfortunately, working with the Open Toolkit and customizing its output can be a bit unwieldy. http://groups.yahoo.com/search?query=dita+users is a pretty good place to look for help.