Slashdot Mirror


HTML Tags For Academic Printing?

meketrefi writes "It's been quite a while since I got interested in the idea of using html (instead of .doc. or .odf) as a standard for saving documents — including the more official ones like academic papers. The problem is using HTML to create pages with a stable size that would deal with bibliographical references, page breaks, different printers, etc. Does anyone think it is possible to develop a decent tag like 'div,' but called 'page,' specially for this? Something that would make no use of CSS? Maybe something with attributes as follows: {page size="A4" borders="2.5cm,2.5cm,2cm,2cm" page_numbering="bottomleft,startfrom0"} — You get the idea... { /page} I guess you would not be able to tell when the page would be full, so the browser would have to be in charge of breaking the content into multiple pages when needed. Bibliographical references would probably need a special tag as well, positioned inside the tag ..." Is this such a crazy idea? What would you advise?

14 of 338 comments (clear)

  1. LaTeX by Anonymous Coward · · Score: 5, Informative

    You seem to be talking about LaTex. It already exists. Don't reinvent it.

    1. Re:LaTeX by Liquidrage · · Score: 5, Insightful

      Good answer. Now all we need is for someone to mod your post up to +5 and then lock the thread.

      *sigh* it's a slow news day

    2. Re:LaTeX by The+Snowman · · Score: 5, Informative

      You seem to be talking about LaTex. It already exists. Don't reinvent it.

      Another alternative is RTF, which is a sister SGML language of HTML. While it may have drawbacks, it would accomplish most if not all of what is required.

      --
      24 beers in a case, 24 hours in a day. Coincidence? I think not!
    3. Re:LaTeX by Anonymous Coward · · Score: 5, Informative

      Actually, the font problem is solved by using XeLaTeX (which uses XeTeX).

      Full OpenType support. Looks amazing.

    4. Re:LaTeX by femto · · Score: 5, Informative

      LyX. I wrote a thesis in it and didn't have to resort to any manual interventions in the generated LaTeX. Couple it with SVG diagrams, generated by inkscape, and you have a seamless authoring system that handles both text and graphics. SVG means there is no messy task of keeping source and postscript output synchronised (just right click a diagram within LyX to edit the SVG source with inkscape). Use gnuplot to generate your (postscript) graphs and you have pretty well a complete authoring system. A few years ago, LyX and inkscape were too immature to use seriously, but they have matured. I recommend the combination.

    5. Re:LaTeX by Petrushka · · Score: 5, Interesting

      I have a sneaking suspicion that when the OP is saying things like "no CSS" and doesn't mention LaTeX, s/he is actually giving specifications in a very obfuscated way -- specifications that need to be deduced. What I take from the post is that the OP wants

      • Portability. Anyone can open an HTML file without having to install new software; the same doesn't go for ODF, LaTeX, or MSWord. I suspect this is the main thing the OP wants. But this shouldn't rule out CSS.
      • Everything in one file: I'm guessing this may be why the OP doesn't want CSS. But that's not a good reason to avoid CSS either, since CSS can perfectly easily go in the same file. (I think it does rule out editing the XML in ODF documents, though, since as far as I'm aware they're always a composite of several files.)
      • Read/edit in the same document. This could be another reason why the OP doesn't mention LaTeX. LaTeX is perfect for editing, not so great for reading: for that you have PDF. Maybe the OP doesn't want to have two separate files like that.

      I'm guessing the OP has been inspired by the use of HTML for slide presentations, in the form of S5. I can see that. But the specifications, if I've deduced them correctly, are not hugely well-thought-out ones. I can kind of see someone not wanting to use LaTeX for the reasons given above, but insisting on no CSS is crazy.

      In any case, the OP should certainly give slightly clearer specifications if s/he doesn't want to have people yelling "LaTeX!!!" all day.

    6. Re:LaTeX by plasticsquirrel · · Score: 5, Informative

      Use LaTex. Except for the often limited fonts, it is vastly superior to an word processor, because a word processor is not the write tool to create real documents. We have know that for many years. That is why people bought pagemaker. And I think the lack of fonts forces people to create compelling content. LaTex is free, there are many good books,and if you do have a hankering to code, you can always play with Tex.

      LaTeX has limited fonts, but if you use XeTeX (which uses LaTeX), you can not only use all the LaTeX stuff, but also any TrueType or OpenType font, and native unicode support as well. This is a godsend for typesetting anything that includes words or characters not in English, or just for people who are picky about typography. My personal favorite font is the open source SIL Gentium family, which is not only much more beautiful and readable than Times, but contains a fuller character set that makes it compatible with many more languages. Once you start writing documents with XeTeX and nicer fonts, you see how lacking word processors are for good typography and well-structured documents, and how self-limiting the concept is.

      For newcomers to LaTeX and XeTeX, including packages and specifying options can be a bit time-consuming when you just want to get started with a basic A4-sized document. Here is the basic XeTeX file template I use for simple stuff. I'm picky about margins, line spacing, fonts, etc. so you know that it's a safe place to start out.
      -

      \documentclass[11pt]{article}

      %%
      % XeTeX packages
      %%
      \usepackage{fontspec}
      \usepackage{xunicode}
      \usepackage{xltxtra}

      %%
      % Formatting packages
      %%
      \usepackage{setspace}
      \usepackage[vcentering,dvips]{geometry}
      \usepackage{fancyhdr}

      \geometry{papersize={8.5in,11in},total={6.5in,8.8in}}

      \pdfpagewidth 8.5in
      \pdfpageheight 11in

      % 10pt font: 1.15
      % 11pt font: 1.1
      \setstretch{1.1}

      \setmainfont[Mapping=tex-text]{Gentium Basic}

      \begin{document}

      Some text...

      \end{document}

      --
      Systemd: the PulseAudio of init systems
    7. Re:LaTeX by moosesocks · · Score: 5, Insightful

      Although I agree with you in that LaTeX is widely used in the scientific community, and unambiguously offers the best typesetting facilities you'll find outside of a publishing house, is it still appropriate today?

      The internet as we know it was created at CERN to facilitate the sharing of scientific information. Why are we still publishing in a format designed to be presented on dead trees?

      Like it or not, a properly-formatted print article looks horrible on a screen. An article formatted for printing on A4 or Letter-sized paper will use the whole width of the page, be set in 10-point type, and use columns. Unfortunately, modern computer screens don't have nearly enough resolution to display the full width of the page alongside much else. Obviously, PDF files also don't have the ability to flow to fit the width of the screen.

      LaTeX also doesn't give you the benefit of hypertext. Yes, there are various hacks you can use to add anchors and links to PDFs, although these are mere hacks on top of a broken format. Things such as high-resolution figures and hyperlinked references would be particularly beneficial for academic uses. It'd also be great to be able to see all articles linking back to what you happen to be reading. (This brings up all sorts of questions about the very nature of scientific publishing, although this is another debate entirely)

      Wikipedia (more specifically, MediaWiki) actually offers a promising solution to these (and the original poster's) requirements. It provides a convenient and simplistic markup for multi-sectioned articles, flows to fit the width of the page, and also provides LaTeX's fantastic mathematical typesetting facilities. Hyperlinking to other parts of the Wiki (and to external sites) is excessively easy. I'm sure the DOI system could be integrated to allow linking back to other articles within the constraints of the existing academic publishing regime.

      Google could very easily provide the "glue" to hold such a system together, although it would ultimately be better to put a public, non-profit entity in charge. It's absurd and hypocritical that so much of academic research (particularly the publishing part of it) is profit-driven.

      --
      -- If you try to fail and succeed, which have you done? - Uli's moose
    8. Re:LaTeX by rhathar · · Score: 5, Insightful

      Portability:

      Name me an OS that doesn't have a PDF or PS reader installed by default.

      Windows

      --
      http://www.chaotickingdoms.com
  2. Congratulations! by Anonymous Coward · · Score: 5, Insightful

    Congratulations, you're the 5,134,978th person to suggest a change to HTML which will prevent it from being reflowable!

    Please step up to the spiked door in front of the acid pit to claim your prize.

    1. Re:Congratulations! by Memroid · · Score: 5, Funny

      Speaking of HTML/CSS... can I be the first person to suggest that we rename "Anonymous Cowardon" back to "Anonymous Coward"?

  3. PDF? by sys.stdout.write · · Score: 5, Informative

    As much as I hate Adobe, there's a reason why PDF files dominate acadamia..

  4. CSS3 is the solution by Tiles · · Score: 5, Informative

    This is exactly what CSS is designed for, presentation. The CSS3 Paged Media module already defines a number of the properties and settings you're going for. It even includes positions such as @bottom-center to allow you to position footnotes and references. The only thing missing is a way to mark this up in HTML, which could easily be done with anchors and the longdesc attribute, coupled with the CSS content: property. What you're looking for is a CSS3 enabled browser, not a new specification.

  5. Wrong, in many ways by Zaffle · · Score: 5, Insightful

    What you want (being able to define pages) is wrong in many many ways.

    You should, as an authoring tool, never define a page, or its dimensions, especially academic works, which will be printed in different formats, on different paper (A4/Letter/Tradeback/etc/etc)

    At most, whatever markup you have, many define things like page breaks, but even then, they are more a typesetting issue.

    What you want is either LaTeX or DocBook.

    --

    I use to have a funny sig, but slash cut it off, and I forgot what the punchline was.