Slashdot Mirror


HTML Tags For Academic Printing?

meketrefi writes "It's been quite a while since I got interested in the idea of using html (instead of .doc. or .odf) as a standard for saving documents — including the more official ones like academic papers. The problem is using HTML to create pages with a stable size that would deal with bibliographical references, page breaks, different printers, etc. Does anyone think it is possible to develop a decent tag like 'div,' but called 'page,' specially for this? Something that would make no use of CSS? Maybe something with attributes as follows: {page size="A4" borders="2.5cm,2.5cm,2cm,2cm" page_numbering="bottomleft,startfrom0"} — You get the idea... { /page} I guess you would not be able to tell when the page would be full, so the browser would have to be in charge of breaking the content into multiple pages when needed. Bibliographical references would probably need a special tag as well, positioned inside the tag ..." Is this such a crazy idea? What would you advise?

68 of 338 comments (clear)

  1. LaTeX by Anonymous Coward · · Score: 5, Informative

    You seem to be talking about LaTex. It already exists. Don't reinvent it.

    1. Re:LaTeX by Liquidrage · · Score: 5, Insightful

      Good answer. Now all we need is for someone to mod your post up to +5 and then lock the thread.

      *sigh* it's a slow news day

    2. Re:LaTeX by The+Snowman · · Score: 5, Informative

      You seem to be talking about LaTex. It already exists. Don't reinvent it.

      Another alternative is RTF, which is a sister SGML language of HTML. While it may have drawbacks, it would accomplish most if not all of what is required.

      --
      24 beers in a case, 24 hours in a day. Coincidence? I think not!
    3. Re:LaTeX by Anonymous Coward · · Score: 3, Informative

      No one really writes RTF by hand though. DocBook is more like what this person is suggesting.

    4. Re:LaTeX by fermion · · Score: 4, Informative
      Latex is really the solution. There is no reason to reinvent the wheel. In fact, reinventing the wheel might cause problems when submitting papers. From what I have seen, many academic journals prefer .tex and .eps files. I can't imagine what they would do with HTML.

      The nice thing about LaTex is that, like HTML, it is a pure markup language, but it is a markup language that understands typesetting so one tend to get a good page layout no matter what. OTOH, HTML merely identifies blocks of text as various generic types, and really does not have a context for the types. The render engine is free visualize, make a sound, or do whatever it wishes to represent the blocks. CSS is what imposes a consistent visual framework, so what one needs to duplicate LaTex is in fact CSS.

      Use LaTex. Except for the often limited fonts, it is vastly superior to an word processor, because a word processor is not the write tool to create real documents. We have know that for many years. That is why people bought pagemaker. And I think the lack of fonts forces people to create compelling content. LaTex is free, there are many good books,and if you do have a hankering to code, you can always play with Tex.

      --
      "She's a scientist and a lesbian. She's not going to let it slide." Orphan Black
    5. Re:LaTeX by Liquidrage · · Score: 2, Informative

      Sadly I have. I've spit out RTF docs from websites many moons ago back when we had lots of printing issues on the web (still sucks, but at least now it's manageable).

      In hindsight there was probably a better way...but I was young, and string manipulation is easy.

    6. Re:LaTeX by Anonymous Coward · · Score: 5, Informative

      Actually, the font problem is solved by using XeLaTeX (which uses XeTeX).

      Full OpenType support. Looks amazing.

    7. Re:LaTeX by dangitman · · Score: 2, Informative

      LaTex is a solution for certain (usually niche) purposes, but has its drawbacks, like everything else. The problem is that we are a dealing with an online world, where things are published in several different formats, online and offline. LaTex doesn't translate easily or cleanly into HTML, or vice-versa. And good luck getting people outside of math and science academia to use LaTex.

      This is a real problem, and shouldn't simply be brushed aside with "use this" comments. There currently is no workable format, no Lingua Franca for multipurpose documents. A solution should at least be attempted to make the web, word processing, page layout and typography interoperable.

      Don't get me wrong, I actually like LaTex, but it's an exercise in frustration trying to integrate it with the rest of the world.

      --
      ... and then they built the supercollider.
    8. Re:LaTeX by femto · · Score: 5, Informative

      LyX. I wrote a thesis in it and didn't have to resort to any manual interventions in the generated LaTeX. Couple it with SVG diagrams, generated by inkscape, and you have a seamless authoring system that handles both text and graphics. SVG means there is no messy task of keeping source and postscript output synchronised (just right click a diagram within LyX to edit the SVG source with inkscape). Use gnuplot to generate your (postscript) graphs and you have pretty well a complete authoring system. A few years ago, LyX and inkscape were too immature to use seriously, but they have matured. I recommend the combination.

    9. Re:LaTeX by Petrushka · · Score: 5, Interesting

      I have a sneaking suspicion that when the OP is saying things like "no CSS" and doesn't mention LaTeX, s/he is actually giving specifications in a very obfuscated way -- specifications that need to be deduced. What I take from the post is that the OP wants

      • Portability. Anyone can open an HTML file without having to install new software; the same doesn't go for ODF, LaTeX, or MSWord. I suspect this is the main thing the OP wants. But this shouldn't rule out CSS.
      • Everything in one file: I'm guessing this may be why the OP doesn't want CSS. But that's not a good reason to avoid CSS either, since CSS can perfectly easily go in the same file. (I think it does rule out editing the XML in ODF documents, though, since as far as I'm aware they're always a composite of several files.)
      • Read/edit in the same document. This could be another reason why the OP doesn't mention LaTeX. LaTeX is perfect for editing, not so great for reading: for that you have PDF. Maybe the OP doesn't want to have two separate files like that.

      I'm guessing the OP has been inspired by the use of HTML for slide presentations, in the form of S5. I can see that. But the specifications, if I've deduced them correctly, are not hugely well-thought-out ones. I can kind of see someone not wanting to use LaTeX for the reasons given above, but insisting on no CSS is crazy.

      In any case, the OP should certainly give slightly clearer specifications if s/he doesn't want to have people yelling "LaTeX!!!" all day.

    10. Re:LaTeX by Anonymous Coward · · Score: 2, Informative

      I think you are overstating the enjoyment possible when using LaTeX :)

    11. Re:LaTeX by plasticsquirrel · · Score: 5, Informative

      Use LaTex. Except for the often limited fonts, it is vastly superior to an word processor, because a word processor is not the write tool to create real documents. We have know that for many years. That is why people bought pagemaker. And I think the lack of fonts forces people to create compelling content. LaTex is free, there are many good books,and if you do have a hankering to code, you can always play with Tex.

      LaTeX has limited fonts, but if you use XeTeX (which uses LaTeX), you can not only use all the LaTeX stuff, but also any TrueType or OpenType font, and native unicode support as well. This is a godsend for typesetting anything that includes words or characters not in English, or just for people who are picky about typography. My personal favorite font is the open source SIL Gentium family, which is not only much more beautiful and readable than Times, but contains a fuller character set that makes it compatible with many more languages. Once you start writing documents with XeTeX and nicer fonts, you see how lacking word processors are for good typography and well-structured documents, and how self-limiting the concept is.

      For newcomers to LaTeX and XeTeX, including packages and specifying options can be a bit time-consuming when you just want to get started with a basic A4-sized document. Here is the basic XeTeX file template I use for simple stuff. I'm picky about margins, line spacing, fonts, etc. so you know that it's a safe place to start out.
      -

      \documentclass[11pt]{article}

      %%
      % XeTeX packages
      %%
      \usepackage{fontspec}
      \usepackage{xunicode}
      \usepackage{xltxtra}

      %%
      % Formatting packages
      %%
      \usepackage{setspace}
      \usepackage[vcentering,dvips]{geometry}
      \usepackage{fancyhdr}

      \geometry{papersize={8.5in,11in},total={6.5in,8.8in}}

      \pdfpagewidth 8.5in
      \pdfpageheight 11in

      % 10pt font: 1.15
      % 11pt font: 1.1
      \setstretch{1.1}

      \setmainfont[Mapping=tex-text]{Gentium Basic}

      \begin{document}

      Some text...

      \end{document}

      --
      Systemd: the PulseAudio of init systems
    12. Re:LaTeX by moosesocks · · Score: 5, Insightful

      Although I agree with you in that LaTeX is widely used in the scientific community, and unambiguously offers the best typesetting facilities you'll find outside of a publishing house, is it still appropriate today?

      The internet as we know it was created at CERN to facilitate the sharing of scientific information. Why are we still publishing in a format designed to be presented on dead trees?

      Like it or not, a properly-formatted print article looks horrible on a screen. An article formatted for printing on A4 or Letter-sized paper will use the whole width of the page, be set in 10-point type, and use columns. Unfortunately, modern computer screens don't have nearly enough resolution to display the full width of the page alongside much else. Obviously, PDF files also don't have the ability to flow to fit the width of the screen.

      LaTeX also doesn't give you the benefit of hypertext. Yes, there are various hacks you can use to add anchors and links to PDFs, although these are mere hacks on top of a broken format. Things such as high-resolution figures and hyperlinked references would be particularly beneficial for academic uses. It'd also be great to be able to see all articles linking back to what you happen to be reading. (This brings up all sorts of questions about the very nature of scientific publishing, although this is another debate entirely)

      Wikipedia (more specifically, MediaWiki) actually offers a promising solution to these (and the original poster's) requirements. It provides a convenient and simplistic markup for multi-sectioned articles, flows to fit the width of the page, and also provides LaTeX's fantastic mathematical typesetting facilities. Hyperlinking to other parts of the Wiki (and to external sites) is excessively easy. I'm sure the DOI system could be integrated to allow linking back to other articles within the constraints of the existing academic publishing regime.

      Google could very easily provide the "glue" to hold such a system together, although it would ultimately be better to put a public, non-profit entity in charge. It's absurd and hypocritical that so much of academic research (particularly the publishing part of it) is profit-driven.

      --
      -- If you try to fail and succeed, which have you done? - Uli's moose
    13. Re:LaTeX by nine-times · · Score: 2, Interesting

      On the other hand, .tex files don't render so well if you drop them into web browsers. I mean, which format you choose really should depend on what your needs are. If you want to be able to store a single copy that can be opened in a web browser or an office suite, HTML isn't necessarily a bad choice, but at this point you wont get great layout control. I really think it's reasonable to hope that as new versions of HTML and CSS come out, they should be aiming towards enabling people to have a good "print" media type CSS that gives professional layout results, but we aren't there yet. We aren't even really close.

      If you want people who know what they're doing to be making/editing these documents, then LaTex may be a good choice. If you want people to have normal everyday people to be able to open the file in an office suite they're comfortable with, then ODF is worth considering. If you want a widely supported format only for display/printing purposes (no editing) and you want tight layout control, then you won't do better than PDF.

      At this point, there is no format that does it all without any downsides. You have to pick the best tool for the job.

    14. Re:LaTeX by thefringthing · · Score: 3, Informative

      Most distributions of LaTeX come with some way to compile it and export to PDF, which is both portable and readable.

    15. Re:LaTeX by AigariusDebian · · Score: 3, Insightful

      LaTeX describes the document, just like HTML describes it, but with more structure about it. What you see in a web browser is not HTML, it is a *rendering* of HTML. Different browsers render the same HTML differently, for example a mobile browser will skip stuff and reformat other stuff. In the same way there are different LaTeX renderers - some output PDF, some output HTML, some output ODF. It would be much easier to use LaTeX for the source document and then compile a pretty HTML file and a PDF file from it than to craft it in HTML or some other XML variant directly. You can have links in LaTeX that render as normal links in the HTML output. Why stress?

    16. Re:LaTeX by AceofSpades19 · · Score: 3, Insightful

      I'm pretty sure every one that has a browser, has a pdf reader to read pdfs written in latex

    17. Re:LaTeX by andymadigan · · Score: 2, Insightful

      I've written few LaTeX documents that will work out of the box on most distributions, let alone all of them. Realistically, I can't send a LaTeX document to someone else and expect them to be able to edit it and read it, even if they have LaTeX installed.

      Unfortunately any program able to handle everyone's different styles for document printing is probably going to be too specialized for everyone to have. LaTeX shows that print layouts are a difficult problem. Even on webpages (screen display), to get really good layouts we rely on scripts, styles and templates from other sources, in most cases these are too numerous to make distribution of the document via e-mail trivial. Plus, we use specialized software (e.g. Dreamweaver).

      Unfortunately there's no good solution that I know of for this. Simply throwing text and images into a document does not make it readable, and there's no software that can simply take the jumble and make it readable, it takes a human touch to produce a good layout.

      --
      The right to protest the State is more sacred than the State.
    18. Re:LaTeX by MightyMartian · · Score: 2, Interesting

      Why is LaTeX archaic, but HTML not? LaTeX is infinitely more powerful when it comes to layout. Even with CSS, HTML has to be bent over backwards? But it's deeper than that. HTML is about rendering documents on many variant platforms and displays. It's about general text flow, not about pages. I think you could probably do what you wanted with CSS, but it would be horrible to code, and would be utterly screwed up on any screen that didn't match yours in resolution or in ratio. What about people who don't maximize their browsers? I mean, the whole point of HTML is that it can deal with all sorts of browser window sizes. I find CSS that explicitly sets screen real estate to be an incredibly pain in the ass, and essentially breaks the browser. What you're proposing would be even worse.

      Screens are not pieces of paper. They are entirely different mediums with entirely different requirements. Maybe if we all had 30" monitors, it would make some sense, but since that's not the trend (quite the opposite, in fact, we're going to smaller and smaller screens on netbooks and smartphones).

      HTML has its place, but there are proper typesetting languages out there, designed specifically for what you want. So learn them.

      Now, like I said, I think it would be very cool if someone would design a plugin that would allow browsers to render LaTeX files. That would be quite awesome, although I guess the reasonable argument could be made that just simply compiling them as a PDF would accomplish the same thing. But any such plugin would have to pretty much dispense with most of the typesetting markup or again, we'd be breaking the browser.

      --
      The world's burning. Moped Jesus spotted on I50. Details at 11.
    19. Re:LaTeX by Nitewing98 · · Score: 4, Informative

      Yes, not using CSS is crazy. Especially since CSS includes the concept of media types. A simple "media='print'" in the CSS spec should hand that.

      --

      Nitewing '98

      Everything works...in theory.

    20. Re:LaTeX by Idbar · · Score: 2, Informative

      it's the eternal fight between intepreted vs. compiled languages. You are arguing that html is easier to read. Well , it's not, neither latex. The fact that you have an interpreter that i's more common (a browser) is another thing. But don't be confused reading a plain html is as or more annoying than reading a tex file.
      Please also remember that the "compiled" output of latex is dvi, not ps or pdf.

    21. Re:LaTeX by amicusNYCL · · Score: 2, Informative

      Exactly. Until I can use a server scripting language to open a (binary) PDF template and do string find-replace, and not break the document, I'll stick with the ASCII-based RTF format.

      There are a lot of people who seem to be screaming "LaTex" as the solution to this non-problem. This is specifically what the PDF format was created for: to create a portable document that renders and prints the same anywhere. HTML is a fluid layout to fit various resolutions, PDF is guaranteed to print the same on any machine. That's exactly what it's for. There's no reason to bring LaTeX into the picture when you can export a regular word processing document to a PDF. Regardless of whether or not you can export to PDF from LaTeX, there's no reason to use LaTeX specifically vs. Word or Open Office or Acrobat or whatever else. The OP didn't even mention that the document should be editable by anyone other than the author.

      It's been quite a while since I got interested in the idea of using html (instead of .doc. or .odf) as a standard for saving documents

      PDF: you can export from your word processor, you can generate one easily from PHP, you can use Adobe's bloated software, whatever you want to use. It supports images, links, page size and margins, and it's guaranteed to print the same way on any printer. That's specifically what it's for.

      --
      "Our two-party system is like a bowl of shit looking at itself in a mirror." - Lewis Black
    22. Re:LaTeX by pjpII · · Score: 2, Insightful

      Latex is really the solution. There is no reason to reinvent the wheel. In fact, reinventing the wheel might cause problems when submitting papers. From what I have seen, many academic journals prefer .tex and .eps files. I can't imagine what they would do with HTML.

      Actually, that may be true of some academic journals, but most deal primarily with MS Word documents. Some publishers might grudgingly deal with Latex documents (I know John Benjamin's mentions it in their style requirements), but the people who run conferences and therefore are in charge of submitting the proceedings tend not to be computer saavy enough to work with anything other than MS Word files (god save whoever has to deal with the millions of random fonts people use, use/non-use of styles, etc).

      This of course depends on your field - in Comp Sci, I'd wager there're many more journals that regularly accept latex files. In linguistics, it's somewhat rarer, and as you get further into the humanities, it becomes increasingly difficult to find anyone who's heard of Latex at all.

    23. Re:LaTeX by moosesocks · · Score: 2, Insightful

      99.99% of LaTeX is output straight to PDF, Postscript, or (in special cases such as Wikipedia's math renderer) a rasterized image. The documentation, plugins, and user community of LaTeX all reflect this.

      I haven't come across any serious usage of LaTeX in the manner that you describe it.

      ODF and HTML do not support the full set of typographic features that LaTeX does. Something will almost certainly be lost in translation.

      Although I suppose it's possible to craft a source document that would look good both in print and as free-flowing hypertext, you'd need a zen-like command of the language. LaTeX has enough quirks as it is. I have a very difficult time accepting this as a practical solution.

      Also, if you're crafting a hypertext document, why not start with a language specifically designed for the task?

      --
      -- If you try to fail and succeed, which have you done? - Uli's moose
    24. Re:LaTeX by rhathar · · Score: 5, Insightful

      Portability:

      Name me an OS that doesn't have a PDF or PS reader installed by default.

      Windows

      --
      http://www.chaotickingdoms.com
    25. Re:LaTeX by (pvb)charon · · Score: 2

      Anyone can open an HTML file without having to install new software;

      Except for Windows 7 users in Europe...

    26. Re:LaTeX by colinrichardday · · Score: 2, Informative

      Because HTML continues to evolve LaTex does not.

      Really? How long is it taking the W3C to release HTML 5?

      It may be powerful at layout (but not as powerful as something like InDesign or Quark) but what I'm talking about it something that encompasses page layout, web design, and semantic markup.

      LaTeX is working on semantic markup
      http://tug.ctan.org/cgi-bin/ctanPackageInformation.py?id=stex

      and

      http://tug.ctan.org/cgi-bin/ctanPackageInformation.py?id=cool

      As for web design, people are working on converting LaTeX to MathML.

      it's also to be able to use that file to create decent web pages without any modification, that works with content management systems and the like.

      And how do you presume to get all the browser vendors on board?

    27. Re:LaTeX by TheRaven64 · · Score: 3, Informative

      CSS has had attributes for page breaks since CSS 2.1. I've not played with them for a while, but Opera supported them correctly back in 2003, so I'd imagine that they work well now. You can find the relevant part of the specification here. It lets you specify margins, soft breaks, rules for two-sided printing and so on. I played with the idea of using HTML instead of LaTeX for a while, but decided not to for two reasons. First, LaTeX is easier to type and read, and second because LaTeX produces nicer-looking output.

      --
      I am TheRaven on Soylent News
    28. Re:LaTeX by TheRaven64 · · Score: 3, Informative

      LaTeX is a macro system built on top of TeX. XeTeX is a TeX implementation. Whether any given LaTeX package will work with XeTeX (or any other TeX implementation, like pdftex) depends on whether it uses any implementation-specific features. Most don't.

      --
      I am TheRaven on Soylent News
    29. Re:LaTeX by cronostitan · · Score: 2, Informative

      I recently had to implement a proper print-view with CSS covering several pages of print outpu. What I can tell is that it is a pain in the a** - since alot of print-specific CSS attributes are not supported by actual browser-version - Opera being the exception from the sad rule. For example most browser do not support the command to keep divs intact and do the pagebreak automatically before or after the div. We ended up having the user to decide when he needs a page break.

      I honestly can understand that someone get frustrated and wants to use a 'better' way.

      --
      Spelling errors were made for your amusement only...
    30. Re:LaTeX by Anonymous Coward · · Score: 2, Insightful

      Please also remember that the "compiled" output of C is PDP-11 machine code, not x86 code or CLR code.

    31. Re:LaTeX by 1u3hr · · Score: 2, Insightful
      We need to bring sanity in document formats to the average user. To the professor of arts who doesn't know anything about a computer except how to use Word. I have wished for a long time that people who write and publish would develop some sort of typographical literacy, but the reality is that it's never going to happen.

      I edit and layout books, and the worst problems are when the author DOES think he has typographic literacy. If I let them have their way they print books set in Arial 10 point with vertical quotemarks, and I could go on.

      It took me months of study and years of practice to get the degree of typographic literacy I have now.

      They will just waste their time on details that will actually impede the publishing process if not stripped out.

      It's like cutting your own hair -- yes, you can do it. But you're less likely to make a fool of yourself if you pay someone who does it all day long a couple of dollars to do it right.

      Authors ideally should not be concerned with visual layout. They just need to make sure that the logical structure (headings, notes, location of diagrams) is clear. Doesn't matter if they use Courier or Comic sans.

    32. Re:LaTeX by smallfries · · Score: 4, Informative

      Further to the two AC posts: you are doing something wrong. As an academic I send / receive latex source all the time and expect it to compile and reproduce the same exact same results. How are you abusing TeX to get these kinds of problems?

      --
      Slashdot: where don knuth is an idiot because he cant grasp the awesome power of php
    33. Re:LaTeX by YttriumOxide · · Score: 2, Interesting

      I'll ignore for the moment that "Workbench" isn't an OS (I'm assuming you mean AmigaOS), but I do want to point out that Workbench has had a PDF reader since version 4.0 (6 years ago), and a nice frontend for GhostScript as well.

      --
      My book about LSD and Self-Discovery
      Also on facebook as: DroppingAcidDaleBewan
    34. Re:LaTeX by TerranFury · · Score: 4, Interesting

      But... not everything about the PDF is specified by the LaTeX source -- and the toolchain matters. For instance, a document prepared for pdflatex with pdf figures and another prepared for the latex-->dvips-->ps2pdf route (which is often necessary as a number of journal styles use some pstricks) will in general not work with the opposite toolchain. Another example is paper size; certain of these tools output either letter or A4 by default, and must be instructed on the command line (or, really, in your build scripts) when you want the other (I know you can specify paper sizes in the source, but this is lost somewhere in the toolchain).

      Download ubuntu on one computer. Use apt to install kile and all its dependecies. Compile a paper written with the IEEE conference style. Now install Windows on another computer. Install MiKTeX on it and do the same. You will get similar output, but it will by no means be identical. The most noticeable thing is that margins are different.

      Oh, and so far I've ignored in this discussion that different styles will use different methods for including, say, theorems. It is a pipe dream to simply change the style of a document and expect decent results. Chances are the damn thing won't even compile -- and if it does, all your beautiful theorems will look like crap because the other style expected some different markup for them.

      I don't rule out that I'm doing something wrong, and if I am, I could stand to use some enlightenment. But I know that I don't use LaTeX significantly differently from anyone else I know...

  2. Congratulations! by Anonymous Coward · · Score: 5, Insightful

    Congratulations, you're the 5,134,978th person to suggest a change to HTML which will prevent it from being reflowable!

    Please step up to the spiked door in front of the acid pit to claim your prize.

    1. Re:Congratulations! by Memroid · · Score: 5, Funny

      Speaking of HTML/CSS... can I be the first person to suggest that we rename "Anonymous Cowardon" back to "Anonymous Coward"?

  3. PDF? by sys.stdout.write · · Score: 5, Informative

    As much as I hate Adobe, there's a reason why PDF files dominate acadamia..

  4. CSS3 is the solution by Tiles · · Score: 5, Informative

    This is exactly what CSS is designed for, presentation. The CSS3 Paged Media module already defines a number of the properties and settings you're going for. It even includes positions such as @bottom-center to allow you to position footnotes and references. The only thing missing is a way to mark this up in HTML, which could easily be done with anchors and the longdesc attribute, coupled with the CSS content: property. What you're looking for is a CSS3 enabled browser, not a new specification.

  5. ODF by minsk · · Score: 2, Insightful

    LaTeX already got mentioned, and probably makes more sense.

    If you really want an unreadable super-general XML-based format, use ODF.

  6. Have you looked at PrinceXML? by sandford · · Score: 3, Informative

    Is there a reason you don't want to use CSS? Because, there are already CSS extensions that do exactly what you want. The book Cascading Style Sheets - Designing for the web, was written using only HTML and CSS and prepped for printing using PrinceXML. The PrinceXML web site has a bunch of HTML+CSS similar samples, including academic papers.

    1. Re:Have you looked at PrinceXML? by ccvqc · · Score: 2, Interesting

      I've had great experience with PrinceXML -- same document to generate both interactive web page and printable PDF using CSS3 tailored to the media. If you already know HTML/CSS, extending yourself to CSS3 is a lot easier than learning LaTeX.

  7. Wrong, in many ways by Zaffle · · Score: 5, Insightful

    What you want (being able to define pages) is wrong in many many ways.

    You should, as an authoring tool, never define a page, or its dimensions, especially academic works, which will be printed in different formats, on different paper (A4/Letter/Tradeback/etc/etc)

    At most, whatever markup you have, many define things like page breaks, but even then, they are more a typesetting issue.

    What you want is either LaTeX or DocBook.

    --

    I use to have a funny sig, but slash cut it off, and I forgot what the punchline was.
    1. Re:Wrong, in many ways by Ambiguous+Coward · · Score: 4, Funny

      You should, as an authoring tool, never...

      Who're you callin' a tool?

      --
      Their may be a grammatical error, misspeling, or evn a typo in this post.
  8. Static Page Feeds are available by caffiend666 · · Score: 4, Informative

    Static configurations are available already, not the intelligent ones being requested. Has sufficed for what I needed:

    To have print page break add: <p style="page-break-before: always">

    Also, to hide odd font and underline for links:

    <STYLE TYPE="text/css" MEDIA=print> <!-- A { text-decoration: none; color: black } --> </STYLE>

    Yes, they have to be massaged a little.

    --
    Here's to losing my Karma Bonus again....
    1. Re:Static Page Feeds are available by dgatwood · · Score: 2, Insightful

      The HTML comments inside of style tags are still a good idea. Although no modern browser requires it, not everything that parses HTML is a full-blown web browser. Those extra seven bytes don't hurt anything, and they pretty much guarantee that any code with anything resembling a proper HTML parser won't interpret the styles, JavaScript, etc. as content even if the tool doesn't understand or care about specific tags.

      Perhaps more importantly, from a purely philosophical point of view, leaving out the comments in style tags is wrong. That line noise is not part of the content, and therefore should be fundamentally separated from the presentation. Other stuff like that (link URLs, image URLs, inline styles, etc.) are all in HTML attributes or otherwise sequestered from the text content. Putting CSS or JavaScript bare inside a tag without surrounding it with comment markers violates the fundamental philosophy of HTML. Yes, this means the XHTML spec is fundamentally defective by design.

      I'll leave the uppercase/lowercase flame war to people who care.

      --

      Check out my sci-fi/humor trilogy at PatriotsBooks.

  9. hey, why don't we... by Anonymous Coward · · Score: 2, Insightful

    create yet another little-used and poorly supported document format...

  10. Re:Nope by nlawalker · · Score: 3, Informative

    I should have been more clear: HTML describes the *structure* of a document, of which pages are not a part.

    As many have said above, you could use CSS if you really wanted to, since page specifications are presentational aspects of the document. Or, you could use LaTeX, which is designed for this kind of use.

  11. You don't actually want HTML by Anonymous Coward · · Score: 2, Insightful

    Seriously. It's pretty bad. You can, however, use Docbook (or your own schema or Docbook extended with your own stuff) and XSLT it into XTHML (or something entirely different) at the end.

    Most likely you just want to use Latex though.

  12. what do you want to do? by jipn4 · · Score: 4, Informative

    If you want to save the source form or markup, use a language designed for it: LaTeX. LaTeX lets you represent all the things you would want to represent in an academic paper, it's fairly readable, very widespread, and has tons of tools. And LaTeX converts to both HTML and PDF.

    If you want to display on the web, use HTML. It's meant for the web. It's not a good representation for paged media. If you must represent paged media, you need to use CSS or XSL, but you probably don't want to.

    If you want archival quality paged representations, PDF is the only game in town really. HTML with CSS doesn't come close. But it doesn't make sense to save your own papers only in PDF because PDF is not really editable and doesn't have the semantic information.

  13. Don't use HTML by emandres · · Score: 3, Informative

    You wouldn't want to use HTML for something like this, especially with newer versions of HTML. There has been a steady transition in HTML away from specification of the aesthetic appearance of a page. For this reason tags like and are considered nonstandard anymore, mostly because CSS does a way better (and cleaner) job of it.

    --
    The only way to tell the difference between a hamster and a gerbil is that the hamster has more white meat.
    1. Re:Don't use HTML by Wonko+the+Sane · · Score: 3, Insightful

      HTML was never supposed to do those things in the first place. The tags you are referring were hacks invented because CSS did not exist yet.

      Unfortunately there is a whole generation of "web developers" who don't understand the concepts of semantic markup and output device-independent layouts.

  14. In my day by Barny · · Score: 2, Insightful

    I used netscape communicator to write all my papers for uni, mainly because it was available under windows and unix (IRIX in our case) and could be read by anyone on any platform.

    It was a reasonably easy to use editor, without all the useless crap most others have.

    A few lecturers were quite impressed with the idea, the portability and cost were big factors.

    --
    ...
    /me sighs
    1. Re:In my day by tbird81 · · Score: 2

      But the amount of time people waste on page breaking where they want, font selection, "just so" footnote standards, etc. is a sign of people who don't have anything to actually say.

      While you're probably correct, I spent ages choosing the correct font, making sure pair-kerning was on, making sure headings were standard in size and form (i.e. setting up new Styles), getting rid of widows/orphans, and basically all sorts of procrastinating shit to delay actually typing the essay.

      I always did much better than the people with non-professional looking assignments, even though most of what I write is crap.

      Never underestimate the power of first-impressions and of looking good.

  15. XSL:FO by Roxton · · Score: 4, Informative

    There's a little-used standard that came out of the W3C along with XSLTs called XSL:FO. You write your document in XSL:FO markup, and then one of any number of processors like XEP to convert it into PDF or what have you.

    http://www.w3schools.com/xslfo/default.asp

    One of the original purposes of it was so that you could use XSLTs to transform the same XML data into both XHTML or XSL:FO for publishing. The standard never took off though. XSL:FO just doesn't have enough options to be typographically interesting, compared to SVG.

    Of course, the right answer is LaTeX, but you might want to give XSL:FO a try for familiarity's sake.

    1. Re:XSL:FO by styrotech · · Score: 2, Informative

      You write your document in XSL:FO markup, and then one of any number of processors like XEP to convert it into PDF or what have you.

      Ouch :)

      Hand writing XSL:FO is extremely painful - very fiddly and the embedded layout/styling gets tedious quickly. It's kinda like writing a very very long webpage using HTML 3.2 with all the nasty old embedded presentation tags (but worse).

      One of the original purposes of it was so that you could use XSLTs to transform the same XML data into both XHTML or XSL:FO for publishing.

      I have a feeling that is a bit backwards. The original standard was XSL and it was going to include everything related to transforms and publishing, but it got too large and complex so they split it into XSLT for the transforms and XSL:FO for the page description language. Much better that way, as XSLT has wider uses than publishing.

      I think XSL:FO was always intended to be generated via XSLT rather than hand written, and I don't think that has changed at all. That way if you only need to a styling change, rather than making a zillion edits throughout the document you change the transform. It is analogous to how CSS make styling changes much easier with HTML.

      Personally I'd rather use some other semantic format (eg Docbook, DITA etc) that can be transformed into XSL:FO via XSLT when required (eg on the way to PDF generation). That way you already get some handy XSLT starting points to work with. Making the occasional small tweak to XSLT isn't too bad, but writing a large complex set of transforms from scratch isn't something I'd want to do :)

  16. Use DITA by wooden+pickle · · Score: 4, Informative

    Someone mentioned XML/XSL/FO. Don't try to write your content in XSL-FO. You'll hate every minute of it.

    I'd look in to using DITA (Darwin Information Typing Architecture). It's a set of canned XML structures, plus a specification for how to process and customize those structures. It includes tags for stuff like footnotes...I bet it covers a lot of your use cases. There are some good intros to how these XML structures work here: http://dita.xml.org/book/dita-wiki-knowledgebase

    As DITA is XML, you can convert it to HTML and whatever else you feel like, pretty easily. There's an open-source implementation of the DITA spec called the DITA Open Toolkit (http://sourceforge.net/projects/dita-ot/). The DITA Open Toolkit includes stylesheets/scripts to publish HTML and PDF, among other things. PDFs are published via XSL-FO. Just like HTML needs a web browser to render something useful, XSL-FO requires a FO processor to create a PDF. So, in the end you write DITA, XSLT and other scripts transform that DITA to XSL-FO, the a FO processor consumes the XSL-FO and spits out a PDF. The DITA Open Toolkit comes with an open-source FO processor (Apache FOP). FOP doesn't fulfill everyone's needs, but it might work very well for you.

    Unfortunately, working with the Open Toolkit and customizing its output can be a bit unwieldy. http://groups.yahoo.com/search?query=dita+users is a pretty good place to look for help.

  17. texexplorer by e**(i+pi)-1 · · Score: 4, Interesting

    yes, latex is nice, but it would be even better, if basic TeX would
    be understood by browsers.  About 10 years ago, IBM had a cool plugin called texexplorer.
    The plugin would compile latex on the fly. No need to publish a PDF. It worked
    pretty well for basic documents which would not rely on macros.

    Still, to address the question of the submitter, it would be nice to have something like

    <latex>
    $\int_0^1  \frac{\sqrt{\sin(x)}}{1+x^2} \; dx$.
    </latex>

    It would not have to be the full latex stack but the ability to place mini latex pages into
    HTML documents. Its a pity techexplorer technology seems have disappeared. If IBM would
    opensource it, it could become an add-on for firefox.

    1. Re:texexplorer by Yobgod+Ababua · · Score: 2, Informative

      That example sounds like it could be well rendered using MathML... http://www.w3.org/Math/

  18. Re:wondering if we should let go of standard tags by Hecatonchires · · Score: 2, Insightful

    "You know, if we abstract this back one level" Now we find the true terror of computer science.

    --

    Yay me!

  19. Learn the tools first, then worry about changing by crmartin · · Score: 2, Informative

    See, as someone has already pointed out, there's at least one such tool that's in wide use already: TeX and LaTeX. If you don't like that one, it turns out that HTML, with CSS and a little bit of Javascript, is perfectly capable of doing all the things you want, too. You just have to learn how. Have a look at Lie's Cascading Style Sheets: Designing for the Web (written and typeset in HTML/CSS) and at Prince XML for detailed examples.

  20. You're on the right track, for the wrong reason. by mellon · · Score: 2, Informative

    The ability to cite an HTML document is something that would indeed be useful. The ability to hard code page numbers into an HTML document isn't. The reason why academia and the press have been so resistant to HTML, historically, is that you don't get any control over page layout. Which means that you can't refer to things by page number.

    The solution isn't to fix HTML so that you can number pages. It is to fix the bibliographic references to not use page numbers. Generally speaking, it's not hard to number documents by section, and you can make the numbering fine-grained enough for bibliographic references. Then refer to the chapter and section, rather than the page number in your bibliography, and you're done. No need to "fix" HTML.

    It might make sense to ID paragraphs in HTML, so that you could simply refer to the paragraph ID in your bibliography. If this were simply document metadata, and didn't have anything to do with layout, it would work pretty well. As a bonus, you wouldn't need to renumber, because the ID would just be an arbitrary cookie, and wouldn't need to make sense to a human.

    Of course, with hypertext, there's really no need for a bibliography anyway. Just link to the text you're referencing... But I realize that that's impractical in academia at the moment. I'm just saying...

  21. Re:wondering if we should let go of standard tags by Draek · · Score: 3, Insightful

    There's no complexity problem that cannot be solved by adding a layer of abstraction, nor performance problem that cannot be solved by removing a layer of abstraction.

    Though I must note that you can already define your own tags in HTML+CSS and, while the W3 validator will (rightfully) complain loudly about them, most browsers deal with them just fine.

    --
    No problem is insoluble in all conceivable circumstances.
  22. Let CSS work for you! by Fireflymantis · · Score: 2, Insightful

    <html>
      <head>
        <title>Abstract of a usable design</title>
        <style type="text/css">
          @media print {
             body { margin: 2.5cm; }
          }
          @media screen {
             body { margin:  50px; width: 50%; }
          }
          body { font-family: sans-serif; font-size: 12pt; }
        </style>
      </head>
      <body>
        <h1>It's so crazy it just might work</h2>
        <h2>and other html inspired musings</h2>
        <p>Why not just use css?</p>
        <p>Also, don't worry about page numbering. that's the browser's job.</p>
      </body>
    </html>

  23. I am curious to know... by pdboddy · · Score: 2, Insightful

    Why the OP specified not using CSS and then suggested an HTML element that looks almost exactly like CSS?

    CSS has a method of creating pages, for printing and more. It's no more difficult to learn than HTML is. You could use XML, create all the custom tags you want, and use XSL (oh look stylesheets again) to style the XML however you want.

    HTML5 is coming out in the near or distant future, if you have suggestions for tags and functions, you might want to try to get involved with the W3.

    --
    Julie Moult is an idiot.
  24. How portableis HTML? by colinrichardday · · Score: 2, Insightful

    These different browsers render HTMLdifferently.

  25. restructuredtext... by tyroneking · · Score: 2, Informative

    is made for what you're talking about

    Documents are written in human-readable text format - good for storing in version control and using for diffs
    Python Docutils is used to convert to HTML and/or LaTeX and a few other formats
    rst2pdf is a tool that converts to beautiful PDF (easier than using Docutils + LaTeX)