Slashdot Mirror


HTML Tags For Academic Printing?

meketrefi writes "It's been quite a while since I got interested in the idea of using html (instead of .doc. or .odf) as a standard for saving documents — including the more official ones like academic papers. The problem is using HTML to create pages with a stable size that would deal with bibliographical references, page breaks, different printers, etc. Does anyone think it is possible to develop a decent tag like 'div,' but called 'page,' specially for this? Something that would make no use of CSS? Maybe something with attributes as follows: {page size="A4" borders="2.5cm,2.5cm,2cm,2cm" page_numbering="bottomleft,startfrom0"} — You get the idea... { /page} I guess you would not be able to tell when the page would be full, so the browser would have to be in charge of breaking the content into multiple pages when needed. Bibliographical references would probably need a special tag as well, positioned inside the tag ..." Is this such a crazy idea? What would you advise?

338 comments

  1. LaTeX by Anonymous Coward · · Score: 5, Informative

    You seem to be talking about LaTex. It already exists. Don't reinvent it.

    1. Re:LaTeX by Liquidrage · · Score: 5, Insightful

      Good answer. Now all we need is for someone to mod your post up to +5 and then lock the thread.

      *sigh* it's a slow news day

    2. Re:LaTeX by The+Snowman · · Score: 5, Informative

      You seem to be talking about LaTex. It already exists. Don't reinvent it.

      Another alternative is RTF, which is a sister SGML language of HTML. While it may have drawbacks, it would accomplish most if not all of what is required.

      --
      24 beers in a case, 24 hours in a day. Coincidence? I think not!
    3. Re:LaTeX by nevhan · · Score: 1

      Indeed, LaTex is standard in academia, I keep all my written work in LaTex format. I then usually convert to PDF for submission. Its pretty :

    4. Re:LaTeX by Anonymous Coward · · Score: 3, Informative

      No one really writes RTF by hand though. DocBook is more like what this person is suggesting.

    5. Re:LaTeX by fermion · · Score: 4, Informative
      Latex is really the solution. There is no reason to reinvent the wheel. In fact, reinventing the wheel might cause problems when submitting papers. From what I have seen, many academic journals prefer .tex and .eps files. I can't imagine what they would do with HTML.

      The nice thing about LaTex is that, like HTML, it is a pure markup language, but it is a markup language that understands typesetting so one tend to get a good page layout no matter what. OTOH, HTML merely identifies blocks of text as various generic types, and really does not have a context for the types. The render engine is free visualize, make a sound, or do whatever it wishes to represent the blocks. CSS is what imposes a consistent visual framework, so what one needs to duplicate LaTex is in fact CSS.

      Use LaTex. Except for the often limited fonts, it is vastly superior to an word processor, because a word processor is not the write tool to create real documents. We have know that for many years. That is why people bought pagemaker. And I think the lack of fonts forces people to create compelling content. LaTex is free, there are many good books,and if you do have a hankering to code, you can always play with Tex.

      --
      "She's a scientist and a lesbian. She's not going to let it slide." Orphan Black
    6. Re:LaTeX by Liquidrage · · Score: 2, Informative

      Sadly I have. I've spit out RTF docs from websites many moons ago back when we had lots of printing issues on the web (still sucks, but at least now it's manageable).

      In hindsight there was probably a better way...but I was young, and string manipulation is easy.

    7. Re:LaTeX by Anonymous Coward · · Score: 5, Informative

      Actually, the font problem is solved by using XeLaTeX (which uses XeTeX).

      Full OpenType support. Looks amazing.

    8. Re:LaTeX by scubamage · · Score: 1

      LaTeX works. It's about as enjoyable as a proctological exam being performed by a porcupine, but it does work.

    9. Re:LaTeX by dangitman · · Score: 2, Informative

      LaTex is a solution for certain (usually niche) purposes, but has its drawbacks, like everything else. The problem is that we are a dealing with an online world, where things are published in several different formats, online and offline. LaTex doesn't translate easily or cleanly into HTML, or vice-versa. And good luck getting people outside of math and science academia to use LaTex.

      This is a real problem, and shouldn't simply be brushed aside with "use this" comments. There currently is no workable format, no Lingua Franca for multipurpose documents. A solution should at least be attempted to make the web, word processing, page layout and typography interoperable.

      Don't get me wrong, I actually like LaTex, but it's an exercise in frustration trying to integrate it with the rest of the world.

      --
      ... and then they built the supercollider.
    10. Re:LaTeX by femto · · Score: 5, Informative

      LyX. I wrote a thesis in it and didn't have to resort to any manual interventions in the generated LaTeX. Couple it with SVG diagrams, generated by inkscape, and you have a seamless authoring system that handles both text and graphics. SVG means there is no messy task of keeping source and postscript output synchronised (just right click a diagram within LyX to edit the SVG source with inkscape). Use gnuplot to generate your (postscript) graphs and you have pretty well a complete authoring system. A few years ago, LyX and inkscape were too immature to use seriously, but they have matured. I recommend the combination.

    11. Re:LaTeX by Anonymous Coward · · Score: 0

      There are a few LaTeX compilers that output HTML, one of them apparently even outputs to ODF
      link: http://enc.com.au/docs/latexhtml.html (found with google)

    12. Re:LaTeX by igny · · Score: 1

      You had it easy young man. I had to use ChiWriter .

      --
      In theory there is no difference between theory and practice. In practice there is. - Yogi Berra
    13. Re:LaTeX by Petrushka · · Score: 5, Interesting

      I have a sneaking suspicion that when the OP is saying things like "no CSS" and doesn't mention LaTeX, s/he is actually giving specifications in a very obfuscated way -- specifications that need to be deduced. What I take from the post is that the OP wants

      • Portability. Anyone can open an HTML file without having to install new software; the same doesn't go for ODF, LaTeX, or MSWord. I suspect this is the main thing the OP wants. But this shouldn't rule out CSS.
      • Everything in one file: I'm guessing this may be why the OP doesn't want CSS. But that's not a good reason to avoid CSS either, since CSS can perfectly easily go in the same file. (I think it does rule out editing the XML in ODF documents, though, since as far as I'm aware they're always a composite of several files.)
      • Read/edit in the same document. This could be another reason why the OP doesn't mention LaTeX. LaTeX is perfect for editing, not so great for reading: for that you have PDF. Maybe the OP doesn't want to have two separate files like that.

      I'm guessing the OP has been inspired by the use of HTML for slide presentations, in the form of S5. I can see that. But the specifications, if I've deduced them correctly, are not hugely well-thought-out ones. I can kind of see someone not wanting to use LaTeX for the reasons given above, but insisting on no CSS is crazy.

      In any case, the OP should certainly give slightly clearer specifications if s/he doesn't want to have people yelling "LaTeX!!!" all day.

    14. Re:LaTeX by scubamage · · Score: 1

      You have no idea how much I could have used that when I was still in the academic arena. The head of our math/comp sci department dictated that all work be produced in LaTeX. Hours of my time wasted. Oh well, at least I know should I find myself in grad school. Thanks for the links! :)

    15. Re:LaTeX by Anonymous Coward · · Score: 2, Informative

      I think you are overstating the enjoyment possible when using LaTeX :)

    16. Re:LaTeX by Anonymous Coward · · Score: 0

      Another option is to use ReST (Restructured Text) which can be converted both to nice TeX and nice HTML.

    17. Re:LaTeX by dangitman · · Score: 1

      But they don't do a particularly great job of it. To deploy in a publishing system, a more serious solution is warranted.

      --
      ... and then they built the supercollider.
    18. Re:LaTeX by plasticsquirrel · · Score: 5, Informative

      Use LaTex. Except for the often limited fonts, it is vastly superior to an word processor, because a word processor is not the write tool to create real documents. We have know that for many years. That is why people bought pagemaker. And I think the lack of fonts forces people to create compelling content. LaTex is free, there are many good books,and if you do have a hankering to code, you can always play with Tex.

      LaTeX has limited fonts, but if you use XeTeX (which uses LaTeX), you can not only use all the LaTeX stuff, but also any TrueType or OpenType font, and native unicode support as well. This is a godsend for typesetting anything that includes words or characters not in English, or just for people who are picky about typography. My personal favorite font is the open source SIL Gentium family, which is not only much more beautiful and readable than Times, but contains a fuller character set that makes it compatible with many more languages. Once you start writing documents with XeTeX and nicer fonts, you see how lacking word processors are for good typography and well-structured documents, and how self-limiting the concept is.

      For newcomers to LaTeX and XeTeX, including packages and specifying options can be a bit time-consuming when you just want to get started with a basic A4-sized document. Here is the basic XeTeX file template I use for simple stuff. I'm picky about margins, line spacing, fonts, etc. so you know that it's a safe place to start out.
      -

      \documentclass[11pt]{article}

      %%
      % XeTeX packages
      %%
      \usepackage{fontspec}
      \usepackage{xunicode}
      \usepackage{xltxtra}

      %%
      % Formatting packages
      %%
      \usepackage{setspace}
      \usepackage[vcentering,dvips]{geometry}
      \usepackage{fancyhdr}

      \geometry{papersize={8.5in,11in},total={6.5in,8.8in}}

      \pdfpagewidth 8.5in
      \pdfpageheight 11in

      % 10pt font: 1.15
      % 11pt font: 1.1
      \setstretch{1.1}

      \setmainfont[Mapping=tex-text]{Gentium Basic}

      \begin{document}

      Some text...

      \end{document}

      --
      Systemd: the PulseAudio of init systems
    19. Re:LaTeX by Anonymous Coward · · Score: 0

      I see a lot of people knocking this guy for reinventing the wheel. But I don't get why. Let him take his crack at the problem and see what he comes up with. Let the market decide.

      I didn't realise Slashdot had turned into a Microsoft convention.

      Sad.

    20. Re:LaTeX by moosesocks · · Score: 5, Insightful

      Although I agree with you in that LaTeX is widely used in the scientific community, and unambiguously offers the best typesetting facilities you'll find outside of a publishing house, is it still appropriate today?

      The internet as we know it was created at CERN to facilitate the sharing of scientific information. Why are we still publishing in a format designed to be presented on dead trees?

      Like it or not, a properly-formatted print article looks horrible on a screen. An article formatted for printing on A4 or Letter-sized paper will use the whole width of the page, be set in 10-point type, and use columns. Unfortunately, modern computer screens don't have nearly enough resolution to display the full width of the page alongside much else. Obviously, PDF files also don't have the ability to flow to fit the width of the screen.

      LaTeX also doesn't give you the benefit of hypertext. Yes, there are various hacks you can use to add anchors and links to PDFs, although these are mere hacks on top of a broken format. Things such as high-resolution figures and hyperlinked references would be particularly beneficial for academic uses. It'd also be great to be able to see all articles linking back to what you happen to be reading. (This brings up all sorts of questions about the very nature of scientific publishing, although this is another debate entirely)

      Wikipedia (more specifically, MediaWiki) actually offers a promising solution to these (and the original poster's) requirements. It provides a convenient and simplistic markup for multi-sectioned articles, flows to fit the width of the page, and also provides LaTeX's fantastic mathematical typesetting facilities. Hyperlinking to other parts of the Wiki (and to external sites) is excessively easy. I'm sure the DOI system could be integrated to allow linking back to other articles within the constraints of the existing academic publishing regime.

      Google could very easily provide the "glue" to hold such a system together, although it would ultimately be better to put a public, non-profit entity in charge. It's absurd and hypocritical that so much of academic research (particularly the publishing part of it) is profit-driven.

      --
      -- If you try to fail and succeed, which have you done? - Uli's moose
    21. Re:LaTeX by nine-times · · Score: 2, Interesting

      On the other hand, .tex files don't render so well if you drop them into web browsers. I mean, which format you choose really should depend on what your needs are. If you want to be able to store a single copy that can be opened in a web browser or an office suite, HTML isn't necessarily a bad choice, but at this point you wont get great layout control. I really think it's reasonable to hope that as new versions of HTML and CSS come out, they should be aiming towards enabling people to have a good "print" media type CSS that gives professional layout results, but we aren't there yet. We aren't even really close.

      If you want people who know what they're doing to be making/editing these documents, then LaTex may be a good choice. If you want people to have normal everyday people to be able to open the file in an office suite they're comfortable with, then ODF is worth considering. If you want a widely supported format only for display/printing purposes (no editing) and you want tight layout control, then you won't do better than PDF.

      At this point, there is no format that does it all without any downsides. You have to pick the best tool for the job.

    22. Re:LaTeX by thefringthing · · Score: 3, Informative

      Most distributions of LaTeX come with some way to compile it and export to PDF, which is both portable and readable.

    23. Re:LaTeX by Apathy451 · · Score: 1

      Also of note is MathJax (http://www.mathjax.com/) which is a full rewrite of jsMath (http://www.math.union.edu/~dpvc/jsMath/)
      (See jsMath in action: http://www.math.union.edu/~dpvc/jsMath/examples/Henrici.html)

      It handles a lot of the Math rendering needed for the web without the need for the end user to install/do anything. Granted, it doesn't do things like macros or any other number of LaTeX stuff, but it does quite a lot as-is for taking straight TeX and rendering it properly.

    24. Re:LaTeX by AigariusDebian · · Score: 3, Insightful

      LaTeX describes the document, just like HTML describes it, but with more structure about it. What you see in a web browser is not HTML, it is a *rendering* of HTML. Different browsers render the same HTML differently, for example a mobile browser will skip stuff and reformat other stuff. In the same way there are different LaTeX renderers - some output PDF, some output HTML, some output ODF. It would be much easier to use LaTeX for the source document and then compile a pretty HTML file and a PDF file from it than to craft it in HTML or some other XML variant directly. You can have links in LaTeX that render as normal links in the HTML output. Why stress?

    25. Re:LaTeX by AigariusDebian · · Score: 1

      HTML does not render so well if you drop it into an image viewer.

      That is why there are LaTeX renderers. Or at least LaTeX->HTML renderers.

    26. Re:LaTeX by nine-times · · Score: 1

      I have a sneaking suspicion that when the OP is saying things like "no CSS" ... s/he is actually giving specifications in a very obfuscated way

      Definitely. The problem with this question is that the OP is suggesting an idea for solving a problem, but then not giving a complete description of the problem being solved. A complete description would also give the restrictions on how you can solve the problem so that you can really analyze what the problem is.

      Just to add to your list, another thing that popped into my head (which may be on the 'unlikely' side): the aversion to CSS may be that the poster wants people to be able to edit in an office suite, but might have some inkling that WYSIWYG editors don't handle style sheets very well at all. But really, there's no proper way to handle this sort of formatting in HTML itself. It would have to be done with styles, even if they were inline.

    27. Re:LaTeX by Anonymous Coward · · Score: 0

      Anyone can open an HTML file without having to install new software; the same doesn't go for ODF, LaTeX, or MSWord. I suspect this is the main thing the OP wants. But this shouldn't rule out CSS.

      I do believe that web browsers are more common then pdf readers (Windows does come with a web browser but not a pdf reader), but not by much at all. This isn't really much of a point against LaTeX.

      Everything in one file: I'm guessing this may be why the OP doesn't want CSS. But that's not a good reason to avoid CSS either, since CSS can perfectly easily go in the same file. (I think it does rule out editing the XML in ODF documents, though, since as far as I'm aware they're always a composite of several files.)

      Again, this is easily something LaTeX can take care of.

      Read/edit in the same document. This could be another reason why the OP doesn't mention LaTeX. LaTeX is perfect for editing, not so great for reading: for that you have PDF. Maybe the OP doesn't want to have two separate files like that.

      This [i]could[/i] be a really trivial advantage for some HTML jiggering over an established standard for such things.

      I'd be perfectly happy to learn of a better solution then LaTeX for this situation - but your post doesn't really give any notable reason LaTeX would not really work here. Your points are extremely trivial.

      Eh-hem: LaTeX!!!

    28. Re:LaTeX by MightyMartian · · Score: 1

      HTML is simply not that good at more complex forms of document creation, nor does it have to be. Now if we're talking about giving browsers the capacity to directly display LaTex, well, I think that's an interesting idea, but in reality, what the parent is talking about is making an essentially retarded version of LaTeX, with the single ability that it can define page size. How would it render properly given various resolutions?

      I agree with others, LaTeX is what the parent is looking for. HTML and CSS have sufficient compromises to try to make browsers do what they never were intended to do. This would take the whole thing absurd extreme.

      --
      The world's burning. Moped Jesus spotted on I50. Details at 11.
    29. Re:LaTeX by AceofSpades19 · · Score: 3, Insightful

      I'm pretty sure every one that has a browser, has a pdf reader to read pdfs written in latex

    30. Re:LaTeX by AceofSpades19 · · Score: 1

      Rich Text Format is nothing like SGML. It has a completely different syntax and it behaves a lot differently.

    31. Re:LaTeX by dangitman · · Score: 0

      I agree that HTML is not that great. Which is why I'm arguing for a new standard, perhaps something that merges HTML/LaTex/PDF? I'm just doubt that LaTex is the way, it's so archaic. HTML has potential, if decent typesetting features were integrated with it and CSS. CSS is supposed to be capable of dealing with different media, it just needs to deliver on the promise, which it currently isn't.

      --
      ... and then they built the supercollider.
    32. Re:LaTeX by nine-times · · Score: 1

      Yeah, but my whole point is: it depends on what your requirements are. Since the OP brings up HTML, I would guess being directly viewable in a web browser is a plus, and being viewable in an image viewer isn't desired.

      Is part of your aim to be that you can have common users edit the file in an office suite, dump documents to a website as-is, set up an index page with a link to each document, and allow the general public to view the articles just by visiting the website (without any additional software or plugins installed) and print them if they want (while controlling the print layout)? Because if that's the sort of thing that the OP is hoping for, then LaTex might be a horrible solution.

      You have to know what the OP is trying to accomplish and who will be using this system.

    33. Re:LaTeX by master5o1 · · Score: 1

      HTML is only portable and widely spread because every operating system that one would view an HTML file in has a rendering engine designed to render HTML -- yeah, a browser!

      If, for example, Mozilla incorporated LaTeX into Gecko then LaTeX would do the job that your first bullet points out. Probably any implementation using LaTeX would be a on-the-fly compile and view as PDF.

      --
      signature is pants
    34. Re:LaTeX by andymadigan · · Score: 2, Insightful

      I've written few LaTeX documents that will work out of the box on most distributions, let alone all of them. Realistically, I can't send a LaTeX document to someone else and expect them to be able to edit it and read it, even if they have LaTeX installed.

      Unfortunately any program able to handle everyone's different styles for document printing is probably going to be too specialized for everyone to have. LaTeX shows that print layouts are a difficult problem. Even on webpages (screen display), to get really good layouts we rely on scripts, styles and templates from other sources, in most cases these are too numerous to make distribution of the document via e-mail trivial. Plus, we use specialized software (e.g. Dreamweaver).

      Unfortunately there's no good solution that I know of for this. Simply throwing text and images into a document does not make it readable, and there's no software that can simply take the jumble and make it readable, it takes a human touch to produce a good layout.

      --
      The right to protest the State is more sacred than the State.
    35. Re:LaTeX by MightyMartian · · Score: 2, Interesting

      Why is LaTeX archaic, but HTML not? LaTeX is infinitely more powerful when it comes to layout. Even with CSS, HTML has to be bent over backwards? But it's deeper than that. HTML is about rendering documents on many variant platforms and displays. It's about general text flow, not about pages. I think you could probably do what you wanted with CSS, but it would be horrible to code, and would be utterly screwed up on any screen that didn't match yours in resolution or in ratio. What about people who don't maximize their browsers? I mean, the whole point of HTML is that it can deal with all sorts of browser window sizes. I find CSS that explicitly sets screen real estate to be an incredibly pain in the ass, and essentially breaks the browser. What you're proposing would be even worse.

      Screens are not pieces of paper. They are entirely different mediums with entirely different requirements. Maybe if we all had 30" monitors, it would make some sense, but since that's not the trend (quite the opposite, in fact, we're going to smaller and smaller screens on netbooks and smartphones).

      HTML has its place, but there are proper typesetting languages out there, designed specifically for what you want. So learn them.

      Now, like I said, I think it would be very cool if someone would design a plugin that would allow browsers to render LaTeX files. That would be quite awesome, although I guess the reasonable argument could be made that just simply compiling them as a PDF would accomplish the same thing. But any such plugin would have to pretty much dispense with most of the typesetting markup or again, we'd be breaking the browser.

      --
      The world's burning. Moped Jesus spotted on I50. Details at 11.
    36. Re:LaTeX by DrLudicrous · · Score: 1

      I LOVE inkscape. That coupled with GIMP enables just about any graphic manipulation you want. Plus, inkscape has some plugins that allow you directly enter (and in one case edit) LaTeX code. All for free. Unbelievably useful. I have futzed with LyX, but because I am used to handcoding my tags, I doubt I will start using it. I use WinEdt to create my documents, though IMO the autotabbing leaves something to be desired.

    37. Re:LaTeX by Nitewing98 · · Score: 4, Informative

      Yes, not using CSS is crazy. Especially since CSS includes the concept of media types. A simple "media='print'" in the CSS spec should hand that.

      --

      Nitewing '98

      Everything works...in theory.

    38. Re:LaTeX by anton_kg · · Score: 1

      Not everyone has browser installed by default. If one do have it, then he could always open pdf/odf in google docs.

    39. Re:LaTeX by Idbar · · Score: 2, Informative

      it's the eternal fight between intepreted vs. compiled languages. You are arguing that html is easier to read. Well , it's not, neither latex. The fact that you have an interpreter that i's more common (a browser) is another thing. But don't be confused reading a plain html is as or more annoying than reading a tex file.
      Please also remember that the "compiled" output of latex is dvi, not ps or pdf.

    40. Re:LaTeX by Anonymous Coward · · Score: 0

      s/he is actually giving specifications in a very obfuscated way -- specifications that need to be deduced.

      I'm Not New Here, but I wasn't aware Ask-Shashdot was really "Deduce your madness while-U-wait".

      Oh well. If you've got that kind of time.

      Not to suggest it's the case here, but.. it *is* entirely possible some of these questions are just to fuck with people, isn't it? I always assume Psychology students are hiding in the bushes.

    41. Re:LaTeX by CarpetShark · · Score: 1

      LaTeX is not a great solution these days. It doesn't handle unicode characters that well, and as far as I've seen, there's no way to generate accessible tagged PDFs. HTML is more modern in many ways. Mostly I just like the terse syntax of LaTeX, and it's support for math and nice rendering. With a good renderer for HTML+MATHML+SVG+CSS print media, and maybe also a preprocessing language that supports syntax like \body(style="",...) { \h1{ ... } ... } and a few macros to generate indicese etc., I'd be much happier than when using LaTeX. As far as I know, PRINCE supplies the first of these requirements, and the rest shouldn't be that hard. I'm sure it exists, somewhere.

    42. Re:LaTeX by Secret+Rabbit · · Score: 1

      Portability:

      Name me an OS that doesn't have a PDF or PS reader installed by default. If one wants to bitch about no LaTeX installed by default, then don't write an academic paper. It's a speciality and as such, bitching about having to install something special is rather asinine.

      Everything in one file:

      LaTeX can do this just as good as HTML i.e. it would requite the same kludge.

      Read/edit in the same document:

      Bullshit. With HTML you have to open the doc in a browser to see what would be printed. Similarly with LaTeX you would "compile" it and then open that in a relevant viewer (see first comment above).

    43. Re:LaTeX by amicusNYCL · · Score: 2, Informative

      Exactly. Until I can use a server scripting language to open a (binary) PDF template and do string find-replace, and not break the document, I'll stick with the ASCII-based RTF format.

      There are a lot of people who seem to be screaming "LaTex" as the solution to this non-problem. This is specifically what the PDF format was created for: to create a portable document that renders and prints the same anywhere. HTML is a fluid layout to fit various resolutions, PDF is guaranteed to print the same on any machine. That's exactly what it's for. There's no reason to bring LaTeX into the picture when you can export a regular word processing document to a PDF. Regardless of whether or not you can export to PDF from LaTeX, there's no reason to use LaTeX specifically vs. Word or Open Office or Acrobat or whatever else. The OP didn't even mention that the document should be editable by anyone other than the author.

      It's been quite a while since I got interested in the idea of using html (instead of .doc. or .odf) as a standard for saving documents

      PDF: you can export from your word processor, you can generate one easily from PHP, you can use Adobe's bloated software, whatever you want to use. It supports images, links, page size and margins, and it's guaranteed to print the same way on any printer. That's specifically what it's for.

      --
      "Our two-party system is like a bowl of shit looking at itself in a mirror." - Lewis Black
    44. Re:LaTeX by amicusNYCL · · Score: 1, Funny

      Actually, the font problem is solved by using XeLaTeX

      It's nice to see the naming conventions from 1996 making a CoMeBaCk.

      --
      "Our two-party system is like a bowl of shit looking at itself in a mirror." - Lewis Black
    45. Re:LaTeX by pjpII · · Score: 2, Insightful

      Latex is really the solution. There is no reason to reinvent the wheel. In fact, reinventing the wheel might cause problems when submitting papers. From what I have seen, many academic journals prefer .tex and .eps files. I can't imagine what they would do with HTML.

      Actually, that may be true of some academic journals, but most deal primarily with MS Word documents. Some publishers might grudgingly deal with Latex documents (I know John Benjamin's mentions it in their style requirements), but the people who run conferences and therefore are in charge of submitting the proceedings tend not to be computer saavy enough to work with anything other than MS Word files (god save whoever has to deal with the millions of random fonts people use, use/non-use of styles, etc).

      This of course depends on your field - in Comp Sci, I'd wager there're many more journals that regularly accept latex files. In linguistics, it's somewhat rarer, and as you get further into the humanities, it becomes increasingly difficult to find anyone who's heard of Latex at all.

    46. Re:LaTeX by the+donner+party · · Score: 1

      Interestingly, Acrobat Reader v9 contains an option to reflow the PDF text for easier reading. (Of course standard PDF doesn't contain enough information to do a good job: the one time I tried it on a LaTeX-generated PDF, it mixed up the lines from two columns, and even missed a lot of inter-word spaces. Maybe it works better on simpler one-column PDFs, or maybe it needs some new extensions to the PDF standard to work properly.)

    47. Re:LaTeX by EvanED · · Score: 1

      it has everything that you are looking for and can be easily compiled to ps, dvi, pdf

      While I want to emphasize that Latex is probably the best of a bunch of not-very-good solutions, I strongly suggest targeting PDF. Windows users sort of have the shaft when it comes to PS viewers; the best one out there is (or at least until recently was; if something like Okular now works on Windows through KDE-Windows this might have changed) GSView, which says a lot because it's a piece of crap.

      DVI? I personally think that ceased to be a benefit a while ago, since other formats became available, and really no one has DVI viewers.

      And while you can do ps2pdf (or weird DVI->PDF flows), targeting PDF directly also lets you use some spiffy pdflatex-only packages that add things like comment/annotation support, hyperlinks, etc. One very impressive package that I believe is pdflatex-only is called PGF; it's a graphics package, and can do a lot of the things you'd use something like pstricks or xylatex for.

      That's sort of my personal view, but it's one that I think is increasingly "correct".

      and (I am told but haven't used) html

      Eh, I haven't put a ton of investigation into it, but I've not been very impressed with the HTML output those converters generate. They're usually fine for conveying information, but I don't really consider what I've seen to be sort of on "equal footing" with the PS/DVI/PDF output you'd get from latex proper. I'm also not sure at all what they would produce for things like xypic code.

      It even plays nicely with version control, bibliography management (BiBTeX), etc.

      This is one of the best parts about it, and is nowadays probably the biggest benefit (IMO) over Word. If you're working with collaborators (and if you're doing academic publications, when are you not?) then easy merging is usually essential, and $FAVORITE_VCS usually just works so well.

      As a bonus you can run it on linux via command line.

      Or even... the same way on Windows. ;-)

      But really that's possibly not the best way; get a Latex mode for Emacs that will let you compile and open the resulting file right from emacs (I think auktex does this), a dedicated Latex editor, etc.

    48. Re:LaTeX by dangitman · · Score: 1

      Why is LaTeX archaic, but HTML not? LaTeX is infinitely more powerful when it comes to layout.

      Because HTML continues to evolve LaTex does not. It may be powerful at layout (but not as powerful as something like InDesign or Quark) but what I'm talking about it something that encompasses page layout, web design, and semantic markup.

      Screens are not pieces of paper. They are entirely different mediums with entirely different requirements.

      Of course. Which is why CSS allows different stylesheets for different media, such as "print" and "screen." The problem is it's not sufficient at the moment.

      HTML has its place, but there are proper typesetting languages out there, designed specifically for what you want. So learn them.

      I've worked for plenty of years in print design, so I know them. But that's not enough. We need to bring sanity in document formats to the average user. To the professor of arts who doesn't know anything about a computer except how to use Word. I have wished for a long time that people who write and publish would develop some sort of typographical literacy, but the reality is that it's never going to happen.

      Add in the other users responsible for publishing content online such as HTML monkeys or blog editors, and you're just not going to get the expertise required to master all the different permutations. We need something that is portable, flexible, and able to supplant Word documents and proprietary page layout files.

      As I hope I have already impressed, the problem is not just to create page layout for print publications, it's also to be able to use that file to create decent web pages without any modification, that works with content management systems and the like.

      --
      ... and then they built the supercollider.
    49. Re:LaTeX by Silas+is+back · · Score: 1

      Nothing to add.

      --
      this sig is useless
    50. Re:LaTeX by EvanED · · Score: 1

      I do believe that web browsers are more common then pdf readers (Windows does come with a web browser but not a pdf reader), but not by much at all

      I bet more people have PDF readers than a non-IE browser though. And how long do you think it'd take MS to implement enough CSS and such to make formatting academic papers and such in it a good idea?

    51. Re:LaTeX by StarsAreAlsoFire · · Score: 1

      but insisting on no CSS is crazy

      I couldn't agree more. It is EXACTLY the inverse of the direction the world is taking HTML.

      I could see an argument for making it a CSS tag... Oh. <a href="http://www.w3.org/TR/CSS2/page.html">Wait</a>

    52. Re:LaTeX by Anonymous Coward · · Score: 1, Informative

      I've written few LaTeX documents that will work out of the box on most distributions, let alone all of them. Realistically, I can't send a LaTeX document to someone else and expect them to be able to edit it and read it, even if they have LaTeX installed.

      Then you're doing some very wrong. I frequent freenode#latex and we exchange files constantly across platforms and distributions with expectation of exactly-the-same performance. And we get it.

    53. Re:LaTeX by Anonymous Coward · · Score: 1, Insightful

      • Portability. Anyone can open an HTML file without having to install new software; the same doesn't go for ODF, LaTeX, or MSWord. I suspect this is the main thing the OP wants. But this shouldn't rule out CSS.

      Sometimes on lab equipment you don't have X or similar. I run latex and put it to print. I don't need to see it before the printing. I don't get distracted by layout while doing the relevant stuff and if you don't get a weird distro mod of latex it is really protable. HTML is for things where you don't know the output size. Latex is for generating documents.

    54. Re:LaTeX by Bromskloss · · Score: 1

      Actually, the font problem is solved by using XeLaTeX (which uses XeTeX).

      Full OpenType support. Looks amazing.

      I've been interested in upgrading from LaTeX to something better. Are there any problems with moving from LaTeX to XeTeX? I understand XeTeX can process LaTeX files (right?), but what about LaTeX pakages? Will they work in XeTeX as well?

      --
      Swedish plasma phys. PhD student; MSc EE; knows maths, programming, electronics; finance interest; seeks opportunities
    55. Re:LaTeX by moosesocks · · Score: 2, Insightful

      99.99% of LaTeX is output straight to PDF, Postscript, or (in special cases such as Wikipedia's math renderer) a rasterized image. The documentation, plugins, and user community of LaTeX all reflect this.

      I haven't come across any serious usage of LaTeX in the manner that you describe it.

      ODF and HTML do not support the full set of typographic features that LaTeX does. Something will almost certainly be lost in translation.

      Although I suppose it's possible to craft a source document that would look good both in print and as free-flowing hypertext, you'd need a zen-like command of the language. LaTeX has enough quirks as it is. I have a very difficult time accepting this as a practical solution.

      Also, if you're crafting a hypertext document, why not start with a language specifically designed for the task?

      --
      -- If you try to fail and succeed, which have you done? - Uli's moose
    56. Re:LaTeX by rhathar · · Score: 5, Insightful

      Portability:

      Name me an OS that doesn't have a PDF or PS reader installed by default.

      Windows

      --
      http://www.chaotickingdoms.com
    57. Re:LaTeX by (pvb)charon · · Score: 2

      Anyone can open an HTML file without having to install new software;

      Except for Windows 7 users in Europe...

    58. Re:LaTeX by hakola · · Score: 1

      You, Sir, just saved my future. Thank you for the suggestion, the font is aesthetically exquisite. Just what I've been looking for.

    59. Re:LaTeX by beelsebob · · Score: 1

      He is, but he also seems to completely miss the point of html too. div is a generic division. It can mean page if you like. All you need to do is specify that in your css.

    60. Re:LaTeX by Anonymous Coward · · Score: 1, Informative

      I've written few LaTeX documents that will work out of the box on most distributions, let alone all of them.

      That's strange... academic journals do that all the time. Every journal I've published in accepts LaTeX sources and postscript/TIFF figures. As long as you use their stylesheets and follow their rules (generally imposed to prevent problems when they have to process dozens of papers in one go), you should get the same result that they do.

    61. Re:LaTeX by Anonymous Coward · · Score: 0

      Anyone can open an HTML file without having to install new software;

      Except anyone who buys Windows 7 in the EU...

    62. Re:LaTeX by Peter+Winnberg · · Score: 1

      Like you point out XeTeX has advantages. But many that are picky about typography will not use it because it is unable to use the microtype package ( that needs pdfTeX ).

    63. Re:LaTeX by Anonymous Coward · · Score: 0

      You seem to be talking about LaTex. It already exists. Don't reinvent it.

      Another alternative is RTF,

      Read the fucking comma?

    64. Re:LaTeX by duffel · · Score: 1

      You would think, wouldn't you? I'm still shocked from when one of my windows using friends discovered they didn't.

    65. Re:LaTeX by philipgar · · Score: 1

      the last part is true for plain latex, however I think more and more people have switched to using pdflatex, where the output IS a pdf file...

      Phil

    66. Re:LaTeX by uglyduckling · · Score: 1

      Name me an OS that doesn't have a PDF or PS reader installed by default.

      DOS 5

    67. Re:LaTeX by phantomfive · · Score: 1

      Yes, there are various hacks you can use to add anchors and links to PDFs, although these are mere hacks on top of a broken format.

      That is exactly how I would describe CSS. Not a troll.

      --
      Qxe4
    68. Re:LaTeX by colinrichardday · · Score: 2, Informative

      Because HTML continues to evolve LaTex does not.

      Really? How long is it taking the W3C to release HTML 5?

      It may be powerful at layout (but not as powerful as something like InDesign or Quark) but what I'm talking about it something that encompasses page layout, web design, and semantic markup.

      LaTeX is working on semantic markup
      http://tug.ctan.org/cgi-bin/ctanPackageInformation.py?id=stex

      and

      http://tug.ctan.org/cgi-bin/ctanPackageInformation.py?id=cool

      As for web design, people are working on converting LaTeX to MathML.

      it's also to be able to use that file to create decent web pages without any modification, that works with content management systems and the like.

      And how do you presume to get all the browser vendors on board?

    69. Re:LaTeX by colinrichardday · · Score: 1

      I prefer pstricks. The plots look nicer than what I've seen from gnuplot. And the only way I know to plot functions in SVG is to use polybeziers.

    70. Re:LaTeX by colinrichardday · · Score: 1

      ODF and HTML do not support the full set of typographic features that LaTeX does.

      Use MathML instead of HTML, though you might not have good browser support.

    71. Re:LaTeX by TheRaven64 · · Score: 3, Informative

      CSS has had attributes for page breaks since CSS 2.1. I've not played with them for a while, but Opera supported them correctly back in 2003, so I'd imagine that they work well now. You can find the relevant part of the specification here. It lets you specify margins, soft breaks, rules for two-sided printing and so on. I played with the idea of using HTML instead of LaTeX for a while, but decided not to for two reasons. First, LaTeX is easier to type and read, and second because LaTeX produces nicer-looking output.

      --
      I am TheRaven on Soylent News
    72. Re:LaTeX by colinrichardday · · Score: 1

      The internet as we know it was created at CERN to facilitate the sharing of scientific information.

      It would appear that they did not believe that the sharing of scientific information required mathematical notation.

    73. Re:LaTeX by TheRaven64 · · Score: 3, Informative

      LaTeX is a macro system built on top of TeX. XeTeX is a TeX implementation. Whether any given LaTeX package will work with XeTeX (or any other TeX implementation, like pdftex) depends on whether it uses any implementation-specific features. Most don't.

      --
      I am TheRaven on Soylent News
    74. Re:LaTeX by TheRaven64 · · Score: 1

      LaTex doesn't translate easily or cleanly into HTML, or vice-versa

      LaTeX converts very well to [X]HTML using tex4ht. I know one person who has written his entire web site in LaTeX and compiled it with tex4ht. Lots of people use it for online copies of papers and they look good.

      --
      I am TheRaven on Soylent News
    75. Re:LaTeX by TheRaven64 · · Score: 1

      As you say, it depends on your field. In computer science, you can typically judge the quality of a journal by whether it accepts LaTeX; if it doesn't it's probably not worth submitting to. In Physics and Mathematics TeX is also common; Mathematics journals often allow plain-TeX as well as LaTeX because, for a long time, it was the only way of getting nicely typeset formulae.

      --
      I am TheRaven on Soylent News
    76. Re:LaTeX by cronostitan · · Score: 2, Informative

      I recently had to implement a proper print-view with CSS covering several pages of print outpu. What I can tell is that it is a pain in the a** - since alot of print-specific CSS attributes are not supported by actual browser-version - Opera being the exception from the sad rule. For example most browser do not support the command to keep divs intact and do the pagebreak automatically before or after the div. We ended up having the user to decide when he needs a page break.

      I honestly can understand that someone get frustrated and wants to use a 'better' way.

      --
      Spelling errors were made for your amusement only...
    77. Re:LaTeX by TheRaven64 · · Score: 1

      It doesn't handle unicode characters that well

      Really? I've written a thesis and two books using LaTeX, and all of my source files are UTF8 with scattered non-ASCII characters.

      --
      I am TheRaven on Soylent News
    78. Re:LaTeX by thogard · · Score: 1

      LaTeX isn't the solution but its close. LaTeX is just a complex module that defines lots of macros that makes it easy to produce TeX document.

      The proper solution is TeX with a new web module that does more modern and web based macros. Get that into Firefox and a free module for IE and people will start to use it and HTML can die the death it deserved back in the 1980s.

    79. Re:LaTeX by emlyncorrin · · Score: 1

      Please also remember that the "compiled" output of latex is dvi, not ps or pdf.

      It depends what you compile it with, with pdflatex, the compiled output is pdf.

    80. Re:LaTeX by Anonymous Coward · · Score: 0

      Or SGML..

    81. Re:LaTeX by Anonymous Coward · · Score: 0

      I'm pretty sure every one that has a browser, has a pdf reader to read pdfs written in latex

      True but now I quickly want to edit something and make my edits viewable(pdf). So I will need to ask you for the latex file and install a latex2pdf compiler (a minimum of 100 MB).

    82. Re:LaTeX by vtcodger · · Score: 1
      ***What you see in a web browser is not HTML, it is a *rendering* of HTML.***

      Exactly. It's called HyperText Markup Language, not HyperText Layout Language. What You See is not necessarily what Someone else will get. That is by intent to deal with widely varying display capabilities. Using HTML for layout definition is sort of like modifying a sports car for use as a plow. It can probably be done, but it will be a lot of work and will probably never be be a very good plow.

      --
      You can't see ANYTHING from a car, You've got to get out of the goddamned contraption and walk...Edward Abbey
    83. Re:LaTeX by Anonymous Coward · · Score: 2, Insightful

      Please also remember that the "compiled" output of C is PDP-11 machine code, not x86 code or CLR code.

    84. Re:LaTeX by 1u3hr · · Score: 2, Insightful
      We need to bring sanity in document formats to the average user. To the professor of arts who doesn't know anything about a computer except how to use Word. I have wished for a long time that people who write and publish would develop some sort of typographical literacy, but the reality is that it's never going to happen.

      I edit and layout books, and the worst problems are when the author DOES think he has typographic literacy. If I let them have their way they print books set in Arial 10 point with vertical quotemarks, and I could go on.

      It took me months of study and years of practice to get the degree of typographic literacy I have now.

      They will just waste their time on details that will actually impede the publishing process if not stripped out.

      It's like cutting your own hair -- yes, you can do it. But you're less likely to make a fool of yourself if you pay someone who does it all day long a couple of dollars to do it right.

      Authors ideally should not be concerned with visual layout. They just need to make sure that the logical structure (headings, notes, location of diagrams) is clear. Doesn't matter if they use Courier or Comic sans.

    85. Re:LaTeX by smallfries · · Score: 4, Informative

      Further to the two AC posts: you are doing something wrong. As an academic I send / receive latex source all the time and expect it to compile and reproduce the same exact same results. How are you abusing TeX to get these kinds of problems?

      --
      Slashdot: where don knuth is an idiot because he cant grasp the awesome power of php
    86. Re:LaTeX by Anonymous Coward · · Score: 0

      I'm doing most of my academic writing in LaTeX and have typeset a book in it, but for any other purpose than submitting a camera-ready PDF LaTeX is a nightmare. You cannot reliably convert LaTeX sources to any other format and I had to manually rewrite more than one document for journals that required .doc format. Even with journals that accept .tex files there can be indefinitely many troubles because of incompatibilities between packages, missing line-breaking rules for mathematical formulas, and special symbols. LaTeX source code even breaks between different TeX installations, so it's unsuitable for long-time storage.

      Docbook or similar stuff seems the way to go, as long as the texts don't contain formulas.

    87. Re:LaTeX by ben0207 · · Score: 1

      Workbench.

      --
      cmd-q.co.uk - some sort of stupid fucking internet bullshit
    88. Re:LaTeX by Anonymous Coward · · Score: 1, Informative

      LaTeX also doesn't give you the benefit of hypertext. Yes, there are various hacks you can use to add anchors and links to PDFs, although these are mere hacks on top of a broken format.

      \usepackage{hyperlink} gives hyperlinked table of contents, hyperlinked references, and if you specify \url{http://slashdot.org} that is linked too.

      It may be a hack, you may need to use pdflatex, but it works and is simple from the users standpoint. Implementation wise it could surely be a hack... so what?

    89. Re:LaTeX by Anonymous Coward · · Score: 1, Informative

      RTF is a "sister SGML language of HTML"? Huh???? It isn't even close! It's just a text version of the MS-Word doc file format and even the style definitions don't carry into the document body. Only in the most abstract sense is it related to SGML.

      If you want professional-quality documents, select PostScript, PDF (PostScript encapsulated, not meaning EPS) or one of the TeX flavors.

      Anything based on MS-Word and its ilk (including ODF), RTF (same thing, as I mentioned earlier) or HTML is dicey. All of these apps do their final typesetting by using the font metrics available on the computer where it was printed (NOT the computer where it was composed). It's not as noticeable these days thanks to extensive use of TrueType and similar soft fonts, but in the olden times when most of the fonts being used in document composition were the printer hardware fonts, you could get really badly messed-up line and page breaks just by composing on a machine with a different printer driver than the machine that the hardcopy was being produced on.

      PostScript and TeX are based on absolute metrics. HTML and the various word processors are not.

    90. Re:LaTeX by digitrev · · Score: 1

      Rich Text Format

      --
      Cynical Idealist
    91. Re:LaTeX by Anonymous Coward · · Score: 0

      The problem is that the H in HTML is for HYPER... It Links to something else. If you have a printed document, it is STATIC on paper, and NOT Hyper. What the person should use is output to PDF so that no matter where it is printed, it is the same. Laytex would be for the Layout, PDF for the Output. I feel that there must be a knowledge gap that is growing between the current generation and the old generation that they do not understand this. We use a number of acronyms but people are not properly educated as to their meaning, they just become a name.

    92. Re:LaTeX by chrish · · Score: 1

      LaTeX is awesome if you want your document to look exactly like a LaTeX document. If you're trying to make a customized page layout (say, so your documents look like they come from your company instead of from LaTeX) it's more trouble than you can imagine.

      --
      - chrish
    93. Re:LaTeX by YttriumOxide · · Score: 2, Interesting

      I'll ignore for the moment that "Workbench" isn't an OS (I'm assuming you mean AmigaOS), but I do want to point out that Workbench has had a PDF reader since version 4.0 (6 years ago), and a nice frontend for GhostScript as well.

      --
      My book about LSD and Self-Discovery
      Also on facebook as: DroppingAcidDaleBewan
    94. Re:LaTeX by just+fiddling+around · · Score: 1

      Exactly, you would be doomed to do it wrong if you tried.

      --
      You're not old until regret takes the place of your dreams.
    95. Re:LaTeX by Anonymous Coward · · Score: 0

      Okay, now name a good one.

    96. Re:LaTeX by TerranFury · · Score: 4, Interesting

      But... not everything about the PDF is specified by the LaTeX source -- and the toolchain matters. For instance, a document prepared for pdflatex with pdf figures and another prepared for the latex-->dvips-->ps2pdf route (which is often necessary as a number of journal styles use some pstricks) will in general not work with the opposite toolchain. Another example is paper size; certain of these tools output either letter or A4 by default, and must be instructed on the command line (or, really, in your build scripts) when you want the other (I know you can specify paper sizes in the source, but this is lost somewhere in the toolchain).

      Download ubuntu on one computer. Use apt to install kile and all its dependecies. Compile a paper written with the IEEE conference style. Now install Windows on another computer. Install MiKTeX on it and do the same. You will get similar output, but it will by no means be identical. The most noticeable thing is that margins are different.

      Oh, and so far I've ignored in this discussion that different styles will use different methods for including, say, theorems. It is a pipe dream to simply change the style of a document and expect decent results. Chances are the damn thing won't even compile -- and if it does, all your beautiful theorems will look like crap because the other style expected some different markup for them.

      I don't rule out that I'm doing something wrong, and if I am, I could stand to use some enlightenment. But I know that I don't use LaTeX significantly differently from anyone else I know...

    97. Re:LaTeX by TerranFury · · Score: 1

      This mostly works if you include the style you're using with the source... but if not then overall it's the same problem as dependencies that you get when you compile software from source. Obviously this is not a trivial problem, as otherwise it wouldn't have been necessary to invent apt, emerge and the like. The one difference is that generally I've found code libraries easier to to find than the correct LaTeX styles...

    98. Re:LaTeX by Anonymous Coward · · Score: 0

      RTF is not a sister SGML language of HTML. XML is.

    99. Re:LaTeX by al3 · · Score: 1

      And DocBook

    100. Re:LaTeX by Anonymous Coward · · Score: 0

      Firefox is completely useless for printing because it doesn't support the CSS page break styles. It has been not-supporting them since 2002.

      We hear a lot of blather about ACID tests, CSS3, HTML5, etc. but how about the basics?

    101. Re:LaTeX by maxume · · Score: 1

      Most word processors have good support for creating structured documents. They also make it straightforward to create a giant mash of custom styled text, which most users do (this is what I do, but I see why it is stupid and I would move away from it if I had a large project in front of me).

      Font support hasn't been an issue for a long time (basically since Windows 2000 for windows people, Linux is probably more complicated, but it looks like freetype2 made most fonts available, starting, at the latest, in 2002).

      I'm not going to say anything dumb about typography, but high end word processors generally have pretty decent layout engines.

      --
      Nerd rage is the funniest rage.
    102. Re:LaTeX by Angstroem · · Score: 1

      I can't send a LaTeX document to someone else and expect them to be able to edit it and read it, even if they have LaTeX installed.

      Then you are using some quite non-standard style files which are not part of the basic texlive installation, nor the full one. If you are using publisher-specific style files (IEEE, ACM, Springer LNCS, GI LNI, and whatnot...), then of course the recipient needs to have them installed to compile your work.

      This, however, is most likely the case if you do collaborative work. Otherwise you wouldn't send source, but the compiled PDF.

      Unfortunately any program able to handle everyone's different styles for document printing is probably going to be too specialized for everyone to have.

      No idea what you are trying to say here. LaTeX will search any style file in dedicated directories. Ever tried updating $TEXPATH?

      LaTeX shows that print layouts are a difficult problem. Even on webpages (screen display), to get really good layouts we rely on scripts, styles and templates from other sources, in most cases these are too numerous to make distribution of the document via e-mail trivial. Plus, we use specialized software (e.g. Dreamweaver).

      Unfortunately there's no good solution that I know of for this. Simply throwing text and images into a document does not make it readable, and there's no software that can simply take the jumble and make it readable, it takes a human touch to produce a good layout.

      Well, there's a reason why there's something like layouters and printers (the people, not the machine). It takes at least experience to design both, eye-friendly *and* appropriate, layouts. Web and printed media is full of examples where someone thought "hey, that program's wizard should do" -- or where they designed everything in Word and Powerpoint, violating each and every rule of proper design.

      Care to explain how Dreamweaver relates to LaTeX?

      That it takes *knowledge* and *talent* to come up with a pleasant layout is no problem of LaTeX, although for the average joe it is *easier* to cough up a pleasantly-looking document (assuming that he uses a standard design template and not some bogged-up "let's make it look as ugly as it can be" like the ACM style) than with e.g. Word. Kerning, microspacing, orphans, widows, hyphenization, LaTeX will take care of that cause it was *designed* to adhere to such printing rules.

    103. Re:LaTeX by CarpetShark · · Score: 1

      I know there is some unicode support, but I was under the impression that many aspects of unicode were not supported, and that some characters had to be entered as escape sequences (or whatever the equivalent of HTML entities is called). If this isn't the case, then please explain the actual limitations -- it would remove one of my three main objections to LaTeX :)

    104. Re:LaTeX by guttergod · · Score: 1

      You seem to be talking about LaTex. It already exists. Don't reinvent it.

      He may be LATEX INTOLERANT, you insensitive clod!!

      --

      Apple built a platform for their ideas, Google built one for everyone's.

    105. Re:LaTeX by dskoll · · Score: 1

      LaTex doesn't translate easily or cleanly into HTML

      Actually, Tex4ht does a superb job of translating LaTeX into readable HTML. We use it internally to produce HTML versions of our product manuals. (We ship both PDF and HTML.)

    106. Re:LaTeX by haifastudent · · Score: 0

      Another option is the Open Office PDF-export extension. It lets you export a PDF file with the original ODF file embedded. PDF viewers read it like a PDF, the user can edit it like an ODF. One file.
      http://extensions.services.openoffice.org/project/pdfimport

      --
      Thank for reading to the sig. You may stop reading now. It is safe. There is no more content. Why are you still reading?
    107. Re:LaTeX by Anonymous Coward · · Score: 0

      XeLaTeX

    108. Re:LaTeX by Anonymous Coward · · Score: 0

      Portability:

      Name me an OS that doesn't have a PDF or PS reader installed by default.

      Windows

      He said an OS, not a fucking toy.

    109. Re:LaTeX by shininghappydude · · Score: 1

      I completely concur about the SIL Gentium font for multilingual printing. I've used it for years now for Russian-English documents. It's only drawback is that it doesn't render nearly as nicely in browsers as it does on the printed page. (My version of Gentium is old, so maybe something about the font has changed so it is no longer the case.)

    110. Re:LaTeX by PDAllen · · Score: 1

      I can tell you exactly what a journal would do with HTML: the same as they'd do with a word doc or a handwritten manuscript, namely charge you a fee for retyping.

    111. Re:LaTeX by PDAllen · · Score: 1

      If you use LaTeX, since it's built on top of TeX, what you are basically doing is accepting Don Knuth's idea of good style. Some of it is good, some is not so good.

      If you use HTML, you aren't really taking anyone's idea of good style; HTML rendering is supposed to be minimalist, the browser is not meant to try to fix pretty hyphenation or whatever.

      HTML is not, originally, designed to do document design. It's not even really meant to do 'pretty' web design, which is why websites are typically a mess of CSS and set styles forcing the HTML to do something it's not basically meant to do - and the later HTML spec confuses this further by trying to make the ugly forcing style standardised. What HTML was originally for was a very basic markup, enough to cope with a few text sizes and colours (and pictures, badly) on a screen that could be anything from ASCII-text only (i.e. drop all the markup and pictures), 12'' mono, 21'' VGA, et cetera - the point being that the markup should be easy to parse for the browser to render as something intelligible.

      LaTeX is not especially antiquated; there's not much 'modern' that you can't do in LaTeX. What you can't easily do, is anything Knuth doesn't like. So, you may want to flow text by an image, but because Knuth doesn't like that you can't do it without a lot of work.

      But the simple fact for a journal is, if everything is in the same style it looks professional. So the journal will do everything in LaTeX, and your argument that your new style looks better will be ignored; they'll just bill you for retyping the document in LaTeX.

    112. Re:LaTeX by PDAllen · · Score: 1

      Unicode support depends on the operating system and the specific compiler. If you want to be sure everything will work on everyone's computer, escape everything that isn't standard ASCII-128. If you only care about your machine and you happen to have a nicely behaved compiler/OS combination, you can happily type an u-umlaut directly instead of \"u and it will work. That said, the escape codes are not exactly hard - it's not HTML where they're non-standard; if you want an acute accent then you use \' followed by the letter, et cetera.

    113. Re:LaTeX by CronoCloud · · Score: 1

      There's plenty of devices that have web browsers but no PDF viewer, the Nintendo Wii, Nintendo DS/DSi, the Sony PS3's GameOS and Sony PSP come to mind. Personally, I'd love to be able to open PDF's or do simple text editing in the PS3's GameOS. Sure, I've personally got Linux on it (and LaTeX), but it would be useful nonetheless.

    114. Re:LaTeX by dangitman · · Score: 1

      Authors ideally should not be concerned with visual layout. They just need to make sure that the logical structure (headings, notes, location of diagrams) is clear. Doesn't matter if they use Courier or Comic sans.

      Yes, this was my point. Word is terrible at both structure and layout. We need something that works better as a source document for both print and web professionals.

      --
      ... and then they built the supercollider.
    115. Re:LaTeX by Anonymous Coward · · Score: 0

      Nope. Every PC I've used for 10 years has had *some* kind of web browser (at least IE5), but I regularly work on machines that have no PDF viewing capabilities.

      Many (most? all?) versions of Windows have no built-in PDF viewer, and many Linux installations don't by default, either. I've got lots of test and production machines like this -- sometimes a virtual machine image, sometimes a physical box.

      (That's not necessarily a dealbreaker for PDF, but it does raise the barrier a bit.)

    116. Re:LaTeX by 1u3hr · · Score: 1
      Yes, this was my point. Word is terrible at both structure and layout.

      Old versions of Word -- eg Word 5/DOS -- DID have strong styling features.

      But Microsoft thought this was too hard a concept, so de-emphasised styles to the point that hardly any users know they exist, let alone use them.

      Most people just start typing and use the formatting bar to make headings or any other different styles. Or you find pages of text that are "Heading 1" converted to 12 point Arial.

      First thing I do when I get a file is scan through it to work out the structure, then spend 20 minutes or more making headings into heading styles and so on, before I can export it to my layout app.

      Microsoft was right though: most users do not understand, and do not want to understand, what a style is. (For layout purposes, "style" maps to "structure".) And in the "user-friendly" era, it's a no-no to expect users to spend 10 minutes reading instructions before using an app: they just want to type and directly format with the button bar. Even if it's somethng they use for hours every day, they WIL NOT crack a book to learn anything if they can get by on point and click.While MSOffice has turned into bloated gigabytes of myriad features, untold menus and functions; most users never use any features (except spellcheck) beyond those in Wordpad, and would be more productive if they did just use that.

    117. Re:LaTeX by AceofSpades19 · · Score: 1

      I don't think many people would be reading academic papers on those devices

    118. Re:LaTeX by The_Wilschon · · Score: 1

      Flowing to fit the screen is a bad thing for readability, in my experience. If I'm reading a significant amount of text on a web page that allows that text to take up the whole width of the browser window, I find that I read it much much more quickly if I horizontally shrink the browser so that there are about (surprise!) 70 characters on a line. Many news sites and some blogs understand this and restrict the text width accordingly. PDFs are formatted for print, and as long as they are, as you describe, two-column 10pt, they are usually right in the sweet spot for easy reading.

      --
      SIGSEGV caught, terminating

      wait... not that kind of sig.
    119. Re:LaTeX by The_Wilschon · · Score: 1

      TeXinfo, which is still TeX based, AFAIK, but is not LaTeX, is commonly used in GNU projects to produce several forms of documentation, including info, HTML, and PDF. It produces all three more than adequately.

      --
      SIGSEGV caught, terminating

      wait... not that kind of sig.
    120. Re:LaTeX by The_Wilschon · · Score: 1
      \usepackage{ucs}
      \usepackage[utf8x]{inputenc}

      This has been able to handle any unicode I've thrown at it, although that has primarily been some greek and math characters. But if you look through the data files for the ucs package, it is clear that the unicode support is quite extensive. Whether or not it is perfectly complete, I don't know.

      Or, you can use XeTeX, which I understand has full unicode support entirely built in.

      --
      SIGSEGV caught, terminating

      wait... not that kind of sig.
    121. Re:LaTeX by andymadigan · · Score: 1

      My point was, that yes, it takes knowledge and talent to be able to properly layout a document, a program can't do it for you. The amount of extra configuration involved to set up layouts is better kept in an external stylesheet than muddling the content itself, so you can't make it a single file.

      Those who were wondering what styles I've been using: I create my own (usually short) styles that basically just create a few macros, depending on the document set I'm working on (documentation for one program might have different needs than another). My interpretation of the original post was that he wanted just *one* file. I'm sure an old installation would be missing (for instance) svn (for displaying revision info in the document). Anything that changes TEXPATH would inherently make your documents work only on your system. There doesn't seem to be any way to guarantee that your LaTeX document will generate the same PDF (or DVI) on every platform.

      As for Dreamweaver to LaTeX: They're both used to layout documents, one does it for a web browser and one does it for a printer. Sure, Dreamweaver is GUI based, but their both at least one level above the actual language of the output. Dreamweaver outputs HTML, but the "language" you are using in Dreamweaver is the GUI. Printers use PostScript, not TeX. There are some who write HTML by hand (myself included) but most people can't be bothered to try to make a serious page without using tools.

      --
      The right to protest the State is more sacred than the State.
    122. Re:LaTeX by Angstroem · · Score: 1

      Anything that changes TEXPATH would inherently make your documents work only on your system.

      I see you didn't grok the concept of environment variables.

      If *I* prefer to put my stuff in ~/.TeX/ and *you* prefer to put it in c:\misc\texpckgs\, then we can exchange compileable documents *because* we are individually able to adjust TEXPATH to our needs. All you need to tell me (or vice versa) is which standard and non-standard packages are required. It's then up to me (or you) to install them where they can be found and/or update TEXPATH to do so.

      As for Dreamweaver to LaTeX: They're both used to layout documents,

      Same goes for a gazillion of other tools. Still, there's a difference between tools which put the layouts entirely into your hands, and tools, which have built-in printer/layouter intelligence.

      That said, I definitely wouldn't use LaTeX for typesetting fancy magazines or flyers, as many required graphical elements (floating text/pictures being one of them...) are not easily accessible -- and dealing with pictures still is a major pain in the neck.

    123. Re:LaTeX by andymadigan · · Score: 1

      I understand environment variables perfectly. My point is that if you modify your TEXPATH to add non-standard packages, then your documents will not be freely interchangeable (you'll have to include a list of packages to install). That's a problem for collaboration.

      It's not the environment variable itself that causes the issue, it's the use of it.

      --
      The right to protest the State is more sacred than the State.
    124. Re:LaTeX by Angstroem · · Score: 1

      No, it's not the use of the environment variable. It's the use of non-standard packages -- and the will to throw them wherever you like them. If they are placed in a standard directory, e.g. ~/.TeX/ or /usr/share/textmf/tex/latex/ [*] then you don't even need to fiddle around with the environment variable.

      Still, you would need s/o to install that very fancy package you liked to use.

      [*] Or, you're a nice guy and copy the used style files into the respective working directory which then gets archived, compressed, and sent to the collaborator. That would be the Word approach, clobbering everything into one container, saving a few hassles but eventually leading to different versions of the same style file hidden in the respective document directories.

    125. Re:LaTeX by Anonymous Coward · · Score: 0

      If they comply to the European Comission verdict, Windows won't have even a web browser.
      Doesn't Vista come with some simple PDF reader? I is it preinstalled by OEM customized SW?

  2. Congratulations! by Anonymous Coward · · Score: 5, Insightful

    Congratulations, you're the 5,134,978th person to suggest a change to HTML which will prevent it from being reflowable!

    Please step up to the spiked door in front of the acid pit to claim your prize.

    1. Re:Congratulations! by Memroid · · Score: 5, Funny

      Speaking of HTML/CSS... can I be the first person to suggest that we rename "Anonymous Cowardon" back to "Anonymous Coward"?

    2. Re:Congratulations! by TheThiefMaster · · Score: 1

      Can we PLEASE get a horizontal layout tag or property in HTML like all good resizable UIs? Getting divs to go next to each other horizontally without using float: left and float: clear is difficult, and using that approach is misusing floats and feels unclean.

    3. Re:Congratulations! by Anonymous Coward · · Score: 0

      Does someone care to comment why it is that HTML isn't reflowable?

    4. Re:Congratulations! by Lehk228 · · Score: 1

      they are called tables.

      --
      Snowden and Manning are heroes.
    5. Re:Congratulations! by styrotech · · Score: 1

      Other layout options have been in the works for a long time, and it will still be a long time before browser support is ubiquitous.

      CSS 3 has support for newspaper style text columns, and CSS 2.x allowed for laying out arbitrary elements using the same rules as various table elements - CSS 2 was released over 10 yrs ago, but IE only recently supported this part in IE 8.

      CSS 3 has languished while everyone waited for widespread CSS 2.x support to actually happen. It seems like we'll just need legacy IE versions to die out before anyone bothers with progressing CSS 3 further. Hopefully it doesn't die on the vine the same way XHTML 1.1 or 2.0 did.

    6. Re:Congratulations! by Anonymous Coward · · Score: 0

      perhaps it's some kind of dinosaur

    7. Re:Congratulations! by Anonymous Coward · · Score: 0

      as a fellow Anonymous Cowardon, I second your motion!

    8. Re:Congratulations! by SlaveToSoftware · · Score: 1

      Awesome!

  3. PDF? by sys.stdout.write · · Score: 5, Informative

    As much as I hate Adobe, there's a reason why PDF files dominate acadamia..

    1. Re:PDF? by jskora · · Score: 1

      The thing with books is that most folks who can read and manipulate them. PDF is similar at this point, relatively ubiquitous and easier to use for most people.

    2. Re:PDF? by amicusNYCL · · Score: 1

      The thing with books is that most folks who can read and manipulate them.

      ...continue..

      Most folks who can read and manipulate books.. what? WHAT DO THEY DO? I must know!

      --
      "Our two-party system is like a bowl of shit looking at itself in a mirror." - Lewis Black
    3. Re:PDF? by Anonymous Coward · · Score: 0

      and the reason being?

      p.s. regarding AC, it is not that I am afraid. It is just that I am too lazy to create an account or that I forgot all about it. There are just too many web sties with too many accounts. Just being lazy that is............

    4. Re:PDF? by tygerstripes · · Score: 1

      I think that, while the GP meant to complete the sentence with a clause.

      --
      Meta will eat itself
    5. Re:PDF? by greyhueofdoubt · · Score: 1

      You are free to hate on Adobe- PDF is an open,standardized format now. I use it all the time in OS X; File>Print>Print as PDF...

      -b

      --
      No offense, but I've stopped responding to AC's.
    6. Re:PDF? by haifastudent · · Score: 0

      With an extension, Open Office can even export PDF files with the source ODF embedded:
      http://extensions.services.openoffice.org/project/pdfimport

      PDF readers see it as a legal PDF file to display, Open Office sees it as a legal ODF file to edit. Truly the best of both worlds.

      --
      Thank for reading to the sig. You may stop reading now. It is safe. There is no more content. Why are you still reading?
    7. Re:PDF? by vidaddy · · Score: 1

      At Videography Lab we are in the middle of a major copyright claim based originally on a website. The US Patent Office will not consider any content, academic or otherwise that is submitted via HTML. The documents must be fixed like jpg or other bitmap files ... perhaps a dynamic PDF. In this case all reference files in the PDF must be bitmaps that become part of the PDF package submitted. We understand that you are talking academia, but patents and copyrights are the end game. So we would like to hear from anyone who has made a claim with a dynamic PDF and succeeded. Bob Kiger - Videography Lab

    8. Re:PDF? by Jake+Griffin · · Score: 1

      Except that it makes it an excessively large file. If you need a PDF, save it as a PDF. If you need it as ODF, save it as ODF. If you need both, save it as BOTH. I don't want to have to keep an oversized file around for "compatibility." If that was the way to do it, why not have it formatted in XML, HTML, and plain ASCII text too?

      --
      SIG FAULT: Post index out of bounds.
  4. LaTeX by SanguineV · · Score: 1

    LaTeX: it has everything that you are looking for and can be easily compiled to ps, dvi, pdf, and (I am told but haven't used) html. It even plays nicely with version control, bibliography management (BiBTeX), etc.

    As a bonus you can run it on linux via command line.

  5. wondering if we should let go of standard tags by tjstork · · Score: 1

    I am wondering if the whole concept of CSS modifying a set of stock tags is unwieldly, and if a simpler Html might be one that allows you to first specify a page schema with custom tags, then, renders those using CSS to define custom tags. So, instead of having pages with div class = "menu", we might have , etc.

    --
    This is my sig.
    1. Re:wondering if we should let go of standard tags by Anonymous Coward · · Score: 0

      I suggest you do a little research on XML and XSL.

    2. Re:wondering if we should let go of standard tags by Hecatonchires · · Score: 2, Insightful

      "You know, if we abstract this back one level" Now we find the true terror of computer science.

      --

      Yay me!

    3. Re:wondering if we should let go of standard tags by Draek · · Score: 3, Insightful

      There's no complexity problem that cannot be solved by adding a layer of abstraction, nor performance problem that cannot be solved by removing a layer of abstraction.

      Though I must note that you can already define your own tags in HTML+CSS and, while the W3 validator will (rightfully) complain loudly about them, most browsers deal with them just fine.

      --
      No problem is insoluble in all conceivable circumstances.
    4. Re:wondering if we should let go of standard tags by CarpetShark · · Score: 1

      Yep, this is exactly what XML and XSL are for. You can generate PDFs from them, etc. Look into document-specific XML formats like DocBook, TEI, ODF, etc.

    5. Re:wondering if we should let go of standard tags by Anonymous Coward · · Score: 0

      /. accidentally your tag.

    6. Re:wondering if we should let go of standard tags by styrotech · · Score: 1

      That is already the case. CSS isn't just designed for HTML - it is designed to be able to be applied to any XML language also. Although XHTML is really the only common XML language where the user agents have implemented CSS rendering engines (they are very complex).

      Do any web browsers handle applying CSS to arbitrary (but valid) XML files that aren't HTML?

    7. Re:wondering if we should let go of standard tags by Ant+P. · · Score: 1

      Firefox understands CSS links, I seem to remember the BBC site using it in their RSS feeds. Konqueror (and by extension Webkit) don't seem to do XML at all.

    8. Re:wondering if we should let go of standard tags by Ant+P. · · Score: 1

      There should be an <?xml-stylesheet ?> in there after "CSS". /.'s shitty comment parser strikes again...

    9. Re:wondering if we should let go of standard tags by maxume · · Score: 1

      Lots of people essentially do this using divs and spans. For a while, people were also shouting about the css code that they started each document with, usually css that reset the spacing, padding and fonts for almost every element.

      --
      Nerd rage is the funniest rage.
    10. Re:wondering if we should let go of standard tags by idlemachine · · Score: 1

      You mean something like what this article outlines? Yes, You Can Use HTML 5 Today!

  6. CSS3 is the solution by Tiles · · Score: 5, Informative

    This is exactly what CSS is designed for, presentation. The CSS3 Paged Media module already defines a number of the properties and settings you're going for. It even includes positions such as @bottom-center to allow you to position footnotes and references. The only thing missing is a way to mark this up in HTML, which could easily be done with anchors and the longdesc attribute, coupled with the CSS content: property. What you're looking for is a CSS3 enabled browser, not a new specification.

    1. Re:CSS3 is the solution by rs79 · · Score: 1

      Nice. So there's 3 ways to skin a cat.

      Normally I'd say use PDF cause the entire planet seems to use PDFs to get the page numbers right but acrobad became such an obnoxious pig in the last couple of releases I regard PDF now more as a warning than anything else.

      And yes stupid new spell checker in Opera, I means to say "acrobad". Stop underlining it damn you.

      --
      Need Mercedes parts ?
    2. Re:CSS3 is the solution by Anonymous Coward · · Score: 1, Informative

      Why not use a proper PDF viewer, like one derived from the OSS poppler stack? I find xpdf to be a valuable tool on Linux, and prefer it with PDF docs to my old (very old!) standard of ghostview with PS docs. (PostScript being what we academics used to exchange print-media formatted papers before PDF appeared.)

      I rather like the newer latex modules that can be enabled when producing PDF outputs, so that tables of content and references are hyperlinked in the PDF document, while still providing a print-quality typset document for the audience.

      Even for less academic technical documents, I find most reading/reviewing collaborations regarding documents are much better served by "let's now discuss page 74" rather than trying to navigate everyone via section labels, paragraph counts, etc. while they look on in their own re-flowed views.

    3. Re:CSS3 is the solution by Hecatonchires · · Score: 1

      I've switched from adobe to foxit reader for my pdf viewing. Much much quicker to load. You do need to uninstall a toolbar tho, which I didn't like.

      --

      Yay me!

    4. Re:CSS3 is the solution by Agent+ME · · Score: 1

      You need to uninstall a toolbar? I'd call that a plus on most systems I see with too many Yahoo/Google/MSN toolbars installed on the browser.

    5. Re:CSS3 is the solution by Anonymous Coward · · Score: 0

      Hell, a browser that properly follows the CSS **2** rules regarding print documents would be nice. Unfortunately, printer output control via CSS is in terrible shape. Now admittedly, I haven't tested Chrome 2 or Firefox 3.5 on a printer, but Firefox 3 and Webkit's handling of print specific CSS attributes is no better than IE's -- and all browsers handling of this area of CSS is no stronger today than it was in 1999, over ten years ago.

    6. Re:CSS3 is the solution by dgatwood · · Score: 1

      Odd. I've use the CSS 2.1 print media standards on several occasions, and they have worked consistently well for me since Safari 3 came out.... What doesn't work for you? You aren't talking about the obsolete (and massively broken by design) CSS 2 page size bits that were removed in CSS 2.1, are you?

      --

      Check out my sci-fi/humor trilogy at PatriotsBooks.

    7. Re:CSS3 is the solution by maxume · · Score: 1

      It may be because I installed a bunch of ram at about the same time I switched, but Reader9 is much more responsive than 7 or 8 (it is still a bit of a pig the first time it loads, but subsequent launches are quick enough).

      I had started looking for an alternative, but I needed to install 9 for the Adobe only commenting support (they use some sort of certificate system so it only works in their products) and the experience has been smooth enough that I stopped messing around with Foxit and Sumatra.

      --
      Nerd rage is the funniest rage.
    8. Re:CSS3 is the solution by Anonymous Coward · · Score: 0

      I don't think he needs a browser, just a CSS3-compliant client that will print out his documents correctly, such as Prince, which creates perfect PDFs from HTML+CSS3. See "Printing a Book with CSS: Boom!". That should be enough, for the time being, until browsers catch up.

  7. ODF by minsk · · Score: 2, Insightful

    LaTeX already got mentioned, and probably makes more sense.

    If you really want an unreadable super-general XML-based format, use ODF.

  8. Why not use CSS? by mckinnsb · · Score: 1

    I don't seem to understand why you couldn't simply change the properties of standard HTML tags to fit your needs with a simple CSS sheet. HTML, after all, was designed with the explicit purpose of representing a document.
    Otherwise, if you want special tags, use LaTEX.
    Otherwise, I'm sorry, its really a crazy idea.

  9. Have you looked at PrinceXML? by sandford · · Score: 3, Informative

    Is there a reason you don't want to use CSS? Because, there are already CSS extensions that do exactly what you want. The book Cascading Style Sheets - Designing for the web, was written using only HTML and CSS and prepped for printing using PrinceXML. The PrinceXML web site has a bunch of HTML+CSS similar samples, including academic papers.

    1. Re:Have you looked at PrinceXML? by ccvqc · · Score: 2, Interesting

      I've had great experience with PrinceXML -- same document to generate both interactive web page and printable PDF using CSS3 tailored to the media. If you already know HTML/CSS, extending yourself to CSS3 is a lot easier than learning LaTeX.

    2. Re:Have you looked at PrinceXML? by shutdown+-p+now · · Score: 1

      Thirded. PrinceXML is precisely what you ask for. See the sample output and judge for yourself.

    3. Re:Have you looked at PrinceXML? by Forget4it · · Score: 1

      In PrinceXML using CSS you can't go in and out of italics on a the page running heading.

      --
      Artificial intelligence is the study of how to make real computers act like the ones in the movies.
    4. Re:Have you looked at PrinceXML? by Anonymous Coward · · Score: 0

      Prince is what the poster is looking for, but Zaffle's comment below says why it is unsuited for traditional academic work.

    5. Re:Have you looked at PrinceXML? by Plug · · Score: 1

      So? That's one bug. Report it to them, they'll probably fix it. No reason to write off the entire program.

    6. Re:Have you looked at PrinceXML? by Forget4it · · Score: 1

      It not a bug - it's a limit to the CCS approach - you can't put markup in CSS textual items. It's out of the spec.

      --
      Artificial intelligence is the study of how to make real computers act like the ones in the movies.
    7. Re:Have you looked at PrinceXML? by wolverine1999 · · Score: 1

      I use it and I'm quite happy with PrinceXML...

    8. Re:Have you looked at PrinceXML? by Plug · · Score: 1

      My apologies. A bug in the spec then :)

  10. Why not use CSS? by Homburg · · Score: 1

    Something that would make no use of CSS?

    Given that CSS does this already, what's the advantage of adding another way of doing it without CSS?

  11. Wrong, in many ways by Zaffle · · Score: 5, Insightful

    What you want (being able to define pages) is wrong in many many ways.

    You should, as an authoring tool, never define a page, or its dimensions, especially academic works, which will be printed in different formats, on different paper (A4/Letter/Tradeback/etc/etc)

    At most, whatever markup you have, many define things like page breaks, but even then, they are more a typesetting issue.

    What you want is either LaTeX or DocBook.

    --

    I use to have a funny sig, but slash cut it off, and I forgot what the punchline was.
    1. Re:Wrong, in many ways by Ambiguous+Coward · · Score: 4, Funny

      You should, as an authoring tool, never...

      Who're you callin' a tool?

      --
      Their may be a grammatical error, misspeling, or evn a typo in this post.
    2. Re:Wrong, in many ways by Anonymous Coward · · Score: 0

      Who're you callin' a tool?

      I pity the tool!

    3. Re:Wrong, in many ways by drinkypoo · · Score: 1

      What I found most confusing about the question was the reference to paging as relates to bibliographic references. This is a strange question for two reasons:
      1) Bibliographic references are typically treated as endnotes; i.e. the references appear throughout the text (or at least chapter) and follow the text in question.
      2) Footnotes and endnotes are both creations which existed because hypertext did not. We still use symbol reference hyperlinks next to relevant text in hypertext because that is the commonly accepted scheme of writing certain types of document. In a hypertext document, however, the notes can simply jump to the footnote or endnote, and the user can click the 'back' button to get back to where they were (if you haven't totally broken the page metaphor with some sort of unnecessary dynamic content...)
      It seems to me that as you say they want a documentation markup language which can be exported to HTML, or that they want PDF.

      --
      "You're right," Fisheye says. "I should have set it on 'whip' or 'chop.'"
  12. Static Page Feeds are available by caffiend666 · · Score: 4, Informative

    Static configurations are available already, not the intelligent ones being requested. Has sufficed for what I needed:

    To have print page break add: <p style="page-break-before: always">

    Also, to hide odd font and underline for links:

    <STYLE TYPE="text/css" MEDIA=print> <!-- A { text-decoration: none; color: black } --> </STYLE>

    Yes, they have to be massaged a little.

    --
    Here's to losing my Karma Bonus again....
    1. Re:Static Page Feeds are available by Anonymous Coward · · Score: 1, Insightful

      1995 called, they want their old HTML back.

      Seriously, no browser has needed the HTML comment stuff inside of style tags in many years. And don't even get me started on the uppercase tag names...

    2. Re:Static Page Feeds are available by dgatwood · · Score: 2, Insightful

      The HTML comments inside of style tags are still a good idea. Although no modern browser requires it, not everything that parses HTML is a full-blown web browser. Those extra seven bytes don't hurt anything, and they pretty much guarantee that any code with anything resembling a proper HTML parser won't interpret the styles, JavaScript, etc. as content even if the tool doesn't understand or care about specific tags.

      Perhaps more importantly, from a purely philosophical point of view, leaving out the comments in style tags is wrong. That line noise is not part of the content, and therefore should be fundamentally separated from the presentation. Other stuff like that (link URLs, image URLs, inline styles, etc.) are all in HTML attributes or otherwise sequestered from the text content. Putting CSS or JavaScript bare inside a tag without surrounding it with comment markers violates the fundamental philosophy of HTML. Yes, this means the XHTML spec is fundamentally defective by design.

      I'll leave the uppercase/lowercase flame war to people who care.

      --

      Check out my sci-fi/humor trilogy at PatriotsBooks.

    3. Re:Static Page Feeds are available by TheThiefMaster · · Score: 1

      It shouldn't need to be commented due to being in a tag which is only allowed in the <head> tag of the document. Same for script.

    4. Re:Static Page Feeds are available by Ant+P. · · Score: 1

      Eh? Nowhere in anything does it say script tags are only allowed in the <head>.

      HTML comments shouldn't be there in any case, since both tags' content is supposed to be CDATA. A web browser could completely ignore everything between those <!-- --> markers and still be within spec - and for XHTML pages that's exactly what they do.

      Anyone concerned about working around bugs in broken HTML parsers should learn to use external links instead of prolonging the lifespan of hacks and workarounds.

  13. hey, why don't we... by Anonymous Coward · · Score: 2, Insightful

    create yet another little-used and poorly supported document format...

  14. unnecessary by Anonymous Coward · · Score: 0

    the document size will increase.
    normal text like ("this is my file and my image and my link and my e-mail")
    will need more tag and element to let the browser speak with it.
    e.g ("this is mymy file") and so on.

    this is just an unnecessary waste for bandwidth and time :).

    especially when there is an alternative solutions, e.g PDF, DOC, OpenOffice.

    cheers

  15. Nope by nlawalker · · Score: 1

    HTML describes a document. "Document" used to imply printed pages, but it doesn't anymore. HTML doesn't have anything to represent the notion of a page because documents don't have pages.

    1. Re:Nope by nlawalker · · Score: 3, Informative

      I should have been more clear: HTML describes the *structure* of a document, of which pages are not a part.

      As many have said above, you could use CSS if you really wanted to, since page specifications are presentational aspects of the document. Or, you could use LaTeX, which is designed for this kind of use.

    2. Re:Nope by Anonymous Coward · · Score: 0

      Exactly. An additional point - if there is some reason why the page breaks have importance - a good example would be if you're encoding a famous paper which is cited by its page numbers - and you want the page breaks to be part of the document structure, HTML is the wrong markup language: you want TEI, which has a page break element and is used for literary texts and other "texts which are important as texts."

  16. Not a bad idea but it points to a larger issue by jeffgtr · · Score: 1

    Actually this makes a great deal of sense to me. I'm not sure on this but I think HTML5 contains tags for many of the things needed. I don't think css is the answer though as it is for presentation only. HTML is for hierarchy and structure of information as is XML. The part that makes sense with this is that it would be standardized (if you can keep Microsoft out of it) and could easily be transitioned back and forth between the web, ebooks and whatever device came next. PDF is widely used but truly it is a pain to convert into a structured document. Word is a nightmare with all of the jumbled up MS proprietary tags. I've yet to see an online editor that will clean up that mess with a simple copy and paste. The real issue is standardization in the way we store textual information. It's a huge issue and frankly Microsoft needs to be called on the carpet for manipulating and at the very least getting in the way of standards. It's refusal to recognize standards has caused needless expense to anyone that publishes information on the web. Few people realize the damage MS has caused on the web. Everyone bitches and moans about their operating system but only those directly involved in creating content for the web seem to complain about IE and their corruption of a standardized open document format. The damage they have done in this arena will haunt us longer than windows will, in my humble but sincere opinion.

  17. You don't actually want HTML by Anonymous Coward · · Score: 2, Insightful

    Seriously. It's pretty bad. You can, however, use Docbook (or your own schema or Docbook extended with your own stuff) and XSLT it into XTHML (or something entirely different) at the end.

    Most likely you just want to use Latex though.

  18. what do you want to do? by jipn4 · · Score: 4, Informative

    If you want to save the source form or markup, use a language designed for it: LaTeX. LaTeX lets you represent all the things you would want to represent in an academic paper, it's fairly readable, very widespread, and has tons of tools. And LaTeX converts to both HTML and PDF.

    If you want to display on the web, use HTML. It's meant for the web. It's not a good representation for paged media. If you must represent paged media, you need to use CSS or XSL, but you probably don't want to.

    If you want archival quality paged representations, PDF is the only game in town really. HTML with CSS doesn't come close. But it doesn't make sense to save your own papers only in PDF because PDF is not really editable and doesn't have the semantic information.

    1. Re:what do you want to do? by Akir · · Score: 1

      PostScript, the technology upon which PDF is based on, is still around and running. It can do everything required for this application, since it was designed for paged media (such as real pages). Best of all, it's a more open standard than PDF is, and it's actually slightly more ubiquitous since there are so many printers that use it.

    2. Re:what do you want to do? by EvanED · · Score: 1

      it's actually slightly more ubiquitous since there are so many printers that use it.

      To play devil's advocate, I'd guess there are more computers without a printer than there are computers without the ability to read PDF files.

  19. page breaks with css by Anonymous Coward · · Score: 0

    From JavaScript Site Page breaks with css

  20. Don't use HTML by emandres · · Score: 3, Informative

    You wouldn't want to use HTML for something like this, especially with newer versions of HTML. There has been a steady transition in HTML away from specification of the aesthetic appearance of a page. For this reason tags like and are considered nonstandard anymore, mostly because CSS does a way better (and cleaner) job of it.

    --
    The only way to tell the difference between a hamster and a gerbil is that the hamster has more white meat.
    1. Re:Don't use HTML by Wonko+the+Sane · · Score: 3, Insightful

      HTML was never supposed to do those things in the first place. The tags you are referring were hacks invented because CSS did not exist yet.

      Unfortunately there is a whole generation of "web developers" who don't understand the concepts of semantic markup and output device-independent layouts.

    2. Re:Don't use HTML by Will.Woodhull · · Score: 1

      You wouldn't want to use HTML for something like this, especially with newer versions of HTML

      Well, that sort of depends on where the value of the paper is.

      Some academic papers are pure fluff, and-- you are right-- these do not fare well in a semantic mark-up language like HTML. If there is no meaning in the content, then the semantic tags are being applied in a purely arbitrary manner and the result ends up looking as void of sense as it is actually is. OTOH, if the content of the academic paper has meaning, then the semantic mark-up of HTML will emphasize this meaning and make it clear to the bots and spiders that index the web, and the paper will be quickly integrated into the universal library of knowledge that is being built. That's a good thing.

      As far as appearances go, once the semantics are done properly, then CSS can style the paper so it presents well on the screen and, through the modern miracle of "@media print", it can also produce quality hardcopy.

      It is true that it would be very difficult with today's CSS to generate the kind of detailed hardcopy formatting needed to comply with the American Psychological Association's standards and other, similar, standards. But these standards were put into place to enforce a common semantic meaning on whitespace and typography usage; they were actually attemptng to do the same thing on paper that HTML semantic tags succeed in doing in digital formats. They are increasingly being replaced by HTML (since HTML does this so much better). It has been many long years since I've heard any academics talking wishfully about page scanning machines that could run through the last five years of JAMA journals and extract all the citations. Now you can do that with a few lines of script built around a regex that searches for <cite> tags...

      --
      Will
    3. Re:Don't use HTML by am+2k · · Score: 1

      To their defense, the HTML standard doesn't really allow fully semantic markup in the first place, and it's hard to get it to that point as close as possible.

      For example you have to do a lot of div wrapping (even double-wrapping pretty frequently) if you want a match a precise layout. Those have to happen in the HTML file.

    4. Re:Don't use HTML by Wonko+the+Sane · · Score: 1

      HTML isn't about specifying a layout. The purpose is to add semantic metadata so that rendering device can decide how to best present the information to the user.

      A properly written HTML document should present information to the user in a useful form regardless of if the user is using a traditional browser, mobile phone, braille terminal or text-to-speech screen reader.

      For example, the <EM> tag has a specific semantic meaning that can be used by all of the above output devices to enhance meaning, while <I> does not.

      It's a different mindset than desktop publishing. Instead of telling the browser, "Make this text bold" you should tell it why the text should be bold. That way the browser (visual or otherwise) can decide how best to present the information to the user.

  21. XML/XSL/FOP/PDF by sgrover · · Score: 1

    I use XML/XSL to render my content as needed - including images and SVG graphics where needed. Then I use the FOP project to convert the generated XML-FO into PDF. Works great and can be scripted easily. But the learning curve is kinda steep. Luckily there are a few tutorials out there.

    1. Re:XML/XSL/FOP/PDF by Lil'wombat · · Score: 1

      I 2nd the XSL-FO recomendation.

      XML is like violence. If it doesn't solve your problems, you're not using enough.

      --

      Truth: If it's not one thing, it's another

    2. Re:XML/XSL/FOP/PDF by caerwyn · · Score: 1

      Some people, when confronted with a problem, think "I know, I'll use XML." Now they have two problems.

      --
      The ringing of the division bell has begun... -PF
  22. Anonymous Coward by Anonymous Coward · · Score: 0

    Yeah, seriously? This is not a valid slashdot article. PDF and numerous other formats exist for a reason. Why reinvent the wheel, there was no reasons stated in this article why any of the other, very popular open standards for documents couldn't be used.

    Ugh... who submits these articles?

  23. In my day by Barny · · Score: 2, Insightful

    I used netscape communicator to write all my papers for uni, mainly because it was available under windows and unix (IRIX in our case) and could be read by anyone on any platform.

    It was a reasonably easy to use editor, without all the useless crap most others have.

    A few lecturers were quite impressed with the idea, the portability and cost were big factors.

    --
    ...
    /me sighs
    1. Re:In my day by Antique+Geekmeister · · Score: 0, Flamebait

      And the rest of us used Emacs, just to get the indenting consistent and make sure we closed our parentheses correctly. But the amount of time people waste on page breaking where they want, font selection, "just so" footnote standards, etc. is a sign of people who don't have anything to actually say.

      The exception is people who make visual illusion picture books. Other than that, let's get over our "web designer", IDE driven fascination with layout, and use a straightforward plain text format. Then get on with writing something worth reading, not something to be treasured for its footnote layout.

    2. Re:In my day by brusk · · Score: 1

      How did you handle footnotes? Page numbers?

      --
      .sig withheld by request
    3. Re:In my day by Barny · · Score: 1

      get on with writing something worth reading, not something to be treasured for its footnote layout.

      Thats the main reason I used it, I could just open it up on any of the machines they had there (at the time, netscape was the main browser used on campus) and type without having to think "am I in the right font?" or stuff around with paragraph setups and crap that office used.

      Of course at the time I was reasonably new to unix so it was something I was comfortable with and knew about already.

      --
      ...
      /me sighs
    4. Re:In my day by Lemmy+Caution · · Score: 1

      Academic citation guidelines allow you to cite paragraph numbers instead of page numbers.

    5. Re:In my day by Barny · · Score: 1

      Select everything you want on one page, wrap it in a table tag, set table to 100% of screen height and leave a separate row at the bottom for page numbers (and a second for footnotes if there were any).

      --
      ...
      /me sighs
    6. Re:In my day by AigariusDebian · · Score: 1

      You should be shot for that. Shot to death.

    7. Re:In my day by Barny · · Score: 1

      Yup, but it was better than trying to get windows 98 running in a VM under IRIX and putting microsoft office on it ;)

      --
      ...
      /me sighs
    8. Re:In my day by Antique+Geekmeister · · Score: 1

      Oh, I've no issue with _your_ use of a simple, available tool. Netscape also produced clean HTML, not the "let's HTML tag every character independently with its own font information so it looks like Microsoft Word" format that many tools produce when they output in HTML. (Microsoft Office is amazingly bad about this.)

    9. Re:In my day by crankyspice · · Score: 1

      I used netscape communicator to write all my papers for uni, mainly because it was available under windows and unix (IRIX in our case) and could be read by anyone on any platform.

      I did the same thing, mostly because I dual-booted between Windows NT and Slackware Linux. Actually, Netscape was (before I learned LaTeX etc) the only app I could get decent formatted printing out of, on my PCL3 HP DeskJet 400, on Linux, circa 1996. (I was a n00b. I don't think I'd even discovered a2ps yet!)

      But that was before my academic work had to include footnotes, cross-references within the document, etc. Writing papers in HTML was use of a simpler tool, from a more civilized age. ;)

      These days for me it's OpenOffice.org FTW, it's not perfect but it's usable for what I need it to do (mostly keeping contract revisions straight, with cross-references and changes tracked, etc). Still cross-platform, much better results.

      --
      geek. lawyer.
    10. Re:In my day by brusk · · Score: 1

      Actually I think the better punishment would be to require him to stick to this system for the rest of his life, for everything he ever writes.

      --
      .sig withheld by request
    11. Re:In my day by Anonymous Coward · · Score: 0

      Well think about it... you don't really need foot-notes for most classwork, and if you do, you can put them all at the end of the document (before or after the bibliography).

      The page numbers show up when you print the document with most web browsers.

    12. Re:In my day by tbird81 · · Score: 2

      But the amount of time people waste on page breaking where they want, font selection, "just so" footnote standards, etc. is a sign of people who don't have anything to actually say.

      While you're probably correct, I spent ages choosing the correct font, making sure pair-kerning was on, making sure headings were standard in size and form (i.e. setting up new Styles), getting rid of widows/orphans, and basically all sorts of procrastinating shit to delay actually typing the essay.

      I always did much better than the people with non-professional looking assignments, even though most of what I write is crap.

      Never underestimate the power of first-impressions and of looking good.

    13. Re:In my day by Antique+Geekmeister · · Score: 1

      Sadly. You're correct in many cases: packaging over content is why a lot of truly frightening snack foods remain on store shelves. (What _are_ Combos, anyway? Did anyone ever actually want them?)

      However, I suspect that you underestimate the quality of your actual content, especially compared to some of the recent business plans I've seen colleagues write. You seem coherent and actually make your point, and may raise the average Slashdot quality significantly.

  24. XSL:FO by Roxton · · Score: 4, Informative

    There's a little-used standard that came out of the W3C along with XSLTs called XSL:FO. You write your document in XSL:FO markup, and then one of any number of processors like XEP to convert it into PDF or what have you.

    http://www.w3schools.com/xslfo/default.asp

    One of the original purposes of it was so that you could use XSLTs to transform the same XML data into both XHTML or XSL:FO for publishing. The standard never took off though. XSL:FO just doesn't have enough options to be typographically interesting, compared to SVG.

    Of course, the right answer is LaTeX, but you might want to give XSL:FO a try for familiarity's sake.

    1. Re:XSL:FO by Anonymous Coward · · Score: 0
      We can go back and forth in time on the specifications and find all sorts of interesting specifications, but the question is will they be portable. That is, will common software decode them as users expect. HTML is derived from IBM SGLM which is derived from GML, which, like runoff is a document preparation system, and will do everything the op wants. The disconnect we are seeing here is likely the desire for a WYSIWYG fixed presentation across platform, a la Adobe PDF, and the reality that most WYSIWG editors doesn't give a user that level of control. For instance, very few web browsers pass Acid 3, and i don't know of any that pass Acid 4, test that guarantee that things will always look the same.

      XHTML is a good answer to the original question, however, as it will allow the bells and whistles needed for an academic documet. However, as the question was ill posed(it suggested a solution rather than stating a general problem to be solved with constraints) there is no way an ideal solution will be achieved.

    2. Re:XSL:FO by CarpetShark · · Score: 1

      HTML/CSS does not guarantee that things will "always look the same"; that's not the intent at all.

    3. Re:XSL:FO by styrotech · · Score: 2, Informative

      You write your document in XSL:FO markup, and then one of any number of processors like XEP to convert it into PDF or what have you.

      Ouch :)

      Hand writing XSL:FO is extremely painful - very fiddly and the embedded layout/styling gets tedious quickly. It's kinda like writing a very very long webpage using HTML 3.2 with all the nasty old embedded presentation tags (but worse).

      One of the original purposes of it was so that you could use XSLTs to transform the same XML data into both XHTML or XSL:FO for publishing.

      I have a feeling that is a bit backwards. The original standard was XSL and it was going to include everything related to transforms and publishing, but it got too large and complex so they split it into XSLT for the transforms and XSL:FO for the page description language. Much better that way, as XSLT has wider uses than publishing.

      I think XSL:FO was always intended to be generated via XSLT rather than hand written, and I don't think that has changed at all. That way if you only need to a styling change, rather than making a zillion edits throughout the document you change the transform. It is analogous to how CSS make styling changes much easier with HTML.

      Personally I'd rather use some other semantic format (eg Docbook, DITA etc) that can be transformed into XSL:FO via XSLT when required (eg on the way to PDF generation). That way you already get some handy XSLT starting points to work with. Making the occasional small tweak to XSLT isn't too bad, but writing a large complex set of transforms from scratch isn't something I'd want to do :)

  25. Prince XML by cwt137 · · Score: 0

    I think writing papers using XHTML and CSS 2.1 or 3 is a good idea. Then you can use Prince XML to convert it to PDF. Their site has a nice sample or two of journal articles / conference papers. The quality of the renderer is great. It was even used to create a professional book, Cascading Style Sheets: Designing for the Web.

  26. Docbook, definitely by ishmalius · · Score: 1

    It has exactly what you need, an html-like format, but tagged by meaning, not presentation. The project has tools to convert it to printable formats.

    The spec: http://www.docbook.org/

    The tools: http://docbook.sourceforge.net/

  27. Use DITA by wooden+pickle · · Score: 4, Informative

    Someone mentioned XML/XSL/FO. Don't try to write your content in XSL-FO. You'll hate every minute of it.

    I'd look in to using DITA (Darwin Information Typing Architecture). It's a set of canned XML structures, plus a specification for how to process and customize those structures. It includes tags for stuff like footnotes...I bet it covers a lot of your use cases. There are some good intros to how these XML structures work here: http://dita.xml.org/book/dita-wiki-knowledgebase

    As DITA is XML, you can convert it to HTML and whatever else you feel like, pretty easily. There's an open-source implementation of the DITA spec called the DITA Open Toolkit (http://sourceforge.net/projects/dita-ot/). The DITA Open Toolkit includes stylesheets/scripts to publish HTML and PDF, among other things. PDFs are published via XSL-FO. Just like HTML needs a web browser to render something useful, XSL-FO requires a FO processor to create a PDF. So, in the end you write DITA, XSLT and other scripts transform that DITA to XSL-FO, the a FO processor consumes the XSL-FO and spits out a PDF. The DITA Open Toolkit comes with an open-source FO processor (Apache FOP). FOP doesn't fulfill everyone's needs, but it might work very well for you.

    Unfortunately, working with the Open Toolkit and customizing its output can be a bit unwieldy. http://groups.yahoo.com/search?query=dita+users is a pretty good place to look for help.

  28. RDFa to model bibliographical data by TwistedPants · · Score: 1

    Don't reinvent, as so many have already said. CSS works for print media, LaTeX works wonderfully, pdfs work wonderfully. RDFa lets you really define the semantics of anything - People, Businesses, Biliographic data in a workable way.

  29. LyX + LaTeX ... DUH! by WolphFang · · Score: 1

    LyX + LaTeX ... DUH! It even makes it easy to take public domain OCR'ed books and reset them into something extremely nice.... *quickly*

    --
    leather-dog muksihs
    Blog: @muksihs
  30. texexplorer by e**(i+pi)-1 · · Score: 4, Interesting

    yes, latex is nice, but it would be even better, if basic TeX would
    be understood by browsers.  About 10 years ago, IBM had a cool plugin called texexplorer.
    The plugin would compile latex on the fly. No need to publish a PDF. It worked
    pretty well for basic documents which would not rely on macros.

    Still, to address the question of the submitter, it would be nice to have something like

    <latex>
    $\int_0^1  \frac{\sqrt{\sin(x)}}{1+x^2} \; dx$.
    </latex>

    It would not have to be the full latex stack but the ability to place mini latex pages into
    HTML documents. Its a pity techexplorer technology seems have disappeared. If IBM would
    opensource it, it could become an add-on for firefox.

    1. Re:texexplorer by Hecatonchires · · Score: 1

      +1 neat idea

      --

      Yay me!

    2. Re:texexplorer by Anonymous Coward · · Score: 0

      Take a look at the XKCD forums. It's not exactly what you're looking for, but it's close.

    3. Re:texexplorer by Yobgod+Ababua · · Score: 2, Informative

      That example sounds like it could be well rendered using MathML... http://www.w3.org/Math/

    4. Re:texexplorer by Anonymous Coward · · Score: 1, Informative

      There is LatexMathML though.

      http://math.etsu.edu/LaTeXMathML/

    5. Re:texexplorer by goose-incarnated · · Score: 1

      Still, to address the question of the submitter, it would be nice to have something like <latex> $\int_0^1 \frac{\sqrt{\sin(x)}}{1+x^2} \; dx$. </latex> It would not have to be the full latex stack but the ability to place mini latex pages into HTML documents. Its a pity techexplorer technology seems have disappeared. If IBM would opensource it, it could become an add-on for firefox.

      Why does it need to be an add-on for firefox? That could more easily be rendered as inline low-res images by the server, so currently it can be done.

      --
      I'm a minority race. Save your vitriol for white people.
    6. Re:texexplorer by linuxtuba · · Score: 0

      Already Exists

      Tex the world http://thewe.net/tex/

    7. Re:texexplorer by Anonymous Coward · · Score: 0

      http://math.etsu.edu/LaTeXMathML/

      Might be what you want.

  31. A solution requires a problem by carlzum · · Score: 1

    What is it you're trying to accomplish? Non-standard HTML is certainly not a solution for whatever printing problem you're having, and it eliminates the benefits of HTML. Listen to everyone else that's responded. LaTeX solves most gripes people have with word processors, stick with CSS if you have a compelling reason to use HTML, and look into Docbook XML if you're not happy with the first two options.

    If you want to use HTML just to prove it can be done, go for it if you think it sounds fun. But if you're serious about using it for publishing, forget it. No one's going to accept a homegrown HTML file for printing.

  32. Which paper size? by TalkingToes · · Score: 1

    What size paper would we all agree upon? You listed "A4", I like "Letter". Close in size, but different. Get the world to agree, and maybe you have you wish one step closer. I'd not vote for "Business Card" sized.

    --
    5'16" is easy math, so why do so many miss it?
  33. yes, it's a crazy idea. by porky_pig_jr · · Score: 1

    This is exactly what HTML was *not* intended to be. We're talking about viewing of a document, with different browsers. No standard display is guaranteed, no matter what you try. For academic documents use software like LaTeX, and create a PDF file, or, use MIcrosoft and create doc file, or whatever. I remember reading somewhere discussion why LaTeX cannot be mapped exactly to HTML (may be it was TeX faq, not sure), and that was pretty much it. Different goals in either case.

  34. Editors should be ashamed. by Anonymous Coward · · Score: 0

    As mentioned by everyone else in this thread, LaTeX is exactly what you're looking for. HTML is absolutely not, and should never be made into, a page description language.

    The editor of this Slashdot summary should be ashamed for not being familiar with LaTeX, one of the greatest open source projects.

  35. Learn the tools first, then worry about changing by crmartin · · Score: 2, Informative

    See, as someone has already pointed out, there's at least one such tool that's in wide use already: TeX and LaTeX. If you don't like that one, it turns out that HTML, with CSS and a little bit of Javascript, is perfectly capable of doing all the things you want, too. You just have to learn how. Have a look at Lie's Cascading Style Sheets: Designing for the Web (written and typeset in HTML/CSS) and at Prince XML for detailed examples.

  36. Mod parent up by bluej100 · · Score: 1

    If you want to print HTML, Prince is the way to go. It even makes our end-user-generated TinyMCE documents look good.

    1. Re:Mod parent up by CarpetShark · · Score: 1

      Prince is the way to go... It even makes our end-user-generated TinyMCE documents look good.

      Yeah, but that's hardly accurate rendering ;)

      Seriously... I've been meaning to try Prince too; it seems like an ideal solution. Do you know if it can make properly tagged (accessible) PDFs?

  37. Themes by Anonymous Coward · · Score: 0

    I had the same idea as the OP, while looking I found LaTeX and I find it quite perfect for writing pretty much anything, however there is one point which makes it mostly unusable for normal people: themes.

    While writing in LaTeX is easy and powerful, in order to theme (typeset?) a document you have to suffer quite a bit: read docs, learn lots of stuff etc. I believe what the OP wants is to be able to easily write documents (HTML) but also, easy to create a presentation (CSS), think about it: CSS is easy, simple and clean and it could be an awesome companion to something like LaTeX or any other markup language. There are a lot of styles for LaTeX that allow to create a bunch of document kinds, however when you want to customize some part of the presentation (like: add a section with a little image to the right and a yellow border) you are in a world of hurt.

    I have yet to find an easy way to create print documents and have a good control over the presentation. So far the closest thing are word processors, but I hate the broken visual editing (I prefer to stick with good old code syntax).

  38. You're on the right track, for the wrong reason. by mellon · · Score: 2, Informative

    The ability to cite an HTML document is something that would indeed be useful. The ability to hard code page numbers into an HTML document isn't. The reason why academia and the press have been so resistant to HTML, historically, is that you don't get any control over page layout. Which means that you can't refer to things by page number.

    The solution isn't to fix HTML so that you can number pages. It is to fix the bibliographic references to not use page numbers. Generally speaking, it's not hard to number documents by section, and you can make the numbering fine-grained enough for bibliographic references. Then refer to the chapter and section, rather than the page number in your bibliography, and you're done. No need to "fix" HTML.

    It might make sense to ID paragraphs in HTML, so that you could simply refer to the paragraph ID in your bibliography. If this were simply document metadata, and didn't have anything to do with layout, it would work pretty well. As a bonus, you wouldn't need to renumber, because the ID would just be an arbitrary cookie, and wouldn't need to make sense to a human.

    Of course, with hypertext, there's really no need for a bibliography anyway. Just link to the text you're referencing... But I realize that that's impractical in academia at the moment. I'm just saying...

  39. Learn the truth about Slashdot. by Anonymous Coward · · Score: 0

    Behold Anti-Slash, the jihad HQ for the holy war against the Slashdot hive-mind. See our extensive documented failures of Slashdot, and make today the last day of being a robot.

  40. You want more tags? You want XML. by jrharshath · · Score: 1

    Since HTML wont add new tags for you, you could write your paper as XML, and use a stylesheet to display it in whatever fashion you want. That way you could have "one column stylesheet", "two column stylesheet" etc formatting the same XML document in your favourite way of presenting it :)

    1. Re:You want more tags? You want XML. by theillien · · Score: 1

      Seconded. This is the sort of custom application that XML was created for.

  41. Universality of HTML by Anonymous Coward · · Score: 0

    You can embed CSS in HTML pages; that should do what you want, if you have another way of dividing up the right amount of information per page.

    Although this is slightly more complicated, I'd look to an XML/XSLT/CSS solution instead. It would enable you to take a source document, split it into pages by paragraph or size, and then format those pages, all while keeping the raw data in XML in the case the user wanted to use another reader.

    1. Re:Universality of HTML by theillien · · Score: 1

      You can embed CSS in HTML pages; that should do what you want, if you have another way of dividing up the right amount of information per page.

      Please don't encourage people to do things the wrong way. Stylesheets are for style and markup is for content. Style should only be applied as attributes using the stylesheet definitions. Encouraging people to embed their style in their markup is a step backward.

  42. He wants us to reverse-engineer SGML from HTML? by kenh · · Score: 1

    SGML pre-dated HTML, in fact, HTML is (in many ways) a subset of SGML.

    I suspect the poster never heard of SGML, or it's predecessor GML

    Here's a link to a good book on the subject in Google Books: The SGML Book

    There is also DOCBOOK and LaTex..

    --
    Ken
  43. seriously? by Anonymous Coward · · Score: 0

    Why are obvious trolls being posted as if they were serious questions?

  44. Save as HTML by InsertCleverUsername · · Score: 1

    Some versions of Microsoft Word have a really cool feature called "Save as HTML". Saves simple one-page documents as fantastically redundant HTML in less than a terabyte --and you might even get a cute little paperclip to help you through the process!

    HTH.

    --
    Ask me about my sig!
    1. Re:Save as HTML by youn · · Score: 1

      You mean they got it down to less than one terabyte for short documents? That's awesome news... now all they need is to ask for more plugins while display the document... and it'll be awesome. Seriously, what were they thinking... they should have a browser for every word feature, including one for footers, one for page layout, etc.

      --
      Never antropomorphize computers, they do not like that :p
  45. Anonymous Cowardon? by camperdave · · Score: 1

    "Anonymous Cowardon"? What the devil are you talking about?

    --
    When our name is on the back of your car, we're behind you all the way!
    1. Re:Anonymous Cowardon? by Memroid · · Score: 1

      On windows machines (current Firefox, Chrome, IE) the current CSS style on slashdot displays no space between "Coward" and "on" in the line reading "by Anonymous Coward on Thursday July 02, @07:43PM"

    2. Re:Anonymous Cowardon? by nausea_malvarma · · Score: 1

      Not just windows machines. I'm running debian and I still see cowardon in firefox. I just assumed it was some newly discovered ancient reptile, like dimetrodon.

    3. Re:Anonymous Cowardon? by Philip_the_physicist · · Score: 1

      Not just windows. Firefox 3b5 does that under Linux (not my fault, idiot admins using RHEL)

    4. Re:Anonymous Cowardon? by vonart · · Score: 1

      Bizzarely, I'm on IE6 (no choice, at work) and it doesn't do that. Go figure -- IE6 is actually useful for something.

      --
      The American Dream has too much grinding and the leveling makes no sense. -GameboyRMH (1153867)
    5. Re:Anonymous Cowardon? by smoker2 · · Score: 1

      My Firefox (on linux) reads it fine. Except of course that it should be Anonymous Coward at.

    6. Re:Anonymous Cowardon? by MacAnkka · · Score: 1

      Opera 10 on Mac does it, too.

    7. Re:Anonymous Cowardon? by Anonymous Coward · · Score: 0

      running ff3.5 final on win2k, everything looks ok

    8. Re:Anonymous Cowardon? by camperdave · · Score: 1

      My Firefox (on linux) reads it fine. Except of course that it should be Anonymous Coward at.

      Mine displays it with a space, on both Windows and Linux. I wonder if it is a font/text size side effect.

      As far as "on" vs "at", the usage is correct. Events occur "on" days, "at" times. Eg:by smoker2 ... on Friday July 03, @03:41AM. The problem is that they are using @ instead of the word "at", so it is easily missed.

      --
      When our name is on the back of your car, we're behind you all the way!
    9. Re:Anonymous Cowardon? by Anonymous Coward · · Score: 0

      safari 4 on mac also displays 'anonymous cowardon' with the missing space.

    10. Re:Anonymous Cowardon? by VisceralLogic · · Score: 1

      Safari 4 on Mac does it, too.

      --
      Stop! Dremel time!
    11. Re:Anonymous Cowardon? by cybernanga · · Score: 1

      Safari 4 on OS X 10.5.7 (Leopard) displays Anonymous Cowardons too.

      --
      www.Buy-Proxy.com - A "buyer-driven" global marketplace.
    12. Re:Anonymous Cowardon? by Anubis+IV · · Score: 1

      Depends how you have your time settings displayed. Mine look like this, for instance:
      by Anonymous Coward on 09:43 PM -- Thursday July 02 2009

      To say the least, that usage is definitively incorrect.

    13. Re:Anonymous Cowardon? by Anonymous Coward · · Score: 0

      So does Safari 4 on Mac...

  46. ReST too by greg1104 · · Score: 1

    Obviously the only sensible robust solutions to this problem are either LaTeX or Docbook. The main problem with both of those is they're kind of painful to author. What I've switched to for any quick documents I write is reST. It's easy to learn for quick documents, you can edit with just about anything (its rudimentary tables support is best handled with emacs), includes features like footnotes, and is easy to render into HTML and PDF. After a few months of writing docs for some projects I work on in reST, I've found myself even writing all my random notes in that form, so that I can generated nicely printed versions of them at any time.

  47. Hands down... by juanergie · · Score: 1

    ... you want \LaTeX

    --
    Aeroespacio.org
  48. Re:Learn the tools first, then worry about changin by Gx2 · · Score: 1

    As the previous poster said: in the context of HTML + CSS have a look at 'Prince XML' (http://www.princexml.com/overview//). From the website ('why type if you can copy & paste' ;-) ): "Prince is a computer program that converts XML and HTML into PDF documents. Prince can read many XML formats, including XHTML and SVG. Prince formats documents according to style sheets written in CSS. Prince is available for several platforms and is easy to download and install. We offer a free Personal license for interactive use on a single computer." I have used it successfully for some personal projects and if you're already somewhat familiar with HTML & CSS it's real easy to get into. Don't forget to check out the examples on the site.

  49. Let CSS work for you! by Fireflymantis · · Score: 2, Insightful

    <html>
      <head>
        <title>Abstract of a usable design</title>
        <style type="text/css">
          @media print {
             body { margin: 2.5cm; }
          }
          @media screen {
             body { margin:  50px; width: 50%; }
          }
          body { font-family: sans-serif; font-size: 12pt; }
        </style>
      </head>
      <body>
        <h1>It's so crazy it just might work</h2>
        <h2>and other html inspired musings</h2>
        <p>Why not just use css?</p>
        <p>Also, don't worry about page numbering. that's the browser's job.</p>
      </body>
    </html>

    1. Re:Let CSS work for you! by Anonymous Coward · · Score: 0

      Page numbering is the browser's job, but how can you cross-reference bits of the document in the printed version? I.e. you might want to refer to a section that gets printed on Page x, (with the browser determining x at print time). You could work around that by referring to section 3.8.37 or so, but that might not be desirable.

    2. Re:Let CSS work for you! by Anonymous Coward · · Score: 0

      It's HTML. You use a named anchor (#) link.

    3. Re:Let CSS work for you! by tkiehne · · Score: 1

      It's HTML. You use a named anchor (#) link.

      ...Which works fine as long as you are in HTML... but what about citations? There has to be some sort of portable way to reference specific portions of the document no matter where it is printed or accessed.

      An anchor link isn't typically visible (though you could make it so - with a section number for instance ;-) and as such is not portable to other media.

      --
      -- t_kiehne
  50. A measured reply by Yobgod+Ababua · · Score: 1

    To echo some of what's already been said, if you really want a format that will look the same for all clients, HTML is not the answer. The problem is that HTML gives too much formatting control to the VIEWER, allowing one to change the font size, change the screen (or paper!) size (think everyone prints on A4, or on US Letter? Think again!!), or even the entire font. If you really want your report to look the same, export to PDF or use a real typesetting language.

    That said, if you really want to use HTML, look more closely at the "orphan control" CSS options. Used properly on your p or div elements, they can help ensure that your paragraphs or sections line up nicely on separate pages, no matter what sizes those pages end up being, or what font they end up being rendered in. If what you really want is to keep your writing from becoming visually fragmented, this may very well do the trick for you.

  51. I am curious to know... by pdboddy · · Score: 2, Insightful

    Why the OP specified not using CSS and then suggested an HTML element that looks almost exactly like CSS?

    CSS has a method of creating pages, for printing and more. It's no more difficult to learn than HTML is. You could use XML, create all the custom tags you want, and use XSL (oh look stylesheets again) to style the XML however you want.

    HTML5 is coming out in the near or distant future, if you have suggestions for tags and functions, you might want to try to get involved with the W3.

    --
    Julie Moult is an idiot.
    1. Re:I am curious to know... by Anonymous Coward · · Score: 0

      Why the OP specified not using CSS and then suggested an HTML element that looks almost exactly like CSS?

      Basic troll tactics. Ask a question which artificially restricts and limits answers so that there's no real solution - thus creating "a problem", which in reality doesn't exist. If the question is sincere, it should be phrased differently, so that any good solution to the actual problem is accepted. Now (s)he's just using wrong tools for the task.

      "I want to bake a cake, but I don't want to use flour or oven, how do I bake a tasty cake from just used motor oil using nothing but finger nail clippers?"

    2. Re:I am curious to know... by tygerstripes · · Score: 1

      So, so, so many questions of this nature suffer from the same basic flaw. "How do I do X using tool Y?"

      The question betrays a basic misunderstanding of the problem, or of the context, or just a bloody-minded determination to stick to an inappropriate ideal.

      Most of the time, the answer is simply that You're asking the wrong question.

      --
      Meta will eat itself
  52. Bad idea by dna_(c)(tm)(r) · · Score: 1

    I agree with you that the OP doesnt seem to have thought this through.

    • It took more than a decade to remove presentation from content in HTML, and now moving that back in?
    • If the platform independent formatting is so important, why not use PDF? That is also very well supported on different platforms. And it is reasonably easy to generate with XML + XSL-FO.
    • Why 1 HTML page to x Paper pages? Why not x HTML pages?
    • Why not use XML + CSS? most browsers support this.
    • Why not use SVG? most browsers support this too.

    I think the ODF way is still the best for archiving, keeping content and markup data separated (separate XML) and together (single ZIP). And it is an open standard with freely available software - not just OpenOffice.org.

    There is a Single OpenDocument XML File variant for ODF, where everything is a single XML file (root tag office:document) if it really has to be.

  53. Structured text by Anonymous Coward · · Score: 0

    I'm not sure if this is precisely what you are looking for, but you might want to check out the various "structured text" systems. For example, "reStructuredText":

    http://docutils.sourceforge.net/rst.html

  54. How portableis HTML? by colinrichardday · · Score: 2, Insightful

    These different browsers render HTMLdifferently.

    1. Re:How portableis HTML? by msuarezalvarez · · Score: 1

      But that is by design: there is no expectation that HTML will be rendered the same on different browsers, as it is stated in the spec itself.

      (La)TeX has a rather different set of requirements...

    2. Re:How portableis HTML? by colinrichardday · · Score: 1

      But it's not merely differences in rendering; there is no guarantee that a browser will meet the specification at all.

    3. Re:How portableis HTML? by msuarezalvarez · · Score: 1

      No standard offers any guarantees that implementations will meet the spec.

      HTML (and its ancillary technologies, like CSS, various scripting languages, etc) are designed to degrade gracefully. Most importantly, it was never the intention of the HTML spec to have HTML documents be rendered the same in different browsers, and, in fact, it is designed very, very carefully to allow documents to be rendered differently.

    4. Re:How portableis HTML? by colinrichardday · · Score: 1

      I have a better chance that LaTeX source code will compile properly in a Windows version of LaTeX than I have of HTML render properly across browsers, if the reports of people massaging HTML to work in Internet Explorer are correct.

      HTML is easier for computers to parse than LaTeX is.

    5. Re:How portableis HTML? by msuarezalvarez · · Score: 1

      I have a better chance that LaTeX source code will compile properly in a Windows version of LaTeX than I have of HTML render properly across browsers, if the reports of people massaging HTML to work in Internet Explorer are correct.

      HTML is easier for computers to parse than LaTeX is.

      I have no idea what your point is.

      Every latex file should result in the same output in all architectures: the whole system is designed with extraordinary care to guarantee that (assuming absence of bugs and proper font metrics). On the other hand, your expectation that HTML be rendered the same in different browsers is only born out of your ignorance of the what HTML wants to be: no one ever designed it in order to render in the same way everywhere, and in fact, it is designed with great care in order to allow it to be rendered differently.

      On the other hand, HTML is easier to parse by design (and XHTML more so). In order to parse LaTeX you have to implement TeX itself, for it is not possible to "parse" LaTeX without actually executing it: "easy to parse" and LaTeX are simply not twothings that can be done at the same time.

    6. Re:How portableis HTML? by colinrichardday · · Score: 1

      your expectation that HTML be rendered the same in different browsers is only born out of your ignorance of the what HTML wants to be: no one ever designed it in order to render in the same way everywhere, and in fact, it is designed with great care in order to allow it to be rendered differently.

      I said rendered properly, not identically. I don't care if MathML in Internet Explorer looks the same as MathML in Firefox, as long as it looks like math. But can I be be sure that my users can download/install the MathExplorer plugin for IE?

    7. Re:How portableis HTML? by rhkramer · · Score: 1

      Re: "it is designed very, very carefully to allow documents to be rendered differently"

      Can you shed some light on why they would have done that (short of dealing with browsers with lesser capabilities, the biggest example being text only browsers)?

    8. Re:How portableis HTML? by msuarezalvarez · · Score: 1

      The idea is to be able to render to different media and under different circumstances (a braille reader, a text-to-speech browser, printed pages, small-screens, large screens, etc) and letting the client control rendering (just as you can choose the text font or size on any sensible browser, you can modify pretty much anything---see, for example, the user.css file in firefox)

      That is why the whole thing is set up so as to be able to separate presentation from context, and why HTML deprecated the FONT tag, for example.

    9. Re:How portableis HTML? by rhkramer · · Score: 1

      Thanks!

  55. That wouldn't be HTML by Hai-Etlik · · Score: 1

    That's entirely counter to the philosophy of HTML. It's meant to be independent of presentation so it can be presented in different ways. There's good reason for CSS being separate and for this sort of thing being in there. If you want a way to link to the point where a page break occurs in a particular print copy, I'd suggest adding a elements at the locations of the page breaks like this <a id="page1" />

  56. restructuredtext... by tyroneking · · Score: 2, Informative

    is made for what you're talking about

    Documents are written in human-readable text format - good for storing in version control and using for diffs
    Python Docutils is used to convert to HTML and/or LaTeX and a few other formats
    rst2pdf is a tool that converts to beautiful PDF (easier than using Docutils + LaTeX)

  57. embedding images/graphs by Anonymous Coward · · Score: 0

    one problem with html is that images are kept as separate files. you would also need to have them embedded in the file i guess.

  58. Opera CTO did it by edelholz · · Score: 1

    The Opera CTO Håkon Wium Lie had a talk at our university and I think just to prove a point, he wrote one of his books in HTML. Go look into that.

  59. Pages = anchors by Anonymous Coward · · Score: 0

    HTML doesn't work well with pages, rather instead use generous anchors with a consistent naming scheme. For example, using headers to separate sections, you could label the anchors by section, subsection, paragraph numbers.
     
    But as others have said, use LaTeX and/or Lyx.

  60. Why force the page breaks? by rew · · Score: 1

    Why do you want to force the page breaks? It's stupid. HTML is intended to render correctly independent of the resolution, so independent of number-of-characters-that-fit-onto-a-page. Suppose someone gets your academic paper, but he is a bit blind. So he sets character size to 15pt, and prints it (for reading during a hypothetical train ride).

    Someone else is concerned with the environment, has good eyes, and prints double sided with a 9 pt font.

    Generating documents that handle this well, means you have to take care that you refer to "fig 3" and not to "the figure on page 2", things like that.

    1. Re:Why force the page breaks? by Angeliqe · · Score: 0

      But that is exactly why you would want to force a page break. Someone may print your document with large fonts and someone else may print with small fonts and different page margins, ect. If it's an academic paper, you would want a page break after certain sections: Title Page, Table of Contents, sometimes Chapter breaks, just to be professional. This has no effect on the way it is seen in the browser, just how it is printed.

  61. Consider using reStructued text by Anonymous Coward · · Score: 0

    You can write your document in plain text then transform it into HTML or PDF etc using docutils.

    http://docutils.sourceforge.net/docs/user/rst/quickstart.html

    The entire Python documentation is written in it.

  62. Try this: by Anonymous Coward · · Score: 0

    span.by:before {
            content: ' ';
    }

  63. Seconded; need something between HTML and Word by davide+marney · · Score: 1

    It is ironic that HTML, originally developed precisely to make it easy to mark up academic and technical information for publication, has never moved beyond the extremely bare-bones specification of heading, list, term, and paragraph tags. I would have expected some elaboration over time, but HTML seems frozen in time.

    What happened, I think, is that people basically ignored HTML and went straight for word processing, a far more complex beast from a specifications point of view. For the past 20 years, we have been letting HTML languish while we attempt to come up with a document specification. ODF is the just the most recent cycle on this effort.

    It is unfair of the parent to pin the blame for this on Microsoft, however. Word processing was one of the "killer apps" at the time of the birth of the web, and Word was just a niche player at the time. No, we went straight for the jugular of word processing because we all wanted to print to paper.

    The OP is absolutely correct to think about revisiting HTML as a specification. What I hate about all reader-dependent formats (DOC, ODF, PDF, ...) is they force the user to completely leave the context of the web page just to view some data. The browser is the only "reader" I should need . If you can't at least embed on the page, fuggetaboutit. The gold standard is compound content with full document flow. Why oh why can't we come up with a simple way to blend content without drawing frames and putting scrollbars within scrollbars!?

    Personally, I'd love to see some formality and general adoption of richer semantic markups such as the microformats hCard, hCalendar, etc. I'd also love to see some richer hierarchical markup; simple lists only take you so far! I'm imagining something with the hierarchy of XML but without the complexity of full extensibility, and all the definitional parts of a specification needed to support that (schemas).

    The Atom Publishing Protocol is the perfect example of what I'm talking about: extensible, but easy to use because it comes with a well-chosen set of standard elements and attributes.

    --
    "We receive as friendly that which agrees with, we resist with dislike that which opposes us" - Faraday
  64. LaTeX? XML! by meuhlavache · · Score: 1

    XML allow you to have a very-very-(very-)long time support. Only problem: you have to create the reader... but the reader can be extend with no limitation and humans from futur will get an easy way to use this document!

  65. Do you know latex? by Anonymous Coward · · Score: 0

    Use latex!

    It's much more productive to write a document.

  66. Good ol' XML by tygerstripes · · Score: 1

    XML is like violence: if it doesn't solve the problem, you're not using enough.

    --
    Meta will eat itself
  67. It's the browsers/editors, not the file format! by frisket · · Score: 1

    HTML, especially XHTML, can already do what the OP describes, but browsers don't support all the bells and whistles needed for paper-like paged rendering. CSS goes some way towards meeting the deficiencies, but the end user still retains sufficient control to (perhaps unwittingly) defeat almost any attempt to force pagination and placement. It is tedious, but by no means impossible, to write documents of considerable complexity in HTML, as I pointed out long ago, but page support requires browser cooperation.

    The only reliable answer at the moment is to provide multiple formats generated from a single source. An XML master (DocBook, TEI, whatever) can be used with XSLT to generate LaTeX source code for making a PDF, and the pagination data can be re-used in a subsequent XSLT script to generate paged HTML. The problem is the XML and LaTeX editors, which are unsuited for writing unless you learn about XML or LaTeX markup, and even the relatively smart ones don't implement a lot of the features needed for complex structured writing (<plug>come to Balisage to find out why</plug>).

    LyX and similar editors (Scientific Word, Textures) provide synchronous typographic interfaces to LaTeX, and TeX4ht provides excellent conversion to web pages and other formats. Even Word and OpenOffice, when used with named styles (with utter rigour) can be converted reliably to HTML, LaTeX and other outputs.

    The last thing on earth we need is to increase the size of the HTML tagset: HTML5 is already suffering from bloat.

  68. You are reinventing DocBook by Anonymous Coward · · Score: 0

    You are trying to reinvent docbook. Not only is everything you want done, it is implemented in several tools (XMLMind and oXygen are two I know of), has a standard method of converting it to any form you want (XSL, XSLT, XSL-FO), and there are tools that are already written to take advantage of those standards (Apache FOP being a FLOSS one). The latest version of DocBook uses XML namespaces, so you can mix in other markup languages as well; the canonical example is DocBook + MathML + SVG, which covers 99.9% of the math/science based literature out there. BTW, if you DO plan on going down this path, I suggest picking up a copy of XSLT, 2nd edition by Doug Tidwell. The latest version of the DocBook book is supposed to be out in August; don't buy the version currently on sale, it is 10 years old, and does NOT cover the current version of DocBook.

  69. Bring me the OP so I can shoot him... by Shaiku · · Score: 1

    Hacked up HTML as our universal word processing format? That has to be the most naive -- no, the dumbest fucking idea I have ever heard in my entire life. For a full explanation of why, take 50 cents and go buy an education. It should be obvious to anybody that HTML was a complete failure in interoperability and it is one of the clumsiest protocols to try and use when it comes to content presentation. As others have already pointed out repeatedly, there already exist better mousetraps for this problem anyway.

  70. Right tool for the right job, dammit by SuiteSisterMary · · Score: 1

    On a very practical note, you'd literally be laughed out of any print shop where you showed up with your earnest little smile and a USB key with an HTML file, and an expectation of getting an accurate print-out.

    --
    Vintage computer games and RPG books available. Email me if you're interested.
  71. RTF !=SGML descended by Anonymous Coward · · Score: 0

    if http://en.wikipedia.org/wiki/Rich_Text_Format is to be believed, RTF owes more to TEX than to SGML, and it doesn't look like SGML to me at all.

    RTF is a pure Microsoft "standard", and its versions reflect the respective capabilities of the current version of MS Word.

    That notwithstanding, RTF is implemented relatively well in most word processors. If you restrict yourself to relatively simple formatting, there shouldn't be a lot of problems.

  72. Re:You're on the right track, for the wrong reason by TerranFury · · Score: 1

    The reason why academia and the press have been so resistant to HTML, historically, is that you don't get any control over page layout. Which means that you can't refer to things by page number.

    LaTeX is the same, more-or-less. You can refer to page numbers in the PDF though, which is the "final product" you actually exchange. Also, academic works tend to be very structured into sections and subsections (something LaTeX does nicely for you) which in some ways eliminates the need to refer to page numbers.

  73. A few days earlier... by Anonymous Coward · · Score: 0

    At university in the 1990s, I found that most of my professors and TAs preferred typewritten manuscript or a good approximation. So I edited non-paginated, single-spaced ASCII text in Emacs, using the Emacs text justification mode to reflow paragraphs with line breaks. I used enscript to print to school printers while inserting header/footer data and page numbering. I used the enscript options to also spread the text out into a double-spaced print format with a fixed-pitch courier font. The teaching staff liked it, and I didn't waste time on silly presentation issues.

    For cross-references, I found most professors accepted any reasonably standard reference variant, so I chose a liberal arts format with end notes and "(AuthXY)" references to author name (abbreviated) and year of publication. I opened the same text file in Emacs with two Emacs windows so I could keep one editing cursor where I was in the main text, and easily insert new references and see both the entry and the reference key in the main text at the same time. Since they were not numbered sequentially, there was no issue of relabling them every time I inserted more references. This left me with a nice editable "source file" that looked good on screen, and totally functional print formatting. Once in a while, I inserted a literal ^L character to force pagination boundaries.

    I started using latex when I needed to work with lots of quasi-mathematical notation in both Computer Science and Philosophy, including many subscripts and superscripts on deeply nested bracket forms. I later learned to insert vector graphics, and applied my computer knowledge to have a nice Makefile to rebuild my full latex document from source files. You haven't lived until you do this under a version-control system and then start collaborating with multiple authors checking out and incrementally modifying the source files.

    I found it liberating to work with textual source formats and ignore formatting for the most part. And this is after growing up with WordStar on CP/M and then learning proper use of paragraph styles with AmiPro (?) on Windows 3.0/3.1. I am aware that my education and career has changed my whole way of thinking about information workflows. Unfortunately, I have yet to see how to bring the fruits of my labors to the masses, as it seems you have to undergo this wholesale cognitive conversion before you can start to appreciate better ways of managing your information. It is too easy to take an "easy" way out and then find yourself trapped with poor quality tools and methods.

  74. Page already exists in CSS/Media specific CSS by Anonymous Coward · · Score: 0

    What you want to do already exists. The W3C released media specific stylesheets, which allow you to create an HTML page with CSS optimized for the specific media you're using. It's most frequently used to create "printer friendly" versions of webpages without having to maintain two separate files. There's even an author who used HTML/CSS to create a book.

    Practical information about using media specific stylesheets can be found at these articles:
    Printing a Book with CSS: Boom! by Haåkon Wium Lie, Bert Bos
    ALAs New Print Styles by Eric Meyer

    W3C information:
    CSS Print Profile
    CSS3 Module: Paged Media
    XHTML-Print

  75. do not reinvent the wheel... by kbdd · · Score: 1

    The purpose of HTML is to display adequately (optimally?) across different display sizes (and resolutions). If you want the opposite (fixed size), there are other formats better suited like Postscript, PDF and Latex, among others. Do not reinvent the wheel.

  76. Both XHTML and LATEX work by sanguisdex · · Score: 1

    One the above post talking about LATEX is right. but if you don't want to learn a hole new standard. you could read this http://www.alistapart.com/articles/boom it all about printer style sheets. One cool thing about HTML that latext does not have is the auto type media extensions. you can redefine the look of you content with a specific style sheet. And while I can do all that with LATEX I have to run it through a processor first. bot that multi media marrett and if you want to get into really complex type setting issues LATEX is the way to go. if you just want to use it as a word processor I would suggest using some sort of WYSIWYG for it and have a pdf printer.

  77. Re:You're on the right track, for the wrong reason by greyhueofdoubt · · Score: 1

    Our technical order system uses chapter/section/paragraph numbers in addition to page numbers; the page numbers are completely useless as they can change with new updates but the content always stays under the same paragraph.

    So you have chapter 7, section 2, paragraph 5: 7.2.5, How to do this task, and then 7.2.6, How to inspect this task, etc. If an update comes up for the task, the page is replaced and the new paragraph is added inline: 7.2.5.1, Something we forgot about this task.

    With electronic documents becoming more and more widespread, I would assume that this numbering would become more popular. It's easy to use, fine-grained (there are multiple paragraphs per page=more precise notations), and you can expand on it in place. No more, "Ok class turn to page 119, except for those of you who have the textbook with the lion on the cover, you'll need to read the first paragraph on page 117 and the middle paragraph on page 120."

    My 2 cents.

    -b

    --
    No offense, but I've stopped responding to AC's.
  78. DocBook and XSLT by Anonymous Coward · · Score: 0

    I have a sneaking suspicion that when the OP is saying things like "no CSS" and doesn't mention LaTeX, s/he is actually giving specifications in a very obfuscated way -- specifications that need to be deduced. What I take from the post is that the OP wants

    Write everything in DocBook XML, and add an XSLT to have it automatically transformed into valid (X)HTML.

    There are programs to convert it into things like PDF and RTF, but you can keep things in straight DocBook for the canonical version.

  79. Meaning Vs Presentation by Ractive · · Score: 0

    The trend (and not an unjustified one) is to separate structure from presentation in HTML so THE WEB can be browsed and searched in a a more meaningful way so CSS is used for display and tags are used to markup structure
    Since what you are suggesting is a mix of structure and display needs, it doesn't make a lot of sense to introduce it all in HTML, it would be like going back to the 90's, nevertheless it can already be achieved by separating the components you suggest in structural and display and surely you will find solutions for what you intend using a combination of existing HTML - CSS techniques.
    Also HTML was not created and is not primarily oriented for printed documents, it's native purpose is for on-screen display so it's not surprising that these features are not naturally supported in the core of the language.

  80. wrong tool for the job by j1mmy · · Score: 1

    while you can mark up your HTML with CSS for print media, why bother? when i send documents around i almost always send PDF's since they'll look the same in just about every reader. if it's something somebody else needs to edit, then i usually go with an MS Word document, which is a very portable format these days.

  81. CSS print media by NekoXP · · Score: 1

    CSS3 does all you want: http://www.w3.org/TR/css3-page/

  82. use jsMath, Re:texexplorer by Petronius+Arbiter · · Score: 1

    In pmwiki, you can include LaTeX math with this:

    http://www.pmwiki.org/wiki/Cookbook/JsMath

    I've used it for some time and highly recommend it

  83. No CSS is crazy talk by Rambo+Tribble · · Score: 1

    HTML is about semantic content, not presentation. It is joined at the hip with CSS for presentation through the browser. Print is a form of presentation.

  84. Been done DocBook or DITA by _32nHz · · Score: 1

    If you want your structure and presentation intertwined then use ODF.

    If you want them separated:

    For structure use the book inspired DocBook, or the journal inspired (and generally more flexible) DITA.

    To format either of these for presentation (either on screen or in print) you can either use an adaptive layout with HTML+CSS or a predetermined layout with XSL:FO.

    Can't think of any way of avoiding CSS as all three solutions use it.

  85. Re:You're on the right track, for the wrong reason by Anonymous Coward · · Score: 0

    > The ability to hard code page numbers into an HTML document isn't [useful]

    BS. As always, a slashdotter recommends changing the way the world works to fit the available tools, instead of wondering how the tools could be fixed to suit the world, like the original poster.

    Getting an HTML rendering engine to yield a page number, given an arbitrary reference inside a document is a SMOP. And waaaaay easier than convincing every author in the world to "number documents by section, and you can make the numbering fine-grained enough for bibliographic references".

    By the way, I agree with others, LaTeX is some really wonderful technology. '80's technology.

  86. A4 by manaway · · Score: 1

    Parent's paper size is for US letter (8.5 x 11 inches). For A4 (210 x 297 mm) with 25mm margins use:

    \geometry{papersize={210mm,297mm},total={185mm,272mm}}
    \pdfpagewidth 210mm
    \pdfpageheight 297mm

  87. Re:A4 redux by manaway · · Score: 1

    Crap, should have doublechecked my own work! Margins in the above are wrong (unless you like 12.5mm margins).

    Parent's paper size is for US letter (8.5 x 11 inches). For A4 (210 x 297 mm) with 25mm margins use:

    \geometry{papersize={210mm,297mm},total={160mm,247mm}}
    \pdfpagewidth 210mm
    \pdfpageheight 297mm

  88. "Printing" by Anonymous Coward · · Score: 0

    LOL

  89. Re:You're on the right track, for the wrong reason by mellon · · Score: 1

    Er, no. HTML is meant to render at arbitrary sizes. The term "page" doesn't mean anything in that context. You don't need to convince "every author in the world" of anything - just convince the ones who are using HTML as their publication medium. Or use my other suggestion - paragraph identification metadata.

  90. Yes. by Anonymous Coward · · Score: 0

    Yes, this is such a crazy idea.

  91. XML? by VirtualJWN · · Score: 1

    Isn't that what XML is for? Document is the data. I'm just saying....

    --
    "Any sufficiently advanced technology is indistinguishable from magic." - Arthur C. Clarke
  92. But will contiki web server run it. by Anonymous Coward · · Score: 0

    Will xhtml run on the old commodore 64 web server running contiki http://www.c64web.com/ URL
    I think contiki only does pure html ?

  93. This would be called the media rule by Greg_D · · Score: 1

    Ala:

    <link href="http://www.michaeljacksonsmissingnose.com/screen.css" type="text/css" media="screen" title="screen stylesheet">
    <link href="http://www.michaeljacksonsmissingnose.com/print.css" type="text/css" media="print" title="print stylesheet">

    Then in your print stylesheet, for print specific crap, you just do:

    @media print {

            a {
                    text-decoration:none;
                    color: black;
            }

    }

    Or whatever it is you wanna do. I mean, it's up to you as to how you want to define the size of your page, and they COULD add a feature to define the size of the page, but that's better handled in CSS than HTML. The entire point of CSS is to change the way the document looks or is formatted without having to create a separate document for each way you want the document to be viewed. Adding HTML tags is the exact WRONG way to go about this.

  94. The problem is the concept of page. by ResidentSourcerer · · Score: 1

    With paper publications, the publisher decided on a page size, and Vol 1, Issue 27, page 341 was the same for everyone.

    The web has to be readable on everything from an iphone to a 2000 x 3000 pixel display.

    Page is not relevant to web browsers.

    As far as I can tell, the OP's big problem is the issue of bibliographic citation. How do you cite a particular point in the text, ideally in a way that can be done both automatically on computer, and by people reading the paper copy.

    Number the paragraphs.

    Just as the various flavours of TeX have prescribed macro packages, an academic journal could have a prescribed CSS style sheet.

    A citation then is in the form of author, article, paragraph number instead of page number.

    There are details to hammer out: Are tables given their own numbering or are they considered a paragraph. (Can be a real problem with floats.) Illustrations/figures, section and subheads? Stuff that has z-levels?

    --
    Third Career: Tree Farmer Second Career: Computer Geek First Career: Teacher, Outdoor Instructor, Photographer.
  95. QUality PDF output from HTML by rolandw · · Score: 1

    We searched for ages for a tool to produce high quality print output from HTML for exactly the same reasons before stumbling on Prince (http://www.princexml.com) and haven't regretted adopting it. We use it from wiki pages, for technical and sales documents, for theses. It is CSS3 aware but the underlieing documents still work in most browsers.

  96. Use paragraph numbers by aminorex · · Score: 1

    Apply grey superscript paragraph numbers, and use those to refer to the text, instead of page numbers. This resolves the problem of varying output devices which is the absolute show-stopper for incorporating pagination into html.

    --
    -I like my women like I like my tea: green-
  97. obQuote by alexo · · Score: 1

    I must know!

    Get used to disappointments.