HTML Tags For Academic Printing?
meketrefi writes "It's been quite a while since I got interested in the idea of using html (instead of .doc. or .odf) as a standard for saving documents — including the more official ones like academic papers. The problem is using HTML to create pages with a stable size that would deal with bibliographical references, page breaks, different printers, etc. Does anyone think it is possible to develop a decent tag like 'div,' but called 'page,' specially for this? Something that would make no use of CSS? Maybe something with attributes as follows: {page size="A4" borders="2.5cm,2.5cm,2cm,2cm" page_numbering="bottomleft,startfrom0"} — You get the idea... { /page} I guess you would not be able to tell when the page would be full, so the browser would have to be in charge of breaking the content into multiple pages when needed. Bibliographical references would probably need a special tag as well, positioned inside the tag ..." Is this such a crazy idea? What would you advise?
You seem to be talking about LaTex. It already exists. Don't reinvent it.
Congratulations, you're the 5,134,978th person to suggest a change to HTML which will prevent it from being reflowable!
Please step up to the spiked door in front of the acid pit to claim your prize.
As much as I hate Adobe, there's a reason why PDF files dominate acadamia..
This is exactly what CSS is designed for, presentation. The CSS3 Paged Media module already defines a number of the properties and settings you're going for. It even includes positions such as @bottom-center to allow you to position footnotes and references. The only thing missing is a way to mark this up in HTML, which could easily be done with anchors and the longdesc attribute, coupled with the CSS content: property. What you're looking for is a CSS3 enabled browser, not a new specification.
LaTeX already got mentioned, and probably makes more sense.
If you really want an unreadable super-general XML-based format, use ODF.
Is there a reason you don't want to use CSS? Because, there are already CSS extensions that do exactly what you want. The book Cascading Style Sheets - Designing for the web, was written using only HTML and CSS and prepped for printing using PrinceXML. The PrinceXML web site has a bunch of HTML+CSS similar samples, including academic papers.
What you want (being able to define pages) is wrong in many many ways.
You should, as an authoring tool, never define a page, or its dimensions, especially academic works, which will be printed in different formats, on different paper (A4/Letter/Tradeback/etc/etc)
At most, whatever markup you have, many define things like page breaks, but even then, they are more a typesetting issue.
What you want is either LaTeX or DocBook.
I use to have a funny sig, but slash cut it off, and I forgot what the punchline was.
Static configurations are available already, not the intelligent ones being requested. Has sufficed for what I needed:
To have print page break add: <p style="page-break-before: always">
Also, to hide odd font and underline for links:
<STYLE TYPE="text/css" MEDIA=print> <!-- A { text-decoration: none; color: black } --> </STYLE>
Yes, they have to be massaged a little.
Here's to losing my Karma Bonus again....
create yet another little-used and poorly supported document format...
I should have been more clear: HTML describes the *structure* of a document, of which pages are not a part.
As many have said above, you could use CSS if you really wanted to, since page specifications are presentational aspects of the document. Or, you could use LaTeX, which is designed for this kind of use.
Seriously. It's pretty bad. You can, however, use Docbook (or your own schema or Docbook extended with your own stuff) and XSLT it into XTHML (or something entirely different) at the end.
Most likely you just want to use Latex though.
If you want to save the source form or markup, use a language designed for it: LaTeX. LaTeX lets you represent all the things you would want to represent in an academic paper, it's fairly readable, very widespread, and has tons of tools. And LaTeX converts to both HTML and PDF.
If you want to display on the web, use HTML. It's meant for the web. It's not a good representation for paged media. If you must represent paged media, you need to use CSS or XSL, but you probably don't want to.
If you want archival quality paged representations, PDF is the only game in town really. HTML with CSS doesn't come close. But it doesn't make sense to save your own papers only in PDF because PDF is not really editable and doesn't have the semantic information.
You wouldn't want to use HTML for something like this, especially with newer versions of HTML. There has been a steady transition in HTML away from specification of the aesthetic appearance of a page. For this reason tags like and are considered nonstandard anymore, mostly because CSS does a way better (and cleaner) job of it.
The only way to tell the difference between a hamster and a gerbil is that the hamster has more white meat.
I used netscape communicator to write all my papers for uni, mainly because it was available under windows and unix (IRIX in our case) and could be read by anyone on any platform.
It was a reasonably easy to use editor, without all the useless crap most others have.
A few lecturers were quite impressed with the idea, the portability and cost were big factors.
...
There's a little-used standard that came out of the W3C along with XSLTs called XSL:FO. You write your document in XSL:FO markup, and then one of any number of processors like XEP to convert it into PDF or what have you.
http://www.w3schools.com/xslfo/default.asp
One of the original purposes of it was so that you could use XSLTs to transform the same XML data into both XHTML or XSL:FO for publishing. The standard never took off though. XSL:FO just doesn't have enough options to be typographically interesting, compared to SVG.
Of course, the right answer is LaTeX, but you might want to give XSL:FO a try for familiarity's sake.
Someone mentioned XML/XSL/FO. Don't try to write your content in XSL-FO. You'll hate every minute of it.
I'd look in to using DITA (Darwin Information Typing Architecture). It's a set of canned XML structures, plus a specification for how to process and customize those structures. It includes tags for stuff like footnotes...I bet it covers a lot of your use cases. There are some good intros to how these XML structures work here: http://dita.xml.org/book/dita-wiki-knowledgebase
As DITA is XML, you can convert it to HTML and whatever else you feel like, pretty easily. There's an open-source implementation of the DITA spec called the DITA Open Toolkit (http://sourceforge.net/projects/dita-ot/). The DITA Open Toolkit includes stylesheets/scripts to publish HTML and PDF, among other things. PDFs are published via XSL-FO. Just like HTML needs a web browser to render something useful, XSL-FO requires a FO processor to create a PDF. So, in the end you write DITA, XSLT and other scripts transform that DITA to XSL-FO, the a FO processor consumes the XSL-FO and spits out a PDF. The DITA Open Toolkit comes with an open-source FO processor (Apache FOP). FOP doesn't fulfill everyone's needs, but it might work very well for you.
Unfortunately, working with the Open Toolkit and customizing its output can be a bit unwieldy. http://groups.yahoo.com/search?query=dita+users is a pretty good place to look for help.
yes, latex is nice, but it would be even better, if basic TeX would
be understood by browsers. About 10 years ago, IBM had a cool plugin called texexplorer.
The plugin would compile latex on the fly. No need to publish a PDF. It worked
pretty well for basic documents which would not rely on macros.
Still, to address the question of the submitter, it would be nice to have something like
<latex>
$\int_0^1 \frac{\sqrt{\sin(x)}}{1+x^2} \; dx$.
</latex>
It would not have to be the full latex stack but the ability to place mini latex pages into
HTML documents. Its a pity techexplorer technology seems have disappeared. If IBM would
opensource it, it could become an add-on for firefox.
"You know, if we abstract this back one level" Now we find the true terror of computer science.
Yay me!
See, as someone has already pointed out, there's at least one such tool that's in wide use already: TeX and LaTeX. If you don't like that one, it turns out that HTML, with CSS and a little bit of Javascript, is perfectly capable of doing all the things you want, too. You just have to learn how. Have a look at Lie's Cascading Style Sheets: Designing for the Web (written and typeset in HTML/CSS) and at Prince XML for detailed examples.
The ability to cite an HTML document is something that would indeed be useful. The ability to hard code page numbers into an HTML document isn't. The reason why academia and the press have been so resistant to HTML, historically, is that you don't get any control over page layout. Which means that you can't refer to things by page number.
The solution isn't to fix HTML so that you can number pages. It is to fix the bibliographic references to not use page numbers. Generally speaking, it's not hard to number documents by section, and you can make the numbering fine-grained enough for bibliographic references. Then refer to the chapter and section, rather than the page number in your bibliography, and you're done. No need to "fix" HTML.
It might make sense to ID paragraphs in HTML, so that you could simply refer to the paragraph ID in your bibliography. If this were simply document metadata, and didn't have anything to do with layout, it would work pretty well. As a bonus, you wouldn't need to renumber, because the ID would just be an arbitrary cookie, and wouldn't need to make sense to a human.
Of course, with hypertext, there's really no need for a bibliography anyway. Just link to the text you're referencing... But I realize that that's impractical in academia at the moment. I'm just saying...
There's no complexity problem that cannot be solved by adding a layer of abstraction, nor performance problem that cannot be solved by removing a layer of abstraction.
Though I must note that you can already define your own tags in HTML+CSS and, while the W3 validator will (rightfully) complain loudly about them, most browsers deal with them just fine.
No problem is insoluble in all conceivable circumstances.
<html>
<head>
<title>Abstract of a usable design</title>
<style type="text/css">
@media print {
body { margin: 2.5cm; }
}
@media screen {
body { margin: 50px; width: 50%; }
}
body { font-family: sans-serif; font-size: 12pt; }
</style>
</head>
<body>
<h1>It's so crazy it just might work</h2>
<h2>and other html inspired musings</h2>
<p>Why not just use css?</p>
<p>Also, don't worry about page numbering. that's the browser's job.</p>
</body>
</html>
Why the OP specified not using CSS and then suggested an HTML element that looks almost exactly like CSS?
CSS has a method of creating pages, for printing and more. It's no more difficult to learn than HTML is. You could use XML, create all the custom tags you want, and use XSL (oh look stylesheets again) to style the XML however you want.
HTML5 is coming out in the near or distant future, if you have suggestions for tags and functions, you might want to try to get involved with the W3.
Julie Moult is an idiot.
These different browsers render HTMLdifferently.
is made for what you're talking about
Documents are written in human-readable text format - good for storing in version control and using for diffs
Python Docutils is used to convert to HTML and/or LaTeX and a few other formats
rst2pdf is a tool that converts to beautiful PDF (easier than using Docutils + LaTeX)
http://www.gnu.org/software/groff/
http://en.wikipedia.org/wiki/Groff_(software)