Convert from HTML to XML With HTML Tidy
An anonymous reader writes "HTML Tidy, a powerful tool to help convert old HTML pages to newer standards, such as XML. This tip demonstrates how to convert HTML documents to XML (or more specifically, XHTML) with a simple, open source tool. This conversion is useful for webmasters who are migrating to XML. It can also help XML converts who have to interface with legacy HTML tools."
A few days ago I had to convert HTML pages into XHTML, stripping out a few extra elements and attributes. I used xsltproc, from libxslt , which uses the parser from libxml2 , and this has the option of parsing strict HTML into an XML DOM.
XMLTidy can be useful when you have a not-so-strict HTML, but for most quick conversions I've found libxml2 &co to be quite light and easy.
dakkar - mobilis in mobile
HTML Tidy has been our for years.
Check out the Tidy Homepage or the project on SourceForge.
Popisms.com - Connecting pop culture
Ian Hickson makes a good case here that using XHTML may not be the right direction to go -- at least at this point.
- say that I use XHTML
- make it easier to parse my pages
HTML 4.01 doesn't make you expressly close your tags, which causes XML processors to choke and die. I'd rather write it in a usable format once than have to Tidy-parse every time I want to update my search engine. Plus XSLT really is cool. I've got (somewhere) a stylesheet I wrote that will validate form data for me and then I can apply other xslt stylesheets to make the output, further seperating the output from the script that does the magic. Great way to update the look of a page without messing up (accidentally, of course) the code I wrote months ago.If you are running MacOS with BBEdit, you can use the BBTidy plugin to get HTML Tidy integration in BBEdit.
JP