Slashdot Mirror


Extracting HTML and Images from MHT and CHM?

smoon asks: "I've got a boatload of .CHM files, and have recently run across some .MHT files. The .MHT format appears to be how MS Internet Explorer saves a web site for offline viewing, and appears to be a basic MIME-format with all images inlined as base64 encoded data. I've poked around a bit looking for a utility that will extract the HTML files and associated graphics so that I can view these in Linux, but no luck so far. The .CHM format is billed as a 'compiled' HTML, and boils down to the equivalent of a tarball, in some Microsoft proprietary format, that shows a series of web pages. MS has a .CHM developers kit that allows you to extract all of the data, but the links stop working and it ends up not being very useful. Anyone know of a code that can extract the HTML and associated images from .CHM or .MHT files?"

1 of 14 comments (clear)

  1. But, the real question is.... by Anonymous Coward · · Score: -1, Troll

    .CHM files are compressed html files. More specifically, they are Windows help files in a compressed html format.

    So, I must ask, why would you want to read Windows help files on your Linux box???? If they are not Windows help files but, are in fact ebooks in the Microsoft Reader format (.chm) then you should realize that the license for the ebook doesn't allow you to read it on a Linux box. If you do, you face charges under the DMCA.