Extracting HTML and Images from MHT and CHM?
smoon asks: "I've
got a boatload of .CHM files, and have recently run across some .MHT files.
The .MHT format appears to be how MS Internet Explorer saves a web site
for offline viewing, and appears to be a basic MIME-format with all images
inlined as base64 encoded data. I've poked around a bit looking for a utility
that will extract the HTML files and associated graphics so that I can view
these in Linux, but no luck so far. The .CHM format is billed as a 'compiled'
HTML, and boils down to the equivalent of a tarball, in some Microsoft
proprietary format, that shows a series of web pages. MS has a .CHM
developers kit that allows you to extract all of the data, but the links stop
working and it ends up not being very useful. Anyone know of a
code that can extract the HTML and associated images from .CHM or
.MHT files?"
A .CHM is a compilation of HTML files with support for a tree style view of the documents in it, as well as binary files (examples), images, browse order (associating a "forward" button with the page that represents the page after the current page), searching, etc...
It's a pretty handy way of distributing online documentation, kinda like PDF but for HTML.
Being HTML you can still dynamically resize the window and have the text reflow - In my opinion that's it's big advantage over PDF. A PDF is basically a rendering of a page - not really what you want for an online help system.
It probably ends up just being a bunch of standard filenames inside a .CAB file (the .CAB format is what Microsoft puts a lot of their install packages and other archives into).
As for the format it's in, here's what I found on Google.
- Steve
First off, MHT is just a mime message, run mpack on it.
However, for CHM extraction, you can use this portable CHM extractor. I don't think Matthew has officially released it, but it should be OK to use. Get in touch with him if you want.
Does my bum look big in this?
HTML Help Workshop from MS can perfectly extract all of the files in .chm, in their original state. Open it up, click on decompile and voila... I dont see how you managed to run into problems, i have done so with hundreds of chm files...sure, it doesnt always save all chm related data however files are always preserved in their original state