Extracting HTML and Images from MHT and CHM?
smoon asks: "I've
got a boatload of .CHM files, and have recently run across some .MHT files.
The .MHT format appears to be how MS Internet Explorer saves a web site
for offline viewing, and appears to be a basic MIME-format with all images
inlined as base64 encoded data. I've poked around a bit looking for a utility
that will extract the HTML files and associated graphics so that I can view
these in Linux, but no luck so far. The .CHM format is billed as a 'compiled'
HTML, and boils down to the equivalent of a tarball, in some Microsoft
proprietary format, that shows a series of web pages. MS has a .CHM
developers kit that allows you to extract all of the data, but the links stop
working and it ends up not being very useful. Anyone know of a
code that can extract the HTML and associated images from .CHM or
.MHT files?"
.CHM files are compressed html files. More specifically, they are Windows help files in a compressed html format.
So, I must ask, why would you want to read Windows help files on your Linux box???? If they are not Windows help files but, are in fact ebooks in the Microsoft Reader format (.chm) then you should realize that the license for the ebook doesn't allow you to read it on a Linux box. If you do, you face charges under the DMCA.