Question gzip Maven Jean-loup Gailly
Jean-loup Gailly is the author of gzip and, now, CTO for Mandrakesoft, purveyors of Linux-Mandrake. Jean-loup's home page tells you quite a bit about him, including some interesting peeks into his life beyond Linux and open source software. Please try to keep it down to one question per post. Submitted questions will be chosen from the highest-moderated. Answers will appear within the next week.
Using tar makes things unnecessarily complicated. There is less support for tar around on non-UNIX platforms, and 'embedded compression/archiving' seems to cause great trouble for newbies who can just about handle WinZip and nothing more.
If gzip is to become a truly viable alternative to patented zip, I think the .tar.gz should become a thing of the past.
Remove the old legacy tape archive!
Would you think it wise to roll alternatives to the Lempel-Ziv algorithms into gzip to make other compression utilities less attractive?
It seems that this approach is adopted by other applications (ssh uses multiple encryption engines, and TIFF has allowed several compression techniques for quite a long time).
Would you support an effort to implement bzip2 within gzip? Do you think such a thing could be done while maintaining gzip's stability?
I notice you are a keen Go player... the GNOME version of Go (Iagno) seems much more attractive to me than the KDE version (kgo). I was wondering what software you use to play games, or are you not really interested in the interface at your level of play?
Regards,
Denny
# Using Linux in the UK? Check out Linux UK
Police State UK - news and
On your website, in the history section, you have a link to some information about pulsars...
Were you an astronomy student, and if so how did you go from studying pulsars to CTO of a major Linux distributor?!?
Regards,
Denny
# Using Linux in the UK? Check out Linux UK
Police State UK - news and
I have read about an agreement between Mandrake and LinuxOne to create a chinese Linux development center. Did any good come out of this? Couldn't Mandrake's otherwise excellent reputation be damaged by such relationships?
I strongly believe that trying to be clever is detrimental to your health. -- Linus Torvalds
Why do you write code like this:
z = (z = g - w) > (unsigned)l ? l : z;
It makes your code almost impossible to read. Do you even know what this line does anymore?
The answer to your question is no. Very briefly, it is not possible to build a universal compressor that can reduce the size of all possible inputs, nor is it possible for a compressor to emit an output with less information content (ie. Shannon entropy) than the input. It is not possible to have a compressor that takes in, say all 1MB text files, and always outputs 10k compressed files simply because most 1MB text files contain more information than can be expressed in 10k.
A much more intuitive argument is the "pigeonhole principle." Let's assume that there are 16 holes in a wall, to which each is associated with a message. It is impossible for 17 messages to each be uniquely associated with a hole because there are not enough holes avalible. A 4-bit file can only represent 16 different messages, regardless of what algorithm is used to compress the message...unless, that is, you don't care about the compression being reversible!
If you could compress anything and put it in your pocket what would you choose and why?
blog and junk
I see from your homepage you avoided patents when writing gzip, how do you feel about the current explosion of software related patents?
blog and junk
I guess that you have at least a little something to say about this.
Is the 586 optimization enough to justify Mandrake's position? Are you especially proud of any of the architectural differences between the distributions (from what I have been told, the Apache-PHP layout is quite a bit different).
How do feel about the steps that Red Hat has taken to change their distribution in reaction to yours?
The Data Compression Book was an excellent reference when it came out, but there are some hot topics in compression that it doesn't cover - frequency-domain lossy audio techniques (MP3), video techniques (MPEG2 and especially MPEG4), wavelets (Sorenson video uses these, I believe, and JPEG2000 will), and the Burrows-Wheeler transform from bzip.
Do you have any plans for a new edition of the book, or good Web references for these techniques? BZip is covered well by a Digital research note, but documentation for MPEG2 seems only to exist as source code and I can't find anything concrete about using wavelets for compression. The data is all there on the comp.compression FAQ, but the excellent exposition of the book is sorely lacking.
When I think of a game like go or chess, I think that each player develops there own algorithm to beat their opponent. If you agree, what relationships or similarities do you see between your intrest in Go and your intrest in compression?
Inquiring minds want to know.
When is gzip going to provide (transparent) support for bzip2 files and the Burrows-Wheeler algorithm?
Will BW be an algorithm option within the gzip file format itself ever?
However, much of the software you've written has started gathering a few grey hairs. Gzip, for example, has been at 1.2.4 for many, many moons, and looks about ready to collect it's gold watch.
Is compression software in a category that inherently plateus quickly, so that significant further work simply isn't possible? Or is there some other reason, such as Real Life(tm) intruding and preventing any substantial development?
(I noticed, for example, a patch for >4Gb files for gzip, which could have been rolled into the master sources to make a 1.2.5. This hasn't happened.)
It's a small world and it smells funny; I'd buy another if it wasn't for the money; Take back what I paid (SoM)
I noticed that you allowed the people who make the Winzip product to incorporate code written for Gzip. I think it's cool that you did that, because it would be horrible if winzip couldn't handle the gzip format, but at the same time, what are your thoughts about allowing free software code to be included in closed-source products?
Just out of curiosity, (tell me it's none of my business if you want to and I'll be OK with that) did you receive a licensure fee from the company that makes Winzip for the code?
-- Truth goes out the door when rumor comes innuendo. -- Groucho Marx
The field of compression has been thronged with patents for a long time - but patents at least reveal the algorithm.
What do you think of the expansion of trade-secret algorithms (MP3 quantisation tables, Sorensen, RealAudio and RealVideo, Microsoft Streaming Media) where the format of the data stream is not documented anywhere?
Tom
The compression world has many patents, notably for Lempel-Ziv compression as used in GIF. What is your view on companies patenting non-obvious algorithms for such processes as data compression?
11.0010010000111111011010101000100010000101101000
I am a happy owner of The Data Compression Book (2nd Ed). With the increasing availability of compression routines within libraries (Java's GZIP streams spring to mind), does this make your book a little unnecessary?
Should software authors continue to write their own compression routines, or simply trust the versions available to them in library form?
I can see some definite advantages to library code, i.e. the ability to upgrade routines, and having standardized algorithms which can be read by any program which utilizes the library.
Doug
Venn ist das nurnstuck git und Slotermeyer? Ya! Beigerhund das oder die Flipperwaldt gersput!
As we all know, at first Mandrake was little more than a repackaged version of RedHat. That's changed a bit with the newer versions. My question is this: to what degree will Mandrake continue to differ from RedHat and will there ever be a "developer" version (i.e. one that is centered towards those who are a bit more technically competant)?
Brad Johnson
--We are the Music Makers, and we
are the Dreamers of Dreams
Brad Johnson