Slashdot Mirror


FEAD Compressing Compressed Files by 50-75%?

An anonymous reader asks: "I just installed Acrobat Reader and found that it was using FEAD which claims - 'FEAD© Optimizer© significantly reduces the size of application programs on average by 50% (in some cases up to 75%, depending on the specific software), even when they are already compressed with common compression technology like ZIP or CAB.' . It seems that they optimize each application individually at thieir labs. But an average of 50% compression on already compressed binary files seems to be too good to be true. Anyone familiar with how someone may be able to achieve this?"

75 comments

  1. Old compilers by mystran · · Score: 1
    IIRC some old IBM Fortran compiler used to produce very small by using almost nothing but subroutine calls in the actual program producing something kinda like P-code. Maybe they are doing something similar.

    Other than that ? Probably just marketing.

    --
    Software should be free as in speech, but if we also get some free beer, all the better.
    1. Re:Old compilers by RevAaron · · Score: 1

      Uh, how would this be related? Compressing data files is a lot different from compiling to bytecode, or to a machine language which uses dynamic libraries.

      --

      Working toward a usable PDA environment in the spirit of Newton OS: Dynapad
  2. This is a common hoax. by Futurepower(R) · · Score: 2, Informative

    This is a common hoax. Maybe 2 years ago, another Slashdot editor posted this hoax. So, it's a repeat hoax for Slashdot, too.

    1. Re:This is a common hoax. by thumperward · · Score: 2, Funny

      remember this comment well, for when you get the opportunity to meta-mod the parent post's crack-addict "Informative" rating into the Netherworld./b.

  3. It's not compression by photon317 · · Score: 4, Informative


    The thing they tout as FEAD is basically a load-over-network-on-demand thingy. They haven't actually developed anything that does compression, they're just storing some of the app on a server somewhere to be downloaded on demand. The hype at their site mislead you, like it was meant to do.

    --
    11*43+456^2
    1. Re:It's not compression by bobbozzo · · Score: 3, Informative

      When you download Acrobat, it usually will download an "Adobe Download Manager" or something like that.

      That is NOT what is being discussed here.

      Even if you bypass using the download manager, it still uses FEAD to decompress and install AcroRead.

      One could easily disprove your theory by unplugging their net connection during the FEAD decompression... Done... no adverse affect.

      Nonetheless, the installer is VERY slow, and is still bigger than the AcroRead 5.1 installer, which did not use FEAD.

      Making users go through this many steps (download the download manager. run it. wait for it to download. wait for fead...) and slowness is insane.

      --
      Nothing to see here; Move along.
    2. Re:It's not compression by photon317 · · Score: 1


      Well, the information on FEAD available by going where the /. submission references shows a pretty picture along with some very convoluted wording that seem to indicate that they're using the word "compression" to describe dynamic loading of peices of the application over the network, thereby "compressing" the original install download.

      --
      11*43+456^2
  4. Compression is easy by cybermage · · Score: 4, Funny

    It's decompressing the file that's hard.

    You can compress all your files down to a single bit using this patented two step process:

    1. Discard all zeros.
    2. Use one to represent any length sequence of ones.

    This is as reliable a compression scheme as most backups to tape I've ever seen, and you can fit a huge number of files onto a single floppy.

    1. Re:Compression is easy by Alroussassa · · Score: 1

      This confuses me: If you have this: 1101101001001 and you discard all zeros it becomes this: 1111111 you have lost information, you no longer know where those zeros went! Can you maybe reexplain the compression technique or link me to a site that does? Thanks, -Alroussassa

    2. Re:Compression is easy by CableModemSniper · · Score: 1

      No no you're compressing all wrong. First you discard all zeroes... so 1101101001001 becomes 1111111. Now you represent any-length run of 1s with a 1. so 1111111 becomes 1. so it goes 1101101001001 -> 1.

      --
      Why not fork?
    3. Re:Compression is easy by Alroussassa · · Score: 0, Redundant

      Ahh! Try not to be annoyed by my newbieness (I just started programming recently).

      Ok, fine, so now you end up with 1 from 1101101001001. But how do you go backwards?

      I receive 1.

      How do I decompress 1 to 1101101001001?
      There isn't enough information?




      -Alroussassa

    4. Re:Compression is easy by FaultMachine · · Score: 1

      Just got done reading this thread and I am intently waiting for an answer :D How the hell would you decompress 1 to 1101101001001? help me, I think I'm wondering into the deep side of the pool.

    5. Re:Compression is easy by optikSmoke · · Score: 5, Funny

      Thats too easy.

      First, you expand the 1 to the requisite number of 1s:
      1 -> 1111111

      Then, reinsert the 0s:
      1111111 -> 1101101001001

      Thus, 1 -> 1101101001001

    6. Re:Compression is easy by mozkill · · Score: 1

      LOL... ok , i got a good laugh out of this one... thanks guys. :-)

      --

      -- Betting on the survival of the media industry is a serious risk. I advise investing elsewhere.
    7. Re:Compression is easy by qqtortqq · · Score: 1

      For god's sake man, its a joke!

    8. Re:Compression is easy by King+of+the+World · · Score: 1
      That reminds me of my video over phone technology. I compressed several major motion pictures down to 20KB - a mere 5 second download for most people!

      Of course, the player itself is 65 gig...


      (guess I shouldn't have used Delphi)

    9. Re:Compression is easy by Alroussassa · · Score: 0

      Yes, but you are adding informaition to the file! Where is the information that says 1 decompresses to 1111111 or whatever, where is the information that says where to insert the zeros?

    10. Re:Compression is easy by Havokmon · · Score: 1
      Thus, 1 -> 1101101001001

      Am I the only person who looked down at his t-shirt for comparison?

      --
      "I can't give you a brain, so I'll give you a diploma" - The Great Oz (blatently stolen sig)
    11. Re:Compression is easy by BJH · · Score: 1

      I'm praying you're not serious.

    12. Re:Compression is easy by Alroussassa · · Score: 0

      I'm dead serious, and obviously am missing something clear. Will someone kindly take two minutes to fill me in? :-) -Alroussassa

    13. Re:Compression is easy by Mr+Z · · Score: 1

      Uhm, yeah. The original post was a joke? It sounds like an algorithm that should go into LZip.

      --Joe
  5. Marketing obfuscation by jrpascucci · · Score: 4, Insightful

    I think that it reads as you interpret: if you put some stuff in a .ZIP, it will further compress it. But, on a very close reading, they are only comparing sizes, and not necessarily saying they are compressing the zip file.

    From the article: "Netopsystems specialists combine and customize these tools and processes for each individual software product so that optimal size reduction results are achieved."

    Note the following from the whitepaper: "Usually software producers compress their data by generating cabinet files or the like...Applying a conventional compression tool like WinZip or WinRAR on such data does not lead to appreciable - often negative - results."

    Read strictly, this says what we know: compressing a compressed file generally doesn't work. They aren't saying they compress the compressed file here.

    Note that towards the bottom, they are comparing 'lossless compressed' data to what they do.

    So, here's my bet: they probably do something like crack open a cab or zip, parse a PDF, for example, for 'magic things' that can be ignored without changing the functionality ('lossy' but nothing of significance lost), or take an HTML file and strip all spaces and newlines between tags. Similar things could be done for other file types: Removing quotes and instead, magic-quoting commas in a CDF. Etc, ad inifinitum.

    All in all, it's lame, but so is most software.

    If you have a gigantic amount (hundreds of gigs terabytes) of different files to back up or move around, with so many file formats that you can't keep them straight, then it might be worth it. If you are lazy and it's cheap, it might be worth it. Other than that, I fail to see the real utility here - disk is cheap, bandwidth is getting cheaper, and reasonably assuming the bulk of this data is generated (an adequate assumption), you can do very similar things by fiddling around with the the output formatting in code.

    J

    1. Re: Marketing obfuscation by Black+Parrot · · Score: 2, Funny


      > Similar things could be done for other file types: Removing quotes [...]

      When I have files with lots of quotes in them I reduce the size by using single quotes instead of double quotes.

      --
      Sheesh, evil *and* a jerk. -- Jade
    2. Re: Marketing obfuscation by dylan_- · · Score: 1
      When I have files with lots of quotes in them I reduce the size by using single quotes instead of double quotes.
      This is just wasteful as it only applies to quotes. You should use a smaller font so all characters are reduced in size.
      --
      Igor Presnyakov stole my hat
  6. How to do anything by kurosawdust · · Score: 5, Funny
    "Anyone familiar with how someone may be able to achieve this?"

    "Lying through one's teeth" comes to mind...

    1. Re:How to do anything by natmsincome.com · · Score: 2, Informative

      I ready the white paper and it looks like they are actually providing a number of products:

      Download on demand:
      Think Quicktime/IE/etc. Download a small download then download the components they want. I expect they'd also use a protocol similar to rsync which makes downloading alot faster.

      Code Inspection:
      This is where they say then can decrease to size of the executable. If you've done progaming then you know that you can make executables alot smaller. Here's some examples: Remove inline macros and make them functions, remove debug information.

      Ziped executables:
      You can have realtime zipped executables. There are about 5 different forms of this that I've seen - they zip/encrypt the executable this makes it about the same size as if it was zipped but you don't have to unzip it to run it(It still uses the space in memory) You also have to break the encryption before you can reverse enginer it. The overhead is about 5 - 20% loading times.

      Basically the provide servers that campanies could do themselves but get someone with experiance to do instead.

      If I did the following:
      *Used realtime compresion on the exe
      *Optimized my code so that it didn't incluse useless code.
      *Moduralised my code (reuse etc)
      *Made the code more plugin like.
      *Added a bootstrap downloader.
      *Make the software come in three versions Lite, Full and express.

      If I did all of this I'd easily be able to half the bandwidth needed for a file without really changing it that much.

      As for the CD the data has to be over 700MB before they can decrease the size so I'm fairly sure they'd be able to optimse it.

      The service they provide isn't unique. It's just a convient package of 10 or more technologies to make life easy for other companies.

  7. i know this one by onya · · Score: 3, Funny

    by employing the latest in smoke and mirrors technology. they've invented a new mirror that reflects 110% of all light. neat huh?

  8. Wow. by Lendrick · · Score: 2, Funny

    That Site (c) is an Eyesore (c). I wonder if these Dipshits (c) realize that all those "(c)" marks make their Site (c) Difficult (c) to Read (c).

    1. Re:Wow. by stefanlasiewski · · Score: 3, Funny

      Difficult (c) to Read (c).

      It may be difficult to read, but it sure is easy to FEAD© !!!

      --
      "Can of worms? The can is open... the worms are everywhere."
    2. Re:Wow. by Kelerain · · Score: 3, Funny

      Difficult to read, but easy to compress.

    3. Re:Wow. by cgenman · · Score: 1

      Amazing! (tm) "Their Lawyers" (tm) are a little lax (tm) in educating their De-signers (tm) about the branches of IP (tm) law.

      (c) 2003 Cgenman (tm) All rights reserved.

    4. Re:Wow. by Wesley+Felter · · Score: 1

      Also, they apparently don't understand the difference between (C) and TM, or the first thing about trademark law.

  9. EXE compressor? by almightyjustin · · Score: 2, Informative

    Sounds to me like an EXE compressor like UPX - they can compress EXE files better than a ZIP archive can (by taking advantage of known aspects of executable files); so by unzipping, EXE-compressing, and re-zipping, one can reduce the size of an already existing ZIP archive.

    --

    Omnes arx vestrum sunt adiuncta nobis.

    1. Re:EXE compressor? by 42forty-two42 · · Score: 1

      No. If you compress something twice, no matter the methods used, the second time is unlikely to give any benifit.

    2. Re:EXE compressor? by Naikrovek · · Score: 2, Insightful

      wrong. you say "unlikely to give any benefit," but the truth is that it can be beneficial.

      zip an .exe file. or gzip an .so, whatever you want. then zip or gzip (or bzip2 for that matter) the file again - the doubly compressed file will be smaller than the compressed file it contains.

      this is why people zip up movie files on their sites, it does make a difference, and if you only save one meg on your 40 meg movie, and 1,000 people download it, you just saved yourself one gigabyte of transfer fees, during whatever timeframe those 1000 people downloaded.

      i'd say there's a benefit. it depends on what's in the compressed files, but if you're serving this file on a high traffic site it can often pay to double compress your files.

    3. Re:EXE compressor? by almightyjustin · · Score: 1

      Maybe I didn't phrase that too well. You're correct, zipping the already compressed EXE won't do much. I was using EXE compressors as an example of how it's possible to reduce an existing zip file's size by "50-75%", assuming that zip file contained a program (which the story states). One would probably zip the compressed EXE anyway to include other support files etc. in one file but that wouldn't offer much benefit in the way of compression.

      --

      Omnes arx vestrum sunt adiuncta nobis.

    4. Re:EXE compressor? by Anonymous Coward · · Score: 0

      Stupid boy. Think it through.

    5. Re:EXE compressor? by Anonymous Coward · · Score: 0

      duh! everyone knows 1000MB != 1GB

    6. Re:EXE compressor? by qqtortqq · · Score: 1

      Well, I didn't believe you, so I tried it out. Here's the results:

      -rw-r--r-- 1 tort tort 337648 Jun 11 02:27 libslang.so.1.4.4
      -rw-r--r-- 1 tort tort 149671 Jun 11 02:26 libslang.so.1.4.4.gz
      -rw-r--r-- 1 tort tort 148000 Jun 11 02:30 libslang.so.1.4.4.gz.gz
      -rw-r--r-- 1 tort tort 148051 Jun 11 02:31 libslang.so.1.4.4.gz.gz.gz


      double compressing saved 1671 bytes. On triple compressing, thats when you end up with a net loss. You learn something new every day, I suppose...

    7. Re:EXE compressor? by Anonymous Coward · · Score: 0

      Great, 1% smaller.

    8. Re:EXE compressor? by Zocalo · · Score: 1
      Certainly, it's unlikely and unexpected, but it happens far more than you might expect, and a lot depends on the first compressor. Many compressors don't actually compress their own data, for instance ZIP doesn't make much effort at compressing its file table. As a result, if you "zip foo.zip *.gif ; zip bar.zip foo.zip" then "bar.zip" can be significantly smaller than "foo.zip", especially for very large numbers of files. Try looking at the Usenet binaries groups as well - they frequently double compress files with combinations of ARJ/RAR/ZIP before posting and often squeeze several additional percent by getting the right combo.

      That still doesn't change the fact that, in my experience, the most effective compressor of large amounts of executable code is just to run "strip" on it. Just for giggles a few years back I performed a clean install of a major Linux distro, then ran "strip" on all the executables. There were a lot of errors for scripts etc., but overall disk saving was around 10%, and with no consistency what so ever in what was or was not stripped. It's not just FOSS that does this too, I've quite often found commercial Windows binaries with the symbol table still attached as well...

      --
      UNIX? They're not even circumcised! Savages!
    9. Re:EXE compressor? by cperciva · · Score: 1

      On a (very marginally) related note, the same applies to binary patches. When applied to two versions of the same binary, bsdiff (which can take advantage of the structure of executable files) routinely produces patches which are 5-10 times smaller than those produced by Xdelta (which can't).

      In short: Executable files are far more than just streams of bytes.

    10. Re:EXE compressor? by Glonoinha · · Score: 1

      - There were a lot of errors for scripts etc., but overall disk saving was around 10%,

      Translation : It didn't work anymore, but I recovered 10% of the disk space that it used to use.

      Hell that's easy, just go to any random directory, pick the largest file in there and delete it. Pretty much the same result.

      --
      Glonoinha the MebiByte Slayer
    11. Re:EXE compressor? by Zocalo · · Score: 1
      Translation: it would appear that you don't understand "strip", like most of the package compilers on the distro release concerned.

      For the non-developers who can't run "man strip"; "strip" removes the *optional* fluff added to a binary executable (including libraries). Since this is is really only useful when debugging the code, something a user doesn't do that often and certainly shouldn't be done on a production box strip will remove it for you. It does not stop the executable from running. When you try and run it on a file format it doesn't understand (a perl script say), it gives an error and leaves the file alone. Basically therefore, 10% of the /usr partition utilisation on the system was symbol tables.

      In summary, if you are interested in reducing executable size, strip the binaries *then* apply the code compressor, assuming that it doesn't remove the symbol table for you anyway, of course.

      --
      UNIX? They're not even circumcised! Savages!
    12. Re:EXE compressor? by Glonoinha · · Score: 1

      Naw, I was just joking about the point you made about some of the scripts not working after they had been stripped.

      --
      Glonoinha the MebiByte Slayer
    13. Re:EXE compressor? by BJH · · Score: 1

      He didn't mean that the scripts stopped working; he meant that strip didn't work on the scripts.

    14. Re:EXE compressor? by Tony-A · · Score: 1

      Another stunt with multiple files.
      First zip, no compression.
      Second zip, zip the first zip.
      Can be significantly smaller than compressing on the first zip.

    15. Re:EXE compressor? by pthisis · · Score: 1

      > Another stunt with multiple files.

      > First zip, no compression.
      Or use another uncompressed archive (e.g. tar)...

      > Second zip, zip the first zip. ...and then compress it (e.g. with gzip)

      > Can be significantly smaller than
      > compressing on the first zip.

      Yes. That's why a .tar.gz is usually smaller than a .zip of the same files (modulo speed flags, etc) even though they use essentially the same compression algorithm.

      You do lose something, though; a .zip is indexed so you can easily extract individual files without decompressing the whole archive. Whether that matters depends on the circumstances (in something like a .jar file you probably want fast access to individual files).

      Sumner

      --
      rage, rage against the dying of the light
    16. Re:EXE compressor? by pthisis · · Score: 1

      Since this is is really only useful when debugging the code, something a user doesn't do that often and certainly shouldn't be done on a production box strip will remove it for you.

      A production box probably does want debug info--when things start going awry there you want to track them down _fast_. Attaching a debugger to the misbehaving app can save tons of downtime.

      Ideally you wouldn't have any bugs on the production machine, but in the real world...

      Sumner

      --
      rage, rage against the dying of the light
  10. Compressing data exe compressors don't by TheSHAD0W · · Score: 3, Interesting

    When you use an executable compressor, like PKLITE, on an executable file, it can't compress all the data. This is because EXEs will dynamically load more data, and if that data is compressed, the code can't read it.

    I suspect these guys are going in and manually altering the code to perform a decompression. This would certainly produce a benefit.

    Here's something for you to try: Take an executable and zip it. If it compresses, then there's probably SOME give in it. And most executables I see are compressable.

    1. Re:Compressing data exe compressors don't by Anonymous Coward · · Score: 0

      huh? did that make any sense?

    2. Re:Compressing data exe compressors don't by Sentry21 · · Score: 1

      The operating system relocates the EXE when it's loaded. EXE's don't (normally) load data from themselves, they load them from external overlays. As long as the execution point points to where the PKLITE code is, it will run fine.

      --Dan

  11. Hmm... by bobbozzo · · Score: 2, Interesting
    I just RAR'd (RAR 3.11 with -m5) my AcroRead folder... it came out 300k bigger than the 16MB full installer...

    Using ZIP -9 gives a 20MB file.

    So, FEAD offers slightly better compression. (I know there's other crap, including the installer, registry settings, icons, ...)

    Still, is it worth the annoyance of the greatly increased install time?

    Also, how is FEAD saying they are 50% better than other compressors?

    --
    Nothing to see here; Move along.
  12. thumperward is a believer. by Anonymous Coward · · Score: 0

    I think thumperward is trying to tell us that he thinks it is possible to compress already compressed files by another 50%.

  13. How I'd do it by PapaZit · · Score: 1

    I have no idea how FEAD works, but here's how I'd do something similar:

    A large portion of shipped executables are blocks of standard code from the compiler. If you're using a Microsoft compiler, you can strip out the standard chunks and pull those chunks in from the binaries that are already in Windows.

    If you're using another compiler, you can still probably do the same kind of thing: some intelligent block compression with the included code that'll do better than the "dumb" compression from "zip" or other algorithms.

    You could combine that with compiler space optimization tricks, too: loop RE-rolling, for example. A lot of compilers do tricks to make code faster. Not many do things to explicitly make code smaller. A "shriking" compiler (or a disassembler/reassembler) that produced small code combined with an "expander" app that made the code bigger but faster could make very small apps.

    --
    Forward, retransmit, or republish anything I say here. Just don't misquote me.
    1. Re:How I'd do it by Hard_Code · · Score: 1

      'combined with an "expander" app'

      Otherwise known as JIT...

      --

      It's 10 PM. Do you know if you're un-American?
    2. Re:How I'd do it by zero_offset · · Score: 1
      A large portion of shipped executables are blocks of standard code from the compiler. If you're using a Microsoft compiler, you can strip out the standard chunks and pull those chunks in from the binaries that are already in Windows. If you're using another compiler, you can still probably do the same kind of thing: some intelligent block compression with the included code that'll do better than the "dumb" compression from "zip" or other algorithms.

      Actually, this is what the "sliding dictionary" is in LZW compression (and I assume in other types of comrpession, but I've only written LZW codecs in my life, and never studied compression formally). You scan through the target files looking for large chunks of redundant content. Through some process of weighting number-of-repeated-bytes versus number-of-occurrances, you assign progressively longer "codes" to each chunk of repeated bytes. Viola, compression. The various flags to a compression application tell it how much scanning to do and whether to scan across files (both are speed/compression tradeoffs).

      The only thing novel in your suggestion (novel compared to standard compression) is trying to recover (decode) those chunks of content from locations other than the archive file itself. That would probably be very, very risky -- certainly not the type of risk a company like Adobe, for example, would be willing to take on an installer for one of the most popular pieces of software in the world, and more importantly (to them), their bread & butter.

      A lot of compilers do tricks to make code faster. Not many do things to explicitly make code smaller.

      Actually, it's common to have compiler switches to optimize for either space or performance, and even feature-specific switches...

      --

      Slashdot quality declines as the number of hot grits posts decreases. - Provolt's Law, Apr-09-2005

    3. Re:How I'd do it by pthisis · · Score: 1

      A lot of compilers do tricks to make code faster. Not many do things to explicitly make code smaller.

      Actually, it's common to have compiler switches to optimize for either space or performance, and even feature-specific switches

      And a lot of the time smaller _is_ faster--getting more into icache saves costly trips to RAM. This is becoming more and more true with modern processors, though there are still obvious space/time tradeoffs in many cases.

      Sumner

      --
      rage, rage against the dying of the light
  14. Speculating... by Black+Parrot · · Score: 1


    > It seems that they optimize each application individually at thieir labs. But an average of 50% compression on already compressed binary files seems to be too good to be true. Anyone familiar with how someone may be able to achieve this?

    Maybe they're just removing the bloat. I've read on comp.risks about a guy who disassembled Windows regedit and found embedded strings and even images, which were not actually used in the application program.

    But the link is not at all clear about what they are actually doing. For that matter their basic claim about how much compression they're getting vis-a-vis ordinary methods is very vaguely worded.

    --
    Sheesh, evil *and* a jerk. -- Jade
  15. good compression on binaries by Anonymous Coward · · Score: 0

    At a previous company I worked at, we had developped a (proprietary) method of compressing x86 binaries, which yielded on average a 5:1 compression ratio, when zlib usually only yields 2:1.

  16. I have a better algorithm by Skim123 · · Score: 2, Funny

    I have a far superior algorithm in both time and space complexity. Start with 1. Then simply transform it to the requisite number of 1s and 0s, a la 1101101001001. Bah to your two-step process. :-)

    --

    I could not justify my existence if I were a turkey farmer. Would I terminate myself? Undoubtably, yes.

  17. They don't make any claims of speed, so... by Anonymous Coward · · Score: 0

    Most of the standard compressors work at the byte level, and work only with small chunks of the file, for speed reasons. Obviously, releasing the above constaints may yeild improvements on the order that they claim, but compression might take hours...

  18. Obligatory lzip mention by Strike · · Score: 1

    So, here's my bet: they probably do something like crack open a cab or zip, parse a PDF, for example, for 'magic things' that can be ignored without changing the functionality ('lossy' but nothing of significance lost), or take an HTML file and strip all spaces and newlines between tags. Similar things could be done for other file types: Removing quotes and instead, magic-quoting commas in a CDF. Etc, ad inifinitum.


    Lossy compression eh? Get LZip for all your lossy file compression needs! It can reduce your file sizes up to 100%!
  19. Statistical encoders by afroborg · · Score: 2, Insightful

    I couldn't say for sure, but it's possible that theyre just using a better coding scheme. ZIP et al use (as far as I know) variations on the LZ type compression algorithms. These are fast, but definitely not the best entropy removal methods available. Arithmetic coding OTOH is very effective, removes more entropy than LZ, LZW, or Huffman, but is slow because it needs to collect statistics on the entire file before compression. I dunno about decompression speed though Arithmetic coding is patented though, same as LZW,so not just anyone can use it. Just my $0.02...

    --
    my sig could kick your sig's arse...
    1. Re:Statistical encoders by micromoog · · Score: 1
      entropy removal

      ??? - compression introduces entropy . . . a well-ordered file (little entropy) can be heavily compressed. The compressed version has more entropy per n bits than the original file.

    2. Re:Statistical encoders by Adam+J.+Richter · · Score: 1
      I agree. Although the claims attributed to FEAD still sound much too good to be true on average, data compression has improved in the past decade or so with techniques like Prediction by Partial Match with unbuonded length, made more practical by Esko Ukkonen's algorithm (published in the early 90's) for constructing suffix trees in linear time and linear space, making it much easier to find repeating substrings, and the Burrows-Wheeler transformation (discovered in the 80's, published in the early 90's).

      I'm not an algorithms expert, so I'll not try to explain the jargon in the preceding paragraph. Instead, I'll just cop out and say that now you know what terms feed a search engine. I will, however, provide this link to bwtzip an experimental compressor covered by the GNU General Public License that uses the Burrows-Wheeler transformation, and this link page, mostly about suffix trees.

      I wish I could find it, but I recently read a paper that showed a pretty impressive comparison between some compressor that used a Prediction by Partial Match variant and arithmetic coding (probably not truely free, due to software patents on arithmetic coding) versus gzip and some other compressors.

    3. Re:Statistical encoders by Trolling4Dollars · · Score: 1

      Welcome to my friends list. (I like your SIG)

    4. Re:Statistical encoders by pthisis · · Score: 1

      I will, however, provide this link to bwtzip an experimental compressor covered by the GNU General Public License that uses the Burrows-Wheeler transformation

      bzip2 (included in most Linux distributions and well past the experimental phase) also uses a Burrows-Wheeler transform (with Huffman coding).

      bzip (not 2) used BW with arithmetic coding but was withdrawn because of potential patent problems with that combination (the bzip ari coding wasn't LZW and no concrete patent on bzip's ari coding was known, but there are enough different patents on various ari coding implementations that changing to Huffman was thought prudent).

      Sumner

      --
      rage, rage against the dying of the light
  20. Spaces/newlines in HTML add up QUICKLY by Anonymous Coward · · Score: 0
    or take an HTML file and strip all spaces and newlines between tags
    You say this almost as if it's a bad thing, or as if the newlines in HTML files are inconsequential and should be ignored. Let me counter with an example.

    My company uses freelance designers to create HTML templates for the sites we build and operate. The designers we work with typically do their graphics work first, then cut the graphics into pieces and build HTML-ized versions of the layout with WYSIWYG HTML editors such as Dreamweaver or GoLive. Having just recently realized the number of tabs, newlines, and spaces that the average WYSIWYG editor inserts into HTML documents, I've started a crusade to begin optimizing our sites one by one.

    So far, I've only "de-bloated" one of our sites. I was able to cut the size of the index.html from 14,719 bytes down to 9,252 bytes just by eliminating unnecessary spaces, tabs, and newlines. This particular template was built using Dreamweaver, and the designer apparently has his Dreamweaver preferences set to indent HTML. In terms of filesize, that translates to a great deal of wasted bytes (spaces used for indentation) in just about every "nested" HTML element there is, especially tables.

    Considering we get approximately 10,000 hits to this particular site each day, the savings add up. The initial 14,719 bytes minus the optimized 9,252 bytes means 5,467 fewer bytes per pageload which were comprised entirely of spaces, tabs, and newlines - junk as far as any browser is concerned. At 10K pageloads, that means more than 50 megs of saved bandwidth per day for this site alone. And that's just the index.

    We run more than 100 sites; by the time I get done stripping extraneous whitespace out of all of them, I seriously expect our bandwidth to be cut in half. If you're running a homepage on Geocities, sure, who cares... But when you're running a dedicated server and doing 100+ gigs a month of transfer, stop and think about how many of those gigs are useless. Spaces, tabs, newlines which are invisible to your visitors' browsers.

    I'd be willing to bet that the majority of websites on the internet could reduce their monthly bandwidth consumption by 25% or more if they'd remove unnecessary whitespace from the HTML files they're serving. Don't underestimate the waste that's taking place!
    1. Re:Spaces/newlines in HTML add up QUICKLY by Elm+Tree · · Score: 1

      You think that's bad? Someone on my office gave me an HTML documen generated from excel. It started off at 640k. With some simple editing, removing whitespace, newlines, etc. I was able to get it to 200k. With some more advanced techniques I got it down to 68k. It's now 1/10th the original size. Moral of the story: Whitespace is bad.

    2. Re:Spaces/newlines in HTML add up QUICKLY by 1u3hr · · Score: 1
      Moral of the story: Whitespace is bad.

      My moral is that HTML from MSOffice is bloated like crazy and sucks badly in many other ways. I made a nice FAQ file, about 100k total, with some very simple CSS as the only styling. Another guy uses Word to edit it, it goes to 180k, and is so full of and nested and other tags that it's impossible to edit source any more.

  21. is UPX the same thing? by shish · · Score: 1

    AFAIK upx does pretty much the same thing, for free

    http://upx.sourceforge.net

    it generally gets 50-75% too. IIRC it make a really fast (faster than a HD read) decompressor prepended to a compressed program.

    The only thing I can think of to do better is to actually rearrange the binary in a more efficient order / go in at the assembler level and replace any repeated 5 instruction or more sequence with a function call.

    --
    I mod down anyone who says "I will be modded down for this", regardless of the rest of their comment