| Registered User
Join Date: Oct 2001 Location: NC in the US
Posts: 3,732
| » 
The Compression Test
Comparison of Compression Libraries:
Friday, June 06, 2003
By Baird Hendrix
Alias Redwolf
There are numerous kinds of compression algorithms available to use on the Internet. Many are available on the Internet for free use; others are commercial applications that require the user to pay to use. Lately, I've seen a variety of compression schemes used on the Internet; each of their users claiming theirs is the 'best'. I propose to settle that argument. I won't bore you with talking about the LZ77/LZ78 compression and Burrows-Wheeler block-sorting text compression algorithm. I'll just tell you what works, plus where these formats are used.
In this article, I will compare the following compression schemes:
ZIP -
BZIP2-
GZIP-
RAR-
ACE-
JAR
ARJ
CAB (Microsoft Cabinets)
SITX (StuffIt)
LHA/LHZ
BH
To test the archivers, I will compress the following:
1) 13,255,761 bytes of text files (1824 text files)
2) Uncompressed DOA Beach Volleyball trailer, weighing in at 1,265,647 KB
3) Compressed DOA Beach Volleyball trailer, weighing in at 35,690 KB
4) The Descent® program directory, with the LOTW set included. (31.9 megs, 1994 game, 1.4 patch)
5) The Starcraft:Broodwars Directory, stock, upgraded to the latest version (1.1 I believe, 116,135 KB)
Size of TAR files:
1)14,350 KB
4)33,060 KB
5)116,280 KB
Programs used:
WinRAR 3.20
WinACE v2.2
PowerArchiver 2002
ARJ 2.81a
UC2R3
BZIP 1.0.2
No archives I save the full directory structure on, only the relative one. I used the maximum compression, if applicable.
The results:
Since ZIP is there de facto standard, and, as I hypothesize, the weakest, I will use ZIP as the control group. Here are the results:
Note that the ratio means the size of the compressed file vs. the size of the uncompressed file, in percentage. Like if a 2 KB file compressed to 1 KB, the ratio is 50%, or 50% of the original file.
Ratios according to WinRAR, lower is better
ZIP - Compressed with PowerArchiver
Text files - 3,898 KB Ratio 28%
Dead or Alive Uncompressed - 605,728 KB Ratio 47%
Dead or Alive Compressed - 34,600 KB Ratio 96%
Descent - 12,931 KB Ratio 39%
Starcraft - 112,314 KB Ratio 96%
Now that we have the control, it's time to test the rest of the compression schemes. We'll start with the TAR ones first.
BZIP - Compressed with PowerArchiver
Text Files - 3,233 KB Ratio 24%
ZIP - 3,898 KB Ratio 28%
Dead or Alive Uncompressed - 553,827 KB Ratio 43%
ZIP - 605,728 KB Ratio 47%
Dead or Alive Compressed - 34,733 KB Ratio 97%
ZIP - 34,600 KB Ratio 96%
Descent - 13,440 KB Ratio 41%
ZIP - 12,931 KB Ratio 39%
Starcraft - 113,423 KB Ratio 97%
ZIP - 112,314 KB Ratio 96%
Well, that was disappointing. Not only were the gains, if any, minimal, but the compression took much longer than ZIP compression (at least twice as long). The text files compressed better, however.
GZIP - Compressed with PowerArchiver
Text Files - 3,489 KB Ratio 26%
ZIP - 3,898 KB Ratio 28%
Dead or Alive Uncompressed - 607,419 KB Ratio 47%
ZIP - 605,728 KB Ratio 47%
Dead or Alive Compressed - 34,601 KB Ratio 96%
ZIP - 34,600 KB Ratio 96%
Descent - 13,084 KB Ratio 40%
ZIP - 12,931 KB Ratio 39%
Starcraft - 112,361 KB Ratio 96%
ZIP - 112,314 KB Ratio 96%
GZIP fared much better than BZIP2. Why? The compression is roughly equal to that of ZIP, and so is the amount of time the compression takes. For some reason, the DOA Uncompressed test still took twice as long as the ZIP compression did. I'm thinking I may want to retry the BZIP and GZIP tests in Linux, just to see.
CAB - Compressed with PowerArchiver
Text Files - 2,727 KB Ratio 21%
ZIP - 3,898 KB Ratio 28%
Dead or Alive Uncompressed - 431,687 KB Ratio 34%
ZIP - 605,728 KB Ratio 47%
Dead or Alive Compressed - 34,226 KB Ratio 95%
ZIP - 34,600 KB Ratio 96%
Descent - 11,071 KB Ratio 33%
ZIP - 12,931 KB Ratio 39%
Starcraft - 102,801 KB Ratio 92%
ZIP - 112,314 KB Ratio 96%
First thing I notice off the bat is that the time it takes to compress a CAB file on the maximum setting is much, much longer than that for a ZIP file. Perhaps I should have timed these. However, the compression has been the best so far. the DOA-Uncompressed one gave the most radical results. CAB also set the new barrier for the DOA-Compressed test of 95%.
RAR - Compressed with WinRAR
All are solid archives, but the DOA-uncompressed one, since that seemed to double the time, and really makes no difference for one file.
Text Files - 2,591 KB Ratio 19%
ZIP - 3,898 KB Ratio 28%
Dead or Alive Uncompressed - 292,292 KB Ratio 23%
ZIP - 605,728 KB Ratio 47%
Dead or Alive Compressed - 34,534 KB Ratio 96%
ZIP - 34,600 KB Ratio 96%
Descent - 11,005 KB Ratio 33%
ZIP - 12,931 KB Ratio 39%
Starcraft - 107,779 KB Ratio 92%
ZIP - 112,314 KB Ratio 96%
Compression here is roughly the same as the CAB compression, except for the Text test and the DOA-Uncompressed test on which RAR did very well. The DOA uncompressed test was quite surprising, since I didn't expect so much compression. However, since WinRAR is shareware, I must say that I give kudos to the CAB compression, although the RAR compression is faster.
ACE - Compressed with WinACE
Text Files - 3,117 KB Ratio 23%
ZIP - 3,898 KB Ratio 28%
Dead or Alive Uncompressed - 459,025 KB Ratio 36%
ZIP - 605,728 KB Ratio 47%
Dead or Alive Compressed - 34,713 KB Ratio 97%
ZIP - 34,600 KB Ratio 96%
Descent - 11,201 KB Ratio 34%
ZIP - 12,931 KB Ratio 39%
Starcraft - 108,386 KB Ratio 93%
ZIP - 112,314 KB Ratio 96%
First thing you should know: Do NOT try to create more than one ACE archive at once. Big mistake. You WILL crash you machine. Kids, don't do more than one ACE.
Now for the results. Here's a pop quiz for all of you. What is an overpriced, resource-hogging compression scheme that bested ZIP, but then got bested by both the RAR and CAB compressions? If you answered ACE compression, you’d be right. Next!
SITX - Compressed with StuffIt
NOTE: For the sake of seeing how it goes, I used the default setting 'Best Binary Compression' for all but the Text test, of which I used 'Best Text Compression'. Also, WinRAR doesn't recognize .SITX files, so the Ratio is calculated manually here, against an uncompressed .tar file. (On a TI-36X, if you want to be picky)
Text Files - 2,540 KB Ratio 18%
ZIP - 3,898 KB Ratio 28%
Dead or Alive Uncompressed - 414,630 KB Ratio 33%
ZIP - 605,728 KB Ratio 47%
Dead or Alive Compressed - 34,350 KB Ratio 96%
ZIP - 34,600 KB Ratio 96%
Descent - 10,254 KB Ratio 31%
ZIP - 12,931 KB Ratio 39%
Starcraft - 109,031 KB Ratio 94%
ZIP - 112,314 KB Ratio 96%
Another pop quiz for you still reading. What compression scheme bested RAR in two categories, lost in two, and yet was STILL unable to break 95% on the DOA-Compressed test? If you answered LZH or ACE, then you're either blind and can't read this anyway, or illiterate, and, well, can't read this anyway. Which prompts the paradox...how did you know to say ANYTHING in the first place? If you answered StuffIt, Johnny has your prize waiting in the back room. StuffIt scored the best in our Text compression so far, probably because there was a setting for optimal text compression. StuffIt took absolutely FOREVER on the DOA-Uncompressed test. Yes, I should have timed these.
LHA/LHZ - Compressed with PowerArchiver
Text Files - 3,879 KB Ratio 28%
ZIP - 3,898 KB Ratio 28%
Dead or Alive Uncompressed - 608,794 KB Ratio 48%
ZIP - 605,728 KB Ratio 47%
Dead or Alive Compressed - 34,571 KB Ratio 97%
ZIP - 34,600 KB Ratio 96%
Descent - 13,290 KB Ratio 40%
ZIP - 12,931 KB Ratio 39%
Starcraft - 113,688 KB Ratio 97%
ZIP - 112,314 KB Ratio 96%
Well, this scheme turned out to be another disappointment. It beat ZIP compression at only one category, the text compression. To recap, so far CAB and RAR compression schemes have turned out to be the best.
BH - Compressed with PowerArchiver
Text Files - 3,804 KB Ratio 26%
ZIP - 3,898 KB Ratio 28%
Dead or Alive Uncompressed - 606,266 KB Ratio 47%
ZIP - 605,728 KB Ratio 47%
Dead or Alive Compressed - 34,600 KB Ratio 96%
ZIP - 34,600 KB Ratio 96%
Descent - 12,959 KB Ratio 39%
ZIP - 12,931 KB Ratio 39%
Starcraft - 112,303 KB Ratio 96%
ZIP - 112,314 KB Ratio 96%
No, I've never heard of BH either. Apparently, it's only about as good as ZIP compression. Only the text managed to differ from ZIP by more than 1%. Even then times were about the same.
JAR - Compressed with WinACE
Text Files - 3,912 KB Ratio 28%
ZIP - 3,898 KB Ratio 28%
Dead or Alive Uncompressed - 608,914 KB Ratio 48%
ZIP - 605,728 KB Ratio 47%
Dead or Alive Compressed - 34,556 KB Ratio 96%
ZIP - 34,600 KB Ratio 96%
Descent - 12,989 KB Ratio 39%
ZIP - 12,931 KB Ratio 39%
Starcraft - 112,291 KB Ratio 96%
ZIP - 112,314 KB Ratio 96%
The last compression option in WinACE, and so we have to try it for variety's sake. It's just another word for ZIP pretty much, except for the DOA-Uncompressed test.
ARJ - Compressed with ARJ (on -jm1 setting)
Text Files - 3,844 KB Ratio 29%
ZIP - 3,898 KB Ratio 28%
Dead or Alive Uncompressed - 604,991 KB Ratio 47%
ZIP - 605,728 KB Ratio 47%
Dead or Alive Compressed - 34,622 KB Ratio 97%
ZIP - 34,600 KB Ratio 96%
Descent - 12,944 KB Ratio 39%
ZIP - 12,931 KB Ratio 39%
Starcraft - 113,571 KB Ratio 97%
ZIP - 112,314 KB Ratio 96%
I'm amazed with the speed of this compression. It compressed the descent test at the same ratio in half the time as it took the Zip. The text test compressed very quickly. The only test where there was no real speed improvement was the DOA-Uncompressed test. A great alternative to ZIP for us on the go.
BZIP2 Unix test - Compressed with BZIP2 1.0.2 (-zv9 option used)
Text Files - 2,937 KB Ratio 20%
ZIP - 3,898 KB Ratio 28%
Dead or Alive Uncompressed - 456,460 KB Ratio 36%
ZIP - 605,728 KB Ratio 47%
Dead or Alive Compressed - 34,358 KB Ratio 96%
ZIP - 34,600 KB Ratio 96%
Descent - 12,283 KB Ratio 37%
ZIP - 12,931 KB Ratio 39%
Starcraft - 110,815 KB Ratio 95%
ZIP - 112,314 KB Ratio 96%
As I thought. BZIP2 is actually better under BZIP2 and not under PowerArchiver. This significantly upgrades the rating. It's now mediocre to pretty good.
The winner:
Best Compression:
-=~{RAR}~=-
Hooray for RAR! The best compression in the bunch. Runners up are CAB and StuffIt. CAB is the best 'free' option.
Hope you've enjoyed it. I promise, any future tests WILL have timings in it. Thanks for reading!
-edit- Spelling fixed.
Last edited by Redwolf; 06-10-2003 at 07:17 AM.
|