JPEG Decompressors

back

Michael Herf and Shawn Brenneman
December 2001

If you want to decompress JPEGs really fast, you have a few free options to choose from.

I was hooking up Intel's JPEG library, and on the image I was using, I noticed a significant artifact. Here's the sample I sent to Intel:

Image Copyright 2000 Phil Askey. Please see www.dpreview.com for the original image.
I'm assuming that IE5 uses the decompressor from the Independent JPEG Group, since they have a credit for it in their about box.

In any case, through my friend at Intel I got this response:

By default, the IJL uses IJL_BOX_FILTER upsampling method. To get better quality you can use IJL_TRIANGLE_FILTER method. Note, IJL_TRIANGLE_FILTER is slower than IJL_BOX_FILTER upsampling.
Fair enough.

I enabled IJL_TRIANGLE_FILTER, but then, I decided I should really do a more comprehensive test of JPEG decompression speed.

The Benchmark

For this benchmark, I'm only using the image above (a quite large 2160x1440 image). If your needs lean towards small image compression, you shouldn't rely on these timings at all. But if you're decompressing large files, say video-resolution or greater, these numbers should be quite applicable.

Also, we tested with just one picture. All timings were done on an Intel P3/600EB, etc. Mileage, as usual, will vary greatly by CPU and memory subsystem.

OS: Win2k.
Filesystem: NTFS.
IE: IE5.5 SP2 (relevant because OleLoadPicture uses IE components to decompress, I think.)

Decompressors

I used four decompressors for this test.
  1. Windows OleLoadPicture (See the codeproject article on this, since MSDN is such a disaster these days), adapted for a DIBSection,
  2. IJG 6b, the Independent JPEG Group decompressor, and
  3. IJL 1.0, because lots of people use it, and I wanted to see how much faster it's gotten,
  4. IJL: the Intel JPEG Library version 1.51. (Note: 1.51 is very important to get, since they had lots of optimizations turned off in 1.5.

If you're writing code where you care about size, the overhead for using each of these is as follows:
  1. OleLoadPicture: 8K + Internet Explorer
  2. IJG 6b: 40-50K statically linked
  3. IJL10.dll: 148K DLL only.
  4. IJL15l.lib: 292K statically linked
  5. IJL15.dll: 344K DLL.

Off to the races

All timings include the following:
  1. Initializing the decompressor (with DLLs already linked),
  2. Reading a file from disk (with a warm cache),
  3. Parsing the JPEG to determine its size,
  4. Allocating a 32-bit ARGB image using either CreateDIBSection or malloc,
  5. Decompressing the image into the 32-bit space directly, OR decompressing to 24 bits and making a second pass to expand.

32 Bit?

Not everybody uses 32-bit images, but it's quite important to me. I think the overhead for the 24->32 bit conversion is relatively similar between the compressors, so I haven't hurt anyone in particular by doing this.

The decompressors all required different techniques for the 24->32 bit conversion.

In particular, the most expensive conversion was an in-place 24->32 bit conversion, which took about 100ms to complete. This required a full pass through the image after it had decompressed, which was definitely out-of-cache. All of the IJL timings have this overhead. IJL does have its own 32-bit modes, but they were considerably slower, so we rolled our own.

Windows can draw directly to an ARGB bitmap, so we let it do that. IJG gives each scanline back in a separate buffer, so we did the conversions on the fly (probably faster, because the cache locality is just so much better.)

And the winner...

Decompressor Speed (ms) Comments
IJL 1.51 with box filter364 msHas box filter artifacts
IJL 1.0580 msHas box filter artifacts, other bugs
IJG 6b with JDC_IFAST615 msLower quality
IJG 6b with JDC_ISLOW680 msGood quality
IJL 1.51 with triangle filter1010 msGood quality
OleReadPicture1126 msGood quality
IJG 6b with JDC_FLOAT1270 msGood quality

Wow.

Comments and Recommendations

Well, if you want a high-quality JPEG loader that's fast on mid-range machines, use the Independent JPEG Group's. Surprised?

Well, I am. Actually, this is doubly surprising, because in my profile, IJG's _h2v1_fancy_upsample (the "triangle filter" that takes Intel so long), only uses about 50ms total for IJG. Go figure? Maybe Intel just doesn't optimize that path much at all.

Of course, the speed of the new IJL 1.51 is truly phenomenal, especially when you consider that our 24->32 bit Unpack function is using 100ms on top of the decompress. So when you want speed at all costs, or for video applications, definitely use IJL 1.51.

Intel, however, does cost a lot in terms of code size. Also, they have a Clause of Evil in their license agreement:

"Upon Intel's release of an update, upgrade or new version of the Materials, you will make reasonable efforts to discontinue distribution of the enclosed Materials and you will make reasonable efforts to distribute such updates, upgrades or new versions to your customers who have received the Materials herein."
Well, it makes sense. They want you to distribute bug-free code (though their code still leaks about 96 bytes per JPEG.)

But why is this bad?

Well...

IJL10.DLL was 148KB.
IJL15.DLL is 344KB.

And you're obligated to upgrade all your users at your earliest convenience. So if you care about code size at all, you're at Intel's whim on this one. I'm not complaining about the performance, but it could be a nasty slide to fall down.

But, hmm, I guess VJPEG will be using IJG for a while.