How Image Compression Actually Works: A Technical Deep Dive
Every time you compress an image, complex mathematical transformations happen behind the scenes. Understanding these processes helps you make smarter decisions about quality settings and format choices.
1. The Two Fundamental Approaches
All image compression falls into two categories:
Lossy Compression
Permanently discards data the human eye is unlikely to notice. Achieves 60-95% reductions but cannot be reversed. Used by JPEG and lossy WebP.
Lossless Compression
Finds more efficient ways to represent the same data without losing a single bit. More modest 20-50% reductions but perfectly reversible. Used by PNG and lossless WebP.
The choice between lossy and lossless isn't about “good vs. bad” — it's about matching the compression strategy to your content type. Photographs tolerate lossy compression beautifully; graphics with sharp text edges do not.
2. Inside Lossy Compression: The JPEG Pipeline
JPEG compression is a masterclass in psychovisual engineering — exploiting the known limitations of human vision to discard data we won't miss.
Step 1: Color Space Conversion (RGB → YCbCr)
Your camera captures in RGB, but JPEG first converts to the YCbCr color space:
- Y (Luminance): Brightness — what your eyes are most sensitive to
- Cb (Blue Chrominance): Blue-ness relative to brightness
- Cr (Red Chrominance): Red-ness relative to brightness
Human vision is far more sensitive to brightness changes than color changes. By separating these channels, JPEG compresses color data much more aggressively. This single insight is responsible for roughly 50% of JPEG's compression efficiency.
Step 2: Chroma Subsampling
JPEG then applies chroma subsampling — reducing color channel resolution while keeping brightness at full resolution.
| Pattern | Color Resolution | Reduction | Best For |
|---|---|---|---|
| 4:4:4 | Full | 0% | Graphics, sharp edges |
| 4:2:2 | Half horizontal | 33% | High-quality photography |
| 4:2:0 | Quarter | 50% | Web photos (most common) |
Step 3: Discrete Cosine Transform (DCT)
The image is divided into 8×8 pixel blocks, and each block is transformed from spatial data (pixel positions) into frequency data (patterns of varying detail).
Think of it like audio: a file contains bass notes (low frequency) and high-pitched details (high frequency). An image block similarly contains gradual gradients (low frequency) and fine edges (high frequency).
The DCT converts each 8×8 block into 64 frequency coefficients. The top-left represents the average color (DC component); bottom-right represents fine details (AC components).
Why 8×8 Blocks?
Smaller blocks reduce compression efficiency. Larger blocks increase computational cost without proportional quality gains. The 8×8 size was standardized in 1992 and remains optimal.
Step 4: Quantization — The Lossy Step
Each DCT coefficient is divided by a value from a quantization matrix, then rounded. The matrix has small divisors for important low-frequency components and large divisors for high-frequency details — many become zero, discarding fine detail entirely.
The “quality slider” controls this quantization aggressiveness. Quality 100: almost nothing lost. Quality 20: most detail removed — visible “blocky” artifacts appear.
Step 5: Entropy Coding
The final step applies lossless packing:
- Run-Length Encoding: Consecutive zeros stored as a single count
- Huffman Coding: Frequent values get shorter binary codes
- Zigzag Scanning: Diagonal reading groups zeros together for better RLE
3. Inside Lossless Compression: The PNG Pipeline
Stage 1: Predictive Filtering
PNG stores the difference between each pixel and a predicted value based on neighbors. Five filter types — None, Sub, Up, Average, Paeth — are tested per row. For images with similar-color regions, differences are mostly zero, enabling excellent compression.
Stage 2: DEFLATE Compression
Filtered data is compressed using DEFLATE (same as ZIP files):
- LZ77: Finds repeated byte sequences, replaces with back-references
- Huffman Coding: Shorter codes for frequent symbols
Why PNG Files Vary So Much
A 500×500 screenshot with 10 colors might be 15KB as PNG but 800KB as JPEG. A 500×500 photo might be 1.2MB as PNG but 80KB as JPEG. Efficiency depends on how predictable adjacent pixels are.
4. The Modern Era: WebP
Google's WebP borrows from video compression (VP8/VP9) to surpass both JPEG and PNG:
- Variable block sizes: 4×4 to 16×16, adapting to content
- Intra-frame prediction: Predicts blocks from already-decoded neighbors
- Arithmetic coding: More efficient than Huffman (extra 5-10% reduction)
- Loop filtering: Smooths block boundaries to reduce visible artifacts
Result: files 25-35% smaller than JPEG at equivalent quality, and 26% smaller than PNG losslessly.
5. Bringing It to the Browser: WebAssembly
WebAssembly enables near-native performance directly in browsers, eliminating the need for server-side processing.
Native Speed
Wasm runs at 80-95% of native C/C++ speed. Libraries like MozJPEG and cwebp are compiled to Wasm for browser execution.
Privacy by Architecture
All processing happens in the browser's sandboxed runtime. Image data never touches a server — an architectural guarantee.
When you drop an image onto InstaShrink:
- File API: Browser reads the image locally — no network request
- Canvas Decoding: Browser converts file into raw pixel data
- Wasm Compression: Full DCT/quantization/entropy pipeline runs locally
- Blob Output: Compressed file assembled in browser memory
- Download: File offered via generated object URL — no network
6. Practical Takeaways
Quality 80 ≠ 80% of original quality
Settings aren't linear. 95→80 saves ~60% with barely visible changes. The “sweet spot” for web images is 75-85.
Never re-compress a JPEG
Each cycle runs quantization on degraded data, compounding artifacts (“generation loss”). Always compress from the original.
Format choice matters more than compression level
The right format at moderate compression always outperforms the wrong format at maximum compression.