MidManStudio | Standard compression research benchmark suite
Build: #281
Branch: master
Commit: e17cd93d
Date: Fri Mar 13 17:04:51 UTC 2026
The Canterbury Corpus is the standard benchmark dataset used in academic
compression research since 1997. It covers a representative spread of real-world file
types — English prose, source code, HTML, binary data, executables, and spreadsheets.
Results on this corpus are directly comparable to published compression research.
Lower % = better compression. C = compress ms,
D = decompress ms.
All MBFA roundtrips must pass for results to be valid.
Files tested
—
Canterbury corpus
MBFA wins
—
best ratio per file
Roundtrips
—
compress → decompress
Avg MBFA ratio
—
across all files
vs gzip avg
—
negative = MBFA wins
vs zstd avg
—
negative = MBFA wins
Avg MBFA C time
—
compress ms
Avg MBFA D time
—
decompress ms
Results by File
MBFA
gzip
zstd
xz
brotli
zpaq
C = compress ms
D = decompress ms
🥇 = best ratio for that file
💡 Click any row for breakdown, timing detail, and fold pipeline