← Main Benchmarks

📚 MBFA — Canterbury Corpus Benchmark

MidManStudio | Standard compression research benchmark suite

Build: #281
Branch: master
Commit: e17cd93d
Date: Fri Mar 13 17:04:51 UTC 2026
The Canterbury Corpus is the standard benchmark dataset used in academic compression research since 1997. It covers a representative spread of real-world file types — English prose, source code, HTML, binary data, executables, and spreadsheets. Results on this corpus are directly comparable to published compression research. Lower % = better compression. C = compress ms, D = decompress ms. All MBFA roundtrips must pass for results to be valid.
Files tested
Canterbury corpus
MBFA wins
best ratio per file
Roundtrips
compress → decompress
Avg MBFA ratio
across all files
vs gzip avg
negative = MBFA wins
vs zstd avg
negative = MBFA wins
Avg MBFA C time
compress ms
Avg MBFA D time
decompress ms
Results by File
MBFA
gzip
zstd
xz
brotli
zpaq
C = compress ms   D = decompress ms   🥇 = best ratio for that file

💡 Click any row for breakdown, timing detail, and fold pipeline

File Size MBFA gzip zstd xz brotli zpaq Rank Bars