The $5000 Compression Challenge (2001)
Challenge rules vs “spirit”
- Many argue the challenger met the written conditions (total size of compressed data + decompressor smaller than original), even if no “real” compression occurred.
- Others say the solution abuses the filesystem to store information and violates the intended spirit of “one file in, smaller file + decompressor out.”
- Debate centers on: allowing multiple files, whether filesystem metadata counts as data, and whether ordering via filenames is forbidden by the relevant FAQ.
- Some feel that if multiple files weren’t allowed, the challenge host should have said so; changing standards afterward is seen as moving the goalposts.
Information theory and random data
- Multiple comments restate that truly random data is, in general, incompressible on average (pigeonhole principle, entropy, Kolmogorov complexity).
- Others explore “weak” random files: extremely rare random instances with accidental structure that could be hand‑compressed, in principle.
- There is discussion of probabilities: for realistic file sizes and decompressor overhead, chances of net savings on random data are essentially zero at 50:1 odds.
Loopholes and environment tricks
- Numerous hypothetical “solutions” are proposed and critiqued:
- Using hashes and /dev/random to “re-find” the file.
- Embedding data in seeds of PRNGs, polynomials, or OS files, or downloading the original over the network.
- Relying on file ordering, file sizes, tar metadata, or package managers as hidden side channels.
- Consensus: once you forbid any out‑of‑band entropy (filesystem, network, OS state), the challenge reduces to the standard impossibility result.
Bet odds and game theory
- Some suggest exploiting rare compressible instances across many attempts; others show that decompressor size and address encoding erase any gain.
- A few try to estimate whether any pattern‑based scheme could beat 50:1 odds; most technical replies say no, barring RNG flaws.
Compression, intelligence, and broader theory
- The thread connects to the Hutter Prize and arguments over whether intelligence is closer to lossless or lossy compression.
- There’s extended discussion of Kolmogorov complexity, UTMs, and the subjectivity of “complexity,” with skepticism about using it as an absolute practical measure.
Meta and ethics
- Some see the challenger’s rule‑lawyering as clever and deserved comeuppance for an overconfident host charging $100 per try.
- Others think targeting a volunteer FAQ maintainer with legalistic exploits is corrosive and discourages future public challenges.