2025-05-26

Lossless video compression using Bloom filters

What the project is about

Confusion initially about whether this recompresses existing YouTube/H.264 video or targets raw/new video; multiple commenters conclude it’s conceptually an alternative codec / entropy-encoding stage, operating on frame deltas.
The author later clarifies it’s an experiment in using rational Bloom filters for (eventually) lossless video compression, not a practical production codec.

Core idea and algorithm

Represent changes between consecutive frames as a bitmap: 1 if the pixel changed, 0 otherwise.
Insert positions of changed pixels into a Bloom filter; then, for all positions that test positive, store the corresponding pixel color values (including some false positives).
This effectively stores “(x,y,r,g,b) for changed pixels” but compresses the coordinate part via the Bloom filter while accepting some over-stored pixels.
Commenters note this is general “diff between two bitstrings” compression, not video-specific, and lacks motion estimation and other standard video tricks.

Losslessness and correctness concerns

Several people point out code paths that discard small color differences (e.g., thresholding on mean RGB changes), making the current implementation lossy despite the “lossless” framing.
Others highlight color-space conversion (YUV↔BGR) introduces rounding error; the author acknowledges this and states a goal of bit-exact YUV handling and mathematically provable losslessness.
There’s a clear distinction drawn between the Bloom-based sparsity trick and the rational Bloom filter innovation (variable k to reduce false positives).

Compression performance and comparisons

A graph in the repo reportedly shows the Bloom approach consistently worse than gzip on sparse binary strings; commenters note this undercuts the core claim.
In later raw-video tests, the author reports: ~4.8% of original size vs JPEG2000 (3.7%), FFV1 (36.5%), H.265 (9.2% lossy), H.264 (0.3% lossy), with PSNR ~31 dB and modest fps. Others note the method is still lossy, so comparisons to lossless codecs are ambiguous.

Skepticism about efficiency and modeling

Multiple commenters argue hashing pixel positions destroys spatial locality that real codecs exploit (blocks, motion, clustered changes), so this is structurally disadvantaged.
Some state that for sparse binary data, conventional schemes (run-length, arithmetic coding, better filters like fuse/ribbon) should dominate.
Others question the motivation versus simply layering a sparse “correction mask” on top of existing near-lossless codecs.

Potential advantages and niches

A few speculate Bloom-based lookup might be embarrassingly parallel (even GPU-friendly), though others counter that the specific decoding loop is inherently serial.
Suggested that if it ever shines, it might be on very static or synthetic content (screen recordings, animation) where frame differences are extremely sparse.
Overall sentiment: technically interesting Bloom-filter experiment, unlikely yet to compete with mature codecs, but worth exploring as a research toy.

Related topics