Lossless video compression using Bloom filters

What the project is about

  • Confusion initially about whether this recompresses existing YouTube/H.264 video or targets raw/new video; multiple commenters conclude it’s conceptually an alternative codec / entropy-encoding stage, operating on frame deltas.
  • The author later clarifies it’s an experiment in using rational Bloom filters for (eventually) lossless video compression, not a practical production codec.

Core idea and algorithm

  • Represent changes between consecutive frames as a bitmap: 1 if the pixel changed, 0 otherwise.
  • Insert positions of changed pixels into a Bloom filter; then, for all positions that test positive, store the corresponding pixel color values (including some false positives).
  • This effectively stores “(x,y,r,g,b) for changed pixels” but compresses the coordinate part via the Bloom filter while accepting some over-stored pixels.
  • Commenters note this is general “diff between two bitstrings” compression, not video-specific, and lacks motion estimation and other standard video tricks.

Losslessness and correctness concerns

  • Several people point out code paths that discard small color differences (e.g., thresholding on mean RGB changes), making the current implementation lossy despite the “lossless” framing.
  • Others highlight color-space conversion (YUV↔BGR) introduces rounding error; the author acknowledges this and states a goal of bit-exact YUV handling and mathematically provable losslessness.
  • There’s a clear distinction drawn between the Bloom-based sparsity trick and the rational Bloom filter innovation (variable k to reduce false positives).

Compression performance and comparisons

  • A graph in the repo reportedly shows the Bloom approach consistently worse than gzip on sparse binary strings; commenters note this undercuts the core claim.
  • In later raw-video tests, the author reports: ~4.8% of original size vs JPEG2000 (3.7%), FFV1 (36.5%), H.265 (9.2% lossy), H.264 (0.3% lossy), with PSNR ~31 dB and modest fps. Others note the method is still lossy, so comparisons to lossless codecs are ambiguous.

Skepticism about efficiency and modeling

  • Multiple commenters argue hashing pixel positions destroys spatial locality that real codecs exploit (blocks, motion, clustered changes), so this is structurally disadvantaged.
  • Some state that for sparse binary data, conventional schemes (run-length, arithmetic coding, better filters like fuse/ribbon) should dominate.
  • Others question the motivation versus simply layering a sparse “correction mask” on top of existing near-lossless codecs.

Potential advantages and niches

  • A few speculate Bloom-based lookup might be embarrassingly parallel (even GPU-friendly), though others counter that the specific decoding loop is inherently serial.
  • Suggested that if it ever shines, it might be on very static or synthetic content (screen recordings, animation) where frame differences are extremely sparse.
  • Overall sentiment: technically interesting Bloom-filter experiment, unlikely yet to compete with mature codecs, but worth exploring as a research toy.