2026-06-14

I indexed 669 GB of my GoPro videos using my M1 Max computer and local ML models

Post format and access

Several commenters say this should have been a “Show HN” and note difficulty editing the title.
One person reports the main site briefly returning Cloudflare errors; an archive link is shared, later the site works again.

Use cases and extensions

Many are excited about local, open pipelines for organizing large personal media collections (videos, photos, documents).
A recurring question: could the same approach be used to index porn collections; people discuss model safety filters, abliterated models, LoRA finetuning, and how easy it is to bypass content restrictions with multi-turn prompting.
Others focus on family content: hopes for automatic “memories” and year-in-review compilations, combining photos, videos, and music.

Technical pipeline and models

Core flow (as discussed): extract scenes at ~1 fps, downscale frames (e.g., 720p), run face/object/text detection, transcription (Whisper), and visual description (Qwen2.5-VL variants).
Outputs go into a vector DB plus SQL for semantic search, RAG, and querying by text, screenshot, or audio.
One user notes Whisper can hallucinate when fed non-speech (e.g., moaning, slapping); another suggests Parakeet-style models that filter non-voice sounds.
Some want true video-clip embeddings, not just frame-level, to better capture actions.

Hardware performance and acceleration

Discussion compares M1 Max to 11th gen i9 and Snapdragon X Elite: similar CPU scores, but Apple’s unified memory and bandwidth (and local “AI accelerator”) are seen as major advantages for these workloads.
RTX GPUs (e.g., 3060, 5090) are expected to be significantly faster than M1 Max for indexing.
People suggest pay-as-you-go GPU providers (Runpod, vast.ai) to speed up large jobs while keeping models local-ish.

Existing tools and integration

DaVinci Resolve Studio and Adobe Premiere are mentioned as having built-in or cloud-based AI indexing; DaVinci’s AI runs locally but reportedly lacks full face tagging.
Third-party tools like Jumper, Immich, and other local video-indexing projects are suggested, some with NLE integrations and APIs.
There’s interest in containerized GPU access on Apple Silicon (podman + Mesa, vLLM-metal via Docker).

Skepticism, usability, and alternatives

Some question the example highlight reels as underwhelming given the volume of footage, wondering if the tech is mature enough.
A contrasting “simple” workflow is proposed: use GoPro’s built-in “HiLight Tag” while recording, then manually cut those marked segments later.
Others argue that while manual tagging is simpler, the ML pipeline enables retroactive search, multi-modal queries, and broader use cases beyond highlights.

Related topics