I indexed 669 GB of my GoPro videos using my M1 Max computer and local ML models

Post format and access

  • Several commenters say this should have been a “Show HN” and note difficulty editing the title.
  • One person reports the main site briefly returning Cloudflare errors; an archive link is shared, later the site works again.

Use cases and extensions

  • Many are excited about local, open pipelines for organizing large personal media collections (videos, photos, documents).
  • A recurring question: could the same approach be used to index porn collections; people discuss model safety filters, abliterated models, LoRA finetuning, and how easy it is to bypass content restrictions with multi-turn prompting.
  • Others focus on family content: hopes for automatic “memories” and year-in-review compilations, combining photos, videos, and music.

Technical pipeline and models

  • Core flow (as discussed): extract scenes at ~1 fps, downscale frames (e.g., 720p), run face/object/text detection, transcription (Whisper), and visual description (Qwen2.5-VL variants).
  • Outputs go into a vector DB plus SQL for semantic search, RAG, and querying by text, screenshot, or audio.
  • One user notes Whisper can hallucinate when fed non-speech (e.g., moaning, slapping); another suggests Parakeet-style models that filter non-voice sounds.
  • Some want true video-clip embeddings, not just frame-level, to better capture actions.

Hardware performance and acceleration

  • Discussion compares M1 Max to 11th gen i9 and Snapdragon X Elite: similar CPU scores, but Apple’s unified memory and bandwidth (and local “AI accelerator”) are seen as major advantages for these workloads.
  • RTX GPUs (e.g., 3060, 5090) are expected to be significantly faster than M1 Max for indexing.
  • People suggest pay-as-you-go GPU providers (Runpod, vast.ai) to speed up large jobs while keeping models local-ish.

Existing tools and integration

  • DaVinci Resolve Studio and Adobe Premiere are mentioned as having built-in or cloud-based AI indexing; DaVinci’s AI runs locally but reportedly lacks full face tagging.
  • Third-party tools like Jumper, Immich, and other local video-indexing projects are suggested, some with NLE integrations and APIs.
  • There’s interest in containerized GPU access on Apple Silicon (podman + Mesa, vLLM-metal via Docker).

Skepticism, usability, and alternatives

  • Some question the example highlight reels as underwhelming given the volume of footage, wondering if the tech is mature enough.
  • A contrasting “simple” workflow is proposed: use GoPro’s built-in “HiLight Tag” while recording, then manually cut those marked segments later.
  • Others argue that while manual tagging is simpler, the ML pipeline enables retroactive search, multi-modal queries, and broader use cases beyond highlights.