I indexed 669 GB of my GoPro videos using my M1 Max computer and local ML models
Post format and access
- Several commenters say this should have been a “Show HN” and note difficulty editing the title.
- One person reports the main site briefly returning Cloudflare errors; an archive link is shared, later the site works again.
Use cases and extensions
- Many are excited about local, open pipelines for organizing large personal media collections (videos, photos, documents).
- A recurring question: could the same approach be used to index porn collections; people discuss model safety filters, abliterated models, LoRA finetuning, and how easy it is to bypass content restrictions with multi-turn prompting.
- Others focus on family content: hopes for automatic “memories” and year-in-review compilations, combining photos, videos, and music.
Technical pipeline and models
- Core flow (as discussed): extract scenes at ~1 fps, downscale frames (e.g., 720p), run face/object/text detection, transcription (Whisper), and visual description (Qwen2.5-VL variants).
- Outputs go into a vector DB plus SQL for semantic search, RAG, and querying by text, screenshot, or audio.
- One user notes Whisper can hallucinate when fed non-speech (e.g., moaning, slapping); another suggests Parakeet-style models that filter non-voice sounds.
- Some want true video-clip embeddings, not just frame-level, to better capture actions.
Hardware performance and acceleration
- Discussion compares M1 Max to 11th gen i9 and Snapdragon X Elite: similar CPU scores, but Apple’s unified memory and bandwidth (and local “AI accelerator”) are seen as major advantages for these workloads.
- RTX GPUs (e.g., 3060, 5090) are expected to be significantly faster than M1 Max for indexing.
- People suggest pay-as-you-go GPU providers (Runpod, vast.ai) to speed up large jobs while keeping models local-ish.
Existing tools and integration
- DaVinci Resolve Studio and Adobe Premiere are mentioned as having built-in or cloud-based AI indexing; DaVinci’s AI runs locally but reportedly lacks full face tagging.
- Third-party tools like Jumper, Immich, and other local video-indexing projects are suggested, some with NLE integrations and APIs.
- There’s interest in containerized GPU access on Apple Silicon (podman + Mesa, vLLM-metal via Docker).
Skepticism, usability, and alternatives
- Some question the example highlight reels as underwhelming given the volume of footage, wondering if the tech is mature enough.
- A contrasting “simple” workflow is proposed: use GoPro’s built-in “HiLight Tag” while recording, then manually cut those marked segments later.
- Others argue that while manual tagging is simpler, the ML pipeline enables retroactive search, multi-modal queries, and broader use cases beyond highlights.