OTranscribe: A free and open tool for transcribing audio interviews

What OTranscribe Is (and Isn’t)

  • Tool is a browser-based UI that helps with manual transcription: playback speed, easy pause/play, keyboard control.
  • Does not perform automatic speech-to-text; users type everything themselves (confirmed in FAQ excerpt).
  • Works offline if the site is preloaded; can be self‑hosted (MIT licensed) or saved as an offline web app.
  • Praised for being simple, distraction‑free, and well-suited for interviews and travel/offline use.
  • Some find it “too simple” for many modern use cases.

Expectations Around AI and Automation

  • Several commenters initially assume it’s an automatic ASR tool and are corrected.
  • Others are surprised it has no AI integration, but context is that it was written years ago before current AI wave and is not actively developed.
  • Some argue manual transcription still matters, even with AI, for proofing, attribution, and handling edge cases.

Alternative Tools: Automatic Transcription & Diarization

  • Many suggestions based on Whisper and derivatives:
    • CLI and libraries: Whisper, whisper.cpp (much faster on CPU), WhisperX, whisper‑diarization.
    • Hosted services/APIs: Spectropic (with diarization, some LLM post‑processing for speaker names), Audiogest, TurboScribe, VideotoTextAI, Talio.
    • Desktop/mobile: Aiko (iOS, offline Whisper), various macOS and Electron apps, oTranscribe+ (browser/Electron with Vosk via WASM), Pixel Recorder and Live Transcribe, FUTO’s Android tools, Transcribro.
  • Several tools generate subtitles (SRT/VTT), handle YouTube downloads, or provide chat-with-transcript features.

Real-Time, Local-First, and Accessibility Use Cases

  • Strong interest in:
    • Real-time, word-by-word transcription.
    • Fully local processing for privacy and for people who are hard of hearing.
  • Some report good results with local-first apps and Android’s built‑in captioning/transcription; others still searching for open-source, punctuated, real‑time solutions.

Quality, Hallucination, and Post-Processing

  • Whisper and similar models are viewed as high quality but can be slow on CPU and may hallucinate in dead air.
  • LLMs can be chained after transcription to:
    • Remove filler words.
    • Correct errors, punctuate, and infer speaker names.
  • One experiment with a multimodal LLM shows near‑perfect transcription with nuanced punctuation, but post‑processing pipelines can also “over-correct” phrasing.

Language and Dialect Issues

  • OTranscribe itself is language-agnostic (you type whatever you hear).
  • Multiple tools claim broad language support; specific mentions of Japanese and Brazilian Portuguese queries, with mixed clarity on accent handling.