Launch HN: Trellis (YC W24) – AI-powered workflows for unstructured data

Technology & Approach

  • Trellis uses a mix of fine-tuned LLMs for extraction/validation/parsing and large foundation models for general reasoning, routed via a model-selection architecture.
  • Vision models are used for complex PDFs and scans (nested tables, images), with OCR/parsers relegated to cross-checking.
  • The system supports connectors to diverse data sources and destinations, workflow orchestration (triggering on new data), and both UI and API for technical and non‑technical users.
  • Validation mechanisms include schema-aware checks, custom rules, confidence scores, reference back to source documents, and “LLM-as-a-judge” style evaluation.

Accuracy, Validation & Risk

  • Claimed accuracy is ~95% out of the box, rising to 99%+ with fine-tuning and human-in-the-loop review.
  • Commenters note that in practice many LLM extraction setups hover around ~90% and that undetected errors are the real concern, especially in finance and healthcare.
  • There is strong emphasis that fully automated pipelines are risky for critical tasks (e.g., payments, medical data); human review remains essential.
  • Some ask about concrete metrics and how Trellis detects failures rather than just average performance.

Use Cases Discussed

  • Heavy focus on financial services: private credit documents, bank risk models, invoices, product catalogs, compliance flagging, and email processing.
  • Other domains mentioned: logistics/shipping, genealogy (vital records), audiograms in healthcare, customer calls/chats, and general ETL for RAG pipelines.
  • Table extraction and multi-row outputs from a single document are a recurring need; Trellis recently added a “table mode” for this.

Positioning, Differentiation & Competition

  • Trellis is framed as “ETL for unstructured data” and an end-to-end workflow tool rather than just a structured-output wrapper around an LLM.
  • Differentiation claims versus NER/rule-based systems: robustness to varied, messy formats and complex tasks beyond simple field extraction.
  • Compared with legacy/document-AI tools and other platforms, Trellis stresses configurability, high-accuracy extraction, validation, and workflow integration.
  • Overlap with adjacent YC startups and open source tools is acknowledged; Trellis emphasizes document-heavy workflows, dashboards, and system-of-record integration.

Market & Business Concerns

  • Some argue this is a feature that cloud/LLM vendors will eventually subsume; others counter that workflow complexity, integration, and last‑mile accuracy are nontrivial moats.
  • Several commenters doubt the “80% of important data is unstructured” narrative and note big banks already have strong ETL teams and internal LLM projects, plus strict vendor/compliance barriers.
  • Others see large opportunities in bespoke, high-touch deployments where the main competitor is expensive human labor, not basic OCR.
  • On‑prem and HIPAA support are requested for regulated industries; HIPAA is said to be on the near-term roadmap, on‑prem is seen as harder due to GPU needs.