Launch HN: Trellis (YC W24) – AI-powered workflows for unstructured data
Technology & Approach
- Trellis uses a mix of fine-tuned LLMs for extraction/validation/parsing and large foundation models for general reasoning, routed via a model-selection architecture.
- Vision models are used for complex PDFs and scans (nested tables, images), with OCR/parsers relegated to cross-checking.
- The system supports connectors to diverse data sources and destinations, workflow orchestration (triggering on new data), and both UI and API for technical and non‑technical users.
- Validation mechanisms include schema-aware checks, custom rules, confidence scores, reference back to source documents, and “LLM-as-a-judge” style evaluation.
Accuracy, Validation & Risk
- Claimed accuracy is ~95% out of the box, rising to 99%+ with fine-tuning and human-in-the-loop review.
- Commenters note that in practice many LLM extraction setups hover around ~90% and that undetected errors are the real concern, especially in finance and healthcare.
- There is strong emphasis that fully automated pipelines are risky for critical tasks (e.g., payments, medical data); human review remains essential.
- Some ask about concrete metrics and how Trellis detects failures rather than just average performance.
Use Cases Discussed
- Heavy focus on financial services: private credit documents, bank risk models, invoices, product catalogs, compliance flagging, and email processing.
- Other domains mentioned: logistics/shipping, genealogy (vital records), audiograms in healthcare, customer calls/chats, and general ETL for RAG pipelines.
- Table extraction and multi-row outputs from a single document are a recurring need; Trellis recently added a “table mode” for this.
Positioning, Differentiation & Competition
- Trellis is framed as “ETL for unstructured data” and an end-to-end workflow tool rather than just a structured-output wrapper around an LLM.
- Differentiation claims versus NER/rule-based systems: robustness to varied, messy formats and complex tasks beyond simple field extraction.
- Compared with legacy/document-AI tools and other platforms, Trellis stresses configurability, high-accuracy extraction, validation, and workflow integration.
- Overlap with adjacent YC startups and open source tools is acknowledged; Trellis emphasizes document-heavy workflows, dashboards, and system-of-record integration.
Market & Business Concerns
- Some argue this is a feature that cloud/LLM vendors will eventually subsume; others counter that workflow complexity, integration, and last‑mile accuracy are nontrivial moats.
- Several commenters doubt the “80% of important data is unstructured” narrative and note big banks already have strong ETL teams and internal LLM projects, plus strict vendor/compliance barriers.
- Others see large opportunities in bespoke, high-touch deployments where the main competitor is expensive human labor, not basic OCR.
- On‑prem and HIPAA support are requested for regulated industries; HIPAA is said to be on the near-term roadmap, on‑prem is seen as harder due to GPU needs.