2024-11-18

Show HN: FastGraphRAG – Better RAG using good old PageRank

Role of PageRank and Knowledge Graphs in RAG

Many see it as fitting, not ironic, that “classic” IR like PageRank complements LLMs: LLMs build semantic knowledge graphs; PageRank navigates them.
Graph RAG is framed as superior to pure vector RAG for multi-hop/complex questions because it explicitly models relations between entities.
Some argue classic search/BM25 plus good product design is often underrated compared to bigger models and vector search “pixie dust.”

Implementation and Features of FastGraphRAG

Graphs are built by LLMs (entities, relations, descriptions, conflict resolution) and stored via python-igraph; connectors to graph databases (e.g., Neo4j-like tools, Memgraph) are planned.
Retrieval uses semantic search to seed nodes, then personalized PageRank to spread relevance; future plans include weighted edges and “negative PageRank”/repulsors.
The system is configurable via domain descriptions, example queries, and entity types to make graph construction more opinionated and task-specific.
Works with any OpenAI-compatible API; people ask for clearer Ollama examples and pure retriever usage.

Use Cases, Capabilities, and Limits

Suggested uses: multi-hop QA, codebase understanding, customer-ticket assistants, compliance-doc analysis at scale, podcast/sentiment queries.
Authors position it as preferable to massive context windows (accuracy, cost, latency constraints).
For approximate aggregation (e.g., “positive view of X across many podcasts”), they propose graph-based filtering, acknowledging results are “best-effort,” not exact.

Ecosystem, Integrations, and Alternatives

Comparisons and references arise to HippoRAG, LightRAG, nano-graphrag, Aider’s PageRank-on-code, and alternative centrality measures (Triangle Centrality, Authority Rank).
There’s interest in Obsidian integration, Memgraph connectors, and using the framework purely as a retriever.

Critiques, Concerns, and Open Questions

Several criticize the GitHub README as too marketing-heavy and light on technical explanation, benchmarks, and concrete examples.
Some worry about dependence on OpenAI APIs and restrictive terms.
Others question whether RAG fundamentally struggles with implicit inferences, while defenders say that’s the LLM’s job once the right subgraph is retrieved.
Performance, multi-hop benchmark results, tenancy/multi-tenant graph handling, and long-running extraction times are raised as unanswered or unclear.

Related topics