Show HN: FastGraphRAG – Better RAG using good old PageRank
Role of PageRank and Knowledge Graphs in RAG
- Many see it as fitting, not ironic, that “classic” IR like PageRank complements LLMs: LLMs build semantic knowledge graphs; PageRank navigates them.
- Graph RAG is framed as superior to pure vector RAG for multi-hop/complex questions because it explicitly models relations between entities.
- Some argue classic search/BM25 plus good product design is often underrated compared to bigger models and vector search “pixie dust.”
Implementation and Features of FastGraphRAG
- Graphs are built by LLMs (entities, relations, descriptions, conflict resolution) and stored via python-igraph; connectors to graph databases (e.g., Neo4j-like tools, Memgraph) are planned.
- Retrieval uses semantic search to seed nodes, then personalized PageRank to spread relevance; future plans include weighted edges and “negative PageRank”/repulsors.
- The system is configurable via domain descriptions, example queries, and entity types to make graph construction more opinionated and task-specific.
- Works with any OpenAI-compatible API; people ask for clearer Ollama examples and pure retriever usage.
Use Cases, Capabilities, and Limits
- Suggested uses: multi-hop QA, codebase understanding, customer-ticket assistants, compliance-doc analysis at scale, podcast/sentiment queries.
- Authors position it as preferable to massive context windows (accuracy, cost, latency constraints).
- For approximate aggregation (e.g., “positive view of X across many podcasts”), they propose graph-based filtering, acknowledging results are “best-effort,” not exact.
Ecosystem, Integrations, and Alternatives
- Comparisons and references arise to HippoRAG, LightRAG, nano-graphrag, Aider’s PageRank-on-code, and alternative centrality measures (Triangle Centrality, Authority Rank).
- There’s interest in Obsidian integration, Memgraph connectors, and using the framework purely as a retriever.
Critiques, Concerns, and Open Questions
- Several criticize the GitHub README as too marketing-heavy and light on technical explanation, benchmarks, and concrete examples.
- Some worry about dependence on OpenAI APIs and restrictive terms.
- Others question whether RAG fundamentally struggles with implicit inferences, while defenders say that’s the LLM’s job once the right subgraph is retrieved.
- Performance, multi-hop benchmark results, tenancy/multi-tenant graph handling, and long-running extraction times are raised as unanswered or unclear.