Show HN: FastGraphRAG – Better RAG using good old PageRank

Role of PageRank and Knowledge Graphs in RAG

  • Many see it as fitting, not ironic, that “classic” IR like PageRank complements LLMs: LLMs build semantic knowledge graphs; PageRank navigates them.
  • Graph RAG is framed as superior to pure vector RAG for multi-hop/complex questions because it explicitly models relations between entities.
  • Some argue classic search/BM25 plus good product design is often underrated compared to bigger models and vector search “pixie dust.”

Implementation and Features of FastGraphRAG

  • Graphs are built by LLMs (entities, relations, descriptions, conflict resolution) and stored via python-igraph; connectors to graph databases (e.g., Neo4j-like tools, Memgraph) are planned.
  • Retrieval uses semantic search to seed nodes, then personalized PageRank to spread relevance; future plans include weighted edges and “negative PageRank”/repulsors.
  • The system is configurable via domain descriptions, example queries, and entity types to make graph construction more opinionated and task-specific.
  • Works with any OpenAI-compatible API; people ask for clearer Ollama examples and pure retriever usage.

Use Cases, Capabilities, and Limits

  • Suggested uses: multi-hop QA, codebase understanding, customer-ticket assistants, compliance-doc analysis at scale, podcast/sentiment queries.
  • Authors position it as preferable to massive context windows (accuracy, cost, latency constraints).
  • For approximate aggregation (e.g., “positive view of X across many podcasts”), they propose graph-based filtering, acknowledging results are “best-effort,” not exact.

Ecosystem, Integrations, and Alternatives

  • Comparisons and references arise to HippoRAG, LightRAG, nano-graphrag, Aider’s PageRank-on-code, and alternative centrality measures (Triangle Centrality, Authority Rank).
  • There’s interest in Obsidian integration, Memgraph connectors, and using the framework purely as a retriever.

Critiques, Concerns, and Open Questions

  • Several criticize the GitHub README as too marketing-heavy and light on technical explanation, benchmarks, and concrete examples.
  • Some worry about dependence on OpenAI APIs and restrictive terms.
  • Others question whether RAG fundamentally struggles with implicit inferences, while defenders say that’s the LLM’s job once the right subgraph is retrieved.
  • Performance, multi-hop benchmark results, tenancy/multi-tenant graph handling, and long-running extraction times are raised as unanswered or unclear.