
Retrieval-Augmented Generation (RAG) powers most modern LLM applications, but production systems often reveal the same problems: broken context from chunking, embedding mismatches, and important information that never gets retrieved.
PageIndex takes a different approach.
Instead of relying on embeddings and vector databases, it lets the LLM reason through a document’s structure to find relevant information. Documents are transformed into a hierarchical semantic tree, allowing the model to navigate summaries first and drill deeper only where needed.
In this article, we’ll explore how PageIndex works, why traditional RAG fails in certain scenarios, and how reasoning-based retrieval can improve accuracy without using embeddings.
What is PageIndex?
PageIndex transforms long documents into a hierarchical semantic tree, essentially a Table of Contents designed for LLMs instead of humans.
Rather than splitting documents into arbitrary chunks, PageIndex preserves the document’s natural structure. Each section becomes a node containing:
- A title
- A concise summary
- The original content
This allows the LLM to first understand the document at a high level, then navigate deeper into only the most relevant sections.
Traditional RAG systems retrieve information using embedding similarity.
PageIndex retrieves information through structured reasoning over summaries, making retrieval more interpretable, efficient, and context-aware.
The Problem With Traditional RAG
Most RAG systems do a vector search loop: split documents into chunks → embed chunks → store in a vector DB → embed the query → retrieve the “closest” chunks.
It works when the query wording matches the document. In practice, three issues show up:
- Chunking breaks context (ideas get split across chunks that never get retrieved together).
- Similarity ≠ intent (different phrasing can hide the right passage).
- Weak recall for small details (a key sentence buried in a larger section may never score high enough).
Result: the answer is in the source, but retrieval fails to surface it.
How PageIndex Solves This
PageIndex replaces similarity search with a structure-first retrieval loop:
- Index as a semantic tree (sections → nodes with titles, summaries, and content).
- Reason over summaries to choose the right branch for a query.
- Retrieve full text only for selected nodes, then generate the answer grounded in that content.
In short: summaries → selection → deep read → answer.
No Chunking and No Vector Database
PageIndex drops two RAG defaults, chunking and a vector database, by keeping the document intact as a structured tree.
Benefits:
- Preserves context (no arbitrary splits).
- Avoids embedding mismatch (retrieval is reasoning-driven).
- More debuggable retrieval (you can see which sections were chosen).
How Data is Stored in PageIndex?
Instead of storing text chunks with embeddings, PageIndex stores structured nodes.
A simplified version of the stored JSON looks like this:
{
"title": "Sports Interests",
"summary": "This section discusses hobbies and activities the person is interested in.",
"content": "The person expressed interest in learning football and outdoor sports."
}
Multiple nodes form a tree structure, representing the entire document hierarchy.
This structure allows the LLM to quickly scan summaries and decide where to retrieve information.
Implementing PageIndex: From Setup to Retrieval
Enough theory. Here is how you actually run it.
Walk away with actionable insights on AI adoption.
Limited seats available!
PageIndex is open source and self-hostable. The full pipeline, from indexing a PDF to querying it with reasoning-based retrieval, takes fewer than ten lines of setup.
Step 1: Install Dependencies
Clone the repo and install requirements.
git clone <https://github.com/VectifyAI/PageIndex.git>
cd PageIndex
pip install -r requirements.txtThe requirements are intentionally minimal. LiteLLM handles multi-provider LLM support, so you can use OpenAI, Anthropic, Gemini, or any compatible provider. There is no vector database client to install because there is no vector database.
Step 2: Set Your LLM API Key
Create a .env file in the root directory.
OPENAI_API_KEY=your_openai_key_here
PageIndex routes through LiteLLM, so the same setup works if you swap OpenAI for another provider. Just change the key and the model flag.
Step 3: Index Your Document
Point it at a PDF and run.
python run_pageindex.py --pdf_path /path/to/your/document.pdfThat is it. PageIndex parses the document, builds the semantic tree, and writes the structured JSON output. No embedding step. No database write. Just a file on disk that the LLM can reason over.
Optional flags let you tune the behavior:
--model LLM model to use (default: gpt-4o-2024-11-20)
--toc-check-pages Pages to scan for existing table of contents
--max-pages-per-node Max pages per node before it splits (default: 10)
--max-tokens-per-node Max tokens per node (default: 20000)
--if-add-node-summary Include summaries in each node (yes/no, default: yes)For Markdown files, use --md_path instead of --pdf_path. PageIndex uses heading levels (##, ###) to determine hierarchy, so the file needs clean formatting.
What the Output Looks Like
After indexing, you get a JSON file representing the full document tree. Each node captures a section with its title, summary, page range, and nested children.
{
"title": "Financial Stability",
"node_id": "0006",
"start_index": 21,
"end_index": 22,
"summary": "Covers the Federal Reserve's approach to monitoring financial vulnerabilities and coordinating with international bodies.",
"nodes": [
{
"title": "Monitoring Financial Vulnerabilities",
"node_id": "0007",
"start_index": 22,
"end_index": 28,
"summary": "Details the indicators and frameworks used to assess systemic financial risk."
},
{
"title": "Domestic and International Cooperation",
"node_id": "0008",
"start_index": 28,
"end_index": 31,
"summary": "Describes collaboration with regulators and foreign central banks in 2023."
}
]
}Notice what is not here: no embedding vector, no chunk ID, no similarity score. Just structure, summaries, and page references.
Step 4: Query the Document (Agentic RAG)
Install the optional dependency:
pip install openai-agentsRun the demo:
python examples/agentic_vectorless_rag_demo.pyIn a simple experiment, a document that has 500+ lines contained a single line mentioning football as an interest.

The query was:
“Generate tags listing things they were interested in learning about or doing.”
The document was indexed in two ways:
- Traditional RAG using embeddings

- PageIndex reasoning-based retrieval

Even when the entire document was embedded without chunking, the traditional RAG system failed to identify football as a relevant interest.
However, the PageIndex system correctly retrieved the node containing that information.
Why?
Because the LLM reasoned through the node summaries first, rather than relying purely on embedding similarity.
How the Retrieval Loop Actually Works?
When a query comes in, the LLM first scans only the top-level node summaries, not the full document. It picks the relevant branches, drills deeper through sub-node summaries, and reads the full content only once it reaches the right section.
Tree search, not similarity search. The model navigates a document the way a person would, scan the headings, go deep only where it matters.
Walk away with actionable insights on AI adoption.
Limited seats available!
Self-Hosted vs Cloud
The open-source version uses standard PDF parsing, which works well for clean, text-based documents. For complex PDFs with tables, multi-column layouts, or scanned pages, the PageIndex Cloud API uses an enhanced OCR and tree-building pipeline that produces higher-quality node structures.
Both expose the same interface. If you are prototyping, start local. If you are moving to production with difficult documents, the cloud API is a drop-in upgrade.
Accuracy vs Retrieval Speed
PageIndex is more precise, but slower by design. Traditional RAG is ~2.6× faster, because PageIndex runs three steps instead of one: reasoning over summaries, selecting the right nodes, then reading the full content to generate the answer.
Each step adds latency. Each step is also what makes it accurate. If missing one detail has real consequences, the extra time is worth it.
When PageIndex Works Best
PageIndex works best when the right answer matters more than a fast one.
Long structured documents, research papers, legal text, technical docs, already have a natural hierarchy. PageIndex uses that structure instead of fighting it.
Complex queries, when intent matters more than keyword matching, reasoning-based retrieval wins over embedding similarity.
Knowledge exploration, open-ended questions feel natural here. The LLM orients at the summary level first, then goes deep. Just like a person would.
When Traditional RAG May Still Be Better
PageIndex isn't for every use case. Vector-based RAG still wins when:
- Latency is critical - sub-second responses leave no room for multiple reasoning steps.
- Scale is massive - searching millions of documents is where vector databases are purpose-built to excel.
- Real-time search - live search, autocomplete, and high-throughput pipelines need vector speed.
Conclusion
RAG isn't broken, but the assumption that embeddings and vector databases are the only way to do retrieval is worth questioning.
PageIndex shows that when you let the LLM reason about where to look, not just what to retrieve, accuracy improves dramatically. The trade-off is speed, and that's a fair one to make in domains where a missed detail or a wrong answer has real consequences.
As LLMs get faster and cheaper, the case for reasoning-based retrieval only gets stronger. PageIndex is a working proof that this approach isn't just theoretical, it's practical, deployable, and already outperforming traditional RAG where it counts.
If your current pipeline is fast but not accurate enough, it might be time to rethink how retrieval works. → Try PageIndex on GitHub
Frequently Asked Questions
1. Does PageIndex completely remove embeddings?
Yes. PageIndex doesn't use embedding models or vector similarity at any stage. The LLM reasons directly over node summaries to find what's relevant, no vectors involved.
2. How is PageIndex different from traditional RAG?
Traditional RAG converts your document into chunks, embeds them, and retrieves by similarity score. PageIndex skips all of that. Instead, the LLM reads structured summaries and reasons about which sections actually answer the question. It's the difference between pattern matching and understanding.
3. Why does PageIndex take more time?
Because it thinks before it retrieves. Traditional RAG runs one similarity search and returns results. PageIndex reasons over summaries, selects the right nodes, then reads the content, three steps instead of one. That adds latency, but also accuracy.
4. Is PageIndex scalable for large datasets?
It works best with structured, long-form documents rather than massive document collections. If you're indexing millions of files and need real-time retrieval, a vector database is still the right tool. PageIndex is built for depth, not scale.
5. When should I use PageIndex over traditional RAG?
When being wrong is not an option. If your use case involves complex documents, nuanced queries, or domains where a missed detail has real consequences, legal, financial, medical, technical, PageIndex is worth the extra latency.
Walk away with actionable insights on AI adoption.
Limited seats available!



