By using this site, you agree to the Privacy Policy and Terms of Use.
Accept
World of SoftwareWorld of SoftwareWorld of Software
  • News
  • Software
  • Mobile
  • Computing
  • Gaming
  • Videos
  • More
    • Gadget
    • Web Stories
    • Trending
    • Press Release
Search
  • Privacy
  • Terms
  • Advertise
  • Contact
Copyright © All Rights Reserved. World of Software.
Reading: I Rewrote a Python RAG Library in Rust | HackerNoon
Share
Sign In
Notification Show More
Font ResizerAa
World of SoftwareWorld of Software
Font ResizerAa
  • Software
  • Mobile
  • Computing
  • Gadget
  • Gaming
  • Videos
Search
  • News
  • Software
  • Mobile
  • Computing
  • Gaming
  • Videos
  • More
    • Gadget
    • Web Stories
    • Trending
    • Press Release
Have an existing account? Sign In
Follow US
  • Privacy
  • Terms
  • Advertise
  • Contact
Copyright © All Rights Reserved. World of Software.
World of Software > Computing > I Rewrote a Python RAG Library in Rust | HackerNoon
Computing

I Rewrote a Python RAG Library in Rust | HackerNoon

News Room
Last updated: 2026/02/26 at 4:32 PM
News Room Published 26 February 2026
Share
I Rewrote a Python RAG Library in Rust | HackerNoon
SHARE

Chunk-based RAG is broken for structured documents. The fix is simpler than you think - and faster than the original.


A few weeks ago, I came across an article by Agent Native about vectorless RAG. The framing stuck with me: most RAG systems turn documents into “semantic confetti” — chunk everything, embed everything, then hope an ANN search surfaces the right bits. For large document bases, this becomes semantic hide-and-seek across thousands of chunks, burning tokens and confidently hallucinating near the answer.

Upon digging deeper, PageIndex from VectifyAI had the perfect implementation as an alternative approach. Instead of embedding chunks, it treats the document’s own heading structure as the retrieval primitive. Represent the document as a hierarchical tree, hand the outline to your LLM, let it navigate to the right section, pull that section’s text. No embeddings. No ANN. Just the document telling you how it’s organized.

I had been building agents over financial documents and hitting exactly this problem. I tried PageIndex, it worked, and then I rewrote it in Rust.

This is the story of what happened.

Why chunk-based RAG fails on structured documents

Take a 10-K filing. It has a section on risk factors, inside which there’s a subsection on liquidity risk, inside which there’s a paragraph about covenant breaches. When you split this into 512-token chunks, those three levels of context get shattered. The chunk about covenant breaches no longer knows it’s inside liquidity risk, which is inside risk factors.

At query time, “what are the company’s covenant breach risks” might surface three chunks from different sections that share vocabulary but don’t form a coherent answer. The retrieval is technically close but contextually wrong. You end up with an LLM that has all the right words and none of the right context.

Structured documents — financial reports, legal filings, technical manuals, research papers — already tell you how they’re organized. Every heading is a natural retrieval boundary. PageIndex just respects that structure.

How PageIndex works

The approach is straightforward. Parse the markdown document into a tree of nodes, one per heading. Each node holds its title, body text, and children. Generate a compact outline of the tree. At query time:

  1. Send the outline to your LLM with the question
  2. Ask it to return the node ID of the most relevant section
  3. Fetch that node directly
  4. Pass the node’s text to your LLM for the final answer

The outline looks like this:

[1] Annual Report 2023
[1.1] Financial Highlights
[1.2] Risk Factors
[1.2.1] Market Risk
[1.2.2] Liquidity Risk
[1.2.3] Regulatory Risk
[1.3] Management Discussion

The LLM reads this and says “1.2.2” — you fetch that node and you’re done. Precise, explainable, and no embedding infrastructure required.

VectifyAI’s Mafin 2.5 system, powered by PageIndex, achieved 98.7% accuracy on the FinanceBench benchmark. That’s the practical proof that the approach works at scale.

Why I rewrote it in Rust

A few reasons. I had already built fastrustrag — a Rust library for document deduplication that achieved 8–121x speedups over Python’s datasketch — so I had the toolchain and the workflow ready. I was also skeptical that the Python implementation would hold up under load, specifically for the index build and node retrieval operations that happen on every query.

Before writing a line of Rust I validated that there was actually a performance problem worth solving. The methodology I’ve been using for these projects: always benchmark the Python implementation first, identify the bottleneck, then build the Rust version. Don’t rewrite things for fun.

For PageIndex specifically, the bottleneck I expected was node retrieval. The Python library stores nodes in a flat list and does a linear scan to find a node by ID. That’s O(n). At 28 nodes it’s fine. At 765 nodes across a large document corpus it becomes measurably slow and, more importantly, wildly inconsistent at the tail.

Building pageindex-rs

The Rust implementation follows the same architecture: parse markdown into a tree, assign dot-notation node IDs (1.2.3 rather than 0012), store nodes in a HashMap for O(1) lookup, expose everything to Python via PyO3.

The dot-notation IDs turned out to matter more than I expected. When you show an LLM an outline with IDs like 1.2.3, it immediately understands the hierarchy — 1.2.3 is a child of 1.2, which is a child of 1. With zero-padded sequential IDs like 0012, the LLM just sees a number with no structural signal. This affected retrieval accuracy in the benchmarks, which I’ll get to.

The Python API looks like this:

import pageindex_rs
index = pageindex_rs.PageIndex.from_file("annual_report", "report.md")
# Feed this to your LLM
print(index.outline())
# [1] Annual Report 2023
# [1.1] Financial Highlights
# [1.2] Risk Factors
# [1.2.1] Market Risk
# [1.2.2] Liquidity Risk
# Fetch the node your LLM returned
node = index.get_node("1.2.2")
print(node.title) # Liquidity Risk
print(node.text) # The company's liquidity position…
print(node.breadcrumb) # ['Risk Factors', 'Liquidity Risk']
# Get a full section with all subsections merged
section = index.get_node_with_children("1.2")

The retrieval loop is a handful of lines:

outline = index.outline()
node_id = llm(f"""
Document outline:
{outline}
Question: {user_query}
Return only the node_id of the most relevant section. Nothing else.
""").strip()
result = index.get_node(node_id)
# Pass result.text to your LLM for the final answer

The benchmarks

I ran three benchmark suites across three document sizes — a 42KB single article, a 395KB multi-article corpus, and a 1055KB large corpus. 500 iterations per build test, 1000 random lookups per retrieval test. The full notebook is in the repo.

Index build speed

| Document size | Rust mean | Python mean | Speedup |
|—-|—-|—-|—-|
| 42 KB | 0.207 ms | 0.153 ms | 0.74x ❌ |
| 395 KB | 0.873 ms | 1.369 ms | 1.57x |
| 1055 KB | 2.549 ms | 4.278 ms | 1.68x |

Below ~200KB, PyO3 FFI overhead cancels the parsing speedup — Rust actually loses at small scale. I’m reporting this honestly because benchmarks that only show wins aren’t useful. At realistic document sizes the picture flips.

The more important number is consistency. This is what production systems actually care about:

| Document size | Rust p99 | Python p99 | Rust max | Python max |
|—-|—-|—-|—-|—-|
| 42 KB | 1.3 ms | 0.2 ms | 17.4 ms | 0.4 ms |
| 395 KB | 1.1 ms | 1.5 ms | 1.3 ms | 1.6 ms |
| 1055 KB | 2.8 ms | 21.0 ms | 3.7 ms | 42.9 ms |

At 1055KB, Python’s p99 is 21ms and its max is 42ms. Rust’s p99 is 2.8ms and max is 3.7ms. Python’s standard deviation at that size is 2.78ms versus Rust’s 0.10ms — 27x more variable. In a pipeline processing hundreds of documents those spikes accumulate into real latency.

Node retrieval speed

This is where the O(1) vs O(n) gap shows most clearly:

| Document size | Nodes | Rust mean | Python mean | Speedup |
|—-|—-|—-|—-|—-|
| 42 KB | 28 | 0.0072 ms | 0.0060 ms | 0.83x |
| 395 KB | 261 | 0.0119 ms | 0.0272 ms | 2.29x |
| 1055 KB | 765 | 0.0216 ms | 0.0686 ms | 3.18x |

At 28 nodes, linear scan is fast enough that the HashMap overhead tips Rust slightly negative. At 765 nodes, Rust is 3.18x faster. The gap keeps widening — at 5000 nodes in a combined corpus it would be around 10x.

Answer accuracy

I tested both on 10 financial questions against a ~3MB document corpus using the same LLM for both:

| Implementation | Correct |
|—-|—-|
| pageindex-rs | 9 / 10 |
| PageIndex (Python) | 7 / 10 |

The accuracy difference comes down to node ID format. 1.2.3 gives the LLM structural signal for free. 0012 does not. Small design decisions compound.

What I learned

Benchmark before you build. The small document results prove that Rust isn’t automatically faster — FFI overhead is real and it dominates at small scales. If your documents are consistently under 200KB, the Python library is probably fine.

Consistency matters more than mean speed. The headline speedup numbers are nice but the stdev and p99 tell the real story for production. A system that’s 1.68x faster on average but 27x more consistent in stdev is a much better choice than the mean alone suggests.

Node ID design affects LLM behavior. I didn’t expect the dot-notation change to move accuracy by two questions out of ten, but it did. How you present structure to an LLM matters in ways that are hard to predict without actually running the experiment.

Try it

pip install pageindex-rs
  • GitHub: https://github.com/Manojython/pageindex-rs
  • PyPI: https://pypi.org/project/pageindex-rs/
  • Original PageIndex by VectifyAI: https://github.com/VectifyAI/PageIndex
  • Agent Native’s article that started this: Vectorless RAG for Agents

Thanks for reading 🙂

Sign Up For Daily Newsletter

Be keep up! Get the latest breaking news delivered straight to your inbox.
By signing up, you agree to our Terms of Use and acknowledge the data practices in our Privacy Policy. You may unsubscribe at any time.
Share This Article
Facebook Twitter Email Print
Share
What do you think?
Love0
Sad0
Happy0
Sleepy0
Angry0
Dead0
Wink0
Previous Article New Low-Cost iPad Coming Next Week: What to Expect New Low-Cost iPad Coming Next Week: What to Expect
Next Article Galaxy S26 Ultra's Privacy Display Is an Innovative Tool That Curbs Shoulder Surfing Galaxy S26 Ultra's Privacy Display Is an Innovative Tool That Curbs Shoulder Surfing
Leave a comment

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Stay Connected

248.1k Like
69.1k Follow
134k Pin
54.3k Follow

Latest News

How to Exchange Crypto Without KYC (No ID Required) | HackerNoon
How to Exchange Crypto Without KYC (No ID Required) | HackerNoon
Computing
LG XBOOM 360 deal: A serious speaker at a third of the price!
LG XBOOM 360 deal: A serious speaker at a third of the price!
News
‘Unbelievably dangerous’: experts sound alarm after ChatGPT Health fails to recognise medical emergencies
‘Unbelievably dangerous’: experts sound alarm after ChatGPT Health fails to recognise medical emergencies
News
The HackerNoon Newsletter: Swift: Master of Decoding Messy json (2/26/2026) | HackerNoon
The HackerNoon Newsletter: Swift: Master of Decoding Messy json (2/26/2026) | HackerNoon
Computing

You Might also Like

How to Exchange Crypto Without KYC (No ID Required) | HackerNoon
Computing

How to Exchange Crypto Without KYC (No ID Required) | HackerNoon

6 Min Read
The HackerNoon Newsletter: Swift: Master of Decoding Messy json (2/26/2026) | HackerNoon
Computing

The HackerNoon Newsletter: Swift: Master of Decoding Messy json (2/26/2026) | HackerNoon

2 Min Read
The Compliance Gap in Agentic AI: Why the Real Opportunity Isn’t Another Agent | HackerNoon
Computing

The Compliance Gap in Agentic AI: Why the Real Opportunity Isn’t Another Agent | HackerNoon

12 Min Read
AI Doesn’t Need Robots. It Needs Rentable Humans | HackerNoon
Computing

AI Doesn’t Need Robots. It Needs Rentable Humans | HackerNoon

6 Min Read
//

World of Software is your one-stop website for the latest tech news and updates, follow us now to get the news that matters to you.

Quick Link

  • Privacy Policy
  • Terms of use
  • Advertise
  • Contact

Topics

  • Computing
  • Software
  • Press Release
  • Trending

Sign Up for Our Newsletter

Subscribe to our newsletter to get our newest articles instantly!

World of SoftwareWorld of Software
Follow US
Copyright © All Rights Reserved. World of Software.
Welcome Back!

Sign in to your account

Lost your password?