By using this site, you agree to the Privacy Policy and Terms of Use.
Accept
World of SoftwareWorld of SoftwareWorld of Software
  • News
  • Software
  • Mobile
  • Computing
  • Gaming
  • Videos
  • More
    • Gadget
    • Web Stories
    • Trending
    • Press Release
Search
  • Privacy
  • Terms
  • Advertise
  • Contact
Copyright © All Rights Reserved. World of Software.
Reading: I Rewrote a Python RAG Library in Rust | HackerNoon
Share
Sign In
Notification Show More
Font ResizerAa
World of SoftwareWorld of Software
Font ResizerAa
  • Software
  • Mobile
  • Computing
  • Gadget
  • Gaming
  • Videos
Search
  • News
  • Software
  • Mobile
  • Computing
  • Gaming
  • Videos
  • More
    • Gadget
    • Web Stories
    • Trending
    • Press Release
Have an existing account? Sign In
Follow US
  • Privacy
  • Terms
  • Advertise
  • Contact
Copyright © All Rights Reserved. World of Software.
World of Software > Computing > I Rewrote a Python RAG Library in Rust | HackerNoon
Computing

I Rewrote a Python RAG Library in Rust | HackerNoon

News Room
Last updated: 2026/02/26 at 4:32 PM
News Room Published 26 February 2026
Share
I Rewrote a Python RAG Library in Rust | HackerNoon
SHARE

Chunk-based RAG is broken for structured documents. The fix is simpler than you think - and faster than the original.


A few weeks ago, I came across an article by Agent Native about vectorless RAG. The framing stuck with me: most RAG systems turn documents into “semantic confetti” — chunk everything, embed everything, then hope an ANN search surfaces the right bits. For large document bases, this becomes semantic hide-and-seek across thousands of chunks, burning tokens and confidently hallucinating near the answer.

Upon digging deeper, PageIndex from VectifyAI had the perfect implementation as an alternative approach. Instead of embedding chunks, it treats the document’s own heading structure as the retrieval primitive. Represent the document as a hierarchical tree, hand the outline to your LLM, let it navigate to the right section, pull that section’s text. No embeddings. No ANN. Just the document telling you how it’s organized.

I had been building agents over financial documents and hitting exactly this problem. I tried PageIndex, it worked, and then I rewrote it in Rust.

This is the story of what happened.

Why chunk-based RAG fails on structured documents

Take a 10-K filing. It has a section on risk factors, inside which there’s a subsection on liquidity risk, inside which there’s a paragraph about covenant breaches. When you split this into 512-token chunks, those three levels of context get shattered. The chunk about covenant breaches no longer knows it’s inside liquidity risk, which is inside risk factors.

At query time, “what are the company’s covenant breach risks” might surface three chunks from different sections that share vocabulary but don’t form a coherent answer. The retrieval is technically close but contextually wrong. You end up with an LLM that has all the right words and none of the right context.

Structured documents — financial reports, legal filings, technical manuals, research papers — already tell you how they’re organized. Every heading is a natural retrieval boundary. PageIndex just respects that structure.

How PageIndex works

The approach is straightforward. Parse the markdown document into a tree of nodes, one per heading. Each node holds its title, body text, and children. Generate a compact outline of the tree. At query time:

  1. Send the outline to your LLM with the question
  2. Ask it to return the node ID of the most relevant section
  3. Fetch that node directly
  4. Pass the node’s text to your LLM for the final answer

The outline looks like this:

[1] Annual Report 2023
[1.1] Financial Highlights
[1.2] Risk Factors
[1.2.1] Market Risk
[1.2.2] Liquidity Risk
[1.2.3] Regulatory Risk
[1.3] Management Discussion

The LLM reads this and says “1.2.2” — you fetch that node and you’re done. Precise, explainable, and no embedding infrastructure required.

VectifyAI’s Mafin 2.5 system, powered by PageIndex, achieved 98.7% accuracy on the FinanceBench benchmark. That’s the practical proof that the approach works at scale.

Why I rewrote it in Rust

A few reasons. I had already built fastrustrag — a Rust library for document deduplication that achieved 8–121x speedups over Python’s datasketch — so I had the toolchain and the workflow ready. I was also skeptical that the Python implementation would hold up under load, specifically for the index build and node retrieval operations that happen on every query.

Before writing a line of Rust I validated that there was actually a performance problem worth solving. The methodology I’ve been using for these projects: always benchmark the Python implementation first, identify the bottleneck, then build the Rust version. Don’t rewrite things for fun.

For PageIndex specifically, the bottleneck I expected was node retrieval. The Python library stores nodes in a flat list and does a linear scan to find a node by ID. That’s O(n). At 28 nodes it’s fine. At 765 nodes across a large document corpus it becomes measurably slow and, more importantly, wildly inconsistent at the tail.

Building pageindex-rs

The Rust implementation follows the same architecture: parse markdown into a tree, assign dot-notation node IDs (1.2.3 rather than 0012), store nodes in a HashMap for O(1) lookup, expose everything to Python via PyO3.

The dot-notation IDs turned out to matter more than I expected. When you show an LLM an outline with IDs like 1.2.3, it immediately understands the hierarchy — 1.2.3 is a child of 1.2, which is a child of 1. With zero-padded sequential IDs like 0012, the LLM just sees a number with no structural signal. This affected retrieval accuracy in the benchmarks, which I’ll get to.

The Python API looks like this:

import pageindex_rs
index = pageindex_rs.PageIndex.from_file("annual_report", "report.md")
# Feed this to your LLM
print(index.outline())
# [1] Annual Report 2023
# [1.1] Financial Highlights
# [1.2] Risk Factors
# [1.2.1] Market Risk
# [1.2.2] Liquidity Risk
# Fetch the node your LLM returned
node = index.get_node("1.2.2")
print(node.title) # Liquidity Risk
print(node.text) # The company's liquidity position…
print(node.breadcrumb) # ['Risk Factors', 'Liquidity Risk']
# Get a full section with all subsections merged
section = index.get_node_with_children("1.2")

The retrieval loop is a handful of lines:

outline = index.outline()
node_id = llm(f"""
Document outline:
{outline}
Question: {user_query}
Return only the node_id of the most relevant section. Nothing else.
""").strip()
result = index.get_node(node_id)
# Pass result.text to your LLM for the final answer

The benchmarks

I ran three benchmark suites across three document sizes — a 42KB single article, a 395KB multi-article corpus, and a 1055KB large corpus. 500 iterations per build test, 1000 random lookups per retrieval test. The full notebook is in the repo.

Index build speed

| Document size | Rust mean | Python mean | Speedup |
|—-|—-|—-|—-|
| 42 KB | 0.207 ms | 0.153 ms | 0.74x ❌ |
| 395 KB | 0.873 ms | 1.369 ms | 1.57x |
| 1055 KB | 2.549 ms | 4.278 ms | 1.68x |

Below ~200KB, PyO3 FFI overhead cancels the parsing speedup — Rust actually loses at small scale. I’m reporting this honestly because benchmarks that only show wins aren’t useful. At realistic document sizes the picture flips.

The more important number is consistency. This is what production systems actually care about:

| Document size | Rust p99 | Python p99 | Rust max | Python max |
|—-|—-|—-|—-|—-|
| 42 KB | 1.3 ms | 0.2 ms | 17.4 ms | 0.4 ms |
| 395 KB | 1.1 ms | 1.5 ms | 1.3 ms | 1.6 ms |
| 1055 KB | 2.8 ms | 21.0 ms | 3.7 ms | 42.9 ms |

At 1055KB, Python’s p99 is 21ms and its max is 42ms. Rust’s p99 is 2.8ms and max is 3.7ms. Python’s standard deviation at that size is 2.78ms versus Rust’s 0.10ms — 27x more variable. In a pipeline processing hundreds of documents those spikes accumulate into real latency.

Node retrieval speed

This is where the O(1) vs O(n) gap shows most clearly:

| Document size | Nodes | Rust mean | Python mean | Speedup |
|—-|—-|—-|—-|—-|
| 42 KB | 28 | 0.0072 ms | 0.0060 ms | 0.83x |
| 395 KB | 261 | 0.0119 ms | 0.0272 ms | 2.29x |
| 1055 KB | 765 | 0.0216 ms | 0.0686 ms | 3.18x |

At 28 nodes, linear scan is fast enough that the HashMap overhead tips Rust slightly negative. At 765 nodes, Rust is 3.18x faster. The gap keeps widening — at 5000 nodes in a combined corpus it would be around 10x.

Answer accuracy

I tested both on 10 financial questions against a ~3MB document corpus using the same LLM for both:

| Implementation | Correct |
|—-|—-|
| pageindex-rs | 9 / 10 |
| PageIndex (Python) | 7 / 10 |

The accuracy difference comes down to node ID format. 1.2.3 gives the LLM structural signal for free. 0012 does not. Small design decisions compound.

What I learned

Benchmark before you build. The small document results prove that Rust isn’t automatically faster — FFI overhead is real and it dominates at small scales. If your documents are consistently under 200KB, the Python library is probably fine.

Consistency matters more than mean speed. The headline speedup numbers are nice but the stdev and p99 tell the real story for production. A system that’s 1.68x faster on average but 27x more consistent in stdev is a much better choice than the mean alone suggests.

Node ID design affects LLM behavior. I didn’t expect the dot-notation change to move accuracy by two questions out of ten, but it did. How you present structure to an LLM matters in ways that are hard to predict without actually running the experiment.

Try it

pip install pageindex-rs
  • GitHub: https://github.com/Manojython/pageindex-rs
  • PyPI: https://pypi.org/project/pageindex-rs/
  • Original PageIndex by VectifyAI: https://github.com/VectifyAI/PageIndex
  • Agent Native’s article that started this: Vectorless RAG for Agents

Thanks for reading 🙂

Sign Up For Daily Newsletter

Be keep up! Get the latest breaking news delivered straight to your inbox.
By signing up, you agree to our Terms of Use and acknowledge the data practices in our Privacy Policy. You may unsubscribe at any time.
Share This Article
Facebook Twitter Email Print
Share
What do you think?
Love0
Sad0
Happy0
Sleepy0
Angry0
Dead0
Wink0
Previous Article New Low-Cost iPad Coming Next Week: What to Expect New Low-Cost iPad Coming Next Week: What to Expect
Next Article Galaxy S26 Ultra's Privacy Display Is an Innovative Tool That Curbs Shoulder Surfing Galaxy S26 Ultra's Privacy Display Is an Innovative Tool That Curbs Shoulder Surfing
Leave a comment

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Stay Connected

248.1k Like
69.1k Follow
134k Pin
54.3k Follow

Latest News

Tim Cook teases March 4 Apple special event: What it means
Tim Cook teases March 4 Apple special event: What it means
News
The days of cheap phones are gone as the smartphone market is set for a shocking decline
The days of cheap phones are gone as the smartphone market is set for a shocking decline
News
Ubuntu 26.04 Resolute Snapshot 4 Released
Ubuntu 26.04 Resolute Snapshot 4 Released
Computing
You Should Disable This Invasive New Microsoft Feature Right Now – Here’s Why – BGR
You Should Disable This Invasive New Microsoft Feature Right Now – Here’s Why – BGR
News

You Might also Like

Ubuntu 26.04 Resolute Snapshot 4 Released
Computing

Ubuntu 26.04 Resolute Snapshot 4 Released

1 Min Read
QIELend: Bringing Efficient DeFi Lending to The QIE Blockchain | HackerNoon
Computing

QIELend: Bringing Efficient DeFi Lending to The QIE Blockchain | HackerNoon

8 Min Read
What Does It Mean to Be Human When Tortured? | HackerNoon
Computing

What Does It Mean to Be Human When Tortured? | HackerNoon

6 Min Read
Pipe Network Launches SolanaCDN: A Free, Open-Source Validator Client With Built-In Acceleration  | HackerNoon
Computing

Pipe Network Launches SolanaCDN: A Free, Open-Source Validator Client With Built-In Acceleration | HackerNoon

5 Min Read
//

World of Software is your one-stop website for the latest tech news and updates, follow us now to get the news that matters to you.

Quick Link

  • Privacy Policy
  • Terms of use
  • Advertise
  • Contact

Topics

  • Computing
  • Software
  • Press Release
  • Trending

Sign Up for Our Newsletter

Subscribe to our newsletter to get our newest articles instantly!

World of SoftwareWorld of Software
Follow US
Copyright © All Rights Reserved. World of Software.
Welcome Back!

Sign in to your account

Lost your password?