Resident Evil Star Milla Jovovich Shipped An AI Memory System. Devs Shredded Its Benchmarks

The Big Announcement

On April 6, 2026, Ben Sigman, CEO of Bitcoin lending platform Libre Labs and self-described friend of Jovovich, posted on X announcing MemPalace, an open-source AI memory system built with Claude. The pitch was ambitious. From Sigman’s launch post:

“My friend Milla Jovovich and I spent months creating an AI memory system with Claude. It just posted a perfect score on the standard benchmark — beating every product in the space, free or paid.”

The headline numbers: 100% on LongMemEval (500/500 questions), 100% on LoCoMo, 92.9% on ConvoMem — allegedly more than double Mem0’s score. No API key required, no cloud dependency, MIT licensed, runs locally.

Jovovich (yes, the Resident Evil and Fifth Element actress) posted a video explanation on her Instagram. Sigman’s follow-up tweet framed her dual life:

“By day she’s filming action movies, walking Miu Miu fashion shows, and being a mom. By night, she’s coding.”

The topic trended on X within hours. The internet did what the internet does. It went viral. Then it went adversarial.

So, What’s New?

Strip away the celebrity angle and the inflated numbers, and MemPalace has a genuinely novel architectural idea worth paying attention to.

Most AI memory systems – Mem0, Zep, Letta – let an LLM decide what’s worth remembering. They extract facts like “user prefers Postgres” and discard the original conversation. MemPalace takes the opposite bet: store everything verbatim, then make it searchable. The README states the philosophy plainly:

“Other memory systems try to fix this by letting AI decide what’s worth remembering. It extracts ‘user prefers Postgres’ and throws away the conversation where you explained why. MemPalace takes a different approach: store everything, then make it findable.”

The organizing metaphor is the ancient Greek method of loci: a “memory palace.” Your data gets sorted into Wings (top-level topics like a person or project), Rooms (sub-topics), and Halls (memory types: facts, events, discoveries, preferences, advice). It’s built on a single ChromaDB collection plus a SQLite knowledge graph. Two runtime dependencies. Twenty-one Python files.

The write path is the interesting part: zero LLM involvement. All extraction, classification, and compression is deterministic. No API calls on ingest. Chunking is fixed at 800 characters with 100-character overlap. Room assignment follows a priority cascade, folder path, filename, keyword frequency, and fallback. This means you can mine months of ChatGPT or Claude exports completely offline.

The read path uses a 4-layer memory stack. Layer 0 loads your identity file (~50 tokens). Layer 1 loads compressed top-15 memories (~120 tokens). Layer 2 retrieves wing-scoped context on the topic trigger. Layer 3 does a full semantic search. Wake-up cost: roughly 170 tokens. That’s genuinely low.

Nobody else in the AI memory space is doing the spatial-metaphor-as-organizing-principle thing. Nobody else has a fully offline write path. These are real differentiators. They just aren’t what the launch marketing focused on.

Why This Matters for Developers

Three things are worth your attention here, independent of whether MemPalace itself becomes a lasting tool:

First, the “store everything” bet is architecturally sound and underexplored. The dominant approach in AI memory, LLM-extracted summaries, is lossy by design. You’re trusting a model to decide what matters at write time, before you know what you’ll need later. MemPalace’s retrieval-first approach sidesteps this. Independent testers confirmed a 96.6% retrieval score (recall@5) on LongMemEval’s raw mode – reproducible, no API needed. That’s a competitive number for a zero-cost local tool.
Second, the local-first, zero-dependency philosophy matters. Two pip packages. No cloud. No API key for writes. MIT license. Your memories never leave your machine. In a landscape where Mem0 charges $20–200/month, Zep targets enterprise pricing, and most tools require sending your data to someone else’s infrastructure, MemPalace’s operational model is meaningfully different. If you’re building agentic workflows and want persistent memory without vendor lock-in, this architecture is worth studying even if you never use the tool itself.
Third, the mining pipeline for existing chat exports is underrated. MemPalace can ingest your existing ChatGPT, Claude, and Slack histories and organize them into its palace structure. For developers who’ve accumulated months of context across multiple AI assistants, this is a practical capability most competing tools don’t offer.

But, Something Remains Unclear…

Okay, hear me out, the benchmark claims that made MemPalace famous are, at best, misleading. Why?

The 100% LongMemEval score measures the wrong thing

LongMemEval (ICLR 2025, UC Santa Barbara) is the gold standard for AI memory evaluation: 500 manually curated questions across ~115K tokens of chat history. Its primary metric is end-to-end QA accuracy: the system retrieves context, generates an answer, and GPT-4 judges it. MemPalace’s “100%” measures only retrieval recall@5. It never generates or judges an answer. For comparison, published end-to-end scores from real systems: EverMemOS at 83.0%, TiMem at 76.88%, Zep/Graphiti at 71.2%. MemPalace’s number lives in a different category entirely.

Even the 100% retrieval score was engineered. GitHub Issue #29 — the devastating technical audit that changed the conversation – documented three hand-coded boosts targeting specific failing questions. The held-out score on the other 450 questions: 98.4%. Still strong. But “98.4% retrieval recall” doesn’t trend on X the way “first perfect score ever recorded” does.

The 100% LoCoMo score is a retrieval bypass

The system sets top_k=50, which retrieves the entire conversation pool. As the Issue #29 auditor put it, the pipeline reduces to dumping every session into Claude Sonnet and asking which one matches. That’s cat *.txt | claude, not a memory system.

The “2× Mem0” ConvoMem comparison is apples-to-oranges. MemPalace’s 92.9% is retrieval-based. Mem0’s published numbers are end-to-end QA accuracy. Different metrics, different tasks.
“No API key” is only true for writes. Both 100% scores required paid Claude API calls for reranking and answer generation. The marketing said, “No API key. No cloud.” The benchmarks needed both.
The “30× lossless compression” is lossy. MemPalace’s AAAK compression mode — regex-based abbreviation with dictionary lookups — drops retrieval from 96.6% to 84.2%. A 12.4-point regression. The team has since acknowledged that the “lossless” claim was overstated. Leonard Lin’s independent code analysis further confirmed that the marketed “contradiction detection” feature doesn’t exist in the codebase — zero occurrences of the word “contradict” in the knowledge graph code.

Then there’s the provenance question. The original repository was pushed by a now-deleted GitHub account called “aya-thekeeper.” There’s no git author history connecting to any identifiable developer. Jovovich says her AI coding assistant “Lu” is Claude Code — meaning the codebase was substantially AI-generated. None of this is inherently disqualifying, but combined with Sigman’s crypto background and reports of a pumped-and-dumped “MemPalace” memecoin on pump.fun within 24 hours of launch, the trust deficit is real.

An X Community Note was appended to Sigman’s viral post, flagging the benchmark methodology issues. The r/LocalLLaMA community engaged with the project seriously but skeptically, as one reviewer put it, this is the crowd that “reads benchmark BENCHMARKS.md files on Saturday mornings.” The consensus landed somewhere between the hype and the dismissals: “People on X calling this fake are wrong about the project. They are closer to right about the numbers.”

The Bigger Picture

MemPalace landed at a moment when AI memory is becoming a genuine infrastructure category. The field has moved past basic RAG into stateful, adaptive memory systems — what some are calling “Context Engines.” Temporal knowledge graphs (Zep’s Graphiti, TiMem, EverMemOS) represent the frontier, tracking how facts evolve over time. Hybrid search — dense vector plus BM25 plus learned rerankers — is now baseline table stakes.

In this context, MemPalace’s minimalist approach is both its charm and its limitation. The spatial metaphor is clever but operationally reduces to ChromaDB metadata filtering — a standard vector DB feature. The knowledge graph is a simple SQLite triple store, far simpler than Graphiti’s temporal entity tracking. There’s no decay mechanism, no content dedup, no multi-hop retrieval, no feedback loops. It’s an interesting v1 with a novel organizing principle, not the competitive leapfrog that the marketing claimed.

The celebrity angle is the real behind-the-scenes backstory. MemPalace’s 36,800 stars didn’t come from its architecture. They came from the collision of a famous actress, a viral benchmark claim, and the AI hype cycle’s appetite for novelty. A technically identical project from an unknown developer would have maybe 200 stars and a quiet r/LocalLLaMA thread. The launch reached 1.5 million people in 24 hours — but it also attracted the kind of scrutiny that most open-source projects never face in their entire lifetime.

There’s a lesson here for every developer who’s ever been tempted to juice their benchmark numbers for a launch: the developer community will find out, and the correction will be louder than the original claim. MemPalace’s team has been responsive, updating docs, acknowledging overstatements, and engaging with Issue #29. But the viral first impression was built on numbers that don’t survive technical scrutiny, and in open source, trust is the hardest dependency to rebuild.

The project itself? I think it’s worth a read. The README.md is well-written. The architecture diagram is clear. The zero-LLM write path is a genuinely interesting design choice. If you’re building agent memory and want a local-first, privacy-preserving baseline to study or fork, MemPalace is a reasonable starting point.

Just, don’t believe the benchmarks too quickly 😉

Sources: Ben Sigman’s launch thread · X trending topic · GitHub: milla-jovovich/mempalace · r/LocalLLaMA discussion · mempalace.tech · GitHub Issue #29: benchmark methodology · Leonard Lin’s independent analysis · Kotaku investigation · Penfield Labs audit · LongMemEval benchmark paper

Resident Evil Star Milla Jovovich Shipped an AI Memory System. Devs Shredded Its Benchmarks | HackerNoon

The Big Announcement

So, What’s New?

Why This Matters for Developers

But, Something Remains Unclear…

The 100% LongMemEval score measures the wrong thing

The 100% LoCoMo score is a retrieval bypass

The Bigger Picture

Leave a Reply

The Big Announcement

So, What’s New?

Why This Matters for Developers

But, Something Remains Unclear…

The 100% LongMemEval score measures the wrong thing

The 100% LoCoMo score is a retrieval bypass

The Bigger Picture

Sign Up For Daily Newsletter

Be keep up! Get the latest breaking news delivered straight to your inbox.

Leave a Reply Cancel reply

Leave a Reply