By using this site, you agree to the Privacy Policy and Terms of Use.
Accept
World of SoftwareWorld of SoftwareWorld of Software
  • News
  • Software
  • Mobile
  • Computing
  • Gaming
  • Videos
  • More
    • Gadget
    • Web Stories
    • Trending
    • Press Release
Search
  • Privacy
  • Terms
  • Advertise
  • Contact
Copyright © All Rights Reserved. World of Software.
Reading: Why Your AI System Can Look Healthy While Producing Zero Value | HackerNoon
Share
Sign In
Notification Show More
Font ResizerAa
World of SoftwareWorld of Software
Font ResizerAa
  • Software
  • Mobile
  • Computing
  • Gadget
  • Gaming
  • Videos
Search
  • News
  • Software
  • Mobile
  • Computing
  • Gaming
  • Videos
  • More
    • Gadget
    • Web Stories
    • Trending
    • Press Release
Have an existing account? Sign In
Follow US
  • Privacy
  • Terms
  • Advertise
  • Contact
Copyright © All Rights Reserved. World of Software.
World of Software > Computing > Why Your AI System Can Look Healthy While Producing Zero Value | HackerNoon
Computing

Why Your AI System Can Look Healthy While Producing Zero Value | HackerNoon

News Room
Last updated: 2026/04/14 at 9:05 PM
News Room Published 14 April 2026
Share
Why Your AI System Can Look Healthy While Producing Zero Value | HackerNoon
SHARE

Three weeks ago, we published our findings from running for 504 continuous hours with no intervention. We noticed a few problems which we had to fix and try again. We ran this again for another three weeks and it seems the second failure mode encountered in this run is arguably worse than the first.

Here’s what the numbers actually looked like

Looking at the table, every infrastructure metric improved while every outcome metric stayed at zero.

From Rogue to Inert

The first experiment’s failure mode was loud and dramatic as the agent went rogue by scheduling its own jobs, spawning subagent swarms, recycling cached responses 121 times in a single day. It was basically doing too much of the wrong thing while the second experiment’s failure mode was the opposite. The agent did everything correctly and accomplished nothing.

The Memory Paradox

In the first run, the memory system failed by modeling the agent instead of the user. It stored facts like “The system uses SKILL.md files to define agent skills” a perfect closed loop of self-referential observation.

In the second run after the fix, the memory extraction was properly filtered. The system stored 614 memories with genuine user-relevant content: names, preferences, communication style, business context. The extraction pipeline worked, the deduplication worked, the priority scoring worked even checkpoint snapshots were created on schedule. But none of it mattered, because the retrieval system was broken.

The memory architecture uses vector embeddings for semantic search. When the user asks a question, the system embeds the query, searches for similar memories, and injects relevant context into the LLM’s prompt. This requires an embedding model running on LM Studio at http://127.0.0.1:1234/v1/embeddings.

That endpoint returned 400 Bad Request for the entire 3 week run.

2026-03-27 18:28:39 | ERROR | Memory retrieval failed:
  Client error '400 Bad Request' for url 'http://127.0.0.1:1234/v1/embeddings'

Six hundred fourteen memories, carefully extracted and scored, sitting in a SQLite database that nothing ever read. The access_count field for every single memory was 0 as the system had graduated from storing irrelevancies to storing a gold mine with no way to access it.

Even though this retrieval failure was logged as an ERROR, it didn’t crash the process, the agent kept running normally, appearing functional while operating without any long-term context.

2026-03-29 15:40:24 | repetition | -0.50 | similar to earlier user message

The signal detection system identified these repetitive responses and it adjusted memory priorities downward. But without retrieval, adjusted priorities had nowhere to go. The feedback loop was complete: detect repetition, lower priority, fail to retrieve, repeat.

The JSON Problem Persisted

In the first run, the memory consolidator couldn’t parse JSON wrapped in markdown code fences which caused silent duplication. We fixed that by stripping fences before parsing.

In the second run, 127 out of the memory consolidation responses failed with “Invalid LLM decision” as the model was producing malformed JSON that couldn’t be parsed even after normalization.

This is the local LLM tax, cloud APIs have structured output modes that guarantee valid JSON. Local models running quantized weights at 1-bit precision do not. The model used in this experiment (Qwen3-Coder-Next, IQ1_S quant) is remarkably good at tool use and natural language, but asking it to consistently produce valid JSON for a structured decision pipeline pushes against its limits. When it works, it works well. When it doesn’t, it results in127 silent failures.

If you’re building agent memory systems that run on local models, you need either aggressive retry logic with format correction, or a simpler decision format that the model can hit reliably, maybe a single keyword (ADD/UPDATE/NOOP/DELETE) followed by unstructured content, rather than a full JSON object.

Comparing the Two Failure Modes n

A rogue agent is visible because it fills logs, burns tokens, triggers alerts which you notice. An inert agent passes every health check while delivering zero value. It runs for 3 weeks, maintains 99.8% uptime, executes 500+ scans, extracts 614 memories and the user has nothing to show for it.

What We Learned

1. Infrastructure reliability masks execution failure. 99.8% uptime doesn’t mean the system is working, it means the system is running. You need measured outcome metrics such as memories retrieved, posts published, tasks completed not just process metrics like uptime and scan count.

2. Fixing architecture reveals operational gaps. The first run’s problems were architectural: no context isolation, no extraction filtering, no format normalization. Fixing those revealed that the system also needed working infrastructure (embedding service), actual credentials being loaded where they’re needed, and executable code behind skill definitions. Architecture is necessary but not sufficient.

3. A dead dependency can silently disable an entire subsystem. The embedding endpoint returned 400 for 3 weeks. If the memory retrieval had been a hard dependency: fail the request if context can’t be retrieved, the problem would have been caught on day 1 instead of week 3.

4. The local LLM structured output problem is ongoing. Both runs surfaced JSON parsing failures from the local model. The format changed (code fences in run 1, malformed JSON in run 2), but the failure class persisted. If your agent pipeline depends on structured model output, you need to design for partial parsing failures, not just handle them as exceptions.

Sign Up For Daily Newsletter

Be keep up! Get the latest breaking news delivered straight to your inbox.
By signing up, you agree to our Terms of Use and acknowledge the data practices in our Privacy Policy. You may unsubscribe at any time.
Share This Article
Facebook Twitter Email Print
Share
What do you think?
Love0
Sad0
Happy0
Sleepy0
Angry0
Dead0
Wink0
Previous Article Turn Messy PDFs Into Editable Files in Seconds With This  Deal Turn Messy PDFs Into Editable Files in Seconds With This $30 Deal
Next Article Google adds reusable prompts to Gemini in Chrome –  News Google adds reusable prompts to Gemini in Chrome – News
Leave a comment

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Stay Connected

248.1k Like
69.1k Follow
134k Pin
54.3k Follow

Latest News

How Renewals and Restores Affect Transaction IDs in StoreKit 2 | HackerNoon
How Renewals and Restores Affect Transaction IDs in StoreKit 2 | HackerNoon
Computing
Commercials Ruining YouTube? 8 Ad-Blocking Techniques That Still Work
Commercials Ruining YouTube? 8 Ad-Blocking Techniques That Still Work
News
Huawei and Oppo to launch eSIM phones this year as China’s major carriers roll out nationwide eSIM services · TechNode
Huawei and Oppo to launch eSIM phones this year as China’s major carriers roll out nationwide eSIM services · TechNode
Computing
19 Things to Cut When Money Gets Tight (Most People Ignore #11)
19 Things to Cut When Money Gets Tight (Most People Ignore #11)
News

You Might also Like

How Renewals and Restores Affect Transaction IDs in StoreKit 2 | HackerNoon
Computing

How Renewals and Restores Affect Transaction IDs in StoreKit 2 | HackerNoon

6 Min Read
Huawei and Oppo to launch eSIM phones this year as China’s major carriers roll out nationwide eSIM services · TechNode
Computing

Huawei and Oppo to launch eSIM phones this year as China’s major carriers roll out nationwide eSIM services · TechNode

1 Min Read
Meet Kilo: HackerNoon Company of the Week | HackerNoon
Computing

Meet Kilo: HackerNoon Company of the Week | HackerNoon

3 Min Read
JD.com teams up with GAC and CATL to launch new car next month · TechNode
Computing

JD.com teams up with GAC and CATL to launch new car next month · TechNode

1 Min Read
//

World of Software is your one-stop website for the latest tech news and updates, follow us now to get the news that matters to you.

Quick Link

  • Privacy Policy
  • Terms of use
  • Advertise
  • Contact

Topics

  • Computing
  • Software
  • Press Release
  • Trending

Sign Up for Our Newsletter

Subscribe to our newsletter to get our newest articles instantly!

World of SoftwareWorld of Software
Follow US
Copyright © All Rights Reserved. World of Software.
Welcome Back!

Sign in to your account

Lost your password?