By using this site, you agree to the Privacy Policy and Terms of Use.
Accept
World of SoftwareWorld of SoftwareWorld of Software
  • News
  • Software
  • Mobile
  • Computing
  • Gaming
  • Videos
  • More
    • Gadget
    • Web Stories
    • Trending
    • Press Release
Search
  • Privacy
  • Terms
  • Advertise
  • Contact
Copyright © All Rights Reserved. World of Software.
Reading: From LLM to Agent: How Memory + Planning Turn a Chatbot Into a Doer | HackerNoon
Share
Sign In
Notification Show More
Font ResizerAa
World of SoftwareWorld of Software
Font ResizerAa
  • Software
  • Mobile
  • Computing
  • Gadget
  • Gaming
  • Videos
Search
  • News
  • Software
  • Mobile
  • Computing
  • Gaming
  • Videos
  • More
    • Gadget
    • Web Stories
    • Trending
    • Press Release
Have an existing account? Sign In
Follow US
  • Privacy
  • Terms
  • Advertise
  • Contact
Copyright © All Rights Reserved. World of Software.
World of Software > Computing > From LLM to Agent: How Memory + Planning Turn a Chatbot Into a Doer | HackerNoon
Computing

From LLM to Agent: How Memory + Planning Turn a Chatbot Into a Doer | HackerNoon

News Room
Last updated: 2026/02/17 at 12:09 PM
News Room Published 17 February 2026
Share
From LLM to Agent: How Memory + Planning Turn a Chatbot Into a Doer | HackerNoon
SHARE

The Day Your LLM Stops Talking and Starts Doing

There’s a moment in every LLM project where you realize the “chat” part is the easy bit.

The hard part is everything that happens between the user request and the final output:

  • gathering missing facts,
  • choosing which tools to call (and in what order),
  • handling failures,
  • remembering prior decisions,
  • and not spiraling into confident nonsense when the world refuses to match the model’s assumptions.

That’s the moment you’re no longer building “an LLM app.”

You’re building an agent.

In software terms, an agent is not a magical model upgrade. It’s a system design pattern:

Agent = LLM + tools + a loop + state

Once you see it this way, “memory” and “planning” stop being buzzwords and become engineering decisions you can reason about, test, and improve.

Let’s break down how it works.

1) What Is an LLM Agent, Actually?

A classic LLM app looks like this:

user_input -> prompt -> model -> answer

An agent adds a control loop:

user_input
  -> (state) -> model -> action
  -> tool/environment -> observation
  -> (state update) -> model -> action
  -> ... repeat ...
  -> final answer

The difference is subtle but massive:

  • A chatbot generates text.
  • An agent executes a policy over time.

The model is the policy engine; the loop is the runtime.

This means agents are fundamentally about systems: orchestration, state, observability, guardrails, and evaluation.

2) Memory: The Two Buckets You Can’t Avoid

Human-like “memory” in agents usually becomes two concrete buckets:

2.1 Short-Term Memory (Working Memory)

Short-term memory is whatever you stuff into the model’s current context:

  • the current conversation (or the relevant slice of it),
  • tool results you just fetched,
  • intermediate notes (“scratchpad”),
  • temporary constraints (deadlines, budgets, requirements).

Engineering reality check: short-term memory is limited by your context window and by model behavior.

Two classic failure modes show up in production:

  1. Context trimming: you cut earlier messages to save tokens → the agent “forgets” key constraints.
  2. Recency bias: even with long contexts, models over-weight what’s near the end → old-but-important details get ignored.

If you’ve ever watched an agent re-ask for information it already has, you’ve seen both.

2.2 Long-Term Memory (Persistent Memory)

Long-term memory is stored outside the model:

  • vector DB embeddings,
  • document stores,
  • user profiles/preferences (only if you can do it safely and legally),
  • task history and decisions,
  • structured records (tickets, orders, logs, CRM entries).

The mainstream pattern is: retrieve → inject → reason.

If that sounds like RAG (Retrieval-Augmented Generation), that’s because it is. Agents just make RAG operational: retrieval isn’t only for answering questions—it’s for deciding what to do next.

The part people miss: memory needs structure

A pile of vector chunks is not “memory.” It’s a landfill.

Practical long-term memory works best when you store:

  • semantic content (embedding),
  • metadata (timestamp, source, permissions, owner, reliability score),
  • a policy for when to write/read (what gets saved, what gets ignored),
  • decay or TTL for things that stop mattering.

If you don’t design write/read policies, you’ll build an agent that remembers the wrong things forever.

3) Planning: From Decomposition to Search

Planning sounds philosophical, but it maps to one question:

How does the agent choose the next action?

In real tasks, “next action” is rarely obvious. That’s why we plan: to reduce a big problem into smaller moves with checkpoints.

3.1 Task Decomposition: Why It’s Not Optional

When you ask an agent to “plan,” you’re buying:

  • controllability: you can inspect steps and constraints,
  • debuggability: you can see where it went wrong,
  • tool alignment: each step can map to a tool call,
  • lower hallucination risk: fewer leaps, more verification.

But planning can be cheap or expensive depending on the technique.

3.2 CoT: Linear Reasoning as a Control Interface

Chain-of-Thought style prompting nudges the model to produce intermediate reasoning before the final output.

From an engineering perspective, the key benefit is not “the model becomes smarter.” It’s that the model becomes more steerable:

  • it externalizes intermediate state,
  • it decomposes implicitly into substeps,
  • and you can gate or validate those steps.

CoT ≠ show-the-user-everything

In production, you often want the opposite: use structured reasoning internally, then output a crisp answer.

This is both a UX decision (nobody wants a wall of text) and a safety decision (you don’t want to leak internal deliberations, secrets, or tool inputs).

3.3 ToT: When Reasoning Becomes Search

Linear reasoning fails when:

  • there are multiple plausible paths,
  • early choices are hard to reverse,
  • you need lookahead (trade-offs, planning, puzzles, strategy).

Tree-of-Thought style reasoning turns “thinking” into search:

  • expand: propose multiple candidate thoughts/steps,
  • evaluate: score candidates (by heuristics, constraints, or another model call),
  • select: continue exploring the best branches,
  • optionally backtrack if a branch collapses.

If CoT is “one good route,” ToT is “try a few routes, keep the ones that look promising.”

The cost: token burn

Search is expensive. If you expand branches without discipline, cost grows fast.

So ToT tends to shine in:

  • high-value tasks,
  • problems with clear evaluation signals,
  • situations where being wrong is more expensive than being slow.

3.4 GoT: The Engineering Upgrade (Reuse, Merge, Backtrack)

Tree search wastes work when branches overlap.

Graph-of-Thoughts takes a practical step:

  • treat intermediate reasoning as states in a directed graph,
  • allow merging equivalent states (reuse),
  • support backtracking to arbitrary nodes,
  • apply pruning more aggressively.

If ToT is a tree, GoT is a graph with memory: you don’t re-derive what you already know.

This matters in production where repeated tool calls and repeated reasoning are the real cost drivers.

3.5 XoT: “Everything of Thoughts” as Research Direction

XoT-style approaches try to unify thought paradigms and inject external knowledge and search methods (think: MCTS-style exploration + domain guidance).

It’s promising, but the engineering bar is high:

  • you need reliable evaluation,
  • tight budgets,
  • and a clear “why” for using something heavier than a well-designed plan loop.

In practice, many teams implement a lightweight ToT/GoT hybrid without the full research stack.

4) ReAct: The Loop That Makes Agents Feel Real

Planning is what the agent intends to do.

ReAct is what the agent actually does:

  1. Reason about what’s missing / what to do next
  2. Act by calling a tool
  3. Observe the result
  4. Reflect and adjust

Repeat until done.

This solves three real problems:

  • incomplete information: the agent can fetch what it doesn’t know,
  • verification: it can check assumptions against reality,
  • error recovery: it can reroute after failures.

If you’ve ever debugged a hallucination, you already know why this matters: a believable explanation isn’t the same thing as a correct answer.

5) A Minimal Agent With Memory + Planning (Practical Version)

Below is a deliberately “boring” agent loop. That’s the point.

Most production agents are not sci-fi. They’re well-instrumented control loops with strict budgets.

from dataclasses import dataclass, field
from typing import Any, Dict, List, Optional
import time
​
# --- Tools (stubs) ---------------------------------------------------------
​
def web_search(query: str) -> str:
    # Replace with your search API call + caching.
    return f"[search-results for: {query}]"
​
def calc(expression: str) -> str:
    # Replace with a safe evaluator.
    return str(eval(expression, {"__builtins__": {}}, {}))
​
# --- Memory ----------------------------------------------------------------
​
@dataclass
class MemoryItem:
    text: str
    ts: float = field(default_factory=lambda: time.time())
    meta: Dict[str, Any] = field(default_factory=dict)
​
@dataclass
class MemoryStore:
    short_term: List[MemoryItem] = field(default_factory=list)
    long_term: List[MemoryItem] = field(default_factory=list)  # stand-in for vector DB
​
    def remember_short(self, text: str, **meta):
        self.short_term.append(MemoryItem(text=text, meta=meta))
​
    def remember_long(self, text: str, **meta):
        self.long_term.append(MemoryItem(text=text, meta=meta))
​
    def retrieve_long(self, hint: str, k: int = 3) -> List[MemoryItem]:
        # Dummy retrieval: filter by substring.
        hits = [m for m in self.long_term if hint.lower() in m.text.lower()]
        return sorted(hits, key=lambda m: m.ts, reverse=True)[:k]
​
# --- Planner (very small ToT-ish idea) ------------------------------------
​
def propose_plans(task: str) -> List[str]:
    # In reality: this is an LLM call producing multiple plan candidates.
    return [
        f"Search key facts about: {task}",
        f"Break task into steps, then execute step-by-step: {task}",
        f"Ask a clarifying question if constraints are missing: {task}",
    ]
​
def score_plan(plan: str) -> int:
    # Heuristic scoring: prefer plans that verify facts.
    if "Search" in plan:
        return 3
    if "Break task" in plan:
        return 2
    return 1
​
# --- Agent Loop ------------------------------------------------------------
​
def run_agent(task: str, memory: MemoryStore, max_steps: int = 6) -> str:
    # 1) Retrieve long-term memory if relevant.
    recalled = memory.retrieve_long(hint=task)
    for item in recalled:
        memory.remember_short(f"Recalled: {item.text}", source="long_term")
​
    # 2) Plan (cheap multi-candidate selection).
    plans = propose_plans(task)
    plan = max(plans, key=score_plan)
    memory.remember_short(f"Chosen plan: {plan}")
​
    # 3) Execute loop.
    for step in range(max_steps):
        # In reality: this is an LLM call that decides "next tool" based on state.
        if "Search" in plan and step == 0:
            obs = web_search(task)
            memory.remember_short(f"Observation: {obs}", tool="web_search")
            continue
​
        # Example: do a small computation if the task contains a calc hint.
        if "calculate" in task.lower() and step == 1:
            obs = calc("6 * 7")
            memory.remember_short(f"Observation: {obs}", tool="calc")
            continue
​
        # Stop condition (simplified).
        if step >= 2:
            break
​
    # 4) Final answer: summarise short-term state.
    notes = "n".join([f"- {m.text}" for m in memory.short_term[-8:]])
    return f"Task: {task}nnWhat I did:n{notes}nnFinal: (produce a user-facing answer here.)"
​
# Demo usage:
mem = MemoryStore()
mem.remember_long("User prefers concise outputs with clear bullets.", tag="preference")
print(run_agent("Write a short guide on LLM agents with memory and planning", mem))

What this toy example demonstrates (and why it matters)

  • Memory is state, not vibe. It’s read/write with policy.
  • Planning can be multi-candidate without going full ToT. Generate a few, pick one, move on.
  • Tool calls are first-class. Observations update state, not just the transcript.
  • Budgets exist. max_steps is a real safety and cost control.

6) Production Notes: Where Agents Actually Fail

If you want this to work outside demos, you’ll spend most of your time on these five areas.

6.1 Tool reliability beats prompt cleverness

Tools fail. Time out. Rate limit. Return weird formats.

Your agent loop needs:

  • retries with backoff,
  • strict schemas,
  • parsing + validation,
  • and fallback strategies.

A “smart” agent without robust I/O is just a creative writer with API keys.

6.2 Memory needs permissions and hygiene

If you store user data, you need:

  • clear consent and retention rules,
  • permission checks at retrieval time,
  • deletion pathways,
  • and safe defaults.

In regulated environments, long-term memory is often the highest-risk component.

6.3 Planning needs evaluation signals

Search-based planning is only as good as its scoring.

You’ll likely need:

  • constraint checkers,
  • unit tests for tool outputs,
  • or a separate “critic” model call that can reject bad steps.

6.4 Observability is not optional

If you can’t trace:

  • which tool was called,
  • with what inputs,
  • what it returned,
  • and how it changed the plan,

you can’t debug. You also can’t measure improvements.

Log everything. Then decide what to retain.

6.5 Security: agents amplify blast radius

When a model can take actions, mistakes become incidents.

Guardrails look like:

  • allowlists (tools, domains, actions),
  • spend limits,
  • step limits,
  • sandboxing,
  • and human-in-the-loop gates for high-impact actions.

7) The Real “Agent Upgrade”: A Better Mental Model

If you remember one thing, make it this:

An agent is an LLM inside a state machine.

  • Memory = state
  • Planning = policy shaping
  • Tools = actuators
  • Observations = state transitions
  • Reflection = error-correcting feedback

Once you build agents this way, you stop chasing “the perfect prompt” and start shipping systems that can survive reality.

And reality is the only benchmark that matters.

Sign Up For Daily Newsletter

Be keep up! Get the latest breaking news delivered straight to your inbox.
By signing up, you agree to our Terms of Use and acknowledge the data practices in our Privacy Policy. You may unsubscribe at any time.
Share This Article
Facebook Twitter Email Print
Share
What do you think?
Love0
Sad0
Happy0
Sleepy0
Angry0
Dead0
Wink0
Previous Article This Cool New Car Could Change The Future Of EV Batteries Forever – BGR This Cool New Car Could Change The Future Of EV Batteries Forever – BGR
Next Article Best cheap laptop deal: Save 0 on the Dell 15 DC15250 Laptop Best cheap laptop deal: Save $150 on the Dell 15 DC15250 Laptop
Leave a comment

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Stay Connected

248.1k Like
69.1k Follow
134k Pin
54.3k Follow

Latest News

The Galaxy S26 is fixing my biggest complaint from the S25
The Galaxy S26 is fixing my biggest complaint from the S25
Gadget
What Real Student Reviews Reveal About the SCLA Experience
What Real Student Reviews Reveal About the SCLA Experience
Gadget
Best earbud deal: Get a pair of Bose Ultra Open Earbuds for 0 off
Best earbud deal: Get a pair of Bose Ultra Open Earbuds for $100 off
News
China’s Neta launches three electric car models in Brazil · TechNode
China’s Neta launches three electric car models in Brazil · TechNode
Computing

You Might also Like

China’s Neta launches three electric car models in Brazil · TechNode
Computing

China’s Neta launches three electric car models in Brazil · TechNode

1 Min Read
MTN moves to take full control of IHS Towers in .2 billion deal
Computing

MTN moves to take full control of IHS Towers in $2.2 billion deal

5 Min Read
Content Creation Systems: The Difference Between Posting and Earning
Computing

Content Creation Systems: The Difference Between Posting and Earning

5 Min Read

Best AI Tools for Marketing Teams

19 Min Read
//

World of Software is your one-stop website for the latest tech news and updates, follow us now to get the news that matters to you.

Quick Link

  • Privacy Policy
  • Terms of use
  • Advertise
  • Contact

Topics

  • Computing
  • Software
  • Press Release
  • Trending

Sign Up for Our Newsletter

Subscribe to our newsletter to get our newest articles instantly!

World of SoftwareWorld of Software
Follow US
Copyright © All Rights Reserved. World of Software.
Welcome Back!

Sign in to your account

Lost your password?