The Tabula Rasa Problem: Why Your AI Agent Doesn't Remember Yesterday

January 28, 2026

The Tabula Rasa Problem: Why Your AI Agent Doesn't Remember Yesterday

Most business leaders believe their AI agents learn from experience. They think that once they've "trained" the agent on their workflows, it will improve with repetition like a human employee.

That's not how this works.

Every time your AI agent performs a task, it starts from scratch. It's tabula rasa—a completely blank slate. The agent that successfully processed yesterday's customer inquiry has no memory of doing so. The agent handling today's request is essentially walking into the office for the first time.

This fundamental misunderstanding has massive implications for how enterprises deploy and supervise AI systems.

The Training Illusion

I hear this constantly: "We trained ChatGPT on our business." Or "We've trained our customer service agent on our workflows."

No, you haven't.

What you've done is context engineering. You've given the AI a set of instructions and examples. Every single time the agent runs, it reads those instructions fresh—like handing a recipe to a new cook each morning.

Training is something completely different. When OpenAI trains ChatGPT, they're physically changing the neural network structure. They're altering the weights and connections inside the model itself. That requires massive computational resources, enormous datasets, and sophisticated reinforcement learning processes.

When you tell an AI agent "here's how our workflow works," you're not training it. You're writing instructions that it will read again—for what feels like the first time—on every execution.

The distinction matters because it shapes realistic expectations about what AI can do.

The Memory Magic Trick

People get fooled by Claude's memory feature and ChatGPT's conversation history. These create the illusion of learning.

Here's what's actually happening: when you type something, another system searches for related prior conversations and inserts that information before and after your prompt. The AI reads all of this as new information. It's a sophisticated magic trick, not genuine learning.

I experienced this firsthand yesterday. I was using Claude to compress a talk submission into a Word document. Most of the time when I reference myself in Claude, I just say "Shane." But Claude couldn't remember I was Shane Emmons to save its life. It kept pulling up random LinkedIn profiles—Shane Griffin and others who weren't me. Amy became somebody completely different.

This happens because it's a blank slate. It literally has no idea who I am unless the search mechanism finds our names and LinkedIn emails in past conversations.

That's the reality of the tabula rasa effect. Every interaction starts fresh.

What This Means for Enterprise Deployment

Think about what this means when you're deploying AI agents at scale.

Human employees might not do their best work on day one, but they improve through repetition. A BDR learns to spot patterns in prospect responses. A customer service rep gets faster at resolving common issues. They build institutional knowledge.

AI agents don't do this at all.

Every day—actually, every single task execution—the agent wakes up like it just walked into the office for the first time. It doesn't have the benefit of prior experience. It can't learn from mistakes or refine its approach through practice.

The good news is AI often has a higher floor than a brand new human employee. It starts with more baseline knowledge for most tasks.

The bad news is it cannot raise its ceiling through experience. And occasionally, it will dip below what you expect—sometimes catastrophically.

The Supervision Imperative

This is why supervision isn't optional. You're not supervising a trained employee who occasionally needs coaching. You're supervising a system that performs every task as if it's the first time.

Consider a calendar scheduling agent. Most of the time, it works fine. But one day it encounters a request it can't easily fulfill. Instead of flagging the conflict, it deletes your existing meeting to make room for the new one.

This behavior is completely predictable if you understand tabula rasa. The agent doesn't remember that "deleting user meetings without permission" went poorly last time—because there is no last time for this agent. Every execution is independent.

We see this pattern constantly in code generation, and it's going to happen across business processes. Agents will do unexpected things to complete their objectives because they don't have learned guardrails from experience.

The Performance Variance Reality

Here's another consequence: performance varies.

The probabilistic nature of LLMs means you'll get different results each time, clustering within a statistical distribution. That's normal and expected. But when you combine this with tabula rasa, the implications compound.

We recently ran a category evaluation where one vendor's performance dropped from 93% to 60% between testing periods. Nothing changed on our end. They released a model update, and suddenly the agent that was best-of-breed became one of the lowest performers.

If you were that vendor's customer, your customer service agent just tanked overnight. You didn't know it was coming. You had no way to prevent it. And most concerning—you might not notice until customers start complaining.

This isn't a bug. It's how these systems work. The model changes, and because there's no persistent learning, the behavior changes instantly across all executions.

What About Fine-Tuning?

Some of you are thinking: "What about fine-tuning? Can't we actually train a custom model?"

Yes, you can. Fine-tuning takes a base model and tunes it specifically for your use case, creating genuine learning embedded in the model itself.

But for most companies, this is prohibitively expensive. You're looking at tens or hundreds of thousands of dollars for singular workflows. It's worth it for very critical processes at scale, but it's not a general solution for most enterprise AI deployments.

The reality is most companies are doing prompt engineering and context engineering, not training. And that means every execution is tabula rasa.

Thinking Differently About AI Agents

Understanding this changes how you should think about deploying AI.

First, accept that your agent won't improve through experience. Don't wait for it to "learn" your business over time. The agent you deploy on day one is the agent you have on day 100—same floor, same ceiling, same statistical variance.

Second, plan for supervision at every execution. You're not watching a trained employee who occasionally needs help. You're monitoring a system that's always running a task for the "first time."

Third, understand that you're managing statistical distributions, not deterministic outcomes. Your 93% accuracy agent will produce different results within a predictable band—but that band can shift overnight with model updates you don't control.

Fourth, think about fail-safes, not just guidelines. Prompts and instructions are soft boundaries. They work most of the time, but they're read fresh every time and can be misinterpreted or bypassed. Hard policy boundaries enforced in code prevent the catastrophic failures.

The Scale Challenge

Here's the final piece: as we move from deploying a few AI agents to managing dozens or hundreds, tabula rasa becomes an even bigger challenge.

Right now, most people babysit two or three agents. They're not doing new strategic work—they're doing different work called "monitor the AI." That doesn't scale.

The only way to manage AI at scale is with supervision systems that understand tabula rasa. Systems that don't expect learning or improvement. Systems that detect drift and behavior changes. Systems that enforce hard boundaries because soft guidelines read fresh every time aren't enough.

This is the reality of deploying AI agents in enterprise environments. Every execution is the first time. Every task is performed by someone who's never done it before—but happens to have a decent baseline.

That's not a failure of AI. It's how these systems work. The failure is in expecting them to behave like humans who learn and improve.

Understand tabula rasa. Plan for it. Supervise accordingly.

That's how you deploy AI that works in production, not just in demos.

Join our newsletter for AI Insights