Blog

Why Your AI Forgets the Plot After 20 Messages (and the Fix)

May 31, 2026 · Updated May 31, 2026

Your AI forgets the plot because every model reads from a fixed-size context window, and once your story outgrows it, the oldest text is simply dropped. The name you established in turn three falls off the back. The fix is to stop feeding it raw transcript and instead carry forward compressed summaries of each scene — so the past stays present without filling the window.

What's actually happening

There is no memory in the way you'd think of memory. The model does not remember your story. Every single turn, it re-reads everything you've written so far, from the top, and predicts what comes next. That re-reading happens inside a fixed budget called the context window — a hard ceiling on how much text the model can hold in front of it at once.

For the first dozen turns this is invisible. The whole story fits. Then it doesn't. Most tools handle the overflow the crudest way possible: they cut from the top. The oldest turns get trimmed off so the newest ones still fit. You don't see the cut happen. You just notice, around turn twenty, that the narrator has quietly forgotten your character's sister exists.

Why it feels like betrayal

Because the failure is silent and selective. The model writes the next paragraph just as fluently as it wrote the first. The prose doesn't degrade. The plot does. The husband who died in chapter one walks back into the kitchen. The rain you established stops mattering. A character's accent drifts. Nothing announces itself as broken, which is exactly why it's maddening — the writing stays confident while the story underneath quietly comes apart.

What a truncated model does

"Of course! Just to confirm — what was your character's name again, and could you remind me where we are in the story so I can pick up where we left off?"

What carry-forward gives you

Elena set the unopened letter on the counter where it had sat for three days. Her sister's handwriting. She'd promised herself she would read it before the funeral, and the funeral was tomorrow.

The difference isn't model quality. The same model produced both. The difference is what it was handed. The first was handed a window with the beginning chopped off and had to ask. The second was handed a tight summary of everything that mattered — the sister, the letter, the deadline — and just kept writing.

The fix: treat the story as scenes, not a transcript

A novel doesn't carry every word of chapter one into chapter twelve. It carries consequences. The reader remembers that the marriage failed, not the exact phrasing of the argument. That's the model Underfiction uses.

Scenes are chapters. When a scene ends, it gets summarized — the facts, the relationships, the unresolved tension — and that summary carries forward into everything after it.
Long scenes compress in place. Older turns inside a running scene get condensed automatically so the window never fills with stale back-and-forth.
The window stays clear. Instead of raw transcript crowding out room to write, the model holds a dense map of the story so far plus the live scene in front of it.
You never see the seams. The narrator keeps the sister, the letter, the weather, the grudge — because they were written into the carried context, not left to fall off the back.

A novel carries consequences forward, not every word. So does a well-built story engine.

The part that surprises people

Done well, compression makes the writing better, not just longer. A model staring at forty turns of raw dialogue gets distracted by the noise. A model handed a clean summary plus the current scene knows what's load-bearing. It foreshadows. It calls back to the letter you forgot you'd planted. It writes like something that has read the whole book, because in effect it has — through the summary, not the transcript.

You're the director. The engine follows. Part of following is remembering — and remembering well means knowing what to keep and what to let go, the same instinct a good editor brings to a draft.

If you want the full mechanics — what goes into a summary, how the compression boundary is chosen, why it stays free — read the deep dive, "How Scene-Based Context Compression Works." This was the plain-terms version. That one shows you the gears.

Underfiction is interactive storytelling under your direction. Stories are local by default; sync is optional and encrypted. New accounts start with 500 free credits.

New accounts start with 500 free credits after email confirmation.

Try Underfiction

Frequently asked questions

Why does my AI forget the plot after 20 messages?

Because the model reads your whole story from a fixed-size context window every turn, and once the story outgrows that budget, most tools trim the oldest text to make room. The early facts — names, deaths, established details — fall off the back, so the model writes confidently but loses the plot.

Does a bigger context window fix AI memory loss?

It delays the problem, it doesn't solve it. A larger window holds more turns before truncation starts, but any long story will eventually outrun any fixed budget. The real fix is compressing past scenes into carry-forward summaries so the window never fills with raw transcript.

How does Underfiction keep track of long stories?

It treats scenes as chapters. When a scene ends, it's summarized and that summary carries forward. Inside long scenes, older turns are compressed automatically. The model always holds a dense map of the story plus the live scene, so it doesn't lose earlier facts.

Does compression cost extra credits?

No. Summaries and compression are free. You only pay for the prose you generate, never for the housekeeping that keeps your plot intact. There's no subscription and no ads.

Will summarizing my story make the writing worse?

Usually the opposite. A model handed clean summaries plus the current scene knows what's load-bearing, instead of getting distracted by dozens of turns of raw dialogue. It calls back to planted details and foreshadows better, because it's working from a map of the whole story.