Technology

How AI Actually "Knows" Things

Paul Kruchoski

25 Nov 2025 — 5 min read

A practical guide for getting more out of Claude and ChatGPT (and written based on conversations last week!)

There's a counterintuitive truth about working with AI that took me a while to internalize: more isn't better. Not more training data, not more context, not more conversation history. The models that perform best on specific tasks are often trained on carefully curated, focused datasets. And the conversations that produce the most useful outputs are ones where I've been deliberate about what the AI can "see."

Understanding why requires unpacking how AI actually processes information. There are three distinct layers here, and they work very differently from each other.

The Three Layers of AI "Knowledge"

Training data is the foundation—the massive corpus of text and code that the model learned patterns from during its creation. Think of it like the books you read growing up that shaped how you think. You can't consciously recall every sentence from The Great Gatsby, but it influenced your sense of narrative structure, your vocabulary, certain cultural references. Training data works similarly. It's baked in, it has a cutoff date, and users can't change it.

This is where the "less is more" principle first shows up. The most capable specialized models aren't trained on "everything"—they're trained on carefully curated datasets relevant to their domain. A model trained on high-quality medical literature will outperform one trained on the entire internet for clinical reasoning. Signal-to-noise ratio matters enormously.

System instructions are the briefing that happens when a conversation starts. These shape how the model behaves—its persona, its rules, its capabilities. When you notice that Claude feels different in one app versus another, or that ChatGPT with custom instructions behaves differently than the default, you're experiencing the effect of different system instructions. They're powerful, but they're not "knowledge" exactly—they're more like behavioral parameters.

The context window is where the real action happens. This is everything the model can "see" right now: your current message, the conversation history, any documents you've uploaded or it's loaded, and those system instructions. Think of it as the papers spread across your desk during a meeting.

Here's what catches people off guard: what's not in the context window doesn't exist for the model. It has no persistent memory between conversations. That profound insight from Tuesday's session? Gone on Wednesday unless something brings it back into context.

And here's where "less is more" shows up again. The context window has limited space, and filling it with marginally relevant information dilutes the signal. When I load a 50-page document when I only need three paragraphs, I'm not being thorough—I'm introducing noise that makes the model's job harder.

Two Approaches to Persistent Memory

Given that fundamental limitation, there are two basic approaches to giving AI useful context about you and your work over time.

The application-managed approach is what Claude and ChatGPT do natively now. The platform automatically stores facts about you and retrieves them in future conversations. You mentioned you're a consultant? It might remember that. You talked about a project last month? Maybe it'll surface.

This approach has real benefits: zero effort on your part, it works automatically, it can improve over time. But it has a significant drawback I've come to call the "magic black box" problem. What does it actually remember? How does it decide what's relevant to surface? When I ask about my project, which of the fifty things I've mentioned is it retrieving? I can't see inside the mechanism, which means I can't debug it when responses miss the mark.

The user-controlled approach is what I use: maintaining my own files—a personal knowledge management vault—and selectively loading relevant context into conversations. When I'm writing a blog post, I load my voice and style guide. When I'm researching a complex topic, I load my previous notes on that subject. When I'm doing something routine, I load almost nothing.

This requires intentional management. It's not automatic. But the tradeoff is complete transparency: I can see exactly what the AI knows because I put it there. And crucially, I can apply that "smallest effective context" principle directly. I'm not hoping the system surfaces the right information—I'm deliberately loading only what's needed for the task at hand.

The Practical Difference

Let me make this concrete. Say I'm writing a long-form piece on a topic I've been researching for months—organizational transformation, in my case.

With application memory, I'm hoping the system remembers the right things about this topic, the sources I've cited before, the arguments I've already made. Maybe it does. Maybe it surfaces something from a completely different conversation. Maybe it misses the key framework I developed three weeks ago.

With vault-loading, I explicitly load my research notes on this topic, my outline, and any relevant background pieces I've written. The AI sees exactly what I want it to see—no more, no less. The responses are more focused because the context is more focused.

This gets at a broader principle I've learned the hard way: in AI work, loading everything creates noise and confusion. Loading exactly what's needed produces high-signal responses. The context window is precious real estate, and treating it that way—being stingy, being deliberate, being modular—dramatically improves output quality.

What Auto-Compression Changes

The newest models, including Claude's Opus 4.5, are starting to address one of the most annoying practical problems with context management: what happens when a long conversation fills up the available space.

Previously, you had two options as a conversation got long. Either start fresh and lose continuity, or try to manually summarize what mattered and carry it forward. Neither was great. Starting fresh meant re-explaining context. Summarizing was tedious and error-prone.

Auto-compression changes this by automatically condensing earlier parts of the conversation while preserving key information. The model essentially creates running summaries of what came before, keeping the important bits accessible while freeing up space for new content. This extends effective conversation length significantly and reduces the manual overhead of context management.

It's genuinely useful. Fewer "let's start a new conversation" moments. Better continuity on complex projects. Less friction overall.

But—and this is important—it doesn't eliminate the value of intentional context loading. A compressed conversation history is still a conversation history, with all its tangents and exploratory dead ends. Loading fresh, focused context for a specific task will still produce more targeted results than relying on compressed history alone.

Think of it this way: auto-compression makes conversations more forgiving, but deliberate context engineering still wins for quality.

Takeaways

If you remember nothing else from this:

Training data is what the model knows. System instructions shape how it behaves. The context window is what it sees right now. You can influence context directly; you can't influence training.

More deliberate context loading produces better results. The instinct to give the AI "everything it might need" is usually counterproductive. Find the smallest set of high-signal information that enables the task.

Application-managed memory is convenient but opaque. User-controlled context is more work but more reliable. The right choice depends on how much precision you need.

Auto-compression helps with long conversations, but it's not a substitute for intentional context management. Both have their place.

The underlying principle is the same one that applies to training data, to context windows, to everything in this domain: quality beats quantity. Focus beats comprehensiveness. The smallest effective context set, carefully chosen, outperforms the kitchen sink every time.

Questions? Thoughts? I'm genuinely curious how others are thinking about this—particularly anyone who's built their own context-loading systems or found creative approaches to the memory problem.