Technology

From 48KB to 5KB: Rebuilding My AI Assistant's architecture to better manage context

Paul Kruchoski

04 Oct 2025 — 3 min read

Last weekend, I rebuilt my Claude PKM system. Not because it was broken—it was working perfectly fine. But "working" and "optimal" are two very different standards, especially when you're scaling AI collaboration across an organization.

Over the past month, my Claude system had grown to 48KB of instructions. Every interaction loaded my entire professional context, voice guide, technical specifications, and decision frameworks—whether Claude needed them or not. It was like requiring every employee to memorize the entire company handbook before answering any email. Functional? Sure. Efficient? Not even close.

The Problem Was Context Bloat

The old system was a single 48KB instruction file—my "Claude Core System Instructions"—that loaded every conversation. Professional background, writing voice, stakeholder relationships, Obsidian formatting rules, decision frameworks. All of it, every time.

This created two problems. First, wasted tokens on irrelevant context. Claude loaded my diplomatic stakeholder map when I asked for a markdown syntax check. Second, slower response times. More context means more processing, even when that context adds zero value. This is such an issue that Anthropic even wrote a full-length piece about it this week.

The solution: modular architecture with intelligent loading.

How We Built It

I started by analyzing a month of conversation logs to identify context usage patterns. Which information actually got used? Which sat idle? The data showed clear clusters:

Always needed together: Obsidian operations always needed formatting rules. Ghostwriting always needed voice guide + personal context. Stakeholder discussions always needed relationship maps.

Sometimes needed: Strategic planning sometimes needed full context, sometimes didn't. Quick questions rarely needed anything beyond core instructions.

Never needed together: Technical syntax rules never mattered during strategic planning. Voice guides were irrelevant for vault searches.

This gave me the module structure:

Core Instructions (5KB)

Identity and behavioral rules
PAUSE protocol (check existing resources before building new ones)
Module loading logic and trigger patterns
Manual override commands

Personal Context (14KB)

Professional background and stakeholder relationships
Work patterns and priorities
Triggers: Names, organizations, strategic topics

Voice Guide (12KB)

Writing DNA, tone patterns, structural preferences
Content type templates (emails, memos, blog posts)
Triggers: "Write," "draft," "ghostwrite," "in my voice"

Obsidian Syntax (12KB)

YAML frontmatter rules and markdown formatting
Vault organization principles and file handling specs
Triggers: "Create note," "add to vault," ".md file"

The loading logic itself is pattern matching in the core. When I say "create a note," Claude scans for vault-operation keywords and auto-loads Obsidian Syntax. When I say "draft an email to xxx," it catches both the writing trigger and the stakeholder name, loading Voice Guide + Personal Context.

I built in three loading behaviors:

Auto-load (silent): Safety-critical stuff. Obsidian formatting loads automatically because corrupted markdown breaks the vault. Voice guide loads automatically because ghostwriting without it produces generic corporate speak.

Query first: Ambiguous contexts. "Let's analyze the partnership strategy" triggers a prompt: "Load personal context for stakeholder dynamics? (Y/N)"

Manual override: I can say "minimal context" to work with just the core, or "full context" to load everything. "What's loaded?" shows me current modules and token usage.

The core doesn't just load modules reactively. It explains when and why. If it auto-loads something, I get a message: "[Auto-loading Obsidian Syntax for vault operation...]". This keeps the loading logic transparent, not magical.

The actual implementation was collaborative. I worked with Claude to spec out the architecture—defining module boundaries, identifying trigger patterns, and designing the loading logic. Once we had the parameters clear, Claude rewrote the instruction files based on those specs. We tested the new system against typical use cases, validated that nothing broke, and archived the old monolithic instructions in case we needed to revert. The whole rebuild took about two hours.

The Results

Baseline context dropped from 48KB to 5KB— a 90% reduction. Full functionality preserved. Response times noticeably faster, especially on simple queries that don't need heavy context.

But the real win is intentionality. I'm now explicit about which context I'm using when. The system shows me what's loaded, what's available, and asks before pulling in expensive modules. Context becomes a conscious choice, not an invisible default.

Some Parallels with Organizational Design

This architecture mirrors how effective teams actually work, and it is a good bridge to thinking about human/AI collaboration. You don't bring your entire organization to every meeting. You have core principles and decision rights, then pull in specific expertise when context demands it. Legal joins when contracts arise. Finance joins when budgets shift. Communications joins when external messaging matters.

A lot of organizational complexity exists because we're afraid of missing something important. So we include everyone, all the time, just in case. It's expensive, slow, and ultimately less effective than modular systems that load context deliberately. As we scale with AI, I suspect everyone will continue to build patterns more like this: not just specialized, but modular.

But you can't modularize what you don't understand. I needed a month of running monolithic before I could identify the usage patterns that informed module boundaries. Same with teams—you can't design efficient collaboration structures until you've mapped actual information flows and decision patterns.

Start by documenting what exists. Observe the patterns. Then design your architecture. Real transformation requires the confidence to travel light, knowing you can access what you need when you need it.

From 48KB to 5KB: Rebuilding My AI Assistant's architecture to better manage context

Paul Kruchoski

The Problem Was Context Bloat

How We Built It

The Results

Some Parallels with Organizational Design

Read more

How to Stop Wasting Hours on Slides: A Step-by-Step Guide to MARP

Building a PKM with Obsidian and Claude: A Practical Guide

BCL Bot v0.1: An AI Briefing Checklist Drafter

Where to start with AI as a Diplomat? - September 2025 Edition