Progressive Context Enrichment for LLMs
Learn how progressive context enrichment can improve LLM performance by optimizing attention and context delivery, based on our production experience with GPT-4 and Claude.
Based on our experience with production LLMs like GPT-4o and Claude, we discovered something counterintuitive: giving LLMs more context often makes them perform worse, not better - in instruction following tasks. The problem isn't just about context window sizes - it's about attention.
Current generation LLMs are surprisingly bad at holding attention across large contexts. Give them a complex set of instructions with a lot of background information, and they'll start dropping balls. They might process half the articles instead of all of them, or forget crucial steps in multi-step tasks. It's not that they're disobeying instructions - they're just forgetting them.
flowchart TD A[Task Start] --> B{Need More Context?} B -->|Yes| C[Tool Call] C --> D[Fetch Specific Data] D --> E[Process Current Step] E --> B B -->|No| F[Complete Step] F --> G{More Steps?} G -->|Yes| B G -->|No| H[Task Complete] style A fill:#f9f,stroke:#333,stroke-width:2px style H fill:#9ff,stroke:#333,stroke-width:2px
A Different Approach: Progressive Context Enrichment
Instead of cramming everything into the context window upfront, we've found success with a different approach: letting the LLM fetch context when it needs it. Think of it like progressive disclosure in UX design, where interfaces reveal complexity gradually instead of overwhelming users with everything at once.
The idea is simple: rather than saying "here's all my article data," we expose a tool that lets the LLM search for specific articles when needed. The model constructs search queries in real time, fetching only the data it needs for the current step of its task.
Why This Works Better
-
Focused Attention: The model only needs to handle relevant information for each step, reducing the chance of "forgetting" important instructions.
-
Going beyond working memory: Your system's capability isn't limited by context window size - the model can theoretically process unlimited data by fetching what it needs.
-
Cost: The cost of the tool calls to get a subset of the data is much cheaper than the cost of the LLM running on the full data. (Of course you pay for this in latency)
The Progressive Context Pattern
The pattern works like this:
- Start with minimal context - just enough to understand the task
- Let the model identify when it needs more information
- Provide tools to fetch specific data
- Process the current step with focused context
- Repeat as needed
This mirrors how humans process complex tasks. We don't load everything into our working memory at once - we retrieve information as needed.
A Real World Example
Let's look at how we implement this pattern in our open-source data connector for Postgres. The core idea is simple: instead of dumping the entire database schema into the context, we provide tools for the LLM to:
- Get the basic schema information when needed
- Execute queries with approval gates
- Control how much data flows back to the model
Here's how it works:
sequenceDiagram participant LLM participant Connector participant Database Note over LLM,Database: Initial Phase - Light Context LLM->>Connector: Request Schema Info Connector->>Database: Get Table Structure Database-->>Connector: Basic Schema Connector-->>LLM: Tables + Columns Note over LLM,Database: Query Phase LLM->>Connector: Execute Query alt Approval Mode Connector->>Human: Request Approval Human-->>Connector: Approve/Deny end Connector->>Database: Run Query Database-->>Connector: Results alt Privacy Mode Connector-->>Human: Direct Results else Normal Mode Connector-->>LLM: Results for Processing end
The connector provides two main functions:
getPostgresContext
: Fetches schema information when the LLM needs itexecutePostgresQuery
: Runs queries with built-in safeguards
The interesting bits are the safeguards we've built in:
- Privacy Mode: Results can bypass the LLM entirely and go straight to the user
- Approval Mode: Queries need human approval before execution
These safeguards aren't just security features - they're practical necessities for working with LLMs at scale. When result sets are too large for the context window, we route them directly to the user instead of trying to stuff them through the model.
Challenges and Considerations
This approach isn't without tradeoffs. You need to:
- Design clear tool interfaces for context fetching
- Handle potential latency from multiple tool calls
- Balance between too many small calls and too much context
But in our experience, these challenges are more manageable than dealing with unreliable execution in large contexts.
Looking Forward
Progressive context enrichment isn't just a theoretical pattern - it's a practical approach we use in production. As LLM capabilities evolve, attention span might improve. But for now, this pattern of "fetch what you need, when you need it" provides a reliable way to build AI systems that can handle complex, data-heavy tasks without getting lost in their own context.
The code for this implementation is open source and available in our data-connector repository. We also have a announcement post on the agentic use cases on our blog.