Progressive Context Enrichment for LLMs

Learn how progressive context enrichment can improve LLM performance by optimizing attention and context delivery, based on our production experience with GPT-4 and Claude.

Nadeesha Cabral on 02-01-2025

Based on our experience with production LLMs like GPT-4o and Claude, we discovered something counterintuitive: giving LLMs more context often makes them perform worse, not better - in instruction following tasks. The problem isn't just about context window sizes - it's about attention.

Current generation LLMs are surprisingly bad at holding attention across large contexts. Give them a complex set of instructions with a lot of background information, and they'll start dropping balls. They might process half the articles instead of all of them, or forget crucial steps in multi-step tasks. It's not that they're disobeying instructions - they're just forgetting them.

flowchart TD
    A[Task Start] --> B{Need More Context?}
    B -->|Yes| C[Tool Call]
    C --> D[Fetch Specific Data]
    D --> E[Process Current Step]
    E --> B
    B -->|No| F[Complete Step]
    F --> G{More Steps?}
    G -->|Yes| B
    G -->|No| H[Task Complete]

    style A fill:#f9f,stroke:#333,stroke-width:2px
    style H fill:#9ff,stroke:#333,stroke-width:2px

A Different Approach: Progressive Context Enrichment

Instead of cramming everything into the context window upfront, we've found success with a different approach: letting the LLM fetch context when it needs it. Think of it like progressive disclosure in UX design, where interfaces reveal complexity gradually instead of overwhelming users with everything at once.

The idea is simple: rather than saying "here's all my article data," we expose a tool that lets the LLM search for specific articles when needed. The model constructs search queries in real time, fetching only the data it needs for the current step of its task.

Why This Works Better

  1. Focused Attention: The model only needs to handle relevant information for each step, reducing the chance of "forgetting" important instructions.

  2. Going beyond working memory: Your system's capability isn't limited by context window size - the model can theoretically process unlimited data by fetching what it needs.

  3. Cost: The cost of the tool calls to get a subset of the data is much cheaper than the cost of the LLM running on the full data. (Of course you pay for this in latency)

The Progressive Context Pattern

The pattern works like this:

  1. Start with minimal context - just enough to understand the task
  2. Let the model identify when it needs more information
  3. Provide tools to fetch specific data
  4. Process the current step with focused context
  5. Repeat as needed

This mirrors how humans process complex tasks. We don't load everything into our working memory at once - we retrieve information as needed.

A Real World Example

Let's look at how we implement this pattern in our open-source data connector for Postgres. The core idea is simple: instead of dumping the entire database schema into the context, we provide tools for the LLM to:

  1. Get the basic schema information when needed
  2. Execute queries with approval gates
  3. Control how much data flows back to the model

Here's how it works:

sequenceDiagram
    participant LLM
    participant Connector
    participant Database
    
    Note over LLM,Database: Initial Phase - Light Context
    LLM->>Connector: Request Schema Info
    Connector->>Database: Get Table Structure
    Database-->>Connector: Basic Schema
    Connector-->>LLM: Tables + Columns

    Note over LLM,Database: Query Phase
    LLM->>Connector: Execute Query
    alt Approval Mode
        Connector->>Human: Request Approval
        Human-->>Connector: Approve/Deny
    end
    Connector->>Database: Run Query
    Database-->>Connector: Results
    
    alt Privacy Mode
        Connector-->>Human: Direct Results
    else Normal Mode
        Connector-->>LLM: Results for Processing
    end

The connector provides two main functions:

  1. getPostgresContext: Fetches schema information when the LLM needs it
  2. executePostgresQuery: Runs queries with built-in safeguards

The interesting bits are the safeguards we've built in:

  • Privacy Mode: Results can bypass the LLM entirely and go straight to the user
  • Approval Mode: Queries need human approval before execution

These safeguards aren't just security features - they're practical necessities for working with LLMs at scale. When result sets are too large for the context window, we route them directly to the user instead of trying to stuff them through the model.

Challenges and Considerations

This approach isn't without tradeoffs. You need to:

  • Design clear tool interfaces for context fetching
  • Handle potential latency from multiple tool calls
  • Balance between too many small calls and too much context

But in our experience, these challenges are more manageable than dealing with unreliable execution in large contexts.

Looking Forward

Progressive context enrichment isn't just a theoretical pattern - it's a practical approach we use in production. As LLM capabilities evolve, attention span might improve. But for now, this pattern of "fetch what you need, when you need it" provides a reliable way to build AI systems that can handle complex, data-heavy tasks without getting lost in their own context.

The code for this implementation is open source and available in our data-connector repository. We also have a announcement post on the agentic use cases on our blog.

Subscribe to our newsletter for high signal updates from the cross section of AI agents, LLMs, and distributed systems.

Maximum one email per week.

Subscribe to Newsletter