Implementing structured outputs as a feature for any LLM

Learn how to build reliable JSON parsing for LLMs using Zod schemas and recursive retries. A practical guide to getting deterministic structured outputs from any language model.

Nadeesha Cabral on 24-11-2024

It's nice when the LLM can return structured data. It makes it easier to work with the output of the LLM. For example, OpenAI already does this.

But getting structured output from an LLM that doesn't support it can be tricky. While function calling APIs exist in models like GPT-4, they're not available in all models. And even when they are, the output isn't always guaranteed to be valid JSON.

Let's build a simple but reliable JSON parser that:

Uses Zod to validate the structure
Recursively retries with validation errors as feedback
Works with local models via Ollama

Setting up

First, let's install our dependencies:

npm install zod async-retry

We'll also need Ollama running locally with llama3.2. If you haven't already:

ollama pull llama3.2

The Parser Implementation

Here's our implementation of a recursive JSON parser that keeps trying until it gets valid JSON that matches our schema:

import { z } from "zod";
import retry from "async-retry";

interface ParserOptions {
  maxRetries?: number;
  schema: z.ZodSchema;
  prompt: string;
}

async function callOllama(prompt: string) {
  const response = await fetch("http://localhost:11434/api/generate", {
    method: "POST",
    headers: { "Content-Type": "application/json" },
    body: JSON.stringify({
      model: "llama3.2",
      prompt: prompt,
      stream: false,
    }),
  });

  const data = await response.json();
  return data.response;
}

export async function parseWithRetry({
  maxRetries = 3,
  schema,
  prompt,
}: ParserOptions) {
  return retry(
    async (bail, attempt) => {
      try {
        // If this isn't the first attempt, add the previous error to the prompt
        const fullPrompt =
          attempt === 1
            ? prompt
            : `${prompt}\n\nPrevious attempt failed with error: ${attempt}. Please fix and try again.`;

        const response = await callOllama(fullPrompt);

        // Extract JSON from the response
        const jsonMatch = response.match(/\{[\s\S]*\}/);
        if (!jsonMatch) {
          throw new Error("No JSON found in response");
        }

        const parsed = JSON.parse(jsonMatch[0]);
        return schema.parse(parsed);
      } catch (error) {
        if (attempt === maxRetries) {
          bail(error as Error);
          return;
        }
        throw error;
      }
    },
    {
      retries: maxRetries,
      factor: 1,
      minTimeout: 100,
      maxTimeout: 1000,
    }
  );
}

Using the Parser

Let's try it out with a simple movie recommendation schema:

import { z } from "zod";
import { parseWithRetry } from "./parser";

const MovieSchema = z.object({
  title: z.string(),
  year: z.number(),
  rating: z.number().min(0).max(10),
  genres: z.array(z.string()),
});

async function main() {
  const prompt = `
    Give me a movie recommendation in JSON format with the following structure:
    - title (string)
    - year (number)
    - rating (number between 0-10)
    - genres (array of strings)

    Return only the JSON, no other text.

    Here's an example of valid JSON:
    ${JSON.stringify({
      title: "The Dark Knight",
      year: 2008,
      genres: ["Action", "Crime", "Drama"],
    })}
  `;

  try {
    const result = await parseWithRetry({
      schema: MovieSchema,
      prompt,
      maxRetries: 3,
    });

    console.log("Parsed result:", result);
  } catch (error) {
    console.error("Failed after all retries:", error);
  }
}

main();

When you run this, you might see something like:

{
  "title": "Inception",
  "year": 2010,
  "genres": ["Science Fiction", "Action", "Thriller"]
}

How it Works

The parser takes a Zod schema, a prompt, and optional retry settings
It sends the prompt to Ollama's llama3.2 model
Extracts JSON from the response using regex
Validates the JSON against the Zod schema
If validation fails, it retries with the error message appended to the prompt
This continues until we get valid JSON or hit the retry limit

Improving the Parser

This approach works best with simpler schemas. Complex nested structures might need more retries. In our experience, we've found that breaking down the schema into smaller chunks and handling each chunk separately has worked well.

In other cases, providing few-shot examples of valid and invalid JSON can help the model understand the schema better.

If you're using Claude, you can also prefill Claude's response format to force it to output in the correct format.

[
  {
    "role": "user",
    "content": "What is your favorite color? Output only the JSON, no other text."
  },
  {
    "role": "assistant",
    "content": "{ \"color\":"
  }
]

To run this example in your own machine, clone the repo.

Structured Outputs in Popular Models

While we've built our own parser above, many popular LLM providers offer native support for structured outputs. Let's look at how different providers handle this:

OpenAI

OpenAI provides built-in JSON mode through the response_format parameter:

const completion = await openai.chat.completions.create({
  model: "gpt-4-turbo-preview",
  response_format: { type: "json_object" },
  messages: [
    {
      role: "system",
      content: "You are a helpful assistant designed to output JSON.",
    },
    { role: "user", content: "Who won the world series in 2020?" },
  ],
});

This ensures the model always returns valid JSON, making it ideal for production applications that need reliable structured data.

Google Gemini

Gemini supports structured outputs through both JSON schemas and type hints:

import google.generativeai as genai
from typing_extensions import TypedDict

class Recipe(TypedDict):
    recipe_name: str
    ingredients: list[str]

model = genai.GenerativeModel("gemini-1.5-pro-latest")
result = model.generate_content(
    "List a few cookie recipes.",
    generation_config=genai.GenerationConfig(
        response_mime_type="application/json",
        response_schema=list[Recipe]
    )
)

Gemini's approach is particularly powerful for Python developers as it integrates well with type hints and allows for complex nested structures.

LiteLLM

LiteLLM provides a unified interface for JSON mode across different providers. It supports client-side validation of JSON schemas:

from litellm import completion
from pydantic import BaseModel

class CalendarEvent(BaseModel):
    name: str
    date: str
    participants: list[str]

response = completion(
    model="gpt-4",
    messages=[{"role": "user", "content": "Schedule a team meeting"}],
    response_format=CalendarEvent
)

This abstraction allows you to use structured outputs consistently across different LLM providers while maintaining type safety.

Anthropic Claude

Claude takes a different approach, focusing on template-based structured outputs:

prompt = """<report>
  <summary>
    <metric name="total_revenue">$0.00</metric>
    <metric name="units_sold">0</metric>
  </summary>
</report>"""

response = client.messages.create(
    model="claude-3-opus-20240229",
    messages=[{"role": "user", "content": prompt}],
    system="Always respond in the exact format provided"
)