Implementing structured outputs as a feature for any LLM
Learn how to build reliable JSON parsing for LLMs using Zod schemas and recursive retries. A practical guide to getting deterministic structured outputs from any language model.
It's nice when the LLM can return structured data. It makes it easier to work with the output of the LLM. For example, OpenAI already does this.
But getting structured output from an LLM that doesn't support it can be tricky. While function calling APIs exist in models like GPT-4, they're not available in all models. And even when they are, the output isn't always guaranteed to be valid JSON.
Let's build a simple but reliable JSON parser that:
- Uses Zod to validate the structure
- Recursively retries with validation errors as feedback
- Works with local models via Ollama
Setting up
First, let's install our dependencies:
npm install zod async-retry
We'll also need Ollama running locally with llama3.2. If you haven't already:
ollama pull llama3.2
The Parser Implementation
Here's our implementation of a recursive JSON parser that keeps trying until it gets valid JSON that matches our schema:
import { z } from "zod";
import retry from "async-retry";
interface ParserOptions {
maxRetries?: number;
schema: z.ZodSchema;
prompt: string;
}
async function callOllama(prompt: string) {
const response = await fetch("http://localhost:11434/api/generate", {
method: "POST",
headers: { "Content-Type": "application/json" },
body: JSON.stringify({
model: "llama3.2",
prompt: prompt,
stream: false,
}),
});
const data = await response.json();
return data.response;
}
export async function parseWithRetry({
maxRetries = 3,
schema,
prompt,
}: ParserOptions) {
return retry(
async (bail, attempt) => {
try {
// If this isn't the first attempt, add the previous error to the prompt
const fullPrompt =
attempt === 1
? prompt
: `${prompt}\n\nPrevious attempt failed with error: ${attempt}. Please fix and try again.`;
const response = await callOllama(fullPrompt);
// Extract JSON from the response
const jsonMatch = response.match(/\{[\s\S]*\}/);
if (!jsonMatch) {
throw new Error("No JSON found in response");
}
const parsed = JSON.parse(jsonMatch[0]);
return schema.parse(parsed);
} catch (error) {
if (attempt === maxRetries) {
bail(error as Error);
return;
}
throw error;
}
},
{
retries: maxRetries,
factor: 1,
minTimeout: 100,
maxTimeout: 1000,
}
);
}
Using the Parser
Let's try it out with a simple movie recommendation schema:
import { z } from "zod";
import { parseWithRetry } from "./parser";
const MovieSchema = z.object({
title: z.string(),
year: z.number(),
rating: z.number().min(0).max(10),
genres: z.array(z.string()),
});
async function main() {
const prompt = `
Give me a movie recommendation in JSON format with the following structure:
- title (string)
- year (number)
- rating (number between 0-10)
- genres (array of strings)
Return only the JSON, no other text.
Here's an example of valid JSON:
${JSON.stringify({
title: "The Dark Knight",
year: 2008,
genres: ["Action", "Crime", "Drama"],
})}
`;
try {
const result = await parseWithRetry({
schema: MovieSchema,
prompt,
maxRetries: 3,
});
console.log("Parsed result:", result);
} catch (error) {
console.error("Failed after all retries:", error);
}
}
main();
When you run this, you might see something like:
{
"title": "Inception",
"year": 2010,
"genres": ["Science Fiction", "Action", "Thriller"]
}
How it Works
- The parser takes a Zod schema, a prompt, and optional retry settings
- It sends the prompt to Ollama's llama3.2 model
- Extracts JSON from the response using regex
- Validates the JSON against the Zod schema
- If validation fails, it retries with the error message appended to the prompt
- This continues until we get valid JSON or hit the retry limit
Improving the Parser
This approach works best with simpler schemas. Complex nested structures might need more retries. In our experience, we've found that breaking down the schema into smaller chunks and handling each chunk separately has worked well.
In other cases, providing few-shot examples of valid and invalid JSON can help the model understand the schema better.
If you're using Claude, you can also prefill Claude's response format to force it to output in the correct format.
[
{
"role": "user",
"content": "What is your favorite color? Output only the JSON, no other text."
},
{
"role": "assistant",
"content": "{ \"color\":"
}
]
To run this example in your own machine, clone the repo.
Structured Outputs in Popular Models
While we've built our own parser above, many popular LLM providers offer native support for structured outputs. Let's look at how different providers handle this:
OpenAI
OpenAI provides built-in JSON mode through the response_format
parameter:
const completion = await openai.chat.completions.create({
model: "gpt-4-turbo-preview",
response_format: { type: "json_object" },
messages: [
{
role: "system",
content: "You are a helpful assistant designed to output JSON.",
},
{ role: "user", content: "Who won the world series in 2020?" },
],
});
This ensures the model always returns valid JSON, making it ideal for production applications that need reliable structured data.
Google Gemini
Gemini supports structured outputs through both JSON schemas and type hints:
import google.generativeai as genai
from typing_extensions import TypedDict
class Recipe(TypedDict):
recipe_name: str
ingredients: list[str]
model = genai.GenerativeModel("gemini-1.5-pro-latest")
result = model.generate_content(
"List a few cookie recipes.",
generation_config=genai.GenerationConfig(
response_mime_type="application/json",
response_schema=list[Recipe]
)
)
Gemini's approach is particularly powerful for Python developers as it integrates well with type hints and allows for complex nested structures.
LiteLLM
LiteLLM provides a unified interface for JSON mode across different providers. It supports client-side validation of JSON schemas:
from litellm import completion
from pydantic import BaseModel
class CalendarEvent(BaseModel):
name: str
date: str
participants: list[str]
response = completion(
model="gpt-4",
messages=[{"role": "user", "content": "Schedule a team meeting"}],
response_format=CalendarEvent
)
This abstraction allows you to use structured outputs consistently across different LLM providers while maintaining type safety.
Anthropic Claude
Claude takes a different approach, focusing on template-based structured outputs:
prompt = """<report>
<summary>
<metric name="total_revenue">$0.00</metric>
<metric name="units_sold">0</metric>
</summary>
</report>"""
response = client.messages.create(
model="claude-3-opus-20240229",
messages=[{"role": "user", "content": prompt}],
system="Always respond in the exact format provided"
)