You don't need tool calling

Implement your own tool calling with LLM structured outputs for fun and profit

Nadeesha Cabral on 14-01-2025

OpenAI, Anthropic, and others have built specialized APIs for tool calling. We've always thought that it's a weird abstraction to be provided by the LLM provider, given that things become much easier if you can treat the LLM as a text-in text-out function.

It's good that structured outputs are a thing, and function calling is trivial to implement if you want to do it yourself and have full control over the control flow. This is especially pertinent when building agentic workflows that need some kind of advanced control flow.

The Basic Pattern

The core idea is simple - instead of using specialized APIs, just ask your LLM to return a JSON object with:

The function name to execute
The arguments for that function

Let's see how this works with TypeScript and the OpenAI SDK:

// First, define your function types
type ToolFunction = {
  name: string;
  args: Record<string, any>;
}

// Define your available tools
const tools = {
  searchWeather: async (location: string) => {
    // Implement weather search
    return `Weather data for ${location}`;
  },
  sendEmail: async (to: string, subject: string, body: string) => {
    // Implement email sending
    return `Email sent to ${to}`;
  }
} as const;

// Create the tool caller
async function executeToolCall(
  prompt: string, 
  availableTools: Record<string, Function>
): Promise<string> {
  const openai = new OpenAI({
    apiKey: process.env.OPENAI_API_KEY,
  });

  const response = await openai.chat.completions.create({
    model: "gpt-4",
    messages: [
      {
        role: "system",
        content: `You are a helpful assistant that responds with JSON objects representing function calls.
          Available functions:
          - searchWeather(location: string)
          - sendEmail(to: string, subject: string, body: string)
          
          Respond ONLY with a JSON object in this format:
          {
            "name": "functionName",
            "args": {
              "argName": "argValue"
            }
          }`
      },
      {
        role: "user",
        content: prompt
      }
    ]
  });

  const toolCall = JSON.parse(response.choices[0].message.content!) as ToolFunction;
  
  // Execute the tool
  const tool = availableTools[toolCall.name];
  if (!tool) {
    throw new Error(`Unknown tool: ${toolCall.name}`);
  }

  return await tool(...Object.values(toolCall.args));
}

This works, but it's not very robust. For example, if the LLM returns a function name that doesn't exist, it will throw an error. Or, the LLM might return a function name that exists, but with invalid arguments.

Trying to prompt engineer your way out of this is a pain, but luckily there's a better way.

Adding Type Safety with Zod

The second version adds Zod schemas for validation and proper TypeScript types:

import { z } from 'zod';

// Define Zod schemas for our tool inputs
const WeatherInput = z.object({
  location: z.string().min(1)
});

const EmailInput = z.object({
  to: z.string().email(),
  subject: z.string().min(1),
  body: z.string().min(1)
});

// Create a union of all possible tool schemas
const ToolSchemas = {
  searchWeather: WeatherInput,
  sendEmail: EmailInput
} as const;

// Infer the tool input types from the schemas
type ToolInputs = {
  [K in keyof typeof ToolSchemas]: z.infer<typeof ToolSchemas[K]>
};

// Define the tools with their implementations
const tools = {
  searchWeather: async (args: ToolInputs['searchWeather']) => {
    return `Weather data for ${args.location}`;
  },
  sendEmail: async (args: ToolInputs['sendEmail']) => {
    return `Email sent to ${args.to}`;
  }
} as const;

// Enhanced tool caller with Zod validation
async function executeToolCall(
  prompt: string,
  availableTools: typeof tools
): Promise<string> {
  const openai = new OpenAI({
    apiKey: process.env.OPENAI_API_KEY,
  });

  const response = await openai.chat.completions.create({
    model: "gpt-4",
    messages: [
      {
        role: "system",
        content: `You are a helpful assistant that responds with JSON objects representing function calls.
          Available functions:
          - searchWeather(location: string)
          - sendEmail(to: string, subject: string, body: string)
          
          Respond ONLY with a JSON object in this format:
          {
            "name": "functionName",
            "args": {
              "argName": "argValue"
            }
          }`
      },
      {
        role: "user",
        content: prompt
      }
    ]
  });

  const toolCall = parseToolCall(response.choices[0].message.content!);
  const schema = ToolSchemas[toolCall.name];
  
  try {
    const validatedArgs = schema.parse(toolCall.args);
    return await availableTools[toolCall.name](validatedArgs);
  } catch (error) {
    if (error instanceof z.ZodError) {
      throw new Error(`Invalid args: ${formatZodError(error)}`);
    }
    throw error;
  }
}

const ToolCallSchema = z.object({
  name: z.enum(['searchWeather', 'sendEmail']),
  args: z.record(z.unknown())
});

function parseToolCall(jsonString: string): z.infer<typeof ToolCallSchema> {
  const parsed = JSON.parse(jsonString);
  return ToolCallSchema.parse(parsed);
}

function formatZodError(error: z.ZodError): string {
  return error.issues.map(issue => 
    `${issue.path.join('.')}: ${issue.message}`
  ).join(', ');
}

This is a bit better, but given that the LLM will be incorrect non-zero percent of the time, our code will fail proportionally. One way out of this is to add a retry mechanism, and help the LLM out by providing some context about what went wrong.

Adding Self-Healing with Recursive Retries

type ValidationError = {
  field: string;
  message: string;
};

interface ToolCallError {
  type: 'validation' | 'unknown';
  errors?: ValidationError[];
  message?: string;
}

async function executeToolCallWithRetries(
  prompt: string,
  availableTools: typeof tools,
  maxRetries: number = 3,
  attempt: number = 1,
  previousError?: ToolCallError
): Promise<string> {
  const openai = new OpenAI({
    apiKey: process.env.OPENAI_API_KEY,
  });

  // Build system message with error context if this is a retry
  let systemMessage = `You are a helpful assistant that responds with JSON objects representing function calls.
    Available functions:
    - searchWeather(location: string)
    - sendEmail(to: string, subject: string, body: string)
    
    Respond ONLY with a JSON object in this format:
    {
      "name": "functionName",
      "args": {
        "argName": "argValue"
      }
    }`;

  if (previousError?.type === 'validation') {
    systemMessage += `\n\nYour previous attempt failed validation:
      ${previousError.errors?.map(e => `- ${e.field}: ${e.message}`).join('\n')}
      
      Please try again with valid arguments.`;
  }

  const response = await openai.chat.completions.create({
    model: "gpt-4",
    messages: [
      { role: "system", content: systemMessage },
      { role: "user", content: prompt }
    ]
  });

  try {
    const toolCall = parseToolCall(response.choices[0].message.content!);
    const schema = ToolSchemas[toolCall.name];
    const validatedArgs = schema.parse(toolCall.args);
    
    return await availableTools[toolCall.name](validatedArgs);
  } catch (error) {
    if (attempt >= maxRetries) {
      throw new Error(`Failed after ${maxRetries} attempts: ${error.message}`);
    }

    if (error instanceof z.ZodError) {
      const validationError: ToolCallError = {
        type: 'validation',
        errors: error.issues.map(issue => ({
          field: issue.path.join('.'),
          message: issue.message
        }))
      };
      
      // Recursive retry with error context
      return executeToolCallWithRetries(
        prompt,
        availableTools,
        maxRetries,
        attempt + 1,
        validationError
      );
    }

    throw error;
  }
}

// Example usage
async function main() {
  try {
    // This might fail first time but self-correct
    const result = await executeToolCallWithRetries(
      "Send an email to invalid-email about the weather",
      tools
    );
    console.log(result);
  } catch (error) {
    console.error('All retries failed:', error.message);
  }
}

With this approach, we've added a retry mechanism that helps the LLM out by providing some context about what went wrong. This will get you a lot of the way there, but there's still a lot of room for improvement.

For example:

You need to stop the LLM from trying to be too helpful.The latest version of Anthropic's Claude Sonnet will randomly insert "<UNKNOWN>" for required string arguments, when it can't find a good value for a mandatory string. You need to guard against this, and other nuances across different LLMs.
You need to tell the LLM it's ok not to do anything if there's no good tool to use. This is a bit of prompt engineering, and a eval hole that you need to fill.
You need to guard against the LLM hallucinating a tool that doesn't exist. You won't get this problem if you use inbuilt tool calling APIs as much, as the LLM providers will validate these for you (and would have post-trained to avoid this). But in our experience, yes - it definitely makes tools up out of thin air.

Protection against all of these problems is something that you get for free with Inferable Functions. Our tool calling abstraction is open source, and you can use it to build your own tool calling system outside of the LLM provider's tool calling API.

Tool calling is great, but getting the right tool at the right time (dynamically) makes it a lot better. The reason that this matters is that especially in large sets of toold attachements, LLM gets easily confused by which tool to use. Attention is inversely proportional to the number of tools (size of the utilized context).

We wrote about this earlier in more detail about dynamic tool calling.