Dynamic Tool Attachment for LLM Applications

Learn how to build a TypeScript application that dynamically attaches tools to LLMs for efficient function calling and reduced token usage.

John Smith on 24-11-2024

Tool calling is a feature of many Large Language Models (LLMs) which allows them to interact with external systems through functions provided by the calling application. When an LLM needs to perform an action such as searching a database or calling an API it can "call" a tool by specifying the function name and parameters in a structured format.

For example, if a user asks "What's the weather in Paris?", a LLM with access to a weather API tool might generate a response like:

{
  "function": {
    "name": "getWeather",
    "parameters": {
      "city": "Paris"
    }
  }
}

The application can interpret this call and use the weather API to fetch the weather in Paris.

In order for an LLM to invoke a tool, the tool's name, description, and parameters must be provided in the context of the conversation. For applications with a large number of tools, this can lead to a few challenges:

Increased token usage as all tool descriptions are included in the context
Higher likelihood of the model choosing the wrong tool

Dynamic Tool Attachment

One approach to combate the above issues is to dynamically attach tools to the LLM based on the user's input. For example, we can use a semantic search to dynamically attach only the most relevant tools based on the user's input.

Lets build a simple system that uses semantic search to dynamically attach tools to an LLM based on the user's input by:

Embedding tool names and descriptions into an in-memory vector store using an embedding model
Embedding the user's prompt using the same embedding model
Retrieving the 5 most similar tools using a cosine similarity search
Including the most relevant tools in the model's context with the user's prompt

Setting Up

For this project we will be using a TypeScript application and Ollama for local chat completion and embedding.

If you don't have Ollama installed, you can follow the installation instructions to get started.

Downloading Models

Download the following models for local use:

llama3.2 (3b) for chat completion
nomic-embed-text for generating embeddings

Node Dependencies

This project uses a couple Node dependencies:

compute-cosine-similarity for computing the cosine similarity of different vectors
tsx for running TypeScript in Node

npm init
npm install compute-cosine-similarity tsx

Defining Our Tools

We'll start by defining a set of dummy tools for our application. Each tool follows a consistent pattern with a name, description, and parameters:

export const ALL_TOOLS = [
  {
    type: "function",
    function: {
      name: "findCat",
      description: "Find the cat with the ID provided",
      parameters: {
        type: "object",
        properties: {
          id: { type: "string" },
        },
        required: ["id"],
      },
    },
  },
  ....
]

You can find a pre-defined set of tool schemas in the accompanying project repository.

Chat completion

We will use llama3.2 (3b) for chat completion by calling the Ollama chat endpoint:

type Message = {
  role: "user" | "assistant";
  content: string;
};

type Tool = {
  type: string;
  function: {
    name: string;
    description: string;
    parameters: unknown;
  };
};

const ollamaChat = async (messages: Message[], tools: Tool[]) => {
  const response = await fetch("http://localhost:11434/api/chat", {
    method: "POST",
    headers: { "Content-Type": "application/json" },
    body: JSON.stringify({
      model: "llama3.2",
      messages,
      tools,
      stream: false,
    }),
  });

  const data = await response.json();
  return data.message;
};

Computing Embeddings

We will use nomic-embed-text to generate vector embeddings by calling the Ollama embed endpoint:

const ollamaEmbed = async (input: string) => {
  const response = await fetch("http://localhost:11434/api/embed", {
    method: "POST",
    headers: { "Content-Type": "application/json" },
    body: JSON.stringify({
      model: "nomic-embed-text",
      input,
    }),
  });

  const data = await response.json();
  return data.embeddings[0];
};

Embedding tools

Let's create a function to compute embeddings for all tools and return an array of objects with both the tool schema and the embedding. We will embed the tool name and description.

const computeToolEmbeddings = async (tools: Tool[]) =>
  Promise.all(
    tools.map(async (tool) => {
      const embedding = await ollamaEmbed(
        `${tool.function.name}: ${tool.function?.description}`
      );

      return {
        tool,
        embedding,
      };
    })
  );

While this in memory data structure works for the purposes of this example, in a real-world application you would want to store embeddings in a database such as pgvector to avoid needing to re-compute the embeddings.

Tool Search Implementation

Let's implement a simple semantic search function using the compute-cosine-similarity package.

type EmbededTool = {
  tool: Tool;
  embedding: number[];
};

const searchTools = async (input: string, embeddings: EmbededTool[]) => {
  const messageEmbedding = await ollamaEmbed(input);

  return embeddings
    .map((embedding) => ({
      tool: embedding.tool,
      similarity: similarity(embedding.embedding, messageEmbedding),
    }))
    .sort((a, b) => b.similarity - a.similarity)
    .slice(0, 5);
};

This function:

Embeds the user's input
Compares the user's embedding with all tool embeddings
Returns the top 5 most relevant tools based on the similarity score (higher is more relevant)

Putting it all together

The last part of our project is a main function that:

Embeds all tools using computeToolEmbeddings
Prompts the user for input using readline
Uses searchTools to compare the user's input with the tool embeddings
Calls ollamaChat with the user's input and the attached tools

const main = async () => {
  console.log("Embedding tools...");
  const embeddings = await computeToolEmbeddings(ALL_TOOLS);
  console.log("Tools embedded.");

  const rl = readline.createInterface({
    input: process.stdin,
    output: process.stdout,
  });

  console.log("Enter your message:");
  let messages: Message[] = [];
  rl.on("line", async (input) => {
    const attachedTools = await searchTools(input, embeddings);

    messages.push({ role: "user", content: input });
    const response = await ollamaChat(messages, attachedTools);
    messages.push({ role: "assistant", content: response.content });

    console.log("Tools called:", response.content, response.tool_calls);
  });
};

main();

We can now run the project with tsx index.ts and test it out:

If we use the prompt find tool with id 123, our searchTools function will return the 5 most relevant tools. We can see that findTool has the highest similarity score.

Enter your message:
find tool with ID 123

Attaching tool: { tool: 'findTool', similarity: 0.825733057876305 }
Attaching tool: { tool: 'findToy', similarity: 0.6924025554970754 }
Attaching tool: { tool: 'findCar', similarity: 0.6859769197248136 }
Attaching tool: { tool: 'findBook', similarity: 0.6778616325470412 }
Attaching tool: { tool: 'findSong', similarity: 0.6710841959852032 }

Tools called:  [ { function: { name: 'findTool', arguments: [Object] } } ]

We can also prompt with similar terms, such as find hammer with ID 123. Even though no tool explicitly matches the term hammer our similarity search function still returns findTool as the most relevant result.

Enter your message:
find hammer with ID 123

Attaching tool: { tool: 'findTool', similarity: 0.5955042990419085 }
Attaching tool: { tool: 'findToy', similarity: 0.5411030864968827 }
Attaching tool: { tool: 'findMovie', similarity: 0.5219368182792513 }
Attaching tool: { tool: 'findSong', similarity: 0.5155495695032712 }
Attaching tool: { tool: 'findCar', similarity: 0.5132604044773412 }

Tools called:  [ { function: { name: 'findTool', arguments: [Object] } } ]

Source Code

Source code for this post is available in the accompanying inferablehq/ollama-dynamic-tools Github project.