Dynamic Tool Attachment for LLM Applications
Tool calling is a feature of many Large Language Models (LLMs) which allows them to interact with external systems through functions provided by the calling application. When an LLM needs to perform an action such as searching a database or calling an API it can "call" a tool by specifying the function name and parameters in a structured format.
For example, if a user asks "What's the weather in Paris?", a LLM with access to a weather API tool might generate a response like:
{
"function": {
"name": "getWeather",
"parameters": {
"city": "Paris"
}
}
}
The application can interpret this call and use the weather API to fetch the weather in Paris.
In order for an LLM to invoke a tool, the tool's name, description, and parameters must be provided in the context of the conversation. For applications with a large number of tools, this can lead to a few challenges:
- Increased token usage as all tool descriptions are included in the context
- Higher likelihood of the model choosing the wrong tool
Dynamic Tool Attachment
One approach to combate the above issues is to dynamically attach tools to the LLM based on the user's input. For example, we can use a semantic search to dynamically attach only the most relevant tools based on the user's input.
Lets build a simple system that uses semantic search to dynamically attach tools to an LLM based on the user's input by:
- Embedding tool names and descriptions into an in-memory vector store using an embedding model
- Embedding the user's prompt using the same embedding model
- Retrieving the 5 most similar tools using a cosine similarity search
- Including the most relevant tools in the model's context with the user's prompt
Setting Up
For this project we will be using a TypeScript application and Ollama for local chat completion and embedding.
If you don't have Ollama installed, you can follow the installation instructions to get started.
Downloading Models
Download the following models for local use:
- llama3.2 (3b) for chat completion
- nomic-embed-text for generating embeddings
Node Dependencies
This project uses a couple Node dependencies:
compute-cosine-similarity
for computing the cosine similarity of different vectorstsx
for running TypeScript in Node
npm init
npm install compute-cosine-similarity tsx
Defining Our Tools
We'll start by defining a set of dummy tools for our application. Each tool follows a consistent pattern with a name, description, and parameters:
export const ALL_TOOLS = [
{
type: "function",
function: {
name: "findCat",
description: "Find the cat with the ID provided",
parameters: {
type: "object",
properties: {
id: { type: "string" },
},
required: ["id"],
},
},
},
....
]
You can find a pre-defined set of tool schemas in the accompanying project repository.
Chat completion
We will use llama3.2
(3b) for chat completion by calling the Ollama chat endpoint:
type Message = {
role: "user" | "assistant";
content: string;
};
type Tool = {
type: string;
function: {
name: string;
description: string;
parameters: unknown;
};
};
const ollamaChat = async (messages: Message[], tools: Tool[]) => {
const response = await fetch("http://localhost:11434/api/chat", {
method: "POST",
headers: { "Content-Type": "application/json" },
body: JSON.stringify({
model: "llama3.2",
messages,
tools,
stream: false,
}),
});
const data = await response.json();
return data.message;
};
Computing Embeddings
We will use nomic-embed-text
to generate vector embeddings by calling the Ollama embed endpoint:
const ollamaEmbed = async (input: string) => {
const response = await fetch("http://localhost:11434/api/embed", {
method: "POST",
headers: { "Content-Type": "application/json" },
body: JSON.stringify({
model: "nomic-embed-text",
input,
}),
});
const data = await response.json();
return data.embeddings[0];
};
Embedding tools
Let's create a function to compute embeddings for all tools and return an array of objects with both the tool schema and the embedding. We will embed the tool name and description.
const computeToolEmbeddings = async (tools: Tool[]) =>
Promise.all(
tools.map(async (tool) => {
const embedding = await ollamaEmbed(
`${tool.function.name}: ${tool.function?.description}`
);
return {
tool,
embedding,
};
})
);
While this in memory data structure works for the purposes of this example, in a real-world application you would want to store embeddings in a database such as pgvector to avoid needing to re-compute the embeddings.
Tool Search Implementation
Let's implement a simple semantic search function using the compute-cosine-similarity
package.
type EmbededTool = {
tool: Tool;
embedding: number[];
};
const searchTools = async (input: string, embeddings: EmbededTool[]) => {
const messageEmbedding = await ollamaEmbed(input);
return embeddings
.map((embedding) => ({
tool: embedding.tool,
similarity: similarity(embedding.embedding, messageEmbedding),
}))
.sort((a, b) => b.similarity - a.similarity)
.slice(0, 5);
};
This function:
- Embeds the user's input
- Compares the user's embedding with all tool embeddings
- Returns the top 5 most relevant tools based on the similarity score (higher is more relevant)
Putting it all together
The last part of our project is a main
function that:
- Embeds all tools using
computeToolEmbeddings
- Prompts the user for input using
readline
- Uses
searchTools
to compare the user's input with the tool embeddings - Calls
ollamaChat
with the user's input and the attached tools
const main = async () => {
console.log("Embedding tools...");
const embeddings = await computeToolEmbeddings(ALL_TOOLS);
console.log("Tools embedded.");
const rl = readline.createInterface({
input: process.stdin,
output: process.stdout,
});
console.log("Enter your message:");
let messages: Message[] = [];
rl.on("line", async (input) => {
const attachedTools = await searchTools(input, embeddings);
messages.push({ role: "user", content: input });
const response = await ollamaChat(messages, attachedTools);
messages.push({ role: "assistant", content: response.content });
console.log("Tools called:", response.content, response.tool_calls);
});
};
main();
We can now run the project with tsx index.ts
and test it out:
If we use the prompt find tool with id 123
, our searchTools
function will return the 5 most relevant tools.
We can see that findTool
has the highest similarity score.
Enter your message:
find tool with ID 123
Attaching tool: { tool: 'findTool', similarity: 0.825733057876305 }
Attaching tool: { tool: 'findToy', similarity: 0.6924025554970754 }
Attaching tool: { tool: 'findCar', similarity: 0.6859769197248136 }
Attaching tool: { tool: 'findBook', similarity: 0.6778616325470412 }
Attaching tool: { tool: 'findSong', similarity: 0.6710841959852032 }
Tools called: [ { function: { name: 'findTool', arguments: [Object] } } ]
We can also prompt with similar terms, such as find hammer with ID 123
.
Even though no tool explicitly matches the term hammer our similarity search function still returns findTool
as the most relevant result.
Enter your message:
find hammer with ID 123
Attaching tool: { tool: 'findTool', similarity: 0.5955042990419085 }
Attaching tool: { tool: 'findToy', similarity: 0.5411030864968827 }
Attaching tool: { tool: 'findMovie', similarity: 0.5219368182792513 }
Attaching tool: { tool: 'findSong', similarity: 0.5155495695032712 }
Attaching tool: { tool: 'findCar', similarity: 0.5132604044773412 }
Tools called: [ { function: { name: 'findTool', arguments: [Object] } } ]
Source Code
Source code for this post is available in the accompanying inferablehq/ollama-dynamic-tools Github project.