Functions as AI Agents


TL;DR: We explore how treating functions as AI agents with fuzzy contracts could enable more flexible and powerful AI systems, while maintaining the benefits of functional programming principles like composition and purity.

When we started building Inferable, we began with a deceptively simple idea: Can a function - a ubiquitous programming construct - serve as its own AI agent?

This concept has become almost commonplace now through function calling (or tool calling) that foundational models have popularized. However, we believe there's an elegant way to reason about how AI agents should evolve.

Strict Contracts vs. Fuzzy Contracts

Before we delve deeper, let's address the why. Why do we even need to explore this?

Usually, function inputs are (mostly) static. You can't simply provide Gary Oldman to f send_email(address). (Well, you can - but it won't send the email to Gary Oldman.)

To make this work properly in code, you'd typically call f get_email_by_name(name) and then pipe that result to send_email(). However, in real-life scenarios, you'd need to handle multiple results for "Gary Oldman" in the database (among other complexities). Due to the inability to make deterministic guarantees about finding the right result, you might throw an exception if you don't find an exact match for the name, or, as is often the case, defer to human intervention.

Now, imagine if the f send_email contract were fuzzy. This would enable some interesting possibilities:

  • Send an email to Gary Oldman, and if you can't find the email, ask a human for help.
  • Send an email to Gary Oldman's most recent email address.
  • Send an email to User ID #3194

By using the function as the primitive, our fuzzy contract becomes a probabilistic container for a deterministic core. The code can ensure determinism, but the fuzzing is context-dependent, malleable, and execution-context aware.

Functional Composition as Agent Composition

Just like you'd compose primitive functions together to create an aggregate result, what if you could compose AI agents? To do this, you have to model an agent as a unit that accepts an input and then produces an output.

The thing that makes functional composition useful in a programming context is the fact that you can reason about the output schema of the result. However, LLMs are probabilistic, and so are AI agents.

So there's our first problem: How do you get AI agents to produce structured, deteministic results?

Suppose that we can reliably and deterministically compose AI agents together. This then allows an agent to be a primitive unit of execution, in a chain of agents. Higher order agents have context about lower order agents, and lower order agents encapsulate the minutia that higher order agents don't need to know about.

This gives us 2 advantages:

  1. Just like the execution context of a function is locally scoped, our AI agents have smaller, well-defined contexts. Apart from the performance and cost benefits, smaller context windows also improve the evals significantly.

  2. Much like how FaaS platforms allow individual functions to execute in different runtimes, as a unit of execution, a "functional agent" can choose to execute on the best possible LLM for the job at hand.

graph TD
    A[Supervisor] --> B[Agent 1]
    A --> C[Agent 2]
    B --> E[Lower Order Agent 1.1]
    B --> F[Lower Order Agent 1.2]
    C --> G[Lower Order Agent 2.1]
    E --> J[Execution Context 1.1]
    F --> K[Execution Context 1.2]
    G --> L[Execution Context 2.1]

Pure Functions as Agents

💡 Pure functions always produce the same output for the same input. While LLMs are probabilistic, we can explore ways to make AI agents more deterministic.

Pure functions in the programming context are devoid of side effects. A side-effect-free function will produce the same output given the same input. However, we can't make the same case for our AI agent with fuzzy contracts.

Firstly, LLMs provide probabilistic results. The very act of calling an LLM over a network induces a side effect. Therefore, using LLMs as a programming primitive that produces a deterministic result is challenging.

However, imagine that an agent can express that its internals are pure. That is, the agent knows it's evaluating a pure function. Therefore, the ~same prompt (p) on the same pure function should produce the same result. (Note: I'm not saying that it does, but merely that given a temperature of 0, that it would be helpful if it did.)

In theory, we should be able to completely bypass all n-1 model calls and retrieve the result from a cache that hashes the prompt + function definition.

However, to get this right, we have to solve three other problems:

  1. Structured prompts: Some structured DSL (domain specific language) that maximises the chance of producing the same prompt for the same intent.
{
  "state": { "name": "Gary Oldman" },
  "action": "Find email"
}
  1. Distributed caching: If agents are distributed, then the caching must be as well. (A mostly solved problem.)

  2. Language serialisable function constructs: For us to determine the cache key for f, we should be able to generate the most specific cache key for f, it's input schema and the output schema.

  3. Some way for the function to signal deterministically that it's a pure function.

If we solve this, then we essentially make the agent resemble a pure function with cacheable results.

Pure Functions as Free Exploits

🎯 The key insight: Pure functions allow us to safely explore and understand a system's capabilities without worrying about side effects.

A complex distributed environment will have hundreds and if not thousands of functions to choose from, in order to be morphed into an agent.

If we can safely execute a function, and (again) throw enough fuzz at it, we can infer the internals of the function, without reading the source code. The reason why we can't repeatably call a function with fuzzy inputs is because of the possible side-effects and the fact that our fuzzing will change the state of the system in unintended ways. (Calling send_email multiple times with random inputs will send emails to random people)

If you think of schema observation as exploring and function execution as exploiting in an explore/exploit problem, a tool search (or which agent to call) is a more interesting problem.

A function free from side-effects is essentially a ~free exploit when trying to decide whether to explore or exploit. Therefore, in order of difficulty, we have:

  1. Determinisitcally determining if a function is pure (not very hard)
  2. Isolating side-effects in an impure function (very hard)

Dynamic Runtimes

We already established that the evaluation context of an agent should ideally be local to the agent, and how this allows us to run different agents on different LLMs which are distributed across a data center, or the internet.

It's somewhat trivial to do this with static configuration. You would just define a runsOn, in the agent and the orchestration plane refers to that in order to correctly execute.

However, the holy grail here is to allow this to happen dynamically and reliably.

If the execution plane is context-aware enough, the routing can both improve the cost and performance of evaluations dynamically. Some of these variables may be:

  1. Real-time model performance (latency / token throughput)
  2. Inference cost per token
  3. Max context window required per evaluation

In a not so distant future I hope we will have LLM hosting platforms with "spot pricing" and access to millions of specialised models with varying weights. The use cases might range from code formatting to excel formula evaluation.

In many ways, your run of the mill LLM application is very much a very constrained and specific prompt applied over a very large general purpose model. But a shorter prompt over a specialised model is both faster and cheaper. You just have to find your way (route) to the right model.

Distributed Execution

Functions are encapsulated by computer programs, and computer programs run on computers. Most of these computers run the same copy of the program for redundancy and scale.

For any of this agent stuff to work at some useful scale, we need solve the problem of millions of computers working in concert on thousands of data centres:

  1. Having (almost) the same shared view of the world at any given point of time (state)
  2. What to do about it and computing it (action)
  3. Agreeing with each other on the new view of the world (next state)

Some of these computers host parts of LLMs, some host deterministic code written by programmers that act as invariants for the evaluation.

It has to move state between these computers, and do so selectively and deterministically. For example, you don't want to send a 100GB CSV through an LLM to SUM a column. It's much cheaper to sample the data and let the model generate an aggregation query, which can be applied on the 100GB CSV out of band.

While doing that, it also has to product truthful information about it's internal state, and write it to an immutable log that humans (and agents - in a near future) can consult to debug when things go wrong. Ok, they will all so terribly go wrong at times.

Building distributed systems that "works on my machine" are hard, but we (as a industry) have had a lot of practice at it. Building distributed systems that wrap deterministic code in probabilistic containers and producing reliable results is compounding that complexity by a few orders of magnitude.

But also, this is what excites us.