Building text-to-sql agents from first principles

Learn how to build a text-to-sql agent from first principles, including how to get the model to understand your database schema, how to generate SQL queries, and how to add deterministic guardrails.

Nadeesha Cabral on 23-01-2025

Natural language interfaces to databases have been a long-standing goal in human-computer interaction. Funnily enough, if you do a quick search on Google, you can see papers going back a few decades on building a natural language alternative to SQL.

We're not 100% there yet with the LLMs, but I think we've come closer than ever before.

For the purposes of this post, I'll only focus on Postgres. But the same principles apply to almost any other relational database like MySQL, SQLite, etc.

What we're trying to do.

I think of this problem broadly as 3 separte problems.

  1. The model should understand your database schema. It should know what your tables are, how to join them and the columns for SELECTs.

  2. The suggested query from the model should be executable against a database.

  3. There should be enough deterministic guardrails to make sure that a human approves either all, or a subset of the queries.

Getting the model to understand your database.

Conveniently, postgres provides you with the catalog table where you can get all the metadata about the database.

SELECT * FROM pg_catalog.pg_tables WHERE schemaname = $1

Executing this means you get all of the database objects under a particular schema. Then, you can iterate over it, and get the full schema

async function getPostgresContext(client, params) {
    const tables = await getAllTables(client);

    const context: any[] = [];

    for (const table of tables) {
      const schema = await client.query(
        `SELECT * FROM information_schema.columns WHERE table_schema = $1 AND table_name = $2`,
        [params.schema, table.tablename],
      );

      context.push({
        tableName: table.tablename.substring(0, 100),
        columns: schema.rows.map((col) => col.column_name.substring(0, 100)),
        sampleData: [],
      });
    }

    return context;
}

Now, you can load this context to the system prompt of your LLM call. It's just row text. From our experience, any large foundation model has enough information about SQL so it should be able to understand it well.

Executing the generated query

With this system context, now you can prompt the LLM with your question. For example: "Given this database schema, write a query that satisfies the following user query: $userQuery".

And you can get a structured output with the SQL query, like so:

const Query = z.object({
  sql: z.string(),
})

async function generateSQLQuery(
  dbSchema: string,
  userQuery: string,
): Promise<{ sql: string; explanation: string }> {
  const systemPrompt = `You are a SQL expert. Given the following database schema, generate SQL queries based on user requests. Database Schema: \n ${dbSchema}`;

  const response = await openai.beta.chat.completions.parse({
    model: MODE_ID,
    messages: [
      {
        role: "system",
        content: systemPrompt
      },
      {
        role: "user",
        content: `Generate a SQL query for the following request: ${userQuery}`
      }
    ],
    response_format: zodResponseFormat(Query, "query"),
  });

  return response.choices[0].message.parsed;
}

Now, you will get a structured result that you can feed back to a model call. For example:

const { sql } = await generateSQLQuery(schema, userQuery);

const result = await client.query(sql);

Adding deterministic guardrails

This should be all good for a hobby application, but we shouldn't give such a long leash to LLMs when executing queries against a production database.

There are a few options available here:

Read-only connection

Give a read-only connection to the database client. This would mean the database is only allowed to read data, but not perform any UPDATE / DELETE / DROP / TRUNCATE / INSERT etc queries.

However, this also takes away the ability to do any data manipulation, thereby reducing the usefulness of the agent.

Human in the loop

This is the most common approach. You can have a human in the loop to approve the query before it's executed. Assuming this is a CLI application, you can present the query to the user before it's executed.

const { sql } = await generateSQLQuery(schema, userQuery);

console.log(`Generated SQL: ${sql}`);

const approved = rl.question(`Do you want to execute this query? (y/n)`);

if (approved === 'y') {
  const result = await client.query(sql);
} else {
  console.log('Ok, I won\'t execute it');
}

However, this is also not the best user experience as it'll be a bit of a pain to have to approve every query.

Selective Human in the loop

This is a more advanced approach. You can have a human in the loop to approve the query before it's executed. However, this time you can check the query against a set of rules. For example, you can check if the query is a SELECT query, or if it's a UPDATE query, or if it's a DELETE query, etc.

The naive way to do this is to do a regex check on the query to detect if it's a mutative query.

const isMutativeQuery = /(UPDATE|DELETE|INSERT|CREATE|ALTER|DROP|TRUNCATE)/.test(query.toUpperCase());

This almost gets us to the point we want, however, it will cause false positives. For example, if the query is SELECT id, deleted_at FROM users WHERE id = 1, it will be flagged as a mutative query.

LLM as judge

Just putting this here for completeness, but not something I'd recommend. You can ask the LLM to check if the query is a mutative query. I've seen products do this for other use cases, however, I don't think this is a good idea, as there's no way to deterministically know if the judgement is correct.

Building a production ready agent

We've covered the core principles of building a text-to-sql agent. However, to make this a truly delightful product experience, we need to consider a few more things.

Allowing multiple iterations

  1. Sometimes, the first query you generate will be correct. You'll need to iterate over it a few times to get it right.
  2. The first query generated might be a SELECT query that might feed into an UPDATE query, if there's not enough context to generate the correct UPDATE query on the first go.

Therefore, this is a good place to build a multi-step agent. A multi-step agent architecture can handle these scenarios more effectively as the agent is able to evaluate any error output from the initial query and try again with this new context.

Getting the model to understand the data

We've seen that the model is able to understand the database schema, but it's not able to understand the data. If the model is able to read a sample of the data, it might help it generate better queries.

It's not that hard to get this done. We can just get the context getter to return a few rows of data along with the schema.

const sample = await client.query(
  `SELECT * FROM ${this.params.schema}.${table.tablename} LIMIT 1`,
);

pgsql-adapter

We've built a pgsql-adapter that makes it easy to build text-to-sql agents. It's a plug and play experience for building text-to-sql agents.

It does all of the above, and a bit more, including:

  • Modeling the context getter as a tool, so long running chats can refresh the schema if it has changed.
  • Privacy mode, where the LLM doesn't see your data, and the tool returns the result directly to you. (We're able to do this because we don't use native tool calling. More about that here)
  • A nice chat interface on the Inferable playground.

It is available as a npm package. Quick start is as simple as:

# connection string doesn't leave your local machine
npx @inferable/pgsql-adapter \
  postgresql://user:pass@localhost:5432/postgres \
  --secret=sk_inf_xxx # get this from the Inferable dashboard or self-host

It's open-source, and free to be modified and self-hosted alongside the Inferable control plane.

Subscribe to our newsletter for high signal updates from the cross section of AI agents, LLMs, and distributed systems.

Maximum one email per week.

Subscribe to Newsletter