The Unreasonable Effectiveness of SKIP LOCKED in PostgreSQL
Using SKIP LOCKED to distribute work across machines in a distributed system is a popular pattern. We explore why, the alternatives, and a real world example from our own codebase.
When building distributed systems that need to process jobs or tasks concurrently, one of the most challenging problems is ensuring that work is distributed efficiently without duplication. PostgreSQL's SELECT FOR UPDATE SKIP LOCKED
feature provides an elegant and performant solution to this problem that almost feels too good to be true. We're big believers in the just use postgres ethos, as our source code might tell you.
The Problem: Distributed Job Processing
Imagine you have a queue of jobs that need to be processed by multiple workers. The core challenge is: how do you ensure that each job is processed exactly once, while maximizing throughput and minimizing contention?
The naive approach might look something like this:
BEGIN;
SELECT * FROM jobs WHERE status = 'pending' LIMIT 1;
-- Process job
UPDATE jobs SET status = 'completed' WHERE id = ?;
COMMIT;
This has an obvious race condition - two workers could select the same job before either marks it as completed. Here's what that looks like:
sequenceDiagram participant W1 as Worker 1 participant DB as Database participant W2 as Worker 2 W1->>DB: SELECT pending job W2->>DB: SELECT pending job Note over W1,W2: Both workers get same job W1->>DB: UPDATE job as completed W2->>DB: UPDATE job as completed Note over W1,W2: Job processed twice!
You might try to fix this with SELECT FOR UPDATE
:
BEGIN;
SELECT * FROM jobs WHERE status = 'pending' FOR UPDATE LIMIT 1;
-- Process job
UPDATE jobs SET status = 'completed' WHERE id = ?;
COMMIT;
With FOR UPDATE, the behavior changes but creates a bottleneck:
sequenceDiagram participant W1 as Worker 1 participant DB as Database participant W2 as Worker 2 W1->>DB: SELECT FOR UPDATE Note over DB: Row locked W2->>DB: SELECT FOR UPDATE Note over W2: Waits for lock... W1->>DB: UPDATE completed Note over DB: Lock released W2-->>DB: Finally gets lock W2->>DB: UPDATE completed
Enter SKIP LOCKED
This is where SKIP LOCKED
comes in. By adding this clause, we tell PostgreSQL to skip any rows that are already locked by other transactions, rather than waiting for the locks to be released:
BEGIN;
SELECT * FROM jobs WHERE status = 'pending' FOR UPDATE SKIP LOCKED LIMIT 1;
-- Process job
UPDATE jobs SET status = 'completed' WHERE id = ?;
COMMIT;
Here's how SKIP LOCKED allows workers to process jobs concurrently:
sequenceDiagram participant W1 as Worker 1 participant DB as Database participant W2 as Worker 2 W1->>DB: SELECT FOR UPDATE SKIP LOCKED Note over DB: Job 1 locked W2->>DB: SELECT FOR UPDATE SKIP LOCKED Note over DB: Gets different job par Process in parallel W1->>DB: UPDATE Job 1 completed and W2->>DB: UPDATE Job 2 completed end
Now, transactions can claim jobs without ever blocking each other. This works for us because our goal is to get through the queue as quickly as possible.
Real World Usage
Postgres-based Job Queues
Solid Queue by 37signals showcases the power of this pattern. Solid Queue is a PostgreSQL-backed job queue for Ruby on Rails that relies heavily on SKIP LOCKED
to handle job distribution.
In fact many other postgres backed job queues use this pattern, including PG Boss.
Inferable's Job Processing System
At Inferable, we use SKIP LOCKED
in our job processing system to distribute work across machines that long poll the control plane.
When the agent runtime determines that an action needs to be peformed (a function/tool call), it will insert a new job into the jobs table with a status of pending
. Then we wait until a suitable machine (worker) polls the jobs table for a job. Since at a given point, there could be hundreds of mahines competing for thousands of jobs concurrently, we use SKIP LOCKED
to ensure that we don't block each other.
Here's a snippet from our codebase that shows how we claim jobs:
const results = await db.execute(sql`
UPDATE jobs SET
status = 'running',
remaining_attempts = remaining_attempts - 1,
last_retrieved_at = now(),
executing_machine_id = ${machineId}
WHERE id IN (
SELECT id FROM jobs
WHERE status = 'pending'
AND cluster_id = ${clusterId}
AND service = ${service}
LIMIT ${limit}
FOR UPDATE SKIP LOCKED
)
RETURNING id, target_fn, target_args...`);
Technical Details
When using SELECT FOR UPDATE SKIP LOCKED
, there are a few important technical considerations from the PostgreSQL documentation:
-
The
SKIP LOCKED
clause only applies to row-level locks - the requiredROW SHARE
table-level lock is still taken normally. -
Using
SKIP LOCKED
provides an inconsistent view of the data by design. This is why it's perfect for queue-like tables where we want to distribute work, but not suitable for general purpose work where consistency is required. -
SKIP LOCKED
can be combined withLIMIT
to grab batches of jobs. The locking will stop once enough rows have been returned to satisfy the limit. In fact, this is how we implement our long polling logic.