ULIDs are awesome
Learn how ULIDs are a better alternative to UUIDs for unique identifiers, and all the hidden benefits they provide.
ULIDs are awesome. There I said it. It's one of those things that you'll look back on and wonder how you ever lived without it, once you start using it.
If you're familiar with UUIDs, ULIDs are very similar. UUIDs have subtle differences between specs. The v4 spec that I've seen most people use is designed to be more random than the other spec, while v1 uses some information from the MAC address of the machine that generated the UUID.
ULIDs on the other hand has a more straight-forward spec. They're 128-bit identifiers.
- First 48 bits are the timestamp
- Next 80 bits are the random bits
01AN4Z07BY 79KA1307SR9X4MV3
|----------| |----------------|
Timestamp Randomness
48bits 80bits
They are sortable
The first 48 bits being the timestamp means it's lexicographically sortable. Meaning, you can generate a bunch of ULIDs, and sort them by the id itself. It'll be in the order of when they were generated. This is a huge benefit when you're dealing with time-series data, and data storage systems that store time-series data.
Efficient Indexing
When you store ULIDs as a unique identifier or a primary key, you reduce index fragmentation. This is because the new ULIDs are always "newer" and can be appended to the end of the index. This prevents random writes to the index, which can cause fragmentation.
If you're working with a lot of writes and/or a lot of indexes, this can be a huge benefit.
Easier cursor-based pagination implementation
Cursor-based pagination is a way to paginate through a list of items without having to know the total number of items. For example, if you load the first 100 messages, and have to load the next 100 messages, you can use the last message message id as the cursor to load the next 100 messages.
SELECT * FROM messages WHERE id > $1 ORDER BY id ASC LIMIT 100
This is much more easier to implement than the offset-based pagination which requires you to know the total number of items. And because you can be assured that no new ULIDs will be generated after your cursor, >
is a safe operator to use.
sequenceDiagram participant Client participant Server participant DB Client->>Server: Request Page 1 Server->>DB: SELECT * FROM messages LIMIT 100 DB-->>Server: First 100 messages Server-->>Client: Messages + Last ULID Client->>Server: Request Next (Last ULID) Server->>DB: SELECT * WHERE id > last_ulid LIMIT 100 DB-->>Server: Next 100 messages Server-->>Client: Messages + New Last ULID
Uniqueness
80 bits of randomness means that the probability of a collision is extremely low. It's perfectly suitable for a usecases where you need a unique identifier across multiple distributed systems - yes, even at the rate of thousands of messages (perhaps even more) per millisecond. Although the first 48 bits (timestamp) might be the same, the last 80 bits are random, so the probability of a collision is extremely low.
Embeds time
You can reverse engineer the timestamp from the ULID, which is useful for debugging and auditing. If you keep a created_at DEFAULT NOW()
column for all your tables, you can instead use the ULID as the primary key, and then desconstruct the timestamp to get the created_at value when you need it. A trivial javascript implmementation is here.
export function decodeTime(id: string): number {
if (id.length !== TIME_LEN + RANDOM_LEN) {
throw createError("malformed ulid")
}
var time = id
.substr(0, TIME_LEN)
.split("")
.reverse()
.reduce((carry, char, index) => {
const encodingIndex = ENCODING.indexOf(char)
if (encodingIndex === -1) {
throw createError("invalid character found: " + char)
}
return (carry += encodingIndex * Math.pow(ENCODING_LEN, index))
}, 0)
if (time > TIME_MAX) {
throw createError("malformed ulid, timestamp too large")
}
return time
}
How we use ULIDs
We use ULIDs as the primary key for all our tables. Since we do a lot of time-series data (messages, events, etc), this is a huge benefit. We use the ulid
package to generate ULIDs at the application layer, rather than relying on the database to generate them. This helps us preserve the correct ordering of ULIDs, when we do batch inserts.