Learning Notes #26 – Types of Unique ID Generations

Today, i learnt about various types of id generations. ID Generations are a key part in handling data, uniquely identifying entities. My goto ID generation is UUID4. You might have seem some debates on UUID vs ULID for better performance.

Related: https://parottasalna.com/2024/06/09/benchmarking-uuid4-vs-ulid-for-db-indexing/

In this blog i jot down notes on various ID generations, for quick access for my future self.

In software development, especially for cloud applications, it’s crucial to assign unique identifiers to data records, resources, and entities.

Traditionally, developers (me) have used UUIDs (Universally Unique Identifiers) for this purpose. A typical UUID looks like this: 1e7cacb7-af9f-4f07-88fd-45370c25ab62. While UUIDs are widely adopted, they have certain drawbacks, such as large size and lack of natural ordering, which can lead to inefficiencies in performance, storage, and indexing at scale.

To address these issues, several alternative methods for generating unique identifiers have been developed, each with its own advantages and disadvantages.

Here are some of the most notable alternatives,

1. Auto-Incrementing IDs (Sequential IDs)

Auto-incrementing IDs are numeric values that increase by one with each new record, commonly used in relational databases.

Pros:

Simple to implement and understand.
Efficient in terms of storage due to smaller numeric data types.
Easily indexed for better database performance.
Suitable for small-scale systems where order matters and global uniqueness isn’t required.

Cons:

Not suitable for distributed systems due to potential conflicts.
Predictable, posing security risks as IDs can be easily guessed.
Requires database coordination to avoid duplicates in sharded systems.

Note: This alternative is provided as a reference for the simple implementation that many people learn with… but is most likely not what you want, it’s normally considered a bad practice unless you have some specialized use case.

2. Snowflake ID (Twitter Snowflake)

Snowflake IDs are 64-bit unique identifiers composed of a timestamp, machine ID, and sequence number. They are designed for distributed systems to ensure uniqueness across multiple machines.

Example: 5643574219214851220

Pros:

Distributed and scalable, ideal for distributed systems.
Timestamp component provides approximate ordering.
Ensures uniqueness across systems.
Useful for scalable, time-ordered unique IDs, especially in social media or messaging apps.

Cons:

More complex to implement than simple IDs.
Timestamp-based IDs can reveal timing information.
Requires careful configuration of machine identifiers to avoid collisions.

Papers:

3. KSUID (K-Sortable Unique Identifier)

KSUIDs are 160-bit identifiers that combine a timestamp with a random payload, allowing for lexicographical sorting based on creation time.

Papers:

Example: 1avvTqCSFGnD5LDc4hN6GFFCAXD

Pros:

Sortable by creation time.
More human-readable than UUIDs.
Suitable for high-scale distributed systems.
Ideal for systems requiring time-ordered unique IDs, such as event logging.

Cons:

Timestamp exposure may not be ideal for privacy.
Slightly more complex to generate than UUIDs.

4. ULID (Universally Unique Lexicographically Sortable Identifier)

ULIDs are 128-bit identifiers that combine a timestamp with randomness, enabling lexicographical sorting and ensuring uniqueness.

Papers:

NPM: https://www.npmjs.com/package/ulid
PyPI: https://pypi.org/project/python-ulid/

Example: 22H1UECHZX3FGGSZ7A9Y9BVC1

Pros:

Sortable based on creation time.
More human-readable than UUIDs.
Suitable for high-scale distributed systems.
Useful for systems requiring lexicographical sorting along with globally unique identifiers, such as e-commerce or document management systems.

Cons:

Similar to KSUID, the timestamp is exposed, which might not be ideal for privacy.
Slightly more complex to generate than UUIDs.

5. NanoID

NanoID is a JavaScript library designed for generating unique IDs that are small, fast, and secure.

Papers & References:

Github: https://github.com/ai/nanoid
NPM: https://www.npmjs.com/package/nanoid
PyPI: https://pypi.org/project/nanoid/

Example: E9SxJKL8_K5emHi2B-noZ

Pros:

Shorter IDs compared to UUIDs, making them more URL-friendly.
Faster generation due to smaller size.
Cryptographically strong random IDs, enhancing security.

Cons:

May not be suitable for all systems, especially those requiring longer IDs for uniqueness.
Primarily designed for JavaScript environments, which might limit usage in other ecosystems.

6. ObjectID (MongoDB ObjectID)

ObjectIDs are 12-byte identifiers used by MongoDB, consisting of a timestamp, machine ID, process ID, and a counter.

Example: 102e1b71bcd16cd721434331
A 24-character hexadecimal string consisting of a timestamp, machine ID, and process ID.

Pros:

Sortable by creation time.
Compact size compared to UUIDs.
Ensures uniqueness within a MongoDB collection.

Cons:

Specific to MongoDB; may not be suitable for other systems.
Exposes information about the server and time of creation, which might be a privacy concern.

Papers:

https://www.mongodb.com/docs/manual/reference/method/ObjectId/

7. CUID (Collision-Resistant Unique Identifier)

CUIDs are designed to be human-readable, URL-safe, and collision-resistant, making them suitable for distributed systems.

Pros:

Human-readable and URL-friendly.
Collision-resistant, making it ideal for distributed systems.
Works well in both small and large-scale applications

Cons:

Larger than some alternatives like NanoID.
Less widely adopted compared to UUIDs and Snowflake IDs.

Papers & References:

https://www.npmjs.com/package/@paralleldrive/cuid2

Choosing the Right Alternative

When deciding which unique identifier to use for your cloud application, consider the following factors:

Scalability: For distributed systems, prioritize IDs like Snowflake or KSUID, which are designed for scalability.
Performance: For faster generation and smaller storage requirements, look into NanoID or auto-incrementing IDs.
Sorting Needs: If you need time-ordered identifiers, ULID, KSUID, or Snowflake IDs are great options.
Readability: For user-facing systems or URLs, CUIDs or NanoIDs might be more suitable.
Ecosystem Compatibility: Some options, like ObjectID, are tied to specific databases (MongoDB), which can be a deciding factor.

UUID and Its Different Versions

UUIDs (Universally Unique Identifiers) are a widely-used standard for generating unique identifiers. A UUID is a 128-bit value represented as a string, often formatted like this:
550e8400-e29b-41d4-a716-446655440000.

UUIDs have several versions, each suited for different use cases – UUID v1 to v8.

Version	Purpose	Best Use Case

Time-based + MAC address

Local systems needing time-based IDs

DCE Security

Legacy systems with security-related requirements

Namespace + MD5

Deterministic IDs from input data

Random

General-purpose unique IDs

Namespace + SHA-1

Deterministic, more secure than version 3

Time-ordered

Distributed systems needing sortable IDs

Custom

Encoding domain-specific data