Designing the Ledgers API with Optimistic Locking

Optimistic Locking is a control method that assumes multiple transactions can complete concurrently without conflict. We built our ledgers API with concurrency control semantics to better manage transaction volume.

August 15, 2021

Andy QinEngineering

When we designed the Ledgers API, most of the functionality we wanted to support had precedent. For example, there's lots of prior art for APIs to support idempotency [1]. We generally try to follow existing standards because it makes for a less surprising developer experience [2]. This approach works most of the time, but we discovered a class of problems for which there was no obvious standard. In this post, we'll explain why we wanted to support optimistic locking in our API. We'll also examine the various different solutions we considered before describing the one we built.

Background

Developers can think of our Ledgers API as a database for recording financial transactions. Let's say we're building a peer-to-peer payment app like Venmo. When a user views their current balance, we'd fetch that user's associated ledger account and present the balance from the API response. Here's an example with a fictional user, Lucy Ledger:

If Lucy Ledger was to transfer $50 to John Journal, we'd create a ledger transaction to debit Lucy's account and credit John's. The resulting ledger transaction looks something like this:

Ledgers uses double-entry accounting to ensure that Lucy's balance decreases by the same $50 that John's increases.

This is easy to track, since there was only one request to debit Lucy. But what if we had a situation where there were multiple, concurrent requests to debit Lucy's account? How can we ensure that her balance doesn't go negative?

A naive solution is to fetch the balance of Lucy's account to check whether there is a sufficient balance before actually attempting the debit.

Two examples of balance requests before attempting the $50 debit.

This works so long as we are guaranteed that a request to debit the account doesn't happen in between these requests. But there is no such guarantee.

Example of multiple requests to debit at the same time.

In database-land, this is called a phantom read; a spooky name for a race condition. As an API provider, we have two problems to solve. First, we need to implement some kind of locking mechanism in our own database. Second, we have to decide on the right interface to expose to the consumers of our API.

The Implementation

Modern Treasury is a Ruby on Rails shop. Like most Rails apps, we use ActiveRecord as the Object-Relational Mapping (ORM). ActiveRecord provides an interface for two types of locking in our PostgreSQL database: pessimistic and optimistic.

Pessimistic locking, true to its name, assumes that conflicting database transactions happen often. ActiveRecord takes out an exclusive lock against the selected database row for writes. While a database transaction has acquired a lock to write a ledger entry against Lucy's account, other attempts to write will fail. However, the lock is exclusive, so any attempts to read the balance will also fail until the lock is released. This effectively avoids conflicts by preventing them altogether.

On the other hand, optimistic locking assumes that conflicts are rare. It requires adding a version number column called "lock_version" to support concurrent access. Whenever a row in the table changes, its version number is incremented. If two clients update a record simultaneously, one client's changes are committed while the other client gets a "StaleObjectError" because the version number won't match the one in the table. This makes us responsible for rescuing the error and rolling back instead of committing the transaction. A huge advantage over pessimistic locking is that reads aren't blocked by writes.

Looking at our request traffic made choosing optimistic locking fairly easy. We expect the majority of ledger operations to be reads, and we didn't want reads to block writes (and vice versa). So it should be easy to add the "lock_version" column to our ledger account model and be done with it, right?

It turned out not to be so straightforward. For some background, this is our architecture.

A "ledger" colloquially refers to a collection of accounts and transactions, but a row in the ledger table is just a parent object that holds metadata about its children (e.g. what currency its entries are denominated in). The same is true of ledger accounts, where it's not the resource itself that we want to version, but a collection of its child resources in ledger entries. The behavior we want is that inserting a row into the ledger entries table should update its parent ledger account's "lock_version".

Our solution was two-fold. First, we created a new "ledger account version" model to be one-to-one with every ledger account. This model holds the "lock_version" for its associated ledger account. This way, updating attributes on a ledger account don't cause versioning conflicts when the user is trying to insert ledger entries. Secondly, we called Rails' touch method on the "ledger account version" in a callback after new ledger entries are created against a ledger account. Our "ledger account version" class functions as a counter for whenever new ledger entries are posted against its associated ledger account.

The Interface

HTTP has a similar construct to Rails' "lock_version" in entity tags. From the MDN Web Docs [3]:

The "ETag" HTTP response header is an identifier for a specific version of a resource. It lets caches be more efficient and save bandwidth, as a web server does not need to resend a full response if the content has not changed. Additionally, etags help prevent simultaneous updates of a resource from overwriting each other ("mid-air collisions").

Since the protocol doesn't specify how the ETag should be generated, we could have passed in our "lock_version" version numbers. But because it seemed strange to only honor the ETag headers for a single resource in our API, we decided against it. We also didn't want discerning developers to be confused about whether the version returned referred to the ledger account object literally, or the collection of its posted ledger entries as we actually intend.

The resulting create call looks something like this:

If either "lock_version" differs, we rollback the database transaction and fail the request. Since not all requests to create ledger transactions need these concurrency guarantees, we made passing the "lock_version" optional. In those cases, we update the "lock_version" asynchronously in a background job, since it only needs to be eventually consistent.

Summary

As a serverless database, Ledgers needs to have robust data consistency features. Packaging optimistic locking in our API was an interesting technical challenge, but we're open to feedback. We're certainly not the only API company that offers some form of concurrency control over HTTP. We're also probably not the only ones to encounter a situation where a parent resource is versioned by a collection of child resources.

If you have any suggestions, feel free to contact me at andy@moderntreasury.com. Likewise, if you enjoy designing well-abstracted APIs, Modern Treasury is hiring software engineers. If you're interested or want to learn more, please reach out.