Webinar

Watch Goldman Sachs, Nacha, and Modern Treasury discuss the future of embedded payments.Watch the webinar.

Journal|||January 4, 2023

How to Scale a Ledger, Part V

Image of Matt McNierney
Matt McNierneyEngineering

Note: This post is the fifth chapter of a broader technical paper on How to Scale a Ledger.

Ledger Guarantees: Immutability & Double-Entry

At the beginning of this white paper, we outlined four guarantees that ledger databases should give. Let’s dive deeper into what these guarantees mean and how the Ledgers API is able to provide them.

Immutability

Every state of the ledger must be recorded and can be easily reconstructed.

Immutability is the most important guarantee from a ledger. You may be wondering how the ledger described in this paper can possibly be immutable given that all of the data models are mutable.

  • Accounts: The balances change as Entries are written to them.
  • Transactions: The state can change from pending to either posted or archived. While the Transaction is pending, the Entries can change too.
  • Entries: Entries can be discarded.

These mutable fields on our core data model help the ledger match real world money movement. But ultimately, the data model is a facade that’s built on top of an immutable log. This log can be queried in a few different ways to reconstruct past states.

Ledger state by time

We’ve already covered how effective_at allows us to see what the balance on an Account was at an earlier time. We can also see the Entries that were present on an Account at a timestamp, since no Entries are ever actually deleted. The Entries on Account account_a at effective time timestamp_a can be fetched by filtering:

  • Account ID is account_a
  • effective_at is less than or equal to timestamp_a
  • discarded_at is null or greater than or equal to timestamp_a

This kind of query can be helpful if you are reading from the ledger from a job. Passing in a timestamp when querying Entries can help the job return consistent results regardless of when it is scheduled.

Ledger state by version

As we’ve seen before, timestamps aren’t sufficient to allow clients to query all past states in ledgers that are operating at scale. For one, we’ve allowed Entries to be written at any point in the past using the effective_at timestamp. Also, what if Entries share the same effective_at or discarded_at timestamp?

Ledgers should solve the drawbacks of timestamps by introducing versions on the main data models. Account versions that are persisted on Entries allow us to query exactly which Entries correspond to a given Account balance.

Account versions aren’t enough to see how Entries were grouped together in past states of a Transaction. Since the Entries on a Transaction can be modified until the Transaction is in the posted state, we need to preserve past versions of a Transaction to be able to fully reconstruct past states of the ledger. The Ledger API stores all past states of the ledger with another field on Transaction:

Version for Ledger API
The version field in the ledger API tells the current transaction version and can show previous states.

Transactions report their current version, but also past versions of the Transaction can be retrieved by specifying a version when querying.

Past Transaction versions are used when trying to reconstruct the history of a particular money movement event that involves multiple changing Accounts. Consider a bill-splitting app. A pending Transaction represents the bill when it’s first created, and users can add themselves to subsequent versions of the bill before it is posted.

Version 0
Version 1
Version 2

Audit logs

At this point, we’ve recorded past states of the main ledger data models, but we haven’t recorded who made the changes. As ledgers scale to more product use cases, it’s common for the ledger to be modified by multiple API keys and also by internal users through an admin UI. We recommend an audit log to live alongside the ledger to record that kind of information.

The Ledgers API audit logs contain these fields:

Audit log fields
A table showing audit log fields and their descriptions.

Double-entry

All money movement must record the source and destination of funds.

Once a ledger is immutable, the next priority is to make sure money can’t be created or destroyed out of nowhere. We do this by validating that every money movement records both the source and destination of funds. The principles of double-entry accounting helps us do this in a way that’s worked for centuries. We’ll focus just on the implementation details here—if you want a primer on how to use double-entry accounting, check out our guide, Accounting For Developers.

Double-entry accounting must be enforced at the application layer. When a client requests to create a Transaction, the application validates that the sum of the Entries with direction credit equals the sum of the Entries with direction debit.

This works well until a Transaction contains Entries from Accounts with different currencies. Imagine a platform that allows its users to purchase BTC, and wants to represent that purchase as an atomic Transaction. Here’s one way to structure that Transaction:

Double Entry 1
One way to structure a transaction for a BTC purchase.

This Transaction involves 4 accounts, 2 for Alice (one in USD and one in BTC, credit-normal), and 2 for the platform (one in USD and one in BTC, debit-normal). This is a valid Transaction, and the credit and debit Entries still match by the original summing method.

The original method breaks down when the Entries across currencies match, but the Transaction isn’t balanced. Here’s an example:

Double Entry 2

Even though 100 + 100 = 200, this Transaction is crediting an extra 0.000001 BTC out of nowhere, and Alice was debited $1, which vanished. The correct validation is to group Entries by currency and validate that credits and debits for each currency match.

You might think that because $18,948.90 buys 1 BTC, we could implement currency conversion Transactions with just two entries—one for USD, and one for BTC, and they balance based on the exchange rate of USD to BTC. There are three main issues with that implementation:

  1. Because currency exchange rates fluctuate over time, it would be difficult to verify that the ledger was balanced in the past.
  2. There isn’t a universally agreed-upon exchange rate for all currencies. It would be very difficult for the client and the ledger to agree that a Transaction is balanced.
  3. Having just two Accounts doesn’t match the reality of how currency conversion is performed. To allow a user to convert dollars to BTC, the platform must have a pool of BTC to give and a pool of dollars to place the user’s money. That process will always involve at least four Accounts.

Concurrency controls

It’s not possible to unintentionally spend money, or spend it more than once.

With a ledger that’s immutable and enforces double-entry semantics, you’re all set—until your app is successful and starts scaling. Concurrency controls in the Ledgers API prevent inconsistencies borne out of race conditions. Money shouldn’t be unintentionally or double spent, even when Transactions are written on the same Accounts at the same time.

We’ve already covered how our Entry data model, with version and balance locks, prevents the unintentional spending of money (see “Authorizing” above). Nothing we’ve designed so far prevents accidentally double-spending money, though. Here’s a common scenario:

  1. A client sends a request to write a Transaction.
  2. The ledger is backed up and is taking a long time to respond. The client times out.
  3. The client retries the request to write a Transaction.
  4. Concurrently with 3, the ledger actually finished creating the Transaction from 1, even though the client was no longer waiting for the response.
  5. The ledger also creates the Transaction from 3. The end state is that the ledger moved money twice when the client only wanted to do it once.

How can the ledger detect that the client is retrying rather than actually trying to move money more than once? The solution is deduplicating requests using idempotency keys.

An idempotency key is a string sent by the client. There aren’t any requirements for this string, but it is the client’s responsibility to send the same string when it’s retrying a previous request. Here’s some simplified client code that’s resilient to retries:

1idempotency_key = generate_uuid()
2
3response = None
4while response is None or not response.is_successful():
5  response = create_transaction(idempotency_key)

Notice that the idempotency key is generated outside of the retry loop, ensuring that the same string is sent when the client is retrying.

On the ledger side, the idempotency keys must be stored for 24 hours to handle these transient timeout errors. When it sees an idempotency key that’s already stored, it returns the response from the previous request. It’s the client’s responsibility to ensure that no request that moves money is retried past a 24-hour period.

Efficient aggregations

Retrieving the current and historical balances of an Account is fast.

Our ledger is now immutable, balanced, and handles concurrent writes and client retries. But is it fast? You may have already noticed in the “Entry” section above that calculating Account balances is an O(n) operation, because we need to sum across n Entries. That’s fine with 100s or 1,000s of Entries, but it’s too slow once we have 10,000s or 100,000s of Entries. It’s also too slow if we’re asking the ledger to compute the balance of many Accounts at the same time.

The Ledgers API solves this problem by caching Account balances, so that fetching the balance is a constant time operation. As we covered in the section on Accounts, we can compute all balances from pending_debits, pending_credits, posted_debits, and posted_credits—so those are the numbers we need to cache. We propose two caching strategies, one for the current Account balance and one for balances at a timestamp.

Current Account balances

The current Account balance cache is for the balance that reflects all Entries that have been written to the ledger, regardless of their effective_at timestamps. This cache is used for:

  • Displaying the balance on an Account when an effective_at timestamp is not supplied by the client.
  • Enforcing balance locks. A correct implementation of balance locking relies on a single database row containing the most up-to-date balance.

Because this cache is used for enforcing balance locks, it needs to be updated synchronously when authorizing Entries are written to an Account. This synchronous update is at the expense of response time when creating Transactions.

So that updates to the balance aren’t susceptible to stale reads, we trust the database to do the math atomically. Here’s an example Entry that we’ll use to show how to update the cache:

Current account balances
Example entry, "alice_entry_1".

The entry contains a posted debit of 100, so we atomically increment the pending_debit and posted_debit fields in the cache when it’s written.

We also need to update the cache when a Transaction changes state from pending to posted or archived. If we archived a pending debit of 100, we’d decrease pending_debits by 100 in the cache. Similarly, if the pending debit was posted, we’d increase posted_debits by 100.

Effective time balances

We also need reads for Account balances that specify an effective_at timestamp to be fast. This caching is a much trickier problem, so many ledgers avoid it by not supporting effective_at timestamps at all, at the expense of correctness for recorded Transactions. We’ve found a few methods that work, and we’ll cover two approaches here.

Anchoring

Assuming that approximately a constant number of Entries are written to an Account within each calendar date, we can get a constant time read of historical balances by caching the balance at the end of each date, and then applying the intra-day Entries when we read. Our cache saves anchor points for balance reads—for every date that has Entries, the cache can report the end-of-day balance in constant time.

The effective date cache should store the numbers that are needed to compute all balances.

Anchoring
Table showing fields for effective date cache.

Assuming we’ve cached the balance for 2022-08-31, we can get the balance for 2022-09-01T22:22:41Z with this algorithm:

  1. Get the cache entry for an effective_date that’s less than or equal to 2022-08-31. Store latest_effective_date as the effective date for the cache entry found.
  2. Sum all Entries by direction and status for latest_effective_date <= entry.effective_at <= '2022-09-01T22:22:41Z'
  3. Report the Account balance as the sum of step 1 and step 2.

Keeping the cache up to date is the bigger challenge. While we could update the current Account balance cache with one-row upsert, for the effective date cache we potentially need to update many rows—one for each date between the effective_at of the Transaction to be written, and the greatest effective date across all Entries written to the ledger.

Since it’s no longer possible to update the cache synchronously as we write Transactions, we update it asynchronously. Each Entry is placed in a queue after the Transaction is initially written, and we update the cache to reflect each Entry in batches, grouped by Account and effective date.

Anchoring chart
Chart showing entries in batches, grouped by account and effective date.

Since this processing is happening asynchronously, what happens if a client requests the balance of an Account at an effective time, but recent Entries haven’t yet been reflected in the cache? We can still return an up-to-date balance by applying Entries still in the queue to balances at read time. Ultimately, an effective_at balance is:

1balance = cached_by_date_balance + intraday_entries_sum + queued_entries_sum

Resulting balances

When individual Accounts have many intraday Entries, the anchoring approach won’t scale. We can increase the granularity of the cache to hours, minutes, or seconds. But we can truly get constant time access to Account balances by storing a resulting balance cache. Each entry in the cache stores the resulting balance on the associated Account after the Entry was applied.

The cache is simpler now, but updating it is more expensive. When an Entry is written with an earlier effective_at than the max effective_at for the Account, all cache entries for Entries after the new effective_at need to be updated.

A detailed discussion of how the cache is updated is beyond the scope of this paper. It’s no longer possible to update every cache entry atomically in the database, so each cache entry needs to keep track of how up to date it is.

Monitoring

Reading Account balances from a cache improves performance, but it also means that balance reads are no longer from the source of truth—the Entries. It’s possible for balances to diverge from Entries due to bugs. It’s important to be able to quickly notice and remedy when this drift has occurred.

The Ledgers API has:

  1. A regularly scheduled process that verifies that each Account’s cached balances match the sum of Entries.
  2. A way to automatically turn off cache reads for Accounts that have drifted.
  3. A backfill process to correct the cache.
  4. An on-call rotation and runbook for engineers to address cache drift problems 24/7.

Next Steps

This is the fifth chapter of a broader technical paper with a more comprehensive overview of the importance of solid ledgering and the amount of investment it takes to get right. If you want to learn more, download the paper, or get in touch.

Read the rest of the series:

Part I | Part II | Part III | Part IV | Part VI

Try Modern Treasury

See how smooth payment operations can be.

Talk to us
Share

Copied!