High Level System Design - Banking Application
Designing a scalable banking application that allows users to manage multiple accounts with full CRUD operations, ensuring ACID compliance and security while optimizing for high-volume transaction periods like paydays.
Banking Application
In this article, I will walk through how to design a banking application in a system design interview. The goal is not to design every banking feature that exists in the real world. The goal is to show that we can reason clearly about money movement, consistency, security, auditability, and scale.
A banking application lets customers open accounts, deposit funds, withdraw funds, transfer money, and inspect balances and transaction history. Apps from Chase, Capital One, Revolut, or Chime all contain versions of these workflows. What makes this problem interesting is that a bank is not just another CRUD app. A social feed can tolerate a stale like count for a few seconds. A banking system cannot casually lose money, double-charge a customer, or show a balance that allows a user to spend funds they do not have.
The core interview signal is correctness. The interviewer wants to see whether we understand ACID transactions, double-entry ledgers, concurrency control, idempotency, data modeling, access control, and the tradeoff between strong consistency for balances and high availability for lower-risk reads.
1. Clarify the Problem
Before jumping into architecture, we should clarify the scope. In a real interview, this is where we slow the conversation down and ask enough questions to avoid designing the wrong thing.
Good clarifying questions include:
- Are we designing checking and savings accounts only, or do we also need credit cards and loans?
- Do users have one account or multiple accounts?
- Are transfers only internal between accounts in our bank, or do we need external ACH, wire, card, or real-time payment networks?
- Do deposits settle immediately, or can some deposits be pending?
- Do we need to support multiple currencies?
- Are account balances required to be strongly consistent on every read?
- What is the expected scale: users, accounts, transactions per second, and peak traffic during payday?
- Do we need administrator, teller, or support workflows?
- What compliance expectations should we mention, such as audit logs, encryption, retention, and access control?
For this design, I will make a conservative interview-friendly assumption: we are building the core banking ledger and customer account experience for checking and savings accounts. We will support deposits, withdrawals, internal transfers, balances, and transaction history. External networks such as ACH, card rails, wire transfers, and Zelle-like integrations can exist at the boundary, but we will not design those networks in depth.
2. Functional Requirements
The banking application should allow customers to create and close bank accounts. A single user may have multiple accounts, such as one checking account and one savings account. An account can be active, frozen, or closed. Closing an account should only be allowed when the account has no unsettled transactions and, depending on policy, a zero balance or an approved final payout path.
Customers should be able to deposit money into an account. Deposits can come from several sources: cash deposits at a branch or ATM, check deposits, payroll direct deposits, or third-party payment apps. Not every deposit has the same settlement behavior. A cash deposit may become available immediately, while a check deposit may be held until it clears. A payroll deposit may arrive as a file or message from an external payment rail and should be processed idempotently.
Customers should be able to withdraw money from an account. Withdrawals may happen as ATM cash withdrawals, branch cash withdrawals, checks, or internal payout flows. A withdrawal must validate available funds, account status, limits, and authorization before completing.
Customers should be able to transfer money between accounts. For this article, the transfer is internal: both the source account and destination account live inside our system. Internal transfers are the cleanest way to demonstrate ACID money movement because we can debit one account and credit the other in the same database transaction.
Customers should be able to view balances and transaction history. Balance reads are user-facing and sensitive. Transaction history can be paginated, filtered, and served from read-optimized storage as long as we are explicit about freshness guarantees.
3. Out of Scope
It is useful to state what we are intentionally not designing. This protects the interview from growing into five different payment systems at once.
Out of scope for the first version:
- Credit cards
- Loans and interest calculation
- Full fraud machine learning system
- External ACH, card, wire, or RTP network design
- Stock and crypto trading
- Bill pay
- Merchant acquiring
- International remittance
- Dispute and chargeback workflows
Some of these topics may still appear as integrations or future extensions. For example, a fraud service can consume transaction events after a transfer is created, but the core ledger should not depend on an asynchronous fraud pipeline to maintain basic balance correctness.
4. Non-Functional Requirements
The most important non-functional requirement is strong consistency for balance-changing operations. Deposits, withdrawals, and transfers must be processed exactly once from the user's perspective, even if the client retries, the network times out, or two requests arrive concurrently.
Money movement must be ACID-compliant. If a transfer debits one account and credits another, both changes must commit together or neither should commit. A partial transfer is unacceptable.
The system should be highly available for reads, especially for transaction history, account lists, and profile data. However, we should be careful with the phrase "high availability" when discussing balances. The latest balance should come from the strongly consistent source of truth or a read path with explicit consistency guarantees.
Security and authorization are first-class requirements. Users must only access their own accounts. Sensitive data must be encrypted in transit and at rest. Administrative actions must be logged. Authentication should support strong password hashing, MFA, session management, and risk-based controls.
The system must be auditable. Every money movement should leave an immutable trail showing what happened, when it happened, who initiated it, which accounts were affected, and what the resulting balances were.
The system must handle spikes, especially payroll or payday traffic. Payday can create bursts of deposits, balance checks, and transfers. The architecture should absorb read spikes with caching and replicas while keeping money movement serialized correctly where required.
The money movement APIs should be idempotent. A retry should not create a second deposit, withdrawal, or transfer.
5. Key Design Principle: Ledger First
A common mistake is to model the account balance as the entire truth. For example, we might store an accounts.balance column and simply update it for every deposit or withdrawal. That is fast, but by itself it is not enough for a banking system.
The safer design is ledger-first. The ledger is an immutable record of financial entries. The current balance can be stored on the account row for fast reads, but the ledger explains how that balance was produced.
This gives us several advantages:
- We can audit every movement of money.
- We can reconstruct balances if needed.
- We can reconcile internal records against external settlement systems.
- We can debug customer support issues.
- We can detect accounting inconsistencies.
In an interview, saying "the account balance is a cached projection of the ledger, not the only source of truth" is a strong signal. In practice, many systems store both: an immutable ledger for correctness and a current balance column for efficient reads and locking.
6. Core Entities
The exact schema can vary, but we need a few core concepts: users, accounts, transactions, ledger entries, and idempotency records.
User
The user represents a customer who can own one or more bank accounts.
User
- id
- name
- email
- phone
- password_hash
- status: ACTIVE | LOCKED | CLOSED
- created_at
- updated_atIn a real system, identity and KYC details may live in separate services. For this high-level design, the important relationship is that a user owns accounts and authenticates into the application.
Account
The account stores the current state of a checking or savings account.
Account
- id
- user_id
- type: CHECKING | SAVINGS
- currency
- current_balance
- available_balance
- status: ACTIVE | FROZEN | CLOSED
- created_at
- updated_atIt is useful to distinguish current_balance from available_balance. Current balance reflects posted ledger activity. Available balance reflects what the customer can spend after holds, pending deposits, or reserved funds. If the interview does not require this level of detail, we can simplify to one balance, but mentioning the distinction shows practical banking awareness.
Transaction
A transaction represents a business-level money movement request.
Transaction
- id
- type: DEPOSIT | WITHDRAWAL | TRANSFER
- status: PENDING | COMPLETED | FAILED | REVERSED
- amount
- currency
- source_account_id nullable
- destination_account_id nullable
- idempotency_key
- requested_by_user_id
- failure_reason nullable
- created_at
- completed_at nullableThe transaction record is what the user sees in transaction history. It groups together the lower-level ledger entries that actually update account balances.
LedgerEntry
The ledger entry is the immutable accounting record.
LedgerEntry
- id
- transaction_id
- account_id
- direction: DEBIT | CREDIT
- amount
- currency
- balance_after
- created_atFor a transfer, we create at least two ledger entries: one debit from the source account and one credit to the destination account. The sum of debits and credits should balance for each completed transaction.
IdempotencyKey
The idempotency record prevents duplicate processing when a client retries the same logical request.
IdempotencyKey
- id
- user_id
- idempotency_key
- request_hash
- transaction_id nullable
- response_status
- created_at
- expires_atThe request_hash is important. If the client reuses the same idempotency key with a different amount or destination account, we should reject it rather than returning a misleading response.
7. API Design
The API should separate account management from money movement. Account management can look like a regular REST API, while money movement endpoints need stricter validation, idempotency, and authorization.
Account APIs
POST /accounts
GET /accounts
GET /accounts/:accountId
POST /accounts/:accountId/closeI prefer POST /accounts/:accountId/close over DELETE /accounts/:accountId because closing a bank account is a business workflow, not a simple deletion. We do not delete financial records.
Create account request:
{
"type": "CHECKING",
"currency": "USD"
}Deposit API
POST /accounts/:accountId/deposits
Idempotency-Key: dep_abc_123{
"amount": 50000,
"currency": "USD",
"sourceType": "CASH"
}Amounts should be represented in minor units, such as cents, to avoid floating point errors. A deposit of $500.00 becomes 50000.
Withdrawal API
POST /accounts/:accountId/withdrawals
Idempotency-Key: wd_abc_456{
"amount": 20000,
"currency": "USD",
"withdrawalType": "ATM"
}Transfer API
POST /transfers
Idempotency-Key: tr_abc_789{
"fromAccountId": "acct_1",
"toAccountId": "acct_2",
"amount": 10000,
"currency": "USD"
}Important validation rules:
- The source and destination accounts must exist.
- The caller must be authorized to debit the source account.
- The accounts must be active.
- The amount must be positive.
- The currency must match the accounts or go through a dedicated FX flow.
- The source account must have sufficient available funds.
- The idempotency key must be unique for that user and request.
8. High-Level Architecture
At a high level, the system can be organized like this:
Mobile/Web Client
|
v
API Gateway
|
v
Auth / Session Layer
|
v
Banking API Service
|
+--------------------+
| |
v v
Postgres Primary Redis Cache
|
v
Read Replicas
|
v
Analytics / Statements / Search
Banking API Service
|
v
Outbox Table
|
v
Outbox Publisher
|
v
Message Broker
|
+--> Notification Service
+--> Fraud / Risk Service
+--> Statement Service
+--> Audit PipelineThe Banking API Service owns synchronous money movement. It talks to the primary relational database for deposits, withdrawals, transfers, and fresh balance reads.
Redis can be used for low-risk cached data, rate limiting, session support, and possibly account summaries with careful invalidation. It should not be the source of truth for balances.
Read replicas can serve transaction history, statements, and account lists where a small amount of replica lag is acceptable. For fresh balance reads immediately after a transfer, we should read from the primary or use a consistency token/versioning approach.
Asynchronous consumers should handle notifications, analytics, statements, and risk scoring that does not need to block the core transaction. The outbox pattern helps ensure that events are published reliably after the database transaction commits.
The outbox publisher is a small background worker, not a direct user-facing service. It connects to the same database as the Banking API Service, reads unpublished rows from the outbox_events table, publishes those events to the message broker, and then marks them as published. In other words, the Banking API Service writes durable events, and the outbox publisher delivers them.
9. Database Choice
For the source of truth, a relational database such as Postgres is a strong default choice.
Postgres gives us:
- ACID transactions
- Row-level locking
- Foreign key constraints
- Unique indexes
- Transaction isolation levels
- Mature replication
- Partitioning options
- Operational familiarity
This is a good fit because money movement is relational and consistency-sensitive. We need to atomically update account rows, insert transaction records, insert ledger entries, and enforce uniqueness on idempotency keys.
A NoSQL database can be useful in surrounding systems such as analytics, search, or denormalized transaction history views, but it is not the best default source of truth for balances unless the team has a very specific design that preserves equivalent transactional guarantees.
10. ACID Transfer Flow
An internal transfer is the most important flow to explain because it touches two accounts and must be atomic.
The basic flow:
BEGIN;
1. Validate the idempotency key.
2. Lock the source and destination account rows.
3. Validate account status, ownership, currency, and limits.
4. Validate sufficient available balance.
5. Insert a transaction record.
6. Debit the source account.
7. Credit the destination account.
8. Insert ledger entries with balance_after values.
9. Mark the transaction as COMPLETED.
10. Insert an outbox event.
COMMIT;All of this should happen in one database transaction. If any step fails, we roll back the entire operation.
One subtle but important detail is lock ordering. If two concurrent transfers lock the same accounts in opposite order, they can deadlock.
Bad lock pattern:
Transfer A: lock account 1, then account 2
Transfer B: lock account 2, then account 1Safer lock pattern:
Always lock the smaller account_id first.
Then lock the larger account_id.In SQL, this can be done by selecting both account rows in deterministic order with FOR UPDATE.
SELECT *
FROM accounts
WHERE id IN ($1, $2)
ORDER BY id
FOR UPDATE;The application can then identify which locked row is the source and which is the destination.
11. Preventing Double Spending
Double spending happens when two requests see the same balance and both think they can spend it.
Example:
Account balance: $100
Request A withdraws $80
Request B withdraws $80Without locking, both requests might read the $100 balance and both might succeed. The account would either go negative or the system would record an impossible state.
With row-level locking:
Request A locks the account.
Request B waits.
Request A withdraws $80 and commits.
Request B reads the updated balance of $20.
Request B fails because funds are insufficient.The key operation is:
SELECT *
FROM accounts
WHERE id = $1
FOR UPDATE;This ensures only one balance-changing operation can update the same account row at a time.
Another option is optimistic concurrency control with a version column:
UPDATE accounts
SET current_balance = current_balance - $1,
version = version + 1
WHERE id = $2
AND version = $3
AND current_balance >= $1;If the update count is zero, the service retries or fails gracefully. Optimistic locking can work well when contention is low. For bank account withdrawals and transfers, row-level locking is often easier to explain and reason about in an interview.
12. Idempotency
Idempotency protects us from duplicate processing. Clients retry requests all the time because networks are unreliable.
Without idempotency:
User submits a $500 deposit.
Server processes the deposit successfully.
Client times out before receiving the response.
Client retries the same request.
Server processes another $500 deposit.That is a serious bug. The client intended one deposit, but the system recorded two.
The solution is to require an Idempotency-Key header for money movement APIs. We create a unique constraint such as:
UNIQUE(user_id, idempotency_key)When the first request arrives, we store the key and process the transaction. If the same request is retried, we return the original result instead of creating a new money movement.
There are a few edge cases:
- If the same key is reused with the same request body, return the original response.
- If the same key is reused with a different request body, return an error.
- If the original request is still in progress, return a pending response or wait for completion.
- If the service crashes after committing the transaction but before responding, the retry should still find the committed transaction and return it.
This is why idempotency records and transaction records should be written in the same database transaction as the money movement.
13. Deposit Design
Deposits look simple, but different deposit types have different settlement behavior.
A cash deposit can usually be posted immediately. The service validates the account, creates a transaction, credits the account, inserts a ledger entry, and commits.
A check deposit may need a pending state. The customer can see the deposit, but funds may not be available until the check clears. In that case, we may create a pending transaction and update available_balance only after settlement.
Payroll deposits often arrive in batches from an external system. Payday traffic can be heavy because many payroll deposits arrive around the same time and users immediately open the app to check balances. Batch processing should be idempotent at the file level and at the individual deposit instruction level.
For a simple posted deposit:
BEGIN;
1. Validate idempotency key.
2. Lock account.
3. Validate account is active.
4. Insert transaction.
5. Increase account balance.
6. Insert credit ledger entry.
7. Mark transaction completed.
8. Insert outbox event.
COMMIT;Deposit edge cases:
- Duplicate payroll file or duplicate deposit event
- Deposit into a closed or frozen account
- Unsupported currency
- Amount exceeds limit
- Check deposit later fails
- Client retries after timeout
- Deposit is posted but notification delivery fails
The last case should not roll back the deposit. Notifications belong after commit through an asynchronous event.
14. Withdrawal Design
Withdrawals require stricter validation because they remove money from an account.
For a simple withdrawal:
BEGIN;
1. Validate idempotency key.
2. Lock account.
3. Validate account is active.
4. Validate sufficient available balance.
5. Validate withdrawal limits.
6. Insert transaction.
7. Decrease account balance.
8. Insert debit ledger entry.
9. Mark transaction completed.
10. Insert outbox event.
COMMIT;Withdrawal edge cases:
- Insufficient funds
- Concurrent withdrawal requests
- ATM timeout after cash is dispensed
- ATM timeout before cash is dispensed
- Daily withdrawal limit exceeded
- Account is frozen
- Account is closed
- User is authenticated but not authorized for the account
ATM workflows can be tricky because physical cash dispensing may happen outside our database transaction. In a real design, we would coordinate with the ATM network using authorization and completion messages. For this interview scope, we can say our ledger records the completed withdrawal only when the external withdrawal confirmation is received, and all external messages are processed idempotently.
15. Ledger Design in Detail
The ledger should be append-only. We should not update or delete ledger entries after creation. If we need to correct a mistake, we create a reversing transaction.
For a $100.00 transfer from Account A to Account B:
Transaction
- id: txn_1
- type: TRANSFER
- amount: 10000
- currency: USD
- status: COMPLETED
Ledger Entries
- transaction_id: txn_1
account_id: Account A
direction: DEBIT
amount: 10000
balance_after: 90000
- transaction_id: txn_1
account_id: Account B
direction: CREDIT
amount: 10000
balance_after: 25000For internal transfers, total debits should equal total credits. This is the essence of double-entry accounting. If the system debits one account but does not credit another, we have created a broken financial record.
We can enforce parts of this in application logic and parts with database constraints. For example:
- Amount must be positive.
- Currency must be present.
- Ledger entries must reference a transaction.
- Ledger entries should be immutable.
- A completed transfer should have both debit and credit entries.
Some invariants, such as "sum of debits equals sum of credits for a transaction," may require application logic or a database trigger because they involve multiple rows.
16. Balance Reads
There are two common ways to get a balance:
- Read the stored balance from the
accountstable. - Recompute the balance from ledger entries.
Recomputing from the ledger on every request is expensive. The normal design is to store the current balance on the account row and update it transactionally whenever ledger entries are inserted.
For a fresh balance, read from the primary database:
GET /accounts/:accountId/balanceThe response can include both current and available balance:
{
"accountId": "acct_1",
"currency": "USD",
"currentBalance": 100000,
"availableBalance": 90000,
"asOf": "2026-05-05T10:30:00Z"
}For account summaries, caching can help, but we need to be careful. If a user just completed a transfer, showing a stale balance can create confusion or even enable bad decisions. A practical approach is:
- Read fresh balances from the primary after money movement.
- Use short-lived cache for low-risk account summaries.
- Invalidate or update cache after successful writes.
- Clearly separate available balance from pending activity.
17. Transaction History
Transaction history can be served from the transaction table, from a read replica, or from a denormalized read model optimized for filtering and pagination.
The API might look like:
GET /accounts/:accountId/transactions?limit=50&cursor=txn_123Transaction history should support cursor-based pagination because a customer may have many transactions. Sorting by creation time plus a stable ID is better than offset pagination at scale.
Example response:
{
"items": [
{
"id": "txn_1",
"type": "TRANSFER",
"amount": 10000,
"currency": "USD",
"direction": "DEBIT",
"status": "COMPLETED",
"createdAt": "2026-05-05T10:30:00Z"
}
],
"nextCursor": "txn_1"
}Transaction history can tolerate slightly more read scaling techniques than balance writes. Read replicas, search indexes, and denormalized projections are reasonable here as long as the product accepts possible lag.
18. Handling Payday Traffic
Payday creates two kinds of load:
- Write spikes from payroll deposits
- Read spikes from users checking balances and transaction history
The write path should remain strongly consistent. We should not route balance-changing operations to eventually consistent stores just to survive a spike. Instead, we can scale writes by keeping database transactions short, using efficient indexes, batching external deposit ingestion carefully, and partitioning large tables like ledger entries.
Strategies for write-heavy payday traffic:
- Use connection pooling so the database is not overwhelmed by too many connections.
- Keep transactions short and avoid slow external calls inside transactions.
- Process payroll files in controlled batches.
- Make payroll deposit instructions idempotent.
- Partition large ledger and transaction tables by time or account hash.
- Monitor lock wait time, deadlocks, and transaction latency.
Strategies for read-heavy payday traffic:
- Serve transaction history from read replicas.
- Cache account lists and profile data.
- Use CDN or edge caching for static content.
- Use Redis for rate limiting and session support.
- Add backpressure and graceful degradation for non-critical features.
The key interview point: we scale reads aggressively, but we protect the correctness of writes.
19. Asynchronous Processing and the Outbox Pattern
We should keep the synchronous transaction focused on the critical money movement. We should not send emails, push notifications, analytics events, or call slow external services inside the database transaction.
After the transfer commits, other systems can react asynchronously.
TransactionCompleted Event
|
+--> Notification Service
+--> Fraud / Risk Service
+--> Statement Service
+--> Analytics PipelineThe outbox pattern helps avoid a common failure case:
1. Database transaction commits.
2. Service crashes before publishing message.
3. Downstream systems never hear about the transaction.With an outbox table, we write the event into the same database transaction as the money movement. A separate publisher reads unpublished outbox rows and sends them to the message broker. Once published, the row is marked as published.
This gives us reliable event delivery without making the message broker part of the critical balance transaction.
The connection looks like this:
Banking Service
|
| writes inside the same DB transaction
v
outbox_events table
|
| polled or streamed by
v
Outbox Publisher
|
| publishes event messages to
v
Message Broker
|
+--> Notification Service
+--> Fraud / Risk Service
+--> Statement ServiceThe publisher must also be idempotent. It may crash after publishing an event but before marking the row as published, which means the same event could be sent again after restart. Downstream consumers should deduplicate by event_id, and the publisher should use retries with backoff for temporary broker failures.
20. Security and Compliance
Security is not a side note in a banking system. It should appear throughout the design.
Authentication should use strong password hashing such as Argon2 or bcrypt. MFA should be supported for login and sensitive actions. Sessions or tokens should have expiration, rotation, and revocation behavior.
Authorization should be enforced at the service layer and ideally supported by data access patterns that always scope accounts by user or membership. A user should never be able to access another user's account by guessing an account ID.
Sensitive data should be encrypted in transit using TLS and encrypted at rest. Secrets should live in a secrets manager, not in code or environment files committed to the repository.
Audit logs should record security-sensitive and money-sensitive actions:
- Login attempts
- Account creation and closure
- Deposit, withdrawal, and transfer requests
- Admin access
- Failed authorization checks
- Changes to account status
Compliance details depend on the country and product, but in an interview we can mention KYC, AML, data retention, least privilege access, auditability, and separation of duties. The main thing is to show that financial systems require operational controls, not just application code.
21. Failure Cases and Edge Cases
A strong system design answer includes failure modes. Banking interviews often turn into "what happens if..." conversations.
What if a transfer partially fails?
For internal transfers, debit and credit happen inside one ACID transaction. If the credit fails, the debit rolls back. If the transaction commits, both entries exist.
What if the client retries after a timeout?
The idempotency key ensures the retry returns the original result instead of processing a duplicate transfer.
What if two withdrawals happen at the same time?
Row-level locking serializes updates to the same account. The second withdrawal sees the updated balance after the first commits.
What if the read replica is stale?
Fresh balances should read from the primary or from a read path with explicit freshness guarantees. Transaction history can use replicas if slight lag is acceptable.
What if the notification service is down?
The money movement should still commit. The outbox event remains pending and is retried until published.
What if a deposit is later reversed?
We should not mutate history. We create a new reversing transaction and corresponding ledger entries.
What if an account is frozen during a transfer?
The transfer flow locks and validates account status inside the transaction. If the account is frozen before the transfer gets the lock, the transfer fails. If freezing waits behind the transfer, policy determines whether the freeze affects future transactions only.
What if the same idempotency key is reused for a different request?
Store a request hash with the idempotency key. If the hash differs, reject the request.
What if a database node fails?
The primary database should have replication and failover. The application should handle transient errors with retries where safe. Money movement retries must go through idempotency.
What if the ledger and account balance disagree?
This should trigger reconciliation alerts. The ledger can be used to recompute the expected balance, but any correction should be done through controlled repair workflows and reversing or adjustment entries, not silent mutation.
22. Observability
For a banking system, observability is part of correctness. We need to know when money movement is slow, failing, duplicated, or inconsistent.
Important metrics:
- Transaction success and failure rate
- Deposit, withdrawal, and transfer latency
- Database lock wait time
- Deadlock count
- Idempotency hit rate
- Outbox publish lag
- Read replica lag
- Reconciliation mismatches
- Authentication failure rate
Important logs:
- Correlation ID for each request
- Idempotency key
- Transaction ID
- Account IDs involved
- Authorization decision
- Failure reason
Sensitive information should be redacted. Logs should help debugging without leaking private financial data.
23. Final Architecture Summary
Here is the complete design in one view:
+------------------+
| Client |
+--------+---------+
|
v
+------------------+
| API Gateway |
+--------+---------+
|
v
+------------------+
| Auth Service |
+--------+---------+
|
v
+------------------+
| Banking Service |
+--------+---------+
|
+-------------------+-------------------+
| |
v v
+------------------+ +------------------+
| Postgres Primary | | Redis Cache |
+--------+---------+ +--------+---------+
| |
| +--> rate limits
| +--> sessions
| +--> low-risk cache
|
+--------+---------+
| |
v v
+------------+ +------------------+
| DB Tables | | Read Replicas |
+-----+------+ +--------+---------+
| |
| +--> transaction history
| +--> statements
| +--> account lists
|
+--> accounts
+--> transactions
+--> ledger_entries
+--> idempotency_keys
+--> outbox_events
|
v
+------------------+
| Outbox Publisher |
+--------+---------+
|
v
+------------------+
| Message Broker |
+--------+---------+
|
+----------+----------+----------+----------+
| | | | |
v v v v v
+------+ +--------+ +---------+ +------+ +-----------+
|Notif | | Fraud | |Statement| |Audit | | Analytics |
+------+ +--------+ +---------+ +------+ +-----------+And the core money movement flow:
+-------------------+
| BEGIN TRANSACTION |
+---------+---------+
|
v
+-------------------+
| Validate request |
| and idempotency |
+---------+---------+
|
v
+-------------------+
| Lock account rows |
| in stable order |
+---------+---------+
|
v
+-------------------+
| Validate balance, |
| status, limits |
+---------+---------+
|
v
+-------------------+
| Insert transaction|
+---------+---------+
|
v
+--------------------+--------------------+
| |
v v
+-------------------+ +-------------------+
| Update balances | | Insert immutable |
| on account rows | | ledger entries |
+---------+---------+ +---------+---------+
| |
+--------------------+--------------------+
|
v
+-------------------+
| Insert outbox |
| event |
+---------+---------+
|
v
+-------------------+
| COMMIT |
+---------+---------+
|
v
+-------------------+
| Outbox publisher |
| delivers event |
+-------------------+24. Scaling to 100 Million Users
Now let's define a concrete scale target and discuss how the system would grow. Interviewers often ask "how would this scale?" because a simple single-database design is easy to explain, but real banking systems must handle large user bases, huge ledger tables, traffic bursts, and strict correctness requirements at the same time.
Assume the system needs to support:
- 100 million registered users
- 200 million bank accounts
- 10 million daily active users
- 50 million balance reads per day
- 100 million transaction history reads per day
- 20 million money movement operations per day
- 5,000 normal write transactions per second at peak
- 25,000 write transactions per second during payday bursts
- 100,000+ read requests per second during major traffic spikes
- Multiple regions for low-latency reads and disaster recovery
These numbers are intentionally large enough that one primary database eventually becomes a bottleneck. The challenge is to scale without weakening the money invariants.
Start with Vertical Scaling and Read Scaling
Before sharding, we should squeeze a lot of mileage out of a simpler architecture. Sharding adds operational complexity, makes cross-shard transfers harder, and increases the number of failure modes. In an interview, it is good to say that we do not shard on day one.
The first scaling steps are:
- Use a powerful Postgres primary for writes.
- Add read replicas for transaction history and statements.
- Use connection pooling with PgBouncer or an equivalent pooler.
- Add proper indexes for account lookup, transaction history pagination, and idempotency.
- Partition large tables like
ledger_entriesandtransactions. - Cache low-risk reads such as account lists and profile information.
- Keep write transactions short and free of external network calls.
This architecture can handle a surprising amount of traffic if the schema and queries are efficient. The primary database remains the source of truth for balance-changing operations, while replicas and caches absorb read-heavy traffic.
Scale the Read Path First
Most banking traffic is read-heavy. Users check balances and transaction history far more often than they move money. That means the first major scaling opportunity is the read path.
Balance reads are special. If the user just completed a transfer, the balance should be fresh. For this endpoint, we can read from the primary database or use a strongly consistent read path.
GET /accounts/:accountId/balanceFor less sensitive reads, we can use replicas:
GET /accounts
GET /accounts/:accountId/transactions
GET /statements/:statementIdTransaction history can be served from:
- Postgres read replicas
- A denormalized transaction history table
- A search index for filtering
- A statement service for monthly snapshots
The important detail is freshness. We should be able to say:
- Fresh balances come from the primary or a strongly consistent source.
- Transaction history may come from a replica with small lag.
- Statements are generated asynchronously and can lag behind real-time activity.
- Analytics data is eventually consistent.
Partition Large Tables Before Sharding the Whole Database
Ledger and transaction tables will become enormous. With 20 million money movement operations per day, the system may create 40 million or more ledger entries per day because transfers create at least two entries. That is more than 14 billion ledger entries per year.
Before splitting users across database shards, we can partition large append-heavy tables.
Common partitioning strategies:
- Partition
ledger_entriesby month or day. - Partition
transactionsby creation time. - Sub-partition by account hash if a single time partition becomes too hot.
- Keep recent partitions on faster storage.
- Move old partitions to cheaper storage or archive systems.
Example:
ledger_entries_2026_05
ledger_entries_2026_06
ledger_entries_2026_07Time-based partitioning makes retention, archiving, and range queries easier. However, transaction history for one account may span many partitions, so we need indexes such as:
(account_id, created_at DESC, id DESC)For very large scale, a hybrid partitioning approach works well:
Partition by month
Then sub-partition by hash(account_id)This avoids putting all payday writes for a month into one physical partition.
When to Introduce Sharding
Sharding becomes necessary when one primary database cannot handle write throughput, storage growth, vacuum pressure, index size, or operational risk. At 100 million users, sharding is likely needed for the core account and ledger data.
The safest shard key is usually account_id, not user_id.
Why account_id?
- Money movement locks and updates accounts.
- Ledger entries belong to accounts.
- Transaction history is commonly queried by account.
- A user can have multiple accounts, and those accounts may grow independently.
If we shard by user_id, all accounts for one user live together, which is convenient for account lists. But internal transfers may happen between users, businesses, payroll accounts, merchant accounts, or operational accounts. Sharding by account_id maps more naturally to the ledger.
One possible shard layout:
Shard 0: accounts where hash(account_id) % N = 0
Shard 1: accounts where hash(account_id) % N = 1
Shard 2: accounts where hash(account_id) % N = 2
...
Shard N: accounts where hash(account_id) % N = NEach shard owns:
- Account rows for its account IDs
- Ledger entries for those accounts
- Transactions that affect only accounts on that shard
- Idempotency records for requests routed to that shard
For single-account operations like deposits and withdrawals, this works cleanly. The request router looks up the account ID, routes to the correct shard, and the shard processes the operation with a local ACID transaction.
Shard Routing
The Banking API Service should not randomly query every shard. It needs a routing layer.
The routing layer can use:
- A shard map stored in a highly available metadata store
- A deterministic hash function
- Consistent hashing with virtual nodes
- A routing library embedded in the service
Simple modulo hashing is easy:
shard_id = hash(account_id) % number_of_shardsThe problem is resharding. If we go from 32 shards to 64 shards, many accounts move because the modulo result changes.
Consistent hashing reduces the amount of movement. Instead of mapping directly with modulo, we place shards on a hash ring. Each account ID hashes to a point on the ring and belongs to the next shard clockwise. When we add a shard, only a portion of keys move.
hash(account_id) -> position on ring -> owning shardVirtual nodes improve balance. Each physical shard owns many virtual nodes on the ring, which spreads load more evenly and makes adding or removing physical shards smoother.
Physical Shard A owns virtual nodes A1, A2, A3...
Physical Shard B owns virtual nodes B1, B2, B3...In practice, many financial systems prefer an explicit shard map over pure consistent hashing because account movement must be controlled, audited, and reversible. A hybrid approach works well: use hashing to assign new accounts, but store the final account-to-shard mapping in metadata.
Cross-Shard Transfers
Sharding creates the hardest scaling problem: what happens when Account A and Account B live on different shards?
If both accounts are on the same shard, the transfer is simple:
BEGIN;
lock account A
lock account B
debit A
credit B
insert ledger entries
COMMIT;If accounts are on different shards, a single local database transaction cannot atomically update both accounts. There are a few possible designs.
Option 1: Avoid cross-shard transfers by co-locating related accounts.
For example, accounts owned by the same user could be placed on the same shard. This makes transfers between a user's checking and savings accounts easy. But it does not solve transfers between different users or businesses.
Option 2: Use a distributed transaction or two-phase commit.
Two-phase commit can provide atomicity across shards, but it adds complexity, coordinator failure modes, locking across databases, and operational pain. In many interviews, it is acceptable to mention it and then explain why we would avoid it unless the database platform already provides robust distributed transactions.
Option 3: Use a ledger-based settlement account pattern.
This is often easier to reason about. Instead of requiring both shards to update in one database transaction, we model the cross-shard transfer as a small state machine with durable transaction states.
Example:
Transfer txn_1: Account A on Shard 1 -> Account B on Shard 7
Step 1: On Shard 1, reserve or debit funds from Account A.
Step 2: Record transfer state as DEBIT_POSTED.
Step 3: Publish an internal transfer event.
Step 4: On Shard 7, credit Account B.
Step 5: Mark transfer as COMPLETED.This is no longer a single ACID transaction across both accounts. To preserve correctness, we need strong transaction states, idempotent processing, and reconciliation. The source account debit must not be lost, and the destination credit must eventually happen or be reversed.
A more formal state machine:
PENDING
-> SOURCE_DEBITED
-> DESTINATION_CREDITED
-> COMPLETED
PENDING
-> FAILED
SOURCE_DEBITED
-> REVERSAL_PENDING
-> REVERSEDEach state transition is committed locally on the owning shard. Every message includes a globally unique transaction_id and is processed idempotently.
The interview tradeoff:
- Same-shard transfers are strongly ACID.
- Cross-shard transfers become eventually completed but still auditable and correct.
- The system must expose clear transaction status to users.
- Reconciliation jobs detect stuck transfers.
- Reversal flows compensate if the destination credit cannot complete.
If the interviewer requires strict atomicity for all transfers, then we need distributed transactions or a database that supports cross-partition ACID transactions. If they allow real-world payment semantics, a durable state machine with reconciliation is more practical.
Global Transaction IDs
At scale, every transaction and ledger entry needs a globally unique ID. We should not rely on auto-incrementing IDs from separate shards because two shards can generate the same number.
Good options:
- UUIDv7
- ULID
- Snowflake-style IDs
These IDs are sortable or roughly time-ordered, which helps with transaction history pagination and debugging.
Example:
txn_018f3f2a9b7c...
ledger_018f3f2aa1d4...The ID should encode no sensitive information. It should not reveal account numbers, user IDs, or shard IDs unless the system explicitly accepts that tradeoff.
Hot Accounts and Hot Shards
Hashing distributes normal accounts well, but it does not eliminate hot accounts. Some accounts receive massive traffic: payroll funding accounts, merchant settlement accounts, internal clearing accounts, or popular business accounts.
A single hot account is hard because all debits and credits for that account must be serialized to maintain a correct balance.
Mitigations:
- Keep transactions on hot accounts very short.
- Use dedicated shards for known high-volume operational accounts.
- Split high-volume business activity across sub-accounts when the business model allows it.
- Use pending settlement ledgers for batch credits.
- Precompute read models for hot account history.
- Apply per-account rate limits and backpressure.
For example, payroll deposits can be processed as a batch where the funding account debit is recorded once, and individual employee credits are processed in parallel across shards. The system still needs reconciliation to ensure the total credits equal the funding debit.
Caching Strategy
Caching helps reads, but it must be used carefully.
Good cache candidates:
- User profile summary
- Account list
- Feature flags
- Institution metadata
- Recently viewed transaction history pages
- Statement metadata
- Rate limit counters
Risky cache candidates:
- Available balance
- Authorization decisions for sensitive actions
- Account frozen or closed status
For balances, a short-lived cache can be used only if the product accepts the freshness semantics. A common safer pattern is write-through or invalidate-on-write:
Money movement commits
|
+--> update account row
+--> insert ledger entries
+--> write outbox event
|
v
After commit: invalidate account balance cacheIf cache invalidation fails, the cache entry should expire quickly. For high-risk actions like withdrawals, always validate against the database, not the cache.
Queue-Based Load Leveling
Some work can be queued to smooth spikes. Payroll ingestion is a good example.
Instead of letting a payroll file create an unbounded number of concurrent writes, we can:
- Store the payroll file metadata.
- Break it into deposit instructions.
- Put instructions on a queue.
- Process with controlled worker concurrency.
- Use idempotency keys per instruction.
- Track batch-level progress.
- Reconcile total expected deposits against completed deposits.
This protects the database during payday. The user-facing result can show deposits as pending until they are posted.
However, not every operation should be queued. A user-initiated internal transfer should usually return a clear synchronous result if both accounts are local and available. If we queue it, the product must show a PENDING state and handle user expectations.
Multi-Region Scaling
Multi-region design is tricky for banking because strong consistency across regions is expensive.
A practical approach:
- Use one write region per shard or account group.
- Serve read-only traffic from regional replicas.
- Route balance-changing requests to the owning write region.
- Keep disaster recovery replicas in other regions.
- Use async replication for analytics and history views.
For example:
US-East owns write shards 0-31
US-West owns write shards 32-63
EU-Central owns EU customer shardsThe routing layer knows which region owns each account. If a user in California owns an account whose write shard is in US-West, balance-changing operations go there. If the user travels, reads may still be served locally when freshness allows, but writes route to the account's owner region.
Active-active writes to the same account from multiple regions are dangerous. They can create conflicting balance updates. For this reason, each account should have a single write owner at any point in time.
Resharding and Account Movement
Eventually, shards need to be split or accounts need to move.
A safe account movement process:
- Mark the account as moving in the shard map.
- Temporarily route new writes to the old shard or pause writes briefly.
- Copy account, ledger, transaction, and idempotency data to the new shard.
- Verify checksums and balances.
- Switch the shard map to the new shard.
- Resume writes.
- Keep the old data read-only for a retention period.
For large accounts, this may need to happen with change data capture:
Initial copy
|
v
Replay changes from CDC log
|
v
Brief write pause
|
v
Final sync and cutoverThe shard map must be highly available and strongly consistent enough that two services do not route writes for the same account to different shards.
Reconciliation at Scale
The larger the system gets, the more important reconciliation becomes. Reconciliation is a background process that verifies financial invariants.
Examples:
- Account balance equals the sum of ledger entries.
- Every completed transaction has the required ledger entries.
- Cross-shard transfers do not remain stuck too long.
- Payroll batch total equals the sum of posted employee deposits.
- Outbox events are eventually published.
- Downstream consumers have processed all required events.
At large scale, reconciliation should be incremental. Recomputing every account balance from the beginning of time every night is too expensive. Instead, track checkpoints:
account_id
last_reconciled_ledger_entry_id
last_reconciled_balance
last_reconciled_atThen each reconciliation run processes only new ledger entries since the last checkpoint.
Scaling the Outbox Publisher
The outbox table can also become large. If every money movement emits events, we may generate tens of millions of outbox rows per day.
Scaling strategies:
- Partition
outbox_eventsby time. - Poll with
FOR UPDATE SKIP LOCKEDso multiple publishers can work in parallel. - Use event type or shard ID to split work.
- Keep published events for a retention period, then archive.
- Monitor outbox lag closely.
Example polling query:
SELECT *
FROM outbox_events
WHERE published_at IS NULL
ORDER BY created_at
LIMIT 100
FOR UPDATE SKIP LOCKED;Multiple publisher workers can safely run this query. Each worker locks a batch of unpublished events, publishes them, and marks them as published.
What Stays Strongly Consistent?
The final scalability answer should make one thing very clear: scaling does not mean making everything eventually consistent.
Strongly consistent:
- Account balance updates
- Ledger entry creation
- Idempotency records for money movement
- Account status checks for money movement
- Same-shard transfers
Eventually consistent:
- Notifications
- Analytics
- Search indexes
- Monthly statements
- Transaction history read models
- Fraud feature pipelines
- Cross-shard transfer completion if using a state-machine approach
This distinction is the heart of scaling a banking system. We scale the edges aggressively, but we keep the financial core correct.
Scalability Interview Script
If you need to summarize this section quickly in an interview, say:
I would first scale reads with replicas, caching, and denormalized read models.
Then I would partition huge append-only tables like ledger_entries by time and account hash.
When one primary can no longer handle write throughput or storage, I would shard by account_id.
Single-account operations stay local to one shard and remain ACID.
Same-shard transfers remain ACID.
Cross-shard transfers either require distributed transactions or a durable transfer state machine with idempotent steps, reconciliation, and reversals.
I would use a shard map or consistent hashing with virtual nodes for routing, but keep account movement controlled and auditable.
Throughout the system, balances and ledger writes stay strongly consistent, while notifications, analytics, statements, and search are eventually consistent.25. Interview Summary
The most important thing in a banking system is correctness. We should design the system around ACID money movement, immutable ledger entries, idempotent APIs, and careful concurrency control.
For the source of truth, use a relational database like Postgres. Store current balances for fast reads, but also maintain an append-only ledger that explains every balance change. Use row-level locks or another clear concurrency strategy to prevent double spending.
For scale, separate critical writes from scalable reads. Balance-changing operations go to the primary database. Transaction history, statements, notifications, analytics, and other non-critical workflows can use replicas, caches, queues, and asynchronous processing.
If the interviewer pushes on edge cases, focus on the invariants:
- A transfer cannot partially complete.
- A request retry cannot duplicate money movement.
- A user cannot spend the same funds twice.
- A ledger entry is never silently changed.
- A stale read should not be mistaken for a fresh balance.
- External side effects should not break the core transaction.
That is the heart of the design: keep money correct first, then scale the surrounding system around that guarantee.