CS6963 Distributed Systems

Lecture 05 Optimistic Concurrency Control

Efficient Optimistic Concurrency Control using Loosely Synchronized Clocks
by Adya, Gruber, Liskov and Maheshwari.

  • Why this paper?

    • Tuesday: Argus, 2PC + 2PL for serializable transactions
    • Can be slow: must take locks even when there is no contention
    • Scalability limited: lots of locks/unlocks flying around
    • Argus also fused data and computation: people don't like that
    • Thor has a more conventional model
    • Goal: scalable protocol for enforcing serializable transactions across objects.
  • Thor overview

    • [clients, client caches, servers A-M N-Z]
    • Data sharded over servers
    • Code runs in clients (not like Argus; not an RPC system)
    • Clients read/write DB records from servers
    • Clients cache data locally for fast access
    • On client cache miss, fetch from server
  • Thor arrangement is fairly close to modern big web site habits

    • Clients, local fast cache, slower DB servers
    • Similar to Facebook/memcache paper
    • but Thor has much better semantics, strong guarantees
    • As a result, prone to unavailability during failures.
  • Thor programs use fully general transactions

    • Multi-operation
    • Serializable
    • So can do bank xfers w/o losing money...
  • Client caching makes transactions tricky

    • Writes have to invalidatate cached copies
    • How to cope with reads of stale cached data?
    • How to cope with read-modify-write races?
    • Clients could lock before using each record
    • But that's slow - probably need to contact server
    • Wrecks the whole point of fast local caching in clients
    • (though caching read locks might be OK, as in paper Eval)
  • Thor uses optimistic concurrency control (OCC)

    • An idea from the early 1980s
    • Just read and write the local copy
    • Don't worry about other transactions until commit
    • When transaction wants to commit:
    • Send read/write info to server for "validation"
    • Validation decides if OK to commit -- if serializable
    • If yes, send invalidates to clients with cached copies of written records
    • If no, abort, discard writes
    • Optimistic b/c hopes for no conflict
    • If turns out to be true, fast!
    • If false, validation can detect, but slow
  • What should validation do?

    • It looks at what the executing transactions read and wrote
    • Decides if there's a serial execution order that would have gotten
    • The same results as the actual concurrent execution
    • There are many OCC validation algorithms!
    • I will outline a few, leading up to Thor's
  • Validation scheme #1

    • First, just let clients read/write as they see fit
    • A single validation server
    • Clients tell validation server the read and write VALUES
    • Seen by each transaction that wants to commit
    • "read set" and "write set"
    • Validation must decide:
    • Would the results be serializable if we let these transactions commit?
    • Scheme #1 shuffles the transactions, looking for a serial order
    • In which each read sees the value written by the most recent write; if one exists, the execution was serializable.

Validation example 1:

  initially, x=0 y=0 z=0
  T1: Rx0 Wx1
  T2: Rz0 Wz9
  T3: Ry1 Rx1
  T4: Rx0 Wy1
  • Validation needs to decide if this execution (reads, writes) is equivalent to some serial order.
  • Yes: one such order is T4, T1, T3, T2; says yes to all
    • (really T2 can go anywhere)
  • Note this scheme is far more permissive than Thor's

    • e.g. it allows transactions to see uncommitted writes (non-ACR)
  • OCC is neat b/c transactions don't need to lock!

    • So they can run quickly from client caches
    • Just one msg exchange w/ validator per transaction
    • Not one locking exchange per record used
    • OCC excellent for T2 which didn't conflict with anything
    • We got lucky for T1 T3 T4, which do conflict

Validation example 2 -- sometimes must abort:

  initially, x=0 y=0
  T1: Rx0 Wx1
  T2: Rx0 Wy1
  T3: Ry0 Rx1
  • Values not consistent w/ any serial order!

    • T1 -> T3 (via x)
    • T3 -> T2 (via y)
    • T2 -> T1 (via x)
    • There's a cycle, so not the same as any serial execution
    • Perhaps T3 read a stale y=0 from cache or T2 read a style x=0 from cache
    • In this case validation can abort one of them then others are OK to commit
    • e.g. abort T2
    • Then T1, T3 is OK (but not T3, T1)
  • How should client handle abort?

    • Roll back the program (including writes to program variables)
    • Re-run from start of transaction
    • Hopefully won't be conflicts the second time
    • OCC is best when conflicts are uncommon!
  • Do we need to validate read-only transactions?


    initially x=0 y=0
    T1: Wx1
    T2: Rx1 Wy2
    T3: Ry2 Rx0
  • i.e. T3 read a stale x=0 from its cache, hadn't yet seen invalidate.
  • Need to validate in order to abort T3.
  • Other OCC schemes can avoid validating read-only transactions

    • Keep multiple versions -- but Thor and my schemes don't
  • Is OCC better than locking?

    • yes, if few conflicts
    • avoids lock msgs, clients don't have to wait for locks
    • no, if many conflicts
    • OCC aborts, must re-start, perhaps many times
    • locking waits

Example: simultaneous increment

      T1: Rx0 Wx1
      T2: -------Rx1  Wx2
      T1: Rx0 Wx1
      T2: Rx0 Wx1

OCC: fast but wrong; must abort one

  • We really want distributed OCC validation

    • Split storage and validation load over servers
    • Each storage server sees only txns that use its data
    • Each storage server validates just its part of the txn
    • Two-phase commit (2PC) to check that they all say "yes"
    • Only really commit if all relevant servers say "yes"
  • Can we just distribute validation scheme #1?

  • Imagine server S1 knows about x, server S2 knows about y

Example 2 again:

    T1: Rx0 Wx1
    T2: Rx0 Wy1
    T3: Ry0 Rx1

S1 validates just x information:

    T1: Rx0 Wx1
    T2: Rx0
    T3: Rx1

Answer is "yes" (T2 T1 T3)
S2 validates just y information:

    T2: Wy1
    T3: Ry0

Answer is "yes" (T3 T2)
but we know the real answer is "no"!

  • So simple distributed validation does not work

    • The validators must choose the same serial order!
  • Validation scheme #2

    • Idea: client (or coordinator) chooses timestamp for committing txn
      • from loosely synchronized clocks, as in Thor
    • Validation checks that reads and writes are consistent with TS order
    • Solves distrib validation problem:
      • Timestamps tell the validators the order to check
      • So "yes" votes will refer to the same order

Example 2 again, with timestamps:

  T1@100: Rx0 Wx1
  T2@110: Rx0 Wy1
  T3@105: Ry0 Rx1

S1 validates just x information:

    T1@100: Rx0 Wx1
    T2@110: Rx0
    T3@105: Rx1

Timestamps say order must be T1, T3, T2
does not validate! T2 should have seen x=1

S2 validates just y information:

    T2@110: Wy1
    T3@105: Ry0

Timestamps say order must be T3, T2

S1 says no, S2 says yes, two-phase commit coordinator will abort

  • What have we given up by requiring timestamp order?


    T1@100: Rx0 Wx1
    T2@50: Rx1 Wx2
  • T2 follows T1 in real time, and sees T1's write

    • but T2 will abort, since TS says T2 comes first, so T1 should have seen x=2
      • could have committed, since T1 then T2 works
    • this will happen if client clocks are too far off
      • if T1's client clock is ahead, or T2's behind
    • so: requiring TS order can abort unnecessarily
      • Because validation no longer searching for an order that works
      • Instead merely checking that TS order consistent w/ reads, writes
      • We've given up some optimism by requiring TS order
    • Maybe not a problem if clocks closely synched
    • Maybe not a problem if conflicts are rare
  • Problem with schemes so far:

    • Commit messages contained values, which can be big
    • Could instead use version numbers to check whether later txn read earlier txn's write
    • Let's use writing txn's TS as record version number
  • Validation scheme #4

    • Tag each DB record (and cached record) with TS of xation that last wrote it
    • Validation requests carry TS of each record read

Our example for scheme #4:

  all values start with timestamp 0
  T1@100: Rx@0 Wx
  T2@110: Rx@0 Wy
  T3@105: Ry@0 Rx@100
  • Note:
    • Reads have timestamp that was in read record, not value
    • Writes don't include either value or timestamp

S1 validates just x information: orders the transactions by timestamp:

    T1@100: Rx@0 Wx
    T3@105: Rx@100
    T2@110: Rx@0

The question: does each read see the most recent write?
T3 is ok, but T2 is not

S2 validates just y information: again, sort by TS, check each read saw latest write:

    T3@105: Ry@0
    T2@110: Wy

This does validate
So scheme #4 aborts, correctly, reasoning only about version TSs

  • What have we give up by thinking about version #s rather than values?
    • Maybe version numbers are different but values are the same e.g.
    T1@100: Wx1
    T2@110: Wx2
    T3@120: Wx1
    T4@130: Rx1@100
  • Timestamps say we should abort T4 b/c read a stale version

    • Should have read T3's write
    • So scheme #4 will abort
    • But T4 read the correct value: x=1
    • So abort wasn't necessary
  • Problem: per-record timestamp might use too much storage space

    • Thor wants to avoid space overhead
    • Maybe important, maybe not
  • Validation scheme #5

    • Thor's invalidation scheme: no timestamps on records
    • How can validation detect that a transaction read stale data?
    • It read stale data b/c earlier txn's invalidation hadn't yet arrived!
    • So server can track invalidation msgs that might not have arrived yet
      • "invalid set" - one per client
      • Delete invalid set entry when client ACKs invalidation msg
      • Server aborts committing txn if it read record in client's invalid set
      • Client aborts running txn if it read record mentioned in invalidation
  • Example use of invalid set

    • [timeline]
    • Client C1:
      • T2@105 ... Rx ... 2PC commit point
      • imagine that client acts as 2PC coordinator
    • Server:
      • VQ: T1@100 Wx
      • T1 committed, x in C1's invalid set
        • server has sent invalidation message to C1
  • Three cases:

    1. Invalidation arrives before T2 reads:
      • Rx will miss in client cache, read from data from server
      • Client will (probably) return ACK before T2 commits
      • Server won't abort T2
    2. Invalidation arrives after T2 reads, before commit point:
      • Client will abort T2 in response to invalidation
    3. Invalidation arrives after 2PC commit point:
      • i.e. after all servers replied to prepare
      • This means the client was still in the invalid set when
        • the server tried to validate the transaction
      • So the server aborted, so the client will abort too

So: Thor's validation detects stale reads w/o timestamp on each record.


  • Figure 5

    • AOCC is Thor
    • Comparing to ACBL: client talks to srvr to get write-locks,
    • and to commit non-r/o txns, but can cache read locks along with data
    • why does Thor (AOCC) have higher throughput?
      • fewer msgs; commit only, no lock msgs
    • why does Thor throughput go up for a while w/ more clients?
      • apparently a single client can't keep all resources busy
      • maybe due to network RTT?
      • maybe due to client processing time? or think time?
      • more clients -> more parallel xactions -> more completed
    • why does Thor throughput level off?
      • maybe 15 clients is enough to saturate server disk or CPU
      • abt 100 xactions/second, about right for writing disk
    • why does Thor throughput drop with many clients?
      • more clients means more concurrent xactions at any given time
      • more concurrency means more chance of conflict
      • for OCC, more conflict means more aborts, so more wasted CPU
  • Conclusions

    • Fast client caching + transactions would be excellent
    • Distributed OCC very interesting, still an open research area
    • Avoiding per-record version #s doesn't seem compelling
    • Thor's use of time was influential, e.g. Spanner