Relay & Durable Objects

The R-M-W Race in Cloudflare KV: A Quota Bug and How We Fixed It

Cloudflare KV is eventually consistent with read-after-write delays. When two concurrent writers do read-modify-write on the same key, one of the writes can be lost. This bit TermOnMac in the extra quota counter — and the fix illustrates the general pattern for handling KV updates safely.

The Original Design

Originally, the extra quota grant and the used counter were stored in a single KV key:

extra_quota:{userId} → { limit, used, created_at, expires_at }

To consume from extra quota, the code would:

  1. Read the record
  2. Increment used
  3. Write the record back

This is a textbook read-modify-write. Under concurrent load, two writers could both read used=100, both increment to 200, and both write back — losing one of the increments.

Worse: if an admin granted a fresh extra quota (resetting used to 0), and a Room DO that had cached the old record flushed its in-memory counter back to KV, it could overwrite the fresh grant — silently consuming the entire bonus.

The Fix: Split Keys

The current implementation splits the grant from the used counter:

// relay_server/src/usage-tracking.ts
function extraQuotaKey(userId: string): string {
  return `extra_quota:${userId}`;       // grant — written once, never modified
}

function extraQuotaUsedKey(userId: string): string {
  return `extra_quota_used:${userId}`; // used counter — separate key
}

The grant key is written once on creation and never touched again. Only the used counter key is updated by consumption.

Why This Helps

The grant key is no longer subject to R-M-W. Two consumers racing on the used counter can still over-count slightly (which is the safe direction — over-counting means quota is preserved), but they cannot accidentally clobber the grant.

The consumeExtraQuota function explicitly documents this:

/**
 * Consume from extra quota. Returns the amount actually consumed
 * (0 if expired/depleted, partial if remainder < amount).
 *
 * Writes ONLY to the used counter key (extra_quota_used:{userId}), never to
 * the grant key. This prevents Room DO flushes from overwriting admin grants
 * via stale KV cache reads.
 */
export async function consumeExtraQuota(
  kv: KVNamespace,
  userId: string,
  amount: number,
): Promise<number> {
  const record = await getExtraQuota(kv, userId);
  if (!record) return 0;
  const remaining = record.limit - record.used;
  if (remaining <= 0) return 0;
  const consumed = Math.min(amount, remaining);
  // Write ONLY to used counter key — grant key is never touched
  const usedRecord = { used: record.used + consumed, grant_created_at: record.created_at };
  const remainingTtlMs = record.expires_at - Date.now() + 86400_000; // +1 day buffer
  const ttlSeconds = Math.max(1, Math.ceil(remainingTtlMs / 1000));
  await kv.put(extraQuotaUsedKey(userId), JSON.stringify(usedRecord), { expirationTtl: ttlSeconds });
  return consumed;
}

Version Check via grant_created_at

There’s still one case to handle: if a fresh grant has been issued, and a Room DO with stale cache writes a used counter from the previous grant’s perspective, the new grant should not be silently consumed.

The fix is a version field — the used counter records the created_at timestamp of the grant it belongs to:

const usedRecord = { used: record.used + consumed, grant_created_at: record.created_at };

When reading the used counter, we check that it matches the current grant:

export async function getExtraQuota(
  kv: KVNamespace,
  userId: string,
): Promise<ExtraQuotaRecord | null> {
  const grantJson = await kv.get(extraQuotaKey(userId));
  if (!grantJson) return null;
  const grant = JSON.parse(grantJson);
  if (Date.now() > grant.expires_at) return null;

  // Read used counter from separate key
  const usedJson = await kv.get(extraQuotaUsedKey(userId));
  let used = 0;
  if (usedJson) {
    const usedRecord = JSON.parse(usedJson);
    // Version check: only trust used counter if it matches current grant
    if (usedRecord.grant_created_at === grant.created_at) {
      used = usedRecord.used;
    }
    // else: stale used counter from previous grant — treat as 0
  } else if (typeof grant.used === "number") {
    // Backward compat: old single-record format with used in grant
    used = grant.used;
  }

  return { limit: grant.limit, used, created_at: grant.created_at, expires_at: grant.expires_at };
}

If the used counter’s grant_created_at doesn’t match the current grant’s created_at, the counter is from a previous grant. It’s treated as zero — meaning the new grant starts with full balance.

Grant Replacement

grantExtraQuota (called by admin commands) writes a new grant and deletes the old used counter:

export async function grantExtraQuota(
  kv: KVNamespace,
  userId: string,
  limit: number = EXTRA_QUOTA_LIMIT,
  days: number = 7,
): Promise<ExtraQuotaRecord> {
  const now = Date.now();
  const durationMs = days * 24 * 3600 * 1000;
  const grant = {
    limit,
    created_at: now,
    expires_at: now + durationMs,
  };
  const ttlSeconds = Math.ceil(durationMs / 1000) + 86400; // +1 day GC buffer
  // Write grant record — separate from used counter to prevent R-M-W race
  await kv.put(extraQuotaKey(userId), JSON.stringify(grant), { expirationTtl: ttlSeconds });
  // Delete used counter — new grant starts fresh; stale reads are handled
  // by grant_created_at version check in consumeExtraQuota/getExtraQuota
  await kv.delete(extraQuotaUsedKey(userId));
  return { limit, used: 0, created_at: now, expires_at: now + durationMs };
}

The deletion of the used counter is a hint, not a guarantee. The version check is what actually protects against stale reads — even if the deletion hasn’t propagated through KV’s eventual consistency, the version check rejects stale data.

Backward Compatibility

The getExtraQuota function handles records written in the old single-key format:

} else if (typeof grant.used === "number") {
  // Backward compat: old single-record format with used in grant
  used = grant.used;
}

Old records still have used inside the grant. New records have used in a separate key. Both formats are readable.