Networking & Reliability

Heartbeat Design: Keeping Durable Objects Awake

TermOnMac has two heartbeat configurations — one for the Mac side and one for the iOS side. The Mac’s is dynamic: 30 seconds when idle, 1 second when a user is actively working. iOS is a flat 3 seconds. Both are shaped less by “how long until we notice a dead connection” and more by a quieter concern: Cloudflare Durable Object hibernation.

The Numbers

Mac (RelayConnection.swift):

// mac_agent/Sources/MacAgentLib/RelayConnection.swift
public init(...,
    heartbeatInterval: TimeInterval = 30,
    heartbeatAckTimeout: TimeInterval = 60,
    ...)

private static let activeHeartbeatInterval: TimeInterval = 1
private static let activeWindow: TimeInterval = 15 * 60  // 15 minutes
  • Default heartbeat interval: 30 seconds
  • Active interval: 1 second, used when either (a) the iOS peer is authenticated, or (b) user activity was observed in the last 15 minutes
  • No ack received in 60 seconds → reconnect

The dynamic interval logic lives in startHeartbeat():

// mac_agent/Sources/MacAgentLib/RelayConnection.swift
while !Task.isCancelled {
    let interval: TimeInterval
    if self.peerAuthenticated {
        interval = Self.activeHeartbeatInterval
    } else if let last = self.lastUserActivityTime,
              Date().timeIntervalSince(last) < Self.activeWindow {
        interval = Self.activeHeartbeatInterval
    } else {
        interval = defaultInterval  // 30s
    }
    try? await Task.sleep(for: .seconds(interval))
    // ...send heartbeat...
}

iOS (iOSRelayConnection.swift):

// ios_remote_dev_ios_app/RemoteDevApp/Network/iOSRelayConnection.swift
init(...,
    heartbeatInterval: TimeInterval = 3,
    heartbeatAckTimeout: TimeInterval = 15,
    reconnectTotalTimeout: TimeInterval = 300)
  • Heartbeat sent every 3 seconds
  • No ack received in 15 seconds → reconnect
  • Total reconnect attempt window: 300 seconds (5 minutes)

Relay (room.ts):

// relay_server/src/room.ts
const HEARTBEAT_INTERVAL = 30_000;      // 30s — drives the alarm schedule
const HEARTBEAT_DEAD_THRESHOLD = 45_000; // 45s — grace window (~1.5 missed Mac heartbeats)

How the Relay Detects Dead Connections

The relay’s alarm fires every 30 seconds. It checks the lastSeen timestamp for each socket:

async alarm(): Promise<void> {
  const now = Date.now();
  const macLastSeen = await this.state.storage.get<number>("lastSeen:mac") ?? 0;
  const macAge = macLastSeen > 0 ? now - macLastSeen : -1;

  if (this.macSocket && macLastSeen > 0 && macAge > HEARTBEAT_DEAD_THRESHOLD) {
    this.send(this.macSocket, {
      type: "error",
      code: "HEARTBEAT_TIMEOUT",
      message: "Heartbeat timeout",
    });
    this.macSocket.close(4000, "heartbeat timeout");
    if (this.iosSocket) {
      this.send(this.iosSocket, { type: "peer_disconnected", reason: "Mac heartbeat timeout" });
    }
    this.macSocket = null;
  }
  // Same logic for iOS...
}

The 45-second dead threshold is 1.5× the 30-second heartbeat interval. A single missed heartbeat (30s without message) doesn’t trigger a disconnect. The connection must be silent for 45 seconds — roughly two missed heartbeats.

Why the Mac Flips to a 1-Second Heartbeat During Active Use

The first version of this was a fixed 30-second heartbeat. That matched the HEARTBEAT_DEAD_THRESHOLD on the relay and kept the DO healthy. What it didn’t account for was Cloudflare Durable Object hibernation: when there are no events for a while, the DO hibernates, in-memory state is reconstructed on the next wake, and latency-sensitive operations — like the first keystroke after a pause — pay the wake-up cost. A 30-second heartbeat is slow enough that the DO has time to slide into hibernation between beats.

The fix was to send heartbeats aggressively only when it matters. When the iOS peer is authenticated, or when user activity has been observed within the last 15 minutes, the Mac sends a heartbeat every second. Otherwise it reverts to the default 30-second cadence. The lastUserActivityTime timestamp is updated on peer messages and PTY output, so an idle session decays back to 30s after 15 minutes and the DO is free to hibernate.

This is the driver behind the numbers. The interval isn’t about “how long does the user want to wait to notice a dead connection” — it’s about keeping the relay DO awake during the window where a user is actively typing.

Why iOS Uses 3 Seconds

The iOS side is static at 3 seconds. It doesn’t share the Mac’s 1s/30s dynamic, because the iOS app’s foreground state already implies active use — it’s only running heartbeats when the user has the app open, and in that case the DO should stay warm. 3 seconds is a compromise: fast enough that a dead relay connection is detected within 15 seconds (the 5× ack timeout), slow enough not to drain battery or hammer the cellular radio with sub-second WebSocket frames.

The heartbeat_ack Carries Extra Information

The relay’s heartbeat response to iOS includes a mac_connected boolean:

// relay_server/src/types.ts
export interface HeartbeatAckMessage {
  type: "heartbeat_ack";
  mac_connected: boolean;
}

The iOS app uses this to show connection status. If mac_connected: false, the app knows the Mac has disconnected even if the relay-to-iOS leg is healthy. This avoids the case where iOS appears “connected” but the Mac is actually offline.

Stale Ephemeral Keys After Timeout

When a heartbeat timeout fires, the relay clears the stale connection’s session nonce and ephemeral key from room state:

if (macTimedOut) {
  delete roomState.mac_session_nonce;
  delete roomState.mac_ephemeral_key;
}
if (iosTimedOut) {
  delete roomState.ios_session_nonce;
  delete roomState.ios_ephemeral_key;
}
await this.state.storage.put("room", roomState);

When the device reconnects, it provides fresh nonces and ephemeral keys. Stale ephemeral keys from the previous connection are not carried forward.

Immediate Reconnect on Network Change

On the Mac side, RelayConnection subscribes to both system wake/sleep notifications and network path changes:

// mac_agent/Sources/MacAgentLib/RelayConnection.swift
NotificationCenter.default.addObserver(
    forName: NSWorkspace.didWakeNotification, ...) { [weak self] _ in
    log("[relay] Mac woke from sleep — forcing immediate reconnect")
    self?.reconnectDelay = 0
    self?.ws.disconnect()
}

monitor.pathUpdateHandler = { [weak self] path in
    // If network path changed (interface list or status)
    log("[relay] network path changed — forcing reconnect")
    self?.reconnectDelay = 0
    self?.ws.disconnect()
}

On wake from sleep or network interface change, reconnectDelay is reset to 0 and the current WebSocket is disconnected, triggering an immediate reconnect without waiting for the next backoff interval.