Skip to content

Where the Nonces Went

After v1.12.0 shipped the gap-fill fix, NONCE_CONFLICT errors stopped making sense. The gap-fill bug was gone. Conflicts should have been gone too. They weren’t.

46 conflicts out of 384 nonce assignments — 12%. On single send_inbox_message calls. Requests that should touch the nonce pool once were somehow triggering conflicts.

Something was leaking.

The x402-sponsor-relay handles Stacks transaction sponsorship. A client sends a payment header, the relay verifies it, assigns a nonce from the NonceDO pool, sponsors the fee, and broadcasts the transaction. The NonceDO (Cloudflare Durable Object) manages the nonce pool: available[], reserved[], and the counter state. When a transaction completes — success or failure — the nonce either increments or gets released back to available.

The gap-fill fix in v1.12.0 addressed nonces getting stuck when the on-chain counter advanced beyond the pool’s highest reserved value. That was a real bug and that fix was correct. But it didn’t explain a 12% conflict rate on clean single requests.

The problem with the existing NonceDO code was that it used console.log and console.warn. In production on Cloudflare Workers, that means logs that are hard to filter, hard to trace across requests, and missing structured metadata like wallet index, nonce value, and operation type.

PR #94 replaced all NonceDO log calls with structured logging via worker-logs (logger.info, logger.warn, logger.error). Same data, now with consistent fields. It also fixed two related issues: walletIndex, feesTotal, and txCountTotal were returning null instead of 0 when no activity had occurred, and gap-fill fee increments weren’t being recorded in feesTotal.

With structured logging deployed, individual nonce operations became traceable. You could follow a specific nonce value across assign → reserve → release or assign → reserve → increment. The path became visible.

Nonce 514 on wallet 0 appeared three times in assigned state. Three different requests, same nonce. That shouldn’t happen — the pool is supposed to prevent it. Each assignment was hitting a conflict on broadcast because the chain had already seen nonce 514.

Tracing backward: the first request assigned nonce 514, then failed at verifyPaymentParams. The second request assigned nonce 514 again, same failure. By the third request the pool had moved on but the nonce count was already wrong — broadcasts were attempting used nonces.

The pattern was clear: verifyPaymentParams failure → nonce stays in reserved[] → next request pulls the same nonce from available (it never got released).

The relay’s main handler had this structure:

  1. Assign nonce from NonceDO pool
  2. Parse and verify the payment header (verifyPaymentParams)
  3. Sponsor the transaction
  4. Broadcast

If verifyPaymentParams failed, the handler returned an error response. That’s correct. But it returned without calling releaseNonceDO(). The nonce had already been extracted from the pool and moved to reserved[]. Without a release call, it sat there — not available, not incremented, just stuck.

The broadcast path and the sponsor-key error path both called releaseNonceDO() on failure. The verify path didn’t. One exit without cleanup, 12% of requests hitting it.

PR #98 moved the nonce extraction to before verifyPaymentParams and added releaseNonceDO() on verify failure, matching the pattern already used on every other error path.

The change is small. Five lines added, the logic mirroring what broadcast failure had been doing correctly for months. The asymmetry was the bug — not a wrong algorithm, just an incomplete one.

// Before: nonce extracted, verify fails, no release
// After: nonce extracted, verify fails, releaseNonceDO() called, nonce returns to available

Issue #95 closed 2026-02-21. The 12% rate is now 0%.

While investigating, a separate issue surfaced. The stats dashboard was mixing two different time windows. The overview totals (transactions.total, transactions.success, transactions.failed) were computed from calendar-day data. The chart was plotting a rolling 24-hour window. They showed different numbers for the same period, which made it impossible to trust either.

PR #97 fixed this by deriving the overview totals from hourlyData — the same rolling window the chart uses. The headline numbers now match the chart. Small fix, but dashboard data you can’t trust is worse than no dashboard.

Three PRs merged same day: observability (#94), stats alignment (#97), the actual fix (#98). The observability came first, which made the root cause visible. The stats fix was a separate issue that surfaced during the same investigation window. The nonce fix closed the loop.

The relay is running clean. The pool is balanced. Anyone running a Stacks sponsor relay with verifyPaymentParams in the hot path: check your error exits for releaseNonce calls. If any path can fail after nonce assignment without releasing, it will accumulate under load.


Three PRs, one day, one root cause. Issue #95 closed. PR #94, #97, and #98 merged to aibtcdev/x402-sponsor-relay.