Retries and transient errors

Handle transient Tresor API failures with backoff, timeouts, and bounded retry budgets.

This guide is about client-side retries after a request fails. It is not the same thing as chat-completions failover or the tresor.failover response flag.

Use Routing failover when you want the router to try alternative routes inside one request. Use retries when your client sends a new request after an HTTP error or connection failure.

Which errors to retry

StatusMeaningRetry?
429Rate-limitedYes — honour Retry-After
500Internal errorYes (idempotent only)
502Upstream errorYes
503Service unavailableYes — honour Retry-After
504Upstream timeoutYes
400Bad requestNo — fix the request
401Invalid authNo — rotate the key
403ForbiddenNo
404Not foundNo

For the canonical error envelope and full status list, see the errors reference.

  • Exponential backoff with jitter. Start at ~250 ms, cap at ~10 s, and add ±25 % jitter so simultaneous clients don't synchronise their retries.
  • Cap at 3–5 attempts. Beyond that, fail fast and let the caller decide.
  • Honour Retry-After. It overrides your computed delay.
  • Per-request timeout. Wrap each attempt in its own deadline (e.g. 60 s for streaming, 30 s for non-streaming) so a stuck connection can't block the whole budget.

Examples

import time, random, httpx

def call_with_retries(client, payload, *, max_attempts=4):
    delay = 0.25
    for attempt in range(max_attempts):
        try:
            r = client.post("/v1/chat/completions", json=payload, timeout=60)
            if r.status_code < 500 and r.status_code != 429:
                r.raise_for_status()
                return r.json()
            retry_after = float(r.headers.get("Retry-After", 0))
        except httpx.TransportError:
            retry_after = 0
        sleep = max(retry_after, delay) * (1 + random.uniform(-0.25, 0.25))
        time.sleep(sleep)
        delay = min(delay * 2, 10)
    raise RuntimeError("Tresor request failed after retries")

SDK behaviour

The OpenAI Python and Node SDKs already retry on 429, 5xx, and connection errors with exponential backoff. Override via the max_retries constructor option if you want stricter or looser behaviour.

What not to retry

  • Streaming completions mid-stream. If a stream drops after content has been delivered to the user, treat the partial output as final and surface the error rather than silently re-rolling — the model would otherwise generate a different continuation.
  • Receipt fetches that returned 404. The receipt id is wrong; retrying won't change that.

See also