Retries and transient errors

Handle transient Tresor API failures with backoff, timeouts, and bounded retry budgets.

This guide is about client-side retries after a request fails. It is not the same thing as chat-completions failover or the tresor.failover response flag.

Use Routing failover when you want the router to try alternative routes inside one request. Use retries when your client sends a new request after an HTTP error or connection failure.

Which errors to retry

Status	Meaning	Retry?
`429`	Rate-limited	Yes — honour `Retry-After`
`500`	Internal error	Yes (idempotent only)
`502`	Upstream error	Yes
`503`	Service unavailable	Yes — honour `Retry-After`
`504`	Upstream timeout	Yes
`400`	Bad request	No — fix the request
`401`	Invalid auth	No — rotate the key
`403`	Forbidden	No
`404`	Not found	No

For the canonical error envelope and full status list, see the errors reference.

Recommended strategy

Exponential backoff with jitter. Start at ~250 ms, cap at ~10 s, and add ±25 % jitter so simultaneous clients don't synchronise their retries.
Cap at 3–5 attempts. Beyond that, fail fast and let the caller decide.
Honour Retry-After. It overrides your computed delay.
Per-request timeout. Wrap each attempt in its own deadline (e.g. 60 s for streaming, 30 s for non-streaming) so a stuck connection can't block the whole budget.

Examples

import time, random, httpx

def call_with_retries(client, payload, *, max_attempts=4):
    delay = 0.25
    for attempt in range(max_attempts):
        try:
            r = client.post("/v1/chat/completions", json=payload, timeout=60)
            if r.status_code < 500 and r.status_code != 429:
                r.raise_for_status()
                return r.json()
            retry_after = float(r.headers.get("Retry-After", 0))
        except httpx.TransportError:
            retry_after = 0
        sleep = max(retry_after, delay) * (1 + random.uniform(-0.25, 0.25))
        time.sleep(sleep)
        delay = min(delay * 2, 10)
    raise RuntimeError("Tresor request failed after retries")

async function callWithRetries(payload: unknown, maxAttempts = 4) {
  let delay = 250;
  for (let attempt = 0; attempt < maxAttempts; attempt++) {
    const r = await fetch("https://api.tresor.co/v1/chat/completions", {
      method: "POST",
      headers: {
        "Content-Type": "application/json",
        Authorization: `Bearer ${process.env.TRESOR_API_KEY}`,
      },
      body: JSON.stringify(payload),
      signal: AbortSignal.timeout(60_000),
    });
    if (r.status < 500 && r.status !== 429) {
      if (!r.ok) throw new Error(`Tresor ${r.status}`);
      return r.json();
    }
    const retryAfter = Number(r.headers.get("Retry-After") ?? 0) * 1000;
    const jitter = 1 + (Math.random() * 0.5 - 0.25);
    await new Promise((res) =>
      setTimeout(res, Math.max(retryAfter, delay) * jitter),
    );
    delay = Math.min(delay * 2, 10_000);
  }
  throw new Error("Tresor request failed after retries");
}

The OpenAI Python and Node SDKs already retry on 429, 5xx, and connection errors with exponential backoff. Override via the max_retries constructor option if you want stricter or looser behaviour.

What not to retry

Streaming completions mid-stream. If a stream drops after content has been delivered to the user, treat the partial output as final and surface the error rather than silently re-rolling — the model would otherwise generate a different continuation.
Receipt fetches that returned 404. The receipt id is wrong; retrying won't change that.

Retries and transient errors

Which errors to retry

Recommended strategy

Examples

SDK behaviour

What not to retry

See also