Chat Completions

OpenAI-compatible chat completions with streaming, tool calling, and signed receipts.

`POST /v1/chat/completions`

Create a chat completion. The endpoint follows the OpenAI Chat Completions request and response shape, so any OpenAI SDK works by changing the base URL and API key.

from openai import OpenAI

client = OpenAI(
    base_url="https://api.tresor.co/v1",
    api_key="tr-...",
)

resp = client.chat.completions.create(
    model="global/redpill/gpt-oss-120b",
    messages=[{"role": "user", "content": "Hello!"}],
)

Request body

Core parameters

Parameter	Type	Required	Notes
`model`	string	Yes	Compound ID `region/provider/model_key` — see Models.
`messages`	array	Yes	Standard OpenAI messages (`role`, `content`, optionally `tool_calls`, `tool_call_id`).
`stream`	boolean	No	Stream tokens as SSE. Default `false`.
`temperature`	number	No	Sampling temperature. Accepted range `0–2`; values above ~`1.3` often produce incoherent output or empty `content` (especially for reasoning models like `gpt-oss`, where the budget is consumed by the analysis channel). Prefer tuning `top_p` for diversity.
`max_tokens`	integer	No	Cap on generated tokens.

Recommendation. Keep temperature ≤ 1.3 in practice. At 1.8–2.0 the sampler approaches a uniform distribution; reasoning models may burn the entire token budget in their internal channel and return content: "" with a high completion_tokens count. Use top_p (e.g. 0.95) when you want diversity without destabilising the output.

Pass-through parameters

The following OpenAI parameters are forwarded verbatim to the upstream provider. Tresor does not validate them; the provider returns an error if it doesn't support a given option.

top_p, top_k, n, stop, seed, frequency_penalty, presence_penalty, logprobs, top_logprobs, response_format, tools, tool_choice, parallel_tool_calls, user, stream_options.

Most providers behind Tresor support the common subset (top_p, stop, seed, response_format, tools, tool_choice). Less common parameters (logprobs, n > 1) are provider-dependent.

Tresor extensions

Parameter	Type	Notes
`failover`	string	Ordered list of up to 5 alternative compound model IDs. The router tries them after the primary route if needed. Tresor-only.

Routing failover

failover is a Tresor-only routing field for chat completions.

The router tries the primary model first, then each failover entry in order.
Each entry is validated like the primary route and must be a full compound model ID.
The selected route can have different pricing from the primary route.
tresor.requested_route reports the normalized primary route you asked the router to use.
tresor.routed_model always reports the route that actually served the request.
tresor.failover is true when the router had to leave the primary route and use a secondary entry from your failover list.
When you use automatic routing such as a bare model key or auto/auto/..., tresor.requested_route and tresor.routed_model can differ even when tresor.failover is false.
If the primary route and all failover entries are unavailable, the request fails with 503.

For worked examples and guidance on when to use this versus client retries, see Routing failover.

Headers

Header	Notes
`Authorization`	`Bearer tr-…` API key. Required.
`X-Tresor-Receipt`	Set to `false` to opt out of signed receipts. Default `true`.
`X-Tresor-Timeout-Seconds`	Optional for synchronous non-stream requests only. Positive integer seconds, default `180`, max `600`. Values below `180` shorten the budget, values above `600` are clamped, and invalid values return `400`.

Long-running non-stream requests

Streaming is the preferred option for long-running generations. If you intentionally want a single JSON response with stream: false, Tresor keeps the default synchronous timeout at 180s and lets you adjust the per-request budget with X-Tresor-Timeout-Seconds.

Rules:

only valid when stream is omitted or false
positive integer seconds only
values below 180 are accepted and shorten the budget
values above 600 are accepted and clamped to 600
duplicate, fractional, non-numeric, or non-positive values return 400 invalid_timeout_override
if you send the header with stream: true, the router rejects the request with 400 invalid_timeout_override

curl https://api.tresor.co/v1/chat/completions \
  -H "Authorization: Bearer $TRESOR_API_KEY" \
  -H "Content-Type: application/json" \
  -H "X-Tresor-Timeout-Seconds: 600" \
  -d '{
    "model": "eu/privatemode/gemma-4-31b",
    "stream": false,
    "messages": [
      {"role": "user", "content": "Translate the attached policy into German."}
    ]
  }'

When you are using an OpenAI SDK, put router controls such as X-Tresor-Timeout-Seconds in extra_headers and Tresor body extensions such as failover in extra_body.

resp = client.chat.completions.create(
    model="eu/privatemode/gemma-4-31b",
    stream=False,
    messages=[
        {"role": "user", "content": "Translate the attached policy into German."}
    ],
    extra_headers={"X-Tresor-Timeout-Seconds": "600"},
    extra_body={
        "failover": ["auto/auto/gemma-4-31b"]
    },
)

If a request still times out at the 600s cap, switch to streaming instead of depending on a longer synchronous wait.

Response (non-streaming)

{
  "id": "chatcmpl-abc",
  "object": "chat.completion",
  "model": "global/redpill/gpt-oss-120b",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Hello! How can I help you?"
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 9,
    "completion_tokens": 7,
    "total_tokens": 16
  },
  "tresor": {
    "receipt_id": "7c9e6679-7425-40de-944b-e07fc1f90ae7",
    "requested_route": "auto/auto/gpt-oss-120b",
    "routed_model": "global/redpill/gpt-oss-120b",
    "failover": false
  }
}

finish_reason is the upstream value (stop, length, content_filter, tool_calls).

Streaming response

When stream: true, the response is Server-Sent Events. Each data: line is a chunk in standard OpenAI format. The final chunk before [DONE] carries finish_reason and the Tresor metadata.

data: {"id":"chatcmpl-abc","object":"chat.completion.chunk","model":"global/redpill/gpt-oss-120b","choices":[{"index":0,"delta":{"role":"assistant"},"finish_reason":null}]}

data: {"id":"chatcmpl-abc","object":"chat.completion.chunk","model":"global/redpill/gpt-oss-120b","choices":[{"index":0,"delta":{"content":"Hello"},"finish_reason":null}]}

data: {"id":"chatcmpl-abc","object":"chat.completion.chunk","model":"global/redpill/gpt-oss-120b","choices":[{"index":0,"delta":{"content":"!"},"finish_reason":null}]}

data: {"id":"chatcmpl-abc","object":"chat.completion.chunk","model":"global/redpill/gpt-oss-120b","choices":[{"index":0,"delta":{},"finish_reason":"stop"}],"tresor":{"receipt_id":"7c9e6679-7425-40de-944b-e07fc1f90ae7","requested_route":"auto/auto/gpt-oss-120b","routed_model":"global/redpill/gpt-oss-120b","failover":false}}

data: [DONE]

The tresor field is an extension key on the chunk object — OpenAI SDKs ignore unknown keys, so existing clients work unchanged.

Reasoning models

Models that emit chain-of-thought (gpt-oss-120b, DeepSeek-R1, etc.) deliver reasoning tokens via delta.reasoning_content, separate from delta.content:

{"choices":[{"index":0,"delta":{"reasoning_content":"The user wants…"},"finish_reason":null}]}
{"choices":[{"index":0,"delta":{"content":"Hello!"},"finish_reason":null}]}

Tool calling

Tool calling follows the standard OpenAI shape. Pass tools and (optionally) tool_choice in the request; Tresor forwards them to the provider untouched.

Request

{
  "model": "global/redpill/gpt-oss-120b",
  "messages": [{ "role": "user", "content": "What's the weather in Paris?" }],
  "tools": [
    {
      "type": "function",
      "function": {
        "name": "get_weather",
        "description": "Get current weather for a city",
        "parameters": {
          "type": "object",
          "properties": { "city": { "type": "string" } },
          "required": ["city"]
        }
      }
    }
  ]
}

Non-streaming response

{
  "id": "chatcmpl-abc",
  "object": "chat.completion",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "",
        "tool_calls": [
          {
            "id": "call_abc",
            "type": "function",
            "function": {
              "name": "get_weather",
              "arguments": "{\"city\":\"Paris\"}"
            }
          }
        ]
      },
      "finish_reason": "tool_calls"
    }
  ],
  "usage": { "prompt_tokens": 42, "completion_tokens": 18, "total_tokens": 60 },
  "tresor": {
    "receipt_id": "e8a30b14-4b5c-4f7a-8e91-2a3d9f0b1c5e",
    "requested_route": "global/redpill/gpt-oss-120b",
    "routed_model": "global/redpill/gpt-oss-120b",
    "failover": false
  }
}

Streaming response

Tool calls stream as delta.tool_calls fragments. Concatenate function.arguments strings across chunks with the same index to assemble the final argument JSON.

data: {"choices":[{"index":0,"delta":{"role":"assistant"},"finish_reason":null}]}

data: {"choices":[{"index":0,"delta":{"tool_calls":[{"index":0,"id":"call_abc","type":"function","function":{"name":"get_weather","arguments":""}}]},"finish_reason":null}]}

data: {"choices":[{"index":0,"delta":{"tool_calls":[{"index":0,"function":{"arguments":"{\"city"}}]},"finish_reason":null}]}

data: {"choices":[{"index":0,"delta":{"tool_calls":[{"index":0,"function":{"arguments":"\":\"Paris\"}"}}]},"finish_reason":null}]}

data: {"choices":[{"index":0,"delta":{},"finish_reason":"tool_calls"}],"tresor":{"receipt_id":"e8a30b14-4b5c-4f7a-8e91-2a3d9f0b1c5e","requested_route":"global/redpill/gpt-oss-120b","routed_model":"global/redpill/gpt-oss-120b","failover":false}}

data: [DONE]

Compatibility matrix

Feature	Status
Non-streaming chat completions	✅ Supported
Streaming via SSE	✅ Supported
`tools` / `tool_choice` (function calling)	✅ Supported (streaming + non-streaming)
`response_format` (JSON mode / schema)	✅ Forwarded; provider-dependent
`reasoning_content` for reasoning models	✅ Supported
Vision / multimodal `content` arrays	✅ Forwarded; provider-dependent
`seed`, `stop`, `top_p`, penalties	✅ Forwarded
`logprobs`, `n > 1`	⚠️ Forwarded; only `choices[0]` is surfaced in streaming
`/v1/embeddings`	❌ Not supported
`/v1/completions` (legacy)	❌ Not supported
`/v1/audio/transcriptions`	✅ Supported via the dedicated Audio Transcriptions route
`/v1/audio/*` other than transcriptions	❌ Not supported
`/v1/images/*`, batches	❌ Not supported

Tresor extension reference

The tresor object on the response (non-streaming) or on the finish chunk (streaming) contains:

Field	Type	Description
`receipt_id`	string	ID for the signed receipt. Fetch the JWS via `GET /v1/receipts/{id}`. Omitted if receipts opted out.
`requested_route`	string	Normalized primary route from the caller. Bare model keys normalize to `auto/auto/<model_key>`.
`routed_model`	string	Fully-resolved compound model ID actually used.
`failover`	boolean	`true` if routing failover switched the request away from the primary route and onto a secondary route.

Retries and transient errors

Audio Transcriptions