Chat Completions

OpenAI-compatible chat completions with streaming, tool calling, and signed receipts.

POST /v1/chat/completions

Create a chat completion. The endpoint follows the OpenAI Chat Completions request and response shape, so any OpenAI SDK works by changing the base URL and API key.

from openai import OpenAI

client = OpenAI(
    base_url="https://api.tresor.co/v1",
    api_key="tr-...",
)

resp = client.chat.completions.create(
    model="global/redpill/gpt-oss-120b",
    messages=[{"role": "user", "content": "Hello!"}],
)

Request body

Core parameters

ParameterTypeRequiredNotes
modelstringYesCompound ID region/provider/model_key — see Models.
messagesarrayYesStandard OpenAI messages (role, content, optionally tool_calls, tool_call_id).
streambooleanNoStream tokens as SSE. Default false.
temperaturenumberNoSampling temperature. Accepted range 0–2; values above ~1.3 often produce incoherent output or empty content (especially for reasoning models like gpt-oss, where the budget is consumed by the analysis channel). Prefer tuning top_p for diversity.
max_tokensintegerNoCap on generated tokens.
Recommendation. Keep temperature1.3 in practice. At 1.82.0 the sampler approaches a uniform distribution; reasoning models may burn the entire token budget in their internal channel and return content: "" with a high completion_tokens count. Use top_p (e.g. 0.95) when you want diversity without destabilising the output.

Pass-through parameters

The following OpenAI parameters are forwarded verbatim to the upstream provider. Tresor does not validate them; the provider returns an error if it doesn't support a given option.

top_p, top_k, n, stop, seed, frequency_penalty, presence_penalty, logprobs, top_logprobs, response_format, tools, tool_choice, parallel_tool_calls, user, stream_options.

Most providers behind Tresor support the common subset (top_p, stop, seed, response_format, tools, tool_choice). Less common parameters (logprobs, n > 1) are provider-dependent.

Tresor extensions

ParameterTypeNotes
failoverstringOrdered list of up to 5 alternative compound model IDs. The router tries them after the primary route if needed. Tresor-only.

Routing failover

failover is a Tresor-only routing field for chat completions.

  • The router tries the primary model first, then each failover entry in order.
  • Each entry is validated like the primary route and must be a full compound model ID.
  • The selected route can have different pricing from the primary route.
  • tresor.requested_route reports the normalized primary route you asked the router to use.
  • tresor.routed_model always reports the route that actually served the request.
  • tresor.failover is true when the router had to leave the primary route and use a secondary entry from your failover list.
  • When you use automatic routing such as a bare model key or auto/auto/..., tresor.requested_route and tresor.routed_model can differ even when tresor.failover is false.
  • If the primary route and all failover entries are unavailable, the request fails with 503.

For worked examples and guidance on when to use this versus client retries, see Routing failover.

Headers

HeaderNotes
AuthorizationBearer tr-… API key. Required.
X-Tresor-ReceiptSet to false to opt out of signed receipts. Default true.
X-Tresor-Timeout-SecondsOptional for synchronous non-stream requests only. Positive integer seconds, default 180, max 600. Values below 180 shorten the budget, values above 600 are clamped, and invalid values return 400.

Long-running non-stream requests

Streaming is the preferred option for long-running generations. If you intentionally want a single JSON response with stream: false, Tresor keeps the default synchronous timeout at 180s and lets you adjust the per-request budget with X-Tresor-Timeout-Seconds.

Rules:

  • only valid when stream is omitted or false
  • positive integer seconds only
  • values below 180 are accepted and shorten the budget
  • values above 600 are accepted and clamped to 600
  • duplicate, fractional, non-numeric, or non-positive values return 400 invalid_timeout_override
  • if you send the header with stream: true, the router rejects the request with 400 invalid_timeout_override
curl https://api.tresor.co/v1/chat/completions \
  -H "Authorization: Bearer $TRESOR_API_KEY" \
  -H "Content-Type: application/json" \
  -H "X-Tresor-Timeout-Seconds: 600" \
  -d '{
    "model": "eu/privatemode/gemma-4-31b",
    "stream": false,
    "messages": [
      {"role": "user", "content": "Translate the attached policy into German."}
    ]
  }'

When you are using an OpenAI SDK, put router controls such as X-Tresor-Timeout-Seconds in extra_headers and Tresor body extensions such as failover in extra_body.

resp = client.chat.completions.create(
    model="eu/privatemode/gemma-4-31b",
    stream=False,
    messages=[
        {"role": "user", "content": "Translate the attached policy into German."}
    ],
    extra_headers={"X-Tresor-Timeout-Seconds": "600"},
    extra_body={
        "failover": ["auto/auto/gemma-4-31b"]
    },
)

If a request still times out at the 600s cap, switch to streaming instead of depending on a longer synchronous wait.

Response (non-streaming)

{
  "id": "chatcmpl-abc",
  "object": "chat.completion",
  "model": "global/redpill/gpt-oss-120b",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Hello! How can I help you?"
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 9,
    "completion_tokens": 7,
    "total_tokens": 16
  },
  "tresor": {
    "receipt_id": "7c9e6679-7425-40de-944b-e07fc1f90ae7",
    "requested_route": "auto/auto/gpt-oss-120b",
    "routed_model": "global/redpill/gpt-oss-120b",
    "failover": false
  }
}

finish_reason is the upstream value (stop, length, content_filter, tool_calls).

Streaming response

When stream: true, the response is Server-Sent Events. Each data: line is a chunk in standard OpenAI format. The final chunk before [DONE] carries finish_reason and the Tresor metadata.

data: {"id":"chatcmpl-abc","object":"chat.completion.chunk","model":"global/redpill/gpt-oss-120b","choices":[{"index":0,"delta":{"role":"assistant"},"finish_reason":null}]}

data: {"id":"chatcmpl-abc","object":"chat.completion.chunk","model":"global/redpill/gpt-oss-120b","choices":[{"index":0,"delta":{"content":"Hello"},"finish_reason":null}]}

data: {"id":"chatcmpl-abc","object":"chat.completion.chunk","model":"global/redpill/gpt-oss-120b","choices":[{"index":0,"delta":{"content":"!"},"finish_reason":null}]}

data: {"id":"chatcmpl-abc","object":"chat.completion.chunk","model":"global/redpill/gpt-oss-120b","choices":[{"index":0,"delta":{},"finish_reason":"stop"}],"tresor":{"receipt_id":"7c9e6679-7425-40de-944b-e07fc1f90ae7","requested_route":"auto/auto/gpt-oss-120b","routed_model":"global/redpill/gpt-oss-120b","failover":false}}

data: [DONE]

The tresor field is an extension key on the chunk object — OpenAI SDKs ignore unknown keys, so existing clients work unchanged.

Reasoning models

Models that emit chain-of-thought (gpt-oss-120b, DeepSeek-R1, etc.) deliver reasoning tokens via delta.reasoning_content, separate from delta.content:

{"choices":[{"index":0,"delta":{"reasoning_content":"The user wants…"},"finish_reason":null}]}
{"choices":[{"index":0,"delta":{"content":"Hello!"},"finish_reason":null}]}

Tool calling

Tool calling follows the standard OpenAI shape. Pass tools and (optionally) tool_choice in the request; Tresor forwards them to the provider untouched.

Request

{
  "model": "global/redpill/gpt-oss-120b",
  "messages": [{ "role": "user", "content": "What's the weather in Paris?" }],
  "tools": [
    {
      "type": "function",
      "function": {
        "name": "get_weather",
        "description": "Get current weather for a city",
        "parameters": {
          "type": "object",
          "properties": { "city": { "type": "string" } },
          "required": ["city"]
        }
      }
    }
  ]
}

Non-streaming response

{
  "id": "chatcmpl-abc",
  "object": "chat.completion",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "",
        "tool_calls": [
          {
            "id": "call_abc",
            "type": "function",
            "function": {
              "name": "get_weather",
              "arguments": "{\"city\":\"Paris\"}"
            }
          }
        ]
      },
      "finish_reason": "tool_calls"
    }
  ],
  "usage": { "prompt_tokens": 42, "completion_tokens": 18, "total_tokens": 60 },
  "tresor": {
    "receipt_id": "e8a30b14-4b5c-4f7a-8e91-2a3d9f0b1c5e",
    "requested_route": "global/redpill/gpt-oss-120b",
    "routed_model": "global/redpill/gpt-oss-120b",
    "failover": false
  }
}

Streaming response

Tool calls stream as delta.tool_calls fragments. Concatenate function.arguments strings across chunks with the same index to assemble the final argument JSON.

data: {"choices":[{"index":0,"delta":{"role":"assistant"},"finish_reason":null}]}

data: {"choices":[{"index":0,"delta":{"tool_calls":[{"index":0,"id":"call_abc","type":"function","function":{"name":"get_weather","arguments":""}}]},"finish_reason":null}]}

data: {"choices":[{"index":0,"delta":{"tool_calls":[{"index":0,"function":{"arguments":"{\"city"}}]},"finish_reason":null}]}

data: {"choices":[{"index":0,"delta":{"tool_calls":[{"index":0,"function":{"arguments":"\":\"Paris\"}"}}]},"finish_reason":null}]}

data: {"choices":[{"index":0,"delta":{},"finish_reason":"tool_calls"}],"tresor":{"receipt_id":"e8a30b14-4b5c-4f7a-8e91-2a3d9f0b1c5e","requested_route":"global/redpill/gpt-oss-120b","routed_model":"global/redpill/gpt-oss-120b","failover":false}}

data: [DONE]

Compatibility matrix

FeatureStatus
Non-streaming chat completions✅ Supported
Streaming via SSE✅ Supported
tools / tool_choice (function calling)✅ Supported (streaming + non-streaming)
response_format (JSON mode / schema)✅ Forwarded; provider-dependent
reasoning_content for reasoning models✅ Supported
Vision / multimodal content arrays✅ Forwarded; provider-dependent
seed, stop, top_p, penalties✅ Forwarded
logprobs, n > 1⚠️ Forwarded; only choices[0] is surfaced in streaming
/v1/embeddings❌ Not supported
/v1/completions (legacy)❌ Not supported
/v1/audio/transcriptions✅ Supported via the dedicated Audio Transcriptions route
/v1/audio/* other than transcriptions❌ Not supported
/v1/images/*, batches❌ Not supported

Tresor extension reference

The tresor object on the response (non-streaming) or on the finish chunk (streaming) contains:

FieldTypeDescription
receipt_idstringID for the signed receipt. Fetch the JWS via GET /v1/receipts/{id}. Omitted if receipts opted out.
requested_routestringNormalized primary route from the caller. Bare model keys normalize to auto/auto/<model_key>.
routed_modelstringFully-resolved compound model ID actually used.
failoverbooleantrue if routing failover switched the request away from the primary route and onto a secondary route.