POST /v1/chat/completionsCreate a chat completion. The endpoint follows the OpenAI Chat Completions request and response shape, so any OpenAI SDK works by changing the base URL and API key.
from openai import OpenAI
client = OpenAI(
base_url="https://api.tresor.co/v1",
api_key="tr-...",
)
resp = client.chat.completions.create(
model="global/redpill/gpt-oss-120b",
messages=[{"role": "user", "content": "Hello!"}],
)
| Parameter | Type | Required | Notes |
|---|---|---|---|
model | string | Yes | Compound ID region/provider/model_key — see Models. |
messages | array | Yes | Standard OpenAI messages (role, content, optionally tool_calls, tool_call_id). |
stream | boolean | No | Stream tokens as SSE. Default false. |
temperature | number | No | Sampling temperature. Accepted range 0–2; values above ~1.3 often produce incoherent output or empty content (especially for reasoning models like gpt-oss, where the budget is consumed by the analysis channel). Prefer tuning top_p for diversity. |
max_tokens | integer | No | Cap on generated tokens. |
temperature ≤ 1.3 in practice. At 1.8–2.0 the sampler approaches a uniform distribution; reasoning models may burn the entire token budget in their internal channel and return content: "" with a high completion_tokens count. Use top_p (e.g. 0.95) when you want diversity without destabilising the output.The following OpenAI parameters are forwarded verbatim to the upstream provider. Tresor does not validate them; the provider returns an error if it doesn't support a given option.
top_p, top_k, n, stop, seed, frequency_penalty, presence_penalty, logprobs, top_logprobs, response_format, tools, tool_choice, parallel_tool_calls, user, stream_options.
Most providers behind Tresor support the common subset (top_p, stop, seed, response_format, tools, tool_choice). Less common parameters (logprobs, n > 1) are provider-dependent.
| Parameter | Type | Notes |
|---|---|---|
failover | string | Ordered list of up to 5 alternative compound model IDs. The router tries them after the primary route if needed. Tresor-only. |
failover is a Tresor-only routing field for chat completions.
model first, then each failover entry in order.tresor.requested_route reports the normalized primary route you asked the router to use.tresor.routed_model always reports the route that actually served the request.tresor.failover is true when the router had to leave the primary route and use a secondary entry from your failover list.auto/auto/..., tresor.requested_route and tresor.routed_model can differ even when tresor.failover is false.503.For worked examples and guidance on when to use this versus client retries, see Routing failover.
| Header | Notes |
|---|---|
Authorization | Bearer tr-… API key. Required. |
X-Tresor-Receipt | Set to false to opt out of signed receipts. Default true. |
X-Tresor-Timeout-Seconds | Optional for synchronous non-stream requests only. Positive integer seconds, default 180, max 600. Values below 180 shorten the budget, values above 600 are clamped, and invalid values return 400. |
Streaming is the preferred option for long-running generations. If you intentionally want a single JSON response with stream: false, Tresor keeps the default synchronous timeout at 180s and lets you adjust the per-request budget with X-Tresor-Timeout-Seconds.
Rules:
stream is omitted or false180 are accepted and shorten the budget600 are accepted and clamped to 600400 invalid_timeout_overridestream: true, the router rejects the request with 400 invalid_timeout_overridecurl https://api.tresor.co/v1/chat/completions \
-H "Authorization: Bearer $TRESOR_API_KEY" \
-H "Content-Type: application/json" \
-H "X-Tresor-Timeout-Seconds: 600" \
-d '{
"model": "eu/privatemode/gemma-4-31b",
"stream": false,
"messages": [
{"role": "user", "content": "Translate the attached policy into German."}
]
}'
When you are using an OpenAI SDK, put router controls such as X-Tresor-Timeout-Seconds in extra_headers and Tresor body extensions such as failover in extra_body.
resp = client.chat.completions.create(
model="eu/privatemode/gemma-4-31b",
stream=False,
messages=[
{"role": "user", "content": "Translate the attached policy into German."}
],
extra_headers={"X-Tresor-Timeout-Seconds": "600"},
extra_body={
"failover": ["auto/auto/gemma-4-31b"]
},
)
If a request still times out at the 600s cap, switch to streaming instead of depending on a longer synchronous wait.
{
"id": "chatcmpl-abc",
"object": "chat.completion",
"model": "global/redpill/gpt-oss-120b",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "Hello! How can I help you?"
},
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 9,
"completion_tokens": 7,
"total_tokens": 16
},
"tresor": {
"receipt_id": "7c9e6679-7425-40de-944b-e07fc1f90ae7",
"requested_route": "auto/auto/gpt-oss-120b",
"routed_model": "global/redpill/gpt-oss-120b",
"failover": false
}
}
finish_reason is the upstream value (stop, length, content_filter, tool_calls).
When stream: true, the response is Server-Sent Events. Each data: line is a chunk in standard OpenAI format. The final chunk before [DONE] carries finish_reason and the Tresor metadata.
data: {"id":"chatcmpl-abc","object":"chat.completion.chunk","model":"global/redpill/gpt-oss-120b","choices":[{"index":0,"delta":{"role":"assistant"},"finish_reason":null}]}
data: {"id":"chatcmpl-abc","object":"chat.completion.chunk","model":"global/redpill/gpt-oss-120b","choices":[{"index":0,"delta":{"content":"Hello"},"finish_reason":null}]}
data: {"id":"chatcmpl-abc","object":"chat.completion.chunk","model":"global/redpill/gpt-oss-120b","choices":[{"index":0,"delta":{"content":"!"},"finish_reason":null}]}
data: {"id":"chatcmpl-abc","object":"chat.completion.chunk","model":"global/redpill/gpt-oss-120b","choices":[{"index":0,"delta":{},"finish_reason":"stop"}],"tresor":{"receipt_id":"7c9e6679-7425-40de-944b-e07fc1f90ae7","requested_route":"auto/auto/gpt-oss-120b","routed_model":"global/redpill/gpt-oss-120b","failover":false}}
data: [DONE]
The tresor field is an extension key on the chunk object — OpenAI SDKs ignore unknown keys, so existing clients work unchanged.
Models that emit chain-of-thought (gpt-oss-120b, DeepSeek-R1, etc.) deliver reasoning tokens via delta.reasoning_content, separate from delta.content:
{"choices":[{"index":0,"delta":{"reasoning_content":"The user wants…"},"finish_reason":null}]}
{"choices":[{"index":0,"delta":{"content":"Hello!"},"finish_reason":null}]}
Tool calling follows the standard OpenAI shape. Pass tools and (optionally) tool_choice in the request; Tresor forwards them to the provider untouched.
{
"model": "global/redpill/gpt-oss-120b",
"messages": [{ "role": "user", "content": "What's the weather in Paris?" }],
"tools": [
{
"type": "function",
"function": {
"name": "get_weather",
"description": "Get current weather for a city",
"parameters": {
"type": "object",
"properties": { "city": { "type": "string" } },
"required": ["city"]
}
}
}
]
}
{
"id": "chatcmpl-abc",
"object": "chat.completion",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "",
"tool_calls": [
{
"id": "call_abc",
"type": "function",
"function": {
"name": "get_weather",
"arguments": "{\"city\":\"Paris\"}"
}
}
]
},
"finish_reason": "tool_calls"
}
],
"usage": { "prompt_tokens": 42, "completion_tokens": 18, "total_tokens": 60 },
"tresor": {
"receipt_id": "e8a30b14-4b5c-4f7a-8e91-2a3d9f0b1c5e",
"requested_route": "global/redpill/gpt-oss-120b",
"routed_model": "global/redpill/gpt-oss-120b",
"failover": false
}
}
Tool calls stream as delta.tool_calls fragments. Concatenate function.arguments strings across chunks with the same index to assemble the final argument JSON.
data: {"choices":[{"index":0,"delta":{"role":"assistant"},"finish_reason":null}]}
data: {"choices":[{"index":0,"delta":{"tool_calls":[{"index":0,"id":"call_abc","type":"function","function":{"name":"get_weather","arguments":""}}]},"finish_reason":null}]}
data: {"choices":[{"index":0,"delta":{"tool_calls":[{"index":0,"function":{"arguments":"{\"city"}}]},"finish_reason":null}]}
data: {"choices":[{"index":0,"delta":{"tool_calls":[{"index":0,"function":{"arguments":"\":\"Paris\"}"}}]},"finish_reason":null}]}
data: {"choices":[{"index":0,"delta":{},"finish_reason":"tool_calls"}],"tresor":{"receipt_id":"e8a30b14-4b5c-4f7a-8e91-2a3d9f0b1c5e","requested_route":"global/redpill/gpt-oss-120b","routed_model":"global/redpill/gpt-oss-120b","failover":false}}
data: [DONE]
| Feature | Status |
|---|---|
| Non-streaming chat completions | ✅ Supported |
| Streaming via SSE | ✅ Supported |
tools / tool_choice (function calling) | ✅ Supported (streaming + non-streaming) |
response_format (JSON mode / schema) | ✅ Forwarded; provider-dependent |
reasoning_content for reasoning models | ✅ Supported |
Vision / multimodal content arrays | ✅ Forwarded; provider-dependent |
seed, stop, top_p, penalties | ✅ Forwarded |
logprobs, n > 1 | ⚠️ Forwarded; only choices[0] is surfaced in streaming |
/v1/embeddings | ❌ Not supported |
/v1/completions (legacy) | ❌ Not supported |
/v1/audio/transcriptions | ✅ Supported via the dedicated Audio Transcriptions route |
/v1/audio/* other than transcriptions | ❌ Not supported |
/v1/images/*, batches | ❌ Not supported |
The tresor object on the response (non-streaming) or on the finish chunk (streaming) contains:
| Field | Type | Description |
|---|---|---|
receipt_id | string | ID for the signed receipt. Fetch the JWS via GET /v1/receipts/{id}. Omitted if receipts opted out. |
requested_route | string | Normalized primary route from the caller. Bare model keys normalize to auto/auto/<model_key>. |
routed_model | string | Fully-resolved compound model ID actually used. |
failover | boolean | true if routing failover switched the request away from the primary route and onto a secondary route. |