Use the OpenAI SDK

Drop-in replacement for OpenAI — set the base URL and your Tresor API key.

The Tresor API follows the OpenAI request and response shape for both chat completions and audio transcriptions, so existing OpenAI SDKs work. You only need to change two things in your existing code:

  1. Set the base URL to https://api.tresor.co/v1.
  2. Use your Tresor API key (starts with tr-).

For supported parameters and known limitations, see the chat completions reference and the audio transcriptions reference.

Tresor-specific request body fields such as region, provider, and chat-completions failover go through extra_body in OpenAI SDKs. Router control headers such as X-Tresor-Receipt and X-Tresor-Timeout-Seconds go through extra_headers. See Routing failover for the routing-specific semantics.

This guide uses standard TLS only. To additionally pin every request to a specific attested binary, combine the OpenAI SDK with tresor-attest.

Install

pip install openai

Streaming chat completion

from openai import OpenAI

client = OpenAI(
    api_key="YOUR_API_KEY",
    base_url="https://api.tresor.co/v1",
)

stream = client.chat.completions.create(
    model="gpt-oss-120b",
    messages=[{"role": "user", "content": "Hello!"}],
    stream=True,
)

for chunk in stream:
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="")

Reading the receipt

Every chat completion comes back with a tresor.receipt_id field (or chunk on a stream's finish event) that you can later fetch and verify — see Verify a receipt.

Long-running non-stream requests

Streaming remains the preferred path for long-running generations. For the cases where you intentionally want stream=False, the router keeps the default timeout at 180s and accepts an optional X-Tresor-Timeout-Seconds header. Positive values below 180 shorten the budget, and values above 600 are clamped to the hard cap of 600s.

Use extra_headers for the timeout override and extra_body for Tresor body extensions:

from openai import OpenAI

client = OpenAI(
    api_key="YOUR_API_KEY",
    base_url="https://api.tresor.co/v1",
)

resp = client.chat.completions.create(
    model="eu/privatemode/gemma-4-31b",
    stream=False,
    messages=[{"role": "user", "content": "Translate the attached policy into German."}],
    extra_headers={"X-Tresor-Timeout-Seconds": "600"},
    extra_body={
        "failover": ["auto/auto/gemma-4-31b"]
    },
)

print(resp.choices[0].message.content)

The timeout override is valid only for chat completions when stream is omitted or false. Positive integer values below 180 are accepted and shorten the request budget. Values above 600 are clamped. Invalid, duplicate, fractional, or non-positive values return 400 invalid_timeout_override. If the request still times out at 600s, switch to streaming rather than increasing the synchronous wait further.

Audio transcription

OpenAI SDKs can also call Tresor's transcription route.

from openai import OpenAI

client = OpenAI(
        api_key="YOUR_API_KEY",
        base_url="https://api.tresor.co/v1",
)

with open("meeting.webm", "rb") as audio:
        transcript = client.audio.transcriptions.create(
                model="auto/auto/whisper-large-v3",
                file=audio,
                response_format="json",
        )

print(transcript.text)