Audio Transcriptions

Upload audio to the Tresor API and get back a normalized transcript with receipts and usage metadata.

Use POST /v1/audio/transcriptions when you want OpenAI-compatible speech-to-text through Tresor.

Current transcription routes include whisper-large-v3 (billed per audio minute), whisper-large-v3-turbo (billed per request), and voxtral-small-24b (billed on transcription token usage). You can pin any route directly or let the router resolve a bare model key via auto/auto/....

Unlike chat completions, transcriptions do not take a client-provided failover list. Automatic route switching is router-managed when you use automatic resolution, and the response reports that via tresor.requested_route, tresor.routed_model, and tresor.failover. See Routing failover.

Audio format compatibility is not perfectly uniform across providers. The router accepts a broader set of audio MIME types, but upstream models can still reject specific containers or codecs. For the highest portability, prefer WAV or MP3 when testing a new route.

Known route-specific caveats:

  • global/tinfoil/whisper-large-v3-turbo: current validation accepted WAV while rejecting an M4A upload upstream.
  • global/tinfoil/voxtral-small-24b: current provider docs advertise MP3 or WAV and up to 30 minutes for transcription, but provider-side size checks can still reject compressed files well below the router upload cap.
  • eu/privatemode/whisper-large-v3: current provider docs advertise up to 50 MB per request.

For the full request and response schema, see the audio transcriptions reference.

Curl

curl https://api.tresor.co/v1/audio/transcriptions \
  -H "Authorization: Bearer $TRESOR_API_KEY" \
  -F "model=auto/auto/whisper-large-v3" \
  -F "file=@./meeting.mp3;type=audio/mpeg" \
  -F "response_format=json"

OpenAI SDK

from openai import OpenAI

client = OpenAI(
    api_key="YOUR_API_KEY",
    base_url="https://api.tresor.co/v1",
)

with open("meeting.webm", "rb") as audio:
    transcript = client.audio.transcriptions.create(
        model="auto/auto/whisper-large-v3",
        file=audio,
        response_format="json",
    )

print(transcript.text)

Response formats

response_formatWhat you get
jsonNormalized text, optional language, optional duration, and Tresor metadata.
textPlain transcript body. Tresor metadata moves to X-Tresor-Routed-Model and X-Tresor-Receipt-Id headers.
verbose_jsonNormalized transcript plus segments[]. Currently requires language.

If a route rejects your upload, Tresor now surfaces route-specific file errors as 400 responses such as unsupported_file_type, invalid_file, or file_too_large. Reserve 502 upstream_error for actual upstream failures that are not clearly fixable by changing the file.

Preparing audio locally

Before assuming a transcription route is unavailable, make one cheap local preprocessing pass:

  • re-encode to mono MP3 or WAV
  • split long recordings locally instead of relying on hidden server-side chunking
ffmpeg -i input.m4a -vn -ac 1 -ar 16000 -c:a libmp3lame -b:a 96k output.mp3
mkdir -p chunks
ffmpeg -i input.m4a -vn -ac 1 -ar 16000 -c:a libmp3lame -b:a 96k -f segment -segment_time 900 -reset_timestamps 1 chunks/part-%03d.mp3

The first command gives you a conservative MP3 for routes that dislike M4A or higher-bitrate uploads. The second keeps each chunk at 15 minutes, which is a safer default for providers that enforce duration or decoded-size limits behind the scenes.

Receipts and usage

  • Receipts are enabled by default, same as chat completions.
  • JSON responses include tresor.receipt_id when receipt storage is enabled.
  • Usage rows include modality, billing_unit, and duration or token fields so you can reconcile request-priced, duration-priced, and token-priced transcription routes via GET /v1/usage.

See also