Audio Transcriptions

Upload audio to the Tresor API and get back a normalized transcript with receipts and usage metadata.

Use POST /v1/audio/transcriptions when you want OpenAI-compatible speech-to-text through Tresor.

Current transcription routes include whisper-large-v3 (billed per audio minute), whisper-large-v3-turbo (billed per request), and voxtral-small-24b (billed on transcription token usage). You can pin any route directly or let the router resolve a bare model key via auto/auto/....

Unlike chat completions, transcriptions do not take a client-provided failover list. Automatic route switching is router-managed when you use automatic resolution, and the response reports that via tresor.requested_route, tresor.routed_model, and tresor.failover. See Routing failover.

Audio format compatibility is not perfectly uniform across providers. The router accepts a broader set of audio MIME types, but upstream models can still reject specific containers or codecs. For the highest portability, prefer WAV or MP3 when testing a new route.

Known route-specific caveats:

global/tinfoil/whisper-large-v3-turbo: current validation accepted WAV while rejecting an M4A upload upstream.
global/tinfoil/voxtral-small-24b: current provider docs advertise MP3 or WAV and up to 30 minutes for transcription, but provider-side size checks can still reject compressed files well below the router upload cap.
eu/privatemode/whisper-large-v3: current provider docs advertise up to 50 MB per request.

For the full request and response schema, see the audio transcriptions reference.

Curl

curl https://api.tresor.co/v1/audio/transcriptions \
  -H "Authorization: Bearer $TRESOR_API_KEY" \
  -F "model=auto/auto/whisper-large-v3" \
  -F "file=@./meeting.mp3;type=audio/mpeg" \
  -F "response_format=json"

OpenAI SDK

from openai import OpenAI

client = OpenAI(
    api_key="YOUR_API_KEY",
    base_url="https://api.tresor.co/v1",
)

with open("meeting.webm", "rb") as audio:
    transcript = client.audio.transcriptions.create(
        model="auto/auto/whisper-large-v3",
        file=audio,
        response_format="json",
    )

print(transcript.text)

import fs from "node:fs";
import OpenAI from "openai";

const client = new OpenAI({
  apiKey: process.env.TRESOR_API_KEY,
  baseURL: "https://api.tresor.co/v1",
});

const transcript = await client.audio.transcriptions.create({
  model: "auto/auto/whisper-large-v3",
  file: fs.createReadStream("./meeting.mp3"),
  response_format: "json",
});

console.log(transcript.text);

Response formats

`response_format`	What you get
`json`	Normalized `text`, optional `language`, optional `duration`, and Tresor metadata.
`text`	Plain transcript body. Tresor metadata moves to `X-Tresor-Routed-Model` and `X-Tresor-Receipt-Id` headers.
`verbose_json`	Normalized transcript plus `segments[]`. Currently requires `language`.

If a route rejects your upload, Tresor now surfaces route-specific file errors as 400 responses such as unsupported_file_type, invalid_file, or file_too_large. Reserve 502 upstream_error for actual upstream failures that are not clearly fixable by changing the file.

Preparing audio locally

Before assuming a transcription route is unavailable, make one cheap local preprocessing pass:

re-encode to mono MP3 or WAV
split long recordings locally instead of relying on hidden server-side chunking

ffmpeg -i input.m4a -vn -ac 1 -ar 16000 -c:a libmp3lame -b:a 96k output.mp3

mkdir -p chunks
ffmpeg -i input.m4a -vn -ac 1 -ar 16000 -c:a libmp3lame -b:a 96k -f segment -segment_time 900 -reset_timestamps 1 chunks/part-%03d.mp3

The first command gives you a conservative MP3 for routes that dislike M4A or higher-bitrate uploads. The second keeps each chunk at 15 minutes, which is a safer default for providers that enforce duration or decoded-size limits behind the scenes.

Receipts and usage