Use POST /v1/audio/transcriptions when you want OpenAI-compatible speech-to-text through Tresor.
Current transcription routes include whisper-large-v3 (billed per audio minute), whisper-large-v3-turbo (billed per request), and voxtral-small-24b (billed on transcription token usage). You can pin any route directly or let the router resolve a bare model key via auto/auto/....
Unlike chat completions, transcriptions do not take a client-provided failover list. Automatic route switching is router-managed when you use automatic resolution, and the response reports that via tresor.requested_route, tresor.routed_model, and tresor.failover. See Routing failover.
Audio format compatibility is not perfectly uniform across providers. The router accepts a broader set of audio MIME types, but upstream models can still reject specific containers or codecs. For the highest portability, prefer WAV or MP3 when testing a new route.
Known route-specific caveats:
global/tinfoil/whisper-large-v3-turbo: current validation accepted WAV while rejecting an M4A upload upstream.global/tinfoil/voxtral-small-24b: current provider docs advertise MP3 or WAV and up to 30 minutes for transcription, but provider-side size checks can still reject compressed files well below the router upload cap.eu/privatemode/whisper-large-v3: current provider docs advertise up to 50 MB per request.For the full request and response schema, see the audio transcriptions reference.
curl https://api.tresor.co/v1/audio/transcriptions \
-H "Authorization: Bearer $TRESOR_API_KEY" \
-F "model=auto/auto/whisper-large-v3" \
-F "file=@./meeting.mp3;type=audio/mpeg" \
-F "response_format=json"
from openai import OpenAI
client = OpenAI(
api_key="YOUR_API_KEY",
base_url="https://api.tresor.co/v1",
)
with open("meeting.webm", "rb") as audio:
transcript = client.audio.transcriptions.create(
model="auto/auto/whisper-large-v3",
file=audio,
response_format="json",
)
print(transcript.text)
import fs from "node:fs";
import OpenAI from "openai";
const client = new OpenAI({
apiKey: process.env.TRESOR_API_KEY,
baseURL: "https://api.tresor.co/v1",
});
const transcript = await client.audio.transcriptions.create({
model: "auto/auto/whisper-large-v3",
file: fs.createReadStream("./meeting.mp3"),
response_format: "json",
});
console.log(transcript.text);
response_format | What you get |
|---|---|
json | Normalized text, optional language, optional duration, and Tresor metadata. |
text | Plain transcript body. Tresor metadata moves to X-Tresor-Routed-Model and X-Tresor-Receipt-Id headers. |
verbose_json | Normalized transcript plus segments[]. Currently requires language. |
If a route rejects your upload, Tresor now surfaces route-specific file errors as 400 responses such as unsupported_file_type, invalid_file, or file_too_large. Reserve 502 upstream_error for actual upstream failures that are not clearly fixable by changing the file.
Before assuming a transcription route is unavailable, make one cheap local preprocessing pass:
ffmpeg -i input.m4a -vn -ac 1 -ar 16000 -c:a libmp3lame -b:a 96k output.mp3
mkdir -p chunks
ffmpeg -i input.m4a -vn -ac 1 -ar 16000 -c:a libmp3lame -b:a 96k -f segment -segment_time 900 -reset_timestamps 1 chunks/part-%03d.mp3
The first command gives you a conservative MP3 for routes that dislike M4A or higher-bitrate uploads. The second keeps each chunk at 15 minutes, which is a safer default for providers that enforce duration or decoded-size limits behind the scenes.
tresor.receipt_id when receipt storage is enabled.modality, billing_unit, and duration or token fields so you can reconcile request-priced, duration-priced, and token-priced transcription routes via GET /v1/usage.