Audio Transcriptions

OpenAI-compatible audio transcription with modality-aware usage, receipts, and normalized responses.

`POST /v1/audio/transcriptions`

Upload an audio file and receive a normalized transcript. The endpoint follows the OpenAI audio transcription shape closely, with Tresor receipt metadata added on JSON responses.

Headers

Header	Notes
`Authorization`	`Bearer tr-…` API key. Required.
`X-Tresor-Receipt`	Set to `false` to opt out of signed receipts. Default `true`.

Request body

Send multipart/form-data.

Parameter	Type	Required	Notes
`file`	file	Yes	Audio upload.
`model`	string	Yes	Bare model key (for example `whisper-large-v3`, `whisper-large-v3-turbo`, or `voxtral-small-24b`) or a compound route such as `eu/privatemode/whisper-large-v3`.
`region`	string	No	Optional routing hint when `model` is a bare key.
`provider`	string	No	Optional routing hint when `model` is a bare key.
`response_format`	string	No	`json` (default), `text`, or `verbose_json`.
`language`	string	No	ISO-639-1 hint such as `en`. Required for `verbose_json`.
`prompt`	string	No	Optional upstream prompt hint.
`temperature`	number	No	Optional provider hint.

Routing behavior

Audio transcriptions do not accept a client-provided failover array.

Use an explicit compound route such as eu/privatemode/whisper-large-v3 when you need a fixed provider and region.
Use a bare model key or auto/auto/... when you want the router to choose an eligible route automatically.
tresor.requested_route reports the normalized route you asked for.
tresor.routed_model reports the route that actually served the request.
tresor.failover is true when the router had to switch away from the initially preferred route during automatic routing.
With automatic routing, tresor.requested_route and tresor.routed_model can differ even when tresor.failover is false.

For the distinction between automatic route switching and client retries, see Routing failover and Retries and transient errors.

Accepted content types

audio/flac
audio/m4a
audio/mp4
audio/mpeg
audio/ogg
audio/wav
audio/webm
audio/x-m4a

These MIME types are accepted by the Tresor router. Actual compatibility is still provider- and model-specific, so an upstream route can reject a file that passed router validation.

For the broadest compatibility across transcription routes, prefer audio/wav or audio/mpeg. In current validation, Tinfoil whisper-large-v3-turbo accepted WAV input while rejecting an M4A upload upstream.

Known route-specific notes:

global/tinfoil/whisper-large-v3-turbo: prefer WAV or MP3; current validation accepted WAV while rejecting M4A upstream.
global/tinfoil/voxtral-small-24b: provider docs advertise MP3 or WAV and up to 30 minutes for transcription, but provider-side size checks can still reject compressed uploads well below the router cap.
eu/privatemode/whisper-large-v3: current provider docs advertise up to 50 MB per request.

Default upload limit: 25 MiB per request.

Example request

curl https://api.tresor.co/v1/audio/transcriptions \
  -H "Authorization: Bearer $TRESOR_API_KEY" \
  -F "model=auto/auto/whisper-large-v3" \
  -F "file=@./meeting.mp3;type=audio/mpeg" \
  -F "response_format=json"

Response

`json`

{
  "text": "Hello and welcome to the meeting.",
  "language": "en",
  "duration": 12.5,
  "tresor": {
    "receipt_id": "7c9e6679-7425-40de-944b-e07fc1f90ae7",
    "requested_route": "auto/auto/whisper-large-v3",
    "routed_model": "eu/privatemode/whisper-large-v3",
    "failover": false,
    "usage": {
      "billing_unit": "audio_minute",
      "audio_seconds": 12.5
    }
  }
}

Field	Type	Description
`text`	string	Normalized transcript text.
`language`	string	Language hint, when available.
`duration`	number	Duration in seconds, when available.
`segments`	array	Present on `verbose_json` responses.
`tresor.receipt_id`	string	Receipt identifier, omitted when receipts are disabled.
`tresor.requested_route`	string	Normalized route the caller asked the router to use.
`tresor.routed_model`	string	The actual route used by the router.
`tresor.failover`	boolean	Whether routing failover switched the request away from the initially preferred route.
`tresor.usage.billing_unit`	string	Billing unit for this route, such as `audio_minute`, `audio_second`, `request`, or `token`.
`tresor.usage.audio_seconds`	number	Duration metadata when available.
`tresor.usage.prompt_tokens`	integer	Present when the route settles transcription on token usage.
`tresor.usage.completion_tokens`	integer	Present when the route settles transcription on token usage.

Minute-priced transcription routes return billing_unit = audio_minute while still reporting clip duration in audio_seconds. Request-priced transcription routes return billing_unit = request. Token-priced transcription routes return billing_unit = token and may include prompt and completion token counts in tresor.usage.

`text`

When response_format=text, the response body is plain text. Tresor metadata moves to response headers:

X-Tresor-Routed-Model
X-Tresor-Receipt-Id when receipts are enabled

If you need requested_route for a text response, keep the original request or fetch the signed receipt; there is no X-Tresor-Requested-Route header.

`verbose_json`

Adds normalized segments[] timing information. language is currently required.

Errors

Status	`code`	Meaning
`400`	`invalid_content_type`	Request was not `multipart/form-data`.
`400`	`missing_field`	Required field such as `file`, `model`, or `language` was missing.
`400`	`invalid_model`	Model is unknown or not a transcription model.
`400`	`unsupported_file_type`	Router validation or the selected route rejected the audio format.
`400`	`invalid_file`	The selected route rejected the uploaded file after router validation.
`400`	`file_too_large`	The selected route enforced a stricter size limit than the router.
`400`	`invalid_response_format`	Unsupported `response_format`.
`401`	`unauthorized`	Missing or invalid API key.
`402`	`insufficient_balance`	Balance too low for reservation.
`502`	`upstream_error`	Upstream provider failed for a non-file-specific reason.
`503`	`route_not_priced`	The route exists but pricing is not available.

Error bodies follow the standard error envelope.