Audio Transcriptions

OpenAI-compatible audio transcription with modality-aware usage, receipts, and normalized responses.

POST /v1/audio/transcriptions

Upload an audio file and receive a normalized transcript. The endpoint follows the OpenAI audio transcription shape closely, with Tresor receipt metadata added on JSON responses.

Headers

HeaderNotes
AuthorizationBearer tr-… API key. Required.
X-Tresor-ReceiptSet to false to opt out of signed receipts. Default true.

Request body

Send multipart/form-data.

ParameterTypeRequiredNotes
filefileYesAudio upload.
modelstringYesBare model key (for example whisper-large-v3, whisper-large-v3-turbo, or voxtral-small-24b) or a compound route such as eu/privatemode/whisper-large-v3.
regionstringNoOptional routing hint when model is a bare key.
providerstringNoOptional routing hint when model is a bare key.
response_formatstringNojson (default), text, or verbose_json.
languagestringNoISO-639-1 hint such as en. Required for verbose_json.
promptstringNoOptional upstream prompt hint.
temperaturenumberNoOptional provider hint.

Routing behavior

Audio transcriptions do not accept a client-provided failover array.

  • Use an explicit compound route such as eu/privatemode/whisper-large-v3 when you need a fixed provider and region.
  • Use a bare model key or auto/auto/... when you want the router to choose an eligible route automatically.
  • tresor.requested_route reports the normalized route you asked for.
  • tresor.routed_model reports the route that actually served the request.
  • tresor.failover is true when the router had to switch away from the initially preferred route during automatic routing.
  • With automatic routing, tresor.requested_route and tresor.routed_model can differ even when tresor.failover is false.

For the distinction between automatic route switching and client retries, see Routing failover and Retries and transient errors.

Accepted content types

  • audio/flac
  • audio/m4a
  • audio/mp4
  • audio/mpeg
  • audio/ogg
  • audio/wav
  • audio/webm
  • audio/x-m4a

These MIME types are accepted by the Tresor router. Actual compatibility is still provider- and model-specific, so an upstream route can reject a file that passed router validation.

For the broadest compatibility across transcription routes, prefer audio/wav or audio/mpeg. In current validation, Tinfoil whisper-large-v3-turbo accepted WAV input while rejecting an M4A upload upstream.

Known route-specific notes:

  • global/tinfoil/whisper-large-v3-turbo: prefer WAV or MP3; current validation accepted WAV while rejecting M4A upstream.
  • global/tinfoil/voxtral-small-24b: provider docs advertise MP3 or WAV and up to 30 minutes for transcription, but provider-side size checks can still reject compressed uploads well below the router cap.
  • eu/privatemode/whisper-large-v3: current provider docs advertise up to 50 MB per request.

Default upload limit: 25 MiB per request.

Example request

curl https://api.tresor.co/v1/audio/transcriptions \
  -H "Authorization: Bearer $TRESOR_API_KEY" \
  -F "model=auto/auto/whisper-large-v3" \
  -F "file=@./meeting.mp3;type=audio/mpeg" \
  -F "response_format=json"

Response

json

{
  "text": "Hello and welcome to the meeting.",
  "language": "en",
  "duration": 12.5,
  "tresor": {
    "receipt_id": "7c9e6679-7425-40de-944b-e07fc1f90ae7",
    "requested_route": "auto/auto/whisper-large-v3",
    "routed_model": "eu/privatemode/whisper-large-v3",
    "failover": false,
    "usage": {
      "billing_unit": "audio_minute",
      "audio_seconds": 12.5
    }
  }
}
FieldTypeDescription
textstringNormalized transcript text.
languagestringLanguage hint, when available.
durationnumberDuration in seconds, when available.
segmentsarrayPresent on verbose_json responses.
tresor.receipt_idstringReceipt identifier, omitted when receipts are disabled.
tresor.requested_routestringNormalized route the caller asked the router to use.
tresor.routed_modelstringThe actual route used by the router.
tresor.failoverbooleanWhether routing failover switched the request away from the initially preferred route.
tresor.usage.billing_unitstringBilling unit for this route, such as audio_minute, audio_second, request, or token.
tresor.usage.audio_secondsnumberDuration metadata when available.
tresor.usage.prompt_tokensintegerPresent when the route settles transcription on token usage.
tresor.usage.completion_tokensintegerPresent when the route settles transcription on token usage.

Minute-priced transcription routes return billing_unit = audio_minute while still reporting clip duration in audio_seconds. Request-priced transcription routes return billing_unit = request. Token-priced transcription routes return billing_unit = token and may include prompt and completion token counts in tresor.usage.

text

When response_format=text, the response body is plain text. Tresor metadata moves to response headers:

  • X-Tresor-Routed-Model
  • X-Tresor-Receipt-Id when receipts are enabled

If you need requested_route for a text response, keep the original request or fetch the signed receipt; there is no X-Tresor-Requested-Route header.

verbose_json

Adds normalized segments[] timing information. language is currently required.

Errors

StatuscodeMeaning
400invalid_content_typeRequest was not multipart/form-data.
400missing_fieldRequired field such as file, model, or language was missing.
400invalid_modelModel is unknown or not a transcription model.
400unsupported_file_typeRouter validation or the selected route rejected the audio format.
400invalid_fileThe selected route rejected the uploaded file after router validation.
400file_too_largeThe selected route enforced a stricter size limit than the router.
400invalid_response_formatUnsupported response_format.
401unauthorizedMissing or invalid API key.
402insufficient_balanceBalance too low for reservation.
502upstream_errorUpstream provider failed for a non-file-specific reason.
503route_not_pricedThe route exists but pricing is not available.

Error bodies follow the standard error envelope.

See also