Speech
The Speech API gives you access to TTS and Whisper models for performing speech-to-text (STT) and text-to-speech (TTS) allowing the following:
- Creating audio from text (TTS)
- Transcribing audio to text (STT)
- Translating non-English audio to English text
- Converting non-English audio to English audio
Config
TTS and SST configs are provided in configs/prompts/speech/*
.
For TTS you can define:
- The model and fallback models used by chat, eg
tts-1
. - The output
voice
. One ofalloy
,echo
,fable
,onyx
,nova
, orshimmer
. - The output speed, from
0.25
to4.0
. - The response format
mp3
,opus
,aac
,flac
,wav
, orpcm
.
For SST you can define:
- The model. Currently only
whisper-1
is available. - A default
prompt
of how to transcript the provided audio. - The timestamp granularity. Either
segment
, orword
.
Audio responses
For endpoints that result in audio output, the content-type of the response corresponds to the response format provided in the config (eg audio/mpeg
)