Speech
The Speech API gives you access to TTS and Whisper models for performing speech-to-text (STT) and text-to-speech (TTS) allowing the following:
- Creating audio from text (TTS)
- Transcribing audio to text (STT)
- Translating non-English audio to English text
- Converting non-English audio to English audio
Config
TTS and SST configs are provided in configs/prompts/speech/*.
For TTS you can define:
- The model and fallback models used by chat, eg
tts-1. - The output
voice. One ofalloy,echo,fable,onyx,nova, orshimmer. - The output speed, from
0.25to4.0. - The response format
mp3,opus,aac,flac,wav, orpcm.
For SST you can define:
- The model. Currently only
whisper-1is available. - A default
promptof how to transcript the provided audio. - The timestamp granularity. Either
segment, orword.
Audio responses
For endpoints that result in audio output, the content-type of the response corresponds to the response format provided in the config (eg audio/mpeg)