Voice Applications
Build agents that can have natural-sounding audio conversations.
Voice actions use the virtual speaker and microphone hardware along with speech-to-text and text-to-speech models to have audio conversations in an active Workstation.
Voice actions require either a prompt or exact copy to generate spoken audio.
The following Prebuilt Workstation Specs include voice capabilities:
Voice Actions
Currently, workstations with voice capabilities can complete the following actions:
- Speak: Play voice audio into the virtual microphone via a text-to-speech model.
- Listen: Listen for audio from the virtual microphone and transcribe the audio into text.
- Question: Speak a question into the virtual microphone, listening for a user response, and then optionally responding back.
- TranscriptL: Get a URL to a transcript SSE stream from the Workstation.