Voice Applications

Build agents that can have natural-sounding audio conversations.

Voice actions use the virtual speaker and microphone hardware along with speech-to-text and text-to-speech models to have audio conversations in an active Workstation.

Voice actions require either a prompt or exact copy to generate spoken audio.

The following Prebuilt Workstation Specs include voice capabilities:

default
online-meeting
voip

Voice Actions

Currently, workstations with voice capabilities can complete the following actions:

Speak: Play voice audio into the virtual microphone via a text-to-speech model.
Listen: Listen for audio from the virtual microphone and transcribe the audio into text.
Question: Speak a question into the virtual microphone, listening for a user response, and then optionally responding back.
TranscriptL: Get a URL to a transcript SSE stream from the Workstation.

Voice Actions​

Voice Actions