Skip to main content

Voice Applications

Build agents that can have natural-sounding audio conversations.

Voice actions use the virtual speaker and microphone hardware along with speech-to-text and text-to-speech models to have audio conversations in an active Workstation.

Voice actions require either a prompt or exact copy to generate spoken audio.

The following Prebuilt Workstation Specs include voice capabilities:

Voice Actions

Currently, workstations with voice capabilities can complete the following actions:

  • Speak: Play voice audio into the virtual microphone via a text-to-speech model.
  • Listen: Listen for audio from the virtual microphone and transcribe the audio into text.
  • Question: Speak a question into the virtual microphone, listening for a user response, and then optionally responding back.
  • TranscriptL: Get a URL to a transcript SSE stream from the Workstation.