Skip to main content

RSTT Stream

POST 

/workstations/:workstation_id/audio/rstt

The Real-Time Speech-to-Text (RSTT) endpoint provides an endpoint to transcribe voice audio from the Workstation's virtual speakers in real-time with a longer lived timeout. This is useful for transcribing long-form audio. It functions similarly to the /audio/listen endpoint, but with a longer lived timeout and less conversational features.

The common workflow is:

  1. Call this endpoint to get a Server-Sent Events (SSE) streaming URL with a long lived timeout
  2. Call /audio/speak to speak
  3. As speech is detected, transcriptions will be streamed to your SSE connection:
    • 'partial' events contain in-progress transcriptions
    • 'final' events contain completed utterances
  4. Use the speech_started event for interuption detection if needed. Use an RST close packet to stop the speech or send a new /speak request to interrupt the speech.
  5. When you are done listening, close the SSE connection

Request

Responses

Successfully retrieved SSE stream URL