Skip to main content

Voice Actions

Agents built on the AgentStation platform can have natural-sounding audio conversations, with the agent speaking and listening to user responses.

We offer several ways to orchestrate audio conversations on the platform.

API

Using the API you can leverage our integrated speech-to-text, text-to-speech, and LLM models to complete voice actions.

You cannot use your own models yet for voice actions, but in a future release, you will be able to stream audio inputs/outputs to the agent workstation.

Using integrated models

To have an agent complete a voice action via the API, you send a request to a Voice Action endpoint containing the workstation ID of the current agent, any required inputs (e.g. the copy for the agent to speak, or instructions for the LLM to generate a response) and your API key. You will then receive a 200 response and any outputs once the action is successfully completed.

An example API calls to have an agent speak looks like the following:

curl -L -X POST 'https://api.agentstation.ai/v1/workstations/:id/speak' \
-H 'Content-Type: application/json' \
--data-raw '{
"copy": "string"
}'

By streaming audio

In a future release, you will be able to use streaming endpoints to input or output audio via the agent workstation.

JSON Configuration

note

Currently, JSON configuration files support voice conversations over Zoom only. If you want to use a different platform, please use the API.

The JSON configuration file is an easy way to configure an Agent to have branching conversations where they choose relevant topics to be discussed from a pre-defined list, while asking and answering questions from the user.

Below is an example of a JSON configuration file that will ask the user what they would like to make for breakfast, then give them simple instructions on how to cook their selection:

{
"name": "Breakfast Chef",
"instructions": [
"You are a chef that provides easy to follow recipes",
"Start each recipe with a list of the ingredients required",
"GOAL/OBJECTIVE: Provide step by step recipes to make a delicious breakfast",
"PERSONALITY TRAITS: You are enthusiastic, concise, and clear.",
"STRATEGY: DO NOT ASK QUESTIONS, instead make statements and let the user ask questions."
],
"topics": [
{
"name": "Breakfast Selection",
"required": true,
"next": [
"Scrambled eggs",
"Oatmeal"
],
"tasks": [
{
"type": "speak",
"copy": "Hello! I'm here to help you make an amazing breakfast."
},
{
"type": "question",
"copy": "Would you like to make scrambled eggs or oatmeal this morning?"
}
],
"optional_response": false
},
{
"name": "Scrambled eggs",
"required": false,
"tasks": [
{
"type": "speak",
"copy": "Perfect! Let's review the recipe for scrambled eggs."
},
{
"type": "speak",
"instructions": ["Describe how to make scrambled eggs in 4 sentences or less."]
},
{
"type": "question",
"copy": "Do you have any questions on the recipe?"
}
],
"optional_response": true
},
{
"name": "Oatmeal",
"required": false,
"tasks": [
{
"type": "speak",
"copy": "Perfect! Let's review the recipe for oatmeal."
},
{
"type": "speak",
"instructions": ["Describe how to make oatmeal in 4 sentences or less."]
},
{
"type": "question",
"copy": "Do you have any questions on the recipe?"
}
],
"optional_response": true
}
]
}

For a full description of all parameters within the JSON configuration file, refer to the JSON configuration guide.