Skip to main content

Actions Overview

Browser and System Actions

AgentStation provides a comprehensive set of actions to interact with the browser environment and the underlying operating system of the Workstation. These actions enable agentic applications to perform a wide range of tasks, from simple navigation to complex interactions with web elements and system-level operations.

Browser Tab Management

The API offers several endpoints for managing browser tabs:

  • List, open, close, and switch between tabs
  • Navigate to specific URLs
  • Refresh tabs

These actions allow your application to control the browsing environment effectively, enabling multi-tab operations and efficient navigation.

Web Interaction

For interacting with web content, the following actions are available:

  • Clicking on elements
  • Scrolling to specific locations
  • Hovering over elements
  • Inputting text into fields
  • Selecting options from dropdowns

These actions simulate user interactions with web pages, allowing your application to fill forms, navigate complex interfaces, and interact with dynamic content.

Content Retrieval

To gather information from the browser, you can use:

  • HTML content retrieval
  • Screenshot capture

These actions are crucial for analyzing web content, performing visual checks, or capturing data for further processing.

System-Level Operations

For lower-level interactions with the Workstation's operating system, the API provides:

  • Keyboard input simulation
  • System-wide screenshot capture

These actions allow your application to interact with the Workstation beyond the browser, enabling integration with desktop applications and system-wide operations.

Best Practices

When using these actions, consider the following:

  1. Efficient tab management: Open only necessary tabs and close them when no longer needed to optimize resource usage.
  2. Robust selectors: When interacting with web elements, use reliable CSS selectors or XPath to ensure consistent operation across different websites.
  3. Error handling: Implement proper error handling for cases where elements are not found or actions cannot be completed.
  4. Rate limiting: Be mindful of the rate at which you're making API calls to avoid hitting rate limits.
  5. Security: When dealing with sensitive information, especially in system-level operations, ensure you're following best security practices.

By leveraging these browser and system actions, you can create sophisticated agentic applications capable of performing complex tasks across various web environments and system contexts.