Skip to main content

Act

POST 

/workstations/:workstation_id/browser/operator/act

Perform a series of browser actions using natural language descriptions. Midscene analyzes the current page context and screenshot to plan and execute detailed prompt steps for completing the requested actions.

Key Features

  • Natural language action descriptions
  • Automatic element location and interaction
  • Step-by-step execution with visual feedback
  • Built-in error handling and recovery

Supported Actions

  • Locator: Find elements using natural language descriptions
  • Actions: Click/tap, scroll, keyboard input, hover
  • Utility: Sleep/wait operations

Best Practices

  1. Be specific and detailed in prompt step descriptions
  2. Include all necessary interactions (e.g., "click", "type", "hover")
  3. Reference elements by visible text or clear descriptions
  4. Specify the order of operations using words like "then" or "after"

Example Prompt Step Descriptions

✅ Good Examples:
- 'Enter "Learn JS today" in the task box, then press Enter to create'
- 'Move your mouse over the second item in the task list and click the Delete button to the right of the second task'
- 'Click the "completed" status button below the task list'

❌ Poor Examples:
- 'Tweet Hello World' (too vague)
- 'Delete task' (missing context)
- 'Click button' (insufficient description)

Limitations

  • Cannot execute conditional logic (if/else statements)
  • Cannot perform loops or repetitive actions
  • Requires visible and interactive elements

Request

Responses

No Content - the operation was successful but there is no additional content to return.