Act

POST /workstations/:workstation_id/browser/operator/act

Perform a series of browser actions using natural language descriptions. Midscene analyzes the current page context and screenshot to plan and execute detailed prompt steps for completing the requested actions.

Key Features

Natural language action descriptions
Automatic element location and interaction
Step-by-step execution with visual feedback
Built-in error handling and recovery

Supported Actions

Locator: Find elements using natural language descriptions
Actions: Click/tap, scroll, keyboard input, hover
Utility: Sleep/wait operations

Best Practices

Be specific and detailed in prompt step descriptions
Include all necessary interactions (e.g., "click", "type", "hover")
Reference elements by visible text or clear descriptions
Specify the order of operations using words like "then" or "after"

Example Prompt Step Descriptions

✅ Good Examples:
- 'Enter "Learn JS today" in the task box, then press Enter to create'
- 'Move your mouse over the second item in the task list and click the Delete button to the right of the second task'
- 'Click the "completed" status button below the task list'

❌ Poor Examples:
- 'Tweet Hello World' (too vague)
- 'Delete task' (missing context)
- 'Click button' (insufficient description)

Limitations

Cannot execute conditional logic (if/else statements)
Cannot perform loops or repetitive actions
Requires visible and interactive elements

Request

Responses

No Content - the operation was successful but there is no additional content to return.

Act

/workstations/:workstation_id/browser/operator/act

Key Features​

Supported Actions​

Best Practices​

Example Prompt Step Descriptions​

Limitations​

Request​

Responses​