| name | ui-automation-workflows |
| description | Accessibility-first UI automation using IDB. Query accessibility tree (fast, 50 tokens) before screenshots (slow, 170 tokens). Use when automating simulator interactions, tapping UI elements, finding buttons, or testing user flows. Covers idb-ui-describe, idb-ui-tap, idb-ui-find-element patterns. |
UI Automation Workflows
Use the execute_idb_command MCP tool for all UI automation
The xclaude-plugin provides the execute_idb_command MCP tool which consolidates all IDB UI automation operations into a single, token-efficient dispatcher.
⚠️ CRITICAL: Always Use MCP Tools First
This is the most important rule: When automating UI interactions, you MUST use the execute_idb_command MCP tool.
- ✅ DO: Invoke
execute_idb_commandfor all UI automation, element finding, and accessibility queries - ✅ DO: If the MCP tool fails, adjust parameters and retry
- ✅ DO: Read error messages and debug the parameters
- ❌ NEVER: Fall back to bash
idbcommands - ❌ NEVER: Use
idbdirectly in bash - ❌ NEVER: Run
idbcommands in a terminal
Why? The MCP tool provides:
- Structured error handling
- Token efficiency (consolidated into 1 tool vs. verbose bash output)
- Proper integration with the xclaude-plugin architecture
- Accessibility-first patterns built-in
If execute_idb_command fails, the issue is with parameters or app state - not that you should use bash.
Core Principle: Accessibility Before Screenshots
Always query the accessibility tree first. Only use screenshots as a fallback.
Use the execute_idb_command MCP tool with operation describe to access the accessibility tree.
Why Accessibility-First?
| Approach | Time | Tokens | Reliability |
|---|---|---|---|
| Accessibility tree | ~120ms | ~50 | Survives theme changes |
| Screenshot | ~2000ms | ~170 | Breaks on visual changes |
Result: 3-4x faster, 80% cheaper, more reliable
Standard Workflow
1. Check Accessibility Quality (Optional) - Use execute_idb_command
Before starting automation, check if the app has good accessibility support:
Invoke the execute_idb_command MCP tool:
{
"operation": "check-accessibility",
"target": "booted"
}
Interprets:
- "excellent" or "good": Proceed with accessibility-first workflow
- "poor" or "insufficient": May need to rely more on screenshots
Note: Most modern iOS apps have good accessibility support. Skip this check if you're confident.
2. Query Accessibility Tree - Use execute_idb_command with operation: "describe"
This is your starting point for all UI automation:
Invoke the execute_idb_command MCP tool:
{
"operation": "describe",
"target": "booted",
"parameters": {
"operation": "all"
}
}
Returns:
{
"elements": [
{
"label": "Login",
"type": "Button",
"frame": { "x": 100, "y": 400, "width": 175, "height": 50 },
"centerX": 187,
"centerY": 425,
"enabled": true,
"visible": true
},
{
"label": "Email",
"type": "TextField",
"value": "",
"frame": { "x": 50, "y": 300, "width": 275, "height": 44 },
"centerX": 187,
"centerY": 322
}
]
}
Use centerX and centerY for tap coordinates.
3. Find Your Element
Option A: Search by Label/Text (Preferred)
{
"operation": "find-element",
"target": "booted",
"parameters": {
"query": "Login"
}
}
Option B: Manual Search
From the accessibility tree response, find the element you want by:
label: Button text, field labelstype: Button, TextField, Cell, etc.value: Current input valuevisible: Only interact with visible elements
4. Interact with Element
Tap:
{
"operation": "tap",
"target": "booted",
"parameters": {
"x": 187,
"y": 425
}
}
Input Text:
{
"operation": "input",
"target": "booted",
"parameters": {
"text": "user@example.com"
}
}
Keyboard Actions:
{
"operation": "input",
"target": "booted",
"parameters": {
"key": "return"
}
}
Available keys: return, home, delete, space, escape, tab, up, down, left, right
5. Verify State
After interaction, query accessibility tree again to verify:
{
"operation": "describe",
"target": "booted"
}
Common Patterns
Pattern: Login Flow
1. describe → Find "Email" text field
2. tap → Focus email field
3. input → Type email
4. describe → Find "Password" text field
5. tap → Focus password field
6. input → Type password
7. describe → Find "Login" button
8. tap → Submit form
9. describe → Verify next screen
Pattern: Navigate and Tap
1. describe → Get all buttons
2. find-element → Search for specific button
3. tap → Execute tap
4. describe → Verify navigation
Pattern: Fill Form
1. describe → Get all text fields
2. For each field:
- tap → Focus field
- input → Enter text
- input key:return → Next field
3. describe → Find submit button
4. tap → Submit
Pattern: Scroll and Find
1. describe → Check if element visible
2. If not visible:
- gesture (swipe up) → Scroll
- describe → Check again
3. find-element → Locate target
4. tap → Interact
Gestures
Swipe
{
"operation": "gesture",
"target": "booted",
"parameters": {
"gesture_type": "swipe",
"direction": "up",
"duration": 200
}
}
Directions: up, down, left, right
Button Presses
{
"operation": "gesture",
"target": "booted",
"parameters": {
"gesture_type": "button",
"button": "HOME"
}
}
Buttons: HOME, LOCK, SIRI, SIDE_BUTTON, APPLE_PAY, SCREENSHOT, APP_SWITCH
When to Use Screenshots (Fallback Only)
Only use screenshots if:
Accessibility quality is "poor"
{ "operation": "check-accessibility", "target": "booted" }Visual verification needed
- Checking UI layout
- Verifying colors/images
- Debug visual issues
Element not in accessibility tree
- Custom drawn UI
- Canvas/game elements
- Some third-party components
For everything else, use accessibility tree.
Troubleshooting
Element Not Found
Problem: find-element returns no results
Solutions:
- Query full tree with
describeto see all elements - Check if element is in a scroll view (may be off-screen)
- Verify app state (correct screen?)
- Check if element has accessibility label
Tap Not Working
Problem: Tap executes but nothing happens
Solutions:
- Verify element is
enabled: true - Check element is
visible: true - Confirm coordinates are correct (use
centerX,centerY) - Element might need double-tap or long-press
Input Not Working
Problem: Text input not appearing
Solutions:
- Tap text field first to focus
- Wait for keyboard to appear
- Check field is not disabled
- Use keyboard-specific keys (
return,delete)
Advanced: Coordinate Transformation
If using screenshots with idb-ui-tap, coordinates may need scaling:
{
"operation": "tap",
"target": "booted",
"parameters": {
"x": 187,
"y": 425,
"applyScreenshotScale": true,
"screenshotScaleX": 0.5,
"screenshotScaleY": 0.5
}
}
But with accessibility-first, this is rarely needed.
Performance Tips
- Batch Operations: Group describe queries to minimize round-trips
- Cache Tree: Reuse accessibility tree if UI hasn't changed
- Target Specific Areas: Use
describewith point coordinates for specific regions - Avoid Unnecessary Waits: Accessibility tree reflects real-time state
Integration with MCP Tools
This Skill works with execute_idb_command tool:
- All operations use the
execute_idb_commandtool - Tool handles IDB connection and execution
- Tool returns structured accessibility data
- This Skill teaches WHEN and HOW to use operations
Related Skills
- accessibility-testing: WCAG compliance and quality assessment
- ios-testing-patterns: Test automation strategies
- simulator-workflows: Device and app management
Related Resources
xc://operations/idb: Complete IDB operations referencexc://reference/accessibility: Accessibility tree structure guidexc://workflows/accessibility-first: This workflow pattern
Remember: Accessibility tree first, screenshots last. 3-4x faster, 80% cheaper.