MCP
Alumnium’s Model Context Protocol server enables general-purpose AI agents like Claude Code to leverage Alumnium’s web and mobile automation capabilities through the standardized Model Context Protocol. This integration allows AI assistants to control browsers and mobile applications directly.
Installation
Section titled “Installation”The MCP Server is available in both the Python and TypeScript packages.
Claude Code
Section titled “Claude Code”claude mcp add alumnium --env OPENAI_API_KEY=... -- uvx alumnium mcpcodex mcp add alumnium --env OPENAI_API_KEY=... -- uvx alumnium mcpCursor
Section titled “Cursor”Add the following to mcp.json:
{ "mcpServers": { "alumnium": { "command": "uvx", "args": ["alumnium", "mcp"], "env": { "OPENAI_API_KEY": "..." } } }}Gemini CLI
Section titled “Gemini CLI”gemini mcp add alumnium --env OPENAI_API_KEY=... uvx alumnium mcpVisual Studio Code
Section titled “Visual Studio Code”code --add-mcp '{ "name": "alumnium", "command": "uvx", "args": [ "alumnium", "mcp" ], "env": { "OPENAI_API_KEY": "..." } }'Claude Code
Section titled “Claude Code”claude mcp add alumnium --env OPENAI_API_KEY=... -- npx alumnium mcpcodex mcp add alumnium --env OPENAI_API_KEY=... -- npx alumnium mcpCursor
Section titled “Cursor”Add the following to mcp.json:
{ "mcpServers": { "alumnium": { "command": "npx", "args": ["alumnium", "mcp"], "env": { "OPENAI_API_KEY": "..." } } }}Gemini CLI
Section titled “Gemini CLI”gemini mcp add alumnium --env OPENAI_API_KEY=... npx alumnium mcpVisual Studio Code
Section titled “Visual Studio Code”code --add-mcp '{ "name": "alumnium", "command": "npx", "args": [ "alumnium", "mcp" ], "env": { "OPENAI_API_KEY": "..." } }'The MCP Server exposes Alumnium’s core automation capabilities:
| Tool | Description |
|---|---|
start | Initialize browser/mobile drivers with Appium/Selenium/Playwright capabilities |
stop | Cleanup resources and retrieve token usage statistics |
do | Execute natural language automation commands |
check | Verify statements about the current page state with optional vision support |
get | Extract data from pages using natural language descriptions |
wait | Wait a fixed duration or until a natural language condition is met |
fetch_accessibility_tree | Debug page structure with raw accessibility tree |
Initialize the browser or mobile driver session. Pass capabilities as an inline JSON string or a path to a JSON file. Supports all drivers: Appium, Selenium, or Playwright.
platformName
Section titled “platformName”Selects the driver to use. Supported values are chrome, ios, and android. Case-insensitive.
alumnium:options
Section titled “alumnium:options”Pass alumnium:options in capabilities to configure Alumnium and driver behavior for the session:
{ "platformName": "chrome", "alumnium:options": { "autoswitchToNewTab": false, "baseUrl": "https://example.com", "changeAnalysis": true, "cookies": [ { "name": "session", "value": "abc123", "domain": ".example.com" } ], "excludeAttributes": ["url"], "executablePath": "/Applications/Arc.app/Contents/MacOS/Arc", "headers": { "Authorization": "Bearer token" }, "headless": true, "planner": false, "profile": "work" }}| Option | Description |
|---|---|
autoswitchContexts | Automatically switch between native and web contexts after interactions. Appium only. Default is true. |
autoswitchToNewTab | Automatically switch focus to newly opened tabs. Selenium and Playwright only. Default is true. |
baseUrl | URL to navigate to when the session starts. |
changeAnalysis | Enable UI changes analysis after each do() call. Default is true. |
cookies | Pre-defined cookies to set before the session starts. Selenium and Playwright only. |
delay | Seconds to wait after each interaction. Appium only. Default is 0. |
excludeAttributes | Array of accessibility tree attributes to exclude. Reduces tree size on large pages. |
executablePath | Path to a custom Chrome/Chromium executable (e.g. Arc, Brave). Selenium and Playwright only. |
fullPageScreenshot | Capture full-page screenshots instead of viewport-only. Default is false. |
headers | Custom HTTP headers for all browser requests. Selenium and Playwright only. |
headless | Run browser in headless mode. Selenium and Playwright only. Default is false. |
hideKeyboardAfterTyping | Dismiss the on-screen keyboard after typing. Appium only. Default is false. |
newTabTimeout | Milliseconds to wait for a new tab to open after a click. Playwright only. Default is 200. |
permissions | Browser permissions to grant (e.g. ["geolocation"]). Playwright only. |
planner | Enable or disable the planning step in do(). Default is true. |
profile | Name of a persistent browser profile; cookies, sessions, and storage are preserved across restarts in ~/.alumnium/profiles/{name}. Selenium and Playwright only. |
appium:settings
Section titled “appium:settings”For iOS and Android sessions, pass appium:settings in capabilities to configure Appium settings that are applied to the driver after it is created:
{ "platformName": "ios", "appium:settings": { "allowInvisibleElements": true, "ignoreUnimportantViews": true }}Stops running driver session and cleans up resources. Returns path to the artifacts directory, token usage statistics for the session and optionally saves the execution cache.
Perform actions in the application using natural language commands and return summary of the performed steps. Alumnium automatically captures screenshot upon completion and stores it in the artifacts directory.
Verify application state and run assertions using natural language commands. Returns the result of the check along with explanation the verification was evaluated. Alumnium automatically captures screenshot upon completion and stores it in the artifacts directory.
Extract data from the application based on natural language descriptions. If data is not found, returns explanation why it can’t be retrieved. Alumnium automatically captures screenshot upon completion and stores it in the artifacts directory.
Wait for a fixed duration or until a natural language condition is met. Pass a number (1–30) to wait that many seconds, or a string condition (e.g. "user is logged in") to poll with AI-powered verification until it becomes true or a timeout is reached.
fetch_accessibility_tree
Section titled “fetch_accessibility_tree”Returns the raw accessibility tree of the current page as XML. Useful for debugging when do, check, or get behave unexpectedly — inspect the tree to verify element visibility, roles, and attributes.