Skip to content
GitHubDiscordSlack

Alumnium v0.17 with MCP and reasoning model support

Published by Alex Rodionov's avatar Alex Rodionov
release notes

Alumnium v0.17 brings significant advancements in integration capabilities and AI model support. This release introduces the MCP Server for integration with general-purpose agents and comprehensive reasoning model support across all major providers.

This release was made both for PyPI and NPM packages along with a Docker image for Alumnium Server.

This release introduces the Model Context Protocol (MCP) Server, enabling general-purpose agents like Claude Code to leverage Alumnium’s web and mobile automation capabilities. This integration allows AI assistants to control browsers and mobile applications directly through standardized tooling. Alumnium is currently the only MCP server that allows AI assistants to drive both browser and mobile applications with natural language, allowing token- and cost-efficient execution.

Check out the MCP guide detailed setup instructions and a demo.

Alumnium now fully embraces reasoning models across all major AI providers, delivering improved accuracy and more sophisticated decision-making in test automation. The framework has been updated to support extended thinking capabilities and optimize prompts for reasoning-enabled models.

ProviderBeforeAfter
AnthropicClaude Haiku 4.5 ($1/1M input, $5/1M output)Claude Haiku 4.5 with extending thinking ($1/1M input, $5/1M output)
DeepSeekDeepSeek V1 ($0.28/1M input, $0.42/1M output)DeepSeek R1 ($0.28/1M input, $0.42/1M output)
GoogleGemini 2.0 Flash ($0.10/1M input, $0.40/1M output)Gemini 3 Flash ($0.50/1M input, $3/1M output)
OpenAIGPT-4o Mini ($0.15/1M input, $0.60/1M output)GPT-5 Nano ($0.05/1M input, $0.40/1M output)
xAIGrok 4 Fast (Non Reasoning) ($0.20/1M input, $0.50/1M output)Grok 4.1 Fast (Reasoning) ($0.20/1M input, $0.50/1M output)

The framework automatically handles reasoning output across all providers, logging the model’s thought process and adapting prompts to leverage thinking capabilities where available.

This release maitains backwards compatibility, so you can keep using non-reasoning models (Mistral, Llama) or switch back to older models if needed.

Alumnium now supports navigating to URLs, file uploading, custom JavaScript execution, and scrolling to elements as part of its do command.

In addition to that, this release brings multitple improvements to drivers:

  • full support for asynchronous Playwright driver in Python;
  • automated switching to new tabs in browsers;
  • automated scrolling to elements on Appium before interaction;
  • automated keyboard hiding after typing on Appium.

On top of that, LangChain framework which is used for all LLM communication was upgraded to the first stable major release v1.

With MCP server and comprehensive reasoning model support, Alumnium now provides a robust foundation for AI-powered automation across multiple platforms and agent types. The addition of file uploading and enhanced navigation tools expands the range of testing scenarios Alumnium can handle effectively.

Future development will focus on expanding MCP server capabilities to improve agents execution and exploring vision-based actions for applications that cannot be autoamted with an accessibility tree.