Skip to content
GitHubDiscordSlack

Alumnium v0.18 with changes analysis, plannerless mode, and smarter caching

Published by Alex Rodionov's avatar Alex Rodionov
release notes

Alumnium v0.18 brings a set of improvements focused on making AI-powered automation faster and more useful in agentic contexts. This release introduces UI changes analysis, a plannerless execution mode, a smarter elements cache, expanded browser capabilities, and new AI provider support.

This release was made both for PyPI and NPM packages along with a Docker image for Alumnium Server.

Alumnium can now analyze what changed in the UI after each do() call. When enabled, Alumnium captures the accessibility tree before and after executing an action and produces a human-readable description of what changed - new elements that appeared, elements that disappeared, and URL changes.

This feature is particularly valuable when using the MCP server. General-purpose agents like Claude Code receive the change description as part of the tool response, giving them immediate, relevant context about what just happened without needing to inspect the full accessibility tree on every step. This progressive disclosure of context reduces the number of round-trips between the agent and the server and makes agentic automation noticeably more efficient.

Changes analysis is enabled by default in the MCP server and opt-in when using Alumnium as a library.

Alumnium’s do() command has always used a two-step process: a planner determines the sequence of actions, and an actor executes them one by one. In v0.18, the planner can be disabled entirely via ALUMNIUM_PLANNER=false, letting the actor’s own reasoning drive execution directly.

In practice, this cuts the duration of each do() call roughly in half — one fewer LLM call per step. This makes a noticeable difference in the MCP server context, where agents call do() frequently and already plan next steps based on changes analysis.

This change also reflects a longer-term direction: as reasoning models continue to improve, a separate planning step becomes increasingly redundant. We want to eventually remove the planner and rely on a single actor agent built on native tool calling with reasoning models.

The caching system in v0.18 gains a new layer: the elements cache. Unlike the existing response cache - which stores full LLM responses keyed by the exact request - the elements cache stores AI decisions together with the UI elements they reference. On each lookup, it resolves those elements to their current IDs in the accessibility tree.

This means the cache remains valid even when element IDs change between test runs, which happens constantly as pages re-render. It also uses fuzzy matching on instruction text, so minor rephrasing still produces a cache hit.

As a practical example, consider a test that types into a search box and clicks a button. With the response cache, any change to the page - even unrelated content updating elsewhere - causes a miss. With the elements cache, as long as the search box and button are still present on the page, the cached decision is reused and IDs are resolved to their current values.

The two cache layers work in tandem and require no configuration changes.

This release expands what Alumnium can automate across several dimensions:

  • Frames support — Alumnium now handles cross-origin and same-origin iframes, building a unified accessibility tree across all frames on the page.
  • Full-page vision — Screenshots used in vision-based checks and retrievals can now capture the full page rather than just the viewport, via ALUMNIUM_FULL_PAGE_SCREENSHOT=true.
  • New toolsDragSliderTool for setting sliders to specific values, PrintToPdfTool for saving pages as PDF files, and SwitchToNextTabTool / SwitchToPreviousTabTool for manual tab navigation when auto-switching is disabled. See the actions guide for details.

Alumnium now supports Azure AI Foundry (azure_foundry) as a new provider, joining the existing Azure OpenAI integration. See the self-hosting guide for configuration details.

The default Google model has been updated from Gemini 3 Flash to Gemini 3.1 Flash Lite, which is approximately 2x cheaper while delivering the same accuracy on automation tasks.

The next major milestone is rewriting the Alumnium core as a single binary in TypeScript. This will unify the Python and NPM packages under a single implementation, simplify deployment, and set the foundation for faster iteration.

We are also preparing to publish benchmarking results for the WebVoyager benchmark - and as a sneak peek, a generic coding assistant with Alumnium beats all currently published agents on that benchmark.