Fire in da houseTop Tip:Paying $100+ per month for Perplexity, MidJourney, Runway, ChatGPT is crazy - get all your AI tools in one site starting at $15 per month with Galaxy AIFire in da houseCheck it out free

nova-act-mcp

MCP.Pizza Chef: madtank

nova-act-mcp is an MCP server that exposes Amazon Nova Act browser automation tools, enabling agents to perform multi-step browser control workflows. It supports on-demand screenshots, reducing token usage and improving performance by allowing agents to control when visual feedback is captured. This server facilitates efficient, scalable browser automation integrated with MCP agents for real-time interaction and control.

Use This MCP server To

Automate multi-step web browsing workflows via MCP agents Capture on-demand browser screenshots for visual context Control browser sessions programmatically using Amazon Nova Act SDK Reduce token usage by selectively capturing visual feedback Integrate browser automation into AI-enhanced workflows Improve agent performance with smaller response payloads

README

nova-act-mcp

PyPI

nova‑act‑mcp‑server is a zero‑install Model Context Protocol (MCP) server that exposes Amazon Nova Act browser‑automation tools.

What's New in v3.0.0

  • On-Demand Screenshots: New inspect_browser tool to explicitly request screenshots only when needed
  • Reduced Token Usage: Browser actions no longer automatically include screenshots, saving context space
  • More Efficient Workflows: Agents can now control when to get visual feedback
  • Better Performance: Smaller response payloads improve overall agent experience

New inspect_browser Tool Example

# Start a browser session
start_result = await control_browser(action="start", url="https://example.com")
session_id = start_result["session_id"]

# Execute an action without getting a screenshot
execute_result = await control_browser(
    action="execute",
    session_id=session_id,
    instruction="Click on the 'More information...' link"
)

# Now explicitly request a screenshot to see the result
inspect_result = await inspect_browser(session_id=session_id)

# Example output from inspect_browser:
{
  "session_id": "f8a53291-b3a7-4e1e-8c9d-9a12b3c45d67",
  "current_url": "https://www.iana.org/domains/reserved",
  "page_title": "IANA — IANA-managed Reserved Domains",
  "content": [
    {
      "type": "image_base64",
      "data": "data:image/jpeg;base64,/9j/4AAQSkZJRgABAQEASABIAAD/2wBDAAMCA...",
      "caption": "Current viewport"
    },
    {
      "type": "text",
      "text": "Current URL: https://www.iana.org/domains/reserved\nPage Title: IANA — IANA-managed Reserved Domains"
    }
  ],
  "agent_thinking": [],
  "success": true
}

What's New in v0.2.9

  • Improved Screenshot Reliability: More dependable screenshot delivery in responses
  • Enhanced Log Path Discovery: Smart, efficient path tracking for logs and screenshots
  • Better Agent Communication: Clear messaging when screenshots can't be embedded
  • Improved Performance: Eliminated inefficient directory scanning for faster responses

What's New in v0.2.8

  • Enhanced Inline Screenshots: Screenshots now appear directly in the response content array
  • Improved compatibility with vision-capable models like Claude
  • Screenshots include descriptive captions based on the executed instruction
  • Each screenshot is delivered as { type: "image_base64", data: "..." } in the content array

What's New in v0.2.7

  • Automatic Inline Screenshots: Every browser action now includes an optimized screenshot
  • Improved screenshot quality and reliability for AI agents
  • Added environment variables to customize screenshot quality and size limits
  • Comprehensive test coverage ensuring screenshots work in all scenarios

New Feature: Inline Screenshots

Every successful execute response now contains inline_screenshot, a base64-encoded JPEG of the current viewport:

  • Quality ≈ 45, hard-capped at 250 KB (configurable via NOVA_MCP_MAX_INLINE_IMG env variable)
  • If the raw JPEG is larger than the cap, the field is null
  • No extra API calls needed - screenshots are included automatically
  • For full-resolution images and HAR/HTML logs, use the compress_logs tool

What's New in v0.2.6

  • Added compatibility with NovaAct SDK 0.9+ by normalizing log directory handling
  • Improved test organization with clear markers for unit, mock, smoke and e2e tests
  • Moved mock HTML creation logic from production code to test helpers
  • Fixed several syntax errors and incomplete code blocks
  • Added SCREENSHOT_QUALITY constant for consistent compression settings

Quick start (uvx)

Add it to your MCP client configuration:

{
  "mcpServers": {
    "nova-act-mcp-server": {
      "command": "uvx",
      "args": ["nova-act-mcp-server@latest"],
      "env": { "NOVA_ACT_API_KEY": "<your_api_key>" }
    }
  }
}

That's all you need to start controlling browsers from any MCP‑compatible client such as Claude Desktop or VS Code.

Local development (optional)

git clone https://github.com/madtank/nova-act-mcp.git
cd nova-act-mcp
uv sync
uv run nova_mcp.py

License

MIT

nova-act-mcp FAQ

How does nova-act-mcp reduce token usage during browser automation?
It allows agents to request screenshots only on demand, avoiding automatic screenshot capture and saving context space, improving token efficiency.
Can nova-act-mcp control multiple browser sessions simultaneously?
Yes, it supports managing multiple browser sessions via the Amazon Nova Act SDK, enabling complex multi-step workflows.
What is the inspect_browser tool in nova-act-mcp?
The inspect_browser tool lets agents explicitly request browser screenshots only when needed, optimizing workflow efficiency.
How does nova-act-mcp improve agent performance?
By reducing response payload sizes and controlling visual feedback capture, it enhances overall agent responsiveness and efficiency.
Is nova-act-mcp compatible with different LLM providers?
Yes, it is designed to work with various LLMs including OpenAI, Anthropic Claude, and Google Gemini, through the MCP protocol.
What programming languages or environments support nova-act-mcp?
It is primarily a Python-based MCP server but can be integrated into any environment supporting MCP clients and Python interoperability.
How do I start a browser session using nova-act-mcp?
You can start a session by calling the control_browser action with the 'start' command and specifying the URL, as shown in the example in the documentation.
Does nova-act-mcp require installation?
It is a zero-install MCP server, meaning it can be used without complex setup, simplifying deployment.