web-eval-agent

MCP.Pizza Chef: Operative-Sh

The web-eval-agent is an MCP server that autonomously navigates, tests, and debugs web applications by driving a browser instance. It captures network traffic, console errors, and logs, providing a rich UX report directly within your code editor. This server enables end-to-end automated testing and debugging workflows, accelerating development cycles and improving code quality through real-time feedback.

Use This MCP server To

Autonomously test web app functionality end-to-end Capture and analyze network traffic during web app usage Collect and report console errors and logs from web apps Generate detailed UX evaluation reports with screenshots Integrate automated web debugging into code editor workflows Validate code changes by running real browser-based tests Filter and return relevant network requests for context Enable autonomous agents to verify their own code outputs

README

πŸš€ operative.sh web-eval-agent MCP Server

Let the coding agent debug itself, you've got better things to do.

Demo

πŸ”₯ Supercharge Your Debugging

operative.sh's MCP Server launches a browser-use powered agent to autonomously execute and debug web apps directly in your code editor.

⚑ Features

  • 🌐 Navigate your webapp using BrowserUse (2x faster with operative backend)
  • πŸ“Š Capture network traffic - requests are intelligently filtered and returned into the context window
  • 🚨 Collect console errors - captures logs & errors
  • πŸ€– Autonomous debugging - the Cursor agent calls the web QA agent mcp server to test if the code it wrote works as epected end-to-end.

🧰 MCP Tool Reference

Tool Purpose
web_eval_agent πŸ€– Automated UX evaluator that drives the browser, captures screenshots, console & network logs, and returns a rich UX report.
setup_browser_state πŸ”’ Opens an interactive (non-headless) browser so you can sign in once; the saved cookies/local-storage are reused by subsequent web_eval_agent runs.

Key arguments

  • web_eval_agent

    • url (required) – address of the running app (e.g. http://localhost:3000)
    • task (required) – natural-language description of what to test ("run through the signup flow and note any UX issues")
    • headless_browser (optional, default false) – set to true to hide the browser window
  • setup_browser_state

    • url (optional) – page to open first (handy to land directly on a login screen)

You can trigger these tools straight from your IDE chat, for example:

Evaluate my app at http://localhost:3000 – run web_eval_agent with the task "Try the full signup flow and report UX issues".

🏁 Quick Start (macOS/Linux)

  1. Pre-requisites (typically not needed):
  • brew: /bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"
  • npm: (brew install npm)
  • jq: brew install jq
  1. Run the installer after getting an api key (free)
curl -LSf https://operative.sh/install.sh -o install.sh && bash install.sh && rm install.sh
  1. Visit your favorite IDE and restart to apply the changes
  2. Send a prompt in chat mode to call the web eval agent tool! e.g.
Test my app on http://localhost:3000. Use web-eval-agent.

πŸ› οΈ Manual Installation

  1. Get your API key at operative.sh
  2. Install uv
curl -LsSf https://astral.sh/uv/install.sh | sh)
  1. Install playwright:
npm install -g chromium playwright && uvx --with playwright playwright install --with-deps
  1. Add below JSON to your relevant code editor with api key
  2. Restart your code editor

πŸ”ƒ Updating

  • uv cache clean
  • refresh MCP server
    "web-eval-agent": {
      "command": "uvx",
      "args": [
        "--refresh-package",
        "webEvalAgent",
        "--from",
        "git+https://github.com/Operative-Sh/web-eval-agent.git",
        "webEvalAgent"
      ],
      "env": {
        "OPERATIVE_API_KEY": "<YOUR_KEY>"
      }
    }

πŸ› οΈ Manual Installation (Mac + Cursor/Cline/Windsurf)

  1. Get your API key at operative.sh
  2. Install uv
curl -LsSf https://astral.sh/uv/install.sh | sh)
  1. Install playwright:
npm install -g chromium playwright && uvx --with playwright playwright install --with-deps
  1. Add below JSON to your relevant code editor with api key
  2. Restart your code editor

Manual Installation (Windows + Cursor/Cline/Windsurf)

We're refining this, please open an issue if you have any issues!

  1. Do all this in your code editor terminal
  2. curl -LSf https://operative.sh/install.sh -o install.sh && bash install.sh && rm install.sh
  3. Get your API key at operative.sh
  4. Install uv (curl -LsSf https://astral.sh/uv/install.sh | sh)
  5. uvx --from git+https://github.com/Operative-Sh/web-eval-agent.git playwright install
  6. Restart code editor

🚨 Issues

  • Updates aren't being received in code editors, update or reinstall for latest version: Run uv cache clean for latest
  • Any issues feel free to open an Issue on this repo or in the discord!
  • 5/5 - static apps without changes weren't screencasting, fixed! uv clean + restart to get fix

Changelog

  • 4/29 - Agent overlay update - pause/play/stop agent run in the browser

πŸ“‹ Example MCP Server Output Report

πŸ“Š Web Evaluation Report for http://localhost:5173 complete!
πŸ“ Task: Test the API-key deletion flow by navigating to the API Keys section, deleting a key, and judging the UX.

πŸ” Agent Steps
  πŸ“ 1. Navigate β†’ http://localhost:5173
  πŸ“ 2. Click     "Login"        (button index 2)
  πŸ“ 3. Click     "API Keys"     (button index 4)
  πŸ“ 4. Click     "Create Key"   (button index 9)
  πŸ“ 5. Type      "Test API Key" (input index 2)
  πŸ“ 6. Click     "Done"         (button index 3)
  πŸ“ 7. Click     "Delete"       (button index 10)
  πŸ“ 8. Click     "Delete"       (confirm index 3)
🏁 Flow tested successfully – UX felt smooth and intuitive.

πŸ–₯️ Console Logs (10)
  1. [debug] [vite] connecting…
  2. [debug] [vite] connected.
  3. [info]  Download the React DevTools …
     …

🌐 Network Requests (10)
  1. GET /src/pages/SleepingMasks.tsx                   304
  2. GET /src/pages/MCPRegistryRegistry.tsx             304
     …

⏱️ Chronological Timeline
  01:16:23.293 πŸ–₯️ Console [debug] [vite] connecting…
  01:16:23.303 πŸ–₯️ Console [debug] [vite] connected.
  01:16:23.312 ➑️ GET /src/pages/SleepingMasks.tsx
  01:16:23.318 ⬅️ 304 /src/pages/SleepingMasks.tsx
     …
  01:17:45.038 πŸ€– 🏁 Flow finished – deletion verified
  01:17:47.038 πŸ€– πŸ“‹ Conclusion repeated above
πŸ‘οΈ  See the "Operative Control Center" dashboard for live logs.

Star History

Star History Chart


Built with <3 @ operative.sh

web-eval-agent FAQ

How does web-eval-agent capture network traffic?
It intelligently filters and captures network requests during browser navigation, returning them into the context window for analysis.
Can web-eval-agent run autonomously without manual intervention?
Yes, it is designed to autonomously navigate and debug web applications, enabling automated end-to-end testing.
What types of errors does web-eval-agent collect?
It collects console errors, logs, and other runtime issues encountered during web app execution.
How does web-eval-agent integrate with code editors?
It runs as an MCP server that can be called by agents within code editors to provide real-time debugging and UX evaluation.
Does web-eval-agent support capturing screenshots?
Yes, it captures screenshots as part of its UX evaluation reports to provide visual context.
What is BrowserUse and how is it related?
BrowserUse is the underlying browser-driving technology that web-eval-agent uses to navigate web applications efficiently.
Can web-eval-agent help verify if code changes work as expected?
Yes, it enables autonomous agents to test and confirm that the code they wrote functions correctly in a real browser environment.
Is web-eval-agent suitable for continuous integration workflows?
Absolutely, it can be integrated into CI pipelines to automate web app testing and debugging.