macos-screen-mcp

MCP.Pizza Chef: jhead

The macos-screen-mcp is an MCP server designed to enable LLMs to capture screenshots and control windows on macOS. It supports capturing screenshots by window title or ID, listing visible windows, finding windows by title or owner, and sending keyboard events. This server facilitates real-time interaction with macOS graphical environments, making it ideal for automation, testing, and AI-driven desktop control workflows.

Use This MCP server To

Capture screenshots of specific macOS windows by title or ID List all visible windows on a macOS desktop Find windows by title or owner name for targeted actions Send keyboard key press events to active macOS windows Automate GUI testing by capturing window states and sending inputs Enable AI agents to visually monitor and interact with macOS apps

README

macOS Screen View & Control MCP Server

A Model Context Protocol server that provides window screenshot capabilities. This server enables LLMs to capture screenshots of specific windows on macOS, either by window title or window ID.

Available Tools

  • capture_window_screenshot - Captures a screenshot of a specific window by its title or ID

    • window_identifier (string, required): Window title to search for or window ID
    • search_in_owner (boolean, optional): Whether to search in window owner names (default: true)
    • format (string, optional): Output format (binary or base64) (default: "binary")
  • list_windows - Lists all visible windows

    • No parameters required
  • find_window - Finds a window by title or owner name

    • title (string, required): Window title or owner name to search for
    • search_in_owner (boolean, optional): Whether to search in window owner names (default: true)
  • send_key - Sends a keyboard key press event to the active window

    • key (string, required): The key to press (e.g., 'a', 'return', 'space')
    • modifiers (list of strings, optional): List of modifier keys to hold (e.g., ['command', 'shift'])
  • type_text - Types a sequence of text characters

    • text (string, required): The text to type
    • delay (float, optional): Delay between keystrokes in seconds (default: 0.1)

Supported Keys

The following keys are supported:

  • Letters: a-z (case-insensitive)
  • Numbers: 0-9
  • Special keys: return, tab, space, delete, escape
  • Arrow keys: up_arrow, down_arrow, left_arrow, right_arrow
  • Modifier keys: command, shift, control, option (also right_shift, right_option, right_control)

Examples

Send a single key press:

await send_key("return")

Send a key with modifiers:

await send_key("c", ["command"])  # Command+C (copy)

Type text:

await type_text("Hello, World!")

Installation

Using pip

Install macos_screen_mcp via pip:

pip install git+ssh://git@github.com/jhead/macos-screen-mcp.git

After installation, you can run it as a script using:

python -m macos_screen_mcp

Configuration

Configure

Add to your Claude or Cursor settings:

"mcpServers": {
 "macos-screen": {
    "name": "macos-screen",
    "url": "http://localhost:8000/sse",
    "description": "MCP server for capturing window screenshots",
    "version": "1.0.0"
  }
}

Debugging

You can use the MCP inspector to debug the server:

npx @modelcontextprotocol/inspector python -m macos_screen_mcp

Contributing

We encourage contributions to help expand and improve macos-screen-mcp. Whether you want to add new tools, enhance existing functionality, or improve documentation, your input is valuable.

Pull requests are welcome! Feel free to contribute new ideas, bug fixes, or enhancements to make macos-screen-mcp even more powerful and useful.

License

macos-screen-mcp is licensed under the MIT License. This means you are free to use, modify, and distribute the software, subject to the terms and conditions of the MIT License. For more details, please see the LICENSE file in the project repository.

macos-screen-mcp FAQ

How does macos-screen-mcp identify windows for screenshots?
It uses window titles or window IDs to locate specific macOS windows for capturing screenshots.
Can I capture screenshots in different formats?
Yes, screenshots can be output in binary or base64 formats as needed.
Does macos-screen-mcp support sending keyboard inputs?
Yes, it can send keyboard key press events to the active window on macOS.
Is it possible to list all visible windows using this MCP server?
Yes, the server provides a tool to list all currently visible windows on macOS.
Can I search windows by owner name instead of title?
Yes, the find_window tool supports searching by window owner names as well as titles.
What kind of applications benefit from macos-screen-mcp?
It is useful for automation, AI-driven desktop control, GUI testing, and real-time window monitoring.
Is macos-screen-mcp compatible with multiple LLM providers?
Yes, it works with OpenAI, Anthropic Claude, and Google Gemini models via MCP.