mcp-web-search

MCP.Pizza Chef: Claw256

mcp-web-search is an MCP server that integrates Google Custom Search capabilities with advanced filtering and web content viewing. It features markdown conversion, rate limiting, caching, browser instance pooling, and sophisticated bot detection avoidance using rebrowser-puppeteer. Designed for Bun runtime environments, it requires Google API credentials and supports authenticated site access via cookie management. This server enables real-time, structured web search and content retrieval for LLMs within the MCP ecosystem.

Use This MCP server To

Perform Google searches with advanced filtering and custom queries Retrieve and convert web page content to markdown for LLM consumption Cache search results to optimize repeated queries Manage multiple browser instances for efficient web scraping Avoid bot detection during automated web content retrieval Authenticate and access protected web content using cookie files

README

Web Search MCP Server

An MCP server that provides Google search capabilities and web content viewing with advanced bot detection avoidance.

Features

  • Google Custom Search with advanced filtering
  • Web content viewing with markdown conversion
  • Rate limiting and caching
  • Browser instance pooling
  • Bot detection avoidance using rebrowser-puppeteer

Prerequisites

  • Bun runtime v1.0 or higher
  • Google API credentials (API key and Search Engine ID)

Installation

# Install dependencies
bun install

# Build the TypeScript files
bun run build

Configuration

Cookie Setup

For authenticated site access, you'll need to:

  1. Install the Get cookies.txt LOCALLY Chrome extension
  2. Visit the sites you want to authenticate with and log in
  3. Use the extension to export your cookies in JSON format
  4. Store the exported cookies file in a secure location
  5. Set the BROWSER_COOKIES_PATH environment variable to the absolute path of your cookies file

MCP Server Configuration

Add the server configuration to your MCP settings file:

  • For Cline: %APPDATA%\Code\User\globalStorage\rooveterinaryinc.roo-cline\settings\cline_mcp_settings.json
  • For Claude Desktop:
    • MacOS/Linux: ~/Library/Application Support/Claude/claude_desktop_config.json
    • Windows: %APPDATA%\Claude\claude_desktop_config.json
{
  "mcpServers": {
    "web-search": {
      "command": "bun",
      "args": [
        "run",
        "/ABSOLUTE/PATH/TO/web_search_mcp/dist/index.js"
      ],
      "env": {
        "GOOGLE_API_KEY": "your_api_key",
        "GOOGLE_SEARCH_ENGINE_ID": "your_search_engine_id",
        "MAX_CONCURRENT_BROWSERS": "3",
        "BROWSER_TIMEOUT": "30000",
        "RATE_LIMIT_WINDOW": "60000",
        "RATE_LIMIT_MAX_REQUESTS": "60",
        "SEARCH_CACHE_TTL": "3600",
        "VIEW_URL_CACHE_TTL": "7200",
        "MAX_CACHE_ITEMS": "1000",
        "BROWSER_POOL_MIN": "1",
        "BROWSER_POOL_MAX": "5",
        "BROWSER_POOL_IDLE_TIMEOUT": "30000",
        "REBROWSER_PATCHES_RUNTIME_FIX_MODE": "addBinding",
        "REBROWSER_PATCHES_SOURCE_URL": "jquery.min.js",
        "REBROWSER_PATCHES_UTILITY_WORLD_NAME": "util",
        "REBROWSER_PATCHES_DEBUG": "0",
        "BROWSER_COOKIES_PATH": "C:\\path\\to\\cookies.json",
        "LOG_LEVEL": "info",
        "NO_COLOR": "0",
        "BUN_FORCE_COLOR": "1",
        "FORCE_COLOR": "1"
      }
    }
  }
}

Replace /ABSOLUTE/PATH/TO/web_search_mcp with the absolute path to your server directory.

Logging Configuration

The following environment variables control logging behavior:

  • LOG_LEVEL: Sets the logging level (error, warn, info, debug). Default: info
  • NO_COLOR: Disables colored output when set to "1"
  • BUN_FORCE_COLOR: Controls colored output in Bun runtime (set to "0" to disable)
  • FORCE_COLOR: Controls colored output globally (set to "0" to disable)

Bot Detection Avoidance

This server uses rebrowser-puppeteer to avoid bot detection:

  1. Runtime.Enable Leak Prevention:

    • Uses the addBinding technique to avoid Runtime.Enable detection
    • Works with web workers and iframes
    • Maintains access to the main world context
  2. Source URL Masking:

    • Changes Puppeteer's sourceURL to look like a legitimate script
    • Helps avoid detection of automation tools
  3. Utility World Name:

    • Uses a generic utility world name
    • Prevents detection through world name patterns
  4. Browser Launch Configuration:

    • Disables automation flags
    • Uses optimized Chrome arguments
    • Configures viewport and window settings

Using with Claude Desktop

  1. Make sure you have Claude Desktop installed and updated to the latest version

  2. Open your Claude Desktop configuration file:

    • MacOS/Linux: ~/Library/Application Support/Claude/claude_desktop_config.json
    • Windows: %APPDATA%\Claude\claude_desktop_config.json
  3. Add the server configuration as shown in the Configuration section above.

  4. Restart Claude Desktop

  5. Look for the hammer icon to confirm the tools are available

Available Tools

1. Search Tool

{
  name: "search",
  params: {
    query: string;
    trustedDomains?: string[];
    excludedDomains?: string[];
    resultCount?: number;
    safeSearch?: boolean;
    dateRestrict?: string;
  }
}

2. View URL Tool

{
  name: "view_url",
  params: {
    url: string;
    includeImages?: boolean;
    includeVideos?: boolean;
    preserveLinks?: boolean;
    formatCode?: boolean;
  }
}

Troubleshooting

Claude Desktop Integration Issues

  1. Check the logs:

    # MacOS/Linux
    tail -n 20 -f ~/Library/Logs/Claude/mcp*.log
    
    # Windows
    type %APPDATA%\Claude\Logs\mcp*.log
  2. Common issues:

    • Server not showing up: Check configuration file syntax and paths
    • Tool calls failing: Check server logs and restart Claude Desktop
    • Path issues: Ensure you're using absolute paths

For more detailed troubleshooting, refer to the MCP debugging guide.

Development

# Run in development mode with watch
bun --watch run dev

# Run tests
bun run test

# Run linter
bun run lint

Important Notes

  1. Bot Detection:

    • The bot detection avoidance features help prevent most common detection methods
    • However, additional measures like proper proxies and user agents may be needed
    • Some websites may still detect automation through other means
  2. Performance:

    • Browser instances are pooled and reused
    • Idle browsers are automatically cleaned up
    • Resource limits prevent overloading

License

MIT

mcp-web-search FAQ

How do I configure Google API credentials for mcp-web-search?
You need to obtain a Google API key and Search Engine ID from the Google Cloud Console and set them in the server configuration.
What runtime environment does mcp-web-search require?
It requires Bun runtime version 1.0 or higher to run properly.
How does mcp-web-search avoid bot detection?
It uses rebrowser-puppeteer to simulate real browser behavior and bypass bot detection mechanisms.
Can mcp-web-search handle authenticated web content?
Yes, by exporting cookies using a Chrome extension and configuring the server to use them, it can access authenticated sites.
Does mcp-web-search support caching of search results?
Yes, it includes rate limiting and caching to improve performance and reduce redundant queries.
How is web content presented to the model?
Web content is converted to markdown format for easier parsing and readability by LLMs.
Is browser instance pooling supported?
Yes, the server manages a pool of browser instances to optimize resource usage during web scraping.