mcp-iceberg-service

MCP.Pizza Chef: ahodroj

The MCP Iceberg Service is a server implementation that integrates Apache Iceberg data lake catalogs with LLMs like Claude. It provides a SQL interface for querying and managing Iceberg tables, enabling real-time metadata search and data discovery through natural language prompts. Designed for seamless use with Claude Desktop, it simplifies data lake exploration and management by bridging LLMs with Iceberg's catalog and storage infrastructure.

Use This MCP server To

Query Apache Iceberg tables using natural language prompts Discover and search data lake metadata via LLM interface Manage Iceberg catalog tables through SQL commands Integrate Iceberg data lake exploration into Claude Desktop Enable real-time data discovery in S3-compatible storage Automate metadata retrieval from Iceberg catalogs Facilitate data lake governance with LLM-driven queries

README

MCP Iceberg Catalog

smithery badge

A MCP (Model Context Protocol) server implementation for interacting with Apache Iceberg. This server provides a SQL interface for querying and managing Iceberg tables through Claude desktop.

Claude Desktop as your Iceberg Data Lake Catalog

image

How to Install in Claude Desktop

Installing via Smithery

To install MCP Iceberg Catalog for Claude Desktop automatically via Smithery:

npx -y @smithery/cli install @ahodroj/mcp-iceberg-service --client claude
  1. Prerequisites

    • Python 3.10 or higher
    • UV package installer (recommended) or pip
    • Access to an Iceberg REST catalog and S3-compatible storage
  2. How to install in Claude Desktop Add the following configuration to claude_desktop_config.json:

{
  "mcpServers": {
    "iceberg": {
      "command": "uv",
      "args": [
        "--directory",
        "PATH_TO_/mcp-iceberg-service",
        "run",
        "mcp-server-iceberg"
      ],
      "env": {
        "ICEBERG_CATALOG_URI" : "http://localhost:8181",
        "ICEBERG_WAREHOUSE" : "YOUR ICEBERG WAREHOUSE NAME",
        "S3_ENDPOINT" : "OPTIONAL IF USING S3",
        "AWS_ACCESS_KEY_ID" : "YOUR S3 ACCESS KEY",
        "AWS_SECRET_ACCESS_KEY" : "YOUR S3 SECRET KEY"
      }
    }
  }
}

Design

Architecture

The MCP server is built on three main components:

  1. MCP Protocol Handler

    • Implements the Model Context Protocol for communication with Claude
    • Handles request/response cycles through stdio
    • Manages server lifecycle and initialization
  2. Query Processor

    • Parses SQL queries using sqlparse
    • Supports operations:
      • LIST TABLES
      • DESCRIBE TABLE
      • SELECT
      • INSERT
  3. Iceberg Integration

    • Uses pyiceberg for table operations
    • Integrates with PyArrow for efficient data handling
    • Manages catalog connections and table operations

PyIceberg Integration

The server utilizes PyIceberg in several ways:

  1. Catalog Management

    • Connects to REST catalogs
    • Manages table metadata
    • Handles namespace operations
  2. Data Operations

    • Converts between PyIceberg and PyArrow types
    • Handles data insertion through PyArrow tables
    • Manages table schemas and field types
  3. Query Execution

    • Translates SQL to PyIceberg operations
    • Handles data scanning and filtering
    • Manages result set conversion

Further Implementation Needed

  1. Query Operations

    • Implement UPDATE operations
    • Add DELETE support
    • Support for CREATE TABLE with schema definition
    • Add ALTER TABLE operations
    • Implement table partitioning support
  2. Data Types

    • Support for complex types (arrays, maps, structs)
    • Add timestamp with timezone handling
    • Support for decimal types
    • Add nested field support
  3. Performance Improvements

    • Implement batch inserts
    • Add query optimization
    • Support for parallel scans
    • Add caching layer for frequently accessed data
  4. Security Features

    • Add authentication mechanisms
    • Implement role-based access control
    • Add row-level security
    • Support for encrypted connections
  5. Monitoring and Management

    • Add metrics collection
    • Implement query logging
    • Add performance monitoring
    • Support for table maintenance operations
  6. Error Handling

    • Improve error messages
    • Add retry mechanisms for transient failures
    • Implement transaction support
    • Add data validation

mcp-iceberg-service FAQ

How do I install the MCP Iceberg Service in Claude Desktop?
Use Smithery CLI with 'npx -y @smithery/cli install @ahodroj/mcp-iceberg-service --client claude' and configure 'claude_desktop_config.json' accordingly.
What are the prerequisites for running the MCP Iceberg Service?
Python 3.10+, UV or pip installer, access to an Iceberg REST catalog, and S3-compatible storage are required.
Can I use this MCP server with LLMs other than Claude?
While optimized for Claude Desktop, the server can potentially integrate with other LLMs supporting MCP, such as OpenAI's GPT-4 and Anthropic's Claude.
How does the MCP Iceberg Service handle data security?
It relies on secure access to Iceberg REST catalogs and S3 storage, with scoped permissions managed outside the MCP server.
What kind of queries can I run through this MCP server?
You can run SQL queries to search, retrieve, and manage Iceberg table metadata and data lake information.
Is real-time metadata search supported?
Yes, the server enables real-time metadata discovery and search through natural language prompts.
How does this server improve data lake usability?
By allowing LLM-driven natural language queries, it simplifies complex SQL interactions and metadata exploration.
What storage systems are compatible with this MCP server?
It supports S3-compatible storage systems integrated with Apache Iceberg catalogs.