unstructured-mcp

MCP.Pizza Chef: MKhalusova

The unstructured-mcp server provides Model Context Protocol support for processing a wide variety of unstructured document formats. It enables large language models to extract, interpret, and utilize content from documents such as PDFs, Word files, spreadsheets, images, and more. This server supports over 50 file types and requires an Unstructured API key and Claude Desktop for operation. It is designed to integrate unstructured document data into AI workflows, enhancing real-time context and interaction capabilities.

Use This MCP server To

Extract text and metadata from diverse unstructured document formats Enable LLMs to read and interpret PDFs, Word, Excel, and image files Convert unstructured documents into structured data for analysis Integrate document content into AI-driven workflows and applications Support multi-format document ingestion for knowledge base creation Automate document summarization and content extraction tasks Facilitate real-time document querying within AI assistants Enhance LLM context with rich document data from various file types

README

A Model Context Protocol server that provides unstructured document processing capabilities. This server enables LLMs to extract and use content from an unstructured document.

This repo is work in progress, proceed with caution :)

Supported file types:

{".abw", ".bmp", ".csv", ".cwk", ".dbf", ".dif", ".doc", ".docm", ".docx", ".dot",
 ".dotm", ".eml", ".epub", ".et", ".eth", ".fods", ".gif", ".heic", ".htm", ".html",
 ".hwp", ".jpeg", ".jpg", ".md", ".mcw", ".mw", ".odt", ".org", ".p7s", ".pages",
 ".pbd", ".pdf", ".png", ".pot", ".potm", ".ppt", ".pptm", ".pptx", ".prn", ".rst",
 ".rtf", ".sdp", ".sgl", ".svg", ".sxg", ".tiff", ".txt", ".tsv", ".uof", ".uos1",
 ".uos2", ".web", ".webp", ".wk2", ".xls", ".xlsb", ".xlsm", ".xlsx", ".xlw", ".xml",
 ".zabw"}

Prerequisites: You'll need:

  • Unstructured API key. Learn how to obtain one here
  • Claude Desktop installed locally

Quick TLDR on how to add this MCP to your Claude Desktop:

  1. Clone the repo and set up the UV environment.
  2. Create a .env file in the root directory and add the following env variable: UNSTRUCTURED_API_KEY.
  3. Run the MCP server: uv run doc_processor.py
  4. Go to ~/Library/Application Support/Claude/ and create a claude_desktop_config.json. In that file add:
{
    "mcpServers": {
        "unstructured_doc_processor": {
            "command": "PATH/TO/YOUR/UV",
            "args": [
                "--directory",
                "ABSOLUTE/PATH/TO/YOUR/unstructured-mcp/",
                "run",
                "doc_processor.py"
            ],
            "disabled": false
        }
    }
}
  1. Restart Claude Desktop. You should now be able to use the MCP.

unstructured-mcp FAQ

How do I obtain the Unstructured API key required for this server?
You can get an Unstructured API key by following the instructions at https://docs.unstructured.io/api-reference/partition/overview#get-started.
What document formats does unstructured-mcp support?
It supports over 50 formats including PDF, DOCX, XLSX, PPTX, HTML, TXT, CSV, images like PNG and JPEG, and many others.
Is Claude Desktop mandatory to use this server?
Yes, Claude Desktop must be installed locally as part of the prerequisites to operate the unstructured-mcp server.
Can this server handle scanned documents or images with text?
Yes, it supports image formats such as PNG, JPEG, TIFF, and can process text content within them if OCR capabilities are integrated.
Is the unstructured-mcp server production-ready?
The repository is currently a work in progress, so caution is advised when using it in production environments.
How does this server integrate with LLMs like OpenAI, Claude, or Gemini?
It exposes document content in a structured format via MCP, allowing LLMs from providers like OpenAI, Anthropic Claude, and Google Gemini to consume and reason over the data.
What are the main prerequisites to run unstructured-mcp?
You need an Unstructured API key and a local installation of Claude Desktop to run the server.
Can I extend support for additional file types?
Since the project is open source, you can contribute or customize the server to add support for more document formats.