Fire in da houseTop Tip:Paying $100+ per month for Perplexity, MidJourney, Runway, ChatGPT and other tools is crazy - get all your AI tools in one site starting at $15 per month with Galaxy AI Fire in da houseCheck it out free

mcp-doc-forge

MCP.Pizza Chef: cablate

mcp-doc-forge is a robust MCP server that provides extensive document processing capabilities including reading DOCX, PDF, TXT, HTML, and CSV files, converting between formats like DOCX to HTML/PDF and HTML to TXT/Markdown, and manipulating PDFs by merging or splitting. It supports multi-encoding text processing, text formatting, cleaning, comparison, and splitting, as well as HTML content processing. This server enables seamless integration of document workflows into AI-powered applications using the Model Context Protocol.

Use This MCP server To

Read and extract content from DOCX, PDF, TXT, HTML, and CSV files Convert DOCX documents to HTML or PDF formats Transform HTML content into plain text or Markdown Merge multiple PDF files into one document Split large PDFs into smaller parts Clean and format text data for NLP processing Compare two text documents and generate diffs Split text by lines or custom delimiters Process and clean HTML content for downstream use Support multi-encoding text transfer and conversion

README

MseeP.ai Security Assessment Badge

Simple Document Processing MCP Server

smithery badge

A powerful Model Context Protocol (MCP) server providing comprehensive document processing capabilities.

Simple Document Processing Server MCP server

Features

Document Reader

  • Read DOCX, PDF, TXT, HTML, CSV

Document Conversion

  • DOCX to HTML/PDF conversion
  • HTML to TXT/Markdown conversion
  • PDF manipulation (merge, split)

Text Processing

  • Multi-encoding transfer support (UTF-8, Big5, GBK)
  • Text formatting and cleaning
  • Text comparison and diff generation
  • Text splitting by lines or delimiter

HTML Processing

  • HTML cleaning and formatting
  • Resource extraction (images, links, videos)
  • Structure-preserving conversion

Installation

Installing via Smithery

To install Document Processing Server for Claude Desktop automatically via Smithery:

npx -y @smithery/cli install @cablate/mcp-doc-forge --client claude

Manual Installation

npm install -g @cablate/mcp-doc-forge

Usage

Cli

mcp-doc-forge
  1. Click "+ Add MCP Server" in Dive Desktop
  2. Copy and paste this configuration:
{
  "mcpServers": {
    "searxng": {
      "command": "npx",
      "args": [
        "-y",
        "@cablate/mcp-doc-forge"
      ],
      "enabled": true
    }
  }
}
  1. Click "Save" to install the MCP server

License

MIT

Contributing

Welcome community participation and contributions! Here are ways to contribute:

  • ⭐️ Star the project if you find it helpful
  • 🐛 Submit Issues: Report problems or provide suggestions
  • 🔧 Create Pull Requests: Submit code improvements

Contact

If you have any questions or suggestions, feel free to reach out:

  • 📧 Email: reahtuoo310109@gmail.com
  • 📧 GitHub: CabLate
  • 🤝 Collaboration: Welcome to discuss project cooperation
  • 📚 Technical Guidance: Sincere welcome for suggestions and guidance

mcp-doc-forge FAQ

How do I integrate mcp-doc-forge with my existing MCP client?
You can connect mcp-doc-forge by configuring your MCP client to communicate with its server endpoint, enabling document processing capabilities within your workflow.
What document formats does mcp-doc-forge support for reading?
It supports DOCX, PDF, TXT, HTML, and CSV file formats for reading and extraction.
Can mcp-doc-forge convert documents between formats?
Yes, it can convert DOCX to HTML or PDF, and HTML to TXT or Markdown formats.
Does mcp-doc-forge support PDF manipulation?
Yes, it can merge multiple PDFs into one and split PDFs into smaller documents.
What text processing features are available?
It supports multi-encoding transfers (UTF-8, Big5, GBK), text formatting, cleaning, comparison, diff generation, and splitting by lines or delimiters.
Is mcp-doc-forge capable of processing HTML content?
Yes, it includes HTML cleaning and processing features to prepare content for further use.
How does mcp-doc-forge handle different text encodings?
It supports multiple encodings including UTF-8, Big5, and GBK to ensure proper text handling across languages.
Can mcp-doc-forge be used with various LLM providers?
Yes, it is provider-agnostic and works seamlessly with OpenAI, Claude, Gemini, and others.