ToolRAG MCP Server

Use This MCP server To

Dynamically select relevant LLM tools for user queries Reduce token usage by limiting function definitions sent to LLMs Improve LLM response performance by avoiding context overload Enable infinite scaling of LLM tool integrations Optimize cost by minimizing unnecessary tool context Provide context-aware tool retrieval using semantic search Support multi-step reasoning with relevant tool access Integrate diverse LLM tools without manual context management

README

toolrag-header

ToolRAG

Infinity LLM tools, zero context conntraints
Context-aware tool retrieval for large language models.

Introduction

ToolRAG provides a seamless solution for using an unlimited number of function definitions with Large Language Models (LLMs), without worrying about context window limitations, costs, or performance degradation.

🌟 Key Features

Unlimited Tool Definitions: Say goodbye to context window constraints. ToolRAG dynamically selects only the most relevant tools for each query.
Semantic Tool Search: Uses vector embeddings to find the most contextually relevant tools for a given user query.
Cost Optimization: Reduces token usage by only including the most relevant function definitions.
Performance Improvement: Prevents performance degradation that occurs when overwhelming LLMs with too many function definitions.
MCP Integration: Works with any Model Context Protocol (MCP) compliant servers, enabling access to a wide ecosystem of tools.
OpenAI Compatible: Format tools as OpenAI function definitions for seamless integration.

🔍 How It Works

Tool Registration: ToolRAG connects to MCP servers and registers available tools.
Embedding Generation: Tool descriptions and parameters are embedded using vector embeddings (OpenAI or Google).
Query Analysis: When a user query comes in, ToolRAG finds the most relevant tools via semantic search.
Tool Execution: Execute selected tools against the appropriate MCP servers.

Installation

npm install @antl3x/toolrag
# or
yarn add @antl3x/toolrag
# or
pnpm add @antl3x/toolrag

🚀 Quick Start

import { ToolRAG } from "@antl3x/toolrag";
import OpenAI from "openai";

// Initialize ToolRAG with MCP servers
const toolRag = await ToolRAG.init({
  mcpServers: [
    "https://mcp.pipedream.net/token/google_calendar",
    "https://mcp.pipedream.net/token/stripe",
    // Add as many tool servers as you need!
  ],
});

const userQuery =
  "What events do I have tomorrow? Also, check my stripe balance.";

// Get relevant tools for a specific query
const client = new OpenAI();

const response = await client.responses.create({
  model: "gpt-4o",
  input: userQuery,
  tools: await toolRag.listTools(userQuery),
});

// Execute the function calls from the LLM response
for (const call of response.output.filter(
  (item) => item.type === "function_call"
)) {
  const result = await toolRag.callTool(call.name, JSON.parse(call.arguments));
  console.log(result);
}

🏗️ Architecture

ToolRAG uses a Retrieval-Augmented Generation (RAG) approach optimized for tools:

Storage: LibSQL database to store tool definitions and their vector embeddings
Retrieval: Cosine similarity search to find the most relevant tools
Execution: Direct integration with MCP servers for tool execution

👨‍💻 Use Cases

Multi-tool AI Assistants: Build assistants that can access hundreds of APIs
Enterprise Systems: Connect to internal tools and services without context limits
AI Platforms: Provide a unified interface for tool discovery and execution

🔧 Configuration Options

ToolRAG offers flexible configuration options:

Multiple embedding providers (OpenAI, Google)
Customizable relevance thresholds
Database configuration for persistence

📝 License

Apache License 2.0

ToolRAG FAQ

How does ToolRAG handle unlimited tool definitions without context limits?

ToolRAG uses semantic vector search to dynamically retrieve only the most relevant tools per query, avoiding context window overload and enabling unlimited tool usage.

Can ToolRAG reduce token costs when using many LLM tools?

Yes, by including only relevant function definitions, ToolRAG minimizes token usage and reduces associated costs.

Does ToolRAG improve LLM performance?

Yes, by preventing context overload and focusing on relevant tools, it maintains high LLM response quality and speed.

Is ToolRAG compatible with multiple LLM providers?

Yes, ToolRAG is provider-agnostic and works with OpenAI, Anthropic Claude, Google Gemini, and others.

How does ToolRAG select relevant tools for a query?

It uses vector embeddings to semantically match user queries with the most contextually appropriate tools.

Can ToolRAG support complex workflows requiring multiple tools?

Yes, it enables multi-step reasoning by dynamically retrieving all necessary tools for the task.

What are the deployment requirements for ToolRAG?

ToolRAG requires a vector database for embeddings and integration with your MCP client and LLM environment.

How does ToolRAG optimize cost and performance simultaneously?

By limiting tool context to only relevant functions, it reduces token usage and prevents LLM slowdowns.