SourceSage is an MCP (Model Context Protocol) server that efficiently memorizes key aspects of a codebase—logic, style, and standards—while allowing dynamic updates and fast retrieval. It's designed to be language-agnostic, leveraging the LLM's understanding of code across multiple languages.
- Language Agnostic: Works with any programming language the LLM understands
- Knowledge Graph Storage: Efficiently stores code entities, relationships, patterns, and style conventions
- LLM-Driven Analysis: Relies on the LLM to analyze code and provide insights
- Token-Efficient Storage: Optimizes for minimal token usage while maximizing memory capacity
- Incremental Updates: Updates knowledge when code changes without redundant storage
- Fast Retrieval: Enables quick and accurate retrieval of relevant information
SourceSage uses a novel approach where:
- The LLM analyzes code files (in any language)
- The LLM uses MCP tools to register entities, relationships, patterns, and style conventions
- SourceSage stores this knowledge in a token-efficient graph structure
- The LLM can later query this knowledge when needed
This approach leverages the LLM's inherent language understanding while focusing the MCP server on efficient memory management.
# Clone the repository
git clone https://github.com/yourusername/sourcesage.git
cd sourcesage
# Install the package
pip install -e .
# Run the server
sourcesage
# Or run directly from the repository
python -m sourcesage.mcp_server
- Open Claude for Desktop
- Go to Settings > Developer > Edit Config
- Add the following to your
claude_desktop_config.json
:
If you've installed the package:
{
"mcpServers": {
"sourcesage": {
"command": "sourcesage",
"args": []
}
}
}
If you're running from a local directory without installing:
{
"sourcesage": {
"command": "uv",
"args": [
"--directory",
"/path/to/sourcesage",
"run",
"main.py"
]
},
}
- Restart Claude for Desktop
SourceSage provides the following MCP tools:
-
register_entity: Register a code entity in the knowledge graph
Input: - name: Name of the entity (e.g., class name, function name) - entity_type: Type of entity (class, function, module, etc.) - summary: Brief description of the entity - signature: Entity signature (optional) - language: Programming language (optional) - observations: List of observations about the entity (optional) - metadata: Additional metadata (optional) Output: Confirmation message with entity ID
-
register_relationship: Register a relationship between entities
Input: - from_entity: Name of the source entity - to_entity: Name of the target entity - relationship_type: Type of relationship (calls, inherits, imports, etc.) - metadata: Additional metadata (optional) Output: Confirmation message with relationship ID
-
register_pattern: Register a code pattern
Input: - name: Name of the pattern - description: Description of the pattern - language: Programming language (optional) - example: Example code demonstrating the pattern (optional) - metadata: Additional metadata (optional) Output: Confirmation message with pattern ID
-
register_style_convention: Register a coding style convention
Input: - name: Name of the convention - description: Description of the convention - language: Programming language (optional) - examples: Example code snippets demonstrating the convention (optional) - metadata: Additional metadata (optional) Output: Confirmation message with convention ID
-
add_entity_observation: Add an observation to an entity
Input: - entity_name: Name of the entity - observation: Observation to add Output: Confirmation message
-
query_entities: Query entities in the knowledge graph
Input: - entity_type: Filter by entity type (optional) - language: Filter by programming language (optional) - name_pattern: Filter by name pattern (regex, optional) - limit: Maximum number of results to return (optional) Output: List of matching entities
-
get_entity_details: Get detailed information about an entity
Input: - entity_name: Name of the entity Output: Detailed information about the entity
-
query_patterns: Query code patterns in the knowledge graph
Input: - language: Filter by programming language (optional) - pattern_name: Filter by pattern name (optional) Output: List of matching patterns
-
query_style_conventions: Query coding style conventions
Input: - language: Filter by programming language (optional) - convention_name: Filter by convention name (optional) Output: List of matching style conventions
-
get_knowledge_statistics: Get statistics about the knowledge graph
Input: None Output: Statistics about the knowledge graph
-
clear_knowledge: Clear all knowledge from the graph
Input: None Output: Confirmation message
-
Analyze Code: Ask Claude to analyze your code files
"Please analyze this Python file and register the key entities and relationships."
-
Register Entities: Claude will use the register_entity tool to store code entities
"I'll register the main class in this file."
-
Register Relationships: Claude will use the register_relationship tool to store relationships
"I'll register the inheritance relationship between these classes."
-
Query Knowledge: Later, ask Claude about your codebase
"What classes are defined in my codebase?" "Show me the details of the User class." "What's the relationship between the User and Profile classes?"
-
Get Coding Patterns: Ask Claude about coding patterns
"What design patterns are used in my codebase?" "Show me examples of the Factory pattern in my code."
Unlike traditional code analysis tools, SourceSage:
- Leverages LLM Understanding: Uses the LLM's ability to understand code semantics across languages
- Stores Semantic Knowledge: Focuses on meaning and relationships, not just syntax
- Is Language Agnostic: Works with any programming language the LLM understands
- Optimizes for Token Efficiency: Stores knowledge in a way that minimizes token usage
- Evolves with LLM Capabilities: As LLMs improve, so does code understanding
Contributions are welcome! Please feel free to submit a Pull Request.
This project is licensed under the MIT License - see the LICENSE file for details.