Developer Documentation

Welcome to the QDrant Loader developer documentation! This guide provides everything you need to understand, extend, test, and deploy QDrant Loader. Whether you're contributing to the core project or building custom extensions, you'll find detailed technical information and practical examples here.

🎯 Quick Navigation

Core Development

Quality & Deployment

Documentation

πŸ—οΈ Architecture Overview

QDrant Loader follows a modular architecture designed for multi-project document ingestion and vector storage:

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                    QDrant Loader Core                       β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚  Data Sources    β”‚  Processing      β”‚  Vector Storage       β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”‚
β”‚  β”‚ Connectors  β”‚ β”‚  β”‚ Processors  β”‚ β”‚  β”‚ QDrant Client   β”‚  β”‚
β”‚  β”‚ - Local     β”‚ β”‚  β”‚ - MarkItDownβ”‚ β”‚  β”‚ - Collections   β”‚  β”‚
β”‚  β”‚ - Git       β”‚ β”‚  β”‚ - Text      β”‚ β”‚  β”‚ - Vectors       β”‚  β”‚
β”‚  β”‚ - Confluenceβ”‚ β”‚  β”‚ - Chunking  β”‚ β”‚  β”‚ - Search        β”‚  β”‚
β”‚  β”‚ - Jira      β”‚ β”‚  β”‚ - Embedding β”‚ β”‚  β”‚ - Metadata      β”‚  β”‚
β”‚  β”‚ - PublicDocsβ”‚ β”‚  β”‚             β”‚ β”‚  β”‚                 β”‚  β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚  MCP Server      β”‚  CLI Interface   β”‚  Configuration       β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”‚
β”‚  β”‚ Search APIs β”‚ β”‚  β”‚ Commands    β”‚ β”‚  β”‚ YAML Config     β”‚  β”‚
β”‚  β”‚ - Semantic  β”‚ β”‚  β”‚ - init      β”‚ β”‚  β”‚ - Multi-project β”‚  β”‚
β”‚  β”‚ - Hierarchy β”‚ β”‚  β”‚ - ingest    β”‚ β”‚  β”‚ - Workspace     β”‚  β”‚
β”‚  β”‚ - Attachmentβ”‚ β”‚  β”‚ - config    β”‚ β”‚  β”‚ - Environment   β”‚  β”‚
β”‚  β”‚             β”‚ β”‚  β”‚ - project   β”‚ β”‚  β”‚ - Validation    β”‚  β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

πŸš€ Getting Started for Developers

1. Development Environment Setup

# Clone the repository
git clone https://github.com/martin-papy/qdrant-loader.git
cd qdrant-loader

# Create virtual environment
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

# Install development dependencies
cd packages/qdrant-loader
pip install -e ".[dev]"

# Install MCP server package
cd ../qdrant-loader-mcp-server
pip install -e ".[dev]"

# Start QDrant for development
docker run -p 6333:6333 qdrant/qdrant:latest

2. Running Tests

# Run all tests from workspace root
make test

# Run specific package tests
cd packages/qdrant-loader
pytest

# Run with coverage
pytest --cov=qdrant_loader --cov-report=html

# Run MCP server tests
cd packages/qdrant-loader-mcp-server
pytest

3. Code Quality Checks

# From workspace root
make lint
make format

# Or manually
cd packages/qdrant-loader
black src/
isort src/
flake8 src/
mypy src/

πŸ“š Core Concepts for Developers

Data Flow Architecture

Understanding the data flow is crucial for development:

  1. Configuration Phase
  2. Multi-project workspace configuration
  3. Global settings and project-specific sources
  4. Environment variable management
  5. Validation and initialization

  6. Ingestion Phase

  7. Connectors fetch documents from data sources
  8. File conversion using MarkItDown library
  9. Content extraction and cleaning
  10. Chunking strategies for large documents
  11. Metadata extraction and enrichment

  12. Embedding Phase

  13. Text content converted to embeddings via OpenAI
  14. Batch processing for efficiency
  15. Error handling and retries
  16. Progress tracking and metrics

  17. Storage Phase

  18. Vectors stored in QDrant collections
  19. Metadata indexed for filtering
  20. Project-based organization
  21. State tracking and change detection

  22. Search Phase (MCP Server)

  23. Semantic similarity search
  24. Hierarchy-aware search
  25. Attachment-specific search
  26. Project filtering and organization

Connector System

QDrant Loader uses a connector-based architecture for data sources:

# Example connector implementation
from qdrant_loader.connectors.base import BaseConnector
from qdrant_loader.core.document import Document

class CustomConnector(BaseConnector):
    async def get_documents(self) -> list[Document]:
        """Get documents from the source."""
        documents = []
        # Your custom logic here
        for item in self.fetch_data():
            doc = Document(
                content=item.content,
                metadata=item.metadata,
                source_type="custom",
                source_name=self.config.name
            )
            documents.append(doc)
        return documents

Available connectors:

  • LocalFileConnector - Local file system
  • GitConnector - Git repositories
  • ConfluenceConnector - Confluence spaces
  • JiraConnector - Jira projects
  • PublicDocsConnector - Public documentation sites

πŸ”§ Development Workflows

Contributing to Core

  1. Fork and Clone

bash git clone https://github.com/your-username/qdrant-loader.git cd qdrant-loader git remote add upstream https://github.com/martin-papy/qdrant-loader.git

  1. Create Feature Branch

bash git checkout -b feature/your-feature-name

  1. Development Cycle

```bash # Make changes # Run tests make test

# Check code quality make lint

# Commit changes git commit -m "feat: add new feature" ```

  1. Submit Pull Request
  2. Ensure all tests pass
  3. Update documentation
  4. Add changelog entry
  5. Request review

Custom Connector Development

  1. Create Connector Structure

my-connector/ β”œβ”€β”€ src/ β”‚ └── my_connector/ β”‚ β”œβ”€β”€ __init__.py β”‚ β”œβ”€β”€ connector.py β”‚ └── config.py β”œβ”€β”€ tests/ └── pyproject.toml

  1. Implement Connector Interface

```python from qdrant_loader.connectors.base import BaseConnector from qdrant_loader.config.source_config import SourceConfig

class MyConnector(BaseConnector): def init(self, config: SourceConfig): super().init(config) # Initialize your connector

   async def get_documents(self) -> list[Document]:
       # Implement document fetching logic
       pass

```

  1. Add Configuration Support

```python from pydantic import BaseModel

class MyConnectorConfig(SourceConfig): source_type: str = "my_connector" api_key: str base_url: str # Add your configuration fields ```

πŸ“– Detailed Guides

Architecture Guide

Deep dive into system design, component interactions, and architectural decisions. Essential reading for understanding how QDrant Loader works internally.

Key Topics:

  • Multi-project workspace architecture
  • Connector and processor interfaces
  • Async ingestion pipeline design
  • State management and change detection
  • MCP server integration

Extending Guide

Comprehensive guide for building custom functionality and connectors. Learn how to extend QDrant Loader for your specific needs.

Key Topics:

  • Custom connector development
  • File conversion extensions
  • Configuration schema extensions
  • Testing custom components
  • Packaging and distribution

Testing Guide

Testing strategies, frameworks, and best practices for ensuring code quality and reliability.

Key Topics:

  • Unit testing with pytest
  • Integration testing strategies
  • Async testing patterns
  • Mock and fixture usage
  • CI/CD integration

Deployment Guide

Production deployment strategies, containerization, and operational best practices.

Key Topics:

  • Docker containerization
  • Environment configuration
  • Monitoring and logging
  • Performance optimization
  • Security considerations

πŸ› οΈ Development Tools and Utilities

Available CLI Commands

# Initialize QDrant collection
qdrant-loader --workspace . init

# Ingest documents
qdrant-loader --workspace . ingest

# View configuration
qdrant-loader --workspace . config

# Project management
qdrant-loader --workspace . project list
qdrant-loader --workspace . project status
qdrant-loader --workspace . project validate

# Start MCP server
mcp-qdrant-loader

Debugging and Profiling

# Enable debug logging
qdrant-loader --log-level DEBUG --workspace . ingest

# Profile performance
qdrant-loader --workspace . ingest --profile

# Memory profiling (requires memory_profiler)
python -m memory_profiler your_script.py

Development Scripts

# Makefile targets
make test          # Run all tests
make lint          # Run linting
make format        # Format code
make docs          # Build documentation
make clean         # Clean build artifacts

πŸ”— Integration Examples

Workspace Configuration

# config.yaml
global_config:
  qdrant:
    url: "http://localhost:6333"
    collection_name: "my_collection"
  openai:
    api_key: "${OPENAI_API_KEY}"

projects:
  - project_id: "docs"
    sources:
      - source_type: "local_files"
        name: "documentation"
        config:
          base_url: "file://./docs"
          include_paths: ["**/*.md"]

Programmatic Usage

from qdrant_loader.config import Settings, get_settings
from qdrant_loader.core.async_ingestion_pipeline import AsyncIngestionPipeline

# Load settings
settings = get_settings()

# Create and run pipeline
pipeline = AsyncIngestionPipeline(settings)
await pipeline.run()

MCP Server Integration

# The MCP server runs as a separate process
# Start with: mcp-qdrant-loader

# It provides search tools to AI development environments
# Tools available:
# - search_documents
# - search_with_hierarchy
# - search_attachments

πŸ“‹ Development Checklist

Before Submitting Code

  • [ ] All tests pass (make test)
  • [ ] Code style checks pass (make lint)
  • [ ] Type checking passes (mypy)
  • [ ] Documentation updated
  • [ ] Changelog entry added (if applicable)

For New Features

  • [ ] Design document created (for major features)
  • [ ] Tests cover all code paths
  • [ ] Documentation includes examples
  • [ ] Backward compatibility maintained
  • [ ] Configuration schema updated (if needed)

For Bug Fixes

  • [ ] Root cause identified
  • [ ] Regression test added
  • [ ] Fix verified in multiple environments
  • [ ] Documentation updated (if needed)

🀝 Community and Support

Getting Help

  • GitHub Issues - Bug reports and feature requests
  • Discussions - Questions and community support
  • Documentation - Comprehensive guides and references
  • Code Examples - Real-world usage patterns

Contributing Guidelines

  1. Code of Conduct - Be respectful and inclusive
  2. Issue Templates - Use provided templates for consistency
  3. Pull Request Process - Follow the established workflow
  4. Review Process - Participate in code reviews
  5. Documentation - Keep documentation up to date

Development Roadmap

  • Core Features - Enhanced search capabilities and performance
  • Connectors - Additional data source integrations
  • Developer Experience - Better tooling and documentation
  • Enterprise Features - Advanced security and compliance

Ready to start developing? Choose your path:

Need help? Join our community discussions or open an issue on GitHub!

Back to Documentation
Generated from README.md