π Architecture Overview
This section provides a comprehensive overview of QDrant Loader's architecture, including system design principles, component interactions, and data flow patterns.
π― Design Principles
QDrant Loader is built on several key architectural principles:
1. Modularity and Extensibility
- Connector-based architecture - Easy to add new data source connectors
- Clear interfaces - Well-defined interfaces between components
- Separation of concerns - Each component has a single responsibility
2. Scalability and Performance
- Asynchronous processing - Non-blocking I/O for better throughput
- Batch processing - Efficient handling of large datasets
- Configurable concurrency - Adjustable parallelism based on resources
3. Reliability and Robustness
- Error handling - Graceful degradation and retry mechanisms
- State management - Persistent tracking of processing state
- Incremental updates - Only process changed content
4. Developer Experience
- Clear CLI interface - Intuitive command-line operations
- Comprehensive testing - Unit, integration, and end-to-end tests
- Rich documentation - Detailed guides and examples
ποΈ System Architecture
High-Level Overview
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β QDrant Loader β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β β
β βββββββββββββββ βββββββββββββββ βββββββββββββββ β
β β CLI β β MCP Server β β Config β β
β β Interface β β (Separate) β β Manager β β
β βββββββββββββββ βββββββββββββββ βββββββββββββββ β
β β β β β
β βββββββββββββββββββΌββββββββββββββββββ β
β β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β Async Ingestion Pipeline β β
β β β β
β β βββββββββββββββ βββββββββββββββ βββββββββββββββ β β
β β β Data β β File β β Content β β β
β β β Connectors β β Converters β β Processors β β β
β β βββββββββββββββ βββββββββββββββ βββββββββββββββ β β
β β β β β β β
β β βββββββββββββββββββΌββββββββββββββββββ β β
β β β β β
β β βββββββββββββββ βββββββββββββββ βββββββββββββββ β β
β β β Embedding β β State β β QDrant β β β
β β β Service β β Manager β β Manager β β β
β β βββββββββββββββ βββββββββββββββ βββββββββββββββ β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β β
βββββββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββββββββ β
βββββββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββββββββ
β External Services β
βββββββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββββββββ€
β β β
β βββββββββββββββ βββββββββββββββ βββββββββββββββ β
β β QDrant β β OpenAI β β Data β β
β β Database β β API β β Sources β β
β βββββββββββββββ βββββββββββββββ βββββββββββββββ β
β β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Component Layers
1. Interface Layer
- CLI Interface - Command-line tool for data ingestion and management (
setup,init,ingest,config,project list|status|validate) - MCP Server - Separate package (
qdrant-loader-mcp-server) for AI tool integration - Config Manager - Multi-project configuration loading, validation, and environment variables
2. Core Pipeline
- Data Connectors - Fetch content from various data sources using BaseConnector interface
- File Converters - Convert files to text using MarkItDown library
- Content Processors - Chunk text, extract metadata, and prepare for vectorization
- LLM Service - Generate embeddings using configurable LLM providers (OpenAI, Azure OpenAI, Ollama)
- State Manager - SQLite-based tracking of processing state and incremental updates
- QDrant Manager - Manage vector storage and collection operations
3. External Services
- QDrant Database - Vector storage and similarity search
- LLM APIs - Embedding generation via provider-agnostic interface (OpenAI, Azure OpenAI, Ollama)
- Data Sources - Git repositories, Confluence, Jira, local files, web content
π§ Core Components
Data Source Connectors
Purpose: Fetch content from external systems via a common abstraction Key Features:
- Unified
BaseConnectorinterface for all sources - Per-source authentication and validation
- Retry-aware HTTP and rate limiting (where relevant)
- Shared HTTP utilities under
qdrant_loader.connectors.shared.http: RateLimiterfor per-interval throttlingrequest_with_policy/aiohttp_request_with_policyfor consistent retries + jitter + optional rate limiting- Incremental updates via state tracking
- Rich metadata on every
Document
Supported Sources: Git, Confluence, Jira, Local Files, Public Docs Implementation notes:
- Jira uses
request_with_policywith project-configuredrequests_per_minute. - Confluence and PublicDocs expose
requests_per_minutein config (defaults: Confluence 60 RPM, PublicDocs 120 RPM).
Interface (simplified):
- Interface definition: BaseConnector
- Required connector method: BaseConnector.get_documents
File Converters
Purpose: Convert various file formats to text using MarkItDown Key Features:
- 20+ file format support via MarkItDown library
- Optional LLM-enhanced descriptions
- Metadata preservation
- Error handling for corrupted files
- Configurable conversion options
Supported Formats:
- Documents: PDF, DOCX, PPTX, XLSX
- Images: PNG, JPEG, GIF (with OCR)
- Archives: ZIP, TAR, 7Z
- Data: JSON, CSV, XML, YAML
- Audio: MP3, WAV (transcription)
Content Processors
Purpose: Process and prepare content for vectorization
Key Features:
- Text chunking with configurable sizes
- Metadata extraction and enrichment
- Content deduplication via hashing
- Document ID generation
- Async processing pipelines
Refactoring highlights (Large Files):
- Markdown strategy split into
splitters/{base,standard,excel,fallback}.pywith facadesection_splitter.py. - Code strategy modularized (
parser/*,metadata/*,processor/*); orchestrators remain thin.
LLM Service
Purpose: Generate embeddings using configurable LLM providers Key Features:
- Provider-agnostic interface (OpenAI, Azure OpenAI, Ollama)
- Configurable embedding models (text-embedding-3-small, text-embedding-ada-002, etc.)
- Batch processing for efficiency
- Error handling and retries
- Rate limiting compliance
- Unified configuration via
global.llm.*
State Manager
Purpose: Track processing state and enable incremental updates
Key Features:
- SQLite + SQLAlchemy async engine
- Content hashing for change detection
- Ingestion history and per-document state
- Project-aware queries and updates
Implementation: qdrant_loader/core/state/state_manager.py
QDrant Manager
Purpose: Manage vector storage and collection operations Key Features:
- Collection creation and management
- Vector upsert operations with batching
- Search and filtering capabilities
- Metadata handling
- Connection management with retry logic
π§ͺ Data Flow
Ingestion Pipeline
βββββββββββββββ βββββββββββββββ βββββββββββββββ βββββββββββββββ
β Data βββββΆβ File βββββΆβ Content βββββΆβ Embedding β
β Connector β β Converter β β Processor β β Service β
βββββββββββββββ βββββββββββββββ βββββββββββββββ βββββββββββββββ β β β β βΌ βΌ βΌ βΌ
βββββββββββββββ βββββββββββββββ βββββββββββββββ βββββββββββββββ
β Raw Data β β Text β β Chunks β β Vectors β
β + Metadata β β + Metadata β β + Metadata β β + Metadata β
βββββββββββββββ βββββββββββββββ βββββββββββββββ βββββββββββββββ β βΌ βββββββββββββββ β QDrant β β Manager β βββββββββββββββ β βΌ βββββββββββββββ β QDrant β β Database β βββββββββββββββ
Search Pipeline (MCP Server)
βββββββββββββββ βββββββββββββββ βββββββββββββββ βββββββββββββββ
β Query βββββΆβ Embedding βββββΆβ QDrant βββββΆβ Results β
β (Text) β β Service β β Search β β + Metadata β
βββββββββββββββ βββββββββββββββ βββββββββββββββ βββββββββββββββ β β β β βΌ βΌ βΌ βΌ
βββββββββββββββ βββββββββββββββ βββββββββββββββ βββββββββββββββ
β User Query β β Query Vectorβ β Similarity β β Ranked β
β β β β β Scores β β Results β
βββββββββββββββ βββββββββββββββ βββββββββββββββ βββββββββββββββ
π Connector System
Connector Architecture
QDrant Loader uses a connector-based architecture for extensibility. Connectors are resolved through the connector factory in the pipeline orchestrator:
Implementation citation: PipelineOrchestrator._collect_documents_from_sources
Available Connectors
- GitConnector - Git repository processing with file filtering
- ConfluenceConnector - Confluence space content and attachments
- JiraConnector - Jira project issues and attachments
- LocalFileConnector - Local file system processing
- PublicDocsConnector - Web-based documentation crawling
π State Management
State Storage
QDrant Loader uses SQLite with SQLAlchemy for state management:
- State manager class: StateManager
- Initialization flow: StateManager.initialize
Incremental Updates
Implementation citation: StateManager.update_document_state
π Performance Considerations
Asynchronous Processing
The entire pipeline is built on async/await patterns:
- Pipeline entry point: AsyncIngestionPipeline.process_documents
Batch Processing
Implementation citation: QdrantManager.upsert_points
π Security Architecture
Authentication Flow
Each connector handles its own authentication:
Implementation citation: ConfluenceConnector._setup_authentication
Data Privacy
- Credential management - Environment variables and secure configuration
- State isolation - Project-based data separation
- Access control - Per-source authentication
- Local processing - No data sent to external services except for LLM embedding generation
π Related Documentation
- CLI Reference - Command-line interface
- Configuration Guide - Configuration options
- Extending Guide - How to extend functionality
- Testing Guide - Testing framework and patterns
π Architecture Evolution
Current Capabilities
- Multi-project workspace support
- SQLite-based state management with async support
- Asynchronous processing with async I/O
- Separate MCP server package
- MarkItDown-based file conversion
Roadmap Priorities
- Enhanced connectors - More data source integrations
- Improved performance - Better parallel processing and caching
- Advanced search - Enhanced MCP server capabilities
- Deployment options - Container images and deployment scripts
- Monitoring and observability - Enhanced metrics and logging
For version-specific milestones and release status, see the project CHANGELOG.
Ready to dive deeper? Explore the CLI Reference for command-line usage or check out the Extending Guide to learn about extending QDrant Loader.