Workspace Mode Configuration
This guide covers how to configure QDrant Loader using workspace mode, which provides organized directory structure and simplified configuration management for your projects.
🎯 Overview
Workspace mode in QDrant Loader provides a structured approach to organizing your configuration files, logs, and metrics in a dedicated directory. It automatically discovers configuration files and creates necessary subdirectories for organized project management.
What Workspace Mode Provides
📁 Workspace Directory
├── config.yaml # Main configuration file
├── .env # Environment variables (optional)
├── logs/ # Application logs
│ └── qdrant-loader.log
├── metrics/ # Performance metrics
└── data/ # State database
└── state.db
Benefits of Workspace Mode
- Auto-discovery: Automatically finds
config.yaml
and.env
files - Organized structure: Creates dedicated directories for logs and metrics
- Simplified commands: No need to specify config file paths
- Consistent layout: Standardized project organization
🏗️ Setting Up Workspace Mode
Create Workspace Directory
# Create workspace directory
mkdir my-qdrant-workspace
cd my-qdrant-workspace
# Copy configuration template
cp packages/qdrant-loader/conf/config.template.yaml config.yaml
cp packages/qdrant-loader/conf/.env.template .env
Basic Configuration Structure
QDrant Loader uses a multi-project configuration structure where all projects share a single Qdrant collection but are isolated through project metadata:
# config.yaml - Multi-project configuration
global_config:
qdrant:
url: "http://localhost:6333"
api_key: null # Optional for Qdrant Cloud
collection_name: "my_documents" # Shared by all projects
embedding:
model: "text-embedding-3-small"
api_key: "${OPENAI_API_KEY}"
vector_size: 1536
projects:
docs-project:
project_id: "docs-project"
display_name: "Documentation Project"
description: "Company documentation"
sources:
git:
docs-repo:
base_url: "https://github.com/company/docs"
branch: "main"
include_paths: ["docs/**", "*.md"]
token: "${GITHUB_TOKEN}"
wiki-project:
project_id: "wiki-project"
display_name: "Wiki Project"
description: "Internal wiki content"
sources:
confluence:
company-wiki:
base_url: "https://company.atlassian.net/wiki"
space_key: "WIKI"
token: "${CONFLUENCE_TOKEN}"
email: "${CONFLUENCE_EMAIL}"
Environment Variables
# .env file
# Required - QDrant Database
QDRANT_URL=http://localhost:6333
QDRANT_API_KEY=your-qdrant-cloud-key # Optional
# Required - OpenAI API
OPENAI_API_KEY=your-openai-api-key
# Optional - Source credentials
GITHUB_TOKEN=your-github-token
CONFLUENCE_TOKEN=your-confluence-token
CONFLUENCE_EMAIL=your-email@company.com
⚙️ Workspace Commands
Initialize Workspace
# Initialize collection and prepare workspace
qdrant-loader --workspace . init
# Force recreation of existing collection
qdrant-loader --workspace . init --force
Ingest Data
# Process all projects and sources
qdrant-loader --workspace . ingest
# Process specific project
qdrant-loader --workspace . ingest --project docs-project
# Process specific source type across all projects
qdrant-loader --workspace . ingest --source-type git
# Process specific source within a project
qdrant-loader --workspace . ingest --project docs-project --source docs-repo
Configuration Management
# Show current configuration
qdrant-loader --workspace . config
# Validate configuration
qdrant-loader project --workspace . validate
# List all projects
qdrant-loader project --workspace . list
# Show project status
qdrant-loader project --workspace . status
📁 Project Management
Project Structure
Each project in the configuration has:
- project_id: Unique identifier used for filtering and metadata
- display_name: Human-readable name for the project
- description: Brief description of the project's purpose
- sources: Configuration for data sources (git, confluence, jira, etc.)
Project Commands
# List all configured projects
qdrant-loader project --workspace . list
# Show detailed project information
qdrant-loader project --workspace . status
# Show status for specific project
qdrant-loader project --workspace . status --project-id docs-project
# Validate project configurations
qdrant-loader project --workspace . validate
# Output in JSON format
qdrant-loader project --workspace . list --format json
Project Isolation
Projects are isolated through metadata, not separate collections:
- All projects use the same Qdrant collection
- Each document includes
project_id
in metadata - Search can be filtered by project through the MCP server
- Simplifies collection management and enables cross-project search
🔍 Data Source Configuration
Supported Source Types
Git Repositories
sources:
git:
repo-name:
base_url: "https://github.com/user/repo"
branch: "main"
include_paths: ["docs/**", "*.md"]
exclude_paths: ["node_modules/**"]
file_types: ["*.md", "*.rst", "*.txt"]
token: "${GITHUB_TOKEN}"
enable_file_conversion: true
Confluence
sources:
confluence:
wiki-name:
base_url: "https://company.atlassian.net/wiki"
deployment_type: "cloud"
space_key: "DOCS"
content_types: ["page", "blogpost"]
token: "${CONFLUENCE_TOKEN}"
email: "${CONFLUENCE_EMAIL}"
enable_file_conversion: true
download_attachments: true
JIRA
sources:
jira:
project-name:
base_url: "https://company.atlassian.net"
deployment_type: "cloud"
project_key: "PROJ"
token: "${JIRA_TOKEN}"
email: "${JIRA_EMAIL}"
enable_file_conversion: true
download_attachments: true
Local Files
sources:
localfile:
local-docs:
base_url: "file:///path/to/files"
include_paths: ["docs/**"]
exclude_paths: ["tmp/**"]
file_types: ["*.md", "*.txt"]
max_file_size: 1048576 # 1MB
enable_file_conversion: true
Public Documentation
sources:
publicdocs:
docs-site:
base_url: "https://docs.example.com"
version: "1.0"
content_type: "html"
path_pattern: "/docs/{version}/**"
selectors:
content: "article.main-content"
remove: ["nav", "header", "footer"]
enable_file_conversion: true
download_attachments: true
🔧 Advanced Configuration
Global Settings
global_config:
# Chunking configuration
chunking:
chunk_size: 1500
chunk_overlap: 200
# Embedding configuration
embedding:
endpoint: "https://api.openai.com/v1"
model: "text-embedding-3-small"
api_key: "${OPENAI_API_KEY}"
batch_size: 100
vector_size: 1536
max_tokens_per_request: 8000
max_tokens_per_chunk: 8000
# File conversion settings
file_conversion:
max_file_size: 52428800 # 50MB
conversion_timeout: 300 # 5 minutes
markitdown:
enable_llm_descriptions: false
llm_model: "gpt-4o"
llm_api_key: "${OPENAI_API_KEY}"
State Management
Workspace mode automatically manages the state database:
global_config:
state_management:
database_path: "${STATE_DB_PATH}" # Ignored in workspace mode
table_prefix: "qdrant_loader_"
connection_pool:
size: 5
timeout: 30
In workspace mode, the state database is automatically created as state.db
in a data directory within workspace directory.
📊 Workspace Structure
Directory Layout
my-qdrant-workspace/
├── config.yaml # Main configuration
├── .env # Environment variables
├── logs/ # Application logs
│ └── qdrant-loader.log
├── metrics/ # Performance metrics
│ └── ingestion_metrics.json
└── data/
└── state.db # Processing state database
Log Files
Workspace mode automatically configures logging:
- Location:
logs/qdrant-loader.log
- Format: Structured logging with timestamps
- Rotation: Automatic log rotation (if configured)
Metrics
Performance metrics are stored in the metrics/
directory:
- Ingestion metrics: Processing statistics and performance data
- Error tracking: Failed operations and retry attempts
- Resource usage: Memory and processing time metrics
🔗 MCP Server Integration
The MCP server currently uses environment variables for configuration and does not support workspace mode directly. You need to configure it using environment variables:
{
"mcpServers": {
"qdrant-loader": {
"command": "mcp-qdrant-loader",
"env": {
"QDRANT_URL": "http://localhost:6333",
"QDRANT_API_KEY": "your-api-key",
"QDRANT_COLLECTION_NAME": "my_documents",
"OPENAI_API_KEY": "your-openai-key"
}
}
}
}
MCP Server Environment Variables
The MCP server requires these environment variables:
- QDRANT_URL: URL of your QDrant instance (required)
- QDRANT_API_KEY: API key for QDrant authentication (optional)
- QDRANT_COLLECTION_NAME: Name of the collection to use (default: "documents")
- OPENAI_API_KEY: OpenAI API key for embeddings (required)
- MCP_DISABLE_CONSOLE_LOGGING: Set to "true" to disable console logging (optional)
Current Limitations
- The MCP server does not support the
--workspace
flag - Configuration must be done through environment variables
- The
--config
option exists but is not currently implemented - Project-aware search is not yet available in the MCP server
Future Workspace Integration
Workspace mode support for the MCP server is planned for future releases, which would allow:
- Automatic discovery of workspace configuration
- Project-aware search capabilities
- Simplified configuration through workspace files
🚀 Getting Started Checklist
- [ ] Create workspace directory and navigate to it
- [ ] Copy configuration template to
config.yaml
- [ ] Create environment file with required credentials
- [ ] Configure projects with your data sources
- [ ] Initialize collection with
qdrant-loader --workspace . init
- [ ] Ingest data with
qdrant-loader --workspace . ingest
- [ ] Verify setup with
qdrant-loader project --workspace . list
- [ ] Test search through MCP server integration
🔗 Related Documentation
- Configuration File Reference - Complete YAML configuration options
- Environment Variables Reference - Environment variable configuration
- CLI Reference - Command-line interface documentation
- MCP Server Setup - MCP server integration guide
Workspace mode provides organized, scalable configuration management for your QDrant Loader projects. 🎉
This structured approach simplifies project management while maintaining flexibility for complex multi-source configurations.