Configuration Reference
This section provides comprehensive documentation for configuring QDrant Loader. Learn how to set up data sources, optimize performance, configure security, and customize behavior for your specific needs.
๐ฏ Configuration Overview
QDrant Loader uses a combination of configuration files and environment variables:
config.yaml
- Main configuration file for data sources and processing options.env
- Environment variables for credentials and system settings- Command-line options - Runtime parameters and overrides
๐ Configuration Structure
your-workspace/
โโโ config.yaml # Main configuration file
โโโ .env # Environment variables
โโโ state.db # Processing state (auto-generated)
โโโ logs/ # Log files (optional)
๐ Quick Configuration
1. Download Templates
# Download configuration templates
curl -o config.yaml https://raw.githubusercontent.com/martin-papy/qdrant-loader/main/packages/qdrant-loader/conf/config.template.yaml
curl -o .env https://raw.githubusercontent.com/martin-papy/qdrant-loader/main/packages/qdrant-loader/conf/.env.template
2. Basic Environment Setup
Edit .env
file:
# Required - OpenAI Configuration
OPENAI_API_KEY=your_openai_api_key_here
# Required - QDrant Configuration
QDRANT_URL=http://localhost:6333
QDRANT_COLLECTION_NAME=documents
QDRANT_API_KEY=your_qdrant_api_key_here # Optional: for QDrant Cloud
# Optional - Git Authentication
REPO_TOKEN=your_github_token_here
# Optional - Confluence Configuration
CONFLUENCE_TOKEN=your_confluence_token_here
CONFLUENCE_EMAIL=your_confluence_email_here
# Optional - Jira Configuration
JIRA_TOKEN=your_jira_token_here
JIRA_EMAIL=your_jira_email_here
3. Basic Multi-Project Configuration
Edit config.yaml
:
# Global configuration shared across all projects
global:
qdrant:
url: "${QDRANT_URL}"
collection_name: "${QDRANT_COLLECTION_NAME}"
embedding:
model: "text-embedding-3-small"
api_key: "${OPENAI_API_KEY}"
chunking:
chunk_size: 1500
chunk_overlap: 200
# Multi-project configuration
projects:
my-project:
project_id: "my-project"
display_name: "My Documentation Project"
description: "Company documentation and code"
sources:
git:
docs-repo:
base_url: "https://github.com/your-org/your-repo.git"
branch: "main"
include_paths:
- "docs/**"
- "README.md"
file_types:
- "*.md"
- "*.py"
enable_file_conversion: true
localfile:
local-docs:
base_url: "file://./docs"
include_paths:
- "**/*.md"
- "**/*.pdf"
enable_file_conversion: true
๐ Configuration Sections
๐ง Environment Variables
Complete reference for all environment variables including:
- QDrant connection settings - URL, API keys, collection configuration
- Authentication credentials - API tokens for data sources
- Processing options - Embedding models, file conversion settings
๐ Security Considerations
Security best practices and configuration:
- API key management - Secure storage and rotation
- Access control - Permissions and authentication
- Data privacy - Handling sensitive information
- Network security - TLS, firewalls, and secure connections
๐ฏ Configuration by Use Case
๐จโ๐ป Software Development Team
Goal: Integrate code repositories, documentation, and project management
# config.yaml
global:
qdrant:
url: "${QDRANT_URL}"
collection_name: "${QDRANT_COLLECTION_NAME}"
embedding:
model: "text-embedding-3-small"
api_key: "${OPENAI_API_KEY}"
batch_size: 50
chunking:
chunk_size: 800
chunk_overlap: 150
projects:
dev-team:
project_id: "dev-team"
display_name: "Development Team Knowledge"
description: "Code repositories, documentation, and project management"
sources:
git:
main-app:
base_url: "https://github.com/company/main-app.git"
branch: "main"
include_paths: ["src/**", "docs/**"]
file_types: ["*.py", "*.md"]
token: "${REPO_TOKEN}"
enable_file_conversion: true
confluence:
dev-space:
base_url: "https://company.atlassian.net/wiki"
space_key: "DEV"
token: "${CONFLUENCE_TOKEN}"
email: "${CONFLUENCE_EMAIL}"
enable_file_conversion: true
download_attachments: true
jira:
project-tracker:
base_url: "https://company.atlassian.net"
project_key: "PROJ"
token: "${JIRA_TOKEN}"
email: "${JIRA_EMAIL}"
enable_file_conversion: true
download_attachments: true
# .env
QDRANT_URL=http://qdrant-server:6333
QDRANT_COLLECTION_NAME=team_knowledge
OPENAI_API_KEY=sk-proj-your_key
REPO_TOKEN=ghp_your_github_token
CONFLUENCE_URL=https://company.atlassian.net
CONFLUENCE_TOKEN=your_confluence_token
CONFLUENCE_EMAIL=team@company.com
JIRA_URL=https://company.atlassian.net
JIRA_TOKEN=your_jira_token
JIRA_EMAIL=team@company.com
๐ Documentation Team
Goal: Centralize and search documentation across platforms
# config.yaml
global:
qdrant:
url: "${QDRANT_URL}"
collection_name: "${QDRANT_COLLECTION_NAME}"
embedding:
model: "text-embedding-3-small"
api_key: "${OPENAI_API_KEY}"
chunking:
chunk_size: 1200
chunk_overlap: 300
file_conversion:
max_file_size: 52428800 # 50MB for large documents
projects:
documentation:
project_id: "documentation"
display_name: "Documentation Hub"
description: "Centralized documentation across platforms"
sources:
confluence:
docs-space:
base_url: "https://company.atlassian.net/wiki"
space_key: "DOCS"
token: "${CONFLUENCE_TOKEN}"
email: "${CONFLUENCE_EMAIL}"
enable_file_conversion: true
download_attachments: true
localfile:
legacy-docs:
base_url: "file://./legacy-docs"
include_paths: ["**/*.pdf", "**/*.docx", "**/*.md"]
enable_file_conversion: true
publicdocs:
api-docs:
base_url: "https://api-docs.example.com"
selectors:
content: ".content"
enable_file_conversion: true
๐ฌ Research Team
Goal: Index and search research materials and data
# config.yaml
global:
qdrant:
url: "${QDRANT_URL}"
collection_name: "${QDRANT_COLLECTION_NAME}"
embedding:
model: "text-embedding-3-small"
api_key: "${OPENAI_API_KEY}"
batch_size: 20 # Slower processing for large files
chunking:
chunk_size: 1500
chunk_overlap: 400
file_conversion:
max_file_size: 104857600 # 100MB for datasets
projects:
research:
project_id: "research"
display_name: "Research Materials"
description: "Research papers, datasets, and analysis tools"
sources:
localfile:
research-papers:
base_url: "file://./research-papers"
include_paths: ["**/*.pdf", "**/*.txt", "**/*.csv"]
enable_file_conversion: true
notebooks:
base_url: "file://./notebooks"
include_paths: ["**/*.ipynb", "**/*.py"]
enable_file_conversion: true
git:
analysis-tools:
base_url: "https://github.com/research-org/analysis-tools.git"
include_paths: ["**/*.py", "**/*.md", "**/*.ipynb"]
token: "${REPO_TOKEN}"
enable_file_conversion: true
๐ข Enterprise Deployment
Goal: Scalable, secure deployment for large organization
# config.yaml
global:
qdrant:
url: "${QDRANT_URL}"
api_key: "${QDRANT_API_KEY}"
collection_name: "${QDRANT_COLLECTION_NAME}"
embedding:
model: "text-embedding-3-small"
api_key: "${OPENAI_API_KEY}"
batch_size: 100
chunking:
chunk_size: 1000
chunk_overlap: 200
projects:
enterprise-platform:
project_id: "enterprise-platform"
display_name: "Enterprise Platform"
description: "Platform code, architecture, and documentation"
sources:
git:
platform-repo:
base_url: "https://github.com/enterprise/platform.git"
branch: "main"
include_paths: ["**/*.py", "**/*.js", "**/*.md"]
token: "${REPO_TOKEN}"
enable_file_conversion: true
services-repo:
base_url: "https://github.com/enterprise/services.git"
branch: "main"
include_paths: ["**/*.py", "**/*.md"]
token: "${REPO_TOKEN}"
enable_file_conversion: true
confluence:
architecture:
base_url: "https://enterprise.atlassian.net/wiki"
space_key: "ARCH"
token: "${CONFLUENCE_TOKEN}"
email: "${CONFLUENCE_EMAIL}"
enable_file_conversion: true
download_attachments: true
documentation:
base_url: "https://enterprise.atlassian.net/wiki"
space_key: "DOCS"
token: "${CONFLUENCE_TOKEN}"
email: "${CONFLUENCE_EMAIL}"
enable_file_conversion: true
download_attachments: true
jira:
platform-issues:
base_url: "https://enterprise.atlassian.net"
project_key: "PLAT"
token: "${JIRA_TOKEN}"
email: "${JIRA_EMAIL}"
enable_file_conversion: true
download_attachments: true
services-issues:
base_url: "https://enterprise.atlassian.net"
project_key: "SERV"
token: "${JIRA_TOKEN}"
email: "${JIRA_EMAIL}"
enable_file_conversion: true
download_attachments: true
# .env
QDRANT_URL=https://qdrant-cluster.company.com
QDRANT_API_KEY=your_enterprise_api_key
QDRANT_COLLECTION_NAME=enterprise_knowledge
OPENAI_API_KEY=sk-proj-enterprise_key
LOG_LEVEL=INFO
LOG_FILE=/var/log/qdrant-loader/app.log
STATE_DB_PATH=/data/qdrant-loader/state.db
๐ง Configuration Validation
Validate Configuration
# Display current configuration
qdrant-loader config --workspace .
# Validate project configurations
qdrant-loader project validate --workspace .
# Check project status
qdrant-loader project status --workspace .
# List all projects
qdrant-loader project list --workspace .
Common Validation Errors
Missing Required Settings
# Error: Missing QDRANT_URL
export QDRANT_URL=http://localhost:6333
# Error: Missing OPENAI_API_KEY
export OPENAI_API_KEY=your_openai_api_key
# Error: Missing collection name
export QDRANT_COLLECTION_NAME=documents
Invalid Configuration Syntax
# โ Invalid YAML syntax - Missing quotes
projects:
my-project:
project_id: my-project # Missing quotes
sources:
git:
repo:
base_url: https://github.com/org/repo.git # Missing quotes
branch: main # Missing quotes
# โ
Correct YAML syntax
projects:
my-project:
project_id: "my-project" # Quoted string
sources:
git:
repo:
base_url: "https://github.com/org/repo.git" # Quoted string
branch: "main" # Quoted string
๐ฏ Configuration Best Practices
1. Environment-Specific Configuration
Use different configurations for different environments:
# Development
cp config.dev.yaml config.yaml
cp .env.dev .env
# Production
cp config.prod.yaml config.yaml
cp .env.prod .env
2. Secure Credential Management
# Use environment variables for sensitive data
export OPENAI_API_KEY=$(cat /secure/openai-key.txt)
export CONFLUENCE_TOKEN=$(vault kv get -field=token secret/confluence)
# Never commit credentials to version control
echo ".env" >> .gitignore
echo "*.key" >> .gitignore
3. Performance Tuning
# Start with conservative settings
chunk_size: 800
batch_size: 20
max_concurrent_requests: 5
# Monitor and adjust based on:
# - Available memory
# - Network bandwidth
# - API rate limits
# - Processing speed requirements
4. Monitoring and Logging
# Monitor project status
qdrant-loader project status --workspace .
# Use debug logging for troubleshooting
qdrant-loader config --log-level DEBUG --workspace .
๐ Advanced Configuration Patterns
Multi-Environment Setup
# config.yaml with environment variables
global:
qdrant:
url: "${QDRANT_URL}"
collection_name: "${QDRANT_COLLECTION_NAME}"
embedding:
model: "${EMBEDDING_MODEL:-text-embedding-3-small}"
api_key: "${OPENAI_API_KEY}"
batch_size: ${BATCH_SIZE:-50}
chunking:
chunk_size: ${CHUNK_SIZE:-1500}
chunk_overlap: ${CHUNK_OVERLAP:-200}
projects:
main-project:
project_id: "main-project"
display_name: "Main Project"
sources:
git:
repo:
base_url: "${GIT_REPO_URL}"
branch: "${GIT_BRANCH:-main}"
include_paths: ["**/*.md", "**/*.py"]
token: "${REPO_TOKEN}"
enable_file_conversion: true
Conditional Configuration
# Different settings based on environment
global:
qdrant:
url: "${QDRANT_URL}"
collection_name: "${QDRANT_COLLECTION_NAME}"
embedding:
api_key: "${OPENAI_API_KEY}"
file_conversion:
max_file_size: ${MAX_FILE_SIZE:-52428800} # 50MB default
projects:
main-project:
project_id: "main-project"
sources:
git:
repo:
base_url: "https://github.com/org/repo.git"
branch: "main"
max_file_size: ${MAX_FILE_SIZE:-1048576} # Smaller files in development
token: "${REPO_TOKEN}"
enable_file_conversion: true
Template-Based Configuration
# Base configuration template
_git_defaults: &git_defaults
branch: "main"
enable_file_conversion: true
include_paths: ["**/*.md", "**/*.py"]
token: "${REPO_TOKEN}"
global:
qdrant:
url: "${QDRANT_URL}"
collection_name: "${QDRANT_COLLECTION_NAME}"
embedding:
api_key: "${OPENAI_API_KEY}"
chunking:
chunk_size: 1500
chunk_overlap: 200
projects:
multi-repo:
project_id: "multi-repo"
sources:
git:
repo1:
<<: *git_defaults
base_url: "https://github.com/org/repo1.git"
repo2:
<<: *git_defaults
base_url: "https://github.com/org/repo2.git"
๐งช Testing Configuration
Configuration Testing Workflow
# 1. Display current configuration
qdrant-loader config --workspace .
# 2. Validate project configurations
qdrant-loader project validate --workspace .
# 3. List all projects
qdrant-loader project list --workspace .
# 4. Check project status
qdrant-loader project status --workspace .
# 5. Initialize QDrant collection
qdrant-loader init --workspace .
# 6. Process data
qdrant-loader ingest --workspace .
Performance Testing
# Monitor resource usage during processing
top -p $(pgrep -f qdrant-loader)
# Check project status
qdrant-loader project status --workspace .
# Measure processing time
time qdrant-loader ingest --workspace .
๐ Related Documentation
- Environment Variables - Complete environment variable reference
- Security Considerations - Security best practices
- Data Sources - Source-specific configuration
- Troubleshooting - Common configuration issues
๐ Getting Help
Configuration Issues
- Common Issues - Frequent configuration problems
- Performance Issues - Performance tuning help
- GitHub Issues - Report configuration bugs
Community Support
- GitHub Discussions - Ask configuration questions
- Configuration Examples - Real-world configuration examples
Ready to configure QDrant Loader? Start with the Environment Variables guide for complete setup instructions.