Basic Configuration
This guide walks you through configuring QDrant Loader for your specific needs. After completing this guide, you'll have a customized setup ready for your data sources and use cases.
🎯 Overview
QDrant Loader uses a flexible configuration system that supports:
- Environment variables for credentials and basic settings
- Configuration files for detailed project and source configuration
- Workspace mode for organized project management
- Multiple environments (development, staging, production)
🔧 Configuration Methods
Configuration Priority
QDrant Loader uses this priority order (highest to lowest):
1. Command-line arguments (--workspace, --config, --env)
2. Environment variables (QDRANT_URL, OPENAI_API_KEY, etc.)
3. Configuration file (config.yaml)
4. Default values (built-in defaults)
Workspace Mode vs. Traditional Mode
Workspace Mode (Recommended) | Traditional Mode |
---|---|
Organized directory structure | Individual config files |
Auto-discovery of config files | Manual file specification |
Built-in logging and metrics | Manual setup required |
Good for: All use cases | Good for: Simple scripts |
5 minutes to configure | 10-15 minutes to configure |
🚀 Quick Setup (Workspace Mode)
Create Workspace
# Create workspace directory
mkdir my-qdrant-workspace
cd my-qdrant-workspace
# Create environment variables file
cat > .env << EOF
# Required - QDrant Database
QDRANT_URL=http://localhost:6333
QDRANT_COLLECTION_NAME=my_documents
# Required - OpenAI API
OPENAI_API_KEY=your-openai-api-key
# Optional - QDrant Cloud (if using cloud)
QDRANT_API_KEY=your-qdrant-cloud-api-key
EOF
Create Basic Configuration
# Create basic multi-project configuration
cat > config.yaml << EOF
# Global configuration shared across all projects
global:
qdrant:
url: "\${QDRANT_URL}"
api_key: "\${QDRANT_API_KEY}"
collection_name: "\${QDRANT_COLLECTION_NAME}"
openai:
api_key: "\${OPENAI_API_KEY}"
model: "text-embedding-3-small"
# Project definitions
projects:
default:
project_id: "default"
display_name: "My Project"
description: "Default project for getting started"
sources:
localfile:
docs:
base_url: "file://."
include_paths: ["*.md", "*.txt"]
file_types: ["*.md", "*.txt"]
EOF
Initialize and Test
# Initialize QDrant collection
qdrant-loader --workspace . init
# Display current configuration
qdrant-loader --workspace . config
# Check project status
qdrant-loader project --workspace . status
⚙️ Advanced Setup (Multi-Project Configuration)
Configuration File Structure
The actual configuration uses a multi-project structure:
# config.yaml - Multi-project configuration
# Global configuration shared across all projects
global:
# QDrant Database Configuration
qdrant:
url: "${QDRANT_URL}"
api_key: "${QDRANT_API_KEY}" # Optional, for QDrant Cloud
collection_name: "${QDRANT_COLLECTION_NAME}"
timeout: 30
# OpenAI Configuration
openai:
api_key: "${OPENAI_API_KEY}"
model: "text-embedding-3-small"
batch_size: 100
max_retries: 3
timeout: 30
# Processing Configuration
processing:
chunk_size: 1000
chunk_overlap: 200
min_chunk_size: 100
max_file_size: 52428800 # 50MB
# State Management
state_management:
database_path: "${STATE_DB_PATH}"
table_prefix: "qdrant_loader_"
# File Conversion
file_conversion:
max_file_size: 52428800 # 50MB
conversion_timeout: 300
markitdown:
enable_llm_descriptions: false
llm_model: "gpt-4o"
llm_api_key: "${OPENAI_API_KEY}"
# Project definitions
projects:
# Documentation project
docs-project:
project_id: "docs-project"
display_name: "Documentation Project"
description: "Company documentation and guides"
sources:
# Git repositories
git:
docs-repo:
base_url: "https://github.com/example/docs.git"
branch: "main"
include_paths: ["docs/**", "README.md"]
exclude_paths: ["node_modules/**", ".git/**"]
file_types: ["*.md", "*.rst", "*.txt"]
token: "${DOCS_REPO_TOKEN}"
enable_file_conversion: true
# Local files
localfile:
local-docs:
base_url: "file:///path/to/local/files"
include_paths: ["docs/**", "README.md"]
exclude_paths: ["tmp/**", "archive/**"]
file_types: ["*.md", "*.txt", "*.pdf"]
enable_file_conversion: true
# Knowledge base project
kb-project:
project_id: "kb-project"
display_name: "Knowledge Base"
description: "Internal knowledge base and wiki"
sources:
# Confluence
confluence:
company-confluence:
base_url: "https://company.atlassian.net"
space_key: "KB"
deployment_type: "cloud"
token: "${CONFLUENCE_TOKEN}"
email: "${CONFLUENCE_EMAIL}"
enable_file_conversion: true
download_attachments: true
# JIRA
jira:
support-project:
base_url: "https://company.atlassian.net"
deployment_type: "cloud"
project_key: "SUPPORT"
token: "${JIRA_TOKEN}"
email: "${JIRA_EMAIL}"
enable_file_conversion: true
download_attachments: true
Validate Configuration
# Display current configuration
qdrant-loader --workspace . config
# Check project status and validate connections
qdrant-loader project --workspace . status
# List all configured projects
qdrant-loader project --workspace . list
🎯 Common Configuration Scenarios
Scenario 1: Personal Knowledge Base
Use Case: Index personal documents, notes, and bookmarks
global:
qdrant:
url: "${QDRANT_URL}"
collection_name: "personal_knowledge"
openai:
api_key: "${OPENAI_API_KEY}"
processing:
chunk_size: 800
projects:
personal:
project_id: "personal"
display_name: "Personal Knowledge Base"
description: "Personal documents and notes"
sources:
localfile:
documents:
base_url: "file://~/Documents"
include_paths: ["**/*.md", "**/*.txt", "**/*.pdf"]
file_types: ["*.md", "*.txt", "*.pdf"]
enable_file_conversion: true
git:
notes:
base_url: "https://github.com/username/notes.git"
branch: "main"
include_paths: ["**/*.md"]
file_types: ["*.md"]
token: "${GITHUB_TOKEN}"
Scenario 2: Team Documentation Hub
Use Case: Centralize team documentation from multiple sources
global:
qdrant:
collection_name: "team_docs"
openai:
api_key: "${OPENAI_API_KEY}"
projects:
team-docs:
project_id: "team-docs"
display_name: "Team Documentation"
description: "Centralized team documentation"
sources:
git:
main-repo:
base_url: "${TEAM_REPO_URL}"
branch: "main"
include_paths: ["docs/**", "wiki/**", "README.md"]
file_types: ["*.md", "*.rst"]
token: "${TEAM_REPO_TOKEN}"
confluence:
team-space:
base_url: "${CONFLUENCE_URL}"
space_key: "TEAM"
deployment_type: "cloud"
token: "${CONFLUENCE_TOKEN}"
email: "${CONFLUENCE_EMAIL}"
jira:
team-project:
base_url: "${JIRA_URL}"
project_key: "TEAM"
deployment_type: "cloud"
token: "${JIRA_TOKEN}"
email: "${JIRA_EMAIL}"
Scenario 3: Development Team Setup
Use Case: Code documentation and development resources
global:
qdrant:
collection_name: "dev_docs"
processing:
chunk_size: 1200
max_file_size: 104857600 # 100MB
projects:
frontend:
project_id: "frontend"
display_name: "Frontend Documentation"
description: "React frontend application docs"
sources:
git:
frontend-repo:
base_url: "${FRONTEND_REPO_URL}"
branch: "main"
include_paths: ["src/**", "docs/**", "README.md"]
file_types: ["*.md", "*.js", "*.ts", "*.jsx", "*.tsx"]
token: "${REPO_TOKEN}"
backend:
project_id: "backend"
display_name: "Backend Documentation"
description: "API and backend documentation"
sources:
git:
backend-repo:
base_url: "${BACKEND_REPO_URL}"
branch: "main"
include_paths: ["src/**", "docs/**", "API.md"]
file_types: ["*.md", "*.py", "*.yaml", "*.json"]
token: "${REPO_TOKEN}"
🔐 Security Configuration
Environment Variables for Credentials
# .env file - never commit to version control
QDRANT_URL=http://localhost:6333
QDRANT_API_KEY=your-qdrant-cloud-api-key
QDRANT_COLLECTION_NAME=my_documents
OPENAI_API_KEY=sk-your-openai-api-key
# Git authentication
GITHUB_TOKEN=ghp_your-github-token
REPO_TOKEN=your-personal-access-token
# Confluence authentication (Cloud)
CONFLUENCE_URL=https://company.atlassian.net
CONFLUENCE_TOKEN=your-confluence-api-token
CONFLUENCE_EMAIL=your-email@company.com
# JIRA authentication (Cloud)
JIRA_URL=https://company.atlassian.net
JIRA_TOKEN=your-jira-api-token
JIRA_EMAIL=your-email@company.com
# For Data Center/Server deployments
CONFLUENCE_PAT=your-personal-access-token
JIRA_PAT=your-personal-access-token
Secure Configuration Practices
# config.yaml - safe to commit (no secrets)
global:
qdrant:
url: "${QDRANT_URL}"
api_key: "${QDRANT_API_KEY}" # Reference environment variable
openai:
api_key: "${OPENAI_API_KEY}" # Reference environment variable
projects:
example:
sources:
confluence:
space:
base_url: "${CONFLUENCE_URL}"
token: "${CONFLUENCE_TOKEN}"
email: "${CONFLUENCE_EMAIL}"
File Permissions
# Secure configuration files
chmod 600 .env
chmod 644 config.yaml
# Secure workspace directory
chmod 700 logs/
chmod 700 metrics/
🌍 Multi-Environment Setup
Development Environment
# config-dev.yaml
global:
qdrant:
url: "http://localhost:6333"
collection_name: "dev_docs"
processing:
chunk_size: 500 # Smaller for faster testing
projects:
dev-project:
project_id: "dev-project"
display_name: "Development Project"
description: "Development environment testing"
sources:
localfile:
test-docs:
base_url: "file://./test-data"
include_paths: ["**/*.md"]
file_types: ["*.md"]
Production Environment
# config-prod.yaml
global:
qdrant:
url: "${QDRANT_PROD_URL}"
api_key: "${QDRANT_PROD_API_KEY}"
collection_name: "production_docs"
processing:
chunk_size: 1200
max_file_size: 104857600 # 100MB
projects:
prod-project:
project_id: "prod-project"
display_name: "Production Project"
description: "Production documentation"
sources:
git:
prod-repo:
base_url: "${PROD_REPO_URL}"
token: "${PROD_REPO_TOKEN}"
confluence:
prod-space:
base_url: "${CONFLUENCE_PROD_URL}"
token: "${CONFLUENCE_PROD_TOKEN}"
Using Different Configurations
# Use specific configuration file
qdrant-loader --config config-dev.yaml --env .env.dev init
qdrant-loader --config config-prod.yaml --env .env.prod ingest
# Use workspace mode with different environments
qdrant-loader --workspace ./dev-workspace init
qdrant-loader --workspace ./prod-workspace ingest
🔧 Performance Tuning
For Large Datasets
global:
processing:
chunk_size: 1500 # Larger chunks for better context
chunk_overlap: 400 # More overlap for continuity
max_file_size: 209715200 # 200MB
openai:
batch_size: 200 # Larger batches for efficiency
For Fast Ingestion
global:
processing:
chunk_size: 800 # Smaller chunks process faster
chunk_overlap: 100 # Less overlap for speed
openai:
batch_size: 500 # Maximum batch size
max_retries: 1 # Fewer retries for speed
timeout: 10 # Shorter timeout
For Memory Efficiency
global:
processing:
chunk_size: 500 # Smaller chunks use less memory
max_file_size: 10485760 # 10MB
openai:
batch_size: 50 # Smaller batches
✅ Configuration Validation
Test Your Configuration
# Display current configuration
qdrant-loader --workspace . config
# Check project status and connections
qdrant-loader project --workspace . status
# List all projects
qdrant-loader project --workspace . list
# Validate specific project
qdrant-loader project --workspace . validate --project-id my-project
Common Configuration Issues
1. Invalid YAML Syntax
Error: yaml.scanner.ScannerError
Solution:
# Check YAML syntax
python -c "import yaml; yaml.safe_load(open('config.yaml'))"
# Use proper indentation (2 spaces)
# Use quotes for strings with special characters
2. Missing Environment Variables
Error: KeyError: 'OPENAI_API_KEY'
Solution:
# Check environment variables
env | grep QDRANT
env | grep OPENAI
# Set missing variables
export OPENAI_API_KEY="your-key-here"
3. Connection Failures
Error: ConnectionError: Unable to connect to QDrant
Solution:
# Test QDrant connection
curl http://localhost:6333/health
# Check configuration
qdrant-loader --workspace . config
4. Invalid Project Structure
Error: Legacy configuration format detected
Solution: Update to multi-project format:
# OLD (legacy) - not supported
sources:
git:
my-repo: {...}
# NEW (multi-project) - required
projects:
default:
project_id: "default"
display_name: "My Project"
sources:
git:
my-repo: {...}
📋 Configuration Checklist
- [ ] Environment variables set for all credentials
- [ ] Configuration file created with multi-project structure
- [ ] QDrant connection tested successfully
- [ ] OpenAI API key configured and tested
- [ ] Projects defined with appropriate sources
- [ ] File permissions secured (600 for .env files)
- [ ] Workspace structure created if using workspace mode
- [ ] Performance settings tuned for your dataset size
- [ ] Source configurations validated for each project
- [ ] Backup strategy for configuration files
🔗 Next Steps
With your configuration complete:
- Core Concepts - Understand how QDrant Loader works
- User Guides - Explore specific features and workflows
- Data Source Guides - Configure specific connectors
- MCP Server Setup - Set up AI tool integration
- CLI Reference - Learn all available commands
Configuration Complete! 🎉
Your QDrant Loader is now configured with the proper multi-project structure. You can start ingesting documents and using the search capabilities with your AI tools.