Workspace Mode Configuration

This guide covers how to configure QDrant Loader using workspace mode, which provides organized directory structure and simplified configuration management for your projects.

🎯 Overview

Workspace mode in QDrant Loader provides a structured approach to organizing your configuration files, logs, and metrics in a dedicated directory. It automatically discovers configuration files and creates necessary subdirectories for organized project management.

What Workspace Mode Provides

📁 Workspace Directory
├── config.yaml          # Main configuration file
├── .env                 # Environment variables (optional)
├── logs/                # Application logs
│   └── qdrant-loader.log
├── metrics/             # Performance metrics
└── data/                # State database
    └── state.db

Benefits of Workspace Mode

  • Auto-discovery: Automatically finds config.yaml and .env files
  • Organized structure: Creates dedicated directories for logs and metrics
  • Simplified commands: No need to specify config file paths
  • Consistent layout: Standardized project organization

🏗️ Setting Up Workspace Mode

Create Workspace Directory

# Create workspace directory
mkdir my-qdrant-workspace
cd my-qdrant-workspace

# Copy configuration template
cp packages/qdrant-loader/conf/config.template.yaml config.yaml
cp packages/qdrant-loader/conf/.env.template .env

Basic Configuration Structure

QDrant Loader uses a multi-project configuration structure where all projects share a single Qdrant collection but are isolated through project metadata:

# config.yaml - Multi-project configuration
global_config:
  qdrant:
    url: "http://localhost:6333"
    api_key: null  # Optional for Qdrant Cloud
    collection_name: "my_documents"  # Shared by all projects

  embedding:
    model: "text-embedding-3-small"
    api_key: "${OPENAI_API_KEY}"
    vector_size: 1536

projects:
  docs-project:
    project_id: "docs-project"
    display_name: "Documentation Project"
    description: "Company documentation"

    sources:
      git:
        docs-repo:
          base_url: "https://github.com/company/docs"
          branch: "main"
          include_paths: ["docs/**", "*.md"]
          token: "${GITHUB_TOKEN}"

  wiki-project:
    project_id: "wiki-project"
    display_name: "Wiki Project"
    description: "Internal wiki content"

    sources:
      confluence:
        company-wiki:
          base_url: "https://company.atlassian.net/wiki"
          space_key: "WIKI"
          token: "${CONFLUENCE_TOKEN}"
          email: "${CONFLUENCE_EMAIL}"

Environment Variables

# .env file
# Required - QDrant Database
QDRANT_URL=http://localhost:6333
QDRANT_API_KEY=your-qdrant-cloud-key  # Optional

# Required - OpenAI API
OPENAI_API_KEY=your-openai-api-key

# Optional - Source credentials
GITHUB_TOKEN=your-github-token
CONFLUENCE_TOKEN=your-confluence-token
CONFLUENCE_EMAIL=your-email@company.com

⚙️ Workspace Commands

Initialize Workspace

# Initialize collection and prepare workspace
qdrant-loader --workspace . init

# Force recreation of existing collection
qdrant-loader --workspace . init --force

Ingest Data

# Process all projects and sources
qdrant-loader --workspace . ingest

# Process specific project
qdrant-loader --workspace . ingest --project docs-project

# Process specific source type across all projects
qdrant-loader --workspace . ingest --source-type git

# Process specific source within a project
qdrant-loader --workspace . ingest --project docs-project --source docs-repo

Configuration Management

# Show current configuration
qdrant-loader --workspace . config

# Validate configuration
qdrant-loader project --workspace . validate

# List all projects
qdrant-loader project --workspace . list

# Show project status
qdrant-loader project --workspace . status

📁 Project Management

Project Structure

Each project in the configuration has:

  • project_id: Unique identifier used for filtering and metadata
  • display_name: Human-readable name for the project
  • description: Brief description of the project's purpose
  • sources: Configuration for data sources (git, confluence, jira, etc.)

Project Commands

# List all configured projects
qdrant-loader project --workspace . list

# Show detailed project information
qdrant-loader project --workspace . status

# Show status for specific project
qdrant-loader project --workspace . status --project-id docs-project

# Validate project configurations
qdrant-loader project --workspace . validate

# Output in JSON format
qdrant-loader project --workspace . list --format json

Project Isolation

Projects are isolated through metadata, not separate collections:

  • All projects use the same Qdrant collection
  • Each document includes project_id in metadata
  • Search can be filtered by project through the MCP server
  • Simplifies collection management and enables cross-project search

🔍 Data Source Configuration

Supported Source Types

Git Repositories

sources:
  git:
    repo-name:
      base_url: "https://github.com/user/repo"
      branch: "main"
      include_paths: ["docs/**", "*.md"]
      exclude_paths: ["node_modules/**"]
      file_types: ["*.md", "*.rst", "*.txt"]
      token: "${GITHUB_TOKEN}"
      enable_file_conversion: true

Confluence

sources:
  confluence:
    wiki-name:
      base_url: "https://company.atlassian.net/wiki"
      deployment_type: "cloud"
      space_key: "DOCS"
      content_types: ["page", "blogpost"]
      token: "${CONFLUENCE_TOKEN}"
      email: "${CONFLUENCE_EMAIL}"
      enable_file_conversion: true
      download_attachments: true

JIRA

sources:
  jira:
    project-name:
      base_url: "https://company.atlassian.net"
      deployment_type: "cloud"
      project_key: "PROJ"
      token: "${JIRA_TOKEN}"
      email: "${JIRA_EMAIL}"
      enable_file_conversion: true
      download_attachments: true

Local Files

sources:
  localfile:
    local-docs:
      base_url: "file:///path/to/files"
      include_paths: ["docs/**"]
      exclude_paths: ["tmp/**"]
      file_types: ["*.md", "*.txt"]
      max_file_size: 1048576  # 1MB
      enable_file_conversion: true

Public Documentation

sources:
  publicdocs:
    docs-site:
      base_url: "https://docs.example.com"
      version: "1.0"
      content_type: "html"
      path_pattern: "/docs/{version}/**"
      selectors:
        content: "article.main-content"
        remove: ["nav", "header", "footer"]
      enable_file_conversion: true
      download_attachments: true

🔧 Advanced Configuration

Global Settings

global_config:
  # Chunking configuration
  chunking:
    chunk_size: 1500
    chunk_overlap: 200

  # Embedding configuration
  embedding:
    endpoint: "https://api.openai.com/v1"
    model: "text-embedding-3-small"
    api_key: "${OPENAI_API_KEY}"
    batch_size: 100
    vector_size: 1536
    max_tokens_per_request: 8000
    max_tokens_per_chunk: 8000

  # File conversion settings
  file_conversion:
    max_file_size: 52428800  # 50MB
    conversion_timeout: 300  # 5 minutes
    markitdown:
      enable_llm_descriptions: false
      llm_model: "gpt-4o"
      llm_api_key: "${OPENAI_API_KEY}"

State Management

Workspace mode automatically manages the state database:

global_config:
  state_management:
    database_path: "${STATE_DB_PATH}"  # Ignored in workspace mode
    table_prefix: "qdrant_loader_"
    connection_pool:
      size: 5
      timeout: 30

In workspace mode, the state database is automatically created as state.db in a data directory within workspace directory.

📊 Workspace Structure

Directory Layout

my-qdrant-workspace/
├── config.yaml              # Main configuration
├── .env                     # Environment variables
├── logs/                    # Application logs
│   └── qdrant-loader.log
├── metrics/                 # Performance metrics
│   └── ingestion_metrics.json
└── data/
    └── state.db             # Processing state database

Log Files

Workspace mode automatically configures logging:

  • Location: logs/qdrant-loader.log
  • Format: Structured logging with timestamps
  • Rotation: Automatic log rotation (if configured)

Metrics

Performance metrics are stored in the metrics/ directory:

  • Ingestion metrics: Processing statistics and performance data
  • Error tracking: Failed operations and retry attempts
  • Resource usage: Memory and processing time metrics

🔗 MCP Server Integration

The MCP server currently uses environment variables for configuration and does not support workspace mode directly. You need to configure it using environment variables:

{
  "mcpServers": {
    "qdrant-loader": {
      "command": "mcp-qdrant-loader",
      "env": {
        "QDRANT_URL": "http://localhost:6333",
        "QDRANT_API_KEY": "your-api-key",
        "QDRANT_COLLECTION_NAME": "my_documents",
        "OPENAI_API_KEY": "your-openai-key"
      }
    }
  }
}

MCP Server Environment Variables

The MCP server requires these environment variables:

  • QDRANT_URL: URL of your QDrant instance (required)
  • QDRANT_API_KEY: API key for QDrant authentication (optional)
  • QDRANT_COLLECTION_NAME: Name of the collection to use (default: "documents")
  • OPENAI_API_KEY: OpenAI API key for embeddings (required)
  • MCP_DISABLE_CONSOLE_LOGGING: Set to "true" to disable console logging (optional)

Current Limitations

  • The MCP server does not support the --workspace flag
  • Configuration must be done through environment variables
  • The --config option exists but is not currently implemented
  • Project-aware search is not yet available in the MCP server

Future Workspace Integration

Workspace mode support for the MCP server is planned for future releases, which would allow:

  • Automatic discovery of workspace configuration
  • Project-aware search capabilities
  • Simplified configuration through workspace files

🚀 Getting Started Checklist

  • [ ] Create workspace directory and navigate to it
  • [ ] Copy configuration template to config.yaml
  • [ ] Create environment file with required credentials
  • [ ] Configure projects with your data sources
  • [ ] Initialize collection with qdrant-loader --workspace . init
  • [ ] Ingest data with qdrant-loader --workspace . ingest
  • [ ] Verify setup with qdrant-loader project --workspace . list
  • [ ] Test search through MCP server integration

Workspace mode provides organized, scalable configuration management for your QDrant Loader projects. 🎉

This structured approach simplifies project management while maintaining flexibility for complex multi-source configurations.

Back to Documentation
Generated from workspace-mode.md