Performance Issues Guide

This guide helps you diagnose and resolve performance issues with QDrant Loader, including slow data loading, memory problems, and optimization strategies. Whether you're dealing with large datasets or need to improve processing times, this guide provides practical solutions using the actual CLI commands and configuration options.

🎯 Performance Issue Types

Quick Diagnosis

🐌 Slow data loading          See [Loading Performance](#loading-performance-issues)
💾 High memory usage          See [Memory Issues](#memory-issues)
🔥 High CPU usage            See [CPU Issues](#cpu-issues)
📊 Poor throughput           See [Throughput Optimization](#throughput-optimization)
🌐 Network bottlenecks       See [Network Performance](#network-performance)

📊 Performance Monitoring

Basic Performance Metrics

# Check system resources
htop
iostat -x 1
free -h

# Check project status and validation
qdrant-loader --workspace . project status
qdrant-loader --workspace . project validate

# Monitor QDrant instance health
curl -s "$QDRANT_URL/health"
curl -s "$QDRANT_URL/metrics" | grep -E "(memory|cpu|disk)"

# Monitor network usage
iftop
nethogs

Performance Benchmarking

# Benchmark data loading with timing
time qdrant-loader --workspace . ingest --project my-project

# Test configuration validation
time qdrant-loader --workspace . project validate --project-id my-project

# Monitor file processing
find ./docs -type f -name "*.md" -exec wc -c {} + | sort -n
find ./docs -type f | wc -l

🚀 Loading Performance Issues

Issue: Slow data loading

Symptoms:

  • Ingestion takes much longer than expected
  • High CPU usage during processing
  • Memory usage grows continuously
  • Process appears to hang

Diagnostic Steps:

# Check file sizes and counts in your data sources
find ./docs -type f -name "*.md" -exec wc -c {} + | sort -n
find ./docs -type f | wc -l

# Validate project configuration
qdrant-loader --workspace . project validate --project-id my-project

# Check project status
qdrant-loader --workspace . project status --project-id my-project

Optimization Solutions:

  1. Optimize file conversion settings:
# In your workspace config file
global_config:
  file_conversion:
    max_file_size: 10485760  # 10MB limit
    conversion_timeout: 30   # 30 seconds timeout
    markitdown:
      enable_llm_descriptions: false  # Disable for faster processing
  1. Filter unnecessary files:
# In your project configuration
projects:
  my-project:
    sources:
      local_files:
        my-docs:
          base_url: "file:///path/to/docs"
          include_paths:
            - "*.md"
            - "*.txt"
          exclude_paths:
            - "node_modules/**"
            - ".git/**"
            - "*.pdf"  # Skip large PDFs if not needed
          file_types:
            - "md"
            - "txt"
          max_file_size: 5242880  # 5MB limit
  1. Process in smaller batches:
# Process specific projects only
qdrant-loader --workspace . ingest --project specific-project

# Use force flag to reprocess if needed
qdrant-loader --workspace . init --force
qdrant-loader --workspace . ingest --project my-project

Issue: Memory usage grows during loading

Symptoms:

  • RAM usage increases continuously
  • System becomes unresponsive
  • Out of memory errors
  • Swap usage increases

Solutions:

# Monitor memory usage during processing
watch -n 1 'free -h && ps aux | grep qdrant-loader'

# Process smaller datasets first
qdrant-loader --workspace . ingest --project small-project

# Check configuration for memory-intensive settings
qdrant-loader --workspace . config

Memory Optimization:

# Memory-efficient configuration
global_config:
  file_conversion:
    max_file_size: 2097152    # 2MB limit
    conversion_timeout: 15    # Shorter timeout
    markitdown:
      enable_llm_descriptions: false  # Disable LLM processing

projects:
  my-project:
    sources:
      local_files:
        docs:
          max_file_size: 1048576  # 1MB limit per file
          file_types:
            - "md"
            - "txt"
          # Exclude large file types
          exclude_paths:
            - "*.pdf"
            - "*.zip"
            - "*.tar.gz"

Issue: Loading fails with large files

Symptoms:

  • Processing stops on specific large files
  • Timeout errors
  • Memory allocation errors
  • File conversion failures

Solutions:

# Identify large files
find ./docs -type f -size +10M -exec ls -lh {} \;

# Configure smaller file size limits
qdrant-loader --workspace . config

Large File Configuration:

# Handle large files appropriately
global_config:
  file_conversion:
    max_file_size: 5242880    # 5MB limit
    conversion_timeout: 60    # Longer timeout for large files

projects:
  my-project:
    sources:
      local_files:
        large-docs:
          base_url: "file:///path/to/large-docs"
          max_file_size: 10485760  # 10MB for this specific source
          file_types:
            - "md"
            - "txt"
          exclude_paths:
            - "*.pdf"  # Skip PDFs that are too large

💾 Memory Issues

Issue: High memory usage

Symptoms:

  • QDrant Loader uses excessive RAM
  • System becomes slow
  • Other applications affected
  • Swap usage increases

Diagnostic Steps:

# Monitor memory usage
ps aux | grep qdrant-loader
free -h

# Check project configuration for memory-intensive settings
qdrant-loader --workspace . project validate --project-id my-project

Solutions:

# Set system memory limits
ulimit -m 2097152  # 2GB limit

# Process projects individually
qdrant-loader --workspace . ingest --project project1
qdrant-loader --workspace . ingest --project project2

# Use smaller file size limits
qdrant-loader --workspace . config

Memory-Efficient Configuration:

# Optimize for lower memory usage
global_config:
  file_conversion:
    max_file_size: 1048576    # 1MB limit
    conversion_timeout: 30
    markitdown:
      enable_llm_descriptions: false

projects:
  my-project:
    sources:
      local_files:
        docs:
          max_file_size: 524288  # 512KB limit
          file_types:
            - "md"
            - "txt"

🔥 CPU Issues

Issue: High CPU usage

Symptoms:

  • CPU usage consistently above 80%
  • System becomes unresponsive
  • Fan noise increases
  • Other processes slow down

Solutions:

# Limit CPU usage with nice
nice -n 10 qdrant-loader --workspace . ingest

# Process smaller batches
qdrant-loader --workspace . ingest --project small-project

# Use CPU throttling
cpulimit -l 50 qdrant-loader --workspace . ingest

CPU Optimization:

# CPU-efficient configuration
global_config:
  file_conversion:
    conversion_timeout: 15    # Shorter processing time
    markitdown:
      enable_llm_descriptions: false  # Disable CPU-intensive LLM processing

projects:
  my-project:
    sources:
      local_files:
        docs:
          file_types:
            - "md"
            - "txt"
          # Exclude CPU-intensive file types
          exclude_paths:
            - "*.pdf"
            - "*.docx"
            - "*.pptx"

📈 Throughput Optimization

Optimizing Data Loading Throughput

# Process multiple projects efficiently
qdrant-loader --workspace . project list
qdrant-loader --workspace . ingest --project project1
qdrant-loader --workspace . ingest --project project2

# Validate configuration before processing
qdrant-loader --workspace . project validate

Configuration Optimization

# Optimized configuration for better throughput
global_config:
  qdrant:
    url: "${QDRANT_URL}"
    api_key: "${QDRANT_API_KEY}"
    collection_name: "${QDRANT_COLLECTION_NAME}"

  openai:
    api_key: "${OPENAI_API_KEY}"

  file_conversion:
    max_file_size: 5242880    # 5MB - balance between size and processing time
    conversion_timeout: 30
    markitdown:
      enable_llm_descriptions: false  # Faster processing

projects:
  my-project:
    sources:
      local_files:
        docs:
          base_url: "file:///path/to/docs"
          file_types:
            - "md"
            - "txt"
            - "rst"
          max_file_size: 2097152  # 2MB per file
          include_paths:
            - "docs/**"
            - "*.md"
          exclude_paths:
            - "node_modules/**"
            - ".git/**"
            - "*.log"

🌐 Network Performance

Issue: Slow network operations

Symptoms:

  • Slow loading from remote sources (Git, Confluence, JIRA)
  • Timeouts connecting to QDrant
  • High network latency
  • Connection drops

Diagnostic Steps:

# Test QDrant connectivity
curl -w "@curl-format.txt" -o /dev/null -s "$QDRANT_URL/health"

# Test API endpoints
curl -H "Authorization: Bearer $QDRANT_API_KEY" "$QDRANT_URL/collections"

# Monitor network usage
iftop -i eth0

Solutions:

# Network-optimized configuration
global_config:
  qdrant:
    url: "${QDRANT_URL}"
    api_key: "${QDRANT_API_KEY}"
    collection_name: "${QDRANT_COLLECTION_NAME}"

projects:
  my-project:
    sources:
      git:
        my-repo:
          base_url: "https://github.com/user/repo.git"
          branch: "main"
          token: "${REPO_TOKEN}"
          # Optimize for network performance
          include_paths:
            - "docs/**"
          exclude_paths:
            - "*.pdf"
            - "*.zip"
            - ".git/**"
          file_types:
            - "md"
            - "txt"
          max_file_size: 1048576  # 1MB to reduce network load

      confluence:
        my-confluence:
          base_url: "${CONFLUENCE_URL}"
          deployment_type: "cloud"
          space_key: "DOCS"
          email: "${CONFLUENCE_EMAIL}"
          token: "${CONFLUENCE_TOKEN}"
          # Network optimization
          content_types:
            - "page"
          download_attachments: false  # Reduce network load

🔧 Advanced Optimization

Project Structure Optimization

# Organize projects for optimal processing
global_config:
  qdrant:
    url: "${QDRANT_URL}"
    api_key: "${QDRANT_API_KEY}"
    collection_name: "${QDRANT_COLLECTION_NAME}"

  openai:
    api_key: "${OPENAI_API_KEY}"

  file_conversion:
    max_file_size: 5242880
    conversion_timeout: 30
    markitdown:
      enable_llm_descriptions: false

# Separate projects by data source type for better management
projects:
  local-docs:
    sources:
      local_files:
        documentation:
          base_url: "file:///path/to/docs"
          file_types: ["md", "txt"]
          max_file_size: 2097152

  git-repos:
    sources:
      git:
        main-repo:
          base_url: "https://github.com/user/repo.git"
          branch: "main"
          token: "${REPO_TOKEN}"
          file_types: ["md", "py", "js"]

  confluence-content:
    sources:
      confluence:
        company-wiki:
          base_url: "${CONFLUENCE_URL}"
          deployment_type: "cloud"
          space_key: "DOCS"
          email: "${CONFLUENCE_EMAIL}"
          token: "${CONFLUENCE_TOKEN}"
          download_attachments: false

System-Level Optimization

# Increase file descriptor limits
echo "* soft nofile 65536" >> /etc/security/limits.conf
echo "* hard nofile 65536" >> /etc/security/limits.conf

# Monitor system resources during processing
top -p $(pgrep -f qdrant-loader)

# Check disk space
df -h

# Monitor I/O
iostat -x 1

📊 Performance Tuning Presets

Small Dataset (< 1GB)

global_config:
  file_conversion:
    max_file_size: 10485760   # 10MB
    conversion_timeout: 60
    markitdown:
      enable_llm_descriptions: true  # Can afford LLM processing

projects:
  small-project:
    sources:
      local_files:
        docs:
          max_file_size: 5242880  # 5MB per file

Medium Dataset (1-10GB)

global_config:
  file_conversion:
    max_file_size: 5242880    # 5MB
    conversion_timeout: 30
    markitdown:
      enable_llm_descriptions: false  # Skip for performance

projects:
  medium-project:
    sources:
      local_files:
        docs:
          max_file_size: 2097152  # 2MB per file
          exclude_paths:
            - "*.pdf"
            - "*.zip"

Large Dataset (> 10GB)

global_config:
  file_conversion:
    max_file_size: 2097152    # 2MB
    conversion_timeout: 15
    markitdown:
      enable_llm_descriptions: false

projects:
  large-project:
    sources:
      local_files:
        docs:
          max_file_size: 1048576  # 1MB per file
          file_types:
            - "md"
            - "txt"
          exclude_paths:
            - "*.pdf"
            - "*.docx"
            - "*.pptx"
            - "*.zip"
            - "*.tar.gz"

🚨 Performance Emergency Procedures

When System is Unresponsive

# 1. Check system resources
top
df -h
free -h

# 2. Kill runaway processes
pkill -f qdrant-loader

# 3. Clear system caches
sync && echo 3 > /proc/sys/vm/drop_caches

# 4. Restart with minimal configuration
qdrant-loader --workspace . project validate
qdrant-loader --workspace . project status

Performance Recovery

# 1. Check system resources
top
df -h
free -h

# 2. Validate configuration
qdrant-loader --workspace . project validate

# 3. Restart processing with smaller scope
qdrant-loader --workspace . init --force
qdrant-loader --workspace . ingest --project small-project

# 4. Monitor progress
qdrant-loader --workspace . project status

📈 Performance Monitoring

Key Metrics to Track

# Monitor system resources
htop
free -h
df -h
iostat -x 1

# Check project status
qdrant-loader --workspace . project list
qdrant-loader --workspace . project status

# Validate configuration
qdrant-loader --workspace . project validate

Performance Testing

# Time the ingestion process
time qdrant-loader --workspace . ingest --project test-project

# Monitor memory usage during processing
watch -n 1 'free -h && ps aux | grep qdrant-loader'

# Check file processing statistics
find ./docs -type f -name "*.md" | wc -l
du -sh ./docs

Performance optimized! 🚀

This guide covers comprehensive performance optimization strategies using actual QDrant Loader commands and configuration options. For specific error messages, check the Error Messages Reference, and for general issues, see the Common Issues Guide.

Back to Documentation
Generated from performance-issues.md