Release Notes
Version 0.5.0 - July 25, 2025
๐ Major Features
Advanced Search Intelligence
- Cross-document intelligence: Document similarity analysis, clustering, and relationship detection
- Intent-aware adaptive search: AI-powered query understanding and strategy selection
- Knowledge graph integration: Entity relationships and multi-hop reasoning capabilities
- Topic-driven search chaining: Automatic topic discovery and related content suggestions
- Dynamic faceted search: Real-time facet generation and filtering interface
Enhanced Semantic Analysis
- spaCy integration: Advanced NLP processing with configurable language models
- Improved topic extraction: Enhanced LDA modeling with optimized parameters
- Entity recognition: Structured entity and topic conversion in search results
- Semantic analysis configuration: Comprehensive topic modeling settings
CLI & User Experience
- Force ingestion option: Added
--force
flag to bypass change detection for complete reprocessing - Enhanced chunking: Improved timeout handling and performance thresholds
- Better logging: Structured logging for semantic analysis and search components
Version 0.4.15 - July 22, 2025
๐ Major Improvements
Chunking Strategy Overhaul
- Fixed critical chunking inconsistency: All strategies now use character-based
chunk_size
(was mixed token/character interpretation causing 4-5x chunk count differences) - Markdown strategy modularization: Complete refactor into focused components (DocumentParser, SectionSplitter, MetadataExtractor, ChunkProcessor) for better maintainability
- Enhanced hierarchical metadata: Added intelligent section analysis with HeaderAnalysis and SectionMetadata for richer document context
- Smart split level detection: Automatic optimization of header split levels based on document structure and type
- Improved boundary detection: Tokenizer now used for word/token boundaries while respecting character-based limits
- Comprehensive testing: Added integration tests ensuring strategy consistency and preventing regression
Version 0.4.14 - July 13, 2025
๐ Critical Bug Fixes
Excel File Chunking Fixes
- Fixed regex error in table detection: Resolved
bad character range |-\s at position 2
error that was preventing Excel files from being chunked properly - Root cause: Invalid regex pattern
r"^[|-\s:]+$"
in_split_excel_sheet_content
method - Solution: Escaped dash character to create valid pattern:
r"^[|\-\s:]+$"
- Impact: Excel files no longer fall back to default chunking strategy
- Fixed large table chunking logic: Resolved issue where large Excel tables were treated as single massive chunks
- Problem: 128K character files created only 2-5 chunks instead of ~200 chunks at 600-character limit
- Root cause: Large logical units (tables) were not split when exceeding max_size
- Solution: Added intelligent splitting logic that preserves table structure while respecting chunk size limits
- Result: Large Excel files now properly chunk into appropriate sizes (e.g., 74K chars โ 127 chunks @ ~588 chars each)
- Eliminated token limit warnings: Fixed the
Content exceeds maximum token limit, truncating
warnings that occurred with large Excel chunks - Before: Chunks up to 128K characters (47K+ tokens) being truncated
- After: All chunks properly sized to stay within token limits
- Enhanced table structure preservation: Table boundaries are now intelligently detected and preserved during chunking
Technical Improvements
- Better logical unit management: Enhanced
_split_excel_sheet_content
to handle large units by splitting at line boundaries - Preserved table formatting: Chunking algorithm maintains table structure integrity while enforcing size limits
- Improved error handling: Better error messages and fallback behavior for edge cases
- Performance optimization: More efficient chunking for large Excel files without infinite loops
Testing & Validation
- All existing tests pass: 50 markdown strategy tests continue to pass, ensuring backward compatibility
- Verified chunking accuracy: Large test files now produce expected chunk counts with proper size distribution
- Regex pattern validation: Confirmed table detection works correctly for all markdown table formats
Version 0.4.13 - July 11, 2025
โจ New Features
Excel File Chunking Improvements
- Enhanced Excel-to-markdown chunking: Improved MarkdownChunkingStrategy to properly handle Excel files converted to markdown by MarkItDown
- Sheet-aware sectioning: Excel files now split on H2 headers (sheet names) instead of treating the entire file as one "Preamble" section
- Table-aware chunking: Added specialized
_split_excel_sheet_content
method that preserves table structure when splitting large sheets - Intelligent content detection: Automatically detects converted Excel files based on
original_file_type
metadata and applies appropriate chunking rules - Backward compatibility: Regular markdown files continue to use H1-only sectioning, maintaining existing behavior
- Comprehensive testing: Added 3 new test cases covering Excel chunking scenarios and ensuring regular markdown files are unaffected
Technical Improvements
- Context-aware splitting: Different header level thresholds based on file type (H1 for markdown, H1+H2 for Excel)
- Enhanced metadata tracking: Added
is_excel_sheet
metadata to identify Excel-derived chunks - Table boundary preservation: Smart table detection prevents breaking tables in the middle when chunking
- Document reference management: Added proper cleanup of document references to prevent memory leaks
Version 0.4.12 - July 10, 2025
๐ Bug Fixes
Chunking Strategy Improvements
- Fixed missing chunk overlap in MarkdownChunkingStrategy: Implemented proper overlap functionality that was completely missing from markdown file chunking
- Added intelligent overlap calculation: Overlap now respects the configured
chunk_overlap
parameter and uses paragraph/sentence boundaries for natural breaks - Enhanced overlap configuration support: When
chunk_overlap=0
, chunks have no overlap; when configured, up to 25% of chunk content can overlap for better context continuity - Added comprehensive overlap testing: New test suite verifies overlap works correctly across different configurations and content types
Version 0.4.11 - July 10, 2025
๐ Bug Fixes
File Processing & Chunking
- Fixed file size detection limits: Increased default file size limits to handle larger documents (docx, xlsx files up to 5MB)
- Resolved MarkdownChunkingStrategy issues: Fixed chunking strategy to respect
chunk_size
configuration instead of only splitting on H1 headers - Fixed unique chunk ID generation: Resolved issue where chunks from same document had identical IDs, causing overwrites in Qdrant storage
- Enhanced chunk count management: Replaced hard-coded chunk limits with configurable
max_chunks_per_document
setting
Configuration Management
- Improved chunking configuration: Added
max_chunks_per_document
parameter for better control over document processing - Cleaned up redundant settings: Removed conflicting
max_document_size
parameter to maintain clean separation between file size and chunk count limits - Enhanced error messages: Added actionable configuration advice when chunk limits are reached
Processing Pipeline
- Fixed content truncation: Eliminated "maximum chunks per section limit" warnings by making limits dynamic based on user configuration
- Improved chunk estimation: Added better user guidance for optimal chunk count configuration
- Enhanced section handling: Made section limits dynamic (50% of max_chunks_per_document)
Version 0.4.10 - June 18, 2025
๐ Bug Fixes
Windows Compatibility & Logging
- Fixed duplicate debug logging: Resolved
[DEBUG] [DEBUG]
duplicate level tags in both console and file output - Enhanced logging verbosity control: Added filtering for noisy third-party library debug messages (chardet, pdfminer, httpx)
- Improved Windows path formatting: Fixed mixed path separators in log output for consistent cross-platform display
- Complete path normalization: Fixed remaining instances of backslashes in Windows file paths in FileDetector and file processor logging
- Fixed .txt file processing: Resolved issue where
.txt
files were excluded from ingestion whenfile_types: []
was empty
File Processing
- LocalFile connector:
.txt
files now properly processed by default text strategy when no specific file types configured - Git connector: Consistent file type processing logic across all connectors
- Path normalization: All file paths in logs now use forward slashes for consistency
Version 0.4.9 - June 18, 2025
Bug fix
- Issue when deleting a deleted document : missing content_type="md" field to the
_create_deleted_document method
Version 0.4.8 - June 17, 2025
๐ช Windows Compatibility Fixes
- LocalFile Connector: Fixed Windows file URL parsing (
file:///C:/Users/...
now works correctly) - Git Connector: Fixed document URL generation with Windows paths (backslashes โ forward slashes)
- File Conversion: Cross-platform timeout handling (threading on Windows, signals on Unix)
- MarkItDown Integration: Fixed Windows signal compatibility (
signal.SIGALRM
errors resolved) - Console Output: Enhanced emoji handling for clean Windows display
- Logging: Suppressed verbose SQLite logs and fixed duplicate log level display (
[DEBUG] [DEBUG]
โ[DEBUG]
) - Testing: Added 38 Windows compatibility test cases
Version 0.4.7 - June 9, 2025
๐งน Test Suite Improvements
๐ Bug Fixes
CLI and User Experience
- Version check improvements: Fixed upgrade instructions to include
qdrant-loader-mcp-server
package in version check output - Branch display logic: Fixed branch display logic to default to 'main' when branch is unknown in coverage reports
- Error handling: Improved error handling in CLI for invalid input scenarios
๐ Documentation
- Configuration template: Enhanced configuration template with detailed comments for better user guidance
- PublicDocs connector: Improved logging in PublicDocsConnector for better debugging
๐ง Release Process Enhancement
- Release notes validation: Updated release script to automatically check that
RELEASE_NOTES.md
has been updated for new versions before allowing releases - Improved release safety: Enhanced pre-release checks to ensure documentation consistency
Version 0.4.6 - June 3, 2025
๐ User Experience Enhancements
Version Notifications
- Automatic update notifications: CLI now checks for new package versions and notifies users when updates are available
- Non-intrusive background checks: Version checking runs in background without affecting CLI performance
Version 0.4.5 - June 3, 2025
๐ Performance Improvements
CLI Startup Optimization
- CLI startup performance: Reduced startup time by 60-67% for basic commands (#24)
--help
: ~6.8s โ 2.33s (66% improvement)--version
: ~6.3s โ 2.57s (59% improvement)- Lazy loading implementation: Heavy modules now load only when needed (96-97% import time reduction)
- Fixed version detection: Replaced custom parsing with
importlib.metadata.version()
- works in all environments - Resolved circular imports: Eliminated
config
โconnectors
โconfig
dependency cycle
๐จ User Experience Enhancements
Excel File Processing
- Warning capture system: Intercepts openpyxl warnings during Excel conversion
- Structured logging: Routes warnings through qdrant-loader logging system for visual consistency
- Smart detection: Captures "Data Validation" and "Conditional Formatting" warnings with context
- Summary reporting: Provides comprehensive summary of unsupported Excel features
Version 0.4.4 - June 3, 2025
๐ Major Improvements
File Conversion & Processing Overhaul
Fixed Critical File Conversion Issues
- Fixed file conversion initialization: Resolved issue where file conversion was not working due to missing
set_file_conversion_config
calls in the pipeline (9d16b8d) - Enhanced strategy selection: Converted files (Excel, Word, PDF, etc.) now correctly use
MarkdownChunkingStrategy
instead ofDefaultChunkingStrategy
(9d16b8d) - Improved NLP processing: Converted files now have full NLP processing enabled instead of being skipped with
content_type_inappropriate
(7de3526)
Enhanced File Processing Pipeline
- Added proper file conversion configuration initialization in source processors (9d16b8d)
- Implemented automatic strategy selection based on conversion status (9d16b8d)
- Fixed metadata propagation for converted files (7de3526)
Chunking Strategy Improvements
Resolved Infinite Loop Issues
- Fixed MarkdownChunkingStrategy infinite loops: Resolved critical issue where documents with very long words would create infinite loops, hitting the 1000 chunk limit (9d16b8d)
- Added safety limits: Implemented
MAX_CHUNKS_PER_SECTION = 100
andMAX_CHUNKS_PER_DOCUMENT = 500
limits (9d16b8d) - Enhanced error handling: Added proper handling for words longer than
max_size
by truncating them with warnings (9d16b8d)
Improved Chunking Logic
- Added safety checks to prevent infinite loops in
_split_large_section
method (9d16b8d) - Enhanced logging for debugging chunking issues (9d16b8d)
- Added warnings when chunking limits are reached (9d16b8d)
Workspace Management
Better Log Organization
- Fixed workspace logs location: Logs are now stored in
workspace_path/logs/qdrant-loader.log
instead of cluttering the workspace root (589ae4b) - Enhanced workspace structure: Added automatic creation of logs directory (589ae4b)
- Updated documentation: Reflected new log structure in workspace mode documentation (589ae4b)
Resource Management & Stability
Fixed Pipeline Hanging Issues
- Resolved ResourceManager cleanup: Fixed issue where normal cleanup was setting shutdown events, causing workers to exit prematurely (4844abf)
- Enhanced signal handling: Distinguished between normal cleanup and signal-based shutdown (4844abf)
- Improved graceful shutdown: Workers now properly complete processing before shutdown (4844abf)
Performance Optimizations
- Increased
MAX_CHUNKS_TO_PROCESS
from 100 to 1000 chunks to accommodate larger documents (1bfe550) - Better handling of large documents (up to ~1000KB text limit per document) (1bfe550)
- Improved change detection for incremental updates (5408db9)
๐ง Technical Improvements
Code Quality & Testing
Enhanced Test Coverage
- Added comprehensive tests for converted file NLP processing (7de3526)
- Added tests for chunking strategy selection (9d16b8d)
- Enhanced error handling test coverage (4844abf)
- All existing functionality preserved with 100% test pass rate
Architecture Improvements
- Enhanced base connector class with proper file conversion support (9d16b8d)
- Improved factory pattern for pipeline component creation (9d16b8d)
- Better separation of concerns in source processing (9d16b8d)
Configuration & Setup
Improved File Conversion Support
- Enhanced connector initialization with file conversion configuration (9d16b8d)
- Better error handling for conversion failures (9d16b8d)
- Improved fallback mechanisms for unsupported file types (9d16b8d)
๐ Bug Fixes
Critical Fixes
- File conversion not working: Fixed missing initialization causing 0 documents to be processed (5408db9)
- Infinite chunking loops: Resolved MarkdownChunkingStrategy creating thousands of chunks for simple documents (9d16b8d)
- Pipeline hanging: Fixed ResourceManager causing workers to exit prematurely (4844abf)
- NLP processing skipped: Fixed converted files being inappropriately skipped for NLP processing (7de3526)
Minor Fixes
- Fixed workspace log file location (589ae4b)
- Improved error messages and logging (9d16b8d)
- Enhanced metadata handling for converted files (7de3526)
- Better handling of edge cases in chunking strategies (9d16b8d)
๐ Migration Notes
For Existing Users:
- Logs will now be created in
workspace/logs/
directory instead of workspace root - Converted files will now be processed with enhanced NLP capabilities
- Large documents will be chunked more efficiently with higher limits
- No breaking changes to existing configurations
Performance Impact:
- Improved processing speed for converted files
- Better memory usage with enhanced chunking limits
- More stable pipeline execution with proper resource management
Back to Documentation
Generated from RELEASE_NOTES.md