plans/2025-04-26-large-file-read-range-support-v3.md
Implement support for reading extremely large text files by adding range parameters (start_byte and end_byte) to the file read tool, allowing users to read specific portions of large files without loading the entire file into memory. Binary files should not be supported and UTF-8 character boundaries must always be respected.
Update FsReadService interface to support range reading
crates/forge_services/src/infra.rsImplement the range reading functionality in ForgeFileReadService
crates/forge_infra/src/fs_read.rsAdd binary file detection and validation
crates/forge_fs/src/lib.rsUpdate ForgeFS to support range reading with binary file validation
crates/forge_fs/src/lib.rsImplement UTF-8 boundary detection and correction
crates/forge_fs/src/lib.rsUpdate the FSReadInput struct to include optional range parameters
crates/forge_services/src/tools/fs/fs_read.rsModify FSRead tool implementation to support range reading and reject binary files
crates/forge_services/src/tools/fs/fs_read.rscall method to use the range-based reading with UTF-8 boundary adjustment and ensure binary files are rejectedUpdate the FSRead tool description
crates/forge_services/src/tools/fs/fs_read.rsImplement file size detection logic
crates/forge_fs/src/lib.rsAdd content length information to range read responses
crates/forge_services/src/tools/fs/fs_read.rsAdd unit tests for range-based file reading and binary file rejection
crates/forge_services/src/tools/fs/fs_read.rscrates/forge_infra/src/fs_read.rscrates/forge_fs/src/lib.rsPerformance issues with extremely large text files
Mitigation:
UTF-8 boundary adjustment overhead
Mitigation:
Breaking changes to the existing API
Mitigation:
Inaccurate binary file detection
Mitigation:
File locking and concurrent access issues
Mitigation:
Memory consumption with large ranges
Mitigation:
Platform-specific issues
Mitigation:
Invalid UTF-8 sequences in text files
Mitigation:
Streaming API: Implement a streaming interface for file reading instead of range-based reading. This would allow progressive loading of large files but would require more significant changes to the tool interface.
File Pagination Tool: Create a separate tool specifically for paginated file reading, leaving the original file read tool unchanged. This would maintain perfect backward compatibility but introduce redundancy.
Content-Based Partitioning: Implement intelligent partitioning based on content (e.g., by line, by paragraph, by JSON object) rather than raw bytes. This would be more semantic but more complex to implement.
Fixed-size chunking: Instead of arbitrary byte ranges, implement a chunking system where files are divided into fixed-size chunks that can be requested by index. This would simplify the API but reduce flexibility.
Smart text-only file reading: Implement a detection mechanism that automatically determines the optimal portion of a text file to return based on the context of the request, using language-aware boundaries like paragraphs or code blocks.
For the FSReadInput struct, add the following optional parameters:
/// Optional start position in bytes (0-based)
pub start_byte: Option<u64>,
/// Optional end position in bytes (exclusive)
pub end_byte: Option<u64>,
To detect binary files, we'll implement a function that:
When a file is detected as binary, we'll return an error message like: "Binary files are not supported. Please use another tool or method to process this file."
To ensure range reads respect UTF-8 character boundaries:
For the start position:
For the end position:
Report the adjusted positions in the response metadata
The response will include:
To minimize memory usage and improve performance:
tokio::fs::File::open() to get a file handlefile.metadata() to get the file size without reading contentfile.seek() to position near start_bytefile.take(adjusted_end_byte - adjusted_start_byte) to create a limited reader