fix: Prevent Telegram streamed replies from ending after first overflow chunk

Summary

Fix a Telegram gateway bug where a long streamed assistant reply can appear to stop mid-answer in a topic after the first overflow chunk. The reported screenshot shows a long Hermes response in the Nehemiah - Coding Telegram topic ending at - The visible tool-call summary, followed by the user noting that the previous message did not finish streaming to that Telegram topic.

The plan targets the streamed edit overflow path, not general model generation. A completed assistant response must either reach Telegram in full across all continuation messages or leave enough state for the gateway fallback path to deliver the remaining content instead of marking the turn complete after a partial delivery.

Problem Frame

Telegram limits message text to 4096 UTF-16 code units. Hermes streams gateway responses by editing a message and, when a streamed message grows past the limit, splitting the overflow into additional Telegram messages. The adapter already has a split-and-deliver path for oversized edits, but the partial-continuation failure contract is weak: if chunk 1 is edited successfully and a later continuation fails, the adapter can still report success for the operation. The stream consumer may then mark the final response delivered even though the visible topic only contains the first part.

This is especially visible in Telegram forum topics because a long final response can be split below tool-progress bubbles, and a missing continuation looks exactly like the stream stopped mid-answer.

Requirements

R1. Long streamed Telegram replies must preserve all final content across overflow chunks.
R2. If any continuation chunk fails after the first overflow edit lands, the gateway must not mark the final response as fully delivered.
R3. Continuation chunks must remain routed to the same Telegram topic/thread as the original response.
R4. The fix must avoid duplicate full-answer sends when all overflow chunks were delivered successfully.
R5. Tests must cover the reported failure shape: a final streamed reply that exceeds Telegram's limit, succeeds on the first edit, fails on a continuation, and must not be treated as complete.

Key Technical Decisions

Treat overflow delivery as all-or-not-complete. _edit_overflow_split should only return a successful final-delivery result when every planned chunk reaches Telegram. Partial delivery is a distinct outcome that downstream code can recover from.
Carry partial-overflow metadata through SendResult.raw_response rather than adding a new public dataclass field unless implementation proves the existing result shape is insufficient. The stream consumer already inspects SendResult after adapter edits, so a small raw response contract can keep the change contained.
Make the stream consumer responsible for final-delivery truth. The adapter knows which chunks landed, but the consumer owns _final_response_sent, _final_content_delivered, _fallback_prefix, and fallback final-send behaviour.
Keep routing inside Telegram adapter helpers. Continuation sends should continue to use _thread_kwargs_for_send(...) with metadata-derived message_thread_id and reply anchors so forum topic behaviour stays consistent.

High-Level Technical Design

mermaid

sequenceDiagram
    participant C as GatewayStreamConsumer
    participant T as TelegramAdapter.edit_message
    participant B as Telegram Bot API

    C->>T: finalize/edit long accumulated response
    T->>B: edit original message with chunk 1
    loop remaining chunks
        T->>B: send continuation in same topic/thread
    end
    alt all chunks delivered
        T-->>C: success, last message id, continuation ids
        C->>C: mark final response delivered
    else any continuation failed
        T-->>C: partial overflow failure with delivered prefix metadata
        C->>C: do not mark final delivered
        C->>B: fallback sends missing tail or full final response safely
    end

Implementation Units

U1. Add a partial-overflow contract for Telegram edit splits

Goal: Make TelegramAdapter._edit_overflow_split distinguish complete overflow delivery from partial delivery.

Requirements: R1, R2, R4

Dependencies: None

Files:

gateway/platforms/telegram.py
tests/gateway/test_telegram_send.py or the existing Telegram adapter test module that already covers edit_message overflow behaviour

Approach:

Keep the successful path unchanged when every chunk is delivered: return SendResult(success=True, message_id=<last chunk>, continuation_message_ids=(...)).
When a continuation fails after the first edit, return a result that clearly indicates partial delivery instead of plain success. Prefer success=False, retryable=True, and raw_response metadata such as delivered chunk count, total chunk count, last delivered message id, and the visible delivered prefix.
Preserve logging, but do not rely on logs as the only signal. The caller must be able to tell partial delivery happened.
Ensure the first edited chunk and all successful continuation chunks still include the existing Markdown/plain-text fallback behaviour.

Patterns to follow:

Existing overflow handling in TelegramAdapter.edit_message and _edit_overflow_split.
Existing SendResult semantics in gateway/platforms/base.py, especially retryable, raw_response, and continuation_message_ids.

Test scenarios:

Oversized finalized edit where all continuations succeed returns success, the last continuation id, and all continuation ids.
Oversized finalized edit where the first continuation send fails returns a partial-overflow failure and does not report success.
Oversized finalized edit where one continuation succeeds and a later continuation fails reports the last delivered continuation id and delivered count in raw metadata.
A continuation MarkdownV2 formatting failure still retries plain text before being treated as a delivery failure.

Verification: Adapter tests prove complete overflow remains successful and partial overflow is observable by the caller.

U2. Teach the stream consumer to recover from partial overflow

Goal: Ensure a partial Telegram overflow does not set _final_response_sent or _final_content_delivered unless the full response reached the user.

Requirements: R1, R2, R4, R5

Dependencies: U1

Files:

gateway/stream_consumer.py
tests/gateway/test_stream_consumer.py or a focused new tests/gateway/test_stream_consumer_telegram_overflow.py

Approach:

In _send_or_edit, when adapter.edit_message(...) returns a partial-overflow failure, update consumer state to reflect the last visible prefix/message and enter fallback delivery for the missing content.
Avoid treating _already_sent as final delivery. A partial visible message can be true while final delivery is false.
Use the delivered-prefix metadata if available so _send_fallback_final(...) sends only the missing tail. If implementation finds the prefix is unreliable after Markdown formatting, prefer sending the complete final response as a fresh fallback message rather than silently dropping the tail.
Keep the existing success handling for continuation_message_ids when the adapter delivered all chunks.

Patterns to follow:

Existing fallback mode in GatewayStreamConsumer._send_or_edit and _send_fallback_final.
Existing comments around _final_response_sent, _final_content_delivered, and _fallback_prefix for prior partial-delivery regressions.

Test scenarios:

A final streamed response that overflows and receives a complete-success edit split sets final-delivery flags and does not invoke fallback.
A final streamed response whose adapter reports partial overflow does not set final-delivery flags immediately.
After partial overflow, fallback delivery sends the remaining tail and then marks final content delivered only if the fallback send succeeds.
If fallback delivery also fails, the consumer leaves final-delivery false so the gateway's non-streaming final-send safety path can still run.

Verification: Stream consumer tests reproduce the screenshot shape by simulating first chunk visible and continuation failure, then assert the final answer is not suppressed.

U3. Preserve Telegram topic/thread routing for overflow and fallback continuations

Goal: Ensure overflow recovery messages land in the same Telegram forum topic or DM topic fallback context.

Requirements: R3

Dependencies: U1, U2

Files:

gateway/platforms/telegram.py
gateway/stream_consumer.py
tests/gateway/test_stream_consumer_thread_routing.py
Relevant Telegram adapter routing tests, if existing coverage is closer there

Approach:

Keep passing metadata through every overflow continuation and fallback send.
Keep reply anchors where valid, but do not let a missing reply anchor drop the message_thread_id for normal forum topics.
For private DM topic fallback metadata, preserve the existing stricter anchor behaviour documented in the adapter comments.

Patterns to follow:

TelegramAdapter._thread_kwargs_for_send(...).
Existing tests around Telegram topic recovery and stream consumer thread routing.

Test scenarios:

Overflow continuations include message_thread_id for a forum topic.
A continuation retry after reply message not found keeps forum topic routing when allowed.
Partial-overflow fallback sends receive the same metadata passed to the original stream consumer.

Verification: Thread-routing assertions inspect fake bot calls and confirm all continuation/fallback messages carry the expected topic metadata.

U4. Add issue evidence and PR body traceability

Goal: Make the upstream issue and PR clearly trace the user-visible bug and verification evidence.

Requirements: R5

Dependencies: U1, U2, U3

Files:

GitHub issue body created via gh issue create
PR body using .github/PULL_REQUEST_TEMPLATE.md

Approach:

Create a GitHub issue with the screenshot evidence: the long message in the Nehemiah - Coding Telegram topic stops at - The visible tool-call summary, and the user's reply says the previous message did not finish streaming to that Telegram topic.
Reference affected component as Gateway and platform as Telegram.
In the PR body, link the issue with Fixes #..., describe the split-delivery contract change, and include the screenshot or attach it if GitHub upload is available.
Follow CONTRIBUTING.md and the repository PR template exactly.

Patterns to follow:

.github/ISSUE_TEMPLATE/bug_report.yml
.github/PULL_REQUEST_TEMPLATE.md

Test scenarios:

Test expectation: none, this is tracker and PR documentation work.

Verification: The GitHub issue exists with screenshot evidence or an explicit screenshot reference, and the PR body links the issue and lists the tests run.

Scope Boundaries

In Scope

Telegram streamed response overflow splitting and recovery.
Stream consumer final-delivery truth for partial overflow delivery.
Topic/thread metadata preservation for overflow and fallback continuation sends.
Focused unit tests around adapter and stream consumer behaviour.

Out of Scope

Changing model streaming semantics in run_agent.py.
Reworking Telegram draft streaming, which is DM-only and not the forum-topic path in the screenshot.
Changing general platform message splitting for Discord, Slack, WhatsApp, or Matrix unless a shared helper must be corrected for the Telegram fix.
Altering tool-progress display settings or terminal progress rendering.

Deferred to Follow-Up Work

Broader observability for gateway delivery completeness across all messaging platforms.
A user-facing resend/recover command for a previous truncated response.

Risks & Mitigations

Risk: fallback recovery duplicates already-visible first chunks. Mitigation: use delivered-prefix metadata where reliable and add tests for no-duplicate complete-success behaviour.
Risk: preserving forum topic routing while dropping invalid reply anchors is easy to regress. Mitigation: include fake bot call assertions for message_thread_id and reply behaviour.
Risk: MarkdownV2 formatting can alter visible/raw prefix comparisons. Mitigation: keep fallback conservative; duplicate content is preferable to silently missing content, but tests should keep the common path tail-only.

Sources & Research

User-provided screenshot at /root/.hermes/image_cache/img_f664e68f6ddf.jpg.
gateway/stream_consumer.py streamed edit, overflow, fallback, and final-delivery state handling.
gateway/platforms/telegram.py Telegram send/edit overflow splitting and topic routing helpers.
gateway/platforms/base.py SendResult contract and shared message chunking helper.
tests/gateway/test_stream_consumer.py, tests/gateway/test_stream_consumer_thread_routing.py, and Telegram adapter tests for focused regression coverage.

Verification Strategy

Run focused Telegram adapter overflow tests.
Run focused stream consumer overflow/fallback tests.
Run topic-routing tests affected by metadata changes.
Run the gateway test subset around Telegram send/edit, stream consumer, and run progress if touched.
Before PR creation, ensure git diff contains only the plan, implementation, tests, and PR/issue-relevant documentation for this bug.