fix(memory): ingestion reliability and merge bloat#604
Conversation
A chunk was marked as failed when the per-chunk ingestion agent did not emit the memory_persistence_complete signal, even though memory_save had already committed. Combined with the retry loop this re-saved the same memories indefinitely. Chunk completion is now decided by the agent run returning Ok (the contract state is kept for observability only).
A failed ingestion file was re-processed on every poll cycle forever (no cap, no backoff). Add attempts + next_attempt_at columns to ingestion_files: failed files back off exponentially and are quarantined after a max attempt count, so the poll loop stops re-processing them.
Deleting an ingest file only removed the ingestion_files row, leaving the file on disk so the poll loop re-discovered it and re-created the row. Delete now removes the disk file and purges both ingestion_files and ingestion_progress.
Maintenance merged near-duplicate memories by concatenating their content, bloating a memory with many reworded copies of the same fact. Keep the importance-winner's content as the single canonical version.
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (3)
🚧 Files skipped from review as they are similar to previous changes (3)
WalkthroughAdds ingestion retry-budget tracking with quarantine/backoff handling, updates ingest-file deletion to purge disk and database rows, and changes memory merge output to keep the winner text while adjusting tests for the new behavior. ChangesIngestion retry flow
Ingest file purge
Memory merge content
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related issues
Suggested reviewers
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (1)
src/api/ingest.rs (1)
241-245: 📐 Maintainability & Code Quality | 🔵 Trivial | 💤 Low valueAvoid abbreviated binding name
e.Use
errorto match the project convention (and the spelled-outerroralready used indelete_ingest_fileat Line 286).♻️ Proposed rename
match tokio::fs::remove_file(&path).await { Ok(()) => {} - Err(e) if e.kind() == std::io::ErrorKind::NotFound => {} - Err(e) => return Err(e.into()), + Err(error) if error.kind() == std::io::ErrorKind::NotFound => {} + Err(error) => return Err(error.into()), }As per coding guidelines: "Don't abbreviate variable names. Use
queuenotq,messagenotmsg,channelnotch."🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/api/ingest.rs` around lines 241 - 245, Rename the abbreviated error binding in the `remove_file` match inside `delete_ingest_file` to `error` to match the project’s naming convention and the existing usage later in the same function. Update both the `NotFound` guard and the fallback `Err(...)` arm to use the spelled-out `error` name consistently.Source: Coding guidelines
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/agent/ingestion.rs`:
- Around line 355-360: The completion flow in ingestion handling deletes
progress before the source file removal can fail, which can leave the source
intact but chunk markers cleared. In the ingestion success path around
complete_ingestion_file, delete_progress, and tokio::fs::remove_file, move the
source-file deletion to happen before clearing ingestion_progress, and only call
delete_progress after remove_file succeeds so a failed delete cannot trigger
chunk reprocessing.
In `@src/memory/maintenance.rs`:
- Around line 318-320: The fallback path in the merge logic returns
loser_trimmed directly when winner_trimmed is empty, which bypasses
MAX_MERGED_MEMORY_CONTENT_BYTES enforcement. Update the trimming flow in the
relevant merge function in maintenance.rs so the loser fallback also goes
through the same max-size truncation used for normal merged content, preserving
the byte limit regardless of which side wins.
---
Nitpick comments:
In `@src/api/ingest.rs`:
- Around line 241-245: Rename the abbreviated error binding in the `remove_file`
match inside `delete_ingest_file` to `error` to match the project’s naming
convention and the existing usage later in the same function. Update both the
`NotFound` guard and the fallback `Err(...)` arm to use the spelled-out `error`
name consistently.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 4b014c1b-e1df-4aa2-9ddd-cf84d4556a76
📒 Files selected for processing (4)
migrations/20260623000001_ingestion_retry_budget.sqlsrc/agent/ingestion.rssrc/api/ingest.rssrc/memory/maintenance.rs
| if loser_trimmed.is_empty() { | ||
| return winner_trimmed.to_string(); | ||
| if winner_trimmed.is_empty() { | ||
| return loser_trimmed.to_string(); |
There was a problem hiding this comment.
Edge-case: if winner_trimmed is empty, this early-return skips the MAX_MERGED_MEMORY_CONTENT_BYTES truncation. Might be worth applying the same truncation logic here too.
| return loser_trimmed.to_string(); | |
| if winner_trimmed.is_empty() { | |
| if loser_trimmed.len() <= MAX_MERGED_MEMORY_CONTENT_BYTES { | |
| return loser_trimmed.to_string(); | |
| } | |
| let boundary = loser_trimmed.floor_char_boundary(MAX_MERGED_MEMORY_CONTENT_BYTES); | |
| return loser_trimmed[..boundary].to_string(); | |
| } |
| .await? | ||
| { | ||
| let path = ingest_dir.join(&filename); | ||
| match tokio::fs::remove_file(&path).await { |
There was a problem hiding this comment.
Since this deletes a disk path derived from ingestion_files.filename, consider rejecting any non-normal path components (e.g. .. / absolute paths) before join as defense-in-depth.
| match tokio::fs::remove_file(&path).await { | |
| let filename_path = Path::new(&filename); | |
| if filename_path | |
| .components() | |
| .any(|c| !matches!(c, std::path::Component::Normal(_))) | |
| { | |
| anyhow::bail!("invalid ingest filename: {filename}"); | |
| } | |
| let path = ingest_dir.join(filename_path); |
- ingestion: remove the source file before clearing chunk progress, so a failed remove_file cannot drop progress markers and cause already-ingested chunks to be re-processed (and re-create memories). - memory: apply the MAX_MERGED_MEMORY_CONTENT_BYTES cap on the empty-winner fallback path in merged_memory_content. - api: reject non-normal path components (`..`, absolute) before deleting an ingest file (defense-in-depth on a destructive op); rename error binding for consistency.
Cross-backend fixes raised by the upstream PR reviews, applied to the integration branch: ingest delete-before-clear-progress, merged_memory_content size cap on the empty-winner path, ingest delete path-component guard, agent_id on the SQLite backend store, deterministic (weight-descending) edge ordering in graph traversal, chunked IN-list queries to stay under SQLite's bind limit, and FTS-index ensure on set_embedding. Feature-off: 891 tests green. Feature-on: clippy clean.
Summary
Four related fixes to the memory ingestion pipeline and the maintenance merge. They were found during a real ingestion run that produced a runaway loop of duplicate memories. All are backend-agnostic; the only schema change is an additive migration on
ingestion_files.Fixes
1. Deterministic chunk completion —
src/agent/ingestion.rsThe per-chunk ingestion agent was required to call
memory_persistence_complete. If it didn't (e.g. a smaller model skipping the signal), the chunk was marked failed even thoughmemory_savehad already committed. Chunk completion is now decided by the agent run returningOk; the persistence-contract state is kept only for observability. This removes the false failure that fed the retry loop below.2. Bounded retries with backoff + quarantine —
src/agent/ingestion.rs(+ migration)A failed ingestion file was re-processed on every poll cycle indefinitely (no cap, no backoff), continuously re-creating memories. Adds
attempts+next_attempt_attoingestion_files: failures back off exponentially and are quarantined after a maximum attempt count, so the poll loop stops re-processing them.3. Ingest delete removes the source file —
src/api/ingest.rsDeleting an ingest file only removed the
ingestion_filesrow, leaving the file on disk; the poll loop then re-discovered it and re-created the row ("reappears"). Delete now removes the disk file and purges bothingestion_filesandingestion_progress.4. Canonical merge instead of concatenation —
src/memory/maintenance.rsMaintenance merged near-duplicate memories by concatenating their content (
winner\n\nloser), bloating a single memory with many reworded copies of the same fact. It now keeps the importance-winner's content as the single canonical version.Migration
migrations/20260623000001_ingestion_retry_budget.sql— additive only (ADD COLUMN attempts,ADD COLUMN next_attempt_at).ingestion_files.statushas no CHECK constraint, so the new terminalquarantinedvalue needs no schema change.Tests
cargo test --lib— all green.Note
Four fixes addressing a runaway ingestion loop: deterministic chunk completion tracking, bounded retries with exponential backoff and quarantine, deletion of source files on ingest delete, and canonical-content merging instead of concatenation.
Written by Tembo for commit d6239885. This will update automatically on new commits.