Skip to content

APP-11676: Change data sync uploading binary files to not read in the entire binary data before uploading#5949

Open
gloriacai01 wants to merge 12 commits intoviamrobotics:mainfrom
gloriacai01:app-11667
Open

APP-11676: Change data sync uploading binary files to not read in the entire binary data before uploading#5949
gloriacai01 wants to merge 12 commits intoviamrobotics:mainfrom
gloriacai01:app-11667

Conversation

@gloriacai01
Copy link
Copy Markdown
Member

@gloriacai01 gloriacai01 commented Apr 20, 2026

Summary

Fixes APP-11676

Previously, data sync loaded the entire binary payload of each .capture file into memory
before uploading it. .

Changes

data/capture_file.go — adds BinaryPayloadReader(), which parses the protobuf wire
format manually to extract the binary payload as an io.SectionReader backed directly by the file on disk. The payload is never copied into memory; the caller streams it chunk by chunk.

services/datamanager/builtin/sync/upload_data_capture_file.go — rewrites the
BINARY_SENSOR upload path:

  • uploadBinaryPayloads() loops through messages in the capture file using
    BinaryPayloadReader(), streaming each payload directly from disk
  • Large payloads (> MaxUnaryFileSize) go through the new uploadLargeBinaryFromReader(),
    which accepts an io.Reader instead of []byte
  • Small payloads still use the unary API but only read the single message into memory,
    not the whole file
  • Tabular files and legacy camera.GetImages files fall through to the existing path unchanged
  • sendStreamingDCRequests updated to accept io.Reader instead of []byte

Testing

  • Added unit tests for BinaryPayloadReader
  • Manually tested capturing + syncing small image payloads (ensure old path should be unchanged)
  • Generated 50 × 100 MB binary .capture files via a script and synced with
    maximum_num_sync_threads: 8, monitoring RSS with:
watch -n1 'ps -o rss= -p $(pgrep viam-server)'

┌─────────────┬──────────┐
│   Branch    │ Peak RSS │
├─────────────┼──────────┤                                                                                                                                                                                                                                                                         
│ main        │ ~1.6 GB  │
├─────────────┼──────────┤                                                                                                                                                                                                                                                                         
│ this branch │ ~252 MB  │ 
└─────────────┴──────────┘

~1.4 GB reduction in peak RSS. main scales linearly with threads × file_size since
each thread loads its full file into memory. This branch holds at most one 64 KB chunk per
thread regardless of file size, so peak memory is effectively independent of file size.                                                                                                                                                                                                            

@viambot viambot added the safe to test This pull request is marked safe to test from a trusted zone label Apr 20, 2026
@viambot viambot added safe to test This pull request is marked safe to test from a trusted zone and removed safe to test This pull request is marked safe to test from a trusted zone labels Apr 20, 2026
@viambot viambot added safe to test This pull request is marked safe to test from a trusted zone and removed safe to test This pull request is marked safe to test from a trusted zone labels Apr 20, 2026
@gloriacai01 gloriacai01 changed the title APP-11676 APP-11676: Change data sync uploading binary files to not read in the entire binary data before uploading Apr 20, 2026
@viambot viambot added safe to test This pull request is marked safe to test from a trusted zone and removed safe to test This pull request is marked safe to test from a trusted zone labels Apr 20, 2026
@viambot viambot added safe to test This pull request is marked safe to test from a trusted zone and removed safe to test This pull request is marked safe to test from a trusted zone labels Apr 20, 2026
@viambot viambot added safe to test This pull request is marked safe to test from a trusted zone and removed safe to test This pull request is marked safe to test from a trusted zone labels Apr 20, 2026
@gloriacai01 gloriacai01 requested a review from n0nick April 20, 2026 21:11
Comment thread data/capture_file.go Outdated
Comment thread data/capture_file.go Outdated
Comment thread data/capture_file.go Outdated
// Successive calls advance through the file; call f.Reset() to restart.
// Returns io.EOF when no messages remain.
//
// Assumes SensorMetadata (field 1) precedes the binary payload (field 3).
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what do field 1 field 3 mean here? I'm assuming proto SensorData message fields, if so let's clarify

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

generaly, there's a lot of proto magic here, so i think some paragraph explaining the internals of what's this doing would be helpful for future readers. also comment on magic numbers like L327-328

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

field 1 = sensormetadata and field 3 = binary payload.
true, added comments in this function, lmk if makes sense

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

much better, thank you 👍

Comment thread data/capture_file.go
Comment thread data/capture_file.go
if ok {
return n, nil
}
// ok=false: no binary payload field found (legacy camera.GetImages file storing
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how about instead of ok=false, we return an explicit new error type ErrNoBinaryField so it's more of an explicit signal?

@viambot viambot added safe to test This pull request is marked safe to test from a trusted zone and removed safe to test This pull request is marked safe to test from a trusted zone labels Apr 21, 2026
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@viambot viambot added safe to test This pull request is marked safe to test from a trusted zone and removed safe to test This pull request is marked safe to test from a trusted zone labels Apr 21, 2026
@viambot viambot added safe to test This pull request is marked safe to test from a trusted zone and removed safe to test This pull request is marked safe to test from a trusted zone labels Apr 22, 2026
@gloriacai01 gloriacai01 requested a review from n0nick April 22, 2026 14:06
Copy link
Copy Markdown
Member

@n0nick n0nick left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

change LGTM, but my review agent had a nitpick that I'll let you decide on; it's hypothetical but if you want should be an easy fix

The uploadBinaryPayloads loop wraps any BinaryPayloadReader error (including ErrNoBinaryField) with errors.Wrap, and the outer switch in uploadDataCaptureFile then matches it via
errors.Is. That works (pkg/errors implements Unwrap), but: if a later message in a multi-message file returns ErrNoBinaryField after earlier messages were already streamed, the
outer switch falls back to uploadFromMemory and re-uploads the whole file — a double-upload. The doc says binary files only ever contain one message, so this is hypothetical today,
but it's a trap for later. Two cheap fixes:

  • Only return ErrNoBinaryField from uploadBinaryPayloads when msgIdx == 0 (already does this at the bottom); for the in-loop case, wrap in a different error so the fallback doesn't
    fire. i.e. if msgIdx == 0 && errors.Is(err, data.ErrNoBinaryField) { return 0, err } before the generic wrap.
  • Or just leave a comment documenting the one-message invariant at the top of uploadBinaryPayloads.

Comment on lines +57 to +58
// Legacy camera.GetImages file — tabular data in a BINARY_SENSOR-typed file.
// Fall through to in-memory path.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// Legacy camera.GetImages file — tabular data in a BINARY_SENSOR-typed file.
// Fall through to in-memory path.
// No binary payload - maybe tabular data in a BINARY_SENSOR-typed file
// (like legacy camera.GetImages file) - fall through to in-memory path.

is this accurate? wdyt

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

safe to test This pull request is marked safe to test from a trusted zone

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants