APP-11676: Change data sync uploading binary files to not read in the entire binary data before uploading#5949
APP-11676: Change data sync uploading binary files to not read in the entire binary data before uploading#5949gloriacai01 wants to merge 12 commits intoviamrobotics:mainfrom
Conversation
| // Successive calls advance through the file; call f.Reset() to restart. | ||
| // Returns io.EOF when no messages remain. | ||
| // | ||
| // Assumes SensorMetadata (field 1) precedes the binary payload (field 3). |
There was a problem hiding this comment.
what do field 1 field 3 mean here? I'm assuming proto SensorData message fields, if so let's clarify
There was a problem hiding this comment.
generaly, there's a lot of proto magic here, so i think some paragraph explaining the internals of what's this doing would be helpful for future readers. also comment on magic numbers like L327-328
There was a problem hiding this comment.
field 1 = sensormetadata and field 3 = binary payload.
true, added comments in this function, lmk if makes sense
| if ok { | ||
| return n, nil | ||
| } | ||
| // ok=false: no binary payload field found (legacy camera.GetImages file storing |
There was a problem hiding this comment.
how about instead of ok=false, we return an explicit new error type ErrNoBinaryField so it's more of an explicit signal?
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
n0nick
left a comment
There was a problem hiding this comment.
change LGTM, but my review agent had a nitpick that I'll let you decide on; it's hypothetical but if you want should be an easy fix
The
uploadBinaryPayloadsloop wraps anyBinaryPayloadReadererror (includingErrNoBinaryField) witherrors.Wrap, and the outer switch inuploadDataCaptureFilethen matches it via
errors.Is. That works (pkg/errors implements Unwrap), but: if a later message in a multi-message file returnsErrNoBinaryFieldafter earlier messages were already streamed, the
outer switch falls back touploadFromMemoryand re-uploads the whole file — a double-upload. The doc says binary files only ever contain one message, so this is hypothetical today,
but it's a trap for later. Two cheap fixes:
- Only return
ErrNoBinaryFieldfromuploadBinaryPayloadswhenmsgIdx == 0(already does this at the bottom); for the in-loop case, wrap in a different error so the fallback doesn't
fire. i.e.if msgIdx == 0 && errors.Is(err, data.ErrNoBinaryField) { return 0, err }before the generic wrap.- Or just leave a comment documenting the one-message invariant at the top of
uploadBinaryPayloads.
| // Legacy camera.GetImages file — tabular data in a BINARY_SENSOR-typed file. | ||
| // Fall through to in-memory path. |
There was a problem hiding this comment.
| // Legacy camera.GetImages file — tabular data in a BINARY_SENSOR-typed file. | |
| // Fall through to in-memory path. | |
| // No binary payload - maybe tabular data in a BINARY_SENSOR-typed file | |
| // (like legacy camera.GetImages file) - fall through to in-memory path. |
is this accurate? wdyt
Summary
Fixes APP-11676
Previously, data sync loaded the entire binary payload of each
.capturefile into memorybefore uploading it. .
Changes
data/capture_file.go— addsBinaryPayloadReader(), which parses the protobuf wireformat manually to extract the binary payload as an
io.SectionReaderbacked directly by the file on disk. The payload is never copied into memory; the caller streams it chunk by chunk.services/datamanager/builtin/sync/upload_data_capture_file.go— rewrites theBINARY_SENSORupload path:uploadBinaryPayloads()loops through messages in the capture file usingBinaryPayloadReader(), streaming each payload directly from diskMaxUnaryFileSize) go through the newuploadLargeBinaryFromReader(),which accepts an
io.Readerinstead of[]bytenot the whole file
camera.GetImagesfiles fall through to the existing path unchangedsendStreamingDCRequestsupdated to acceptio.Readerinstead of[]byteTesting
BinaryPayloadReader.capturefiles via a script and synced withmaximum_num_sync_threads: 8, monitoring RSS with: