fix(langchain): align LangChain/LangGraph tracing with the Python SDK#209
Merged
Merged
Conversation
4ad2caa to
27236a6
Compare
LangGraph agent traces routinely exceeded the platform's 10MB request
limit and were rejected wholesale, concurrent runs corrupted each
other's traces, and steps rendered poorly in the dashboard. All of it
stemmed from the TypeScript handler drifting from langchain_callback.py,
the reference implementation. This aligns the two:
Payload size (-59% on a representative agent run):
- Skip LangGraph internal runs tagged langsmith:hidden (ChannelWrite,
branch runnables), which carried the full graph state as both inputs
and output. The constant existed but was never used.
- Compact rawOutput: drop the duplicated fullResponse blob and the
pretty-printing that inflated the escaped string.
- Serialize invocation params (including bound tool schemas) once per
chat completion step, in modelParameters, instead of four times.
Concurrency:
- Assemble steps per run keyed by runId/parentRunId, mirroring the
Python handler's run_id/parent_run_id maps, instead of relying on the
tracer's module-global step stack. Concurrent graph executions now
upload as separate traces; ambient trace contexts still nest as
before.
Trace shape:
- Chain runs become USER_CALL steps named after the chain; the LangGraph
root records inputs.prompt and surfaces the final message content as
the trace output.
- LangChain objects are converted recursively before upload, so messages
render as {role, content} instead of raw lc/kwargs constructor JSON.
- Agent steps use Python's "Agent Tool: <tool>" naming with structured
inputs; retriever inputs become {query}.
Step types:
- Add HANDOFF/GUARDRAIL step types with HandoffStep and GuardrailStep
serializing the same wire fields as Python's steps.py.
- addHandoffStepToTrace now emits a real handoff step (previously a
chain step with a name prefix); add addGuardrailStepToTrace.
- Map LangGraph multi-agent handoff tools (transfer_to_<agent>) to
HANDOFF steps with from/to components.
Tracer:
- Extract upload logic into a reusable processAndUploadTrace(),
analogous to Python's _upload_and_publish_trace.
- Log a compact summary on upload failure instead of dumping the entire
pretty-printed trace into the logs.
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
cd37a78 to
61704b1
Compare
gustavocidornelas
approved these changes
Jun 11, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
LangGraph agent traces produced by the
OpenlayerHandlercallback routinely exceeded the platform's 10MB request limit and were rejected wholesale. Investigating the handler surfaced three distinct problem areas — payload bloat, trace corruption under concurrency, and poor dashboard rendering — all stemming from the same root cause: the TypeScript handler had drifted from the Python SDK's behavior, which is the reference implementation the platform is built around.This PR aligns the TypeScript LangChain integration with
langchain_callback.py. No consumer-side changes required — upgrading the SDK is enough.1. Payload size
A representative agent run (50KB response, 15 bound tools, 8-message history) shrinks from 1604KB to 659KB (-59%); the gain grows with the number of LLM calls per run.
LANGSMITH_HIDDEN_TAGwas defined but never used. Internal plumbing runs (ChannelWrite, branch runnables — all taggedlangsmith:hidden) each became a step carrying the full graph state as both inputs and output. They are now skipped, like in Python.rawOutput. It wasJSON.stringify({generation, llmOutput, fullResponse}, null, 2)—fullResponseduplicatedgeneration+llmOutput, and the pretty-printed string was then escaped inside the outer JSON. Now{generation, llmOutput}, compact.modelParameters,metadata.invocation_params,metadata.model_parameters,metadata.extra_params.invocation_params). Now onlymodelParameters.No client-side truncation is performed, matching Python: size is addressed by not serializing redundant data in the first place.
2. Concurrency isolation
The handler relied on the tracer's module-global step stack, so concurrent graph runs in the same process nested into one ever-growing merged trace. Steps are now assembled per run keyed by
runId/parentRunId(mirroring Python'srun_id/parent_run_idmaps), so concurrent executions upload as separate traces. When an ambient trace context exists (e.g. atrace()-wrapped function), steps still nest under it as before.3. Trace shape & step types
USER_CALLsteps named after the chain (no more "Handoffs: " prefix); the LangGraph root recordsinputs.promptand surfaces the final message content as the trace output, so the dashboard shows the actual answer instead of a serialized state object._convert_langchain_objects): messages render as{role, content}instead of rawlc/kwargsconstructor JSON.Agent Tool: <tool>naming with structured{tool, tool_input, log}inputs; retriever inputs become{query}.4. New step types: HANDOFF and GUARDRAIL
The Python SDK and the platform support
handoffandguardrailstep types that the TypeScript SDK could not emit:StepType.HANDOFF/HandoffStep(fromComponent,toComponent,handoffData) andStepType.GUARDRAIL/GuardrailStep(action,reason, blocked/detected/redacted entities,confidenceThreshold,blockStrategy,dataType), serializing the same wire fields as Python'ssteps.py.addHandoffStepToTracenow emits a realhandoffstep (it previously emitted achainstep with a name prefix); newaddGuardrailStepToTracehelper.transfer_to_<agent>/transfer_back_to_<agent>) toHANDOFFsteps with from/to components.5. Tracer
Upload logic extracted into a reusable
processAndUploadTrace()(analogous to Python's_upload_and_publish_trace). Upload failures now log a compact summary — pipeline id, inference id, payload size — instead of dumping the entire pretty-printed trace into the logs.Validation
Test files are intentionally not part of this PR; validation was performed locally:
StateGraphviagraph.withConfig({callbacks})— real callback events including LangGraph's hidden internal runs — with only the HTTP boundary mocked; these fail against the previous implementation and pass with this branchhandoffandguardrailtests/integrations/claudeAgentSdk.test.ts(17 tests) green — it shares the tracer internalstsc --noEmit, eslint and prettier clean; remaining jest failures are pre-existing onmain(generatedapi-resourcestests require the mock Steady server;openai-tracer.test.tshas one pre-existing failure)Out of scope
tracedToolfromfunction_calltotoolsteps (behavior change for existing users — needs a product decision)Content-Encoding: gzip)🤖 Generated with Claude Code