fix(langchain): align LangChain/LangGraph tracing with the Python SDK by viniciusdsmello · Pull Request #209 · openlayer-ai/openlayer-ts

viniciusdsmello · 2026-06-11T15:10:35Z

Summary

LangGraph agent traces produced by the OpenlayerHandler callback routinely exceeded the platform's 10MB request limit and were rejected wholesale. Investigating the handler surfaced three distinct problem areas — payload bloat, trace corruption under concurrency, and poor dashboard rendering — all stemming from the same root cause: the TypeScript handler had drifted from the Python SDK's behavior, which is the reference implementation the platform is built around.

This PR aligns the TypeScript LangChain integration with langchain_callback.py. No consumer-side changes required — upgrading the SDK is enough.

1. Payload size

A representative agent run (50KB response, 15 bound tools, 8-message history) shrinks from 1604KB to 659KB (-59%); the gain grows with the number of LLM calls per run.

Filter LangGraph internal runs. LANGSMITH_HIDDEN_TAG was defined but never used. Internal plumbing runs (ChannelWrite, branch runnables — all tagged langsmith:hidden) each became a step carrying the full graph state as both inputs and output. They are now skipped, like in Python.
Compact rawOutput. It was JSON.stringify({generation, llmOutput, fullResponse}, null, 2) — fullResponse duplicated generation + llmOutput, and the pretty-printed string was then escaped inside the outer JSON. Now {generation, llmOutput}, compact.
Deduplicate invocation params. Bound tool schemas were serialized 4x per chat completion step (modelParameters, metadata.invocation_params, metadata.model_parameters, metadata.extra_params.invocation_params). Now only modelParameters.

No client-side truncation is performed, matching Python: size is addressed by not serializing redundant data in the first place.

2. Concurrency isolation

The handler relied on the tracer's module-global step stack, so concurrent graph runs in the same process nested into one ever-growing merged trace. Steps are now assembled per run keyed by runId/parentRunId (mirroring Python's run_id/parent_run_id maps), so concurrent executions upload as separate traces. When an ambient trace context exists (e.g. a trace()-wrapped function), steps still nest under it as before.

3. Trace shape & step types

Chain runs become USER_CALL steps named after the chain (no more "Handoffs: " prefix); the LangGraph root records inputs.prompt and surfaces the final message content as the trace output, so the dashboard shows the actual answer instead of a serialized state object.
Before upload, LangChain objects are converted recursively (Python's _convert_langchain_objects): messages render as {role, content} instead of raw lc/kwargs constructor JSON.
Agent steps use Python's Agent Tool: <tool> naming with structured {tool, tool_input, log} inputs; retriever inputs become {query}.

4. New step types: HANDOFF and GUARDRAIL

The Python SDK and the platform support handoff and guardrail step types that the TypeScript SDK could not emit:

StepType.HANDOFF / HandoffStep (fromComponent, toComponent, handoffData) and StepType.GUARDRAIL / GuardrailStep (action, reason, blocked/detected/redacted entities, confidenceThreshold, blockStrategy, dataType), serializing the same wire fields as Python's steps.py.
addHandoffStepToTrace now emits a real handoff step (it previously emitted a chain step with a name prefix); new addGuardrailStepToTrace helper.
The LangChain handler maps LangGraph multi-agent handoff tools (transfer_to_<agent> / transfer_back_to_<agent>) to HANDOFF steps with from/to components.

5. Tracer

Upload logic extracted into a reusable processAndUploadTrace() (analogous to Python's _upload_and_publish_trace). Upload failures now log a compact summary — pipeline id, inference id, payload size — instead of dumping the entire pretty-printed trace into the logs.

Validation

Test files are intentionally not part of this PR; validation was performed locally:

Unit/integration tests at the handler→upload boundary covering each fix, written test-first; the concurrency test reproduces the merged-trace bug against the old implementation
End-to-end tests running a real LangGraph StateGraph via graph.withConfig({callbacks}) — real callback events including LangGraph's hidden internal runs — with only the HTTP boundary mocked; these fail against the previous implementation and pass with this branch
Live validation: real LangGraph runs traced with this branch uploaded successfully to a real inference pipeline, confirming ingestion and the improved trace rendering; a probe trace confirmed the platform persists all step types used here, including handoff and guardrail
tests/integrations/claudeAgentSdk.test.ts (17 tests) green — it shares the tracer internals
tsc --noEmit, eslint and prettier clean; remaining jest failures are pre-existing on main (generated api-resources tests require the mock Steady server; openai-tracer.test.ts has one pre-existing failure)

Out of scope

Migrating tracedTool from function_call to tool steps (behavior change for existing users — needs a product decision)
Gzip compression of the data-stream POST (requires confirming backend support for Content-Encoding: gzip)

🤖 Generated with Claude Code

LangGraph agent traces routinely exceeded the platform's 10MB request limit and were rejected wholesale, concurrent runs corrupted each other's traces, and steps rendered poorly in the dashboard. All of it stemmed from the TypeScript handler drifting from langchain_callback.py, the reference implementation. This aligns the two: Payload size (-59% on a representative agent run): - Skip LangGraph internal runs tagged langsmith:hidden (ChannelWrite, branch runnables), which carried the full graph state as both inputs and output. The constant existed but was never used. - Compact rawOutput: drop the duplicated fullResponse blob and the pretty-printing that inflated the escaped string. - Serialize invocation params (including bound tool schemas) once per chat completion step, in modelParameters, instead of four times. Concurrency: - Assemble steps per run keyed by runId/parentRunId, mirroring the Python handler's run_id/parent_run_id maps, instead of relying on the tracer's module-global step stack. Concurrent graph executions now upload as separate traces; ambient trace contexts still nest as before. Trace shape: - Chain runs become USER_CALL steps named after the chain; the LangGraph root records inputs.prompt and surfaces the final message content as the trace output. - LangChain objects are converted recursively before upload, so messages render as {role, content} instead of raw lc/kwargs constructor JSON. - Agent steps use Python's "Agent Tool: <tool>" naming with structured inputs; retriever inputs become {query}. Step types: - Add HANDOFF/GUARDRAIL step types with HandoffStep and GuardrailStep serializing the same wire fields as Python's steps.py. - addHandoffStepToTrace now emits a real handoff step (previously a chain step with a name prefix); add addGuardrailStepToTrace. - Map LangGraph multi-agent handoff tools (transfer_to_<agent>) to HANDOFF steps with from/to components. Tracer: - Extract upload logic into a reusable processAndUploadTrace(), analogous to Python's _upload_and_publish_trace. - Log a compact summary on upload failure instead of dumping the entire pretty-printed trace into the logs. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

viniciusdsmello force-pushed the fix/langchain-trace-payload-size branch from 4ad2caa to 27236a6 Compare June 11, 2026 15:21

viniciusdsmello changed the title ~~fix(langchain): shrink trace payloads and isolate concurrent runs~~ fix(langchain): align LangChain/LangGraph tracing with the Python SDK Jun 11, 2026

viniciusdsmello force-pushed the fix/langchain-trace-payload-size branch from cd37a78 to 61704b1 Compare June 11, 2026 17:11

gustavocidornelas approved these changes Jun 11, 2026

View reviewed changes

gustavocidornelas merged commit 11d1039 into main Jun 11, 2026
5 checks passed

gustavocidornelas deleted the fix/langchain-trace-payload-size branch June 11, 2026 18:00

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(langchain): align LangChain/LangGraph tracing with the Python SDK#209

fix(langchain): align LangChain/LangGraph tracing with the Python SDK#209
gustavocidornelas merged 1 commit into
mainfrom
fix/langchain-trace-payload-size

viniciusdsmello commented Jun 11, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

viniciusdsmello commented Jun 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

1. Payload size

2. Concurrency isolation

3. Trace shape & step types

4. New step types: HANDOFF and GUARDRAIL

5. Tracer

Validation

Out of scope

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

viniciusdsmello commented Jun 11, 2026 •

edited

Loading