Skip to content

schema: add machine-readable JSON Schema v2 (diffgraph-v2.schema.json)#16

Open
avikalpg wants to merge 2 commits into
mainfrom
nia/schema-v2-json-schema
Open

schema: add machine-readable JSON Schema v2 (diffgraph-v2.schema.json)#16
avikalpg wants to merge 2 commits into
mainfrom
nia/schema-v2-json-schema

Conversation

@avikalpg

@avikalpg avikalpg commented Jun 8, 2026

Copy link
Copy Markdown
Contributor

What

Adds diffgraph/schema/diffgraph-v2.schema.json — the JSON Schema 2020-12 draft that operationalises the v2 output contract from design/JSON-SCHEMA.md.

This satisfies one of the four schema ratification criteria listed in that doc:

  • A JSON Schema validation file exists at diffgraph/schema/diffgraph-v2.schema.json

What it covers

  • FileEntry — git-metadata-derived file change info
  • SymbolEntry — named code entities with change_kind, analysis_source, evidence (required for inferred claims)
  • RelationshipEntry — edges with mandatory analysis_source; confidence + evidence required when inferred
  • SummaryEntry — top-level LLM summary (always inferred)
  • Evidence — typed union of all evidence kinds (ast_parse, import_statement, call_site, llm_inference, structural_basis, ...)
  • Metadata — includes privacy_tier as a required field
  • Warning — machine-readable warning codes for degraded analysis

Why now

The schema file was a listed ratification criterion. It's completely unblocked (doesn't depend on Avikalp's answers to B1/B2/B3 or any PR merge). Having it available means:

Remaining ratification criteria

Related

  • Companion doc: design/PR-13-REVIEW.md (committed to the docs repo, Mon Jun 8 session) — code-level review of PR Phase1: tree sitter dependency extraction for diffgraph #13 against acceptance criteria
  • design/DESIGN-SYNTHESIS.md — full v2 spec
  • design/JSON-SCHEMA.md — prose schema spec this file implements

Summary by CodeRabbit

  • New Features
    • Introduced DiffGraph v2.0 schema as the canonical output format, adding comprehensive validation for artifact shape, identifiers, files, symbols, relationships, and metadata.
    • Enforced strict structural constraints and conditional requirements (e.g., evidence and confidence for inferred entries) plus enumerated types and warning/metadata fields for consistent, validated outputs.

Adds diffgraph/schema/diffgraph-v2.schema.json — the JSON Schema 2020-12
draft that operationalises the v2 output contract from design/JSON-SCHEMA.md.

Covers: FileEntry, SymbolEntry, RelationshipEntry, SummaryEntry, Evidence,
Metadata, Warning, AnalysisSource. Required fields enforced; inferred claims
must carry evidence + confidence. privacy_tier is a top-level required
metadata field. Consumers can use this for validation in CI, typed generation,
and VS Code schema hints.

This satisfies one of the four schema ratification criteria in JSON-SCHEMA.md
(the machine-readable file). Still needs: Avikalp sign-off on sub-questions,
one end-to-end worked example validated, PR #11 updated to target this schema.
@coderabbitai

coderabbitai Bot commented Jun 8, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 02fcf128-6e74-4e84-b666-bfba6bb6740c

📥 Commits

Reviewing files that changed from the base of the PR and between f36c1e9 and c995b3e.

📒 Files selected for processing (1)
  • diffgraph/schema/diffgraph-v2.schema.json
🚧 Files skipped from review as they are similar to previous changes (1)
  • diffgraph/schema/diffgraph-v2.schema.json

Walkthrough

A new JSON Schema file defines the canonical DiffGraph v2.0 output format: a strict top-level object with required fields (schema_version, generated_at, wild_version, diff_ref, files, symbols, relationships, metadata), reusable types for provenance/evidence, conditional validation for inferred entries, and metadata/warnings schemas.

Changes

DiffGraph v2.0 Output Schema Contract

Layer / File(s) Summary
Root schema structure and constraints
diffgraph/schema/diffgraph-v2.schema.json
Top-level schema version, required fields (schema_version, generated_at, wild_version, diff_ref, files, symbols, relationships, metadata), strict additionalProperties: false, and structure for diff identity and top-level arrays.
Core type definitions and evidence provenance
diffgraph/schema/diffgraph-v2.schema.json
Reusable defs: AnalysisSource, Evidence, Classification; FileEntry and SymbolEntry record schemas with identifier patterns and conditional validation that inferred SymbolEntry requires non-empty evidence.
RelationshipEntry schema
diffgraph/schema/diffgraph-v2.schema.json
RelationshipEntry with enumerated kind, stable edge identifiers, endpoints, optional resolution/confidence fields, and conditional validation requiring confidence plus non-empty evidence when analysis_source is inferred.
SummaryEntry and evidence rules
diffgraph/schema/diffgraph-v2.schema.json
SummaryEntry fixed to analysis_source: "inferred" and mandates evidence composition (must include both llm_inference and structural_basis evidence entries).
Warnings and metadata schemas
diffgraph/schema/diffgraph-v2.schema.json
Warning object with enumerated code values and Metadata object with required privacy_tier enum and optional telemetry fields (detected languages, analysis duration, LLM calls/model, warnings array).

🎯 3 (Moderate) | ⏱️ ~20 minutes

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately summarizes the primary change: adding a machine-readable JSON Schema v2 file (diffgraph-v2.schema.json) to define the DiffGraph output format.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch nia/schema-v2-json-schema

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@diffgraph/schema/diffgraph-v2.schema.json`:
- Around line 256-257: The schema defines location line numbers with
"line_start" and "line_end" having "minimum": 0 which permits zero-valued lines;
update the JSON schema in diffgraph-v2.schema.json so both "line_start" and
"line_end" use "minimum": 1 to enforce 1-indexed line numbers (change the
"line_start" and "line_end" properties accordingly in the location definition).
- Around line 120-121: The schema's line_start has a mismatch: its description
says "1-indexed line number" but "minimum" is 0; update the JSON schema to
enforce 1-indexing by changing the "minimum" for the line_start property from 0
to 1, and likewise review line_end (the "line_end" property) to ensure its
"minimum" is consistent with line_start (set to 1 if it should also be
1-indexed) so the constraints match the documented descriptions.
- Around line 341-366: The SummaryEntry schema currently allows missing evidence
despite its description; add "evidence" to the SummaryEntry "required" array and
strengthen the "evidence" property so the array must contain at least one
llm_inference and one structural_basis entry (use JSON Schema array "contains"
constraints or equivalent: two contains clauses with item schemas that match an
Evidence with "kind":"llm_inference" and "kind":"structural_basis", while
keeping items as { "$ref": "`#/`$defs/Evidence" }); update SummaryEntry to require
evidence and add the contains-based validators to enforce the described
contract.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 51092d93-c7dc-4f19-8008-259c564ffac1

📥 Commits

Reviewing files that changed from the base of the PR and between a43abac and f36c1e9.

📒 Files selected for processing (1)
  • diffgraph/schema/diffgraph-v2.schema.json

Comment thread diffgraph/schema/diffgraph-v2.schema.json Outdated
Comment thread diffgraph/schema/diffgraph-v2.schema.json Outdated
Comment thread diffgraph/schema/diffgraph-v2.schema.json
Three fixes per coderabbit review:
- Evidence.line_start/line_end: minimum 0 → 1 (schema said 1-indexed,
  minimum was inconsistently 0; line 0 is not a valid source location)
- SymbolEntry.location.line_start/line_end: same fix; added descriptions
- SummaryEntry: add 'evidence' to required array and add allOf contains
  constraints enforcing at least one llm_inference + one structural_basis
  entry (matches the described contract that was previously unenforced)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant