fix: align agent data access policy checks#19507
Conversation
|
Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits. |
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly hardens the agent data access policy framework. It establishes a robust foundation for defining, validating, and evaluating agent interactions with data sources and tools, ensuring a deny-by-default posture and enforcing critical security patterns. The changes are confined to the policy definition and evaluation layers, without altering existing runtime behavior, and include a modernized testing approach for improved reliability. Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Activity
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
|
Warning Rate limit exceeded
⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. ⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (21)
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Code Review
This pull request lays a strong foundation for agent data access policies by introducing schema-validated registries, a typed policy evaluator, and an ESM-compatible test runner. However, there are significant security gaps in the evaluator logic that undermine the intended 'deny-by-default' posture. Specifically, the approvalId check for restricted sources is easily bypassed due to lack of validation, and the optional nature of operation and sourceId fields allows for bypassing banned pattern and source restriction checks. Additionally, the policy loader is vulnerable to a limited path traversal if the policyDir parameter is untrusted. Further suggestions include aligning JSON schemas with the specified draft version, improving policy loading and evaluation performance, and enhancing test coverage.
| if (!inv.approvalId) { | ||
| return { allow: false, reason: 'APPROVAL_REQUIRED' }; | ||
| } |
There was a problem hiding this comment.
The policy evaluator allows access to restricted sources as long as an approvalId is present in the invocation. It does not perform any validation on the approvalId (e.g., checking against a registry of valid approvals, verifying the requester, or ensuring the approval is for the specific source). This allows an attacker to bypass the restriction by providing any non-empty string as an approvalId.
| const op = (inv.operation || '').toLowerCase(); | ||
| for (const pattern of policy.bannedPatterns) { | ||
| if (op.includes(pattern.toLowerCase())) { | ||
| return { allow: false, reason: 'BANNED_OPERATION' }; | ||
| } | ||
| } |
There was a problem hiding this comment.
The policy evaluator only performs checks for operation and sourceId if those fields are provided in the ToolInvocation. Since these fields are optional, an attacker can bypass banned operation patterns and restricted source checks by simply omitting the corresponding fields from the invocation, undermining the 'deny-by-default' security posture. Furthermore, if banned patterns are normalized to lowercase at load time, the .toLowerCase() call here is redundant and can be removed for a minor performance improvement.
if (op.includes(pattern)) {| export function loadPolicyBundle( | ||
| policyDir = path.resolve( | ||
| process.cwd(), | ||
| '.github', | ||
| 'policies', | ||
| 'agent-data-access', | ||
| ), | ||
| ): PolicyBundle { | ||
| const toolSchemaPath = path.join(policyDir, TOOL_SCHEMA); | ||
| const sourceSchemaPath = path.join(policyDir, SOURCE_SCHEMA); | ||
| const bannedSchemaPath = path.join(policyDir, BANNED_SCHEMA); | ||
|
|
||
| const toolRegistry = readJson<ToolRegistry>( | ||
| path.join(policyDir, 'tool_registry.json'), | ||
| ); | ||
| const sourceRegistry = readJson<SourceRegistry>( | ||
| path.join(policyDir, 'source_registry.json'), | ||
| ); | ||
| const bannedRegistry = readJson<BannedPatternRegistry>( | ||
| path.join(policyDir, 'banned_patterns.json'), | ||
| ); |
There was a problem hiding this comment.
The loadPolicyBundle function accepts a policyDir parameter which is used to construct file paths for fs.readFileSync without validation. If this parameter is controlled by an untrusted user, it could lead to arbitrary file read (limited to specific filenames like tool_registry.json).
export function loadPolicyBundle(
policyDir = path.resolve(
process.cwd(),
'.github',
'policies',
'agent-data-access',
),
): PolicyBundle {
const resolvedPath = path.resolve(policyDir);
if (!resolvedPath.startsWith(process.cwd())) {
throw new Error('Invalid policy directory');
}
const toolSchemaPath = path.join(policyDir, TOOL_SCHEMA);| "additionalProperties": { "$ref": "#/definitions/source" } | ||
| } | ||
| }, | ||
| "definitions": { | ||
| "source": { |
There was a problem hiding this comment.
The JSON schema is declared as draft/2020-12, but it uses the definitions keyword which is from older drafts. For draft 2020-12, definitions is deprecated in favor of $defs. To align with the specified schema version, you should use $defs and update the $ref accordingly.
| "additionalProperties": { "$ref": "#/definitions/source" } | |
| } | |
| }, | |
| "definitions": { | |
| "source": { | |
| "additionalProperties": { "$ref": "#/$defs/source" } | |
| } | |
| }, | |
| "$defs": { | |
| "source": { |
| "additionalProperties": { "$ref": "#/definitions/tool" } | ||
| } | ||
| }, | ||
| "definitions": { | ||
| "tool": { |
There was a problem hiding this comment.
The JSON schema is declared as draft/2020-12, but it uses the definitions keyword which is from older drafts. For draft 2020-12, definitions is deprecated in favor of $defs. To align with the specified schema version, you should use $defs and update the $ref accordingly.
| "additionalProperties": { "$ref": "#/definitions/tool" } | |
| } | |
| }, | |
| "definitions": { | |
| "tool": { | |
| "additionalProperties": { "$ref": "#/$defs/tool" } | |
| } | |
| }, | |
| "$defs": { | |
| "tool": { |
| const ajv = new Ajv2020({ allErrors: true, strict: false }); | ||
| addFormats(ajv); |
There was a problem hiding this comment.
Creating a new Ajv2020 instance on every call to validateSchema is inefficient. The AJV instance should be created once at the module level and reused. This avoids the overhead of re-initializing the validator and its formats for each schema validation. After applying the suggested schema fixes for $defs, you might also be able to enable strict mode (strict: true).
| return { | ||
| tools: toolRegistry.tools, | ||
| sources: sourceRegistry.sources, | ||
| bannedPatterns: bannedRegistry.patterns, |
There was a problem hiding this comment.
| ); | ||
| assert.deepEqual(allowDecision, { allow: true }); | ||
|
|
||
| console.log('policyEval.check: ok'); |
There was a problem hiding this comment.
The test coverage for the policy evaluator is a bit sparse. It would be beneficial to add more test cases to cover other denial reasons handled by evaluateInvocation. For example:
- Invocation with a banned operation pattern (e.g., 'bypass').
- Invocation of a disabled tool or source.
- Invocation with a scope not allowlisted for the tool.
- Invocation of a source missing a
lawful_basis.
|
Temporarily closing to reduce Actions queue saturation and unblock #22241. Reopen after the golden-main convergence PR merges. |
1 similar comment
|
Temporarily closing to reduce Actions queue saturation and unblock #22241. Reopen after the golden-main convergence PR merges. |
Motivation
Description
.github/policies/agent-data-access/and enforce validation in the loader (.github/policies/agent-data-access/banned_patterns.json,.github/policies/agent-data-access/banned_patterns.schema.json).src/agents/policy/policyLoad.ts) to validate the banned-patterns schema and perform registry id/pattern checks.src/agents/policy/policyTypes.tsandsrc/agents/policy/policyEval.tsimplementing deny-by-default logic and banned-pattern checks.tests/agents/policy/policyEval.check.mjsand update evidence, prompt metadata, agent example, and roadmap (evidence/*,prompts/registry.yaml,agents/examples/AGENT_DATA_ACCESS_PR1.json,docs/roadmap/STATUS.json).Testing
node --loader ts-node/esm tests/agents/policy/policyEval.check.mjswhich executed and returnedpolicyEval.check: ok(pass).evidence/index.json.Codex Task