Skip to content

Fix negative matching #1521#1522

Open
lekanjava wants to merge 2 commits intoOWASP:masterfrom
lekanjava:fix-negative-matching
Open

Fix negative matching #1521#1522
lekanjava wants to merge 2 commits intoOWASP:masterfrom
lekanjava:fix-negative-matching

Conversation

@lekanjava
Copy link
Copy Markdown

This PR significantly improves detection accuracy and reduces false positives for several core modules by implementing robust negative matching and WAF filtering logic for #1521.

Key Changes:

  • Optimized Clickjacking Logic: Refactored clickjacking.yaml to utilize Nettacker's native reverse: true functionality. This removes complex, computationally expensive negative lookahead regex patterns and improves maintainability.
  • WAF False Positive Suppression: Added negative content matching to admin.yaml and dir.yaml. This prevents these modules from incorrectly reporting a "Found" directory when a Web Application Firewall (like Cloudflare, Incapsula, or Sucuri) returns a generic 403 Forbidden block page.
  • Enhanced Takeover Accuracy: Updated subdomain_takeover.yaml with global negative conditions. The module now ignores 403 status codes and specific WAF-related content strings, ensuring that blocked requests do not trigger unintended takeover signatures.
    These updates help ensure that Nettacker provides more reliable results when scanning modern infrastructure protected by security proxies.
    Fixes # (link your issue here)

Type of change

  • New core framework functionality
  • Bugfix (non-breaking change that fixes an issue)
  • Code refactoring without any functionality changes
  • New or existing module/payload change
  • Documentation/localization improvement
  • Test coverage improvement
  • Dependency upgrade
  • Other improvement (best practice, cleanup, optimization, etc)

Checklist

  • I've followed the contributing guidelines
  • I've digitally signed all my commits in this PR
  • I've run make pre-commit and confirm it didn't generate any warnings/changes
  • I've run make test and I confirm all tests passed locally
  • I've added/updated any relevant documentation in the docs/ folder
  • I've linked this PR with an open issue
  • I've tested and verified that my code works as intended and resolves the issue as described
  • I've attached screenshots demonstrating that my code works as intended (if applicable)
  • I've checked all other open PRs to avoid submitting duplicate work
  • I confirm that the code and comments in this PR are not direct unreviewed outputs of AI
  • I confirm that I am the Sole Responsible Author for every line of code, comment, and design decision

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Apr 19, 2026

Summary by CodeRabbit

  • Improvements
    • Enhanced admin-panel response filtering to exclude common error/access-denied responses.
    • Refined directory scanning to better ignore WAF/edge-provider blocks and error pages.
    • Improved clickjacking checks to more reliably detect protective headers and content.
    • Strengthened subdomain-takeover detection by requiring stricter combined response criteria before further verification.

Walkthrough

Updated YAML rules across scan and vuln modules to refine HTTP response selection: added content-based regex exclusions, changed some header/content checks to inverted-match semantics, and tightened takeover gating by switching a top-level condition from or to and.

Changes

Cohort / File(s) Summary
Scan modules
nettacker/modules/scan/admin.yaml, nettacker/modules/scan/dir.yaml
Added content response condition with case-insensitive regexes covering CDN/WAF/vendor/error phrases and reverse: true to exclude matching response bodies from scan selection.
Clickjacking detection
nettacker/modules/vuln/clickjacking.yaml
Reworked header/content checks: moved from negative-lookahead patterns to direct detection patterns and set reverse: true, inverting pass/fail logic for X-Frame-Options, CSP, and related content checks.
Subdomain takeover
nettacker/modules/vuln/subdomain_takeover.yaml
Changed top-level response gating from condition_type: or to condition_type: and, requiring both non-403 status and non-matching provider/error content before provider-specific takeover matching.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Suggested reviewers

  • arkid15r
🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The title 'Fix negative matching #1521' directly addresses the primary objective of the PR, which is to implement robust negative matching logic across multiple modules.
Description check ✅ Passed The description is well-related to the changeset, clearly explaining the negative matching improvements, WAF false positive suppression, and specific module updates with concrete examples.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@nettacker/modules/scan/admin.yaml`:
- Around line 47-49: The current admin finder condition (content.regex and
reverse: true used under condition_type: and) is too broad and will drop
legitimate admin pages; update content.regex to remove generic phrases like
"Access Denied" and "Error 403 Forbidden" and replace "Webroot" with specific
WAF/CDN signatures (e.g., Akamai, AWS WAF, F5 BIG-IP, Imperva) and anchored
patterns that indicate block pages (examples: "Attention Required! \\|
Cloudflare", "Incapsula incident ID", "Sucuri WebSite Firewall", or "Access
Denied.*reference\\s*#"); keep reverse: true but ensure the regex matches only
clear block-page markers rather than common auth/403 text so protected admin
endpoints aren’t filtered out.

In `@nettacker/modules/vuln/clickjacking.yaml`:
- Around line 37-46: The clickjacking rule should treat absent
X-Frame-Options/Content-Security-Policy headers as a positive finding and the
CSP content check must look for frame-ancestors, so update the rule and
referenced logic: in nettacker/modules/vuln/clickjacking.yaml change the
header/grouping so missing headers don't short-circuit an AND — e.g., make the
header checks an OR group (so absence of either header counts as vulnerable) or
mark each header sub-condition to treat missing key as a match; also replace the
content regex that currently searches for DENY|SAMEORIGIN inside a CSP meta tag
with a pattern that matches real CSP frame-ancestors directives (e.g., look for
"frame-ancestors" followed by 'none', 'self', or host sources), and ensure the
rule's reverse semantics still invert correctly; refer to the
response_conditions_matched function in nettacker/core/lib/http.py when
adjusting how absent headers are evaluated so the YAML behavior aligns with that
function's expectations.

In `@nettacker/modules/vuln/subdomain_takeover.yaml`:
- Around line 36-43: The top-level condition block using condition_type: and is
blocking 403-based takeover detections (e.g., the AWS Bucket Takeover) because
status_code reverse: "403" and broad content regex suppress
iterative_response_match; fix by either (a) removing 403 from the top-level
status_code exclusion and tightening the content regex to only vendor WAF
signatures (e.g., keep Cloudflare|Incapsula|Sucuri and drop generic "Access
Denied"/"Error 403 Forbidden"), or (b) move the WAF/content exclusions into each
provider's iterative_response_match entries so they don't globally block 403
detections; update the condition_type/status_code/content settings accordingly
to ensure the AWS Bucket Takeover sub-rule and iterative_response_match can run
for 403 responses.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 80722f88-6614-4568-915c-4781da0da234

📥 Commits

Reviewing files that changed from the base of the PR and between 50c0be2 and 32e3640.

📒 Files selected for processing (4)
  • nettacker/modules/scan/admin.yaml
  • nettacker/modules/scan/dir.yaml
  • nettacker/modules/vuln/clickjacking.yaml
  • nettacker/modules/vuln/subdomain_takeover.yaml

Comment on lines +47 to +49
content:
regex: (?i)(Cloudflare|Incapsula|Sucuri|Access Denied|Webroot|Error 403 Forbidden)
reverse: true
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Overly generic WAF regex risks filtering out legitimate admin endpoints.

With condition_type: and, this condition (with reverse: true) causes the admin finder to silently drop any response whose body matches these strings. Two concerns worth reconsidering:

  • Access Denied and Error 403 Forbidden are generic phrases that appear on many real admin/authentication pages (e.g., Tomcat Manager, Jenkins, custom admin panels after failed auth). Since the scan also treats 403 as a hit, many legitimate protected admin endpoints will now be silently hidden.
  • Webroot is not a common WAF/CDN block-page marker (Webroot is primarily an endpoint AV vendor). Please confirm this was intentional, or replace with more typical WAF signatures (e.g., Akamai, AWS WAF, F5 BIG-IP, ModSecurity, Imperva).

Consider anchoring patterns to phrases that strongly indicate a block page rather than substrings that appear in legitimate responses (e.g., Attention Required! \| Cloudflare, Incapsula incident ID, Sucuri WebSite Firewall, Access Denied.*reference\s*#).

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@nettacker/modules/scan/admin.yaml` around lines 47 - 49, The current admin
finder condition (content.regex and reverse: true used under condition_type:
and) is too broad and will drop legitimate admin pages; update content.regex to
remove generic phrases like "Access Denied" and "Error 403 Forbidden" and
replace "Webroot" with specific WAF/CDN signatures (e.g., Akamai, AWS WAF, F5
BIG-IP, Imperva) and anchored patterns that indicate block pages (examples:
"Attention Required! \\| Cloudflare", "Incapsula incident ID", "Sucuri WebSite
Firewall", or "Access Denied.*reference\\s*#"); keep reverse: true but ensure
the regex matches only clear block-page markers rather than common auth/403 text
so protected admin endpoints aren’t filtered out.

Comment on lines 37 to +46
headers:
x-frame-options:
regex: ^((?!SAMEORIGIN|DENY).)+$
reverse: false
regex: SAMEORIGIN|DENY
reverse: true
Content-Security-Policy:
regex: ^((?!frame-ancestors|frame-src).)+$
reverse: false
regex: frame-ancestors|frame-src
reverse: true
content:
regex: ^((?!http-equiv=.*Content-Security-Policy.*content=.*(DENY|SAMEORIGIN)).)+$
reverse: false
regex: http-equiv=.*Content-Security-Policy.*content=.*(DENY|SAMEORIGIN)
reverse: true
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Confirm the except-TypeError path in response_conditions_matched and how missing headers are treated.
rg -nP -C8 'def response_conditions_matched' nettacker/core/lib/http.py

Repository: OWASP/Nettacker

Length of output: 803


🏁 Script executed:

sed -n '45,100p' nettacker/core/lib/http.py

Repository: OWASP/Nettacker

Length of output: 2710


🏁 Script executed:

sed -n '45,130p' nettacker/core/lib/http.py

Repository: OWASP/Nettacker

Length of output: 3975


🏁 Script executed:

sed -n '37,46p' nettacker/modules/vuln/clickjacking.yaml

Repository: OWASP/Nettacker

Length of output: 433


Two issues with the inverted clickjacking matching.

  1. Missing headers fail the AND, so the most common vulnerability case isn't detected. In nettacker/core/lib/http.py (response_conditions_matched, lines 95–96), when any header key is absent, the lookup falls back to False, re.findall raises TypeError, and the except branch (line 71–72) sets condition_results["headers"][header] = []. Under condition_type: and, if any [] appears in header results, the entire function returns {}. That means a site shipping no X-Frame-Options and no Content-Security-Policy — the canonical clickjacking-vulnerable configuration — will not be reported. The rule will only fire when both headers are present but misconfigured, which is a very narrow subset. Restructure the condition (e.g., using or at this level, or per-header OR groups) so that "header absent" contributes to a positive finding.

  2. The content regex targets X-Frame-Options tokens inside a CSP meta tag, which don't exist in real code. DENY and SAMEORIGIN are X-Frame-Options directives; CSP uses frame-ancestors 'none' | 'self' | …. Meta tags declaring X-Frame-Options per spec are not valid. The regex http-equiv=.*Content-Security-Policy.*content=.*(DENY|SAMEORIGIN) will never match real markup. Combined with reverse: true, this sub-condition is effectively always true (the regex never matches), making it a no-op rather than a meaningful check. Update to search for actual CSP frame-ancestors syntax instead.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@nettacker/modules/vuln/clickjacking.yaml` around lines 37 - 46, The
clickjacking rule should treat absent X-Frame-Options/Content-Security-Policy
headers as a positive finding and the CSP content check must look for
frame-ancestors, so update the rule and referenced logic: in
nettacker/modules/vuln/clickjacking.yaml change the header/grouping so missing
headers don't short-circuit an AND — e.g., make the header checks an OR group
(so absence of either header counts as vulnerable) or mark each header
sub-condition to treat missing key as a match; also replace the content regex
that currently searches for DENY|SAMEORIGIN inside a CSP meta tag with a pattern
that matches real CSP frame-ancestors directives (e.g., look for
"frame-ancestors" followed by 'none', 'self', or host sources), and ensure the
rule's reverse semantics still invert correctly; refer to the
response_conditions_matched function in nettacker/core/lib/http.py when
adjusting how absent headers are evaluated so the YAML behavior aligns with that
function's expectations.

Comment on lines +36 to +43
condition_type: and
conditions:
status_code:
regex: "403"
reverse: true
content:
regex: (?i)(Cloudflare|Incapsula|Sucuri|Access Denied|Webroot|Error 403 Forbidden)
reverse: true
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Confirm iterative_response_match handling path and any pre-filtering contract.
rg -nP -C5 'iterative_response_match'

Repository: OWASP/Nettacker

Length of output: 10968


🏁 Script executed:

#!/bin/bash
# Read the subdomain_takeover.yaml file around the relevant section
head -50 nettacker/modules/vuln/subdomain_takeover.yaml

Repository: OWASP/Nettacker

Length of output: 1969


🏁 Script executed:

#!/bin/bash
# Search for AWS bucket takeover to understand if 403 is a legitimate concern
rg -A 20 -B 5 'AWS Bucket Takeover|AWS S3' nettacker/modules/vuln/subdomain_takeover.yaml

Repository: OWASP/Nettacker

Length of output: 1182


🏁 Script executed:

#!/bin/bash
# Check the response_conditions_matched function to understand what fields it evaluates
rg -A 30 'def response_conditions_matched' nettacker/core/lib/http.py

Repository: OWASP/Nettacker

Length of output: 1613


Top-level and condition blocks AWS S3 and other 403-returning takeover detections.

With condition_type: and, these filters become a hard pre-gate: iterative_response_match runs only when status ≠ 403 and content lacks the listed terms. This causes two regressions:

  1. AWS S3 (and other cloud services) legitimately return 403 with "The specified bucket does not exist". The AWS Bucket Takeover sub-rule can never fire under this logic—a direct behavioral regression.
  2. Generic error phrases in the content regex (Access Denied, Error 403 Forbidden) will suppress valid takeovers that happen to include these strings.

The iterative_response_match section does have a separate evaluation path (confirmed in nettacker/core/lib/http.py lines 205–211), but it is gated by lines 198–201: only runs if the top-level condition passes or condition_type == "or". With condition_type: and, a failed pre-filter blocks it entirely.

Recommended fixes:

  • (a) Remove 403 from the status-code exclusion; tighten the content regex to concrete WAF signatures only (Cloudflare|Incapsula|Sucuri without the generic Access Denied / Error 403 Forbidden), or
  • (b) Move these WAF exclusions inside each provider's iterative_response_match entry so they don't blanket-suppress 403-based detections.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@nettacker/modules/vuln/subdomain_takeover.yaml` around lines 36 - 43, The
top-level condition block using condition_type: and is blocking 403-based
takeover detections (e.g., the AWS Bucket Takeover) because status_code reverse:
"403" and broad content regex suppress iterative_response_match; fix by either
(a) removing 403 from the top-level status_code exclusion and tightening the
content regex to only vendor WAF signatures (e.g., keep
Cloudflare|Incapsula|Sucuri and drop generic "Access Denied"/"Error 403
Forbidden"), or (b) move the WAF/content exclusions into each provider's
iterative_response_match entries so they don't globally block 403 detections;
update the condition_type/status_code/content settings accordingly to ensure the
AWS Bucket Takeover sub-rule and iterative_response_match can run for 403
responses.

@securestep9
Copy link
Copy Markdown
Collaborator

@coderabbitai full review

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Apr 19, 2026

✅ Actions performed

Full review triggered.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@nettacker/modules/scan/dir.yaml`:
- Around line 47-49: The negative content regex under the content block (keys:
regex, reverse) is overly broad and will drop legitimate hits; tighten it by
removing generic terms like "Access Denied", "Error 403 Forbidden", "Webroot"
and vendor names that can appear in benign pages, and replace them with
anchored, block-page specific patterns (e.g., phrases like "Attention Required!
| Cloudflare", "Incapsula incident", "Sucuri WebSite Firewall", or "Access
Denied.*reference\\s*#"); keep reverse: true behavior but update the regex to
only match strong WAF/blockpage signatures so that 200/401/403 directory hits
(condition_type: and) are not suppressed erroneously.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 75e76299-5d46-4a54-bbd3-360ce925747b

📥 Commits

Reviewing files that changed from the base of the PR and between 50c0be2 and 32e3640.

📒 Files selected for processing (4)
  • nettacker/modules/scan/admin.yaml
  • nettacker/modules/scan/dir.yaml
  • nettacker/modules/vuln/clickjacking.yaml
  • nettacker/modules/vuln/subdomain_takeover.yaml

Comment on lines +47 to +49
content:
regex: (?i)(Cloudflare|Incapsula|Sucuri|Access Denied|Webroot|Error 403 Forbidden)
reverse: true
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Overly broad negative content regex may hide legitimate directory hits.

With condition_type: and and reverse: true, any 200/401/403 response whose body merely contains strings like Access Denied, Error 403 Forbidden, or Webroot will be silently dropped. Several concerns:

  • Access Denied / Error 403 Forbidden are extremely common on legitimate protected directories (Apache/Nginx default 403 pages, IIS, Tomcat, Jenkins, many admin panels). Since the scan is explicitly configured to treat 403 as a hit (line 45), this filter will suppress the very signal the scan is looking for.
  • Webroot is an endpoint AV vendor, not a common CDN/WAF block-page marker; the word also appears in plenty of legitimate web-server documentation/error pages. Likely unintended — consider dropping it or replacing with real WAF signatures (Akamai, AWS WAF, F5 BIG-IP, ModSecurity, Imperva).
  • The vendor names Cloudflare|Incapsula|Sucuri match any page that merely references them (footers, docs, analytics), not just block pages.

Prefer anchoring to strong block-page phrases, e.g. Attention Required! \| Cloudflare, Incapsula incident ID, Sucuri WebSite Firewall, Access Denied.*reference\s*#.

🔧 Suggested tightening
             content:
-              regex: (?i)(Cloudflare|Incapsula|Sucuri|Access Denied|Webroot|Error 403 Forbidden)
+              regex: (?i)(Attention Required!\s*\|\s*Cloudflare|Incapsula incident ID|Sucuri WebSite Firewall|Access Denied.*reference\s*#)
               reverse: true
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
content:
regex: (?i)(Cloudflare|Incapsula|Sucuri|Access Denied|Webroot|Error 403 Forbidden)
reverse: true
content:
regex: (?i)(Attention Required!\s*\|\s*Cloudflare|Incapsula incident ID|Sucuri WebSite Firewall|Access Denied.*reference\s*#)
reverse: true
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@nettacker/modules/scan/dir.yaml` around lines 47 - 49, The negative content
regex under the content block (keys: regex, reverse) is overly broad and will
drop legitimate hits; tighten it by removing generic terms like "Access Denied",
"Error 403 Forbidden", "Webroot" and vendor names that can appear in benign
pages, and replace them with anchored, block-page specific patterns (e.g.,
phrases like "Attention Required! | Cloudflare", "Incapsula incident", "Sucuri
WebSite Firewall", or "Access Denied.*reference\\s*#"); keep reverse: true
behavior but update the regex to only match strong WAF/blockpage signatures so
that 200/401/403 directory hits (condition_type: and) are not suppressed
erroneously.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants