Vulnerability fix changes#121
Conversation
📝 WalkthroughWalkthroughPull request updates Docker base images to use ECR public registries, adds S3 Hadoop filesystem plugin configuration to Flink containers, removes the entire Hudi connector module, and performs comprehensive dependency management updates across all pipeline and framework POMs to establish explicit version pinning and transitive exclusion strategies. Changes
Estimated code review effort🎯 4 (Complex) | ⏱️ ~75 minutes Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 inconclusive)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 11
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
pipeline/extractor/pom.xml (1)
28-323: 🛠️ Refactor suggestion | 🟠 MajorThis exclusion/pin block is duplicated verbatim across 5 pipeline child modules — consolidate into
pipeline/pom.xml.The same ~200-line pattern of Flink/Kafka/embedded-test exclusions and pinned replacements (
kryo:4.0.3,jose4j:0.9.6,netty-*:4.1.130.Final,commons-beanutils:1.11.0,assertj-core:3.27.7,zookeeper:3.9.3,commons-compress:1.26.0,commons-lang3:3.18.0,embedded-kafka:3.9.1,snappy-javareplacement, etc.) is copy-pasted inpipeline/extractor/pom.xml,pipeline/transformer/pom.xml,pipeline/denormalizer/pom.xml,pipeline/preprocessor/pom.xml, andpipeline/dataset-router/pom.xml. This is a maintenance hazard — any future CVE bump now requires 5 simultaneous edits or they will drift. Please move this intopipeline/pom.xml(under<dependencyManagement>for versions, and as shared<dependencies>where the scope/exclusions are identical), so child modules only need to declare the handful of artifacts unique to them.A few additional issues visible within this block (they apply to all 5 duplicated POMs, so I’m flagging them once here):
- Dead code – commented-out kryo block (lines 188–193): remove it; the replacement direct dep is already declared at lines 45–49.
commons-lang3:3.18.0at line 225–229 has no<scope>— it will be bundled into the shaded production jar, even though the transitive it replaces was excluded only from the test-classifiedflink-runtime. If that was intentional (because some compile-time code also needs commons-lang3 3.18), fine; if not, add<scope>test</scope>.- Stray whitespace/indent artifacts (e.g., line 92, 219, 265 empty-line inside
<dependency>) — XML is valid but hard to read; consider a POM formatter pass.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pipeline/extractor/pom.xml` around lines 28 - 323, The duplicated Flink/Kafka/exclusions and pinned test deps (e.g., artifactIds kryo, jose4j, netty-transport-native-epoll, netty-handler, commons-beanutils, assertj-core, zookeeper, commons-compress, commons-lang3, embedded-kafka, snappy-java) must be consolidated: move the version pins into pipeline/pom.xml's <dependencyManagement> and the shared exclusioned/test-scoped dependencies into pipeline/pom.xml as shared <dependencies>, then remove the repeated blocks from the child module POMs (extractor/transformer/denormalizer/preprocessor/dataset-router) so they only declare their module-unique artifacts; also remove the commented-out kryo block (dead code), decide and set an explicit <scope> (likely test) for commons-lang3 (artifactId commons-lang3) if it should not be packaged, and run a POM formatter to remove stray whitespace/indent artifacts.FlinkDockerfile (1)
3-11:⚠️ Potential issue | 🟡 MinorStale Hudi references after connector module removal.
The PR removes the entire Hudi connector module, but this Dockerfile still downloads
hudi-flink1.17-bundle-0.15.0.jar(line 6) and has commented-outcp/COPYlines referencinghudi-connector-1.0.0.jar(lines 9–11). The download adds build time and a useless artifact to the image. Also note the bundle is for Flink 1.17, not the project's Flink 1.20 — another signal this is dead infrastructure.♻️ Suggested cleanup
RUN wget https://repo1.maven.org/maven2/org/apache/flink/flink-shaded-hadoop-2-uber/2.8.3-10.0/flink-shaded-hadoop-2-uber-2.8.3-10.0.jar RUN wget https://repo1.maven.org/maven2/org/apache/flink/flink-s3-fs-hadoop/1.20.0/flink-s3-fs-hadoop-1.20.0.jar -RUN wget https://repo.maven.apache.org/maven2/org/apache/hudi/hudi-flink1.17-bundle/0.15.0/hudi-flink1.17-bundle-0.15.0.jar RUN cp flink-shaded-hadoop-2-uber-2.8.3-10.0.jar $FLINK_HOME/lib RUN cp flink-s3-fs-hadoop-1.20.0.jar $FLINK_HOME/lib -#RUN cp hudi-flink1.17-bundle-0.15.0.jar $FLINK_HOME/lib -## COPY ./target/hudi-connector-1.0.0.jar $FLINK_HOME/custom-lib -#COPY ./target/hudi-connector-1.0.0.jar $FLINK_HOME/lib🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@FlinkDockerfile` around lines 3 - 11, Remove stale Hudi artifacts and references from the Dockerfile: delete the RUN that downloads hudi-flink1.17-bundle-0.15.0.jar and remove or permanently delete the commented cp/COPY lines referencing hudi-connector-1.0.0.jar (and any other hudi-* filenames) so the image build no longer fetches or mentions the obsolete Hudi bundle or connector; keep the remaining Flink library wget/cp steps unchanged.
🧹 Nitpick comments (9)
transformation-sdk/pom.xml (1)
55-80: Drop the explicit<version>onwoodstox-core(andcommons-lang3once centralized) to let parentdependencyManagementdrive the version.Since the root
pom.xmlnow pinscom.fasterxml.woodstox:woodstox-core:6.7.0in<dependencyManagement>, the explicit<version>6.7.0</version>here (line 59) becomes a duplicate source of truth and defeats the purpose of centralization. Same applies tocommons-lang3:3.18.0once it is added to the parent pin list (see root POM comment).♻️ Proposed diff
<dependency> <groupId>com.fasterxml.woodstox</groupId> <artifactId>woodstox-core</artifactId> - <version>6.7.0</version> </dependency>🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@transformation-sdk/pom.xml` around lines 55 - 80, Remove the hard-coded <version> elements for com.fasterxml.woodstox:woodstox-core and org.apache.commons:commons-lang3 in this module's POM so the parent project's dependencyManagement controls their versions; specifically, delete the <version>6.7.0</version> node inside the woodstox-core dependency and remove the <version>3.18.0</version> node for commons-lang3, leaving the dependencies declared only by groupId/artifactId (and any exclusions) so the centralized versions are honored.pom.xml (1)
89-99: Confirm transitive coverage note for ZooKeeper 3.9.3 is still accurate.The comment claims 3.9.3 also fixes transitive Netty 4.1.105, Logback 1.2.13, and Commons IO 2.11.0. ZooKeeper 3.9.3 ships with Netty 4.1.113 / Logback 1.2.13 / Commons IO 2.11.0 per the release notes, so Logback/Commons IO are not actually fixed by the ZK upgrade alone – which is why you are (correctly) pinning
logback-*:1.5.18andcommons-io:2.15.1separately below. Consider tightening the comment so future readers don’t think the ZK bump alone covers those CVEs.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pom.xml` around lines 89 - 99, Update the inline comment above the zookeeper/zookeeper-jute dependency entries to correctly state that while ZooKeeper 3.9.3 addresses the ZK CVE and updates its bundled Netty to 4.1.113, it does not remediate the Logback or Commons IO CVEs; mention that Logback and Commons IO are pinned separately (see the explicit logback-*:1.5.18 and commons-io:2.15.1 entries) so readers don’t assume the ZK bump alone covers those transitive CVEs.pipeline/dataset-router/pom.xml (1)
29-360: Same duplicated exclusion/pin pattern as the other pipeline child POMs, plus a redundantmailapiversion.See the consolidated review on
pipeline/extractor/pom.xml/pipeline/pom.xmlfor the shared concerns (copy-paste duplication, dead-code commented kryo block at 227–232,commons-lang3:3.18.0at 264–268 missing<scope>).Additional issue specific to this file: since the root POM now manages
com.sun.mail:mailapi:1.6.8via<dependencyManagement>, the explicit<version>1.6.8</version>on line 93 becomes a duplicate source of truth. Drop the<version>and rely on the parent pin.♻️ Proposed diff
<!-- Fix BDSA-2025-20661 (MEDIUM): safe mailapi replacing json-schema-validator transitive --> <dependency> <groupId>com.sun.mail</groupId> <artifactId>mailapi</artifactId> - <version>1.6.8</version> </dependency>🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pipeline/dataset-router/pom.xml` around lines 29 - 360, POM contains duplicated exclusion/pin patterns and redundant entries: remove the explicit <version> for the com.sun.mail:mailapi dependency so it inherits the parent dependencyManagement version (locate the dependency with artifactId "mailapi"), delete the dead commented kryo block (the commented com.esotericsoftware kryo block around the earlier section), consolidate repeated exclusion lists shared across Flink/Kafka/test dependencies by removing duplicated exclusions where parent or sibling POMs already manage them, and add a proper <scope> to the commons-lang3 dependency (artifactId "commons-lang3") instead of leaving it unscoped; update only the <dependencies> entries referenced (mailapi, commented kryo, commons-lang3, and the repeated exclusion blocks) to avoid duplicating version pins and commented dead code.pipeline/cache-indexer/pom.xml (2)
176-187: Remove the commented-outkryoblock.Dead code in a pom file is noise. The active
kryodeclaration already exists at lines 40–44.♻️ Suggested fix
- <!-- <dependency> - <groupId>com.esotericsoftware</groupId> - <artifactId>kryo</artifactId> - <version>4.0.3</version> - <scope>test</scope> - </dependency> --> <dependency> <groupId>org.assertj</groupId>🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pipeline/cache-indexer/pom.xml` around lines 176 - 187, Remove the commented-out kryo dependency block from the pom so the file has no dead/commented dependency entries; specifically delete the commented section that references groupId "com.esotericsoftware", artifactId "kryo", and version "4.0.3" (the commented block shown in the diff) since an active kryo dependency already exists elsewhere.
279-291: Odd formatting inside thezookeepertest dependency.Line 284 contains only whitespace between
<scope>test</scope>and<exclusions>. XML parses fine, but clean this up for readability.♻️ Suggested fix
<dependency> <groupId>org.apache.zookeeper</groupId> <artifactId>zookeeper</artifactId> <version>3.9.3</version> <scope>test</scope> - <exclusions> <exclusion> <groupId>org.xerial.snappy</groupId> <artifactId>snappy-java</artifactId> </exclusion> </exclusions> </dependency>🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pipeline/cache-indexer/pom.xml` around lines 279 - 291, The zookeeper test dependency block has an unnecessary blank line between the <scope>test</scope> element and the <exclusions> element; update the <dependency> for org.apache.zookeeper: zookeeper (version 3.9.3) by removing the extra whitespace so <scope>test</scope> and <exclusions> appear consecutively for consistent formatting and readability within the dependency block.pipeline/unified-pipeline/pom.xml (2)
199-210: Remove the commented-outkryotest dependency.Dead markup; same pattern as other pom files in this PR.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pipeline/unified-pipeline/pom.xml` around lines 199 - 210, Remove the dead commented-out Kryo test dependency block from the pom.xml by deleting the commented <dependency> element that references groupId "com.esotericsoftware" and artifactId "kryo" (version 4.0.3, scope test) so only the active assertj-core dependency remains; ensure no other commented dependency remnants remain in the same dependency section.
302-314: Whitespace-only line insidezookeeperdependency.Line 307 is bare whitespace between
<scope>test</scope>and<exclusions>. Cosmetic; please tidy up.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pipeline/unified-pipeline/pom.xml` around lines 302 - 314, The zookeeper dependency block contains an extra blank/whitespace-only line between the <scope>test</scope> element and the <exclusions> element; remove that empty line inside the <dependency> for org.apache.zookeeper:artifactId zookeeper to keep the XML tidy and ensure elements are adjacent (update the dependency block around the zookeeper declaration accordingly).Dockerfile (1)
14-24: Harden the S3 plugin install and config append.Two small robustness concerns:
mv $FLINK_HOME/opt/flink-s3-fs-hadoop-*.jar ...(line 16 / 31): withset -eimplicit in some base images disabled, a missing jar silently leaves the plugin dir empty; with globbing, no match passes the literal tomvand fails with a confusing error. Fail fast explicitly:RUN set -eux; \ mkdir -p "$FLINK_HOME/usrlib" "$FLINK_HOME/plugins/flink-s3-fs-hadoop"; \ ls "$FLINK_HOME"/opt/flink-s3-fs-hadoop-*.jar >/dev/null; \ mv "$FLINK_HOME"/opt/flink-s3-fs-hadoop-*.jar "$FLINK_HOME/plugins/flink-s3-fs-hadoop/"- The append step (lines 20–24) is idempotent only on a freshly built layer; fine for Docker builds, but worth a comment that re-running won't deduplicate.
This block is duplicated in
cache-indexer-image(lines 29–36); consider extracting to a reusable ONBUILD or shared stage.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@Dockerfile` around lines 14 - 24, The Dockerfile's plugin move and config append are brittle: ensure the RUN that creates $FLINK_HOME/usrlib and $FLINK_HOME/plugins/flink-s3-fs-hadoop sets strict shell options (e.g., set -eux) and explicitly verify the jar glob before moving (use an ls or test on $FLINK_HOME/opt/flink-s3-fs-hadoop-*.jar and fail fast) so the mv command in that RUN won't silently noop or produce confusing errors; also add a brief comment near the config append that the echo into $FLINK_HOME/conf/config.yaml or $FLINK_HOME/conf/flink-conf.yaml is not deduplicated on re-runs (idempotency caveat), and consider extracting the duplicated block to a shared ONBUILD or build stage to avoid copy/paste between the two locations mentioned.framework/pom.xml (1)
377-388: Drop the commented-outkryoblock.Same dead declaration as in other modules. The active pin is at lines 57–61.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@framework/pom.xml` around lines 377 - 388, Remove the dead commented-out dependency block for com.esotericsoftware:kryo (the <!-- <dependency> ... kryo ... </dependency> --> chunk) from the pom so the file no longer contains the stale commented declaration; leave the active assertj test dependency as-is and avoid reintroducing the commented kryo block since kryo is already pinned elsewhere in the module.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@dataset-registry/pom.xml`:
- Around line 129-148: The exclusions removed lz4-java and snappy-java from the
test classpath but only re-adds kryo, causing runtime ClassNotFoundExceptions in
flink-runtime tests; update this module's pom to explicitly add test-scope
dependencies for at.yawk.lz4:lz4-java:1.10.3 and
org.xerial.snappy:snappy-java:1.1.10.5 (the same versions pinned in
framework/pom.xml) or otherwise ensure those compile-scope pins are transitively
visible to tests, keeping the existing kryo test dependency and leaving the
flink-runtime exclusion block intact.
In `@Dockerfile`:
- Line 10: The Dockerfile uses the upstream Flink base image; change the FROM
references for the unified-image stage (symbol: unified-image) and the
cache-indexer-image stage (symbol: cache-indexer-image) from
public.ecr.aws/docker/library/flink:1.20-scala_2.12-java11 to the approved
sanketikahub/flink:1.20-scala_2.12-java11; if you intend to change the project
guideline instead, update CLAUDE.md in the same PR to document the new approved
base image.
- Around line 10-37: The Dockerfile removed several build stages
(extractor-image, preprocessor-image, denormalizer-image, transformer-image,
dataset-router-image) while keeping unified-image and cache-indexer-image, which
breaks the CI matrix in build_and_deploy.yaml; fix by either re-adding the
missing stages as aliases that point to the consolidated artifact (e.g., create
additional FROM ... AS extractor-image / AS preprocessor-image entries that
mirror unified-image behavior or add simple stage aliases referencing the same
jar) or update the GitHub Actions matrix to remove those stage names, ensuring
the matrix entries match existing Dockerfile targets (refer to the Dockerfile
stage identifiers unified-image and cache-indexer-image and the COPY targets
like unified-pipeline-1.0.0.jar and cache-indexer-1.0.0.jar).
In `@FlinkDockerfile`:
- Line 1: The Dockerfile currently uses the public ECR base image string
"public.ecr.aws/docker/library/flink:1.20-scala_2.12-java11" which violates the
project's guideline requiring the hardened base
"sanketikahub/flink:1.20-scala_2.12-java11"; update the FROM line to use the
sanctioned base image (replace the image string), or if the ECR mirror is
intentional, add/update rationale and change the guideline in CLAUDE.md within
this PR to permit the new base.
In `@framework/pom.xml`:
- Around line 148-162: Update the httpclient dependency version in framework's
POM: locate the <dependency> element with
<groupId>org.apache.httpcomponents</groupId> and
<artifactId>httpclient</artifactId> (in framework/pom.xml) and change its
<version> from 4.5.13 to 4.5.14 so it matches dataset-registry's CVE-fixed
version; keep the existing <exclusions> (commons-codec) and the separate
commons-codec dependency unchanged.
- Around line 250-262: Remove the stray blank line inside the zookeeper
dependency block so <scope>test</scope> sits directly before <exclusions>, and
then verify the runtime visibility of zookeeper: inspect the zookeeper
dependency declaration (artifactId zookeeper, version 3.9.3) and the
compile-scope exclusions on transitive consumers like flink-connector-kafka and
kafka clients; either change the zookeeper dependency scope to a runtime/compile
scope (or add an explicit runtime-provided dependency) if production code needs
zookeeper at runtime, or document/confirm that zookeeper is only needed for
tests and keep scope as test while ensuring compile-scope exclusions do not
remove required runtime artifacts.
In `@pipeline/cache-indexer/pom.xml`:
- Around line 194-217: Fix the mixed-tab indentation on the <exclusion> block
(remove the stray tab characters so the tag aligns with the other <exclusion>
entries) and change the commons-lang3 dependency declaration (artifactId
commons-lang3, version 3.18.0) to a non-compile scope (set
<scope>provided</scope> or <scope>test</scope>) so it is not shaded into the fat
jar and doesn't conflict with Flink's runtime-provided commons-lang3; update the
dependency block accordingly.
- Around line 40-44: The Kryo dependency (groupId com.esotericsoftware,
artifactId kryo, version 4.0.3) is declared in compile scope and will be shaded
into the job jar, which can conflict with Flink 1.20's bundled Kryo 2.24.0;
update the pom so this dependency uses scope "provided" (or remove it) so the
runtime uses Flink's Kryo, or else explicitly verify and document compatibility
with Kryo 2.24.0 if you must keep 4.0.3 (change the <dependency> for
com.esotericsoftware:kryo accordingly).
In `@pipeline/pom.xml`:
- Around line 142-163: The parent POM currently declares the
org.apache.kafka:kafka-clients dependency (artifactId kafka-clients, property
${kafka.version}) with no scope which causes it to be inherited as compile scope
by all child modules; change this declaration in the pipeline parent POM to use
<scope>provided</scope> (or remove it from the parent and place it only into
modules that truly need kafka-clients at runtime, e.g.,
extractor/transformer/denormalizer/etc.) so that kafka-clients is not packaged
into every child fat-jar and avoids classpath collisions with the Flink runtime
(reference the kafka-clients dependency entry in the parent POM).
In `@pipeline/unified-pipeline/pom.xml`:
- Around line 217-240: The pom has mixed tabs/spaces in the <exclusions> block
and declares org.apache.commons:commons-lang3:3.18.0 without a scope which will
shade it into the fat JAR; update the indentation in the exclusions block to use
consistent spaces and remove the stray tab, and change the commons-lang3
dependency declaration (artifactId commons-lang3, version 3.18.0) to include a
scope of provided (or test if only used in tests) so it is not packaged into the
fat jar.
In `@pom.xml`:
- Around line 63-118: The PR bumped kafka.version to 3.9.1 which is incompatible
with the existing flink connector (flink-connector-kafka:3.3.0-1.20); either
revert kafka.version to <=3.8.x or upgrade the connector to a 4.0.x
flink-connector-kafka that supports Kafka 3.9.1, and update the
dependencyManagement block accordingly; also centralize the repeated pinned
artifacts (com.esotericsoftware:kryo:4.0.3,
org.apache.commons:commons-lang3:3.18.0,
org.apache.commons:commons-compress:1.26.0,
commons-beanutils:commons-beanutils:1.11.0, org.bitbucket.b_c:jose4j:0.9.6,
io.netty:netty-handler:4.1.130.Final,
io.netty:netty-transport-native-epoll:4.1.130.Final,
org.assertj:assertj-core:3.27.7,
io.github.embeddedkafka:embedded-kafka_2.12:3.9.1) into this
dependencyManagement section so child modules (denormalizer, preprocessor,
dataset-router, extractor, transformer) inherit a single pinned version and
avoid drift.
---
Outside diff comments:
In `@FlinkDockerfile`:
- Around line 3-11: Remove stale Hudi artifacts and references from the
Dockerfile: delete the RUN that downloads hudi-flink1.17-bundle-0.15.0.jar and
remove or permanently delete the commented cp/COPY lines referencing
hudi-connector-1.0.0.jar (and any other hudi-* filenames) so the image build no
longer fetches or mentions the obsolete Hudi bundle or connector; keep the
remaining Flink library wget/cp steps unchanged.
In `@pipeline/extractor/pom.xml`:
- Around line 28-323: The duplicated Flink/Kafka/exclusions and pinned test deps
(e.g., artifactIds kryo, jose4j, netty-transport-native-epoll, netty-handler,
commons-beanutils, assertj-core, zookeeper, commons-compress, commons-lang3,
embedded-kafka, snappy-java) must be consolidated: move the version pins into
pipeline/pom.xml's <dependencyManagement> and the shared exclusioned/test-scoped
dependencies into pipeline/pom.xml as shared <dependencies>, then remove the
repeated blocks from the child module POMs
(extractor/transformer/denormalizer/preprocessor/dataset-router) so they only
declare their module-unique artifacts; also remove the commented-out kryo block
(dead code), decide and set an explicit <scope> (likely test) for commons-lang3
(artifactId commons-lang3) if it should not be packaged, and run a POM formatter
to remove stray whitespace/indent artifacts.
---
Nitpick comments:
In `@Dockerfile`:
- Around line 14-24: The Dockerfile's plugin move and config append are brittle:
ensure the RUN that creates $FLINK_HOME/usrlib and
$FLINK_HOME/plugins/flink-s3-fs-hadoop sets strict shell options (e.g., set
-eux) and explicitly verify the jar glob before moving (use an ls or test on
$FLINK_HOME/opt/flink-s3-fs-hadoop-*.jar and fail fast) so the mv command in
that RUN won't silently noop or produce confusing errors; also add a brief
comment near the config append that the echo into $FLINK_HOME/conf/config.yaml
or $FLINK_HOME/conf/flink-conf.yaml is not deduplicated on re-runs (idempotency
caveat), and consider extracting the duplicated block to a shared ONBUILD or
build stage to avoid copy/paste between the two locations mentioned.
In `@framework/pom.xml`:
- Around line 377-388: Remove the dead commented-out dependency block for
com.esotericsoftware:kryo (the <!-- <dependency> ... kryo ... </dependency> -->
chunk) from the pom so the file no longer contains the stale commented
declaration; leave the active assertj test dependency as-is and avoid
reintroducing the commented kryo block since kryo is already pinned elsewhere in
the module.
In `@pipeline/cache-indexer/pom.xml`:
- Around line 176-187: Remove the commented-out kryo dependency block from the
pom so the file has no dead/commented dependency entries; specifically delete
the commented section that references groupId "com.esotericsoftware", artifactId
"kryo", and version "4.0.3" (the commented block shown in the diff) since an
active kryo dependency already exists elsewhere.
- Around line 279-291: The zookeeper test dependency block has an unnecessary
blank line between the <scope>test</scope> element and the <exclusions> element;
update the <dependency> for org.apache.zookeeper: zookeeper (version 3.9.3) by
removing the extra whitespace so <scope>test</scope> and <exclusions> appear
consecutively for consistent formatting and readability within the dependency
block.
In `@pipeline/dataset-router/pom.xml`:
- Around line 29-360: POM contains duplicated exclusion/pin patterns and
redundant entries: remove the explicit <version> for the com.sun.mail:mailapi
dependency so it inherits the parent dependencyManagement version (locate the
dependency with artifactId "mailapi"), delete the dead commented kryo block (the
commented com.esotericsoftware kryo block around the earlier section),
consolidate repeated exclusion lists shared across Flink/Kafka/test dependencies
by removing duplicated exclusions where parent or sibling POMs already manage
them, and add a proper <scope> to the commons-lang3 dependency (artifactId
"commons-lang3") instead of leaving it unscoped; update only the <dependencies>
entries referenced (mailapi, commented kryo, commons-lang3, and the repeated
exclusion blocks) to avoid duplicating version pins and commented dead code.
In `@pipeline/unified-pipeline/pom.xml`:
- Around line 199-210: Remove the dead commented-out Kryo test dependency block
from the pom.xml by deleting the commented <dependency> element that references
groupId "com.esotericsoftware" and artifactId "kryo" (version 4.0.3, scope test)
so only the active assertj-core dependency remains; ensure no other commented
dependency remnants remain in the same dependency section.
- Around line 302-314: The zookeeper dependency block contains an extra
blank/whitespace-only line between the <scope>test</scope> element and the
<exclusions> element; remove that empty line inside the <dependency> for
org.apache.zookeeper:artifactId zookeeper to keep the XML tidy and ensure
elements are adjacent (update the dependency block around the zookeeper
declaration accordingly).
In `@pom.xml`:
- Around line 89-99: Update the inline comment above the
zookeeper/zookeeper-jute dependency entries to correctly state that while
ZooKeeper 3.9.3 addresses the ZK CVE and updates its bundled Netty to 4.1.113,
it does not remediate the Logback or Commons IO CVEs; mention that Logback and
Commons IO are pinned separately (see the explicit logback-*:1.5.18 and
commons-io:2.15.1 entries) so readers don’t assume the ZK bump alone covers
those transitive CVEs.
In `@transformation-sdk/pom.xml`:
- Around line 55-80: Remove the hard-coded <version> elements for
com.fasterxml.woodstox:woodstox-core and org.apache.commons:commons-lang3 in
this module's POM so the parent project's dependencyManagement controls their
versions; specifically, delete the <version>6.7.0</version> node inside the
woodstox-core dependency and remove the <version>3.18.0</version> node for
commons-lang3, leaving the dependencies declared only by groupId/artifactId (and
any exclusions) so the centralized versions are honored.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: e49cb233-f738-4c0b-b356-384ebd8b5dac
📒 Files selected for processing (24)
DockerfileFlinkDockerfiledataset-registry/pom.xmlframework/pom.xmlpipeline/cache-indexer/pom.xmlpipeline/dataset-router/pom.xmlpipeline/denormalizer/pom.xmlpipeline/extractor/pom.xmlpipeline/hudi-connector/pom.xmlpipeline/hudi-connector/src/main/resources/core-site.xmlpipeline/hudi-connector/src/main/resources/hudi-writer.confpipeline/hudi-connector/src/main/resources/schemas/schema.jsonpipeline/hudi-connector/src/main/scala/org/sunbird/obsrv/function/RowDataConverterFunction.scalapipeline/hudi-connector/src/main/scala/org/sunbird/obsrv/streaming/HudiConnectorConfig.scalapipeline/hudi-connector/src/main/scala/org/sunbird/obsrv/streaming/HudiConnectorStreamTask.scalapipeline/hudi-connector/src/main/scala/org/sunbird/obsrv/streaming/TestTimestamp.scalapipeline/hudi-connector/src/main/scala/org/sunbird/obsrv/util/HMetrics.scalapipeline/hudi-connector/src/main/scala/org/sunbird/obsrv/util/HudiSchemaParser.scalapipeline/pom.xmlpipeline/preprocessor/pom.xmlpipeline/transformer/pom.xmlpipeline/unified-pipeline/pom.xmlpom.xmltransformation-sdk/pom.xml
💤 Files with no reviewable changes (10)
- pipeline/hudi-connector/src/main/scala/org/sunbird/obsrv/streaming/TestTimestamp.scala
- pipeline/hudi-connector/src/main/resources/core-site.xml
- pipeline/hudi-connector/src/main/scala/org/sunbird/obsrv/util/HMetrics.scala
- pipeline/hudi-connector/src/main/resources/schemas/schema.json
- pipeline/hudi-connector/src/main/resources/hudi-writer.conf
- pipeline/hudi-connector/src/main/scala/org/sunbird/obsrv/util/HudiSchemaParser.scala
- pipeline/hudi-connector/src/main/scala/org/sunbird/obsrv/streaming/HudiConnectorStreamTask.scala
- pipeline/hudi-connector/src/main/scala/org/sunbird/obsrv/streaming/HudiConnectorConfig.scala
- pipeline/hudi-connector/src/main/scala/org/sunbird/obsrv/function/RowDataConverterFunction.scala
- pipeline/hudi-connector/pom.xml
| <exclusions> | ||
| <exclusion> | ||
| <groupId>com.esotericsoftware.kryo</groupId> | ||
| <artifactId>kryo</artifactId> | ||
| </exclusion> | ||
| <exclusion> | ||
| <groupId>org.lz4</groupId> | ||
| <artifactId>lz4-java</artifactId> | ||
| </exclusion> | ||
| <exclusion> | ||
| <groupId>org.xerial.snappy</groupId> | ||
| <artifactId>snappy-java</artifactId> | ||
| </exclusion> | ||
| </exclusions> | ||
| </dependency> | ||
| <dependency> | ||
| <groupId>com.esotericsoftware</groupId> | ||
| <artifactId>kryo</artifactId> | ||
| <version>4.0.3</version> | ||
| <scope>test</scope> |
There was a problem hiding this comment.
Exclusions add no replacements for lz4-java and snappy-java in test classpath.
flink-runtime tests require lz4-java and snappy-java for codec use in serializers and records. Only kryo is re-added. Tests that exercise compressed records/network stack will fail with ClassNotFoundException. The framework/pom.xml pins snappy-java:1.1.10.5 and at.yawk.lz4:lz4-java:1.10.3 in compile scope — verify the same pins are transitively visible to this module for tests.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@dataset-registry/pom.xml` around lines 129 - 148, The exclusions removed
lz4-java and snappy-java from the test classpath but only re-adds kryo, causing
runtime ClassNotFoundExceptions in flink-runtime tests; update this module's pom
to explicitly add test-scope dependencies for at.yawk.lz4:lz4-java:1.10.3 and
org.xerial.snappy:snappy-java:1.1.10.5 (the same versions pinned in
framework/pom.xml) or otherwise ensure those compile-scope pins are transitively
visible to tests, keeping the existing kryo test dependency and leaving the
flink-runtime exclusion block intact.
| RUN mvn clean package -DskipTests -f /app/pipeline/pom.xml | ||
|
|
||
| FROM sanketikahub/flink:1.20-scala_2.12-java11 AS extractor-image | ||
| FROM public.ecr.aws/docker/library/flink:1.20-scala_2.12-java11 AS unified-image |
There was a problem hiding this comment.
Base image violates coding guideline.
As per coding guidelines, "Docker images for deployment must use base image: sanketikahub/flink:1.20-scala_2.12-java11". This stage (and cache-indexer-image at line 27) uses public.ecr.aws/docker/library/flink:1.20-scala_2.12-java11, which is the upstream Apache Flink image rather than the project's hardened base. Revert to sanketikahub/flink:1.20-scala_2.12-java11 unless the guideline is being formally changed — in which case update CLAUDE.md in the same PR.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@Dockerfile` at line 10, The Dockerfile uses the upstream Flink base image;
change the FROM references for the unified-image stage (symbol: unified-image)
and the cache-indexer-image stage (symbol: cache-indexer-image) from
public.ecr.aws/docker/library/flink:1.20-scala_2.12-java11 to the approved
sanketikahub/flink:1.20-scala_2.12-java11; if you intend to change the project
guideline instead, update CLAUDE.md in the same PR to document the new approved
base image.
| FROM public.ecr.aws/docker/library/flink:1.20-scala_2.12-java11 AS unified-image | ||
| USER flink | ||
| RUN mkdir -p $FLINK_HOME/usrlib | ||
| COPY --from=build-pipeline /app/pipeline/extractor/target/extractor-1.0.0.jar $FLINK_HOME/usrlib/ | ||
|
|
||
| FROM sanketikahub/flink:1.20-scala_2.12-java11 AS preprocessor-image | ||
| USER flink | ||
| RUN mkdir -p $FLINK_HOME/usrlib | ||
| COPY --from=build-pipeline /app/pipeline/preprocessor/target/preprocessor-1.0.0.jar $FLINK_HOME/usrlib/ | ||
|
|
||
| FROM sanketikahub/flink:1.20-scala_2.12-java11 AS denormalizer-image | ||
| USER flink | ||
| RUN mkdir -p $FLINK_HOME/usrlib | ||
| COPY --from=build-pipeline /app/pipeline/denormalizer/target/denormalizer-1.0.0.jar $FLINK_HOME/usrlib/ | ||
|
|
||
| FROM sanketikahub/flink:1.20-scala_2.12-java11 AS transformer-image | ||
| USER flink | ||
| RUN mkdir -p $FLINK_HOME/usrlib | ||
| COPY --from=build-pipeline /app/pipeline/transformer/target/transformer-1.0.0.jar $FLINK_HOME/usrlib/ | ||
|
|
||
| FROM sanketikahub/flink:1.20-scala_2.12-java11 AS dataset-router-image | ||
| USER flink | ||
| RUN mkdir -p $FLINK_HOME/usrlib | ||
| COPY --from=build-pipeline /app/pipeline/dataset-router/target/dataset-router-1.0.0.jar $FLINK_HOME/usrlib/ | ||
|
|
||
| # unified image build | ||
| FROM sanketikahub/flink:1.20-scala_2.12-java11 AS unified-image | ||
| USER flink | ||
| RUN mkdir -p $FLINK_HOME/usrlib | ||
| # Move the bundled flink-s3-fs-hadoop plugin from opt/ to the required plugins subfolder. | ||
| # This avoids a network download and guarantees the plugin version matches the runtime. | ||
| RUN mkdir -p $FLINK_HOME/usrlib && \ | ||
| mkdir -p $FLINK_HOME/plugins/flink-s3-fs-hadoop && \ | ||
| mv $FLINK_HOME/opt/flink-s3-fs-hadoop-*.jar $FLINK_HOME/plugins/flink-s3-fs-hadoop/ | ||
| # Use IRSA/OIDC (Web Identity Token) for S3 auth instead of static access keys. | ||
| # EKS injects AWS_ROLE_ARN and AWS_WEB_IDENTITY_TOKEN_FILE into pods whose service | ||
| # account has an IAM role annotation; WebIdentityTokenCredentialsProvider reads them. | ||
| RUN if [ -f "$FLINK_HOME/conf/config.yaml" ]; then \ | ||
| echo 's3.aws.credentials.provider: com.amazonaws.auth.WebIdentityTokenCredentialsProvider' >> $FLINK_HOME/conf/config.yaml; \ | ||
| else \ | ||
| echo 's3.aws.credentials.provider: com.amazonaws.auth.WebIdentityTokenCredentialsProvider' >> $FLINK_HOME/conf/flink-conf.yaml; \ | ||
| fi | ||
| COPY --from=build-pipeline /app/pipeline/unified-pipeline/target/unified-pipeline-1.0.0.jar $FLINK_HOME/usrlib/ | ||
|
|
||
| # # Lakehouse connector image build | ||
| # FROM sanketikahub/flink:1.17.2-scala_2.12-java11 AS lakehouse-connector-image | ||
| # USER flink | ||
| # RUN wget https://repo1.maven.org/maven2/org/apache/flink/flink-shaded-hadoop-2-uber/2.8.3-10.0/flink-shaded-hadoop-2-uber-2.8.3-10.0.jar | ||
| # RUN wget https://repo1.maven.org/maven2/org/apache/flink/flink-s3-fs-hadoop/1.17.2/flink-s3-fs-hadoop-1.17.2.jar | ||
| # RUN wget https://repo.maven.apache.org/maven2/org/apache/hudi/hudi-flink1.17-bundle/1.0.2/hudi-flink1.17-bundle-1.0.2.jar | ||
| # RUN mv flink-shaded-hadoop-2-uber-2.8.3-10.0.jar $FLINK_HOME/lib | ||
| # RUN mv flink-s3-fs-hadoop-1.17.2.jar $FLINK_HOME/lib | ||
| # RUN mv hudi-flink1.17-bundle-1.0.2.jar $FLINK_HOME/lib | ||
| # # RUN mkdir $FLINK_HOME/custom-lib | ||
| # COPY --from=build-pipeline /app/pipeline/hudi-connector/target/hudi-connector-1.0.0.jar $FLINK_HOME/lib | ||
|
|
||
| # cache indexer image build | ||
| FROM sanketikahub/flink:1.20-scala_2.12-java11 AS cache-indexer-image | ||
| FROM public.ecr.aws/docker/library/flink:1.20-scala_2.12-java11 AS cache-indexer-image | ||
| USER flink | ||
| RUN mkdir -p $FLINK_HOME/usrlib | ||
| RUN mkdir -p $FLINK_HOME/usrlib && \ | ||
| mkdir -p $FLINK_HOME/plugins/flink-s3-fs-hadoop && \ | ||
| mv $FLINK_HOME/opt/flink-s3-fs-hadoop-*.jar $FLINK_HOME/plugins/flink-s3-fs-hadoop/ | ||
| RUN if [ -f "$FLINK_HOME/conf/config.yaml" ]; then \ | ||
| echo 's3.aws.credentials.provider: com.amazonaws.auth.WebIdentityTokenCredentialsProvider' >> $FLINK_HOME/conf/config.yaml; \ | ||
| else \ | ||
| echo 's3.aws.credentials.provider: com.amazonaws.auth.WebIdentityTokenCredentialsProvider' >> $FLINK_HOME/conf/flink-conf.yaml; \ | ||
| fi | ||
| COPY --from=build-pipeline /app/pipeline/cache-indexer/target/cache-indexer-1.0.0.jar $FLINK_HOME/usrlib/ No newline at end of file |
There was a problem hiding this comment.
CI/CD build matrix references removed Docker targets — pipeline will fail.
The removed stages (extractor-image, preprocessor-image, denormalizer-image, transformer-image, dataset-router-image) are still present in .github/workflows/build_and_deploy.yaml (lines 19–31). After this change, the workflow will fail on every one of those matrix entries with failed to solve: target <name>-image: not found. Either:
- Add the stages back (aliased to
unified-imageif the consolidation is intentional), or - Update
build_and_deploy.yamlin this PR to drop those matrix entries.
🧰 Tools
🪛 Trivy (0.69.3)
[error] 22-22: 'RUN update' instruction alone
The instruction 'RUN update' should always be followed by ' install' in the same RUN statement.
Rule: DS-0017
(IaC/Dockerfile)
[error] 23-23: 'apt-get' missing '--no-install-recommends'
'--no-install-recommends' flag is missed: 'apt-get install libcurl4 curl -y'
Rule: DS-0029
(IaC/Dockerfile)
[error] 26-29: 'apt-get' missing '--no-install-recommends'
'--no-install-recommends' flag is missed: 'set -ex; apt-get update; apt-get -y install gpg libsnappy1v5 gettext-base libjemalloc-dev; rm -rf /var/lib/apt/lists/*'
Rule: DS-0029
(IaC/Dockerfile)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@Dockerfile` around lines 10 - 37, The Dockerfile removed several build stages
(extractor-image, preprocessor-image, denormalizer-image, transformer-image,
dataset-router-image) while keeping unified-image and cache-indexer-image, which
breaks the CI matrix in build_and_deploy.yaml; fix by either re-adding the
missing stages as aliases that point to the consolidated artifact (e.g., create
additional FROM ... AS extractor-image / AS preprocessor-image entries that
mirror unified-image behavior or add simple stage aliases referencing the same
jar) or update the GitHub Actions matrix to remove those stage names, ensuring
the matrix entries match existing Dockerfile targets (refer to the Dockerfile
stage identifiers unified-image and cache-indexer-image and the COPY targets
like unified-pipeline-1.0.0.jar and cache-indexer-1.0.0.jar).
| @@ -1,4 +1,4 @@ | |||
| FROM --platform=linux/x86_64 flink:1.20-scala_2.12-java11 | |||
| FROM --platform=linux/x86_64 public.ecr.aws/docker/library/flink:1.20-scala_2.12-java11 | |||
There was a problem hiding this comment.
Base image violates coding guideline.
As per coding guidelines, "Docker images for deployment must use base image: sanketikahub/flink:1.20-scala_2.12-java11". Revert to the project's hardened base, or update the guideline in CLAUDE.md in the same PR if the switch to the ECR public mirror is deliberate.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@FlinkDockerfile` at line 1, The Dockerfile currently uses the public ECR base
image string "public.ecr.aws/docker/library/flink:1.20-scala_2.12-java11" which
violates the project's guideline requiring the hardened base
"sanketikahub/flink:1.20-scala_2.12-java11"; update the FROM line to use the
sanctioned base image (replace the image string), or if the ECR mirror is
intentional, add/update rationale and change the guideline in CLAUDE.md within
this PR to permit the new base.
| <groupId>org.apache.httpcomponents</groupId> | ||
| <artifactId>httpclient</artifactId> | ||
| <version>4.5.13</version> | ||
| <exclusions> | ||
| <exclusion> | ||
| <groupId>commons-codec</groupId> | ||
| <artifactId>commons-codec</artifactId> | ||
| </exclusion> | ||
| </exclusions> | ||
| </dependency> | ||
| <dependency> | ||
| <groupId>commons-codec</groupId> | ||
| <artifactId>commons-codec</artifactId> | ||
| <version>1.15</version> | ||
| </dependency> |
There was a problem hiding this comment.
httpclient version is inconsistent with dataset-registry/pom.xml.
framework/pom.xml pins httpclient:4.5.13, but dataset-registry/pom.xml uses 4.5.14 (the explicit CVE-fix version). Since framework is a transitive parent for many modules, the lower version here can resolve first on downstream classpaths. Align to 4.5.14.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@framework/pom.xml` around lines 148 - 162, Update the httpclient dependency
version in framework's POM: locate the <dependency> element with
<groupId>org.apache.httpcomponents</groupId> and
<artifactId>httpclient</artifactId> (in framework/pom.xml) and change its
<version> from 4.5.13 to 4.5.14 so it matches dataset-registry's CVE-fixed
version; keep the existing <exclusions> (commons-codec) and the separate
commons-codec dependency unchanged.
| <dependency> | ||
| <groupId>com.esotericsoftware</groupId> | ||
| <artifactId>kryo</artifactId> | ||
| <version>4.0.3</version> | ||
| </dependency> |
There was a problem hiding this comment.
🧩 Analysis chain
🌐 Web query:
What version of Kryo does Apache Flink 1.20 bundle?
💡 Result:
Apache Flink 1.20 bundles Kryo 2.24.0. Flink 1.20 is part of the 1.x series, which uses Kryo version 2.x for serialization compatibility. The upgrade to Kryo 5.x occurred in Flink 2.0, as evidenced by pull requests and discussions targeting Flink 2.0 branches. Release notes for 1.20 do not mention any Kryo version change, and the KryoSerializer code in 1.20 uses the legacy Kryo 2.x with Twitter Chill support. Specific version 2.24.0 is referenced in upgrade PRs as the prior version before the 5.x upgrade in later versions.
Citations:
- 1: https://github.com/apache/flink/blob/master/flink-core/src/main/java/org/apache/flink/api/java/typeutils/runtime/kryo/KryoSerializer.java
- 2: [FLINK-24017] Setup Kryo to be usable without flink-scala apache/flink#17402
- 3: [FLINK-37661] Remove Kryo serializer reference data from version 1.x apache/flink#26444
- 4: https://github.com/a0x8o/flink/blob/master/flink-core/src/main/java/org/apache/flink/api/java/typeutils/runtime/kryo/KryoSerializer.java
- 5: [FLINK-3154][API] Upgrade from Kryo 2.x to Kryo 5.x. Removed twitter … apache/flink#25896
- 6: [FLINK-3154][runtime] Upgrade from Kryo v2 + Chill 0.7.6 to Kryo v5 w… apache/flink#22660
Reconcile kryo 4.0.3 scope with Flink 1.20's bundled Kryo 2.24.0.
kryo:4.0.3 is in compile scope and will be packaged into the shaded job jar. However, Flink 1.20 bundles Kryo 2.24.0 for its serialization (KryoSerializer). Including a newer 4.0.3 version in the user jar risks version conflicts on the classpath during job execution—the runtime may expect the 2.24.0 APIs that 4.0.3 may not fully provide or may behave differently. Either scope kryo to provided (relying on Flink's bundled version) or explicitly verify compatibility with Kryo 2.24.0.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@pipeline/cache-indexer/pom.xml` around lines 40 - 44, The Kryo dependency
(groupId com.esotericsoftware, artifactId kryo, version 4.0.3) is declared in
compile scope and will be shaded into the job jar, which can conflict with Flink
1.20's bundled Kryo 2.24.0; update the pom so this dependency uses scope
"provided" (or remove it) so the runtime uses Flink's Kryo, or else explicitly
verify and document compatibility with Kryo 2.24.0 if you must keep 4.0.3
(change the <dependency> for com.esotericsoftware:kryo accordingly).
| <exclusions> | ||
| <exclusion> | ||
| <groupId>com.esotericsoftware.kryo</groupId> | ||
| <artifactId>kryo</artifactId> | ||
| </exclusion> | ||
| <exclusion> | ||
| <groupId>org.apache.commons</groupId> | ||
| <artifactId>commons-lang3</artifactId> | ||
| </exclusion> | ||
| <exclusion> | ||
| <groupId>org.lz4</groupId> | ||
| <artifactId>lz4-java</artifactId> | ||
| </exclusion> | ||
| <exclusion> | ||
| <groupId>org.xerial.snappy</groupId> | ||
| <artifactId>snappy-java</artifactId> | ||
| </exclusion> | ||
| </exclusions> | ||
| </dependency> | ||
| <dependency> | ||
| <groupId>org.apache.commons</groupId> | ||
| <artifactId>commons-lang3</artifactId> | ||
| <version>3.18.0</version> | ||
| </dependency> |
There was a problem hiding this comment.
Mixed-indentation/tab issue and unintended compile-scope commons-lang3.
Two issues in this block:
- Line 207 has a broken indentation with tabs mixed into the exclusion start tag — cosmetic but fragile for future edits:
<exclusion> commons-lang3:3.18.0at lines 213–217 has no<scope>, so it becomes compile-scope and will be shaded into the fat jar.flink-runtimeexcludedcommons-lang3only at test scope, so this doesn't "replace" anything for runtime — Flink's lib directory already providescommons-lang3. Bundling it will create classloader visibility risks (child-first classloading). Either mark ittestorprovided.
♻️ Suggested fix
- <exclusion>
+ <exclusion>
<groupId>org.xerial.snappy</groupId>
<artifactId>snappy-java</artifactId>
</exclusion>
</exclusions>
</dependency>
- <dependency>
+ <dependency>
<groupId>org.apache.commons</groupId>
<artifactId>commons-lang3</artifactId>
<version>3.18.0</version>
+ <scope>provided</scope>
</dependency>📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| <exclusions> | |
| <exclusion> | |
| <groupId>com.esotericsoftware.kryo</groupId> | |
| <artifactId>kryo</artifactId> | |
| </exclusion> | |
| <exclusion> | |
| <groupId>org.apache.commons</groupId> | |
| <artifactId>commons-lang3</artifactId> | |
| </exclusion> | |
| <exclusion> | |
| <groupId>org.lz4</groupId> | |
| <artifactId>lz4-java</artifactId> | |
| </exclusion> | |
| <exclusion> | |
| <groupId>org.xerial.snappy</groupId> | |
| <artifactId>snappy-java</artifactId> | |
| </exclusion> | |
| </exclusions> | |
| </dependency> | |
| <dependency> | |
| <groupId>org.apache.commons</groupId> | |
| <artifactId>commons-lang3</artifactId> | |
| <version>3.18.0</version> | |
| </dependency> | |
| <exclusions> | |
| <exclusion> | |
| <groupId>com.esotericsoftware.kryo</groupId> | |
| <artifactId>kryo</artifactId> | |
| </exclusion> | |
| <exclusion> | |
| <groupId>org.apache.commons</groupId> | |
| <artifactId>commons-lang3</artifactId> | |
| </exclusion> | |
| <exclusion> | |
| <groupId>org.lz4</groupId> | |
| <artifactId>lz4-java</artifactId> | |
| </exclusion> | |
| <exclusion> | |
| <groupId>org.xerial.snappy</groupId> | |
| <artifactId>snappy-java</artifactId> | |
| </exclusion> | |
| </exclusions> | |
| </dependency> | |
| <dependency> | |
| <groupId>org.apache.commons</groupId> | |
| <artifactId>commons-lang3</artifactId> | |
| <version>3.18.0</version> | |
| <scope>provided</scope> | |
| </dependency> |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@pipeline/cache-indexer/pom.xml` around lines 194 - 217, Fix the mixed-tab
indentation on the <exclusion> block (remove the stray tab characters so the tag
aligns with the other <exclusion> entries) and change the commons-lang3
dependency declaration (artifactId commons-lang3, version 3.18.0) to a
non-compile scope (set <scope>provided</scope> or <scope>test</scope>) so it is
not shaded into the fat jar and doesn't conflict with Flink's runtime-provided
commons-lang3; update the dependency block accordingly.
| <dependency> | ||
| <groupId>org.apache.kafka</groupId> | ||
| <artifactId>kafka-clients</artifactId> | ||
| <version>${kafka.version}</version> | ||
| <exclusions> | ||
| <exclusion> | ||
| <groupId>org.lz4</groupId> | ||
| <artifactId>lz4-java</artifactId> | ||
| </exclusion> | ||
| <!-- Fix CVE-2023-43642 / BDSA-2023-2110/2111/2113: exclude vulnerable snappy-java --> | ||
| <exclusion> | ||
| <groupId>org.xerial.snappy</groupId> | ||
| <artifactId>snappy-java</artifactId> | ||
| </exclusion> | ||
| </exclusions> | ||
| </dependency> | ||
| <!-- Fix CVE-2023-43642 / BDSA-2023-2110/2111/2113: safe snappy-java replacing kafka-clients transitive --> | ||
| <dependency> | ||
| <groupId>org.xerial.snappy</groupId> | ||
| <artifactId>snappy-java</artifactId> | ||
| <version>1.1.10.5</version> | ||
| </dependency> |
There was a problem hiding this comment.
kafka-clients at the parent POM has no scope — this bundles it into every child module’s shaded jar.
Declaring org.apache.kafka:kafka-clients here with no <scope> makes it compile-scope, and because this is the parent pipeline POM, every child (extractor, transformer, denormalizer, preprocessor, dataset-router, cache-indexer, unified-pipeline) inherits it as compile-scope unless they redeclare it. In the child POMs you still see it re-declared as <scope>test</scope>, which now only overrides per module — any module that forgets to redeclare will ship kafka-clients-3.9.1.jar inside its fat jar and potentially collide with the Kafka client library the Flink cluster already provides.
Since org.apache.flink:flink-connector-kafka:3.3.0-1.20 normally supplies kafka-clients in the provided Flink runtime, this should almost certainly be <scope>provided</scope> here, or moved out of the parent and into only the modules that actually need it on the runtime classpath.
🛠️ Suggested fix
<dependency>
<groupId>org.apache.kafka</groupId>
<artifactId>kafka-clients</artifactId>
<version>${kafka.version}</version>
+ <scope>provided</scope>
<exclusions>
<exclusion>
<groupId>org.lz4</groupId>
<artifactId>lz4-java</artifactId>
</exclusion>
<!-- Fix CVE-2023-43642 / BDSA-2023-2110/2111/2113: exclude vulnerable snappy-java -->
<exclusion>
<groupId>org.xerial.snappy</groupId>
<artifactId>snappy-java</artifactId>
</exclusion>
</exclusions>
</dependency>🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@pipeline/pom.xml` around lines 142 - 163, The parent POM currently declares
the org.apache.kafka:kafka-clients dependency (artifactId kafka-clients,
property ${kafka.version}) with no scope which causes it to be inherited as
compile scope by all child modules; change this declaration in the pipeline
parent POM to use <scope>provided</scope> (or remove it from the parent and
place it only into modules that truly need kafka-clients at runtime, e.g.,
extractor/transformer/denormalizer/etc.) so that kafka-clients is not packaged
into every child fat-jar and avoids classpath collisions with the Flink runtime
(reference the kafka-clients dependency entry in the parent POM).
| <exclusions> | ||
| <exclusion> | ||
| <groupId>com.esotericsoftware.kryo</groupId> | ||
| <artifactId>kryo</artifactId> | ||
| </exclusion> | ||
| <exclusion> | ||
| <groupId>org.apache.commons</groupId> | ||
| <artifactId>commons-lang3</artifactId> | ||
| </exclusion> | ||
| <exclusion> | ||
| <groupId>org.lz4</groupId> | ||
| <artifactId>lz4-java</artifactId> | ||
| </exclusion> | ||
| <exclusion> | ||
| <groupId>org.xerial.snappy</groupId> | ||
| <artifactId>snappy-java</artifactId> | ||
| </exclusion> | ||
| </exclusions> | ||
| </dependency> | ||
| <dependency> | ||
| <groupId>org.apache.commons</groupId> | ||
| <artifactId>commons-lang3</artifactId> | ||
| <version>3.18.0</version> | ||
| </dependency> |
There was a problem hiding this comment.
Same malformed indentation and unintended compile-scope commons-lang3 as in cache-indexer/pom.xml.
Line 230 mixes tabs/spaces inside the exclusions block, and commons-lang3:3.18.0 at lines 236–240 is declared without a scope — it will be shaded into the fat jar and can clash with Flink's own commons-lang3 via child-first classloading. Mark it provided (or test if only tests need it) and fix the indentation.
♻️ Suggested fix
- <exclusion>
+ <exclusion>
<groupId>org.xerial.snappy</groupId>
<artifactId>snappy-java</artifactId>
</exclusion>
</exclusions>
</dependency>
- <dependency>
+ <dependency>
<groupId>org.apache.commons</groupId>
<artifactId>commons-lang3</artifactId>
<version>3.18.0</version>
+ <scope>provided</scope>
</dependency>📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| <exclusions> | |
| <exclusion> | |
| <groupId>com.esotericsoftware.kryo</groupId> | |
| <artifactId>kryo</artifactId> | |
| </exclusion> | |
| <exclusion> | |
| <groupId>org.apache.commons</groupId> | |
| <artifactId>commons-lang3</artifactId> | |
| </exclusion> | |
| <exclusion> | |
| <groupId>org.lz4</groupId> | |
| <artifactId>lz4-java</artifactId> | |
| </exclusion> | |
| <exclusion> | |
| <groupId>org.xerial.snappy</groupId> | |
| <artifactId>snappy-java</artifactId> | |
| </exclusion> | |
| </exclusions> | |
| </dependency> | |
| <dependency> | |
| <groupId>org.apache.commons</groupId> | |
| <artifactId>commons-lang3</artifactId> | |
| <version>3.18.0</version> | |
| </dependency> | |
| <exclusions> | |
| <exclusion> | |
| <groupId>com.esotericsoftware.kryo</groupId> | |
| <artifactId>kryo</artifactId> | |
| </exclusion> | |
| <exclusion> | |
| <groupId>org.apache.commons</groupId> | |
| <artifactId>commons-lang3</artifactId> | |
| </exclusion> | |
| <exclusion> | |
| <groupId>org.lz4</groupId> | |
| <artifactId>lz4-java</artifactId> | |
| </exclusion> | |
| <exclusion> | |
| <groupId>org.xerial.snappy</groupId> | |
| <artifactId>snappy-java</artifactId> | |
| </exclusion> | |
| </exclusions> | |
| </dependency> | |
| <dependency> | |
| <groupId>org.apache.commons</groupId> | |
| <artifactId>commons-lang3</artifactId> | |
| <version>3.18.0</version> | |
| <scope>provided</scope> | |
| </dependency> |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@pipeline/unified-pipeline/pom.xml` around lines 217 - 240, The pom has mixed
tabs/spaces in the <exclusions> block and declares
org.apache.commons:commons-lang3:3.18.0 without a scope which will shade it into
the fat JAR; update the indentation in the exclusions block to use consistent
spaces and remove the stray tab, and change the commons-lang3 dependency
declaration (artifactId commons-lang3, version 3.18.0) to include a scope of
provided (or test if only used in tests) so it is not packaged into the fat jar.
| <dependencyManagement> | ||
| <dependencies> | ||
| <!-- Fix BDSA-2025-5099 (HIGH): PostgreSQL JDBC channel-binding bypass --> | ||
| <dependency> | ||
| <groupId>org.postgresql</groupId> | ||
| <artifactId>postgresql</artifactId> | ||
| <version>42.7.7</version> | ||
| </dependency> | ||
| <!-- Fix BDSA-2025-20661 (MEDIUM): SMTP injection in Jakarta Mail / mailapi --> | ||
| <dependency> | ||
| <groupId>com.sun.mail</groupId> | ||
| <artifactId>mailapi</artifactId> | ||
| <version>1.6.8</version> | ||
| </dependency> | ||
| <!-- Fix CVE-2023-43642, BDSA-2023-2110/2111/2113 (HIGH/MEDIUM): snappy-java integer overflow / DoS --> | ||
| <dependency> | ||
| <groupId>org.xerial.snappy</groupId> | ||
| <artifactId>snappy-java</artifactId> | ||
| <version>1.1.10.5</version> | ||
| </dependency> | ||
| <!-- Fix BDSA-2022-2582 (MEDIUM): Woodstox stack-based buffer overflow --> | ||
| <dependency> | ||
| <groupId>com.fasterxml.woodstox</groupId> | ||
| <artifactId>woodstox-core</artifactId> | ||
| <version>6.7.0</version> | ||
| </dependency> | ||
| <!-- Fix CVE-2024-23944 (HIGH): Apache ZooKeeper authorization bypass; 3.9.3 also fixes transitive Netty 4.1.105, Logback 1.2.13, Commons IO 2.11.0 --> | ||
| <dependency> | ||
| <groupId>org.apache.zookeeper</groupId> | ||
| <artifactId>zookeeper</artifactId> | ||
| <version>3.9.3</version> | ||
| </dependency> | ||
| <dependency> | ||
| <groupId>org.apache.zookeeper</groupId> | ||
| <artifactId>zookeeper-jute</artifactId> | ||
| <version>3.9.3</version> | ||
| </dependency> | ||
| <!-- Fix CVE-2023-6378 (MEDIUM): Logback serialization vulnerability --> | ||
| <dependency> | ||
| <groupId>ch.qos.logback</groupId> | ||
| <artifactId>logback-classic</artifactId> | ||
| <version>1.5.18</version> | ||
| </dependency> | ||
| <dependency> | ||
| <groupId>ch.qos.logback</groupId> | ||
| <artifactId>logback-core</artifactId> | ||
| <version>1.5.18</version> | ||
| </dependency> | ||
| <!-- Fix CVE-2024-47554 (MEDIUM): Apache Commons IO path traversal --> | ||
| <dependency> | ||
| <groupId>commons-io</groupId> | ||
| <artifactId>commons-io</artifactId> | ||
| <version>2.15.1</version> | ||
| </dependency> | ||
| </dependencies> | ||
| </dependencyManagement> |
There was a problem hiding this comment.
🧩 Analysis chain
🌐 Web query:
Is Apache Flink connector flink-connector-kafka 3.3.0-1.20 compatible with kafka-clients 3.9.1?
💡 Result:
No, the Apache Flink connector flink-connector-kafka 3.3.0-1.20 is not compatible with kafka-clients 3.9.1. It depends on an earlier version of kafka-clients, specifically around 3.4.0 as per updates in FLINK-31599 and prior to the update to 3.9.0 in April 2025 via PR #138 which was merged into main after the 3.3.0 release (October 2024). The dependency is listed as ${kafka.version} in the POM, resolved to pre-3.9. Using kafka-clients 3.9.1 would require a newer connector version like 4.0.x. Older versions had incompatibility issues with Kafka 3.3.x clients (FLINK-29977).
Citations:
- 1: https://central.sonatype.com/artifact/org.apache.flink/flink-connector-kafka/3.3.0-1.20
- 2: [FLINK-36821] Update to Kafka 3.9.0 apache/flink-connector-kafka#138
- 3: https://issues.apache.org/jira/browse/FLINK-31599
- 4: https://issues.apache.org/jira/browse/FLINK-29977
🏁 Script executed:
# Check the root pom.xml structure and kafka.version property
head -50 pom.xmlRepository: Sanketika-Obsrv/obsrv-core
Length of output: 1675
🏁 Script executed:
# Find kafka.version in pom.xml
rg "kafka.version" pom.xml -A 2 -B 2Repository: Sanketika-Obsrv/obsrv-core
Length of output: 298
🏁 Script executed:
# Check if flink-connector-kafka is in the pom
rg "flink-connector-kafka" pom.xml -A 2 -B 2Repository: Sanketika-Obsrv/obsrv-core
Length of output: 52
🏁 Script executed:
# List the pipeline child modules to verify
ls -la | grep -E "pom|src"Repository: Sanketika-Obsrv/obsrv-core
Length of output: 128
🏁 Script executed:
# Find all pom.xml files (root and modules)
fd "pom\.xml$" | head -20Repository: Sanketika-Obsrv/obsrv-core
Length of output: 399
🏁 Script executed:
# Check for flink-connector-kafka in all pom files
rg "flink-connector-kafka" --type xmlRepository: Sanketika-Obsrv/obsrv-core
Length of output: 297
🏁 Script executed:
# Check root pom.xml dependencyManagement section
sed -n '63,118p' pom.xmlRepository: Sanketika-Obsrv/obsrv-core
Length of output: 2099
🏁 Script executed:
# Check if the dependencies listed in review are in any child module (sample extractor)
rg "kryo|commons-lang3|commons-compress|netty-handler|jose4j|assertj-core|embedded-kafka" pipeline/extractor/pom.xml -B 2 -A 2Repository: Sanketika-Obsrv/obsrv-core
Length of output: 3180
🏁 Script executed:
# Check another child module (transformer)
rg "kryo|commons-lang3|commons-compress|netty-handler|jose4j|assertj-core|embedded-kafka" pipeline/transformer/pom.xml -B 2 -A 2Repository: Sanketika-Obsrv/obsrv-core
Length of output: 3177
🏁 Script executed:
# Check if dependencyManagement section is already in root pom and what's in it
sed -n '60,150p' pom.xmlRepository: Sanketika-Obsrv/obsrv-core
Length of output: 2884
🏁 Script executed:
# Get the exact version of flink-connector-kafka from the pom files
rg "flink-connector-kafka" -A 3 pipeline/pom.xmlRepository: Sanketika-Obsrv/obsrv-core
Length of output: 196
🏁 Script executed:
# Check framework pom for flink-connector-kafka version
rg "flink-connector-kafka" -A 3 framework/pom.xmlRepository: Sanketika-Obsrv/obsrv-core
Length of output: 223
🏁 Script executed:
# Check dataset-registry for flink-connector-kafka version
rg "flink-connector-kafka" -A 3 dataset-registry/pom.xmlRepository: Sanketika-Obsrv/obsrv-core
Length of output: 223
🏁 Script executed:
# Verify all child modules have the same scattered dependencies - check denormalizer
rg "kryo|commons-lang3|embedded-kafka" pipeline/denormalizer/pom.xml -B 1 -A 1Repository: Sanketika-Obsrv/obsrv-core
Length of output: 1302
🏁 Script executed:
# Check preprocessor module
rg "kryo|commons-lang3|embedded-kafka" pipeline/preprocessor/pom.xml -B 1 -A 1Repository: Sanketika-Obsrv/obsrv-core
Length of output: 1441
🏁 Script executed:
# Check dataset-router module
rg "kryo|commons-lang3|embedded-kafka" pipeline/dataset-router/pom.xml -B 1 -A 1Repository: Sanketika-Obsrv/obsrv-core
Length of output: 1302
Critical: Kafka 3.9.1 is incompatible with flink-connector-kafka:3.3.0-1.20.
The kafka.version property was bumped to 3.9.1, but the codebase uses flink-connector-kafka:3.3.0-1.20 (released October 2024). This connector requires kafka-clients ~3.4.0 or 3.7.x and cannot work with 3.9.1. To use Kafka 3.9.1, you must upgrade to flink-connector-kafka:4.0.x or revert kafka.version to ≤3.8.x.
Additionally, recommend centralizing the remaining repeated pins in this <dependencyManagement> block:
com.esotericsoftware:kryo:4.0.3org.apache.commons:commons-lang3:3.18.0org.apache.commons:commons-compress:1.26.0commons-beanutils:commons-beanutils:1.11.0org.bitbucket.b_c:jose4j:0.9.6io.netty:netty-handler:4.1.130.Finalio.netty:netty-transport-native-epoll:4.1.130.Finalorg.assertj:assertj-core:3.27.7io.github.embeddedkafka:embedded-kafka_2.12:3.9.1
These are pinned across pipeline child modules (denormalizer, preprocessor, dataset-router, extractor, transformer). Centralizing here prevents drift when any module bumps independently.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@pom.xml` around lines 63 - 118, The PR bumped kafka.version to 3.9.1 which is
incompatible with the existing flink connector
(flink-connector-kafka:3.3.0-1.20); either revert kafka.version to <=3.8.x or
upgrade the connector to a 4.0.x flink-connector-kafka that supports Kafka
3.9.1, and update the dependencyManagement block accordingly; also centralize
the repeated pinned artifacts (com.esotericsoftware:kryo:4.0.3,
org.apache.commons:commons-lang3:3.18.0,
org.apache.commons:commons-compress:1.26.0,
commons-beanutils:commons-beanutils:1.11.0, org.bitbucket.b_c:jose4j:0.9.6,
io.netty:netty-handler:4.1.130.Final,
io.netty:netty-transport-native-epoll:4.1.130.Final,
org.assertj:assertj-core:3.27.7,
io.github.embeddedkafka:embedded-kafka_2.12:3.9.1) into this
dependencyManagement section so child modules (denormalizer, preprocessor,
dataset-router, extractor, transformer) inherit a single pinned version and
avoid drift.
Summary by CodeRabbit
Release Notes
Infrastructure Updates
Feature Removals
Dependencies