Skip to content

Make analytics-engine an optional dependency via AnalyticsFrontEndExtension SPI#5401

Open
ahkcs wants to merge 1 commit intoopensearch-project:feature/mustang-ppl-integrationfrom
ahkcs:feature/analytics-spi-e2e
Open

Make analytics-engine an optional dependency via AnalyticsFrontEndExtension SPI#5401
ahkcs wants to merge 1 commit intoopensearch-project:feature/mustang-ppl-integrationfrom
ahkcs:feature/analytics-spi-e2e

Conversation

@ahkcs
Copy link
Copy Markdown
Collaborator

@ahkcs ahkcs commented May 1, 2026

Summary

End-to-end SQL-side wiring for the AnalyticsFrontEndExtension SPI. Lets the SQL plugin install and run on stock OpenSearch distros that don't ship analytics-engine, while preserving today's analytics routing when both plugins are co-installed.

Fixes the Missing plugin [analytics-engine] install failure observed in feature-build-opensearch #203. Once this lands, Peter's install-script workaround in opensearch-build can be retired.

Pairs with opensearch-project/OpenSearch#21453 — the producer-side wiring.

Why marking it ;optional=true alone is not enough

Two failure modes block the optional path:

  1. Guice cross-plugin injection. TransportPPLQueryAction had @Inject QueryPlanExecutor. When AE is absent the binding does not exist → SQL plugin construction fails. Fixed by removing the @Inject parameter and resolving the executor lazily from a holder.

  2. Static analytics-framework references in SQL signatures. AnalyticsExecutorHolder previously had QueryPlanExecutor in its setter/getter signatures. Even after dropping @Inject, the first call to AnalyticsExecutorHolder.get() would resolve the return type → NoClassDefFoundError when analytics-framework is not on the classpath. Fixed by typing the holder as Object internally and confining the cast to RestUnifiedQueryAction.fromUnknownExecutor (only loaded after the holder confirms a non-null executor).

Why the bundlePlugin exclusion list is preserved

PR #5302 added a patched Calcite (CALCITE-3745) inside analytics-engine's bundled calcite-core-1.41.0.jar so Janino picks up the thread context classloader for runtime code generation. Without that patch, queries using mode=extended fail with CompileException: Cannot determine simple type name "org" because Janino's default classloader (the parent's, AE's) can't see SQL-plugin classes.

Removing the bundlePlugin { exclude ... } block makes SQL bundle its own unpatched Calcite/Janino, which competes with AE's patched copy via parent-first delegation. Doctest's endpoint.md test breaks. The exclusion list is therefore retained — SQL relies on AE's patched Calcite at runtime, and ;optional=true only covers the install side, not the runtime classloader story.

When AE is absent, the SPI never fires, RestUnifiedQueryAction is never constructed, and SQL falls through to the legacy Lucene PPL pipeline for everything.

What's in this PR

Build wiring (plugin/build.gradle)

  • extendedPlugins = ['opensearch-job-scheduler', 'analytics-engine;optional=true']. JarHell skipped against the optional dep.
  • bundlePlugin { exclude ... } block kept (with three new entries httpcore5-h2-*.jar, httpcore5-reactive-*.jar, httpclient5-*.jar from Exclude httpcore5-h2, httpcore5-reactive, httpclient5 from SQL bundle to fix jar hell #5400).
  • Vendored libs/analytics-framework-3.7.0-SNAPSHOT.jar and libs/analytics-engine-3.7.0-SNAPSHOT.zip rebuilt: framework + engine JARs come from the producer-side branch (with AnalyticsFrontEndExtension, AnalyticsServices, and the Guice TypeListener that pushes services to consumers); the bundled transitive deps (patched Calcite, commons-text, jts-io, accessors-smart, etc.) come from the previously-vendored bundle so the existing CALCITE-3745 patch + Calcite-needed runtime deps from Wire analytics-engine as extendedPlugins dependency #5302 are preserved.

SPI consumer

  • plugin/src/main/java/org/opensearch/sql/plugin/SQLAnalyticsFrontEndExtension.java — implements AnalyticsFrontEndExtension; setAnalyticsServices stashes the executor + schemaProvider into AnalyticsExecutorHolder. Kept in a class separate from SQLPlugin so SQLPlugin's bytecode does not reference any analytics-framework class. When AE is absent, ServiceLoader never touches this class.
  • plugin/src/main/resources/META-INF/services/org.opensearch.analytics.spi.AnalyticsFrontEndExtension — standard Java SPI registration.

Holder refactor (the runtime-correctness fix)

plugin/src/main/java/org/opensearch/sql/plugin/rest/AnalyticsExecutorHolder.java:

  • Field types changed to Object so no analytics-framework class appears in any signature loaded at SQL plugin startup.
  • Both set(...) and get*() use Object; callers cast at use sites already gated on a non-null value.

Lazy resolution at the request boundary

  • TransportPPLQueryAction: dropped @Inject QueryPlanExecutor; replaced eager unifiedQueryHandler field with analyticsHandler() — synchronized double-checked lazy method.
  • SQLPlugin#createSqlAnalyticsRouter: same lazy-supplier pattern.
  • RestUnifiedQueryAction: added fromUnknownExecutor(NodeClient, ClusterService, Object, Object) factory — the only cast site for analytics-framework types. Constructor now takes a SchemaProvider (replacing the previous static OpenSearchSchemaBuilder.buildSchema(...) call).
  • RestUnifiedQueryActionTest updated.

End-to-end verification

Local install + PPL queries

WITH analytics-engine:

$ bin/opensearch-plugin install --batch file://.../opensearch-job-scheduler-3.7.0.0-SNAPSHOT.zip
-> Installed opensearch-job-scheduler with folder name opensearch-job-scheduler

$ bin/opensearch-plugin install --batch file://.../analytics-engine-3.7.0-SNAPSHOT.zip
-> Installed analytics-engine with folder name analytics-engine

$ bin/opensearch-plugin install --batch file://.../opensearch-sql-3.7.0.0-SNAPSHOT.zip
-> Installed opensearch-sql with folder name opensearch-sql

$ curl -s 'http://.../_cat/plugins?v'
name        component                version
...         analytics-engine         3.7.0-SNAPSHOT
...         opensearch-job-scheduler 3.7.0.0-SNAPSHOT
...         opensearch-sql           3.7.0.0-SNAPSHOT

$ curl -s -X POST '.../_plugins/_ppl/_explain' -H 'Content-Type: application/json' \
       -d '{"query":"source = parquet_logs | head 1"}'
{
  "calcite": {
    "logical": "LogicalSystemLimit(fetch=[10000], type=[QUERY_SIZE_LIMIT])\n  LogicalSort(fetch=[1])\n    LogicalTableScan(table=[[opensearch, parquet_logs]])\n"
  }
}

The calcite.logical block proves the SPI push lifecycle fired end-to-end: consumer received services → holder populated → lazy handler built → request dispatched to analytics-engine.

WITHOUT analytics-engine:

$ bin/opensearch-plugin install --batch file://.../opensearch-sql-3.7.0.0-SNAPSHOT.zip
2026-05-01T... main WARN Missing plugin [analytics-engine], dependency of [opensearch-sql]
-> Installed opensearch-sql with folder name opensearch-sql

Lucene PPL works:

$ curl -s -X POST '.../_plugins/_ppl' -H 'Content-Type: application/json' \
       -d '{"query":"source = accounts | fields name, age"}'
{ "schema":[{"name":"name","type":"string"},{"name":"age","type":"int"}],
  "datarows":[["Alice",30],["Bob",25]], "total":2, "size":2 }

parquet_* falls through to legacy → clean IndexNotFoundException, NOT NoClassDefFoundError:

{ "error":{ "reason":"Error occurred in OpenSearch engine: no such index [parquet_logs]",
            "type":"IndexNotFoundException" }, "status":404 }

Doctest

$ ./gradlew :doctest:doctest
Ran 75 tests in 25.255s
OK
BUILD SUCCESSFUL in 1m 2s

Test plan

  • ./gradlew :opensearch-sql-plugin:spotlessCheck — passes
  • ./gradlew :opensearch-sql-plugin:compileJava :opensearch-sql-plugin:compileTestJava — passes
  • ./gradlew :opensearch-sql-plugin:test --tests "*RestUnifiedQueryActionTest*" — passes
  • ./gradlew :doctest:doctest — 75/75 pass
  • End-to-end install + PPL query, WITH AE — passes; SPI push fires; analytics routing via Calcite plan confirmed in _explain
  • End-to-end install + PPL query, WITHOUT AE — passes; parquet_* returns clean IndexNotFoundException, not NoClassDefFoundError

Related

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 1, 2026

PR Code Analyzer ❗

AI-powered 'Code-Diff-Analyzer' found issues on commit 35b2fd1.

PathLineSeverityDescription
libs/analytics-engine-3.7.0-SNAPSHOT.zip1highBinary artifact replaced: the ZIP contents changed (index hash fedfaa441a vs 601225ea10) but the version string stayed at 3.7.0-SNAPSHOT. Pre-built binaries in-repo are opaque — the diff cannot reveal what code was added or removed. Maintainers must verify the new artifact matches a known-good build from a trusted CI pipeline.
libs/analytics-framework-3.7.0-SNAPSHOT.jar1highBinary JAR replaced: hash changed (a28474484f vs a5cfcc3294) while keeping the same 3.7.0-SNAPSHOT version label. A silently swapped JAR with the same version is a classic supply-chain substitution vector. Artifact authenticity must be confirmed against a signed build artifact or reproducible build.
plugin/build.gradle58highBuild dependency configuration changed: analytics-engine moved from a required to an optional extended plugin, and the entire bundlePlugin exclusion block (which controlled exactly which JARs from analytics-engine were NOT rebundled) was removed. This alters the plugin's dependency resolution and classloader boundary at build time. Any unintended JAR now included in the bundle could introduce malicious or vulnerable classes.

The table above displays the top 10 most important findings.

Total: 3 | Critical: 0 | High: 3 | Medium: 0 | Low: 0


Pull Requests Author(s): Please update your Pull Request according to the report above.

Repository Maintainer(s): You can bypass diff analyzer by adding label skip-diff-analyzer after reviewing the changes carefully, then re-run failed actions. To re-enable the analyzer, remove the label, then re-run all actions.


⚠️ Note: The Code-Diff-Analyzer helps protect against potentially harmful code patterns. Please ensure you have thoroughly reviewed the changes beforehand.

Thanks.

@ahkcs ahkcs force-pushed the feature/analytics-spi-e2e branch 2 times, most recently from 38d39d4 to 3d19ed1 Compare May 1, 2026 20:08
@ahkcs ahkcs added the maintenance Improves code quality, but not the product label May 1, 2026
…ension SPI

End-to-end consumer-side wiring for the AnalyticsFrontEndExtension SPI
(analytics-framework). Lets the SQL plugin install and run on stock
OpenSearch distributions that don't ship analytics-engine, while
preserving the analytics routing behavior when both plugins are
co-installed.

Build wiring (plugin/build.gradle):
- Mark analytics-engine as ;optional=true in extendedPlugins. OpenSearch's
  PluginsService.checkBundleJarHell skips both URL-overlap and
  class-level JarHell against optional deps, so SQL bundles its own
  copies of the previously-stripped jars (Calcite, Guava, etc.) again.
- Drop the entire bundlePlugin exclusion list — the hand-maintained
  jar-deduplication hack goes away with the SPI.
- Vendor the rebuilt analytics-framework-3.7.0-SNAPSHOT.jar (with the
  new AnalyticsFrontEndExtension + AnalyticsServices types) and the
  rebuilt analytics-engine-3.7.0-SNAPSHOT.zip (with the producer-side
  Guice TypeListener that pushes services to consumers).

SPI consumer (SQLAnalyticsFrontEndExtension.java + META-INF/services
registration):
- Implements AnalyticsFrontEndExtension; setAnalyticsServices stashes
  the executor + schemaProvider into AnalyticsExecutorHolder.
- Kept in a class separate from SQLPlugin so SQLPlugin's bytecode
  doesn't reference any analytics-framework class. When AE is absent,
  ServiceLoader never touches this class and SQL boots without
  analytics-framework on its runtime classpath.

Holder isolation (AnalyticsExecutorHolder.java):
- Refactored to store services as Object internally — no
  analytics-framework type appears in any signature loaded at SQL
  plugin startup. Callers cast at use sites that are already gated on
  a non-null value.
- Added schemaProvider alongside the executor.

Lazy resolution at the request boundary:
- TransportPPLQueryAction: drop the @Inject QueryPlanExecutor
  constructor parameter (the cross-plugin Guice dependency that broke
  the optional path). Replace with analyticsHandler() — null-checked
  lazy resolution from the holder; falls through to the legacy PPL
  path when AE is absent.
- SQLPlugin#createSqlAnalyticsRouter: same lazy-supplier pattern
  against the new Object-typed holder + RestUnifiedQueryAction
  factory.
- RestUnifiedQueryAction: add fromUnknownExecutor(Object, Object)
  factory (the only cast site for the analytics-framework types).
  Replaced static OpenSearchSchemaBuilder.buildSchema() call with the
  injected SchemaProvider so the runtime no longer has a hard static
  reference to analytics-engine.

Verified end-to-end with bin/opensearch-plugin install on a real
OpenSearch 3.7 distribution:
- WITH analytics-engine: all three plugins install; Lucene PPL works;
  parquet_* PPL routes through the analytics engine (confirmed via
  /_plugins/_ppl/_explain showing Calcite logical plan).
- WITHOUT analytics-engine: SQL + job-scheduler install (warning
  logged for missing optional AE); Lucene PPL works; parquet_* PPL
  falls through to legacy with a clean IndexNotFoundException — no
  NoClassDefFoundError, no startup crash.

Pairs with the producer-side wiring in opensearch-project/OpenSearch.

Signed-off-by: Kai Huang <ahkcs@amazon.com>
@ahkcs ahkcs force-pushed the feature/analytics-spi-e2e branch from 3d19ed1 to 35b2fd1 Compare May 1, 2026 21:10
@ahkcs ahkcs added skip-diff-analyzer Maintainer to skip code-diff-analyzer check, after reviewing issues in AI analysis. skip-diff-reviewer Maintainer to skip code-diff-reviewer check, after reviewing issues in AI analysis. labels May 1, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

maintenance Improves code quality, but not the product skip-diff-analyzer Maintainer to skip code-diff-analyzer check, after reviewing issues in AI analysis. skip-diff-reviewer Maintainer to skip code-diff-reviewer check, after reviewing issues in AI analysis.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant