diff --git a/fe/fe-filesystem/ENV_TEST_README.md b/fe/fe-filesystem/ENV_TEST_README.md index 965df0ba5770b2..eda5ee8a5b8358 100644 --- a/fe/fe-filesystem/ENV_TEST_README.md +++ b/fe/fe-filesystem/ENV_TEST_README.md @@ -1,15 +1,36 @@ + + # Filesystem Environment Tests -本目录下的环境测试(Layer 2)需要真实的云存储/HDFS/Broker 服务才能运行。默认 CI 构建会自动跳过它们。 +The environment tests (Layer 2) in this directory require real cloud storage / HDFS / Broker +services to run. They are automatically skipped in default CI builds. -## 快速开始 +## Quick Start -### 使用辅助脚本 +### Using the Helper Script -仓库根目录提供了 `run-fs-env-test.sh`,支持通过命令行参数或预设环境变量运行: +The repository root provides `run-fs-env-test.sh`, which supports running tests via command-line +arguments or pre-set environment variables: ```bash -# S3 测试 +# S3 tests ./run-fs-env-test.sh s3 \ --s3-endpoint=https://s3.us-east-1.amazonaws.com \ --s3-region=us-east-1 \ @@ -17,25 +38,25 @@ --s3-ak=AKIAIOSFODNN7EXAMPLE \ --s3-sk=wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY -# Azure 测试 +# Azure tests ./run-fs-env-test.sh azure \ --azure-account=myaccount \ --azure-key=base64key== \ --azure-container=testcontainer -# HDFS 测试(Simple Auth) +# HDFS tests (Simple Auth) ./run-fs-env-test.sh hdfs \ --hdfs-host=namenode.example.com \ --hdfs-port=8020 -# Kerberos 测试 +# Kerberos tests ./run-fs-env-test.sh kerberos \ --kdc-principal=hdfs/namenode@REALM \ --kdc-keytab=/path/to/hdfs.keytab \ --hdfs-host=namenode.example.com \ --hdfs-port=8020 -# COS(腾讯云)测试 +# COS (Tencent Cloud) tests ./run-fs-env-test.sh cos \ --cos-endpoint=https://cos.ap-guangzhou.myqcloud.com \ --cos-region=ap-guangzhou \ @@ -43,7 +64,7 @@ --cos-ak=SecretId \ --cos-sk=SecretKey -# OSS(阿里云)测试 +# OSS (Alibaba Cloud) tests ./run-fs-env-test.sh oss \ --oss-endpoint=https://oss-cn-hangzhou.aliyuncs.com \ --oss-region=cn-hangzhou \ @@ -51,7 +72,7 @@ --oss-ak=AccessKeyId \ --oss-sk=AccessKeySecret -# OBS(华为云)测试 +# OBS (Huawei Cloud) tests ./run-fs-env-test.sh obs \ --obs-endpoint=https://obs.cn-north-4.myhuaweicloud.com \ --obs-region=cn-north-4 \ @@ -59,40 +80,40 @@ --obs-ak=AK \ --obs-sk=SK -# Broker 测试 +# Broker tests ./run-fs-env-test.sh broker \ --broker-host=broker.example.com \ --broker-port=8060 -# 运行全部环境测试(需要所有环境变量已预设) +# Run all environment tests (requires all environment variables to be pre-set) ./run-fs-env-test.sh all ``` -### 使用 Maven 直接运行 +### Running Directly with Maven -也可以先导出环境变量,然后使用 Maven 命令直接运行: +You can also export environment variables first and then run tests directly with Maven: ```bash -# 导出凭据 +# Export credentials export DORIS_FS_TEST_S3_ENDPOINT=https://s3.us-east-1.amazonaws.com export DORIS_FS_TEST_S3_REGION=us-east-1 export DORIS_FS_TEST_S3_BUCKET=my-test-bucket export DORIS_FS_TEST_S3_AK=your-access-key export DORIS_FS_TEST_S3_SK=your-secret-key -# 运行 S3 相关环境测试 +# Run S3-related environment tests cd fe -mvn test -pl fe-filesystem \ - -Dsurefire.excludedGroups= \ +mvn test -pl fe-filesystem/fe-filesystem-s3 \ + -Dtest.excludedGroups=none \ -Dgroups=s3 \ -Dcheckstyle.skip=true \ -DfailIfNoTests=false \ -Dmaven.build.cache.enabled=false \ --also-make -# 运行 HDFS + Kerberos 环境测试 -mvn test -pl fe-filesystem \ - -Dsurefire.excludedGroups= \ +# Run HDFS + Kerberos environment tests +mvn test -pl fe-filesystem/fe-filesystem-hdfs \ + -Dtest.excludedGroups=none \ -Dgroups="hdfs | kerberos" \ -Dcheckstyle.skip=true \ -DfailIfNoTests=false \ @@ -100,90 +121,101 @@ mvn test -pl fe-filesystem \ --also-make ``` -## 环境变量一览 +## Environment Variables Reference -| Tag | 环境变量 | 说明 | -|-----|---------|------| -| `s3` | `DORIS_FS_TEST_S3_ENDPOINT` | S3 兼容存储端点 | -| | `DORIS_FS_TEST_S3_REGION` | 区域 | -| | `DORIS_FS_TEST_S3_BUCKET` | 测试桶名 | +| Tag | Environment Variable | Description | +|-----|---------------------|-------------| +| `s3` | `DORIS_FS_TEST_S3_ENDPOINT` | S3-compatible storage endpoint | +| | `DORIS_FS_TEST_S3_REGION` | Region | +| | `DORIS_FS_TEST_S3_BUCKET` | Test bucket name | | | `DORIS_FS_TEST_S3_AK` | Access Key | | | `DORIS_FS_TEST_S3_SK` | Secret Key | -| `azure` | `DORIS_FS_TEST_AZURE_ACCOUNT` | Azure Storage 账户名 | -| | `DORIS_FS_TEST_AZURE_KEY` | 账户密钥 | -| | `DORIS_FS_TEST_AZURE_CONTAINER` | 测试容器名 | -| `cos` | `DORIS_FS_TEST_COS_ENDPOINT` | COS 端点 | -| | `DORIS_FS_TEST_COS_REGION` | 区域(如 ap-guangzhou) | -| | `DORIS_FS_TEST_COS_BUCKET` | 测试桶名 | +| `azure` | `DORIS_FS_TEST_AZURE_ACCOUNT` | Azure Storage account name | +| | `DORIS_FS_TEST_AZURE_KEY` | Account key | +| | `DORIS_FS_TEST_AZURE_CONTAINER` | Test container name | +| `cos` | `DORIS_FS_TEST_COS_ENDPOINT` | COS endpoint | +| | `DORIS_FS_TEST_COS_REGION` | Region (e.g., ap-guangzhou) | +| | `DORIS_FS_TEST_COS_BUCKET` | Test bucket name | | | `DORIS_FS_TEST_COS_AK` | SecretId | | | `DORIS_FS_TEST_COS_SK` | SecretKey | -| `oss` | `DORIS_FS_TEST_OSS_ENDPOINT` | OSS 端点 | -| | `DORIS_FS_TEST_OSS_REGION` | 区域 | -| | `DORIS_FS_TEST_OSS_BUCKET` | 测试桶名 | +| `oss` | `DORIS_FS_TEST_OSS_ENDPOINT` | OSS endpoint | +| | `DORIS_FS_TEST_OSS_REGION` | Region | +| | `DORIS_FS_TEST_OSS_BUCKET` | Test bucket name | | | `DORIS_FS_TEST_OSS_AK` | Access Key | | | `DORIS_FS_TEST_OSS_SK` | Secret Key | -| `obs` | `DORIS_FS_TEST_OBS_ENDPOINT` | OBS 端点 | -| | `DORIS_FS_TEST_OBS_REGION` | 区域 | -| | `DORIS_FS_TEST_OBS_BUCKET` | 测试桶名 | +| `obs` | `DORIS_FS_TEST_OBS_ENDPOINT` | OBS endpoint | +| | `DORIS_FS_TEST_OBS_REGION` | Region | +| | `DORIS_FS_TEST_OBS_BUCKET` | Test bucket name | | | `DORIS_FS_TEST_OBS_AK` | Access Key | | | `DORIS_FS_TEST_OBS_SK` | Secret Key | -| `hdfs` | `DORIS_FS_TEST_HDFS_HOST` | NameNode 地址 | -| | `DORIS_FS_TEST_HDFS_PORT` | NameNode 端口 | -| `kerberos` | `DORIS_FS_TEST_KDC_PRINCIPAL` | Kerberos 主体 | -| | `DORIS_FS_TEST_KDC_KEYTAB` | keytab 文件路径 | -| | `DORIS_FS_TEST_HDFS_HOST` | 启用 Kerberos 的 HDFS 地址 | -| `broker` | `DORIS_FS_TEST_BROKER_HOST` | Broker 进程地址 | -| | `DORIS_FS_TEST_BROKER_PORT` | Broker 进程端口 | - -## 测试用例概览 - -### T-E1: S3ObjStorage 环境测试(5 tests, tag: `s3`) -- `putAndHeadObject` — 上传小文件 → headObject 验证 size 和 etag -- `listObjects` — 上传多个文件 → listObjects 验证返回数量 -- `copyAndDeleteObject` — 上传 → copy → 验证 → delete → 验证不存在 -- `multipartUpload_completeSucceeds` — initiate → uploadPart × 2 → complete → headObject 验证 -- `abortMultipartUpload_leavesNoObject` — initiate → abort → 对象不存在 - -### T-E2: S3FileSystem 环境测试(7 tests, tag: `s3`) -- `exists` — 已存在/不存在的对象 -- `deleteRemovesObject` — 上传 → delete → exists false -- `renameMovesObject` — 上传 → rename → 旧不存在/新存在 -- `listReturnsCorrectEntries` — 上传多个 → list 验证 -- `inputOutputRoundTrip` — 写入 → 读取 → 内容一致(含 UTF-8 + emoji) -- `inputFileLength` — 上传已知大小 → length() 验证 - -### T-E3: Azure 环境测试(7 tests, tag: `azure`) -测试项同 T-E2,使用 `wasbs://` scheme。 - -### T-E3b/c/d: COS / OSS / OBS 环境测试(各 7 tests, tag: `cos` / `oss` / `obs`) -测试项同 T-E2,分别使用腾讯云/阿里云/华为云 SDK。 - -### T-E4: DFSFileSystem 环境测试(8 tests, tag: `hdfs`) -- `mkdirsAndExists` — 创建多级目录 → exists true -- `deleteRecursive` — 创建含文件的目录 → 递归删除 -- `renameFile` / `renameDirectory` — 文件/目录重命名 -- `listFiles` / `listDirectories` — 列举文件/子目录 -- `inputOutputRoundTrip` — 写入 → 读取 → 内容一致 -- `inputFileLength` — 验证文件大小 - -### T-E5: Kerberos 环境测试(4 tests, tag: `kerberos`) -- `loginSucceeds` — 使用真实 principal/keytab 登录 -- `doAsExecutesAction` — 代理执行返回值验证 -- `doAsPropagatesIOException` — IOException 正确传播 -- `hdfsOperationWithKerberos` — Kerberos 模式下 HDFS exists 可正常工作 - -### T-E6: Broker 环境测试(4 tests, tag: `broker`) -- `existsReturnsFalseForMissing` — 不存在的路径 → false -- `writeAndRead` — 通过 outputFile 写入 → inputFile 读取 → 一致 -- `deleteRemovesFile` — 写入 → delete → exists false -- `listReturnsFiles` — 写入多个 → list 验证 - -## 注意事项 - -1. **每次测试都会在目标存储中创建带唯一 UUID 前缀的临时数据**,`@AfterAll` 会尝试清理。如果测试异常中断,可能需要手动清理。 - -2. **S3 multipart upload 测试**会上传 ~5MB 数据,请确保测试桶有足够的空间和权限。 - -3. **环境测试默认不运行**(通过 Maven Surefire 的 `environment` 配置)。只有通过 `-Dsurefire.excludedGroups= -Dgroups=` 显式启用时才会运行。 - -4. **请勿将凭据提交到代码仓库中**。建议使用环境变量或 CI/CD 密钥管理。 +| `hdfs` | `DORIS_FS_TEST_HDFS_HOST` | NameNode address | +| | `DORIS_FS_TEST_HDFS_PORT` | NameNode port | +| `kerberos` | `DORIS_FS_TEST_KDC_PRINCIPAL` | Kerberos principal | +| | `DORIS_FS_TEST_KDC_KEYTAB` | Keytab file path | +| | `DORIS_FS_TEST_HDFS_HOST` | Kerberos-enabled HDFS address | +| | `DORIS_FS_TEST_HDFS_PORT` | NameNode port (optional, defaults to 8020) | +| `broker` | `DORIS_FS_TEST_BROKER_HOST` | Broker process address | +| | `DORIS_FS_TEST_BROKER_PORT` | Broker process port | + +## Test Case Overview + +### T-E1: S3ObjStorage Environment Tests (6 tests, tag: `s3`) +- `putAndHeadObject` — Upload a small file → verify size and etag via headObject +- `listObjects` — Upload multiple files → verify returned count via listObjects +- `copyAndDeleteObject` — Upload → copy → verify → delete → verify non-existence +- `multipartUpload_completeSucceeds` — initiate → uploadPart × 2 → complete → verify via headObject +- `abortMultipartUpload_leavesNoObject` — initiate → abort → object does not exist +- `getPresignedUrl_returnsValidUrlAndUploadWorks` — Generate presigned URL → PUT upload → verify object exists + +### T-E2: S3FileSystem Environment Tests (8 tests, tag: `s3`) +- `exists` — Existing / non-existing objects +- `deleteRemovesObject` — Upload → delete → exists returns false +- `renameMovesObject` — Upload → rename → old does not exist / new exists +- `listReturnsCorrectEntries` — Upload multiple → verify via list +- `inputOutputRoundTrip` — Write → read → content matches (including UTF-8 + emoji) +- `inputFileLength` — Upload with known size → verify length() +- `getPresignedUrl_returnsValidUrlAndUploadWorks` — Generate presigned URL → PUT upload → verify object exists + +### T-E3: Azure Environment Tests (8 tests, tag: `azure`) +Same test cases as T-E2, using the `wasbs://` scheme. + +### T-E3b/c/d: COS / OSS / OBS Environment Tests (8 tests each, tags: `cos` / `oss` / `obs`) +Same test cases as T-E2, using the Tencent Cloud / Alibaba Cloud / Huawei Cloud SDKs +respectively. + +### T-E4: DFSFileSystem Environment Tests (8 tests, tag: `hdfs`) +- `mkdirsAndExists` — Create multi-level directories → exists returns true +- `deleteRecursive` — Create a directory with files → recursive delete +- `renameFile` / `renameDirectory` — File / directory rename +- `listFiles` / `listDirectories` — List files / subdirectories +- `inputOutputRoundTrip` — Write → read → content matches +- `inputFileLength` — Verify file size + +### T-E5: Kerberos Environment Tests (4 tests, tag: `kerberos`) +- `loginSucceeds` — Log in with a real principal / keytab +- `doAsExecutesAction` — Verify return value of proxied execution +- `doAsPropagatesIOException` — IOException is correctly propagated +- `hdfsOperationWithKerberos` — HDFS exists works correctly under Kerberos + +### T-E6: Broker Environment Tests (4 tests, tag: `broker`) +- `existsReturnsFalseForMissing` — Non-existing path → false +- `writeAndRead` — Write via outputFile → read via inputFile → content matches +- `deleteRemovesFile` — Write → delete → exists returns false +- `listReturnsFiles` — Write multiple → verify via list + +## Notes + +1. **Each test creates temporary data with a unique UUID prefix in the target storage**. + `@AfterAll` attempts cleanup. If a test is interrupted abnormally, manual cleanup may be + required. + +2. **S3 multipart upload tests** upload ~5 MB of data. Ensure the test bucket has sufficient + space and permissions. + +3. **Environment tests are not run by default** (the module POM sets + `${test.excludedGroups}` with a default value of + `environment`). They only run when explicitly enabled with + `-Dtest.excludedGroups=none -Dgroups=`. + +4. **Never commit credentials to the code repository**. Use environment variables or CI/CD + secret management instead. diff --git a/fe/fe-filesystem/README.md b/fe/fe-filesystem/README.md new file mode 100644 index 00000000000000..8b1a069f4b46d1 --- /dev/null +++ b/fe/fe-filesystem/README.md @@ -0,0 +1,406 @@ + + +# Doris FE Filesystem Module + +The `fe-filesystem` module provides a **pluggable filesystem abstraction** for the Doris FE +(Frontend). It decouples the query engine (`fe-core`) from concrete storage backends (S3, HDFS, +Azure Blob, etc.) at both the Maven dependency level and the runtime classpath level. + +Each storage backend is a self-contained plugin that is discovered and loaded at FE startup—no +modification to `fe-core` is required to add a new storage backend. + +## Module Structure + +``` +fe-filesystem/ (aggregator POM — no Java code) +│ +├── fe-filesystem-api/ [API] Core abstractions (FileSystem, Location, …) +├── fe-filesystem-spi/ [SPI] Provider interface + object-storage contracts +│ +├── fe-filesystem-s3/ [IMPL] AWS S3 / S3-compatible (MinIO, …) +├── fe-filesystem-oss/ [IMPL] Alibaba Cloud OSS (delegates to S3) +├── fe-filesystem-cos/ [IMPL] Tencent Cloud COS (delegates to S3) +├── fe-filesystem-obs/ [IMPL] Huawei Cloud OBS (delegates to S3) +├── fe-filesystem-azure/ [IMPL] Azure Blob Storage +├── fe-filesystem-hdfs/ [IMPL] HDFS / ViewFS / OFS / JFS +├── fe-filesystem-local/ [IMPL] Local filesystem (testing only) +└── fe-filesystem-broker/ [IMPL] Doris Broker process (Thrift RPC) +``` + +### Layering + +| Layer | Module | Compiled into fe-core? | Deployed as plugin? | +|-------|--------|:-----:|:------:| +| **API** | `fe-filesystem-api` | ✅ Yes | ❌ No | +| **SPI** | `fe-filesystem-spi` | ✅ Yes | ❌ No | +| **IMPL** | `fe-filesystem-s3`, `-hdfs`, … | ❌ No | ✅ Yes | + +* **API** — Pure-JDK interfaces and value types consumed by `fe-core` (zero third-party + dependencies). Defines `FileSystem`, `Location`, `FileEntry`, `DorisInputFile`, + `DorisOutputFile`, etc. +* **SPI** — The `FileSystemProvider` interface (extends `PluginFactory` from `fe-extension-spi`) + plus the object-storage layer (`ObjStorage`, `ObjFileSystem`, `HadoopAuthenticator`). Also + compiled into `fe-core`. +* **IMPL** — Concrete backends. Each one depends on `fe-filesystem-spi` (and transitively on + `fe-filesystem-api`). S3-delegating backends (OSS, COS, OBS) also depend on `fe-filesystem-s3` + to reuse `S3FileSystem`. They **must not** depend on `fe-core`, `fe-common`, or `fe-catalog`. + +## How It Works + +### Plugin Discovery & Loading + +At FE startup, `Env.initFileSystemPluginManager()` creates a `FileSystemPluginManager` and loads +providers in two phases: + +1. **ServiceLoader scan** — Discovers `FileSystemProvider` implementations already on the + classpath (built-in providers, test overrides). +2. **Directory plugin scan** — Uses `DirectoryPluginRuntimeManager` to scan the directory + configured by `Config.filesystem_plugin_root` (default: `${DORIS_HOME}/plugins/filesystem`). + Each direct child directory is treated as an **unpacked** plugin directory. The runtime + manager resolves `pluginDir/*.jar` (root-level jars, scanned for ServiceLoader registration) + and `pluginDir/lib/*.jar` (dependency jars, available for class loading only). + +Classpath providers have higher priority than directory-loaded providers. + +### Provider Selection + +When `fe-core` needs a filesystem, it calls `FileSystemFactory.getFileSystem(properties)`: + +``` +Map properties + │ + ▼ +FileSystemPluginManager.createFileSystem(properties) + │ + │ for each registered FileSystemProvider: + │ 1. provider.supports(properties) ← cheap, no I/O + │ 2. if true → provider.create(properties) → return FileSystem + │ + ▼ +First matching provider wins +``` + +Each provider's `supports()` method examines property keys to decide if it can handle the +request. For example: + +| Provider | Matching Logic | +|----------|---------------| +| S3 | `AWS_ACCESS_KEY` + (`AWS_ENDPOINT` or `AWS_REGION`) | +| OSS | Endpoint contains `aliyuncs.com` or `_STORAGE_TYPE_` = `"OSS"` | +| HDFS | `_STORAGE_TYPE_` = `"HDFS"` or URI scheme is `hdfs`/`viewfs`/`ofs`/`jfs`/`oss` | +| Azure | `AZURE_ACCOUNT_NAME` or endpoint contains `blob.core.windows.net` | +| Local | URI starts with `file://` or `local://` | +| Broker | `_STORAGE_TYPE_` = `"BROKER"` and `BROKER_HOST` present | + +### Plugin Packaging & Class Loading + +Each implementation module uses `maven-assembly-plugin` to produce a **zip build artifact**. +The zip must be **unpacked** before deployment. At runtime, `DirectoryPluginRuntimeManager` +expects each plugin to be an unpacked directory under `filesystem_plugin_root`: + +``` +${DORIS_HOME}/plugins/filesystem/ +├── s3/ ← one directory per plugin +│ ├── doris-fe-filesystem-s3.jar ← plugin jar at root (scanned for ServiceLoader) +│ └── lib/ +│ ├── aws-sdk-*.jar ← third-party dependencies +│ └── ... +├── hdfs/ +│ ├── doris-fe-filesystem-hdfs.jar +│ └── lib/... +└── ... +``` + +The Maven build produces a zip with the same layout. To deploy, unzip it into the appropriate +subdirectory (e.g., `unzip doris-fe-filesystem-s3.zip -d plugins/filesystem/s3/`). Dropping +the raw `.zip` file into the directory will **not** work — `DirectoryPluginRuntimeManager` +resolves `pluginDir/*.jar` and `pluginDir/lib/*.jar` from unpacked directories only. + +Jars that are already on the fe-core classpath (`fe-filesystem-api`, `fe-filesystem-spi`, +`fe-extension-spi`) are **excluded** from the zip to avoid duplication. + +At runtime, `DirectoryPluginRuntimeManager` creates an isolated `ClassLoader` per plugin. A +`ClassLoadingPolicy` ensures that shared framework classes (`org.apache.doris.filesystem.*`, +`software.amazon.awssdk.*`, `org.apache.hadoop.*`) are loaded parent-first to avoid cross- +ClassLoader cast failures. + +### S3 Delegation Pattern + +Several cloud providers (OSS, COS, OBS) are S3-compatible. Instead of duplicating the S3 +implementation, they follow a **delegation pattern**: + +1. Translate cloud-native property keys to S3-compatible keys. +2. Extend `ObjStorage` to override cloud-specific operations (pre-signed URLs, STS tokens). +3. Delegate core I/O to `S3FileSystem` (from `fe-filesystem-s3`). + +``` +CosFileSystemProvider + └─→ creates S3FileSystem(CosObjStorage) + │ │ + │ FileSystem ops │ Overrides getPresignedUrl(), getStsToken() + └─────────────────────┘ +``` + +## Relationship to Other Modules + +``` +┌──────────────────────────────────────────────────────────┐ +│ fe-core │ +│ Uses: FileSystem, Location, FileEntry (from API) │ +│ Depends on: fe-filesystem-api, fe-filesystem-spi │ +│ Loads plugins from: ${DORIS_HOME}/plugins/filesystem │ +└────────────┬────────────────────────┬────────────────────┘ + │ compile-time │ runtime (plugin) + ▼ ▼ + ┌──────────────────┐ ┌─────────────────────────┐ + │ fe-filesystem-api│ │ fe-filesystem-s3 (zip) │ + │ fe-filesystem-spi│ │ fe-filesystem-hdfs (zip)│ + │ (compiled in) │ │ fe-filesystem-xxx (zip) │ + └──────────────────┘ └─────────────────────────┘ +``` + +* **`fe-core`** — Compile-time dependency on `fe-filesystem-api` and `fe-filesystem-spi`. + Implementation modules are **not** Maven dependencies of `fe-core`; they are loaded at runtime + from the plugin directory. +* **`fe-extension-spi`** — Provides the `PluginFactory` / `Plugin` interfaces and the + `DirectoryPluginRuntimeManager` class used by the plugin loading infrastructure. +* **`fe-filesystem-local`** — Used in `fe-core` unit tests (test-scope dependency) so that tests + can exercise `FileSystem` operations without cloud credentials. + +## Adding a New Filesystem Sub-Module + +Follow these steps to add support for a new storage backend (e.g., Google Cloud Storage): + +### 1. Create the Maven module + +Create a new directory `fe-filesystem/fe-filesystem-gcs/` with this structure: + +``` +fe-filesystem-gcs/ +├── pom.xml +└── src/ + ├── main/ + │ ├── assembly/ + │ │ └── plugin-zip.xml + │ ├── java/org/apache/doris/filesystem/gcs/ + │ │ ├── GcsFileSystemProvider.java + │ │ ├── GcsFileSystem.java + │ │ └── GcsObjStorage.java (if object-storage based) + │ └── resources/META-INF/services/ + │ └── org.apache.doris.filesystem.spi.FileSystemProvider + └── test/ + └── java/org/apache/doris/filesystem/gcs/ + └── ...Test.java +``` + +### 2. Write `pom.xml` + +```xml + + + org.apache.doris + fe-filesystem + ${revision} + + + fe-filesystem-gcs + jar + Doris FE Filesystem - GCS + + + + + org.apache.doris + fe-filesystem-spi + ${revision} + + + + + com.google.cloud + google-cloud-storage + ... + + + + + org.junit.jupiter + junit-jupiter + test + + + + + doris-fe-filesystem-gcs + + + + maven-assembly-plugin + + false + + src/main/assembly/plugin-zip.xml + + + + + make-assembly + package + single + + + + + + +``` + +**Important**: Do **not** add dependencies on `fe-core`, `fe-common`, or `fe-catalog`. + +### 3. Write the assembly descriptor + +Copy `plugin-zip.xml` from an existing module (e.g., `fe-filesystem-s3/src/main/assembly/`). +The key points: + +* Place the plugin jar at the **root** of the zip (for ServiceLoader discovery). +* Place all runtime dependencies in `lib/`. +* **Exclude** `fe-filesystem-api`, `fe-filesystem-spi`, and `fe-extension-spi` (they are + already on the fe-core classpath). + +### 4. Implement `FileSystemProvider` + +```java +package org.apache.doris.filesystem.gcs; + +import org.apache.doris.filesystem.FileSystem; +import org.apache.doris.filesystem.spi.FileSystemProvider; +import java.io.IOException; +import java.util.Map; + +public class GcsFileSystemProvider implements FileSystemProvider { + + // Public no-arg constructor — required by ServiceLoader + public GcsFileSystemProvider() {} + + @Override + public boolean supports(Map properties) { + // Must be cheap (no network calls) and deterministic. + String type = properties.get("_STORAGE_TYPE_"); + return "GCS".equalsIgnoreCase(type); + } + + @Override + public FileSystem create(Map properties) throws IOException { + return new GcsFileSystem(properties); + } + + @Override + public String name() { + return "GCS"; + } +} +``` + +### 5. Implement `FileSystem` + +You have two choices: + +* **Object-storage backend** — Extend `ObjFileSystem` (from SPI) and implement `ObjStorage`. + This gives you `exists()`, `close()`, and cloud-specific delegates for free. +* **Custom backend** — Implement `FileSystem` directly (like `DFSFileSystem` for HDFS). + +At minimum, you must implement: + +| Method | Description | +|--------|-------------| +| `exists(Location)` | Check if a file/directory exists | +| `mkdirs(Location)` | Create directories | +| `delete(Location, boolean)` | Delete file or directory | +| `rename(Location, Location)` | Rename/move | +| `list(Location)` | List directory contents | +| `newInputFile(Location)` | Open a file for reading | +| `newOutputFile(Location)` | Open a file for writing | +| `close()` | Release resources | + +### 6. Register via ServiceLoader + +Create the file: + +``` +src/main/resources/META-INF/services/org.apache.doris.filesystem.spi.FileSystemProvider +``` + +With content: + +``` +org.apache.doris.filesystem.gcs.GcsFileSystemProvider +``` + +### 7. Register the module in the parent POM + +Add your module to `fe-filesystem/pom.xml`: + +```xml + + ... + fe-filesystem-gcs + +``` + +### 8. Add tests + +* **Unit tests** — Test your `FileSystem` and `ObjStorage` implementations with mocked cloud + clients. Place in `src/test/java/`. +* **Environment tests** — Tests that require real cloud credentials should be tagged with + `@Tag("environment")`. They are excluded by default and can be enabled with + `-Dtest.excludedGroups=none`. + +### 9. Build and deploy + +```bash +# Build the new module (must go through the reactor so sibling SNAPSHOTs resolve) +cd fe +mvn package -pl fe-filesystem/fe-filesystem-gcs --also-make -DskipTests + +# The build produces a zip at: +# fe-filesystem/fe-filesystem-gcs/target/doris-fe-filesystem-gcs.zip +# Deploy by unpacking into the plugin directory: +mkdir -p ${DORIS_HOME}/plugins/filesystem/gcs +unzip fe-filesystem/fe-filesystem-gcs/target/doris-fe-filesystem-gcs.zip \ + -d ${DORIS_HOME}/plugins/filesystem/gcs/ + +# The unpacked layout should be: +# plugins/filesystem/gcs/ +# ├── doris-fe-filesystem-gcs.jar +# └── lib/ +# └── *.jar +# +# NOTE: Do NOT drop the .zip file directly — it must be unpacked. +``` + +### Checklist + +- [ ] Module depends only on `fe-filesystem-spi` (not `fe-core`/`fe-common`/`fe-catalog`) +- [ ] `FileSystemProvider` has a public no-arg constructor +- [ ] `supports()` is cheap — no network calls +- [ ] `META-INF/services` file is present and correct +- [ ] Assembly descriptor excludes `fe-filesystem-api`, `fe-filesystem-spi`, `fe-extension-spi` +- [ ] Module is listed in `fe-filesystem/pom.xml` `` +- [ ] Unit tests pass; environment tests are tagged `@Tag("environment")`