Skip to content
Open
51 changes: 38 additions & 13 deletions content/influxdb3/enterprise/admin/clustering.md
Original file line number Diff line number Diff line change
Expand Up @@ -71,6 +71,11 @@ Available modes:
- `compact`: Background compaction and optimization
- `process`: Data processing and transformations

> [!Warning]
> Only **one** node per cluster can run in a mode that includes compaction (`compact` or `all`).
> Running multiple compactors causes data corruption.
> In a cluster, assign `all` mode to at most one node, and ensure no other node uses the `compact` mode.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure all mode is compatible with clustering at all. I think "all" nodes act alone, but i haven't tried it.

Why would i run an "All" node with a separate "ingest" node for example. It's an odd configuration. If it is possible, i don't think i would recommended it.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated the issue and PR description with the latest advice.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated in d1f15bf to reflect that all mode is for single-node Enterprise deployments only. Changes include:

  • Warning callout now says: avoid all in multi-node clusters (replication and catalog refresh aren't designed for all-mode nodes); use explicit modes with compact assigned to exactly one node
  • Small cluster (3 nodes) example: Node 1 changed from mode: all to mode: ingest,query,compact
  • "Migrate to specialized nodes" section reframed as "single-node to specialized cluster" migration, showing all only in the single-node Phase 1


## Allocate threads by node type

### Critical concept: Thread pools
Expand Down Expand Up @@ -194,6 +199,11 @@ influxdb3 serve \

Compactor nodes optimize stored data through background compaction processes.

> [!Warning]
> Only **one** compactor node can run per cluster.
> Multiple compactors writing compacted data to the same location will cause data corruption.
> Any node mode that includes compaction (`compact` or `all`) counts toward this limit.

### Dedicated compactor (32 cores)

```bash
Expand Down Expand Up @@ -298,21 +308,25 @@ influxdb3 \

### Small cluster (3 nodes)

> [!Note]
> Only one node per cluster can run compaction.
> In this example, Node 1 runs all modes (including compaction) and Nodes 2–3 run ingest and query only.

```yaml
# Node 1: All-in-one primary
# Node 1: All-in-one primary (includes compaction)
mode: all
cores: 32
io_threads: 8
datafusion_threads: 24

# Node 2: All-in-one secondary
mode: all
# Node 2: Ingest and query (no compaction)
mode: ingest,query
cores: 32
io_threads: 8
datafusion_threads: 24

# Node 3: All-in-one tertiary
mode: all
# Node 3: Ingest and query (no compaction)
mode: ingest,query
cores: 32
io_threads: 8
datafusion_threads: 24
Expand All @@ -333,8 +347,14 @@ cores: 48
io_threads: 4
datafusion_threads: 44

# Nodes 5-6: Compactor + Process
mode: compact,process
# Node 5: Compactor (only one compactor per cluster)
mode: compact
cores: 32
io_threads: 4
datafusion_threads: 28

# Node 6: Process node
mode: process
cores: 32
io_threads: 4
datafusion_threads: 28
Expand All @@ -355,13 +375,13 @@ cores: 64
io_threads: 4
datafusion_threads: 60

# Nodes 9-10: Dedicated compactors
# Node 9: Dedicated compactor (only one compactor per cluster)
mode: compact
cores: 32
io_threads: 2
datafusion_threads: 30

# Nodes 11-12: Process nodes
# Nodes 10-12: Process nodes
mode: process
cores: 32
io_threads: 6
Expand Down Expand Up @@ -553,7 +573,7 @@ GROUP BY event_type;
- Growing number of small Parquet files
- Increasing query times due to file fragmentation

**Solution:** Add compactor nodes or increase DataFusion threads (see [Compactor node issues](#compactor-node-issues))
**Solution:** Increase DataFusion threads on your single compactor node (see [Compactor node issues](#compactor-node-issues))
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@philjb @peterbarnett03 Can you verify this?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for parquet backend yes, which uses datafusion for compaction (pacha doesn't use DF for compaction).

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for parquet backend yes, which uses datafusion for compaction (pacha doesn't use DF for compaction).

@copilot Specify that this solution is for the Parquet-backed storage engine. The Performance Preview (with PachaTree) doesn't use DataFusion for compaction.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated in d1f15bf. The solution now specifies it applies to the default Parquet-backed storage engine and notes that PachaTree (Performance Preview) doesn't use DataFusion for compaction.


## Troubleshoot node configurations

Expand Down Expand Up @@ -602,7 +622,7 @@ free -h

```bash
# Check: Compaction queue length
# Solution: Add more compactor nodes or increase threads
# Solution: Increase threads on the single compactor node (only one compactor is allowed per cluster)
--datafusion-num-threads=30
```

Expand All @@ -619,9 +639,14 @@ free -h

### From all-in-one to specialized

> [!Note]
> If you're migrating a multi-node cluster, only one node should ever be in `all` mode.
> In a baseline multi-node setup, additional nodes should use `ingest,query` instead of `all`
> to avoid running multiple compactors.

```bash
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@copilot Fix the notation/syntax in this "bash" example so that it represents real code. Observe the following for all examples scaling from all mode to a multi-node cluster:

  • Replace the all node with nodes that have explicit, specialized modes
  • Assign compact mode to exactly one node that uses the same node-id as the all node being replaced

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 689cacc. The pseudo-bash is replaced with real influxdb3 serve commands. Phase 2 now explicitly shows the compact node reusing --node-id=node0 (same as the replaced all-mode node), per @philjb's requirement.

# Phase 1: Baseline (all nodes identical)
all nodes: --mode=all --num-io-threads=8
# Phase 1: Baseline (single all-in-one node for starting point)
node1: --mode=all --num-io-threads=8

# Phase 2: Identify workload patterns
# Monitor which nodes handle most writes vs queries
Expand Down
Loading