Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ description: Explanation of the compaction process in GreptimeDB, including conc

For databases based on the LSM Tree, compaction is extremely critical. It merges overlapping fragmented SST files into a single ordered file, discards deleted data while significantly improves query performance.

Until v0.9.1, GreptimeDB provides compaction strategies to control how SST files are compacted: Time Windowed Compaction Strategy (TWCS) and Strict Window Compaction Strategy (SWCS).
Since v0.9.1, GreptimeDB provides compaction strategies to control how SST files are compacted: Time Windowed Compaction Strategy (TWCS) and Strict Window Compaction Strategy (SWCS).


## Concepts
Expand Down Expand Up @@ -68,9 +68,10 @@ It assigns files to be compacted into different time windows. For each window, T
For window assignment, SST files may span multiple time windows. TWCS assigns SSTs based on their maximum timestamps to ensure they are not affected by stale data. In time-series workloads, out-of-order writes are infrequent, and even when they occur, recent data's query performance is more critical than that of stale data.


TWCS provides 2 parameters:
Common TWCS table options include:
- `trigger_file_num`: number of files in a specific time window to trigger a compaction (default 4).
- `max_output_file_size`: max allowed compaction output file size (no limit by default).
- `time_window`: time window size for TWCS compaction.
- `max_output_file_size`: max allowed compaction output file size (default 512MB).
Comment thread
fengjiachun marked this conversation as resolved.


Following diagrams show how files in a window get compacted when `trigger_file_num = 3`:
Expand Down Expand Up @@ -107,6 +108,7 @@ CREATE TABLE monitor (
WITH (
'compaction.type'='twcs',
'compaction.twcs.trigger_file_num'='8',
'compaction.twcs.time_window'='1h',
'compaction.twcs.max_output_file_size'='500MB'
);
```
Expand All @@ -131,7 +133,7 @@ ADMIN COMPACT_TABLE(
);
```

The `<strategy_name>` parameter can be either `twcs` or `swcs` (case insensitive) which refer to Time Windowed Compaction Strategy and Strict Window Compaction Strategy respectively.
The `<strategy_name>` parameter can be `regular` (or `twcs`) for regular TWCS compaction, or `swcs` (or `strict_window`) for Strict Window Compaction Strategy. The value is case-insensitive.
For the `swcs` strategy, the `<strategy_parameters>` can specify:
- The window size (in seconds) for splitting SST files
- The `parallelism` parameter to control the level of parallelism for compaction (defaults to 1)
Expand All @@ -145,11 +147,11 @@ ADMIN COMPACT_TABLE(
"3600"
);

+--------------------------------------------------------------------+
| ADMIN compact_table(Utf8("monitor"),Utf8("swcs"),Utf8("3600")) |
+--------------------------------------------------------------------+
| 0 |
+--------------------------------------------------------------------+
+------------------------------------------------+
| ADMIN COMPACT_TABLE("monitor", "swcs", "3600") |
+------------------------------------------------+
| 0 |
+------------------------------------------------+
1 row in set (0.01 sec)
```

Expand Down Expand Up @@ -178,4 +180,4 @@ In Figure A, there are 3 overlapping SST files: `[0, 3]` (which includes timesta
The strict window compaction strategy will assign the file `[3, 8]` that covers windows 0, 4, and 8 to three separate windows respectively. This allows it to merge with `[0, 3]` and `[8, 10]` separately.
Figure B shows the final compaction result with three files: `[0, 3]`, `[4, 7]`, and `[8, 10]`. These files do not overlap with each other.

![compaction-strict-window.jpg](/compaction-strict-window.jpg)
![compaction-strict-window.jpg](/compaction-strict-window.jpg)
Original file line number Diff line number Diff line change
Expand Up @@ -66,9 +66,10 @@ TWCS 将要压缩的文件分配到不同的时间窗口。对于每个窗口,

对于窗口分配,SST 文件可能跨越多个时间窗口。为了确保不受陈旧数据影响,TWCS 根据 SST 的最大时间戳来进行分配。在时间序列工作负载中,无序写入很少发生,即使发生了,最近数据的查询性能也比陈旧数据更为重要。

TWCS 提供了 5 个参数供调整
常用的 TWCS 表级参数包括
- `trigger_file_num`: 单一时间窗口中触发 compaction 的文件数量(默认为 4)
- `max_output_file_size`: compaction 产生文件的最大大小(默认无限制)
- `time_window`: TWCS compaction 的时间窗口大小
- `max_output_file_size`: compaction 产生文件的最大大小(默认 512MB)

以下图表显示了当 `trigger_file_num`为 3 时,窗口中的文件如何被压缩:
- 在 A 中,有两个 SST 文件 `[0, 3]` 和 `[5, 6, 9]`,但只有一个有序组,因为这两个文件的时间范围不重叠。
Expand Down Expand Up @@ -104,6 +105,7 @@ CREATE TABLE monitor (
WITH (
'compaction.type'='twcs',
'compaction.twcs.trigger_file_num'='8',
'compaction.twcs.time_window'='1h',
'compaction.twcs.max_output_file_size'='500MB'
);
```
Expand All @@ -128,7 +130,7 @@ ADMIN COMPACT_TABLE(
);
```

`<strategy_name>` 参数可以是 `twcs``swcs`(大小写不敏感),分别指定时间窗口压缩策略和严格窗口压缩策略
`<strategy_name>` 参数可以是 `regular`(或 `twcs`)来指定常规 TWCS compaction,也可以是 `swcs`(或 `strict_window`)来指定严格窗口压缩策略。该值大小写不敏感
对于 `swcs` 策略, `<strategy_parameters>` 可以指定:
- 用于拆分 SST 文件的窗口大小(以秒为单位)
- `parallelism` 参数用于控制压缩的并行度(默认为 1)
Expand All @@ -142,11 +144,11 @@ ADMIN COMPACT_TABLE(
"3600"
);

+--------------------------------------------------------------------+
| ADMIN compact_table(Utf8("monitor"),Utf8("swcs"),Utf8("3600")) |
+--------------------------------------------------------------------+
| 0 |
+--------------------------------------------------------------------+
+------------------------------------------------+
| ADMIN COMPACT_TABLE("monitor", "swcs", "3600") |
+------------------------------------------------+
| 0 |
+------------------------------------------------+
1 row in set (0.01 sec)
```

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -66,9 +66,10 @@ TWCS 将要压缩的文件分配到不同的时间窗口。对于每个窗口,

对于窗口分配,SST 文件可能跨越多个时间窗口。为了确保不受陈旧数据影响,TWCS 根据 SST 的最大时间戳来进行分配。在时间序列工作负载中,无序写入很少发生,即使发生了,最近数据的查询性能也比陈旧数据更为重要。

TWCS 提供了 5 个参数供调整
常用的 TWCS 表级参数包括
- `trigger_file_num`: 单一时间窗口中触发 compaction 的文件数量(默认为 4)
- `max_output_file_size`: compaction 产生文件的最大大小(默认无限制)
- `time_window`: TWCS compaction 的时间窗口大小
- `max_output_file_size`: compaction 产生文件的最大大小(默认 512MB)

以下图表显示了当 `trigger_file_num`为 3 时,窗口中的文件如何被压缩:
- 在 A 中,有两个 SST 文件 `[0, 3]` 和 `[5, 6, 9]`,但只有一个有序组,因为这两个文件的时间范围不重叠。
Expand Down Expand Up @@ -104,6 +105,7 @@ CREATE TABLE monitor (
WITH (
'compaction.type'='twcs',
'compaction.twcs.trigger_file_num'='8',
'compaction.twcs.time_window'='1h',
'compaction.twcs.max_output_file_size'='500MB'
);
```
Expand All @@ -128,7 +130,7 @@ ADMIN COMPACT_TABLE(
);
```

`<strategy_name>` 参数可以是 `twcs``swcs`(大小写不敏感),分别指定时间窗口压缩策略和严格窗口压缩策略
`<strategy_name>` 参数可以是 `regular`(或 `twcs`)来指定常规 TWCS compaction,也可以是 `swcs`(或 `strict_window`)来指定严格窗口压缩策略。该值大小写不敏感
对于 `swcs` 策略, `<strategy_parameters>` 可以指定:
- 用于拆分 SST 文件的窗口大小(以秒为单位)
- `parallelism` 参数用于控制压缩的并行度(默认为 1)
Expand All @@ -142,11 +144,11 @@ ADMIN COMPACT_TABLE(
"3600"
);

+--------------------------------------------------------------------+
| ADMIN compact_table(Utf8("monitor"),Utf8("swcs"),Utf8("3600")) |
+--------------------------------------------------------------------+
| 0 |
+--------------------------------------------------------------------+
+------------------------------------------------+
| ADMIN COMPACT_TABLE("monitor", "swcs", "3600") |
+------------------------------------------------+
| 0 |
+------------------------------------------------+
1 row in set (0.01 sec)
```

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ description: Explanation of the compaction process in GreptimeDB, including conc

For databases based on the LSM Tree, compaction is extremely critical. It merges overlapping fragmented SST files into a single ordered file, discards deleted data while significantly improves query performance.

Until v0.9.1, GreptimeDB provides compaction strategies to control how SST files are compacted: Time Windowed Compaction Strategy (TWCS) and Strict Window Compaction Strategy (SWCS).
Since v0.9.1, GreptimeDB provides compaction strategies to control how SST files are compacted: Time Windowed Compaction Strategy (TWCS) and Strict Window Compaction Strategy (SWCS).


## Concepts
Expand Down Expand Up @@ -68,9 +68,10 @@ It assigns files to be compacted into different time windows. For each window, T
For window assignment, SST files may span multiple time windows. TWCS assigns SSTs based on their maximum timestamps to ensure they are not affected by stale data. In time-series workloads, out-of-order writes are infrequent, and even when they occur, recent data's query performance is more critical than that of stale data.


TWCS provides 2 parameters:
Common TWCS table options include:
- `trigger_file_num`: number of files in a specific time window to trigger a compaction (default 4).
- `max_output_file_size`: max allowed compaction output file size (no limit by default).
- `time_window`: time window size for TWCS compaction.
- `max_output_file_size`: max allowed compaction output file size (default 512MB).


Following diagrams show how files in a window get compacted when `trigger_file_num = 3`:
Expand Down Expand Up @@ -107,6 +108,7 @@ CREATE TABLE monitor (
WITH (
'compaction.type'='twcs',
'compaction.twcs.trigger_file_num'='8',
'compaction.twcs.time_window'='1h',
'compaction.twcs.max_output_file_size'='500MB'
);
```
Expand All @@ -131,7 +133,7 @@ ADMIN COMPACT_TABLE(
);
```

The `<strategy_name>` parameter can be either `twcs` or `swcs` (case insensitive) which refer to Time Windowed Compaction Strategy and Strict Window Compaction Strategy respectively.
The `<strategy_name>` parameter can be `regular` (or `twcs`) for regular TWCS compaction, or `swcs` (or `strict_window`) for Strict Window Compaction Strategy. The value is case-insensitive.
For the `swcs` strategy, the `<strategy_parameters>` can specify:
- The window size (in seconds) for splitting SST files
- The `parallelism` parameter to control the level of parallelism for compaction (defaults to 1)
Expand All @@ -145,11 +147,11 @@ ADMIN COMPACT_TABLE(
"3600"
);

+--------------------------------------------------------------------+
| ADMIN compact_table(Utf8("monitor"),Utf8("swcs"),Utf8("3600")) |
+--------------------------------------------------------------------+
| 0 |
+--------------------------------------------------------------------+
+------------------------------------------------+
| ADMIN COMPACT_TABLE("monitor", "swcs", "3600") |
+------------------------------------------------+
| 0 |
+------------------------------------------------+
1 row in set (0.01 sec)
```

Expand Down Expand Up @@ -178,4 +180,4 @@ In Figure A, there are 3 overlapping SST files: `[0, 3]` (which includes timesta
The strict window compaction strategy will assign the file `[3, 8]` that covers windows 0, 4, and 8 to three separate windows respectively. This allows it to merge with `[0, 3]` and `[8, 10]` separately.
Figure B shows the final compaction result with three files: `[0, 3]`, `[4, 7]`, and `[8, 10]`. These files do not overlap with each other.

![compaction-strict-window.jpg](/compaction-strict-window.jpg)
![compaction-strict-window.jpg](/compaction-strict-window.jpg)
Loading