Feature Request: Add Approximate Mode/Frequent Items Support Using DataSketches’ FrequentItemsSketch

Hi team 

I’m using the excellent DuckDB datasketches extension for large-scale analytics use cases. One common requirement in our datasets is to compute the mode() (most frequent item) per group, but the built-in exact mode() function in DuckDB leads to high memory usage or even OOMs when applied on large, high-cardinality datasets.

**Feature Request**
Please consider adding support for approximate mode estimation using [FrequentItemsSketch](https://datasketches.apache.org/docs/FrequentItems/FrequentItemsOverview.html) from Apache DataSketches.

**Why is this useful?**

 - mode() is commonly needed in aggregations over grouped data, e.g.:
    ```
    SELECT x, y, mode(z) FROM table GROUP BY x, y;
    ```
 - On large datasets (e.g., 30M+ rows, 1K+ groups), the exact mode() leads to memory exhaustion.
 - Approximate mode with bounded error would be a great tradeoff and fits well into the sketch philosophy.

**References**
 - [Frequent Items Sketches documentation](https://datasketches.apache.org/docs/Frequency/FrequencySketches.html#frequent-items-sketches)
 - [Open issue](https://github.com/duckdb/duckdb/issues/16531) with exact mode


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature Request: Add Approximate Mode/Frequent Items Support Using DataSketches’ FrequentItemsSketch #3

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Feature Request: Add Approximate Mode/Frequent Items Support Using DataSketches’ FrequentItemsSketch #3

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions