Skip to content

Feature Request: Add Approximate Mode/Frequent Items Support Using DataSketches’ FrequentItemsSketch #3

@chitralverma

Description

@chitralverma

Hi team

I’m using the excellent DuckDB datasketches extension for large-scale analytics use cases. One common requirement in our datasets is to compute the mode() (most frequent item) per group, but the built-in exact mode() function in DuckDB leads to high memory usage or even OOMs when applied on large, high-cardinality datasets.

Feature Request
Please consider adding support for approximate mode estimation using FrequentItemsSketch from Apache DataSketches.

Why is this useful?

  • mode() is commonly needed in aggregations over grouped data, e.g.:
    SELECT x, y, mode(z) FROM table GROUP BY x, y;
    
  • On large datasets (e.g., 30M+ rows, 1K+ groups), the exact mode() leads to memory exhaustion.
  • Approximate mode with bounded error would be a great tradeoff and fits well into the sketch philosophy.

References

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions