Equibles.ParadeDB.EntityFrameworkCore

EF Core integration for ParadeDB pg_search — BM25 full-text search indexes on PostgreSQL.

Provides a [Bm25Index] attribute for automatic index creation via migrations, and LINQ-friendly query methods for BM25 search, fuzzy matching, boosting, scoring, snippets, and more. No raw SQL needed.

Requirements

PostgreSQL with the pg_search extension installed
Npgsql.EntityFrameworkCore.PostgreSQL provider
.NET 8, 9, or 10

Installation

dotnet add package Equibles.ParadeDB.EntityFrameworkCore

Setup

1. Enable ParadeDB in your DbContext

services.AddDbContext<MyDbContext>(options =>
    options.UseNpgsql(connectionString, npgsql => npgsql.UseParadeDb()));

2. Add BM25 indexes to your entities

using Equibles.ParadeDB.EntityFrameworkCore;

[Bm25Index(nameof(Id), nameof(Title), nameof(Content))]
public class Article
{
    public Guid Id { get; set; }
    public string Title { get; set; }
    public string Content { get; set; }
}

The first parameter is the key field (required by pg_search to identify rows for scoring via pdb.score()), followed by the columns to index for full-text search. The key field is not searchable — it's only used internally by ParadeDB.

3. (Optional) Configure per-column index settings

Drop column-level attributes on the indexed properties to tune tokenization, stemming, stopwords, fast-storage, and field-specific options. Anything you don't set keeps pg_search's defaults.

using Equibles.ParadeDB.EntityFrameworkCore;

[Bm25Index(nameof(Id), nameof(Title), nameof(Content), nameof(Category), nameof(Rating), nameof(PublishedAt))]
public class Article
{
    public Guid Id { get; set; }

    [Bm25Text(Stemmer = Bm25Language.English, Fast = true)]
    public string Title { get; set; }

    [Bm25Text(Stemmer = Bm25Language.English, Record = Bm25Record.Position)]
    public string Content { get; set; }

    [Bm25Text(Tokenizer = Bm25Tokenizer.Raw, Fast = true)]
    public string Category { get; set; }

    [Bm25Numeric(Fast = true)]
    public int Rating { get; set; }

    [Bm25DateTime(Fast = true)]
    public DateTime PublishedAt { get; set; }
}

Attribute	Settings
`[Bm25Text]`	`Tokenizer`, `MinGram` / `MaxGram` / `PrefixOnly` (ngram), `RegexPattern` (regex), `Stemmer`, `StopwordsLanguage`, `Fast`, `Record`, `Indexed`, `Fieldnorms`
`[Bm25Numeric]`	`Fast`, `Indexed`
`[Bm25Boolean]`	`Fast`, `Indexed`
`[Bm25DateTime]`	`Fast`, `Indexed`
`[Bm25Json]`	same as `[Bm25Text]` plus `ExpandDots`

Bm25Tokenizer — Default, Whitespace, Raw, Keyword, SourceCode, Icu, Ngram, Regex, ChineseCompatible, ChineseLindera, JapaneseLindera, KoreanLindera, Jieba. Ngram requires MinGram and MaxGram; Regex requires RegexPattern. Other tokenizers take no parameters.

Bm25Language — used by both Stemmer and StopwordsLanguage: Arabic, Czech, Danish, Dutch, English, Finnish, French, German, Greek, Hungarian, Italian, Norwegian, Polish, Portuguese, Romanian, Russian, Spanish, Swedish, Tamil, Turkish.

Bm25Record — Basic (term + frequency), Freq, or Position. Position is required for phrase queries; the others use less disk.

Fast stores the column in columnar format for sorting/faceting/aggregation. Required if you want ORDER BY on this column to use the index.

A property may only have one [Bm25*] attribute, and that property must be listed in the entity's [Bm25Index] columns — orphan attributes throw at model-build time.

PascalCase columns: works for the common path, with two exceptions. EF Core's default naming maps public int Id { get; set; } to a column named "Id", and every search operator that references the column directly in the LINQ expression — Matches, MatchesAll, MatchesPhrase, MatchesFuzzy, MatchesTerm, Score, Snippet, Parse, Regex, PhrasePrefix — works fine that way. The two exceptions are MoreLikeThis and JsonSearch with Term-field filters: those functions generate internal SQL inside pg_search that references the column unquoted, and PostgreSQL folds it to lowercase — column "id" does not exist. If you need either of those features, map the affected columns to snake_case, either project-wide via EFCore.NamingConventions (npgsql.UseSnakeCaseNamingConvention()) or per-property with [Column("snake_case")]. This is a ParadeDB-side quoting issue; the library can't work around it from the client. Tracked upstream in paradedb/paradedb#5065 with a fix proposed in paradedb/paradedb#5078 — once that PR is merged and shipped in a ParadeDB release, PascalCase columns will work for MoreLikeThis and JsonSearch/Term too and this workaround can be dropped.

4. Create a migration

dotnet ef migrations add AddBm25Index
dotnet ef database update

EF Core will generate the migration automatically, creating:

The pg_search PostgreSQL extension
A BM25 index on the specified columns with the key_field storage parameter and per-column text_fields / numeric_fields / boolean_fields / datetime_fields / json_fields JSON derived from any [Bm25Text] / [Bm25Numeric] / etc. attributes

Querying

Basic Search

Uses the ||| operator — matches documents containing any of the query terms (OR).

var results = await dbContext.Articles
    .Where(a => EF.Functions.Matches(a.Content, "machine learning"))
    .ToListAsync();

SELECT * FROM "Articles" WHERE "Content" ||| 'machine learning'

Conjunction Search

Uses the &&& operator — matches documents containing all of the query terms (AND).

var results = await dbContext.Articles
    .Where(a => EF.Functions.MatchesAll(a.Content, "machine learning"))
    .ToListAsync();

SELECT * FROM "Articles" WHERE "Content" &&& 'machine learning'

Phrase Search

Matches terms in exact order. Slop allows N words between terms or transposition of adjacent terms.

// Exact phrase
var results = await dbContext.Articles
    .Where(a => EF.Functions.MatchesPhrase(a.Content, "neural networks"))
    .ToListAsync();

// Phrase with slop — allows up to 2 words between terms
var results = await dbContext.Articles
    .Where(a => EF.Functions.MatchesPhrase(a.Content, "neural networks", 2))
    .ToListAsync();

-- Exact phrase
SELECT * FROM "Articles" WHERE "Content" ### 'neural networks'

-- With slop
SELECT * FROM "Articles" WHERE "Content" ### 'neural networks'::pdb.slop(2)

Term Search

Exact token match — the query is NOT tokenized (no stemming/lowering). Most tokenizers lowercase, so search lowercase.

// Single term
var results = await dbContext.Articles
    .Where(a => EF.Functions.MatchesTerm(a.Content, "gpu"))
    .ToListAsync();

// Multiple terms (matches any)
var results = await dbContext.Articles
    .Where(a => EF.Functions.MatchesTermSet(a.Content, "gpu", "tpu", "npu"))
    .ToListAsync();

SELECT * FROM "Articles" WHERE "Content" === 'gpu'
SELECT * FROM "Articles" WHERE "Content" === ARRAY['gpu', 'tpu', 'npu']

Fuzzy Search (Levenshtein Distance)

Tolerates typos by allowing up to N single-character edits (insertions, deletions, substitutions). Max distance is 2.

prefix: exempts the initial substring from the edit distance
transpositionCostOne: counts swapping two adjacent characters as one edit instead of two

// Basic fuzzy (distance 2)
var results = await dbContext.Articles
    .Where(a => EF.Functions.MatchesFuzzy(a.Content, "machin", 2))
    .ToListAsync();

// Fuzzy with all options
var results = await dbContext.Articles
    .Where(a => EF.Functions.MatchesFuzzy(a.Content, "machin", 2, true, false))
    .ToListAsync();

// Fuzzy AND match
var results = await dbContext.Articles
    .Where(a => EF.Functions.MatchesAllFuzzy(a.Content, "machin lerning", 2))
    .ToListAsync();

// Fuzzy term match
var results = await dbContext.Articles
    .Where(a => EF.Functions.MatchesTermFuzzy(a.Content, "machin", 1))
    .ToListAsync();

SELECT * FROM "Articles" WHERE "Content" ||| 'machin'::pdb.fuzzy(2)
SELECT * FROM "Articles" WHERE "Content" ||| 'machin'::pdb.fuzzy(2, true, false)
SELECT * FROM "Articles" WHERE "Content" &&& 'machin lerning'::pdb.fuzzy(2)
SELECT * FROM "Articles" WHERE "Content" === 'machin'::pdb.fuzzy(1)

Boost

Increases the BM25 relevance weight of a specific search term. Higher boost = higher score for matches on that term. Factor range: -2048 to 2048.

// Boosted OR match
var results = await dbContext.Articles
    .Where(a => EF.Functions.MatchesBoosted(a.Title, "transformers", 2.0))
    .ToListAsync();

// Boosted AND match
var results = await dbContext.Articles
    .Where(a => EF.Functions.MatchesAllBoosted(a.Content, "attention mechanism", 1.5))
    .ToListAsync();

// Combined fuzzy + boost
var results = await dbContext.Articles
    .Where(a => EF.Functions.MatchesFuzzyBoosted(a.Title, "transfomers", 2, 2.0))
    .ToListAsync();

SELECT * FROM "Articles" WHERE "Title" ||| 'transformers'::pdb.boost(2)
SELECT * FROM "Articles" WHERE "Content" &&& 'attention mechanism'::pdb.boost(1.5)
SELECT * FROM "Articles" WHERE "Title" ||| 'transfomers'::pdb.fuzzy(2)::pdb.boost(2)

BM25 Scoring

BM25 (Best Matching 25) ranks documents by relevance considering term frequency, inverse document frequency, and document length.

var results = await dbContext.Articles
    .Where(a => EF.Functions.Matches(a.Content, "deep learning"))
    .Select(a => new
    {
        a.Title,
        Score = EF.Functions.Score(a.Id)
    })
    .OrderByDescending(a => a.Score)
    .Take(10)
    .ToListAsync();

SELECT "Title", pdb.score("Id") AS "Score"
FROM "Articles"
WHERE "Content" ||| 'deep learning'
ORDER BY pdb.score("Id") DESC
LIMIT 10

Snippets

Returns text excerpts with matched terms highlighted using configurable HTML tags.

// Basic snippet (default highlighting)
var results = await dbContext.Articles
    .Where(a => EF.Functions.Matches(a.Content, "neural networks"))
    .Select(a => new
    {
        a.Title,
        Snippet = EF.Functions.Snippet(a.Content)
    })
    .ToListAsync();

// Parameterized snippet (custom tags and length)
var results = await dbContext.Articles
    .Where(a => EF.Functions.Matches(a.Content, "neural networks"))
    .Select(a => new
    {
        a.Title,
        Snippet = EF.Functions.Snippet(a.Content, "<b>", "</b>", 100)
    })
    .ToListAsync();

// Multiple snippets
var results = await dbContext.Articles
    .Where(a => EF.Functions.Matches(a.Content, "neural networks"))
    .Select(a => new
    {
        a.Title,
        Snippets = EF.Functions.Snippets(a.Content, 15, 5, 0)
    })
    .ToListAsync();

SELECT "Title", pdb.snippet("Content") AS "Snippet" FROM "Articles" WHERE ...
SELECT "Title", pdb.snippet("Content", start_tag => '<b>', end_tag => '</b>', max_num_chars => 100) AS "Snippet" FROM "Articles" WHERE ...
SELECT "Title", pdb.snippets("Content", max_num_chars => 15, "limit" => 5, "offset" => 0) AS "Snippets" FROM "Articles" WHERE ...

Parse Query (Tantivy Syntax)

Full query parser supporting field:value, boolean operators (AND/OR/NOT), ranges (rating:>3), and wildcards.

lenient: ignores syntax errors
conjunctionMode: defaults terms to AND instead of OR

// Basic parse query
var results = await dbContext.Articles
    .Where(a => EF.Functions.Parse(a.Id, "title:transformers AND content:attention"))
    .ToListAsync();

// With options
var results = await dbContext.Articles
    .Where(a => EF.Functions.Parse(a.Id, "transformers attention", true, true))
    .ToListAsync();

SELECT * FROM "Articles" WHERE "Id" @@@ pdb.parse('title:transformers AND content:attention')
SELECT * FROM "Articles" WHERE "Id" @@@ pdb.parse('transformers attention', lenient => TRUE, conjunction_mode => TRUE)

Regex Search

Matches indexed tokens against a regular expression (Rust regex syntax).

var results = await dbContext.Articles
    .Where(a => EF.Functions.Regex(a.Content, "neuro.*"))
    .ToListAsync();

SELECT * FROM "Articles" WHERE "Content" @@@ pdb.regex('neuro.*')

Phrase Prefix

Matches a phrase where the last term is treated as a prefix — useful for autocomplete/type-ahead.

// Basic phrase prefix
var results = await dbContext.Articles
    .Where(a => EF.Functions.PhrasePrefix(a.Content, "running", "sh"))
    .ToListAsync();

// With max expansions
var results = await dbContext.Articles
    .Where(a => EF.Functions.PhrasePrefix(a.Content, 10, "running", "sh"))
    .ToListAsync();

SELECT * FROM "Articles" WHERE "Content" @@@ pdb.phrase_prefix(ARRAY['running', 'sh'])
SELECT * FROM "Articles" WHERE "Content" @@@ pdb.phrase_prefix(ARRAY['running', 'sh'], max_expansion => 10)

More Like This

Finds documents similar to a given document by analyzing its indexed terms.

// Find similar to document with ID 3
var results = await dbContext.Articles
    .Where(a => EF.Functions.MoreLikeThis(a.Id, 3))
    .ToListAsync();

// Restrict similarity analysis to specific fields
var results = await dbContext.Articles
    .Where(a => EF.Functions.MoreLikeThis(a.Id, 3, "description"))
    .ToListAsync();

SELECT * FROM "Articles" WHERE "Id" @@@ pdb.more_like_this(3)
SELECT * FROM "Articles" WHERE "Id" @@@ pdb.more_like_this(3, ARRAY['description'])

JSON Query Search

For complex queries combining full-text search with term filters, use ParadeDbJsonQuery to build structured JSON queries. This translates to the @@@ operator with ::jsonb cast.

// Build a boolean query combining parse + term filters
var query = ParadeDbJsonQuery.Boolean(b => b
    .Must(
        ParadeDbJsonQuery.Parse("revenue growth"),
        ParadeDbJsonQuery.Term("DocumentId", documentId),
        ParadeDbJsonQuery.Term("DocumentType", 10)));

// Option 1: Using EF.Functions directly
var results = await dbContext.Chunks
    .Where(c => EF.Functions.JsonSearch(c.Id, query.ToJson()))
    .OrderByDescending(c => EF.Functions.Score(c.Id))
    .Take(5)
    .ToListAsync();

// Option 2: Using IQueryable extensions
var results = await dbContext.Chunks
    .JsonSearch(c => c.Id, query)
    .OrderByScoreDescending(c => c.Id)
    .Take(5)
    .ToListAsync();

// Option 3: Inline boolean builder
var results = await dbContext.Chunks
    .JsonSearch(c => c.Id, b => b
        .Must(
            ParadeDbJsonQuery.Parse("revenue growth"),
            ParadeDbJsonQuery.Term("DocumentId", documentId),
            ParadeDbJsonQuery.Term("DocumentType", 10)))
    .OrderByScoreDescending(c => c.Id)
    .Take(5)
    .ToListAsync();

SELECT * FROM "Chunks"
WHERE "Id" @@@ '{"boolean":{"must":[
    {"parse":{"query_string":"revenue growth"}},
    {"term":{"field":"DocumentId","value":"..."}},
    {"term":{"field":"DocumentType","value":10}}
]}}'::jsonb
ORDER BY pdb.score("Id") DESC
LIMIT 5

Available query types:

Factory Method	JSON Output
`Parse("query")`	`{"parse":{"query_string":"query"}}`
`Parse("query", lenient, conjunctionMode)`	With optional flags
`Term("field", value)`	`{"term":{"field":"...","value":...}}`
`TermSet("field", values...)`	`{"term_set":{"field":"...","terms":[...]}}`
`Match("value")`	`{"match":{"value":"..."}}`
`Match("value", "field", distance, conjunctionMode)`	With options
`FuzzyTerm("field", "value", distance)`	`{"fuzzy_term":{"field":"...","value":"...","distance":N}}`
`Phrase("field", phrases...)`	`{"phrase":{"field":"...","phrases":[...]}}`
`Phrase("field", slop, phrases...)`	With slop
`PhrasePrefix("field", phrases...)`	`{"phrase_prefix":{"field":"...","phrases":[...]}}`
`Regex("field", "pattern")`	`{"regex":{"field":"...","pattern":"..."}}`
`Range("field", lower, upper, lowerInclusive, upperInclusive)`	Bound objects with included/excluded
`Boost(query, factor)`	`{"boost":{"query":{...},"factor":N}}`
`ConstScore(query, score)`	`{"const_score":{"query":{...},"score":N}}`
`Exists("field")`	`{"exists":{"field":"..."}}`
`All()`	`{"all":null}`
`DisjunctionMax(queries...)`	`{"disjunction_max":{"disjuncts":[...]}}`
`MoreLikeThis(documentId)`	`{"more_like_this":{"key_value":N}}`
`Boolean(b => b.Must(...).Should(...).MustNot(...))`	Boolean combinations

Combining with LINQ

All search methods compose naturally with standard LINQ:

var results = await dbContext.Articles
    .Where(a => EF.Functions.MatchesFuzzy(a.Content, "transfomers", 2)
                && a.CreatedAt > DateTime.UtcNow.AddMonths(-6))
    .Select(a => new
    {
        a.Title,
        Snippet = EF.Functions.Snippet(a.Content, "<mark>", "</mark>", 200),
        Score = EF.Functions.Score(a.Id)
    })
    .OrderByDescending(a => a.Score)
    .Take(20)
    .ToListAsync();

Contributing

The repo uses CSharpier for formatting and a prek (or pre-commit-compatible) hook bundle for the usual hygiene checks (end-of-file-fixer, trailing-whitespace, markdownlint, codespell, etc.).

dotnet tool restore                 # installs CSharpier locally
prek install -f                     # installs the pre-commit hooks
prek run --all-files                # one-off sweep over the whole repo

CI runs the same checks (CSharpier check + -warnaserror build) in the lint job, so even contributors who skip the local hooks get caught at PR time.

License

MIT

Author

Daniel Oliveira

Name		Name	Last commit message	Last commit date
Latest commit History 109 Commits
.config		.config
.github		.github
Equibles.ParadeDB.EntityFrameworkCore.IntegrationTests		Equibles.ParadeDB.EntityFrameworkCore.IntegrationTests
Equibles.ParadeDB.EntityFrameworkCore.Tests		Equibles.ParadeDB.EntityFrameworkCore.Tests
Equibles.ParadeDB.EntityFrameworkCore		Equibles.ParadeDB.EntityFrameworkCore
assets		assets
.codespellignore		.codespellignore
.codespellrc		.codespellrc
.csharpierignore		.csharpierignore
.editorconfig		.editorconfig
.git-blame-ignore-revs		.git-blame-ignore-revs
.gitignore		.gitignore
.markdownlint.yaml		.markdownlint.yaml
.pre-commit-config.yaml		.pre-commit-config.yaml
CHANGELOG.md		CHANGELOG.md
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
Directory.Build.props		Directory.Build.props
Equibles.ParadeDB.EntityFrameworkCore.slnx		Equibles.ParadeDB.EntityFrameworkCore.slnx
LICENSE		LICENSE
README.md		README.md
SECURITY.md		SECURITY.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Equibles.ParadeDB.EntityFrameworkCore

Requirements

Installation

Setup

1. Enable ParadeDB in your DbContext

2. Add BM25 indexes to your entities

3. (Optional) Configure per-column index settings

4. Create a migration

Querying

Basic Search

Conjunction Search

Phrase Search

Term Search

Fuzzy Search (Levenshtein Distance)

Boost

BM25 Scoring

Snippets

Parse Query (Tantivy Syntax)

Regex Search

Phrase Prefix

More Like This

JSON Query Search

Combining with LINQ

Contributing

License

Author

About

Uh oh!

Releases 8

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Equibles.ParadeDB.EntityFrameworkCore

Requirements

Installation

Setup

1. Enable ParadeDB in your DbContext

2. Add BM25 indexes to your entities

3. (Optional) Configure per-column index settings

4. Create a migration

Querying

Basic Search

Conjunction Search

Phrase Search

Term Search

Fuzzy Search (Levenshtein Distance)

Boost

BM25 Scoring

Snippets

Parse Query (Tantivy Syntax)

Regex Search

Phrase Prefix

More Like This

JSON Query Search

Combining with LINQ

Contributing

License

Author

About

Topics

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases 8

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages