Skip to content

daniel3303/ParadeDbEntityFrameworkCore

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

109 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Equibles.ParadeDB.EntityFrameworkCore

NuGet NuGet Downloads CI codecov .NET License: MIT

EF Core integration for ParadeDB pg_search — BM25 full-text search indexes on PostgreSQL.

Provides a [Bm25Index] attribute for automatic index creation via migrations, and LINQ-friendly query methods for BM25 search, fuzzy matching, boosting, scoring, snippets, and more. No raw SQL needed.

Requirements

Installation

dotnet add package Equibles.ParadeDB.EntityFrameworkCore

Setup

1. Enable ParadeDB in your DbContext

services.AddDbContext<MyDbContext>(options =>
    options.UseNpgsql(connectionString, npgsql => npgsql.UseParadeDb()));

2. Add BM25 indexes to your entities

using Equibles.ParadeDB.EntityFrameworkCore;

[Bm25Index(nameof(Id), nameof(Title), nameof(Content))]
public class Article
{
    public Guid Id { get; set; }
    public string Title { get; set; }
    public string Content { get; set; }
}

The first parameter is the key field (required by pg_search to identify rows for scoring via pdb.score()), followed by the columns to index for full-text search. The key field is not searchable — it's only used internally by ParadeDB.

3. (Optional) Configure per-column index settings

Drop column-level attributes on the indexed properties to tune tokenization, stemming, stopwords, fast-storage, and field-specific options. Anything you don't set keeps pg_search's defaults.

using Equibles.ParadeDB.EntityFrameworkCore;

[Bm25Index(nameof(Id), nameof(Title), nameof(Content), nameof(Category), nameof(Rating), nameof(PublishedAt))]
public class Article
{
    public Guid Id { get; set; }

    [Bm25Text(Stemmer = Bm25Language.English, Fast = true)]
    public string Title { get; set; }

    [Bm25Text(Stemmer = Bm25Language.English, Record = Bm25Record.Position)]
    public string Content { get; set; }

    [Bm25Text(Tokenizer = Bm25Tokenizer.Raw, Fast = true)]
    public string Category { get; set; }

    [Bm25Numeric(Fast = true)]
    public int Rating { get; set; }

    [Bm25DateTime(Fast = true)]
    public DateTime PublishedAt { get; set; }
}
Attribute Settings
[Bm25Text] Tokenizer, MinGram / MaxGram / PrefixOnly (ngram), RegexPattern (regex), Stemmer, StopwordsLanguage, Fast, Record, Indexed, Fieldnorms
[Bm25Numeric] Fast, Indexed
[Bm25Boolean] Fast, Indexed
[Bm25DateTime] Fast, Indexed
[Bm25Json] same as [Bm25Text] plus ExpandDots

Bm25TokenizerDefault, Whitespace, Raw, Keyword, SourceCode, Icu, Ngram, Regex, ChineseCompatible, ChineseLindera, JapaneseLindera, KoreanLindera, Jieba. Ngram requires MinGram and MaxGram; Regex requires RegexPattern. Other tokenizers take no parameters.

Bm25Language — used by both Stemmer and StopwordsLanguage: Arabic, Czech, Danish, Dutch, English, Finnish, French, German, Greek, Hungarian, Italian, Norwegian, Polish, Portuguese, Romanian, Russian, Spanish, Swedish, Tamil, Turkish.

Bm25RecordBasic (term + frequency), Freq, or Position. Position is required for phrase queries; the others use less disk.

Fast stores the column in columnar format for sorting/faceting/aggregation. Required if you want ORDER BY on this column to use the index.

A property may only have one [Bm25*] attribute, and that property must be listed in the entity's [Bm25Index] columns — orphan attributes throw at model-build time.

PascalCase columns: works for the common path, with two exceptions. EF Core's default naming maps public int Id { get; set; } to a column named "Id", and every search operator that references the column directly in the LINQ expression — Matches, MatchesAll, MatchesPhrase, MatchesFuzzy, MatchesTerm, Score, Snippet, Parse, Regex, PhrasePrefix — works fine that way. The two exceptions are MoreLikeThis and JsonSearch with Term-field filters: those functions generate internal SQL inside pg_search that references the column unquoted, and PostgreSQL folds it to lowercase — column "id" does not exist. If you need either of those features, map the affected columns to snake_case, either project-wide via EFCore.NamingConventions (npgsql.UseSnakeCaseNamingConvention()) or per-property with [Column("snake_case")]. This is a ParadeDB-side quoting issue; the library can't work around it from the client. Tracked upstream in paradedb/paradedb#5065 with a fix proposed in paradedb/paradedb#5078 — once that PR is merged and shipped in a ParadeDB release, PascalCase columns will work for MoreLikeThis and JsonSearch/Term too and this workaround can be dropped.

4. Create a migration

dotnet ef migrations add AddBm25Index
dotnet ef database update

EF Core will generate the migration automatically, creating:

  • The pg_search PostgreSQL extension
  • A BM25 index on the specified columns with the key_field storage parameter and per-column text_fields / numeric_fields / boolean_fields / datetime_fields / json_fields JSON derived from any [Bm25Text] / [Bm25Numeric] / etc. attributes

Querying

Basic Search

Uses the ||| operator — matches documents containing any of the query terms (OR).

var results = await dbContext.Articles
    .Where(a => EF.Functions.Matches(a.Content, "machine learning"))
    .ToListAsync();
SELECT * FROM "Articles" WHERE "Content" ||| 'machine learning'

Conjunction Search

Uses the &&& operator — matches documents containing all of the query terms (AND).

var results = await dbContext.Articles
    .Where(a => EF.Functions.MatchesAll(a.Content, "machine learning"))
    .ToListAsync();
SELECT * FROM "Articles" WHERE "Content" &&& 'machine learning'

Phrase Search

Matches terms in exact order. Slop allows N words between terms or transposition of adjacent terms.

// Exact phrase
var results = await dbContext.Articles
    .Where(a => EF.Functions.MatchesPhrase(a.Content, "neural networks"))
    .ToListAsync();

// Phrase with slop — allows up to 2 words between terms
var results = await dbContext.Articles
    .Where(a => EF.Functions.MatchesPhrase(a.Content, "neural networks", 2))
    .ToListAsync();
-- Exact phrase
SELECT * FROM "Articles" WHERE "Content" ### 'neural networks'

-- With slop
SELECT * FROM "Articles" WHERE "Content" ### 'neural networks'::pdb.slop(2)

Term Search

Exact token match — the query is NOT tokenized (no stemming/lowering). Most tokenizers lowercase, so search lowercase.

// Single term
var results = await dbContext.Articles
    .Where(a => EF.Functions.MatchesTerm(a.Content, "gpu"))
    .ToListAsync();

// Multiple terms (matches any)
var results = await dbContext.Articles
    .Where(a => EF.Functions.MatchesTermSet(a.Content, "gpu", "tpu", "npu"))
    .ToListAsync();
SELECT * FROM "Articles" WHERE "Content" === 'gpu'
SELECT * FROM "Articles" WHERE "Content" === ARRAY['gpu', 'tpu', 'npu']

Fuzzy Search (Levenshtein Distance)

Tolerates typos by allowing up to N single-character edits (insertions, deletions, substitutions). Max distance is 2.

  • prefix: exempts the initial substring from the edit distance
  • transpositionCostOne: counts swapping two adjacent characters as one edit instead of two
// Basic fuzzy (distance 2)
var results = await dbContext.Articles
    .Where(a => EF.Functions.MatchesFuzzy(a.Content, "machin", 2))
    .ToListAsync();

// Fuzzy with all options
var results = await dbContext.Articles
    .Where(a => EF.Functions.MatchesFuzzy(a.Content, "machin", 2, true, false))
    .ToListAsync();

// Fuzzy AND match
var results = await dbContext.Articles
    .Where(a => EF.Functions.MatchesAllFuzzy(a.Content, "machin lerning", 2))
    .ToListAsync();

// Fuzzy term match
var results = await dbContext.Articles
    .Where(a => EF.Functions.MatchesTermFuzzy(a.Content, "machin", 1))
    .ToListAsync();
SELECT * FROM "Articles" WHERE "Content" ||| 'machin'::pdb.fuzzy(2)
SELECT * FROM "Articles" WHERE "Content" ||| 'machin'::pdb.fuzzy(2, true, false)
SELECT * FROM "Articles" WHERE "Content" &&& 'machin lerning'::pdb.fuzzy(2)
SELECT * FROM "Articles" WHERE "Content" === 'machin'::pdb.fuzzy(1)

Boost

Increases the BM25 relevance weight of a specific search term. Higher boost = higher score for matches on that term. Factor range: -2048 to 2048.

// Boosted OR match
var results = await dbContext.Articles
    .Where(a => EF.Functions.MatchesBoosted(a.Title, "transformers", 2.0))
    .ToListAsync();

// Boosted AND match
var results = await dbContext.Articles
    .Where(a => EF.Functions.MatchesAllBoosted(a.Content, "attention mechanism", 1.5))
    .ToListAsync();

// Combined fuzzy + boost
var results = await dbContext.Articles
    .Where(a => EF.Functions.MatchesFuzzyBoosted(a.Title, "transfomers", 2, 2.0))
    .ToListAsync();
SELECT * FROM "Articles" WHERE "Title" ||| 'transformers'::pdb.boost(2)
SELECT * FROM "Articles" WHERE "Content" &&& 'attention mechanism'::pdb.boost(1.5)
SELECT * FROM "Articles" WHERE "Title" ||| 'transfomers'::pdb.fuzzy(2)::pdb.boost(2)

BM25 Scoring

BM25 (Best Matching 25) ranks documents by relevance considering term frequency, inverse document frequency, and document length.

var results = await dbContext.Articles
    .Where(a => EF.Functions.Matches(a.Content, "deep learning"))
    .Select(a => new
    {
        a.Title,
        Score = EF.Functions.Score(a.Id)
    })
    .OrderByDescending(a => a.Score)
    .Take(10)
    .ToListAsync();
SELECT "Title", pdb.score("Id") AS "Score"
FROM "Articles"
WHERE "Content" ||| 'deep learning'
ORDER BY pdb.score("Id") DESC
LIMIT 10

Snippets

Returns text excerpts with matched terms highlighted using configurable HTML tags.

// Basic snippet (default highlighting)
var results = await dbContext.Articles
    .Where(a => EF.Functions.Matches(a.Content, "neural networks"))
    .Select(a => new
    {
        a.Title,
        Snippet = EF.Functions.Snippet(a.Content)
    })
    .ToListAsync();

// Parameterized snippet (custom tags and length)
var results = await dbContext.Articles
    .Where(a => EF.Functions.Matches(a.Content, "neural networks"))
    .Select(a => new
    {
        a.Title,
        Snippet = EF.Functions.Snippet(a.Content, "<b>", "</b>", 100)
    })
    .ToListAsync();

// Multiple snippets
var results = await dbContext.Articles
    .Where(a => EF.Functions.Matches(a.Content, "neural networks"))
    .Select(a => new
    {
        a.Title,
        Snippets = EF.Functions.Snippets(a.Content, 15, 5, 0)
    })
    .ToListAsync();
SELECT "Title", pdb.snippet("Content") AS "Snippet" FROM "Articles" WHERE ...
SELECT "Title", pdb.snippet("Content", start_tag => '<b>', end_tag => '</b>', max_num_chars => 100) AS "Snippet" FROM "Articles" WHERE ...
SELECT "Title", pdb.snippets("Content", max_num_chars => 15, "limit" => 5, "offset" => 0) AS "Snippets" FROM "Articles" WHERE ...

Parse Query (Tantivy Syntax)

Full query parser supporting field:value, boolean operators (AND/OR/NOT), ranges (rating:>3), and wildcards.

  • lenient: ignores syntax errors
  • conjunctionMode: defaults terms to AND instead of OR
// Basic parse query
var results = await dbContext.Articles
    .Where(a => EF.Functions.Parse(a.Id, "title:transformers AND content:attention"))
    .ToListAsync();

// With options
var results = await dbContext.Articles
    .Where(a => EF.Functions.Parse(a.Id, "transformers attention", true, true))
    .ToListAsync();
SELECT * FROM "Articles" WHERE "Id" @@@ pdb.parse('title:transformers AND content:attention')
SELECT * FROM "Articles" WHERE "Id" @@@ pdb.parse('transformers attention', lenient => TRUE, conjunction_mode => TRUE)

Regex Search

Matches indexed tokens against a regular expression (Rust regex syntax).

var results = await dbContext.Articles
    .Where(a => EF.Functions.Regex(a.Content, "neuro.*"))
    .ToListAsync();
SELECT * FROM "Articles" WHERE "Content" @@@ pdb.regex('neuro.*')

Phrase Prefix

Matches a phrase where the last term is treated as a prefix — useful for autocomplete/type-ahead.

// Basic phrase prefix
var results = await dbContext.Articles
    .Where(a => EF.Functions.PhrasePrefix(a.Content, "running", "sh"))
    .ToListAsync();

// With max expansions
var results = await dbContext.Articles
    .Where(a => EF.Functions.PhrasePrefix(a.Content, 10, "running", "sh"))
    .ToListAsync();
SELECT * FROM "Articles" WHERE "Content" @@@ pdb.phrase_prefix(ARRAY['running', 'sh'])
SELECT * FROM "Articles" WHERE "Content" @@@ pdb.phrase_prefix(ARRAY['running', 'sh'], max_expansion => 10)

More Like This

Finds documents similar to a given document by analyzing its indexed terms.

// Find similar to document with ID 3
var results = await dbContext.Articles
    .Where(a => EF.Functions.MoreLikeThis(a.Id, 3))
    .ToListAsync();

// Restrict similarity analysis to specific fields
var results = await dbContext.Articles
    .Where(a => EF.Functions.MoreLikeThis(a.Id, 3, "description"))
    .ToListAsync();
SELECT * FROM "Articles" WHERE "Id" @@@ pdb.more_like_this(3)
SELECT * FROM "Articles" WHERE "Id" @@@ pdb.more_like_this(3, ARRAY['description'])

JSON Query Search

For complex queries combining full-text search with term filters, use ParadeDbJsonQuery to build structured JSON queries. This translates to the @@@ operator with ::jsonb cast.

// Build a boolean query combining parse + term filters
var query = ParadeDbJsonQuery.Boolean(b => b
    .Must(
        ParadeDbJsonQuery.Parse("revenue growth"),
        ParadeDbJsonQuery.Term("DocumentId", documentId),
        ParadeDbJsonQuery.Term("DocumentType", 10)));

// Option 1: Using EF.Functions directly
var results = await dbContext.Chunks
    .Where(c => EF.Functions.JsonSearch(c.Id, query.ToJson()))
    .OrderByDescending(c => EF.Functions.Score(c.Id))
    .Take(5)
    .ToListAsync();

// Option 2: Using IQueryable extensions
var results = await dbContext.Chunks
    .JsonSearch(c => c.Id, query)
    .OrderByScoreDescending(c => c.Id)
    .Take(5)
    .ToListAsync();

// Option 3: Inline boolean builder
var results = await dbContext.Chunks
    .JsonSearch(c => c.Id, b => b
        .Must(
            ParadeDbJsonQuery.Parse("revenue growth"),
            ParadeDbJsonQuery.Term("DocumentId", documentId),
            ParadeDbJsonQuery.Term("DocumentType", 10)))
    .OrderByScoreDescending(c => c.Id)
    .Take(5)
    .ToListAsync();
SELECT * FROM "Chunks"
WHERE "Id" @@@ '{"boolean":{"must":[
    {"parse":{"query_string":"revenue growth"}},
    {"term":{"field":"DocumentId","value":"..."}},
    {"term":{"field":"DocumentType","value":10}}
]}}'::jsonb
ORDER BY pdb.score("Id") DESC
LIMIT 5

Available query types:

Factory Method JSON Output
Parse("query") {"parse":{"query_string":"query"}}
Parse("query", lenient, conjunctionMode) With optional flags
Term("field", value) {"term":{"field":"...","value":...}}
TermSet("field", values...) {"term_set":{"field":"...","terms":[...]}}
Match("value") {"match":{"value":"..."}}
Match("value", "field", distance, conjunctionMode) With options
FuzzyTerm("field", "value", distance) {"fuzzy_term":{"field":"...","value":"...","distance":N}}
Phrase("field", phrases...) {"phrase":{"field":"...","phrases":[...]}}
Phrase("field", slop, phrases...) With slop
PhrasePrefix("field", phrases...) {"phrase_prefix":{"field":"...","phrases":[...]}}
Regex("field", "pattern") {"regex":{"field":"...","pattern":"..."}}
Range("field", lower, upper, lowerInclusive, upperInclusive) Bound objects with included/excluded
Boost(query, factor) {"boost":{"query":{...},"factor":N}}
ConstScore(query, score) {"const_score":{"query":{...},"score":N}}
Exists("field") {"exists":{"field":"..."}}
All() {"all":null}
DisjunctionMax(queries...) {"disjunction_max":{"disjuncts":[...]}}
MoreLikeThis(documentId) {"more_like_this":{"key_value":N}}
Boolean(b => b.Must(...).Should(...).MustNot(...)) Boolean combinations

Combining with LINQ

All search methods compose naturally with standard LINQ:

var results = await dbContext.Articles
    .Where(a => EF.Functions.MatchesFuzzy(a.Content, "transfomers", 2)
                && a.CreatedAt > DateTime.UtcNow.AddMonths(-6))
    .Select(a => new
    {
        a.Title,
        Snippet = EF.Functions.Snippet(a.Content, "<mark>", "</mark>", 200),
        Score = EF.Functions.Score(a.Id)
    })
    .OrderByDescending(a => a.Score)
    .Take(20)
    .ToListAsync();

Contributing

The repo uses CSharpier for formatting and a prek (or pre-commit-compatible) hook bundle for the usual hygiene checks (end-of-file-fixer, trailing-whitespace, markdownlint, codespell, etc.).

dotnet tool restore                 # installs CSharpier locally
prek install -f                     # installs the pre-commit hooks
prek run --all-files                # one-off sweep over the whole repo

CI runs the same checks (CSharpier check + -warnaserror build) in the lint job, so even contributors who skip the local hooks get caught at PR time.

License

MIT

Author

Daniel Oliveira

Website X LinkedIn

About

EF Core integration for ParadeDB pg_search BM25 full-text search indexes on PostgreSQL

Topics

Resources

License

Code of conduct

Contributing

Security policy

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages