Skip to content

Decouple IngestionDocumentReader from IngestionPipeline constructor#7454

Draft
Copilot wants to merge 1 commit intodata-ingestion-preview2from
copilot/extend-ingestionpipeline-with-new-method
Draft

Decouple IngestionDocumentReader from IngestionPipeline constructor#7454
Copilot wants to merge 1 commit intodata-ingestion-preview2from
copilot/extend-ingestionpipeline-with-new-method

Conversation

Copy link
Copy Markdown
Contributor

Copilot AI commented Apr 7, 2026

IngestionPipeline<T> required an IngestionDocumentReader at construction time, making it impossible to use with in-memory or programmatically created documents.

Changes

  • IngestionPipeline<T> constructor — removed reader parameter; pipeline now only requires chunker and writer
  • New overloadProcessAsync(IAsyncEnumerable<IngestionDocument>, CancellationToken) processes documents directly without any file-system dependency
  • File-system overloadsProcessAsync(IngestionDocumentReader, DirectoryInfo, ...) and ProcessAsync(IngestionDocumentReader, IEnumerable<FileInfo>, ...) now take reader as a mandatory first argument
  • DiagnosticsConstants — added ProcessDocuments / ProcessDocument activity names for the new overload's tracing
  • Tests — updated existing tests for the new signatures; added CanProcessDocumentsWithoutReader demonstrating direct document ingestion
  • Template + snapshots — updated DataIngestor.cs and all 5 integration-test snapshots
  • READMEs / CHANGELOG — updated Microsoft.Extensions.DataIngestion, MarkItDown, and Markdig docs

Usage without a reader

using IngestionPipeline<string> pipeline = new(CreateChunker(), CreateWriter());

IngestionDocument document = new("my-doc-id");
document.Sections.Add(new IngestionDocumentSection());
document.Sections[0].Elements.Add(new IngestionDocumentParagraph("In-memory content."));

await foreach (IngestionResult result in pipeline.ProcessAsync(new[] { document }.ToAsyncEnumerable()))
{
    Console.WriteLine($"{result.DocumentId}: {result.Succeeded}");
}

Usage with a reader (file system)

IngestionDocumentReader reader = new MarkdownReader();
using IngestionPipeline<string> pipeline = new(CreateChunker(), CreateWriter());

await foreach (IngestionResult result in pipeline.ProcessAsync(reader, new DirectoryInfo("docs"), "*.md"))
{
    Console.WriteLine($"{result.DocumentId}: {result.Succeeded}");
}
Microsoft Reviewers: Open in CodeFlow

…(IAsyncEnumerable<IngestionDocument>) overload

Agent-Logs-Url: https://github.com/dotnet/extensions/sessions/4dc3f0c2-40aa-445e-9392-fa3e254d2d05

Co-authored-by: adamsitnik <6011991+adamsitnik@users.noreply.github.com>
@github-actions github-actions bot added the area-ai-templates Microsoft.Extensions.AI.Templates label Apr 7, 2026
@adamsitnik adamsitnik added area-data-ingestion and removed area-ai-templates Microsoft.Extensions.AI.Templates labels Apr 7, 2026
@adamsitnik adamsitnik added this to the Data Ingestion Preview 2 milestone Apr 7, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants