Skip to content

Document read_auto operator#366

Merged
mavam merged 12 commits into
mainfrom
topic/read-auto
Jun 16, 2026
Merged

Document read_auto operator#366
mavam merged 12 commits into
mainfrom
topic/read-auto

Conversation

@mavam

@mavam mavam commented May 28, 2026

Copy link
Copy Markdown
Member

🔍 Problem

  • read_auto adds user-facing automatic reader detection, but docs.tenzir.com has no operator reference for it.

🛠️ Solution

  • Add a read_auto reference page with strict detection behavior, fallbacks, probe limits, and examples.
  • Add read_auto to the operator reference index.

💬 Review

⚙️ Code PR: tenzir/tenzir#6191

@github-actions github-actions Bot added the reference Reference documentation label May 28, 2026
@github-actions

github-actions Bot commented May 28, 2026

Copy link
Copy Markdown
Contributor

📦 Preview  ·  View →  ·  ⚪ Removed

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 4d7d284ceb

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread src/content/docs/reference/operators/read_auto.mdx Outdated
mavam added 5 commits June 6, 2026 10:23
Add the reference entry for automatic reader detection, including strict detection behavior, fallback modes, probe limits, and examples.

Assisted-by: GPT-5 (pi)
State that fallback=all chooses text or binary mode from the current probe bytes, not from the entire stream. Point users with binary payloads that start with a UTF-8 prefix to a larger probe or direct read_all binary mode.

Assisted-by: GPT-5 (pi)
Replace the invalid load snippets with from_file subpipelines, matching the documented file-reading syntax for parsing byte streams.

Assisted-by: GPT-5 (pi)
Document the default probe limit as 1Mi to match the TQL spelling users can configure.

Assisted-by: GPT-5 (pi)
Add guide examples for rapid prototyping, mixed file drops, and TCP intake endpoints that accept several input formats. Expand the operator reference with guidance about when to choose automatic detection versus a concrete reader.

Assisted-by: ChatGPT (pi)
@mavam mavam force-pushed the topic/read-auto branch from 0067154 to f5c39ac Compare June 6, 2026 08:26
@github-actions github-actions Bot added the guide How-to guides label Jun 6, 2026

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: f5c39acde6

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/content/docs/reference/operators/read_auto.mdx
mavam added 2 commits June 6, 2026 10:31
Document that fallback selection waits until the probe is final. This makes the long-lived stream behavior explicit and points users to lower probe limits or concrete readers when they need immediate text parsing.

Assisted-by: GPT-5 (Codex)
Describe the two detection layers in the description: capability via
dry runs of the actual parsers, and a specificity order that picks the
most precise format among capable readers. Document that SSV and
PRI-less Syslog never auto-detect, and that output keeps the selected
reader's schema name.

Assisted-by: Fable 5 (Claude Code)
@chatgpt-codex-connector

Copy link
Copy Markdown

You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard.

Assisted-by: Fable 5 (Claude Code)
@chatgpt-codex-connector

Copy link
Copy Markdown

You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard.

Assisted-by: Fable 5 (Claude Code)
@chatgpt-codex-connector

Copy link
Copy Markdown

You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard.

Assisted-by: Fable 5 (Claude Code)
@chatgpt-codex-connector

Copy link
Copy Markdown

You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard.

Document that YAML auto-detection requires a map document that read_yaml would turn into an event.

Assisted-by: GPT-5 Codex (OpenAI Codex)

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: f9f322198c

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/content/docs/guides/collecting/get-data-from-the-network.mdx Outdated
Set a smaller probe limit in the TCP read_auto example and point known long-lived plain-text streams to read_lines directly.

Assisted-by: GPT-5 Codex (OpenAI Codex)
mavam added a commit to tenzir/tenzir that referenced this pull request Jun 16, 2026
## 🔍 Problem

- Users currently need to choose a concrete reader up front, even when
the input format is obvious from the first bytes.
- Generic readers such as `read_lines` and `read_all` are too weak to be
safe defaults for automatic parsing.

## 🛠️ Solution

- Add `read_auto` as a strict detector-driven reader selector for chunk
input.
- Add detector variants for the first supported JSON, text-line,
delimited, and magic-byte formats.
- Require an explicit `fallback="lines"` or `fallback="all"` for unknown
input.
- Add a read-detection extension point for reader plugins.
- Add a changelog entry and focused integration coverage.

## 💬 Review

- Focus on detector precedence and ambiguity behavior, especially JSON
object vs NDJSON and explicit fallbacks.
- Verified with `scripts/build.sh tenzir-unit-test` and `uvx tenzir-test
--root test --match read_auto`.

<sub>
📚 Docs PR: tenzir/docs#366
</sub>
@mavam mavam merged commit f7bfc55 into main Jun 16, 2026
5 checks passed
@mavam mavam deleted the topic/read-auto branch June 16, 2026 14:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

guide How-to guides reference Reference documentation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant