diff --git a/src/content/docs/guides/collecting/get-data-from-the-network.mdx b/src/content/docs/guides/collecting/get-data-from-the-network.mdx
index 661211b96..5e8ef8105 100644
--- a/src/content/docs/guides/collecting/get-data-from-the-network.mdx
+++ b/src/content/docs/guides/collecting/get-data-from-the-network.mdx
@@ -27,6 +27,22 @@ pipeline to convert incoming bytes to events. Inside the nested pipeline,
`$peer.ip` and `$peer.port` identify the connecting client. Set
`resolve_hostnames=true` to also expose `$peer.hostname` from reverse DNS.
+### Accept multiple input formats
+
+Use read_auto when a TCP endpoint receives data from producers that
+don't all use the same format:
+
+```tql
+accept_tcp "0.0.0.0:9000" {
+ read_auto fallback="lines"
+}
+```
+
+The detector runs once per connection, so one client can send NDJSON while
+another sends CSV, Syslog, or another supported format. This pattern is useful
+for rapid prototyping, shared intake endpoints, and package pipelines where you
+want to normalize different producer formats after parsing.
+
### Connect to a remote server
Use from_tcp to connect to an existing server:
@@ -223,6 +239,10 @@ select
## See also
+- accept_tcp
+- accept_udp
+- from_nic
+- read_auto
- tcp
- udp
- nic
diff --git a/src/content/docs/guides/collecting/read-and-watch-files.mdx b/src/content/docs/guides/collecting/read-and-watch-files.mdx
index 20105cb8d..dd6c19296 100644
--- a/src/content/docs/guides/collecting/read-and-watch-files.mdx
+++ b/src/content/docs/guides/collecting/read-and-watch-files.mdx
@@ -46,6 +46,19 @@ from_file "/path/to/file.log" {
The parsing pipeline runs on the file content and must return events.
+When file names or extensions don't identify the format reliably, use
+read_auto to detect the format from the file content:
+
+```tql
+from_file "/dropzone/**" {
+ read_auto fallback="lines"
+}
+```
+
+This pattern helps with upload directories, partner file drops, and rapid
+prototyping with mixed sample files. Keep `fallback="none"` if unknown formats
+should fail instead of becoming line-oriented text.
+
## Directory processing
You can process multiple files efficiently using glob patterns. This section
diff --git a/src/content/docs/guides/parsing/parse-delimited-text.mdx b/src/content/docs/guides/parsing/parse-delimited-text.mdx
index 04d8420d5..b0ee23cf0 100644
--- a/src/content/docs/guides/parsing/parse-delimited-text.mdx
+++ b/src/content/docs/guides/parsing/parse-delimited-text.mdx
@@ -10,6 +10,24 @@ The examples use from_file with a [parsing
subpipeline](/reference/programs#parsing-subpipelines) to illustrate each
technique.
+## Auto-detect structured text
+
+Use read_auto when you don't know the text format yet, or when you want
+to prototype against sample files before choosing a concrete reader. It detects
+common structured text formats such as NDJSON, CSV, TSV, key-value text, YAML,
+Syslog, CEF, and LEEF from the first bytes of the stream:
+
+```tql
+from_file "sample.log" {
+ read_auto fallback="lines"
+}
+```
+
+With `fallback="lines"`, unsupported UTF-8 input still becomes one event per
+line. Keep the default `fallback="none"` when unknown formats should fail fast.
+After you know the exact format, switch to the concrete reader when you need
+format-specific options.
+
## Split on newlines
Use read_lines to split a byte stream on newline characters. Given this
@@ -250,6 +268,7 @@ from_file "syslog.txt" {
## See also
+- read_auto
- parsing/parse-binary-data
- parsing/parse-string-fields
- collecting/read-and-watch-files
diff --git a/src/content/docs/reference/operators.mdx b/src/content/docs/reference/operators.mdx
index 47536eb36..0c00024c0 100644
--- a/src/content/docs/reference/operators.mdx
+++ b/src/content/docs/reference/operators.mdx
@@ -531,6 +531,10 @@ operators:
description: 'Parses an incoming bytes stream into a single event.'
example: 'read_all binary=true'
path: 'reference/operators/read_all'
+ - name: 'read_auto'
+ description: 'Detects the input format of a byte stream and selects a matching reader.'
+ example: 'read_auto fallback="lines"'
+ path: 'reference/operators/read_auto'
- name: 'read_bitz'
description: 'Parses bytes as *BITZ* format.'
example: 'read_bitz'
@@ -2231,6 +2235,14 @@ read_all binary=true
+
+
+```tql
+read_auto fallback="lines"
+```
+
+
+
```tql
diff --git a/src/content/docs/reference/operators/read_auto.mdx b/src/content/docs/reference/operators/read_auto.mdx
new file mode 100644
index 000000000..bfa2100e6
--- /dev/null
+++ b/src/content/docs/reference/operators/read_auto.mdx
@@ -0,0 +1,187 @@
+---
+title: read_auto
+category: Parsing
+example: 'read_auto fallback="lines"'
+---
+
+Detects the input format of a byte stream and selects a matching reader.
+
+```tql
+read_auto [fallback=string, max_probe_bytes=uint]
+```
+
+## Description
+
+The `read_auto` operator buffers the first bytes of its input as a probe and
+asks every reader whether it can parse them. Use it when the input format is
+unknown at authoring time, but should still be one of Tenzir's structured
+formats.
+
+1. Probe the first bytes of the input, up to `max_probe_bytes`.
+2. Dry-run every reader's parser on the probe to find capable readers.
+3. Start the most specific capable reader. Without a capable reader, use the
+ `fallback` reader or fail; when two formats are equally specific, fail
+ with an ambiguity error.
+
+Detection works in two layers:
+
+1. **Capability**: Every reader dry-runs its actual parser on the probe. For
+ example, YAML detection runs the YAML parser and requires a structured
+ document, and CSV detection tokenizes complete lines with the reader's
+ quoting rules and requires a stable number of fields. A reader only
+ becomes a candidate when it would accept the probed input.
+2. **Specificity**: When several readers are capable of parsing the same
+ bytes, the most specific format wins. Magic-byte formats such as PCAP or
+ Parquet rank above JSON dialects such as Suricata EVE or GELF, which rank
+ above generic NDJSON, which ranks above key-value, delimited, Syslog, and
+ YAML input. For example, a GELF stream is also valid NDJSON, but the GELF
+ reader wins because it describes the input more precisely.
+
+Detection is strict by default. If no reader is capable, or if two formats
+with equal specificity match the same probe, `read_auto` emits an error
+instead of guessing. A reader that needs more evidence delays the decision
+until more input arrives, the input ends, or the probe reaches
+`max_probe_bytes`. Once a single best candidate exists, `read_auto` starts
+that reader, replays the buffered bytes, and forwards the rest of the stream
+unchanged.
+
+The built-in detectors cover common JSON, delimited text, security log, and
+magic-byte formats, including NDJSON, JSON objects, JSON arrays of objects,
+CSV, TSV, key-value text, YAML, Syslog, CEF, LEEF, Zeek TSV, Suricata EVE
+JSON, Zeek JSON, GELF, PCAP, Feather, BITZ, and Parquet. Formats that accept
+nearly arbitrary text never participate in detection: space-separated values
+look like prose, so select read_ssv explicitly, and Syslog messages
+without a `` prefix look like free-form text, so they only match via
+`fallback`.
+
+The output uses the schema name that the selected reader would normally assign.
+For example, detected CSV input produces the same schema name as
+read_csv, and detected NDJSON input produces the same schema name as
+read_ndjson. Inspect `@name` to see the schema name. `read_auto` does
+not add a separate field with the detected format.
+
+Use `read_auto` for exploratory pipelines where you want to try sample data
+quickly, for file drops where names don't reliably encode the format, and for
+multi-format ingestion endpoints. For example, accept_tcp can run
+`read_auto` per connection so one client sends NDJSON while another sends CSV,
+Syslog, or another supported format.
+
+Prefer a concrete reader when you already know the format or need reader-specific
+options such as `unflatten_separator` for read_ndjson. `read_auto`
+selects the reader once for each byte stream and expects the remaining bytes in
+that stream to use the same format.
+
+### `fallback = string (optional)`
+
+Controls what happens when no detector matches.
+
+Valid values are:
+
+- `"none"`: Emit an error. This is the default.
+- `"lines"`: Use read_lines. The input must be valid UTF-8.
+- `"all"`: Use read_all. `read_auto` uses the current probe to
+ choose between text and binary output: valid UTF-8 probe bytes select
+ `read_all`, while invalid probe bytes select `read_all binary=true`. If
+ binary input can start with a valid UTF-8 prefix longer than
+ `max_probe_bytes`, use a larger probe limit or read_all with
+ `binary=true` directly.
+
+`read_auto` uses a fallback only after the probe is final, either because the
+input ended or because the probe reached `max_probe_bytes`. For long-lived
+streams with unknown plain-text input, lower `max_probe_bytes` to reduce startup
+latency or use read_lines directly.
+
+### `max_probe_bytes = uint (optional)`
+
+The maximum number of bytes to inspect before forcing a detection decision.
+
+Defaults to `1Mi` bytes.
+
+## Examples
+
+### Detect JSON lines
+
+Given this input:
+
+```json title="events.ndjson"
+{"x":1}
+{"x":2}
+```
+
+Use `read_auto` where you would normally use a concrete reader:
+
+```tql
+from_file "events.ndjson" {
+ read_auto
+}
+```
+
+```tql
+{x: 1}
+{x: 2}
+```
+
+### Fall back to lines
+
+For arbitrary UTF-8 text, opt into line-based parsing explicitly:
+
+```txt title="messages.txt"
+hello
+world
+```
+
+```tql
+from_file "messages.txt" {
+ read_auto fallback="lines"
+}
+```
+
+```tql
+{line: "hello"}
+{line: "world"}
+```
+
+### Fall back to a single event
+
+Use `fallback="all"` when unknown input should become one event instead of one
+event per line:
+
+```tql
+from_file "payload.bin" {
+ read_auto fallback="all"
+}
+```
+
+If the input is binary, the resulting event contains a `blob` value in the
+`data` field.
+
+### Accept multiple formats over TCP
+
+Use `read_auto` in a network listener when the endpoint accepts producers with
+different formats:
+
+```tql
+accept_tcp "0.0.0.0:9000" {
+ read_auto fallback="lines"
+}
+```
+
+The detector runs separately for each connection. This makes the pattern useful
+for rapid prototyping, intake endpoints shared by several teams, and package
+pipelines that normalize data only after the parser has selected the input
+format.
+
+## See Also
+
+- accept_tcp
+- from_file
+- read_all
+- read_csv
+- read_json
+- read_lines
+- read_ndjson
+- read_syslog
+- read_yaml
+- collecting/get-data-from-the-network
+- collecting/read-and-watch-files
+- parsing/parse-delimited-text