tilo · tilo · May 14, 2026 · Apr 12, 2026 · Apr 12, 2026 · Apr 12, 2026
diff --git a/.github/workflows/ruby.yml b/.github/workflows/ruby.yml
@@ -27,6 +27,7 @@ jobs:
           - 3.2
           - 3.3
           - "3.4"
+          - "4.0"
           - head
           - truffleruby
           - truffleruby-head

diff --git a/.rubocop.yml b/.rubocop.yml
@@ -13,6 +13,12 @@ Layout/SpaceInsideHashLiteralBraces:
 Layout/SpaceAroundOperators:
   Enabled: false
 
+Lint/ConstantDefinitionInBlock:
+  Enabled: false
+
+Lint/UnderscorePrefixedVariableName:
+  Enabled: false
+
 Metrics/AbcSize:
   Enabled: false
 
@@ -37,6 +43,9 @@ Metrics/ModuleLength:
 Metrics/PerceivedComplexity:
   Enabled: false
 
+Naming/MethodParameterName:
+  Enabled: false
+
 Naming/PredicateName:
   Enabled: false
 
@@ -156,7 +165,7 @@ Style/SymbolArray:
 Style/SymbolProc: # old Ruby versions can't do this
   Enabled: false
 
-Style/TernaryParentheses:
+Style/TernaryParentheses: # parentheses are good!
   Enabled: false
 
 Style/TrailingCommaInArrayLiteral:

diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -1,6 +1,60 @@
 
 # SmarterCSV 1.x Change Log
 
+## 1.17.0 (NOT RELEASED)
+
+RSpec tests: **1,434 → 2,210** (+776 tests)
+
+### New Features
+
+* **Streaming IO support** — SmarterCSV now works with non-seekable IO sources such as pipes, STDIN, and Zlib streams.
+  A rewindable peek buffer transparently captures the first bytes of the stream so that `row_sep` and `col_sep` auto-detection can replay them without requiring the underlying source to support `rewind` or `seek`.
+
+* **Structured warnings** — auto-detection and configuration warnings are now collected on the Reader as a deduped histogram:
+
+  ```ruby
+  reader = SmarterCSV::Reader.new('data.csv')
+  reader.process
+  reader.warnings  # => [{ type:, code:, severity:, message:, count: }, ...]
+  ```
+
+  Repeated warnings of the same `(type, code)` are deduped — `count` tracks occurrences. Available codes today: `:chunk_size_default`, `:header_a_method`, `:utf8_missing_binary_mode`, `:no_clear_row_sep`, `:no_row_sep_found`.
+
+* **Class-level `SmarterCSV.warnings`** accessor — mirrors `SmarterCSV.errors`. Per-thread, cleared at the start of each `.process` / `.parse` / `.each` / `.each_chunk` call. Safe under Puma/Sidekiq.
+
+* **Rails.logger routing** — when `Rails.logger` is present, warnings are routed through it at the severity declared at the call site (`:debug` / `:info` / `:warn` / `:error` / `:fatal`); otherwise `Kernel#warn` is used as a fallback. Detection is cached at construct time, no per-call overhead.
+
+### Improvements
+
+* Improved auto-detection of `row_sep` and `col_sep` — giving more accurate results on files with comment headers.
+
+* Larger scan window for accurate row separator detection on files with wide headers or long first lines.
+
+* `guess_line_ending` now scans the input in chunks up to a 64KB hard cap, returning as soon as one separator has a clear majority. Near-tie chunk-boundary artifacts no longer cause spurious warnings; only true ties at the hard cap fall back to `"\n"` and emit a `:no_clear_row_sep` warning at `:error` severity (silent miss-parse risk).
+
+### New / Changed Options
+
+* **`buffer_size` is now a public option** — peek buffer chunk size for non-seekable inputs (pipes, gzip readers, HTTP/S3 bodies). Default `16_384`. Out-of-range values warn and clamp to the supported range rather than raising.
+
+* **`auto_row_sep_chars` default changed to `4096`** (was `500` in 1.16.x). Sized to cover wide-header CSVs in a single read. Bump it higher if your files have very wide headers or long comment preambles.
+
+### Bug Fixes
+
+* **Files ending in a lone `\r`** are now correctly detected as `\r`-terminated instead of falling through to a "no clear row separator" warning.
+
+* **`remove_empty_values` now treats Unicode whitespace as empty** — a field containing only whitespace, including characters like non-breaking space (U+00A0) or ideographic space (U+3000), is now dropped, the same way Ruby's `String#blank?` behaves. Previously only ASCII whitespace counted (and only Rails apps got the Unicode behavior, via `blank?` — an inconsistency that's now gone). Behavior is identical with or without the C extension.
+
+* **`remove_zero_values` now also removes signed zeros** — `+0`, `-0`, `-0.0`, `+0.00`, etc. are recognized as zero and dropped, just like `0` and `0.0`. (Only applies when `remove_zero_values: true`, which is off by default.)
+
+### Performance
+
+Measured against 1.16.4 (Apple M4, Ruby 3.4.7):
+
+* **C-accelerated path (the default):** quote-heavy, large-field, and wide CSVs parse meaningfully faster — roughly **7–22% faster** (city/address-style files ~10–12%; long-field and wide files the most). CSVs with very short lines and many tiny fields are up to ~3% slower — a side effect of the larger default auto-detection scan window (see `auto_row_sep_chars`); set it back to a smaller value if that matters for your workload. Net: solid wins where there's real per-row work, a small cost on the trivially-cheap cases.
+* **Ruby fallback path (`acceleration: false`):** faster on nearly every file — typically **3–20% faster** than 1.16.4, with the biggest gains on wide and many-small-field CSVs.
+
+Per-file breakdown: [`docs/releases/1.17.0/performance_notes.md`](docs/releases/1.17.0/performance_notes.md).
+
 ## 1.16.4 (2026-04-21) — Bug Fixes
 
 RSpec tests: **1,434 → 1,467** (+33 tests)

diff --git a/Gemfile b/Gemfile
@@ -5,12 +5,17 @@ source 'https://rubygems.org'
 # Specify your gem's dependencies in smarter_csv.gemspec
 gemspec
 
-gem "rake"
-gem "rake-compiler"
+group :development do
+  gem "rake"
+  gem "rake-compiler"
+  gem "ostruct"          # silences rake's stdlib-deprecation warning during dev
+  gem "rubocop"
+end
 
-gem "awesome_print"
-gem 'pry'
-gem "rubocop"
+group :development, :test do
+  gem "awesome_print"
+  gem "pry"              # required in spec_helper.rb; also useful in dev console
+end
 
 group :test do
   gem "rspec"

diff --git a/README.md b/README.md
@@ -14,9 +14,13 @@
 
   Beyond raw speed, SmarterCSV is designed to provide a significantly more convenient and developer-friendly interface than traditional CSV libraries. Instead of returning raw arrays that require substantial post-processing, SmarterCSV produces Rails-ready hashes for each row, making the data immediately usable with ActiveRecord, Sidekiq pipelines, parallel processing, and JSON-based workflows such as S3.
 
+  In a Rails app, warnings auto-route through `Rails.logger` and instrumentation hooks compose with `ActiveSupport::Notifications` — no setup required. Outside Rails, warnings fall back to `$stderr` and the same APIs work without any framework dependency.
+
   The library includes intelligent defaults, automatic detection of column and row separators, and flexible header/value transformations. These features eliminate much of the boilerplate typically required when working with CSV data and help keep ingestion code concise and maintainable.
 
-  For large files, SmarterCSV supports both chunked processing (arrays of hashes) and streaming via Enumerable APIs, enabling efficient batch jobs and low-memory pipelines. The C acceleration further optimizes the full ingestion path — including parsing, hash construction, and conversions — so performance gains reflect real-world workloads, not just tokenizer benchmarks.
+  For large files, SmarterCSV supports both chunked processing (arrays of hashes) and streaming via Enumerable APIs, enabling efficient batch jobs and low-memory pipelines.
+  As of 1.17.0, SmarterCSV also accepts **non-seekable streaming inputs** — pipes, `STDIN`, `Zlib::GzipReader`, and HTTP responses — with no need to materialize the file on disk first.
+  The C acceleration further optimizes the full ingestion path — including parsing, hash construction, and conversions — so performance gains reflect real-world workloads, not just tokenizer benchmarks.
 
   The interface is intentionally designed to robustly handle messy real-world CSV while keeping application code clean. Developers can easily map headers, skip unwanted rows, quarantine problematic data, and transform values on the fly without building custom post-processing pipelines. See [Real-World CSV Files](docs/real_world_csv.md) for a comprehensive guide to production CSV patterns.
 
@@ -33,22 +37,33 @@ SmarterCSV is designed for **real-world CSV processing**, returning fully usable
 
 For a fair comparison, `CSV.table` is the closest Ruby CSV equivalent to SmarterCSV.
 
-| Comparison (SmarterCSV 1.16.0, C-accelerated)  | Range                   |
+| Comparison (SmarterCSV 1.17.0, C-accelerated)  | Range                   |
 |-------------------------------------------------|-------------------------|
-| vs SmarterCSV 1.15.2 (with C acceleration)      | up to 2.4× faster       |
-| vs SmarterCSV 1.14.4 (with C acceleration)      | 9×–65× faster           |
-| vs SmarterCSV 1.14.4 (Ruby path)                | 1.7×–10.6× faster       |
-| vs CSV.read  (arrays of arrays)                 | 1.7×–8.6× faster        |
-| vs CSV.table (arrays of hashes)                 | 7×–129× faster          |
-| vs ZSV (arrays of hashes, equiv. output)        | 1.1×–6.6× faster †      |
+| vs SmarterCSV 1.15.2 (with C acceleration)      | up to 2.8× faster       |
+| vs SmarterCSV 1.14.4 (with C acceleration)      | 9×–82× faster           |
+| vs SmarterCSV 1.14.4 (Ruby path)                | 2.4×–19.8× faster       |
+| vs CSV.read  (arrays of arrays)                 | 1.3×–7.9× faster        |
+| vs CSV.table (arrays of hashes)                 | 4.9×–132× faster        |
+| vs ZSV 1.3.0 (arrays of hashes, equiv. output)  | 1.1×–6.6× faster †      |
+
+† SmarterCSV faster on 15 of 16 files. ZSV raw arrays (no hashes, no conversions) are 2×–14× faster — but that omits the post-processing work needed to produce usable output. ZSV row carried over from the 1.16.0 benchmark; not re-measured for 1.17.0.
+
+_Benchmarks: 19 CSV files (20k–240k rows), Ruby 3.4.7, Apple M4._
 
-† SmarterCSV faster on 15 of 16 files. ZSV raw arrays (no hashes, no conversions) are 2×–14× faster — but that omits the post-processing work needed to produce usable output.
+> ⁉️ **Why these numbers look a touch lower than 1.16.0 charts?**
+> TL;DR: because we use different statistic methods.
+>
+> Earlier versions of these benchmarks reported the best-of-N sample (the absolute `min` / fastest run) for each measurement. A single lucky run — empty caches lining up, no scheduler interrupts — could shave up to ~10% off and become the headline number. I think that would be misleading.
+> Because of that, we've switched to the 10th-percentile (`p10`) of multiple runs of 40 samples, which discards roughly the four luckiest runs and reports a time much closer to what you'll actually observe in production. On noisier fixtures `p10` is ~5–10% above `min`; on quiet ones it's within 1%. The relative ordering between versions and adapters is unchanged; the absolute speedup figures are simply more honest.
 
-_Benchmarks: 19 CSV files (20k–80k rows), Ruby 3.4.7, Apple M1._
+### SmarterCSV vs Ruby CSV
+![SmarterCSV 1.17.0 vs Ruby CSV 3.3.5 speedup](images/SmarterCSV_1.17.0_vs_RubyCSV_3.3.5_speedup.svg)
 
-![SmarterCSV 1.16.0 vs Ruby CSV 3.3.5 speedup](images/SmarterCSV_1.16.0_vs_RubyCSV_3.3.5_speedup.png)
+### SmarterCSV C Path
+![SmarterCSV 1.17.0 vs previous versions — C-accelerated path](images/SmarterCSV_1.17.0_vs_previous_C-speedup.svg)
 
-![SmarterCSV 1.16.0 vs previous versions — C-accelerated path](images/SmarterCSV_1.16.0_vs_previous_C-speedup.svg)
+### SmarterCSV Ruby Path
+![SmarterCSV 1.17.0 vs previous versions — Ruby path](images/SmarterCSV_1.17.0_vs_previous_Rb-speedup.svg)
 
 See [SmarterCSV 1.15.2: Faster Than Raw CSV Arrays](https://tilo-sloboda.medium.com/smartercsv-1-15-2-faster-than-raw-csv-arrays-benchmarks-zsv-and-the-full-pipeline-2c12a798032e) and [PR #319](https://github.com/tilo/smarter_csv/pull/319) for more details.
 
@@ -61,7 +76,7 @@ It's a one-line change:
 # Before
 rows = CSV.table('data.csv').map(&:to_h)
 
-# After — up to 129× faster, same symbol keys
+# After — up to 132× faster, same symbol keys
 rows = SmarterCSV.process('data.csv')
 ```
 
@@ -124,6 +139,23 @@ strip_whitespace → nil_values_matching → remove_empty_values → remove_zero
 
 Each step is individually configurable. See [Data Transformations](docs/data_transformations.md) and [Value Converters](docs/value_converters.md) for details.
 
+### Value Converters
+
+Per-column lambdas convert raw strings into typed values — dates, currency, booleans:
+
+```ruby
+require 'date'
+
+data = SmarterCSV.process('orders.csv',
+  value_converters: {
+    dob:    ->(v) { v && Date.strptime(v, '%m/%d/%Y') },
+    price:  ->(v) { v&.delete('$,')&.to_f },
+    active: ->(v) { v&.match?(/\Atrue\z/i) },
+  })
+```
+
+See [Value Converters](docs/value_converters.md).
+
 ### Batch Processing:
 
 Processing large CSV files in chunks minimizes memory usage and enables powerful workflows:
@@ -147,6 +179,8 @@ SmarterCSV.process(filename, chunk_size: 100) do |chunk|
 end
 ```
 
+See [Batch Processing](docs/batch_processing.md) for chunk sizing, `each_chunk`, and parallel-worker patterns.
+
 ### Modern Enumerator API:
 
 `Reader#each` is the modern, idiomatic way to process rows — `Reader` includes `Enumerable`, so all standard Ruby methods work:
@@ -166,6 +200,29 @@ first_ten = reader.lazy.select { |h| h[:active] }.first(10)
 reader.each_slice(500) { |batch| MyModel.insert_all(batch) }
 ```
 
+See [The Basic Read API](docs/basic_read_api.md) for the full `Reader` interface.
+
+### Streaming / Non-Seekable Inputs (1.17.0+):
+
+SmarterCSV reads directly from any IO — no need to materialize the file on disk first. Auto-detection works on streaming inputs without rewinding; the first chunk is buffered transparently.
+
+```ruby
+# Gzipped CSV — stream-decompressed, never written to disk
+require 'zlib'
+Zlib::GzipReader.open('huge.csv.gz') do |io|
+  SmarterCSV.process(io) { |row| MyModel.upsert(row.first) }
+end
+
+# STDIN / pipes
+SmarterCSV.process($stdin) { |row, _| ... }
+
+# HTTP response body
+require 'open-uri'
+URI.open('https://example.com/data.csv') { |io| SmarterCSV.process(io) }
+```
+
+See [Row and Column Separators](docs/row_col_sep.md) for how `:auto` detection works on non-seekable streams, and [Configuration Options](docs/options.md) for `buffer_size` (the peek-buffer chunk size).
+
 ### Bad Row Handling:
 
 SmarterCSV can quarantine malformed rows instead of crashing the entire import:
@@ -182,7 +239,33 @@ end
 
 See [Bad Row Quarantine](docs/bad_row_quarantine.md) for full details including `bad_row_limit` and `field_size_limit`.
 
-See [13 Examples](docs/examples.md) for more, including value converters, header validation, writing CSV, encoding handling, and resumable Rails ActiveJob imports.
+### Header Validation:
+
+Raise early if the file is missing required columns, before any data row is processed:
+
+```ruby
+begin
+  SmarterCSV.process('transactions.csv',
+    required_keys: [:account_id, :amount, :currency])
+rescue SmarterCSV::MissingKeys => e
+  abort "CSV missing columns: #{e.keys.join(', ')}"
+end
+```
+
+See [Header Validations](docs/header_validations.md).
+
+### Writing CSV:
+
+```ruby
+SmarterCSV.generate('output.csv') do |csv|
+  csv << { name: 'Alice', age: 30, city: 'New York' }
+  csv << { name: 'Bob',   age: 25, city: 'Chicago'  }
+end
+```
+
+Hashes (not arrays) make column-shift bugs impossible — adding a column never silently misaligns existing rows. See [The Basic Write API](docs/basic_write_api.md) for header renaming, value converters, and ordered output.
+
+See [18 Examples](docs/examples.md) for more, including encoding and preamble handling, key mapping, instrumentation hooks, and resumable Rails ActiveJob imports.
 
 ## Requirements
 
@@ -223,6 +306,7 @@ Or install it yourself as:
   * [Data Transformations](docs/data_transformations.md)
   * [Value Converters](docs/value_converters.md)
   * [Bad Row Quarantine](docs/bad_row_quarantine.md)
+  * [Warnings](docs/warnings.md)
   * [Instrumentation Hooks](docs/instrumentation.md)
   * [Examples](docs/examples.md)
   * [Real-World CSV Files](docs/real_world_csv.md)
-Original file line number
+Diff line change
@@ Expand Up / @@ -27,6 +27,7 @@ jobs: @@
               - 3.2
               - 3.3
               - "3.4"
+              - "4.0"
               - head
               - truffleruby
               - truffleruby-head
@@ Expand Down @@