pgjones · menudoproblema · Nov 14, 2025 · Nov 14, 2025 · Nov 14, 2025 · Nov 14, 2025
diff --git a/benchmarks/README.md b/benchmarks/README.md
@@ -0,0 +1,156 @@
+# Benchmarks
+
+Suite de benchmarks mantenible para Hypercorn. Su funcion es validar cambios de rendimiento con una metodologia reproducible y suficientemente estable como para vivir en `main`.
+
+## Objetivos
+
+- medir mejoras y regresiones reales en rutas criticas;
+- comparar ramas o refs contra un baseline reproducible;
+- separar benchmarks generales de escenarios dirigidos;
+- priorizar resultados robustos frente a cifras puntuales.
+
+## Metodologia por defecto
+
+Todos los comparadores `benchmarks/compare*.py` siguen estas reglas por defecto:
+
+- intercalan runs de `current` y `baseline`;
+- resumen por medianas de run;
+- exponen `mean`, `median` y `p95`;
+- dejan `--sequential` solo para diagnostico o compatibilidad historica.
+
+Esto reduce sesgos por orden de ejecucion, calentamiento, drift del sistema y ruido de la maquina.
+
+## Estructura
+
+- `benchmarks/app.py`
+  App ASGI minima usada como objetivo de los benchmarks de servidor.
+- `benchmarks/_runtime.py`
+  Infraestructura compartida de servidor, TLS, readiness, puertos y percentiles.
+- `benchmarks/_compare.py`
+  Infraestructura compartida de worktrees, intercalado, resumen y salida JSON.
+- `benchmarks/run_load.py`
+  Benchmark dirigido de HoL en HTTP/2.
+- `benchmarks/fragmented_body.py`
+  Benchmark dirigido de request body H2 muy fragmentado.
+- `benchmarks/general.py`
+  Benchmark general para `/fast` y otras rutas HTTP.
+- `benchmarks/ws.py`
+  Benchmark de eco WebSocket.
+- `benchmarks/h3.py`
+  Benchmark real de HTTP/3 sobre QUIC con `aioquic`.
+- `benchmarks/task_group.py`
+  Microbenchmark de `TaskGroup.spawn_app()`.
+- `benchmarks/compare*.py`
+  Comparadores contra otro ref o repo.
+
+## Escenarios disponibles
+
+### HoL HTTP/2
+
+Una sola conexion HTTP/2, un stream lento y varios streams rapidos multiplexados. Mide si existe bloqueo global a nivel de conexion.
+
+```bash
+python -m benchmarks.run_load
+python -m benchmarks.compare --baseline-ref upstream/main
+```
+
+### Body fragmentado HTTP/2
+
+Ejercita `QueuedStream` y el coste de entregar muchos `DATA` pequenos al app ASGI.
+
+```bash
+python -m benchmarks.fragmented_body
+python -m benchmarks.compare_fragmented_body --baseline-ref upstream/main
+```
+
+### Benchmark general HTTP
+
+Sirve para medir el camino rapido sin mezclarlo con escenarios artificiales.
+
+```bash
+python -m benchmarks.general --http-version 1.1
+python -m benchmarks.general --http-version 1.1 --tls
+python -m benchmarks.general --http-version 2
+python -m benchmarks.compare_general --baseline-ref upstream/main
+```
+
+El comparador general ejecuta:
+
+- HTTP/1.1 sin TLS
+- HTTP/1.1 con TLS
+- HTTP/2 con TLS
+
+### WebSocket echo
+
+Valida handshake y eco binario con payload configurable.
+
+```bash
+python -m benchmarks.ws --tls
+python -m benchmarks.compare_ws --baseline-ref upstream/main --tls
+```
+
+### HTTP/3 real
+
+Mide QUIC/H3 real con una conexion H3 multiplexada y un cliente `aioquic`.
+
+```bash
+python -m benchmarks.h3
+python -m benchmarks.compare_h3 --baseline-ref upstream/main
+```
+
+### TaskGroup
+
+Microbenchmark de investigacion para separar el coste fijo de `TaskGroup.spawn_app()` del servidor completo.
+
+```bash
+python -m benchmarks.task_group --mode asgi
+python -m benchmarks.task_group --mode wsgi
+python -m benchmarks.compare_task_group --mode asgi --baseline-ref upstream/main
+python -m benchmarks.compare_task_group --mode wsgi --baseline-ref upstream/main
+```
+
+## Comandos habituales
+
+Comparar contra `upstream/main`:
+
+```bash
+python -m benchmarks.compare_general --baseline-ref upstream/main --runs 6
+python -m benchmarks.compare --baseline-ref upstream/main --runs 6
+python -m benchmarks.compare_h3 --baseline-ref upstream/main --runs 4
+```
+
+Comparar contra un repo ya existente sin crear worktree:
+
+```bash
+python -m benchmarks.compare_general --baseline-path /ruta/a/otro/repo --runs 6
+```
+
+Guardar resultados en JSON:
+
+```bash
+python -m benchmarks.compare_general \
+  --baseline-ref upstream/main \
+  --runs 6 \
+  --output-json benchmarks/results/general.json
+```
+
+## Guia de interpretacion
+
+- `p95` es el guard-rail principal de cola larga.
+- `mean` y `median` ayudan a distinguir mejora general de mejora puntual.
+- `req/s` o `messages/s` sirven para leer throughput, pero no deben ocultar empeoramientos claros de latencia.
+- Los benchmarks dirigidos como HoL o body fragmentado validan cambios estructurales.
+- Los benchmarks generales detectan si una optimizacion local rompe el balance global.
+
+## Mantenimiento
+
+- Antes de aceptar un cambio de rendimiento, medir contra un baseline estable.
+- No usar una sola pasada secuencial para sacar conclusiones.
+- Si un escenario es nuevo, anadir primero un benchmark reproducible y luego el cambio.
+- Mantener la app de benchmark y los comparadores pequenos, explicitamente documentados y sin dependencias ocultas.
+
+## Notas
+
+- La suite usa `benchmarks/app.py` como app objetivo.
+- `tests/assets/cert.pem` y `tests/assets/key.pem` habilitan TLS y ALPN para H2.
+- Los resultados son comparaciones relativas entre ramas o refs, no numeros absolutos publicables.
diff --git a/benchmarks/__init__.py b/benchmarks/__init__.py
@@ -0,0 +1 @@
+
diff --git a/benchmarks/_compare.py b/benchmarks/_compare.py
@@ -0,0 +1,162 @@
+from __future__ import annotations
+
+import json
+import shutil
+import subprocess
+import tempfile
+from dataclasses import replace
+from pathlib import Path
+from typing import Any, Callable, Sequence, TypeVar
+
+PROJECT_ROOT = Path(__file__).resolve().parent.parent
+INTERLEAVED_METHODOLOGY = "interleaved median-of-runs"
+SEQUENTIAL_METHODOLOGY = "sequential median-of-runs"
+
+T = TypeVar("T")
+
+
+def create_worktree(ref: str, fetch: bool) -> tuple[Callable[[], None], Path]:
+    if fetch:
+        subprocess.run(["git", "fetch", "upstream"], cwd=PROJECT_ROOT, check=True)
+
+    tempdir = Path(tempfile.mkdtemp(prefix="hypercorn-bench-"))
+    subprocess.run(
+        ["git", "worktree", "add", "--detach", str(tempdir), ref],
+        cwd=PROJECT_ROOT,
+        check=True,
+        stdout=subprocess.DEVNULL,
+        stderr=subprocess.DEVNULL,
+    )
+
+    def cleanup() -> None:
+        subprocess.run(
+            ["git", "worktree", "remove", "--force", str(tempdir)],
+            cwd=PROJECT_ROOT,
+            check=True,
+            stdout=subprocess.DEVNULL,
+            stderr=subprocess.DEVNULL,
+        )
+        shutil.rmtree(tempdir, ignore_errors=True)
+
+    return cleanup, tempdir
+
+
+def percentage_improvement(current: float, baseline: float) -> float:
+    if baseline == 0:
+        return 0.0
+    return ((baseline - current) / baseline) * 100
+
+
+def percentage_growth(current: float, baseline: float) -> float:
+    if baseline == 0:
+        return 0.0
+    return ((current - baseline) / baseline) * 100
+
+
+def methodology_name(*, sequential: bool) -> str:
+    return SEQUENTIAL_METHODOLOGY if sequential else INTERLEAVED_METHODOLOGY
+
+
+async def run_interleaved_async(
+    runs: int,
+    current_runner: Callable[[int], Any],
+    baseline_runner: Callable[[int], Any],
+    *,
+    interleave: bool,
+) -> tuple[list[Any], list[Any]]:
+    current_runs: list[Any] = []
+    baseline_runs: list[Any] = []
+    for index in range(runs):
+        if interleave and (index % 2 == 1):
+            baseline_runs.append(await baseline_runner(index))
+            current_runs.append(await current_runner(index))
+        else:
+            current_runs.append(await current_runner(index))
+            baseline_runs.append(await baseline_runner(index))
+    return current_runs, baseline_runs
+
+
+def run_interleaved_sync(
+    runs: int,
+    current_runner: Callable[[int], T],
+    baseline_runner: Callable[[int], T],
+    *,
+    interleave: bool,
+) -> tuple[list[T], list[T]]:
+    current_runs: list[T] = []
+    baseline_runs: list[T] = []
+    for index in range(runs):
+        if interleave and (index % 2 == 1):
+            baseline_runs.append(baseline_runner(index))
+            current_runs.append(current_runner(index))
+        else:
+            current_runs.append(current_runner(index))
+            baseline_runs.append(baseline_runner(index))
+    return current_runs, baseline_runs
+
+
+def summarize_dataclass_runs(
+    label: str,
+    runs: Sequence[T],
+    *,
+    extra_fields: dict[str, Callable[[Sequence[T]], Any]] | None = None,
+) -> T:
+    if not runs:
+        raise ValueError("Expected at least one benchmark run")
+
+    overrides: dict[str, Any] = {
+        "target_label": label,
+        "samples_ms": [sample for run in runs for sample in getattr(run, "samples_ms")],
+        "mean_ms": _median(getattr(run, "mean_ms") for run in runs),
+        "median_ms": _median(getattr(run, "median_ms") for run in runs),
+        "p95_ms": _median(getattr(run, "p95_ms") for run in runs),
+        "minimum_ms": _median(getattr(run, "minimum_ms") for run in runs),
+        "maximum_ms": _median(getattr(run, "maximum_ms") for run in runs),
+    }
+    if extra_fields is not None:
+        for field, aggregator in extra_fields.items():
+            overrides[field] = aggregator(runs)
+    return replace(runs[0], **overrides)
+
+
+def build_comparison_result(
+    current: Any,
+    baseline: Any,
+    *,
+    throughput_field: str | None = None,
+    throughput_delta_field: str | None = None,
+    throughput_improvement_field: str | None = None,
+) -> dict[str, Any]:
+    payload = {
+        "current": current.__dict__,
+        "baseline": baseline.__dict__,
+        "delta_mean_ms": current.mean_ms - baseline.mean_ms,
+        "delta_median_ms": current.median_ms - baseline.median_ms,
+        "delta_p95_ms": current.p95_ms - baseline.p95_ms,
+        "improvement_mean_percent": percentage_improvement(current.mean_ms, baseline.mean_ms),
+        "improvement_median_percent": percentage_improvement(current.median_ms, baseline.median_ms),
+        "improvement_p95_percent": percentage_improvement(current.p95_ms, baseline.p95_ms),
+    }
+    if throughput_field is not None:
+        current_value = getattr(current, throughput_field)
+        baseline_value = getattr(baseline, throughput_field)
+        payload[throughput_delta_field or f"delta_{throughput_field}"] = current_value - baseline_value
+        payload[throughput_improvement_field or f"improvement_{throughput_field}_percent"] = percentage_growth(
+            current_value, baseline_value
+        )
+    return payload
+
+
+def write_json_output(payload: dict[str, Any], output_json: str | None) -> None:
+    encoded = json.dumps(payload, indent=2) + "\n"
+    print(encoded, end="")
+    if output_json is not None:
+        Path(output_json).write_text(encoded)
+
+
+def _median(values) -> float:
+    ordered = sorted(values)
+    middle = len(ordered) // 2
+    if len(ordered) % 2 == 1:
+        return float(ordered[middle])
+    return (ordered[middle - 1] + ordered[middle]) / 2