News Tracker

News Tracker is a producer-side intelligence system for semiconductors: it ingests multi-source evidence, resolves claims into assertions, builds lane outputs (narrative, filing, structural, backtest), and publishes manifest-keyed objects for downstream consumers.

Why Teams Use This

Reduce time-to-insight from raw documents to explainable intelligence artifacts.
Compare narrative momentum vs filing-confirmed adoption before making decisions.
Surface second-order exposure paths with auditable structural rationale.
Replay runs and publication state point-in-time through manifests and lineage.

Quick Start (Utility-First)

1) Boot Local Stack

uv sync --extra dev
docker compose up -d
uv run news-tracker init-db

2) Create Reproducible Mock State

uv run news-tracker run-once --mock
uv run news-tracker graph seed

# Optional (for non-empty Themes view screenshots)
uv run news-tracker run-once --mock --with-embeddings
uv run news-tracker daily-clustering --date YYYY-MM-DD

3) Run API + Frontend

uv run news-tracker serve
cd frontend && npm install && npm run dev

4) Capture UI Evidence with Playwright

FRONTEND_PORT=5151   # If occupied, use the actual Vite port from startup logs
mkdir -p output/playwright
npx --yes playwright screenshot "http://localhost:${FRONTEND_PORT}/" output/playwright/dashboard.png
npx --yes playwright screenshot "http://localhost:${FRONTEND_PORT}/themes" output/playwright/themes.png
npx --yes playwright screenshot "http://localhost:${FRONTEND_PORT}/graph" output/playwright/graph.png

Core Operator Views

Dashboard (System + Lane Signal)

Use this for quick triage:

Is ingestion/processing moving?
Are lane health states publish-ready?
Are queue backlogs stable?

Themes (Narrative Discovery Surface)

Use this to inspect theme volume, lifecycle stage, and ranking-oriented context. If empty, run clustering jobs and confirm ingestion throughput.

Graph (Structural Exposure Surface)

Use this to explore two complementary graph layers:

Manual causal graph (causal_nodes / causal_edges): node types ticker, theme, technology; edge relations depends_on, supplies_to, competes_with, drives, blocks.
Assertion-derived structural relations (src/graph/structural.py): broader concept predicates (for example customer_of, uses_technology, component_of) with sign, freshness, corroboration, and assertion lineage. During news-tracker graph sync, these map into causal-edge relations for traversal APIs.

Q88 Producer Boundary (Authoritative Architecture)

The system is documented around this producer contract:

Source documents and filings produce evidence claims.
Claims are resolved into assertions (current-belief layer).
Assertions feed lane-specific computations.
Lane outputs are published as manifest-keyed objects.
Downstream consumers read published surfaces, not WIP tables.

Key publication concepts:

news_intel.lane_runs: lane execution lifecycle.
intel_pub.manifests: versioned publication units.
intel_pub.manifest_pointers: active serving pointer per lane.
intel_pub.published_objects: published, lineaged payloads.
intel_pub.read_model: stable consumer read surface.

Data Science / ML Techniques by Lane

Lane	Techniques	Primary Outputs	Why It Matters
Narrative	Component scoring: attention, corroboration, confirmation, novelty/persistence	Narrative run payloads, rollups, signals	Distinguishes real cross-platform momentum from noise
Filing	Section-weighted adoption scoring, fact alignment, temporal consistency, divergence classification	Adoption payloads, divergence alerts, issuer summaries	Tests whether narrative claims are operationally reflected in filings
Structural	Assertion-derived typed relations, 1/2-hop path scoring, basket assembly	Path explanations, basket summaries, structural relations	Surfaces second-order beneficiaries/risks with traceable rationale
Backtest	Point-in-time replay over published states	Backtest/evaluation artifacts	Validates decision utility under historical constraints

Narrative Lane Scoring

Narrative scoring is decomposed into four inspectable components:

Attention: velocity + acceleration + doc mass.
Corroboration: platform spread + source diversity + spread speed.
Confirmation: authority alignment + crowd agreement.
Novelty/Persistence: recency decay vs duration persistence.

Composite scores are weighted and capped, then exposed for ranking/explanation workflows.

Filing Lane Scoring

Filing adoption score combines:

Section coverage
Section depth
Fact alignment
Temporal consistency

Divergence logic classifies structured reason codes such as:

narrative_without_filing
filing_without_narrative
adverse_drift
contradictory_drift
lagging_adoption

Structural Lane Scoring

Structural path scoring is explanation-first, not opaque graph embedding:

Edge score: confidence * freshness * corroboration
Path score: product of edge scores with hop decay
Path sign: product of edge signs

Path outputs keep decomposed factors and assertion lineage.

Explainability: “Why Did This Surface?”

This system is designed so surfaced outputs can be audited without recomputing everything live.

Assertion-Level Explainability

Resolved assertions expose confidence context via:

top-level fields: support_count, contradiction_count, source_diversity, valid_from, valid_to, first_seen_at, last_evidence_at
metadata.breakdown: base, freshness, diversity, support_ratio, review_bonus

Structural Path Explainability

Published path explanations include:

hops, path_score, path_sign
confidence_product, freshness_product, corroboration_product, hop_decay
assertion_ids and edge predicate sequence

Filing Divergence Explainability

Divergence payloads include reason code, severity, human-readable summary, and structured evidence fields for UI and audit workflows.

Data Flow (Practical)

Adapters fetch and normalize source content.
Processing pipeline runs spam filtering, deduplication, extraction/enrichment.
Lane logic computes narrative/filing/structural outputs.
Publish layer creates/updates manifests and object state.
Consumers query published objects/read-model surfaces.

API Surfaces

Infrastructure/publish endpoints: src/api/routes/intel.py
User-facing intelligence endpoints: src/api/routes/intel_surface.py
Graph endpoints: src/api/routes/graph.py

Start docs:

API docs: http://localhost:8001/docs

CLI Reference (Most Used)

# Core runtime
news-tracker serve
news-tracker worker
news-tracker init-db
news-tracker health

# Mock ingestion and cleanup
news-tracker run-once --mock
news-tracker cleanup --days 90 --dry-run

# Clustering and ranking workflows
news-tracker daily-clustering --date YYYY-MM-DD
news-tracker cluster status

# Graph and monitoring
news-tracker graph seed
news-tracker drift check-quick
news-tracker drift check-daily
news-tracker drift report

# Backtesting
news-tracker backtest run --start YYYY-MM-DD --end YYYY-MM-DD --strategy swing
news-tracker backtest plot --run-id <id>

Configuration Essentials

# Infrastructure
DATABASE_URL=postgresql://postgres:postgres@localhost:5432/news_tracker
REDIS_URL=redis://localhost:6379/0
API_KEYS=key1,key2

# X/Twitter ingestion
TWITTER_BEARER_TOKEN=...
TWITTER_XUI_ENABLED=false
XUI_INSTALL=false

# Model selection
EMBEDDING_MODEL_NAME=ProsusAI/finbert
SENTIMENT_MODEL_NAME=ProsusAI/finbert
NER_SPACY_MODEL=en_core_web_trf

# Processing thresholds
SPAM_THRESHOLD=0.7
DUPLICATE_THRESHOLD=0.85

The official X API is the primary Twitter/X ingestion path when TWITTER_BEARER_TOKEN is configured. The private xui path is an explicit fallback: set TWITTER_XUI_ENABLED=true at runtime and XUI_INSTALL=true at image build time to use it.

Feature flags are opt-in and grouped by subsystem (*_ENABLED). See src/config/settings.py for full settings.

Observability and Reliability

Tracing: OpenTelemetry with OTLP export.
Metrics: Prometheus + Grafana dashboards.
Logging: structured logs with request correlation.
Lane health semantics: freshness, quality, quarantine, publish readiness.

Development

uv sync --extra dev
uv run pytest tests/ -v
uv run pytest tests/ -v -m "not integration"

Project layout highlights:

src/contracts/intelligence/: producer contract definitions
src/publish/: manifest/pointer/object lifecycle and export
src/assertions/: aggregation, derived edges, recompute
src/narrative/, src/filing/, src/graph/: lane-specific methods
src/api/: REST/WebSocket routes
frontend/: React app and domain views

Roadmap / In Progress

Tightening end-to-end lane publication orchestration across all lanes.
Expanding parity between lane payload producers and intel_surface consumer expectations.
Continuing migration of mixed UX reads to published-object surfaces where WIP table access still exists.
Improving screenshot/data fixtures for richer non-empty local demo states.
Continuing contract-level hardening and replay validations for operational cutover.

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 276 Commits
.beads		.beads
.github/workflows		.github/workflows
benchmarks/inference_runtime		benchmarks/inference_runtime
deployments		deployments
docs/superpowers/plans		docs/superpowers/plans
frontend		frontend
migrations		migrations
output/playwright		output/playwright
scripts		scripts
src		src
tests		tests
.dockerignore		.dockerignore
.env.example		.env.example
.gitattributes		.gitattributes
.gitignore		.gitignore
AGENTS.md		AGENTS.md
CLAUDE.md		CLAUDE.md
Dockerfile		Dockerfile
README.md		README.md
docker-compose.yml		docker-compose.yml
pyproject.toml		pyproject.toml
themes-page.png		themes-page.png
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

News Tracker

Why Teams Use This

Quick Start (Utility-First)

1) Boot Local Stack

2) Create Reproducible Mock State

3) Run API + Frontend

4) Capture UI Evidence with Playwright

Core Operator Views

Dashboard (System + Lane Signal)

Themes (Narrative Discovery Surface)

Graph (Structural Exposure Surface)

Q88 Producer Boundary (Authoritative Architecture)

Data Science / ML Techniques by Lane

Narrative Lane Scoring

Filing Lane Scoring

Structural Lane Scoring

Explainability: “Why Did This Surface?”

Assertion-Level Explainability

Structural Path Explainability

Filing Divergence Explainability

Data Flow (Practical)

API Surfaces

CLI Reference (Most Used)

Configuration Essentials

Observability and Reliability

Development

Roadmap / In Progress

License

About

Uh oh!

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

News Tracker

Why Teams Use This

Quick Start (Utility-First)

1) Boot Local Stack

2) Create Reproducible Mock State

3) Run API + Frontend

4) Capture UI Evidence with Playwright

Core Operator Views

Dashboard (System + Lane Signal)

Themes (Narrative Discovery Surface)

Graph (Structural Exposure Surface)

Q88 Producer Boundary (Authoritative Architecture)

Data Science / ML Techniques by Lane

Narrative Lane Scoring

Filing Lane Scoring

Structural Lane Scoring

Explainability: “Why Did This Surface?”

Assertion-Level Explainability

Structural Path Explainability

Filing Divergence Explainability

Data Flow (Practical)

API Surfaces

CLI Reference (Most Used)

Configuration Essentials

Observability and Reliability

Development

Roadmap / In Progress

License

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages