Skip to content

[Stack 14/27] Fix test DB connection: use DATABASE_URL with dotenv#2443

Closed
jucor wants to merge 13 commits intojc/vectorize-participant-infofrom
jc/fix-test-db-connection
Closed

[Stack 14/27] Fix test DB connection: use DATABASE_URL with dotenv#2443
jucor wants to merge 13 commits intojc/vectorize-participant-infofrom
jc/fix-test-db-connection

Conversation

@jucor
Copy link
Copy Markdown
Collaborator

@jucor jucor commented Mar 13, 2026

Summary

Stacked on #2437 (Vectorize participant info computation (3-15x speedup)). Please review and merge #2437 first.
Next in stack: #2446 (Fix D9: z-score thresholds from two-tailed to one-tailed)

  • Absorbed PR [CLOSED — absorbed into #2443] DB connect_timeout #2434 (DB connect_timeout): adds connect_timeout=5 to all DB connections so tests fail fast instead of hanging when Postgres is unavailable. Reverts timeout from production code (a 5s timeout could break real deployments), keeping it only in test code.
  • Adds require_dynamodb() and require_s3() helpers to conftest that probe services with short timeouts and pytest.fail() immediately — applied to all tests that previously hung when Docker services were down.
  • Adds pytest-timestamper for per-test timing visibility.
  • Fixes test_postgres_real_data.py to use DATABASE_URL (consistent with all other delphi Python code) via python-dotenv.

Refs #2442

Test plan

  • test_conversation_from_postgres passes
  • test_pakistan_conversation_batch passes (9 min, 400K votes)
  • Full delphi test suite: 294 passed, no regressions

🤖 Generated with Claude Code

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Updates the Delphi real-data PostgreSQL integration test to connect using a single DATABASE_URL connection string (optionally loaded from a .env file), aligning with the repository’s broader database configuration approach.

Changes:

  • Update connect_to_db() to load .env and read DATABASE_URL.
  • Simplify psycopg2 connection setup to use the URL DSN instead of individual POSTGRES_* fields.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +112 to +115
database_url = os.environ.get('DATABASE_URL')
if not database_url:
print("DATABASE_URL environment variable is not set")
return None
Comment on lines +107 to +111
from dotenv import load_dotenv

# Load .env from the repo root (two levels up from tests/)
load_dotenv(Path(__file__).parent.parent.parent / '.env')

@jucor jucor force-pushed the jc/vectorize-participant-info branch from b839591 to 3048f93 Compare March 13, 2026 13:09
@jucor jucor force-pushed the jc/fix-test-db-connection branch from 1514e3d to bafbdf6 Compare March 13, 2026 13:09
@jucor jucor changed the title Fix test DB connection: use DATABASE_URL with dotenv [Stack 12/12] Fix test DB connection: use DATABASE_URL with dotenv Mar 13, 2026
@jucor jucor force-pushed the jc/fix-test-db-connection branch from cb7c5cd to bafbdf6 Compare March 13, 2026 13:45
@jucor jucor force-pushed the jc/vectorize-participant-info branch from 3048f93 to e883338 Compare March 13, 2026 13:53
@jucor jucor force-pushed the jc/fix-test-db-connection branch from bafbdf6 to e2f39bb Compare March 13, 2026 13:54
@jucor jucor changed the title [Stack 12/12] Fix test DB connection: use DATABASE_URL with dotenv [Stack 12/13] Fix test DB connection: use DATABASE_URL with dotenv Mar 13, 2026
@jucor jucor force-pushed the jc/vectorize-participant-info branch from e883338 to 9dee5d9 Compare March 13, 2026 14:15
@jucor jucor force-pushed the jc/fix-test-db-connection branch from e2f39bb to 89297cf Compare March 13, 2026 14:17
@jucor jucor force-pushed the jc/vectorize-participant-info branch from 9dee5d9 to 0d4cd71 Compare March 13, 2026 15:56
@jucor jucor force-pushed the jc/fix-test-db-connection branch from 89297cf to 403be8d Compare March 13, 2026 16:07
@jucor jucor changed the title [Stack 12/13] Fix test DB connection: use DATABASE_URL with dotenv [Stack 12/15] Fix test DB connection: use DATABASE_URL with dotenv Mar 16, 2026
@jucor jucor force-pushed the jc/vectorize-participant-info branch from 0d4cd71 to fa345cd Compare March 16, 2026 16:04
@jucor jucor force-pushed the jc/fix-test-db-connection branch from 403be8d to 464da16 Compare March 16, 2026 16:04
@jucor jucor changed the title [Stack 12/15] Fix test DB connection: use DATABASE_URL with dotenv [Stack 12/16] Fix test DB connection: use DATABASE_URL with dotenv Mar 16, 2026
@jucor jucor changed the title [Stack 12/16] Fix test DB connection: use DATABASE_URL with dotenv [Stack 12/17] Fix test DB connection: use DATABASE_URL with dotenv Mar 16, 2026
@jucor jucor changed the title [Stack 12/17] Fix test DB connection: use DATABASE_URL with dotenv [Stack 12/24] Fix test DB connection: use DATABASE_URL with dotenv Mar 17, 2026
@jucor jucor changed the title [Stack 12/24] Fix test DB connection: use DATABASE_URL with dotenv [Stack 12/25] Fix test DB connection: use DATABASE_URL with dotenv Mar 17, 2026
@jucor jucor force-pushed the jc/vectorize-participant-info branch from fa345cd to 655bcfa Compare March 19, 2026 10:43
@jucor jucor force-pushed the jc/fix-test-db-connection branch from 6b81d8e to 6da6172 Compare March 19, 2026 10:43
@jucor jucor changed the title [Stack 12/25] Fix test DB connection: use DATABASE_URL with dotenv [Stack 11/24] Fix test DB connection: use DATABASE_URL with dotenv Mar 19, 2026
@jucor jucor force-pushed the jc/vectorize-participant-info branch from 655bcfa to 01a7637 Compare March 19, 2026 12:31
@jucor jucor force-pushed the jc/fix-test-db-connection branch 2 times, most recently from 0ef54ca to b62c0cd Compare March 19, 2026 14:52
@jucor jucor changed the title [Stack 11/24] Fix test DB connection: use DATABASE_URL with dotenv [Stack 12/25] Fix test DB connection: use DATABASE_URL with dotenv Mar 19, 2026
@jucor jucor force-pushed the jc/fix-test-db-connection branch from 4ea1333 to 917cba8 Compare March 27, 2026 10:41
@jucor jucor changed the title [Stack 12/25] Fix test DB connection: use DATABASE_URL with dotenv [Stack 13/26] Fix test DB connection: use DATABASE_URL with dotenv Mar 30, 2026
@jucor jucor force-pushed the jc/fix-test-db-connection branch from 917cba8 to e45a120 Compare March 30, 2026 12:48
@jucor jucor changed the title [Stack 13/26] Fix test DB connection: use DATABASE_URL with dotenv [Stack 14/27] Fix test DB connection: use DATABASE_URL with dotenv Mar 30, 2026
@jucor jucor force-pushed the jc/fix-test-db-connection branch from e45a120 to b68bd5b Compare March 30, 2026 12:54
@github-actions
Copy link
Copy Markdown

Delphi Coverage Report

File Stmts Miss Cover
init.py 2 0 100%
benchmarks/bench_pca.py 76 76 0%
benchmarks/bench_repness.py 81 81 0%
benchmarks/bench_update_votes.py 38 38 0%
benchmarks/benchmark_utils.py 34 34 0%
components/init.py 1 0 100%
components/config.py 165 133 19%
conversation/init.py 2 0 100%
conversation/conversation.py 1107 320 71%
conversation/manager.py 131 42 68%
database/init.py 1 0 100%
database/dynamodb.py 387 234 40%
database/postgres.py 305 205 33%
pca_kmeans_rep/init.py 5 0 100%
pca_kmeans_rep/clusters.py 257 22 91%
pca_kmeans_rep/corr.py 98 17 83%
pca_kmeans_rep/pca.py 52 16 69%
pca_kmeans_rep/repness.py 297 43 86%
pca_kmeans_rep/stats.py 107 22 79%
regression/init.py 4 0 100%
regression/clojure_comparer.py 188 17 91%
regression/comparer.py 887 720 19%
regression/datasets.py 135 27 80%
regression/recorder.py 36 27 25%
regression/utils.py 138 87 37%
run_math_pipeline.py 260 114 56%
umap_narrative/500_generate_embedding_umap_cluster.py 210 109 48%
umap_narrative/501_calculate_comment_extremity.py 112 53 53%
umap_narrative/502_calculate_priorities.py 135 135 0%
umap_narrative/700_datamapplot_for_layer.py 502 502 0%
umap_narrative/701_static_datamapplot_for_layer.py 310 310 0%
umap_narrative/702_consensus_divisive_datamapplot.py 432 432 0%
umap_narrative/801_narrative_report_batch.py 785 785 0%
umap_narrative/802_process_batch_results.py 265 265 0%
umap_narrative/803_check_batch_status.py 175 175 0%
umap_narrative/llm_factory_constructor/init.py 2 2 0%
umap_narrative/llm_factory_constructor/model_provider.py 157 157 0%
umap_narrative/polismath_commentgraph/init.py 1 0 100%
umap_narrative/polismath_commentgraph/cli.py 270 270 0%
umap_narrative/polismath_commentgraph/core/init.py 3 3 0%
umap_narrative/polismath_commentgraph/core/clustering.py 108 108 0%
umap_narrative/polismath_commentgraph/core/embedding.py 104 104 0%
umap_narrative/polismath_commentgraph/lambda_handler.py 219 219 0%
umap_narrative/polismath_commentgraph/schemas/init.py 2 0 100%
umap_narrative/polismath_commentgraph/schemas/dynamo_models.py 160 9 94%
umap_narrative/polismath_commentgraph/tests/conftest.py 17 17 0%
umap_narrative/polismath_commentgraph/tests/test_clustering.py 74 74 0%
umap_narrative/polismath_commentgraph/tests/test_embedding.py 55 55 0%
umap_narrative/polismath_commentgraph/tests/test_storage.py 87 87 0%
umap_narrative/polismath_commentgraph/utils/init.py 3 0 100%
umap_narrative/polismath_commentgraph/utils/converter.py 283 237 16%
umap_narrative/polismath_commentgraph/utils/group_data.py 354 336 5%
umap_narrative/polismath_commentgraph/utils/storage.py 584 518 11%
umap_narrative/reset_conversation.py 159 50 69%
umap_narrative/run_pipeline.py 453 312 31%
utils/general.py 62 41 34%
Total 10877 7640 30%

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 16 out of 16 changed files in this pull request and generated 5 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 279 to 288
# Create engine
uri = self.config.get_uri()
self.engine = sa.create_engine(
uri,
pool_size=self.config.pool_size,
max_overflow=self.config.max_overflow,
pool_recycle=300, # Recycle connections after 5 minutes
connect_args={"connect_timeout": 5},
pool_pre_ping=True,
)
Copy link

Copilot AI Mar 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR description says the 5s DB connect_timeout is test-only, but this change applies it to the main SQLAlchemy engine used by PostgresClient. A hard-coded 5s timeout (and enabling pool_pre_ping) can cause production requests to fail under transient network/DB load. Consider making the timeout configurable (e.g., env/config defaulting to libpq default) and only overriding to 5s in test config, or gate it behind an explicit "fast-fail" setting.

Copilot uses AI. Check for mistakes.
Comment on lines 50 to 65
try:
# Check if DATABASE_URL is set and use it if available
database_url = os.environ.get("DATABASE_URL")
if database_url:
logger.info(f"Using DATABASE_URL: {database_url.split('@')[1] if '@' in database_url else '(hidden)'}")
conn = psycopg2.connect(database_url)
conn = psycopg2.connect(database_url, connect_timeout=5)
else:
# Fall back to individual connection parameters
conn = psycopg2.connect(
dbname=os.environ.get("DATABASE_NAME", "polisDB_prod_local_mar14"),
user=os.environ.get("DATABASE_USER", "colinmegill"),
password=os.environ.get("DATABASE_PASSWORD", ""),
host=os.environ.get("DATABASE_HOST", "localhost"),
port=os.environ.get("DATABASE_PORT", 5432),
connect_timeout=5,
)
Copy link

Copilot AI Mar 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

connect_timeout=5 is now applied in this script's connect_to_db() path for both DATABASE_URL and per-field config. If this script is used against remote/prod DBs, a hard-coded 5s timeout can be too aggressive and cause intermittent failures. Consider making the timeout configurable (env/arg) with a safer default, and keep the 5s value for tests/local dev only.

Copilot uses AI. Check for mistakes.
Comment on lines +74 to 75
return psycopg2.connect(database_url, connect_timeout=5)

Copy link

Copilot AI Mar 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

get_db_connection() now hard-codes connect_timeout=5 for all connections. For a utility that may be used against slower or remote Postgres instances, this can cause avoidable failures. Consider making the timeout configurable (env/CLI) and defaulting to libpq's default unless explicitly opting into fast-fail behavior.

Suggested change
return psycopg2.connect(database_url, connect_timeout=5)
# Allow optional override of the database connect timeout via environment variable.
# If DATABASE_CONNECT_TIMEOUT is not set, fall back to psycopg2/libpq defaults.
timeout_env = os.environ.get("DATABASE_CONNECT_TIMEOUT")
connect_kwargs = {}
if timeout_env:
try:
connect_kwargs["connect_timeout"] = int(timeout_env)
except ValueError:
# Invalid value; ignore and use default behavior.
print(
f"Warning: DATABASE_CONNECT_TIMEOUT={timeout_env!r} is not a valid integer; "
"using psycopg2 default connect timeout."
)
return psycopg2.connect(database_url, **connect_kwargs)

Copilot uses AI. Check for mistakes.
Comment on lines +62 to +97
def require_s3(
endpoint: str | None = None,
timeout: float = 3.0,
) -> None:
"""Skip the test if S3/MinIO is not responding.

Uses pytest.skip (not fail) because MinIO is a local dev service
that is not available in CI.
"""
import os

import boto3
from botocore.config import Config

endpoint = endpoint or os.environ.get(
"AWS_S3_ENDPOINT", "http://host.docker.internal:9000"
)
cfg = Config(
connect_timeout=timeout,
read_timeout=timeout,
retries={"max_attempts": 0},
signature_version="s3v4",
)
client = boto3.client(
"s3",
endpoint_url=endpoint,
region_name="us-east-1",
aws_access_key_id=os.environ.get("AWS_ACCESS_KEY_ID", "minioadmin"),
aws_secret_access_key=os.environ.get("AWS_SECRET_ACCESS_KEY", "minioadmin"),
config=cfg,
verify=False,
)
try:
client.list_buckets()
except Exception as exc:
pytest.skip(f"S3/MinIO is not available at {endpoint}: {exc}")
Copy link

Copilot AI Mar 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PR description says both require_dynamodb() and require_s3() use pytest.fail() to fail fast, but require_s3() currently uses pytest.skip() when MinIO is unavailable. Either update the PR description to match the implemented behavior, or change require_s3() to fail if the intention is to treat missing S3 as a hard test failure.

Copilot uses AI. Check for mistakes.
Comment on lines 281 to 288
self.engine = sa.create_engine(
uri,
pool_size=self.config.pool_size,
max_overflow=self.config.max_overflow,
pool_recycle=300, # Recycle connections after 5 minutes
connect_args={"connect_timeout": 5},
pool_pre_ping=True,
)
Copy link

Copilot AI Mar 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This introduces connect_timeout=5 (and pool_pre_ping) in delphi/polismath/database/postgres.py, which is part of the non-test codepath. That conflicts with the PR description stating the 5s timeout was reverted from production code and kept only in tests; either move this to test-only configuration or update the PR description/commit message to match the actual scope.

Copilot uses AI. Check for mistakes.
@jucor jucor force-pushed the jc/vectorize-participant-info branch from 67731ff to afbd68a Compare March 30, 2026 16:49
@jucor jucor force-pushed the jc/fix-test-db-connection branch from b68bd5b to 583f955 Compare March 30, 2026 16:49
jucor and others added 13 commits March 30, 2026 17:58
Without a connect_timeout, psycopg2 and SQLAlchemy wait for the OS TCP
timeout (~30s+) when the database is unreachable, causing tests to stall.

Add connect_timeout=5 to all psycopg2.connect() calls and to the
SQLAlchemy engine (via connect_args). Also add pool_pre_ping=True so
stale pooled connections are detected. Change DB-dependent tests from
pytest.skip to pytest.fail so a missing database is surfaced as a
failure rather than silently skipped.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Revert connect_timeout from production DB code (postgres.py,
run_math_pipeline.py, regression_download.py) — a 5s timeout could
break real deployments where infra is slow to respond.

Instead, add require_dynamodb() and require_s3() helpers to conftest
that probe the service with a short timeout and zero retries, then
pytest.fail() immediately. Applied to test_dynamodb_direct,
test_batch_id, and test_minio_access — all previously hung forever
when Docker was down.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
MinIO is a local dev service not available in CI, so use pytest.skip
instead of pytest.fail to avoid breaking the CI test suite.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add require_dynamodb() guard so the test fails in ~3s instead of
hanging for minutes on a TCP connection to a stopped DynamoDB.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add require_dynamodb() guard to the dynamodb_resource fixture so it
fails in ~3s instead of hanging on a TCP connection to stopped DynamoDB.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add pytest-timestamper>=0.0.10 to dev dependencies for detecting hung tests
  (prints timestamps in -v mode so hangs are obvious from stopped progress)

- Fix test_501_calculate_comment_extremity hang by also mocking PostgresClient.
  The test already mocked GroupDataProcessor but PostgresClient.initialize()
  was still called at line 57 for exclude_comment_selections, causing a hang
  when Docker (Postgres) was paused.

Verified: both tests pass in <0.2s with Docker paused.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Move `from run_pipeline import main` from module level to inside the test
function. Module-level import loads SentenceTransformer, UMAP, evoc during
test collection. With xdist, multiple workers collecting simultaneously
*might* cause lock contention and hangs.

Note: A previous session reported indefinite hangs under xdist, but we
couldn't reproduce it. This change is defensive — deferred imports are
good practice regardless.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
…n type

- Initialize timing_stats_compared to {} instead of None when benchmark
  is off, preventing AttributeError in generate_report()
- Fix compute_all_stages() return type from Dict[str, Dict[str, Any]]
  to Dict[str, Any] since it also contains "timings": Dict[str, float]

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
connect_to_db() in test_postgres_real_data.py used POSTGRES_* env vars
with wrong defaults (password='postgres', database='polismath'), causing
both test_conversation_from_postgres and test_pakistan_conversation_batch
to fail with auth errors even when Postgres is running.

Switch to DATABASE_URL (the standard in all other delphi Python code)
and load it from the repo root .env via python-dotenv.

Refs #2442

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
connect_to_db() in test_postgres_real_data.py had a hardcoded .env path
that doesn't work inside Docker containers, and required DATABASE_URL
with no fallback. Now uses find_dotenv() to search up the directory tree,
and falls back to DATABASE_HOST/PORT/NAME/USER/PASSWORD (which the
docker-compose.test.yml delphi service already provides).

Also removes dead POSTGRES_* env vars from the CI exec command — nothing
in Delphi reads those anymore.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The session-scoped conversation cache fixture and the
pytest_collection_modifyitems reordering hook were inadvertently
removed when require_dynamodb/require_s3 helpers were added.
test_legacy_clojure_regression.py depends on this fixture.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
pyproject.toml forces -n0 (sequential) because the session-scoped
conversation cache makes sequential ~1.6-1.8x faster than parallel.
The xdist_group markers were never active. Restore the plain
pytest.param() calls that match the rest of the stack.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@jucor jucor force-pushed the jc/vectorize-participant-info branch from afbd68a to 6138219 Compare March 30, 2026 17:05
@jucor jucor force-pushed the jc/fix-test-db-connection branch from 583f955 to 380b00d Compare March 30, 2026 17:05
This was referenced Mar 30, 2026
@jucor
Copy link
Copy Markdown
Collaborator Author

jucor commented Mar 30, 2026

Superseded by spr-managed PR stack. See the new stack starting at #2508.

@jucor jucor closed this Mar 30, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants