[Stack 14/27] Fix test DB connection: use DATABASE_URL with dotenv#2443
[Stack 14/27] Fix test DB connection: use DATABASE_URL with dotenv#2443jucor wants to merge 13 commits intojc/vectorize-participant-infofrom
Conversation
There was a problem hiding this comment.
Pull request overview
Updates the Delphi real-data PostgreSQL integration test to connect using a single DATABASE_URL connection string (optionally loaded from a .env file), aligning with the repository’s broader database configuration approach.
Changes:
- Update
connect_to_db()to load.envand readDATABASE_URL. - Simplify psycopg2 connection setup to use the URL DSN instead of individual
POSTGRES_*fields.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| database_url = os.environ.get('DATABASE_URL') | ||
| if not database_url: | ||
| print("DATABASE_URL environment variable is not set") | ||
| return None |
| from dotenv import load_dotenv | ||
|
|
||
| # Load .env from the repo root (two levels up from tests/) | ||
| load_dotenv(Path(__file__).parent.parent.parent / '.env') | ||
|
|
b839591 to
3048f93
Compare
1514e3d to
bafbdf6
Compare
cb7c5cd to
bafbdf6
Compare
3048f93 to
e883338
Compare
bafbdf6 to
e2f39bb
Compare
e883338 to
9dee5d9
Compare
e2f39bb to
89297cf
Compare
9dee5d9 to
0d4cd71
Compare
89297cf to
403be8d
Compare
0d4cd71 to
fa345cd
Compare
403be8d to
464da16
Compare
fa345cd to
655bcfa
Compare
6b81d8e to
6da6172
Compare
655bcfa to
01a7637
Compare
0ef54ca to
b62c0cd
Compare
4ea1333 to
917cba8
Compare
917cba8 to
e45a120
Compare
e45a120 to
b68bd5b
Compare
Delphi Coverage Report
|
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 16 out of 16 changed files in this pull request and generated 5 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| # Create engine | ||
| uri = self.config.get_uri() | ||
| self.engine = sa.create_engine( | ||
| uri, | ||
| pool_size=self.config.pool_size, | ||
| max_overflow=self.config.max_overflow, | ||
| pool_recycle=300, # Recycle connections after 5 minutes | ||
| connect_args={"connect_timeout": 5}, | ||
| pool_pre_ping=True, | ||
| ) |
There was a problem hiding this comment.
PR description says the 5s DB connect_timeout is test-only, but this change applies it to the main SQLAlchemy engine used by PostgresClient. A hard-coded 5s timeout (and enabling pool_pre_ping) can cause production requests to fail under transient network/DB load. Consider making the timeout configurable (e.g., env/config defaulting to libpq default) and only overriding to 5s in test config, or gate it behind an explicit "fast-fail" setting.
| try: | ||
| # Check if DATABASE_URL is set and use it if available | ||
| database_url = os.environ.get("DATABASE_URL") | ||
| if database_url: | ||
| logger.info(f"Using DATABASE_URL: {database_url.split('@')[1] if '@' in database_url else '(hidden)'}") | ||
| conn = psycopg2.connect(database_url) | ||
| conn = psycopg2.connect(database_url, connect_timeout=5) | ||
| else: | ||
| # Fall back to individual connection parameters | ||
| conn = psycopg2.connect( | ||
| dbname=os.environ.get("DATABASE_NAME", "polisDB_prod_local_mar14"), | ||
| user=os.environ.get("DATABASE_USER", "colinmegill"), | ||
| password=os.environ.get("DATABASE_PASSWORD", ""), | ||
| host=os.environ.get("DATABASE_HOST", "localhost"), | ||
| port=os.environ.get("DATABASE_PORT", 5432), | ||
| connect_timeout=5, | ||
| ) |
There was a problem hiding this comment.
connect_timeout=5 is now applied in this script's connect_to_db() path for both DATABASE_URL and per-field config. If this script is used against remote/prod DBs, a hard-coded 5s timeout can be too aggressive and cause intermittent failures. Consider making the timeout configurable (env/arg) with a safer default, and keep the 5s value for tests/local dev only.
| return psycopg2.connect(database_url, connect_timeout=5) | ||
|
|
There was a problem hiding this comment.
get_db_connection() now hard-codes connect_timeout=5 for all connections. For a utility that may be used against slower or remote Postgres instances, this can cause avoidable failures. Consider making the timeout configurable (env/CLI) and defaulting to libpq's default unless explicitly opting into fast-fail behavior.
| return psycopg2.connect(database_url, connect_timeout=5) | |
| # Allow optional override of the database connect timeout via environment variable. | |
| # If DATABASE_CONNECT_TIMEOUT is not set, fall back to psycopg2/libpq defaults. | |
| timeout_env = os.environ.get("DATABASE_CONNECT_TIMEOUT") | |
| connect_kwargs = {} | |
| if timeout_env: | |
| try: | |
| connect_kwargs["connect_timeout"] = int(timeout_env) | |
| except ValueError: | |
| # Invalid value; ignore and use default behavior. | |
| print( | |
| f"Warning: DATABASE_CONNECT_TIMEOUT={timeout_env!r} is not a valid integer; " | |
| "using psycopg2 default connect timeout." | |
| ) | |
| return psycopg2.connect(database_url, **connect_kwargs) |
| def require_s3( | ||
| endpoint: str | None = None, | ||
| timeout: float = 3.0, | ||
| ) -> None: | ||
| """Skip the test if S3/MinIO is not responding. | ||
|
|
||
| Uses pytest.skip (not fail) because MinIO is a local dev service | ||
| that is not available in CI. | ||
| """ | ||
| import os | ||
|
|
||
| import boto3 | ||
| from botocore.config import Config | ||
|
|
||
| endpoint = endpoint or os.environ.get( | ||
| "AWS_S3_ENDPOINT", "http://host.docker.internal:9000" | ||
| ) | ||
| cfg = Config( | ||
| connect_timeout=timeout, | ||
| read_timeout=timeout, | ||
| retries={"max_attempts": 0}, | ||
| signature_version="s3v4", | ||
| ) | ||
| client = boto3.client( | ||
| "s3", | ||
| endpoint_url=endpoint, | ||
| region_name="us-east-1", | ||
| aws_access_key_id=os.environ.get("AWS_ACCESS_KEY_ID", "minioadmin"), | ||
| aws_secret_access_key=os.environ.get("AWS_SECRET_ACCESS_KEY", "minioadmin"), | ||
| config=cfg, | ||
| verify=False, | ||
| ) | ||
| try: | ||
| client.list_buckets() | ||
| except Exception as exc: | ||
| pytest.skip(f"S3/MinIO is not available at {endpoint}: {exc}") |
There was a problem hiding this comment.
The PR description says both require_dynamodb() and require_s3() use pytest.fail() to fail fast, but require_s3() currently uses pytest.skip() when MinIO is unavailable. Either update the PR description to match the implemented behavior, or change require_s3() to fail if the intention is to treat missing S3 as a hard test failure.
| self.engine = sa.create_engine( | ||
| uri, | ||
| pool_size=self.config.pool_size, | ||
| max_overflow=self.config.max_overflow, | ||
| pool_recycle=300, # Recycle connections after 5 minutes | ||
| connect_args={"connect_timeout": 5}, | ||
| pool_pre_ping=True, | ||
| ) |
There was a problem hiding this comment.
This introduces connect_timeout=5 (and pool_pre_ping) in delphi/polismath/database/postgres.py, which is part of the non-test codepath. That conflicts with the PR description stating the 5s timeout was reverted from production code and kept only in tests; either move this to test-only configuration or update the PR description/commit message to match the actual scope.
67731ff to
afbd68a
Compare
b68bd5b to
583f955
Compare
Without a connect_timeout, psycopg2 and SQLAlchemy wait for the OS TCP timeout (~30s+) when the database is unreachable, causing tests to stall. Add connect_timeout=5 to all psycopg2.connect() calls and to the SQLAlchemy engine (via connect_args). Also add pool_pre_ping=True so stale pooled connections are detected. Change DB-dependent tests from pytest.skip to pytest.fail so a missing database is surfaced as a failure rather than silently skipped. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Revert connect_timeout from production DB code (postgres.py, run_math_pipeline.py, regression_download.py) — a 5s timeout could break real deployments where infra is slow to respond. Instead, add require_dynamodb() and require_s3() helpers to conftest that probe the service with a short timeout and zero retries, then pytest.fail() immediately. Applied to test_dynamodb_direct, test_batch_id, and test_minio_access — all previously hung forever when Docker was down. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
MinIO is a local dev service not available in CI, so use pytest.skip instead of pytest.fail to avoid breaking the CI test suite. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add require_dynamodb() guard so the test fails in ~3s instead of hanging for minutes on a TCP connection to a stopped DynamoDB. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add require_dynamodb() guard to the dynamodb_resource fixture so it fails in ~3s instead of hanging on a TCP connection to stopped DynamoDB. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add pytest-timestamper>=0.0.10 to dev dependencies for detecting hung tests (prints timestamps in -v mode so hangs are obvious from stopped progress) - Fix test_501_calculate_comment_extremity hang by also mocking PostgresClient. The test already mocked GroupDataProcessor but PostgresClient.initialize() was still called at line 57 for exclude_comment_selections, causing a hang when Docker (Postgres) was paused. Verified: both tests pass in <0.2s with Docker paused. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Move `from run_pipeline import main` from module level to inside the test function. Module-level import loads SentenceTransformer, UMAP, evoc during test collection. With xdist, multiple workers collecting simultaneously *might* cause lock contention and hangs. Note: A previous session reported indefinite hangs under xdist, but we couldn't reproduce it. This change is defensive — deferred imports are good practice regardless. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
…n type
- Initialize timing_stats_compared to {} instead of None when benchmark
is off, preventing AttributeError in generate_report()
- Fix compute_all_stages() return type from Dict[str, Dict[str, Any]]
to Dict[str, Any] since it also contains "timings": Dict[str, float]
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
connect_to_db() in test_postgres_real_data.py used POSTGRES_* env vars with wrong defaults (password='postgres', database='polismath'), causing both test_conversation_from_postgres and test_pakistan_conversation_batch to fail with auth errors even when Postgres is running. Switch to DATABASE_URL (the standard in all other delphi Python code) and load it from the repo root .env via python-dotenv. Refs #2442 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
connect_to_db() in test_postgres_real_data.py had a hardcoded .env path that doesn't work inside Docker containers, and required DATABASE_URL with no fallback. Now uses find_dotenv() to search up the directory tree, and falls back to DATABASE_HOST/PORT/NAME/USER/PASSWORD (which the docker-compose.test.yml delphi service already provides). Also removes dead POSTGRES_* env vars from the CI exec command — nothing in Delphi reads those anymore. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The session-scoped conversation cache fixture and the pytest_collection_modifyitems reordering hook were inadvertently removed when require_dynamodb/require_s3 helpers were added. test_legacy_clojure_regression.py depends on this fixture. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
pyproject.toml forces -n0 (sequential) because the session-scoped conversation cache makes sequential ~1.6-1.8x faster than parallel. The xdist_group markers were never active. Restore the plain pytest.param() calls that match the rest of the stack. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
afbd68a to
6138219
Compare
583f955 to
380b00d
Compare
|
Superseded by spr-managed PR stack. See the new stack starting at #2508. |
Summary
connect_timeout=5to all DB connections so tests fail fast instead of hanging when Postgres is unavailable. Reverts timeout from production code (a 5s timeout could break real deployments), keeping it only in test code.require_dynamodb()andrequire_s3()helpers to conftest that probe services with short timeouts andpytest.fail()immediately — applied to all tests that previously hung when Docker services were down.pytest-timestamperfor per-test timing visibility.test_postgres_real_data.pyto useDATABASE_URL(consistent with all other delphi Python code) viapython-dotenv.Refs #2442
Test plan
test_conversation_from_postgrespassestest_pakistan_conversation_batchpasses (9 min, 400K votes)🤖 Generated with Claude Code