Skip to content

Optimize initial import and update to MusicBrainz schema 30#24

Open
MockingMagician wants to merge 2 commits into
acoustid:mainfrom
MockingMagician:v30.0.0
Open

Optimize initial import and update to MusicBrainz schema 30#24
MockingMagician wants to merge 2 commits into
acoustid:mainfrom
MockingMagician:v30.0.0

Conversation

@MockingMagician
Copy link
Copy Markdown

  • Temporarily disable triggers and constraints (session_replication_role) during import for massive performance gain.
  • Update SQL scripts to MusicBrainz schema 30.
  • Add progress bars (tqdm) for download and import.
  • Improve Docker environment (docker-compose.dev.yml, entrypoint script).
  • Bump version to 30.0.0

- Temporarily disable triggers and constraints (session_replication_role) during import for massive performance gain.
- Update SQL scripts to MusicBrainz schema 30.
- Add progress bars (tqdm) for download and import.
- Improve Docker environment (docker-compose.dev.yml, entrypoint script).
- Bump version to 30.0.0
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Dec 29, 2025

Walkthrough

Bumps mbslave to 30.0.0: adds medium GID and redirect table, new artist_release/artist_release_group partitions and triggers, resumable HTTP downloads with progress, Poetry-based dev/test flow, Docker/dev-compose updates, and removal of CAA/EAA AMQP trigger infrastructure.

Changes

Cohort / File(s) Summary
Version & metadata
mbslave/__init__.py, pyproject.toml, CHANGELOG.rst
Version -> 30.0.0; adds tqdm, moves dev deps into tool.poetry.group.dev, updates changelog.
CI/CD workflows
.github/workflows/build-mbslave-container-image.yml, .github/workflows/test.yml
Update actions (checkout→v4, docker actions→v3/v5), add buildx, caching, expand image tags; test workflow adds Python container, PostgreSQL service, Poetry install and test/init steps.
Docker & dev compose
Dockerfile.mbslave, Dockerfile.postgres, docker-compose.dev.yml, docker-entrypoint.sh
New Postgres image build with mbslave deps; mbslave image pins Poetry 1.8.5 and uses docker-entrypoint.sh; new dev compose defines db and mbslave services, volumes, and healthchecks.
Core Python changes
mbslave/replication.py, mbslave/tests/test_docker_db.py
Adds resumable download_url (Range requests) and TqdmFileReader for COPY streams, --work-dir and --only-db CLI options, safer create_user/create_database checks; new DB integration tests.
SQL: schema additions
mbslave/sql/CreateTables.sql, mbslave/sql/updates/*.sql, mbslave/sql/updates/schema-change/*.sql, mbslave/sql/CreateIndexes.sql, mbslave/sql/CreatePrimaryKeys.sql
Add medium.gid (UUID), medium_gid_redirect table (PK gid, new_id), partitioned artist_release and artist_release_group with new columns/indexes/functions, migrations to populate and enforce constraints.
SQL: triggers/functions (adds)
mbslave/sql/CreateFunctions.sql, mbslave/sql/CreateTriggers.sql, mbslave/sql/CreateMirrorOnlyFunctions.sql, mbslave/sql/CreateMirrorOnlyTriggers.sql, mbslave/sql/CreateReplicationTriggers*.sql, mbslave/sql/updates/*.sql
New mirror trigger functions (e.g., a_upd_medium_mirror, release_group primary/secondary mirror funcs), triggers and constraint-trigger entries to enqueue pending updates; new helper functions (e.g., set_mediums_recordings_first_release_dates).
SQL: schema cleanup / drops
mbslave/sql/Drop*.sql (multiple), mbslave/sql/TruncateTables.sql, mbslave/sql/InsertTestData.sql
Symmetric DROP statements for new objects; test data updated to include medium.gid; new TRUNCATE for medium_gid_redirect.
Removed AMQP trigger infra
mbslave/sql/caa/CreateMQTriggers.sql, mbslave/sql/caa/DropMQTriggers.sql, mbslave/sql/eaa/CreateMQTriggers.sql, mbslave/sql/eaa/DropMQTriggers.sql
Deletes cover/event AMQP trigger functions and corresponding drop scripts (removes CAA/EAA reindex/move/delete trigger logic).
CAA/EAA views tweak
mbslave/sql/caa/CreateViews.sql, mbslave/sql/eaa/CreateViews.sql
Removes mime_type != 'application/pdf' exclusion so PDFs may be considered front art.
Development docs & VCS
DEVELOPMENT.rst, README.rst, .gitignore
Switch to Poetry-based development instructions, add docker-compose dev guidance, remove apt line, add /.idea/ and /.bin/ to .gitignore.

Estimated code review effort

🎯 5 (Critical) | ⏱️ ~120 minutes

🚥 Pre-merge checks | ✅ 2 | ❌ 1
❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately summarizes the main objectives: schema upgrade to v30 and import performance optimization through trigger/constraint disabling.
Description check ✅ Passed The description directly relates to the changeset, covering schema updates, performance optimization, Docker improvements, and version bump with specific technical details.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 22

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
Dockerfile.mbslave (1)

1-24: Optional: Address static analysis hints for production-readiness.

The static analysis tools suggest two improvements for production environments:

  1. Add a HEALTHCHECK instruction so orchestrators can monitor container health:
HEALTHCHECK --interval=30s --timeout=3s --start-period=5s --retries=3 \
  CMD pg_isready -h db || exit 1
  1. Run as non-root user for security:
RUN useradd -m -u 1000 mbslave
USER mbslave

These aren't critical for a development environment but would be important for production deployments.

pyproject.toml (1)

3-24: Align project version and Poetry dev-group usage

  • Line 3: version = "28.0.0" doesn’t match the stated 30.0.0 bump. If 30.0.0 is the target release, update this value to avoid packaging/version drift.
  • Lines 16-23: Moving test tooling into [tool.poetry.group.dev.dependencies] is fine, but plain poetry install (as used in the workflow) will not install the dev group under Poetry 1.8.x. If check.sh or tests rely on pytest, flake8, mypy, etc., switch CI to poetry install --with dev (or similar) or move required test deps back to the default group.
  • psycopg2 is declared both as a runtime dependency (line 11) and again in the dev group (line 18). Keeping it only in [tool.poetry.dependencies] is enough and avoids confusion.
♻️ Duplicate comments (2)
mbslave/sql/updates/schema-change/30.all.sql (2)

17-30: Same issue - DROP without IF EXISTS.

As noted in the standalone migration file, these DROP statements could fail on fresh installs. The compiled script inherits this issue from the source.


166-170: DROP statements for artist_release structures.

Same pattern - DROP FUNCTION and DROP TABLE without IF EXISTS. These will fail if running against a database where these objects don't exist.

📜 Review details

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between a6d1686 and 7919769.

⛔ Files ignored due to path filters (1)
  • poetry.lock is excluded by !**/*.lock
📒 Files selected for processing (60)
  • .github/workflows/build-mbslave-container-image.yml
  • .github/workflows/test.yml
  • .gitignore
  • CHANGELOG.rst
  • DEVELOPMENT.rst
  • Dockerfile.mbslave
  • Dockerfile.postgres
  • README.rst
  • bin/act
  • docker-compose.dev.yml
  • docker-entrypoint.sh
  • mbslave/__init__.py
  • mbslave/replication.py
  • mbslave/sql/CreateFKConstraints.sql
  • mbslave/sql/CreateFunctions.sql
  • mbslave/sql/CreateIndexes.sql
  • mbslave/sql/CreateMirrorOnlyFunctions.sql
  • mbslave/sql/CreateMirrorOnlyTriggers.sql
  • mbslave/sql/CreatePrimaryKeys.sql
  • mbslave/sql/CreateReplicationTriggers.sql
  • mbslave/sql/CreateReplicationTriggers2.sql
  • mbslave/sql/CreateTables.sql
  • mbslave/sql/CreateTriggers.sql
  • mbslave/sql/DisableLastUpdatedTriggers.sql
  • mbslave/sql/DropFKConstraints.sql
  • mbslave/sql/DropFunctions.sql
  • mbslave/sql/DropIndexes.sql
  • mbslave/sql/DropMirrorOnlyTriggers.sql
  • mbslave/sql/DropPrimaryKeys.sql
  • mbslave/sql/DropReplicationTriggers.sql
  • mbslave/sql/DropReplicationTriggers2.sql
  • mbslave/sql/DropTables.sql
  • mbslave/sql/DropTriggers.sql
  • mbslave/sql/EnableLastUpdatedTriggers.sql
  • mbslave/sql/InsertTestData.sql
  • mbslave/sql/TruncateTables.sql
  • mbslave/sql/caa/CreateMQTriggers.sql
  • mbslave/sql/caa/CreateViews.sql
  • mbslave/sql/caa/DropMQTriggers.sql
  • mbslave/sql/eaa/CreateMQTriggers.sql
  • mbslave/sql/eaa/CreateViews.sql
  • mbslave/sql/eaa/DropMQTriggers.sql
  • mbslave/sql/updates/20240221-mbs-13492.sql
  • mbslave/sql/updates/20240726-mbs-9373.sql
  • mbslave/sql/updates/20241017-mbs-9253-13464.sql
  • mbslave/sql/updates/20241017-mbs-9253-master_and_standalone.sql
  • mbslave/sql/updates/20241017-mbs-9253-mirror_only.sql
  • mbslave/sql/updates/20241125-mbs-13832.sql
  • mbslave/sql/updates/20250320-mbs-13768-fks.sql
  • mbslave/sql/updates/20250320-mbs-13768.sql
  • mbslave/sql/updates/20250408-mbs-13322.sql
  • mbslave/sql/updates/20250408-mbs-13964-all.sql
  • mbslave/sql/updates/20250425-mbs-13464-master_and_standalone.sql
  • mbslave/sql/updates/20250425-mbs-13464.sql
  • mbslave/sql/updates/20250425-mbs-13966.sql
  • mbslave/sql/updates/schema-change/30.all.sql
  • mbslave/sql/updates/schema-change/30.master_and_standalone.sql
  • mbslave/sql/updates/schema-change/30.mirror_only.sql
  • mbslave/tests/test_docker_db.py
  • pyproject.toml
💤 Files with no reviewable changes (6)
  • mbslave/sql/eaa/CreateViews.sql
  • mbslave/sql/caa/CreateViews.sql
  • mbslave/sql/eaa/CreateMQTriggers.sql
  • mbslave/sql/eaa/DropMQTriggers.sql
  • mbslave/sql/caa/DropMQTriggers.sql
  • mbslave/sql/caa/CreateMQTriggers.sql
🧰 Additional context used
🪛 Checkov (3.2.334)
.github/workflows/test.yml

[low] 19-20: Base64 High Entropy String

(CKV_SECRET_6)


[low] 48-49: Base64 High Entropy String

(CKV_SECRET_6)

docker-compose.dev.yml

[low] 7-8: Base64 High Entropy String

(CKV_SECRET_6)

Dockerfile.postgres

[low] 1-49: Ensure that HEALTHCHECK instructions have been added to container images

(CKV_DOCKER_2)


[low] 1-49: Ensure that a user for the container has been created

(CKV_DOCKER_3)

Dockerfile.mbslave

[low] 1-24: Ensure that HEALTHCHECK instructions have been added to container images

(CKV_DOCKER_2)


[low] 1-24: Ensure that a user for the container has been created

(CKV_DOCKER_3)

🪛 Hadolint (2.14.0)
Dockerfile.postgres

[warning] 3-3: Pin versions in apt get install. Instead of apt-get install <package> use apt-get install <package>=<version>

(DL3008)


[warning] 19-19: Avoid use of cache directory with pip. Use pip install --no-cache-dir <package>

(DL3042)

🪛 Ruff (0.14.10)
mbslave/replication.py

27-27: Class TqdmFileReader inherits from object

Remove object inheritance

(UP004)


28-28: Missing return type annotation for special method __init__

Add return type annotation: None

(ANN204)


259-259: Use format specifiers instead of percent format

(UP031)


262-262: Audit URL open for permitted schemes. Allowing use of file: or custom schemes is often unexpected.

(S310)


264-264: Audit URL open for permitted schemes. Allowing use of file: or custom schemes is often unexpected.

(S310)


268-268: Audit URL open for permitted schemes. Allowing use of file: or custom schemes is often unexpected.

(S310)


322-322: Use format specifiers instead of percent format

Replace with format specifiers

(UP031)


322-322: Possible SQL injection vector through string-based query construction

(S608)


327-327: Use format specifiers instead of percent format

Replace with format specifiers

(UP031)


335-335: Use f-string instead of format call

Convert to f-string

(UP032)


337-337: Use format specifiers instead of percent format

Replace with format specifiers

(UP031)


357-357: Call startswith once with a tuple

Merge into a single startswith call

(PIE810)

🪛 YAMLlint (1.37.1)
.github/workflows/test.yml

[error] 13-13: too many spaces inside brackets

(brackets)


[error] 13-13: too many spaces inside brackets

(brackets)

docker-compose.dev.yml

[error] 14-14: too many spaces inside brackets

(brackets)


[error] 14-14: too many spaces inside brackets

(brackets)

🔇 Additional comments (71)
mbslave/replication.py (4)

53-56: LGTM!

Using if value: prevents empty environment variables from overwriting defaults with empty strings. Sensible behavior for configuration.


379-397: LGTM!

The superuser=True addition is required for SET session_replication_role. The work_dir handling is consistent with mbslave_import_main.


722-724: LGTM!

Early return for --only-db is clean. Allows partial initialization for Docker/CI scenarios where you want to create user/database but defer schema setup.


856-873: LGTM!

CLI arguments are consistent and well-documented. The --work-dir option enables resumable downloads by persisting tarballs, which is a good UX improvement for large imports.

mbslave/sql/CreatePrimaryKeys.sql (1)

267-267: LGTM! Primary key addition for medium_gid_redirect.

The constraint follows the established pattern and properly defines the primary key on the gid column. This matches the corresponding drop statement in DropPrimaryKeys.sql.

.gitignore (1)

11-11: LGTM! Standard IDE exclusion.

Adding .idea/ to gitignore is the right move for keeping IntelliJ IDEA workspace files out of version control.

mbslave/__init__.py (1)

4-4: LGTM! Version bump to 30.0.0.

The major version bump correctly reflects the MusicBrainz schema 30 update.

mbslave/sql/DropPrimaryKeys.sql (1)

267-267: LGTM! Matching drop for medium_gid_redirect primary key.

The DROP CONSTRAINT statement properly complements the PK addition in CreatePrimaryKeys.sql and uses the safe IF EXISTS clause.

CHANGELOG.rst (1)

1-9: LGTM! Clear and comprehensive changelog entry.

The Version 30.0.0 entry accurately documents the major changes: import optimization (MBS-13492), schema 30 update, progress bars, Docker improvements, and dependency updates.

DEVELOPMENT.rst (1)

8-51: LGTM! Clear Poetry-based development workflow.

The rewrite provides clear instructions for:

  • Installing Poetry and dependencies
  • Running commands via poetry run or poetry shell
  • Setting up the development database with Docker Compose
  • Running tests with pytest

This aligns well with modern Python development practices.

mbslave/sql/updates/20240221-mbs-13492.sql (1)

1-23: LGTM! Well-structured migration for editor beginner flag.

The migration correctly:

  • Wraps changes in a transaction
  • Disables statement timeout for potentially long-running operation
  • Uses bitwise OR (8192) to set the beginner flag
  • Excludes ModBot and deleted editors
  • Identifies beginners as those who either joined within 2 weeks OR don't have at least 10 non-autoedit approved edits (OFFSET 9 means ≥10 rows must exist to fail the NOT EXISTS check)

The logic is sound and the transaction boundaries are appropriate.

mbslave/sql/CreateMirrorOnlyFunctions.sql (1)

12-223: Session-level search_path is configured when executing through the mbslave wrapper, but direct SQL execution bypasses this protection.

The functions in CreateMirrorOnlyFunctions.sql lack explicit SET search_path clauses. When executed via mbslave psql -f, the wrapper sets search_path to musicbrainz,public through the PGOPTIONS environment variable (replication.py, line 805). However, if these SQL files are executed directly with psql or through other deployment mechanisms, the search_path won't be configured, causing the functions to fail or access the wrong schema.

mbslave/sql/updates/20250320-mbs-13768.sql (2)

21-29: Table structure and indexes look correct.

The medium_gid_redirect table follows the standard pattern used by other gid_redirect tables. Primary key on gid and index on new_id will support efficient lookups in both directions.


11-12: UUID namespace is correct and follows established migration pattern; performance concern depends on table size.

The namespace 6ba7b8119dad11d180b400c04fd430c8 is the DNS namespace UUID (standard UUID v3 namespace) and its unformatted string representation is the correct parameter format—consistent with decades of use throughout the codebase. The comment in CreateFunctions.sql confirms this: -- NameSpace_URL = '6ba7b8119dad11d180b400c04fd430c8'.

The full-table UPDATE follows the established pattern used in similar migrations (e.g., 20160516-mbs-8838.sql for bulk type GID generation, 20210419-mbs-11456.sql for artist_credit). On a database with millions of medium rows, this will acquire locks and consume time, but that's the cost of the operation—unavoidable without application-level batching. If this runs during an upgrade on live systems with significant traffic, plan accordingly.

mbslave/sql/DropReplicationTriggers.sql (1)

207-207: Cleanup statement looks correct.

The drop trigger statement follows the established pattern and uses IF EXISTS for idempotency.

mbslave/sql/DropFunctions.sql (1)

42-43: Function drops look correct.

The drop statements follow the file's pattern and are placed logically after the related a_upd_release_group() drop.

mbslave/sql/TruncateTables.sql (1)

271-271: Truncate statement is correct and properly positioned.

The statement follows the established pattern with RESTART IDENTITY CASCADE and maintains alphabetical ordering.

mbslave/sql/DropTables.sql (1)

271-271: Table drop statement is correct and properly positioned.

The drop statement maintains alphabetical ordering and follows the file's pattern.

mbslave/sql/DropIndexes.sql (1)

406-407: Index drop statements are correct and properly positioned.

Both index drops are placed logically with other medium-related indexes and follow the file's pattern.

mbslave/sql/updates/20250408-mbs-13964-all.sql (1)

5-15: The dependency checks out. set_recordings_first_release_dates() is defined in CreateFunctions.sql and available to this migration. The function logic correctly aggregates recordings from tracks filtered by the given medium IDs, then passes them to the dependency function for processing.

mbslave/sql/DropReplicationTriggers2.sql (1)

207-207: LGTM - Consistent with schema pattern.

The addition of the DROP TRIGGER statement for medium_gid_redirect follows the same pattern as other gid_redirect tables in this file.

mbslave/sql/CreateReplicationTriggers.sql (1)

818-820: LGTM - Replication trigger properly configured.

The new trigger for medium_gid_redirect correctly uses verbose logging and follows the same pattern as other gid_redirect triggers.

mbslave/sql/DropFKConstraints.sql (1)

600-600: LGTM - FK constraint drop follows pattern.

The DROP CONSTRAINT statement for medium_gid_redirect_fk_new_id is consistent with other gid_redirect foreign key drops in this file.

mbslave/sql/updates/20250320-mbs-13768-fks.sql (1)

1-12: LGTM - FK constraint properly defined.

The foreign key constraint follows standard patterns with proper transaction handling and error handling enabled. The default RESTRICT behavior (no ON DELETE clause) is consistent with other gid_redirect table foreign keys.

mbslave/sql/updates/20250425-mbs-13464-master_and_standalone.sql (1)

1-19: LGTM - FK constraints with appropriate cascade behavior.

Both foreign key constraints use ON DELETE CASCADE, which is correct for a relationship table like artist_release. When either the artist or release is deleted, the corresponding relationship rows should be removed.

mbslave/sql/CreateReplicationTriggers2.sql (1)

818-820: LGTM - dbmirror2 trigger correctly defined.

The trigger for medium_gid_redirect follows the dbmirror2 pattern (no verbose parameter) which is different from the dbmirror triggers but consistent with all other triggers in this file.

docker-entrypoint.sh (1)

4-7: Review the KEEP_ALIVE logic.

The current condition means that if KEEP_ALIVE is set to "1" or "true", the container will stay alive even if a command is provided. This might not be the intended behavior.

Consider if you want:

  • Option A (current): KEEP_ALIVE overrides any provided command
  • Option B (alternative): Only keep alive when no command is provided, unless KEEP_ALIVE forces it

If Option B is preferred:

-if [ "$KEEP_ALIVE" = "1" ] || [ "$KEEP_ALIVE" = "true" ] || [ $# -eq 0 ]; then
+if [ $# -eq 0 ] || [ "$KEEP_ALIVE" = "1" ] || [ "$KEEP_ALIVE" = "true" ]; then

This reordering doesn't change the logic, but if you want KEEP_ALIVE to be an override only when no command is given:

-if [ "$KEEP_ALIVE" = "1" ] || [ "$KEEP_ALIVE" = "true" ] || [ $# -eq 0 ]; then
+if [ $# -eq 0 ]; then
+    if [ "$KEEP_ALIVE" != "0" ] && [ "$KEEP_ALIVE" != "false" ]; then

Clarify the intended behavior when KEEP_ALIVE is set and a command is provided.

Dockerfile.mbslave (2)

4-4: LGTM - Poetry version pinned for reproducibility.

Pinning poetry to 1.8.5 ensures consistent builds across environments.


21-24: LGTM - Entrypoint properly configured.

The docker-entrypoint.sh script is correctly copied, made executable, and invoked through dumb-init for proper signal handling.

mbslave/sql/CreateFKConstraints.sql (1)

3009-3012: LGTM! FK constraint follows established pattern.

The new foreign key constraint for medium_gid_redirect is consistent with other gid-redirect constraints in the file and properly establishes referential integrity with the medium table.

.github/workflows/build-mbslave-container-image.yml (3)

7-7: Good: Added dev branch to CI triggers.

Including the dev branch in push and PR triggers enables testing and building containers for development changes before they reach main.

Also applies to: 13-13


29-32: Good: Added Docker Buildx setup.

Setting up Docker Buildx explicitly ensures consistent build capabilities across CI runners and enables advanced features like multi-platform builds.


61-62: Performance improvement: GitHub Actions cache added.

The cache-from and cache-to configuration will significantly speed up Docker builds by reusing layers from previous builds. This is a solid optimization for CI runtime.

docker-compose.dev.yml (2)

12-12: Verify: 8GB shared memory allocation for PostgreSQL.

The shm_size: 8gb is quite large. This is intentional for PostgreSQL performance (especially for operations like parallel queries, hash joins, and sorts), but ensure this aligns with your deployment environment's available resources.

Are you confident that development environments where this compose file runs will have 8GB of free memory available for shared memory? If not, consider documenting this requirement or making it configurable via an environment variable.


1-33: LGTM: Clean Docker Compose setup with proper health checks and dependencies.

The compose file properly establishes service dependencies with the condition: service_healthy check, ensuring mbslave only starts after the database is ready. Volume management is appropriate for both data persistence and configuration.

mbslave/sql/DropTriggers.sql (1)

197-198: LGTM: Trigger drops align with schema 30 mirror changes.

The new DROP TRIGGER statements for release_group_primary_type and release_group_secondary_type mirror triggers are consistent with the corresponding CREATE TRIGGER statements in other migration files. This ensures proper cleanup during schema updates.

Also applies to: 391-392

Dockerfile.postgres (2)

26-34: Hardcoded credentials in config file.

The default credentials "musicbrainz/musicbrainz" are hardcoded in both ENV vars and the config file. While acceptable for development, ensure this is documented as insecure for any non-local usage.

Is this image intended only for local development, or could it be used in staging/testing environments? If the latter, consider making credentials configurable at runtime rather than baked into the image.


37-49: LGTM: Clean initialization script setup.

The heredoc approach to create the init script is clean, and the script correctly uses PGUSER=postgres to connect as the superuser during initialization. The mbslave init --empty command with the proper config path is appropriate for initial schema setup.

mbslave/sql/updates/schema-change/30.mirror_only.sql (2)

11-19: LGTM: Mirror trigger setup is clean.

The AFTER UPDATE triggers on release_group_primary_type and release_group_secondary_type properly invoke their corresponding mirror procedures. The DROP IF EXISTS pattern ensures idempotent migrations.


21-36: Good: Efficient WHEN clause prevents unnecessary trigger execution.

The constraint triggers only fire when child_order actually changes (OLD.child_order IS DISTINCT FROM NEW.child_order), avoiding unnecessary work for unrelated updates. The DEFERRABLE INITIALLY DEFERRED setting is appropriate for maintaining consistency across batch operations.

mbslave/sql/DropMirrorOnlyTriggers.sql (1)

20-21: Mirror-only drop script correctly includes new release-group type triggers

The added DROP TRIGGERs for a_upd_release_group_*_mirror and apply_artist_release_group_pending_updates_mirror on the primary/secondary type tables line up with the new mirror triggers and keep create/drop symmetry. No issues from a DDL perspective.

Also applies to: 36-37

mbslave/sql/updates/20241017-mbs-9253-mirror_only.sql (1)

1-31: Mirror-only triggers and constraint triggers look sound

The migration cleanly:

  • Recreates a_upd_release_group_primary_type_mirror / _secondary_type_mirror as AFTER UPDATE row-level triggers.
  • Adds deferrable apply_artist_release_group_pending_updates_mirror constraint triggers on both tables, firing only when child_order actually changes.

That matches the pending-update model and should avoid unnecessary work in apply_artist_release_group_pending_updates(). No obvious correctness issues.

mbslave/sql/CreateTables.sql (2)

422-480: Artist release/group table changes match new sort semantics

Adding:

  • artist_release.name VARCHAR COLLATE musicbrainz NOT NULL (line 432), and
  • artist_release_group.primary_type_child_order, secondary_type_child_orders, and name VARCHAR COLLATE musicbrainz NOT NULL (lines 461, 463, 466)

is consistent with the updated sort indexes in CreateIndexes.sql. From a schema standpoint this looks fine.

The only thing to double‑check is that all population paths (e.g. get_artist_release_rows, get_artist_release_group_rows, and the various apply_*_pending_updates functions/triggers) are updated to populate these new columns correctly, otherwise they’ll persist as NULL/empty values after migration.


295-305: medium.gid + medium_gid_redirect DDL aligns with the rest of schema 30

  • The updated medium table with track_count INTEGER NOT NULL DEFAULT 0 and gid UUID NOT NULL (lines 295-305) matches the new indexing and test data and is compatible with existing usages of medium_index and track.medium.
  • The new medium_gid_redirect table (lines 2970-2974) mirrors the existing *_gid_redirect tables and is wired up with an index in CreateIndexes.sql.

This all looks coherent. Just make sure:

  • Initial import scripts / COPY formats include medium.gid, and
  • Any existing redirect-handling code is updated to also consult medium_gid_redirect.

Also applies to: 2970-2974

mbslave/sql/updates/20250425-mbs-13966.sql (1)

1-69: First-release-date backfill logic is correct; confirm function is actually used

  • set_release_group_first_release_date(release_group_id INTEGER) chooses the earliest non‑NULL (year, month, day) per release group via ORDER BY rd.year/month/day NULLS LAST LIMIT 1, then updates release_group_meta and queues the group in artist_release_group_pending_update. That logic looks right.
  • The bulk UPDATE release_group_meta ... FROM (SELECT DISTINCT ON (release_group.id) ...) using the temporary tmp_release_first_release_date_2025_q2 table also correctly computes and applies the first release date only when values change.
  • Truncating artist_release_group_pending_update at the end is a sensible cleanup given that mirror triggers will have queued updates during the bulk change.

Note that this migration does not actually call set_release_group_first_release_date; if this function is meant to be part of the ongoing schema (e.g., used by other migrations or triggers), that’s fine, but if it’s purely migration helper code, you might prefer to either call it here or move it to the core DDL instead of leaving it effectively unused.

mbslave/sql/InsertTestData.sql (1)

113-115: Test data changes for medium.gid and artist–artist relationship look consistent

  • The INSERT INTO medium (id, gid, release, position, format, name) statements (lines 113-114, 160-165) line up with the updated medium table definition and the new unique index on medium.gid. The gids are distinct, and all referenced releases/medium IDs match the surrounding test data.
  • The new link row with id = 4, link_type = 108 (line 245) and the l_artist_artist row (id=1, link=4, entity0=8, entity1=9) (line 260) respect the uniqueness constraint on (entity0, entity1, link, link_order) and use the default link_order = 0.

The only thing to confirm is that link_type with id = 108 is present in the reference data that’s loaded before this script; otherwise, FK creation on link.link_type would fail.

Also applies to: 160-165, 242-261

mbslave/sql/CreateIndexes.sql (1)

72-73: Index updates align with new sort keys and medium gid schema

  • Lines 72-73: artist_release_{nonva,va}_idx_sort now sort by (artist, first_release_date, catalog_numbers, country_code, barcode, name, release). This matches the new artist_release columns and should give a more sensible ordering than the previous single-character sort key.
  • Lines 80-81: artist_release_group_{nonva,va}_idx_sort now include primary_type_child_order, primary_type, secondary_type_child_orders, secondary_types, first_release_date, and name before release_group, which lines up with the new artist_release_group schema.
  • Line 510: CREATE UNIQUE INDEX medium_idx_gid ON medium (gid) is appropriate for the new non‑NULL medium.gid.
  • Line 736: CREATE INDEX medium_gid_redirect_idx_new_id ON medium_gid_redirect (new_id) is consistent with other *_gid_redirect tables and supports reverse lookups by target id.

All of these look consistent with the DDL and expected query patterns.

Also applies to: 80-81, 510-510, 736-736

.github/workflows/test.yml (1)

10-51: Workflow containerisation is sound; the Poetry install issue is a non-issue

The move to job-level Python containers with a postgresql service works correctly, and using postgresql as the host in MBSLAVE_DB_HOST matches the service name. The postgres health checks are also well-configured.

Poetry 1.8.5's default poetry install actually does install the dev group (and all non-optional groups) by default—you don't need --with dev. The dev dependencies (pytest, flake8, mypy) that check.sh relies on are already being pulled in, so no change needed here.

The YAML spacing in the python_version matrix on line 13 ([ '3.8', '3.9', '3.10' ]) may trigger a linter complaint about spaces inside brackets. If your linter is configured to enforce this, reformat it as ['3.8', '3.9', '3.10'].

The hard-coded Postgres and mbslave passwords here are clearly dummy values; if they ever become real credentials, move them to GitHub Actions secrets.

Also applies to: 54-90

Likely an incorrect or invalid review comment.

mbslave/sql/CreateTriggers.sql (3)

457-459: LGTM - Medium update trigger added.

The trigger correctly fires AFTER UPDATE and delegates to the mirror function. This will propagate medium changes to recording first release dates when a medium's release association changes.


586-591: LGTM - Release group type triggers look correct.

Both triggers fire on UPDATE and call their respective mirror functions to handle child_order changes on primary/secondary types.


1311-1322: LGTM - Constraint triggers with conditional execution.

The WHEN (OLD.child_order IS DISTINCT FROM NEW.child_order) condition efficiently limits trigger execution to only rows where child_order actually changed. The DEFERRABLE INITIALLY DEFERRED allows batching of updates within a transaction.

mbslave/sql/updates/20250425-mbs-13464.sql (2)

44-48: Barcode handling edge case.

The regex replacement [^0-9]+ strips non-digits, then left(..., 18) truncates to fit BIGINT. This handles most barcodes correctly. However, if the barcode is empty string, it becomes '0' which is fine. If it's NULL, the CASE doesn't handle it - but that's likely okay since regexp_replace(NULL, ...) returns NULL anyway.


75-79: Index definitions look good.

The sort indexes cover the common query pattern by artist with date/catalog/country/barcode ordering. The unique indexes on (release, artist) enforce the expected cardinality constraint per partition.

mbslave/sql/CreateMirrorOnlyTriggers.sql (3)

27-29: LGTM - Mirror trigger for medium updates.

Follows the same pattern as other mirror triggers in this file.


69-74: LGTM - Mirror triggers for release group type updates.

These call the _mirror suffix functions which are designed to be safe on mirror databases.


123-134: LGTM - Constraint triggers with WHEN conditions.

The WHEN (OLD.child_order IS DISTINCT FROM NEW.child_order) condition is appropriate here - only fires when child_order actually changes, which is the field that affects sort ordering in artist_release_group.

mbslave/sql/updates/20241017-mbs-9253-13464.sql (2)

40-95: Function logic looks solid.

The dynamic SQL approach for single vs. all release groups is a documented performance optimization. The bool_and for unofficial status correctly identifies release groups where all releases have non-official status.


97-128: Mirror trigger functions correctly guard replicated tables.

Both functions include the warning comment and only modify artist_release_group_pending_update, which is a non-replicated table. The IF (NEW.child_order IS DISTINCT FROM OLD.child_order) check avoids unnecessary work.

mbslave/sql/updates/20241017-mbs-9253-master_and_standalone.sql (2)

7-17: LGTM - Foreign key constraints with CASCADE.

The ON DELETE CASCADE is appropriate here since artist_release_group is a materialized/denormalized table. When the source artist or release_group is deleted, the corresponding rows should be cleaned up automatically.


19-35: LGTM - Triggers match the pattern in CreateTriggers.sql.

These are the master/standalone equivalents of the mirror triggers. The constraint triggers have the same WHEN condition to optimize execution.

mbslave/sql/updates/schema-change/30.master_and_standalone.sql (2)

69-72: Consider ON DELETE behavior for medium_gid_redirect FK.

This FK doesn't specify ON DELETE CASCADE. If a medium is deleted while redirects still point to it, this will cause FK violations. Depending on the intended behavior:

  • If redirects should be cleaned up when a medium is deleted: add ON DELETE CASCADE
  • If medium deletion should be blocked while redirects exist: current behavior is correct (implicit ON DELETE NO ACTION)

What's the expected behavior when a medium with GID redirects is deleted? Check if medium_gid_redirect rows should be cleaned up automatically.


51-61: LGTM - artist_release FK constraints.

Same pattern as artist_release_group - ON DELETE CASCADE is correct for this materialized table.

mbslave/sql/CreateFunctions.sql (5)

532-564: LGTM - Mirror trigger functions for release group types.

These functions correctly:

  1. Include the warning comment about not modifying replicated tables
  2. Check child_order IS DISTINCT FROM before inserting pending updates
  3. Return NULL as expected for AFTER triggers

1153-1166: Good fix - LEFT JOIN for release.

The change from an implicit join to LEFT JOIN release ON release.release_group = release_group.id is correct. A release_group with no releases should still have its metadata updated (with NULL dates), rather than being skipped entirely.


1202-1238: LGTM - Medium-related helper functions.

set_mediums_recordings_first_release_dates provides a batch interface for updating recording dates by medium IDs. The a_upd_medium_mirror trigger function correctly fires only when release changes, avoiding unnecessary work for other column updates.


1953-1989: LGTM - Enhanced get_artist_release_group_rows.

The additions:

  • rgpt.child_order::SMALLINT for primary type ordering
  • array_agg(DISTINCT rgst.child_order ...) for secondary type ordering
  • Additional JOINs to type tables
  • GROUP BY extended with rgpt.child_order

All align with the new artist_release_group table schema.


1453-1461: Exception handling pattern for delete_unused_url.

The function catches foreign_key_violation and silently returns. This is a safe pattern when you want to attempt deletion but accept that foreign keys may block it. The comment-less nature is fine since the purpose is clear from the exception type.

mbslave/sql/updates/schema-change/30.all.sql (4)

296-306: Medium GID migration - good approach.

The migration:

  1. Adds nullable column
  2. Populates with deterministic UUIDs via generate_uuid_v3
  3. Adds NOT NULL constraint
  4. Creates unique index

Using deterministic UUIDs from 'medium' || id ensures reproducibility across environments. The sequence avoids the issue of trying to add NOT NULL to an empty column.


347-350: TRUNCATE recording_first_release_date - data loss during migration.

This wipes all recording first release dates. The expectation is they'll be rebuilt by the triggers. Verify this is documented in migration instructions - users need to run a rebuild after the migration or accept temporarily missing data.

Is there a documented rebuild step after running this migration to repopulate recording_first_release_date?


379-418: Data migration for release_group_meta first_release_date.

The migration:

  1. Creates temp table with release first release dates
  2. Updates release_group_meta with correct dates using DISTINCT ON ordering
  3. Only updates where dates actually changed (lines 405-409)
  4. Truncates pending updates to avoid redundant recalculation

The comment at lines 411-417 explains why the truncation is safe. This is a well-structured data migration.


246-270: CAA view recreation.

The view drops and recreates cover_art_archive.index_listing. The new definition:

  • Uses edit.close_time IS NOT NULL for approved status
  • Uses LEFT JOIN to edit, allowing NULL edit references
  • Aggregates type names into an array

No functional issues - this is a straightforward view definition.

Comment thread DEVELOPMENT.rst
Comment thread docker-compose.dev.yml
Comment thread docker-entrypoint.sh
Comment thread Dockerfile.postgres
Comment thread Dockerfile.postgres
Comment thread mbslave/sql/updates/20250425-mbs-13464.sql
Comment thread mbslave/tests/test_docker_db.py
Comment thread mbslave/tests/test_docker_db.py
Comment thread mbslave/tests/test_docker_db.py
Comment thread README.rst
@MockingMagician
Copy link
Copy Markdown
Author

@lalinsky Is there someone actively maintaining this software !?

@lalinsky
Copy link
Copy Markdown
Member

lalinsky commented Jan 9, 2026

@lalinsky Is there someone actively maintaining this software !?

Noone is maintaining is actively, I'm unfortunately super busy with other things and did not notice this PR. Thank you for the ping. I'll have a look later today.

@MockingMagician
Copy link
Copy Markdown
Author

@lalinsky Is there someone actively maintaining this software !?

Noone is maintaining is actively, I'm unfortunately super busy with other things and did not notice this PR. Thank you for the ping. I'll have a look later today.

@lalinsky Thanks for the reply, and for your time ^^

@lalinsky
Copy link
Copy Markdown
Member

@MockingMagician Sorry it took me so long, the changes look good. What is the act binary? Can you remove that?

Comment thread mbslave/replication.py
logger.info("Skipping %s (table %s already contains data)", name, fulltable)
continue
try:
cursor.execute("ALTER TABLE %s DISABLE TRIGGER ALL" % fulltable)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there is a conceptual misundstanding, but you are supposed to be copying to schema without indexes/primary keys/unique keys and triggers. I don't like this being here, because it encourages importing to full DB and that's just not supported.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added it for performance because it greatly improves import time.
Since the import only runs when the table is empty, it should be safe in practice.

Comment thread mbslave/replication.py
logger.info("Loading %s to %s", name, fulltable)
cursor.copy_expert('COPY {} FROM STDIN'.format(fulltable), tar.extractfile(member))
try:
cursor.execute("SET session_replication_role = 'replica'")
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here, what does this improve is the schema contains nothing but tabls? Insert into a table without any index or key is super fast.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here. But I can remove it if you think it will be better.

Comment thread mbslave/replication.py
if source.startswith('http://') or source.startswith('https://'):
filename = source.split('/')[-1]
dest_path = os.path.join(work_dir, filename)
download_url(source, dest_path)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the benefit of downloading the file to disk? For my use case, I see that as negative.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The benefit is resilience: these files can be multiple GB, and writing to disk avoids re-downloading everything from scratch when the download gets interrupted.
If disk usage is considered a downside for your use case, I can make it optional.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my case, this saved me several hours of re-downloading during development/testing.

@MockingMagician
Copy link
Copy Markdown
Author

@MockingMagician Sorry it took me so long, the changes look good. What is the act binary? Can you remove that?

Hello @lalinsky,
I used act during development to run the GitHub Actions pipeline locally, and I forgot to remove it afterwards.
I’ll remove it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants