Skip to content

Add mart for OCW resources#2236

Open
pt2302 wants to merge 3 commits into
mainfrom
pt/ocw_resources_mart
Open

Add mart for OCW resources#2236
pt2302 wants to merge 3 commits into
mainfrom
pt/ocw_resources_mart

Conversation

@pt2302
Copy link
Copy Markdown
Contributor

@pt2302 pt2302 commented May 21, 2026

What are the relevant tickets?

Part of https://github.com/mitodl/hq/issues/9943.

Description (What does it do?)

This PR adds a dimensional-layer-backed mart for OCW resources, replacing the Superset dataset that currently reads int__ocw__resources directly. It adds dim_ocw_resource (one row per course_uuid/resource_uuid, sourced from int__ocw__resources) and marts__ocw_resources (references only the dimensional layer, not int__*). It also surfaces nine fields from the raw resource metadata JSON as scalar columns: resource_license, resource_description, resource_file_type, resource_file_size, resource_ocw_type, resource_audience, resource_level, external_resource_status, external_resource_wayback_url, and drops the raw metadata blob from the dim/mart. A few of these (audience, level, wayback_url) are currently sparse, but are expected to be filled in over time.

How can this be tested?

First, run

uv run dbt run \
  --select +marts__ocw_resources \
  --full-refresh \
  --vars 'schema_suffix: <your name>' \
  --project-dir src/ol_dbt/ \
  --profiles-dir src/ol_dbt/ \
  --target dev_production

filling in <your name> as appropriate.

Then, run

uv run dbt test \
  --select int__ocw__resources dim_ocw_resource marts__ocw_resources \
  --vars 'schema_suffix: <your name>' \
  --project-dir src/ol_dbt/ \
  --profiles-dir src/ol_dbt/ \
  --target dev_production

Finally, smoke-test this in Starburst Galaxy (https://mitol.galaxy.starburst.io/query-editor) by running a query such as

select
    course_number,
    course_title,
    resource_title,
    content_type,
    resource_license,
    resource_file_type,
    external_resource_status,
    external_resource_is_broken
from ol_data_lake_production.ol_warehouse_production_<your name>_mart.marts__ocw_resources
order by course_number
limit 20;

Copilot AI review requested due to automatic review settings May 21, 2026 04:14
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR is intended to introduce a dimensional-layer-backed OCW resources mart (per PR description), and begins surfacing additional OCW resource metadata fields as scalar columns for downstream analysis.

Changes:

  • Added scalar extraction of several resource-level metadata fields (license, description, file_type/size, ocw_type, external link status/wayback URL, audience, level) to int__ocw__resources.
  • Updated intermediate and marts YAML model documentation (including adding a new marts__ocw_resources model entry) and cleaned up course_level description formatting.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.

File Description
src/ol_dbt/models/marts/ocw/_marts__ocw__models.yml Adds schema/docs for marts__ocw_resources (but currently missing the corresponding model SQL in-repo).
src/ol_dbt/models/intermediate/ocw/int__ocw__resources.sql Extracts additional scalar fields from websitecontent_metadata JSON into dedicated columns.
src/ol_dbt/models/intermediate/ocw/_int_ocw__models.yml Documents the new extracted columns on int__ocw__resources and fixes course_level description wrapping.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +76 to +79
- name: marts__ocw_resources
description: OCW course resources (files, external resources, video, image) for
review and analysis
columns:
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated in efa4981.

Comment on lines +172 to +173
- dbt_utils.unique_combination_of_columns:
combination_of_columns:
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated in 57b582d.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 6 out of 6 changed files in this pull request and generated no new comments.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants