Skip to content
Open
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
28 changes: 24 additions & 4 deletions dependencies.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -62,6 +62,15 @@ files:
- depends_on_libcugraph
- depends_on_libcugraph_etl
- depends_on_libcugraph_tests
run_notebooks:
output: none
includes:
- cuda_version
- py_version
- run_notebooks
- depends_on_cugraph
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do like creating a new top-level list called just run_notebooks, but think it's confusing to see anything named test_* showing up in this list, especially to see something called test_notebook showing up in run_notebooks when there is also a test_notebooks.

I have an alternative proposal that I think might make this a bit easier to maintain and understand, let me know what you think:

  1. leave test_python_common completely unchanged
  2. create a new list under dependencies: called run_notebooks. Add to that list only packages that are needed for the notebooks and aren't already runtime dependencies of cugraph, pylibcugraph, and libcugraph. Use YAML anchors to reference other things already in dependencies.yaml. Add a code comment above it explaining its purpose.
  3. make this top-level run_notebooks list here contain cuda_version, py_version, run_notebooks, and the depends_on_*cugraph groups
  4. pass --file-key test_notebooks --file-key run_notebooks here https://github.com/rapidsai/docker/blob/2637556074567056aa69a7cb1850ac844ef89748/context/notebooks.sh#L34 (multiple --file-key are allowed)

That way, we could fine-tune this to exactly the libraries that are needed and nothing else, and the purpose will be clear.

Copy link
Copy Markdown
Contributor Author

@jayavenkatesh19 jayavenkatesh19 Mar 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jameslamb I'm pretty clear on everything until step 3.

In step 4 however, we would need a if-elif statement to use only the run_notebooks file-key for cugraph right?
If the test-notebooks key is added, then dependency lists from both keys would be merged, bringing in unwanted libraries like pytest again.

Something like:

    if [ -f "$REPO/dependencies.yaml" ]; then
        # Prefer run_notebooks (runtime-only deps) over test_notebooks
        # (which may include testing libraries like pytest).
        if yq -e '.files.run_notebooks' "$REPO/dependencies.yaml" >/dev/null 2>&1; then
            FILE_KEY="run_notebooks"
        elif yq -e '.files.test_notebooks' "$REPO/dependencies.yaml" >/dev/null 2>&1; then
            FILE_KEY="test_notebooks"
        else
            continue
        fi
        echo "Running dfg on $REPO (file-key: $FILE_KEY)"
        rapids-dependency-file-generator \
            --config "$REPO/dependencies.yaml" \
            --file-key "$FILE_KEY" \
            --matrix "cuda=${CUDA_VER%.*};arch=$(arch);py=${PYTHON_VER}" \
            --output conda >"/dependencies/${REPO}_notebooks_tests_dependencies.yaml"
    fi

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right, I wasn't thinking about the fact that test_notebooks is still the key in the other repos.

Yes we'll need different keys for different repos, but let's make the code much stricter than it is today and just hard-code that expectation directly.

Something like:

DFG_ARGS=(
  --config "./dependencies.yaml"
  --matrix "cuda=${CUDA_VER%.*};arch=$(arch);py=${PYTHON_VER}"
  --output conda
)
for repo in repos; do
   if [[ "${repo}" == "cugraph" ]]; then
       DFG_ARGS+=(--file-key "run_notebooks")
   else
       DFG_ARGS+=(--file-key "test_notebooks")
   fi
done

That'll fail with a big loud error (instead of silently passing) if any of our expectations aren't met, like:

  • repo doesn't have a dependencies.yaml (<-- this would also catch being in the wrong working directory)
  • repo doesn't have the exact keys we expect

We're only doing this testing for 3 repos there, let's go with specific and strict over generic and permissive, to give ourselves a better chance of catching bugs at the source.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, that sounds good!

I've made the changes on both this PR and the rapidsai/docker#853 PR on Docker.

- depends_on_pylibcugraph
- depends_on_libcugraph
test_notebooks:
output: none
includes:
Expand Down Expand Up @@ -483,8 +492,8 @@ dependencies:
- output_types: [conda, requirements]
packages:
- certifi
- ipython
- notebook>=0.5.0
- &ipython ipython
- &notebook notebook>=0.5.0
test_python_common:
common:
- output_types: [conda, pyproject, requirements]
Expand All @@ -494,17 +503,28 @@ dependencies:
- pytest-benchmark
- pytest-cov
- pytest-xdist
- scipy
- &scipy scipy
test_python_cugraph:
common:
- output_types: [conda, pyproject, requirements]
packages:
- certifi
- networkx>=2.5.1
- &networkx networkx>=2.5.1
- *numpy
- packaging
- python-louvain
- scikit-learn>=0.23.1
# Runtime dependencies for included cugraph notebooks, used in the
# rapidsai/notebooks Docker image. Only lists packages not already
# provided by cugraph, pylibcugraph, or libcugraph.
run_notebooks:
common:
- output_types: [conda]
packages:
- *ipython
- *networkx
- *notebook
- *scipy
test_python_pylibcugraph:
common:
- output_types: [conda, pyproject, requirements]
Expand Down
Loading