Add S3 storage and multi-hop transfer tutorial section#766
Conversation
Addresses feedback from PR rucio#744: - Remove Jupyter notebook - Inline script content directly as code blocks - Replace inline script comments with prose descriptions - Remove docker exec wrapping; assume reader is in a Rucio admin environment - Narrow scope to S3 RSE setup only; remove environment initialization steps
for more information, see https://pre-commit.ci
|
|
||
| This tutorial covers how to register S3-compatible storage (MinIO) as Rucio Storage Elements (RSEs), configure credentials for both Rucio and FTS, and set up RSE distances to enable multi-hop transfers between S3 and XRootD endpoints. | ||
|
|
||
| The examples use a Docker Compose playground environment with two MinIO instances (MINIO1, MINIO2) and three XRootD servers (XRD1, XRD2, XRD3). The commands assume you are already operating within a Rucio admin environment with the `rucio` and `rucio-admin` CLI tools available. |
There was a problem hiding this comment.
For clarity, rucio and rucio-admin have been merged since rucio 38, the admin role is handled by the permission policies. Instead I'd change this too
| The examples use a Docker Compose playground environment with two MinIO instances (MINIO1, MINIO2) and three XRootD servers (XRD1, XRD2, XRD3). The commands assume you are already operating within a Rucio admin environment with the `rucio` and `rucio-admin` CLI tools available. | |
| The examples use a Docker Compose playground environment with two MinIO instances (MINIO1, MINIO2) and three XRootD servers (XRD1, XRD2, XRD3). The commands assume you are already have an rucio instance with an admin account. |
|
|
||
| Register both MinIO instances as RSEs with S3 protocol configuration. The `gfal.NoRename` implementation is used because S3 does not support server-side rename operations. | ||
|
|
||
| ```bash |
There was a problem hiding this comment.
This can be reduced to a for loop (over both MINIO1 and 2), it would make this easier to read
| --prefix /rucio/ \ | ||
| --impl rucio.rse.protocols.gfal.NoRename \ | ||
| --domain-json '{"lan": {"read": 1, "write": 1, "delete": 1}, "wan": {"read": 1, "write": 1, "delete": 1, "third_party_copy_read": 1, "third_party_copy_write": 1}}' | ||
| rucio rse attribute add MINIO1 --key sign_url --value s3 |
There was a problem hiding this comment.
These could use some inline comments to explain each of the options, or at least a link to the config params page in the section description
| ```bash | ||
| ID1=$(rucio rse show MINIO1 | grep '^ id:' | awk '{print$2}') | ||
| ID2=$(rucio rse show MINIO2 | grep '^ id:' | awk '{print$2}') | ||
| cat >/opt/rucio/etc/rse-accounts.cfg <<JSON |
There was a problem hiding this comment.
This doesn't render to anything special, I'd instead just use the console tag to instead do
$ cat /opt/rucio/etc/rse-accounts.cfg
{
"$ID1": {
"access_key": "admin",
"secret_key": "password",
"signature_version": "s3v4",
"region": "us-east-1"
},
"$ID2": {
"access_key": "admin",
"secret_key": "password",
"signature_version": "s3v4",
"region": "us-east-1"
}
}|
|
||
| ### Configuring RSE Distances for Multi-Hop | ||
|
|
||
| RSE distances tell Rucio which transfer paths are available and their relative cost. Setting a distance of 1 between MINIO RSEs and XRD3 establishes the multi-hop path: transfers from MinIO to XRD1 or XRD2 will route through XRD3 as an intermediate. |
There was a problem hiding this comment.
This explanation is technically correct, but a little overly verbose. I recommend putting in a mermaid chart instead to make this more visual.
| RSE distances tell Rucio which transfer paths are available and their relative cost. Setting a distance of 1 between MINIO RSEs and XRD3 establishes the multi-hop path: transfers from MinIO to XRD1 or XRD2 will route through XRD3 as an intermediate. | |
| RSE distances establish transfer paths between RSEs. Setting a distance of 1 between the source and an intermediate will ensure the intermediate transfer will always be preferred over longer direct transfers. | |
| ```mermaid | |
| graph TD | |
| MINIO[MINIO RSE] | |
| XRD1[XRD1 RSE] | |
| XRD2[XRD2 RSE] | |
| XRD3[XRD3 RSE] | |
| MINIO -.->|distance=1| XRD3 | |
| XRD3 -.->|distance=1| XRD1 | |
| XRD3 -.->|distance=1| XRD2 |
|
Hi @alessio94 , have you gotten a chance to look at these comments? |
|
Hi @voetberg, thanks for the follow-up and apologies for the slow response. I'll address the remaining comments, but I want to be transparent: this PR is now at its third or fourth iteration and I'm finding it difficult to prioritize given that my ATLAS qualification task effectively concluded weeks ago. I'd appreciate if you could consolidate any remaining feedback in one pass once I push the next update, so we can move toward a final review without further back-and-forth. A couple of questions before I start: for the suggested changes you left inline, should I simply accept them as-is, or are there parts you'd like me to rework independently? Also, given how many commits have landed on main since this branch was created, would it make more sense to open a fresh PR on a rebased branch rather than continuing here? I'll aim to push the changes within the next few days. |
Added detailed instructions for configuring S3 storage and multi-hop transfers in Rucio, including setting up MinIO instances, registering RSEs, and verifying the setup. This PR supersedes rucio#744, rucio#739, and rucio#766. Addresses feedback from PR rucio#766. Thanks for the review, @voetberg. I have implemented the requested documentation changes: - Updated the introduction to reflect that `rucio` and `rucio-admin` have been merged since Rucio 38, and that admin access is now handled through permission policies. - Reduced the duplicated MinIO RSE registration commands to a loop over `MINIO1` and `MINIO2`. - Added explanatory comments for the RSE attributes used in the S3 protocol configuration. - Reworked the `rse-accounts.cfg` section to separate the commands used to generate the file from the rendered configuration output, using a `console` block for the inspected file content. - Simplified the explanation of RSE distances and added a Mermaid diagram to make the multi-hop topology clearer. - Removed the Jupyter notebook material from the documentation, keeping the demo setup focused on the operator workflow.
I will do my best on this, but if you make changes that need additional comments after this, I cannot make promises.
It makes more sense to just modify this existing PR with the suggested changes, you can simply use
All the content on this page is new, so it shouldn't have to be rebased. The PR doesn't show any conflicts so making a new PR or rebasing won't do anything. |
This PR supersedes #744 and #739.
Addresses feedback from PR #744:
@voetberg