Skip to content
Open
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
93 changes: 93 additions & 0 deletions documents/wg-application.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,93 @@
# CUDA WG - draft

## What value do you want to bring to the project?

The Working Group is an attempt to combine efforts to bring reliable support of writing safe GPGPU CUDA code with Rust.
Comment thread
denzp marked this conversation as resolved.
Outdated
The work that will be done is supposed to clear the path for other GPGPU platforms in future.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think i'd just leave this out. You are right that some work here could be reused by other GPGPU Rust targets, but it feels quite speculative to say anything about that.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I don't quite agree here. One of the main and probably the biggest topic we will discuss, is the safety in SIMT code. So far, without diging too much into the details, it seems, this will be nearly completely platform-agnostic.

All those compilation warnings and lints we will define, could be applied to any other "coming next" GPGPU platform.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One of the main and probably the biggest topic we will discuss, is the safety in SIMT code.

This feels like something that ought to be done by the Unsafe Code Guidelines WG / Lang team, and only tangentially here.

is the safety in SIMT code. So far, without diging too much into the details, it seems, this will be nearly completely platform-agnostic.

I don't know. Previous implementations of SIMT in Rust, like this extension, look quite different from what we will able to do for the time being in Rust without significantly extending the language. At the same time, this is all different from OpenMP, rayon, ISPC, and the other dozens different approaches to this.

Tackling this problem changes the roadmap of the WG from "getting CUDA support kind-of-working" to "language extensions for SIMT".

While this is something that could be tackled later, I think it is unrealistic to try to tackle any of this at this point in time.

All those compilation warnings and lints we will define, could be applied to any other "coming next" GPGPU platform.

Safe Rust code not invoking undefined behavior cannot be a lint or a warning. It has to be something that is impossible to do 100% of the time, and rejected by the type system.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not saying that this issues are not important. Maybe it would be worth it to address them in a different question, e.g., about which kind of support we need from other teams, or which with other teams we expect to collaborate. We could add there that we expect to collaborate with the UCG WG and the Lang team to make sure that the SIMT programming model exposed by CUDA kernels remains sound or something like that.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tackling this problem changes the roadmap of the WG from "getting CUDA support kind-of-working" to "language extensions for SIMT".

Okay, thanks, I see the point!


## Why should it be put into place?

First steps to basic support of the feature have already been made by individual developers.
Comment thread
denzp marked this conversation as resolved.
Outdated
Regardless of recent progress, there are still a lot of questions about safety and soundness of SIMT (Single Instruction, Multiple Threads) code and they require a lot of collaboration to find solutions.
Comment thread
denzp marked this conversation as resolved.
Outdated

## What is the working group about?

We want to work together on getting a solid foundation for writing CUDA code.
Comment thread
denzp marked this conversation as resolved.
Outdated
Getting a safe and production-ready development experience on Stable Rust is our primary goal!

Also, currently a major obstacle for developing GPGPU applications and algorithms in Rust is a lack of learning resources.
We plan to solve the "documentation debt" with a broad range of tutorials, examples and references.

## What is the working group not about?

The WG is not focused on promoting or developing "standard" frameworks.
Instead, we want to provide basic and reliable support of the feature and inspire the community to start using it.
This should lead to experimenting with different approaches on how to use it and creating awesome tooling.
Comment thread
denzp marked this conversation as resolved.
Outdated

## Is your WG long-running or temporary?

In our current vision, the WG should live until we fulfill our goals.
Comment thread
denzp marked this conversation as resolved.
Outdated

In the end, we hope the WG will evolve into another one to cover similar topics:
to support other GPGPU platforms or to create higher-level frameworks to improve end-to-end experience based on community feedback.

## What is your long-term vision?

Having a reliable and safe CUDA development experience is our ultimate goal.
This should include:

* Getting `nvptx64-nvidia-cuda` to a [Tier 2](https://forge.rust-lang.org/platform-support.html) support state.
* Test infrastructure on a real hardware and running the tests during Rust CI process.
* Rich documentation, references, tutorials and examples.
* A broad set of new compile errors, warnings and lints for SIMT execution model to avoid pitfalls and ensure code soundness.

Comment thread
denzp marked this conversation as resolved.
## How do you expect the relationship to the project be?

The WG will be responsible for CUDA related issues and user requests.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we should be responsible for this, people can request all sort of things. I'd just document here what our current relationship with the project is. For example, we send PRs to Rust components, pick up Alex Crichton, which would be our Core Team Liasson, as a reviewer, etc.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, that was a bad wording 😃 I rather meant that we will participate in discussions around everything that's related to NVPTX target.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't think we should mention somebody particular at least at this point. It might feel like we are obligating the person with helping us.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIRC, they volunteered back then to play the role of the core team member responsible for the WG, but I know they have different priorities right now, so I'd ask them again.

From the POV of view of reviewing PRs to rust components, they are probable the person having the most complete picture of target-support/target-specific stuff. So picking Alex as a reviewer is probably the right thing to do anyways. If they don't have time to review they'll say so, but they will at least be able to suggest who should review it instead.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, nice, I didn't know about this.

So picking Alex as a reviewer is probably the right thing to do anyways.

Totally agree here!

Still, do you think we should mention the fact in the application? Seems a little too much going into details, for me.

This includes [already reported issues](https://github.com/rust-lang/rust/issues?q=is%3Aopen+is%3Aissue+label%3AO-NVPTX) and those that will be opened due to the work made by the WG.
An important aspect is that the WG will take care of the state of the `rust-ptx-linker` and will do its best to avoid blocking someone else's work (like it, unfortunately, happened in [rust#59752](https://github.com/rust-lang/rust/pull/59752)).

### How do you want to establish accountability?

For that purpose, we will publish an agenda and made decisions from our meetings.
Comment thread
denzp marked this conversation as resolved.
Outdated
Once we achieve important milestones we plan to make public announces.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is a great idea. There are also blog posts published by some of the WG members that show how to use CUDA from Rust and showcase the different frameworks. Maybe we could have a document in the repo that links to them ?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've opened #16 - I like how Async Foundations WG present their work with a website.


## Which other WGs do you expect to have close contact with?

We would like to cooperate with Core or Compiler Teams in discussions about safety in SIMT code.
Comment thread
denzp marked this conversation as resolved.
Outdated
Additionally, it would be very important to discuss a strategy of reliable deployment of `rust-ptx-linker` (likely as a `rustup` component).
Comment thread
denzp marked this conversation as resolved.
Outdated

On the other hand, proposed [Machine Learning WG](https://internals.rust-lang.org/t/enabling-the-formation-of-new-working-groups/10218/11) could leverage CUDA accelerated computing capabilities, so we can discuss their use cases.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd say here that we expect to have most contact with Alex Crichton, which could inform the Core Team about our work if they deem it necessary.

There are some open questions about the soundness of CUDA kernels and Rust's type system, which we will raise to the Language team when we deem appropriate (maybe link here to some of the open issues?).

We have also secured access to a Power9+Volta GPGPU cluster from OSI for CI (I think @peterhj requested the access?) and we'd like to work with the infra team to enable rust-lang/rust to use it at some point, but initially we'd like to experiment with using it on a smaller component like stdsimd to gain experience with it.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the prolonged absence (due to a combo of work + travel). We got moved around a couple times but what we actually have access to now is a POWER8 + 2x Pascal (Tesla P100) machine via usual HPC batch scheduler setup (torque/PBS) via another POWER8 head node. They did install nvidia-docker on the compute node though, so the docker-based stdsimd tests should run on it fine. I had a custom CI-like setup that was working for single GPU nodes, though the torque/PBS stuff was the last blocker I was dealing with.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@peterhj wonderful, thanks for an update!

They did install nvidia-docker on the compute node though, so the docker-based stdsimd tests should run on it fine.

So, let's say, we theoretically can run CUDA-adjusted and resource-restricted Rust Playground there? I feel it will be extremely helpful for quick experiments.

Perhaps, could you please provide basic documentation about these things when you'll have time? It will be useful for people who never worked with HPC job schedulers (me at least 😉).

Copy link
Copy Markdown
Contributor

@gnzlbg gnzlbg Jun 7, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose that you get ssh access to some front-end nodes, and the only way to access the nodes with the GPGPUs is to schedule jobs with Torque.

I don't think they allow Playground like services for people to execute arbitrary code there, but even if they do, we would need to host the playground somewhere else, and each command (e.g. run), would need to ssh into the system, schedule a job with Torque, wait for the job to be scheduled, fetch the job logs, and report back.

So you would probably be waiting a while for the results of a playground run there.

EDIT: I suppose we could schedule a 24/7 interactive job using Torque there but... if I were the admin and I'd see that I'd just kick us out and revoke our credentials.

## What are your short-term goals?

Mainly, short-term goals are about evaluating and discussing how can we achieve long-term goals:
Comment thread
denzp marked this conversation as resolved.
Outdated

* Make a poll about community experiences and use cases for CUDA.
* Deploy the `rust-ptx-linker` as a `rustup` component.
* Collect soundness and safety issues that can happen in SIMT execution model.
* Decide testing approach on a real hardware.

## Who is the initial leadership?

> TBD...

## How do you intend to make your work accessible to outsiders?

Excessive learning materials and retrospective about made decisions should help to get more people involved in either discussions or further experimenting.
Comment thread
denzp marked this conversation as resolved.
Outdated

> TBD... something else?

## Everything that is already decided upon

We already have a [`rust-cuda`](https://github.com/rust-cuda) Github organization and a [`rust-cuda`](https://rust-cuda.zulipchat.com) Zulip server.
Comment thread
denzp marked this conversation as resolved.
Outdated

> TBD... would it make sense to move to a `rust-lang` Zulip server?

## Where do you need help?

> TBD...
Comment thread
denzp marked this conversation as resolved.
Outdated

## Preferred way of contact/discussion

> TBD...
Comment thread
denzp marked this conversation as resolved.
Outdated