WebGPURenderer: Adjust getArrayBufferAsync behavior, add support for partial readback by gkjohnson · Pull Request #33322 · mrdoob/three.js

gkjohnson · 2026-04-03T12:04:11Z

This PR fixes a number of issues introduced in #33300, adjusts support for "ReadbackBuffer" (detailed below), and adds support for passing "offset" and "count" to the function for mapping and reading back a portion of the buffer attribute.

Signature

This is the new call signature. The function optionally takes a target ArrayBuffer to write into or ReadbackBuffer to copy the data into and map. If no "target" is provided then a new ArrayBuffer is constructed and returned, as is done in r183 meaning this function is backwards compatible. Both "count" and "offsets" are in bytes. And the "target" object is the one returned from the function, aligning with the patterns used for functions that take a "target" object across the project:

getArrayBufferAsync(
  attribute: BufferAttribute,
  count = - 1: number,
  offset = 0: number,
  target = null: ArrayBuffer | ReadbackBuffer
): ArrayBuffer | ReadbackBuffer;

The ReadbackBuffer shape has changed like so. It represents an intermediate buffer of "maxByteLength" in size on the GPU for copying data into and mapping to the CPU. The "buffer" field is set to the mapped WebGPU-api provided buffer and is subsequently set to "null" when releasing or disposing the readback buffer instance.

class ReadbackBuffer {
  name: string;
  maxByteLength: number;
  buffer: ArrayBuffer | null;
  constructor( maxByteLength: number );

  release();
  dispose();
}

Behavior Changes

Fixed the WebGL code path logging an error and returning a buffer of all zeroes.
BufferAttribute.array is no longer read in either WebGPU nor WebGL codepaths so users can continue to truncate the CPU-side arrays after upload, as designed.
WebGL readback path now correctly returns an ArrayBuffer rather than a TypedArray.
Removed internal allocation and retention of ReadbackBuffers when the user passes only a BufferAttribute. This was creating a lot of unnecessary retained GPU memory, created confusing overwrite behavior when running multiple interleaved kernel operations and readbacks, and caused an error to throw when trying to issue a subsequent readback before the first one finished.
Only the necessary content is copied and mapped to the CPU.
Simplified WebGL code path to remove unnecessary intermediate buffers.
Decouple "ReadbackBuffer" from "BufferAttribute" so the same readback buffer can be used for any buffer attribute's data. This allows the instances to be pooled and reused.

Testing

Running this code snippet with WebGPURenderer demonstrates the code working as expected and the ability to run multiple readbacks in parallel. This works with both "forceWebGL" set to true and false.

const mesh = new THREE.Mesh( new THREE.SphereGeometry() );
mesh.geometry.index.array = new Uint32Array( mesh.geometry.index.array );
scene.add( mesh );

renderer.render( scene, camera );

requestAnimationFrame( async () => {

	const attr = mesh.geometry.index;
	const targetHandle = new THREE.ReadbackBuffer( 8 );
	const targetBuffer = new ArrayBuffer( 8 );
	const [ buff0 ] = await Promise.all( [
		renderer.getArrayBufferAsync( attr, 8, 8 ),
		renderer.getArrayBufferAsync( attr, 8, 8, targetBuffer ),
		renderer.getArrayBufferAsync( attr, 8, 8, targetHandle ),
	] );

	console.log(
		new Uint32Array( buff0 ),
		new Uint32Array( targetBuffer ),
		new Uint32Array( handle.buffer ),
	);

} );

Tested with basic integration on a small buffer in this three-edge-projection demo which runs multiple compute kernels and readbacks in parallel to perform hidden edge removal.

github-actions · 2026-04-03T12:06:32Z

📦 Bundle size

Full ESM build, minified and gzipped.

	Before	After	Diff
WebGL	361.21 85.78	361.21 85.78	+0 B +0 B
WebGPU	636.67 176.75	637.34 177.03	+667 B +279 B
WebGPU Nodes	634.79 176.45	635.46 176.73	+667 B +280 B

🌳 Bundle size after tree-shaking

Minimal build including a renderer, camera, empty scene, and dependencies.

	Before	After	Diff
WebGL	493.58 120.42	493.58 120.42	+0 B +0 B
WebGPU	708.75 191.64	709.19 191.86	+437 B +223 B
WebGPU Nodes	657.97 178.89	658.41 179.15	+437 B +261 B

Mugen87 · 2026-04-04T08:39:48Z

I'm not feeling strong about which direction the API should head here especially since I have used GPU readbacks only sporadically in the past. IMO, it's best if you and @sunag settle on a API. Related #33315.

sunag · 2026-04-04T11:17:39Z

I think the idea of sharing buffers is great but the API proposed in the PR #33315 seems simpler and can cover the same use cases as the current one, with the exception of offset/size, which should be easy to implement.

Clone arraybuffer after obtaining it seem a bit contradictory to the observations we made earlier, since they use slice() and set() for the buffers, which will introduce unnecessary overhead for the user who only wants to read the buffer after obtaining it, this was improved in the latest updates I made to the PR, I think we should keep it. Cloning the buffer after obtaining it should be the user’s responsibility.

I think the implementations of the buffer property in ReadbackBuffer with return overload not seems bring real benefits, since we can obtain the array and getArrayBufferAsync always returns an ArrayBuffer consistent with the function's own name.

I would prefer to see a simpler function signature with parameter and return overloads:

class ReadbackBuffer {
  name: string;
  size: number;
  constructor( size: number );

  release();
  dispose();
}

getArrayBufferAsync( attribute ) : ArrayBuffer
getArrayBufferAsync( attribute, offset, size ) : ArrayBuffer

getArrayBufferAsync( attribute, readbackBuffer ) : ArrayBuffer
getArrayBufferAsync( attribute, readbackBuffer, offset, size ) : ArrayBuffer

gkjohnson · 2026-04-05T03:54:43Z

seems simpler and can cover the same use cases as the current one

It does not cover the same use cases. It includes all the same problematic behavior I called out in the original PR and this one. I would appreciate it if you start asking questions about why I prefer something and have recommended it rather than making sweeping claims like this. I have not suggested this approach for no reason.

Clone arraybuffer after obtaining it seem a bit contradictory to the observations we made earlier, since they use slice() and set() for the buffers, which will introduce unnecessary overhead for the user

All of my comments on why this is a problem seem to continually be ignored and I'm at a complete loss for why. First of all, I never asked for this and as a user I do not want this behavior. This doesn't solve anything I asked for and mandates incredibly confusing functionality not to mention breaks existing code unnecessarily.

This code will no longer work as expected. The buffers are now set to the same buffer:

let buff0, buff1;
renderer.compute( kernel, [ 1000, 1, 1 ] );
buff0 = await renderer.getArrayBufferAsync( attr );

renderer.compute( kernel, [ 1000, 1, 1 ] );
buff1 = await renderer.getArrayBufferAsync( attr );

And this code will throw an error whereas it does not here nor in r183:

let pr0, pr1;
renderer.compute( kernel, [ 1000, 1, 1 ] );
pr0 = renderer.getArrayBufferAsync( attr );

renderer.compute( kernel, [ 1000, 1, 1 ] );
pr1 = renderer.getArrayBufferAsync( attr );

const [ buff0, buff1 ] = Promise.all( [ pr0, pr1 ] );

Not having to worry about managing these buffers and race conditions is a benefit. I will frequently write code quickly using less efficient mechanisms specifically so I can get something working then switch to more optimal system (like ReadbackBuffer in this case) so I plan to continue using this simple path in dev.

This is even before we get to the fact that there is nearly no benefit and a ton of downsides:

Cons

Hidden, unexpected, and unnecessary buffer overwrite behavior with no indicator.
Inability to read from the same buffer attribute back to back.
Persistent GPU & CPU memory allocation the user has no control or knowledge of.
Not backwards compatible.

Pros

A small perf benefit from no slice maybe? But it's no slower than the "set" that replaced it in my testing.

Bottom line is this hasn't actually solved a problem anyone has raised. If we're going to make such dramatic changes to an existing API can we at least wait for someone to actually complain about the code path first? Again this isn't anything I wanted and I raised the issue. The ReadbackBuffer otherwise enables the behavior I'm looking for.

I think the implementations of the buffer property in ReadbackBuffer with return overload not seems bring real benefits

It absolutely does. The buffer attached to "ReadBuffer" is the buffer that has been mapped and is the buffer that will be unmapped. This means the given buffer can be set to null or neutered at any time. So passing the "ReadBuffer" instance to other parts of an application means they can attach "release" or "dispose" events to it to determine exactly when that buffer will have been invalidated or changed. It also serves as a flag indicating whether the buffer is currently mapped or not denoting that it's "in use".

I would prefer to see a simpler function signature with parameter and return overloads:

We can discuss rearranging the arguments but across the project the "target" object is consistently the final object in arguments list in math functions etc and is returned from the function. I'm not sure why this pattern should be different here. And if the function is going to work without passing a "readbackBuffer" as an option then my proposed order still allows you to readback a smaller buffer w/ count and offset.

sunag · 2026-04-05T13:16:01Z

Could you make a simple fiddle, as you usually do, for your use case based on this PR?

sunag · 2026-04-05T15:33:47Z

I think we should avoid cloning ArrayBuffer in the core; the user can easily do this using slice after obtaining if needed. It is more common for the user to manipulate the ArrayBuffer in one frame and expect the changes in another; if they need something more specific like this, they can still do it.

let buff0, buff1;
renderer.compute( kernel, [ 1000, 1, 1 ] );
buff0 = ( await renderer.getArrayBufferAsync( attr ) ).slice( 0 );

renderer.compute( kernel, [ 1000, 1, 1 ] );
buff1 = ( await renderer.getArrayBufferAsync( attr ) ).slice( 0 );

sunag · 2026-04-05T15:52:12Z

I'm not sure why this pattern should be different here.

The count, offset signature is very unusual; people are used to using offset, size. It would be better to use the term size as well, since we are referring to these same terms in other related places. I think we can add an internal parameter overload to bring this benefit to the user.

sunag · 2026-04-05T16:14:26Z

All of my comments on why this is a problem seem to continually be ignored and I'm at a complete loss for why.

Sorry if I wasn’t able to be that specific, but this problem could be solved in the way below using with the changes made in #33315. You can paste this code into webgpu_compute_audio. I don’t think it’s justifiable to clone the GPU/CPU buffer every time we make a request using getArrayBufferAsync(attribute) just to support this use case as proposed in this PR, since previous change will benefit the most common use cases.

If the user wants to obtain multiple fractions of the same GPU buffer in parallel, they should use ReadbackBuffer, even to better manage this since it's moving towards something more advanced, the user will usually obtain portions to optimize processing, and this is counterintuitive if they are recreating the GPU/CPU buffer every time.

async function getArrayBuffer() {

	const buffer = new THREE.ReadbackBuffer( waveArray.value.array.byteLength  );

	const array = ( await renderer.getArrayBufferAsync( waveArray.value, buffer ) ).slice( 0 );
	buffer.dispose();

	return array;

}

let c1 = getArrayBuffer();
let c2 = getArrayBuffer();

const [ waveBuffer1, waveBuffer2 ] = await Promise.all( [ c1, c2 ] );

const wave = new Float32Array( waveBuffer2 );

mrdoob · 2026-04-06T07:16:16Z

@sunag

I think we should avoid cloning ArrayBuffer in the core; the user can easily do this using slice after obtaining if needed. It is more common for the user to manipulate the ArrayBuffer in one frame and expect the changes in another; if they need something more specific like this, they can still do it.
let buff0, buff1;
renderer.compute( kernel, [ 1000, 1, 1 ] );
buff0 = ( await renderer.getArrayBufferAsync( attr ) ).slice( 0 );

renderer.compute( kernel, [ 1000, 1, 1 ] );
buff1 = ( await renderer.getArrayBufferAsync( attr ) ).slice( 0 );

We have gone through this kind of API design issues before.

This API expects too much from the developer and it would produce hard to find bugs if used incorrectly.

Instead, we always try to make things work even if it's not optimal.

An example of this is raycaster.intersectObjects():

three.js/src/core/Raycaster.js

Lines 214 to 226 in f9152b7

    
           intersectObjects( objects, recursive = true, intersects = [] ) { 
        
           	for ( let i = 0, l = objects.length; i < l; i ++ ) { 
        
           		intersect( objects[ i ], this, intersects, recursive ); 
        
           	} 
        
           	intersects.sort( ascSort ); 
        
           	return intersects; 
        
           }

Ideally the developer would pass an intersects array. If they don't, the function creates a new one.

If we tried to reuse an internal array for it, it would most likely result in a really bad experience for the developer.

mrdoob · 2026-04-06T07:22:17Z

Lets try to find an API design that works for @gkjohnson 🙏

sunag · 2026-04-06T11:04:07Z

@mrdoob It makes sense, even if we cloned the CPU, we would still have the GPU buffer, but does it really make sense to recreate GPU buffers every time we use getArrayBufferAsync(attribute) ? Maybe I'm off here but we're dealing with buffers that can be large.

The Renderer is optimized to work in a cycle; we would end up creating a new GPU buffer if the user uses this in their render loop, which is quite likely to happen.

If we were to use it the way it was or is here, I really wouldn’t recommend using getArrayBufferAsync(attribute) and would instead make ReadbackBuffer mandatory for performance reasons, resuing GPU buffer and cloning the CPU can be something easy to implement if we are considering this but would prevent obtaining multiple fractions of the same GPU without readbackBuffer as a parameter.

What I want most is to be able to resolve the issue that @gkjohnson brought with great suggestions, but I would also like to have an easy-to-use API that balances performance and simplicity.

sunag · 2026-04-06T11:22:05Z

If we were to use it the way it was or is here, I really wouldn’t recommend using getArrayBufferAsync(attribute) and would instead make ReadbackBuffer mandatory for performance reasons

In that case, it is also preferable to change the function signature and deprecate the previous usage warning the user and providing a internal fallback.

await renderer.getArrayBufferAsync( attribute, readbackBuffer, offset, size )

I might be mistaken, but offset, size are usually optional; if the user wants to read the entire buffer, it would be enough to use getArrayBufferAsync(attribute, readbackBuffer).

mrdoob · 2026-04-07T06:59:38Z

Making it mandatory seems like a safer dev experience.

It could even be like this:

await renderer.getArrayBufferAsync( readbackBuffer, attribute, offset, size )

mrdoob · 2026-04-09T06:29:01Z

@mrdoob It makes sense, even if we cloned the CPU, we would still have the GPU buffer, but does it really make sense to recreate GPU buffers every time we use getArrayBufferAsync(attribute) ? Maybe I'm off here but we're dealing with buffers that can be large.

Seems like this is the main isssue... Have you measured this? Maybe it's a premature optimization?

mrdoob · 2026-04-09T06:29:40Z

By the way, I'm putting the release on hold until this issue/PR is resolved.

sunag · 2026-04-09T10:27:43Z

Seems like this is the main isssue... Have you measured this? Maybe it's a premature optimization?

I think projects like Tetrament, for example, will benefit if we solve this issue; this would be just one of possible use cases that would use this in the update cycle. I think that if we update the order of the parameters now we can avoid patches later too.

https://github.com/zalo/Tetrament/blob/420d4a6781344337604fd2bc9debec7f4132608a/lib/core/SoftbodySimulation.js#L788-L794

I will made these changes in my alternative PR, this should help with the decisions as well.

gkjohnson · 2026-04-09T11:39:44Z

@sunag Perhaps the question was unclear. In #33322 (comment) you say that recreating a GPU Buffer for copying every function call is so performance intensive that we should prevent users from using the function in such a way:

we would still have the GPU buffer, but does it really make sense to recreate GPU buffers every time we use getArrayBufferAsync(attribute) ? Maybe I'm off here but we're dealing with buffers that can be large.
...
If we were to use it the way it was or is here, I really wouldn’t recommend using getArrayBufferAsync(attribute) and would instead make ReadbackBuffer mandatory for performance reasons

How have you tested the performance of creating a new buffer? How have you determined that it is so intensive?

sunag · 2026-04-09T15:03:21Z

I created a fiddle, you can control it in 1k float because bytes are usually irrelevant, WebGL seems to be 2.5 slower than WebGPU. https://jsfiddle.net/eLbr76dx/

I understand it may be a premature optimization, or maybe we are just ignoring a micro-optimization, depending on the use case and point of view.

In any case, I would like @gkjohnson @mrdoob to consider the analysis of the alternative I updated #33315; it should resolves this issue about cache the buffer and alerts the user if the ReadbackBuffer is being used concurrently offering the alternative, the parameters seem cleaner since there is no need for return overloading.

It was what I managed to improve by collecting the feedback from here.

gkjohnson · 2026-04-10T02:34:41Z

Here are concrete numbers, which are necessary to have a discussion around this. And this is the fiddle. The WebGL path does not require an intermediate buffer to copy data so the timing in that case is irrelevant.

Platform	Time to Create / Write Buffer
Pixel 10 (Chrome)	~0.005ms
Samsung Galaxy S20	~0.009ms
Macbook Pro (Chrome)	~0.001ms
Macbook Pro (Firefox)	~0.001ms
Windows w/ RTX 2070 (Chrome)	~0.002ms

So to be clear: we are discussing thousandths of a millisecond as an "optimization". And in order to "optimize" this away we're discussing retaining GPU memory that the user has no control over, silently doubling the footprint the storage buffer until it's disposal, as well as reducing the simple usability of this code path. I'm having a hard time seeing how this is a benefit. Can someone please explain this? It seems closer to a memory leak for any user more than it is an optimization.

And again, I am asking for this code path because I would like to use it during development & prototyping and I also know I will be bitten by this memory thing in the future, as will other users.

Here is my proposal: I have adjusted this PR to use the suggested signature (attribute, target, offset, count) because I agree it's an improvement. This also places the "target" object in the second field so we can make it mandatory at a later time if we feel the need. But generally I would like to wait to do anything around this GPU-side buffer creation until it's actually proven to be a problem, which it has not yet. Creating and disposing the buffer allows for a simple, easy to use path that will work for any use case without memory accumulation and the ReadbackBuffer serves well for optimized paths.

If this is not acceptable then I will adjust the API so providing a ReadbackBuffer is required.

Regarding #33315, it still does not address the core goals of the original issue. I have run out of energy for repeating the goals of the readback buffer at this point, unfortunately, and I'm sorry for that. I have tested this PR with the use case that spurned this need (a more complex version of this demo) and have verified that it works, reduces the memory usage, and improves the performance of the task (16sec to 14sec, or 2sec / ~14% improvement) when ReadbackBuffer is used.

gkjohnson added 9 commits April 2, 2026 13:07

Adjust "getArrayBufferAsync" call signature

5910b8d

Fix memory tracking, read buffer function

a494efd

Fix "ReadbackBuffer" path

6ffc625

Add removal of "release" callback

a5f9ebb

Update docs

091a045

Initial WebGL fix

2636b61

variable rename

4fb558a

Fix up WebGL readback

0423623

maxByteSize -> maxByteLength

a5ab2f8

gkjohnson added this to the r184 milestone Apr 3, 2026

gkjohnson requested a review from sunag April 3, 2026 12:04

gkjohnson added the WebGPU label Apr 3, 2026

Remove unused import

097a5b6

gkjohnson added 3 commits April 3, 2026 21:07

Fix attribute pass-through

ed614f3

Fix missed member rename

43a1c4c

Use the correct name

1006421

gkjohnson requested a review from Mugen87 April 3, 2026 12:15

Merge remote-tracking branch 'mrdoob/dev' into readback-fixes

0b8b584

gkjohnson and others added 5 commits April 10, 2026 11:39

Rearrange arguments, add guard against double-readback

a5588cf

Fix event handlers

dee9a78

fix reuse of readback buffer

a6d7d0e

define _mapped=false in dispose too

80402fa

add check for multiple of 4

019e180

sunag merged commit 99cc9af into mrdoob:dev Apr 10, 2026
10 checks passed

sunag mentioned this pull request Apr 10, 2026

WebGPURenderer: Add ReadbackBuffer as parameter #33315

Closed

Uh oh!

Conversation

gkjohnson commented Apr 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Apr 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

📦 Bundle size

🌳 Bundle size after tree-shaking

Uh oh!

Mugen87 commented Apr 4, 2026

Uh oh!

sunag commented Apr 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gkjohnson commented Apr 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sunag commented Apr 5, 2026

Uh oh!

sunag commented Apr 5, 2026

Uh oh!

sunag commented Apr 5, 2026

Uh oh!

sunag commented Apr 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mrdoob commented Apr 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mrdoob commented Apr 6, 2026

Uh oh!

sunag commented Apr 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sunag commented Apr 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mrdoob commented Apr 7, 2026

Uh oh!

mrdoob commented Apr 9, 2026

Uh oh!

mrdoob commented Apr 9, 2026

Uh oh!

sunag commented Apr 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gkjohnson commented Apr 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sunag commented Apr 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gkjohnson commented Apr 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

gkjohnson commented Apr 3, 2026 •

edited

Loading

github-actions bot commented Apr 3, 2026 •

edited

Loading

sunag commented Apr 4, 2026 •

edited

Loading

gkjohnson commented Apr 5, 2026 •

edited

Loading

sunag commented Apr 5, 2026 •

edited

Loading

mrdoob commented Apr 6, 2026 •

edited

Loading

sunag commented Apr 6, 2026 •

edited

Loading

sunag commented Apr 6, 2026 •

edited

Loading

sunag commented Apr 9, 2026 •

edited

Loading

gkjohnson commented Apr 9, 2026 •

edited

Loading

sunag commented Apr 9, 2026 •

edited

Loading

gkjohnson commented Apr 10, 2026 •

edited

Loading