Skip to content

Sync upstream 20260108#13

Open
lixmal wants to merge 19 commits intomasterfrom
sync-upstream-20260108
Open

Sync upstream 20260108#13
lixmal wants to merge 19 commits intomasterfrom
sync-upstream-20260108

Conversation

@lixmal
Copy link
Copy Markdown

@lixmal lixmal commented Jan 8, 2026

No description provided.

jwhited and others added 19 commits May 4, 2025 18:11
The sync.Locker used with a sync.Cond must be acquired when changing
the associated condition, otherwise there is a window within
sync.Cond.Wait() where a wake-up may be missed.

Fixes: 4846070 ("device: use a waiting sync.Pool instead of a channel")
Reviewed-by: Brad Fitzpatrick <bradfitz@tailscale.com>
Signed-off-by: Jordan Whited <jordan@tailscale.com>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
…itPool

Fixes: 3bb8fec ("conn, device, tun: implement vectorized I/O plumbing")
Reviewed-by: Brad Fitzpatrick <bradfitz@tailscale.com>
Signed-off-by: Jordan Whited <jordan@tailscale.com>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Since 3c75945fd ("netstack: remove PacketBuffer.IsNil()") this has been
invalid. Follow the replacement pattern of that commit.

The old definition inlined to the same code anyway:

 func (pk *PacketBuffer) IsNil() bool {
 	return pk == nil
 }

Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Colin's commit went a step further and protected tun.incomingPacket with
a lock on shutdown, but let's see if the tun.stack.Close() call actually
solves that on its own.

Suggested-by: kshangx <hikeshang@hotmail.com>
Suggested-by: Colin Adler <colin1adler@gmail.com>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Use parallel summation with native byte order per RFC 1071.
add-with-carry operation is used to add 4 words per operation.  Byteswap
is performed before and after checksumming for compatibility with old
`checksumNoFold()`.  With this we get a 30-80% speedup in `checksum()`
depending on packet sizes.

Add unit tests with comparison to a per-word implementation.

**Intel(R) Xeon(R) Silver 4210R CPU @ 2.40GHz**

| Size | OldTime | NewTime | Speedup  |
|------|---------|---------|----------|
| 64   | 12.64   | 9.183   | 1.376456 |
| 128  | 18.52   | 12.72   | 1.455975 |
| 256  | 31.01   | 18.13   | 1.710425 |
| 512  | 54.46   | 29.03   | 1.87599  |
| 1024 | 102     | 52.2    | 1.954023 |
| 1500 | 146.8   | 81.36   | 1.804326 |
| 2048 | 196.9   | 102.5   | 1.920976 |
| 4096 | 389.8   | 200.8   | 1.941235 |
| 8192 | 767.3   | 413.3   | 1.856521 |
| 9000 | 851.7   | 448.8   | 1.897727 |
| 9001 | 854.8   | 451.9   | 1.891569 |

**AMD EPYC 7352 24-Core Processor**

| Size | OldTime | NewTime | Speedup  |
|------|---------|---------|----------|
| 64   | 9.159   | 6.949   | 1.318031 |
| 128  | 13.59   | 10.59   | 1.283286 |
| 256  | 22.37   | 14.91   | 1.500335 |
| 512  | 41.42   | 24.22   | 1.710157 |
| 1024 | 81.59   | 45.05   | 1.811099 |
| 1500 | 120.4   | 68.35   | 1.761522 |
| 2048 | 162.8   | 90.14   | 1.806079 |
| 4096 | 321.4   | 180.3   | 1.782585 |
| 8192 | 650.4   | 360.8   | 1.802661 |
| 9000 | 706.3   | 398.1   | 1.774177 |
| 9001 | 712.4   | 398.2   | 1.789051 |

Signed-off-by: Tu Dinh Ngoc <dinhngoc.tu@irit.fr>
[Jason: simplified and cleaned up unit tests]
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: ruokeqx <ruokeqx@gmail.com>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
It should be POLLIN because closeFd is read-only file.

Signed-off-by: Kurnia D Win <kurnia.d.win@gmail.com>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Reduce allocations by eliminating byte reader, hand-rolled decoding and
reusing message structs.

Synthetic benchmark:

    var msgSink MessageInitiation
    func BenchmarkMessageInitiationUnmarshal(b *testing.B) {
        packet := make([]byte, MessageInitiationSize)
        reader := bytes.NewReader(packet)
        err := binary.Read(reader, binary.LittleEndian, &msgSink)
        if err != nil {
            b.Fatal(err)
        }
        b.Run("binary.Read", func(b *testing.B) {
            b.ReportAllocs()
            for range b.N {
                reader := bytes.NewReader(packet)
                _ = binary.Read(reader, binary.LittleEndian, &msgSink)
            }
        })
        b.Run("unmarshal", func(b *testing.B) {
            b.ReportAllocs()
            for range b.N {
                _ = msgSink.unmarshal(packet)
            }
        })
    }

Results:
                                         │      -      │
                                         │   sec/op    │
MessageInitiationUnmarshal/binary.Read-8   1.508µ ± 2%
MessageInitiationUnmarshal/unmarshal-8     12.66n ± 2%

                                         │      -       │
                                         │     B/op     │
MessageInitiationUnmarshal/binary.Read-8   208.0 ± 0%
MessageInitiationUnmarshal/unmarshal-8     0.000 ± 0%

                                         │      -       │
                                         │  allocs/op   │
MessageInitiationUnmarshal/binary.Read-8   2.000 ± 0%
MessageInitiationUnmarshal/unmarshal-8     0.000 ± 0%

Signed-off-by: Alexander Yastrebov <yastrebov.alex@gmail.com>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
This is already enforced in receive.go, but if these unmarshallers are
to have error return values anyway, make them as explicit as possible.

Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
This pairs with the recent change in wireguard-tools.

Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Optimize message encoding by eliminating binary.Write (which internally
uses reflection) in favour of hand-rolled encoding.

This is companion to 9e7529c.

Synthetic benchmark:

    var packetSink []byte
    func BenchmarkMessageInitiationMarshal(b *testing.B) {
        var msg MessageInitiation
        b.Run("binary.Write", func(b *testing.B) {
            b.ReportAllocs()
            for range b.N {
                var buf [MessageInitiationSize]byte
                writer := bytes.NewBuffer(buf[:0])
                _ = binary.Write(writer, binary.LittleEndian, msg)
                packetSink = writer.Bytes()
            }
        })
        b.Run("binary.Encode", func(b *testing.B) {
            b.ReportAllocs()
            for range b.N {
                packet := make([]byte, MessageInitiationSize)
                _, _ = binary.Encode(packet, binary.LittleEndian, msg)
                packetSink = packet
            }
        })
        b.Run("marshal", func(b *testing.B) {
            b.ReportAllocs()
            for range b.N {
                packet := make([]byte, MessageInitiationSize)
                _ = msg.marshal(packet)
                packetSink = packet
            }
        })
    }

Results:
                                             │      -      │
                                             │   sec/op    │
    MessageInitiationMarshal/binary.Write-8    1.337µ ± 0%
    MessageInitiationMarshal/binary.Encode-8   1.242µ ± 0%
    MessageInitiationMarshal/marshal-8         53.05n ± 1%

                                             │     -      │
                                             │    B/op    │
    MessageInitiationMarshal/binary.Write-8    368.0 ± 0%
    MessageInitiationMarshal/binary.Encode-8   160.0 ± 0%
    MessageInitiationMarshal/marshal-8         160.0 ± 0%

                                             │     -      │
                                             │ allocs/op  │
    MessageInitiationMarshal/binary.Write-8    3.000 ± 0%
    MessageInitiationMarshal/binary.Encode-8   1.000 ± 0%
    MessageInitiationMarshal/marshal-8         1.000 ± 0%

Signed-off-by: Alexander Yastrebov <yastrebov.alex@gmail.com>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Kernels below 5.12 are missing this:

    commit 98184612aca0a9ee42b8eb0262a49900ee9eef0d
    Author: Norman Maurer <norman_maurer@apple.com>
    Date:   Thu Apr 1 08:59:17 2021

        net: udp: Add support for getsockopt(..., ..., UDP_GRO, ..., ...);

        Support for UDP_GRO was added in the past but the implementation for
        getsockopt was missed which did lead to an error when we tried to
        retrieve the setting for UDP_GRO. This patch adds the missing switch
        case for UDP_GRO

        Fixes: e20cf8d3f1f7 ("udp: implement GRO for plain UDP sockets.")
        Signed-off-by: Norman Maurer <norman_maurer@apple.com>
        Reviewed-by: David Ahern <dsahern@kernel.org>
        Signed-off-by: David S. Miller <davem@davemloft.net>

That means we can't set the option and then read it back later. Given
how buggy UDP_GRO is in general on odd kernels, just disable it on older
kernels all together.

Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants