Skip to content
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
/.idea
__pycache__
/corpus
/venv
139 changes: 139 additions & 0 deletions spec/eofv0_verkle.md
Original file line number Diff line number Diff line change
Expand Up @@ -105,6 +105,145 @@ The same as above except encode the values as 6-bit numbers
(minimum number of bits needed for encoding `32`).
Such encoding lowers the size overhead from 3.1% to 2.3%.

### Encode only invalid jumpdests (dense encoding)

Alternate option is instead of encoding all valid `JUMPDEST` locations, to only encode invalid ones.
Comment thread
axic marked this conversation as resolved.
By invalid `JUMPDEST` we mean a `0x5b` byte in any pushdata.

This is beneficial if our assumption is correct that most contracts only contain a limited number
of offending cases. Our initial analysis of the top 1000 used bytecodes suggests this is the case:
only 0.07% of bytecode bytes are invalid jumpdests.

Let's create a map of `invalid_jumpdests[chunk_index] = first_instruction_offset`. We can densely encode this
map using techniques similar to *run-length encoding* to skip distances and delta-encode indexes.
This map is always fully loaded prior to execution, and so it is important to ensure the encoded
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note to self: see how much of those costs could be covered by the 21000 gas.

version is as dense as possible (without sacrificing on complexity).

We propose the encoding using fixed-size 8-bit elements.
For each entry in `invalid_jumpdests`:
- 1-bit mode (`skip`, `value`)
- For skip-mode:
- 7-bit number of chunks to skip
- For value-mode:
- 7-bit number combining number of chunks to skip `s` and `first_instruction_offset`
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

next line?

produced as `s * 33 + first_instruction_offset`

For the worst case where each chunk contains an invalid `JUMPDEST` the encoding length is equal
to the number of chunks in the code. I.e. the size overhead is 3.1%.

| code size limit | code chunks | encoding chunks |
|-----------------|-------------|-----------------|
| 24576 | 768 | 24 |
| 32768 | 1024 | 32 |
| 65536 | 2048 | 64 |

Our current hunch is that in average contracts this results in a sub-1% overhead, while the worst case is 3.1%.
This is strictly better than the 3.2% overhead of the current Verkle code chunking.

#### Header location

It is possible to place above as part of the "EOFv0" header, but given the upper bound of number of chunks occupied is low (33 vs 21),
it is also possible to make this part of the Verkle account header.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, but if we want to increase the maximum code size to 64k, there won't be enough space left for it in the header.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With scheme 1 it is still 56 verkle leafs for 64k code in worst case. That should still easily fit into the 128 "special" first header leafs.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we definitely need a variadic length of this section because the average case (1–2 chunks) is much different from the worst case (20–30 chunks). I.e. you don't want to reserve ~60 chunks in the tree just to use 2 on average.


This second option allows for the simplification of the `code_size` value, as it does not need to change.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By "second option", you mean "adding it to the account header", not "Scheme 2", right ?

I don't see why there would be a difference with the other case though : in both cases, one needs to use the code size to skip the header.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By "second option", you mean "adding it to the account header", not "Scheme 2", right ?

Yes.

I don't see why there would be a difference with the other case though : in both cases, one needs to use the code size to skip the header.

No because I'd imagine the account header (i.e. not code leafs/keys) would be handled separately, so the actual EVM code remains verbatim.


#### Runtime after Verkle

During execution of a jump two checks must be done in this order:

1. Check if the jump destination is the `JUMPDEST` opcode.
2. Check if the jump destination chunk is in the `invalid_jumpdests` map.
If yes, the jumpdest analysis of the chunk must be performed
to confirm the jump destination is not push data.

It is possible to reconstruct sparse account code prior to execution with all the submitted chunks of the transaction
and perform `JUMPDEST`-validation to build up a relevant *valid `JUMPDEST` locations* map instead.

#### Analysis

We have analyzed two contracts, Arbitrum validator and Uniswap router.

Arbitrum (2147-bytes long):
```
(chunk offset, chunk number, pushdata offset)
malicious push byte: 85 2 21
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This analysis is wrong because we have to encode first instruction offset instead of first invalid jumpdest offset. I think we should remove this section or at least mark is as incorrect until I'll come with proper analysis.

malicious push byte: 95 2 31
malicious push byte: 116 3 20
malicious push byte: 135 4 7
malicious push byte: 216 6 24
malicious push byte: 1334 41 22
```

Encoding with *scheme 1*:
```
[skip, 2]
[value, 21]
[value, 31]
[skip, 1]
[value, 20]
[skip, 1]
[value, 7]
[skip, 2]
[value, 24]
[skip, 35]
[value, 22]
```

Encoding size: `5 skips (5 * 11 bits) + 6 values (6 * 7 bits)` = 13-bytes header (0.605%)

Encoding with *scheme 2*:
```
[skip, 2]
[value, 0, 21]
[value, 0, 31]
[value, 1, 20]
[value, 1, 7]
[value, 2, 24]
[skip, 35, 22]
```

Encoding size: `2 skips (2 * 11 bits) + 5 values (5 * 11 bits)` = 10-bytes header (0.465%)

Uniswap router contract (17958 bytes):
Comment thread
axic marked this conversation as resolved.
Outdated

```
(chunk offset, chunk number, pushdata offset)
malicious push byte: 1646 51 14
malicious push byte: 1989 62 5
malicious push byte: 4239 132 15
malicious push byte: 4533 141 21
malicious push byte: 7043 220 3
malicious push byte: 8036 251 4
malicious push byte: 8604 268 28
malicious push byte: 12345 385 25
malicious push byte: 15761 492 17
```

Encoding using *scheme 2*:
```
[skip, 51]
[value, 0, 14]
[value, 11, 5]
[skip, 70]
[value, 0, 15]
[value, 9, 21]
[skip, 79]
[value, 0, 3]
[skip, 31]
[value, 0, 4]
[skip, 17]
[value, 0, 28]
[skip, 117]
[value, 0, 25]
[skip, 107]
[value, 0, 17]
```

Encoding size: `7 skips (7 * 11 bits) + 9 values (9 * 11 bits)` = 22-bytes header (0.122%)

Our current hunch is that in average contracts this results in a sub-1% overhead, while the worst case is 4.1%.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's good results, although I would like to see a full analysis, including of contracts that are close to the 24kb limit. And, ideally, of contracts with 64kb code size.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note to myself: we will make a table with worst case values for code size limits of 24k, 32k and 64k.

This compares against the constant 3.2% overhead of the current Verkle code chunking.

## Backwards Compatibility

EOF-packaged code execution if fully compatible with the legacy code execution.
Expand Down