Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
25 commits
Select commit Hold shift + click to select a range
bfaa5b4
feat: myers diff
myfreess Mar 5, 2026
b284c05
refactor(diff): remove EditTag and take_tagged_view_from from public API
bobzhang Mar 24, 2026
223b5c2
refactor(diff): redesign API around Diff[T] and Hunk[T]
bobzhang Mar 24, 2026
09b8526
simplify Diff::group implementation
myfreess Mar 25, 2026
1a5715b
add patience sort
myfreess Mar 26, 2026
29fddc9
Diff::new() support patience option
myfreess Mar 26, 2026
14e5ce1
add test for patience diff
myfreess Mar 26, 2026
7b32fdd
update README for patience diff
myfreess Mar 26, 2026
873737b
fix Hunk header display
myfreess Mar 27, 2026
29d5205
refactor(diff): replace patience? : Bool with algorithm? : DiffAlgori…
bobzhang Apr 1, 2026
641b6f7
tweak
bobzhang Apr 1, 2026
539cb8f
docs(diff): document Edit and its range helpers
myfreess Apr 1, 2026
d5b35c6
fix Insert's `old_index`
myfreess Apr 1, 2026
1aae545
Rename edit length field `old_len` && `new_len` to len
myfreess Apr 1, 2026
ed87bd8
remove recursive unique_lcs search in patience
myfreess Apr 2, 2026
3bb9d9e
fix deprecated warning
myfreess Apr 13, 2026
89d06fa
refactor(diff): avoid unnecessary unwrap
myfreess Apr 14, 2026
9b7e2a7
refactor(diff): avoid unnecessary pattern matching
myfreess Apr 14, 2026
5da8ac1
refactor(diff): implementing pile use array
myfreess Apr 15, 2026
2a06be9
refactor(diff): remove unnecessary polymorphism
myfreess Apr 15, 2026
3438045
refactor(diff): remove unnecessary arguments
myfreess Apr 16, 2026
28c2875
refactor(diff): remove use of cstyle forloop and unnecessary `let mut`
myfreess Apr 16, 2026
bdc1921
migrate to @test.assert_eq
myfreess Apr 16, 2026
4d28664
refactor(diff): remove use of high order function
myfreess Apr 17, 2026
02d4821
add internal doc
myfreess Apr 19, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
175 changes: 175 additions & 0 deletions diff/INTERNAL.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,175 @@
# Implementation and Working Principles of Patience Diff

Patience Diff was first proposed by Bram Cohen. It is essentially a heuristic
text-partitioning strategy that can cooperate with other diff algorithms. Its
core idea is as follows:

- For the text blocks `old` and `new`, first count the occurrences of each
line, then use the lines whose contents appear exactly once in `old` and
exactly once in `new` as candidate anchors.

Suppose `old` looks like this:

```
1| Ruby: ruby-lang.org
2| #
3| Python: python.org
4| #
5| MoonBit: www.moonbitlang.com
6| #
7| Perl: use.perl.org
```

In this example, within `old`, `#` appears more than once, so it is not unique
on the `old` side. The other four lines are unique in `old`, and they can
serve as anchors only if they also appear exactly once in `new`.

- Then, among the lines that are unique on both sides, select those that
appear in both blocks to form a candidate anchor sequence.

Suppose `new` looks like this:

```
1| Python: python.org
2| #
3| MoonBit: www.moonbitlang.com
4| #
5| Javascript: tc39.es
6| #
7| Ruby: ruby-lang.org
```

Then the candidate anchor sequence is:

```
"Ruby: ruby-lang.org": old index = 1, new index = 7
"Python: python.org": old index = 3, new index = 1
"MoonBit: www.moonbitlang.com": old index = 5, new index = 3
```

The candidate anchor sequence must ensure that one column of `index` values is
ordered. Here we arrange it from top to bottom in ascending order of `old
index`.

- Next, within this candidate sequence, search for the longest increasing
subsequence by `new index`. Once this is done, the indices on both sides are
in ascending order.

The sequence found from the candidate anchor sequence above is:

```
"Python: python.org": old index = 3, new index = 1
"MoonBit: www.moonbitlang.com": old index = 5, new index = 3
```

- Finally, split the two text blocks according to the anchors, and apply a
basic diff algorithm to the resulting subranges.

There is another approach commonly described online: apply patience again to
each subrange until no suitable anchors can be found. This was the first
version of Patience proposed by Bram Cohen. In a later blog post, he argued
that in practical use, a single split did not appear significantly worse than
recursive splitting, while being simpler, so this document still uses the
single-split approach.

Its basic principle is this simple. In many real code edits, this heuristic
works well.

Its implementation is basically just the process above. The only relatively
complicated part is finding the longest increasing subsequence, which uses an
algorithm called `Patience sort` (also the origin of the name Patience diff).

## Patience Sort

The name `Patience Sort` is said to come from a solitaire card game called
`Patience`. At the start of the game there is a shuffled deck of cards
(corresponding to an unordered `new index` list). By dealing the cards one by
one into a series of piles on the table according to a few rules, the longest
increasing subsequence can be found.

Below, we use an array containing the numbers 1 through 13 as an example. Each
number appears only once, because they are all filtered candidate anchors.

```
5 9 4 6 12 8 7 1 10 11 3 2 13
```

First, take out `5`. Since the table is currently empty, create a new pile to
hold it.

```
9 4 6 12 8 7 1 10 11 3 2 13
-------------------------------------------------


5
```

Next, take out `9`. Since `9` is greater than `5`, it cannot be placed on top
of `5`, so it can only go to the right of `5`, forming a new pile. This time,
unlike the first step, we need to record some extra information. The last
number `9` compares against is `5`, so we create a *back pointer* from `9` to
`5`.

```
4 6 12 8 7 1 10 11 3 2 13 9 -> 5
---------------------------------------------


5 9
```

Next, take out `4`. Since `4` is smaller than `5`, place it directly on top of
`5` without recording a back pointer.

```
6 12 8 7 1 10 11 3 2 13 9 -> 5
-----------------------------------------


4
5 9
```

The following steps work in the same way. Take out `6`: `6` is greater than
`4` but smaller than `9`, so place it on top of `9` and record a back pointer
from `6` to `4`.

```
12 8 7 1 10 11 3 2 13 9 -> 5
------------------------------------- 6 -> 4


4 6
5 9
```

By repeating this process, we eventually get the following piles and back
pointer records:

```
9 -> 5
6 -> 4
12 -> 6
8 -> 6
7 -> 6
10 -> 7
11 -> 10
3 -> 1
2 -> 1
13 -> 11

2
1 3 7
4 6 8
5 9 12 10 11 13
```

Finally, start from `13`, the top card of the rightmost pile, and follow the
back pointers: `13 -> 11 -> 10 -> 7 -> 6 -> 4`. Reverse this sequence to get
one longest increasing subsequence.

When translating this process into code, there is one optimization that can be
made: because the top elements of these piles are ordered, once the number of
piles becomes large, binary search can be used to find the final position for a
new card.
119 changes: 119 additions & 0 deletions diff/README.mbt.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,119 @@
# Diff

Compute edit scripts between two sequences using the Myers diff algorithm by
default, or patience diff when you pass `algorithm=@diff.Patience`.

`Diff` works with any element type that implements `Hash + Eq`. Constructing a
`Diff[T]` bundles the source arrays with the edit script. Call `group` on the
result to split far-apart changes into separate `Hunk[T]` values for
unified-diff-style output.

## Compute A Diff

`Diff(old~, new~)` computes the full sequence of `Delete`, `Insert`, and
`Equal` operations, accessible via the `edits` field.

```mbt check
///|
test "Diff computes deletes inserts and equals" {
let old = ["apple", "pear", "banana"][:]
let new = ["apple", "banana", "coconut"][:]

let d = @diff.Diff(old~, new~)

assert_eq(d.edits.length(), 4)
assert_true(
d.edits[:]
is [
Equal(old_index=0, new_index=0, len=1),
Delete(old_index=1, new_index=1, len=1),
Equal(old_index=2, new_index=1, len=1),
Insert(old_index=3, new_index=2, len=1),
],
)
}
```

## Prefer Unique Anchors With Patience Diff

Pass `algorithm=@diff.Patience` to `Diff(old~, new~, algorithm=@diff.Patience)` to enable
patience diff. This first finds elements that appear exactly once in both
inputs and uses them as anchors, then runs Myers diff on the unmatched ranges
between those anchors. This can produce more stable result when repeated
elements move around.

```mbt check
///|
test "patience diff keeps unique anchors in place" {
let old = ["unique", "dup", "dup"][:]
let new = ["dup", "unique", "dup"][:]

let myers = @diff.Diff(old~, new~)
let patience = @diff.Diff(old~, new~, algorithm=@diff.Patience)

assert_true(
myers.edits[:]
is [
Delete(old_index=0, new_index=0, len=1),
Equal(old_index=1, new_index=0, len=1),
Insert(old_index=2, new_index=1, len=1),
Equal(old_index=2, new_index=2, len=1),
],
)
assert_true(
patience.edits[:]
is [
Insert(old_index=0, new_index=0, len=1),
Equal(old_index=0, new_index=1, len=2),
Delete(old_index=2, new_index=3, len=1),
],
)
}
```

## Group Into Hunks And Render

`group` splits the edit script into `Hunk[T]` values, keeping `radius` lines
of surrounding context (default 3). `radius` must be non-negative, and
`radius=0` emits hunks without surrounding context. Each `Hunk[T]` implements
`Show`, so you can print it directly as unified-diff output.

```mbt check
///|
test "group splits distant changes into separate hunks" {
let old = [
" aaaaaaaaaa", " bbbbbbbbbb", " cccccccccc", " dddddddddd", " eeeeeeeeee",
" ffffffffff", " gggggggggg", " hhhhhhhhhh",
][:]
let new = [
" aaaaaaaaaa", " xxxxxxxxxx", " cccccccccc", " dddddddddd", " eeeeeeeeee",
" ffffffffff", " yyyyyyyyyy", " hhhhhhhhhh",
][:]

let hunks = @diff.Diff(old~, new~).group(radius=1)

assert_eq(hunks.length(), 2)
assert_eq(
hunks[0].to_string(),
(
#|@@ -1,3 +1,3 @@
#| aaaaaaaaaa
#|- bbbbbbbbbb
#|+ xxxxxxxxxx
#| cccccccccc
#|
),
)
assert_eq(
hunks[1].to_string(),
(
#|@@ -6,3 +6,3 @@
#| ffffffffff
#|- gggggggggg
#|+ yyyyyyyyyy
#| hhhhhhhhhh
#|
),
)
}
```
38 changes: 38 additions & 0 deletions diff/backpointer.mbt
Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@
// Copyright 2026 International Digital Economy Academy
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.

///|
/// A node in the predecessor chain recovered from the patience-sorting phase.
///
/// `value` stores the current pile entry, and `prev` points to the entry chosen
/// from the previous pile so the final increasing chain can be reconstructed
/// once the last pile is known.
priv struct BackPointer {
value : (Int, Int)
prev : BackPointer?
}

///|
/// Materialize the predecessor chain ending at `self`, preserving left-to-right
/// subsequence order.
fn BackPointer::to_array(self : BackPointer) -> Array[(Int, Int)] {
let result = []
let mut self = self
while self.prev is Some(prev) {
result.push(self.value)
self = prev
}
result.push(self.value)
return result.rev()
}
50 changes: 50 additions & 0 deletions diff/compact.mbt
Original file line number Diff line number Diff line change
@@ -0,0 +1,50 @@
// Copyright 2026 International Digital Economy Academy
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.

///|
/// A view over `source` filtered through a stable list of retained indices.
///
/// `indices[i]` stores the original position in `source` for the `i`th visible
/// element. This lets the Myers implementation skip values that cannot match
/// while still translating results back to the original coordinates.
priv struct CompactedElements[T]((ArrayView[T], FixedArray[Int]))

///|
/// Read the `index`th retained element.
fn[T] CompactedElements::op_get(self : CompactedElements[T], index : Int) -> T {
let (source, indices) = self.0
source[indices[index]]
}

///|
/// Map a retained position back to its original index in `source`.
fn[T] CompactedElements::get_index(
self : CompactedElements[T],
index : Int,
) -> Int {
let (_, indices) = self.0
indices[index]
}

///|
/// Return the number of retained elements visible through this compacted view.
fn[T] CompactedElements::indices_length(self : CompactedElements[T]) -> Int {
self.0.1.length()
}

///|
/// Return the length of the underlying unfiltered source sequence.
fn[T] CompactedElements::source_length(self : CompactedElements[T]) -> Int {
self.0.0.length()
}
Loading
Loading