Skip to content
Merged
Show file tree
Hide file tree
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -296,6 +296,7 @@ Each profile includes:
- **Collaboration**: top devs sharing the same files (ranked by `shared_lines` = Σ min(linesA, linesB))
- **Weekend %**: off-hours work ratio
- **Top files**: most impacted files by churn
- **Top commits**: the dev's largest individual commits by lines changed (additions + deletions); surfaces vendored drops and bulk rewrites that can skew the totals

### Coupling analysis

Expand Down
3 changes: 2 additions & 1 deletion docs/METRICS.md
Original file line number Diff line number Diff line change
Expand Up @@ -219,6 +219,7 @@ Per-developer report combining multiple metrics.
| Specialization | Herfindahl index over the **full** per-directory file-count distribution: Σ pᵢ² where pᵢ is the share of the dev's files in directory i. 1 = all files in one directory (narrow specialist); 1/N for a uniform spread across N directories; approaches 0 as the distribution widens. Computed before the top-5 Scope truncation so it reflects actual breadth. Labels (see `specBroadGeneralistMax`, `specBalancedMax`, `specFocusedMax` constants): `< 0.15` broad generalist, `< 0.35` balanced, `< 0.7` focused specialist, `≥ 0.7` narrow specialist. Herfindahl, not Gini, because Gini would collapse "1 file in 1 dir" and "1 file in each of 5 dirs" to the same value (both have zero inequality among buckets), which misses the specialization distinction. **Measures file distribution, not domain expertise** — see caveat below. **Display vs raw:** CLI and HTML show the value rounded to 3 decimals (`%.3f`) for readability; JSON output preserves the full float64. Band classification runs against the raw float, so a value like 0.149 lands in `broad generalist` even though %.2f would have rounded it to `0.15`. JSON consumers that reproduce the banding must use the raw value, not a rounded version. |
| Contribution type | Based on del/add ratio: growth (<0.4), balanced (0.4-0.8), refactor (>0.8) |
| Collaborators | Top 5 devs sharing code with this dev. Ranked by `shared_lines` (Σ min(linesA, linesB) across shared files), tiebreak `shared_files`, then email. Same `shared_lines` semantics as the Developer Network metric — discounts trivial one-line touches so "collaborator" reflects real overlap. |
| Top commits | The dev's top 10 commits by `lines_changed` (additions + deletions), tiebreak `sha asc`. Same ranking key and tiebreak as the dataset-level Top Commits section so the two read consistently side by side. Messages follow the same 80-character truncation rule and are only populated when `extract` ran with `--include-commit-messages`. Rendered in the CLI `profile` stat and in the standalone `--email` HTML profile page; intentionally omitted from the main report's Developer Profiles cards to keep those compact. **Divergence from dataset-level Top Commits:** commits with a zero `author_date` are dropped from the per-dev list (they share the guard that protects grid/monthly bucketing); the dataset-level section renders them as `0001-01-01`. Negligible in practice — the JSONL extract always emits `author_date` — but worth knowing if you compare the two views. |

## Top Commits

Expand Down Expand Up @@ -373,7 +374,7 @@ Every ranking function has an explicit tiebreaker so the same input produces the
| `dev-network` | shared_lines | shared_files |
| `profile` | commits | email asc |

A third-level tiebreaker on path/sha/email asc is applied where primary and secondary can both tie (`churn-risk`, `coupling`, `dev-network`) so ordering is stable even with exact equality on the first two keys. Inside each profile, the `TopFiles`, `Scope`, and `Collaborators` sub-lists are also sorted with explicit tiebreakers (path / dir / email asc) so their internal ordering is deterministic too.
A third-level tiebreaker on path/sha/email asc is applied where primary and secondary can both tie (`churn-risk`, `coupling`, `dev-network`) so ordering is stable even with exact equality on the first two keys. Inside each profile, the `TopFiles`, `TopCommits`, `Scope`, and `Collaborators` sub-lists are also sorted with explicit tiebreakers (path / sha / dir / email asc) so their internal ordering is deterministic too.

Inside `busfactor`, the per-file `TopDevs` list is sorted by lines desc with an email asc tiebreaker. Without it, binary assets and small files where two devs contribute equal lines (e.g. `.gif`, `.png`, one-line configs) produced a different `TopDevs` email order on every run.

Expand Down
20 changes: 20 additions & 0 deletions internal/report/profile_template.go
Original file line number Diff line number Diff line change
Expand Up @@ -190,6 +190,26 @@ footer { margin-top: 40px; padding-top: 16px; border-top: 1px solid #d0d7de; col
</table>
{{end}}

{{if .Profile.TopCommits}}
<h2>Top Commits</h2>
<p class="hint">This developer's largest individual commits by lines changed (additions + deletions). A handful of outsized commits (vendored drops, bulk renames, generated code) reads very differently from a steady stream of medium-sized ones, even when the totals match.</p>
<table>
<tr><th>SHA</th><th>Date</th><th>Lines</th><th>Files</th><th>Message</th></tr>
{{range .Profile.TopCommits}}
<tr>
<td class="mono">{{slice .SHA 0 12}}</td>
Comment thread
lex0c marked this conversation as resolved.
Outdated
<td class="mono" style="font-size:11px;">{{.Date}}</td>
<td>{{thousands .LinesChanged}}</td>
<td>{{thousands .FilesChanged}}</td>
<td class="truncate">{{.Message}}</td>
</tr>
{{end}}
{{if gt .Profile.TopCommitsHidden 0}}
<tr><td colspan="5" style="color:#656d76; font-style:italic; text-align:center;">+{{.Profile.TopCommitsHidden}} more commits not shown</td></tr>
{{end}}
</table>
{{end}}

{{if .ActivityYears}}
<h2 style="display:flex; justify-content:space-between; align-items:center;">Activity <button onclick="var h=document.getElementById('prof-act-heatmap'),t=document.getElementById('prof-act-table');h.hidden=!h.hidden;t.hidden=!t.hidden;this.textContent=h.hidden?'heatmap':'table'" style="font-size:11px; font-weight:normal; padding:2px 10px; border:1px solid #d0d7de; border-radius:4px; background:#f6f8fa; color:#24292f; cursor:pointer;">table</button></h2>
<p class="hint">Monthly commit heatmap. Darker = more commits. Gaps = inactive periods; steady cadence signals healthy pace. Hover for details; toggle to table for exact numbers. · {{docRef "activity"}}</p>
Expand Down
2 changes: 1 addition & 1 deletion internal/report/template.go
Original file line number Diff line number Diff line change
Expand Up @@ -372,7 +372,7 @@ footer { margin-top: 40px; padding-top: 16px; border-top: 1px solid #d0d7de; col
{{end}}
{{if .Profiles}}
<h2>Developer Profiles</h2>
<h2>Developer Profiles{{if lt (len .Profiles) .Summary.TotalDevs}} <span style="font-size:13px; color:#656d76; font-weight:normal;">{{thousands (len .Profiles)}} of {{thousands .Summary.TotalDevs}}</span>{{end}}</h2>
<p class="hint">Per-developer view. Use to spot silos (narrow scope + few collaborators), knowledge concentration (high pace on few directories), and cultural patterns (weekend or refactor-heavy work). · {{docRef "profile"}}</p>
{{range .Profiles}}
<div style="background:#fff; border:1px solid #d0d7de; border-radius:6px; padding:16px; margin-bottom:16px;">
Expand Down
12 changes: 12 additions & 0 deletions internal/stats/format.go
Original file line number Diff line number Diff line change
Expand Up @@ -520,6 +520,18 @@ func (f *Formatter) PrintProfiles(profiles []DevProfile) error {
}
}

if len(p.TopCommits) > 0 {
fmt.Fprintln(f.w)
fmt.Fprintln(f.w, " Top commits:")
for _, tc := range p.TopCommits {
fmt.Fprintf(f.w, " %s %s %6d lines %3d files %s\n",
tc.SHA[:12], tc.Date, tc.LinesChanged, tc.FilesChanged, tc.Message)
Comment thread
lex0c marked this conversation as resolved.
Outdated
}
if p.TopCommitsHidden > 0 {
fmt.Fprintf(f.w, " ... (+%d more commits not shown)\n", p.TopCommitsHidden)
}
}

if len(p.MonthlyActivity) > 0 {
fmt.Fprintln(f.w, " Activity:")
maxCommits := 0
Expand Down
82 changes: 81 additions & 1 deletion internal/stats/stats.go
Original file line number Diff line number Diff line change
Expand Up @@ -1328,6 +1328,14 @@ type DevProfile struct {
// whole footprint or just a sample. Zero when the dev's touched
// file count fits in 10.
TopFilesHidden int
// TopCommits is the dev's largest commits by LinesChanged (add+del),
// capped at 10. Mirrors the dataset-level TopCommits metric so a
// reader can see which individual commits drive this dev's churn
// footprint — a handful of huge vendored-drop commits reads very
// differently from a steady stream of medium ones, even when the
// totals match. TopCommitsHidden follows the TopFilesHidden pattern.
TopCommits []DevCommit
TopCommitsHidden int
Scope []DirScope
// ScopeHidden / ExtensionsHidden count the buckets dropped by the
// top-5 truncation so CLI and HTML can surface "+N more" — without
Expand Down Expand Up @@ -1369,6 +1377,22 @@ type DevFileContrib struct {
Churn int64
}

// DevCommit is a single commit attributed to the dev, carrying the
// fields needed to render the per-dev "top commits" list. Mirrors the
// shape of BigCommit (the dataset-level TopCommits type) minus the
// AuthorName/AuthorEmail fields — those are redundant in a per-dev view
// where every entry belongs to the same author. Message is truncated
// at 80 chars (same as TopCommits) to keep the CLI/HTML table narrow.
type DevCommit struct {
SHA string
Date string
Message string
Additions int64
Deletions int64
LinesChanged int64
FilesChanged int
}

// DevExtContrib is a dev's footprint in a single extension bucket.
// Churn is the summed per-file dev-lines (from fe.devLines), so it
// reflects lines the dev personally added/removed across files that
Expand Down Expand Up @@ -1525,16 +1549,44 @@ func DevProfiles(ds *Dataset, filterEmail string, n int) []DevProfile {
// Per-dev work grid + monthly activity
devGrid := make(map[string]*[7][24]int)
devMonthly := make(map[string]map[string]*ActivityBucket)
// Per-dev commit list for TopCommits ranking. Collected in the same
// ds.commits pass as devGrid/devMonthly so we don't iterate the full
// commit map twice; actual sort + top-10 truncation happens in the
// per-dev assembly loop below.
devCommits := make(map[string][]DevCommit)
dayIdx := [7]int{6, 0, 1, 2, 3, 4, 5} // Sunday=6, Monday=0, ...

for _, cm := range ds.commits {
for sha, cm := range ds.commits {
if !inTarget(cm.email) {
continue
}
if cm.date.IsZero() {
// Note: dataset-level TopCommits() renders zero-date commits
// as "0001-01-01"; we drop them here because grid/monthly below
// share this guard and malformed-date commits are rare enough
// in practice (JSONL extract always emits author_date) that
// the divergence is not worth branching the loop for.
continue
}

// Message is stored un-truncated on purpose: the 80-char
// truncation is deferred to the per-dev assembly loop below,
// which runs after sort + top-10 cap. A dev with thousands of
// commits would otherwise pay N small string allocations here
// just to throw away all but 10. Dataset-level TopCommits()
// truncates inline because it builds BigCommits in one pass;
// the per-dev path splits collection from projection so we can
// avoid that cost.
devCommits[cm.email] = append(devCommits[cm.email], DevCommit{
SHA: sha,
Date: cm.date.UTC().Format("2006-01-02"),
Message: cm.message,
Additions: cm.add,
Deletions: cm.del,
LinesChanged: cm.add + cm.del,
FilesChanged: cm.files,
})

if devGrid[cm.email] == nil {
devGrid[cm.email] = &[7][24]int{}
}
Expand Down Expand Up @@ -1585,6 +1637,33 @@ func DevProfiles(ds *Dataset, filterEmail string, n int) []DevProfile {
}
}

// Top commits: rank this dev's commits by lines changed, mirroring
// the dataset-level TopCommits semantics. Deterministic tiebreak on
// SHA asc so the displayed top-10 is stable across runs when a dev
// has several same-sized commits (e.g. a series of formatting
// passes each touching the same LOC count). Message truncation is
// done here, post-cap, so we pay the string-copy cost for at most
// 10 entries per dev instead of the full commit count.
topCommits := devCommits[email]
topCommitsHidden := 0
if len(topCommits) > 0 {
sort.Slice(topCommits, func(i, j int) bool {
if topCommits[i].LinesChanged != topCommits[j].LinesChanged {
return topCommits[i].LinesChanged > topCommits[j].LinesChanged
}
return topCommits[i].SHA < topCommits[j].SHA
})
if len(topCommits) > 10 {
topCommitsHidden = len(topCommits) - 10
topCommits = topCommits[:10]
}
for i := range topCommits {
if len(topCommits[i].Message) > 80 {
topCommits[i].Message = topCommits[i].Message[:77] + "..."
}
}
}

var monthly []ActivityBucket
if months, ok := devMonthly[email]; ok {
var order []string
Expand Down Expand Up @@ -1805,6 +1884,7 @@ func DevProfiles(ds *Dataset, filterEmail string, n int) []DevProfile {
LinesChanged: cs.Additions + cs.Deletions, FilesTouched: cs.FilesTouched,
ActiveDays: cs.ActiveDays, FirstDate: cs.FirstDate, LastDate: cs.LastDate,
TopFiles: topFiles, TopFilesHidden: topFilesHidden,
TopCommits: topCommits, TopCommitsHidden: topCommitsHidden,
Scope: scope, ScopeHidden: scopeHidden,
Extensions: extensions, ExtensionsHidden: extensionsHidden,
Specialization: specialization,
Expand Down
136 changes: 136 additions & 0 deletions internal/stats/stats_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -1958,6 +1958,142 @@ func TestDevProfilesContribType(t *testing.T) {
}
}

func TestDevProfilesTopCommits(t *testing.T) {
// Three devs × varying commit sizes. alice has 12 commits so the
// top-10 cap fires; bob has 3; carol has 1. The fixture deliberately
// includes two same-sized alice commits to exercise the SHA-asc
// tiebreak, a long-message commit to exercise the 80-char truncation,
// and a large bob commit to verify ranking is per-dev, not global.
ds := &Dataset{
commits: map[string]*commitEntry{},
contributors: map[string]*ContributorStat{},
files: map[string]*fileEntry{},
}
// Alice: 12 commits. Lines = 10*(i+1) except a tie at idx 4,5 both 500.
ds.contributors["alice@x"] = &ContributorStat{Name: "Alice", Email: "alice@x", Commits: 12, ActiveDays: 12}
for i := 0; i < 12; i++ {
sha := fmt.Sprintf("alice-%02d", i)
lines := int64(10 * (i + 1))
if i == 5 {
lines = 50 // alice-04 (idx 4) stays 50; force idx 5 to same so
// alice-04 and alice-05 tie on 50 lines.
}
ds.commits[sha] = &commitEntry{
email: "alice@x",
date: time.Date(2024, 1, 1+i, 10, 0, 0, 0, time.UTC),
add: lines, del: 0, files: 1,
}
}
// Long-message commit: 120 chars, should truncate to 77 + "..." (80 total).
longMsg := strings.Repeat("x", 120)
ds.commits["alice-LONG"] = &commitEntry{
email: "alice@x",
date: time.Date(2024, 2, 1, 10, 0, 0, 0, time.UTC),
add: 9999, del: 0, files: 5,
message: longMsg,
}
ds.contributors["alice@x"].Commits = 13

// Bob: 3 small commits, plus one huge (9000 lines) that must NOT
// appear in alice's TopCommits even though it's globally the biggest
// after alice-LONG.
ds.contributors["bob@x"] = &ContributorStat{Name: "Bob", Email: "bob@x", Commits: 4, ActiveDays: 4}
for i := 0; i < 3; i++ {
sha := fmt.Sprintf("bob-%02d", i)
ds.commits[sha] = &commitEntry{
email: "bob@x",
date: time.Date(2024, 3, 1+i, 10, 0, 0, 0, time.UTC),
add: 5, del: 0, files: 1,
}
}
ds.commits["bob-BIG"] = &commitEntry{
email: "bob@x",
date: time.Date(2024, 4, 1, 10, 0, 0, 0, time.UTC),
add: 9000, del: 0, files: 3,
}

profiles := DevProfiles(ds, "", 0)
var alice, bob *DevProfile
for i := range profiles {
switch profiles[i].Email {
case "alice@x":
alice = &profiles[i]
case "bob@x":
bob = &profiles[i]
}
}
if alice == nil || bob == nil {
t.Fatalf("missing profile: alice=%v bob=%v", alice, bob)
}

// Top-10 cap: alice has 13 commits → TopCommits=10, Hidden=3.
if len(alice.TopCommits) != 10 {
t.Fatalf("alice TopCommits len = %d, want 10", len(alice.TopCommits))
}
if alice.TopCommitsHidden != 3 {
t.Errorf("alice TopCommitsHidden = %d, want 3", alice.TopCommitsHidden)
}

// Ranking: LinesChanged desc. alice-LONG is the biggest (9999).
if alice.TopCommits[0].SHA != "alice-LONG" {
t.Errorf("alice[0] = %q, want alice-LONG", alice.TopCommits[0].SHA)
}
for i := 1; i < len(alice.TopCommits); i++ {
if alice.TopCommits[i-1].LinesChanged < alice.TopCommits[i].LinesChanged {
t.Errorf("alice not lines-desc at idx %d: %d < %d",
i, alice.TopCommits[i-1].LinesChanged, alice.TopCommits[i].LinesChanged)
}
}

// Message truncation: 77 + "..." = 80 chars.
if len(alice.TopCommits[0].Message) != 80 {
t.Errorf("long message len = %d, want 80 (77+...)", len(alice.TopCommits[0].Message))
}
if !strings.HasSuffix(alice.TopCommits[0].Message, "...") {
t.Errorf("long message missing ellipsis: %q", alice.TopCommits[0].Message)
}

// Per-dev isolation: no bob commits in alice's list.
for _, c := range alice.TopCommits {
if strings.HasPrefix(c.SHA, "bob-") {
t.Errorf("alice contains bob commit %q", c.SHA)
}
}

// Bob has 4 commits, no truncation needed.
if len(bob.TopCommits) != 4 {
t.Errorf("bob TopCommits len = %d, want 4", len(bob.TopCommits))
}
if bob.TopCommitsHidden != 0 {
t.Errorf("bob TopCommitsHidden = %d, want 0", bob.TopCommitsHidden)
}
if bob.TopCommits[0].SHA != "bob-BIG" {
t.Errorf("bob[0] = %q, want bob-BIG", bob.TopCommits[0].SHA)
}

// Tiebreak: when LinesChanged ties, SHA asc wins. alice-04 and
// alice-05 both carry 50 lines. alice-04 must come first.
var tieIdx04, tieIdx05 = -1, -1
for i, c := range alice.TopCommits {
if c.SHA == "alice-04" {
tieIdx04 = i
}
if c.SHA == "alice-05" {
tieIdx05 = i
}
}
if tieIdx04 >= 0 && tieIdx05 >= 0 && tieIdx04 > tieIdx05 {
t.Errorf("tiebreak broken: alice-04 at %d, alice-05 at %d (want 04 < 05)", tieIdx04, tieIdx05)
}

// LinesChanged field equals add+del.
for _, c := range alice.TopCommits {
if c.LinesChanged != c.Additions+c.Deletions {
t.Errorf("%s: LinesChanged=%d, add+del=%d", c.SHA, c.LinesChanged, c.Additions+c.Deletions)
}
}
}

func TestRenameMergesHistory(t *testing.T) {
// JSONL newest-first (as git log emits). Historical sequence:
// 1) 2024-01 c1 creates old.go
Expand Down
Loading