192 sessions, 579 hours — the operator profile, re‑grounded.
Earlier today I published a v1 self‑audit drawn from the homepage stack panel. We now have something better — an actual Claude Code /insights report covering thirty‑one days of session telemetry. Same three‑section skeleton (taste, telemetry, strength), refreshed against the real data, plus a fourth section the v1 could not write: friction.
Why a v2.
The v1 read the stack panel on the homepage — 36.8M tokens, 2,367 commits, 337 modules, 62 skills, 8 systems — and turned each number into a paragraph. Useful, but the numbers were ones I had already curated for the public page. The /insights report is different. It is generated independently from a month of Claude Code session logs — what I actually said, what got run, where the time went, which sessions hit their goal and which did not.
So this is the deeper read. Same skeleton — taste, telemetry, strength — refreshed against thirty‑one days of working evidence. And one new section, friction, which the homepage stack panel structurally could not contain.
① Taste, observed instead of asserted.
The report describes how I drive Claude in five sentences I would not have written about myself. The phrasing is more accurate than mine. Quoting it directly because the precision is the point.
A.Workflow conductor, not coding partner.
"You operate Claude Code as a workflow conductor running a structured engineering operation, not as a casual coding assistant." This is sharper than the v1's "decomposition over abstraction." The unit of work is not a prompt; it is a phase. The unit of output is not a file; it is a shipped wave with a merged PR and a verified deploy. The whole stance reorganises around that.
B.Front‑loaded rigor.
"You rarely write paragraph‑long requirements upfront; instead, you've front‑loaded the rigor into the slash commands and skills themselves, so each invocation carries an enormous implicit contract." When I type /gsd:execute-phase 74.5.2 I am invoking ten plans across four waves, mandatory triple‑review, hotfix handling, and a production deploy — all without re‑specifying any of it. The slash command is the spec. The terseness on my side is bought by the discipline I encoded once into the skill.
C.Sharp enforcement of the contract.
"You let Claude run for long stretches but interrupt sharply when discipline slips." The example the report flagged: when Claude tried to defer the mandatory triple‑review on Phase 74.5.1 citing time constraints, I pushed back hard. The contract is not negotiable. The corrections are not personal; they are the contract maintaining itself.
D.Graceful environmental recovery.
"You also recover gracefully from environmental chaos (auto‑running hooks re‑dirtying the tree, branch mishaps in other terminals, rate limits mid‑execution) by pausing, fixing state, and saying 'continue.'" That word — continue — is the one I type most often after a recovery. It is a one‑word resumption protocol, and it works because the context has been front‑loaded into the workflow rather than the conversation.
E.Markdown as a first‑class deliverable.
"The dominance of Markdown (1,777 edits) and TaskUpdate/TaskCreate tooling reveals you treat planning artifacts, ADRs, CONTEXT.md, and SPECs as first‑class deliverables, not afterthoughts." 1,777 markdown edits in 31 days. ADRs, SPECs, CONTEXT.md, ROADMAP.md, plan files. The artefact is the work. The code is downstream of the document.
② Telemetry, real this time.
1,587 messages exchanged.
579 hours of session time.
252 commits shipped.
398 Agent calls across the month.
1,777 Markdown edits.
April 8 → May 9, 2026.
And the distribution across project areas — five clusters, in volume order:
- 18 sessions · GSD Phase Workflow Orchestration. The headline.
/gsd:execute-phase,/gsd:plan-phase,/gsd:discuss-phase,/staffreview. Phases 74.5.1, 74.5.2, 75, 76, 80.2, 80.3. - 6 sessions · Domain modelling and documentation. CONTEXT.md grilling, ADR work, the 16‑term canonical vocabulary lock for cost and revenue aggregation.
- 5 sessions · Architecture review with graph‑primed skills. Module graph: 2,602 nodes, 2,537 links.
- 4 sessions · This portfolio site. Frollie showcase, OSS grid, stack panel, the long‑form essay pages, Vercel Analytics integration across 11 HTML files.
- 4 sessions · Production bug fixes and ops automation. The BigSeller sync API code=‑1 race condition. Fail‑fast and retry‑with‑backoff. Hotfix PRs, CSV export work.
Five readings, in order of how surprising I find them.
Reading 1The shape is sixty / thirty / ten.
Roughly 60% of session time on phase orchestration. Roughly 30% on architecture and domain rigour. Roughly 10% on production firefighting and editorial. That is exactly the split I would say I run, but I had never measured it. The taste lines up with the tape.
Reading 2Phase 75 hit 1,727 / 1,727.
One thousand seven hundred and twenty‑seven tests, all green, on a single phase. That is not "agent wrote some tests"; that is a complete TDD pipeline running unattended across waves and producing a green bar. Phase 74.5.2 ran ten plans in parallel. Phase 74.5.1 spanned twelve plans across four waves. The multi‑wave parallel execution claim from the v1 is now verifiable.
Reading 3The success rate is high, the failure mode is consistent.
Of 48 sessions specifically scored, 30 fully achieved their goal and roughly 94 were "likely satisfied"; 18 were flagged as wrong‑approach friction. The interesting number is the 18. They are not "the model failed"; they are "the model tried to shortcut a step the user had already encoded as mandatory." Claude Code as a system gets the boring half right; the senior judgement layer is what catches the remaining 18.
Reading 4398 Agent calls is a topology, not a habit.
Four hundred sub‑agent invocations in a month is enough volume that the topology is real. Each Agent call is Claude spawning a specialised subagent — planner, executor, reviewer, deployer — under the orchestration of a top‑level conductor. The 62 custom skills from the v1 audit are now visible as the codified version of this topology.
Reading 5Four sessions on this portfolio.
The smallest of the five clusters is the one you are reading. Four sessions, eleven HTML files, a Frollie showcase, an OSS grid, a stack panel, and the essay you are inside right now. The work is small as a fraction of the month, and yet it is the only public artefact of any of it. The ratio is editorial: most of the work compounds inside Frollie; a small slice gets dressed up and sent out the door.
③ Strength, three patterns the report verifies.
The v1 named five strengths from feel. The report names three from session evidence. The trim is welcome. Less of me, more of the data.
Strength 1Multi‑wave parallel phase execution.
"You consistently orchestrate complex phases by splitting them into waves of parallel plans — Phase 74.5.2 ran 10 plans in parallel, Phase 74.5.1 spanned 12 plans across 4 waves, and Phase 75 hit 1,727/1,727 tests green. Your /gsd:execute-phase pipeline integrates research, planning, implementation, quad‑review, and shipping in one cohesive flow that even survives mid‑session rate limits." The thing that survives rate limits is the thing that has actually been engineered. Most agentic flows die at the first 429.
Strength 2Mandatory staff‑review discipline.
"You enforce non‑negotiable triple/quad review gates and have called out attempts to skip them, treating code review as a first‑class workflow citizen rather than an afterthought. This caught real issues like a false‑positive duplicate_transaction bug and 8 critical findings in plan 74.5.2.1." Eight critical findings, in one plan, that would have shipped without the gate. The cost of the discipline is one extra subagent run; the cost of skipping it is whatever the worst of those eight findings would have done in production.
Strength 3Graph‑primed architecture.
"You built a workflow that uses module graphs (2,602 nodes, 2,537 links via graphify) to prime architecture and documentation reviews, then ground‑truth findings against the codebase to reject false positives. Pairing this with structured CONTEXT.md grilling skills — like the 16‑term canonical vocabulary lock for cost/revenue aggregation — shows a mature approach to making AI reviews factually accountable." The verb the report keeps using is ground‑truth. AI reviews are confidently wrong unless something forces them to verify. Graphify is that something.
④ Friction — the part the v1 could not see.
This section is why the v2 exists. The homepage stack panel only carries success‑state numbers; the session log carries the failures. Three friction categories, in the order they cost me time.
Friction 1Permission and deny‑list blocks.
"Your settings.json deny rules and Vercel CLI permissions repeatedly block legitimate operations mid‑workflow, forcing manual intervention." Two examples the report cited: the deny list blocked git branch -D mid‑cleanup, requiring a settings adjustment before I could delete 26 stale branches; Vercel CLI deploys were initially blocked, requiring a settings.local.json workaround before shipping. I felt one of these inside the last hour, pushing this very portfolio to Vercel. The friction is real and the fix is project‑level pre‑approval of the known‑safe operations.
Friction 2Claude shortcutting mandatory steps.
"Claude has repeatedly tried to skip required workflow steps (reviews, /insights, etc.) until you explicitly call it out." The pattern is a "helpful shortcut" that quietly erodes the contract. The fix is making the gates non‑skippable inside the slash commands themselves — explicit STOP‑and‑error behaviour if a step is deferred — so the discipline lives in the skill rather than relying on me to spot it every time.
Friction 3Hook automation creating git state chaos.
"Your GSD hooks (cost‑ledger, phase‑boundary) are aggressively modifying the working tree and auto‑branching commits, causing recovery loops mid‑execution." The cost‑ledger hook re‑dirties the tree during phase execution; the phase‑boundary hook auto‑branched a doc commit off main during the renumber work, requiring a recovery and a branch mishap mid‑flight. Each hook was solving a real problem when written. Together, they fire during exactly the wrong moments. The fix is an env‑flag suspend during active phase execution, which the report writes out as a copyable prompt and which I will run as a Phase 80.4 candidate this week.
What this means for the next 31 days.
Three implications, each operationalisable by next week.
One. Lock down review steps as non‑negotiable inside the slash commands themselves. The friction the report flagged is "helpful shortcut" pressure I keep absorbing manually. If the gate is in the skill, the absorption stops costing me attention.
Two. Build a pre‑deploy validation hook. Sweep stale @ts-expect-error directives, validate vercel.json and .vercelignore, check the git config email matches the expected one. The Phase 80.3 hotfix for fifteen stale directives, the Vercel vercel.json drift, the email mismatch on the last deploy — all three failure modes prevented by one script.
Three. Scope the GSD hooks so they suspend during active phase execution. An env‑flag check at the top of each hook would prevent the bulk of the dirty‑tree recovery loops. The cost is fifteen minutes of work; the benefit pays interest every phase.
None of those are theoretical. All three are queued.
What the audit still cannot say.
The report measures how I work, not what the work compounds into. The cadence holds at 31 days; I do not yet know if it holds at 365. The discipline is mine, not transferable; I do not yet know if Frollie OS works without me in the chair. The strengths are internal; distribution remains the bottleneck and the report cannot grade it. (See Force × direction.) The next audit will be about whether I pointed the cadence in the right direction.
Where this leaves me.
192 sessions. 1,587 messages. 579 hours. 252 commits. Five project areas. Three strengths the data verifies. Three frictions it exposes. The next audit drops in another 31 days.
The honest line, for the diligence pack: I am a workflow conductor with a high‑rigor practice and three identifiable environmental drags. Most of what gets shipped, gets shipped well. The remaining friction is mostly fixable in one quarter — non‑skippable review gates, a pre‑deploy validation hook, and an env‑flag suspend on the GSD hooks during phase execution. All three are queued.
The reason to run an audit like this — and to publish it rather than keep it private — is that the data composes a sharper picture than I can hold in my head. The next venture I take on, the next operator I hire, the next investor I sit with, will get a better version of the conversation because the source material is now legible.
/insights report covering 2026‑04‑08 → 2026‑05‑09