Cambridge, 2023 — when I decided to ignore AI.

I sat through the McKinsey keynote that introduced Lilli — the firm's first private LLM — and decided, in real time, to ignore it. Two years later: three Claude Code sessions, eight live systems, one agentic engineer running a snack business. This is the field note for the journey in between.

Lucas Zhu in Cambridge, July 2023, wearing the McKinsey lanyard reading 'Lucas Zhu — Private Equity & Principal Investors / Strategy & Corporate Finance, Sydney'. — July 2023, Cambridge — at the McKinsey engagement‑manager off‑site, eight days before the keynote that introduced Lilli. *Private equity, principal investors, strategy and corporate finance.*

The slide. Then the demo.

July 2023. I was in Cambridge, on the McKinsey engagement‑manager off‑site — what is, in spirit, the Coachella of McKinsey for engagement managers. We gather, we eat well, we listen to keynotes about the latest internal tools, and we go back to our practices smarter than we left. NVIDIA shares were trading at $42 USD; two years later they would trade at multiples of that, but in the room nobody knew which way the index would move.

I was representing private equity principal investors and the strategy and corporate finance practices. The keynote that mattered, in retrospect, was on language modelling. The presenter walked us through how the architecture worked and why it was supposed to be transformative — probability distributions over sequences of words, a worked example comparing he likes apples with apples likes he, the slide ending on the question: given the observed training text, how probable is this new utterance?

Photograph from the audience of the Cambridge keynote slide titled 'Language Modelling' — explaining that a language model assigns a probability to a sequence of words, with the formula sum p(w) = 1 and worked examples comparing 'he likes apples' versus 'apples likes he', and 'he likes apples' versus 'he licks apples'. — The slide on language modelling — the math was the math, and the room was full of people more comfortable in Excel than in probability notation.

I read it twice. To someone analytical and strategic but not technically trained — which described most of the engagement managers in that room, including me — the math was a different language. The honest, clear‑eyed thought that ran through my head was the one most of us would have admitted under enough wine:

"I don't understand a damn thing."

Then they ran the live demo of the first version of Lilli — the firm's privately trained model. It crashed.¹

So I decided to ignore it. Not consciously, not strategically — the way you ignore the weather forecast when you have no plans to go outside. I went back to my engagements. The model would mature; someone else would figure it out; my job was to keep my clients well served and my decks clean.

What I cared about.

If you'd asked me, in that room in July 2023, what mattered to my career — what tools I would have defended on a desert island — the honest answer was two things. Excel. And PowerPoint.

That was the consulting identity in 2023. The job was force — analytical horsepower, expensively produced and beautifully formatted. The job was not, at any point, "build the system that produces the analysis." The system was the firm. You were inside it.

The thing that changed.

What broke the consulting identity, in the end, was not AI. It was a daughter.

I left McKinsey because I wanted to spend more time with her, and because I wanted to build something I could point to. The plan, when I put it on paper, was two brands — Malo Studios in Jakarta, the wife‑and‑husband studio my partner Yiqing and I had set up to design and ship the consumer side, and Frollie, the snack line — both needing an operating backbone. One of me, building it.

The two brands. Malo Studios designs and ships; Frollie sells. Both run on the same agent‑built operating system.

I had to force myself to realise something I had been able to avoid for the seven years inside McKinsey: if I wanted to scale a brand — let alone two — I needed to learn how to build systems. Not theorise about systems on a whiteboard. Build them. The good news, somewhere in the back of my head, was that I am an Information Systems graduate from UNSW. I knew, abstractly, what systems were. The bad news was I had not opened a code editor in nearly a decade.

Christmas Day, 2025.

Here is the honest baseline most operators don't write down.

On the 25th of December 2025, my approach to building software was: open ChatGPT in a browser tab, ask it to write a small piece of business logic, copy whatever it produced, paste it into something the model had told me was Python, and try to run it. When it failed — and it usually failed — I would copy the error back into the chat, paste the next version, and repeat.

Screenshot of a chatbot prompt interface: 'What's on the agenda today?' with the text input reading 'Make me an ERP system for my FMCG business in Indonesia. Make no mistakes.' — December 2025, the actual prompt. *"Make me an ERP system for my FMCG business in Indonesia. Make no mistakes."* No further specification. Predictably, mistakes were made.

That was the level. December 2025. Three months ago, in real time. I am writing this down because the gap between that and what came next is the entire argument of this piece, and the reader who has been there will recognise the texture of it: the small wins, the longer dead‑ends, the constant feeling that the model knew something you did not.

The Jakarta terminal.

I switched to Claude Code in early January 2026 after a week of trying to keep ChatGPT‑pasted Python alive in production — that didn't go well, and the switch was the single most important decision in this story.

April 2026, four months later, I am sitting in front of a terminal in Jakarta with three Claude Code sessions running in parallel. On the left pane is what I think of as the architecture conversation — I am discussing, with what I have started calling my agent CTO, what we are actually building this morning. Not how. What.

What that is, mechanically, is a long‑lived Claude Code session anchored to a curated CLAUDE.md and a sub‑agent persona file I have iterated on for months. The agent CTO has the architectural shape of Frollie's repo loaded as context, my taste preferences spelled out as guardrails, and the failure modes from prior sessions recorded so it does not make the same mistake twice. It is not a separate model. It is the same model with calibrated priors and a memory of what good and bad have looked like in this codebase.

Top right is a debug session: a kitchen function had an issue overnight and I want it cleared before lunch. Bottom right is a planning session that just finished — a saved plan ready to execute later in the week. A status line at the bottom of the terminal is reading the model's working state directly so I always know exactly where each agent is in its loop.

Claude Code terminal with three live sessions in parallel — an architecture discussion with the agent CTO on the left, a debug session top right, and a planning session bottom right; a status line at the bottom reads the model's current state. — The April 2026 terminal — three Claude Code sessions, an agent CTO, no IDE in sight.

The thing I want to be honest about is that I am not, in any meaningful sense, writing the code anymore.

"I'm not really writing any code."

What I am doing is expressing my thoughts — clearly, repeatedly, with enough specificity that the model can take them and produce something that survives in production. The work is the spec, the review, and the failure modes. The work is calibration: knowing what good looks like, knowing what bad looks like, and giving the agent enough of both to keep it honest.

What Frollie actually is.

I have a snack business. We make Dubai cookies right now, we make corn pops, a few other things. The first thing I needed when I started was a way to take orders. So I started there — a single agent‑built ordering system, end to end, before lunch on the day I started. Then I needed to know how the kitchen worked, so I built the kitchen module. Then packaging. Then dispatch. Then inventory. Then financials. Then accounting. Then the configuration layer so we could run new products without writing more code. One module at a time, integrated as I went.

Frollie Pro internal dashboard — header reads 'Good morning, Lucas'. Eight module cards visible: Operations (orders, kitchen, packaging), Inventory & Supply (locations, ingredients, bulk prices), Sales & Distribution (revenue analytics, GoFood Depot, GrabFood, sales analytics, K3Mart cockpit), Financials (income statement, expenses, marketing performance, reimbursements, payroll), Accounting (journal entry, chart of accounts, bank reconciliation, asset register), Configuration (production components, customers, WhatsApp messaging), Admin (menu products, vouchers), and Help & Training (modules, expense register guides). — Frollie Pro, May 2026. Eight modules. Thirty‑two features. *Good morning, Lucas.*

The economics of that build, plainly: roughly USD $700–800 per month in agent spend through the first sixty days — Claude Code subscription plus API. Across three brands and six channels — direct sales (DMs), food delivery (GoFood, GrabFood, ShopeeFood), TikTok Shop, Shopee, cafe distribution, and K3Mart — Frollie Pro now runs the daily reps without a technical hire. Multi‑channel margin reconciliation, MOQ planning, and ingredient‑cost roll‑ups are in the system because they had to be from week one. Returns from modern trade and BPOM/halal documentation drag are not yet pressure‑tested at our current scale — when those harder tests come, the bet is that the same agent topology absorbs them, but I will not claim that has been proven yet.

The agent stack is not a single point of failure either. Every workflow lives in markdown, every spec is versioned, and a competent operator with the runbook can keep Frollie running for a fortnight without me. I will not call that battle‑tested until it has been, but the architecture is honest about it.

Sixty days of build. Eight systems live — Operations, Inventory & Supply, Sales & Distribution, Financials, Accounting, Configuration, Admin, and Help & Training. Thirty‑two features across them. One agentic engineer on review. No engineering hire. If I tried to take that screenshot back to the Lucas sitting in Cambridge in July 2023, he would not believe it was possible — and, to be fair, in 2023 it was not. A single person could not have built that system in 60 days even if they had been the best developer in Google, because the toolchain to do it didn't yet exist.

What I had instead, two years on, was the only thing that turned out to actually matter: an explicit idea of how the business needed to run, and a practiced way of expressing that to a model so it could be built. The 2023 me was wrong about which half of the work was the moat — and the toolchain to find out had finally arrived.

Where this transfers.

That is the part that translates. It transfers from the consulting bench to the operator's chair, from one product line to another, from FMCG to anything where someone with judgement has to ship under constraint.

What it does not do is replace a Head of Operations who has lived a co‑packer dispute, or a CFO who has watched Tokopedia change commission structures overnight, or any of the relationships those roles are made of. The agent stack accelerates the spec and the review; it does not earn the trust that gets you the bad news at 11pm on a Sunday.

If you are still sitting in the room I was in two and a half years ago, the move is small and unglamorous. Pick one workflow you do every week — the one you would automate if you had a competent intern. Write it as a spec in a markdown file. Hand the spec to Claude Code and review what comes back. Repeat. The first three rounds will be embarrassing. By round ten the model will be doing the work.

Lilli would, two years later, make the news for being hacked — by another AI agent. The model the firm built privately got compromised by a model the rest of the world built openly. The pace of irony in this industry has held steady. ↩

— Lucas, May 2026 · adapted from the UNSW BIS guest lecture, April 2026 edition