Claude: "Yes, I did it, it's perfect"

There was a month earlier this year where I caught myself just copy-pasting client bug reports into Claude Code.

The agent was running with --dangerously-skip-permissions (it doesn't ask me to confirm anything. It just runs). I'd paste in what the client sent over, walk away, come back ten minutes later, and check if the changes made sense. If it did, I shipped. If it didn't, I'd paste the error back in and let Claude figure it out.

What was I actually adding? I wasn't writing code. I wasn't making architectural calls. I was a copy-paste bridge between a client's Slack message and a capable-enough agent doing the work. And I was charging for it.

Something had shifted. I just hadn't named it yet.

Clients like you

The copy-paste realization wasn't the start of the existential crisis. Just one wave of it. The crisis had been simmering for a while, on and off. Some weeks my wife could read it on me across the room. Some weeks I'd be ranting at strangers at kids birthday parties. The question underneath was always the same: what am I actually adding here? And underneath that one, the heavier version: what work will my kids do?

What boiled it over was the inbound.

We started getting messages from clients who'd already built the thing.

Not "I have an idea." Not "Here is a broken Lovable app." Actual working applications. With real users. Stripe integrations. Production-ish. Internal tools, customer-facing SaaS, both. Usually running somewhere on Vercel or Replit. The founders hadn't written any of the code themselves. They'd prompted it into existence with Claude Code, Replit, Lovable, etc. Three years ago, people came to us because they couldn't build the thing. Now they came because they already had, and they couldn't tell what they were holding. One was a COO at a large grant-services company. Another was a lawyer building a legal SaaS for other lawyers. Another was an operations director at a medical practice with dozens of providers.

On a call with one of them, the rant slipped out: "This is kind of an existential crisis, right? The AI is getting better. What's the role of developers?"

Then I heard myself say the quiet part out loud: "We're encountering a lot of clients like you. People who took the project 90% of the way themselves."

Three or four more calls in the same shape over the following weeks really ground it in. This was the new top of our funnel. If our value was "we can build it for you," our value was about to be zero.

Under the hood

What pulled me out of the spiral was actually looking at the apps.

They looked great on live demos. Some even had a handful of real beta users. Then we'd ask about the code architecture, or the deployment pipeline, or logging, backups, security. Blank stares. Or we'd go look ourselves and find horrors.

The clearest framing I've heard for this came from my account lead. He used it with a client last week, and I've been stealing it ever since:

❝

A team of shipbuilders starts by designing the hull, the electrical, the plumbing, and the security system as unified blueprints. Every switch connects to the right circuit, every pipe lines up with the next room, every door has its own key. What happens instead is a passenger doing the directing. They walk the AI room to room: "this room needs lights, this one needs a toilet, oh and we probably need locks." The AI builds each room to match the description, wiring lights into whatever circuit is closest, running pipes through whatever wall is nearby, copying the same key for every door.

Everything that was asked for got built. Every room is legitimately in the ship. But the ship is not a ship. It's a set of rooms floating next to each other. The moment the ship hits water, everything floods.

I keep coming back to this metaphor whenever the existential crisis creeps back (usually after a new round of AI hype). Yes, AI will keep getting better at building what you ask for. What doesn't improve on the AI's side is knowing what you should have asked for. For a passenger-prompted app to actually hold up, the AI would have to upskill the prompter into a shipbuilder. An explanation can only be simplified so far before it stops being accurate enough to decide from. The ceiling isn't the AI. It's the passenger.

The one who stopped, and the one who didn't

Two cases from this month make the pattern visible.

Client one. A COO at a medium size grant-services firm. Her CEO asked her to eliminate the team's busywork. She opened Claude Code and, over sixty days of evenings and weekends, built them an internal client-management and workflow-tracking app. Vercel, Supabase, Microsoft auth, document upload, email integration, the works. She showed it to her coworkers and, by her words, they "were picking their jaws off the floor."

Then she stopped. She'd set up a QC step "kind of loose on purpose" because she knew her coworkers wouldn't use anything rigid. But she'd hit the ceiling of what she could evaluate herself, and she knew it. "I don't know what I don't know," she told me in the first meeting. "So maybe there's something I just don't know to be worried about."

On a first-pass review her codebase actually looked pretty good. Claude Code's default output is usually fine at a surface level. The deeper pass was a different story. The build script ran prisma db push --accept-data-loss on every deploy, meaning any schema change could silently drop production data. She'd been developing locally against a Supabase database while production actually ran on a completely different Neon instance. She didn't realize the two weren't connected, and Claude Code hadn't flagged it either. Fourteen API routes were completely unauthenticated, including admin endpoints exposing contact PII.

If she'd pushed on without the audit, the most likely outcome was some combination of silent data loss on a deploy, a security breach via the open admin routes, or a codebase too entangled to recover.

Client two, the contrast. A lawyer building a legal SaaS (case management, trust accounting, document review, billing, compliance, all of it) as a commercial B2B product to sell to other law firms. Bootstrapped from personal savings. Built solo on Replit using AI. By the time we got on the discovery call, she'd done eight demos, had twelve more firms queued, was pursuing an angel check from one of the demo prospects, and had a target launch date four weeks out.

She had also, the week before, signed a contract with a SOC 2 compliance firm. SOC 2 Type 2 is the enterprise-grade security attestation. It requires demonstrating six to twelve months of operational controls.

My team found over a hundred issues in the first six hours of audit. The backend was a single seventeen-thousand-line file. There was no tenant isolation, which for a multi-tenant legal app means any lawyer at any firm could (hypothetically) pull any other firm's privileged client documents by URL. Payment tokens were generated with Math.random(). The production Adobe PDF API key was hardcoded into frontend code shipped to every user's browser. IOLTA trust-account balances (state-bar regulated, where an incorrect balance can trigger disbarment) were read and written without a database lock. The SOC 2 contract was a week old. Her actual state was a product with zero working access controls.

On the call, she said something I suspect every vibe coder comes to eventually. The later they get there, the more expensive it is:

❝

"I think I was, like, overly optimistic with, because, you can build so fast with AI, it's like, I'll just do this feature next, I'll do this feature next. And then, [coding agents] are so encouraging. They're just, yes, I did it, it's perfect."

She isn't reckless. Reckless would require her to have known what she was missing. Shes a real lawyer with real domain expertise in an industry that genuinely needs better tooling. She saw a gap, used the tools available, built something impressive. The tools didn't tell her when to stop. The AI told her, every feature along the way, "yes, I did it, it's perfect." And she believed it, because she had no other reference point.

Two founders. Same tools. Different outcomes. The difference wasn't capability. It was knowing where the boundary was. The first one knew because she'd spent a career running a 22-person company and had developed the instinct that when something is load-bearing, you call someone who knows what load-bearing means.

What this means for the agency

I'm rewriting our landing page this month. The pitch used to be about building software. Now it's about owning whether the thing actually works in the real business it's deployed into. Whether it holds under load. Whether it survives a regulator. Whether it stays in sync with a workflow that changes every quarter.

I'm still figuring this part out. I don't have a neat playbook for what a sustainable AI consultancy looks like in 2028 - nobody does. But I know what we're investing in: getting deep enough into specific industries (legal, healthcare, professional services) that we accumulate the business context an LLM can't.

And I know what we're not doing anymore. The month I described at the top, where I was a copy-paste bridge and charging for it: that was brief, maybe six weeks. Most agencies are still inside that window and not saying so out loud. The honest version is that a lot of "developer hours" are AI hours with a human copy-pasting, and the invoice doesn't distinguish. Going forward, clients pay for human time and judgment. AI compute is overhead. If a problem can be solved by handing it to Claude unsupervised, we shouldnt be charging for it, and we're going to stop pretending otherwise.

Hiring is a mess right now, incidentally. Every application I get is run through an LLM. The cover letters are identical. The resumes are optimized for ATS scrapers that are also LLMs. The signal I used to read from "can this person write, can they think clearly under a word budget, can they communicate" has been flattened. This is its own essay.

The honest part

I'm firing my accountant this year. My taxes are straightforward. An LLM can do them.

But here's what I'm actually giving up. If the IRS sends me a letter three years from now, my accountant could have said: I flagged that, here's the email where I warned you. He carries professional liability. The AI doesn't. Every line of AI-generated advice is legally mine, and the vendor disclaims everything. By firing my accountant, I'm not replacing him. I'm transferring his liability onto myself in exchange for the cost savings. That's the real trade. I'm taking his risk.

The only reason I can make this trade is that I'm probably in the 0.1% of people fluent enough with AI to supervise it meaningfully. Most people aren't. Most people, told they can fire their accountant, will accept the liability transfer without noticing it happened, and three years later will get a letter they can't answer.

I have two young kids. I think about what they're going to do for work a lot. I don't have a confident answer, and I don't trust anyone who does. But I notice that everyone I talk to about the displacement question (the executives replacing employees with AI, the consultants pitching the replacement, the parents of teenagers) has the same reflex. Some version of "what do you want me to do about it." It gets passed on, compartmentalized. Nobody owns the question.

That's the part that actually worries me. Not AGI. The question that continues to be unanswered.

The bet

So here's the bet I'm making with the agency.

Code will be effectively free. But the ceiling I described will hold for a long time. The limitation isn't the AI, it's the passenger. The work that matters stays stubbornly human: understanding a business deeply enough over time to be trusted with its outcomes, owning the consequences when something goes wrong, knowing where the load-bearing walls are. Not because humans are magical, but because knowing what to ask for is still the whole game, and AI hasn't changed that yet.

That's a narrower strip of ground than agencies used to stand on. "We write good code" is gone. What's left is knowing which questions to ask before the first line gets written.

The COO at the grant-services firm understood this, I think, in the second meeting. She said, "I'm willing to be your guinea pig, because I know that this is new." She was telling me she knew we were both figuring out the rules in real time, and she was okay with that, because the alternative, her and Claude alone, had a ceiling that scared her.

The lawyer building the legal SaaS would probably have learned it, eventually. Or she'd keep racing the avalanche and one of her twelve queued law firms will discover the cross-firm document leak, and she'd learn it the expensive way.

AI writes the code. We own the outcome. I don't know if that's enough. But it's a better bet than billing for Claude time and hoping nobody notices.