From typing commands to orchestrating agents
Six months ago I typed every command. Now I describe what I want, and the agent figures out which tools to call. Notes from rebuilding my engineering workflow around LLMs.
Six months ago I typed every command. CLI for the orchestrator. API queries for the analysis platform. Click-throughs in the controller GUI. Same questions, same answers, same fingers.
Now I describe what I want and the agent figures out which of the forty-something tools to call. Last week I asked: “if this east-west fiber goes down, which other links lose redundancy?” Half a minute later I had the diff, the SRLG overlap, and a draft of the remediation. I never opened a tool.
This is what Andrej Karpathy is pointing at when he calls our era Software 3.0. His thesis, in one paragraph: Software 1.0 was hand-written code. Software 2.0 was neural-net weights you trained. Software 3.0 is when the LLM itself becomes the computer and you program it in English, with prompts, context, and tools. He puts it bluntly: “prompts are programs.” The compiler is the model. The runtime is the agent.
When I first heard him say it, I nodded the way you nod at slideware. Then I tried it on my actual work and the floor moved.
What I actually did
The unlock was not the LLM. It was wrapping my existing engineering tools so an agent could use them.
I have a network analysis platform that answers “what would happen if”. An orchestration system that pushes config to live devices. A controller that knows the realtime state of the network. Each of them has had a perfectly serviceable API for a decade. None of them was written for an LLM to call. So I wrapped them — exposed the verbs as tools, gave each tool a clear schema and a one-line description, and let the agent decide when to use which.
The pattern has a name now: MCP — a tool surface designed for agents, not for humans. The work was not glamorous. Read the API docs. Define the inputs. Write the wrapper. Repeat for every verb that mattered. The reward is that I now have a few dozen tools across three platforms that a model can compose on demand.
The IDE on top is Cursor. The agent inside it reads my code, my notes, my git history, the topology, the operational state — and decides. Sometimes wrongly. That’s the next section.
The shift, in one sentence
Programming used to be: I know what I want, I know how to express it, I type it.
Now it is: I know what I want, I describe it once, the agent drafts it, I read what the agent did, I correct, repeat.
The bottleneck moved. It used to be syntax, library knowledge, sheer typing. Now it is the clarity of my intent and the discipline of my verification. Karpathy calls this the generator-verifier loop. The generator is cheap and fluent and confidently wrong. The verifier — me — has to stay sharp.
What I missed at first
I missed how much of the value is in the small, boring layer between the LLM and the existing software. Not the model. Not the prompt. The shape of the interface you give the agent.
I missed that documentation now has two audiences. Humans skim. Agents ingest. A skill written in markdown, with explicit gating questions and an unambiguous workflow, is a far better artifact than a wiki page that “explains” the same thing in prose. I now write skills the way I used to write internal wikis.
I missed that the agent will not invent context. If the topology, the design rules, the customer’s bias toward latency-over-cost — none of that is in the workspace, none of it shows up in the answer. So I started parking context in the repo: rules, skills, captured gotchas, voice profiles. The agent stops asking dumb questions once the answers are sitting next to the code.
I also missed how much fun this is. The friction of I want to do X, but the tooling makes it tedious has collapsed for whole categories of work. Writing this post — drafting, translating to German, generating an image, publishing to my own site, verifying it’s live in the RSS feed — was one paragraph of instruction.
What is still hard
Confident wrongness. The model will produce a plausible answer about a network it has never seen, and the answer will be wrong in a way you only catch if you already knew the right answer. Verification is non-negotiable. Tests, sanity-checks, “show me the actual output”, read the trace by hand. The same way you would supervise a smart but new engineer.
Designing for agents. The interface you write for a human (icons, wizards, modal dialogs) is hostile terrain for an agent. You end up rewriting the affordances — tools instead of buttons, schemas instead of forms, structured logs instead of toasts.
Knowing when not to use the model. Software 1.0 has not gone away. The agent is wonderful at orchestration and drafts; it is terrible at things that need bit-exact correctness without a verifier. The skill is knowing which layer the problem belongs in.
Where I’m going next
More tools, exposed as MCP. More skills, captured from things I find myself explaining twice. Tighter generator-verifier loops, including agent-on-agent evaluation for cases where my own judgement is the bottleneck.
The cliché says programming is now limited only by your imagination. That’s almost right. It is limited by your ability to specify what you imagine — clearly enough that an agent can execute it, and precisely enough that you can tell when it didn’t.
Which, it turns out, is just engineering.