This Week in Learning - Agentic Coding Tools

I wrapped up two big initiatives this week—moving a set of services from cloud platforms to Linux VPSs, and finally launching this blog. That freed up some mental space to start experimenting with agentic, command-line coding tools.

I spent time with two of them:

  • OpenAI’s Codex
  • Anthropic’s Claude Code

These tools live in a very different place than ChatGPT in a browser or Copilot-style features embedded directly into an IDE (the programmer equivalent of Word where we write and organize code). I’ve generally avoided IDE-embedded tools because the workflow never quite fit me.

CLI tools are used through a terminal window on your computer, usually from inside the project’s folder. They can inspect the entire project and perform most of the same actions you can from the command line—reading files, running builds, and modifying code.

This post is a snapshot of what I’ve seen so far. What impressed me, where friction showed up, and the questions I’m now carrying forward.

(Note: This post was generated by AI based on my notes, then edited by me for accuracy and clarity)

A quick overview: what these tools actually are

Most people have seen AI through a web interface: you type a prompt, you get a response. That works well for explanations, brainstorming, and small chunks of code.

Agentic CLI coding tools go further.

They can:

  • Read and navigate an entire codebase
  • Query databases directly
  • Run builds and tests
  • Modify files across a project
  • Fix errors by observing failures, not just guessing

Instead of asking for a small chunk of code, you can ask for a piece of a system.

For non-technical readers, think of this like writing a book.

  • Chat-style tools are great for writing a paragraph.
  • These tools can draft many pages at a time and revise chapters based on feedback.

The project context

I’m testing these tools while building a small-to-mid-sized AI-assisted editorial pipeline in .Net with a Razor Pages front end. Without AI help, this would be a 6–8 week project. The goal is to reuse parts from another system while building new workflows on top.

That makes it a good test case:

  • Real architecture
  • Real constraints
  • Real integration work
  • Enough complexity to break things

What impressed me

A few things genuinely stood out:

  • Direct database awareness
    The tools could query my curriculum database directly. I didn’t have to paste examples or explain table relationships.

  • Entity Framework and migrations
    They set up EF, EF Migrations, and even seeded data through migrations. That work is usually tedious and easy to get wrong.

  • Pattern matching from existing projects
    They copied core infrastructure from a source project (base classes, services, CSS, tag helpers). The tools recognized my patterns and used them in new modules.

  • Running and fixing builds
    When something didn’t compile, the tool ran the build, saw the error, and fixed it.

  • Larger, coherent chunks of work
    After enough context was built up, the tools could generate full modules—multiple tables, data models, input models, Razor pages—in a single pass. I normally rely on custom code generators for that kind of work.

  • Codebase reorganization
    Moving files, reshaping folders, and updating references worked better than I expected.

Working this way starts to feel less like “writing code” and more like directing construction.

Where friction showed up

The limitations were just as instructive:

  • Chunk size still matters
    You can work in larger chunks than browser-based tools, but not unlimited ones. For some of the larger chunks they built, I'm going back and rebuilding in smaller pieces at a time.

  • Architecture still has to come from somewhere
    The tools don’t invent good structure on their own. You have to provide or reinforce the architectural intent.

  • Over-scaffolding is easy
    When I asked for a core workflow scaffold as just the UI, I got more than I needed. Removing unnecessary UI pieces was harder than layering things in gradually. That mirrors how most human developers already work: start small, expand intentionally.

  • Cost and limits are real
    I burned through Claude credits quickly and would need to go to the $100/month plan for daily use. OpenAI’s limits were less visible. I worked for five hours one day on a $20/month plan and didn't hit any limit.

What still seems firmly human

Even with these tools, some responsibilities don’t move:

  • Setting goals and success criteria at multiple levels
  • Providing real-world context and examples
  • Guiding discovery, build, and refinement intentionally
  • Validating that the result actually serves a business need

The tool accelerates execution. Judgment still sits with the human.

Questions I’m carrying forward

This is where things get interesting.

  • Will architecture matter as much?
    Architecture shapes quality, scalability, reuse, and maintainability—mostly for humans. Some of those priorities may become inputs to the AI rather than artifacts we optimize for ourselves.

  • Will code organization matter or change?
    Today, code is organized for human discovery and maintainability. If AI becomes the primary reader and editor, does that constraint loosen or change?

  • How far does natural language go?
    Today, the C# I write isn’t the code that actually runs. It compiles to intermediate language, then to machine code I’ve never bothered to read—and don’t need to. C# exists because it’s a form humans can reason about.

    Natural-language coding feels like the next layer in that stack. We may stay aware of the code and its behavior, but we interact at a level optimized for intent rather than syntax. We step in manually only when something subtle, structural, or high-stakes requires it. And like intermediate language, this layer may exist almost entirely outside our attention.

Early thoughts

Building professional-grade software still takes weeks or months. That hasn’t changed.

What has changed is who can plausibly do it—and how much of the work shifts from mechanical execution to intent, direction, and validation.

I’ll keep experimenting. Future TWILs will probably show more opinions as patterns stabilize. For now, this feels like the early innings of a tooling shift that will eventually reach far beyond programmers.