Author of the piece here :-). We are not building coding agents and focused on quite different stuff… I am just trying to share my personal experience as a software person!
~~Counter~~ add to that - Armin Ronacher[0] (Flask, Sentry et al.), Charlie Marsh[1] (ruff, uv) and Jarred Sumner[2] (Bun) amongst others are tweeting extensively about their positive experiences with llm driven development.
My experience matches theirs - Claude Code is absolutely phenomenal, as is the Cursor tab completion model and the new memory feature.
Not a counter — antirez is posting positive things too.
Charlie Marsh seems to have much better luck writing Rust with Claude than I have. Claude has been great for TypeScript changes and build scripts but lousy when it comes to stuff like Rust borrowing
I'll add - they do seem to do better with Go and Typescript (particularly Next and React) and are somewhat good with Python (although you need a clean project structure with nothing magic in it).
This one seems really sloppy and confused; he describes three "modes of vibe coding" that involve looking at the code and therefore aren't vibe coding at all, as the definition he quoted immediately previously from Karpathy makes clear. Maybe he's writing his code by hand and letting Claude write his blog posts.
Not OP, and I don't have specific stake in any AI companies, but IMHO (as someone doing web-related things for a living (as a developer, team lead, "architect", product manager, consultant, and manager) since 1998, I think we pretty much all of us have skin in the game, whether or not we back a particular horse.
If you believe that agents will replace software developers like me in the near term, then you’d think I have a horse in this race.
But I don’t believe that.
My company pays for Cursor and so do I, and I’m using it with all the latest models. For my main job, writing code in a vast codebase with internal frameworks everywhere, it’s reasonably useless.
For much smaller codebases it’s much better, and it’s excellent for greenfield work.
But greenfield work isn’t where most of the money and time is spent.
There’s an assumption the tools will get much better. There are several ways they could be better (e.g. plugging into typecheckers to enable global reasoning about a codebase) but even then they’re not in replacement territory.
I listen to people like Yann LeCun and Demis Hassabis who believe further as-yet-unknown innovations are needed before we can escape a local maxima that we have with LLMs.
> For personal tools, I’ve completely shifted my approach. *I don’t even look at the code anymore - I describe what I want to Claude Code, test the result, make some minor tweaks with the AI and if it’s not good enough, I start over with a slightly different initial prompt. The iteration cycle is so fast that it’s often quicker to start over than trying to debug or modify the generated code myself.* This has unlocked a level of creative freedom where I can build small utilities and experiments without the usual friction of implementation details. Want a quick script to reorganize some photos? Done. Need a little web scraper for some project? Easy. The mental overhead of “is this worth building?” has basically disappeared for small tools. I do wish Claude Code had a mode where it could work more autonomously - right now it still requires more hands-on attention than I’d like - but even with that limitation, the productivity gains are wild.
So I suppose the chasm is that actually doing programming is dead, or quickly dying, and if that's the thing you actually enjoyed doing, then tough luck.
I'm personally still in the "smarter autocomplete" phase when it comes to LLMs, as I don't trust the vibe-coded "agents" and the outputs they produce to control my computer. But that aside, this part stood out to me:
> I don’t even look at the code anymore - I describe what I want to Claude Code, test the result, make some minor tweaks with the AI and if it’s not good enough, I start over with a slightly different initial prompt.
Honestly, does the author and anyone else using this workflow find this way of working enjoyable? To me programming is not entirely about the end goal. It's mostly the small bursts of dopamine whenever I solve a particular problem; whenever I refactor code to make it cleaner, simpler, and easier to read; whenever I write a test and see it pass, knowing that I'm building a safety net to safely refactor in the future. And so on.
Yes, the feeling of accomplishment after shipping a useful piece of software, be that a small script or a larger part of a system, is also great. But the small wins along the way are the things that make me want to keep programming.
This way of working where you don't even look at the code, but describe the system specs in prose, go back and forth with an extremely confident but highly error prone tool, manually test the result, and repeat this until you're satisfied... doesn't sound fun or interesting at all.
Just remember, you can continue to do artisanal programming as a hobby or for your own projects. But if you have an employer, they're paying you for functional and secure features, not lines of code.
Exactly what the execs want; a valid excuse to reduce entire engineering departments to five people in a closet and scores of underpaid offshored people vibe coding all the things.
Up to now, my attempts at doing what the author claims to be possible ends up in a broken piece of code that the agent only makes worse when asked to debug, and finally it wont even compile. There seems to be a threshold of difficulty above which the agent will go bug-runaway. I honestly haven't seen this threshold going up. If anything, it seems to be saturating.
When I use an LLM for (much of anything) I always feel like that scene in the first Iron Man movie where Stark is trying to build stuff with the robot and it almost, but doesn't quite, do what he wants, and then screws something all the way up
The chasm between "practical and useful utilities that have long term viability without a vast knowledge of the underlying mechanisms" and what AI-immersed devs consider "practical" and "useful" grows wider every time I check in.
Snarky and dismissive, sure. But the Wii wasn't a "1 to 1 motion matching" machine no matter how many people insisted it was. It was just "better than anything before had ever been". Which is not the same thing as "good". I'm not holding anything against anyone. The Wii was an incredible console, an LLMs are an incredible technology. I'd just like to read some thoughts on the tech from people who are more aligned with myself in their discernment. If, for nothing else, some variety.
> “We’re entering an era of untrustable code everywhere” - This assumes AI-generated code is inherently less trustworthy than human-written code, which isn’t obviously true.
It's not true if your humans are on controlled substances all the time, it is true if we are talking about real humans.
I've been testing coding agents on real code and I can say without a doubt that they make worse mistakes than humans.
Was this article written by AI? If I argue with it, am I arguing with a real person? Is it written by a corporate shill? Again, if I argue with it, am I talking to a wall?
AI (and before that, corporations) makes skepticism more and more a basic survival skill.
Since this is partly an experience report, it is only as trustworthy as its author, whoever that is. What is this person risking by writing it?
The content seems plausible to me. However, what I’m missing here is:
- how does he test?
- how does he keep himself sharp while his tools are doing so much?
- How does he model the failure modes of this approach, or does he just feel it?
I am not having the same feeling of success as this guy is as I experiment with the same tech. Maybe he’s better than me at using it. Or maybe he’s easily impressed.
Author here… I wrote it, I used Claude for proofreading/editing as mentioned at the end. Anyway point is, real human here!
I do still read the code _except_ when I am consciously vibe coding a non production thing where I will know empirically that it worked or not by using it.
I’m definitely not using agents to do all my coding (as I hope is reasonably) clear from the post. But they have crossed this line from pointless to try to genuinely useful for many real world problems in just the last couple of months in my experience.
I agree they’re better but author should add add
“Disclaimer: I’m the CEO of a company that sells agents as a service”
at the top of and article promoting said agents.
Author of the piece here :-). We are not building coding agents and focused on quite different stuff… I am just trying to share my personal experience as a software person!
Absolutely — but I also think there’s a strong resistance to managers saying “AI is good, really”.
The experience of long-term software engineers (e.g. antirez) who don’t have a horse in the AI race tends to line up much better with my own.
Also really like this one: https://diwank.space/field-notes-from-shipping-real-code-wit...
~~Counter~~ add to that - Armin Ronacher[0] (Flask, Sentry et al.), Charlie Marsh[1] (ruff, uv) and Jarred Sumner[2] (Bun) amongst others are tweeting extensively about their positive experiences with llm driven development.
My experience matches theirs - Claude Code is absolutely phenomenal, as is the Cursor tab completion model and the new memory feature.
[0] https://x.com/mitsuhiko
[1] https://x.com/charliermarsh
[2] https://x.com/jarredsumner
Not a counter — antirez is posting positive things too.
Charlie Marsh seems to have much better luck writing Rust with Claude than I have. Claude has been great for TypeScript changes and build scripts but lousy when it comes to stuff like Rust borrowing
Apologies I misread! Updated.
I'll add - they do seem to do better with Go and Typescript (particularly Next and React) and are somewhat good with Python (although you need a clean project structure with nothing magic in it).
This one seems really sloppy and confused; he describes three "modes of vibe coding" that involve looking at the code and therefore aren't vibe coding at all, as the definition he quoted immediately previously from Karpathy makes clear. Maybe he's writing his code by hand and letting Claude write his blog posts.
Not OP, and I don't have specific stake in any AI companies, but IMHO (as someone doing web-related things for a living (as a developer, team lead, "architect", product manager, consultant, and manager) since 1998, I think we pretty much all of us have skin in the game, whether or not we back a particular horse.
Really depends what you believe.
If you believe that agents will replace software developers like me in the near term, then you’d think I have a horse in this race.
But I don’t believe that.
My company pays for Cursor and so do I, and I’m using it with all the latest models. For my main job, writing code in a vast codebase with internal frameworks everywhere, it’s reasonably useless.
For much smaller codebases it’s much better, and it’s excellent for greenfield work.
But greenfield work isn’t where most of the money and time is spent.
There’s an assumption the tools will get much better. There are several ways they could be better (e.g. plugging into typecheckers to enable global reasoning about a codebase) but even then they’re not in replacement territory.
I listen to people like Yann LeCun and Demis Hassabis who believe further as-yet-unknown innovations are needed before we can escape a local maxima that we have with LLMs.
You need to use better coding agents and workflows.
Long-term software engineers do have an anti-horse in the AI race - a lot of us eventually could be replaced by a coding agent.
Most of us have been replaced by Microsoft Excel already though. Or by a compiler.
very true - for many tasks excel is enough, better, and faster
Long term software engineers very much have a horse in the AI race. It threatens their jobs and importance.
Took me a while to find this [1]: "We’re building the next-gen operating system for AI agents."
--
1: https://sdsa.ai/
Always super helpful to post more guidelines on how to use LLMs more effectively!
> For personal tools, I’ve completely shifted my approach. *I don’t even look at the code anymore - I describe what I want to Claude Code, test the result, make some minor tweaks with the AI and if it’s not good enough, I start over with a slightly different initial prompt. The iteration cycle is so fast that it’s often quicker to start over than trying to debug or modify the generated code myself.* This has unlocked a level of creative freedom where I can build small utilities and experiments without the usual friction of implementation details. Want a quick script to reorganize some photos? Done. Need a little web scraper for some project? Easy. The mental overhead of “is this worth building?” has basically disappeared for small tools. I do wish Claude Code had a mode where it could work more autonomously - right now it still requires more hands-on attention than I’d like - but even with that limitation, the productivity gains are wild.
So I suppose the chasm is that actually doing programming is dead, or quickly dying, and if that's the thing you actually enjoyed doing, then tough luck.
This era sucks. The suits have finally won.
(emphasis mine)
I'm sure there will be a market for artisanal, hand-crafted code.
If not, just do it for yourself.
I'm personally still in the "smarter autocomplete" phase when it comes to LLMs, as I don't trust the vibe-coded "agents" and the outputs they produce to control my computer. But that aside, this part stood out to me:
> I don’t even look at the code anymore - I describe what I want to Claude Code, test the result, make some minor tweaks with the AI and if it’s not good enough, I start over with a slightly different initial prompt.
Honestly, does the author and anyone else using this workflow find this way of working enjoyable? To me programming is not entirely about the end goal. It's mostly the small bursts of dopamine whenever I solve a particular problem; whenever I refactor code to make it cleaner, simpler, and easier to read; whenever I write a test and see it pass, knowing that I'm building a safety net to safely refactor in the future. And so on.
Yes, the feeling of accomplishment after shipping a useful piece of software, be that a small script or a larger part of a system, is also great. But the small wins along the way are the things that make me want to keep programming.
This way of working where you don't even look at the code, but describe the system specs in prose, go back and forth with an extremely confident but highly error prone tool, manually test the result, and repeat this until you're satisfied... doesn't sound fun or interesting at all.
Just remember, you can continue to do artisanal programming as a hobby or for your own projects. But if you have an employer, they're paying you for functional and secure features, not lines of code.
He's quoting someone saying they don't even look at the code and you're saying "functional and secure"?
Exactly what the execs want; a valid excuse to reduce entire engineering departments to five people in a closet and scores of underpaid offshored people vibe coding all the things.
Up to now, my attempts at doing what the author claims to be possible ends up in a broken piece of code that the agent only makes worse when asked to debug, and finally it wont even compile. There seems to be a threshold of difficulty above which the agent will go bug-runaway. I honestly haven't seen this threshold going up. If anything, it seems to be saturating.
When I use an LLM for (much of anything) I always feel like that scene in the first Iron Man movie where Stark is trying to build stuff with the robot and it almost, but doesn't quite, do what he wants, and then screws something all the way up
The chasm between "practical and useful utilities that have long term viability without a vast knowledge of the underlying mechanisms" and what AI-immersed devs consider "practical" and "useful" grows wider every time I check in.
Snarky and dismissive, sure. But the Wii wasn't a "1 to 1 motion matching" machine no matter how many people insisted it was. It was just "better than anything before had ever been". Which is not the same thing as "good". I'm not holding anything against anyone. The Wii was an incredible console, an LLMs are an incredible technology. I'd just like to read some thoughts on the tech from people who are more aligned with myself in their discernment. If, for nothing else, some variety.
> “We’re entering an era of untrustable code everywhere” - This assumes AI-generated code is inherently less trustworthy than human-written code, which isn’t obviously true.
It's not true if your humans are on controlled substances all the time, it is true if we are talking about real humans.
I've been testing coding agents on real code and I can say without a doubt that they make worse mistakes than humans.
Was this article written by AI? If I argue with it, am I arguing with a real person? Is it written by a corporate shill? Again, if I argue with it, am I talking to a wall?
AI (and before that, corporations) makes skepticism more and more a basic survival skill.
Since this is partly an experience report, it is only as trustworthy as its author, whoever that is. What is this person risking by writing it?
The content seems plausible to me. However, what I’m missing here is:
- how does he test?
- how does he keep himself sharp while his tools are doing so much?
- How does he model the failure modes of this approach, or does he just feel it?
I am not having the same feeling of success as this guy is as I experiment with the same tech. Maybe he’s better than me at using it. Or maybe he’s easily impressed.
Author here… I wrote it, I used Claude for proofreading/editing as mentioned at the end. Anyway point is, real human here!
I do still read the code _except_ when I am consciously vibe coding a non production thing where I will know empirically that it worked or not by using it.
I’m definitely not using agents to do all my coding (as I hope is reasonably) clear from the post. But they have crossed this line from pointless to try to genuinely useful for many real world problems in just the last couple of months in my experience.