>The people really leading AI coding right now (and I’d put myself near the front, though not all the way there) don’t read code. They manage the things that produce code.
I can’t imagine any other example where people voluntarily move for a black box approach.
Imagine taking a picture on autoshot mode and refusing to look at it. If the client doesn’t like it because it’s too bright, tweak the settings and shoot again, but never look at the output.
What is the logic here? Because if you can read code, I can’t imagine poking the result with black box testing being faster.
Are these people just handing off the review process to others? Are they unable to read code and hiding it? Why would you handicap yourself this way?
Your product managers most likely are not reading your code. Your CEO is not. The vast majority of your company is unlikely to ever look at a line of code.
If the process becomes reliable enough, then there is no reason. For now, that still requires developers to pay attention for important projects, but there are also a lot of AI written tools I rely on day to day that I don't, because the opportunity cost of spending time reading them is lower than the cost of accepting the risk that they do something wrong.
There are also a whole lot of tools I do read thoroughly, because the risk profile is different.
But that category is getting smaller day by day, not just with model improvements, but with improved harnesses.
I think many people are missing the overall meaning of these sorts of posts.. that is they are describing a new type of programmer that will only use agents and never read the underlying code. These vibe/agent coders will use natural(-ish) language to communicate with the agents and wouldn't look at the code anymore than, say, a PHP developer would look at the underlying assembly. It is not the level of abstraction they are working on. There are many use cases where this type of coding will work fine and it will let many people who previously couldn't really take advantage of computers to do so. This is great but in no way will do anything to replace the need for code that requires humans to understand (which, in turn, requires participation in the writing).
Your analogy to PHP developers not reading assembly got me thinking.
Early resistance to high-level (i.e. compiled) languages came from assembly programmers who couldn’t imagine that the compiler could generate code that was just as performant as their hand-crafted product. For a while they were right, but improved compiler design and the relentless performance increases in hardware made it so that even an extra 10-20% boost you might get from perfectly hand-crafted assembly was almost never worth the developer time.
There is an obvious parallel here, but it’s not quite the same. The high-level language is effectively a formal spec for the abstract machine which is faithfully translated by the (hopefully bug-free) compiler. Natural language is not a formal spec for anything, and LLM-based agents are not formally verifiable software. So the tradeoffs involved are not only about developer time vs. performance, but also correctness.
For a great many software projects no formal spec exists. The code is the spec, and it gets modified constantly based on user feedback and other requirements that often appear out of nowhere. For many projects, maybe ~80% of the thinking about how the software should work happens after some version of the software exists and is being used to do meaningful work.
Put another way, if you don't know what correct is before you start working then no tradeoff exists.
> Put another way, if you don't know what correct is before you start working then no tradeoff exists.
This goes out the window the first time you get real users, though. Hyrum's Law bites people all the time.
"What sorts of things can you build if you don't have long-term sneaky contracts and dependencies" is a really interesting question and has a HUGE pool of answers that used to be not worth the effort. But it's largely a different pool of software than the ones people get paid for today.
> For many projects, maybe ~80% of the thinking about how the software should work happens after some version of the software exists and is being used to do meaningful work.
Some version of the software exists and now that's your spec. If you don't have a formal copy of that and rigorous testing against that spec, you're gonna get mutations that change unintended things, not just improvements.
Users are generally ok with - or at least understanding of - intentional changes, but now people are talking about no-code-reading workflows, where you just let the agents rewrite stuff on the fly to build new things until all the tests pass again. The in-code tests and the expectations/assumptions about the product that your users have are likely wildly different - they always have been, and there's nothing inherent about LLM-generated code or about code test coverage percentages that change this.
> So the tradeoffs involved are not only about developer time vs. performance, but also correctness.
The "now that producing plausible code is free, verification becomes the bottleneck" people are technically right, of course, but I think they're missing the context that very few projects cared much about correctness to begin with.
The biggest headache I can see right now is just the humans keeping track of all the new code, because it arrives faster than they can digest it.
But I guess "let go of the need to even look at the code" "solves" that problem, for many projects... Strange times!
For example -- someone correct me if I'm wrong -- OpenClaw was itself almost entirely written by AI, and the developer bragged about not reading the code. If anything, in this niche, that actually helped the project's success, rather than harming it.
(In the case of Windows 11 recently.. not so much ;)
> The "now that producing plausible code is free, verification becomes the bottleneck" people are technically right, of course, but I think they're missing the context that very few projects cared much about correctness to begin with.
It's certainly hard to find, in consumer-tech, an example of a product that was displaced in the market by a slower moving competitor due to buggy releases. Infamously, "move fast and break things" has been the rule of the land.
In SaaS and B2B deterministic results becomes much more important. There's still bugs, of course, but showstopper bugs are major business risks. And combinatorial state+logic still makes testing a huge tarpit.
The world didn't spend the last century turning customer service agents and business-process-workers into script-following human-robots for no reason, and big parts of it won't want to reintroduce high levels of randmoness... (That's not even necessarily good for any particular consumer - imagine an insurance company with a "claims agent" that got sweet talked into spending hundreds of millions more on things that were legitimate benefits for their customers, but that management wanted to limit whenever possible on technicalities.)
OK but, I've definitely read the assembly listings my C compiler produced when it wasn't working like I hoped. Even if that's not all that frequent it's something I expect I have to do from time to time and is definitely part of "programming".
It's also important to remember that vibe coders throw away the natural language spec each time they close the context window.
Vibe coding is closer to compiling your code, throwing the source away and asking a friend to give you source that is pretty close to the one you wrote.
Imagine if high level coding worked like: write a first draft, and get assembly. All subsequent high level code is written in a repl and expresses changes to the assembly, or queries the state of the assembly, and is then discarded. only the assembly is checked into version control.
Or the opposite, all applications are just text files with prompts in them and the assembly lives as ravioli in many temp files. It only builds the code that is used. You can extend the prompt while using the application.
I'm glad you wrote this comment because I completely agree with it. I don't think that there is not a need for software engineers to deeply consider architecture; who can fully understand the truly critical systems that exist at most software companies; who can help dream up the harness capabilities to make these agents work better.
I just am describing what I'm doing now, and what I'm seeing at the leading edge of using these tools. It's a different approach - but I think it'll become the most common way of producing software.
> that is they are describing a new type of programmer that will only use agents and never read the underlying code
> and wouldn't look at the code anymore than, say, a PHP developer would look at the underlying assembly
This really puts down the work that the PHP maintainers have done. Many people spend a lot of time crafting the PHP codebase so you don't have to look at the underlying assembly. There is a certain amount of trust that I as a PHP developer assume.
Is this what the agents do? No. They scrape random bits of code everywhere and put something together with no craft. How do I know they won't hide exploits somewhere? How do I know they don't leak my credentials?
> Imagine taking a picture on autoshot mode and refusing to look at it. If the client doesn’t like it because it’s too bright, tweak the settings and shoot again, but never look at the output.
The output of code isn't just the code itself, it's the product. The code is a means to an end.
So the proper analogy isn't the photographer not looking at the photos, it's the photographer not looking at what's going on under the hood to produce the photos. Which, of course, is perfectly common and normal.
>The output of code isn't just the code itself, it's the product. The code is a means to an end.
I’ll bite. Is this person manually testing everything that one would regularly unit test? Or writing black box tests that he does know are correct because of being manually written?
If not, you’re not reviewing the product either. If yes, it’s less time consuming to actually read and test the damn code
I mostly ignore code, I lean on specs + tests + static analysis. I spot check tests depending on how likely I think it is for the agent to have messed up or misinterpreted my instructions. I push very high test coverage on all my projects (85%+), and part of the way I build is "testing ladders" where I have the agent create progressively bigger integration tests, until I hit e2e/manual validation.
"I push very high test coverage on all my projects (85%+)"
Coverage doesn't matter if the tests aren't good. If you're not verifying the tests are actually doing something useful, talking about high coverage is just wanking.
"have the agent create progressively bigger integration tests, until I hit e2e/manual validation."
Same thing. It doesn't matter how big the tests are if they're not testing the right thing. Also why is e2e slashed with manual? Those are orthogonal. E2E tests can [and should] be fully automated for many [most?] systems. And manual validation doesn't have to wait for full e2e.
>I spot check tests depending on how likely I think it is for the agent to have messed up or misinterpreted my instructions
So a percentage of your code, based on your gut feeling, is left unseen by any human by the moment you submit it.
Do you agree that this rises the chance of bugs slipping by? I don’t see how you wouldn’t.
And considering the fact that your code output is larger, the percentage of it that is buggy is larger, and (presumably) you write faster, have you considered the conclusion in terms of the compounding likelihood of incidents?
There's definitely a class of bugs that are a lot more common, where the code deviates from the intent in some subtle way, while still being functional. I deal with this using benchmarking and heavy dogfooding, both of these really expose errors/rough edges well.
My approach is similar. I invest in the harness layer (tests, hooks, linting, pre-commit checks). The code review happens, it's just happening through tooling rather than my eyeballs.
Exactly this. The code is an intermediate artifact - what I actually care about is: does the product work, does it meet the spec, do the tests pass?
I've found that focusing my attention upstream (specs, constraints, test harness) yields better outcomes than poring over implementation details line by line. The code is still there if I need it. I just rarely need it.
People miss this a lot. Coding is just a (small) part of building a product. You get a much better bang for the buck if you focus your time on talking to the user, dogfooding, and then vibecoding. It also allows you to do many more iterations with even large changes because since your didn't "write" the code, you don't care about throwing it away.
AI-assisted coding is not a black box in the way that managing an engineering team of humans is. You see the model "thinking", you see diffs being created, and occasionally you intervene to keep things on track. If you're leveraging AI professionally, any coding has been preceded by planning (the breadth and depth of which scale with the task) and test suites.
Don’t read the code, test for desired behavior, miss out on all the hidden undesired behavior injected by malicious prompts or AI providers. Brave new world!
My understanding is that it's quite easy to poison the models with inaccurate data, I wouldn't be surprised if this exact thing has happened already. Maybe not an AI company itself, but it's definitely in the purview of a hostile actor to create bad code for this purpose. I suppose it's kind of already happened via supply chain attacks using AI generated package names that didn't exist prior to the LLM generating them.
One mitigation might be to use one company's model to check the work of another company's code and depend on market competition to keep the checks and balances.
Then how many models deep do you go before it's more cost effective to just hire a junior dev, supply them with a list of common backdoors, and have them scan the code?
The output is the program behavior. You use it, like a user, and give feedback to the coding agent.
If the app is too bright, you tweak the settings and build it again.
Photography used to involve developing film in dark rooms. Now my iPhone does... god knows what to the photo - I just tweak in post, or reshoot. I _could_ get the raw, understand the algorithm to transform that into sRGB, understand my compression settings, etc - but I don't need to.
Similarly, I think there will be people who create useful software without looking at what happens in between. And there will still be low-level software engineers for whom what happens in between is their job.
I can’t imagine retesting all the functionality of a well established product for possible regressions not being stupidly time consuming. This is the very reason why we have unit tests in the first place, and why they are far more numerous in tests than end-to-end ones.
> I can’t imagine any other example where people voluntarily move for a black box approach.
Anyone overseeing work from multiple people has to? At some point you have to let go and trust people‘s judgement, or, well, let them go. Reading and understanding the whole output of 9 concurrently running agents is impossible. People who do that (I‘m not one of them btw) must rely on higher level reports. Maybe drilling into this or that piece of code occasionally.
>At some point you have to let go and trust people‘s judgement.
Indeed. People. With salaries, general intelligence, a stake in the matter and a negative outcome if they don’t take responsibility.
>Reading and understanding the whole output of 9 concurrently running agents is impossible.
I agree. It is also impossible for a person to drive two cars at once… so we don’t. Why is the starting point of the conversation that one should be able to use 9 concurring agents?
I get it, writing code no longer has a physical bottleneck. So the bottleneck becomes the next thing, which is our ability to review outputs. It’s already a giant advancement, why are we ignoring that second bottleneck and dropping quality assurance as well? Eventually someone has to put their signature on the thing being shippable.
It is not. To review code you need to have an understanding of the problem that can only be built by writing code. Not necessarily the final product, but at least prototypes and experiments that then inform the final product.
Juniors build worse code than codex. Their superiors also can‘t check everything they do. They need to have some level of trust for doing dumb shit, or they can’t hire juniors.
> Does such a thing exist here? Just "done".
Not sure what you mean. You can definitely ask the agent what it built, why it built it, and what could be improved. You will get only part of the info vs when you read the output, but it won’t be zero info.
LLM: "Because the embeddings in your prompt are close to some embeddings in my training data. Here's some seemingly explanatory text with that is just similar embeddings to other 'why?' questions."
You: "What could be improved?"
LLM: "Here's some different stuff based on other training data with embeddings close to the original embeddings, but different.
---
It's near zero useful information. Example imformation might be "it builds" (baseline necessity, so useless info), "it passes some tests" (fairly baseline, more useful, but actually useless if you don't know what the tests are doing), or "it's different" (duh).
> I can’t imagine any other example where people voluntarily move for a black box approach.
I can think of a few. The last 78 pages of any 80-page business analysis report. The music tracks of those "12 hours of chill jazz music" YouTube videos. Political speeches written ahead of time. Basically - anywhere that a proper review is more work than the task itself, and the quality of output doesn't matter much.
But can you get an AI to zone out on a fluffy couch at the center point of a dank hi-fi setup with the volume cranked to 11, while chillin' on 50mg of THC?
And will you enjoy paying someone else to let the AI to do that?
No pun intended but - it's been more "vibes" than science that I've done this. It's more effective. When I focus my attention on the harness layer (tests, hooks, checks, etc), and the inputs, my overall velocity improves relative to reading & debugging the code directly.
To be fair - it is not accurate to say I absolutely never read the code. It's just rare, and it's much more the exception than the rule.
My workflow just focuses much more on the final product, and the initial input layer, not the code - it's becoming less consequential.
Same. I stopped reading after that. I get the sense that most of these people thing all code is web or mobile or something non critical. Granted im not a web or mobile guy so I cant presume the complexity, risk, cost of such things. But I assume its in a different category than safety/mission critical things. I do dev tools for ASIL-B systems devs now and even then I cant say im comfortable not reading the generated code. Some of my junior peers are though, and im very frustrated that I feel like I keep having to play AI janitor, dont think the bosses care.
I think this is the logical next step -- instead of manually steering the model, just rely on the acceptance criteria and some E2E test suite (that part is tricky since you need to verify that part).
I personally think we are not that far from it, but it will need something built on top of current CLI tools.
> Because if you can read code, I can’t imagine poking the result with black box testing being faster.
I don't know... it depends on the use case. I can't imagine even the best front-end engineer ever can read HTML faster than looking at the rendered webpage to check if the layout is correct.
That's true. Never let the AI know about the code it wrote when writing the test for sure. Write multiple tests, have an arbitrator (also AI) figure out if implementation or tests are wrong when tests fail. Have the AI heavily comment code and heavily comment tests in the language of your spec so you can manually verify if the scenarios/parts of the implementations make sense when it matters.
etc...etc...
> In other words, if “the ai is checking as well” no one is.
Explain then how testing the functionality (not the new one; regressions included, this is not a school exercise) is faster than checking the code.
Are you writing black box testing by hand, or manually checking, everything that would normally be a unit test? We have unit tests precisely because of how unworkable the “every test is black box” approach is.
people care about results. Better processes need to produce better results. this is programming not a belief system where you have to adhere to some view or else.
I find so many of these comments and debates fascinating as a lay person. I'm more tech savy than mostI meet, built my own PCs, know my way around some more 'advanced' things like terminal a bit and have a deeper understanding of computer systems, software, etc. than most people I know. It has always been more of a hobby for me. People look at me as the 'tech' guy even though I'm actually not.
Something I know very little about is coding. I know there are different languages with pros and cons to each. I know some work across operating systems while others don't but other than that I don't know too much.
For the first time I just started working on my own app in Codex and it feels absolutely amazing and magical. I've not seen the code, would have basically no idea how to read it, but i'm working on a niche application for my job that it is custom tailored to my needs and if it works I'll be thrilled. Even better is that the process of building is just feels so special and awesome.
This really does feel like it is on the precipice of something entirely different. I think back to computers before a GUI interface. I think back to even just computers before mobile touch interfaces. I am sure there are plenty of people who thought some of these things wouldn't work for different reasons but I think that is the wrong idea. The focus should be on who this will work for and why and there, I think, there are a ton of possibilities.
For reference, I'm a middle school Assistant Principal working on an app to help me with student scheduling.
After 10+ years of stewing on an idea, I started building an app (for myself) that I've never had the courage or time to start until now.
I really wanted to learn the coding, the design patterns, etc, but truthfully, it was never gonna happen without a Claude. I could never get past the unknown-unknowns (and I didn't even grasp how broad is the domain of knowledge it actually requires.) Best case I would have started small chunks and abandoned it countless times, piling on defeatism and disappointment each time.
Now in under two weeks of spare time and evenings, I've got a working prototype that's starting to resemble my dream. Does my code smell? Yes. Is it brittle? Almost certainly. Is it a security risk? I hope not. (It's not.)
I want to be intentional about how I use AI; I'm nervous about how it alters how we think and learn. But seeing my little toy out in the real world is flippin incredible.
It very probably is, but if it's a personal project you're not planning on releasing anywhere, it doesn't matter much.
You should still be very cognizant that LLMs will currently fairly reliably implement massive security risks once a project grows beyond a certain size, though.
They can also identify and fix vulnerabilities when prompted. AI is being used heavily by security researchers for this purpose.
It’s really just a case of knowing how to use the tools. Said another way, the risk is being unaware of what the risks are. And awareness can help one get out of the bad habits that create real world issues.
My observation is that "AI" makes easy things easier and hard things impossible. You'll get your niche app out of it, you'll be thrilled, then you'll need it to do more. Then you will struggle to do more, because the AI created a pile of technical debt.
Programmers dream of getting a green field project. They want to "start it the right way this time" instead of being stuck unwinding technical debt on legacy projects. AI creates new legacy projects instantly.
Never thought this would be something people actually take seriously. It really makes me wonder if in 2 - 3 years there will be so much technical debt that we'll have to throw away entire pieces of software.
> Never thought this would be something people actually take seriously
The author of the article has a bachelor's degree in economics[1], worked as a product manager (not a dev) and only started using GitHub[2] in 2025 when they were laid off[3].
Whilst I won't comment on this specific person, one of the best programmers I've met has a law degree, so I wouldn't use their degree against them. People can have many interests and skills.
> Never thought this would be something people actually take seriously.
You have to remember that the number of software developers saw a massive swell in the last 20 years, and many of these folks are Bootcamp-educated web/app dev types, not John Carmack. They typically started too late and for the wrong reasons to become very skilled in the craft by middle age, under pre-AI circumstances and statistically (of course there are many wonderful exceptions; one of my best developers is someone who worked in a retail store for 15 years before pivoting).
AI tools are now available to everyone, not just the developers who were already proficient at writing code. When you take in the excitement you always have to consider what it does for the average developer and also those below average: A chance to redefine yourself, be among the first doing a new thing, skip over many years of skill-building and, as many of them would put it, focus on results.
It's totally obvious why many leap at this, and it's even probably what they should do, individually. But it's a selfish concern, not a care for the practice as-is. It also results in a lot of performative blog posting. But if it was you, you might well do the same to get ahead in life. There's only to so many opportunities to get in on something on the ground floor.
I feel a lot of senior developers don't keep the demographics of our community of practice into account when they try to understand the reception of AI tools.
I have rarely had the words pulled out of my mouth.
The percentage of devs in my career that are from the same academic background, show similar interests, and approach the field in the same way, is probably less than %10, sadly.
Ahh Bulverism, with a hint of ad-hominem and a dash no No True Scotsman. I think the most damning indictment here is the seeming inability to make actual arguments and not just cheap shots at people you've never even met.
Please tell me, "Were people excited about high-level languages just programmers who 'couldn't hack it' with assembly? Maybe you are one of those? Were GUI advocates just people who couldn't master the command line?"
Thanks for teaching me about Bulverism, I hadn't heard of that fallacy before. I can see how my comment displays those characteristics and will probably try to avoid that pattern more in the future.
Honestly, I still think there's truth to what I wrote, and I don't think your counter-examples prove it wrong per-se. The prompt I responded to ("why are people taking this seriously") also led fairly naturally down the road of examining the reasons. That was of course my choice to do, but it's also just what interested me in the moment.
>I think he's a cook, watching people putting frozen "meals" in the microwave and telling himself: "hey! That's not cooking!".
It's the equivalent of saying anyone excited about being able to microwave Frozen meals is a hack who couldn't make it in the kitchen. I'm sorry, but if you don't see how ridiculous that assertion is then I don't know what to tell you.
>And I totally agree with him. Throwing some kind of fallacy in the air for the show doesn't make your argument, or lack of, more convincing.
A series of condescending statements meant to demean with no objective backing whatsoever is not an argument. What do you want me to say ? There's nothing worth addressing, other than pointing out how empty it is.
You think there aren't big shots, more accomplished than anyone in this conversation who are similarly enthusiastic?
You and OP have zero actual clue. At any advancement, regardless of how big or consequential, there are always people like that. It's very nice to feel smart and superior and degrade others, but people ought to be better than that.
So I'm sorry but I don't really care how superior a cook you think you are.
Well, there are programmers like Karpathy in his original coinage of vibe coding
> There's a new kind of coding I call "vibe coding", where you fully give in to the vibes, embrace exponentials, and forget that the code even exists. It's possible because the LLMs (e.g. Cursor Composer w Sonnet) are getting too good. Also I just talk to Composer with SuperWhisper so I barely even touch the keyboard. I ask for the dumbest things like "decrease the padding on the sidebar by half" because I'm too lazy to find it. I "Accept All" always, I don't read the diffs anymore. When I get error messages I just copy paste them in with no comment, usually that fixes it. The code grows beyond my usual comprehension, I'd have to really read through it for a while. Sometimes the LLMs can't fix a bug so I just work around it or ask for random changes until it goes away. It's not too bad for throwaway weekend projects, but still quite amusing. I'm building a project or webapp, but it's not really coding - I just see stuff, say stuff, run stuff, and copy paste stuff, and it mostly works.
Half serious - but is that really so different than many apps written by humans?
I've worked on "legacy systems" written 30 to 45 years ago (or more) and still running today (things like green-screen apps written in Pick/Basic, Cobol, etc.). Some of them were written once and subsystems replaced, but some of it is original code.
In systems written in the last.. say, 10 to 20 years, I've seen them undergo drastic rates of change, sometimes full rewrites every few years. This seemed to go hand-in-hand with the rise of agile development (not condemning nor approving of it) - where rapid rates of change were expected.. and often the tech the system was written in was changing rapidly also.
In hardware engineering, I personally also saw a huge move to more frequent design and implementation refreshes to prevent obsolescence issues (some might say this is "planned obsolescence" but it also is done for valid reasons as well).
I think not reading the code anymore TODAY may be a bit premature, but I don't think it's impossible to consider that someday in the nearer than further future, we might be at a point where generative systems have more predictability and maybe even get certified for safety/etc. of the generated code.. leading to truly not reading the code.
I'm not sure it's a good future, or that it's tomorrow, but it might not be beyond the next 20 year timeframe either, it might be sooner.
> 2 - 3 years there will be so much technical debt that we'll have to throw away entire pieces of software.
That happens just as often without AI. Maybe the people that like it all thave experience with trashing multiple sets of products over the course of their life?
Remember though this forum is full of people who consider code objects when it's just state in a machine.
We have been throwing away entire pieces of software forever. Where's Novell? Who runs 90s Linux kernels in prod?
Code isn't a bridge or car. Preservation isn't meaningful. If we aren't shutting the DCs off we're still burning the resources regardless if we save old code or not.
Most coders are so many layers of abstraction above the hardware at this point anyway they may as well consider themselves syntax artists as much as programmers, and think of Github as DeviantArt for syntax fetishists.
Am working on a model of /home to experiment with booting Linux to models. I can see a future where Python in my screen "runs" without an interpreter because the model is capable of correctly generating the appropriate output without one.
Code is ethno objects, only exists socially. It's not essential to computer operations. At the hardware level it's arithmetical operations against memory states.
Am working on my own "geometric primitives" models that know how to draw GUIs and 3D world primitives, text; think like "boot to blender". Rather store data in strings, will just scaffold out vectors to a running "desktop metaphor".
It’s pretty well established that you cannot understand code without having thought things through while writing it. You need to know why things are written the way the are to understand what is written.
Yeah, just reading code does little to help me understand how a program works. I have to break it apart and change it and run it. Write some test inputs, run the code under a debugger, and observe the change in behavior when changing inputs.
I’ll grant you that there are many trivial software defects that can be identified by simply reading the code and making minor changes.
But for architectural issues, you need to be able to articulate how you would have written the code in the first place, once you understand the existing behavior and its problems. That is my interpretation of GP’s comment.
The coincidental timing between the rapid increase in the number of emergency fixes coming out on major software platforms and the proud announcement of the amount of code that's being produced by AI at the same companies is remarkable.
I think 2-3 years is generous.
Don't get me wrong, I've definitely found huge productivity increases in using various LLM workflows in both development as well as operational things. But removing a human from the loop entirely at this point feels reckless bordering on negligent.
My overall stance on this is that it's better to lean into the models & the tools around them improving. Even in the last 3-4 months, the tools have come an incredible distance.
I bet some AI-generated code will need to be thrown away. But that's true of all code. The real questions to me are - are the velocity gains be worth it? Will the models be so much better in a year that they can fix those problems themselves, or re-write it?
If the models don't get to the point where they can correct fixes on their own, then yeah, everything will be falling apart. There is just no other way around increasing entropy.
The only way to harness it is to somehow package code producing LLMs into an abstraction and then somehow validate the output. Until we achieve that, imo doesn't matter how closely people watch out the output, things will be getting worse.
> If the models don't get to the point where they can correct fixes on their own
Depending on what you're working on, they are already at that point. I'm not into any kind of AI maximalist "I don't read code" BS (I read a lot of code), but I've been building a fairly expensive web app to manage my business using Astro + React and I have yet to find any bug or usability issue that Claude Code can't fix much faster than I would have (+). I've been able to build out, in a month, a fully TDD app that would have conservatively taken me a year by myself.
(+) Except for making the UI beautiful. It's crap at that.
The key that made it click is exactly what the person describes here: using specs that describe the key architecture and use cases of each section. So I have docs/specs with files like layout.md (overall site shell info), ui-components.md, auth.md, database.md, data.md, and lots more for each section of functionality in the app. If I'm doing work that touches ui, I reference layout and ui-components so that the agent doesn't invent a custom button component. If I'm doing database work, reference database.md so that it knows we're using drizzle + libsql, etc.
This extends up to higher level components where the spec also briefly explains the actual goal.
Then each feature building session follows a pattern: brainstorm and create design doc + initial spec (updates or new files) -> write a technical plan clearly following TDD, designed for batches of parallel subagents to work on -> have Claude implement the technical plan -> manual testing (often, I'll identify problems and request changes here) -> automated testing (much stricter linting, knip etc. than I would use for myself) -> finally, update the spec docs again based on the actual work that was done.
My role is less about writing code and more about providing strict guardrails. The spec docs are an important part of that.
The proponents of Spec Driven Development argue that throwing everything out completely and rebuilding from scratch is "totally fine". Personally, I'm not comfortable with the level of churn.
Also take something into account: absolutely _none_ of the vibe coding influencer bros make anything more complicated than a single-feature, already implemented 50 times webapp. They've never built anything complicated either, or maintained something for more than a few years with all the warts that it entails. Literally, from his bio on his website:
> For 12 years, I led data and analytics at Indeed - creating company-wide success metrics used in board meetings, scaling SMB products 6x, managing organizations of 70+ people.
He's a manager that made graphs on Power BI.
They're not here because they want to build things, they're here to shit a product out and make money. By the time Claude has stopped being able to pipe together ffmpeg commands or glue together 3 JS libraries, they've gone on to another project and whoever bought it is a sucker.
It's not that much different from the companies of the 2000s promising a 5th generation language with a UI builder that would fix everything.
And then, as a very last warning: the author of this piece sells AI consulting services. It's in his interest to make you believe everything he has to say about AI, because by God is there going to be suckers buying his time at indecently high prices to get shit advice. This sucker is most likely your boss, by the way.
I'd have the decency to know and tell people that it's a steaming pile of shit and that I have no idea how it works though, and would not have the shamelessness to sell a course on how to put out LLM vomit in public though.
Engineering implies respect for your profession. Act like it.
But invoking No True Scotsman would imply that the focus is on gatekeeping the profession of programming. I don’t think the above poster is really concerned with the prestige aspect of whether vibe bros should be considered true programmers. They’re more saying that if you’re a regular programmer worried about becoming obsolete, you shouldn’t be fooled by the bluster. Vibe bros’ output is not serious enough to endanger your job, so don’t fret.
Unfortunately they're still too superficial. 9 times out of 10 they don't have enough context to properly implement something and end up just tacking it on in some random place with no regard for the bigger architecture. Even if you do tell it something in an AGENT.md file or something, it often just doesn't follow it.
I've seen software written and architected by Claude and I'd say that they're already ready to be thrown out. Security sucks, performance will probably suck, maintainability definitely sucks, and UX really fucking sucks.
As LLMs advance so rapidly I think that all the AI slop code written today will be easily digestible by the LLMs a few generations down the line. I think there will be a lot of improvements in making user intent clearer. Combined with a bad codebase and larger context windows, refactoring wont be a challenge.
The skills required to perform as a software engineer in an environment where competent AI agents is a commodity has shifted. Before it was important for us to be very good as reading documentation and writing code. Now we need to be very good at writing docs, specs and interfaces, and reading code.
That goes a bit against the article, but it's not reading code in the traditional sense where you are looking for common mistakes we humans tend to make. Instead you are looking for clues in the code to determine where you should improve in the docs and specs you fed into your agent, so the next time you run it chances are it'll produce better code, as the article suggests.
And I think this is good. In time, we are going to be forced to think less technically and more semantically.
Or simply look at the Astro blog, which is still showing the default Astro favicon.
I say this not to discourage anyone. Building a blog—or any app—is a huge accomplishment, and you should be proud. In particular, if you’re sharing what you’ve learned and sharing your code publicly, you’re already ahead of the majority of people on this journey.
What I’d encourage you to do is keep doing what you’re doing. The best way to learn to build software is to build software. The more you do, the more you learn.
When you write code yourself, you're convinced each line is correct as you write it. That assumption is hard to shake, so you spend hours hunting for bugs that turn out to be obvious. When reading AI-generated code fresh, you lack that assumption. Bugs can jump out faster. That's at least my naive explanation to this phenomenon
Become a CTO, CEO or even a venture investor. "Here's $100K worth tokens, analyze market, review various proposals from Agents, invest tokens, maximize profit".
You know why not? Because it will be more obvious it doesn't work as advertised.
If one truly believed in LLMs being able to replace knowledge workers, then it would also hold that they could replace managers and execs. In fact, they should be able to do it even better: LLMs could convert every company into a "flat" one, bypassing the manangement hierarchy and directly consuming meeting notes from every meeting to get the real status as the source of truth, and provide suggestions as needed. If combined with web-search capability, they would also be more plugged into the market, customer sentiment, and competitors than most execs could ever be.
We're not at the point where we are replacing all software developers entirely (and will never be without real AGI), but we are definitely at the point where scaling back headcount is possible.
Also, creating software is much more testable and verifiable than what a CEO does. You can usually tell when the code isn't right because it doesn't work or doesn't pass a test. How can you verify that your AI CEO is giving you the right information or planning its business strategy effectively?
It's one of the biggest reasons that software development and art are the two domains in which AI excels. In software you can know when it's right, and in art it doesn't matter if it's right.
You have to move up or down to survive. In 10 years we'll either be managers (either of humans or agents), or we'll be electrical engineers. Programming is done! I for one am glad.
* AI can replace knowledge workers - most of existing software engineers, managers of all levels will loose their job and have to re-qualify.
* AI requires human in the loop.
In the first scenario, I see no reason to waste time and should start building plan B now (remaining job markets will be saturated at that point).
In the second scenario, tech-debt and zettabytes of slop will harm companies which relied on it heavily. In the age of failing giants and crumbling infrastructure, engineers and startups that can replace gigawatt burning data center with a few kilowatt rack, by manually coding a shell script that replaces Hadoop, will flourish.
Most probably it will be a spectrum - some roles can be replaced, some not.
I still think this is mostly people who never could hack it at coding taking to the new opportunities that these tools afford them without having to seriously invest in the skill, and basking in touting their skilless-ness being accepted as the new temporary cool.
Which is perhaps what they should do, of course. Any transition is a chance to get ahead and redefine yourself.
Just FYI, this is the attitude that causes pro-AI people to start shit-talking anti-AI folks as Luddites who need to learn to use the tools.
Agents are a quality/velocity tradeoff (which is often good), if you can't debug stuff without them that's a problem as you'll get into holes, but that doesn't mean you have to write the code by hand.
I enjoy new technology in general, so I very much keep up with the tools and also like using them for the things they do well at any given moment. I'm not among the Luddites, FWIW. I think there's a lot of legitimately great building going on right now.
Note though we're talking about "not reading code" in context, not the writing of it.
Author is a former data analytics product manager (already a bit of a tea leaf reading domain) who says he never reads code and is now marketing himself as a new class of developer.
Parent post sounds like a very accurate description.
I completely agree in a sense - the cost of producing software is plummeting, and it's leading to me being able to develop things that I would never have invested months in before.
This blog post is written by a product manager, not a programmer. Their CV speaks to an Economics background, a stint in market research, writing small scripting-type programs ("Cron+MySQL data warehouse") and then off to the product management races.
What it's trying to express is that the (T)PM job still should still be safe because they can just team-lead a dozen agents instead of software developers.
Take with a grain of salt when it comes to relevance for "coding", or the future role breakdown in tech organizations.
I'm not trying to express that my particular flavor of career is safe. I think that the ability to produce software is much less about the ability to hand-write code, and that's going to continue as the models and ecosystem improve, and I'm fascinated by where that goes.
>I think the industry is moving left. Toward specs. The code is becoming an implementation detail. What matters is the system that produces it - the requirements, the constraints, the architecture. Get those right, and the code follows.
So basically a return to waterfall design.
Rather than YOLO planning (agile), we go back to YOLO implementation (farming it out to dozens of replaceable peons, but this time they're even worse).
I really wish posts like this explained what sort of development they are doing. Is this for an internal CRUD server? Internal React app? Scala server with three instances? Golang server with complex AWS configuration? 10k lines? 100k lines? 1M+? Externally facing? iOS app? Algorithm-heavy photo processing desktop app? It would give me a much better idea of whether the argument is reasonable, and whether it is applicable for the kind of software I generally write.
He makes <10k cloc websites trying to sell you a spec-creation wizard[0]. Considering Claude wrote the site, it could probably be written in 1/10th of the lines.
I think these tools are great for allowing non-technical people like OP to create landing pages and small prototypes, but they're useless for anything serious. That said, I applaud OP for embodying the "when in a gold rush, sell shovels" mentality.
You're completely right and I wish I had in retrospect... I was honestly just talking mostly in broad terms, but people really (maybe rightly) focused on the "not reading code" snippet.
I'm mostly developing my own apps and working with startups.
At least going by their own CV, they've mostly written what sounds like small scripting-type programs described in grandiose terms like "data warehouse".
When I talk with people in the space, go to meetups, present my work & toolset, I am usually one of the more advanced, but usually not THE most, people in the conversation / group. I'm not saying I'm some sort of genius, I'm just saying I'm relatively near the leading edge of how to use these tools. I feel like it's true.
turning a big dial taht says "Psychosis" on one side and "Wishful thinking" on the other and constantly looking back at the LinkedIn audience for approval like a contestant on the price is right
Why have a spec when I have the concrete implementation and a system ready and willing to answer any questions I have about it? I don't understand why people value an artifact that can be out of sync with reality over the actual reality. The LLM can answer questions based on the code. We might drift away from needing a code editor, but I likely won't be drifting to reading specs in a world where I can converse with the deployed implementation.
I think the idea is more to program the prompter than to program the LLM. He sells a wizard for generating project specs. Anyone can do this with a normal LLM conversation, but I suppose some people forget
> Here’s the thing: I don’t read code anymore. I used to write code and read code. Now when something isn’t working, I don’t go look at the code.
Recently I picked a smallish task from our backlog. This is some code I'm not familiar with, frontend stuff I wouldn't tackle normally.
Claude wrote something. I tested, it didn't work. I explained the issue. It added a bunch of traces, asked me to collect the logs, figured out a fix, submitted the change.
Got bunch of linter errors that I don't understand, and that I copied and pasted to Claude. It fixed something, but still got lint errors, which Claude dismissed as irrelevant, but I realized I wasn't happy with the new behavior.
After 3 days of iteration, my change seems ok, passed the CI, the linters, and automatic review.
At that stage, I have no idea if this is the right way to fix the problem, and if it breaks something, I won't be able to fix it myself as I'm clueless. Also, it could be that a human reviewer tells me it's totally wrong, or ask me questions I won't be able to answer.
Not only, this process wasn't fun at all, but I also didn't learn anything, and I may introduce technical debt which AI may not be able to fix.
I agree that coding agents can boost efficiency in some cases, but I don't see a shift left of IDEs at that stage.
My rule is 3 tries then dig deeper. Sometimes I don't even wait that long, certain classes of bugs are easy for humans to detect but hard for agents, such as CSS issues. Try asking the agent to explain/summarize the code that's causing the problem and double checking against docs for the version you're using, that solves a lot of problems.
Spec is too low level in my experience. The graph continues far further to the left.
I tried doing clean room reimplementations from specs, and just ended up with even worse garbage. Cause it kept all the original garbage and bloated it further!
Giving it a description of what you're actually trying to do works way better. Then it finds the most elegant solution to the problem, both in terms of the code and the UI design.
I don't like the craft of the app. There are a few moments that really left me feeling it wasn't 100 percent thought through like cursor is at this point.
Why create IDE without IDE features? Whats the benefit of this over using IDE with Codex plugin? I don't believe that you can review the code without code traversal by references, so looks like its directed towards toy projects/ noobs. And the agents are not yet near the autonomy that will omit the code review in complex systems.
It's nano banana - I actually noticed the same thing. I didn't prompt it as such.
Here's the prompt I used, actually:
Create a vibrant, visually dynamic horizontal infographic showing the spectrum of AI developer tools, titled "The Shift Left"
Layout: 5 distinct zones flowing RIGHT TO LEFT as a journey/progression. Use creative visual metaphors — perhaps a road, river, pipeline, or abstract flowing shapes connecting the stages. Each zone should feel like its own world but connected to the others.
2. "Multi-Agent Orchestration" - Claude Code logo, Codex CLI logo, Codex App logo, Conductor logo
Label: "Parallel agents, fire & forget"
3. "Agentic IDE" - Cursor logo, Windsurf logo
Label: "Autonomous multi-file edits"
4. "Code + AI" - GitHub Copilot logo
Label: "Inline suggestions"
5. "Code" (rightmost) - VS Code logo
Label: "Read & write files"
Visual style: Fun, energetic, modern. Think illustrated tech landscape or isometric world. NOT a boring corporate chart. Use warm off-white background (#faf8f5) with amber/orange (#b45309) as the primary accent color throughout. Add visual flair — icons, small illustrations, depth, texture, but don't make it visually overloaded.
> Where IDEs are headed and why specs matter more than code.
We are very far away from this being a settled or agreed upon statement and I really struggle to understand how one vendor making a tool is indicative of an industry practice.
Clearly written by someone who has no systems of importance in production. If my code fail people loose money, planes halts, cars break down. Read. The. Code.
Yes, but also ... the analogy to assembly is pretty good. We're moving pretty quickly towards a world where we will almost never read the code.
You may read all the assembly that your compiler produces. (Which, awesome! Sounds like you have a fun job.) But I don't. I know how to read assembly and occasionally do it. But I do it rarely enough that I have to re-learn a bunch of stuff to solve the hairy bug or learn the interesting system-level thing that I'm trying to track down if I'm reading the output of the compiler. And mostly even when I have a bug down at the level where reading assembly might help, I'm using other tools at one or two removes to understand the code at that level.
I think it's pretty clear that "reading the code" is going to go the way of reading compiler output. And quite quickly. Even for critical production systems. LLMs are getting better at writing code very fast, and there's no obvious reason we'll hit a ceiling on that progress any time soon.
In a world where the LLMs are not just pretty good at writing some kinds of code, but very good at writing almost all kinds of code, it will be the same kind of waste of time to read source code as it is, today, to read assembly code.
Compilers predictably transform one kind of programming language code to CPU (or VM) instructions. Transpilers predictably transform one kind of programming language to another.
We introduced various instruction architectures, compiler flags, reproducible builds, checksums exactly to make sure that whatever build artifact that's produced is super predictable and dependable.
That reproducibility is how we can trust our software and that's why we don't need to care about assembly (or JVM etc.) specifics 99% of the time. (Heck, I'm not familiar with most of it.)
Same goes for libraries and frameworks. We can trust their abstractions because someone put years or decades into developing, testing and maintaining them and the community has audited them if they are open-source.
It takes a whole lot of hand-waving to traverse from this point to LLMs - which are stochastic by nature - transforming natural language instructions (even if you call it "specs", it's fundamentally still a text prompt!) to dependable code "that you don't need to read" i.e. a black box.
The analogy to assembly is wrong. Even in a high level language, you can read the code and reason about what it does.
What's the equivalent for an LLM? The string of prompts that non-deterministically generates code?
Also, if LLM output is analogous to assembly, then why is that what we're checking in to our source control?
LLMs don't seem to solve any of the problems I had before LLMs existed. I never worried about being able to generate a bunch of code quickly. The problem that needs to be solved is how to better write code that can be understood, and easily modified, with a high degree of confidence that it's correct, performs well, etc. Using LLMs for programming seems to do the opposite.
I think it's the performative aspects that are grating, though. You're right that even many systems programmers only look at the generated assembly occasionally, but at least most of them have the good sense to respect the deeper knowledge of mechanism that is to be found there, and many strive to know more eventually. Totally orthogonal to whether writing assembly at scale is sensible practice or not.
But with the AI tools we're not yet at the wave of "sometimes it's good to read the code" virtue signaling blog posts that will make front page next year or so, and still at the "I'm the new hot shit because I don't read code" moment, which is all a bit hard to take.
I mean, fair enough. Obviously there are different levels of criticality in any production environment. I'm building consumer products and internal tools, not safety-critical systems.
Even in those environments, I'd argue that AI coding can offer a lot in terms of verification & automated testing. However, I'd probably agree, in high-stakes safety environments, it's more of a 'yes and' than an either/or.
I think a lot of AI bros are sleeping on quality. Prior startup wisdom was “move fast and break things”. Speed is ubiquitous now. Relatively anyone can vibe code a buggy solution that works for their happy path. If that’s the bar, why would I pay for your jank solution when I can make my own tailored to my exact needs? Going fast is a race to the bottom in the long run.
What’s worth paying for is something that is trustworthy.
Claude code is a perfect example: They blocked tools like opencode because they know quality is the only moat, and they don’t currently have it.
Hell I see the big banner picture hallucinated by a prompt and all I see is an unproductive mess. Won't comment on the takes the article makes they're just miserable
>Here’s the thing: I don’t read code anymore. I used to write code and read code. Now when something isn’t working, I don’t go look at the code. I don’t question the code. I either ask one of my coding agents, or - more often - I ask myself: what happened with my system? What can I improve about the inputs that led to that code being generated?
Good luck debugging any non trivial problem in such codebase
Not to mention data retention and upgrade management.
When an update script jacks up the guaranteed-to-be-robust vibed data setup in this first of a kind, one of a kind, singular installation… what then?
The pros have separate dev, test, QA, and prod environments. Immutable servers, NixOs, containers, git, and rollback options in orchestration frameworks. Why? Because uh-oh, oh-shit, say-what, no-you’re-kidding, oh-fuck, and oops are omnipresent.
MS Access was a great product with some scalability ceilings that took engineering to work past. MS Access solutions growing too big then imploding was a real concern that bit many departments. MS access was not dumping 15,000 LoC onto the laps of these non-developers and telling them they are hybrid spirit code warriors with next level hacking skills.
Ruby on Rails, Wordpress, SharePoint… there are legitimately better options out there for tiny-assed self-serving CRUD apps and cheap developer ecosystems. They’re not quite as fun, tho, and they don’t gas people up as well.
I have always thought that AI code generation is an irresistible attraction for those personalities who lack the technical skills or knowledge necessary for programming, but nevertheless feel undeservedly like geniuses. This post is proof of that.
Also, the generated picture in this post makes me want to kick someone in the nuts. It doesn't explain anything.
Is the image really not that clear? There are IDE-like tools that all are focusing on different parts of the Spec --> Agent --> Code continuum. I think it illustrates that all right.
I really wonder why nobody is talking about how it is more important to be able to test the code.
9/10 my ai generated code is bad before my verification layers 9/10 its good after.
Claude fights through your rules. And if you code in another language you could use other agents to verify code.
This is the challenge now, effectively verify the code. Whenever I end up with a bad response I ask myself what layers could i set to stop AI as early as possible.
Also things like namings, comments, tree traversal, context engineering, even data-structures, multi-agenting. I know it sounds like buzzword, but these are the topics a software-engineer really should think about. Everything else is frankly cope.
I think people attacking "don't read the code" are not considering the status quo - they're comparing to some perfect world where staff engineers are reading every line of code. That's not even close to happening. Test-driven development is something most engineers just won't put up with... AI's will do it, no problem. If I can automate ten different checks for every commit, is my code really getting looked at less?
>The people really leading AI coding right now (and I’d put myself near the front, though not all the way there) don’t read code. They manage the things that produce code.
I can’t imagine any other example where people voluntarily move for a black box approach.
Imagine taking a picture on autoshot mode and refusing to look at it. If the client doesn’t like it because it’s too bright, tweak the settings and shoot again, but never look at the output.
What is the logic here? Because if you can read code, I can’t imagine poking the result with black box testing being faster.
Are these people just handing off the review process to others? Are they unable to read code and hiding it? Why would you handicap yourself this way?
Your product managers most likely are not reading your code. Your CEO is not. The vast majority of your company is unlikely to ever look at a line of code.
If the process becomes reliable enough, then there is no reason. For now, that still requires developers to pay attention for important projects, but there are also a lot of AI written tools I rely on day to day that I don't, because the opportunity cost of spending time reading them is lower than the cost of accepting the risk that they do something wrong.
There are also a whole lot of tools I do read thoroughly, because the risk profile is different.
But that category is getting smaller day by day, not just with model improvements, but with improved harnesses.
I think many people are missing the overall meaning of these sorts of posts.. that is they are describing a new type of programmer that will only use agents and never read the underlying code. These vibe/agent coders will use natural(-ish) language to communicate with the agents and wouldn't look at the code anymore than, say, a PHP developer would look at the underlying assembly. It is not the level of abstraction they are working on. There are many use cases where this type of coding will work fine and it will let many people who previously couldn't really take advantage of computers to do so. This is great but in no way will do anything to replace the need for code that requires humans to understand (which, in turn, requires participation in the writing).
Your analogy to PHP developers not reading assembly got me thinking.
Early resistance to high-level (i.e. compiled) languages came from assembly programmers who couldn’t imagine that the compiler could generate code that was just as performant as their hand-crafted product. For a while they were right, but improved compiler design and the relentless performance increases in hardware made it so that even an extra 10-20% boost you might get from perfectly hand-crafted assembly was almost never worth the developer time.
There is an obvious parallel here, but it’s not quite the same. The high-level language is effectively a formal spec for the abstract machine which is faithfully translated by the (hopefully bug-free) compiler. Natural language is not a formal spec for anything, and LLM-based agents are not formally verifiable software. So the tradeoffs involved are not only about developer time vs. performance, but also correctness.
For a great many software projects no formal spec exists. The code is the spec, and it gets modified constantly based on user feedback and other requirements that often appear out of nowhere. For many projects, maybe ~80% of the thinking about how the software should work happens after some version of the software exists and is being used to do meaningful work.
Put another way, if you don't know what correct is before you start working then no tradeoff exists.
> Put another way, if you don't know what correct is before you start working then no tradeoff exists.
This goes out the window the first time you get real users, though. Hyrum's Law bites people all the time.
"What sorts of things can you build if you don't have long-term sneaky contracts and dependencies" is a really interesting question and has a HUGE pool of answers that used to be not worth the effort. But it's largely a different pool of software than the ones people get paid for today.
> This goes out the window the first time you get real users, though.
Not really. Many users are happy for their software to change if it's a genuine improvement. Some users aren't, but you can always fire them.
Certainly there's a scale beyond which this becomes untenable, but it's far higher than "the first time you get real users".
But that's not what this is about:
> For many projects, maybe ~80% of the thinking about how the software should work happens after some version of the software exists and is being used to do meaningful work.
Some version of the software exists and now that's your spec. If you don't have a formal copy of that and rigorous testing against that spec, you're gonna get mutations that change unintended things, not just improvements.
Users are generally ok with - or at least understanding of - intentional changes, but now people are talking about no-code-reading workflows, where you just let the agents rewrite stuff on the fly to build new things until all the tests pass again. The in-code tests and the expectations/assumptions about the product that your users have are likely wildly different - they always have been, and there's nothing inherent about LLM-generated code or about code test coverage percentages that change this.
"Some users will _accept_ "improvements" IFF it doesn't break their existing use cases."
Fixed that for you.
> So the tradeoffs involved are not only about developer time vs. performance, but also correctness.
The "now that producing plausible code is free, verification becomes the bottleneck" people are technically right, of course, but I think they're missing the context that very few projects cared much about correctness to begin with.
The biggest headache I can see right now is just the humans keeping track of all the new code, because it arrives faster than they can digest it.
But I guess "let go of the need to even look at the code" "solves" that problem, for many projects... Strange times!
For example -- someone correct me if I'm wrong -- OpenClaw was itself almost entirely written by AI, and the developer bragged about not reading the code. If anything, in this niche, that actually helped the project's success, rather than harming it.
(In the case of Windows 11 recently.. not so much ;)
> The "now that producing plausible code is free, verification becomes the bottleneck" people are technically right, of course, but I think they're missing the context that very few projects cared much about correctness to begin with.
It's certainly hard to find, in consumer-tech, an example of a product that was displaced in the market by a slower moving competitor due to buggy releases. Infamously, "move fast and break things" has been the rule of the land.
In SaaS and B2B deterministic results becomes much more important. There's still bugs, of course, but showstopper bugs are major business risks. And combinatorial state+logic still makes testing a huge tarpit.
The world didn't spend the last century turning customer service agents and business-process-workers into script-following human-robots for no reason, and big parts of it won't want to reintroduce high levels of randmoness... (That's not even necessarily good for any particular consumer - imagine an insurance company with a "claims agent" that got sweet talked into spending hundreds of millions more on things that were legitimate benefits for their customers, but that management wanted to limit whenever possible on technicalities.)
OK but, I've definitely read the assembly listings my C compiler produced when it wasn't working like I hoped. Even if that's not all that frequent it's something I expect I have to do from time to time and is definitely part of "programming".
It's also important to remember that vibe coders throw away the natural language spec each time they close the context window.
Vibe coding is closer to compiling your code, throwing the source away and asking a friend to give you source that is pretty close to the one you wrote.
> which is faithfully translated by the (hopefully bug-free) compiler.
"Hey Claude, translate this piece of PHP code into Power10 assembly!"
Imagine if high level coding worked like: write a first draft, and get assembly. All subsequent high level code is written in a repl and expresses changes to the assembly, or queries the state of the assembly, and is then discarded. only the assembly is checked into version control.
Or the opposite, all applications are just text files with prompts in them and the assembly lives as ravioli in many temp files. It only builds the code that is used. You can extend the prompt while using the application.
I'm glad you wrote this comment because I completely agree with it. I don't think that there is not a need for software engineers to deeply consider architecture; who can fully understand the truly critical systems that exist at most software companies; who can help dream up the harness capabilities to make these agents work better.
I just am describing what I'm doing now, and what I'm seeing at the leading edge of using these tools. It's a different approach - but I think it'll become the most common way of producing software.
> that is they are describing a new type of programmer that will only use agents and never read the underlying code
> and wouldn't look at the code anymore than, say, a PHP developer would look at the underlying assembly
This really puts down the work that the PHP maintainers have done. Many people spend a lot of time crafting the PHP codebase so you don't have to look at the underlying assembly. There is a certain amount of trust that I as a PHP developer assume.
Is this what the agents do? No. They scrape random bits of code everywhere and put something together with no craft. How do I know they won't hide exploits somewhere? How do I know they don't leak my credentials?
That is true for all languages. Very high quality until you use a lib, a module or an api.
> Imagine taking a picture on autoshot mode and refusing to look at it. If the client doesn’t like it because it’s too bright, tweak the settings and shoot again, but never look at the output.
The output of code isn't just the code itself, it's the product. The code is a means to an end.
So the proper analogy isn't the photographer not looking at the photos, it's the photographer not looking at what's going on under the hood to produce the photos. Which, of course, is perfectly common and normal.
>The output of code isn't just the code itself, it's the product. The code is a means to an end.
I’ll bite. Is this person manually testing everything that one would regularly unit test? Or writing black box tests that he does know are correct because of being manually written?
If not, you’re not reviewing the product either. If yes, it’s less time consuming to actually read and test the damn code
I mostly ignore code, I lean on specs + tests + static analysis. I spot check tests depending on how likely I think it is for the agent to have messed up or misinterpreted my instructions. I push very high test coverage on all my projects (85%+), and part of the way I build is "testing ladders" where I have the agent create progressively bigger integration tests, until I hit e2e/manual validation.
"I push very high test coverage on all my projects (85%+)"
Coverage doesn't matter if the tests aren't good. If you're not verifying the tests are actually doing something useful, talking about high coverage is just wanking.
"have the agent create progressively bigger integration tests, until I hit e2e/manual validation."
Same thing. It doesn't matter how big the tests are if they're not testing the right thing. Also why is e2e slashed with manual? Those are orthogonal. E2E tests can [and should] be fully automated for many [most?] systems. And manual validation doesn't have to wait for full e2e.
>I spot check tests depending on how likely I think it is for the agent to have messed up or misinterpreted my instructions
So a percentage of your code, based on your gut feeling, is left unseen by any human by the moment you submit it.
Do you agree that this rises the chance of bugs slipping by? I don’t see how you wouldn’t.
And considering the fact that your code output is larger, the percentage of it that is buggy is larger, and (presumably) you write faster, have you considered the conclusion in terms of the compounding likelihood of incidents?
There's definitely a class of bugs that are a lot more common, where the code deviates from the intent in some subtle way, while still being functional. I deal with this using benchmarking and heavy dogfooding, both of these really expose errors/rough edges well.
"Testing ladders" is a great framing.
My approach is similar. I invest in the harness layer (tests, hooks, linting, pre-commit checks). The code review happens, it's just happening through tooling rather than my eyeballs.
Exactly this. The code is an intermediate artifact - what I actually care about is: does the product work, does it meet the spec, do the tests pass?
I've found that focusing my attention upstream (specs, constraints, test harness) yields better outcomes than poring over implementation details line by line. The code is still there if I need it. I just rarely need it.
People miss this a lot. Coding is just a (small) part of building a product. You get a much better bang for the buck if you focus your time on talking to the user, dogfooding, and then vibecoding. It also allows you to do many more iterations with even large changes because since your didn't "write" the code, you don't care about throwing it away.
A photo isn't going to fail next week or three months from now because it's full of bugs no one's triggered yet.
Specious analogies don't help anything.
Right, it seems the appropriate analogy is the shift from analog-photograph-developers to digital camera photographers.
The product is: solving a problem. Requirements vary.
AI-assisted coding is not a black box in the way that managing an engineering team of humans is. You see the model "thinking", you see diffs being created, and occasionally you intervene to keep things on track. If you're leveraging AI professionally, any coding has been preceded by planning (the breadth and depth of which scale with the task) and test suites.
Don’t read the code, test for desired behavior, miss out on all the hidden undesired behavior injected by malicious prompts or AI providers. Brave new world!
You made me imagine AI companies maliciously injecting backdoors in generated code no one reads, and now I'm scared.
My understanding is that it's quite easy to poison the models with inaccurate data, I wouldn't be surprised if this exact thing has happened already. Maybe not an AI company itself, but it's definitely in the purview of a hostile actor to create bad code for this purpose. I suppose it's kind of already happened via supply chain attacks using AI generated package names that didn't exist prior to the LLM generating them.
One mitigation might be to use one company's model to check the work of another company's code and depend on market competition to keep the checks and balances.
Then how many models deep do you go before it's more cost effective to just hire a junior dev, supply them with a list of common backdoors, and have them scan the code?
What about writing the actual code yourself
Nah, more fun to burn money.
Already happening in the wild
The output is the program behavior. You use it, like a user, and give feedback to the coding agent.
If the app is too bright, you tweak the settings and build it again.
Photography used to involve developing film in dark rooms. Now my iPhone does... god knows what to the photo - I just tweak in post, or reshoot. I _could_ get the raw, understand the algorithm to transform that into sRGB, understand my compression settings, etc - but I don't need to.
Similarly, I think there will be people who create useful software without looking at what happens in between. And there will still be low-level software engineers for whom what happens in between is their job.
> What is the logic here?
It is right often enough that your time is better spent testing the functionality than the code.
Sometimes it’s not right, and you need to re-instruct (often) or dive in (not very often).
I can’t imagine retesting all the functionality of a well established product for possible regressions not being stupidly time consuming. This is the very reason why we have unit tests in the first place, and why they are far more numerous in tests than end-to-end ones.
> I can’t imagine any other example where people voluntarily move for a black box approach.
Anyone overseeing work from multiple people has to? At some point you have to let go and trust people‘s judgement, or, well, let them go. Reading and understanding the whole output of 9 concurrently running agents is impossible. People who do that (I‘m not one of them btw) must rely on higher level reports. Maybe drilling into this or that piece of code occasionally.
>At some point you have to let go and trust people‘s judgement.
Indeed. People. With salaries, general intelligence, a stake in the matter and a negative outcome if they don’t take responsibility.
>Reading and understanding the whole output of 9 concurrently running agents is impossible.
I agree. It is also impossible for a person to drive two cars at once… so we don’t. Why is the starting point of the conversation that one should be able to use 9 concurring agents?
I get it, writing code no longer has a physical bottleneck. So the bottleneck becomes the next thing, which is our ability to review outputs. It’s already a giant advancement, why are we ignoring that second bottleneck and dropping quality assurance as well? Eventually someone has to put their signature on the thing being shippable.
Is reviewing outputs really more efficient than writing the code? Especially if it's a code base you haven't written code in?
It is not. To review code you need to have an understanding of the problem that can only be built by writing code. Not necessarily the final product, but at least prototypes and experiments that then inform the final product.
An AI agent cannot be held accountable
Neither can employees, in many countries.
> Anyone overseeing work from multiple people has to?
That's not a black box though. Someone is still reading the code.
> At some point you have to let go and trust people‘s judgement
Where's the people in this case?
> People who do that (I‘m not one of them btw) must rely on higher level reports.
Does such a thing exist here? Just "done".
> Someone is still reading the code.
But you are not. That’s the point?
> Where's the people in this case?
Juniors build worse code than codex. Their superiors also can‘t check everything they do. They need to have some level of trust for doing dumb shit, or they can’t hire juniors.
> Does such a thing exist here? Just "done".
Not sure what you mean. You can definitely ask the agent what it built, why it built it, and what could be improved. You will get only part of the info vs when you read the output, but it won’t be zero info.
You: "Why did you build this?"
LLM: "Because the embeddings in your prompt are close to some embeddings in my training data. Here's some seemingly explanatory text with that is just similar embeddings to other 'why?' questions."
You: "What could be improved?"
LLM: "Here's some different stuff based on other training data with embeddings close to the original embeddings, but different.
---
It's near zero useful information. Example imformation might be "it builds" (baseline necessity, so useless info), "it passes some tests" (fairly baseline, more useful, but actually useless if you don't know what the tests are doing), or "it's different" (duh).
> You can definitely ask the agent what it built, why it built it, and what could be improved.
If that was true we’d have what they call AGI.
So no, it doesn’t actually give you those since it can’t reason and logic in such a way.
> I can’t imagine any other example where people voluntarily move for a black box approach.
I can think of a few. The last 78 pages of any 80-page business analysis report. The music tracks of those "12 hours of chill jazz music" YouTube videos. Political speeches written ahead of time. Basically - anywhere that a proper review is more work than the task itself, and the quality of output doesn't matter much.
So... things where the producer doesn't respect the audience? Because any such analysis would be worth as much as a 4.5 hour atonal bass solo.
You can get an AI to listen to that bass solo for you
But can you get an AI to zone out on a fluffy couch at the center point of a dank hi-fi setup with the volume cranked to 11, while chillin' on 50mg of THC?
And will you enjoy paying someone else to let the AI to do that?
No pun intended but - it's been more "vibes" than science that I've done this. It's more effective. When I focus my attention on the harness layer (tests, hooks, checks, etc), and the inputs, my overall velocity improves relative to reading & debugging the code directly.
To be fair - it is not accurate to say I absolutely never read the code. It's just rare, and it's much more the exception than the rule.
My workflow just focuses much more on the final product, and the initial input layer, not the code - it's becoming less consequential.
> What is the logic here? Because if you can read code, I can’t imagine poking the result with black box testing being faster.
It's producing seemingly working code faster than you can closely review it.
Your car can also move faster than what you can safely control. Knowing this, why go pedal to the metal?
Same. I stopped reading after that. I get the sense that most of these people thing all code is web or mobile or something non critical. Granted im not a web or mobile guy so I cant presume the complexity, risk, cost of such things. But I assume its in a different category than safety/mission critical things. I do dev tools for ASIL-B systems devs now and even then I cant say im comfortable not reading the generated code. Some of my junior peers are though, and im very frustrated that I feel like I keep having to play AI janitor, dont think the bosses care.
I think this is the logical next step -- instead of manually steering the model, just rely on the acceptance criteria and some E2E test suite (that part is tricky since you need to verify that part).
I personally think we are not that far from it, but it will need something built on top of current CLI tools.
> Because if you can read code, I can’t imagine poking the result with black box testing being faster.
I don't know... it depends on the use case. I can't imagine even the best front-end engineer ever can read HTML faster than looking at the rendered webpage to check if the layout is correct.
Good analogy.
> What is the logic here? Because if you can read code, I can’t imagine poking the result with black box testing being faster.
The AI also writes the black box tests, what am I missing here?
>The AI also writes the black box tests, what am I missing here?
If the AI misinterpreted your intentions and/or missed something in productive code, tests are likely to reproduce rather than catch that behavior.
In other words, if “the ai is checking as well” no one is.
That's true. Never let the AI know about the code it wrote when writing the test for sure. Write multiple tests, have an arbitrator (also AI) figure out if implementation or tests are wrong when tests fail. Have the AI heavily comment code and heavily comment tests in the language of your spec so you can manually verify if the scenarios/parts of the implementations make sense when it matters.
etc...etc...
> In other words, if “the ai is checking as well” no one is.
"I tried nothing, and nothing at all worked!"
your metaphor is wrong.
code is not the output. functionality is the output, and you do look at that.
Explain then how testing the functionality (not the new one; regressions included, this is not a school exercise) is faster than checking the code.
Are you writing black box testing by hand, or manually checking, everything that would normally be a unit test? We have unit tests precisely because of how unworkable the “every test is black box” approach is.
people care about results. Better processes need to produce better results. this is programming not a belief system where you have to adhere to some view or else.
>Imagine taking a picture on autoshot mode
Almost everyone does this. Hardly anyone taking pictures understands what f-stop or focal length are. Even those who do seldom adjust them.
There dozens of other examples where people voluntarily move to a black box approach. How many Americans drive a car with a manual transmission?
Hey it's me! I shoot with manual focus lenses in RAW and drive a standard. There are dozens of us!
You missed out on the rest of the analogy though, which is the part where the photo is not reviewed before handing it over to the client.
I find so many of these comments and debates fascinating as a lay person. I'm more tech savy than mostI meet, built my own PCs, know my way around some more 'advanced' things like terminal a bit and have a deeper understanding of computer systems, software, etc. than most people I know. It has always been more of a hobby for me. People look at me as the 'tech' guy even though I'm actually not.
Something I know very little about is coding. I know there are different languages with pros and cons to each. I know some work across operating systems while others don't but other than that I don't know too much.
For the first time I just started working on my own app in Codex and it feels absolutely amazing and magical. I've not seen the code, would have basically no idea how to read it, but i'm working on a niche application for my job that it is custom tailored to my needs and if it works I'll be thrilled. Even better is that the process of building is just feels so special and awesome.
This really does feel like it is on the precipice of something entirely different. I think back to computers before a GUI interface. I think back to even just computers before mobile touch interfaces. I am sure there are plenty of people who thought some of these things wouldn't work for different reasons but I think that is the wrong idea. The focus should be on who this will work for and why and there, I think, there are a ton of possibilities.
For reference, I'm a middle school Assistant Principal working on an app to help me with student scheduling.
Keep building and keep learning, I think you are the kind of user that stands to benefit the most from this technology.
After 10+ years of stewing on an idea, I started building an app (for myself) that I've never had the courage or time to start until now.
I really wanted to learn the coding, the design patterns, etc, but truthfully, it was never gonna happen without a Claude. I could never get past the unknown-unknowns (and I didn't even grasp how broad is the domain of knowledge it actually requires.) Best case I would have started small chunks and abandoned it countless times, piling on defeatism and disappointment each time.
Now in under two weeks of spare time and evenings, I've got a working prototype that's starting to resemble my dream. Does my code smell? Yes. Is it brittle? Almost certainly. Is it a security risk? I hope not. (It's not.)
I want to be intentional about how I use AI; I'm nervous about how it alters how we think and learn. But seeing my little toy out in the real world is flippin incredible.
> Is it a security risk? I hope not. (It's not.)
It very probably is, but if it's a personal project you're not planning on releasing anywhere, it doesn't matter much.
You should still be very cognizant that LLMs will currently fairly reliably implement massive security risks once a project grows beyond a certain size, though.
They can also identify and fix vulnerabilities when prompted. AI is being used heavily by security researchers for this purpose.
It’s really just a case of knowing how to use the tools. Said another way, the risk is being unaware of what the risks are. And awareness can help one get out of the bad habits that create real world issues.
My observation is that "AI" makes easy things easier and hard things impossible. You'll get your niche app out of it, you'll be thrilled, then you'll need it to do more. Then you will struggle to do more, because the AI created a pile of technical debt.
Programmers dream of getting a green field project. They want to "start it the right way this time" instead of being stuck unwinding technical debt on legacy projects. AI creates new legacy projects instantly.
> I don’t read code anymore
Never thought this would be something people actually take seriously. It really makes me wonder if in 2 - 3 years there will be so much technical debt that we'll have to throw away entire pieces of software.
> Never thought this would be something people actually take seriously
The author of the article has a bachelor's degree in economics[1], worked as a product manager (not a dev) and only started using GitHub[2] in 2025 when they were laid off[3].
[1] https://www.linkedin.com/in/benshoemaker000/
[2] https://github.com/benjaminshoemaker
[3] https://www.benshoemaker.us/about
Whilst I won't comment on this specific person, one of the best programmers I've met has a law degree, so I wouldn't use their degree against them. People can have many interests and skills.
I've written code since 2012, I just didn't put it online. It was a lot harder, so all my code was written internally, at work.
But sure, go with the ad hominem.
> Never thought this would be something people actually take seriously.
You have to remember that the number of software developers saw a massive swell in the last 20 years, and many of these folks are Bootcamp-educated web/app dev types, not John Carmack. They typically started too late and for the wrong reasons to become very skilled in the craft by middle age, under pre-AI circumstances and statistically (of course there are many wonderful exceptions; one of my best developers is someone who worked in a retail store for 15 years before pivoting).
AI tools are now available to everyone, not just the developers who were already proficient at writing code. When you take in the excitement you always have to consider what it does for the average developer and also those below average: A chance to redefine yourself, be among the first doing a new thing, skip over many years of skill-building and, as many of them would put it, focus on results.
It's totally obvious why many leap at this, and it's even probably what they should do, individually. But it's a selfish concern, not a care for the practice as-is. It also results in a lot of performative blog posting. But if it was you, you might well do the same to get ahead in life. There's only to so many opportunities to get in on something on the ground floor.
I feel a lot of senior developers don't keep the demographics of our community of practice into account when they try to understand the reception of AI tools.
This is gold.
I have rarely had the words pulled out of my mouth.
The percentage of devs in my career that are from the same academic background, show similar interests, and approach the field in the same way, is probably less than %10, sadly.
Ahh Bulverism, with a hint of ad-hominem and a dash no No True Scotsman. I think the most damning indictment here is the seeming inability to make actual arguments and not just cheap shots at people you've never even met.
Please tell me, "Were people excited about high-level languages just programmers who 'couldn't hack it' with assembly? Maybe you are one of those? Were GUI advocates just people who couldn't master the command line?"
Thanks for teaching me about Bulverism, I hadn't heard of that fallacy before. I can see how my comment displays those characteristics and will probably try to avoid that pattern more in the future.
Honestly, I still think there's truth to what I wrote, and I don't think your counter-examples prove it wrong per-se. The prompt I responded to ("why are people taking this seriously") also led fairly naturally down the road of examining the reasons. That was of course my choice to do, but it's also just what interested me in the moment.
I think he's a cook, watching people putting frozen "meals" in the microwave and telling himself: "hey! That's not cooking!".
And I totally agree with him. Throwing some kind of fallacy in the air for the show doesn't make your argument, or lack of, more convincing.
>I think he's a cook, watching people putting frozen "meals" in the microwave and telling himself: "hey! That's not cooking!".
It's the equivalent of saying anyone excited about being able to microwave Frozen meals is a hack who couldn't make it in the kitchen. I'm sorry, but if you don't see how ridiculous that assertion is then I don't know what to tell you.
>And I totally agree with him. Throwing some kind of fallacy in the air for the show doesn't make your argument, or lack of, more convincing.
A series of condescending statements meant to demean with no objective backing whatsoever is not an argument. What do you want me to say ? There's nothing worth addressing, other than pointing out how empty it is.
You think there aren't big shots, more accomplished than anyone in this conversation who are similarly enthusiastic?
You and OP have zero actual clue. At any advancement, regardless of how big or consequential, there are always people like that. It's very nice to feel smart and superior and degrade others, but people ought to be better than that.
So I'm sorry but I don't really care how superior a cook you think you are.
> You think there aren't big shots, more accomplished than anyone in this conversation who are similarly enthusiastic?
I think both things can be true simultaneously.
You're arguing against a straw man.
Well, there are programmers like Karpathy in his original coinage of vibe coding
> There's a new kind of coding I call "vibe coding", where you fully give in to the vibes, embrace exponentials, and forget that the code even exists. It's possible because the LLMs (e.g. Cursor Composer w Sonnet) are getting too good. Also I just talk to Composer with SuperWhisper so I barely even touch the keyboard. I ask for the dumbest things like "decrease the padding on the sidebar by half" because I'm too lazy to find it. I "Accept All" always, I don't read the diffs anymore. When I get error messages I just copy paste them in with no comment, usually that fixes it. The code grows beyond my usual comprehension, I'd have to really read through it for a while. Sometimes the LLMs can't fix a bug so I just work around it or ask for random changes until it goes away. It's not too bad for throwaway weekend projects, but still quite amusing. I'm building a project or webapp, but it's not really coding - I just see stuff, say stuff, run stuff, and copy paste stuff, and it mostly works.
Notice "don't read the diffs anymore".
In fact, this is practically the anniversary of that tweet: https://x.com/karpathy/status/2019137879310836075?s=20
Half serious - but is that really so different than many apps written by humans?
I've worked on "legacy systems" written 30 to 45 years ago (or more) and still running today (things like green-screen apps written in Pick/Basic, Cobol, etc.). Some of them were written once and subsystems replaced, but some of it is original code.
In systems written in the last.. say, 10 to 20 years, I've seen them undergo drastic rates of change, sometimes full rewrites every few years. This seemed to go hand-in-hand with the rise of agile development (not condemning nor approving of it) - where rapid rates of change were expected.. and often the tech the system was written in was changing rapidly also.
In hardware engineering, I personally also saw a huge move to more frequent design and implementation refreshes to prevent obsolescence issues (some might say this is "planned obsolescence" but it also is done for valid reasons as well).
I think not reading the code anymore TODAY may be a bit premature, but I don't think it's impossible to consider that someday in the nearer than further future, we might be at a point where generative systems have more predictability and maybe even get certified for safety/etc. of the generated code.. leading to truly not reading the code.
I'm not sure it's a good future, or that it's tomorrow, but it might not be beyond the next 20 year timeframe either, it might be sooner.
I would enjoy discussion with whoever voted this down - why did you?
What is your opinion and did you vote this down because you think it's silly, dangerous or you don't agree?
I'm torn between running away to be an electrician or just waiting three years until everyone realises they need engineers who can still read.
Sometimes it feels like pre-AI education is going to be like low-background steel for skilled employees.
> 2 - 3 years there will be so much technical debt that we'll have to throw away entire pieces of software.
That happens just as often without AI. Maybe the people that like it all thave experience with trashing multiple sets of products over the course of their life?
Remember though this forum is full of people who consider code objects when it's just state in a machine.
We have been throwing away entire pieces of software forever. Where's Novell? Who runs 90s Linux kernels in prod?
Code isn't a bridge or car. Preservation isn't meaningful. If we aren't shutting the DCs off we're still burning the resources regardless if we save old code or not.
Most coders are so many layers of abstraction above the hardware at this point anyway they may as well consider themselves syntax artists as much as programmers, and think of Github as DeviantArt for syntax fetishists.
Am working on a model of /home to experiment with booting Linux to models. I can see a future where Python in my screen "runs" without an interpreter because the model is capable of correctly generating the appropriate output without one.
Code is ethno objects, only exists socially. It's not essential to computer operations. At the hardware level it's arithmetical operations against memory states.
Am working on my own "geometric primitives" models that know how to draw GUIs and 3D world primitives, text; think like "boot to blender". Rather store data in strings, will just scaffold out vectors to a running "desktop metaphor".
It's just electromagnetic geometry, delta sync between memory and display: https://iopscience.iop.org/article/10.1088/1742-6596/2987/1/...
Wie bitte?
Reading and understanding code is more important than writing imo
It’s pretty well established that you cannot understand code without having thought things through while writing it. You need to know why things are written the way the are to understand what is written.
Yeah, just reading code does little to help me understand how a program works. I have to break it apart and change it and run it. Write some test inputs, run the code under a debugger, and observe the change in behavior when changing inputs.
If that were true, then only the person who wrote the code could ever understand it enough to fix bugs, which is decidedly not true.
I’ll grant you that there are many trivial software defects that can be identified by simply reading the code and making minor changes.
But for architectural issues, you need to be able to articulate how you would have written the code in the first place, once you understand the existing behavior and its problems. That is my interpretation of GP’s comment.
The coincidental timing between the rapid increase in the number of emergency fixes coming out on major software platforms and the proud announcement of the amount of code that's being produced by AI at the same companies is remarkable.
I think 2-3 years is generous.
Don't get me wrong, I've definitely found huge productivity increases in using various LLM workflows in both development as well as operational things. But removing a human from the loop entirely at this point feels reckless bordering on negligent.
I actually think this is fair to wonder about.
My overall stance on this is that it's better to lean into the models & the tools around them improving. Even in the last 3-4 months, the tools have come an incredible distance.
I bet some AI-generated code will need to be thrown away. But that's true of all code. The real questions to me are - are the velocity gains be worth it? Will the models be so much better in a year that they can fix those problems themselves, or re-write it?
I feel like time will validate that.
If the models don't get to the point where they can correct fixes on their own, then yeah, everything will be falling apart. There is just no other way around increasing entropy.
The only way to harness it is to somehow package code producing LLMs into an abstraction and then somehow validate the output. Until we achieve that, imo doesn't matter how closely people watch out the output, things will be getting worse.
> If the models don't get to the point where they can correct fixes on their own
Depending on what you're working on, they are already at that point. I'm not into any kind of AI maximalist "I don't read code" BS (I read a lot of code), but I've been building a fairly expensive web app to manage my business using Astro + React and I have yet to find any bug or usability issue that Claude Code can't fix much faster than I would have (+). I've been able to build out, in a month, a fully TDD app that would have conservatively taken me a year by myself.
(+) Except for making the UI beautiful. It's crap at that.
The key that made it click is exactly what the person describes here: using specs that describe the key architecture and use cases of each section. So I have docs/specs with files like layout.md (overall site shell info), ui-components.md, auth.md, database.md, data.md, and lots more for each section of functionality in the app. If I'm doing work that touches ui, I reference layout and ui-components so that the agent doesn't invent a custom button component. If I'm doing database work, reference database.md so that it knows we're using drizzle + libsql, etc.
This extends up to higher level components where the spec also briefly explains the actual goal.
Then each feature building session follows a pattern: brainstorm and create design doc + initial spec (updates or new files) -> write a technical plan clearly following TDD, designed for batches of parallel subagents to work on -> have Claude implement the technical plan -> manual testing (often, I'll identify problems and request changes here) -> automated testing (much stricter linting, knip etc. than I would use for myself) -> finally, update the spec docs again based on the actual work that was done.
My role is less about writing code and more about providing strict guardrails. The spec docs are an important part of that.
I have wondered the same but for the projects I am completely "hands off" on, the model improvements have overcome this issue time and time again.
I'm 2-3 years from now if coding AI continues to improve at this pace I reckon people will rewrite entire projects.
I can't imagine not reading the code I'm responsible for any more than I could imagine not looking out the windscreen in a self driving Tesla.
But if so many people are already there, and mostly highly skilled programmers imagine in 2 years time with people who've never programmed!
If I keep getting married at the same pace I have, then in a few years I'll have like 50 husbands.
Well, Tesla has been nearly at FSD for how long? The analogy you make sorta makes it sound less likely
Seems dangerous to wager your entire application on such an uncertainty
Some people are not aware that they are one race condition away from a class action lawsuit.
The proponents of Spec Driven Development argue that throwing everything out completely and rebuilding from scratch is "totally fine". Personally, I'm not comfortable with the level of churn.
Also take something into account: absolutely _none_ of the vibe coding influencer bros make anything more complicated than a single-feature, already implemented 50 times webapp. They've never built anything complicated either, or maintained something for more than a few years with all the warts that it entails. Literally, from his bio on his website:
> For 12 years, I led data and analytics at Indeed - creating company-wide success metrics used in board meetings, scaling SMB products 6x, managing organizations of 70+ people.
He's a manager that made graphs on Power BI.
They're not here because they want to build things, they're here to shit a product out and make money. By the time Claude has stopped being able to pipe together ffmpeg commands or glue together 3 JS libraries, they've gone on to another project and whoever bought it is a sucker.
It's not that much different from the companies of the 2000s promising a 5th generation language with a UI builder that would fix everything.
And then, as a very last warning: the author of this piece sells AI consulting services. It's in his interest to make you believe everything he has to say about AI, because by God is there going to be suckers buying his time at indecently high prices to get shit advice. This sucker is most likely your boss, by the way.
No true programmer would vibecode an app, eh?
Oh no, they would. I would.
I'd have the decency to know and tell people that it's a steaming pile of shit and that I have no idea how it works though, and would not have the shamelessness to sell a course on how to put out LLM vomit in public though.
Engineering implies respect for your profession. Act like it.
But invoking No True Scotsman would imply that the focus is on gatekeeping the profession of programming. I don’t think the above poster is really concerned with the prestige aspect of whether vibe bros should be considered true programmers. They’re more saying that if you’re a regular programmer worried about becoming obsolete, you shouldn’t be fooled by the bluster. Vibe bros’ output is not serious enough to endanger your job, so don’t fret.
Yes, and you can rebuild them for free
Claude, Codex and Gemini can read code much faster than we can. I still read snippets, but mostly I have them read the code.
Unfortunately they're still too superficial. 9 times out of 10 they don't have enough context to properly implement something and end up just tacking it on in some random place with no regard for the bigger architecture. Even if you do tell it something in an AGENT.md file or something, it often just doesn't follow it.
I use them to probabilistically program. They’re better than me and I’ve been at it for 16 years now. So I wouldn’t say they’re superficial at all.
What have you tried to use them for?
I've seen software written and architected by Claude and I'd say that they're already ready to be thrown out. Security sucks, performance will probably suck, maintainability definitely sucks, and UX really fucking sucks.
I have a wide range of Claude Code based setups, including one with an integrated issue tracker and parallel swarms.
And for anything really serious? Opus 4.5 struggles to maintain a large-scale, clean architecture. And the resulting software is often really buggy.
Conclusion: if you want quality in anything big in February 2026, you still need to read the code.
Opus is too superficial for coding (great at bash though, on the flipside), I‘d recommend giving Codex a try.
As LLMs advance so rapidly I think that all the AI slop code written today will be easily digestible by the LLMs a few generations down the line. I think there will be a lot of improvements in making user intent clearer. Combined with a bad codebase and larger context windows, refactoring wont be a challenge.
The skills required to perform as a software engineer in an environment where competent AI agents is a commodity has shifted. Before it was important for us to be very good as reading documentation and writing code. Now we need to be very good at writing docs, specs and interfaces, and reading code.
That goes a bit against the article, but it's not reading code in the traditional sense where you are looking for common mistakes we humans tend to make. Instead you are looking for clues in the code to determine where you should improve in the docs and specs you fed into your agent, so the next time you run it chances are it'll produce better code, as the article suggests.
And I think this is good. In time, we are going to be forced to think less technically and more semantically.
For anyone who says you don’t need to look at code to code, I’d encourage you to look at this code: https://github.com/benjaminshoemaker/benshoemaker-us
Or simply look at the Astro blog, which is still showing the default Astro favicon.
I say this not to discourage anyone. Building a blog—or any app—is a huge accomplishment, and you should be proud. In particular, if you’re sharing what you’ve learned and sharing your code publicly, you’re already ahead of the majority of people on this journey.
What I’d encourage you to do is keep doing what you’re doing. The best way to learn to build software is to build software. The more you do, the more you learn.
Sometimes when I vibe code, I also have a problem with the code, and find myself asking: “What went wrong with the system that produced the code?”
The answer is clear: I didn’t write the code, I didn’t read it, I have no idea what it does, and that’s why it has a bug.
That as it may be. I spot bugs a lot faster when I didn’t write the code than when I did.
Well, I’d wager there are quite a few more bugs, so naturally it should be easier to spot a few.
When you write code yourself, you're convinced each line is correct as you write it. That assumption is hard to shake, so you spend hours hunting for bugs that turn out to be obvious. When reading AI-generated code fresh, you lack that assumption. Bugs can jump out faster. That's at least my naive explanation to this phenomenon
Following this logic, why not move further left?
Become a CTO, CEO or even a venture investor. "Here's $100K worth tokens, analyze market, review various proposals from Agents, invest tokens, maximize profit".
You know why not? Because it will be more obvious it doesn't work as advertised.
If one truly believed in LLMs being able to replace knowledge workers, then it would also hold that they could replace managers and execs. In fact, they should be able to do it even better: LLMs could convert every company into a "flat" one, bypassing the manangement hierarchy and directly consuming meeting notes from every meeting to get the real status as the source of truth, and provide suggestions as needed. If combined with web-search capability, they would also be more plugged into the market, customer sentiment, and competitors than most execs could ever be.
We're not at the point where we are replacing all software developers entirely (and will never be without real AGI), but we are definitely at the point where scaling back headcount is possible.
Also, creating software is much more testable and verifiable than what a CEO does. You can usually tell when the code isn't right because it doesn't work or doesn't pass a test. How can you verify that your AI CEO is giving you the right information or planning its business strategy effectively?
It's one of the biggest reasons that software development and art are the two domains in which AI excels. In software you can know when it's right, and in art it doesn't matter if it's right.
> LLMs could convert every company into a "flat" one, bypassing the manangement hierarchy
It sounds like you're describing Manna by Marshall Brain
I think this will work but requires quality data pipelines and scaffolding same as coding
You have to move up or down to survive. In 10 years we'll either be managers (either of humans or agents), or we'll be electrical engineers. Programming is done! I for one am glad.
There are two extremes and spectrum in between:
* AI can replace knowledge workers - most of existing software engineers, managers of all levels will loose their job and have to re-qualify.
* AI requires human in the loop.
In the first scenario, I see no reason to waste time and should start building plan B now (remaining job markets will be saturated at that point).
In the second scenario, tech-debt and zettabytes of slop will harm companies which relied on it heavily. In the age of failing giants and crumbling infrastructure, engineers and startups that can replace gigawatt burning data center with a few kilowatt rack, by manually coding a shell script that replaces Hadoop, will flourish.
Most probably it will be a spectrum - some roles can be replaced, some not.
I still think this is mostly people who never could hack it at coding taking to the new opportunities that these tools afford them without having to seriously invest in the skill, and basking in touting their skilless-ness being accepted as the new temporary cool.
Which is perhaps what they should do, of course. Any transition is a chance to get ahead and redefine yourself.
Just FYI, this is the attitude that causes pro-AI people to start shit-talking anti-AI folks as Luddites who need to learn to use the tools.
Agents are a quality/velocity tradeoff (which is often good), if you can't debug stuff without them that's a problem as you'll get into holes, but that doesn't mean you have to write the code by hand.
I enjoy new technology in general, so I very much keep up with the tools and also like using them for the things they do well at any given moment. I'm not among the Luddites, FWIW. I think there's a lot of legitimately great building going on right now.
Note though we're talking about "not reading code" in context, not the writing of it.
Author is a former data analytics product manager (already a bit of a tea leaf reading domain) who says he never reads code and is now marketing himself as a new class of developer.
Parent post sounds like a very accurate description.
I completely agree in a sense - the cost of producing software is plummeting, and it's leading to me being able to develop things that I would never have invested months in before.
This blog post is written by a product manager, not a programmer. Their CV speaks to an Economics background, a stint in market research, writing small scripting-type programs ("Cron+MySQL data warehouse") and then off to the product management races.
What it's trying to express is that the (T)PM job still should still be safe because they can just team-lead a dozen agents instead of software developers.
Take with a grain of salt when it comes to relevance for "coding", or the future role breakdown in tech organizations.
That's me! I'm pretty open about that.
I'm not trying to express that my particular flavor of career is safe. I think that the ability to produce software is much less about the ability to hand-write code, and that's going to continue as the models and ecosystem improve, and I'm fascinated by where that goes.
>I think the industry is moving left. Toward specs. The code is becoming an implementation detail. What matters is the system that produces it - the requirements, the constraints, the architecture. Get those right, and the code follows.
So basically a return to waterfall design.
Rather than YOLO planning (agile), we go back to YOLO implementation (farming it out to dozens of replaceable peons, but this time they're even worse).
I really wish posts like this explained what sort of development they are doing. Is this for an internal CRUD server? Internal React app? Scala server with three instances? Golang server with complex AWS configuration? 10k lines? 100k lines? 1M+? Externally facing? iOS app? Algorithm-heavy photo processing desktop app? It would give me a much better idea of whether the argument is reasonable, and whether it is applicable for the kind of software I generally write.
The author is a PM with a bachelors in economics who got laid off last year and began building with AI. Zero engineering experience.
You can guess what kind of software he is building.
When you read the 100th blog post about how AI is changing software development, just remember that these are the authors.
He makes <10k cloc websites trying to sell you a spec-creation wizard[0]. Considering Claude wrote the site, it could probably be written in 1/10th of the lines.
I think these tools are great for allowing non-technical people like OP to create landing pages and small prototypes, but they're useless for anything serious. That said, I applaud OP for embodying the "when in a gold rush, sell shovels" mentality.
[0] - https://vibescaffold.dev/
You're completely right and I wish I had in retrospect... I was honestly just talking mostly in broad terms, but people really (maybe rightly) focused on the "not reading code" snippet.
I'm mostly developing my own apps and working with startups.
> The people really leading AI coding right now (and I’d put myself near the front, though not all the way there)
So humble. Who is he again?
"Ex-Indeed"
https://www.linkedin.com/in/benshoemaker000/
> I don’t read code anymore
> Senior Technical Product Manager
yeah i'd wager they didn't read (let alone write) much code to begin with..
At least going by their own CV, they've mostly written what sounds like small scripting-type programs described in grandiose terms like "data warehouse".
This blog post is influencer content.
Pretty unpopular influencer if that were the case
When I talk with people in the space, go to meetups, present my work & toolset, I am usually one of the more advanced, but usually not THE most, people in the conversation / group. I'm not saying I'm some sort of genius, I'm just saying I'm relatively near the leading edge of how to use these tools. I feel like it's true.
turning a big dial taht says "Psychosis" on one side and "Wishful thinking" on the other and constantly looking back at the LinkedIn audience for approval like a contestant on the price is right
Why have a spec when I have the concrete implementation and a system ready and willing to answer any questions I have about it? I don't understand why people value an artifact that can be out of sync with reality over the actual reality. The LLM can answer questions based on the code. We might drift away from needing a code editor, but I likely won't be drifting to reading specs in a world where I can converse with the deployed implementation.
I think the idea is more to program the prompter than to program the LLM. He sells a wizard for generating project specs. Anyone can do this with a normal LLM conversation, but I suppose some people forget
Yeah, the revenge of waterfall, specs documents for AI agents.
I don't get it. Can't you just open Claude Code in another terminal? I had like 5 open yesterday.
I haven't used Codex though, so maybe there's something I'm missing about the parallel-ness of it here.
> Here’s the thing: I don’t read code anymore. I used to write code and read code. Now when something isn’t working, I don’t go look at the code.
Recently I picked a smallish task from our backlog. This is some code I'm not familiar with, frontend stuff I wouldn't tackle normally.
Claude wrote something. I tested, it didn't work. I explained the issue. It added a bunch of traces, asked me to collect the logs, figured out a fix, submitted the change.
Got bunch of linter errors that I don't understand, and that I copied and pasted to Claude. It fixed something, but still got lint errors, which Claude dismissed as irrelevant, but I realized I wasn't happy with the new behavior.
After 3 days of iteration, my change seems ok, passed the CI, the linters, and automatic review.
At that stage, I have no idea if this is the right way to fix the problem, and if it breaks something, I won't be able to fix it myself as I'm clueless. Also, it could be that a human reviewer tells me it's totally wrong, or ask me questions I won't be able to answer.
Not only, this process wasn't fun at all, but I also didn't learn anything, and I may introduce technical debt which AI may not be able to fix.
I agree that coding agents can boost efficiency in some cases, but I don't see a shift left of IDEs at that stage.
Why not look at the code? If you see something that looks messy, ask for it to be cleaned up.
Code health is a choice. We have power tools now. All you have to do is ask.
a simple this seems odd / messy / un-pythonic is often enough
My rule is 3 tries then dig deeper. Sometimes I don't even wait that long, certain classes of bugs are easy for humans to detect but hard for agents, such as CSS issues. Try asking the agent to explain/summarize the code that's causing the problem and double checking against docs for the version you're using, that solves a lot of problems.
> This is some code I'm not familiar with
Ask it to analyze and explain the code to you.
This has largely been my experience. Just reading and understanding the code, and writing the change myself ends up actually being faster.
Spec is too low level in my experience. The graph continues far further to the left.
I tried doing clean room reimplementations from specs, and just ended up with even worse garbage. Cause it kept all the original garbage and bloated it further!
Giving it a description of what you're actually trying to do works way better. Then it finds the most elegant solution to the problem, both in terms of the code and the UI design.
I don't like the craft of the app. There are a few moments that really left me feeling it wasn't 100 percent thought through like cursor is at this point.
Why create IDE without IDE features? Whats the benefit of this over using IDE with Codex plugin? I don't believe that you can review the code without code traversal by references, so looks like its directed towards toy projects/ noobs. And the agents are not yet near the autonomy that will omit the code review in complex systems.
Why do the illustrations bear such a strong resemblance to those in the Gas Town article?
https://steve-yegge.medium.com/welcome-to-gas-town-4f25ee16d...
Is it a nano banana tendency or was it probably intentional?
It's nano banana - I actually noticed the same thing. I didn't prompt it as such.
Here's the prompt I used, actually:
Create a vibrant, visually dynamic horizontal infographic showing the spectrum of AI developer tools, titled "The Shift Left"
Layout: 5 distinct zones flowing RIGHT TO LEFT as a journey/progression. Use creative visual metaphors — perhaps a road, river, pipeline, or abstract flowing shapes connecting the stages. Each zone should feel like its own world but connected to the others.
Zones (LEFT to RIGHT):
1. "Specs" (leftmost) - Kiro logo, VibeScaffold logo, GitHub Spec Kit logo
2. "Multi-Agent Orchestration" - Claude Code logo, Codex CLI logo, Codex App logo, Conductor logo 3. "Agentic IDE" - Cursor logo, Windsurf logo 4. "Code + AI" - GitHub Copilot logo 5. "Code" (rightmost) - VS Code logo Visual style: Fun, energetic, modern. Think illustrated tech landscape or isometric world. NOT a boring corporate chart. Use warm off-white background (#faf8f5) with amber/orange (#b45309) as the primary accent color throughout. Add visual flair — icons, small illustrations, depth, texture, but don't make it visually overloaded.Aspect ratio: 16:9 landscape
> Where IDEs are headed and why specs matter more than code.
We are very far away from this being a settled or agreed upon statement and I really struggle to understand how one vendor making a tool is indicative of an industry practice.
Clearly written by someone who has no systems of importance in production. If my code fail people loose money, planes halts, cars break down. Read. The. Code.
Yes, but also ... the analogy to assembly is pretty good. We're moving pretty quickly towards a world where we will almost never read the code.
You may read all the assembly that your compiler produces. (Which, awesome! Sounds like you have a fun job.) But I don't. I know how to read assembly and occasionally do it. But I do it rarely enough that I have to re-learn a bunch of stuff to solve the hairy bug or learn the interesting system-level thing that I'm trying to track down if I'm reading the output of the compiler. And mostly even when I have a bug down at the level where reading assembly might help, I'm using other tools at one or two removes to understand the code at that level.
I think it's pretty clear that "reading the code" is going to go the way of reading compiler output. And quite quickly. Even for critical production systems. LLMs are getting better at writing code very fast, and there's no obvious reason we'll hit a ceiling on that progress any time soon.
In a world where the LLMs are not just pretty good at writing some kinds of code, but very good at writing almost all kinds of code, it will be the same kind of waste of time to read source code as it is, today, to read assembly code.
I think this analogy to assembly is flawed.
Compilers predictably transform one kind of programming language code to CPU (or VM) instructions. Transpilers predictably transform one kind of programming language to another.
We introduced various instruction architectures, compiler flags, reproducible builds, checksums exactly to make sure that whatever build artifact that's produced is super predictable and dependable.
That reproducibility is how we can trust our software and that's why we don't need to care about assembly (or JVM etc.) specifics 99% of the time. (Heck, I'm not familiar with most of it.)
Same goes for libraries and frameworks. We can trust their abstractions because someone put years or decades into developing, testing and maintaining them and the community has audited them if they are open-source.
It takes a whole lot of hand-waving to traverse from this point to LLMs - which are stochastic by nature - transforming natural language instructions (even if you call it "specs", it's fundamentally still a text prompt!) to dependable code "that you don't need to read" i.e. a black box.
The analogy to assembly is wrong. Even in a high level language, you can read the code and reason about what it does.
What's the equivalent for an LLM? The string of prompts that non-deterministically generates code?
Also, if LLM output is analogous to assembly, then why is that what we're checking in to our source control?
LLMs don't seem to solve any of the problems I had before LLMs existed. I never worried about being able to generate a bunch of code quickly. The problem that needs to be solved is how to better write code that can be understood, and easily modified, with a high degree of confidence that it's correct, performs well, etc. Using LLMs for programming seems to do the opposite.
I think it's the performative aspects that are grating, though. You're right that even many systems programmers only look at the generated assembly occasionally, but at least most of them have the good sense to respect the deeper knowledge of mechanism that is to be found there, and many strive to know more eventually. Totally orthogonal to whether writing assembly at scale is sensible practice or not.
But with the AI tools we're not yet at the wave of "sometimes it's good to read the code" virtue signaling blog posts that will make front page next year or so, and still at the "I'm the new hot shit because I don't read code" moment, which is all a bit hard to take.
I mean, fair enough. Obviously there are different levels of criticality in any production environment. I'm building consumer products and internal tools, not safety-critical systems.
Even in those environments, I'd argue that AI coding can offer a lot in terms of verification & automated testing. However, I'd probably agree, in high-stakes safety environments, it's more of a 'yes and' than an either/or.
I think a lot of AI bros are sleeping on quality. Prior startup wisdom was “move fast and break things”. Speed is ubiquitous now. Relatively anyone can vibe code a buggy solution that works for their happy path. If that’s the bar, why would I pay for your jank solution when I can make my own tailored to my exact needs? Going fast is a race to the bottom in the long run.
What’s worth paying for is something that is trustworthy.
Claude code is a perfect example: They blocked tools like opencode because they know quality is the only moat, and they don’t currently have it.
Hell I see the big banner picture hallucinated by a prompt and all I see is an unproductive mess. Won't comment on the takes the article makes they're just miserable
>Here’s the thing: I don’t read code anymore. I used to write code and read code. Now when something isn’t working, I don’t go look at the code. I don’t question the code. I either ask one of my coding agents, or - more often - I ask myself: what happened with my system? What can I improve about the inputs that led to that code being generated?
Good luck debugging any non trivial problem in such codebase
Not to mention data retention and upgrade management.
When an update script jacks up the guaranteed-to-be-robust vibed data setup in this first of a kind, one of a kind, singular installation… what then?
The pros have separate dev, test, QA, and prod environments. Immutable servers, NixOs, containers, git, and rollback options in orchestration frameworks. Why? Because uh-oh, oh-shit, say-what, no-you’re-kidding, oh-fuck, and oops are omnipresent.
MS Access was a great product with some scalability ceilings that took engineering to work past. MS Access solutions growing too big then imploding was a real concern that bit many departments. MS access was not dumping 15,000 LoC onto the laps of these non-developers and telling them they are hybrid spirit code warriors with next level hacking skills.
Ruby on Rails, Wordpress, SharePoint… there are legitimately better options out there for tiny-assed self-serving CRUD apps and cheap developer ecosystems. They’re not quite as fun, tho, and they don’t gas people up as well.
reason why I also ended up creating something like this: https://github.com/saadnvd1/aTerm
has someone figured out on how to set the codex app to yolo mode yet?
the constant asking drives me crazy
There‘s a button looking like a shield, next to the voice dictation button.
It's called the "Yeet" skill in the app
I have always thought that AI code generation is an irresistible attraction for those personalities who lack the technical skills or knowledge necessary for programming, but nevertheless feel undeservedly like geniuses. This post is proof of that.
Also, the generated picture in this post makes me want to kick someone in the nuts. It doesn't explain anything.
Ouch lol.
Is the image really not that clear? There are IDE-like tools that all are focusing on different parts of the Spec --> Agent --> Code continuum. I think it illustrates that all right.
I really wonder why nobody is talking about how it is more important to be able to test the code.
9/10 my ai generated code is bad before my verification layers 9/10 its good after.
Claude fights through your rules. And if you code in another language you could use other agents to verify code.
This is the challenge now, effectively verify the code. Whenever I end up with a bad response I ask myself what layers could i set to stop AI as early as possible.
Also things like namings, comments, tree traversal, context engineering, even data-structures, multi-agenting. I know it sounds like buzzword, but these are the topics a software-engineer really should think about. Everything else is frankly cope.
I think people attacking "don't read the code" are not considering the status quo - they're comparing to some perfect world where staff engineers are reading every line of code. That's not even close to happening. Test-driven development is something most engineers just won't put up with... AI's will do it, no problem. If I can automate ten different checks for every commit, is my code really getting looked at less?
[dead]
Not really what "shift left" means...
[dead]