I don't know about you, but when the creator of a software says I have not read any of the code, I don't want to install or use it. Call me old fashioned. Really hoping this terrifying vibe coding future dies an early death before the incurred technical debt makes every digital interaction a landmine.
To be fair, the author says: "Do not use Gas Town."
I started "fully vibecoding" 6 months ago, on a side-project, just to see if it was possible.
It was painful. The models kept breaking existing functionality, overcomplicating things, and generally just making spaghetti ("You're absolutely right! There are 4 helpers across 3 files that have overlapping logic").
A combination of adjusting my process (read: context management) and the models getting better, has led me to prefer "fully vibecoding" for all new side-projects.
Note: I still read the code that gets merged for my "real" work, but it's no longer difficult for me to imagine a future where that's not the case.
I have noticed in just the past two weeks or so, a lot of the naysayers have changed their tunes. I expect over the next 2 months there will be another sea change as the network effect and new frameworks kick in.
No. If anything we are getting "new" models but hardly any improvements. Things are "improving" on scores, ranking and whatever other metrics the AI industry has invented but nothing is really materializing in real work.
I think we have crossed the chasm and the pragmatists have adopted these tools because they are actually useful now. They've thrown out a lot of their previously held principles and norms to do so and I doubt the more conservative crowd will be so quick to compromise.
2 years sounds more likely than 2 months since the established norms and practices need to mature a lot more than this to be worthy of the serious consideration of the considerably serious.
Agreed. If the author did not bother to write, much less read, their work, why should we spend time reading it?
In the past a large codebase indicated that maybe you might take the project serious, as some human effort was expended in its creation. There were still some outliers like Urbit and it's 144 KLOC of Hoon code, perverse loobeans and all.
Now if I get so much as a whiff of AI scent of a project, I lot all interest. It indicates that the author did not a modicum of their own time in the project, so therefore I should waste my own time on it.
(I use LLM-based coding tools in some of my projects, but I have the self-respect to review the generated code before publishing init.)
I’ve come to appreciate that there is a new totally valid (imo) kind of software development one can do now where you simply do not read the code at all. I do this when prototyping things with vibe coding for example for personal use, and I’ve posted at least one such project on GitHub for others who may want to run the code.
Of course as a developer you still have to take responsibility for your code, minimally including a disclaimer, and not dumping this code in to someone else’s code base. For example at work when submitting MRs I do generally read the code and keep MRs concise.
I’ve found that there is a certain kind of coder that hears of someone not reading the code and this sounds like some kind of moral violation to them. It’s not. It’s some weird new kind of coding where I’m more creating a detailed description of the functionality I want and incrementally refining it and iterating on it by describing in text how I want it to change. For example I use it to write GUI programs for Ubuntu using GTK and python. I’m not familiar with python-gtk library syntax or GTK GUI methods so there’s not really much of a point in reading the code - I ask the machine to write that precisely because I’m unfamiliar with it. When I need to verify things I have to come up with ways for the machine to test the code on its own.
Point is I think it’s honestly one new legitimate way of using these tools, with a lot of caveats around how such generated code can be responsibly used. If someone vibe coded something and didn’t read it and I’m worried it contains something dangerous, I can ask Claude to analyze it and then run it in a docker container. I treat the code the same way the author does - as a slightly unknown pile of functions which seem to perform a function but may need further verification.
I’m not sure what this means for the software world. On the face of it it seems like it’s probably some kind of problem, but I think at the same time we will find durable use cases for this new mode of interacting with code. Much the same as when compilers abstracted away the assembly code.
Many years ago, java compilers, though billed out as a multiple-platform write-once-run-anywhere solution, those compilers would output different bytecode that would behave in interesting and sometimes unpredictable fashion. You would be inside jdb, trying to debug why the compiler did what it did.
This is not exactly that, but it is one step up. Having agents output code that then gets compiled/interpreted/whatever, based upon contextual instruction, feels very, very familiar to engineers who have ever worked close to the metal.
"Old fashioned", in this aspect, would be putting guardrails in place so that you knew that what the agent/compiler was creating was what you wanted. Many years ago, that was binaries or bytecode packaged with lots of symbols for debugging. Today, that's more automated testing.
You are ignoring the obvious difference between errors introduced while translating one near-formal-intent-clear language to another as opposed to ambiguous-natural-language to code done through a non-deterministic intermediary. At some point in the future the non-deterministic intermediary will become stable enough (when temperature is low and model versions won't affect output much) but the ambiguity of the prompting language is still going to remain an issue. Hence, read before commit will always be a requirement I think.
A good friend of mine wrote somewhere that at about 5 agents or so per project is when he is the bottleneck. I respect that assessment. Trust but verify. This way of getting faster output by removing that bottleneck altogether is, at least for me, not a good path forward.
Unfortunately, reading before merge commit is not always a firm part of human team work. Neither reading code nor test coverage by themselves are sufficient to ensure quality.
Same here. I'm "happy" that I'm old "enough" to be able to wrap up my career in a few years time and likely be able to get out of this mess before this "agentic AI slop" becomes the expected workflow.
On my personal project I do sometimes chat with ChatGPT and it works as a rubber duck. I explain, put my thoughts into words and typically I already solve my problem when I'm thinking it through while expressing it in words. But I must also admit that ChatGPT is very good at producing prose and I often use it for recommending names of abstractions/concepts, modules, functions, enums etc. So there's some value there.
But when it comes to code I want to understand everything that goes into my project. So in the end of the day I'm always going to be the "bottle neck", whether I think through the problem myself and write the code or I review and try to understand the AI generated code slop.
It seems to me that using the AI slop generation workflow is a great fit for the industry though, more quantity rather quality and continuous churn. Make it cheaper to replace code so that the replacement can be replaced a week later with another vibe-coded slop. Quality might drop, bugs might proliferate but who cares?
And to be fair, code itself has no value, it's ephemeral, data and its transformations are what matter. Maybe at some point we can just throw out the code and just use the chatbots to transform the data directly!
This is pretty much how I use LLMs as well. These interactions have convinced me that while the LLMs are very convincing with persuasive arguments, they are wrong often on things I am good at; so much so that I would have a hard time opening PRs for code edited by them without reading it carefully. Gell-man amnesia and all that seems appropriate here even though that anthropomorphizes LLMs to an uncomfortable extent. At some point in the future I can see them becoming very good at recognizing my intent and also reasoning correctly. Not there yet.
I've been vibe coding my own personal assistant platform, still haven't read any of the code but who cares it's just for me and it works.
Now I've got tools and functionality that I would have paid for before as separate apps that are running "for free" locally.
I can't help but think this is the way forward and we'll just have to deal with the landmine as/when it comes, or hope that the tooling gets drastically better so we the landmine isn't as powerful as we fear.
You're old fashioned, and that's ok, if it's ok with you.
But when high level languages were getting started, we had to read and debug the the transformed lower level output they made (hello C-front). At a certain point, most of us stopped debugging the layer below and most LLVM IR and assembly flow by without anyone reading it.
I use https://exe.dev to orchestrate several agents, and I am seeing the same benefits as Steve (with a better UI). My code smell triggers with lots of diffs that flow by, but just as often this feeling of, "oh, that's a nice feature, it's much better than I could have made" is also triggered. If you work with colleagues who occasionally delight and surprise you with excellent work, it's the same thing.
Maybe if you are not used to the feeling of being surprised and mostly delighted by your (human) colleagues, orchestrated agentic coding is hard to get your head around.
I have nothing against automated code completion on steroids or agents. What I cannot condone is not reading and understanding the generated code. If you have not understood your agent generated code, you will be "surprised" for sure, sooner or later.
Yeah, the assumption is that it eventually will be the same or better. It's basically how this software was created, he seems to have made a few different versions before he was happy.
Compilers only obtained that level of trust through huge amounts of testing and deterministic execution. You don't look at compiler output because it's nearly always correct. People find compiler bugs horrifying for that reason.
LLMs are far from being as trustworthy as compilers.
If I use the same codebase and the same compiler version and the same compiler flags over and over again to produce a binary, I expect the binary to be the deterministically be the same machine code. I would not expect that from an LLM.