I had ChatGPT output an algorithm implementation in Go (Shamir Secret Sharing) that I didn't want to figure out. It kinda worked, but everytime I pointed out a problem with the code it seemed more bugs were added (and I ended up hating the "Good catch!" text responses...)
Eventually, figuring out why it didn't work made me have to read the algorithm spec and basically write the code from scratch, throwing away all of the ChatGPT work. Definitely took more time than doing it the "hard way".
The skill in using an LLM currently is in getting you to where you want to be, rather than wasting time convincing the LLM to spit out exactly what you want. That means flipping between having Aider write the code and editing the code yourself, when it's clear the LLM doesn't get it, or you get it better than it does.
This is the key thing that I feel most people who dislike using LLMs for development miss. You need to be able to quickly tell if the model is just going to keep spinning on something stupid, and just do it yourself in those scenarios. If you're decent at this then there can only really be a net benefit.
The bits GPT4 always gets wrong - and as you say, more and more wrong the further I try to work with it to fix the mistakes - are exactly the bits I want it to do for me. Tedious nested loops that I need to calculate on paper in particular.
What it's good for is high level overview and structuring of simple apps, which saves me a lot of googling, reviewing prior work, and some initial typing.
After my last attempts to work with it, I've decided that until there's another large improvement in the models (GPT5 or similar), I won't try to use it beyond this initial structure creation phase.
The issue is that for complex apps that already have a structure in place - especially if it's not a great structure and I don't have the rights or time to do a refactoring - the AI can't really do anything to help. So in this case, for new, simple, or test projects it'll seem like an amazing tool and then in the real world it's pretty much useless or even just wastes time, except for brainstorming entirely new features that can be reasoned about in isolation, in which case it's useful again.
A counterpoint is that code should always be written in a modular way so that each piece can be reasoned about in isolation. Which doesn't often happen in large apps that I've worked on, unfortunately. Unless I'm the one who writes them from scratch.
I can regularly get it to autocomplete big chunks of code that are good. But specifically only when it's completely mind numbingly boring, repetitive and derivative code. Good for starting out a new view or controller that is very similar to something that already exists in the codebase. Anything remotely novel and it's useless.
I have strange documentation habits and sometimes when you document everything in code comments up front, Copilot does seem to synthesize from your documentation most of the "bones" that you need. It often needs a thorough code review, but it's not unlike sending a requirements document to a very Junior developer that sometimes surprises you and getting back something that almost works in a PR you need a fine tooth comb on. A few times I've "finished my PR review" with "Not bad, Junior, B+".
I know a lot of us generally don't write comments until "last" so will never see this side of Copilot, but it is interesting to try if you haven't.
An alterinative to this workflow that I find myself returning to is the good ol' nicking code from stackoverflow or Github.
ChatGPT works really well because the stuff you are looking for is already written somewhere and it solves the needle-in-the-haystack problem of finding it, very well.
But I often find it tends to output code that doesn't work but eerily looks like it should, whereas Github stuff tends to need a bit more wrangling but tends to work.
The big benefit to me with SO is that with a question with multiple answers, the top up voted question likely works, since those votes are probably people that tried it. I also like the 'well, actually' responses and follow up, because people point out performance issues or edge cases I may or may not care about.
I only find current LLMs to be useful for code that I could easily write, but I am too lazy to do so. The kind of boilerplate that can be verified quickly by eye.
Once it writes the code, take that into a new session to fix a bug. Repeat with new sessions. Don’t let it read the buggy code, it will just get worse.
Yah this works for me and I'm not a SWE. I use it to make marketing websites. Sometimes it will do something perfectly but mess up one part, if I keep getting it to fix that one part in the same session almost certainly it's never going to work (I burnt a week this way). However, if I take it into a brand-new GPT sessions and say here is this webpage i wrote, but I made a mistake and the dropdown box should be on the left not the right, it can almost always fix it. Again, I'm not really a SWE so I'm not sure what is going on here, but if you click the drop down on that "Analyzing" thing that shows up, in the same session it seems to try to re-work the code from memory, on a new session if you look at the drop down Analyzing thing, it seems to be using a different method to re-work the code.
Interesting - I almost always iterate on code in the same session. I will try doing it with history off and frequently re-starting the session. I naively assumed the extra context would help, but I can see how it's also just noise when there are 5 versions of the same code in the context.
Until step 4 everything stays the same, but instead of asking it to fix the code again you copy it into another session, this way, you'll repeat step 3 again, without the LLM "seeing" the code it previously generated for step 4.
I dunno how you SWE's are doing it, but I have my ChatGPT output files and if multi, zip files, not code snippets (unless I want a code snippet), and then I re-upload those files to new session using the attach thinger. Also, in my experience just building marketing websites, I don't do step 3, I just do step 1 and 2 over and over in new sessions, it's longer because you have to figure out a flow through a bunch of work sessions, but it's faster because it makes wwwwaaaaayyyyyyy fewer mistakes. (You're basically just shaking off any additional context the GPT has at all about what you are doing when you put it in a brand-new session, so it can be more focused on the task, I guess?)
The only time I've had success with using AI to drive development work is for "writers block" situations where I'm staring at an empty file or using a language/tool with which I'm out of practice or simply don't have enough experience.
In these situations, giving me something that doesn't work (even if I wind up being forced to rewrite it) is actually kinda helpful. The faster I get my hands dirty and start actually trying to build the thing, the faster I usually get it done.
The alternative is historically trying to read the docs or man pages and getting overwhelmed and discouraged if they wind up being hard to grok.
I've literally never seen an LLM respond negatively to being told "hold on that's not right"; they always say "Oh, you're right!" even if you aren't right.
GPT-4 today: "Hey are you sure that's the right package to import?" "Oh, sorry, you're right, its this other package" (hallucinates the most incorrect response only a computer could imagine for ten paragraphs).
I've seen junior engineers lose half a day traveling alongside GPT's madness before an adult is brought in to question an original assumption, or incorrect fork in the road, or whathaveyou.
That's pointing to a fairly solid implementation, though (I've used it.) I would trust it way before I'd trust a de novo implementation from ChatGPT. The idea of people using cryptographic implementations written by current AI services is a bit terrifying.
please don't roll your own crypto, and PLEASE don't roll your own crypto from a LLM. They're useful for other kinds of programs, but crypto libraries need to be to spec, and heavily used and reviewed to not be actively harmful. Not sure ChatGPT can write constant time code :)
People always say this but how else are you going to learn? I doubt many of us who are "rolling our own crypto" are actually deploying it into mission critical contexts anyway.
I mean, by that, people don't generally mean, literally, "never write your own crypto". They just mean "on no account _use_ self-written crypto for anything".
Eventually, figuring out why it didn't work made me have to read the algorithm spec and basically write the code from scratch, throwing away all of the ChatGPT work. Definitely took more time than doing it the "hard way".