Hacker Newsnew | past | comments | ask | show | jobs | submit | bopbopbop7's commentslogin

I'm working on BacklogAI, an autopilot engineer for SaaS teams.

You connect GitHub, CI, Sentry, and Linear, and it takes tickets all the way to production. Claude writes the changes, BacklogAI handles tests, migrations, feature flags, staged rollouts, etc...

It’s clearing months of backlog work in hours, and a couple teams I’m working with have already stopped hiring because it’s cheaper than adding more developers. It's crystal clear that developers and designers won't be needed in a couple of months because claude increases productivity by at least 120x and one or two PMs can do pretty much everything.

My stack is Claude, v0, nextjs, shadcn, clerk, supabase, vercel.


Do you have any concerns about production bugs or human-in-the-loop?

Human in the loop is no longer needed as all production bugs are fixed with a bug fix agent automatically.

What’s needed to make it at least 121x productivity?

Claude 5.0

If your story were true, I could just have Claude implement your business and would have no need to pay you.

If these productivity gains were real, no one would be giving you access.


So you think a statement about the current state of things is wrong because you believe that sometime in the future agents are going to magically do everything? Great argument!

Company that invested 100s of billions into AI says that AI is good. In other news, “Eat meat” says the butcher.

2022/2023: "Next year software engineering is dead"

2024: "Now this time for real, software engineering is dead in 6 months, AI CEO said so"

2025: "I know a guy who knows a guy who built a startup with an LLM in 3 hours, software engineering is dead next year!"

What will be the cope for you this year?


I went from using ChatGPT 3.5 for functions and occasional scripts…

… to one of the models in Jan 2024 being able to repeatedly add features to the same single-page web app without corrupting its own work or hallucinating the APIs it had itself previously generated…

… to last month using a gifted free week of Claude Code to finish one project and then also have enough tokens left over to start another fresh project which, on that free left-over credit, reached a state that, while definitely not well engineered, was still better than some of the human-made pre-GenAI nonsense I've had to work with.

Wasn't 3 hours, and I won't be working on that thing more this month either because I am going to be doing intensive German language study with the goal of getting the language certificate I need for dual citizenship, but from the speed of work? 3 weeks to make a startup is already plausible.

I won't say that "software engineering" is dead. In a lot of cases however "writing code" is dead, and the job of the engineer should now be to do code review and to know what refactors to ask for.


So you did some basic web development and built a "not well engineered" greenfield app that you didn't ship, and from that your conclusion is that "writing code is dead"?

In half a week with left-over credit.

What do you think the first half of the credit was spent on?

In addition to the other projects it finished off for me, the reason I say "coding is dead" is that even this mediocre quality code is already shippable. Customers do not give a toss if it has clean code or nicely refactored python backend, that kind of thing is a pain point purely for developers, and when the LLM is the developer then the LLM is the one who gets to be ordered to pay down the technical debt.

The other project (and a third one I might have done on a previous free trial) are as complete as I care to make them. They're "done" in a way I'm not used to being possible with manual coding, because LLMs can finish features faster than I can think of new useful features to add. The limiting factor is my ability to do code review, or would be if I got the more expensive option, as I was on a free trial I could do code review about twice as fast as I burned through tokens (given what others say about the more expensive option that either means I need to learn to code review faster, or my risk tolerance is lower than theirs).

Now, is my new 3-day web app a viable business idea? It would've been shippable as-is 5-6 years ago, I saw worse live around then. Today? Hard to say, if markets were efficient then everyone would know LLMs can create this kind of thing so easily and nobody could charge for them, but people like yourself who disbelieve are an example of markets not being efficient, people like you can have apps like these sold to them.

That said, I try not to look at where the ball is but where it is going. For business ideas, I have to figure out what *doesn't* scale, and do that. Coding *does* scale now, that's why coding is dead.

I expect to return to this project in a month. Have one of the LLMs expand it and develop it for more than 3 the days spent so far, turn it into something I'd actually be happy to sell. Like I said, it seems like we're at "3 weeks" not "3 hours" for a decent MVP by current standards, but the floor is rising fast.


The cope + disappointment will be knowing that a large population of HN users will paint a weird alternative reality. There are a multitude of messages about AI that are out there, some are highly detached from reality (on the optimistic and pessimistic side). And then there is the rational middle, professionals who see the obvious value of coding agents in their workflow and use them extensively (or figure out how to best leverage them to get the most mileage). I don't see software engineering being "dead" ever, but the nature of the job _has already changed_ and will continue to change. Look at Sonnet 3.5 -> 3.7 -> 4.5 -> Opus 4.5; that was 17 months of development and the leaps in performance are quite impressive. You then have massive hardware buildouts and improvements to stack + a ton of R&D + competition to squeeze the juice out of the current paradigm (there are 4 orders of magnitude of scaling left before we hit real bottlenecks) and also push towards the next paradigm to solve things like continual learning. Some folks have opted not to use coding agents (and some folks like yourself seem to revel in strawmanning people who point out their demonstrable usefulness). Not using coding agents in Jan 2026 is defensible. It won't be defensible for long.

Please do provide some data for this "obvious value of coding agents". Because right now the only thing obvious is the increase in vulnerabilities, people claiming they are 10x more productive but aren't shipping anything, and some AI hype bloggers that fail to provide any quantitative proof.

Sure: at my MAANG company, where I watch the data closely on adoption of CC and other internal coding agent tools, most (significant) LOC are written by agents, and most employees have adopted coding agents as WAU, and the adoption rate is positively correlated with seniority.

Like a lot of things LLM related (Simon Willison's pelican test, researchers + product leaders implementing AI features) I also heavily "vibe" check the capabilities myself on real work tasks. The fact of the matter is I am able to dramatically speed up my work. It may be actually writing production code + helping me review it, or it may be tasks like: write me a script to diagnose this bug I have, or build me a streamlit dashboard to analyze + visualize this ad hoc data instead of me taking 1 hour to make visualizations + munge data in a notebook.

> people claiming they are 10x more productive but aren't shipping anything, and some AI hype bloggers that fail to provide any quantitative proof.

what would satisfy you here? I feel you are strawmanning a bit by picking the most hyperbolic statements and then blanketing that on everyone else.

My workflow is now:

- Write code exclusively with Claude

- Review the code myself + use Claude as a sort of review assistant to help me understand decisions about parts of the code I'm confused about

- Provide feedback to Claude to change / steer it away or towards approaches

- Give up when Claude is hopelessly lost

It takes a bit to get the hang of the right balance but in my personal experience (which I doubt you will take seriously but nevertheless): it is quite the game changer and that's coming from someone who would have laughed at the idea of a $200 coding agent subscription 1 year ago


We probably work at the same company, given you used MAANG instead of FAANG.

As one of the WAU (really DAU) you’re talking about, I want to call out a couple things: 1) the LOC metrics are flawed, and anyone using the agents knows this - eg, ask CC to rewrite the 1 commit you wrote into 5 different commits, now you have 5 100% AI-written commits; 2) total speed up across the entire dev lifecycle is far below 10x, most likely below 2x, but I don’t see any evidence of anyone measuring the counterfactuals to prove speed up anyways, so there’s no clear data; 3) look at token spend for power users, you might be surprised by how many SWE-years they’re spending.

Overall it’s unclear whether LLM-assisted coding is ROI-positive.


To add to your point:

If the M stands for Meta, I would also like to note that as a user, I have been seeing increasingly poor UI, of the sort I'd expect from people committing code that wasn't properly checked before going live, as I would expect from vibe coding in the original sense of "blindly accept without review". Like, some posts have two copies of the sender's name in the same location on screen with slightly different fonts going out of sync with each other.

I can easily believe the metrics that all [MF]AANG bonuses are denominated in are going up, our profession has had jokes about engineers gaming those metrics even back when our comics were still printed in books: https://imgur.com/bug-free-programs-dilbert-classic-tyXXh1d


Oh yes all of this I agree with. I had tried to clarify this above but your examples are clearer: my point is: all measures and studies I have personally seen of AI impact on productivity have been deeply flawed for one reason or another.

Total speed up is WAY less than 10x by any measure. 2x seems too high too.

By data alone it’s a bit unclear of impact I agree. But I will say there seems to be a clear picture that to me, starting from a prior formed from personal experience, indicates some real productivity impact today, with a trajectory that suggests these claims of a lot of SWE work being offloaded to agents over the next few years seems not that far fetched.

- adoption and retention numbers internally and externally. You can argue this is driven by perverse incentives and/or the perception performance mismatch but I’m highly skeptical of this even though the effects of both are probably really, it would be truly extraordinary to me if there weren’t at least a ~10-20% bump in productivity today and a lot of headroom to go as integration gets better and user skill gets better and model capabilities grow

- benchmark performance, again benchmarks are really problematic but there are a lot of them and all of them together paint a pretty clear picture of capabilities truly growing and growing quickly

- there are clearly biases we can think of that would cause us to overestimate AI impact, but there are also biases that may cause us to underestimate impact: e.g. I’m now able to do work that I would have never attempted before. Multitasking is easier. Experiments are quicker and easier. That may not be captured well by e.g. task completion time or other metrics.

I even agree: quality of agentic code can be a real risk, but:

- I think this ignores the fact that humans have also always written shitty code and always will; there is lots of garbage in production believe me, and that predates agentic code

- as models improve, they can correct earlier mistakes

- it’s also a muscle to grow: how to review and use humans in the loop to improve quality and set a high bar


Great response, we’re like 98% aligned at a high-level. :) These next few years will be interesting.

Anecdotes don’t prove anything, ones without any metrics, and especially at MAANG where AI use is strongly incentivized.

Evidence is peer reviewed research, or at least something with metrics. Like the METR study that shows that experienced engineers often got slower on real tasks with AI tools, even though they thought they were faster.


That's why I gave you data! METR study was 16 people using Sonnet 3.5/3.7. Data I'm talking about is 10s of thousands of people and is much more up to date.

Some counter examples to METR that are in the literature but I'll just say: "rigor" here is very difficult (including METR) because outcomes are high dimensional and nuanced, or ecological validity is an issue. It's hard to have any approach that someone wouldn't be able to dismiss due to some issue they have with the methodology. The sources below also have methodological problems just like METR

https://arxiv.org/pdf/2302.06590 -- 55% faster implementing HTTP server in javascript with copilot (in 2023!) but this is a single task and not really representative.

https://demirermert.github.io/Papers/Demirer_AI_productivity... -- "Though each experiment is noisy, when data is combined across three experiments and 4,867 developers, our analysis reveals a 26.08% increase (SE: 10.3%) in completed tasks among developers using the AI tool. Notably, less experienced developers had higher adoption rates and greater productivity gains." (but e.g. "completed tasks" as the outcome measure is of course problematic)

To me, internal company measures for large tech companies will be most reliable -- they are easiest to track and measure, the scale is large enough, and the talent + task pool is diverse (junior -> senior, different product areas, different types of tasks). But then outcome measures are always a problem...commits per developer per month? LOC? task completion time? all of them are highly problematic, especially because its reasonable to expect AI tools would change the bias and variance of the proxy so its never clear if you're measuring the change in "style" or the change in the underlying latent measure of productivity you care about


To be fair, I’ll take a non-biased 16 person study over “internal measures” from a MAANG company that burned 100s of billions on AI with no ROI that is now forcing its employees to use AI.

What do you think about the METR 50% task length results? About benchmark progress generally?

I don't speak for bopbopbop7, but I will say this: my experience of using Claude Code has been that it can do much longer tasks than the METR benchmark implies are possible.

The converse of this is that if those tasks are representative of software engineering as a whole, I would expect a lot of other tasks where it absolutely sucks.

This expectation is further supported by the number of times people pop up in conversations like this to say for any given LLM that it falls flat on its face even for something the poster thinks is simple, that it cost more time than it saved.

As with supposedly "full" self driving on Teslas, the anecdotes about the failure modes are much more interesting than the success: one person whose commute/coding problem happens to be easy, may mistake their own circumstances for normal. Until it does work everywhere, it doesn't work everywhere.

When I experiment with vibe coding (as in, properly unsupervised), it can break down large tasks into small ones and churn through each sub-task well enough, such that it can do a task I'd expect to take most of a sprint by itself. Now, that said, I will also say it seems to do these things a level of "that'll do" not "amazing!", but it does do them.

But I am very much aware this is like all the people posting "well my Tesla commute doesn't need any interventions!" in response to all the people pointing out how it's been a decade since Musk said "I think that within two years, you'll be able to summon your car from across the country. It will meet you wherever your phone is … and it will just automatically charge itself along the entire journey."

It works on my [use case], but we can't always ship my [use case].


I could have guessed you would say that :) but METR is not an unbiased study either. Maybe you mean that METR is less likely to intentionally inflate their numbers?

If you insist or believe in a conspiracy I don’t think there’s really anything I or others will be able to say or show you that would assuage you, all I can say is I’ve seen the raw data. It’s a mess and again we’re stuck with proxies (which are bad since you start conflating the change in the proxy-latent relationship with the treatment effect). And it’s also hard and arguably irresponsible to run RCTs.

All I will say is: there are flaws everywhere. METR results are far from conclusive. Totally understandable if there is a mismatch between perception and performance. But also consider: even if task takes the same or even slightly more time, one big advantage for me is that it substantially reduces cognitive load so I can work in parallel sessions on two completely different issues.


I bet it does reduce your cognitive load, considering you, in your own words "Give up when Claude is hopelessly lost". No better way to reduce cognitive load.

I give up using Claude when it gets hopelessly lost, and then my cognitive load increases.

Meta internal study showed a 6-12% productivity uplift.

https://youtu.be/1OzxYK2-qsI?si=8Tew5BPhV2LhtOg0


> - Give up when Claude is hopelessly lost

You love to see "Maybe completely waste my time" as part of the normal flow for a productivity tool


That negates everything else? If you have a tool that can boost you for 80% of your work and for the other 20% you just have to do what you’re already doing, is that bad?

There's a reason why sunk cost IS a fallacy and not a sound strategy.

The productivity uplift is massive, Meta got a 6-12% productivity uplift from AI coding!

https://youtu.be/1OzxYK2-qsI?si=8Tew5BPhV2LhtOg0


> You then have massive hardware buildouts and improvements to stack + a ton of R&D + competition to squeeze the juice out of the current paradigm (there are 4 orders of magnitude of scaling left before we hit real bottlenecks)

This is a surprising claim. There's only 3 orders of magnitude between US data centre electricity consumption and worldwide primary energy (as in, not just electricity) production. Worldwide electricity supply is about 3/20ths of world primary energy, so without very rapid increases in electricity supply there's really only a little more than 2 orders of magnitude growth possible in compute.

Renewables are growing fast, but "fast" means "will approach 100% of current electricity demand by about 2032". Which trend is faster, growth of renewable electricity or growth of compute? Trick question, compute is always constrained by electricity supply, and renewable electricity is growing faster than anything else can right now.


This is not my own claim, it’s based on the following analysis from Epoch: https://epoch.ai/blog/can-ai-scaling-continue-through-2030

But I forgot how old that article is: it’s 4 orders of magnitude past GPT-4 in terms of total compute which is I think only 3.5 orders of magnitude from where we are today (based on 4.4x scaling/yr)


The nature of my job has always been fighting red tape, process, and stake holders to deploy very small units of code to production. AI really did not help with much of that for me in 2025.

I'd imagine I'm not the only one who has a similar situation. Until all those people and processes can be swept away in favor of letting LLMS YOLO everything into production, I don't see how that changes.


No I think that's extremely correct. I work at a MAANG where we have the resources to hook up custom internal LLMs and agents to actually deal with that but that is unique to an org of our scale.

Weird self roast but okay.

Over 30 years code artisan here. AI has made me 100x more productive. No, I will not provide proof. Sam Altman is the best.

Do you have any evidence of all these "killings" of the profession or are you just vibing?


Photography had particularly dramatic effects on the livelihoods of painters who operated on the fringe of the mainstream. This included the portrait miniaturists, whose markets fell drastically, particularly after the introduction of the multi-pose and cheap cartes de visite in the mid-1850s. Many gave up, while others turned to colouring photos [25]. Some painters of sentimental genre scenes were also particularly affected, as a result of the profusion of readily available photographic genre works, often composed in a painterly or "pictorial" style [26]. This was sometimes due not to the public’s preference for the photographic version, but simply because a particular subject matter lost its appeal to painters and their clients once photography entered the scene [27]. In addition, the introduction of “half-tone” photography in the 1880s also initiated a slow decline in the market for newspaper and magazine illustrators [28].

Much more here: https://www.artinsociety.com/pt-1-initial-impacts.html

https://www.barnesfoundation.org/whats-on/early-photography


Nice wall of text, which part of that says painters jobs were killed?

Or did you just read the title of the second article and not realize it’s not being literal but capturing the anxiety of the painters in the 19th century?


I think the first article which is highly recommended (where the excerpt comes from) goes over subsequent effects on the profession. The second one goes over the different genres that disappeared, and concerns less with the artists themselves

Apart from that our interaction seem overly emotional for me so I'd leave it as that


So nothing about killing the profession, got it, so we were just vibing.


The camera did not kill painting. And how does comparing a camera to an LLM even make sense?


Painting (portraits for example) as a profession largely disappeared, while art based on painting evolved (impressionism, cubism, etc) due to the camera.

My point is that photography is essentially a simulacra of reality, yet it unexpectedly created its own art form and influenced existing ones. So will the use of LLMs for generation


LLMs will not do what the camera did. LLMs have no anchor to reality like the camera, they simply optimize for the average. A camera is a whole new medium, an LLM is a statistical construction. Sorry to burst your AI bubble. LLMs will not be the new camera, they won't be a new programming language, and they won't be the new compiler.


I am not sure I agree, a camera on the surface of things is the most boring machine. It shows you what was already there. It is still can be the basis of several interesting art forms

I don't see why this can't happen with AI, or at least I am not certain like you it can't happen


It might turn out that there are more portrait artists, brush in hand, working today than at any point in history, in real numbers. This is certainly true for riding horses, and most definitly for musicians. But as the sole sources of those services, that is no longer true, or as percentages of total's, but with 9 billion people, the internet, and a big of effort, almost anything is viable.


I generally agree because the population grew, and music can be put aside has it never went through such a technological revolution (synth never really caught on)

But from what I saw there are less living horses today than 200 years ago, and although that just a proxy for horse riders I believe the same applies for painters, especially non-hobbyists


They're conflating the LLM to advancement by comparing LLM:Camera, when really cameras and paintings are two different things


Code writing was solved in 1997 when Dreamweaver was released.


Nope, it was solved with Visual Basic in 1991. And with Nextstep in 1989. And with...

I really dislike people comparing GenAI with compilers. Compilers largely do mechanic transformations, they do almost 0 logic changes (and if they do, they're bugs).

We are in an industry that's great at throwing (developing) and really bad at catching (QA) and we've just invented the machine gun. For some reason people expect the machine gun to be great at catching, or worse, they expect to just throw things continuously and have things working as before.

There is a lot of software for which bugs (especially data handling bugs) don't meaningfully affect its users. BUT there isn't a lot of software we use daily and rely on for which that's the case.

I know that GenAI can help with QA, but I don't really see a world where using GenAI for both coding and QA gets us to where we want to go, unless as some people say, we start using formal verification (or other very rigorous and hopefully automatable advanced verification), at which point we'll have invented a new category of programmers (and we will need to train all of them since the vast majority of current developers don't know about or use formal verification).


What does it mean?


The preferred definition of "vibe coding" is when you have AI generate code that you use without reviewing it first: https://simonwillison.net/2025/Mar/19/vibe-coding/

Unfortunately a lot of people think it means any time an LLM helps write code, but I think we're winning that semantic battle - I'm seeing more examples of it used correctly than incorrectly these days.

It's likely that the majority of code will be AI assisted in some way in the future, at which point calling all of it "vibe coding" will lose any value at all. That's why I prefer the definition that specifies unreviewed.


I share your preference. (I also mourn the loss of the word "vibe" for other contexts.) In this case there were apparently hundreds of commit messages stating "generated by Claude Code". I feel like there's a missing set of descriptors -- something similar to Creative Commons with its now-familiar labels like "CC-BY-SA" -- that could be used to indicate the relative degree of human involvement. Full-on "AI-YOLO-Paperclips" at one extreme could be distinguished from "AI-IDE-TA" for typeahead / fancy autocomplete at the other. Simon, you're in a fantastic position to champion some kind of basic system like this. If you run w/ this idea, please give me a shout-out. :)


I also hope that majority of the code in the future is AI assisted like it is with PostHog because my cyber security firm is going to make so much money.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: