More

bopbopbop7 · 2026-01-14T17:29:48 1768411788

Great LinkedIn post, thanks for sharing.

bopbopbop7 · 2026-01-13T19:26:56 1768332416

Just a couple more months and a couple more trillion dollars to Altman!

bopbopbop7 · 2026-01-13T02:48:13 1768272493

I can't believe they keep on forgetting to add "make it secure" to the end of their prompts.

bopbopbop7 · 2026-01-12T19:55:44 1768247744

Why not just ask Claude to fix the security issues and make sure they don't happen again?

Hamuko · 2026-01-12T20:26:56 1768249616

And if you don't have a Claude subscription, you can just ask your friends to fix them via the remote code execution server.

reactordev · 2026-01-13T01:17:34 1768267054

There goes my discord side hustle, offering Claude code through your OpenCode.

Y_Y · 2026-01-12T20:28:31 1768249711

Talk about kicking someone while they're down...

lostmsu · 2026-01-13T13:52:47 1768312367

I imagine Claude would be able to at least fix this one.

0x500x79 · 2026-01-13T16:03:45 1768320225

I imagine Claude helped write this one.

croes · 2026-01-12T22:04:49 1768255489

Who knows what created the issues in the first place place

bopbopbop7 · 2026-01-11T20:09:34 1768162174

I'm working on BacklogAI, an autopilot engineer for SaaS teams.

You connect GitHub, CI, Sentry, and Linear, and it takes tickets all the way to production. Claude writes the changes, BacklogAI handles tests, migrations, feature flags, staged rollouts, etc...

It’s clearing months of backlog work in hours, and a couple teams I’m working with have already stopped hiring because it’s cheaper than adding more developers. It's crystal clear that developers and designers won't be needed in a couple of months because claude increases productivity by at least 120x and one or two PMs can do pretty much everything.

My stack is Claude, v0, nextjs, shadcn, clerk, supabase, vercel.

what · 2026-01-12T03:13:31 1768187611

If your story were true, I could just have Claude implement your business and would have no need to pay you.

If these productivity gains were real, no one would be giving you access.

bopbopbop7 · 2026-01-12T16:41:01 1768236061

They are real, you simply need to prompt better.

hotpotat · 2026-01-11T20:43:55 1768164235

What’s needed to make it at least 121x productivity?

bopbopbop7 · 2026-01-11T20:55:38 1768164938

Claude 5.0

rafterydj · 2026-01-11T20:42:46 1768164166

Do you have any concerns about production bugs or human-in-the-loop?

bopbopbop7 · 2026-01-11T20:55:20 1768164920

Human in the loop is no longer needed as all production bugs are fixed with a bug fix agent automatically.

bopbopbop7 · 2026-01-10T19:19:42 1768072782

So you think a statement about the current state of things is wrong because you believe that sometime in the future agents are going to magically do everything? Great argument!

bopbopbop7 · 2026-01-03T16:42:01 1767458521

Company that invested 100s of billions into AI says that AI is good. In other news, “Eat meat” says the butcher.

bopbopbop7 · 2026-01-01T17:25:38 1767288338

2022/2023: "Next year software engineering is dead"

2024: "Now this time for real, software engineering is dead in 6 months, AI CEO said so"

2025: "I know a guy who knows a guy who built a startup with an LLM in 3 hours, software engineering is dead next year!"

What will be the cope for you this year?

ben_w · 2026-01-02T10:28:04 1767349684

I went from using ChatGPT 3.5 for functions and occasional scripts…

… to one of the models in Jan 2024 being able to repeatedly add features to the same single-page web app without corrupting its own work or hallucinating the APIs it had itself previously generated…

… to last month using a gifted free week of Claude Code to finish one project and then also have enough tokens left over to start another fresh project which, on that free left-over credit, reached a state that, while definitely not well engineered, was still better than some of the human-made pre-GenAI nonsense I've had to work with.

Wasn't 3 hours, and I won't be working on that thing more this month either because I am going to be doing intensive German language study with the goal of getting the language certificate I need for dual citizenship, but from the speed of work? 3 weeks to make a startup is already plausible.

I won't say that "software engineering" is dead. In a lot of cases however "writing code" is dead, and the job of the engineer should now be to do code review and to know what refactors to ask for.

bopbopbop7 · 2026-01-02T17:10:39 1767373839

So you did some basic web development and built a "not well engineered" greenfield app that you didn't ship, and from that your conclusion is that "writing code is dead"?

ben_w · 2026-01-02T20:21:55 1767385315

In half a week with left-over credit.

What do you think the first half of the credit was spent on?

In addition to the other projects it finished off for me, the reason I say "coding is dead" is that even this mediocre quality code is already shippable. Customers do not give a toss if it has clean code or nicely refactored python backend, that kind of thing is a pain point purely for developers, and when the LLM is the developer then the LLM is the one who gets to be ordered to pay down the technical debt.

The other project (and a third one I might have done on a previous free trial) are as complete as I care to make them. They're "done" in a way I'm not used to being possible with manual coding, because LLMs can finish features faster than I can think of new useful features to add. The limiting factor is my ability to do code review, or would be if I got the more expensive option, as I was on a free trial I could do code review about twice as fast as I burned through tokens (given what others say about the more expensive option that either means I need to learn to code review faster, or my risk tolerance is lower than theirs).

Now, is my new 3-day web app a viable business idea? It would've been shippable as-is 5-6 years ago, I saw worse live around then. Today? Hard to say, if markets were efficient then everyone would know LLMs can create this kind of thing so easily and nobody could charge for them, but people like yourself who disbelieve are an example of markets not being efficient, people like you can have apps like these sold to them.

That said, I try not to look at where the ball is but where it is going. For business ideas, I have to figure out what *doesn't* scale, and do that. Coding *does* scale now, that's why coding is dead.

I expect to return to this project in a month. Have one of the LLMs expand it and develop it for more than 3 the days spent so far, turn it into something I'd actually be happy to sell. Like I said, it seems like we're at "3 weeks" not "3 hours" for a decent MVP by current standards, but the floor is rising fast.

aspenmartin · 2026-01-01T20:15:30 1767298530

The cope + disappointment will be knowing that a large population of HN users will paint a weird alternative reality. There are a multitude of messages about AI that are out there, some are highly detached from reality (on the optimistic and pessimistic side). And then there is the rational middle, professionals who see the obvious value of coding agents in their workflow and use them extensively (or figure out how to best leverage them to get the most mileage). I don't see software engineering being "dead" ever, but the nature of the job _has already changed_ and will continue to change. Look at Sonnet 3.5 -> 3.7 -> 4.5 -> Opus 4.5; that was 17 months of development and the leaps in performance are quite impressive. You then have massive hardware buildouts and improvements to stack + a ton of R&D + competition to squeeze the juice out of the current paradigm (there are 4 orders of magnitude of scaling left before we hit real bottlenecks) and also push towards the next paradigm to solve things like continual learning. Some folks have opted not to use coding agents (and some folks like yourself seem to revel in strawmanning people who point out their demonstrable usefulness). Not using coding agents in Jan 2026 is defensible. It won't be defensible for long.

bopbopbop7 · 2026-01-01T20:25:58 1767299158

Please do provide some data for this "obvious value of coding agents". Because right now the only thing obvious is the increase in vulnerabilities, people claiming they are 10x more productive but aren't shipping anything, and some AI hype bloggers that fail to provide any quantitative proof.

aspenmartin · 2026-01-01T20:39:15 1767299955

Sure: at my MAANG company, where I watch the data closely on adoption of CC and other internal coding agent tools, most (significant) LOC are written by agents, and most employees have adopted coding agents as WAU, and the adoption rate is positively correlated with seniority.

Like a lot of things LLM related (Simon Willison's pelican test, researchers + product leaders implementing AI features) I also heavily "vibe" check the capabilities myself on real work tasks. The fact of the matter is I am able to dramatically speed up my work. It may be actually writing production code + helping me review it, or it may be tasks like: write me a script to diagnose this bug I have, or build me a streamlit dashboard to analyze + visualize this ad hoc data instead of me taking 1 hour to make visualizations + munge data in a notebook.

> people claiming they are 10x more productive but aren't shipping anything, and some AI hype bloggers that fail to provide any quantitative proof.

what would satisfy you here? I feel you are strawmanning a bit by picking the most hyperbolic statements and then blanketing that on everyone else.

My workflow is now:

- Write code exclusively with Claude

- Review the code myself + use Claude as a sort of review assistant to help me understand decisions about parts of the code I'm confused about

- Provide feedback to Claude to change / steer it away or towards approaches

- Give up when Claude is hopelessly lost

It takes a bit to get the hang of the right balance but in my personal experience (which I doubt you will take seriously but nevertheless): it is quite the game changer and that's coming from someone who would have laughed at the idea of a $200 coding agent subscription 1 year ago

Denzel · 2026-01-02T02:30:35 1767321035

We probably work at the same company, given you used MAANG instead of FAANG.

As one of the WAU (really DAU) you’re talking about, I want to call out a couple things: 1) the LOC metrics are flawed, and anyone using the agents knows this - eg, ask CC to rewrite the 1 commit you wrote into 5 different commits, now you have 5 100% AI-written commits; 2) total speed up across the entire dev lifecycle is far below 10x, most likely below 2x, but I don’t see any evidence of anyone measuring the counterfactuals to prove speed up anyways, so there’s no clear data; 3) look at token spend for power users, you might be surprised by how many SWE-years they’re spending.

Overall it’s unclear whether LLM-assisted coding is ROI-positive.

ben_w · 2026-01-02T10:53:32 1767351212

To add to your point:

If the M stands for Meta, I would also like to note that as a user, I have been seeing increasingly poor UI, of the sort I'd expect from people committing code that wasn't properly checked before going live, as I would expect from vibe coding in the original sense of "blindly accept without review". Like, some posts have two copies of the sender's name in the same location on screen with slightly different fonts going out of sync with each other.

I can easily believe the metrics that all [MF]AANG bonuses are denominated in are going up, our profession has had jokes about engineers gaming those metrics even back when our comics were still printed in books: https://imgur.com/bug-free-programs-dilbert-classic-tyXXh1d

aspenmartin · 2026-01-02T13:21:36 1767360096

Oh yes all of this I agree with. I had tried to clarify this above but your examples are clearer: my point is: all measures and studies I have personally seen of AI impact on productivity have been deeply flawed for one reason or another.

Total speed up is WAY less than 10x by any measure. 2x seems too high too.

By data alone it’s a bit unclear of impact I agree. But I will say there seems to be a clear picture that to me, starting from a prior formed from personal experience, indicates some real productivity impact today, with a trajectory that suggests these claims of a lot of SWE work being offloaded to agents over the next few years seems not that far fetched.

- adoption and retention numbers internally and externally. You can argue this is driven by perverse incentives and/or the perception performance mismatch but I’m highly skeptical of this even though the effects of both are probably really, it would be truly extraordinary to me if there weren’t at least a ~10-20% bump in productivity today and a lot of headroom to go as integration gets better and user skill gets better and model capabilities grow

- benchmark performance, again benchmarks are really problematic but there are a lot of them and all of them together paint a pretty clear picture of capabilities truly growing and growing quickly

- there are clearly biases we can think of that would cause us to overestimate AI impact, but there are also biases that may cause us to underestimate impact: e.g. I’m now able to do work that I would have never attempted before. Multitasking is easier. Experiments are quicker and easier. That may not be captured well by e.g. task completion time or other metrics.

I even agree: quality of agentic code can be a real risk, but:

- I think this ignores the fact that humans have also always written shitty code and always will; there is lots of garbage in production believe me, and that predates agentic code

- as models improve, they can correct earlier mistakes

- it’s also a muscle to grow: how to review and use humans in the loop to improve quality and set a high bar

Denzel · 2026-01-09T02:49:49 1767926989

Great response, we’re like 98% aligned at a high-level. :) These next few years will be interesting.

bopbopbop7 · 2026-01-01T20:54:53 1767300893

Anecdotes don’t prove anything, ones without any metrics, and especially at MAANG where AI use is strongly incentivized.

Evidence is peer reviewed research, or at least something with metrics. Like the METR study that shows that experienced engineers often got slower on real tasks with AI tools, even though they thought they were faster.

aspenmartin · 2026-01-01T22:16:11 1767305771

That's why I gave you data! METR study was 16 people using Sonnet 3.5/3.7. Data I'm talking about is 10s of thousands of people and is much more up to date.

Some counter examples to METR that are in the literature but I'll just say: "rigor" here is very difficult (including METR) because outcomes are high dimensional and nuanced, or ecological validity is an issue. It's hard to have any approach that someone wouldn't be able to dismiss due to some issue they have with the methodology. The sources below also have methodological problems just like METR

https://arxiv.org/pdf/2302.06590 -- 55% faster implementing HTTP server in javascript with copilot (in 2023!) but this is a single task and not really representative.

https://demirermert.github.io/Papers/Demirer_AI_productivity... -- "Though each experiment is noisy, when data is combined across three experiments and 4,867 developers, our analysis reveals a 26.08% increase (SE: 10.3%) in completed tasks among developers using the AI tool. Notably, less experienced developers had higher adoption rates and greater productivity gains." (but e.g. "completed tasks" as the outcome measure is of course problematic)

To me, internal company measures for large tech companies will be most reliable -- they are easiest to track and measure, the scale is large enough, and the talent + task pool is diverse (junior -> senior, different product areas, different types of tasks). But then outcome measures are always a problem...commits per developer per month? LOC? task completion time? all of them are highly problematic, especially because its reasonable to expect AI tools would change the bias and variance of the proxy so its never clear if you're measuring the change in "style" or the change in the underlying latent measure of productivity you care about

bopbopbop7 · 2026-01-01T22:38:44 1767307124

To be fair, I’ll take a non-biased 16 person study over “internal measures” from a MAANG company that burned 100s of billions on AI with no ROI that is now forcing its employees to use AI.

anorwell · 2026-01-02T02:04:56 1767319496

What do you think about the METR 50% task length results? About benchmark progress generally?

ben_w · 2026-01-02T11:02:33 1767351753

I don't speak for bopbopbop7, but I will say this: my experience of using Claude Code has been that it can do much longer tasks than the METR benchmark implies are possible.

The converse of this is that if those tasks are representative of software engineering as a whole, I would expect a lot of other tasks where it absolutely sucks.

This expectation is further supported by the number of times people pop up in conversations like this to say for any given LLM that it falls flat on its face even for something the poster thinks is simple, that it cost more time than it saved.

As with supposedly "full" self driving on Teslas, the anecdotes about the failure modes are much more interesting than the success: one person whose commute/coding problem happens to be easy, may mistake their own circumstances for normal. Until it does work everywhere, it doesn't work everywhere.

When I experiment with vibe coding (as in, properly unsupervised), it can break down large tasks into small ones and churn through each sub-task well enough, such that it can do a task I'd expect to take most of a sprint by itself. Now, that said, I will also say it seems to do these things a level of "that'll do" not "amazing!", but it does do them.

But I am very much aware this is like all the people posting "well my Tesla commute doesn't need any interventions!" in response to all the people pointing out how it's been a decade since Musk said "I think that within two years, you'll be able to summon your car from across the country. It will meet you wherever your phone is … and it will just automatically charge itself along the entire journey."

It works on my [use case], but we can't always ship my [use case].

aspenmartin · 2026-01-02T00:08:21 1767312501

I could have guessed you would say that :) but METR is not an unbiased study either. Maybe you mean that METR is less likely to intentionally inflate their numbers?

If you insist or believe in a conspiracy I don’t think there’s really anything I or others will be able to say or show you that would assuage you, all I can say is I’ve seen the raw data. It’s a mess and again we’re stuck with proxies (which are bad since you start conflating the change in the proxy-latent relationship with the treatment effect). And it’s also hard and arguably irresponsible to run RCTs.

All I will say is: there are flaws everywhere. METR results are far from conclusive. Totally understandable if there is a mismatch between perception and performance. But also consider: even if task takes the same or even slightly more time, one big advantage for me is that it substantially reduces cognitive load so I can work in parallel sessions on two completely different issues.

bopbopbop7 · 2026-01-02T00:28:24 1767313704

I bet it does reduce your cognitive load, considering you, in your own words "Give up when Claude is hopelessly lost". No better way to reduce cognitive load.

aspenmartin · 2026-01-02T01:03:22 1767315802

I give up using Claude when it gets hopelessly lost, and then my cognitive load increases.

Ianjit · 2026-01-01T23:59:26 1767311966

Meta internal study showed a 6-12% productivity uplift.

https://youtu.be/1OzxYK2-qsI?si=8Tew5BPhV2LhtOg0

insin · 2026-01-01T22:40:57 1767307257

> - Give up when Claude is hopelessly lost

You love to see "Maybe completely waste my time" as part of the normal flow for a productivity tool

aspenmartin · 2026-01-02T00:09:51 1767312591

That negates everything else? If you have a tool that can boost you for 80% of your work and for the other 20% you just have to do what you’re already doing, is that bad?

shimman · 2026-01-02T01:45:44 1767318344

There's a reason why sunk cost IS a fallacy and not a sound strategy.

Ianjit · 2026-01-01T23:56:35 1767311795

The productivity uplift is massive, Meta got a 6-12% productivity uplift from AI coding!

https://youtu.be/1OzxYK2-qsI?si=8Tew5BPhV2LhtOg0

nsxwolf · 2026-01-01T20:28:04 1767299284

The nature of my job has always been fighting red tape, process, and stake holders to deploy very small units of code to production. AI really did not help with much of that for me in 2025.

I'd imagine I'm not the only one who has a similar situation. Until all those people and processes can be swept away in favor of letting LLMS YOLO everything into production, I don't see how that changes.

aspenmartin · 2026-01-01T20:41:12 1767300072

No I think that's extremely correct. I work at a MAANG where we have the resources to hook up custom internal LLMs and agents to actually deal with that but that is unique to an org of our scale.

ben_w · 2026-01-02T10:43:33 1767350613

> You then have massive hardware buildouts and improvements to stack + a ton of R&D + competition to squeeze the juice out of the current paradigm (there are 4 orders of magnitude of scaling left before we hit real bottlenecks)

This is a surprising claim. There's only 3 orders of magnitude between US data centre electricity consumption and worldwide primary energy (as in, not just electricity) production. Worldwide electricity supply is about 3/20ths of world primary energy, so without very rapid increases in electricity supply there's really only a little more than 2 orders of magnitude growth possible in compute.

Renewables are growing fast, but "fast" means "will approach 100% of current electricity demand by about 2032". Which trend is faster, growth of renewable electricity or growth of compute? Trick question, compute is always constrained by electricity supply, and renewable electricity is growing faster than anything else can right now.

aspenmartin · 2026-01-02T13:31:25 1767360685

This is not my own claim, it’s based on the following analysis from Epoch: https://epoch.ai/blog/can-ai-scaling-continue-through-2030

But I forgot how old that article is: it’s 4 orders of magnitude past GPT-4 in terms of total compute which is I think only 3.5 orders of magnitude from where we are today (based on 4.4x scaling/yr)

bopbopbop7 · 2025-12-30T03:53:38 1767066818

Weird self roast but okay.

bopbopbop7 · 2025-12-30T03:36:53 1767065813

Over 30 years code artisan here. AI has made me 100x more productive. No, I will not provide proof. Sam Altman is the best.