More

aspenmartin · 2026-01-11T17:12:32 1768151552

I don't think these characterizations in either direction are very helpful; I understand they're coming from a place with someone trying to make sense of why their ingrained notion of what creativity means and what the "right" way to generate software projects is is not shared by other people.

I use CC for both business and personal projects. In both cases: I want to achieve something cool. If I do it by hand, it is slow, I will need to learn something new which takes too much time and often time the thing(s) I need to learn is not interesting to me (at the time). Additionally, I am slow and perpetually unhappy with the abstractions and design choices I make despite trying very hard to think through them. With CC: it can handle parts of the project I don't want to deal with, it can help me learn the things I want to learn, it can execute quickly so I can try more things and fail fast.

What's lamentable is the conclusion of "if you use AI it is not truly creative" ("have people using AI lost all understanding of creativity or creation?" is a bit condescending).

In other threads the sensitive dynamic from the AI-skeptic crowds is more or less that AI enthusiasts "threaten or bully" people who are not enthusiastic that they will get "punished" or fall behind. Yet at the same time, AI-skeptics seem to routinely make passive aggressive implications that they are the ones truly Creating Art and are the true Craftsman; as if this venture is some elitist art form that should be gate kept by all of you True Programmers (TM).

I find these takes (1) condescending, (2) wrong and also belying a lack of imagination about what others may find genuinely enjoyable and inspiring, (3) just as much of a straw man as their gripes against others "bullying" them into using AI.

aspenmartin · 2026-01-09T20:36:27 1767990987

Is this coming from the hypothesis / prior that coding agents are a net negative and those who use them really are akin to gambling addicts that are just fooling themselves?

The OP is right and I feel this a lot: when Claude pulls me into a rabbit hole, convinces me it knows where to go, and then just constantly falls flat on its face and we waste like several hours together, with a lot of all caps prompts from me towards the end. These sessions last in a way that he mentions: "maybe its just a prompt away from working"

But I would never delete CC because there are plenty of other instances where it works excellent and accelerates things quite a lot. And additionally, I know we see a lot of "coding agents are getting worse!" and "METR study proves all you AI sycophants are deluding yourselves!" and I again understand where these come from, agree with some of the points they raise, but honestly: my own personal perception (which I argue is pretty well backed up by benchmarks and by Claude's own product data which we don't see -- I doubt they would roll out a launch without at least one or more A/B tests) is that coding agents are getting much better, and that as a verifiable domain these "we're running out of data!" problems just aren't relevant here. The same way alphago gets superhuman, so will these coding agents, it's just a matter of when, and I use them today because they are already useful to me.

rileymichael · 2026-01-09T21:33:18 1767994398

no, this is coming from the fact OP states they are miserable. that is unsustainable. at the end of the day the more productive setup is the one that keeps you happy and in your chair long term, as you'll produce nothing if you are burnt out.

aspenmartin · 2026-01-09T21:58:55 1767995935

Oh sure of course, I missed that part!

aspenmartin · 2026-01-09T20:28:55 1767990535

I understand this sentiment but, it is a lot of fun for me. Because I want to make a real thing to do something, and I didn't get into programming for the love of it, I got into it as a means to an end.

It's like the articles point: we don't do assembly anymore and no one considers gcc to be controversial and no one today says "if you think gcc is fun I will never understand you, real programming is assembly, that's the fun part"

You are doing different things and exercising different skillsets when you use agents. People enjoy different aspects of programming, of building. My job is easier, I'm not sad about that I am very grateful.

Do you resent folks like us that do find it fun? Do you consider us "lesser" because we use coding agents? ("the same as saying you’re really into painting but you’re not really into the brush aspect so you pay someone to paint what you describe. That’s not doing, it’s commissioning.") <- I don't really care if you consider this "true" painting or not, I wanted a painting and now I have a painting. Call me whatever you want!

lunar_mycroft · 2026-01-09T21:36:15 1767994575

> It's like the articles point: we don't do assembly anymore and no one considers gcc to be controversial and no one today says "if you think gcc is fun I will never understand you, real programming is assembly, that's the fun part"

The compiler reliably and deterministically produces code that does exactly what you specified in the source code. In most cases, the code it produces is also as fast/faster than hand written assembly. The same can't be said for LLMs, for the simple reason that English (and other natural languages) is not a programming language. You can't compile English (and shouldn't want to, as Dijkstra correctly pointed out) because it's ambiguous. All you can do is "commission" another

> Do you resent folks like us that do find it fun?

For enjoying it on your own time? No. But for hyping up the technology well beyond it's actual merits, antagonizing people who point out it's shortcomings, and subjecting the rest of us to worse code? Yeah, I hold that against the LLM fans.

aspenmartin · 2026-01-09T22:11:52 1767996712

That a coding agent or LLM is a different technology than a compiler and that the delta in industry standard workflow looks different isn’t quite my point though: things change. Norms change. That’s the real crux of my argument.

> But for hyping up the technology well beyond it's actual merits, antagonizing people who point out it's shortcomings, and subjecting the rest of us to worse code? Yeah, I hold that against the LLM fans.

Is that what I’m doing? I understand your frustration. But I hope you understand that this is a straw man: I can straw man the antagonists and AI-hostile folks but the point is the factions and tribes are complex and unreasonable opinions abound. My stance is that people can dismiss coding agents at their peril, but it’s not really a problem: taking the gcc analogy, in the early compiler days there was a period where compilers were weak enough that assembly by hand was reasonable. Now it would be just highly inefficient and underperformant to do that. But all the folks that lamented compilers didn’t crumble away, they eventually adapted. I see that analogy as being applicable here, it may be hard to see the insanity of coding agents because we’re not time travelers from 2020 or even 2022 or 3. But this used to be an absurd idea and is now very serious and highly adopted. But still quite weak!! Still we’re missing key reliability and functionality and capabilities. But if we got this far this fast, and if you realize that coding agent training is not limited in the same way that e.g. vanilla LLM training is by being a verifiable domain, we seem to be careening forward. But by nature of their current weakness, absolutely it is reasonable not to use them and absolutely it is reasonable to point out all of their flaws.

Lots of unreasonable people out there, my argument is simply: be reasonable.

lunar_mycroft · 2026-01-10T23:30:08 1768087808

As others has already been pointed out, not all new technologies that are proposed are improvements. You say you understand this, but the clear subtext of the analogy to compilers is that LLM driven development are a obvious improvement and if we don't adopt them we'll find ourselves in the same position as assembly programmers who refused to learn compiled languages.

> Is that what I’m doing?

Initially I'd have been reluctant to say yes, but this very comment is laced with assertions that we'd better all start adopting LLMs for coding or we're going to get left behind [0]

> taking the gcc analogy, in the early compiler days there was a period where compilers were weak enough that assembly by hand was reasonable. Now it would be just highly inefficient and underperformant to do that

No matter how good LLMs get at translating english into programs, they will still be limited by the fact that their input (natural language) isn't a programming language. This doesn't mean it can't get way better, but it's always going to have some of the same downsides of collaborating with another programmer.

[0] This is another red flag I would hope programmers would have learned to recognize. Good technology doesn't need to try to threaten people into adopting it.

aspenmartin · 2026-01-11T17:02:13 1768150933

My intention was to say: you won't get left behind you will just get left slightly behind the curve until things reach a point where you feel you have no choice but to join the dark side. Like gcc/assembly: sure maybe there were some hardcore assembly holdouts but any day they could and probably did jump on the bandwagon. This is also speculation, I agree, but my point is: not using LLMs/coding agents today is very very reasonable, and the limitations that people often bring up are also very reasonable and believable.

> No matter how good LLMs get at translating english into programs, they will still be limited by the fact that their input (natural language) isn't a programming language.

Right but engineers routinely convert natural language + business context into formal programs, arguably an enormously important part of creating a software product. What's any different here? Like a programmer, the creation process is two-way. The agent iteratively retrieves additional information, asks questions, checks their approach, etc etc.

> [0] This is another red flag I would hope programmers would have learned to recognize. Good technology doesn't need to try to threaten people into adopting it.

I think I was either not clear or you misread my comment: you're not going to get left behind any more than you want to. Jump in when you feel good about where the technology is and use it where you feel it should be used. Again: if you don't see value in your own personal situation with coding agents, that is objectively a reasonable stance to hold today.

bossyTeacher · 2026-01-09T23:44:46 1768002286

> Norms change. That’s the real crux of my argument.

Novelty isn't necessarily better as a replacement of what exists. Example: blockchain as fancy database, NFTs, Internet Explorer, Silverlight, etc.

aspenmartin · 2026-01-09T23:51:26 1768002686

No it’s certainly not, and if you do want to lump coding agents into blockchain and NFTs that’s of course your choice but those things did not spur trillions of dollars of infra buildout and reshape entire geopolitical landscapes and have billions of active users. If you want to say: coding agents are not truly a net positive right now, that’s I think a perfectly reasonable opinion to hold (though I disagree personally). If you want to say coding agents are about as vapid as NFTs that to me is a bit less defensible

aspenmartin · 2026-01-09T17:55:32 1767981332

I don’t think for this approach it sounds like, this is related to the large concept model: https://arxiv.org/abs/2412.08821, where the latent space is SONAR, which is very much designed for this purpose. You learn SONAR embeddings so that every sentence with the same semantic meaning gets mapped to the same latent representation. So you can have e.g. a French SONAR encoder and a Finnish SONAR encoder, trained separately with large scale corpi of paired sentences with the same meaning (basically the same thing you would use for learning translation models directly, but for SONAR you don’t need to train a single model per pair of languages). The LCM then works in this language-agnostic SONAR space which means it does (in principle) learn concepts from texts or speech in all supported languages

aspenmartin · 2026-01-07T22:34:11 1767825251

I do a lot of human evaluations. Lots of Bayesian / statistical models that can infer rater quality without ground truth labels. The other thing about preference data you have to worry about (which this article gets at) is: preferences of _who_? Human raters are a significantly biased population of people, different ages, genders, religions, cultures, etc all inform preferences. Lots of work being done to leverage and model this.

Then for LMArena there is the host of other biases / construct validity: people are easily fooled, even PhD experts; in many cases it’s easier for a model to learn how to persuade than actually learn the right answers.

But a lot of dismissive comments as if frontier labs don’t know this, they have some of the best talent in the world. They aren’t perfect but they in a large sene know what they’re doing and what the tradeoffs of various approaches are.

Human annotations are an absolute nightmare for quality which is why coding agents are so nice: they’re verifiable and so you can train them in a way closer to e.g. alphago without the ceiling of human performance

fc417fc802 · 2026-01-07T22:48:00 1767826080

> in many cases it’s easier for a model to learn how to persuade than actually learn the right answers

So we should expect the models to eventually tend toward the same behaviors that politicians exhibit?

c0balt · 2026-01-07T23:42:39 1767829359

Maybe a happy to deceive marketing/sales role would be more accurate.

RA_Fisher · 2026-01-07T23:41:07 1767829267

100% (am a Bayesian statistician).

Isn’t it fascinating how it comes down to quality of judgement (and the descriptions thereof)?

We need an LMArena rated by experts.

Lerc · 2026-01-08T03:14:21 1767842061

As a statistician, do you you think you could, given access to the data, identify the subset of LMArena users that are experts?

RA_Fisher · 2026-01-08T12:09:52 1767874192

Yes, for sure! I can think of a few ways.

zqy123007 · 2026-01-08T01:14:40 1767834880

they always know, they just have non-AGI incentive and asymetric upside to play along...

aspenmartin · 2026-01-07T22:28:44 1767824924

Wait you know that frontier labs do actually do this right?

aspenmartin · 2026-01-07T19:12:16 1767813136

Youre mixing up several concepts. Synthetic data works for coding because coding is a verifiable domain. You train via reinforcement learning to reward code generation behavior that passes detailed specs and meets other deseridata. It’s literally how things are done today and how progress gets made.

zwnow · 2026-01-07T20:53:07 1767819187

Most code out there is a legacy security nightmare, surely its good to train on that.

dang · 2026-01-07T21:19:50 1767820790

Would you please stop posting cynical, dismissive comments? From a brief scroll through https://news.ycombinator.com/comments?id=zwnow, it seems like your account has been doing nothing else, regardless of the topic that it's commenting on. This is not what HN is for, and destroys what it is for.

If you keep this up, we're going to have to ban you, not because of your views on any particular topic but because you're going entirely against the intended spirit of the site by posting this way. There's plenty of room to express your views substantively and thoughtfully, but we don't want cynical flamebait and denunciation. HN needs a good deal less of this.

If you wouldn't mind reviewing https://news.ycombinator.com/newsguidelines.html and taking the intended spirit of the site more to heart, we'd be grateful.

zwnow · 2026-01-07T22:46:23 1767825983

Then ban me u loser, as I wrote HN is full of pretentious bullshitters. But its good that u wanna ban authentic views. Way to go. If i feel like it I'll just create a new account:-)

aspenmartin · 2026-01-08T03:23:03 1767842583

dang is a saint, he wants your opinion, not the other toxic stuff.

dang · 2026-01-11T18:29:03 1768156143

I appreciate the kind words but let's not go that far

aspenmartin · 2026-01-07T21:47:36 1767822456

But that doesn't really matter and it shows how confused people really are about how a coding agent like Claude or OSS models are actually created -- the system can learn on its own without simply mimicking existing codebases even though scraped/licensed/commissioned code traces are part of the training cycle.

Training looks like:

- Pretraining (all data, non-code, etc, include everything including garbage)

- Specialized pre-training (high quality curated codebases, long context -- synthetic etc)

- Supervised Fine Tuning (SFT) -- these are things like curated prompt + patch pairs, curated Q/A (like stack overflow, people are often cynical that this is done unethically but all of the major players are in fact very risk adverse and will simply license and ensure they have legal rights),

- Then more SFT for tool use -- actual curated agentic and human traces that are verified to be correct or at least produce the correct output.

- Then synthetic generation / improvement loops -- where you generate a bunch of data and filter the generations that pass unit tests and other spec requirements, followed by RL using verifiable rewards + possibly preference data to shape the vibes

- Then additional steps for e.g. safety, etc

So synthetic data is not a problem and is actually what explains the success coding models are having and why people are so focused on them and why "we're running out of data" is just a misunderstanding of how things work. It's why you don't see the same amount of focus on other areas (e.g. creative writing, art etc) that don't have verifiable rewards.

The

Agent --> Synthetic data --> filtering --> new agent --> better synthetic data --> filtering --> even better agent

flywheel is what you're seeing today so we definitely don't have any reason to suspect there is some sort of limit to this because there is in principle infinite data

aspenmartin · 2026-01-04T13:26:45 1767533205

That is a pretty good setup and delivery I must say

I think really things will just start shifting as things really do start to get better and step changes of capabilities where “I’m not even going to try to get opus to do X because I know it’s going to suck” moves to “oh wow it’s actually helpful” to “I don’t really even need to be that involved”.

Places where engineering labor is the bottleneck to better output will be where talent migrates towards and places where output is capped regardless of engineering labor are going to be where talent migrates from. I don’t really see this apocalyptic view really accurate at all, I think it’s really going to be cost / output will reduce. It’ll make new markets pop up where it wouldn’t be really possible to justify engineering expense today.

aspenmartin · 2026-01-04T13:22:47 1767532967

Your experience mirrors mine as well. I will say since I’ve got both data science and engineering workflows, data science is where I’ve been accelerated the most because opus can do ad hoc scripts and e.g. streamlit websites and data munging very well so all of the things that take time before decisioning are much faster.

For engineering, its extremely easy to not only acquire tech debt but to basically run yourself into the ground when opus cannot do a task but doesn’t know it cannot do a task.

BUT: what I will say is to me and to others the trajectory has been really impressive over the last 1.5 years, so I don’t view the optimism of “we’re nearly there!” As being kind of wishful or magical thinking. I can definitely believe by the end of 2026 we’ll have agents that break through the walls that it still can’t climb over as of today just based on the rate of improvements we’ve seen so far and the knowledge that we still have a lot of headroom just with current stack.

aspenmartin · 2026-01-03T02:07:46 1767406066

I agree but I think to be fair it seems that there’s an open question of just how much more we can get from scaling / tricks. I would assume that there’s agreement that e.g. continual learning just won’t be solved without a radical departure from the current stack. But even with all of the baggage we have right now, if you believe extrapolations we have ~2 GPT4->5 sized leaps before everyone has to get out of the pool