LLM Year in Review

socketcluster · 2025-12-20T05:06:46 1766207206

For me, Claude Code was the most impressive innovation this year. Cursor was a good proof of concept but Claude Code is the tool that actually got me to use LLMs for coding.

The kind of code that Claude produces looks almost exactly like the code I would write myself. It's like it's reading my mind. This is a game changer because I can maintain the code that Claude produces.

With Claude Code, there are no surprises. I can pretty much guess what its code will look like 90% to 95% of the time but it writes it a lot faster than I could. This is an amazing innovation.

Gemini is quite impressive as well. Nano banana in particular is very useful for graphic design.

I haven't tried Gemini with coding yet but TBH, Claude Code does such a great job; if I could code any faster, I would get decision fatigue. I don't like rushing into architecture or UX decisions. I like to sit on certain decisions for a day or two before starting implementation. Once you start in a particular direction, it's hard to undo and you may try to double down on the mistake due to sunk cost fallacy. I try hard to avoid that.

Daniel_sk · 2025-12-20T14:03:27 1766239407

I don't even see much reason to use Cursor. I am used to IntelliJ IDEA, so I just downloaded the Claude Code plugin and basically now I use the IDE only for navigating in the code, finding references and reviewing the code. I can't even remember the last time I wrote more than 2 lines of code. Claude Code has catapulted my performance at least 5x if not more. And now that the cost of writing test is so minimal I am also able to achieve much better (and meaningful!) test coverage too. The AI agents is where the most productivity is. I just create a plan with Claude, iterate over, ask questions, then let it implement the plan, review, ask to do some adjustments. No manual writing of code at all. Zero.

spaceman_2020 · 2025-12-20T14:07:07 1766239627

Nano Banana Pro is legitimately an insane tool if you know how to use it. I still can’t believe they released it in the wild

andai · 2025-12-20T09:50:10 1766224210

I first got into agentic properly with GLM coding plan (it's like $2/month), but I found myself very consistently asking Claude to make the code more elegant and readable. At which point I realized I was being silly and just switched to Claude code.

(GLM etc. get surprisingly close with good prompting but... $0.60/day to not worry about that is a no brainer.)

tarsinge · 2025-12-20T07:36:50 1766216210

I don’t have much time to evaluate tools every months and I have settled on Cursor. I’m curious on what I’m missing when using the same models?

ramoz · 2025-12-20T13:20:12 1766236812

You are missing an entire agentic experience. And I wouldn't call it vibe coding for an engineer; you're more or less empowered to truly orchestrate the development of your system.

Cursor has agent, but that's like whoever else tried to copy the Model T while Ford was developing it.

andai · 2025-12-20T09:54:17 1766224457

I have only compared Claude Code with Crush and a tool of my own design. In my experience, Claude code is optimized for giant codebases and long tasks. It loves launching dozens of agents in parallel. So it's a bit heavy for smaller, surgical stuff, though it works decent for that too.

If you mostly have small codebases that fit in context, or make many small changes interactively, it's not really great for that (though it can handle it too). It'll just be spending most of its time poking around the codebase, when the whole thing should have just been loaded... (Too bad there's no small repo mode. I made startup hook that just dumps cat dir into context, but yeah, should be a toggle.)

afro88 · 2025-12-20T10:22:24 1766226144

You're not missing much. You can generally use Cursor like Claude Code for normal day to day use. I prefer Cursor because I like reviewing changes in an IDE, and I like being able to switch to the current SOTA model.

Though for more automated work, one thing you miss with Cursor is sub agents. And then to a lesser extent skills (these are pretty easy to emulate in other tools). I'm sure it's only a matter of time though.

Ozzie_osman · 2025-12-20T13:12:07 1766236327

Claude Code's VS Code integration is very easy to set up and pretty helpful if you want to see/review changes in an IDE.

ollysb · 2025-12-20T13:29:59 1766237399

The big limitation is that you have to approve/disapprove at every step. With Cursor you can iterate on changes and it updates the diffs until you approve the whole batch.

wahnfrieden · 2025-12-20T08:04:22 1766217862

If you switch to Codex you will get a lot of tokens for $200, enough to more consistently use high reasoning as well. Cursor is simply far more expensive so you end up using less or using dumber models.

Claude Code is overrated as it uses many of its features and modalities to compensate for model shortcomings that are not as necessary for steering state of the art models like GPT 5.2

MrOrelliOReilly · 2025-12-20T09:46:15 1766223975

I think this is a total misunderstanding of Anthropic’s place in the AI race. Opus 4.5 is absolutely a state of the art model. I won’t knock anyone for preferring Codex, but I think you’re ignoring official and unofficial benchmarks.

See: https://artificialanalysis.ai

woadwarrior01 · 2025-12-20T11:20:29 1766229629

> Opus 4.5 is absolutely a state of the art model.

> See: https://artificialanalysis.ai

The field moves fast. Per artificialanalysis, Opus 4.5 is currently behind GPT-5.2 (x-high) and Gemini 3 Pro. Even Google's cheaper Gemini 3 Flash model seems to be slightly ahead of Opus 4.5.

MrOrelliOReilly · 2025-12-20T14:26:58 1766240818

Totally, however OP's point was that Claude had to compensate for deficiencies versus a state of the art model like ChatGPT 5.2. I don't think that's correct. Whether or not Opus 4.5 is actually #1 on these benchmarks, it is clearly very competitive with the other top-tier models. I didn't take "state of the art" to here narrowly mean #1 on a given benchmark, but rather to mean near or at the frontier of current capabilities.

dr_dshiv · 2025-12-20T13:36:26 1766237786

https://lmarena.ai/leaderboard/webdev

LM Arena shows Claude Opus 4.5 on top

HarHarVeryFunny · 2025-12-20T13:57:58 1766239078

I wonder how model competence and/or user preference on web development (that leaderboard) carries over to more complex and larger projects, or more generally anything other than web development ?

In addition to whatever they are exposed to as part of pre-training, it'd be interesting to know what kind of coding tasks these models are being RL-trained for? Are things like web development and maybe Python/ML coding overemphasized, or are they also being trained on things like Linux/Windows/embedded development etc in different languages?

ramoz · 2025-12-20T14:06:28 1766239588

https://x.com/giansegato/status/2002203155262812529/photo/1

https://x.com/METR_Evals/status/2002203627377574113

> Even Google's cheaper Gemini 3 Flash model seems to be slightly ahead of Opus 4.5.

What an insane take for anybody uses these models daily.

MrOrelliOReilly · 2025-12-20T14:18:59 1766240339

Yes, I personally feel that the "official" benchmarks are increasingly diverging from the everyday reality of using these models. My theory is that we are reaching a point where all the models are intelligent enough for day-to-day queries, so points like style/personality and proper use of web queries and other capabilities are better differentiators than intelligence alone.

ccmcarey · 2025-12-20T08:34:06 1766219646

I disagree, the claude models seem the best at tool calling, opus 4.5 seems the smartest, and claude code (+ claude model) seems to make good use of subagents and planning in a way that codex doesn't

augment_me · 2025-12-20T07:51:53 1766217113

I noticed that despite really liking Karpathy and the blog, I was am kind of wincing/involuntarily reacting to the LLM-like "It's not X, its Y"-phrases:

> it's not just a website you go to like Google, it's a little spirit/ghost that "lives" on your computer

> it's not just about the image generation itself, it's about the joint capability coming from text generation

There would be no reaction from me on this 3 years ago, but now this sentence structure is ruined for me

spaceman_2020 · 2025-12-20T14:09:02 1766239742

I used to use a lot of em dashes normally in my writing - they were my go-to replacements for commas and semicolons

But I had to change how I write because people started calling my writing “AI generated”

matsemann · 2025-12-20T13:46:37 1766238397

Yeah, came to read Karpathy's thoughts, but might as well ask an LLM myself..

d-lisp · 2025-12-20T09:17:01 1766222221

I hated these sentences way before LLMs, at least in the context of an explanation.

> it's not just a website you go like Google, it's a little spirit/ghost that "lives" on your computer

This type of sentence, I call rhetorical fat. Get rid of this fat and you obtain a boring sentence that repeats what has been said in the previous one.

Not all rhetorical fats are equal, and I must admit I find myself eyerolling on the "little spirit" part more than about the fatness.

I understand the author wants to decorate things and emphasize key elements, and the hate I feel is only caused by the incompatible projection of my ideals to a text that doesn't belong to me.

> it's not just about the image generation itself, it's about the joint capability coming from text generation.

That's unjustified conceptual stress.

That could be a legitimate answer to a question ("No, no, it's not just about that, it's more about this"), but it's a text. Maybe the text wants you to be focused, maybe the text wants to hype you; this is the shape of the hype without the hype.

"I find image generation is cooler when paired with text generation."

amelius · 2025-12-20T13:54:55 1766238895

Karpathy should go back to what he does best: educating people about AI on a deep level. Running experiments and sharing how they work, that sort of stuff. It seems lately he is closer to an influencer who reviews AI-based products. Hopefully it is not too late to go back.

killerstorm · 2025-12-20T13:13:52 1766236432

It is not a decoration. Karpathy juxtaposes ChatGPT (which feels like a "better google" to most people) to Claude Code, which, apparently, feels different to him. It's a comparison between the two.

You might find this statement non-informative, but without two parts there's no comparison. That's really the semantics of the statement which Karpathy is trying to express.

ChatGPT-ish "it's not just" is annoying because the first part is usually a strawman, something reader considers trite. But it's not the case here.

d-lisp · 2025-12-20T13:36:03 1766237763

Indeed, I was probably grumpy at the time I wrote the comment. I do find some truth in it still.

You're right ! The strawman theory is based.

But I think there's more to it, I find dislikable the structure of these sentences (which I find a bit sensationnalist for nothing, I don't know, maybe I am still grumpy).

yard2010 · 2025-12-20T11:41:02 1766230862

I cannot unsee this anymore and it ruins the whole internet experience for me

another_twist · 2025-12-20T09:40:57 1766223657

Same here, had to configure ChatGPT to stop making these statements. Also had to configure bunch of other stuff to make it bland when answering questions.

andai · 2025-12-20T10:01:24 1766224884

The way to make AI not sound like ChatGPT is to use Claude.

I realized that's what bothered me. It's not "oh my god, they used ChatGPT." But "oh my god, they couldn't even be bothered to use Claude."

It'll still sound like AI, but 90% of the cringe is gone.

If you're going to use AI for writing, it's just basic decency to use the one that isn't going to make your audience fly into a fit of rage every ten seconds.

That being said, I feel very self conscious using emdashes in current decade ;)

dr_dshiv · 2025-12-20T13:39:53 1766237993

I love em dashes—they basically indicate a more deliberate pause than a … without the tight vibes of a semicolon.

ionwake · 2025-12-20T10:20:28 1766226028

I dont think Ive ever noticed someone use an emdash until chatgpt appeared

andai · 2025-12-20T13:58:36 1766239116

https://xkcd.com/3126/

I mostly use them in Telegram because it auto converts -- into emdash. They are a pain to type everywhere else though!

huevosabio · 2025-12-20T08:10:24 1766218224

Same, I cringe when I read this structure.

nathias · 2025-12-20T09:43:00 1766223780

It's not text - it's clickbait distillied to grammar.

thoughtpeddler · 2025-12-19T22:56:31 1766184991

I appreciate Andrej’s optimistic spirit, and I am grateful that he dedicates so much of his time to educating the wider public about AI/LLMs. That said, it would be great to hear his perspective on how 2025 changed the concentration of power in the industry, what’s happening with open-source, local inference, hardware constraints, etc. For example, he characterizes Claude Code as “running on your computer”, but no, it’s just the TUI that runs locally, with inference in the cloud. The reader is left to wonder how that might evolve in 2026 and beyond.

karpathy · 2025-12-19T23:45:15 1766187915

The CC point is more about the data and environmental and general configuration context, not compute and where it happens to run today. The cloud setups are clunky because of context and UIUX user in the loop considerations, not because of compute considerations.

CamperBob2 · 2025-12-20T03:25:14 1766201114

Agree with the GP, though -- you ought to make that clearer. It really reads like you're saying that CC runs locally, which is confusing since you obviously know better.

ramoz · 2025-12-20T13:23:56 1766237036

I think we need to shift our mindset on what an agent is. The LLM is a brain in a vat connected far away. The agent sits on your device, as a mech suit for that brain, and can pretty much do damn near anything on that machine. It's there, with you. The same way any desktop software is.

karpathy · 2025-12-20T05:50:49 1766209849

Yeah, I made some edits to clarify.

simonw · 2025-12-20T01:45:52 1766195152

One of the most interesting coding agents to run locally is actually OpenAI Codex, since it has the ability to run against their gpt-oss models hosted by Ollama.

  codex --oss -m gpt-oss:20b

Or 120b if you can fit the larger model.

AlexCoventry · 2025-12-20T01:59:21 1766195961

What do you find interesting about it, and how does it compare to commercial offerings?

simonw · 2025-12-20T03:49:58 1766202598

It's rare to find a local model that's capable of running tools in a loop well enough to power a coding agent.

I don't think gpt-oss:20b is strong enough to be honest, but 120b can do an OK job.

Nowhere NEAR as good as the big hosted models though.

ontouchstart · 2025-12-20T05:03:43 1766207023

Think of it as the early years of UNIX & PC. Running inferences and tools locally and offline opens doors to new industries. We might not even need client/server paradigm locally. LLM is just a probabilistic library we can call.

AlexCoventry · 2025-12-20T05:17:36 1766207856

Thanks.

magicalhippo · 2025-12-19T23:41:36 1766187696

From what I can gather, llama.cpp supports Anthropic's message format now[1], so you can use it with Claude Code[2].

[1]: https://github.com/ggml-org/llama.cpp/pull/17570

[2]: https://news.ycombinator.com/item?id=44654145

ramoz · 2025-12-20T03:34:01 1766201641

What he meant was, agents will probably not be these web abstractions that run in deployed services (langchain, crew); agents meaning the Harnesses (software wrapper) specifically that call the LLM API.

It runs on your computer because of its tooling. It can call Bash. It can literally do anything on the operating system and file system. That's what makes it different. You should think of it like a mech suit. The model is just the brain in a vat connected far away.

D-Machine · 2025-12-19T23:37:19 1766187439

The section on Claude Code is very ambiguously and confusingly written, I think he meant that the agent runs on your computer (not inference) and that this is in contrast to agents running "on a website" or in the cloud:

> I think OpenAI got this wrong because I think they focused their codex / agent efforts on cloud deployments in containers orchestrated from ChatGPT instead of localhost. [...] CC got this order of precedence correct and packaged it into a beautiful, minimal, compelling CLI form factor that changed what AI looks like - it's not just a website you go to like Google, it's a little spirit/ghost that "lives" on your computer. This is a new, distinct paradigm of interaction with an AI.

However, if so, this is definitely a distinction that needs to be made far more clearly.

realcul · 2025-12-20T03:14:52 1766200492

Well Microsoft had thier "localhost" AI before CC but that was a ghost without a clear purpose or skill.

starchild3001 · 2025-12-20T01:01:07 1766192467

The distinction Karpathy draws between "growing animals" and "summoning ghosts" via RLVR is the mental model I didn't know I needed to explain the current state of jagged intelligence. It perfectly articulates why trust in benchmarks is collapsing; we aren't creating generally adaptive survivors, but rather over-optimizing specific pockets of the embedding space against verifiable rewards.

I’m also sold on his take on "vibe coding" leading to ephemeral software; the idea of spinning up a custom, one-off tokenizer or app just to debug a single issue, and then deleting it, feels like a real shift.

HarHarVeryFunny · 2025-12-20T14:40:36 1766241636

> The distinction Karpathy draws between "growing animals" and "summoning ghosts" via RLVR

I don't see these descriptions as very insightful.

The difference between general/animal intelligence and jagged/LLM intelligence is simply that humans/animals really ARE intelligent (the word was created to describe this human capability), while LLMs are just echoing narrow portions of the intelligent output of humans (those portions that are amenable to RLVR capture).

For an artificial intelligence to be intelligent in it's own right, and therefore be generally intelligent, it would need to need - like an animal - to be embodied (even if only virtually), autonomous, predicting the outcomes of it's own actions (not auto-regressively trained), learning incrementally and continually, built with innate traits like curiosity and boredom to put and keep itself in learning situations, etc.

Of course not all animals are generally intelligent - many (insects, fish, reptiles, many birds) just have narrow "hard coded" instinctual behaviors, but others like humans are generalists who evolution have therefore honed for adaptive lifetime learning and general intelligence.

graemefawcett · 2025-12-20T03:47:46 1766202466

I've been doing it for months, it's lovely

https://tech.lgbt/@graeme/115749759729642908

It's a stack based on finishing the job Jupyter started. Fences as functions, callable and composable.

Same shape as an MCP. No training required, just walk them through the patterns.

Literally, it's spatially organized. Turns out a woman named Mrs Curwen and I share some thoughts on pedagogy.

There does in fact exist a functor that maps 18th century piano instruction to context engineering. We play with it

jkubicek · 2025-12-20T01:44:48 1766195088

> In the same way, LLMs should speak to us in our favored format - in images, infographics, slides, whiteboards, animations/videos, web apps, etc.

You think every Electron app out there re-inventing application UX from scratch is bad, wait until LLMs are generating their own custom UX for every single action for every user for every device. What does command-W do in this app? It's literally impossible to predict, try it and see!

johnfn · 2025-12-20T07:52:22 1766217142

On the other side of the spectrum, I see some of the latest agents, like Codex, take care to get accessibility right -- something not even many humans bother to do.

becquerel · 2025-12-20T08:58:36 1766221116

It's an extension of how I've noticed that AIs will generally write very buttoned-down, cross-the-ts-and-dot-the-is code. Everything gets commented, every method has a try-catch with a log statement, every return type is checked, etc. I think it's a consequence of them not feeling fatigue. These things (accessibility included) are all things humans generally know they 'should' do, but there never seems to be enough time in the day; we'll get to it later when we're less tired. But the ghost in the machine doesn't care. It operates at the same level all the time

tim333 · 2025-12-20T12:39:02 1766234342

>our favored format - in images, infographics, slides, whiteboards, animations/videos, web apps, etc

If you look at how humans actually communicate I'd guess #1 is text/speech, #2 pictures

Aiisnotabubble · 2025-12-20T11:45:51 1766231151

But that's exactly what an LLM solved.

It's the best ui ever.

It understands a lot of languages and abstract concepts.

It will not be necessary at all to let LLM generate random uis.

I'm not a native English speaker. I sometimes just throw in a German word and it just works.

mips_avatar · 2025-12-20T01:39:29 1766194769

I would love Andrej's take on the fast models we got this year. Gemini 3 flash and Grok 4 fast have no business being as good + cheap + fast as they are. For Andrej's prediction about LLMs communicating with us via a visual interface we're going to need fast models, but I feel like AI twitter/HN has mostly ignored these.

HarHarVeryFunny · 2025-12-20T14:09:05 1766239745

Just guessing here, but these small models may well be essentially distillations of larger ones, with this being where their power comes from. e.g. Use a large model to generate synthetic reasoning traces, then train a small model on those.

gnerd00 · 2025-12-20T04:10:07 1766203807

check out Sasha Luccioni

mips_avatar · 2025-12-20T04:35:52 1766205352

Do you have a link to anything they wrote about this?

andai · 2025-12-20T09:47:25 1766224045

The bit about o3 being the turning point is very interesting. I heard someone say that o3 (or perhaps the cheaper o4-mini) should have been called gpt-5, and that people would have been mind blown. Instead it kind of went under the radar as far as the mainstream goes.

Whereas we just got the incremental progress with gpt-5 instead and it was very underwhelming. (Plus like 5 other issues at launch, but that's a separate story ;)

I'm not sure if o4-mini would have made a good default gpt though. (Most use is conversational and its language is very awkward.) So they could have just called it gpt-5 pro or something, and put it on the $20 tier. I don't know.

victorbuilds · 2025-12-19T21:28:54 1766179734

Notable omission: 2025 is also when the ghosts started haunting the training data. Half of X replies are now LLMs responding to LLMs. The call is coming from inside the dataset.

vlod · 2025-12-20T00:48:21 1766191701

Any tips to spot this? I want to avoid arguing with a X bot.

shtack · 2025-12-20T02:00:22 1766196022

Really easy: don't argue on the internet. The approach has many benefits.

jckahn · 2025-12-20T04:16:40 1766204200

Also, don't use X.

bdangubic · 2025-12-20T04:22:01 1766204521

also, please just do not use X

dr_dshiv · 2025-12-20T13:48:58 1766238538

Ok, fine, but do you have a better way to build a bot following and expose oneself to trending MAGA memes?

delichon · 2025-12-20T00:00:29 1766188829

> I like this version of the meme for pointing out that human intelligence is also jagged in its own different way.

The idea of jaggedicity seems useful to advancing epistemology. If we could identify the domains that have useful data that we fail to extract, we could fill those holes and eventually become a general intelligence ourselves. The task may be as hard as making a list of your blind spots. But now we have an alien intelligence with an outside perspective. While making AI less jagged it might return the favor.

If we keep inventing different kinds of intelligence the sum of the splats may eventually become well rounded.

visarga · 2025-12-20T06:52:13 1766213533

I don't think it will become well rounded because that is not cost sensitive. Intelligence is sensitive to cost, it is the core constraint shaping it. Any action has a cost - energy, materials, time, opportunity or social. Intelligence is solving the cost equation, if we can't solve it we die. Cost is also why we specialize, in a group we can offload some intelligence to others. LLMs also have their own costs, and are shaped by it into some kind of jagged intelligence, they are no spherical cows either.

mvkel · 2025-12-20T00:47:56 1766191676

> In this world view, nano banana is a first early hint of what that might look like.

What is he referring to here? Is nano banana not just an image gen model? Is it because it's an LLM-based one, and not diffusion?

simonw · 2025-12-20T01:48:18 1766195298

What's interesting about Nano Banana (and even more so video models like Veo 3) is that they act as a weird kind of world model when you consider that they accept images as input and return images as output.

Give it an image of a maze, it can output that same image with the maze completed (maybe).

There's a fantastic article about that for image-to-video models here: https://video-zero-shot.github.io/

> We demonstrate that Veo 3 can zero-shot solve a broad variety of tasks it wasn't explicitly trained for: segmenting objects, detecting edges, editing images, understanding physical properties, recognizing object affordances, simulating tool use, and much more.

dragonwriter · 2025-12-20T01:01:42 1766192502

I think he is referring to capability, not architecture, and say that NB is at the point that it is suggestive of the near-future capability of using GenAI models to create their own UI as needed.

NB (Gemini 2.5 Flash Image) isn't the first major-vendor LLM-based image gen model, after all; GPT Image 1 was first.

lysecret · 2025-12-20T11:26:42 1766230002

It’s funny how every podcaster/public ai figure is so certain text as a Ui will go away and it’s not going anywhere.

tim333 · 2025-12-20T12:43:42 1766234622

It's probably increased during my lifetime. People used to talk, now they sit and text into smartphones.

TheAceOfHearts · 2025-12-19T21:24:15 1766179455

I think one of the things that is missing from this post is engaging a bit in trying to answer: what are the highest priority AI-related problems that the industry should seek to tackle?

Karpathy hints at one major capability unlock being UI generation, so instead of interacting with text the AI can present different interfaces depending on the kind of problem. That seems like a severely underexplored problem domain so far. Who are the key figures innovating in this space so far?

In the most recent Demis interview, he suggests that one of the key problems that must be solved is online / continuous learning.

Aside from that, another major issues is probably reducing hallucinations and increasing reliability. Ideally you should be able to deploy an LLM to work on a problem domain, and if it encounters an unexpected scenario it reaches out to you in order to figure out what to do. But for standard problems it should function reliably 100% of the time.

lukax · 2025-12-20T13:20:58 1766236858

Google is doing that with A2UI. LLM will be able to decide how to present info to the user.

andai · 2025-12-20T09:59:53 1766224793

Here's the source for the jagged spiky intelligence diagram:

https://x.com/colin_fraser/status/1994235521812328695

https://karpathy.bearblog.dev/the-space-of-minds/

nkko · 2025-12-20T08:01:33 1766217693

Beyond graduating students, I see model labs as “accelerators/incubators” bundling, launching, and productizing observed ideas that gain traction. The sheer strength of their platforms, the number of eyes watching them, near-zero marginal costs, and seemingly unlimited budgets mean that only slow decision-making can prevent them from becoming the next Amazons of everything.

dandelionv1bes · 2025-12-20T09:36:04 1766223364

Something I’ve been thinking about is how as end stage users (eg building our own “thing” on top of an LLM) we can broadly verify it’s doing what we need without benchmarks. Does a set of custom evals built out over time solve this? Is there more we can do?

swyx · 2025-12-19T20:49:29 1766177369

xposted to https://x.com/karpathy/status/2002118205729562949

CamperBob2 · 2025-12-20T03:29:15 1766201355

And also accessible sans login via https://xcancel.com/karpathy/status/2002118205729562949 .

distalx · 2025-12-20T11:14:02 1766229242

Friendly reminder: There is no ghost in the machine. It is a system executing code, not a being having thoughts. Let’s admire the tool without projecting a personality onto it.

ngruhn · 2025-12-20T14:08:21 1766239701

Consciousness is weird and nobody understands it. There is no good reason to assume that these systems have it. But there is also no good reason to rule it out.

dr_dshiv · 2025-12-20T14:25:28 1766240728

That’s the old way of thinking about it. there is a new way.

squidbeak · 2025-12-20T14:18:23 1766240303

You sound as if you have grounds for certainty about this. What are they?

alexgotoi · 2025-12-20T08:50:31 1766220631

LLMs still need to bring clear added value to enterprise and corporate work; otherwise, they remain a geek’s toy.

Big media agencies that claim to use AI rely on strong creative teams who fine-tune prompts and spend weeks doing so. Even then, they don’t fully trust AI to slice long videos into shorter clips for social media.

Heavy administrative functions like HR or Finance still don’t get approval to expose any of their data to LLMs.

What I’m trying to say is that we are still in the early stages of LLM development, and as promising as this looks, it’s still far from delivering the real value that is often claimed.

bgwalter · 2025-12-19T23:40:51 1766187651

Vibe coding is sufficient for job hoppers who never finish anything and leave when the last 20% have to be figured out. Much easier to promote oneself as an expert and leave the hard parts to other people.

zingar · 2025-12-20T00:06:54 1766189214

I’ve found incredible productivity gains writing (vibe coding) tools for myself that will never need to be “productionised” or even used by another person. Heck even I will probably never use the latest log retrieval tool, which exists purely for Claude code to invoke it. There is a ton of useful software yet to be written for which there _is_ no “last 20%”.

diamond559 · 2025-12-20T08:38:16 1766219896

These tools are so useful and make you so much more "productive" that you don't think anyone else would want to pay anything for them huh? Did your boss at least give you a big raise for your "productivity" increase, or maybe lay off some of your underperforming coworkers bc you are just so much better now?

augment_me · 2025-12-20T07:49:02 1766216942

All software is not meant to be open-source, in production and working on 100 platforms.

Sometimes the point of the software is to make an app with 2 buttons for your mom to help her do her grocery shopping easier

simonw · 2025-12-20T01:49:20 1766195360

Do you mean vibe coding as-in producing unreviewed code with LLMs and prompting at it until it appears to work, or vibe coding as a catch-all for any time someone uses AI-assistance to help them write code?

ausbah · 2025-12-20T02:13:21 1766196801

tl;dr seems like llms are maturing on the product side and for day-day usage