For a "copy Deepseek's homework" model, it's really good, preferable to DeepSeek for me (at least prior to V3.2, which I haven't been able to fully put through its paces yet). post-training really makes that much of a difference I guess
This is a very good analysis. Timnit Gebru, Emile Torres, and even Hao herself's response to this has been very annoying to me (calling several multiple OOM mistakes that completely reverse the ideologically convenient picture she painted "a typo", whataboutism because the guy who pointed it out is an EA guy, and ignoring most of the criticisms as "philosophical differences" while trying to shift blame for the one issue she does admit, respectively).
Yeah. It's unfortunate that they seem so intent on digging in their heels about this one issue of water use. Precisely because AI is so important, we should want to make sure we're clear on the facts and have the right context for them! And the larger point that AI is likely to consolidate power in the hands of increasingly few people and corporations is well worth making (although, that story can be told for most critical technologies , from the steam engine to the telegraph to the transistor, and I don't think Hao's framing of Empire/Colonialism is really the right way to look at it at all). I think there are plenty of books to be written about the social impact of AI from a more balanced, empirical, less ideological perspective.
Yeah, it is really unfortunate because, like, personally, I think there are a lot of very genuine, very good ethical, social, economic, and material arguments to be made against AI and then dragging their heels on this one transparently ideologically motivated and wrong criticism is just detracting from their credibility, distracting their messaging, and draining their time. And to be clear for me, like, I actually think, yeah, it would be good to have more books written about this, but Empire of AI is actually a pretty good book overall. This stuff it talks about with reinforcement learning workers and the weird bizarre fucked up culture of open AI and stuff are pretty good.
On the other hand, I think Emile P Torres and Timnit Gebru are ideologues that really shouldn't be listened to.
This is just the same flippant dismissive stuff as usual. At this point, it's its own brand of anti-AI slop. Just because LLMs are not deterministic, it does not mean that you can't effectively iterate and modify code it generates. Or that it won't give you something useful almost every time you use it. Also, this article talks about possible use cases of LLMs for learning, and then dismisses them as completely replaceable with a book, a tutorial, or a mentor. Not counting the fact that books and tutorials are not individually tailored to the person and what they want to work on and are interested in can't be infinitely synthesized to take you as far as you want to go and are often limited for certain technologies and mentors are often very difficult to come by.
I get your part about mentors. I came up through having to figure stuff out myself a lot via stack overflow and friends, where the biggest problem for me is usually how to ask the right question (eg with elastic search, having to find and understand "index" vs "store" - once I have those two terms, searching is a lot easier, and without them, it's a bit of a crapshoot). Mentors help here because they had to travel that road too and probably can translate from my description to the correct terms.
And I really wish I could trust an llm for that, or, indeed, any task. But I generally find answers fall into one of these useless buckets:
1. Reword the question as an answer (so common, so useless)
2. Trivial solutions that are correct - meaning one or two lines that are valid, but that I could have easily written myself quicker than getting an agent involved, and without the other detractors on this list
3. Wildly incorrect "solutions". I'm talking about code that doesn't even build because the llm can't take proper direction on which version of the library to refer to, so it keeps giving results based off old information that is no longer relevant. Try resolving a webpack 5 issue - you'll get a lot of webpack 4 answers and none of them will work, even if you specify webpack 5
4. The absolute worst: subtly incorrect solutions that seem correct and are confidently presented as correct. This has been my experience with basically every "oh wow, look what the llm can do" demo. I'm that annoying person who finds the big mid-demo.
The problems are:
1. A person inexperienced in the domain will flounder for ages trying out crap that doesn't work and understanding nothing of it.
2. A person experienced in the domain will spend a reasonable amount of time correcting the llm - and personally, I'd much rather write my own code via tdd-driven emergent design - I'll understand it, and it will be proven to work when it's done.
I see that proponents of the tech often gloss over this and don't realise that they're actually spending more time overall, especially when having to polish out all the bugs. Or maintain the system.
Use whatever you want, but I've got zero confidence in the models, and I prefer to write code instead of gambling. But to each, their own.
The way I see AI coding-agents at the moment is they are interns. You wouldn't give an intern responsibility for the whole project. You need an experienced developer who COULD do the job with some help from interns, but now the AI can be the intern.
There's an old saying "Fire is a good servant, but bad master". I think same applies to AI. In "vibe-coding" AI is too much the master.
But it's the amount and location(?) of the vibes that matters.
If I want to say, create a Youtube RSS hydrator that uses DeArrow to de-clickbait all URLs before they hit my RSS reader.
Level 1 (max vibe) I can either just say that to an LLM hit "go" and hope for the best (maximum vibes on spec and code). Most likely gonna be shit. Might work too.
Level 2 (pair-vibing the spec) is me pair-vibing the spec with an LLM, web versions might work if they can access sites for specs (figuring out how to turn a youtube URL to an RSS feed and how the DeArrow API works)
After the spec is done, I can give it to an agent and go do something else. In most cases there's an MVP done when I come back, depending on how easy said thing is to test (RSS/Atom is a fickle spec and readers implement it in various ways) automatically.
Level 3 continues the pair-vibed spec with pair-coding. I give the agent tasks in small parts and follow along as it progresses, interrupting if it strays.
For most senior folks with experience in writing specs for non-seniors, Level 2 will produce good enough stuff for personal use. And because you offload the time consuming bits to an agent, you can do multiple projects in parallel.
Level 3 will definitely bring the best results, but you can only progress one task at a time.
> And I really wish I could trust an llm for that, or, indeed, any task. But I generally find answers fall into one of these useless buckets: 1. Reword the question as an answer (so common, so useless) 2. Trivial solutions that are correct - meaning one or two lines that are valid, but that I could have easily written myself quicker than getting an agent involved, and without the other detractors on this list 3. Wildly incorrect "solutions". I'm talking about code that doesn't even build because the llm can't take proper direction on which version of the library to refer to, so it keeps giving results based off old information that is no longer relevant. Try resolving a webpack 5 issue - you'll get a lot of webpack 4 answers and none of them will work, even if you specify webpack 5 4. The absolute worst: subtly incorrect solutions that seem correct and are confidently presented as correct. This has been my experience with basically every "oh wow, look what the llm can do" demo. I'm that annoying person who finds the big mid-demo.
This is just not my experience with coding agents, which is interesting. You could chalk this up to me being a bad coder, insufficiently picky, being fooled by plausible looking code, whatever, but I carefully read every diff the agent suggests, force it to keep every diff small enough for that to be easy, and I'm usually very good at spotting potential bugs, and very picky about code quality, and the ultimate test passes: the generated code works, even when I extensively test it in daily usage. I wonder if maybe it has something to do with the technologies or specific models/agents you're using? Regarding version issues, that's usually something I solve by pointing the agent at a number of docs for the version I want and having it generate documentation for itself, and then @'ing those docs in the prompt moving forward, or using llms.txt if available, and that usually works a charm for teaching it things.
> I see that proponents of the tech often gloss over this and don't realise that they're actually spending more time overall, especially when having to polish out all the bugs. Or maintain the system.
I am a very fast, productive coder by hand. I guarantee you, I am much faster with agentic coding, just in terms of measuring the number of days it takes me to finish a feature or greenfield prototype. And I doing corrections are a confounding factor because I very rarely have to correct these models. For some time I used agent that tracks how often as a percentage I accept tool calls including edits the agent suggests. One thing to know about me is that I do not ever accept subpar code. If I don't like an agent's suggestion I do not accept and then iterate; I want it to get it right from the first. My acceptance rate was 95%.
This is very well written and a really good holistic vision of future user respecting AI. Well done! I just hope local LLMs get good enough for this to work
Those are effectively made up numbers, since they're given to him by an anonymous source we have no way of corroborating, and we can't even see the documents themselves, and it contradicts not just OpenAI's official numbers, but first principles analyses of what the economics of inference should be[1] and the inference profit reports of other companies, as well as just an analysis of the inference market would suggest[2]
Yeah. I definitely think LLMs are cool and are useful — they don't just appear that way because everything else happened to be enshittified when they came on the scene — but I've definitely been pondering the fact that we're likely in a part of the cycle right before enshittification sets in. Because it is very possible to monetize AI — the badness of the token economics has been greatly exaggerated, and with huge MAU numbers there's a lot of opportunity — but most of those MAUs won't pay a monthly subscription with caps and rate limits like prosumers will, so the obvious way to monetize is with ads and manipulation.
IDK it's become too verbose IMHO, looks almost like COBOL now. (I think it was Fortran 66 that was the last Fortran true to its nature as a "Formula Translator"...)
We are way beyond comparing languages to COBOL, now that plenty folks type whole book sized descriptions into tiny chat windows for their AI overloads.