Meanwhile, claude CLI has so many huge bugs that break the experience. Memory leaks, major cpu usage, tool call errors that require you to abandon a conversation, infinite loops, context leaks, flashing screens.. so many to list.
I love the feature set of Claude Code and my entire workflow has been fine tuned around it, but i had to to codex this month. Hopefully the Claude Code team spends some time to slow down and focus on bugs.
I doubt it. A large part of the performance problem with CC is constantly writing to a single shared JSON file across all instances, with no sharding or other mechanisms to keep it performant. It's spinning a shitload of CPU and blocking due to constant serialization/deserialization cycles and IO. When I was using CC a lot, my JSON file would hit >20mb quite quickly, and every instance would grind to a halt, sometimes taking >15s to respond to keyboard input. Seriously bullshit.
Everything Anthropic does from an engineering standpoint is bad, they're a decent research lab and that's it.
> Everything Anthropic does from an engineering standpoint is bad, they're a decent research lab and that's it.
This may be true, but then I wonder why it is still the case that no other agentic coding tool comes close to Claude Code.
Take Gemini Pro: excellent model let down by a horrible Gemini CLI. Why are the major AI companies not investing heavily in tooling? So far all the efforts I've seen from them are laughable. Every few weeks there is an announcement of a new tool, I go to try it, and soon drop it.
It seems to me that the current models are as good as they are goingto be for a long time, and a lot of the value to be had from LLMs going forward lies in the tooling
Gemini is a very powerful model, but it's tuned to be "oracular" rather than "agentic." The CLI isn't great but it's not the primary source of woe there. If you use Gemini with Aider in a more oracular fashion, it's still competitive with Claude using CC.
Claude is a very good model for "vibe coding" and content creation. It's got a highly collapsed distribution that causes it to produce good output with poor prompts. The problem is that collapsed distribution means it also tends to disobey more detailed prompts, and it also has a hard time with stuff that's slightly off manifold. Think of it like the car that test drives great but has no end of problems under atypical circumstances. It's also a naturally very agentic, autonomous model, so it does well in low information scenarios where it has to discover task details.
It is still slower than I'd like, at least with regards to UI input responsiveness, but I've never had it hard lock on me like CC. I can run 5-10 codex sessions and my system holds up fine (128GB RAM) but 8 CC instances would grind things to a halt after a few days of heavy usage.
The horrible thumping is purely a fit issue. The solution to the thumping when running is to either size down the tips or to slightly dislodge the tips from your ear.
It’s not ideal, I’ll grant you that.
While we may have some overlap in issues, I would say that the Airpods Pro 3 are incredible. I’ve ditched my Airpods Max entirely. The noise cancellation works too well, the sleep detection is a godsend, and the battery life is so good. I use my airpods to sleep. before, i’d always wake up to dead airpods. now, they have like 70% batteries when i wake up, because the sleep detection kicks in.
Seems odd to call it a fit issue when the solution is to make the fit worse by dislodging them from your ear. If it's a fit issue then improving the fit should make it go away!
I’ve tried with every set of tips except for the xs, same thumping with all of them. Zero issues running almost daily with the second gens for several years. I think it’s more than fit—either oversensitive ANC or something with the composition of the tips themselves. Oddly enough, it’s not present in both ears every time. Sometimes both, sometimes just one, rarely neither. I’ve stopped gathering data because I switched to a different set of headphones.
I’ll grant you that the ANC in the third gen is fantastic. I just felt like the second gen fit themselves into my routine, whereas I have to fiddle and futz with the third gen to get them just-so so that they don’t inhibit my routine.
What you mean? How people are manage to run with noise cancelation? Or how it works that they don’t loose them?
I run with my AirPods Pro 2 and have no issues. I have some other in-ear buds where fit is also no issue but thumping sounds while running make them unusable.
Years ago I was a convert to open ear bone conduction by Shokz (then Aftershokz) but the band was a little annoying and now I use the Huawei Freeclips which I am very happy with. Bose also have an open ear product.
My priority with exercise is peripheral awareness so I would never compromise that with in-ears anymore
I understand. I think it very much depends on the environment. I usually run in parks not on the street. I also trust my eyes more than my ears when doing runs on more trafficked routes. The Apple AirPods have a great transparent mode. I tried bone conducting headphone and it wasn’t for me. I know that the new models are kind of hybrids now. But I also love the fact that I can listen to myself. I had tons of headphones over the years. And I think for me the AirPods Pro 2 are just the most versatile.
Well in a big city sounds can be deceiving. Also depends how trained your hearing is. I guess I would have a hard time in case I end up going blind. In any case, what I meant is, that I use my eyes, and by that also turn my head, to look over my shoulder to check for cars etc. In most cases it’s best to have eye contact with a car driver who currently takes a turn to make sure their actually seeing you.
I am blind. So no eye contact with drivers. And I am still alive, despite usually going alone as pedestrian. However, I guess I benefit a lot from the austrian "Vertrauensgrundsatz", which basically translates as "principle of trust". When acquiring a drivers license, you are drilled to take extra care of disabled or obviously incapacitated pedestrians. That basically means, if you hit a blind person, or even an obviously drunk person, you are at fault, no matter what.
It’s hard to argue with you about that. I think you’re right about the tip composition being the issue. Also, there’s definitely an alien feel to APP3.
Reading that post, all i felt what a splendid luxury it is to live in the united states and be paid much better than those in other nations because of the country’s brand.
I’m not sure if people are aware, but there’s a little-known secret about America not having state-funded healthcare. Did any of you know? Don’t forget this important fact! It’s related to every single American topic.
The reason why OP is getting terrible results is because he's using Cursor, and Cursor is designed to ruthlessly prune context to curtail costs.
Unlike the model providers, Cursor has to pay the retail price for LLM usage. They're fighting an ugly marginal price war. If you're paying more for inference than your competitors, you have to choose to either 1) deliver equal performance as other models at a loss or 2) economize by way of feeding smaller contexts to the model providers.
Cursor is not transparent on how it handles context. From my experience, it's clear that they use aggressive strategies to prune conversations to the extent that it's not uncommon that cursor has to reference the same file multiple times in the same conversation just to know what's going on.
My advice to anyone using Cursor is to just stop wasting your time. The code it generates creates so much debt. I've moved on to Codex and Claude and I couldn't be happier.
Github Copilot is likely running models at or close to cost, given that Azure serves all those models. I haven't used Copilot in several months so I can't speak to its performance. My perception back then was that its underperformance relative to peers was because Microsoft was relatively late to the agentic coding game.
> Or is the performance of those models also worse there?
The context and output limit is heavily shrunk down on github copilot[0].
That's the reason why for example Sonnet 4.5 performs noticeably worse under copilot than in claude code.
Mokyr’s northwestern website has links to a lot of his papers.
An extremely crude selection rule:
Anything published in the American economic review, quarterly journal of economics, journal of political economy has the profession’s “highest stamp of approval”. It’s really hard to publish anything there. (There are two journals im not listing in that “top” category but he has no papers there on his website.). On aghion or howitts websites, look for the above journals but also econometrica and the review of economic studies. Those are the “top five” in the field.
There are surely papers in good history and Econ history journals on mokyr’s website but I don’t know the journals!
Standards for any chapter in a “handbook of X economics” or “handbook of the economics of X” are high - those should be good surveys.
Similarly a paper in the “annual review of economics”
Also mokyr has a bunch of work on Amazon. “The lever of riches” is a classic. “A culture of growth” is well regarded.
Finally he has a forthcoming book called “two paths to prosperity” with two other distinguished guys - one Econ historian (greif) and one political economy guy (tabellini). It’s coming out in about three weeks. Good timing, Princeton U Press!
Aghion and howitt have a growth textbook at the advanced undergrad level called “the economics of growth.”
They have a much more advanced work called “endogenous growth theory” which is for specialists (or at least anyone with first year PhD macro)
Aghion has a book called “the power of creative destruction.”
or perhaps it means that they plan to expand their TV offering so as to merit being something more than just a "plus"
i don't know if i like the rebranding or not – it's such a minor thing that idk if it even warrants an opinion. But they should now be obliged to next rebrand Apple Carplay to Apple Car.
> or perhaps it means that they plan to expand their TV offering so as to merit being something more than just a "plus"
That's an interesting point. One potential reason to simplify the service's base name is to allow for segmentation, e.g. Apple TV Ultra.
I think there's a reasonable possibility that they'll introduce a <$100 device in an effort to 10X their living room user base, in which case we might see something like an Apple Theater Pro and an Apple Theater mini.