More

tarruda · 2025-12-08T10:05:18 1765188318

A VM is displayed as a window on the host OS and Emacs is the window manager within that VM window. What's the difference from running emacs directly as an application on the host?

tarruda · 2025-12-06T22:12:54 1765059174

> It's fast (~3 seconds on my RTX 4090)

It is amazing how far behind Apple Silicon is when it comes to use non- language models.

Using the reference code from Z-image on my M1 ultra, it takes 8 seconds per step. Over a minute for the default of 9 steps.

p-e-w · 2025-12-06T23:57:34 1765065454

The diffusion process is usually compute-bound, while transformer inference is memory-bound.

Apple Silicon is comparable in memory bandwidth to mid-range GPUs, but it’s light years behind on compute.

tarruda · 2025-12-07T00:47:39 1765068459

> but it’s light years behind on compute.

Is that the only factor though? I wonder if pytorch is lacking optimization for the MPS backend.

rfoo · 2025-12-07T11:20:45 1765106445

This is the only factor. People sometimes perceive Apple's NPU as "fast" and "amazing" which is simply false.

It's just that NVIDIA GPU sucks (relatively) at *single-user* LLM inference and it makes people feel like Apple not so bad.

tails4e · 2025-12-07T10:09:38 1765102178

I heard last year the potential future of gaming is not rendering but fully AI generated frames. 3 seconds per 'frame' now, it's not hard to believe it could do 60fps in a few short years. It makes it seem more likely such a game could exist. I'm not sure I like the idea, but it seems like it could happen

snek_case · 2025-12-07T11:53:45 1765108425

The problem is going to be how to control those models to produce a universe that's temporally and spatially consistent. Also think of other issues such as networked games, how would you even begin to approach that in this new paradigm? You need multiple models to have a shared representation that includes other players. You need to be able to sync data efficiently across the network.

I get that it's tempting to say "we no longer have to program game engines, hurray", but at the same time, we've already done the work, we already have game engines that are relatively very computationally efficient and predictable. We understand graphics and simulation quite well.

Personally: I think there's an obvious future in using AI tools to generate game content. 3D modelling and animation can be very time consuming. If you could get an AI model to generate animated characters, you could save a lot of time. You could also empower a lot of indie devs who don't have 3D modelers to help them. AI tools to generate large maps, also super valuable. Replacing the game engine itself, I think it's a taller order than people realize, and maybe not actually desirable.

adventured · 2025-12-07T12:35:34 1765110934

20 years out, what will everybody be using routine 10gbps pipes in our homes for?

I'm paying $43 / month for 500mbps at present and there's nothing special about that at all (in the US or globally). What might we finally use 1gbps+ for? Pulling down massive AI-built worlds of entertainment. Movies & TV streaming sure isn't going to challenge our future bandwidth capabilities.

The worlds are built and shared so quickly in the background that with some slight limitations you never notice the world building going on behind the scenes.

The world building doesn't happen locally. Multiple players connect to the same built world that is remote. There will be smaller hobbyist segments that will still world-build locally for numerous reasons (privacy for one).

The worlds can be constructed entirely before they're downloaded. There are good arguments for both approaches (build the entire world then allow it to be accessed, or attempt to world-build as you play). Both will likely be used over the coming decades, for different reasons and at different times (changes in capabilities will unlock new arguments for either as time goes on, with a likely back and forth where one pulls ahead then the other pulls ahead).

SR2Z · 2025-12-11T15:25:27 1765466727

> The problem is going to be how to control those models to produce a universe that's temporally and spatially consistent.

Why not just have a simple, low-poly rasterizer and have AI fill in the details?

That's essentially the way that AMD FX and NVIDIA DLSS work today, although they do take fully rendered frames as input.

wcoenen · 2025-12-07T11:02:32 1765105352

Increasing the framerate by rendering at a lower resolution + upscaling, or outright generation of extra frames has already been a thing for a few years now. NVidia calls it Deep Learning Super Sampling (DLSS)[1]. AMD's equivalent is called FSR[2].

[1] https://en.wikipedia.org/wiki/Deep_Learning_Super_Sampling

[2] https://en.wikipedia.org/wiki/GPUOpen#FidelityFX_Super_Resol...

liuliu · 2025-12-07T18:38:32 1765132712

Not saying M1 Ultra is great. But you should only get ~8x slow down with proper implementation (such as Draw Things upcoming implementation for Z Image). It should be 2~3 sec per step. On M5 iPad, it is ~6s per step.

tarruda · 2025-12-02T16:00:04 1764691204

Here's what I understood from the blog post:

- Mistral Large 3 is comparable with the previous Deepseek release.

- Ministral 3 LLMs are comparable with older open LLMs of similar sizes.

constantcrying · 2025-12-02T16:03:15 1764691395

And implicit in this is that it compares very poorly to SOTA models. Do you disagree with that? Do you think these Models are beating SOTA and they did not include the benchmarks, because they forgot?

saubeidl · 2025-12-02T16:32:26 1764693146

Those are SOTA for open models. It's a separate league from closed models entirely.

supermatt · 2025-12-02T17:09:53 1764695393

> It's a separate league from closed models entirely.

To be fair, the SOTA models aren't even a single LLM these days. They are doing all manner of tool use and specialised submodel calls behind the scenes - a far cry from in-model MoE.

tarruda · 2025-12-02T16:10:58 1764691858

> Do you disagree with that?

I think that Qwen3 8B and 4B are SOTA for their size. The GPQA Diamond accuracy chart is weird: Both Qwen3 8B and 4B have higher scores, so they used this weid chart where "x" axis shows the number of output tokens. I missed the point of this.

meatmanek · 2025-12-02T19:40:31 1764704431

Generation time is more or less proportional to tokens * model size, so if you can get the same quality result with fewer tokens from the same size of model, then you save time and money.

kergonath · 2025-12-03T08:11:52 1764749512

Thanks. That was not obvious to me either.

tarruda · 2025-12-01T19:57:48 1764619068

You can run at ~20 tokens/second on a 512GB Mac Studio M3 Ultra: https://youtu.be/ufXZI6aqOU8?si=YGowQ3cSzHDpgv4z&t=197

IIRC the 512GB mac studio is about $10k

menaerus · 2025-12-02T16:00:17 1764691217

~20 tokens/second is actually pretty good. I see he's using the q5 version of the model. I wonder how it scales with the larger contexts. And the same guy published the video today with the new 3.2 version: https://www.youtube.com/watch?v=b6RgBIROK5o

hasperdi · 2025-12-01T20:31:52 1764621112

and can be faster if you can get an MOE model of that

dormento · 2025-12-01T20:46:09 1764621969

"Mixture-of-experts", AKA "running several small models and activating only a few at a time". Thanks for introducing me to that concept. Fascinating.

(commentary: things are really moving too fast for the layperson to keep up)

hasperdi · 2025-12-01T21:29:57 1764624597

As pointed out by a sibling comment. MOE consists of a router and a number of experts (eg 8). These experts can be imagined as parts of the brain with specialization, although in reality they probably don't work exactly like that. These aren't separate models, they are components of a single large model.

Typically, input gets routed to a number of of experts eg. top 2, leaving the others inactive. This reduces number of activation / processing requirements.

Mistral is an example of a model that's designed like this. Clever people created converters to transform dense models to MOE models. These days many popular models are also available in MOE configuration

whimsicalism · 2025-12-01T20:57:56 1764622676

that's not really a good summary of what MoEs are. you can more consider it like sublayers that get routed through (like how the brain only lights up certain pathways) rather than actual separate models.

Mehvix · 2025-12-01T22:26:01 1764627961

The gains from MoE is that you can have a large model that's efficient, it lets you decouple #params and computation cost. I don't see how anthropomorphizing MoE <-> brain affords insight deeper than 'less activity means less energy used'. These are totally different systems, IMO this shallow comparison muddies the water and does a disservice to each field of study. There's been loads of research showing there's redundancy in MoE models, ie cerebras has a paper[1] where they selectively prune half the experts with minimal loss across domains -- I'm not sure you could disable half the brain and notice a stupefying difference.

[1] https://www.cerebras.ai/blog/reap

whimsicalism · 2025-12-02T20:08:32 1764706112

> I don't see how anthropomorphizing MoE <-> brain affords insight deeper than 'less activity means less energy used'.

I'm not saying it is a perfect analogy, but it is by far the most familiar one for people to describe what sparse activation means. I'm no big fan of over-reliance on biological metaphor in this field, but I think this is skewing a bit on the pedantic side.

re: your second comment about pruning, not to get in the weeds but I think there have been a few unique cases where people did lose some of their brain and the brain essentially routed around it.

miohtama · 2025-12-01T22:12:17 1764627137

All modern models are MoE already, no?

hasperdi · 2025-12-02T05:55:57 1764654957

That's not the case. Some are dense and some are hybrid.

MOE is not the holy grail, as there are drawbacks eg. less consistency, expert under/over-use

bigyabai · 2025-12-01T21:33:34 1764624814

>90% of inference hardware is faster if you run an MOE model.

tarruda · 2025-12-02T07:28:33 1764660513

Deepseek is already a MoE

tarruda · 2025-11-29T13:57:07 1764424627

Only a matter of time before using coding agents with local LLMs is a viable alternative.

fxtentacle · 2025-11-29T14:40:04 1764427204

I’m quite happy with my offline AI solution:

https://news.ycombinator.com/item?id=45845049

2025-11-29T14:02:42 1764424962

[dead]

pimeys · 2025-11-29T16:08:24 1764432504

Why it needs to work only on a Mac? And why is that better than running the gpt oss with llama.cpp and codex on my Linux box?

adam_patarino · 2025-11-29T17:39:43 1764437983

Our model is bigger and more capable than gpt OSS and can run at full context at 40 tokens / s.

We are rolling out to Mac to start with plans to release windows and Linux within 3 months.

yorwba · 2025-11-29T14:26:00 1764426360

"Join the Waitlist"

edoceo · 2025-11-29T14:27:01 1764426421

Mac only :(

adam_patarino · 2025-11-29T17:40:12 1764438012

We will have windows and Linux early next year! Just starting with Mac for first beta testers

tarruda · 2025-11-26T19:49:12 1764186552

I had a terrible first impression with Gemini CLI a few months ago when it was released because of the constant 409 errors.

With Gemini 3 release I decided to give it another go, and now the error changed to: "You've reached the daily limit with this model", even though I have an API key with billing set up. It wouldn't let me even try Gemini 3 and even after switching to Gemini 2.5 it would still throw this error after a few messages.

Google might have the best LLMs, but its agentic coding experience leaves a lot to be desired.

knollimar · 2025-11-26T20:20:07 1764188407

I had to make a new API key. My old one got stuck with this error; it's on Google's end. New key resolved immediately.

franze · 2025-11-26T22:39:04 1764196744

and then loosing half a day setting up billing - with a limited virtual credit card so you have at least some cost control

knollimar · 2025-11-26T23:21:37 1764199297

For me, I had just set up a project and set billing to that. Making a second key and assigning the billing to that was instant; I got to reuse it.

I have sympathy for any others who did not get so lucky

tarruda · 2025-11-25T09:44:37 1764063877

Interesting that the 8B of the Qwen3-VL family 9th place, above a few proprietary models. This thing can run locally with llama.cpp on modest hardware.

tarruda · 2025-11-01T16:20:31 1762014031

> Why not just reject papers authored by LLMs and ban accounts that are caught?

Are you saying that there's an automated method for reliably verifying that something was created by an LLM?

an0malous · 2025-11-01T17:04:29 1762016669

If there wasn’t, then how do they know LLMs are the problem?

tarruda · 2025-10-28T12:23:03 1761654183

> LLMs do not have a mechanism for sampling from given probability distributions

Would a LLM with tool calls be able to do this?

RA_Fisher · 2025-10-28T22:23:34 1761690214

Yes, ChatGPT can do it using Python today (the statsmodels library). I use it all the time (I’m a statistician).

sceptic123 · 2025-10-28T15:17:17 1761664637

Then it's not the LLM doing the work

catketch · 2025-10-28T16:37:22 1761669442

this is is a distinction without a difference in many instances. I can easily ask an llm to write a python tool to produce random numbers for a given distribution and then use that tool as needed. The LLM writes the code, and uses the executable result. Then end black box result is the LLM doing the work

sceptic123 · 2025-10-28T17:13:53 1761671633

But why limit it to generating random numbers, isn't the logical conclusion that the LLM writes a poker bot instead of playing the game? How would that demonstrate the poker skills of an LLM?

Workaccount2 · 2025-10-28T19:56:50 1761681410

There is a distinction, but for all intents and purposes, it's superficial.

tarruda · 2025-10-26T09:06:09 1761469569

> dunno if the life vest bit comment of yours was sarcastic, but it is a funny remark for sure :-)

It was a quote of the linked article:

"Holtec International, which owns the closed nuclear facility, reported the worker was a contractor who was wearing all required personal protective equipment, including a life vest while working near the pool without a barrier in place."

wrsh07 · 2025-10-26T17:29:26 1761499766

And it's important because it probably provides limits on how deep into the pool they might have fallen

From the linked xkcd in various threads it seems like a life vest should keep you in the "safe" zone

However it doesn't actually say how much is safe to drink