A VM is displayed as a window on the host OS and Emacs is the window manager within that VM window. What's the difference from running emacs directly as an application on the host?
I heard last year the potential future of gaming is not rendering but fully AI generated frames. 3 seconds per 'frame' now, it's not hard to believe it could do 60fps in a few short years. It makes it seem more likely such a game could exist. I'm not sure I like the idea, but it seems like it could happen
The problem is going to be how to control those models to produce a universe that's temporally and spatially consistent. Also think of other issues such as networked games, how would you even begin to approach that in this new paradigm? You need multiple models to have a shared representation that includes other players. You need to be able to sync data efficiently across the network.
I get that it's tempting to say "we no longer have to program game engines, hurray", but at the same time, we've already done the work, we already have game engines that are relatively very computationally efficient and predictable. We understand graphics and simulation quite well.
Personally: I think there's an obvious future in using AI tools to generate game content. 3D modelling and animation can be very time consuming. If you could get an AI model to generate animated characters, you could save a lot of time. You could also empower a lot of indie devs who don't have 3D modelers to help them. AI tools to generate large maps, also super valuable. Replacing the game engine itself, I think it's a taller order than people realize, and maybe not actually desirable.
20 years out, what will everybody be using routine 10gbps pipes in our homes for?
I'm paying $43 / month for 500mbps at present and there's nothing special about that at all (in the US or globally). What might we finally use 1gbps+ for? Pulling down massive AI-built worlds of entertainment. Movies & TV streaming sure isn't going to challenge our future bandwidth capabilities.
The worlds are built and shared so quickly in the background that with some slight limitations you never notice the world building going on behind the scenes.
The world building doesn't happen locally. Multiple players connect to the same built world that is remote. There will be smaller hobbyist segments that will still world-build locally for numerous reasons (privacy for one).
The worlds can be constructed entirely before they're downloaded. There are good arguments for both approaches (build the entire world then allow it to be accessed, or attempt to world-build as you play). Both will likely be used over the coming decades, for different reasons and at different times (changes in capabilities will unlock new arguments for either as time goes on, with a likely back and forth where one pulls ahead then the other pulls ahead).
Increasing the framerate by rendering at a lower resolution + upscaling, or outright generation of extra frames has already been a thing for a few years now. NVidia calls it Deep Learning Super Sampling (DLSS)[1]. AMD's equivalent is called FSR[2].
Not saying M1 Ultra is great. But you should only get ~8x slow down with proper implementation (such as Draw Things upcoming implementation for Z Image). It should be 2~3 sec per step. On M5 iPad, it is ~6s per step.
And implicit in this is that it compares very poorly to SOTA models. Do you disagree with that? Do you think these Models are beating SOTA and they did not include the benchmarks, because they forgot?
> It's a separate league from closed models entirely.
To be fair, the SOTA models aren't even a single LLM these days. They are doing all manner of tool use and specialised submodel calls behind the scenes - a far cry from in-model MoE.
I think that Qwen3 8B and 4B are SOTA for their size. The GPQA Diamond accuracy chart is weird: Both Qwen3 8B and 4B have higher scores, so they used this weid chart where "x" axis shows the number of output tokens. I missed the point of this.
Generation time is more or less proportional to tokens * model size, so if you can get the same quality result with fewer tokens from the same size of model, then you save time and money.
~20 tokens/second is actually pretty good. I see he's using the q5 version of the model. I wonder how it scales with the larger contexts. And the same guy published the video today with the new 3.2 version: https://www.youtube.com/watch?v=b6RgBIROK5o
As pointed out by a sibling comment. MOE consists of a router and a number of experts (eg 8). These experts can be imagined as parts of the brain with specialization, although in reality they probably don't work exactly like that. These aren't separate models, they are components of a single large model.
Typically, input gets routed to a number of of experts eg. top 2, leaving the others inactive. This reduces number of activation / processing requirements.
Mistral is an example of a model that's designed like this. Clever people created converters to transform dense models to MOE models. These days many popular models are also available in MOE configuration
that's not really a good summary of what MoEs are. you can more consider it like sublayers that get routed through (like how the brain only lights up certain pathways) rather than actual separate models.
The gains from MoE is that you can have a large model that's efficient, it lets you decouple #params and computation cost. I don't see how anthropomorphizing MoE <-> brain affords insight deeper than 'less activity means less energy used'. These are totally different systems, IMO this shallow comparison muddies the water and does a disservice to each field of study. There's been loads of research showing there's redundancy in MoE models, ie cerebras has a paper[1] where they selectively prune half the experts with minimal loss across domains -- I'm not sure you could disable half the brain and notice a stupefying difference.
> I don't see how anthropomorphizing MoE <-> brain affords insight deeper than 'less activity means less energy used'.
I'm not saying it is a perfect analogy, but it is by far the most familiar one for people to describe what sparse activation means. I'm no big fan of over-reliance on biological metaphor in this field, but I think this is skewing a bit on the pedantic side.
re: your second comment about pruning, not to get in the weeds but I think there have been a few unique cases where people did lose some of their brain and the brain essentially routed around it.
I had a terrible first impression with Gemini CLI a few months ago when it was released because of the constant 409 errors.
With Gemini 3 release I decided to give it another go, and now the error changed to: "You've reached the daily limit with this model", even though I have an API key with billing set up. It wouldn't let me even try Gemini 3 and even after switching to Gemini 2.5 it would still throw this error after a few messages.
Google might have the best LLMs, but its agentic coding experience leaves a lot to be desired.
Interesting that the 8B of the Qwen3-VL family 9th place, above a few proprietary models. This thing can run locally with llama.cpp on modest hardware.
this is is a distinction without a difference in many instances. I can easily ask an llm to write a python tool to produce random numbers for a given distribution and then use that tool as needed. The LLM writes the code, and uses the executable result. Then end black box result is the LLM doing the work
But why limit it to generating random numbers, isn't the logical conclusion that the LLM writes a poker bot instead of playing the game? How would that demonstrate the poker skills of an LLM?
> dunno if the life vest bit comment of yours was sarcastic, but it is a funny remark for sure :-)
It was a quote of the linked article:
"Holtec International, which owns the closed nuclear facility, reported the worker was a contractor who was wearing all required personal protective equipment, including a life vest while working near the pool without a barrier in place."
reply