More

alyxya · 2025-12-19T04:46:55 1766119615

I don’t think the two kinds of vibe coding are entirely separate. There’s a spectrum of how much context you care to understand yourself, and it’s feasible to ask a lot of questions to gain more understanding or let loose and give more discretion to the LLM.

alyxya · 2025-12-10T17:26:12 1765387572

Minor thing that bothers me is that I can't scroll through the things like in the deep sea or space elevator.

alyxya · 2025-12-09T05:04:13 1765256653

I’ve had a hard time parsing what exactly the paper is trying to explain. So far I’ve understood that their comparison seems to be models within the same family and same weight tensor dimensions, so they aren’t showing a common subspace when there isn’t a 1:1 match between weight tensors in a ViT and GPT2. The plots showing the distribution of principal component values presumably does this on every weight tensor, but this seems to be an expected result that the principal component values shows a decaying curve like a log curve where only a few principal components are the most meaningful.

What I don’t get is what is meant by a universal shared subspace, because there is some invariance regarding the specific values in weights and the directions of vectors in the model. For instance, if you were doing matrix multiplication with a weight tensor, you could swap two rows/columns (depending on the order of multiplication) and all that would do is swap two values in the resulting product, and whatever uses that output could undo the effects of the swap so the whole model has identical behavior, yet you’ve changed the direction of the principal components. There can’t be fully independently trained models that share the exact subspace directions for analogous weight tensors because of that.

seeknotfind · 2025-12-09T05:25:45 1765257945

Yeah, it sounds platonic the way it's written, but it seems more like a hyped model compression technique.

alyxya · 2025-12-07T16:34:47 1765125287

The hardest part about making a new architecture is that even if it is just better than transformers in every way, it’s very difficult to both prove a significant improvement at scale and gain traction. Until google puts in a lot of resources into training a scaled up version of this architecture, I believe there’s plenty of low hanging fruit with improving existing architectures such that it’ll always take the back seat.

tyre · 2025-12-07T20:11:35 1765138295

Google is large enough, well-funded enough, and the opportunity is great enough to run experiments.

You don't necessarily have to prove it out on large foundation models first. Can it beat out a 32b parameter model, for example?

swatcoder · 2025-12-07T20:48:32 1765140512

Do you think there might be an approval process to navigate when experiments costs might run seven or eight digits and months of reserved resources?

While they do have lots of money and many people, they don't have infinite money and specifically only have so much hot infrastructure to spread around. You'd expect they have to gradually build up the case that a large scale experiment is likely enough to yield a big enough advantage over what's already claiming those resources.

dpe82 · 2025-12-08T08:01:09 1765180869

I would imagine they do not want their researchers unnecessarily wasting time fighting for resources - within reason. And at Google, "within reason" can be pretty big.

howdareme · 2025-12-08T10:34:35 1765190075

I mean looking antigravity, jules & gemini cli, they have have no problem with their developers fighting for resources

nl · 2025-12-08T11:45:49 1765194349

I mean you'd think so, but...

> In fact, the UL2 20B model (at Google) was trained by leaving the job running accidentally for a month.

https://www.yitay.net/blog/training-great-llms-entirely-from...

p1esk · 2025-12-07T19:36:31 1765136191

Until google puts in a lot of resources into training a scaled up version of this architecture

If Google is not willing to scale it up, then why would anyone else?

8note · 2025-12-08T01:44:48 1765158288

chatgpt is an example on why.

falcor84 · 2025-12-08T15:23:39 1765207419

You think that this might be another ChatGPT/Docker/Hadoop case, where Google comes up with the technology but doesn't care to productize it?

nickpsecurity · 2025-12-07T22:52:25 1765147945

But, it's companies like Google that made tools like Jax and TPU's saying we can throw together models with cheap, easy scaling. Their paper's math is probably harder to put together than an alpha-level prototype which they need anyway.

So, I think they could default on doing it for small demonstrators.

m101 · 2025-12-08T00:15:56 1765152956

Prove it beats models of different architectures trained under identical limited resources?

UltraSane · 2025-12-07T17:11:18 1765127478

Yes. The path dependence for current attention based LLMs is enormous.

patapong · 2025-12-07T19:10:11 1765134611

At the same time, there is now a ton of data for training models to act as useful assistants, and benchmarks to compare different assistant models. The wide availability and ease of obtaining new RLHF training data will make it more feasible to build models on new architectures I think.

alyxya · 2025-12-04T22:09:49 1764886189

There generally aren't new techniques when optimizing something ubiquitous. Instead, there are a lot of ways to apply existing techniques to create new and better results. Most ideas are built on top of the same foundational principles.

josephg · 2025-12-05T13:04:20 1764939860

Yes. And there’s still lots of places where you can get significant speed ups by simply applying those old techniques in a new domain or a novel way. The difference between a naive implementation of an algorithm and an optimised one is often many orders of magnitude. Look at automerge - which went from taking 30 seconds on a simple example to tens of milliseconds.

I think about this regularly when I compile C++ or rust using llvm. It’s an excellent compiler backend. It produces really good code. But it is incredibly slow, and for no good technical reason. Plenty of other similar compilers run circles around it.

Imagine an llvm rewrite by the people who made V8, or chrome or the unreal engine. Or the guy who made luajit or the Go compiler team. I’d be shocked if we didn’t see an order of magnitude speed up overnight. They’d need some leeway to redesign llvm IR of course. And it would take years to port all of llvm’s existing optimisations. But my computer can retire billions of operations per second. And render cyberpunk at 60fps. It shouldn’t take seconds of cpu time to compile a small program.

slashdave · 2025-12-05T00:02:39 1764892959

I am not sure about that. However, what is clear is that if there is a new technique, it will not be found by this LLM.

CapsAdmin · 2025-12-05T01:33:47 1764898427

It's generally true, isn't it? Otherwise we'd have ground breaking discoveries every day about some new and fastest way to do X.

The way I see it, mathematicians have been trying (and somewhat succeeding every 5~ years) to prove faster ways to do matrix multiplications since the 1970s. But this is only in theory.

If you want to implement the theory, you suddenly have many variables you need to take care of such as memory speed, cpu instructions, bit precision, etc. So in practice, an actual implementation of some theory likely have more room to improve. It is also likely that LLM's can help figure out how to write a more optimal implementation.

alyxya · 2025-12-04T22:06:29 1764885989

The chart confused me because I expected to see performance numbers of CUDA-L2 compared to the others, but instead it shows a chart showing the speedup percentage of CUDA-L2 over the others. In some sense, the bar chart effectively inverts the performance of torch.matmul and cuBLAS with how much percentage it shows. 0% on the bar chart would only mean equal performance.

alyxya · 2025-11-25T22:12:02 1764108722

They have a moat defined by being well known in the AI industry, so they have credibility and it wouldn't be hard for anything they make to gain traction. Some unknown player who replicates it, even if it was just as good as what SSI does, will struggle a lot more with gaining attention.

baxtr · 2025-11-25T22:23:35 1764109415

Being well known doesn’t qualify as a moat.

mrandish · 2025-11-25T22:37:40 1764110260

Agreed. But it can be a significant growth boost. Senior partners at high-profile VCs will meet with them. Early key hires they are trying to recruit will be favorably influenced by their reputation. The media will probably cover whatever they launch, accelerating early user adoption. Of course, the product still has to generate meaningful value - but all these 'buffs' do make several early startup challenges significantly easier to overcome. (Source: someone who did multiple tech startups without those buffs and ultimately reached success. Spending 50% of founder time for six months to raise first funding is a significant burden (working through junior partners and early skepticism) vs 20% of founder time for three weeks.)

baxtr · 2025-11-25T23:11:17 1764112277

Yes, I am not debating that it gets you a significant boost.

I’m personally not aware of a strong correlation with real business value created after the initial boost phase. But surely there must be examples.

alyxya · 2025-11-25T22:09:02 1764108542

The impactful innovations in AI these days aren't really from scaling models to be larger. It's more concrete to show higher benchmark scores, and this implies higher intelligence, but this higher intelligence doesn't necessarily translate to all users feeling like the model has significantly improved for their use case. Models sometimes still struggle with simple questions like counting letters in a word, and most people don't have a use case of a model needing phd level research ability.

Research now matters more than scaling when research can fix limitations that scaling alone can't. I'd also argue that we're in the age of product where the integration of product and models play a major role in what they can do combined.

pron · 2025-11-25T22:26:37 1764109597

> this implies higher intelligence

Not necessarily. The problem is that we can't precisely define intelligence (or, at least, haven't so far), and we certainly can't (yet?) measure it directly. And so what we have are certain tests whose scores, we believe, are correlated with that vague thing we call intelligence in humans. Except these test scores can correlate with intelligence (whatever it is) in humans and at the same time correlate with something that's not intelligence in machines. So a high score may well imply high intellignce in humans but not in machines (e.g. perhaps because machine models may overfit more than a human brain does, and so an intelligence test designed for humans doesn't necessarily measure the same thing we think of when we say "intelligence" when applied to a machine).

This is like the following situation: Imagine we have some type of signal, and the only process we know produces that type of signal is process A. Process A always produces signals that contain a maximal frequency of X Hz. We devise a test for classifying signals of that type that is based on sampling them at a frequency of 2X Hz. Then we discover some process B that produces a similar type of signal, and we apply the same test to classify its signals in a similar way. Only, process B can produce signals containing a maximal frequency of 10X Hz and so our test is not suitable for classifying the signals produced by process B (we'll need a different test that samples at 20X Hz).

matu3ba · 2025-11-25T23:26:09 1764113169

My definition of intelligence is the capability to process and formalize a deterministic action from given inputs as transferable entity/medium. In other words knowing how to manipulate the world directly and indirectly via deterministic actions and known inputs and teach others via various mediums. As example, you can be very intelligent at software programming, but socially very dumb (for example unable to socially influence others).

As example, if you do not understand another person (in language) and neither understand the person's work or it's influence, then you would have no assumption on the person's intelligence outside of your context what you assume how smart humans are.

ML/AI for text inputs is stochastic at best for context windows with language or plain wrong, so it does not satisfy the definition. Well (formally) specified with smaller scope tend to work well from what I've seen so far. Known to me working ML/AI problems are calibration/optimization problems.

What is your definition?

Yizahi · 2025-11-26T10:46:47 1764154007

Forming deterministic actions is a sign of computation, not intelligence. Intelligence is probably (I guess) dependent on the nondeterministic actions.

Computation is when you query a standby, doing nothing, machine and it computes a deterministic answer. Intelligence (or at least some sign of it) is when machine queries you, the operator, on it's own volition.

matu3ba · 2025-11-26T12:03:37 1764158617

> Forming deterministic actions is a sign of computation, not intelligence.

What computations can process and formalize other computations as transferable entity/medium, meaning to teach other computations via various mediums?

> Intelligence is probably (I guess) dependent on the nondeterministic actions.

I do agree, but I think intelligent actions should be deterministic, even if expressing non-deterministic behavior.

> Computation is when you query a standby, doing nothing, machine and it computes a deterministic answer.

There are whole languages for stochastic programming https://en.wikipedia.org/wiki/Stochastic_programming to express deterministically non-deterministic behavior, so I think that is not true.

> Intelligence (or at least some sign of it) is when machine queries you, the operator, on it's own volition.

So you think the thing, who holds more control/force at doing arbitrary things as the thing sees fit, is more intelligent? That sounds to me more like the definition of power, not intelligence.

Yizahi · 2025-11-26T12:38:02 1764160682

> So you think the thing, who holds more control/force at doing arbitrary things as the thing sees fit, is more intelligent? That sounds to me more like the definition of power, not intelligence.

I want to address this item. I think not about control or comparing something to something. I think intelligence is having at least some/any voluntary thinking. A cat can't do math or write text, but he can think on his own volition and is therefore intelligent being. A CPU running some externally predefined commands, is not intelligent, yet.

I wonder if LLM can be stepping stone to intelligence or not, but it is not clear for me.

matu3ba · 2025-11-26T13:00:27 1764162027

I like the idea of voluntary thinking very much, but I have no idea how to properly formalize or define it.

pron · 2025-11-25T23:42:10 1764114130

> My definition of intelligence is the capability to process and formalize a deterministic action from given inputs as transferable entity/medium.

I don't think that's a good definition because many deterministic processes - including those at the core of important problems, such as those pertaining to the economy - are highly non-linear and we don't necessarily think that "more intelligence" is what's needed to simulate them better. I mean, we've proven that predicting certain things (even those that require nothing but deduction) require more computational resources regardless of the algorithm used for the prediction. Formalising a process, i.e. inferring the rules from observation through induction, may also be dependent on available computational resources.

> What is your definition?

I don't have one except for "an overall quality of the mental processes humans present more than other animals".

matu3ba · 2025-11-26T09:53:16 1764150796

> I mean, we've proven that predicting certain things (even those that require nothing but deduction) require more computational resources regardless of the algorithm used for the prediction.

I do understand proofs as formalized deterministic action for given inputs and processing as the solving of various proofs.

> Formalising a process, i.e. inferring the rules from observation through induction, may also be dependent on available computational resources.

Induction is only one way to construct a process and there are various informal processes (social norms etc). It is true, that the overall process depends on various things like available data points and resources.

> I don't have one except for "an overall quality of the mental processes humans present more than other animals".

How would your formalize the process of self-reflection and believing in completely made-up stories of humans often used as example that distinguishes animals from humans? It is hard to make a clear distinction in language and math, since we mostly do not understand animal language and math or other well observable behavior (based on that).

VMG · 2025-11-26T10:23:55 1764152635

ML/AI is much less stochastic than an average human

alyxya · 2025-11-25T22:35:03 1764110103

Fair, I think it would be more appropriate to say higher capacity.

pron · 2025-11-25T22:44:05 1764110645

Ok, but the point of a test of this kind is to generalise its result. I.e. the whole point of an intelligence test is that we believe that a human getting a high score on such a test is more likely to do some useful things not on the test better than a human with a low score. But if the problem is that the test results - as you said - don't generalise as we expect them, then the tests are not very meaningful to begin with. If we don't know what to expect from a machine with a high test score when it comes to doing things not on the test, then the only "capacity" we're measuring is the capacity to do well on such tests, and that's not very useful.

nutjob2 · 2025-11-25T22:45:27 1764110727

> this implies higher intelligence

Models aren't intelligent, the intelligence is latent in the text (etc) that the model ingests. There is no concrete definition of intelligence, only that humans have it (in varying degrees).

The best you can really state is that a model extracts/reveals/harnesses more intelligence from its training data.

darkmighty · 2025-11-25T22:51:24 1764111084

There is no concrete definition of a chair either.

gafferongames · 2025-11-26T04:11:16 1764130276

And yet I'm sitting in one

dragonwriter · 2025-11-25T22:48:25 1764110905

> There is no concrete definition of intelligence

Note that if this is true (and it is!) all the other statements about intelligence and where it is and isn’t found in the post (and elsewhere) are meaningless.

interstice · 2025-11-26T00:33:17 1764117197

I did notice that, the person you replied to made a categorical statement about intelligence followed immediately with negating that there is anything to make a concrete statement about.

TheBlight · 2025-11-25T22:13:03 1764108783

"Scaling" is going to eventually apply to the ability to run more and higher fidelity simulations such that AI can run experiments and gather data about the world as fast and as accurately as possible. Pre-training is mostly dead. The corresponding compute spend will be orders of magnitude higher.

alyxya · 2025-11-25T22:22:06 1764109326

That's true, I expect more inference time scaling and hybrid inference/training time scaling when there's continual learning rather than scaling model size or pretraining compute.

TheBlight · 2025-11-25T22:27:59 1764109679

Simulation scaling will be the most insane though. Simulating "everything" at the quantum level is impossible and the vast majority of new learning won't require anything near that. But answers to the hardest questions will require as close to it as possible so it will be tried. Millions upon millions of times. It's hard to imagine.

emporas · 2025-11-26T16:40:17 1764175217

>Pre-training is mostly dead.

I don't think so. Serious attempts for producing data specifically for training have not being achieved yet. High quality data I mean, produced by anarcho-capitalists, not corporations like Scale AI using workers, governed by laws of a nation etc etc.

Don't underestimate the determination of 1 million young people to produce within 24 hours perfect data, to train a model to vacuum clean their house, if they don't have to do it themselves ever again, and maybe earn some little money on the side by creating the data.

The other part of the comment I agree.

pessimizer · 2025-11-25T22:34:16 1764110056

> most people don't have a use case of a model needing phd level research ability.

Models also struggle at not fabricating references or entire branches of science.

edit: "needing phd level research ability [to create]"?

jfim · 2025-11-25T23:43:01 1764114181

Counting letters is tricky for LLMs because they operate on tokens, not letters. From the perspective of a LLM, if you ask it "this is a sentence, count the letters in it" it doesn't see a stream of characters like we do, it sees [851, 382, 261, 21872, 11, 3605, 290, 18151, 306, 480].

tintor · 2025-11-26T02:15:13 1764123313

So what? It knows number of letters in each token, and can sum them together.

fzzzy · 2025-11-26T02:59:53 1764125993

How does it know the letters in the token?

It doesn't.

There's literally no mapping anywhere of the letters in a token.

ACCount37 · 2025-11-26T08:03:03 1764144183

There is a mapping. An internal, fully learned mapping that's derived from seeing misspellings and words spelled out letter by letter. Some models make it an explicit part of the training with subword regularization, but many don't.

It's hard to access that mapping though.

A typical LLM can semi-reliably spell common words out letter by letter - but it can't say how many of each are in a single word immediately.

But spelling the word out first and THEN counting the letters? That works just fine.

danielscrubs · 2025-11-26T04:54:37 1764132877

If it did frequency analysis then I would consider it having a PhD level intelligence, not just a PhD level of knowledge (like a dictionary).

alyxya · 2025-11-18T19:31:18 1763494278

Making a VSCode fork is probably the wrong direction at this point in time. The future of agentic coding should need less support for code editor related functionality, and could eventually primarily support viewing code rather than editing code. There's a lot more flexibility in UI starting from scratch, and personally I want to see a UI that allows flexible manipulation of context and code changes with multiple agents.

latentsea · 2025-11-19T00:27:21 1763512041

GitHub is building a UI like this. I like it. I sometimes need the full IDE, but plenty of times don't. It's nice to be able to easily see what the agent is up to and converse with it in real-time while reviewing it's outputs.

alyxya · 2025-11-18T14:07:18 1763474838

I think Google probably cares more about a strong generalist model rather than solely optimizing for coding.