Fascinating paper. Thanks for the ref. Operating transistors outside the linear ...

Aurornis · 2025-02-24T15:02:14 1740409334

The machine learning model didn’t discover something that humans didn’t know about. It abused some functions specific to the chip that could not be repeated in production or even on other chips or other configurations of the same chip.

That is a common problem with fully free form machine learning solutions: They can stumble upon something that technically works in their training set, but any human who understood the full system would never actually use due to the other problems associated with it.

> The quadratic region (before the "on") is far more energy efficient

Take a look at the structure of something like CMOS and you’ll see why running transistors in anything other than “on” or “off” is definitely not energy efficient. In fact, the transitions are where the energy usage largely goes. We try to get through that transition period as rapidly as possible because minimal current flows when the transistors reach the on or off state.

There are other logic arrangements, but I don’t understand what you’re getting at by suggesting circuits would be more efficient. Are you referring to the reduced gate charge?

Thorondor · 2025-02-24T15:23:15 1740410595

> Take a look at the structure of something like CMOS and you’ll see why running transistors in anything other than “on” or “off” is definitely not energy efficient. In fact, the transitions are where the energy usage largely goes. We try to get through that transition period as rapidly as possible because minimal current flows when the transistors reach the on or off state.

Sounds like you might be thinking of power electronic circuits rather than CMOS. In a CMOS logic circuit, current does not flow from Vdd to ground as long as either the p-type or the n-type transistor is fully switched off. The circuit under discussion was operated in subthreshold mode, in which one transistor in a complementary pair is partially switched on and the other is fully switched off. So it still only uses power during transitions, and the energy consumed in each transition is lower than in the normal mode because less voltage is switched at the transistor gate.

Aurornis · 2025-02-24T16:13:40 1740413620

> In a CMOS logic circuit, current does not flow from Vdd to ground as long as either the p-type or the n-type transistor is fully switched off.

Right, but how do you get the transistor fully switched off? Think about what happens during the time when it’s transitioning between on and off.

You can run the transistors from the previous stage in a different part of the curve, but that’s not an isolated effect. Everything that impacts switching speed and reduces the current flowing to turn the next gate on or off will also impact power consumption.

There might be some theoretical optimization where the transistors are driven differently, but at what cost of extra silicon and how delicate is the balance between squeezing a little more efficiency and operating too close to the point where minor manufacturing changes can become outsized problems?

ajmurmann · 2025-02-24T16:12:04 1740413524

Seems like this overfitting problem could have been trivially fixed by running it on more than one chip, no?

Aurornis · 2025-02-24T16:16:53 1740413813

Unfortunately not. This is analogous to writing a C program that relied on undefined behavior on the specific architecture and CPU of your developer machine. It’s not portable.

The behavior could change from one manufacturing run to another. The behavior could disappear altogether in a future revision of the chip.

The behavior could even disappear if you change some other part of the design that then relocated the logic to a different set of cells on the chip. This was noted in the experiment where certain behavior depended on logic being placed in a specific location, generating certain timings.

If you rely on anything other than the behavior defined by the specifications, you’re at risk of it breaking. This is a problem with arriving at empirical solutions via guess and check, too.

Ideally you’d do everything in simulation rather than on-chip where possible. The simulator would only function in ways supported by the specifications of the chip without allowing undefined behavior.

Lerc · 2025-02-24T18:43:54 1740422634

>The behavior could change from one manufacturing run to another. The behavior could disappear altogether in a future revision of the chip.

That's the overfitting they were referring to. Relying on the individual behaviour is the overfit. Running on multiple chips (at learning time) reduces the benefit of using an improvement that is specific to one chip.

You are correct that simulation is the better solution, but you have to do more than just limit to the operating range of the components, you have to introduce variances similar to the specified production precision. If the simulator made assumptions that the behaviour of two similar components was absolutely identical to each other then within tolerance manufacturing errors could be magnified.

theamk · 2025-02-25T17:07:41 1740503261

If you simply buy multiple chips at once and train on them, you may overfit because they are all likely from the same wafer. If you spent an effort and bought chips from multiple sources, they might end up being all the same hardware revision. And even if you got all existing hardware revisions, there is no guarantees that the code will keep working on new hardware revisions which has not came out yet.

There is also problems with chips aging, related circuitry (filtering capacitors age too, and the power gets worse over time), operating temperature, faster degradation from unusual conditions...

As long as all you look at is inputs and outputs, it is impossible to not to overfit. For a robust system, you need to look at the official, published spec, because that's what the manufacturer guarantees and tests for - and AI cannot do this.

generalizations · 2025-02-25T19:52:19 1740513139

> For a robust system, you need to look at the official, published spec, because that's what the manufacturer guarantees and tests for - and AI cannot do this.

Why not? All you have to do is run it in a simulator.

adrian_b · 2025-02-24T15:21:17 1740410477

The previous poster was probably thinking about very low power analog circuits or extremely slow digital circuits (like those used in wrist watches), where the on-state of the MOS transistors is in the subthreshold conduction region (while the off state is the same off state as in any other CMOS circuits, ensuring a static power consumption determined only by leakage).

Such circuits are useful for something powered by a battery that must have a lifetime measured in years, but they cannot operate at high speeds.

nextaccountic · 2025-02-24T15:22:58 1740410578

In other words, optimization algorithms in general are prone to overfitting. Fortunately there are techniques to deal with that. Thing is, once you find a solution that generalize better to different chips, it probably won't be as small as the solution found.

nyeah · 2025-02-24T15:27:26 1740410846

I'm having trouble understanding. Chips with very high transistor counts tend to use saturation/turn-off almost exclusively. Very little is done in the linear region because it burns a lot of power and it's less predictable.

shermantanktop · 2025-02-24T16:00:09 1740412809

> Operating transistors outside the linear region (the saturated "on")

Do fuzz pedals count?

To be fair, we know they work and basically how they work, but the sonic nuances can be very hard to predict from a schematic.

nimish · 2025-02-24T23:35:02 1740440102

>Operating transistors outside the linear region (the saturated "on") on a billion+ scale

The whole point of switching transistors is that we _only_ operate them in the fully saturated on or totally off IV-curve region?

Subthreshold circuits are commercially available, just unpopular since all the tools are designed for regular circuits. And the overlap between people who understand semiconductors and people who can make computational tools is very limited, or it's just cheaper to throw people+process shrinks at the problem.

ImHereToVote · 2025-02-23T21:16:06 1740345366

I believe neuromorphic spiking hardware will be the step to truly revolutionize the field of anthropod contagion issues.

trainsarebetter · 2025-02-23T23:07:00 1740352020

Can’t tell if this is a joke or not

burnished · 2025-02-24T00:42:49 1740357769

I came in already knowing what neuromorphic hardware is and I'm also unsure

fouc · 2025-02-24T02:58:41 1740365921

joke I think, anthropod is probably another way of saying bugs/ants haha

Sharlin · 2025-02-24T14:58:06 1740409086

*arthropod, as in "joint(ed) leg" (cf. arthritis), GP misspelled it. "Anthropod" would mean something like "human leg".

yencabulator · 2025-02-27T21:47:57 1740692877

AI-created chips will be so weird, instead of bugs they'll have aliens: https://monster.fandom.com/wiki/Anthropod

burnished · 2025-02-24T03:20:28 1740367228

Oh christ you're right, they were actually being really funny. I was being super literal and imagined them being very excited about futuristic advances in giant isopod diagnosis and care

ImHereToVote · 2025-02-24T08:03:52 1740384232

Yeah, anthropic bugs. The planet is infested with them.

sat_solver · 2025-02-24T16:54:07 1740416047

Bug zapper

igleria · 2025-02-24T15:10:54 1740409854

at last, something possibly more buggy than vibe coding!

gtirloni · 2025-02-24T15:31:52 1740411112

My thoughts, exactly.