Operating transistors outside the linear region (the saturated "on") on a billion+ scale is something that we as engineers and physicists haven't quite figured out, and I am hoping that this changes in future, especially with the advent of analog neuromorphic computing. The quadratic region (before the "on") is far more energy efficient and the non-linearity could actually help with computing, not unlike the activation function in an NN.
Of course, the modeling the nonlinear behavior is difficult. My prof would say for every coefficient in SPICE's transistor models, someone dedicated his entire PhD (and there are a lot of these coefficients!).
I haven't been in touch with the field since I moved up the stack (numerical analysis/ML) I would love to learn more if there has been recent progress in this field.
The machine learning model didn’t discover something that humans didn’t know about. It abused some functions specific to the chip that could not be repeated in production or even on other chips or other configurations of the same chip.
That is a common problem with fully free form machine learning solutions: They can stumble upon something that technically works in their training set, but any human who understood the full system would never actually use due to the other problems associated with it.
> The quadratic region (before the "on") is far more energy efficient
Take a look at the structure of something like CMOS and you’ll see why running transistors in anything other than “on” or “off” is definitely not energy efficient. In fact, the transitions are where the energy usage largely goes. We try to get through that transition period as rapidly as possible because minimal current flows when the transistors reach the on or off state.
There are other logic arrangements, but I don’t understand what you’re getting at by suggesting circuits would be more efficient. Are you referring to the reduced gate charge?
> Take a look at the structure of something like CMOS and you’ll see why running transistors in anything other than “on” or “off” is definitely not energy efficient. In fact, the transitions are where the energy usage largely goes. We try to get through that transition period as rapidly as possible because minimal current flows when the transistors reach the on or off state.
Sounds like you might be thinking of power electronic circuits rather than CMOS. In a CMOS logic circuit, current does not flow from Vdd to ground as long as either the p-type or the n-type transistor is fully switched off. The circuit under discussion was operated in subthreshold mode, in which one transistor in a complementary pair is partially switched on and the other is fully switched off. So it still only uses power during transitions, and the energy consumed in each transition is lower than in the normal mode because less voltage is switched at the transistor gate.
> In a CMOS logic circuit, current does not flow from Vdd to ground as long as either the p-type or the n-type transistor is fully switched off.
Right, but how do you get the transistor fully switched off? Think about what happens during the time when it’s transitioning between on and off.
You can run the transistors from the previous stage in a different part of the curve, but that’s not an isolated effect. Everything that impacts switching speed and reduces the current flowing to turn the next gate on or off will also impact power consumption.
There might be some theoretical optimization where the transistors are driven differently, but at what cost of extra silicon and how delicate is the balance between squeezing a little more efficiency and operating too close to the point where minor manufacturing changes can become outsized problems?
Unfortunately not. This is analogous to writing a C program that relied on undefined behavior on the specific architecture and CPU of your developer machine. It’s not portable.
The behavior could change from one manufacturing run to another. The behavior could disappear altogether in a future revision of the chip.
The behavior could even disappear if you change some other part of the design that then relocated the logic to a different set of cells on the chip. This was noted in the experiment where certain behavior depended on logic being placed in a specific location, generating certain timings.
If you rely on anything other than the behavior defined by the specifications, you’re at risk of it breaking. This is a problem with arriving at empirical solutions via guess and check, too.
Ideally you’d do everything in simulation rather than on-chip where possible. The simulator would only function in ways supported by the specifications of the chip without allowing undefined behavior.
>The behavior could change from one manufacturing run to another. The behavior could disappear altogether in a future revision of the chip.
That's the overfitting they were referring to. Relying on the individual behaviour is the overfit. Running on multiple chips (at learning time) reduces the benefit of using an improvement that is specific to one chip.
You are correct that simulation is the better solution, but you have to do more than just limit to the operating range of the components, you have to introduce variances similar to the specified production precision. If the simulator made assumptions that the behaviour of two similar components was absolutely identical to each other then within tolerance manufacturing errors could be magnified.
If you simply buy multiple chips at once and train on them, you may overfit because they are all likely from the same wafer. If you spent an effort and bought chips from multiple sources, they might end up being all the same hardware revision. And even if you got all existing hardware revisions, there is no guarantees that the code will keep working on new hardware revisions which has not came out yet.
There is also problems with chips aging, related circuitry (filtering capacitors age too, and the power gets worse over time), operating temperature, faster degradation from unusual conditions...
As long as all you look at is inputs and outputs, it is impossible to not to overfit. For a robust system, you need to look at the official, published spec, because that's what the manufacturer guarantees and tests for - and AI cannot do this.
> For a robust system, you need to look at the official, published spec, because that's what the manufacturer guarantees and tests for - and AI cannot do this.
Why not? All you have to do is run it in a simulator.
The previous poster was probably thinking about very low power analog circuits or extremely slow digital circuits (like those used in wrist watches), where the on-state of the MOS transistors is in the subthreshold conduction region (while the off state is the same off state as in any other CMOS circuits, ensuring a static power consumption determined only by leakage).
Such circuits are useful for something powered by a battery that must have a lifetime measured in years, but they cannot operate at high speeds.
In other words, optimization algorithms in general are prone to overfitting. Fortunately there are techniques to deal with that. Thing is, once you find a solution that generalize better to different chips, it probably won't be as small as the solution found.
I'm having trouble understanding. Chips with very high transistor counts tend to use saturation/turn-off almost exclusively. Very little is done in the linear region because it burns a lot of power and it's less predictable.
>Operating transistors outside the linear region (the saturated "on") on a billion+ scale
The whole point of switching transistors is that we _only_ operate them in the fully saturated on or totally off IV-curve region?
Subthreshold circuits are commercially available, just unpopular since all the tools are designed for regular circuits. And the overlap between people who understand semiconductors and people who can make computational tools is very limited, or it's just cheaper to throw people+process shrinks at the problem.
Oh christ you're right, they were actually being really funny. I was being super literal and imagined them being very excited about futuristic advances in giant isopod diagnosis and care
Operating transistors outside the linear region (the saturated "on") on a billion+ scale is something that we as engineers and physicists haven't quite figured out, and I am hoping that this changes in future, especially with the advent of analog neuromorphic computing. The quadratic region (before the "on") is far more energy efficient and the non-linearity could actually help with computing, not unlike the activation function in an NN.
Of course, the modeling the nonlinear behavior is difficult. My prof would say for every coefficient in SPICE's transistor models, someone dedicated his entire PhD (and there are a lot of these coefficients!).
I haven't been in touch with the field since I moved up the stack (numerical analysis/ML) I would love to learn more if there has been recent progress in this field.