Yesterday I promised an overview of what power reduction techniques are out there. First, a disclosure: I was interim CEO of Envis for about a year and I’ve done some consulting for Nanochronous.
Firstly, there are two kinds of power: dynamic and static. Dynamic power is used in switching signals inside the circuit. It is affected by operating frequency and voltage. Static power is dissipated whether the circuit is doing anything or not, and mostly is leakage power through transistors that are supposedly off but in fact leak a little current. This was not a problem above 100nm or so, but below transistors are not so much on and off, as bright and dim.
The most common way to control leakage is to use special libraries that have two versions of each gate (or most gates). One is slow but has low leakage. One is fast but leaks since it never truly turns completely off. On the critical path the fast leaky gates are used; off the critical path the non-leaky slow gates are used. Synthesis tools will choose the cells automatically based on the timing constraints.
Taking this technique a little further was Blaze DFM whose tool would make tiny adjustments to the mask data for transistors off the critical path, lowering their performance but making them leak a lot less. TSMC licensed this technology and Tela announced yesterday that it was acquiring them.
The most common dynamic technique is clock gating. The old rules used to be to do purely synchronous design, and clock every flop on every clock cycle. If a register was only loaded with a new value sometimes, then a multiplexor was added to recirculate the old value back to the input so that when the flop was clocked it would re-latch the same value as it was already holding. The simplest form of clock gating is to replace those multiplexors with a clock gating element (CGE) that inhibits clocking the flop when the value doesn’t change. This doesn’t win you anything on a single flop, but if it is, for example, a 32-bit register then 32 muxes can be replaced with a single CGE saving on area, and, because the effective clock rate of the register is reduced, power. By clever circuit analysis it is possible to find more complex circumstances under which registers can be suppressed either combinationally (the value really wasn’t going to change) or sequentially (the value might bave been going to change but no output from the circuit would noticed the change). All the synthesis tools, most notably Synopsys Power Compiler, do the mux replacement. Calypto and Envis are two companies automating the more extensive gating approaches.
Next there is a whole spectrum of techniques that depend on voltage islands. A voltage island is an area of the chip with its own power supply. Obviously this has a major impact on physical design since the island must correspond to a particular region of the die. The first thing that can be done with voltage islands is simply to power them with different supply voltages. Those on a lower voltage will have lower performance, of course, but they will also consume lower power, both static and dynamic.
Power down is another common technique. Voltage islands which are not being used are turned off completely by turning off their power supply. When you are not making a call on your cell-phone, the gates used to process transmit and receive data are not required and can be turned off. This needs to be done carefully, or else the current inrush when the island is turned back on can cause the voltage to drop elsewhere on the chip. Typically this means that the island must be powered up slowly using small transistors and then finally brought up to operational level by turning on much larger transistors. Powering down blocks is always done under software control but the powered down block needs to be isolated from the rest of the circuit so that its output signals do not drift and cause crowbar current and waste power elsewhere. There are no tools for automatically finding areas to power down. The software not the netlist would be the place to look. The CPF and UPF formats have extensive support for power down.
As we get deeper below 100nm, the variability of processes gets much wider. This means that the typical chip and the worst case chip are getting further and further apart and so the penalty of designing to worst case design, given that most chips are typical by definition, gets larger and larger. Adaptive voltage scaling is a way to handle this. Use on-chip circuitry to measure the actual performance, and then lower the voltage (saving both dynamic and static power) just the right amount that the chip still runs at the correct speed.
One adaptive solution involving off-chip voltage regulators is National Powerwise. They have put this in the public domain since they make their money selling the off-chip voltage regulators. Nanochronous builds copies of critical paths and uses these to adapt the clocking to the environment (process corner, voltage, temperature) so that the chip will automatically consume less power but still run to spec as the voltage is lowered. Elastix does something similar, adapting the performance of the chip as the voltage is altered, while taking the process corner into account. Handshake removes the clock completely and runs asynchronously with whatever performance is appropriate given the power supply voltage. Nanochronous is in Greece, Elastix is in Spain, and Handshake is in Netherlands; it must be something in the wine.
The next approach is to vary the voltage to islands while the chip is being used, rather than having fixed, but different, power supply voltages for each island. When the voltage is changed under software control it is known as dynamic voltage and frequency scaling. This is a technique that is talked about a lot and used only a little, as far as I can tell. The idea is that if your microprocessor (or whatever) is not doing anything very important, why not run it slowly. And when it is in heavy computation mode run it flat out. To do this is tricky though. To slow it down the frequency must be lowered, and then (and only then) the voltage can be lowered. To speed up, the voltage must be raised, which takes time if it is not going to create a lot of power-supply noise, and then the frequency can be bumped up.
A lot of power gets consumed in the clock tree itself. Certainly 30% and sometimes 50% of the total power. Azuro works on laying this out and placing the gates more sensibly than is typically done by the clock tree synthesis built into every place and route tool.
Cyclos has another approach to reducing the 30% consumed in the clock. They think that clocks are the wrong shape, being square waves. If the clock was a sine wave then it could be resonant if we added some inductors, and would not consume power in the clock tree. That would be nice but the price is that every clocked element needs to be adapted so it can work with a sinusoidal clock rather than the usual rising-edge, falling-edge square wave we are all used to.
No list of all companies in the power area would be complete without Sequence, some of whose ancestral companies have been around for over 15 years. Their primary focus is on measuring power, with or without vectors, at netlist or RTL level. They are pretty much the standard tool for this.
There may be other c
ompanies out there focused on power reduction. As I said yesterday, “power is the new timing” and so it is a focus of a lot of innovation.