## 20.4 A 123-Phase DC-DC Converter-Ring with Fast-DVS for Microprocessors

Yan Lu<sup>1,2</sup>, Junmin Jiang<sup>1</sup>, Wing-Hung Ki<sup>1</sup>, C. Patrick Yue<sup>1</sup>, Sai-Weng Sin<sup>2</sup>, Seng-Pan U<sup>2,3</sup>, R. P. Martins<sup>2,4</sup>

<sup>1</sup>Hong Kong University of Science and Technology, Hong Kong, China, <sup>2</sup>University of Macau, Macao, China, <sup>3</sup>Synopsys, Macao, China, <sup>4</sup>Instituto Superior Tecnico, Universidade de Lisboa, Portugal

Inspired by The Square of Vatican City, a fully integrated step-down switchedcapacitor DC-DC converter ring with 100+ phases is designed with a fast dynamic voltage scaling (DVS) feature for the microprocessor in portable or wearable devices. As shown in Fig. 20.4.1, this symmetrical ring-shaped converter surrounds its load in the square and supplies the on-chip power grid, such that a good quality power supply can be easily accessed at any point of the chip edges. There are 30 phases on the top edge and 31 phases on each of the other 3 edges, making 123 phases in total. The phase number and unit cell dimensions of this architecture can easily be adjusted to fit the floor plan of the load. The pads of the converter-ring are placed at the corners, and will not affect the pads of the load. Moreover, by using the proposed V<sub>DD</sub>-controlled oscillator (V<sub>DD</sub>CO), the frequency of which is controlled by varying its supply voltage, a hitherto unexplored feature of the multiphase DC-DC architecture is exposed: the control-loop unity gain frequency (UGF) could be designed to be higher than the switching frequency.

Switched-capacitor power converters (SCPCs) are preferred for full integration because of the high capacitance density in nanometer processes. Multiphase architecture for ripple reduction can be easily built into the SCPC with little power and area overhead [1-3]. Moreover, SCPCs have the potential for faster response compared to inductive DC-DC converters, as SCPCs can be approximate first-order systems. Inductive converters have an LC second-order filter, and the inductor current has to change before  $V_{out}$  can change, resulting in a 90° phase delay. For on-chip power delivery, large I<sub>DC</sub> will introduce IR drop due to the parasitic resistance,  $R_{P}$ , of the  $V_{DD}$  network and fast transient current will cause  $V_{DD}$  variation due to the parasitic  $L_P$  of the power-buses and bond wires. The overall  $\Delta V_{DD}$  is approximately  $I_{DC}R_P + L_P \cdot di/dt$  that will cause clock jitter and affects the logic delay of the load [4]. By using point-of-load power converters, both input-current and output-voltage ripples could be reduced. Low-dropout regulators (LDOs) are commonly used to further suppress voltage ripples [5]. However, an LDO introduces energy loss and suffers from large in-rush input current during fast load transients [6]. For the proposed SCPC, the in-rush current is reduced by the distributed 123-phase configuration, and also by employing higher  $V_{IN}$  (lower  $I_{IN}$ ) when compared to an LDO. It is worth noting that increasing the phase number will not only reduce the ripple, but also allow the control-loop to respond at every fraction of the switching period (T), which is T/123 in our case. As such, the discrete-time SCPC approaches a pseudocontinuous-time power stage.

For the conventional PFM topology that uses a centralized current-starved (CS) voltage-controlled oscillator (VCO) and distributed clock phases, the upper limit on phase number is due to the matching of phases and routings and its dominant pole is usually set at the  $V_{\text{CTL}}$  node. For the proposed topology, the error amplifier (EA) with NMOS source follower buffer stage drives the V<sub>DD</sub>CO which is distributed and localized to every phase, making it free of matching and routing problems. Now,  $V_{\text{DDC}}$  is a low-impedance node and the associated pole  $p_{\rm c}$  is located at high frequencies and the output pole  $p_{\rm o}$  becomes the dominant pole and the bandwidth is extended. The simulation results in Fig. 20.4.2 show that the output frequency  $F_{out}$  of the  $V_{DD}CO$  is approximately linearly proportional to  $V_{\text{DDC}}$ ; hence, the  $V_{\text{DDC}}$  node is automatically adaptively biased by driving the  $V_{DD}CO$ , which improves stability. The  $V_{DD}CO$  is more power-efficient than the CS VCO for the following reason. Power consumption of a ring oscillator can be calculated as  $C \cdot V_{DD}^2 \cdot F_{OUT}$ . To obtain the same maximum  $F_{OUT}$  with the same number of inverter-stages, the CS VCO needs a higher  $V_{DD}$  than that of the  $V_{DD}CO$ because for each stage it has two more current sources and more parasitic capacitance. For lower  $F_{\mbox{\tiny OUT}},$  the power consumption of a CS VCO decreases linearly as its  $V_{DD}$  is fixed, while that of the  $V_{DD}CO$  decreases at a cubic rate.

Figure 20.4.2 shows the schematics of the EA, the conversion ratio (CR) selector, the  $V_{DDL}$  and  $V_{SSH}$  regulators for internal rails, and the level shifter (LS).  $M_5$  and  $M_6$  of the EA form a  $g_m$ -boosting stage [7] to improve the DC gain without

introducing any low-frequency poles. All internal nodes of the EA have pole frequencies in the GHz range. With V<sub>IN</sub> ranging from 1.6V to 2.2V, all of the transistors in this design are low-voltage (LV) devices. Voltages  $2V_{IN}/3$  and  $V_{IN}/3$  are generated as  $V_{DDL}$  and  $V_{SSH}$  by the replica regulators, which consume 7.5µA each. Two hysteresis comparators with built-in offset compare  $V_{REF}$  with  $2V_{IN}/3$  and  $V_{IN}/2$  to determine the CR for the (N-1)/N SCPC, where N=2, 3, 4. Every phase needs one level shifter. The proposed LS can effectively convert the signal from the [ $V_{DDL}$ , GND] input domain to the output domains of [ $V_{IN}$ ,  $V_{SSH}$ ] and [ $V_{DDL}$ , GND], simultaneously, through a single conversion. Cascoding M<sub>13</sub> through M<sub>16</sub> with gate biases of  $V_{DDL}$  and  $V_{SSH}$  can prevent device breakdown.

Figure 20.4.3 shows the schematic of the (N-1)/N SCPC unit cell. The clock phase  $\varphi_k$  comes from the previous unit cell, and passes to the current cell as  $\varphi_{k+1}$  after one inverter delay. For CR=1/2, only C<sub>F1</sub> is used as the fly capacitor, and C<sub>F2</sub> and C<sub>F3</sub> are connected between V<sub>IN</sub> and V<sub>OUT</sub> and serve as C<sub>L</sub>. For CR=2/3, only C<sub>F3</sub> is used as C<sub>L</sub>, and for CR=3/4, all C<sub>F1,2,3</sub> are used as fly capacitors. In this topology, the output capability of each CR is similar. Each C<sub>F1,2,3</sub> is 13pF, which is constructed by stacking MOS, MOM and MIM capacitors, and an additional C<sub>L</sub> of 2nF is integrated to mimic the load capacitance of the microprocessor. To realize non-overlapped timing and consequently eliminate the shoot-through current and the reverse current, 3-transistor (3T) based inverters are used to drive the switches S<sub>1</sub> through S<sub>12</sub>. The 3T inverters that drive the PMOS (NMOS) switches consist of two NMOS (PMOS) devices to turn the switches on slowly. Turn-on sequences of the switches are also controlled by the 3T inverters through sensing the voltages on the C<sub>F1,2,3</sub> plates.

Figure 20.4.4 shows the measured load transient and reference tracking waveforms. Voltage positioning [6] is employed to reduce the V<sub>OUT</sub> peak-to-peak variation and also to relax the DC loop-gain requirement. The  $\Delta V_{OUT}$  is 58mV with the on-chip load transient current set to 4×25mA (one 25mA load at each chip corner), 100ps edge times, CR=2/3, V<sub>IN</sub>=2V, and V<sub>OUT</sub>=1.1V. The loop response time is around 3ns without using an extra 1GHz clock, corresponding to an UGF of >100MHz, with the V<sub>DD</sub>CO frequency ranging from 250kHz to 50MHz for the entire load range. The reference up- and down-tracking speeds are 2.5V/µs and 0.9V/µs, respectively, making it fit for fast-DVS.

Measured efficiencies for different load and CR conditions are shown in Fig. 20.4.5. The peak efficiency is 78.3%, and the efficiency at a power density of 0.13W/mm<sup>2</sup> is 75.8%. A comparison with state-of-the-art bulk-CMOS SCPC designs is listed in Fig. 20.4.6. Besides the fast-transient feature, the measured ripple voltages are only 3.5mV in full load condition. The chip micrograph is shown in Fig. 20.4.7. The converter-ring is fabricated in a 65nm LL CMOS process with an area of 0.84mm<sup>2</sup> excluding the load and pads for testing. The unit cell area is  $50\times120\mu$ m<sup>2</sup>. The shaded chip area is the work of others. Supported by the 123-phase ring-shaped topology and the V<sub>DD</sub>-controlled oscillator, this work attempts to design the control-loop UGF that is higher than the switching frequency of an SCPC.

## References:

[1] G. Villar Piqué, "A 41-Phase Switched-Capacitor Power Converter with 3.8mV Output Ripple and 81% Efficiency in Baseline 90nm CMOS," *ISSCC Dig. Tech. Papers*, pp. 98–100, Feb. 2012.

[2] H.-P. Le, *et al.*, "A Sub-ns Response Fully Integrated Battery-Connected Switched-Capacitor Voltage Regulator Delivering 0.19W/mm<sup>2</sup> at 73% Efficiency," *ISSCC Dig. Tech. Papers*, pp. 372–373, Feb. 2013.

[3] R. Jain, et al., "A 0.45–1 V Fully-Integrated Distributed Switched Capacitor DC-DC Converter with High Density MIM Capacitor in 22 nm Tri-Gate CMOS," *IEEE J. Solid-State Circuits*, vol. 49, no. 4, pp. 917–927, Apr. 2014.

[4] K. A. Bowman, *et al.*, "A 22 nm All-Digital Dynamically Adaptive Clock Distribution for Supply Voltage Droop Tolerance," *IEEE J. Solid-State Circuits*, vol. 48, no. 4, pp. 907–916, Apr. 2013.

[5] Y. Lu, W.-H. Ki, and C. P. Yue, "A 0.65ns-Response-Time 3.01ps FOM Fully-Integrated Low-Dropout Regulator with Full-Spectrum Power-Supply-Rejection for Wideband Communication Systems," *ISSCC Dig. Tech. Papers*, pp. 306–307, Feb. 2014.

[6] P. Hazucha, *et al.*, "Area-Efficient Linear Regulator with Ultra-Fast Load Regulation," *IEEE J. Solid-State Circuits*, vol. 40, no. 4, pp. 933–940, Apr. 2005. [7] K. N. Leung and P. K. T. Mok, "A Capacitor-Free CMOS Low-Dropout Regulator with Damping-Factor-Control Frequency Compensation," *IEEE J. Solid-State Circuits*, vol. 38, no. 10, pp. 1691–1702, Oct. 2003.



Figure 20.4.1: Conceptual layout of the 100+ phases converter-ring for microprocessors; and the proposed system architecture with the  $V_{\rm DD}CO$  and the (N-1)/N SCPC.









Figure 20.4.2: Schematics of the error amplifier,  $V_{\text{DD}}CO$ , conversion ratio selector,  $V_{\text{DDL}}$  and  $V_{\text{SSH}}$  regulators, and level shifter.





| Publication                        | [1] ISSCC '12        | [2] ISSCC '13         | [3] JSSC '14          | This work             |
|------------------------------------|----------------------|-----------------------|-----------------------|-----------------------|
| Process                            | 90nm                 | 65nm                  | 22nm Tri-gate         | 65nm                  |
| Conv. Ratios                       | 1/2, 2/3             | 1/3, 2/5              | 1/2, 2/3, 4/5, 1      | 1/2, 2/3, 3/4         |
| Phase No.                          | 41                   | 18                    | 8                     | 123                   |
| V <sub>IN</sub>                    | 1.2-2V               | 3-4V                  | 1.225V                | 1.6-2.2V              |
| Vout                               | 0.7V                 | 1V                    | 0.45-1V               | 0.6-1.2V              |
| f <sub>sw</sub> @η <sub>Peak</sub> | 50MHz                | N/A                   | 250MHz                | 40MHz                 |
| $\eta_{Peak}$                      | 81%                  | 74.3%                 | 82.7%                 | 78.3%                 |
| Power Density                      | 39mW/mm <sup>2</sup> | 190mW/mm <sup>2</sup> | 250mW/mm <sup>2</sup> | 180mW/mm <sup>2</sup> |
| <b>Р</b> оит, мах                  | 10mW                 | 162mW                 | 25mW                  | 152mW                 |
| Ripple @ŋ <sub>Peak</sub>          | 3.8mV                | N/A                   | >43mV                 | 3.5mV                 |
| $\Delta V_{OUT} @T_{Edge}$         | N/A                  | 76mV@50ps*            | N/A                   | 58mV@100ps            |
| DVS Speed                          | N/A                  | N/A                   | N/A                   | 2.5V/μs               |

\*With additional 3GHz clock for transient response

Figure 20.4.6: Comparison with state-of-the-art bulk-CMOS SCPC works.

## **ISSCC 2015 PAPER CONTINUATIONS**

