## 20.4 An Output-Capacitor-Free Analog-Assisted Digital Low-Dropout Regulator with Tri-Loop Control

Mo Huang<sup>1,2</sup>, Yan Lu<sup>1</sup>, Seng-Pan U<sup>1,3</sup>, Rui P. Martins<sup>1,4</sup>

<sup>1</sup>University of Macau, Macau, China <sup>2</sup>now with South China University of Technology, Guangzhou, China <sup>3</sup>Synopsys Macau Ltd, Macao, China <sup>4</sup>Instituto Superior Tecnico, Universidade de Lisboa, Portugal

Low-dropout regulators (LDOs) are widely distributed in SoC designs to supply individual voltage domains, and a digital LDO (DLDO) is favorable for its low-voltage operation and process scalability. However, as many SoCs generate a load current ( $I_{LOAD}$ ) variation at sub-A/ns level, voltage regulators require a large areaconsuming output capacitor ( $C_{OUT}$ ) to maintain the output voltage ( $V_{OUT}$ ) during fast transients. A conventional shift-register (SR)-based DLDO [1] suffers from a power and speed trade-off, thus requires a large  $C_{OUT}$ . To break the tie and minimize  $C_{OUT}$ , [2-5] applied coarse-fine tuning and daptive clocking, but a fast sampling clock is still necessary for instantaneous  $V_{OUT}$  sensing. Event-driven control used in [6] reacts fast within one clock cycle, but the ADC (with 7 comparators) and the digital PI controller increase the complexity and power consumption. This work presents an analog-assisted (AA) tri-loop control scheme for transient improvement, low power, and  $C_{OUT}$  reduction.

Figure 20.4.1 shows the AA technique in addition to the SR-based DLDO. The V<sub>SSB</sub> nodes of the driving inverters of the power switches are not connected to Gnd as usual, but AC-coupled to V<sub>OUT</sub> through a coupling capacitor C<sub>C</sub> and DC-biased to Gnd with R<sub>C</sub>. This forms an AA loop for bandwidth-extension and instant response. Once a load transient occurs, the V<sub>OUT</sub> droop coupled to V<sub>G</sub> of the on switches provides a larger instantaneous V<sub>GS</sub>, and thus larger unit current I<sub>UNIT</sub>. Simulation shows 5× I<sub>UNIT</sub> can be achieved with 100mV  $\Delta$ V<sub>OUT</sub>, with only 1.4× obtained in the conventional one. Thus, the AA loop significantly reduces  $\Delta$ V<sub>OUT</sub>. A similar behavior is observed when I<sub>LOAD</sub> steps down. Consequently, C<sub>OUT</sub> can be reduced or even removed in this scheme. Fig. 20.4.1 also gives the parameters and simulated Bode plots of the AA loop. The AA loop is stable because the passband gain A<sub>V</sub> <0.

Figure 20.4.2 shows the overall architecture of the proposed AA-DLDO. A 9b PMOS switch array is implemented for better  $V_{OUT}$  accuracy. This array is divided into 3 sub-sections (low, medium, and high) with carry-in/out between each other. These sub-sections are made of L, M, and H SR bits with the instant values of I(t), m(t) and h(t), respectively. A tri-loop control, including the 1) AA, 2) coarse and 3) fine tuning, is implemented. The driving inverters are sized in proportion to their corresponding switch strengths, of which all the V<sub>SSB</sub> nodes are AC-coupled to V<sub>OUT</sub>. Additionally, the coarse tuning is made by the medium and high SRs. The medium SR, triggered by a dead-zone comparator (DZ), outputs carry-in or carry-out signals to drive the High SR. Fine tuning is comprised solely of the low sub-section fed by a 1b quantization comparator (CMP). All these SRs are clock gated for power-loss reduction.

Figure 20.4.3 shows the timing diagram of the AA-DLDO. After the AA loop takes effect for  $I_{LOAD}$  large steps, the 'Coarse\_en' signal generated due to the  $V_{OUT}$  exceeding the DZ activates coarse tuning. In this mode, the coarse control word shifts by L counts each cycle, rapidly regulates  $V_{OUT}$  to  $V_{REF}$  and shortens the recovery time. When  $V_{OUT}$  is within the DZ, the coarse tuning terminates, and fine tuning takes over. Shifting by 1 count per cycle,  $V_{OUT}$  is more accurately guided to  $V_{REF}$ . It is observed that limit-cycle oscillation (LCO) exists in most digitally controlled loops [7]. To eliminate LCO, the 'Fine\_en' is forced down after a duration of T<sub>1</sub>, to enable the freeze mode that stops all the SRs, and also saves steady-state quiescent current.

For the targeted resolution, the proposed scheme only needs L+M+H SR bits, with L×M×H=512, as compared with 512 SR bits for the conventional DLDO. Hence, this arrangement reduces the complexity, area, and power consumption.

Fig. 20.4.3 also shows the simulated power loss breakdowns of the AA-DLDO and a baseline design [1] with the same resolution and process. The AA-DLDO reduces the total power consumption from 41 $\mu$ A to 3.4 $\mu$ A, with the transistor leakage cut from 20 $\mu$ A to 2.9 $\mu$ A because of the significant reduction in the number of SR bits. Although the comparator power is higher due to the additional DZ, the dynamic power losses from the SRs and buffers is eliminated with the freeze-mode operation.

Figure 20.4.4 illustrates design considerations for selecting the L, M and H values. It is straightforward to make L=M=H=(512)<sup>1/3</sup>=8 for the minimum number of SR bits. L is 8 in this work, but M=H=8 suffers from a serious glitch issue. For the m(t)-to-h(t) carry-in transition, h(t) will plus 1 and m(t) is reset to 1. When unmatched h(t) and m(t) delays occur, the coarse word coarse(t)= $h(t)\times8+m(t)$ will experience a ' $8\rightarrow 1\rightarrow 9$ ' transition, rather than the desired ' $8\rightarrow 9$ ', generating a large glitch amplitude of 7×L. A possible solution is to decrease M, while keeping M×H constant (e.g. M=4 and H=16), where a '4 $\rightarrow$ 1' transition is achieved with a 3×L glitch, while the M+H value is slightly increased from 16 to 20. The glitch can be further minimized by selecting an even smaller M, but this requires an exponential increase in H, which is undesirable in terms of power and area. Here, we apply a modified carry-in scheme, where m(t) resets to 3 instead of 1, achieving a '4 $\rightarrow$ 3 $\rightarrow$ 7' transition and reducing the glitch amplitude to 1×L. Meanwhile, coarse(t) ramps faster with this scheme if a consecutive shift-up operation takes place, which is advantageous for a shorter recovery time. A similar effect is expected in carry-out if m(t) is set to 1 instead of 3. The simulated glitch comparison shows a maximum glitch reduction (GR) of 100mV is achieved with the technique, and the recovery time is shortened by roughly 3µs.

The proposed AA-DLDO is fabricated in a 65nm General Purpose (GP) process with  $C_{OUT}$ =0pF and  $C_c$ =100pF, and operates at a 10MHz sampling clock. Fig. 20.4.5 shows the measured transient response. In steady state, the DC level of  $V_{OUT}$  is regulated to 0.5V with a 0.6V input. When I<sub>LOAD</sub> changes from 2mA to 12mA with 1ns edge times, the AA-DLDO achieves a 105mV undershoot and a 65mV overshoot, mainly determined by the AA loop. The LCO is removed in freeze mode, and no significant glitch is seen with the GR technique. Fig. 20.4.6 shows a comparison table. With the AA scheme and tri-loop control, the AA-DLDO achieves the highest resolution per SR bit, and the fastest 0.23ps FOM with the lowest sampling frequency and quiescent current among state-of-art DLDOs. Fig. 20.4.7 shows the micrograph of the AA-DLDO, with an active chip area of 0.03mm<sup>2</sup>.

## Acknowledgments:

This work is supported by the Macao Science and Technology Development Fund (FDCT) 122/2014/A3 and the Research Committee of University of Macau.

## References:

[1] Y. Okuma, et al., "0.5-V Input Digital LDO with 98.7% Current Efficiency and 2.7-μA Quiescent Current in 65nm CMOS," *IEEE Custom Integrated Circuits Conf.*, 2010.

[2] S. T. Kim, et al., "Enabling wide autonomous DVFS in a 22nm graphics execution core using a digitally controlled hybrid LDO/switched-capacitor VR with fast droop mitigation," *ISSCC*, pp. 154-155, 2015.

[3] S. B. Nasir, et al., "A 0.13µm Fully Digital Low-dropout Regulator with Adaptive Control and Reduced Dynamic Stability for Ultra-Wide Dynamic Range," *ISSCC*, pp. 98-99, 2015.

[4] Y. J. Lee, et al., "A 200mA Digital Low-drop-out Regulator with Coarse-fine Dual Loop in Mobile Application Processors," *ISSCC*, pp. 150-151, 2016.

[5] M. Huang, et al., "A Fully Integrated Digital LDO With Coarse–Fine-Tuning and Burst-Mode Operation," *IEEE Trans. Circuits Syst. II, Exp. Briefs*, vol. 63, no. 7, pp. 683-687, Jul. 2016.

[6] D. Kim, et al., "Fully Integrated Low-drop-out Regulator Based on Event-driven PI Control," *ISSCC*, pp. 148-149, 2016.

[7] M. Huang, et al., "Limit Cycle Oscillation Reduction for Digital Low Dropout Regulators," *IEEE Trans. Circuits Syst. II, Exp. Briefs*, vol. 63, no. 9, pp. 903–907, 2016.



Figure 20.4.1: AA-DLDO scheme and the poles of the AA loop (top); the transient waveforms of the AA and conventional schemes; and, the Bode plot of the AA loop (bottom).



Figure 20.4.3: Timing diagram of the AA-DLDO (left), and the power loss breakdown comparison between the baseline and proposed one (right).







Figure 20.4.2: Overall architecture of the proposed AA-DLDO, with the 1) AA, 2) coarse tuning, and 3) fine tuning loops.



Figure 20.4.4: The solution for glitch reduction (top), and simulated load transient waveforms with and w/o the glitch reduction scheme (bottom).

|                                       | [3] 2015   | [4] 2016   | [5] 2016   | [6] 2016    | This work |
|---------------------------------------|------------|------------|------------|-------------|-----------|
| Process                               | 130nm      | 28nm       | 65nm       | 65nm        | 65nm      |
| Area [mm²]                            | 0.355      | 0.021      | 0.01       | 0.029       | 0.03      |
| Туре                                  | Digital    | Digital    | Digital    | Digital     | Digital   |
| Achitecture                           | SR based   | SR based   | SR based   | ADC based   | SR+AA     |
| V <sub>IN</sub> [V]                   | 0.5-1.2    | 1.1        | 0.6-1.1    | 0.5-1       | 0.5-1     |
| V <sub>OUT</sub> [V]                  | 0.45-1.14  | 0.9        | 0.4-1      | 0.45-0.95   | 0.45-0.95 |
| Max. F <sub>SAMPLE</sub> [MHz]        | 400        | N.A.       | 500        | 200         | 10        |
| Min. I <sub>Q</sub> [µA]              | 24         | 110        | 82         | 12.5        | 3.2       |
| Res. [bit] /Total SR                  | 7bit/128   | 6.6bit/25  | 10bit/96   | N.A.        | 9bit/28   |
| Total Capacitance*                    | 1nF        | 23.5nF     | 1nF        | 0.4nF       | 0.1nF     |
| ΔV <sub>ουτ</sub> [mV] @              | 90         | 120        | 55         | 40          | 105       |
| ΔI <sub>LOAD</sub> /T <sub>EDGE</sub> | @1.4mA/N.A | @180mA/4µs | @98mA/20ns | @0.4mA/N.A. | @10mA/1ns |
| FOM** [ps]                            | 76.5       | 7.75       | 0.45       | 1.11        | 0.23      |

\* Total Capacitance includes  $C_L$  and  $C_C,$  where  $C_L{=}0$  and  $C_C{=}100 pF$  in this work.

FOM = 
$$\frac{C\Delta V_{OUT}}{I_{MAX}} \times \frac{I_Q}{I_{MAX}}$$

Figure 20.4.6: Comparison with the state-of-the-art.

