# A 0.127-mm<sup>2</sup>, 5.6-mW, 5<sup>th</sup>-Order SC LPF with +23.5-dBm IIP3 and 1.5-to-15-MHz Clock-Defined Bandwidth in 65-nm CMOS

Yaohua Zhao, Pui-In Mak, Man-Kay Law and Rui P. Martins<sup>1</sup>

The State-Key Laboratory of Analog and Mixed-Signal VLSI, University of Macau, Macao, China 1 – On leave from Instituto Superior Técnico (IST)/TU of Lisbon, Portugal E-mail Correspondence: pimak@umac.mo

Abstract—This paper proposes two techniques for improving the linearity and power efficiency of switched-capacitor (SC) circuits. The first is a high-speed switched-current-assisting (SCA) path that helps the main (folded-cascode) OTA to deliver most of the desired charge to the *integration capacitor*, leaving the final error correction to be completed by the main OTA. The second is a pre-charging (PC) path that assists the main OTA to speed up the charging of the *load capacitor*. Both SCA and PC paths share one auxiliary (differential-pair) OTA that features a high speedto-power efficiency. The prototype is a bandwidth-scalable 5<sup>th</sup>order Butterworth SC lowpass filter (LPF) for software-defined radios. Fabricated in 65-nm CMOS, the LPF exhibits a decadewide tunable bandwidth (1.5 to 15 MHz) solely defined by the clock (30 to 350 MHz), leading to a compact die size (0.127 mm<sup>2</sup>). Under the same power (5.6 mW) and bandwidth (10 MHz) targets, the IIP3 reaches +23.5 dBm (+16.7 dBm) and the cutoff accuracy is 97.9% (82%) with (without) the SCA + PC paths. The achieved Figure-of-Merit (0.014 fJ) compares favorably with the state-of-the-art.

*Index Terms*—bandwidth, CMOS, linearity, lowpass filter (LPF), operational transconductance amplifier (OTA), software-defined radio, switched capacitor (SC).

### I. INTRODUCTION

The linearity, power and area efficiencies of a bandwidthscalable low-pass filter (LPF) for software-defined radios are decisively important in order to support a number of wireless applications with low hardware complexity and cost. Although the active-RC [1] and  $g_m$ -C [2] LPFs can fulfill the linearity requirements of most wireless standards, the bandwidth tuning still heavily relies on scaling of supply [3], current [4] and/or switching elements [2]. The former two can lead to variation of more than one parameter, while the latter impacts the chip area that is increasingly expensive in nanoscale CMOS. A better alternative should be the switched-capacitor (SC) LPF [5] that allows wide and continuous tuning of bandwidth solely defined by the clock, avoiding any calibration loop or backup tuning elements. Yet, the passive open-loop SC LPF [5] is mainly suitable for IIR/FIR realization. which can hardly offer a sharp roll-off as the typical active SC LPFs that use operational transconductance amplifiers (OTAs). On the other hand, the main challenges of OTA-based SC LPFs are more on the linearity and bandwidth-to-power efficiency.

This paper introduces a switched-current assisting (SCA) technique and a pre-charge (PC) technique suitable for OTAbased SC circuits like the channel-selection LPF as realized in this work. The basic principles, circuit details and experimental results are presented next.



Fig. 1. SC integrator (a) with SCA path, and (b) with PC path.

# II. BASIC PRINCIPLES OF SCA AND PC TECHNIQUES FOR A SC INTEGRATOR

For a typical SC integrator [Fig. 1(a), without the SCA path and  $\Phi_{2\_SCA}$ ], the input signal ( $V_{ip}$ ) is captured in the sampling capacitor  $C_s$  during  $\Phi_1$ , while the main OTA should push the sampled charge from  $C_s$  to the integration capacitor  $C_{int}$  during  $\Phi_2$ . Obviously, the finite gain-bandwidth product (GBW) of the OTA can result in an imperfect charge transfer in every clock period  $T_s$ , degrading the accuracy and linearity of the integrator.

To alleviate the GBW (power) to performance tradeoff, a SCA path is introduced. The SCA path samples the inverting input  $(V_{in})$  in  $\Phi_1$  with the help of an auxiliary OTA in unitygain feedback, which copies  $V_{o1p}$  to one plate of  $C_A$  to ensure only the desired charge  $(C_A V_{in})$  is transferred to  $C_{int}$  during  $\Phi_{2\_SCA}$ . In this way, the main OTA only entails to complete the rest error correction task, which is much less speed demanding. To be analytically compared later, the power burned by the auxiliary OTA is more efficient than putting it to the main OTA.

The SCA path is dedicated for assisting the charging of  $C_{\text{int}}$ , but a SC integrator in general can also be loaded by another SC branch to continue the signal processing by the following circuitry. To enhance the charging speed of the load capacitor  $C_{\text{L}}$ , a PC path [Fig. 1(b)] can be added. Before  $\Phi_2$ 

that entails the main OTA to charge  $C_{\rm L}$ ,  $\Phi_{2\_\rm PC}$  allows the auxiliary OTA to pre-charge  $C_{\rm L}$ . Interestingly, referring to Fig. 1(a), only one extra route from the output of the auxiliary OTA to  $C_{\rm L}$  can render the two techniques combinable in one SC integrator, accelerating the charging time of both  $C_{\rm int}$  and  $C_{\rm L}$  more power efficiently.

# III. ANALYTICAL COMPARISON OF SC INTEGRATORS WITH AND WITHOUT THE SCA PATH

As the SCA and PC techniques are similar in nature, only the SCA is analyzed and compared. For the settling error resulting from a limited OTA's GBW, we can study the effectiveness of the SCA path using the equivalent circuit of a SC integrator in the integrating phase [Fig. 2(a)], where  $C_{\rm in}$ and  $G_{\rm L}$  denote the OTA's input parasitic capacitance and output conductance, respectively. Supposing that  $C_{\rm s} = C_{\rm int}$ , the input  $V_{\rm ip}$  of the OTA can be modeled as a unit step signal  $V_{\rm step}$ applied at the left terminal of  $C_{\rm s}$ . Without the SCA path, the output  $V_{\rm olp}(t)$  can be expressed as,

$$V_{o1p}(t) = V_{step} \cdot (1 - e^{-(\frac{t}{\tau})})$$
(1)

where  $\tau = \frac{1}{\omega_{-3dB}}$ ,  $\omega_{-3dB}$  is proportional to the effective transconductance  $G_{\text{OTA}}$  of the OTA. The step responses of  $V_{\text{out}}(t)$  under different  $G_{\text{OTA}}$  (1x, 2x and 4x) are simulated as shown in Fig. 2(b). A better settling accuracy can either be achieved by increasing  $G_{\text{OTA}}$  or prolonging the integrating time. For high-speed applications, only the former is allowed, being a hard tradeoff with the power. Here, the charging speed of the SCA path can be sized to be much faster than that of the main OTA (i.e., only limited by the sizes of  $C_A$  and on-resistance of the switches  $S_{SCA}$ ). Thus, a short SCA operating time ( $\Phi_{2 SCA}$ ) with an anti-phase input  $V_{in} (= -V_{ip})$  is adequate to deliver most of the desired charge for the integration. For instance, for a 1% overall settling error, the auxiliary OTA helps settling to 5% error with an extra power of 35% to that of the main OTA, improving the speed-to-power efficiency from  $\frac{\omega_{-3dB}}{Power} \ge \frac{9.2}{T_s}$  to  $\frac{\omega_{-3dB}}{Power} \ge \frac{11.26}{T_s}$ . The key idea is that only the error correction, from 5% to 1%, is handled by the main OTA.

The linearity of a SC integrator is normally limited by the gain nonlinearity of the main OTA. Fig. 2(c) shows the ratio of  $V_x/V_{in}$  under the step responses with different  $G_{OTA}$ . The peak value of  $V_x$  can be expressed as a function of  $V_{ip}$ ,  $C_s$ ,  $C_{in}$ ,  $C_{int}$  and  $C_L$ , and is weakly dependent on  $G_{OTA}$ . With a large  $V_{ip}$ , the weak nonlinearity of the main OTA can be modeled at the output current as follows (assuming differential realization),

$$i_o = G_{\text{OTA}} \cdot V_{ip} + G_{\text{OTA3}} \cdot V_{ip}^3 \tag{2}$$

where the 3<sup>rd</sup>-order term  $(V_{ip}^3)$  comes from the integration of the current. Traditionally, to improve the linearity,  $G_{\text{OTA}}$  can be boosted to shorten the pulsing time of  $V_x/V_{ip}$ . The proposed SCA path, in a different way, enhances the linearity by *canceling* the voltage at the input of the main OTA. As a result, the virtual ground remains close to zero [blue curve in Fig. 2(c)]

during the entire integration window, thereby lowering the distortion generated by the main OTA.



Fig. 2 (a) SC integrator with the SCA path. (b)  $V_{olp}$  under a step response. (c)  $V_{s}/V_{ip}$  under a step response.

Obviously, the SCA path can also induce noise and nonlinearity penalties. It can be determined that, by optimally sizing  $C_A$  amongst the different trade-offs, the SCA path can be made insignificant (confirmed by simulations) when compared with the overall noise and linearity metrics of the SC integrator. Detailed proofs are omitted here due to space constraints.

# IV. A 5<sup>TH</sup>-ORDER BUTTERWORTH SC LPF PROTOTYPE

The proof-of-concept prototype is a fully-differential 5<sup>th</sup>order Butterworth SC LPF targeting one-decade bandwidth tunability from a few MHz (WCDMA) to over 10 MHz (WLAN). It is structured with one Uniquad and two Biquads in cascade (Fig. 3). To maximize the dynamic range, the high-Q(1.62) Biquad is located at the last stage minimizing the swing at inner nodes. The SCA and PC paths can be disabled such that their effectiveness can be assessed fairly. The optimized unit capacitor  $C_{unit}$  is 288 fF, which balances the layout efforts with the required matching in the employed ST GP/LP 65-nm CMOS technology.



Fig. 3. Simplified block diagram of the LPF with SCA + PC paths.



Fig. 4. (a) Main OTA and (b) auxiliary OTA.

To optimize the speed-to-power efficiency the main OTA [Fig. 4(a)] is based on a single-stage folded-cascode topology with local gain boosting. Each gain-boosted cell features one GP device sized with a long channel length (L = 180 nm) to lower their threshold voltages, leveraging the limited output swing of such OTA architecture. A continuous-time common-mode feedback circuit was chosen for its better power supply rejection at high frequency. The bias current ( $I_{b,OTA}$ ) of the main OTA is 500 µA, which results in a 64-dB DC gain and a 390-MHz unity-gain frequency (UGF) at a 600-fF capacitive load. The auxiliary OTA [Fig. 4(b)] is a simple differential pair with a current-mirror load, showing a 29-dB DC gain and a 140-MHz UGF at a 600-fF capacitive load.  $I_{b,Buf}$  is 85 µA (~17% of that of the main OTA).

The clock generator is based on a typical non-overlapping scheme (not shown) with proper delay cells and AND/OR functions to realize all required phases. Proper device sizing ensures correct operation over a wide range of clock rate, with a power efficiency of 6.5 to 14  $\mu$ W/MHz.

## V. EXPERIMENTAL RESULTS

Fig. 5 shows the die micrograph of the SC LPF. Because of no tuning elements for bandwidth control, the active die area is

very compact  $(0.127 \text{ mm}^2)$ , excluding the test buffer). The factual performance of the LPF is properly de-embedded using the test bench reported in [6]. The signal path is powered at 1.2 V for better overdrives and signal swing, whereas the clock generator is operated at 1 V for lowering the dynamic power.



Fig. 5 Die micrograph of the fabricated 5<sup>th</sup>-order SC LPF.



Fig. 6 Measured normalized gain responses under different clock rates.



Fig 7. Measured gain responses with and without the SCA + PC techniques. The targeted cutoffs are set at 2 and 10 MHz by the clock.

Solely set by the clock rate (30 to 350 MHz) the SC LPF measures a consistent 100-dB/decade stopband rejection profile (Fig. 6) over a wide range of bandwidth from 1.5 to 15 MHz (upper-limited by the main OTA's GBW). The corresponding power consumption rises from 2.3 to 7.8 mW. The normalized gain responses at 2-MHz and 10-MHz –3-dB cutoffs, with and without the SCA and DC paths, are plotted in Fig. 7. At 10-

MHz, the passband gain is improved from -5 to -2.3 dB, while cutoff accuracy is enhanced from 82% to 97.9%. The measured input-referred noise density (IRN) is  $\sim$ 35 nV/ $\sqrt{Hz}$  with and without the SCA + PC techniques.



Fig. 8 Measured IIP3 with and without the SCA + PC techniques.

Table I. Performance Summary.

|                                |               | Without SCA + PC | With SCA + PC |  |
|--------------------------------|---------------|------------------|---------------|--|
| Power<br>@ 10 MHz<br>Bandwidth | Main OTA      | 4.2 mW           | 3.18 mW       |  |
|                                | Auxiliary OTA | N/A              | 1.02 mW       |  |
|                                | Clock Gen.    | 1. 4 mW          | 1. 4 mW       |  |
|                                | Total         | 5.6 mW           | 5.6 mW        |  |
| Cutoff Accuracy @ 10 MHz       |               | 82%              | 97.9%         |  |
| In-band IIP3                   |               | +16.7 dBm        | +23.5 dBm     |  |
| -1-dB Compression Point        |               | +5.2 dBm         | +7.9 dBm      |  |
| IRN                            |               | 35 nV/√Hz        | 35 nV/√Hz     |  |

The in-band linearity is assessed at the 10-MHz bandwidth. With a two-tone test (3 and 3.2 MHz) applied, the IIP3 can be improved from +16.7 to +23.5 dBm after enabling the SCA and PC paths as shown in Fig. 8.

Table II. Performance Comparison with the Prior Art.

|                                   | This<br>Work                     | JSSC'11<br>[2]                           | ISSCC'12<br>[3]                  | JSSC'09<br>[4]                   | ISSCC'13<br>[5]          |
|-----------------------------------|----------------------------------|------------------------------------------|----------------------------------|----------------------------------|--------------------------|
| Technology                        | 65 nm                            | 90 nm                                    | 90 nm                            | 180 nm                           | 65 nm                    |
| Architecture                      | Active SC +<br>SCA + PC          | Gm-C                                     | Ring Osc.<br>Integrator          | Gm-C                             | Gm + Passive<br>SC       |
| Filter Order, N                   | 5 <sup>th</sup> ,<br>Butterworth | 6 <sup>th</sup> ,<br>Butterworth         | 4 <sup>th</sup> ,<br>Butterworth | 3 <sup>rd</sup> ,<br>Butterworth | 7 <sup>th</sup> ,<br>IIR |
| Bandwidth (MHz), BW               | 1.5 to 15                        | 8.1 to 13.5                              | 7 to 30                          | 0.5 to 20                        | 0.4 to 30                |
| Bandwidth Tuning                  | Clock<br>Rate                    | Coarse (Cap Bank)<br>Fine (Bias Current) | Supply<br>Voltage                | Bias<br>Current                  | Clock<br>Rate            |
| In-Band IIP3 (dBm)                | +23.5 @<br>10MHz BW              | +22.1 @<br>10MHz BW                      | N/A                              | +20.5 @<br>10MHz BW              | +16 @<br>2.9MHz BW       |
| IRN (nV/√Hz), P <sub>N</sub>      | 35                               | 75                                       | 23.7 to 32.8                     | 12 to 425                        | 2.85                     |
| Area/Pole (mm <sup>2</sup> /Pole) | 0.0254                           | 0.04                                     | 0.0725                           | 0.077                            | 0.06                     |
| Power (mW), Pc                    | 2.3 to 7.8                       | 4.35                                     | 2.9 to 19.1                      | 4.1 to 11.1                      | 1.96                     |
| FOM (fJ) *                        | 0.014                            | 0.024                                    | N/A                              | 0.072                            | 0.00038                  |

\* FOM = 
$$\frac{\frac{P_{C/N}}{BW \cdot \left(\left(\frac{IIP3}{P_N}\right)^{2/3} \cdot N^{4/3}\right)}$$

The performance summary with and without the SCA + DC techniques is given in Table I, whereas the comparison with the state-of-the-art solutions [2-5] is given in Table II. This work is advantageous for its: 1) highest linearity without sacrificing the power; 2) best area-per-pole efficiency being very suitable for nanoscale CMOS realization; 3) easy bandwidth scaling solely defined by the clock rate (no calibration); and 4) competitive figure-of-merit (FOM) to other Butterworth LPFs [2-4] except the IIR [5], which suffers from a slow stopband roll-off degrading the passband gain in high-order filtering.

# VI. CONCLUSIONS

Two circuit techniques (SCA and PC) have been proposed for improving the power efficiency and linearity of SC circuits. The SCA technique helps the main OTA to deliver most of the desired charge to the integration capacitor. The PC technique speeds up the charging of the load capacitor. Both share one auxiliary OTA that leads to an improved SC integrator with better linearity and cutoff accuracy. The proof-of-concept is a 5<sup>th</sup>-order Butterworth SC LPF fabricated in 65-nm CMOS. It measures a decade-wide tunable bandwidth from 1.5 to 15 MHz solely definable by the clock. The corresponding power consumption is from 2.3 to 7.8 mW. Because of no extra tuning element, a compact die size is achieved (0.127 mm<sup>2</sup>). At a 10-MHz bandwidth, the low power consumption (5.6 mW), high IIP3 (+23.5 dBm) and low IRN (35 nV/ $\sqrt{Hz}$ ) correspond to a competitive FOM (0.014 fJ). The SCA and PC techniques are applicable to other circuits like SC delta-sigma modulators.

#### ACKNOWLEDGEMENT

This work was supported by the Research Committee of the University of Macau and the Macao Science and Technology Development Fund (FDCT).

## REFERENCES

- S. V. Thyagarajan, S. Pavan and P. Sankar, "Active-RC Filters Using the Gm-Assisted OTA-RC Technique" *IEEE J. Solid-State Circuits*, vol.46, no. 7, pp. 1522–1533, Jul. 2011.
- [2] M. S. Oskooei, N. Masoumi, M. Kamarei, and H. Sjoland, "A CMOS 4.35-mW, +22-dBm IIP3 Continuously Tunable Channel Select Filter for WLAN/WiMAX Receivers" *IEEE J.Solid-State Circuits*, vol.46, no. 6, pp. 1382–1391, Jun. 2011.
- [3] B. Drost, M. Talegaonkar, and P.K. Hanumolu, "A 0.55V 61dB-SNR 67dBSFDR 7MHz 4th-Order Butterworth Filter Using Ring-Oscillator-Based Integrators in 90nm CMOS," *IEEE Int. Solid-State Circuits Conf. (ISSCC)*, *Digest.*, pp. 360-362, Feb. 2012.
- [4] T. Lo, C. Hung, and M. Ismail, "A Wide Tuning Range G<sub>m</sub>-C Filter for Multi-Mode CMOS Direct-Conversion Wireless Receivers," *IEEE J.Solid-State Circuits*, vol. 44, no. 9, pp. 2515–2524, Sep. 2009.
- [5] M. Tohidian, I. Madadi, and R. B. Staszewski, "A 2mW 800MS/s 7th-Order Discrete-Time IIR Filter with 400kHz-to-30MHz BW and 100dB stop-band rejection in 65nm CMOS," *IEEE Int. Solid-State Circuits Conf. (ISSCC)*, *Digest.*, pp. 174-175, Feb. 2013.
- [6] S. Pavan and T. Laxminidhi, "Accurate Characterization of Integrated Continuous-Time Filters," *IEEE J. of Solid-State Circuits*, vol. 42, no. 8, pp. 1758–1766, Aug. 2007.