# A 5 -Bit $1.25-\mathrm{GS} / \mathrm{s} 4 \mathrm{x}$-Capacitive-Folding Flash ADC in $65-\mathrm{nm}$ CMOS 

Chi-Hang Chan, Student Member, IEEE, Yan Zhu, Member, IEEE, Sai-Weng Sin, Senior Member, IEEE, Seng-Pan U, Senior Member, IEEE, Rui Paulo Martins, Fellow, IEEE, and Franco Maloberti, Fellow, IEEE


#### Abstract

This paper presents a 5-bit 1.25-GS/s folding flash ADC. The prototype achieves a folding factor of four with a capacitive folding technique that only consumes dynamic power. Incorporated with various calibration schemes, folding errors and the comparator's threshold inaccuracies are corrected, thus allowing a low input capacitance of $\mathbf{8 0} \mathbf{f F}$. The design is fabricated using $\mathbf{6 5 - n m}$ digital CMOS technology and occupies $0.007 \mathrm{~mm}^{2}$. The maximum DNL and INL post calibration are 0.67 and 0.47 LSB, respectively. Measurement results show that the ADC can achieve $1.25 \mathrm{GS} / \mathrm{s}$ at $1-\mathrm{V}$ supply with a total power consumption of $595 \mu \mathrm{~W}$. In addition, it exhibits a mean ENOB of 4.8b at de among ten chips, which yields an FoM of $17 \mathrm{fJ} /$ conversion-step.


Index Terms-Analog-to-digital conversion (ADC), calibration, embedded reference, Flash ADC, folding, low power.

## I. Introduction

POWER and area play an important role in the design of portable devices. Battery-powered systems based on standards like ultra-wideband (UWB) and wireless personal area networks (WPANs) demand ADCs operating at very high speed, low power, and simultaneously with minimum input capacitance. A possible target is 5-6 bit and $500 \mathrm{MS} / \mathrm{s}$ or higher. For such specifications, the conventional pipeline ADC topologies are not suitable as the opamp often results in large power consumption at these speeds. On the other hand, thanks to the benefits granted by technology scaling, it is possible to design very fast comparators with medium accuracy and very low-power consumption. Therefore, the successive-approximate (SA) and the flash topology, based on the use of comparators, are both promising techniques for power-efficient data conversion in different regions of the speed-resolution plane.

[^0]With a conventional $65-\mathrm{nm}$ general purpose (GP) CMOS process, it is possible to design a comparator that responds (delay time from clock to outputs) in less than 166 ps with a 7-mV (half LSB@6b from a 1-V supply) input signal, and the possible power metric at 1 V supply is around $10 \mu \mathrm{~W} / \mathrm{GHz}$. Based on this, the one bit conversion time of 2 ns can be estimated which allows for about six cycles of comparator and SAR logic, thus obtaining 6 bit at $500 \mathrm{MS} / \mathrm{s}$. This option has been used in pipeline SAR to give rise a 10 bit at $500 \mathrm{MS} / \mathrm{s}$ with partial-interleave technique [1]. In addition, the high-speed comparator also finds potential use in the flash.

The conventional two-step method requires generating a residue, or possibly folded, for the input of the second stage. Such folding operation is normally performed using an amplifier which consumes static power. The required bandwidth for high speed and large folding factor leads to significant power consumption, thus reducing the power efficiency of the two-step architecture especially with low resolutions. For instance, a $2+3$ bit architecture needs $3+7$ comparators instead of 31 . However, the $2 / 3$ power reduction in the comparator section should be more than the power required by the residue generator for this topology to be power-efficient.

In this design, the power needed for generating the residue is significantly reduced by using a folding method that uses capacitors. The power required is purely dynamic and consumed in charging and discharging a small capacitor ( 30 fF ). The result is that 800 ps of time is sufficient to accommodate the three phases of operation: conversion, two-bit folding, and three-bit second flash. Besides, the errors arising from process and mismatch variation are corrected with various calibration techniques. As observed during the test measurements, the calibration schemes have significantly improved the ADC's linearity. The circuit is realized in $65-\mathrm{nm}$ CMOS and achieves $28-\mathrm{dB}$ SNDR until $630-\mathrm{MHz}$ at input with $1.25 \mathrm{GS} / \mathrm{s}$.

The organization of this paper is as follows. The ADC architecture and the proposed folding operation are described in Section II. Section III analyzes nonideal errors of the design and illustrates their calibration schemes. Section IV describes the circuit details of some important building block in the ADC. Measured results are shown in Section V. Finally, conclusions are drawn in Section VI.

## II. ADC Architecture and Folding Operation

Fig. 1(a) illustrates the ADC architecture. It consists of sample-and-holds (S/H), folding logic, a multiplexer (S1 and S2), a coarse ADC, a fine ADC, registers, and an encoder. The ADC quantizes 5 bit with two steps in four phases, as shown in Fig. 1(b). During the sampling phase $\Phi_{\mathrm{S}}$, the complementary

(a)

(b)

Fig. 1. ADC. (a) Architecture (actual implementation is in differential). (b) Signals' behavior and timing diagram.
input signals ( $\mathrm{V}_{\mathrm{in}, \mathrm{N}}$ and $\mathrm{V}_{\mathrm{in}, \mathrm{P}}$ ) are sampled on the capacitors $\left(\mathrm{C}_{\mathrm{S} 2}\right.$ and $\left.\mathrm{C}_{\mathrm{S} 1}\right)$ by the $\mathrm{S} / \mathrm{Hs}$. Then, the coarse ADC quantizes 2 bit in $\Phi_{\mathrm{ST} 1}$ while the folding output $\left(\mathrm{V}_{\mathrm{FO}}\right)$ is reset to ground to avoid the memory effect. The outputs from the coarse ADC feed to the logic which generates control signal for the Mux and bottom-plate switches of $\mathrm{C}_{\mathrm{S} 1}$ and $\mathrm{C}_{\mathrm{S} 2}$ to perform $4 \times$ folding. Finally, $\mathrm{V}_{\mathrm{FO}}$ is quantized by the fine ADC in $\Phi_{\mathrm{ST} 2}$, and both the outputs of the coarse and fine ADC are latched and encoded to a 5-bit output.

## A. Conventional and Proposed Folding

Folding is a well-known technique adopted in Flash ADCs for reducing the number of comparisons in the conversion. As in subranging, the quantization is separated into coarse and fine. The residue, which is the folding output, is generated conventionally with folding amplifiers [2]-[4]. The limited bandwidth of the amplifier alters the position of the zero-crossing points in the folder, which is more severe with a narrower bandwidth and causes distortion [5]. The SNDR degradation from this effect can be estimated with the algorithm in [6] by modeling the terminating capacitance and resistor as band-limited filter. Based on this scheme, the degradation amount with different bit quantizer and $4 \times$ folding factor is obtained from the simulation which is indicated in Fig. 2. For 5-bit resolution and $4 \times$ folding factor, the bandwidth of the folding amplifier has to be 3.5 times the input frequency in order to achieve at most $3-\mathrm{dB}$ degradation in SNDR. Thus, the folding amplifier needs to have a large bandwidth at a high speed of operation or with a superior folding factor, which indicates a significant amount of static power consumption. This can be exemplified by the relationship


Fig. 2. Simulated SNDR degradation under different bits of the quantizer with a folding factor of 4 .
between resistive load, folding factor and the $3-\mathrm{dB}$ bandwidth of the amplifier. It can be represented as

$$
\begin{equation*}
f_{3 \mathrm{~dB}}=\frac{1}{2 \cdot \pi \cdot R_{\mathrm{load}} \cdot C_{\mathrm{load}}} \times \frac{1}{N} \tag{1}
\end{equation*}
$$

where $C_{\text {load }}$ and $R_{\text {load }}$ are the capacitive and resistive load of the amplifier, respectively, and $N$ is the folding factor. In order to increase the bandwidth of the amplifier, $R_{\text {load }}$ must be reduced. For maintaining the same dc operation point, the transconductance of the input pair also needs to increase proportionally which implies raising current and power consumption. From (1), $R_{\text {load }}$ can be calculated for the different input bandwidths under a folding factor of 4 and $30-\mathrm{fF}$ load. A folding amplifier with resistive load is designed using the adopted process at $1-\mathrm{V}$ supply and keeping its output-common-mode at mid-supply for different values of $R_{\text {load }}$. The simulated power consumption versus different bandwidths under various $R_{\text {load }}$ is shown in Fig. 3. With a $625-\mathrm{MHz}$ input, the amplifier already consumes around $500 \mu \mathrm{~W}$, which would be almost the total power consumption of this design and implies a significant tradeoff between power and speed in the folding amplifier.

The limited bandwidth of the amplifier leads to rounding of the folding output which affects folding linearity. Apart from rounding, the offset between different input pairs also adds distortion on the folding operation. Assume the offset voltages, $\delta_{\text {VOS }}$ are independent, identically distributed Gaussian random variables with zero mean and a standard deviation of $\sigma_{\text {VOS }}$. As half of the offset being budgeted to the comparators, the $3 \sigma_{\mathrm{VOS}}$ offset voltage, which is generated from each of the input pairs of the folding amplifier, has to be less than half LSB divided by square root of 2

$$
\begin{equation*}
3 \sigma_{\mathrm{VOS}}=3 \frac{A_{\mathrm{VT}}}{\sqrt{\mathrm{WL}}}<\frac{1}{2 \sqrt{2}} \mathrm{LSB}_{5 b} \tag{2}
\end{equation*}
$$

where $A_{\mathrm{VT}}$ is the slope of the Pelgrom plot for a delta VT (Transistor's Threshold Voltage) mismatch, $W$ and $L$ are the width and the length of the transistor. For a 5-bit example with $1-\mathrm{V}$


Fig. 3. Simulated power consumption of the designed folding amplifier.
full swing in $65-\mathrm{nm}$ CMOS process, the area of the input pair can be estimated as

$$
\begin{equation*}
W L>\left(\frac{3 \cdot 5 \mathrm{mV} \cdot \mu \mathrm{~m}}{11 \mathrm{mV}}\right)=1.86 \mu \mathrm{~m}^{2} \tag{3}
\end{equation*}
$$

With $A_{\mathrm{VT}}$ being around $5 \mathrm{mV} \cdot \mu \mathrm{m}$ in $65-\mathrm{nm}$ technology [7] and length is $0.06 \mu \mathrm{~m}$ in adopted process, the minimum required width of the transistor is $31 \mu \mathrm{~m}$. Since there are four input pairs in total in the folding amplifier, this $120-\mu \mathrm{m}$ transistor induces large parasitic at the load, but sizing small will lead to large offset. As a result, in high-speed implementations of the conventional folding architecture, the folding amplifier imposes a performance limitation.

In the proposed architecture, folding is accomplished with switched-capacitor circuits and multiplexers that consume no static power. This results in a purely passive operation and no additional offset from the folding circuit. Since the folding operation is controlled by the coarse ADC through logical feedback, the bandwidth and power tradeoff of the conventional approach is avoided and the speed burden is imposed on the logic. Although the logic structure becomes more complex with a large folding factor, that will be relaxed with technology scaling. Nevertheless, there are a few drawbacks with the proposed folding scheme. First, the folding linearity is distorted by parasitic capacitors, which will be explained in Section III in detail. Second, since the accuracy of the folding operation relies on the coarse ADC's decision, its comparators' offset must be less than $1 / 2$ LSB. In this design, calibration schemes are developed to correct these errors.

## B. $4 X$-Folding Operation

Detailed folding operations are illustrated in Fig. 4 with a single-ended equivalent configuration. During the sampling period, input is complementarily sampled on the top-plate of $\mathrm{C}_{\mathrm{S} 1}$ and $\mathrm{C}_{\mathrm{S} 2}$. Based on the first-stage two-bit decisions, four regions
of folding operation (R1-R4) are identified. The folding is performed with level shifting by suitably biasing one of the sampling capacitor terminals and enabling the appropriate folding switches to move each region into R 2 . When the sampled $V_{\mathrm{in}, \mathrm{P}}$ on $\mathrm{C}_{\mathrm{S} 1}$ is in R1, the folding output equals its complementary sampling $\mathrm{V}_{\mathrm{in}, \mathrm{N}}$ with a dc voltage shift of $\mathrm{V}_{\text {REF }}\left(\mathrm{V}_{\mathrm{in}, \mathrm{N}}+\mathrm{V}_{\text {REF }}\right)$. Accordingly, S 2 is on and the bottom plate of $\mathrm{C}_{\mathrm{S} 2}$ switches from Gnd to $V_{R E F}$. When the sampled $V_{i n, P}$ on $C_{S 1}$ is in $R 4$, the folding output equals its sampling $\mathrm{V}_{\mathrm{in}, \mathrm{P}}$ with a dc offset of $\mathrm{V}_{\text {REF }}\left(\mathrm{V}_{\mathrm{in}, \mathrm{P}}+\mathrm{V}_{\mathrm{REF}}\right)$. Consequently, S 1 is on and the bottom plate of $\mathrm{C}_{\mathrm{S} 1}$ switches from Gnd to $\mathrm{V}_{\text {REF }}$. When the sampled $\mathrm{V}_{\mathrm{in}, \mathrm{P}}$ on $\mathrm{C}_{\mathrm{S} 1}$ is in R2/R3, the operation simply enables $\mathrm{S} 1 / \mathrm{S} 2$ thus keeping the bottom plates connected to Gnd.

For the differential implementation, the folding references $\mathrm{V}_{\text {ref }} \pm$ are ideally $3 \mathrm{~V}_{\mathrm{FS}} / 4$ and $\mathrm{V}_{\mathrm{FS}} / 4$. These reference levels require an extra ladder or biasing circuits which unavoidably leads to static power consumption. In order to develop a folding operation with only dynamic power, supply, and ground are utilized as folding references. The folding accuracy can be ensured by calibration. A more detailed analysis as well as design considerations will be discussed in Section III. Since the proposed folding operation is based on voltage shifting and selecting, the logic control is relatively simple. Including buffers, the critical path from the comparators' outputs to the control switches consists only of four logic gates, thus resulting in high-speed operation. Furthermore, the power of the proposed folding operation is purely dynamic and consumes less than $14 \%$ of the total power of the ADC, including logic, buffers, and switching.

The capacitive folding technique will experience a signal loss effect due to the parasitic in the second stage. Large-signal losses can be avoided with a large sampling capacitance [8], but this will undesirably increase sampling time and the ADC's input capacitance. Thus, a calibration technique is introduced to match the second-stage comparators' thresholds with the folding gain. In this design, the calibration is implemented on-chip, and its operation will be explained in detail in Section III.


Fig. 4. Folding operations in single-ended equivalent configuration.

## C. Comparison With Conventional Gainless Subranging Residue Generation Scheme

Conventional two-step Flash ADC architecture usually consists of a coarse ADC, a fine ADC, and a DAC for residue generation. If the coarse ADC quantizes 2 bit and the fine ADC quantizes 3 bit (as in this design), the residue generation circuit can be realized by a 2-bit resistive or capacitive DAC. Since the resistive DAC requires static power, the comparison between conventional and proposed architecture is based on capacitive terms. For a binary 2-bit DAC, a total of four unit capacitors are required. Assuming that a $40-\mathrm{fF}$ sampling capacitor is adopted (as input capacitor in this design), the unit capacitance is 10 fF . Since the MSB and MSB/2 capacitor in the 2-bit DAC need to charge or discharge to generate residue, it shows a larger power consumption and longer settling time when comparing with the proposed method. From the calculation and simulation results, the extra power and settling time for residue generation are three and two times higher than that of the proposed scheme, respectively (detailed in Appendix I).

## D. Comparator Arrangement, Power, and Speed Considerations

Typically, comparators are dominant in power consumption of flash ADCs due to the essential gain to suppress the offset [9], [10] and their large quantity. Since the main objective of the proposed architecture is to achieve low-power operation, min-imum-size transistors are used in the comparator without preamplifier and the offset is calibrated. Even with small width of the input pair, which has around $120-\mu$ S peak transconductance, the input-referred-noise of the comparator can be easily kept under

1/2LSB (with 1/5LSB). Sizing small alleviates regeneration at the comparator's outputs but it optimizes the power efficiency of the comparator. The power of the dynamic comparator has a relationship as [11]

$$
\begin{equation*}
p_{\text {Total }} \propto C_{\mathrm{reg}} \cdot W_{\mathrm{inv}}+C_{\mathrm{EL}} \tag{4}
\end{equation*}
$$

where $C_{\text {reg }}$ are the intrinsic load at the regeneration node, $W_{\text {inv }}$ is the width of the back-to-back inverter, and $C_{\text {EL }}$ is the extrinsic load at the comparator output. The power of the comparator is proportional to the width of the inverter as its size being large $\left(C_{\mathrm{reg}} \gg C_{\mathrm{EL}}\right)$. On the other hand, sizing the inverter small leads to less power dissipation up to a point when the $C_{\text {EL }}$ starts to dominate (where the power of the comparator reaches the minimum). In this design, minimum size of transistors is used in both the comparator back-to-back inverters and the loading buffers, which optimizes the power efficiency under the desired operation speed.

In addition, another efficient approach to save power is to reduce the number of activated comparators which simultaneously eases the calibration effort, alleviates the kickback noise, and reduces the input capacitance. However, even if the number of comparators shrinks proportionally with its resolution, the complex logic structure will dominate the power consumption [12] and limit the conversion rate of this type of ADC, taking it below the gigahertz sampling rate [13], [14]. In contrast, this architecture adopts the two-steps scheme, which allows a reasonable balance of the number of comparators in each stage, and also achieves high speed and low power residue generation. These allow achieving a sampling rate above 1 GHz with submilliwatt power.


Fig. 5. Major parasitic capacitances affecting the folding accuracy.

## E. Dynamic Folding Logic

For high-speed, the folding logic is implemented using dynamic gates. A folding phase $\Phi_{\text {Fold }}$ (Fig. 1) which starts after $\Phi_{\mathrm{ST} 1}$ finished is utilized as a reset clock of the dynamic logic and also prevents the charge leakage problem. In [15], the firststage only quantizes one bit resolution with the sign comparator whose output is directly applied to the chopper for folding purpose. This leads to charge leakage on the sampling capacitance when the sign comparator is in a metastability condition. The leak happens due to the partially turned on switches between the first and second-stages when an undetermined logic decision is employed. Therefore, in the proposed design, the $\Phi_{\text {Fold }}$ is used to ensure that the digital signals propagated after the comparators carry a valid decision and only one side of the comparators' output is used to make the folding decision.

## III. Nonlinearity and Calibration

Various nonideal effects have a great impact on the performance of the proposed design, therefore it is essential to identify and resolve them with calibration techniques. Usually, in typical folding flash ADCs, the offset of the comparators degrades the conversion linearity [2], [3]. Since the proposed design mainly targets low-power operations, the preamplifier is removed from the comparator and the offset is calibrated. This allows the minimization of the transistor sizes. In addition, process variation on the embedded threshold technique also induces errors in the comparator circuit. These two errors have similar characteristics and can be considered as input-referred offsets of the comparator, thus allowing calibration with the same scheme.

In the proposed architecture, capacitors and multiplexers are utilized to achieve the folding operation. Since these components are all passive, the folding accuracy will be affected by the parasitic. Fig. 5 shows the major parasitic capacitances which distort the folding operation. $C_{p, \text { comp3 }}$ and $C_{\mathrm{p}, \mathrm{comp} 7}$ are the parasitic from the input of the first and second-stage comparators, respectively. $C_{p, \text { rout,2nd }}$ are the routing parasitic at the second stage. Top-plate parasitic of the sampling capacitance $C_{\mathrm{S}}$ and the routing parasitic of the first stage are indicated as $C_{\mathrm{p}, \text { rout }, 1 \mathrm{st}}$. Two types of folding errors are induced due to the parasitic, the folding gain, and the folding reference errors. They are calibrated with different mechanisms in the proposed design.

In summary, there are three different nonideality being calibrated in this design: 1) the trip point of all of the comparators and 2) folding gain error, which are both calibrated by the


Fig. 6. Input-folding output characteristic of capacitive folding scheme (a) with folding factor of 2 and (b) with folding factor of 4 . (Ideal characteristic shown in black line and error shown in the gray line).
method in Section III-B. Folding reference error, which is calibrated by the method in Section III-D. The causes and effects of these errors and other nonlinear source will be discussed in detail in Section III-A.

## A. Folding Gain Error-The Cause

During the sampling period, parasitic $C_{\mathrm{p}, \text { comp3 }}$ are charged to the input, with $C_{\mathrm{p}, \text { comp } 7}$ and $C_{\mathrm{p}, \text { rout }}$ being floated. When the folded input signal feeds into the second stage through $S_{1}$, the routing parasitic and the input capacitances of the 7 comparators cause signal loss. The folding gain may be expressed as

$$
\begin{equation*}
A_{\text {Folding }}=1-\frac{C_{\mathrm{p}, \text { comp } 7}+C_{\mathrm{p}, \text { rout }, 2 \mathrm{nd}}}{C_{S 1}+C_{\mathrm{p}, \text { comp } 3}+C_{\mathrm{p}, \text { comp } 7}+C_{\mathrm{p}, \text { rout }, 1 \text { st\&2nd }}} \tag{5}
\end{equation*}
$$

the gain is less than 1 which attenuates the folded output. Since the input and the comparators' thresholds at the second-stage experience different gains, it translates into a folding gain error. Such error will not induce nonlinearity if the folding factor is 2 , as shown in Fig. 6(a). While the folding factor increases to 4 in the proposed design, it causes nonlinearity, which is illustrated in Fig. 6(b). Furthermore, larger losses will be induced with bottom-plate sampling, as indicated in

$$
\begin{equation*}
A_{\text {Folding }}=1-\frac{C_{\mathrm{p}, \text { comp3 }}+C_{\mathrm{p}, \mathrm{comp} 7}+C_{\mathrm{p}, \text { rout }, 2 \mathrm{nd}}}{C_{\mathrm{S} 1}+C_{\mathrm{p}, \text { comp3 }}+C_{\mathrm{p}, \mathrm{comp} 7}+C_{\mathrm{p}, \text { rout }, 1 \mathrm{st} \& 2 n d}} \tag{6}
\end{equation*}
$$

the bottom-plate sampling experiences more losses because the $\mathrm{C}_{\mathrm{p}, \mathrm{comp} 7}$ and $\mathrm{C}_{\mathrm{p}, \text { comp3 }}$ do not sample the input. These parasitics will draw charge from the $\mathrm{C}_{\mathrm{S}}$ during conversion and degrade the linearity. Thus in the proposed design, top-plate sampling is


Fig. 7. Proposed folding gain error calibration scheme.


Fig. 8. Simulated drifted offset of comparators with different thresholds under $\left(-50^{\circ} \mathrm{C}-150{ }^{\circ} \mathrm{C}\right)$ temperature variation.
utilized at the front-end to prevent more losses and charge injection is ensured less than half LSB with small size of sampling switches.

## B. Folding Gain Error-Calibration

Enlarging the sampling capacitance $C_{S}$ is the intuitive solution to minimize the gain error; however, this increases power consumption and degrades the speed of the proposed folding operation. Instead, thresholds of the second-stage comparators can be adjusted according to the folding gain and matched with the folded input. This tuning is accomplished simultaneously with threshold calibration, by the proposed scheme which corrects the gain and threshold error (due to device mismatch). The


Fig. 9. Simulated comparator offset with different thresholds (2000 times Monte Carlo).
presented scheme adopted imbalance capacitance load at the comparator's outputs to adjust the threshold voltage and correct the error. If the loading is imbalanced, then an offset is generated in the comparator because of the unbalanced regeneration time-constant in the positive-feedback latch and the offset voltage (due to unbalanced load) of the comparator with input transistor M1 can be expressed as [16]

$$
\begin{equation*}
V o s_{\text {comp }}=\frac{\Delta C_{L}}{C_{L, \text { total }}} \frac{I_{M 1}}{g m_{M 1}} \approx \frac{\Delta C_{L}}{C_{L, \text { total }}} \frac{V_{\text {overdrive }}}{2} \tag{7}
\end{equation*}
$$

It can be noticed that the offset of the comparator is affected by the ratio between the capacitive difference at the comparator's outputs $\left(\Delta C_{L}\right)$ and the total load $\left(C_{L, \text { total }}\right)$, and the overdrive


Fig. 10. Folding reference error characteristic with different folding references.
voltage ( $V_{\text {overdrive }}$ ). Based on the above equation, the difference of loading will generate offset. Thus, by unbalanced adjusting the load, the offset of the comparator will drift.

Fig. 7 shows the calibration scheme and its operation can be described as follows. During calibration, the bottom plate of $\mathrm{C}_{\mathrm{S} 1} \pm$ is connected to $V_{\mathrm{CM}}$ and the desired threshold (generated from a fine on-chip reference ladder) is sampled onto the top plate through calibration sampling switches $\left(\mathrm{S}_{\text {cal }}\right)$. Following this, the first decision of the calibrating comparator triggers the control logic to choose the calibration polarity. The comparator's outputs are kept feeding back to either $\mathrm{M}_{\text {cal }}+$ or $\mathrm{M}_{\text {cal }}{ }^{-}$, which act as voltage-controlled capacitances and create unbalanced load, to adjust the threshold toward the input until its output flipped. For calibration of the second-stage comparators, the desired thresholds have to pass through the same signal path as folding input in order to measure the folding gain. Thus, desired thresholds are first sampled on $\mathrm{C}_{\mathrm{S} 1} \pm$ and then pass to the second stage with S1. The same on-chip calibration scheme is applied to all comparators one-by-one sequentially from the first stage onwards, occupying in all a maximum of 640 ADC's clock cycles before conversion. After calibration, the fine ladder is powered off by the footer with $10-\Omega$ on-resistance that induces only a maximum 0.03 LSB error in the desired thresholds.

Since there is no averaging performed during the calibration, the noise of the comparator will affect the calibration result. However, because the input referred noise of the comparator and the noise from the ladder (with decoupling) are smaller than 6 bit (the comparator noise is $1 / 5 \mathrm{LSB}$ and the ladder noise is $1 / 10$ LSB of 5 bit), the calibration accuracy can achieve 5 bit of precision without averaging. In addition, the offset of the comparator is temperature-dependent. Fig. 8 illustrates the drifted amount with $\mathrm{V}_{\mathrm{th}}=\mathrm{V}_{\mathrm{FS}} / 2$ and $\mathrm{V}_{\mathrm{th}}=3 \mathrm{~V}_{\mathrm{FS}} / 4$ threshold comparators from the simulation. The simulation performs in one of the Monte Carlo iteration where calibration is run in foreground at $27^{\circ} \mathrm{C}$. Then, the temperature varies from $-55^{\circ} \mathrm{C}$ to $125^{\circ} \mathrm{C}$ and the offset is measured in different points. The offset voltage does not vary during $-55^{\circ} \mathrm{C} \sim-125^{\circ} \mathrm{C}$ for the comparator with $\mathrm{V}_{\mathrm{th}}=\mathrm{V}_{\mathrm{FS}} / 2$. Nevertheless, systematical offset appears under different temperature for the comparator with $\mathrm{V}_{\mathrm{th}}=$


Fig. 11. Circuit schematic of the calibration capacitor array.
$3 \mathrm{~V}_{\mathrm{FS}} / 4$, which is due to the unsymmetrical structure from the embedded threshold technique (discussed in Section IV). The offset voltage exceeds half LSB of 5 bit as the temperature drops below $-20^{\circ} \mathrm{C}$ and raises over $75^{\circ} \mathrm{C}$. As a result, military purpose $\mathrm{ADC}\left(-55^{\circ} \mathrm{C}\right.$ to $\left.125^{\circ} \mathrm{C}\right)$ may requires background calibration but the foreground calibration (calibrate on startup or idle time) is already suffice for the commercial use $\left(-20^{\circ} \mathrm{C}\right.$ to $75^{\circ} \mathrm{C}$ ).

The simulated offset of the comparators with different thresholds is exhibited in Fig. 9. The embedded threshold technique is not required for the comparator with $\mathrm{V}_{\mathrm{th}}=\mathrm{V}_{\mathrm{FS}} / 2$ and the process variation has no effect on its trip point. On the other hand, the trip point of the embedded threshold comparator with $\mathrm{V}_{\mathrm{th}}=3 \mathrm{~V}_{\mathrm{FS}} / 4$ is altered in the four corners (FF, FS, SF, and SS). This leads to systematical offset and enlargers the range of threshold deviation under process and mismatch variation. Therefore, the calibration range of the proposed calibration scheme is carefully designed to correct errors, including gain, offset and threshold. More errors require larger load to compensate which in turn leads to lower-speed. In order to reduce the speed penalty, the embedded thresholds of each comparator are intentionally adjusted according to an estimated folding gain from post-layout simulation. Consequently, the calibration range is designed to sustain the PVT variations (including parasitic and mismatch) at $3 \sigma$ and the residue offsets are


Fig. 12. Folding-reference calibration scheme and its signals behavior.
guaranteed to be less than 6.25 mV with the investigation step around 15 mV .

## C. Folding-Reference Error-The Cause

During the folding operation, the folding references $\mathrm{V}_{\text {ref }} \pm$ are attenuated by the total node parasitic at $\mathrm{V}_{\mathrm{FO}} \pm$ after S 1 is closed. Such parasitic mainly comprise the input parasitic of the first- and second-stage comparators and the routing $\left(\mathrm{C}_{\mathrm{p}, \text { comp3 }}, \mathrm{C}_{\mathrm{p}, \text { comp } 7}\right.$ and $\left.\mathrm{C}_{\mathrm{p}, \text { rout }}\right)$. With ideal $\mathrm{V}_{\text {ref }} \pm\left(3 \mathrm{~V}_{\mathrm{FS}} / 4\right.$ and $\mathrm{V}_{\mathrm{FS}} / 4$ ), the dc voltage shifting of the folding operation is $\mathrm{V}_{\mathrm{FS}} / 2$ [Fig. 10(a)], as mentioned previously in Section II. Due to attenuation, the voltage shifting is altered and leads to

$$
\begin{align*}
\operatorname{Att}_{\text {ref }} & =\left(\frac{C_{S}+C_{\mathrm{p}, \text { rout,1st }}}{C_{S}+C_{\mathrm{p}, \text { comp7 }}+C_{\mathrm{p}, \text { comp3 }}+C_{\mathrm{p}, \text { rout,1st\&2nd }}}\right) \\
V_{\mathrm{ref}, \mathrm{diff}} & =\left(\frac{3 V_{\mathrm{FS}}}{4}-\frac{V_{\mathrm{FS}}}{4}\right) \cdot A t t_{\mathrm{ref}} . \tag{8}
\end{align*}
$$

It is obvious that $A t t_{\text {ref }}$ is less than 1 and is about $55 \%$ in the proposed design (including top-plate parasitic of $C_{S}, C_{\mathrm{p}, \text { comp } 7}, C_{\mathrm{p}, \text { comp } 3}, C_{\mathrm{p}, \text { rout1st }}$ and $C_{\mathrm{p}, \text { rout2nd }}$ which is around 25 fF ). As illustrated in Fig. 10(b), the error causes the folded output to exceed the quantization range of the second stage and introduces an incorrect dc level shifting into the folding operation. The folding reference error only affects the folding operation in R1 and R4 since these regions involve shifting operation, which turns it into a signal dependent error.

## D. Folding-Reference Error Calibration

Since the attenuation is around $55 \%$, the folding references have to be increased in the same amount in order to compensate the reference error. Thus, $\mathrm{V}_{\mathrm{DD}}$ and Gnd can be directly adopted as $\mathrm{V}_{\mathrm{ref}} \pm$ to overcompensate this attenuation [Fig. 10(c)], which simultaneously takes advantage of removing extra reference buffers in the folding references. Since the capacitance can experience different variations due to process and mismatch,


Fig. 13. Circuit schematic of the proposed comparator with embedded threshold.
thus requiring compensatory calibration. It is undesirable to implement the calibration at the reference-end because $\mathrm{V}_{\text {ref }} \pm$ has a great effect on the conversion speed. Instead, a 3-b bank is inserted at the top plate of the sampling capacitance to adjust the DC level of the folded output. Fig. 11 shows the circuit schematic of the calibration DAC array $\mathrm{C}_{\text {cal, array }}$. Unit capacitance $C_{u}$ is 1 fF , which is implemented with custom-design fringe capacitance [17].

The conceptual signal behavior of the proposed calibration is illustrated in Fig. 12(a). Its operation can be described as follows. At calibration period, the ADC quantizes as normal and a test input at either R1 or R4 has to be applied in order to enable the shifting operation. For instance, $25 \mathrm{~V}_{\mathrm{FS}} / 32$ is selected in this design which has an ideal folded output of $23 \mathrm{~V}_{\mathrm{FS}} / 32$ [Fig. 12(b)]. Three bits of tuning codes (Code ${ }_{\text {con }}$ ) will enable the capacitance in $\mathrm{C}_{\text {cal, array }}$ until the decision of the comparator


Fig. 14. Circuit schematic of the encoder.
with the $23 \mathrm{~V}_{\mathrm{FS}} / 32$ threshold flips. $\mathrm{C}_{\text {cal, array }}$ and switches are implemented on-chip while the tuning control is brought offchip.

The calibration range of the proposed scheme is estimated from layout extraction and post-layout simulation. Since calibration is in a single direction, the parasitic is guaranteed to have less than $50 \%$ attenuation toward $V_{\text {ref }} \pm$ over PVT corners. $C_{\mathrm{S}}$ varies about $\pm 20 \%$ according to maximum and minimum layout extraction while the parasitic capacitances have around $\pm 16 \%$. As a result, 7 f of $\mathrm{C}_{\text {cal, array }}$ has to be enabled in order to compensate capacitance deviation for the worst case (when Cs is 36 fF and parasitic is 29 fF ). This in turn reflects the calibration range and sets as minimum capacitance boundary of $C_{\mathrm{S}}$ (30 $\mathrm{fF})$. The calibration resolution can then be defined as follows:

$$
\begin{align*}
& \Delta A t t_{\text {step }} \\
& =\left(\frac{C_{S}+C_{\mathrm{p}, \text { rout }, 1 \mathrm{st}}}{C_{S}+C_{\mathrm{p}, \mathrm{comp} 7}+C_{\mathrm{p}, \text { comp3 }}+C_{\mathrm{p}, \text { rout,1st\&2nd }}+\Delta C_{\mathrm{cal}, \text { array }}}\right) \tag{9}
\end{align*}
$$

Every bit count of Code con contributes with a $10-\mathrm{mV}$ voltage shift at the folded output which is close to $1 \%$ of attenuation. Considering a signal loss in the second stage, the calibration step is below $1 / 2$ LSB of the converter. Since accuracy of the folding reference calibration depends on the decision of the comparator, offset calibration is executed first during the calibration period.

## E. Other Nonlinearity

Since the proposed folding scheme adopts switched-capacitor circuits and uses supply as reference to generate the residue, the supply variation and the nonlinear capacitor of folding switches affect the folding accuracy. In this design, since the total power of the ADC is only $595 \mu \mathrm{~W}$ and there are on-chip decoupling at the supply, the supply variation


Fig. 15. Chip micrograph.
is suppressed under 1 mV , which is much less than the half LSB of this ADC. In addition, the nonlinear capacitor in the folding switches introduces a signal-dependent gain loss on the residue. The loss is dependent on the size of the sampling capacitor and the folding switches. As the size (width/length) of the folding switch is a $0.27 \mu \mathrm{~m} / 0.06 \mu \mathrm{~m}$ transmission gate with $1: 1 \mathrm{P}$, NMOS sizing, the maximum nonlinear capacitor variation (mainly from gate-drain $C_{\mathrm{gd}}$ and gate-source $C_{\mathrm{gs}}$ ) is only 140 aF which indicates a less than $0.5 \%$ voltage variation on the folding residue and is acceptable in this 5-bit design.

## IV. Circuit Implementation

A prototype ADC was fabricated in $65-\mathrm{nm}$ CMOS to evaluate the proposed folding scheme along with its calibration. Here, we describe the comparator, its threshold generation, and the encoder briefly.

## A. Embedded Thresholds Comparator

The conventional flash ADCs usually require a resistive ladder to generate threshold voltages for the comparators. This ladder not only consumes a significant amount of static power in high-speed operation but also occupies large area because of the routing. In the current design, to save power and area, the thresholds of the comparators are self-embedded and thus avoiding the ladder. Different approaches ([16] and [13]) are existed to change the trip point of the comparator. In [16], different thresholds are obtained by intentionally unbalancing the input differential pair. However, the size of the differential pair is usually small for low-resolution ADCs with offset calibration [18], [19]. Thus, either this method is not applicable or the size of the input pair has to be undesirably increased, which leads to larger area and more power consumption. In [13], different unbalanced loads are induced at the comparator's outputs in order to vary the trip point. However, for a large threshold voltage, this load becomes significant and causes the comparator to fail to operate at high speed.

Fig. 13 illustrates the proposed comparator's circuit topology that includes a one-transistor embedded threshold technique. The thresholds in each comparator are generated by sizing the unbalanced transistor $M_{\text {REF }}$ with its gate connected to $V_{D D}$ properly. Assuming first that the currents in both branches are at a well balance condition, the relationship between the size


Fig. 16. Block diagram of the measurement setup.
of $\mathrm{M}_{\text {REF }}$ and the comparator's threshold can be derived as follows:

$$
\begin{equation*}
I_{\mathrm{REF}}=I_{2}-I_{1} \tag{10}
\end{equation*}
$$

with $\mathrm{M}_{\text {REF }}$, M 1 and M 2 all operating in saturation region in this instance:

$$
\begin{align*}
\frac{1}{2} \mu C_{\mathrm{ox}} \frac{W_{\mathrm{REF}}}{L_{\mathrm{REF}}}\left[V_{\mathrm{DD}}-V_{\mathrm{th}}\right]^{2}= & \left(\frac{1}{2} \mu C_{\mathrm{ox}} \frac{W_{2}}{L_{2}}\left[\left(V_{\mathrm{in}+}\right)-V_{\mathrm{th}}\right]^{2}\right) \\
& -\left(\frac{1}{2} \mu C_{\mathrm{ox}} \frac{W_{1}}{L_{1}}\left[\left(V_{\mathrm{in}-}\right)-V_{\mathrm{th}}\right]^{2}\right) \tag{11}
\end{align*}
$$

Therefore, the threshold voltage of the comparator (with the length and width of M1 and M2 are identical) is given by

$$
\begin{align*}
\left(V_{\mathrm{in}+}\right)-\left(V_{\mathrm{in}-}\right) & =V_{\mathrm{threshold}} \\
& =\sqrt{\frac{W_{\mathrm{MREF}} / L_{\mathrm{MREF}}}{W_{M 1} / L_{M 1}}}\left(V_{\mathrm{DD}}-V_{\mathrm{th}}\right) \tag{12}
\end{align*}
$$

Since the overdrive voltage of $M_{R E F}$ is large $\left(V_{D D}-V_{t h}\right)$, an unbalance can be easily generated with a small size of $\mathrm{M}_{\text {REF }}$ which reduces the speed and area penalty of adding an extra transistor. Even though this technique may degrade the power supply rejection ratio (PSRR) of the comparator circuit, it is well suited for low-resolution and high-speed ADCs.

Based on (12), the comparator threshold depends on supply voltage, sizing ratio between $\mathrm{M}_{1}$ and $\mathrm{M}_{\mathrm{REF}}$, and transistor's threshold of $\mathrm{M}_{\text {REF }}$ which are subjected to process and mismatch variation. Since the circuit is not symmetrical for the differential implementation, it will experience systemically offset with only process variation. Transistor $\mathrm{M}_{\text {REF }}$ induces unbalance at different corners which leads to larger offset variation than the symmetrical setup (Fig. 9). Thanks to the calibration certain of supply ( $\pm 20 \%$ ), process and mismatch variation can also be compensated.


Fig. 17. Measured 10-chip SNDR with low input frequency and $1.25 \mathrm{GS} / \mathrm{s}$ before and after calibration.

## B. Encoder

After the comparators' decisions are stored with the registers, they pass through an encoder in order to be converted into a binary ADC's output. Since the accuracy of the comparators is ensured by the calibration, bubble error correction is not required. A circuit schematic of the chosen encoder is shown in Fig. 14 based on this model. The out( X ) represent the decision of the comparators with threshold voltage of $\mathrm{X} \cdot \mathrm{V}_{\mathrm{FS}}$. It requires two layers of XOR gates due to the folding factor of 4 , and the adopted thermometer-to-binary encoder [20]. The encoder is mainly composed of multiplexers consuming a low amount of power and being suitable for 2-4-bit conversion. Furthermore, the encoder must ensure a consistent logic decision with the folding operation in order to determine the correct ADC's output. In the proposed prototype, the encoder is implemented off-chip for measurement purposes. Its power consumption, estimated with gate-level synthesis, consumes less than $5 \%$ of the total ADC power.


Fig. 18. Measured DNL and INL. (a) Before and (b) after calibration.

## V. Measurement Results

The 5 -bit ADC prototype was fabricated in a $65-\mathrm{nm} 1 \mathrm{P} 7 \mathrm{M}$ digital CMOS. Fig. 15 shows the micrograph of the chip whose active area is $0.007 \mathrm{~mm}^{2}$, including the ADC core and the on-chip calibration. CMOS devices with standard $\mathrm{V}_{\mathrm{T}}$ and regular supply voltage of 1 V are used in this design. The ADC has a full-scale input range of $1 V_{\mathrm{pp}}$ differential and an input capacitance of 80 fF including parasitic, complementary, and calibration capacitances. The block diagram of the measurement setup is shown in Fig. 16, which consists of two high-frequency signal generators, power supply, $\mathrm{PCB}, \operatorname{logic}$ analyzer, and a computer. Signal generators provide inputs and clock signal to the prototype. Since LVDS is not available, the outputs are decimated by a factor of 125 on-chip to reduce the input-output coupling and IO supply noise. All ten comparators' outputs are brought off-chip for measurement purpose which are captured with the Logic Analyzer and the 3-bit tuning code is controlled from off-chip. Folding reference calibration is executed in the foreground after thresholds calibration.

A total of ten chips were measured and their SNDR at conversion rate of 1.25 GHz with low frequency input before and after calibration are illustrated in Fig. 17. The mean SNDR after calibration is 30.7 dB at low input frequency. A chip (\#8) with mean SNDR is picked to report the following results. In order to characterize the static performance, the DNL and INL before and after calibration, are shown in Fig. 18(a) and (b), respectively. It is observed that the DNL characteristic is rather symmetrical because of the folding nature. Without calibration, the DNL and INL are $+1.97 /-1$ LSB and $+1.52 /-0.5 \mathrm{LSB}$,


Fig. 19. Measured SNDR versus conversion rate with low frequency input.
respectively. The worst case DNL and INL happen at around $1 / 4$ and $3 / 4$ positions caused by distortion on the folding operation as explained earlier in Section II.(B). After calibration, the DNL and INL improve to $+0.67 /-0.47$ LSB and $+0.47 /-0.43$ LSB, respectively. With 24 hours chip-run-time in room temperature, the DNL and INL are drifted to $+0.77 /-0.33 \mathrm{LSB}$ and $+0.66 /-0.56 \mathrm{LSB}$, respectively. The experimental results demonstrate and verify the proposed calibration schemes that ensure a low DNL and INL in this architecture. The dynamic performance of the design is investigated in several ways. First,


Fig. 20. Measured SNDR and SFDR versus input frequency.


Fig. 21. ADC output spectrum with 1024 samples for a $1.25-\mathrm{GS} / \mathrm{s}$ and Nyquist input (output decimated by 125 times).
the clock frequency is swept across 500 MHz to 1.5 GHz with low input frequency, as illustrated in Fig. 19, where the SNDR is above 30.7 dB from $500 \mathrm{MS} / \mathrm{s}$ to $1.25 \mathrm{GS} / \mathrm{s}$ and drops rapidly after $1.3 \mathrm{GS} / \mathrm{s}$ due to logic timing failure. Fig. 20 depicts the SNDR and SFDR across low frequency and up to the Nyquist input at a conversion rate of 1.25 GHz . The $-3-\mathrm{dB}$ point is located at 630 MHz which indicates the effective resolution bandwidth (ERBW) of the prototype. Fig. 21 shows the ADC output spectrum at Nyquist input rate after calibration. The measured signal-to-noise ratio (SNR) is 29.87 dB while the total harmonic distortion (THD) is -32.59 dB , which implies that noise is the main limitation on the SNDR of 28.07 dB . The improvement from each calibration is indicated in Fig. 22. At the low frequency input, the SNDR has been enhanced from 26.02 dB [Fig. 22(a)] to 28.71 dB [Fig. 22(b)], and finally to 30.7 dB [Fig. 22(c)], with no calibration, thresholds, and all calibrations, respectively.
The total power consumption is $595 \mu \mathrm{~W}$ at $1-\mathrm{V}$ supply. The analog power, including the comparators and the sampling network, is $232 \mu \mathrm{~W}$, and the digital including on-chip calibrations, logic, encoder and clock generator, is $363 \mu \mathrm{~W}$. The calculated FoM defined as

$$
\begin{equation*}
\mathrm{FoM}=\frac{\text { Power }}{2^{\mathrm{ENOB} @ D C} \times \min \left(f_{s}, 2 \times \mathrm{ERBW}\right)} \tag{13}
\end{equation*}
$$



Fig. 22. ADC output spectra with 0.4 MHz input frequency (output decimated by 125 times): (a) without any calibration, (b) with only threshold calibration, and (c) with all the calibrations.

TABLE I
Summary of Performance

| Technology | $65-\mathrm{nm}$ CMOS |
| :--- | ---: |
| Area | $0.007 \mathrm{~mm}^{2}$ |
| Sampling Rate | $1.25 \mathrm{GS} / \mathrm{s}$ |
| INL | $<0.5 \mathrm{LSB}$ |
| DNL | $<0.7 \mathrm{LSB}$ |
| SNDR @ DC | 30.7 dB |
| SNDR @ Nqyuist | 28.07 dB |
| ERBW | 630 MHz |
| Power consumption | $595 \mu \mathrm{~W}$ |
| Input range | $1 \mathrm{~V}_{\mathrm{pp}}$ differential |
| Input cap. (single-ended) | 80 fF |
| FoM | $17 \mathrm{fJ} /$ Conv.-step |

is a $17 \mathrm{fJ} /$ conversion-step, where $f_{s}$ is the sampling frequency and ENOB is the effective number of bits. The performance of the prototype ADC is summarized in Table I. Table II compares this work with state-of-the-art ADCs. Comparing with converters in gigahertz sampling rate, the proposed architecture achieves the lowest FoM and has the smallest area.

TABLE II
Benchmark With the State-of-the-Art

| Specifications | $\begin{gathered} \text { JSSC'09 } \\ \text { [15] } \end{gathered}$ |  | $\begin{gathered} \text { ISSCC'09 } \\ \text { [14] } \end{gathered}$ |  | $\begin{gathered} \text { JSS } \\ \hline 8 \end{gathered}$ |  | $\underset{[21]}{\text { CICC'10 }}$ |  | *This Work |  |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
| Architecture | Folding <br> Flash |  | Binary Search |  | 4XTI-Binary <br> Search |  | SAR |  | Folding Flash |  |
| Technology ( nm ) | 90 |  | 65 |  | 40 |  | 40 |  | 65 |  |
| Supply Voltage (V) | 1.0 |  | 1.0 |  | 1.1 |  | 1.0 |  | 1.0 |  |
| Sampling Cap. (fF) | 200 |  | 1000 |  | 200 |  | N/A |  | 80 |  |
| Sampling Rate (GS/s) | 1.75 |  | 0.8 |  | 2.2 |  | 1.25 |  | 1.25 |  |
| Resolution(bit) | 5 |  | 5 |  | 6 |  | 6 |  | 5 |  |
| ENOB@DC (bit) | 4.67 |  | 4.4 |  | 4.95 |  | 4.99 |  | 4.8 |  |
| ENOB@Nyquist (bit) | 4.28 |  | 4.18 |  | 4.87 |  | 4.2 |  | 4.37 |  |
| Power ( $\mu \mathrm{W}$ ) | 2200 |  | 1970 |  | 2600 |  | 6080 |  | 595 |  |
| FOM (fJ/Conv.-step) @DC/@Nyquist | DC | Nyq | DC | Nyq | DC | Nyq | DC | Nyq | DC | Nyq |
|  | 50 | 64.7 | 116 | 136 | 38 | 40 | 178 | 265 | 17 | 23 |
| Core Area ( $\mathrm{mm}^{2}$ ) | 0.0165 |  | 0.018 |  | 0.03 |  | 0.014 |  | 0.007 |  |

* Results are obtained from the sample with mean performance among 10 chips.


## VI. Conclusion

A 5-bit $1.25-\mathrm{GS} / \mathrm{s} 4 \mathrm{x}$-capacitive-folding flash ADC has been presented in the above. The power-hungry resistive ladder and folding amplifier have been removed by employing various techniques. While only dynamic power is consumed from the comparators and folding operation, linearity is ensured with proposed calibration schemes. The reduction of the number of comparators and simple folding logic enable a very compact ADC design that even includes on-chip calibration. This not only diminishes the design cost but also allows for implementation at very low power consumption. The prototype ADC draws only $595 \mu \mathrm{~W}$ of power from the $1-\mathrm{V}$ supply at a conversion rate of 1.25 GHz and has an FoM of $17 \mathrm{fJ} /$ Conversion-step.

## Appendix I

The settling time and power of conventional two-step flash ADC with switched-capacitor DAC switching and proposed architecture are calculated based on $10 \%$ of parasitic at the bottom-plate of sampling capacitor and $40-\mathrm{fF}$ sampling capacitor ( $10 \%$ is obtained at the layout extraction).

## A. Speed Comparison

The conventional and proposed switching methods are shown in Fig. 23 (in single-ended configuration). At sampling phase $\Phi_{\mathrm{S}}$, both bottom-plate of the conventional and proposed DAC are connected to ground (Gnd). During residue generation, the bottom-plate of 2 C and C in the conventional method will either charge to $\mathrm{V}_{\text {ref }}$ or keep Gnd depended on the coarse ADC's decision, while the proposed scheme will either shift the input by $\mathrm{V}_{\text {ref }}$ or keep Gnd. Since the conventional method needs to charge MSB and MSB/2 capacitors in the DAC array (Fig. 23), the worst settling time is at " 10 " case. While, one the other


Fig. 23. Conventional two-step and proposed folding DAC switching operation.
hand, the proposed just needs to charge the bottom-plate parasitic for voltage-shift operation, the worse case is only charging the parasitic of 4 fF . Simulation comparison has been performed based on the setup shown in Fig. 23. By using the same size of bottom-plate switches in the proposed and conventional method (DAC bottom-plate switches), the simulation result as shown in Fig. 24 indicates that the presented operation is 2 times faster (when both are at the worst settling condition).

## B. Power Comparison

The power of the residue generation in four cases is illustrated in Fig. 25. Since the switching energy for the parasitic in conventional method is small, it is excluded in the calculation. The proposed method has two cases without any switching power because only selection with no voltage shifting is performed, and the energy is only consumed on the parasitic in


Fig. 24. Conventional and proposed normalized residue output settling (both in worst case).

|  | Conventional Two-step |  |  | Proposed Folding |  |
| :---: | :---: | :---: | :---: | :---: | :---: |
| $\mathrm{C}_{\mathrm{P} 3}=2 / 5 \mathrm{C}$ | S4 | S3 | Switchig Energy | Sf | Switchig Energy |
| Sampling | 0 | 0 | 0 | 0 | 0 |
| $0<\mathrm{Vim}<\mathrm{V}_{\text {FS }} / 4$ | 0 | 0 | 0 | 1 | $\mathrm{C}_{\mathrm{p} 3} \mathrm{~V}_{\text {ret }}{ }^{2}$ |
| $\mathrm{V}_{\text {FS }} / 2<\mathrm{Vin}^{2}<\mathrm{V}_{\text {FS }} / 4$ | 0 | 1 | $314 \mathrm{CV}_{\text {ret }}{ }^{2}$ | 0 | 0 |
| $\mathrm{V}_{\text {FS }} / 2<\mathrm{Vin}<3 \mathrm{~V}_{\text {FS }} / 4$ | 1 | 0 | $\mathrm{CV}_{\text {ret }}{ }^{2}$ | 0 | 0 |
| $3 \mathrm{~V}_{\text {FS }} / 4<V_{\text {in }}<\mathrm{V}_{\text {FS }}$ | 1 | 1 | $3 / 4 \mathrm{CV}_{\text {ref }}{ }^{2}$ | 1 | $\mathrm{C}_{\mathrm{p} 3} \mathrm{~V}_{\text {ref }}{ }^{2}$ |
| Average Switching Energy |  |  | $5 / 8 \mathrm{CV}_{\text {ref }}{ }^{2}$ |  | $1 / 5 \mathrm{CV}_{\text {ref }}{ }^{2}$ |

Fig. 25. Conventional two-step and proposed folding energy for residue generation.
voltage-shift operation. This switching power is the total power for residue generation in the proposed scheme, since switches selection does not have significant power consumption. On the other hand, the conventional method needs to charge either the MSB or MSB/2 bit capacitor in three cases. The average switching power of the conventional method is around three times higher than the proposed one.

## Acknowledgment

The authors also would like to express their sincere appreciation to H . Venkatesan for proofreading.

## References

[1] Y. Zhu, C.-H. Chan, S.-W. Sin, S.-P. U, and R. P. Martins, "34 fJ 10b $500 \mathrm{MS} / \mathrm{s}$ partial-interleaving pipelined SAR ADC," in VLSI Circuits Dig. Tech. Papers, June 2012, To Appear Soon in.
[2] Y. Nakajima, A. Sakaguchi, T. Ohkido, N. Kato, T. Matsumoto, and M. Yotsuyanagi, "A background self-calibrated 6 b 2.7 GS/s ADC with cascade-calibrated folding-interpolating architecture," IEEE J. SolidState Circuits, vol. 45, no. 4, pp. 707-718, Apr. 2010.
[3] R. C. Taft, P. A. Francese, M. R. Tursi, O. Hidri, A. MacKenzie, T. Höhn, P. Schmitz, H. Werker, and A. Glenny, "A 1.8 V 1.0 GS/s 10 b self-calibrating unified-folding-interpolating ADC with 9.1 ENOB at Nyquist frequency," IEEE J. Solid-State Circuits, vol. 44, no. 12, pp. 3294-3304, Dec. 2009.
[4] C.-C. Hsu, C.-C. Huang, Y.-H. Lin, and C.-C. Lee, "10 b $200 \mathrm{MS} / \mathrm{s}$ pipelined folding ADC with offset calibration," in Proc. IEEE Eur. Solid State Circuits Conf., Sep. 2007, pp. 151-154.
[5] R. van de Plassche, CMOS Integrated Analog-to-Digital and Dig-ital-to-Analog Converters, 2nd ed. Dordrecht, The Netherlands: Kluwer Academic, 2003.
[6] S. Limotyrakis, K. Nam, and B. Wooley, "Analysis and simulation of distortion in folding and interpolating A/D converters," IEEE Trans. Circuits Syst. II, Digit. Signal Process., vol. 49, no. 3, pp. 161-169, Mar. 2002.
[7] X. Yuan, T. Shimizu, U. Mahalingam, J. S. Brown, K. Z. Habib, D. G. Tekleab, T.-C. Su, S. Satadru, C. M. Olsen, H. Lee, L.-H. Pan, T. B. Hook, J.-P. Han, J.-E. Park, M.-H. Na, and K. Rim, "Transistor mismatch properties in deep-submicrometer CMOS technologies," IEEE Trans. Electron Devices, vol. 58, no. 2, pp. 335-342, Feb. 2011.
[8] B. Verbruggen, J. Craninckx, M. Kuijk, P. Wambacq, and G. Van der Plas, "A 2.6 mW 6 bit $2.2 \mathrm{GS} / \mathrm{s}$ fully dynamic pipeline ADC in 40 nm digital CMOS," IEEE J. Solid-State Circuits, vol. 45, no. 10, pp. 2080-2090, Oct. 2010.
[9] C. Sander, M. Clara, A. Hartig, and F. Kuttner, "A 6-bit 1.2-GS/s low-power flash-ADC in $0.13 \mu \mathrm{~m}$ digital CMOS technology," IEEE J. Solid-State Circuits, vol. 40, no. 7, pp. 1499-1505, Jul. 2005.
[10] A. Ismail and M. Elmasry, "A 6-Bit 1.6-GS/s low-power wideband flash ADC converter in 0.13- $\mu \mathrm{m}$ CMOS technology," IEEE J. SolidState Circuits, vol. 43, no. 9, pp. 1982-1990, Sep. 2008.
[11] M. EI-Chammas and B. Murmann, Background Calibration of TimeInterleaved Data Converters. New York, NY, USA: Springer, 2012.
[12] S.-S. Wong, U.-F. Chio, C.-H. Chan, H.-L. Choi, S.-W. Sin, S.-P. U, and R. P. Martins, "A 4.8-bit ENOB 5-bit $500 \mathrm{MS} /$ s binary-search ADC with minimized number of comparators," in Proc. IEEE Asian Solid State Circuits Conf., Feb. 2011, pp. 73-76.
[13] G. Van der Plas and B. Verbruggen, "A $150 \mathrm{MS} / \mathrm{s} 133 \mu \mathrm{~W} 7$ b ADC in 90 nm digital CMOS using a comparator-based asynchronous binarysearch sub-ADC," in IEEE Int. Solid-State Circuits Conf. Dig. Tech. Papers, Feb. 2008, pp. 242-243.
[14] Y.-Z. Lin, S.-J. Chang, Y.-T. Liu, C.-C. Liu, and G.-Y. Huang, "A 5 b $800 \mathrm{MS} / \mathrm{s} 2 \mathrm{~mW}$ asynchronous binary-search ADC in 65 nm CMOS," in IEEE Int. Solid-State Circuits Conf. Dig. Tech. Papers, Feb. 2009, pp. 80-81.
[15] B. Verbruggen, J. Craninckx, M. Kuijk, P. Wambacq, and G. Van der Plas, "A $2.2 \mathrm{~mW} 1.75 \mathrm{GS} / \mathrm{s} 5$ bit folding flash ADC in 90 nm digital CMOS," IEEE J. Solid-State Circuits, vol. 44, no. 3, pp. 874-882, Mar. 2009.
[16] G. Van der Plas, S. Decoutere, and S. Donnay, "A 0.16 pJ/ConversionStep $2.5 \mathrm{~mW} 1.25 \mathrm{GS} / \mathrm{s} 4 \mathrm{~b}$ ADC in a 90 nm digital CMOS process," in IEEE Int. Solid-State Circuits Conf. Dig. Tech. Papers, Feb. 2006, pp. 566-567.
[17] C.-H. Chan, Y. Zhu, S.-W. Sin, S.-P. U, and R. P. Martins, "A 3.8 $\mathrm{mW} 8 \mathrm{~b} 1 \mathrm{GS} / \mathrm{s} 2 \mathrm{~b} /$ cycle interleaving SAR ADC with compact DAC structure," in VLSI Circuits Dig. Tech. Papers, 2012, pp. 86-87.
[18] C.-Y. Chen, M. Q. Le, and K. Y. Kim, "A low power 6-bit flash ADC with reference voltage and common-mode calibration," IEEE J. SolidState Circuits, vol. 44, no. 4, pp. 1041-1046, Apr. 2009.
[19] M. El-Chammas, B. Murmann, and K. Y. Kim, "A 12-GS/s 81-mW 5-bit time-interleaved flash ADC with background timing skew calibration," IEEE J. Solid-State Circuits, vol. 46, no. 4, pp. 838-847, Apr. 2011.
[20] E. Sall, M. Vesterbacka, and K. O. Andersson, "A study of digital decoders in flash analog-to-digital converters," in Proc. IEEE Int. Symp. Circuits Syst., Vancouver, BC, Canada, May 23-26, 2004, vol. 1, pp. I-129-I-132.
[21] W. Liu, F. Y. Zhong, C. Zhong, and P. Y. Chiang, "Single-channel, $1.25-\mathrm{GS} / \mathrm{s}$, 6-bit, loop-unrolled asynchronous SAR-ADC in 40 nm-CMOS," in Procs. IEEE Custom Integr. Circuits Conf., Sep. 2010, pp. 1-4.


Chi-Hang Chan (S'12) was born in Macau, China, in 1985. He received the B.S. degree in electrical engineering from the University of Washington, Seattle, WA, USA, in 2008, and the M.S. degree from the University of Macau, Macao, China, in 2012, where he is currently working toward the Ph.D. degree.

He was an Intern with Synopsys-Chipidea Microelectronics, Macau, China, during his undergraduate studies. Currently, his research mainly focuses on comparator offset calibration, Flash, Multi-bit SARADC. His research interests include Nyquist
ADC and mixed-signal circuits.

Mr. Chan was the recipient of the Chipidea Microelectronics Prize and Macao Secientific and Technological R\&D for Postgraduates Award-Postgraduate Level in 2012 and 2011, respectively for outstanding Academic and Research achievements in Microelectronics, as well as travel grant support in VLSI symposium 2012. He also is the co-recipient of the 2011 ISSCC Silk Road Award, Student Design Contest Award in A-SSCC 2011.


Yan Zhu (S'10-M'12) received the B.Sc. degree in electrical engineering and automation from Shanghai University, Shanghai, China, in 2006, and the M.Sc. and Ph.D. degrees in electrical and electronics engineering from the University of Macau Macao, China, in 2009 and 2011, respectively.

She is now a Postdoctoral Researcher with the State Key Laboratory of Analog and Mixed-Signal VLSI, University of Macau, Macao, China. She has authored and coauthored more than 20 technical journals and conference papers in her field of interests and holds four U.S. patents. Her research interests include low-power and wideband high-speed Nyquist A/D converters as well as digitally assisted data converter designs.

Dr. Zhu was the recipient of the Chipidea Microelectronics Prize and Macao Scientific and Technological R\&D for Postgraduates Award-Postgraduate Level in 2012 for outstanding Academic and Research achievements in Microelectronics, as well as the Student Design Contest award in A-SSCC 2011.


Sai-Weng Sin (S'98-M'06-SM'13) received the B.Sc., M.Sc., and Ph.D. degrees (with highest honor) in electrical and electronics engineering from University of Macau, Macao, China, in 2001, 2003, and 2008, respectively.

He is currently an Assistant Professor with the Faculty of Science and Technology, University of Macau, Macao, China, and is the Coordinator of the Data Conversion and Signal Processing (DCSP) Research Line in State-Key Laboratory of Analog and Mixed-Signal VLSI, University of Macau. He has authored one book entitled Generalized Low-Voltage Circuit Techniques for Very High-Speed Time-Interleaved Analog-to-Digital Converters (Springer, 2011), holds four U.S. patents, and over 80 technical journals and conference papers in the field of high-performance data converters and analog mixed-signal integrated circuits.

Dr. Sin is/has been the member of Technical Program Committee of 2013 IEEE Asian Solid-State Circuits Conference, IEEE Sensors 2011 and IEEE RFIT 2011-2012 Conference, Review Committee Member of PrimeAsia 2009 Conference, Technical Program and Organization Committee of the 2004 IEEJ AVLSI Workshop, as well as the Special Session Co-Chair and Technical Program Committee Member of 2008 IEEE APCCAS Conference. He is currently the Secretary of IEEE Solid-State Circuit Society (SSCS) Macau Chapter (with 2012 IEEE SSCS World Chapter of the Year Award) and IEEE Macau CAS/COM Joint Chapter. He was the co-recipient of the 2011 ISSCC Silk Road Award, Student Design Contest Award in A-SSCC 2011 and the 2011 State Science and Technology Progress Award (second-class), China.


Seng-Pan U (S'94-M'00-SM'05) received the B.Sc. and M.Sc. degree (with highest honors) from the University of Macau, Macao, China, in 1991 and 1997, respectively, and the joint Ph.D. degree from the Instituto Superior Técnico, Technical University of Lisbon, Lisbon, Portugal, in 2002 in the field of high-speed analog IC design.

Dr. U has been with Department of Electrical and Electronic Engineering, Faculty of Science and Technology (FST), University of Macau (UM), Macao, China, since 1994, where he is currently a Professor and Deputy Director of State-Key Laboratory of Analog \& Mixed-Signal VLSI of UM. During 1999-2001, he was also on leave to the Integrated CAS Group, Center of Microsystems, IST/UTL, as a Visiting Research Fellow. In 2001, he cofounded the Chipidea Microelectronics Ltd., Macau, China, and was Engineering Director and since 2003 the Corporate VP-IP Operations Asia Pacific
and site General Manager of the company for devoting in advanced analog and mixed-signal Semiconductor IP (SIP) product development. The company was acquired in 2009 by the world leading EDA and IP provider Synopsys Inc. (NASDAQ: SNPS), currently as Synopsys Macau Limited. He is also the corporate Senior Analog Design Manager and Site General Manager. He has authored and coauthored more than 120 scientific papers in IEEE/IET journal and conferences. He coholds seven U.S. patents and coauthored four books in the area of VHF SC filters, analog baseband for multi-standard wireless transceivers, and very high-speed TI ADCs.

Dr. U is currently a Senior Member of the Industrial Relationship Officer of IEEE Macau Section, the Chairman IEEE SSC and CAS/COMM Macau chapter. He has been with technical review committee of various international scientific journals and conferences for many years, e.g. JSSC, TCAS, IEICE, ISCAS and etc. He was the chairman of the local organization committee of IEEJ AVLSIWS'04, the Technical Program co-Chair of IEEE APCCAS'08, ICICS'09 and PRIMEAsia' 11 . He is currently Technical Program Committee of RFIT, VLSI-DAT, A-SSCC and Editorial Board member of Journal Analog Integrated Circuits and Signal Processing. He was the recipient of various scholarships and R\&D grants. He has received $20+$ research and academic/teaching awards and is the advisor for $20+$ various international student paper award recipients, e.g., ISSCC Silk-Road Award, IEEE DAC/ISSCC Student Design Contest, ASSCC Student Design Contest, ISCAS, MWSCAS, PRIME, etc. He has served as the founding Chairman of IEEE SSC Macau Chapter received The 2012 Best SSC Chapter Award in ISSCC'13. He received both the 2012 Macau Science \& Technology Invention and Progress Award. Both at the 1st time from Macau and the Scientific and Technological Innovation Award of Ho Leung Ho Lee Foundation in 2010, and The State Scientific and Technological Progress Award in 2011. In recognition of his contribution in high-technology research \&industrial development in Macau, he was awarded by Macau SAR government the Honorary Title of Value in 2010.


Rui P. Martins (A'88-SM'99-F'08) born on April 30, 1957. He received the B.S, M.S., and Ph.D. degrees and the Habilitation for Full-Professor in electrical engineering and computers from the Department of Electrical and Computer Engineering, Instituto Superior Técnico, Technical University of Lisbon, Lisbon, Portugal, in 1980, 1985, 1992, and 2001, respectively.

He has been with the Department of Electrical and Computer Engineering (DECE)/Instituto Superior Técnico (IST), Technical University of Lisbon (TU of Lisbon), Lisbon, Portugal, since October 1980. Since 1992, he has been on leave from IST, TU of Lisbon, and is with the Department of Electrical and Computer Engineering, Faculty of Science and Technology (FST), University of Macau (UM), Macao, China, where he is a Full-Professor since 1998. At FST, he was the Dean of the Faculty from 1994 to 1997 and he has been Vice-Rector of the University of Macau since 1997. From September 2008, after the reform of the UM Charter, he was nominated after open international recruitment as Vice-Rector (Research) until August 31, 2013. Within the scope of his teaching and research activities, he has taught 21 bachelor and master courses and has supervised (or cosupervised) $26 \mathrm{Ph} . \mathrm{D}$. and M.S. theses. He has authored and coauthored 12 books, coauthoring five and coediting seven, plus five book chapters, 266 refereed papers, in scientific journals and conference proceedings, as well as other 70 academic works, in a total of 348 publications. He has also coauthored seven U.S. patents. He has created the Analog and Mixed-Signal VLSI Research Laboratory of UM: http://www.fst.umac.mo/en/lab/ans_vlsi/website/index.html, elevated in January 2011 to State Key Lab of China (the 1st in Engineering in Macao), being its Founding Director.

Prof. Martins was the Founding Chairman of the IEEE Macau Section from 2003 to 2005 and of the IEEE Macau Joint-Chapter on Circuits And Systems (CAS)/Communications (COM) from 2005 to 2008 [2009 World Chapter of the Year of the IEEE Circuits And Systems Society (CASS)]. He was the General Chair of the 2008 IEEE Asia-Pacific Conference on Circuits And Sys-tems-APCCAS'2008, and was the Vice-President for the Region 10 (Asia, Australia, the Pacific) of the IEEE Circuits And Systems Society (CASS), for the period of 2009 to 2011. He is now the Vice-President (World) Regional Activities and Membership also of the IEEE CAS Society for the period 2012 to 2013. He is Associate Editor of the IEEE Transactions on Circuits and Systems II: Express Briefs, since 2010 and until the end of 2013. Plus, he is a member of the IEEE CASS Fellow Evaluation Committee (Class of 2013). He was the recipient of 2 government decorations: the Medal of Professional Merit from Macao Govern-
ment (Portuguese Administration) in 1999, and the Honorary Title of Value from Macao SAR Government (Chinese Administration) in 2001. In July 2010 was elected, unanimously, as Corresponding Member of the Portuguese Academy of Sciences (in Lisbon), being the only Portuguese Academician living in Asia.


Franco Maloberti (A'84-SM'87-F'96) received the Laurea degree in physics (summa cum laude) from the University of Parma, Parma, Italy, in 1968, and the Dr. Honoris Causa degree in electronics from Inaoe, Puebla, Mexico, in 1996.

He was a Visiting Professor with ETH-PEL, Zurich, Switzlerland, in 1993 and with EPFL-LEG, Lausanne, in 2004. He was a Professor of Microelectronics and Head of the Micro Integrated Systems Group University of Pavia, Pavia, Italy, TI/J.Kilby Analog Engineering Chair Professor with the Texas A\&M University and the Distinguished Microelectronic Chair Professor with University of Texas at Dallas. Currently, he is a Professor with the University of Pavia, Pavia, Italy, and Honorary Professor with the University of Macau, Macao, China. He has authored and coauthored more than 480 published papers
and five books and holds 33 patents. He has been responsible for many research programs including ten ESPRIT projects and served the European Commission in many European Initiatives. He served the Academy of Finland on the assessment of electronic research. He served the National Research Council of Portugal for the research activity assessment of Portuguese Universities. He was a Member of the Advisory Board of INESC-Lisbon, Portugal. He is the Chairman of the Academic Committee of the State Key Laboratory of Analog and Mixed-Signal VLSI, University of Macau, Macao, China.
Prof. Maloberti was VP Region 8 of the IEEE Circuits and Systems (CAS) Society (1995-1997), associate editor of the IEEE Transactions On Circuits And Systems-II: Express Briefs, President of the IEEE Sensor Council (2002-2003), an IEEE CAS BoG member (2003-2005), and VP Publications for IEEE CAS (2007-2008). He was a Distinguished Lecturer for the IEEE Solid-State Circuits Society (2009-2010) and presently is a Distinguished Lecturer for the IEEE CAS Society. He received the 2013 IEEE CAS Mac Van Valkenburg Award, the 1999 IEEE CAS Society Meritorious Service Award, the 2000 CAS Society Golden Jubilee Medal, and the IEEE Millennium Medal. He received the 1996 IEE Fleming Premium, the ESSCIRC 2007 Best Paper Award and the IEEJ Workshop 2007, and 2010 Best Paper Award. In 1992, he was a recipient of the XII Pedriali Prize for his technical and scientific contributions to national industrial production.


[^0]:    Manuscript received June 04, 2012; revised May 06, 2013; accepted May 06, 2013. Date of publication June 11, 2013; date of current version August 21, 2013. This work was supported by Macao Science \& Technology Development Fund (FDCT) through Grant FDCT/025/2009/A1. This paper was approved by Associate Editor Venugopal Gopinathan.
    C.-H. Chan, Y. Zhu, and S.-W. Sin are with the State-Key Laboratory of Analog and Mixed Signal VLSI, Faculty of Science and Technology, University of Macau, Macao, China (e-mail: ivorchan@ieee.org).
    S.-P. U is with the State-Key Laboratory of Analog and Mixed Signal VLSI, Faculty of Science and Technology, University of Macau, Macao, China, and also with the Synopsys-Chipidea Microelectronics (Macau) Limited.
    R. P. Martins is with the State-Key Laboratory of Analog and Mixed Signal VLSI, Faculty of Science and Technology, University of Macau, Macao, China, on leave from Instituto Superior Técnico/Technical University of Lisbon, 1049-001 Lisbon, Portugal.
    F. Maloberti is with the State-Key Laboratory of Analog and Mixed Signal VLSI, Faculty of Science and Technology, University of Macau, Macao, China, and also with the Department of Electronics, University of Pavia, 27100 Pavia, Italy.

    Digital Object Identifier 10.1109/JSSC.2013.2264617

