### 26.5 A $5.5 \mathrm{~mW} 6 \mathrm{~b} 5 \mathrm{GS} / \mathrm{s} 4 \mathrm{x}$-Interleaved 3b/cycle SAR ADC in 65 nm CMOS

Chi-Hang Chan ${ }^{1}$, Yan Zhu ${ }^{1}$, Sai-Weng Sin ${ }^{1}$, Seng-Pan U ${ }^{1,2}$, R. P. Martins ${ }^{1,3}$

${ }^{1}$ University of Macau, Macao, China,
${ }^{2}$ Synopsys, Macao, China,
${ }_{3}^{3}$ Instituto Superior Tecnico, Universidade de Lisboa, Portugal
Communication devices such as 60 GHz -band receivers and serial links demand power-efficient low-resolution gigahertz-sampling-rate ADCs. However, the energy efficiency of ADCs is degraded by scaling up transistor widths in the building blocks for high speed, thus increasing the impact of intrinsic parasitics. Parallel schemes like multi-bit processing and interleaving [1], can ease the problems caused by scaling and lead to better efficiency if the hardware overhead is wisely reduced [2]. This paper presents a combination of $4 \times$ time interleaving and 3b/cycle multi-bit SAR ADC in 65nm CMOS, achieving a Nyquist FoM of $39 \mathrm{fJ} /$ conv-step for $5 \mathrm{GS} / \mathrm{s}$ at 1V supply.

Figure 26.5.1 shows the block diagram of the ADC, which consists of four interleaved channels. The differential input signals are first sampled onto the top plate of the DAC arrays through bootstrapped switches. The sampled voltages are then added/subtracted by the corresponding references $\left(\mathrm{V}_{\text {ref }, \mathrm{P}} / \mathrm{V}_{\text {ref, }, \mathrm{N}}\right)$ by controlling the bottom-plate voltage of the DACs. Next, those residual voltages are amplified by the dynamic pre-amps and latched to obtain the decisions. Interpolation is done at the adjacent outputs of the dynamic pre-amps, thus reducing the hardware complexity. Then latched decisions from the $1^{\text {st }}$ cycle control the DACs to perform the successive approximation by the segmentation SAR logic. After the comparison in the $2^{\text {nd }}$ cycle, the 6 b outputs are obtained by decoding all the latched decisions from both cycles, which eventually are overridden by the boundary-detection-code-overriding (BDCO) circuit if a switching discrepancy occurs. The other 3 channels work in the same manner in an interleaving fashion, and the output codes from all channels are combined with a multiplexer. A 3b/cycle multi-bit SAR architecture requires 7 DAC arrays and comparators, which normally leads to no advantage because of the large hardware overhead. In this prototype, however, the interpolation at the dynamic pre-amps allows reducing the hardware to 4 DAC arrays, 4 dynamic pre-amps, and 9 latches ( 2 are dummies). Furthermore, rather than a conventional Strongarm architecture, the split of the input stage and the latch stage of the comparator can divide the burden of speed and noise considerations, thus reducing the size of input pairs for lower kickback. Interpolation at the outputs of the pre-amps also has less effect on the speed and kickback than the latch interpolation and the DAC interpolation, respectively. Overall power efficiency can be improved by saving not only the DAC arrays, but also the logic controls and buffers.

Figure 26.5.2 illustrates the structure of the DAC array ( $7 / 8 \mathrm{~V}_{\text {FS }}$ reference) where the capacitors are split into unit segments to enable direct control from the latches without decoding. To avoid input common-mode variation of the comparator, VCM-based switching is adopted but the extra control logic and the reference voltage induce large area overhead. Instead of resetting to VCM during sampling, an alternate pattern of $\mathrm{V}_{\text {ref, }, \mathrm{N}}$ and $\mathrm{V}_{\text {ref, },}$ is adopted to achieve the same charge at the top plate; thus, the extra logic, reference and switches are saved. Also, the driving circuit for the bottom plates of the DAC can be a simple inverter, which maximizes the overdrive voltage with less loading. This design uses fractional capacitors rather than the bridge DAC segmentation structure to realize a compact DAC array while keeping the same functionality (Fig. 26.5.2) at the required 6 b matching accuracy. In addition, by arranging the fractional capacitors and the logic controls, the switching and pre-charging operations in the $2^{\text {nd }}$ cycle can be merged to save a pre-charge phase and its logic-switching energy; on the other hand, pre-charging typically needs to be done after comparisons to avoid corrupting the comparing voltage for multi-bit SAR architectures. The unit capacitor structure allows area sharing in both vertical and horizontal directions, which enables short routing distances to logic blocks with a compact DAC. The unit fringing capacitance is 350aF. The fractional capacitors are realized based on a similar structure with less desired coupling, and the capacitance ratios are estimated by the extraction results.

When designing high-speed SAR ADCs, there is always critical available time for the regeneration of the comparator due to the series conversion scheme and the required reset operation. While little available time leads to an incomplete
regeneration and incorrect differential SA operation, a short reset time for the comparator may cause a memory effect. This problem is even more severe when using dynamic logic since a decision cannot be recovered after an improper decision is propagated. The conventional solution that adds extra latches slows down the SA feedback loop with extra power. Instead, in this design, logic detection is placed at the outputs of the logic gate and checks whether differential controls of the DACs are deviated within the conversion cycle. Figure 26.5.3 shows the BDCO scheme and the mapping table for CASE1:7. Under these cases, the outputs are replaced by the mapped codes from CASE1:7 as the input must be very close to certain comparators' thresholds and its outputs $\left(Q_{p} / Q_{N}\right)$ do not have enough time to fully regenerate. By extending the decision time (after the comparator reset) for the CASE signal with extra gain from logic and feedback loop $\mathrm{FB}_{1}$, the error probability can be reduced, as when the boundary condition is detected by the BDCO, the error magnitude is suppressed to within 1 LSB. Based on the post-layout simulation without noise, the probability for an error magnitude of $\mathrm{V}_{\text {ret }} / 8$ is $<10^{-5}$ and $>10^{-2}$ with and without the BDCO, and the measured data depicts that the prototype achieves $\operatorname{BER}<10^{-6}$ at $\mathrm{V}_{\text {ref }} / 8$. When the input makes the differential controls of the DACs be in common, the residue is incorrect but it is not important as the ADC's output is overridden by the BDCO. The BDCO does not handle other cases where the controls are still differential but unable to settle to the final value in time, and these only lead to small error magnitude.

The clock-skew error among the four channels is minimized by sharing a global master clock in the bootstrapped circuits, as shown in Fig. 26.5.4. A modified M2 transistor is used for channel selection, and the global sampling clock is applied to M1; sources of mismatch remain mainly from sampling switches (SWs). The sampling front-ends of 4 channels are placed as close as possible to the clock generator at the center, in order to minimize the routing distance and mismatches from the main clock. The gradient effect from the sampling switches is reduced by arranging them in the same direction. Routing distance from the input to $\mathrm{SW}_{\mathrm{s}}$ (A, C from Fig. 26.5.4) and from $\mathrm{SW}_{\mathrm{s}}$ to the DACs (B) are carefully planned during layout to be identical in all channels to reduce the skewing, bandwidth and gain mismatch errors. The skew is suppressed within $\sigma=550$ fs in the post-layout simulation with Monte Carlo analysis. The offset of the dynamic pre-amps and latches is calibrated on-chip. Instead of inserting capacitor loads, this design adopts extra clocked inputs in the latch that alleviate the speed penalty. The calibration is done at foreground and it runs at full clock rate to suppress all other dynamic offsets [3].

The ADC is fabricated in 65 nm CMOS and occupies $0.09 \mathrm{~mm}^{2}$, where the on-chip calibration is $0.011 \mathrm{~mm}^{2}$, as shown in Fig. 26.5.7 with a close-up layout view. Supply and ground are used directly as references by inserting an extra capacitor in the DAC for reference division. The ADC full-scale input is $1 V_{\text {pp }}$ and the total input capacitance is around 31fF including parasitics (without PAD and ESD devices). The power consumption is 5.5 mW at 1 V supply and $5 \mathrm{GS} / \mathrm{s}(20 \%$ clk-gen. \& buffers, $42 \%$ pre-amps, latches \& bootstrapped circuit, and $38 \%$ digital). Figure 26.5.5 illustrates the measured SNDR and SFDR vs. the input frequencies, where the SNDR stays above 30.22dB up to Nyquist. Figure 26.5.5 shows the frequency spectrum after calibration at near-Nyquist-tone. The mirror images are well below 42dB at the Nyquist input. The maximum INL/DNL are $0.95 / 1.4 \mathrm{LSB}$, respectively. Figure 26.5 .6 summarizes the performance and compares it with state-of-the-art ADCs. It achieves the lowest FoM ( $39 \mathrm{fJ} /$ conv-step) and input capacitance among the $2 \mathrm{G}+\mathrm{S} / \mathrm{s}$ ADCs shown in the table. When comparing with previous works at the same technology node, this design has an improvement of $>2.5 \times$ in conversion efficiency.

## Acknowledgments.

This work was financially supported by Macao FDCT under SKL/AMS-VLSI/11Y3/SSW/FST and the Research Committee of University of Macau.

## References:

[1] B. Verbruggen, et al., "A $2.6 \mathrm{~mW} 6 \mathrm{~b} 2.2 \mathrm{GS} / \mathrm{s} 4$-Times Interleaved Fully Dynamic Pipelined ADC in 40nm Digital CMOS," ISSCC Dig. Tech. Papers, pp. 296-297, Feb. 2010.
[2] Y.-S. Shu, "A 6b 3GS/s 11mW Fully Dynamic Flash ADC in 40nm CMOS with Reduced Number of Comparators," VLSI Symp. Circuits, pp. 26-27, 2012.
[3] V. H.-C. Chen, et al., "An $8.5 \mathrm{~mW} 5 \mathrm{GS} / \mathrm{s} 6 \mathrm{~b}$ Flash ADC with Dynamic Offset Calibration in 32nm CMOS SOI," VLSI Symp. Circuits, pp. C264-C265, 2013.


Figure 26.5.1: 4 x -interleaved 3b/cycle SAR ADC architecture.


Figure 26.5.3: Logic implementation and signal behavior of the SA logic with BDCO scheme.


Figure 26.5.5: Spectrum with near-Nyquist input (decimated 625x), measured SFDR and SNDR vs. input and SNDR/SFDR @ Nyquist input for 3 samples.


Figure 26.5.2: Comparison between typical 3b/cycle DAC and modified structure.


Figure 26.5.4: Floorplan of interleaving front-end.

| Data From ISSCC\&VLSI (recent 5 years, $f_{s}>2 \mathrm{GS} / \mathbf{s}$ ) |
| :--- |

Figure 26.5.6: Benchmark and table of comparison with the state-of-the-art ADCs.


Figure 26.5.7: Chip microphotograph and close-up layout view.

