# A 0.45 V 147–375 nW ECG Compression Processor With Wavelet Shrinkage and Adaptive Temporal Decimation Architectures Chio-In Ieong, Mingzhong Li, Man-Kay Law, Senior Member, IEEE, Pui-In Mak, Senior Member, IEEE, Mang I Vai, Senior Member, IEEE, and Rui P. Martins, Fellow, IEEE Abstract—This paper presents a real-time electrocardiogram (ECG) data compression processor with improved energy efficiency while maintaining high accuracy and real-time operation. Wavelet shrinkage is exploited to filter the noise and achieve sparse ECG signal representation. Adaptive temporal decimation is proposed to achieve configurable processing to adaptively reduce the data amount and computational activities for further power reduction. Modified Huffman and run-length wavelet source coding (MHRLC) is also designed to represent wavelet coefficients with optimized average code length and reduced memory requirement. Fabricated in 0.18-µm CMOS, the ECG processor is implemented with customized near-threshold digital logics for minimum energy operation. The prototype was fully validated with the MIT-BIH Arrhythmia database. With a power consumption of 147-375 nW at 0.45 V, the proposed ECG processor exhibits a wide compression ratio ranging from 2.89 to 26.91, corresponding to a percentage-RMS-distortion from 0% to 3.11%. Index Terms—Adaptive temporal decimation (ATD), data compression processor, electrocardiogram (ECG), near-threshold digital logics, wavelet shrinkage (WS), wavelet transform (WT). #### I. INTRODUCTION EARABLE fitness tracking devices open up the possibility of recording and analyzing long-term data that are improving today's healthcare systems. The continuously generated health data (with various pieces of physiological information over time) can help detect health status, suggest lifestyle qualities, and discover symptoms of illnesses. Saving power is an important design aspect of such battery-powered systems, not only for increased operation time [1], but also for providing more functions and higher performance at the Manuscript received July 4, 2016; revised October 14, 2016; accepted November 29, 2016. Date of publication January 4, 2017; date of current version March 20, 2017. This work was supported by the Research Committee of University of Macau under Grant MYRG2015-AMSV-00140 and Grant MYRG100-FST-LMK. C.-I. Ieong, M. Li, P.-I. Mak, and M. I. Vai are with the State Key Laboratory of Analog and Mixed-Signal VLSI and FST-ECE, University of Macau, Macau 999078, China. M.-K. Law is with the State Key Laboratory of Analog and Mixed-Signal VLSI, University of Macau, Macau 999078, China (e-mail: mklaw@umac.mo). R. P. Martins is with the State-Key Laboratory of Analog and Mixed-Signal VLSI and FST-ECE, University of Macau, Macao, China, and also on leave from Instituto Superior Técnico, Universidade de Lisboa, 1049-001 Lisboa, Portugal. Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org. Digital Object Identifier 10.1109/TVLSI.2016.2638826 same power budget. Various wireless wearable devices are reported for electrocardiogram (ECG) long-term monitoring [1]-[3], typically with analog front end, analog-to-digital converter, processor, wireless, and power supply/management modules. Specialized digital processor architecture can significantly improve power efficiency compared with generalpurpose processor architecture [4], [5]. For this advantage, specialized architectures for ECG characteristic point detection are published in [6] and [7]. In [8], a fully digital front end is even exploited for further power saving. As demonstrated in [6], [9], and [10], the system power can also be effectively reduced by lowering the wireless activities, which can be achieved by introducing a local signal processor to preprocess the acquired data. Power savings can be achieved as long as $P_{\text{comp}} + P_{\text{tx}}/\text{CR} < P_{\text{tx}}$ , where $P_{\text{comp}}$ is the extra computational power, $P_{tx}$ is the power for wireless transmission, and CR is the data compression ratio. Under a limited power budget, data storage space, and/or data transmission bandwidth, the system can be designed to transmit only the detection result with local ECG characteristic point detection processors [6], [10] to effectively reduce the wireless transmission power. Yet, this approach prohibits the possibility for further medical validation and analysis, and is consequently undesirable for certain usages. For the ECG data compression processors in [11] and [12], the ECG data are encoded to achieve compression ratios (CRs) of 2.38 and 2.43, respectively. Nonetheless, the lack of flexibility in CR can result in sub-optimal system performance when only a portion of the signal is of interest with high accuracy requirement. In this case, the system power can be further optimized by mainly recording the signals of interest with high fidelity, e.g., infrequently occurring symptoms or signals happening in a specific period of time, while leaving other parts of the signal to be highly compressed. It can also improve the system adaptability to varying requirements, including CR, accuracy, and power efficiency. This paper investigates the design of a specialized ECG compression processor using wavelet shrinkage (WS), adaptive temporal decimation (ATD), and a combined modified Huffman and run-length wavelet source coding (MHRLC) architectures. Wavelet transform (WT) architecture is optimized for low switching activity. WS is exploited to filter the noise and achieve sparse ECG signal representation. The type of wavelet is systematically compared for accuracy, low hardware cost, and improved CR. ATD is proposed to achieve configurable Fig. 1. Proposed ECG processor architecture. processing to adaptively increase CR and reduce circuit switching activities. MHRLC is also designed for coding the wavelet coefficients with optimized average code length and reduced the memory requirement. A set of energy-efficient low-voltage digital logic circuits optimized for low-frequency operation are custom designed and employed. Measurement results show that the proposed ECG processor achieves ultralow-power consumption and a wide range of CR for configurable lossless or lossy compressions while maintaining good data recovery accuracy and real-time operation. This paper is organized as follows. Section II provides an overview of the proposed ECG processor architecture. Section III discusses the hardware optimization considerations in algorithm design. Section IV describes the proposed algorithm, architecture, and the optimization efforts. Section V outlines the circuit implementation and the customized low-voltage digital library design. Section VI summarizes the experimental results using the MIT-BIH Arrhythmia database, including power consumption, CR, and percentage-RMS-distortion (PRD, which is a measure of distortion after data compression, to be defined in Section III-A). The conclusions are drawn in Section VII. ## II. OVERVIEW OF ECG DATA COMPRESSION PROCESSOR Fig. 1 shows the block diagram of the proposed ECG data compression processor, which is composed of three main modules: ATD, WS, and MHRLC. The processor consists of an ECG readout front-end, ECG processor and a back-end wireless module for data transmission. This paper focuses on the algorithm and implementation of the ECG data compression core under a sub- $\mu$ W power budget. The processor supports a 12-bit signed fixed-point number as the ECG input, and outputs the encoded signal with frame and symbol coding protocol. The signal data representation is marked by (Signedness, Word Length, Fractional Length). The ECG signal and wavelet coefficients are represented in 12 bits (8 integer bits and 4 fractional bits). The external clock can be flexibly tuned from 60 Hz to 1 kHz. Notice that the baseline wander removal and noise filtering are expected to be performed in signal acquisition front end before the signal is fed into the proposed processor, as in most ECG acquisition systems. #### III. ALGORITHM DESIGN CONSIDERATIONS The algorithm design plays a key role in optimizing the ECG data compression processor. In this paper, the targets are to reduce the power computation and data storage requirements while maintaining real-time processing with improved hardware efficiency, which is defined as the low computational cost and low data storage requirements in the algorithm level. Low hardware cost of digital circuit elements is expected for realizing the hardware-efficient algorithms. This section reviews the related compression performance measures, compression methods, and its hardware efficiency. Finally, the design decisions in the algorithm level are discussed. ## A. Performance Measures PRD and CR, which measure the signal distortion after compression and the number of times the data are compressed respectively, are the two main metrics utilized to characterize the performance of the proposed ECG data compression implementation. These two metrics are widely adopted for their popularity and convenience for comparison [13]–[16]. Notice that the dc component can affect the result of PRD [16]. ECG data compression should provide a high CR and low PRD with real-time processing under a very low power budget. The information theory for data compression measures the information amount with Entropy in bits. Entropy is utilized in this paper for designing the wavelet source coding for reducing the average code length. Sparsity is a measure of the percentage of zero value data samples. For most transform-based data compression approaches, the signal sparsity can be exploited to achieve a high CR. In this paper, the widely adopted MIT-BIH Arrhythmia database [17], which consists of 48 records of ECG signals from 47 patients representing real application scenarios (e.g., both normal beats and abnormal beats with the presence of baseline wandering and high-frequency noise), is utilized for system performance evaluation. #### B. Signal Transform Signal transforms are widely employed in data compression. The coefficients after transformation can be encoded to fulfill lossless data compression. To achieve an even higher CR, lossy data compression can be employed to further distinguish and remove the trivial data in a specific transformed domain. Some transforms for data compression include fast Fourier (FFT) [14], Cosine [18], singular value decomposition (SVD) transforms [19] and WT [13], [20], [21]. However, FFT, Cosine, and SVD require segmentation of input signal with high computation complexity, leading to increased data storage, power consumption, and possible blocking effects. WT, however, can be efficiently implemented in hardware with a real-time FIR filter bank [22]. It also has an energy compaction characteristic, which can provide outputs with a reduced number of large-valued coefficients, facilitating data compression. #### C. Source Coding Source coding encodes information with fewer bits than the original representation. It can be employed together with transform-based compression methods to further reduce the data in bits. Entropy coding represents symbols with an averaged word length according to their entropy. Three main source coding methods in this category are: 1) Huffman; 2) run-length; and 3) arithmetic. Huffman coding encodes the symbols with variable lengths according to the individual symbol probabilities. With an increased number of symbols, the code lengths of symbols with low probability go up, entailing large memory storage for the codebook. Arithmetic coding [23], [24] assigns intervals of value for the symbols, with the output number representing the symbol located in a particular interval. A sequence of symbols can be represented with a single output number by iteratively subdividing intervals into subintervals based on the probability of symbols for data compression. It can achieve a higher CR than the other entropy coding methods, but is complex in terms of hardware implementation due to the requirements for iterative multiplications and divisions to achieve subinterval scaling. A normalization scheme is also required to deal with the varying subinterval sizes [24]. Runlength coding performs compression by encoding repeating symbols with their values and the number of repetitions. Even though the CR can be significantly degraded if there are limited repetitions in the input data, it is particularly suitable for long flat signal segments. #### D. Algorithm Design In this paper, WT-based compression is chosen for its configurable CR, low PRD, and low complexity, while supporting real-time processing. To achieve configurable CR while maintaining a low PRD, WS (which can be applied for ECG signal denoising and sparsification [25], [26] by applying thresholding to the wavelet coefficients) is exploited to create highly sparse wavelet coefficients. For source coding, since Huffman coding with a large dictionary size requires a large on-chip memory, modified Huffman coding with prefix and suffix is employed. Runlength coding is also implemented to take advantage of the sparsity of the wavelet coefficients. By combining modified Fig. 2. Decimations to ECG waves and the PRD (%) of the recovered signal, by simulating MIT-BIH Arrhythmia database Recording No. 100. Decimation rate $= 2^n$ . Huffman coding and run-length coding for wavelet source coding, reduced hardware complexity and power overhead can be realized while achieving a good CR. Considering the switching activity of logic gates contributing to the dynamic power of processor, ATD is employed to achieve configurable processing and reduce the data amount for further power reduction. Even though similar signal decimation methods for ECG were proposed in [15] and [27], they cannot be directly applied with the WS and MHRLC. The details of the ATD design with WS and MHRLC are presented in Section IV. ## IV. DETAILED ALGORITHM AND ARCHITECTURE DESIGN Fig. 1 shows the three main modules of the proposed processor. WT type (mother wavelet) and architecture are first selected and designed, balancing the accuracy, CR, and hardware efficiency. WS is optimized to enable global threshold estimation without PRD degradation. ATD is exploited to decimate the ECG signal adaptively by discriminating the QRS wave and P/T waves. The sparse wavelet coefficients are then compressed using the MHRLC optimized for the application. The detailed description of the modules are presented as follows. ## A. Adaptive Temporal Decimation The ECG signal temporally consists of the P/T waves (4 to 13.5 Hz) and QRS waves (8 to 27 Hz), while the common sampling rates are 128 Hz, 250 Hz, 360 Hz, 500 Hz, 720 Hz, and 1 kHz. This frequency difference can be exploited for reducing the intersample redundancy using decimation, and the data can be reconstructed at the recover side using interpolation and resampling methods. By considering the different frequency ranges between the QRS complex and P/T waves, ATD is proposed to decimate high frequency wave (QRS complex) and low frequency wave (P/T waves) with different rates. The decimation rate can be 1 (no decimation) or 2 for QRS waves, and can be configured to be 2, 4, 8, 16, or 32 for P/T waves. Fig. 3. Adaptive temporal decimation circuit. Fig. 2 shows the simulated PRD under various decimation rates with Recording No. 100 in the MIT-BIH Arrhythmia database. The QRS wave and P/T wave regions are determined based on the QRS annotations in the database, and we performed decimation within different regions accordingly: 1) QRS wave only; 2) P/T wave only; and 3) whole ECG wave. It can be observed that the PRD of the recovered signal is largest if the entire ECG waveform is decimated as expected. Also, the PRD is much higher if decimation is performed for QRS wave instead of for P/T wave. Based on this observation, it would be beneficial to decimate the QRS and P/T wave using different decimation rates to relax the power consumption while minimizing the overall PRD. In this paper, ATD is implemented for decimating uniformly prefiltered sampled signals with two decimation rates. The mean absolute deviation (MAD) serves as an estimation for distinguishing the regions of QRS waves and P/T waves [15], [28]. The $mad\_max$ is updated as the maximum value of the recent 1.5-s window of computed MAD value from ECG. The threshold estimation circuit compares $mad\_max$ and outputs the running threshold thr, by applying an empirical coefficient $thr\_coeff$ of 0.4, as given by (1) and (2) $$MAD(x) = \frac{1}{n} \sum_{i=1}^{n} |x_i - \bar{x}|$$ (1) $$thr = thr\_coeff \times mad\_max.$$ (2) As shown in Fig. 3, in the 3-bit control, $set\_dec$ sets the decimation rate while $en\_nuf$ is the enable signal for ATD. The ECG signal (ecg) is fed into the MAD block to compute the MAD value output $mad\_o$ and the delayed ECG signal $ecg\_dly$ for timing alignment according to $set\_dec$ . The MAD output $mad\_o$ is then compared with the estimated threshold for generating the 1-bit alternative frequency selection signal to distinguish temporally the preset high and the preset low decimation frequencies. Besides, $en\_nuf$ also controls MUX for signal selection. The decimation control unit decimates the input ECG signals according to the selected frequency and outputs the decimated signal $ecg\_d$ (or bypass the original signal according to $en\_nuf$ ), with rmark showing Fig. 4. MAD estimation for instantaneous decimation rate selection from two preset decimation frequencies. the instantaneous decimation frequency selection between the two, and controls the dynamic of the sampling control signal $a\_en$ , where the ECG is sampled when $a\_en$ is high. With $set\_dec = \{S3, S2, S1\}$ , S3 sets the decimation rate of the whole ECG waveform, and S2 and S1 set the extra decimation to P/T waves, resulting in QRS and P/T decimation rates of 1|2, 1|4, 1|8, 1|16, 2|4, 2|8, 2|16, and 2|32 (correlates to $set\_dec = 0 - 7$ ). The WS and MHRLC (stage 2 and 3 of the processor) both support the processing of the nonuniformly sampled signal (synchronized by the sampling control signal $a\_en$ ). Figs. 4 and 5 show the simulated signals under ATD operation. The MAD estimated *rmark* windows are well aligned with the QRS waves. The *rmark* value controls the dynamics of the sampling control signal *a\_en*, where the ECG signal is sampled when *a\_en* is high. Note that ATD not only reduces the data for compression, but also reduces the data for processing and circuit activity in the WS and Fig. 5. Adaptive temporal decimated signals. MHRLC modules. This results in power reduction, and has been verified via silicon measurement result shown in Section VI. It can be observed that the MAD signal (mad\_o) is steep and the control signal of decimation rates (rmark) is insensitive to the choice of the empirical coefficient (thr\_coeff). If the threshold increases or decreases by 20% of the recent MAD maximum value, rmark is not seriously affected and the timing estimation of QRS wave is insensitive to thr\_coeff. ## B. Wavelet Transform and Wavelet Shrinkage WT has several beneficial properties for realizing data compression in an ASIC, including the energy compaction characteristic that mostly outputs small coefficients and few large coefficients, simple FIR filter bank architecture, and realtime processing without windowing (thus no blocking effect). 1) Selection Metric for Different Types of WT: The wavelet type selection is based on 1-level wavelet decomposition, shrinkage, and signal recovery test. The total filter tap numbers of wavelets, the sparsity of thresholded wavelet coefficients, and the PRD% of recovered signals are employed as selection criteria, for balancing the hardware efficiency, CR, and accuracy of WT types (mother wavelets). The selection metric is Selection Metric = $$\frac{\text{Sparsity}}{\text{Total Filter Length} \times \text{PRD}(\%)}.$$ (3) To perform the simulation test, the clean ECG signal is first generated by averaging 1000 aligned ECG cycles from the MIT-BIH Arrhythmia Database (recording no. 100, MLII channel) for the testing. The clean ECG signal is 1-level decomposed by wavelets, and then the wavelet coefficients are thresholded using the standard deviation of its d1 coefficients. From the thresholded wavelet coefficients, the sparsity value (ranges from 0 to 1) can be found. Finally, the ECG signal is recovered, and the PRD% is computed. The selection metrics of wavelets are shown in Fig. 6. The wavelets are sorted according to PRD%. It can be observed that Bior3.1 achieves a good balance in terms of sparsity, total filter length, and PRD. The Bior3.1 mother wavelet has simple filter coefficients, and exhibits the perfect reconstruction property [22] $$F_0(z)H_0(z) + F_1(z)H_1(z) = 2z^{-l}$$ (4) $$F_0 = H_1(-z)$$ and $F_1(z) = -H_0(-z)$ (5) where the FIR filter coefficients of Bior3.1 WT are $$H_0 = [-0.25, 0.75, 0.75, -0.25] \times \sqrt{2}$$ $$H_1 = [-0.125, 0.375, -0.375, 0.125] \times \sqrt{2}$$ $$F_1 = [-0.25, -0.75, 0.75, 0.25] \times \sqrt{2}$$ $$F_0 = [0.125, 0.375, 0.375, 0.125] \times \sqrt{2}.$$ (6) The simple coefficients with finite word length after extracting the $\sqrt{2}$ decides the hardware-efficient implementation FIR filter and the exact signal recovery after inverse WT. In this paper, the WT is mainly for transform coding but not for filtering specific signal frequency. Necessary analog filtering is expected in the ECG front-end to ensure accurate signal acquisition and digitization without suffering from saturation and aliasing. 2) Wavelet Transform Architectural Optimization: Fig. 7(a) shows the "à trous" algorithm WT architecture. Zeros are inserted between different FIR filter coefficients, and the number of inserted zero increases by a power of 2 for each wavelet scale. It can generate wavelet coefficients of the same sampling rate to input signal, but it demands a huge number of data storage elements and processing to the wavelet coefficients of redundant information increases circuit switching activity. Instead, the Mallat's algorithm in Fig. 7(b) is chosen. Its down-sampling architecture removes information redundancy of wavelet coefficients and thus reducing computation amount afterward. The FIR filter order is short thereby saving the hardware resources. With the Mallat's algorithm, the input signal is decomposed and decimated to five scales d1, d2, d3, d4, and a4, and the sampling period of the scales are 2, 4, 8, 16, and 16 clock cycles, respectively. To support the adaptive temporal decimated signal, architecture for Mallat's algorithm is further developed, as shown in Fig. 7(c). Since the decimated samples only occur at specific clock cycles (the others are zeroed), a timing controller is designed to ensure correct timing and reduced hardware activities. The signals En, En1, En2, En3, and En4 are clock-div-by-2 incrementally. The reconstruction filter bank architecture for exact signal recovery is on the right of Fig. 7(c). With the delay alignment block to align the group delays of filter-bank branches, the total constant delay between the input and the reconstructed signal is 45 clock cycles. As the WT fulfills the perfect reconstruction condition, ATD should not affect the WT and the subsequent source coding in terms of accuracy. The WT after ATD is properly synchronized with the ATD timing through a\_en to ensure proper signal reconstruction. In order to reduce the computational cost, all the $\sqrt{2}$ coefficients in (6) are extracted from the coefficients in the wavelet decomposition paths and moved to the reconstruction Fig. 6. Selection of wavelet by comparing distortion (PRD), compression capability (sparsity of output signal), and computational cost (total decomposition filter tap number). Fig. 7. Wavelet decomposition architectures and the perfect reconstruction algorithm. (a) "Algorithme à trous" for wavelet decomposition. (b) Mallat's algorithm for wavelet decomposition. (c) Proposed architecture with reduced circuit activity. side instead. The amplitudes of wavelet coefficients are proportionally changed but not affecting the perfect reconstruction of signal. The wavelet filter coefficients finally can be represented simply, so that all the filters can be implemented just by summations and shifting the binary points. Finally, the proposed architecture in Fig. 7(c) achieves a significant gate count reduction of 58.4% and 64.4% when compared with other two in Fig. 7. 3) Shrinkage Architecture and Optimization: WT linearly transforms the signal to wavelet coefficients with the energy compaction characteristic and generates only small portions of large-valued coefficients. Sparse signal representation is produced by applying thresholding to insignificant coefficients. As shown in Fig. 8, the ECG signal $ecg\_d$ is fed into the WT block for processing only the decimated samples according to the ATD timing signal $a\_en$ . The d1-d4 scales are real-time estimated and thresholds are adaptively applied. The stationary points (also called peak sample) in scale d1 are detected by extracting the larger sample from neighboring samplings. The peak samples are further processed for Fig. 8. Proposed WS architecture. threshold estimation. The wavelet coefficients of scale d1-d4 are sparsified with thresholds except the scale a4 for its small data rate and insignificant content amount. The threshold estimation equations are given as if $$|peak[n]| \ge TH_c \rightarrow Signal Peak$$ if $|peak[n]| < TH_c \rightarrow Noise Peak$ $TH_c = CC^*ESPA$ $TH_w = ENPA + TC^*(ESPA - ENPA).$ (7) Here, all the peak samples are categorized to either $Signal\ Peak$ or $Noise\ Peak$ according to the preset classifying threshold $TH_c$ . The maximum values of $Signal\ Peak$ and $Noise\ Peak$ are monitored for every 1.5 seconds, then the $estimated\ signal\ peak\ amplitude$ and $estimated\ noise\ peak\ amplitude$ are extracted. Finally, the threshold value for wavelet coefficient $TH_w$ and the classifying threshold $TH_c$ are computed accordingly. Instead of using global thresholding with the same threshold applied to all the wavelet scales, this paper employs a by-level thresholding approach by considering the amplitude difference in the wavelet scales to perform more accurate thresholding procedures (Fig. 9). Yet, this requires four sets of estimation circuits for d1-d4. To minimize the hardware cost, the threshold estimation circuit is applied only in scale d1 and the threshold values in d2-d4 are scaled from that in d1 with predefined by-level gains. By feeding Gaussian white noise input the wavelet filter bank, the by-level gains are determined from the standard deviations of the wavelet coefficients in d1-d4. The optimized and the original shrinkage architecture after synthesis with Cadence RTL Compiler show a total gate count reduction of 67.5%. #### C. Source Coding and Transmission Source coding is necessary to encode the signal with short averaged code length to enhance CR and reduce power consumption. Since a 12-bit coefficient has 4096 possible values, a large memory to accommodate the codebook is required for the Huffman coding. To reduce the number of codebook entries, the coding format is divided to be prefix symbol and Fig. 9. Bior 3.1 WT and shrinkage signals. Left: wavelet coefficients and adaptive threshold applied on D1 – D4. Right: wavelet coefficients after shrinkage. suffix parts. The symbol code is thus the prefix code followed by the value of the data sample in a predefined code length. The (ZR, B2 - B12) prefix symbols represent the range of data value, and indicate the number of bits of suffix part. Considering the sparsity of wavelet coefficients, an extra prefix RLZ representing consecutive symbols of zero is defined. The format is the run-length prefix of zero followed by the number of consecutive zeroes. Since the sampling rate is time-varying after ATD, each sample of the wavelet coefficient has to be marked with either a high or a low sampling rate. To prevent an expensive extra bit to mark the rate of each sample, RH and RL are designed to represent the switching between the high and low sampling rates. The aforementioned prefix symbols are all designed for reducing the averaged code length, so that the CR is increased. The FS symbol is defined as the frame header for synchronizing every 64 sets of wavelet coefficients. These symbol definitions can effectively reduce the dictionary TABLE I SOURCE CODING PROTOCOL WITH MODIFIED HUFFMAN CODING AND RUN-LENGTH CODING FOR CODING VALUE "ZERO" AND DECIMATION RATE | Symbol | Prefix<br>Code | Meaning | Value | Total word<br>length | |--------|----------------|-------------------------------|------------------|----------------------| | ZR | 01 | Zero | 0 | 2 | | B2 | 11 | 2-bit data | -2, -1, 1 | 4 | | В3 | 100 | 3-bit data | [-4, 3] | 6 | | B4 | 1010 | 4-bit data | [-8, 7] | 8 | | B5 | 00100 | 5-bit data | [-16, 15] | 10 | | В6 | 10110 | 6-bit data | [-32, 31] | 11 | | В7 | 10111 | 7-bit data | [-64, 63] | 12 | | В8 | 000100 | 8-bit data | [-128, 127] | 14 | | В9 | 000101 | 9-bit data | [-256, 255] | 15 | | B10 | 001010 | 10-bit data | [-512, 511] | 16 | | B11 | 0011 | 11-bit data | [-1024,<br>1023] | 15 | | B12 | 00011 | 12-bit data | [-2048,<br>2047] | 17 | | RLZ | 0000 | Multiple of zeros | | 8 | | FS | 00101101 | Frame start & decimation mode | | 11 | | RH | 00101100 | Switch to high sampling rate | | 8 | | RL | 0010111 | Switch to low sampling rate | | 7 | Fig. 10. Scheduling of wavelet coefficients for sequential transmission with state machine and FIFOs for $d3_t-d1_t$ . Colors are highlighted for FIFO push (yellow), pop (blue), push and pop at the same clock cycle (green). size (from 4096 for the Huffman coding down to 16), align ATD samples and enhance CR. Table I shows the final coding design. The lossless source coding scheme is designed by considering: 1) the common characteristics of ECG wavelet coefficients and 2) its compatibility with ATD (the decimation rate of each encoded sample) based on the data rate relationship between the wavelet scales and the sparsity characteristic of the wavelet coefficients after shrinkage. All the internal thresholds are adaptive. To achieve a real-time and high throughput ECG data output, the data sample scheduler is designed to rearrange the parallel-input wavelet coefficients, as shown in Fig. 10. Fig. 11. Proposed source coding architecture. The source coder is utilized to encode the coefficients sequentially. Fig. 11 depicts the complete circuit architecture. Considering the high sparsity of wavelet coefficients in low scales, output coefficients are grouped according to its scale, enhancing the possibility of long-zero sequences and hence increased CR. The data scheduling is realized by FIFOs for $d3_t-d1_t$ . The data sample scheduler outputs the push/pop signals according to a\_en for timing synchronization, and outputs the wcoef\_seq signal containing the wavelet coefficient sample in sequence. The highlighted color in specific clock cycles illustrates the timing of the scale's instantaneous FIFO data operation, with yellow for push, blue for pop, and green for push and pop, respectively. As it can be observed one set of wavelet coefficients is 16, grouping more sets of wavelet coefficients sequentially according to scales relates to higher possibility of long-zero sequences, at the expense of long depths of FIFOs. The coder reads the *wcoef\_seq*, $a_{en}$ , and *rmark* signals and outputs the encoded signal and $bit\_length$ according to: 1) the predefined modified Huffman with run-length coding; 2) the frame format to encode the wavelet coefficients; and 3) the instantaneous decimation rate for data reconstruction. Finally, the source coding protocol reduces the averaged code length from 13.68 bits (including 12-bit wavelet coefficient, 1-bit *rmark*, $set\_dec$ , and frame header FS) to 3.49 bits, according to the simulations using the MIT-BIH Arrhythmia database, resulting in a significant CR improvement. As data communication protocol is layered, the low-level transmission mechanism is not implemented on chip. Wireless module (such as the TI CC2500 used) is expected to provide transparent trustful data communication with its data package handling with features including forward error correction and cyclic redundant check. If, eventually, there is rare case that the encoded signal is corrupted before feeding into the wireless module, the receiver side can resume the correct decoding at the start of next frame. #### V. ULTRA-LOW-VOLTAGE DIGITAL CIRCUITS DESIGN For the logic implementation, the supply voltage $V_{\rm DD}$ is approximately proportional to the energy consumption $E_T$ by an order of 2 as follows: $$E_T = C_{\text{eff}} V_{\text{DD}}^2 + W_{\text{eff}} I_{\text{leak}} V_{\text{DD}} t_d L_{\text{DP}}$$ (8) where $E_T$ is the total energy, $C_{\rm eff}$ and $W_{\rm eff}$ are the effective capacitance and width, $I_{\rm leak}$ is the leakage current, $t_d$ is the propagation delay, and $L_{\rm DP}$ is the logic depth. Reducing $V_{\rm DD}$ will easily save energy. It has been demonstrated in [29] that digital circuits operating in the subthreshold region achieve minimum energy consumption, and the use of sub/near-threshold digital cells can significantly benefit applications such as ECG signal processing, where ultra-low-power consumption with relaxed processing speed is required. Yet, the logic gates operating underneath the threshold voltage are susceptible to PVT variations that have to be thoroughly verified. In this paper, a customized standard cell library is designed to reduce the power consumption based on our previous work [30]. To achieve this, the inverse-narrow-width effect is exploited to lower the threshold voltages of nMOS and pMOS devices. Pass-gate-based architectures are also applied for sequential cells (e.g., DFF, latches, etc.) and XOR/XNOR gates to reduce the logical efforts. Unbalanced pull-up/-down network, which can improve the logic gate energy efficiencies operating in the near-threshold region, is also proposed for further power optimization. We have implemented a total of 56 cells, exhibiting an average area and power reduction of 7.13% and 35.62%, respectively. More detailed discussions on the optimization procedures can be found in [30]. Similar to the characterization process of commercial libraries, we developed three different test cases, i.e., the best case (FF corner, +10% nominal voltage, and -40 °C), the typical case (TT corner, 0.45 V nominal voltage, and 25 °C), and the worst case (SS corner, -10% nominal voltage, and 125 °C), each including the power, timing, and functional information of all the logic gates in the entire digital library. The functional correctness of each logic cell for all the cases are validated using ELC under different input slewing and output loading conditions corresponding to specified PVT points. The corresponding cells were redesigned to increase the noise margin and then further recharacterized until no further ELC errors are reported. ELC simulation results show that the resultant digital library exhibits a power dissipation of $0.63 \times$ smaller $(4.58 \times$ larger) for the best case (worse case), and an operating frequency of $0.84 \times$ faster $(1.353 \times$ slower) for the best case (worse case) when compared with the typical case at 0.45 V and 500 kHz, respectively. This design margin should be sufficient for biosignal processing applications. We have also validated the customized library using silicon measurement with a 14-tap, 8-bit FIR filter [30]. By using the custom-designed energy-efficient circuit library instead of the commercial one, the complete ECG processor achieves a 30.02% power reduction when operating at 0.45 V. #### VI. EXPERIMENTAL RESULTS The single-channel ECG data-compression processor has a gate count of 19500. The chip photograph is shown in Fig. 12. Chip photo of the fabricated ECG processor. Fig. 13. Power consumption measured from ten chips. Fig. 12, with an active area of 0.86 mm<sup>2</sup> in a 1P6M 0.18-μm CMOS process. The layout density is around 70%. The testing prototype is enclosed inside a plastic mold package. All measurements are performed under a 0.45-V supply and a 360-Hz external clock. The Agilent Modular Logic Analyzer System 16902B is employed for ECG input pattern generation (based on the MIT-BIH Arrhythmia database) as well as to monitor the processor outputs. The power consumption is measured using the Agilent 3458A Multimeter at room temperature around 25 °C. A total of ten chips were measured in mode (1,1,3) with the power consumption distribution summarized in Fig. 13. The average power is 213.5 nW. Fig. 14 shows the power consumption of different modes, and the corresponding waveform for (en\_ws, en\_nuf, set\_dec) modes. The ECG signal is ATD decimated according to the decimation control signal $a_en$ . As shown in Fig. 14(d), the decimated signal can be recovered by interpolation with low PRD. The compressed signal after WS is shown in Fig. 14(e). The power consumption reduces with set dec varying from 0 to 3 and from 4 to 7 as a result of the reduction in sampling rate. The power consumption is larger at mode (1,1,0) than at mode (1,0,0), as the ATD is enabled. Lossless compression is realized at mode (0,0,0) for accuracy while lossy compression is enabled in other modes for various levels of enhanced CR and lower power consumption. The larger power in mode (1,0,0) than in Fig. 14. Power consumptions of decimation modes and related signals. (a) Input ECG signal. (b) Adaptive temporal decimated ECG with the sampling rate in (c). (d) and (e) ATD recovered ECG and final recovered ECG after WS Mode = $(en\_ws, en\_nuf, set\_dec)$ . TABLE II MEASURED PERFORMANCE SUMMARY | Process | 0.18 μm CMOS | | | |--------------------------|-----------------------------|--|--| | Logic Count | 19.5 k | | | | Active Area (mm²) | 0.86 | | | | Operation Frequency (Hz) | 360 (tunable from 60 to 1k) | | | | Supply Voltage (V) | 0.45 | | | | Input Signal (Sa/s) | 360 (tunable from 60 to 1k) | | | | Compression Type | Lossless to Lossy | | | | CR | 2.89 to 26.91 | | | | PRD | 0% to 3.11% | | | | Power Consumption (nW) | 147 – 375 | | | mode (0,0,0) happens due to the enabling of wavelet coefficient shrinkage. The data are recovered at the PC side with decoding (according to Table I), inverse WT, and interpolation. The chip performance is summarized in Table II. Table III lists CRs and PRDs of all the compression modes. Here, the MIT-BIH Arrhythmia database is employed for testing [16], [17]. As baseline wandering is in the low frequency band, it generally does not affect the data compression performance. However, the large dc signal drift can saturate the analog front-end and result in system performance degradation. In terms of wideband noise, it can increase the signal distortion after ATD due to aliasing. As a result, the testing signal from the MIT-BIH Arrhythmia database is preprocessed by baseline wandering removal and noise filtering. It should be noted that clinical evaluation by cardiologists should be carried out in order to further validate the quality of the compressed ECG signal in real application. Lossless compression is realized with a CR value of 2.89, when shrinkage and ATD are disabled. The capability of data compression is validated from the energy compaction characteristic of the selected WT that provides the sparse output, and also the source coding that reduces the code length. Wavelet coefficients are more sparse when thresholding, thus providing a lossy compression CR of 3.12 at a small PRD of 0.12%. When ATD is enabled, the total CR ranges from 5.24 to 26.91 while the total PRD ranges from 0.42% to 3.11%. Power saving at the wireless transmitter can be estimated using the data transmission time. For the wireless module CC2500 [31]–[33], the data transmission time is determined by the baud rate and the data bits for transmission. By neglecting the preamble, synchronization and CRC bits, the power saving in wireless data transmission can be estimated by Power Saving(%) = $(1-1/CR) \times 100$ . For example, a CR of 10 can save an estimated wireless data transmission power of 90%. Table IV benchmarks this work with various ECG compression processors. With comparable data throughput and good data resolution (12 b), this paper provides higher CR and lower power consumption when comparing the lossless mode. This paper succeeds in providing a wide range of CR across lossless and lossy compression, while preserving a low PRD as well as low power consumption. Various compression modes can be real-time set by control bits for adapting to the application requirements. The optimized performances are achieved with the overall considerations of algorithm, architecture, and circuit implementation. The power optimization performances using WS architecture optimization, ATD, and near-threshold circuit are shown in Fig. 15. The WT and shrinkage architecture optimizations contribute to 46% power reduction to the overall design power. ATD reduces the overall circuit activity and lowers the power consumption as high as 56.8% according to Fig. 14. The near-threshold circuit contributes to a 30.02% power reduction when compared with the commercial circuit library working at 0.45V. The various techniques (e.g., word length optimization and clock gating) also considerably contribute to the power efficiency. While the near-threshold circuits (optimized for low-voltage operation, low-capacitance, and high speed with inverse narrow width effect) generally applicable to applications of low power and moderate clock Compression Compression PRD<sub>atd</sub> $PRD_{ws}$ $PRD_{tot}$ $CR_{ws}$ $CR_{atd}$ $CR_{tot}$ Modes\* Type (%) (%) (%) Lossless (0, 0, 0)1.00 2.89 2.89 (1, 0, 0)1.00 3.12 3.12 0.00 0.12 0.12 Lossy 0.36 0.42 Lossy (1, 1, 0)1.74 3.01 5.24 0.17 0.66 0.47 0.86 Lossy (1, 1, 1)2.80 3.59 10.00 Lossy (1, 1, 2)4.04 3.32 13.29 1.07 0.64 1.29 Lossy (1, 1, 3)5.23 3.11 15.92 1.31 0.83 1.61 Lossy (1, 1, 4)3.48 3.48 12.11 0.69 0.69 1.02 Lossy (1, 1, 5)5.57 2.77 15.34 1.07 0.84 1.39 0.95 (1, 1, 6)8.01 2.30 1.30 1.64 Lossy 18.11 Lossy (1, 1, 7)10.21 2.70 26.91 1.78 2.44 3.11 TABLE III CR AND PRD (%) UNDER DIFFERENT DECIMATION RATE SETTINGS (TESTED WITH WHOLE MIT-BIH ARRHYTHMIA DATABASE) <sup>\*</sup> Compression mode consists of $(en\_ws, en\_nuf, set\_dec)$ which are shrinkage enable, ATD enable, decimation rate setting $(set\_dec = 0 - 7)$ correlating to different decimation rates to QRS waves and P/T waves of 1|2, 1|4, 1|8, 1|16, 2|4, 2|8, 2|16, and 2|32. | | | This<br>Work | | | Tran. CE'11<br>[11] | EL'13<br>[12] | |------------------------------------|------------|---------------|-------|------------|---------------------|---------------| | Verification Level | Experiment | | | Experiment | Simulation | Simulation | | CMOS Tech. (nm) | 180 | | | 180 | 65 | 180 | | <b>VDD (V)</b> 0.45 | | | | 1.8 | 1 | 1 | | Operation Freq. (Hz) 360 (typical) | | | | 250-1k | 24 M | 100 M | | Sampling Rate (Sa/s) | | 360 (typical) | | 250-1k | 256 | NA | | BW (Bits) | 12 | | 16 | 10 | NA | | | CR | 2.89 | 5.24 | 26.91 | 8.4 or 2.1 | 2.38 | 2.43 | | PRD (%) | 0 | 0.12 | 3.11 | 0.641 | 0 | 0 | | Power/Channel (μW) | 0.313 | 0.375 | 0.147 | 6 | 56.6 | 36.4 | TABLE IV PERFORMANCE COMPARISON Fig. 15. Power reduction breakdown using the proposed techniques. rate, the algorithm and architecture designs with hardwareefficient considerations can tremendously reduce the power consumption, while requiring designers with knowledge spanning the application and the whole stake of design hierarchies. #### VII. CONCLUSION This paper reported a power-efficient real-time ECG processor suitable for long-term wireless cardiac monitoring. It innovates in algorithmic, architectural, and circuit levels to achieve power-efficient configurable data compression, namely the ATD, WS, modified Huffman and run-length source coding, as well as near-threshold digital logics. The achieved low-power consumption (375–147 nW at 0.45 V) with a wide range of CR (2.89–26.91) and low PRD (0%–3.11%) makes the proposed ECG compression processor suitable for long-term ECG monitoring. The processor is also fully validated with the MIT-BIH arrhythmia database. ## REFERENCES - [1] E. S. Winokur, M. K. Delano, and C. G. Sodini, "A wearable cardiac monitor for long-term data acquisition and analysis," *IEEE Trans. Biomed. Eng.*, vol. 60, no. 1, pp. 189–192, Jan. 2013. - [2] A. Burns et al., "SHIMMER—A wireless sensor platform for non-invasive biomedical research," *IEEE Sensors J.*, vol. 10, no. 9, pp. 1527–1534, Sep. 2010. - [3] S. S. Lobodzinski, "ECG patch monitors for assessment of cardiac rhythm abnormalities," *Prog. Cardiovascular Diseases*, vol. 56, no. 2, pp. 224–229, 2013. - [4] M. Shoaib, N. K. Jha, and N. Verma, "Algorithm-driven architectural design space exploration of domain-specific medical-sensor processors," *IEEE Trans. Very Large Scale Integr. (VLSI) Syst.*, vol. 21, no. 10, pp. 1849–1862, Oct. 2013. - [5] C. Piguet, "Ultra-low-power processor design," in *High-Performance Energy-Efficient Microprocessor Design*. New York, NY, USA: Springer, 2006, pp. 1–30. - [6] C.-I. Ieong et al., "A 0.83-μW QRS detection processor using quadratic spline wavelet transform for wireless ECG acquisition in 0.35-μm CMOS," IEEE Trans. Biomed. Circuits Syst., vol. 6, no. 6, pp. 586–595, Dec. 2012. - [7] N. Bayasi, T. Tekeste, H. Saleh, B. Mohammad, A. Khandoker, and M. Ismail, "Low-power ECG-based processor for predicting ventricular arrhythmia," *IEEE Trans. Very Large Scale Integr. (VLSI) Syst.*, vol. 24, no. 5, pp. 1962–1974, May 2016. - [8] M. Zare and M. Maymandi-Nejad, "A fully digital front-end architecture for ecg acquisition system with 0.5 V supply," *IEEE Trans. Very Large Scale Integr. (VLSI) Syst.*, vol. 24, no. 1, pp. 256–265, Jan. 2016. - [9] F. Chen, A. P. Chandrakasan, and V. Stojanović, "A signal-agnostic compressed sensing acquisition system for wireless and implantable sensors," in *Proc. IEEE Custom Integr. Circuits Conf. (CICC)*, Sep. 2010, pp. 1–4. - [10] Y. Zhang et al., "A batteryless 19μW MICS/ISM-band energy harvesting body area sensor node SoC for ExG applications," in Proc. IEEE Int. Solid-State Circuits Conf. (ISSCC), Jan. 2012, pp. 298–300. - [11] E. Chua and W.-C. Fang, "Mixed bio-signal lossless data compressor for portable brain-heart monitoring systems," *IEEE Trans. Consum. Electron.*, vol. 57, no. 1, pp. 267–273, Feb. 2011. - [12] S. L. Chen and J. G. Wang, "VLSI implementation of low-power cost-efficient lossless ECG encoder design for wireless healthcare monitoring application," *Electron. Lett.*, vol. 49, no. 2, pp. 91–93, Jan. 2013. - [13] H. Mamaghanian, N. Khaled, D. Atienza, and P. Vandergheynst, "Compressed sensing for real-time energy-efficient ECG compression on wireless body sensor nodes," *IEEE Trans. Biomed. Eng.*, vol. 58, no. 9, pp. 2456–2466, Sep. 2011. - [14] B. R. S. Reddy and I. S. N. Murthy, "ECG data compression using Fourier descriptors," *IEEE Trans. Biomed. Eng.*, vol. BME-33, no. 4, pp. 428–434, Apr. 1986. - [15] H. Kim, R. F. Yazicioglu, T. Torfs, P. Merken, H. J. Yoo, and C. van Hoof, "A low power ECG signal processor for ambulatory arrhythmia monitoring system," in *Proc. Symp. VLSI Circuits*, Jun. 2010, pp. 19–20. - [16] Y. Zigel, A. Cohen, and A. Katz, "The weighted diagnostic distortion (WDD) measure for ECG signal compression," *IEEE Trans. Biomed. Eng.*, vol. 47, no. 11, pp. 1422–1430, Nov. 2000. - [17] G. B. Moody and R. G. Mark, "The impact of the MIT-BIH arrhythmia database," *IEEE Eng. Med. Biol. Mag.*, vol. 20, no. 3, pp. 45–50, May/Jun. 2001. - [18] S. Lee, J. Kim, and J. Lee, "A real-time ECG data compression and transmission algorithm for an e-health device," *IEEE Trans. Biomed. Eng.*, vol. 58, no. 9, pp. 2448–2455, Sep. 2011. - [19] J.-J. Wei, C.-J. Chang, N.-K. Chou, and G.-J. Jan, "ECG data compression using truncated singular value decomposition," *IEEE Trans. Inf. Technol. Biomed.*, vol. 5, no. 4, pp. 290–299, Dec. 2001. - [20] Z. Lu, D. Y. Kim, and W. A. Pearlman, "Wavelet compression of ECG signals by the set partitioning in hierarchical trees algorithm," *IEEE Trans. Biomed. Eng.*, vol. 47, no. 7, pp. 849–856, Jul. 2000. - [21] A. Djohan, T. Q. Nguyen, and W. J. Tompkins, "ECG compression using discrete symmetric wavelet transform," in *Proc. IEEE 17th Annu. Conf. Eng. Med. Biol. Soc.*, Sep. 1995, pp. 167–168. - [22] G. Strang and T. Nguyen, Wavelets and Filter Banks. Philadelphia, PA, USA: SIAM, 1996. - [23] I. H. Witten, R. M. Neal, and J. G. Cleary, "Arithmetic coding for data compression," Commun. ACM, vol. 30, no. 6, pp. 520–540, Jun. 1987. - [24] R. R. Osorio and J. D. Bruguera, "A new architecture for fast arithmetic coding in H.264 advanced video coder," in *Proc. IEEE Proc. 8th Eur. Conf. Digit. Syst. Design*, Aug. 2005, pp. 298–305. - [25] R. Benzid, F. Marir, A. Boussaad, M. Benyoucef, and D. Arar, "Fixed percentage of wavelet coefficients to be zeroed for ECG compression," *Electron. Lett.*, vol. 39, no. 11, pp. 830–831, May 2003. - [26] S. Poornachandra, "Wavelet-based denoising using subband dependent threshold for ECG signals," *Digit. Signal Process.*, vol. 18, no. 1, pp. 49–55, Jan. 2008. - [27] D. A. Dipersio and R. C. Barr, "Evaluation of the fan method of adaptive sampling on human electrocardiograms," *Med. Biol. Eng. Comput.*, vol. 23, no. 5, pp. 401–410, Sep. 1985. - [28] H. Kim, R. F. Yazicioglu, P. Merken, C. Van Hoof, and H.-J. Yoo, "ECG signal compression and classification algorithm with quad level vector for ECG holter system," *IEEE Trans. Inf. Technol. Biomed.*, vol. 14, no. 1, pp. 93–100, Jan. 2010. - [29] B. H. Calhoun and A. Chandrakasan, "Characterizing and modeling minimum energy operation for subthreshold circuits," in *Proc. Int. Symp. Low Power Electron. Design (ISLPED)*, Aug. 2004, pp. 90–95. - [30] M. Z. Li et al., "Energy optimized subthreshold VLSI logic family with unbalanced pull-up/down network and inverse narrow-width techniques," *IEEE Trans. Very Large Scale Integr. (VLSI) Syst.*, vol. 23, no. 12, pp. 3119–3123, Dec. 2015. - [31] Texas Instruments. Wireless Sensor Monitor Using the eZ430-RF2500. [Online]. Available: http://www.ti.com/lit/an/slaa378d/slaa378d.pdf - [32] Texas Instruments. CC2500 Low-Cost Low-Power 2.4 GHz RF Transceiver. [Online]. Available: http://www.ti.com/lit/ds/symlink/cc2500.pdf - [33] Texas Instruments Low Power RF Transceiver IC CC1100/CC2500 Power Consumption Test, accessed Dec. 22. 2016. [Online]. Available: http://www.silicontra. com/admin/bookpic/20120110150308555.pdf Chio-In Ieong received the B.Eng. degree with major in communication engineering and minor in computer science and technology from Sun Yat-Sen University, Guangzhou, China, in 2003, and the M.Sc. and Ph.D. degrees in electrical and electronics engineering from the University of Macau, Macau, China, in 2008 and 2016, respectively, where he did his M.Sc. and Ph.D. works with the Biomedical Engineering Laboratory and the State Key Laboratory of Analog and Mixed-Signal VLSI. He was a Teacher in a middle school at Macau, instructing mathematics and physics, and leading a student robotics interest team. He has been a Graduate Assistant/Teaching Assistant, tutoring Microprocessors, Digital Controllers, Digital Signal Processing, Communication System, and Data Network Bachelor-level courses, and a Research Assistant for ECG signal processing and Brain-Computer Interface research topics in University of Macau. Since 2011, he has been a Senior Laboratory Technician and later also became the Laboratory Safety Officer at the State Key Laboratory of Analog and Mixed-Signal VLSI. He joined several research projects supported by the Research Committee of University of Macau and Macau Science and Technology Development Fund, designed algorithms/electronics systems and taped out 10+ digital ICs. He was offered studentship by the University of Macau for his Ph.D. study. He will soon attend Huawei Technologies Co., Ltd. in Shenzhen, china, as a researcher. He has authored over 15 papers in peer-reviewed academic journal (e.g., TBioCAS, TVLSI) and conference papers (e.g., EMBC, CinC, ASP-DAC, ISSCC SRP) in energy-efficient processor design, bio-signal detection, and data compression. His current research interests include energyefficient very large scale integration, bio-signal detection and data compression, and machine intelligence. Dr. Ieong was a recipient of the Student Travel Grant Award from the IEEE Solid-State Circuits Society for Student Research Preview presentation in the International Solid-State Circuits Conference. Mingzhong Li received the B.E. degree in biomedical engineering from the Zhongshan School of Medicine, Sun Yat-Sen University, Guangzhou, China, in 2011, and the M.Sc. degree in electrical and electronics engineering from the University of Macau, Macau, China, in 2014, where he is currently pursuing the Ph.D. degree in electrical and computer engineering. His current research interests include biomedical signal processing, low-power and ultralow-power VLSI circuits and systems development, as well as hologram image processing and digital microfluidics. Man-Kay Law (M'11–SM'16) received the B.Sc. degree in computer engineering and the Ph.D. degree in electronic and computer engineering from the Hong Kong University of Science and Technology (HKUST), Hong Kong, in 2006 and 2011, respectively. In 2011, he joined HKUST as a Visiting Assistant Professor. He is currently an Assistant Professor with the State Key Laboratory of Analog and Mixed-Signal VLSI, Faculty of Science and Technology, University of Macau, Macau, China. He developed an ultra-low-power fully integrated CMOS temperature sensing passive UHF RFID tag together with Zhejiang Advanced Manufacturing Institute and HKUST. He has authored or co-authored over 70 technical journals and conference papers, and holds three U.S. patents. His current research interests include the development of ultralow-power sensing circuits and integrated energy harvesting techniques for wireless and biomedical applications. Dr. Law was a Technical Program Committee Member of the Asia Symposium on Quality Electronic Design from 2012 to 2013, the Review Committee Member of the IEEE International Symposium on Circuits and Systems from 2012 to 2015, the Biomedical Circuits and Systems Conference from 2012 to 2015, the International Symposium on Integrated Circuits in 2014, and the University Design Contest Co-Chair of Asia and South Pacific Design Automation Conference in 2016. He is a member of the IEEE CAS Committee on Sensory Systems as well as Biomedical Circuits and Systems. He was a co-recipient of the ASQED Best Paper Award in 2013, the A-SSCC Distinguished Design Award in 2015, and the ASPDAC Best Design Award in 2016. He was a recipient of the Macao Science and Technology Invention Award (2nd class) by Macau Government-FDCT in 2014. **Pui-In Mak** (S'00–M'08–SM'11) received the Ph.D. degree from the University of Macau (UM), Macau, China, in 2006. He is currently an Associate Professor with the Faculty of Science and Technology - ECE, UM, and an Associate Director (Research) with the State Key Laboratory of Analog and Mixed-Signal VLSI, UM. His current research interests include analog and radio-frequency circuits and systems for wireless, biomedical, and physical chemistry applications. His group contributed seven state-of-the-art chips at ISSCC: wideband receivers ('11'14'15), micro-power amplifiers ('12'14), and ultralow-power receivers ('13'14). His team also pioneered the world-first Intelligent Digital Microfluidic Technology (iDMF) with micro-Nuclear Magnetic Resonance (μNMR) and Polymerase Chain Reaction (PCR) capabilites. He has co-authored three books Analog-Baseband Architectures and Circuits for Multistandard and Low-Voltage Wireless Transceivers (Springer'07), High-Mixed-Voltage Analog and RF Circuit Techniques for Nanoscale CMOS (Springer'12), and Ultra-Low-Power and Ultra-Low-Cost Short-Range Wireless Receivers in Nanoscale CMOS (Springer'15). Dr. Mak was an Editorial Board Member of the IEEE Press from 2014 to 2016, an IEEE Distinguished Lecturer from 2014 to 2015, a Member of Board-of-Governors of the IEEE Circuits and Systems Society from 2009 to 2011, a Senior Editor of the IEEE Journal on EMERGING AND SELECTED TOPICS IN CIRCUITS AND SYSTEMS from 2014 to 2015, a Guest Editor of the IEEE RFIC VIRTUAL JOURNAL in 2014, and an Associate Editor of the IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS I from 2010 to 2011 and from 2014 to 2015 and the IEEETransactions on Circuits and Systems II from 2010 to 2013. He is the TPC Vice Co-Chair of ASP-DAC in 2016. He was a co-recipient of the DAC/ISSCC Student Paper Award, in 2005. He was a recipient of the CASS Outstanding Young Author Award in 2010, the SSCS Pre-Doctoral Achievement Awards in 2014 and 2015, the National Scientific and Technological Progress Award in 2011, and the Best Associate Editor of the IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II from 2012 to 2013. In 2005, he was decorated with the Honorary Title of Value for scientific merits by the Macau Government. Mang I. Vai (M'92–SM'06) received the Ph.D. degree in electrical and electronics engineering from the University of Macau (UM), Macau. China, in 2002. He is currently the Coordinator of the Biomedical IC Research Line, State Key Laboratory of Analog and Mixed-Signal VLSI, UM and an Associate Professor of Electrical and Computer Engineering with the Faculty of Science and Technology, UM. His current research interests include digital signal processing and embedded systems. Rui P. Martins (M'88–SM'99–F'08) was born in 1957. He received the bachelor's, master's, and Ph.D. degrees, as well as the Habilitation for Full-Professor in electrical engineering and computers from the Department of Electrical and Computer Engineering (DECE), Instituto Superior Técnico (IST), Technical University of Lisbon, Lisbon, Portugal, in 1980, 1985, 1992, and 2001, respectively. He has been with DECE, IST, since 1980. Since 1992, he has been on leave from IST, TU of Lisbon (currently University of Lisbon since 2013). He has also been the Vice-Rector with the University of Macau (UM), Macau, China, since 1997, and is also with DECE, Faculty of Science and Technology, UM, where he has been a Chair-Professor since 2013. Within the scope of his teaching and research activities, he has taught 21 bachelor and master courses and, in UM, has supervised (or co-supervised) 40 theses, Ph.D. (19), and master's (21). He has co-authored six books and nine book chapters, 355 papers in scientific journals (104), and in conference proceedings (251), as well as other 60 academic works, in a total of 448 publications, and holds 18 Patents: U.S. (16) and Taiwan (2). He was a Co-Founder of Chipidea Microelectronics, Macao [now Synopsys] in 2001/2002, and created the Analog and Mixed-Signal VLSI Research Laboratory, UM, in 2003, elevated to the State Key Laboratory of China (the first in engineering in Macau), being its Founding Director, in 2011. Prof. Martins was the Founding Chairman of both the IEEE Macau Section during 2003-2005 and the IEEE Macau Joint-Chapter on Circuits And Systems (CAS)/Communications during 2005-2008 [2009 World Chapter of the Year of IEEE CAS Society (CASS)]. He was the General Chair of the 2008 IEEE Asia-Pacific Conference on CAS - APCCAS'2008, and was the Vice President for Region 10 (Asia, Australia, and the Pacific) of the IEEE CASS during 2009-2011. Since then, he was the Vice President (World) of regional activities and membership of the IEEE CASS during 2012-2013, and an Associate Editor of the IEEE TRANSACTIONS ON CAS (T-CAS) II: EXPRESS BRIEFS during 2010-2013, nominated the Best Associate Editor of the T-CAS II for 2012 to 2013. He was a member of the IEEE CASS Fellow Evaluation Committee in 2013 and 2014, and the CAS Society representative in the Nominating Committee, for the election in 2014, of the Division I (CASS/EDS/SSCS) - Director of the IEEE. He was the General Chair of the ACM/IEEE Asia South Pacific Design Automation Conference-ASP-DAC'2016. He is currently a Nominations Committee Member of the IEEE CASS. He was a recipient of two government decorations: the Medal of Professional Merit from Macao Government (Portuguese Administration) in 1999 and the Honorary Title of Value from Macao SAR Government (Chinese Administration) in 2001. In 2010, he was elected, unanimously, as a corresponding member of the Portuguese Academy of Sciences (in Lisbon), and is the only Portuguese Academician living in Asia.