## Low-complexity, full-resolution, mirrorswitching digital predistortion scheme for polar-modulated power amplifiers

W.-H. Yu, W.-F. Cheng, Y. Li, C.-F. Cheang, P.-I. Mak and R.P. Martins

Proposed is a mirror-switching digital predistortion (DPD) scheme with low complexity and full resolution, tailored for power-efficient polar-modulated power amplifiers with nonlinear AM-AM and AM-PM characteristics. The involved digital circuitry is composed of just one CORDIC operator and two 1D look-up tables, avoiding any real-time interpolation or analogue operation. The DPD scheme is verified on a FPGA and the estimated power using a 65 nm CMOS technology is 5.2 mW. The training time is ~82  $\mu$ s at a clock rate of 100 MHz. System-level simulations in MATLAB show significant improvements of error vector magnitude from 104.8 to 2%, and adjacent channel leakage ratio from 21.36 to 49.27 dB, under a 20 MHz-bandwidth 64-QAM OFDM test signal.

*Introduction:* Nonlinear power amplifiers (PAs) featuring high poweradded efficiency are of particular interest in portable wireless devices to enhance battery life. Various analogue, digital and mixed-signal PA linearisation techniques have been reported, aimed at managing the adjacent channel leakage ratio (ACLR) and error vector magnitude (EVM) with minimum power and area overheads. With the rapid down-sizing of CMOS technologies to the nanoscale regime, the purely digital predistortion (DPD) techniques become the most promising candidate for their scalability, accuracy, power and area efficiencies.

Among the existing DPD techniques, complex gain [1], the 2D lookup table (LUT) [2], polynomial approximation [3] and the iterative LUT [4] are the most representative. Complex gain and 2D LUT techniques regrettably exhibit a hard trade-off between accuracy and memory to store massive amounts of data, otherwise interpolation will be entailed that intensifies the design complexity and inaccuracy. Although the polynomial approximation and iterative LUT techniques can enhance the accuracy-to-memory ratio, an iterative algorithm is required to continously compute the polynomial coefficients, or update the LUT entries in real-time, demanding a long training time while being easily unstable for highly nonlinear PAs (e.g. class-E). Finally, lowering the resolution via dropping the number of coefficients or table entries is also undesirable, as it simply degrades the effectiveness of the DPD.

In this Letter a mirror-switching DPD scheme is proposed, exploiting the principle that the desired AM-AM predistortion curve and the PA distortion curve are *mirrored* along the ideal x = y curve. Thus, once the input and output of the LUTs are exchanged, the desired AM-AM predistortion curve can be obtained with low complexity, and no resolution degradation. It is known that the stability and resolution are the prime concerns of DPD techniques for highly nonlinear PAs. The principle and implementation are detailed next.

*Proposed DPD scheme:* The nonlinearity of a PA can be characterised by its AM-AM and AM-PM distortion curves. Since the desired AM-AM predistortion curve and the PA distortion curve are mirrored along the ideal x = y curve, exchanging the input and output of the LUT directly yields the desired AM-AM predistortion curve with a high accuracy-to-memory ratio (e.g.  $2 \times N$  memory points yield  $N^2$ points in accuracy). An exchange of x- and y-axis in implementation is simply the switching of AM address and data terminal of the LUTs between the training and DPD operation (Fig. 1a). For the AM-PM curve, no such kind of inversion is necessary. The AM-PM data is directly combined to the output via simple subtraction during the operation phase (Fig. 1b).

In the training phase a preset ramp signal in polar form is applied to the PA. The frequency-modulated output signal is then looped back via re-using the receiver path, for digital time and gain alignments. The two training steps are: 1. to write the received data into the LUTs, and 2. to fill up the missing points with linear interpolation. Note that it is *not* a real-time interpolation but executes once per training. The overall training time is the loop delay time plus twice the training signal length for the table writing and interpolation process. The CORDIC and predistortion LUTs are re-used in the operation phase, minimising the total add-on hardware.

The use of a CORDIC algorithm for Cartesian [I, Q] and polar  $[M, \Phi]$  co-ordinate conversion has several distinct benefits for polar-modulated

PAs. First, the distortion arises from the AM instead of the I/Q allowing the use of a 1D table instead of 2D, saving much memory usage. Secondly, the CORDIC greatly reduces the complexity of the time alignment block, given that the phase information need not be rotated due to the delay of the whole feedback loop, and in this process, the amplitude data is unaffected. Thus, one can use the amplitude information to effectively determine the time delay.



**Fig. 1** *Proposed DPD scheme for polar-modulated PA a* Training phase (receiver re-used for loop back) *b* Operation phase (transmitter and receiver are independent)

The accuracy of the conventional CORDIC algorithm [5] is limited by the resolution of the analogue-to-digital converter (one sign bit) since it puts the maximun iteration number equal to its bit. In this Letter, the accuracy is improved by extending the bits and stages with tailed zero. An optimised 18-stage pipelined zero-oriented architecture balances the performance with power, as shown in Fig. 2. Its power consumption is accurately determined under the Cadence Encounter<sup>TM</sup> with the 65 nm CMOS process. The power of the CORDIC is 1.3 mW when clocked at 100 MHz, of which 793  $\mu$ W is due to the bit extension.



Fig. 2 CORDIC algorithm optimisation bit extensions

*Test results:* The system-level fixed-point simulations were carried out in MATLAB. The nonlinear PA model features considerable AM-AM and AM-PM distortions as shown in Fig. 3. Before and after the DPD with a training time of ~83  $\mu$ s, the ACLR is enhanced from 21.36 to 49.27 dB (Fig. 4*a*) at 11 MHz, while the EVM is reduced from 104.8 to 2% (Fig. 4*b*). The entire DPD scheme was tested on a FPGA at a clock rate of 100 MHz and the functionality has been proved. The power breakdown is summarsied in Table 1. Table 2 benchmarks this work to the prior arts [2–4]. This work is advantageous for its high accuracy-to-memory ratio, fast training time and low circuit complexity.



Fig. 3 PA model



**Fig. 4** *System-level fixed-point simulations a* Spectrum before (left) and after (right) DPD *b* Constellation before (left) and after (right) DPD

Table 1: Power consumption simulated with 65nm CMOS process

|               | CORDIC | LUTs    | Interpolation | Averaged<br>total power                     |  |
|---------------|--------|---------|---------------|---------------------------------------------|--|
| Dynamic power | 1.3 mW | 50.1 μW | 412 μW        | 5.2 mW (timing and gain alignment excluded) |  |
| Leakage power | 134 µW | 3.5 mW  | 0.256 µW      |                                             |  |

|                      |                                        | [2]      | [3]        | [4]                 | This work                       |
|----------------------|----------------------------------------|----------|------------|---------------------|---------------------------------|
| Methods              |                                        | 2-D LUT  | Polynomial | Iterative<br>1D LUT | Mirror switching<br>plus 1D LUT |
| Accuracy             |                                        | 256      | 11 order   | 64^2                | 4096^2<br>(full resolution)     |
| Memory (data points) |                                        | 256      | NA         | 2*64                | 2*4096                          |
| Training time        |                                        | 570.4 μs | NA         | 900 μs <sup>1</sup> | 83 μs <sup>2</sup>              |
| Analogue blocks      |                                        | Yes      | No         | No                  | No                              |
| Digital<br>blocks    | CORDIC                                 | 0        | NA         | 2                   | 1                               |
|                      | Iterative                              | No       | Yes        | Yes                 | No                              |
|                      | Real-time interpolation                | Yes      | No         | Yes                 | No                              |
|                      | Real-time<br>polynomial<br>calculation | No       | Yes        | No                  | No                              |

Table 2: Performance summary and comparison

<sup>1</sup> No training signal required

<sup>2</sup> Loop delay time is excluded

Conclusion: A low-complexity, full-resolution mirror-switching DPD scheme is proposed for nonlinear polar-modulated PAs. The key

principle is that the desired AM-AM predistortion curve and the PA distortion curve are mirrored along the ideal x = y curve; exchanging the input and output of the LUT led directly to the desired AM-AM predistortion curve. Testing under a highly nonlinear PA, the EVM is improved from 104.8 to 2%, and the ACLR is enhanced from 21.36 to 49.27 dB, with power consumption of 5.2 mW in a 65 nm CMOS process.

© The Institution of Engineering and Technology 2012

*13 June 2012* doi: 10.1049/el.2012.2073

One or more of the Figures in this Letter are available in colour online.

W.-H. Yu, W.-F. Cheng, Y. Li, C.-F. Cheang, P.-I. Mak and R.P. Martins (*State-Key Laboratory of Analog and Mixed-Signal VLSI, University of Macau, Macao, China*)

## E-mail: pimak@umac.mo

R.P. Martins: Also with the Instituto Superior Técnico (IST)/TU of Lisbon, Portugal

## References

- Nagata, Y.: 'Linear amplification technique for digital mobile communications'. Proc. IEEE Vehicular Technology. Conf., Kawasaki-City, Japan, 1989, Vol. 1, pp. 159–164
- 2 Chung, S.W., Holloway, J.W., and Dawson, J.L.: 'Energy-efficient digital predistortion with lookup table training using analog Cartesian feedback', *IEEE Trans. Microw. Theory Tech.*, 2008, 56, (10), pp. 2248–2258
- 3 Hong, S., Woo, Y.-Y., Kim, J., Cha, J., Kim, I., Moon, J., Yi, J., and Kim, B.: 'Weighted polynomial digital predistortion for low memory effect Doherty power amplifier', *IEEE Trans. Microw. Theory Tech.*, 2007, 55, (5), pp. 925–931
- 4 Presti, C.D., Kimball, D.F., and Asbeck, P.M.: 'Closed-loop digital predistortion system with fast real-time-adaptation applied to a handset WCDMA PA module', *IEEE Trans. Microw. Theory Tech.*, 2012, **60**, (3), pp. 604–618
- 5 Andraka, R.: 'A survey of CORDIC algorithms for FPGA based computers'. Proc. ACM/SIGDA Conf., New York, USA, 1998, p. 191