# A 10-Gb/s Low Jitter Single-Loop Clock and Data Recovery Circuit With Rotational Phase Frequency Detector

Fan-Ta Chen, Min-Sheng Kao, Yu-Hao Hsu, *Student Member, IEEE*, Jen-Ming Wu, *Member, IEEE*, Ching-Te Chiu, *Member, IEEE*, Shawn S. H. Hsu, *Member, IEEE*, and Mau-Chung Frank Chang, *Fellow, IEEE* 

Abstract—This paper presents a rotational phase frequency detector (RPFD) for reference-less clock and data recovery circuit (CDR). The proposed RPFD changes the bang-bang phase detector (BBPD) characteristic from a bidirectional phase detection to an unilateral phase detection for capturing clock frequency. The phase-and-frequency lock loop (PFLL) locks the clock frequency and the clock phase alternatively. The single-loop CDR replaces the dual-loop CDR so as to eliminate the noise contribution from the frequency lock loop (FLL). This proposed design is fabricated in TSMC mixed-signal 1P9M 90-nm standard CMOS process with overall die size of 0.71-mm<sup>2</sup>. With input 10-Gb/s data of a  $2^{31} - 1$  PRBS, the CDR tracks free running clock over the capture range of 1.48 GHz and locks in the acquisition time of 20  $\mu$ s. At the same time, the peak-to-peak jitters show only 5.0 ps in the recovered clock and exhibits 15.11 ps in the recovered data. The measured chip consumes 92 mW with 1.0-V supply voltage.

*Index Terms*—Bang-bang phase detector (BBPD), clock and data recovery (CDR), frequency detector (FD).

## I. INTRODUCTION

C LOCK AND DATA recovery (CDR) plays a significant role in the modern wireline communication for 3R Regeneration (reamplifying, reshaping, and retiming). At the receiving front end, the internal clock is synchronized with incoming data before data processing. A built-in CDR circuit is required for the high speed commercial I/O, such as the high-definition multimedia interface (HDMI), the Intel Thunderbolt, and the universal serial bus (USB) [1]. Due to the limitation of transmission interface, high speed data usually transmits alone without a reference clock; this is difficult for clock synchronization. Thus, a simple and efficient CDR to make reference-less clock synchronization is more appealing.

Nowadays, modern CDR design mainly uses a dual-loop architecture of frequency lock loop (FLL) and phase lock loop (PLL) [2]. As shown in Fig. 1(a), FLL is responsible for frequency capture because the capture range of PLL is limited to

Manuscript received January 12, 2014; revised April 05, 2014; accepted April 29, 2014. Date of publication July 15, 2014; date of current version October 24, 2014. This work was supported by the National Science Council, Taiwan (under grants NSC-102-2221-E-007-144- and NSC-100-2221-E-007-089) and National Chip Implementation Center (CIC), Taiwan. This paper was recommended by Associate Editor N. Krishnapura.

The authors are with Institute of Communications Engineering, Department of Electrical Engineering, National Tsing Hua University, Hsinchu, Taiwan 30013 (e-mail:jmwu@ee.nthu.edu.tw; fanta524cf@gmail.com; kaom0711@gmail.com; yhhsup@tsmc.com; ctchiu@cs.nthu.edu.tw; shhsu@ee.nthu.edu.tw; mfchangucla@gmail.com).

Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org.

Digital Object Identifier 10.1109/TCSI.2014.2327291



Fig. 1. (a) Dual loops CDR with FLL and PLL. (b) Proposed single loop CDR with PFLL.

keep clock quality up. The frequency detector (FD) has been divided into two main categories. The one category with reference clock uses phase-locked loops (PLLs) to capture clock frequency before data comes [3]. The burst-mode CDR makes the frequency-locked clock aligned with input data in several bits [4]–[7]. The other category without reference clock directly compares input data and internal clock for frequency capture, such as the rotational FD (RFD) [8]–[10], the Pottbacker FD [11], [12], the training signal generator (TSG) [13], and the quadrature divider FD (QDFD) [14], [15]. It is noteworthy that the bang-bang phase and frequency detector (BBPFD) [16] and the modified transconductance  $G_m$ -stage [17] take advantage of the BBPD to capture clock frequency.

A dual-loop CDR provides flexibility as designer can optimize the bandwidth of FLL and PLL independently. The bandwidth of FLL is much lower than that of PLL so as to reduce the noise contribution from FD for ensuring clock quality of voltage-controlled oscillator [18]. In practice, the FD must be turned off after frequency capture to prevent it from interfering with the sensitive control voltage  $V_{CTRL}$  and is turned on again while frequency locking is lost. Compared with FLL, PLL has smaller frequency capture range  $< \pm 5\%$  [16]. If the frequency capture range of PLL can be enhanced without increasing the

1549-8328 © 2014 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications standards/publications/rights/index.html for more information.



Fig. 2. PFLL operation and RPFD characteristic diagram. (a) Phase-tracking mode with bidirectional phase detection. (b) Frequency-tracking mode with unilateral phase detection.

bandwidth, a CDR will ensure low jittering clock by eliminating the noise contributed FLL [19].

In this paper, as shown in Fig. 1(b), a rotational phase frequency detector (RPFD) is proposed to track both phase and frequency. In general, the conventional bang-bang phase detector (BBPD) has been well-known for high speed data recovery to detect clock phase [20] but with the drawback of limited phase detection range for capturing frequency. The proposed RPFD enhances the phase detection range of the BBPD to achieve the frequency detection. Based on our previous work [21], the proposed RPFD simplifies the phase roller (PR) to reduce the Data-to-SW response time and increases the phase swapping probability for frequency acquisition. Therefore, a phase-and-frequency lock loop (PFLL) is implemented to replace the dual-loop CDR so as to realize a single-loop CDR, which PFLL alternatively tracks clock frequency and phase. As a result, the noise contributed FLL is unnecessary for the PFLL to increase frequency capture range, and thus it brings substantial benefits to clock quality.

The remainder of this paper is organized as follows. Section II depicts the conventional BBPD to build the overall RPFD with characteristic analysis, circuit architecture, and waveform proof. The discussion between frequency acquisition simulation and tracking time analysis is shown in Section III. Experimental results and literature comparison are shown in Section IV, and finally they are followed by a conclusion in Section V.

## II. PROPOSED PFLL ARCHITECTURE

The proposed PFLL alternatively operates in the frequency-tracking mode and the phase-tracking mode to achieve the function as the dual-loop CDR. We propose the RPFD with dual-mode phase and frequency detection to synchronize internal clock CK with incoming data  $D_{IN}$ . As shown in Fig. 2(a), the inner BBPD determines bidirectional phase detection of phase LEAD and phase LAG. In the frequency-locked clock, the PR turns off the crossbar switch (XBSW) to conduct current from the V/I converter. Thus, the direction of the control current  $I_{CTRL}$  reflects phase LEAD and phase LAG directly accumulating in the loop filters (LFs). Finally, the proposed PFLL functions as a PLL to decrease phase difference  $\Delta\phi$  between rising edge of clock and data center for phase alignment.

While input data comes, the frequency difference  $\Delta f$  between the clock frequency  $f_{CK}$  and the desired clock frequency  $f_D$  results that the phase difference accumulates. As shown in Fig. 2(b), the RPFD exhibits the unilateral phase detection in frequency-unlocked clock, phase LEAD in fast clock ( $f_{CK}$  >  $f_D$ ) and phase LAG in slow clock ( $f_{CK} < f_D$ ). For clock synchronization, phase LEAD of slow clock and phase LAG of fast clock are swapped to the opposite phase detection in the phase swapping range  $[\pm(2k-1)\pi,\pm 2k\pi]$ , k is a positive integer. In practice, the phase swapping PR alternatively switches on the XBSW while SW generates periodic pulse as phase changes. The XBSW exchanges the current direction of the  $I_{CTRL}$  for the opposite phase detection of the BBPD. Thus, the bidirectional phase detection of phase LEAD and phase LAG is transformed into phase LEAD in fast clock and phase LAG in slow clock. Finally, the RPFD changes clock frequency with the  $V_{CTRL}$ while the alternative voltage of the  $V_{BBPD}$  is transformed into the direct current of the  $I_{CTRL}$  constantly accumulating in the LFs. The dual-mode RPFD exhibits the unilateral phase detection for frequency capture and the bidirectional phase detection for phase alignment independently.

# A. Bang-Bang Phase Detector

The conventional BBPD is well-known for handling high speed data for phase detection. As shown in Fig. 3(a), the BBPD consists of time decision D-type flip-flops (DFF) and logic decision exclusive-or gates (XOR) [22]. To deal with ten gigabit data, DFF and XOR adopt the current-mode-logic (CML) architecture [6]. First of all, input data is recovered by the rising and the falling edge-triggered DFF in Q1 and Q2, respectively. Next, at the next rising edge of clock, the reference signal Q3 and the compare signal Q4 are simultaneously retimed. Finally, the XOR X1 and X2 compare Q4 with lagging Q3 and leading Q1 for  $UP = Q3 \oplus Q4$  and  $DN = Q1 \oplus Q4$ , respectively. We define the BBPD output  $V_{BBPD} = UP - DN$  indicating the phase detection result.

As the characteristic diagram shown in Fig. 2(a), the BBPD generates positive output  $(V_{BBPD} > 0)$  for phase LAG and



Fig. 3. RPFD. (a) BBPD. (b) V/I converter with XBSW. (c) Characteristic.

negative output  $(V_{BBPD} < 0)$  for phase LEAD in the frequency-locked range  $[-\pi, +\pi]$ . However, this case is based on the frequency-locked clock or the frequency-unlocked clock within the limited frequency capture range. While the accumulated phase difference stays in the frequency-locked range, minor frequency difference can be easily corrected by the BBPD. Once clock phase exceeds the frequency-locked range, the accumulated phase difference steadily increases in slow clock and decreases in fast clock. Although the long-term average of the  $V_{BBPD}$  approaches to 0, PLL fails to align clock phase and stays in meta-stable state.

The proposed RPFD leverages the conventional BBPD characteristic and changes the phase detection result for capturing frequency. In frequency-unlocked clock, the phase difference increases for slow clock ( $0 < \Delta \phi < +k\pi$ ) and decreases for fast clock ( $-k\pi < \Delta \phi < 0$ ). As long as the phase detection of the BBPD keeps phase LEAD for fast clock and phase LAG for slow clock, the phase difference will converge to zero ( $\Delta \phi \rightarrow 0$ ). Because the opposite polarization of the  $V_{BBPD}$ is swapped, the proposed RPFD shows the unilateral phase detection, such as phase LEAD of fast clock ( $V_{CTRL} < 0$ ) and phase LAG of slow clock ( $V_{CTRL} > 0$ ). As a result, the frequency-locked range [ $-\pi$ ,  $+\pi$ ] of the BBPD is enhanced to [ $-k\pi$ ,  $+k\pi$ ] in the RPFD.

## B. V/I Converter With XBSW

Due to output swing and propagation delay, swapping the BBPD output (UP and DN) is hard to implement for high speed data transmission. It provides a simple way that the V/I converter transfers the  $V_{BBPD}$  into the alternative control current  $I_{CTRL}$  (charge and discharge). As shown in Fig. 3(b), M1-M2 senses the voltage UP and mirrors the charging current in M7-M8. At the same times, M3-M4 converts the voltage DN to the discharging current for balancing the  $I_{CTRL}$ . The differential control current,  $I_{CTRL} = +i_{CTRL} - (-i_{CTRL})$ , of the V/I converter is adopted to make the polarization of the  $V_{BBPD}$  change possible [23]. Fig. 3(c) displays the average of differential current  $I_{CTRL}$  with maximum current  $I_P$  while the  $\overline{V_{BBPD}}$  outputs in the linear region,  $|\Delta\phi| < \phi_m$ .

The XBSW has been developed between the V/I converter and the LFs. To exchange the current direction, the cross and the bar transmission gates are alternatively conducted by the on-off SW. In frequency-unlocked clock, the polarization of the control current  $I_{CTRL}$  is swapped in SW = 1 and restored in SW = 0. For the unilateral phase detection (LEAD or LAG), the XBSW is activated in phase LEAD of slow clock and phase LAG of fast clock so that the alternative current of  $I_{CTRL}$  is transformed into the direct current (discharge or charge). While the XBSW is continuously turned off in frequency-locked clock, the bar transmission gates are short circuited to easily conduct current for minimal impact on clock quality.

## C. Phase Roller

The phase roller (PR) determines the operation mode of the RPFD. When the PR is in the phase-tracking mode, the RPFD detects the phase bidirectionally. When the PR is in the frequency-tracking mode, the RPFD detects the phase unilaterally. As the phase difference changes, the inner BBPD alternatively determines the bidirectional phase detection of phase LEAD and phase LAG. The PR turns off the XBSW in the frequency-locked range  $[-\pi, +\pi]$ . Once the accumulated phase difference exceeds  $[-(2k-1)\pi, +(2k-1)\pi]$ , the PR turns on the XBSW until the phase difference passes through  $[-2k\pi, +2k\pi]$ . The RPFD detects unilateral phase detection while the bidirectional phase detection is swapped in the phase swapping range  $[\pm(2k-1)\pi,\pm 2k\pi]$ . Moreover, the phase swapping PR must distinguish phase change between the beginning phase difference of odd  $\pi$ ,  $\Delta \phi = \pm (2k-1)\pi$ , and the ending phase difference of even  $\pi$ ,  $\Delta \phi = \pm 2k\pi$ . We set the window area in the center of data bit ranging  $\left[-\pi/2, +\pi/2\right]$  to distinguish them.

To extract the window area, the PR uses a sequential delay chain to equally divide a bit time  $T_b$  of 10 Gb/s data rate ( $T_b =$ 100 ps). As shown in Fig. 4(a), the input *n*th bit data (d[n]) shifts the fixed delay time of  $T_b/4$  and  $3T_b/4$ , which are generated by the CML delay buffers with external control bias voltage  $V_{BIAS}$  [24]. In Fig. 4(b), the delayed data D' and D'' is recovered at the falling edge of clock and then the XOR X3 extracts the the out-of-phase (OP), which generates in the logical exclusive bit such as  $OP[n] = d[n-1] \oplus d[n]$ . While the falling and the rising edge of clock samples in the window area ranging  $[-\pi/2, +\pi/2]$ , phase state is exhibited in the out-of-phase with OP = 1 and the in-phase with OP = 0. Thus, we can set the phase swapping process to begin in odd  $\pi$  and to end in even  $\pi$ .

While clock samples in the edge of data, the phase difference changes between LEAD and LAG at the phase difference of integer  $\pi$  ( $\Delta \phi = \pm k\pi$ ). To copy the phase detection result, the phase swapping PR takes advantage of the recovered data Q1 and Q2 from the first stage of the BBPD. In the DFF F9, the rising edge of Q1 samples in the inverse of Q2 to compare the order of the rising and the falling clock. The lead detection LD reflects phase detection result when LD goes high for phase LEAD and LD goes low for phase LAG. Due to the rising edgetriggered DFF F9, the LD only transits at the rising of the data transition ( $\overline{d[n-1]}d[n] = 1$ ).

Fig. 4(c) shows the XBSW operation that the OP and the LD determines the on-off SW together. In the DFF F7 and F8, the LD is responsible for sampling the OP and then resets the phase swapping process. To avoid the interference between the sample



Fig. 4. (a) PR architecture and operation. (b) Out-of-phase extraction. (c) XBSW operation.

and the reset, the propagation delay of LD-to-CK is longer than that of LD-to-Reset with one more inverter N1 stage.

As mentioned in Fig. 2(b), when fast clock decreases the phase difference to negative odd  $\pi$ , the phase detection changes from LEAD to LAG. The falling edge-triggered DFF F7 senses the falling LD samples in out-of-phase and turns on the XBSW until LD rises to reset the DFF at negative even  $\pi$ . In contrast, LD goes high in the rising edge-triggered DFF F8 when slow clock changes phase detection from LAG to LEAD at positive odd  $\pi$ . The XBSW is turned on until LD falls to reset at positive even  $\pi$ . Finally, the next stage of the OR gate combines the time-division outputs Q7 and Q8 into the on-ff SW. To sum up, the phase swapping PR activates by the transition of LD in the out-of-phase (OP = 1) until LD transits again in the in-phase (OP = 0). The phase swapping process still holds proper function with the centered window area as long as input jittering data is no more than  $T_b/2$ .

# D. RPFD Waveform

To further analyze frequency acquisition, we define that the frequency difference  $\Delta f$  is the difference between the internal clock frequency  $f_{CK}$  and the desired-clock frequency  $f_D$ , which synchronizes with input data rate. Let the frequency deviation h be the ratio of the frequency difference to the desired-clock frequency. Thus, the h is defined as

$$h = \frac{f_{CK} - f_D}{f_D} = \frac{\Delta f}{f_D}, \quad 0 \le |h| \le \frac{1}{2}$$
 (1)

where |h| is no more than 1/2 to ensure the phase difference increasing in slow clock and decreasing in fast clock. The inverse frequency deviation,  $h^{-1}$ , represents the number of passing-through data bits while the phase difference accumulates  $2\pi$ . As shown in Fig. 5, the  $h^{-1}$  exhibits 6 bits and -8 bits in the fast and the slow clock, respectively. The following paragraph addresses the RPFD simulation result and also summarizes the PR operation in the state diagram.

1) Frequency-Tracking Mode: The waveform of the conventional BBPD operation is shown in Fig. 5(a) and 5(b). While the data sequence comes in the RPFD, the input data are recovered in the rising triggered Q1 and the falling triggered Q2. The order of the sequential data Q1 and Q2 implies clock phase, which Q1 leads Q2 for phase LEAD and Q1 lags Q2 for phase LAG. After the falling triggered Q2, the rising triggered Q4 is compared with Q1 and Q3 to decide the phase LEAD and phase LAG, respectively. In the data transition  $(d[n-1] \oplus d[n] = 1)$ , the logical XOR gate determines the output UP and DN in the data transition  $(d[n-1] \oplus d[n] = 1)$ .

Fig. 5(c) and 5(d) shows the PR operation in the timing diagram. To swap the bidirectional phase detection, the phase swapping PR is designed to detect phase change with the LD and phase state with the OP. The first phase change occurs in d[4], which behinds the over-sampling and the under-sampling data bit. While the rising clock samples in the edge of data, the falling clock samples in the window area to identify the out-of-phase (OP = 1) with the data transition, such as (d[3], d[4] = 01 or 10). At the same time, LD transits as the order of Q1 and Q2 changes at the rising of data transition (i.e. d[3], d[4] = 01). The transition of LD activates the phase swapping process until LD transits again in the in-phase (OP = 0). Therefore, the phase swapping PR switches on the XBSW with the on-off SW when fast clock is phase LAG d[4:6] and slow clock is phase LEAD d[4:7]. The unilateral phase detection is successfully realized in the frequency-unlocked clock.

2) Phase-Tracking Mode: As clock frequency is close to data rate, the the falling edged clock becomes difficult to approach the window area of input data. While the phase difference is locked in the frequency-locked range, the rising edge of clock occupies the window area instead. Furthermore, LD transits in the in-phase when OP is logic low so that the DFF F7 and F8 turn off the SW to terminate the phase swapping process. Thus, the  $V_{BBPD}$  directly controls the  $V_{CTRL}$  to change clock phase. At phase locked clock, the RPFD behaves as the bidirectional phase detection of the BBPD to align clock phase when  $\Delta \phi \rightarrow 0$ .

3) State Diagram: The RPFD operation is illustrated as the state diagram shown in Fig. 6. The phase state begins with in-phase when SW is logic low. As the phase difference accumulates in frequency-tracking mode, the falling edge of clock triggers in the data center with the out-of-phase (OP = 1). In step A, the transition of  $LD (LD[n-1] \oplus LD[n] = 1)$  reflects the phase change at odd  $\pi$  and triggers the phase swapping when SW goes high. Next, the rising edge of clock returns to the data center with the in-phase (OP = 0). In step B, LD transits at even  $\pi$  to stop the phase swapping process when SW goes low. The process is turned off until LD transits again in step A. In phase-tracking mode, clock is frequency locked in



Fig. 5. Timing diagram in frequency-tracking mode. (a) BBPD with fast clock. (b) BBPD with slow clock. (c) PR with fast clock. (d) PR with slow clock.



Fig. 6. RPFD state diagram.

step C while the rising edge of clock occupies the data center with the in-phase. The phase swapping process between step A and step B is terminated while clock phase is locked at  $\Delta \phi \rightarrow 0$ .

# E. Voltage Controlled Oscillator

Fig. 7(a) shows the cross-coupled pair LC-tank voltage controlled oscillator (VCO) and the inductive load CML clock buffer (BUF). The control voltage  $V_{CTRL}$  is operated by the RPFD to correct clock phase and frequency for phase alignment [12]. In practice, the differential voltage of the  $V_{CTRL}$  linearly tunes the capacitance of the reverse-biased varactors V1-V4 to change clock frequency. Fig. 7(b) exhibits the frequency tuning range from 9 GHz to 12 GHz with three process variations and reveals the frequency gain  $K_V = 4 \text{ GHz/V}$  in the linear region. The cross-coupled CMOS M1-M4 ensures the activating gain for clock oscillating and complements the parasitic loss in the LC-tank of the VCO. The inductive-load L2-L3 of the BUF amplifies periodic clock CK. The output CK drives the time decision DFF and the external measuring equipment for the recovered clock and data.

#### **III.** ACQUISITION TIME ANALYSIS

## A. PFLL Model

1) Frequency-Tracking Mode: The negative-feedback PFLL provides a dynamic system to diminish the phase difference  $\Delta\phi$ . Fig. 8(a) shows the frequency-tracking PFLL. Due to the limited phase detection capability, the BBPD-based RPFD cannot fully reflect high-speed frequency change with the phase detection gain  $K_{PD}$ . Therefore, the BBPD results in the internal effect of the transfer function  $1/(1 + s/\omega_{LPF})$ , which is equivalent a low-pass filter (LPF) with -3 dB-bandwidth of  $\omega_{LPF}$  [2]. The frequency-tracking transfer function  $(\omega_{out}/\omega_{in})(s)$  is equivalent to the ratio of excess phases of output clock and input data as  $(\Phi_{out}/\Phi_{in})(s)$  because the frequency variation is the rate of phase changes. As a result, the open-loop transfer function can be expressed as

$$\frac{\omega_{out}}{\omega_{in}}(s) \mid_{open} = \frac{\Phi_{out}}{\Phi_{in}}(s) \mid_{open} \\ = \frac{K_{PD}I_P}{1 + \frac{s}{\omega_{LPF}}} \cdot \frac{1 + R_P C_P s}{C_P s} \cdot \frac{K_V}{s}, \quad (2)$$

where  $\Phi_{out}(s)$  and  $\Phi_{in}(s)$  are excess phase of output clock and input data, respectively. In the real LFs, the parallel capacitor  $C_P$  accumulates the control currents with the transfer function  $1/C_Ps$ . The open-loop transfer function indicates one pole at  $s = \omega_{LPF}$  and the other two poles at origin so that we have a type-II PLL. The PFLL is stabilized by the LPF even if the parallel resistor  $R_P$  is chosen to be 0.

2) Phase-Tracking Mode: In this mode, the BBPD fully reflects phase change in the frequency locked range  $[-\pi, +\pi]$  and further locks clock phase at  $\Delta \phi \rightarrow 0$ . Without the LPF, the PFLL is an unstable type-II PLL for only two poles at origin. Thus, we introduce a zero to stabilize the system by connecting



Fig. 7. (a) VCO and BUF. (b) Simulated tuning clock frequency.



Fig. 8. PFLL model. (a) Frequency-tracking mode. (b) Phase-tracking mode. (c) Phase-tracking mode with VCO jitter.

a parallel resistor  $R_P$  with  $C_P$ . Fig. 8(b) displays the phasetracking PFLL. We have the open-loop transfer function as:

$$\frac{\Phi_{out}}{\Phi_{in}}(s)\mid_{open} = K_{PD}I_P \cdot \frac{1 + R_P C_P s}{C_P s} \cdot \frac{K_V}{s}.$$
 (3)

The PFLL is a stable type-II PLL for a zero at  $s = 1/R_P C_P s$ .

To decide  $R_P$  and  $C_P$ , the closed-loop transfer function is given by

$$\frac{\Phi_{out}}{\Phi_{in}}(s) \mid_{closed} = \frac{\frac{K_{PD}I_PK_V}{C_P}(1+R_PC_Ps)}{s^2 + K_{PD}I_PK_VR_Ps + \frac{K_{PD}I_PK_V}{C_P}}.$$
 (4)

A large capacitor  $C_P = 1$  nF is chosen to achieve the lower loop bandwidth and simplifies the closed-loop transfer function to  $(\Phi_{out}/\Phi_{in})(s)|_{closed} \approx 2\zeta \omega_n/(s+2\zeta \omega_n)$ [6]. Thus, we have the -3-dB bandwidth of the loop,  $\omega_{-3 \text{ dB}} \approx 2\zeta \omega_n = K_{PD} I_P K_V R_P$ . In the SONET OC-192, the jitter tolerance mask established the forth jitter tolerance corner frequency  $f_4 = 4$  MHz so that we choose  $\omega_{-3 \text{ dB}} > f_4$ . Assume  $\phi_m = 1$  to simplify  $K_{PD} = 1/\phi_m$  for the maximum current  $I_P = 50 \ \mu A$ . Therefore, we have the parallel resistor  $R_P = 200 \ \Omega$  at  $\omega_{-3 \ dB} = 6.36 \ MHz$ . Fig. 8(c) shows that the VCO jitter,  $\Phi_{VCO}$ , includes in the phase-tracking PFLL. Due to the high-frequency VCO jitter, a smaller  $\omega_{-3 \text{ dB}}$  is avoided for  $(\Phi_{out}/\Phi_{VCO})(s)|_{closed} \approx s/(s+2\zeta\omega_n)$ , which is considered as a high-pass filter.

## B. Frequency Acquisition Time

In the frequency-tracking mode, the  $\overline{V_{BBPD}}$  dependents on the input random data. Only in the data transition does the



Frequency Difference,  $\Delta f$ Fig. 9. Phase swapping probability and frequency deviation analysis.

BBPD determines the bidirectional phase detection with the phase swapping. Since  $h^{-1}$  denotes the number of bits that data stream passes through to accumulate  $2\pi$  phase difference, the unilateral phase detection probability can be expressed as

$$P_{RPFD}(h) = P_{BBPD}(h)P_{SW}(h), \tag{5}$$

(GHz)

where  $P_{BBPD}(h)$  is the phase detection probability and  $P_{SW}(h)$  is the phase swapping probability.  $P_{RPFD}(h)$  is function of h, and |h| converges to 0 over time during the frequency acquisition.

The phase detecting BBPD reflects phase LEAD and phase LAG by  $UP = Q3 \oplus Q4$  and  $DN = Q1 \oplus Q4$ , respectively. Thus, only after the data transition does the BBPD output a pulse with a data transition probability  $(P_{DT})$ . However, as the timing diagram shown in Fig. 6, the rising edge of clocks over-samples and under-samples in d[3], which causes repeating or missing data recovery in the output Q1 and Q3. As a result, the output UP and DN generates a one-bit net zero output in the  $h^{-1}$  bits. The phase detection probability  $P_{BBPD}(h)$  can be written as

$$P_{BBPD}(h) = \frac{P_{DT}|h|^{-1} - 1}{|h|^{-1}} = P_{DT} - |h|.$$
 (6)

Assume input data sequence comes from the *m*-bit linear feedback shift register (LFSR). It generates a pseudo-random binary sequence (PRBS) length of  $2^{\overline{m}} - 1$ , which includes data transition of  $2^{(m-1)}$  times. Thus, the  $P_{DT}$  is given by  $2^{(m-1)}/(2^m - 1)$ 1), which is close to 1/2 in high order of the *m*-bit.

The PR determines the phase swapping process with the on-off SW for frequency detection. As mentioned in Fig. 4(c), the delay data D' and D'' decides the window area ranging  $\left[-\pi/2, +\pi/2\right]$  and divides clock phase between the out-of-phase and the in-phase. The phase swapping process must begin in the phase difference of  $\pi/2$  after odd  $\pi$  and ends in the same area after even  $\pi$ . In other words, the accumulated

Deviation,

requency

ш



Fig. 10. Simulated frequency acquisition from 11.4 GHz to 10 GHz.

phase difference of  $\pi/2$  consumes quarter of  $h^{-1}$  bits to switch the XBSW. While the BBPD determines phase detection with the  $P_{DT}$ , the transition of LD only happens in the rising of the data transition with a conditional probability of 1/2. Thus, the phase swapping probability can be given by

$$P_{SW}(h) = \left[1 - \left(\frac{1}{2}\right)^{\frac{1}{4|h|}}\right]^2.$$
 (7)

As shown in Fig. 9, the phase swapping probability  $P_{SW}(h)$  approaches 1 while the frequency deviation drops exponentially. Even if h increases to 0.14 at  $\Delta f = 1.4$  GHz,  $P_{SW}(h)$  still holds 1/2. Moreover,  $P_{SW}(h)$  shows the clock frequency is captured in the range of  $|h| \le 1/4$ , which includes the frequency tuning range of the VCO, to ensure frequency acquisition in the PFLL. In our previous work [21], the window area decreases to a range of  $[-\pi/3, +\pi/3]$  so that the phase swapping probability is reduced to  $P_{SW}(h) = [1 - (1/2)^{1/6|h|}]^2$ .

In the PFLL, the RPFD generates an unilateral phase detection output to accumulate direct current of the  $I_{CTRL}$  in the LFs. Assume the frequency-unlocked clock with the frequency difference  $\Delta f$  is captured into the frequency captured range  $\Delta f_C$ where the PR stops the phase swapping process. To charge the large capacitor  $C_P$ , the V/I converter takes the acquisition time  $t_a$ , which is expressed as

$$\frac{C_P(\Delta f - \Delta f_C)}{K_V} = I_P \int_0^{t_a} P_{RPFD}(h) dt \cong I_P \overline{P_{RPFD}} t_a \quad (8)$$

where  $\overline{P_{RPFD}}$  is the average of the unilateral frequency detection probability. Note that the  $P_{RPFD}(h)$  increases monotonically to 0.5 over the convergence as |h| decreases to 0. Thus, the theoretical frequency acquisition time is given by

$$t_a \cong \frac{C_P(\Delta f - \Delta f_C)}{K_V I_P \overline{P_{RPFD}}}.$$
(9)

Fig. 10 shows the frequency acquisition simulation with the HSPICE tool. The single-loop PFLL synchronizes the fast clock frequency of 11.4 GHz with the 10 Gb/s data of a PRBS  $2^{31} - 1$ . In the RPFD, the BBPD alternatively pulses in the output UP and DN to reflect the phase LAG and LEAD, respectively. To



Fig. 11. Die microgram of 0.71  $\rm mm^2$  chip area.

capture fast clock, the on-off SW switches as the logic state of UP when SW goes high in phase LAG and goes low in phase LEAD. The rearranged phase detection of phase LEAD makes the differential control voltage  $V_{CTRL}$  continuously drop to zero voltage. After 15.3  $\mu$ s, the frequency difference  $\Delta f$ comes into the frequency captured range  $\Delta f_C$  of 250 MHz. The dual-mode RPFD restores to the bidirectional phase detection for phase alignment while the XBSW is turned off, SW = 0. According to the parameters in Fig. 10, the theoretical analysis shows thay the estimated  $t_a$  is 14  $\mu s$ , which matches the simulation result.

# IV. MEASUREMENT RESULT

The CDR circuit has been fabricated in TSMC 1P9M 90 nm CMOS process with the overall chip size of 0.71 mm<sup>2</sup>. As shown in Fig. 11, the implemented chip includes the LC-tank VCO, the inductive-load BUF, the time-decision DFF, and the proposed RPFD. The LFs adopts poly-silicon resistors  $R_P = 200 \ \Omega$  and off-chip capacitances  $C_P = 1 \ nF$ . The overall chip consumes 92 mW at 1.0-V supply voltage.

The measurement environment is exhibited in Fig. 12 followed by the measured results from Fig. 13 to Fig. 16. The input PRBS data is provided by (Anritsu MT1810A) so as to measure the recovered clock and data with (Agilent 86112A). Due to the measuring speed limitation (<7 GHz), the recovered clock frequency is divided by two with the prescaler (Centellax



Fig. 12. Measurement environment.



Fig. 13. Measured clock frequency from 11.36 GHz to 9.88 GHz.



Fig. 14. Measured clock frequency from 9.28 GHz to 12.46 GHz.

UXD20P) and then measured by the signal source analyzer (Agilent E5052B) for frequency acquisition time.

Fig. 13 and Fig. 14 show the frequency acquisition with a  $2^{31} - 1$  PRBS in the practical CDR application. The recovered fast clock with  $\Delta f = 1.48$  GHz moves downward to 9.88 GHz and consumes the frequency acquisition time of 20  $\mu$ s. While the desired clock frequency  $f_D$  deviates from the chosen 10 GHz, the upward slow clock takes 48  $\mu$ s from 9.28 GHz to 12.46 GHz. The upward acquisition time is longer than our expectation because the phase swapping probability  $P_{SW}(h)$  is also decreased by the shifted window area.

With input data rate 10 Gb/s of a  $2^{31} - 1$  PRBS, Fig. 15 shows that the time domain clock jitter is 5.0 ps for peak-to-peak (P-P) jitters and 558 fs for root mean square (RMS). Fig. 16 shows the recovered data for 15.11 ps P-P jitter and 2.28 ps RMS jitter under input data of a  $2^{11} - 1$  PRBS. The implemented CDR is measured by the signal quality analyzer (Anritsu MP1800A) for



Fig. 15. Measured history diagram of 10 GHz recovered clock.



Fig. 16. Measured eye-diagram of 10 Gb/s recovered data



Fig. 17. Measured jitter tolerance with SONET OC-192 Mask.

a 10 Gb/s jitter tolerance measurement. In Fig. 17, the high frequency jitter tolerance exhibits 0.22 unit interval peak-to-peak (UI<sub>PP</sub>) over the forth corner frequency,  $f_4 = 4$  MHz. Thus, the proposed CDR meets the SONET OC-192 specification.

Also, the overall performances and the modern dual-loop CDRs are summarized in Table I. Divided by one bit time (i.e.  $T_b = 100 \text{ ps}$  for 10 Gb/s), the peak-to-peak jitter of the recovered clock are normalized to compare the fabricated chip with the other CDR fairly. With the normalized comparison, the proposed PFLL exhibits the P-P clock jitter of 0.05 unit interval (UI).

# V. CONCLUSION

This paper proposes the dual-mode RPFD for both phase and frequency detection by changing the conventional BBPD characteristic. The bidirectional phase detection of phase LEAD and phase LAG is exchanged to the unilateral phase detection (LEAD or LAG) for increasing frequency capture range. During the operation, the phase swapping PR turns on the XBSW to

|                    | This Work           | TCAS-I'10             | JSSC'08             | JSSC'09              | TCAS-II'11             | TCAS-I'08              | TCAS-I'13             |
|--------------------|---------------------|-----------------------|---------------------|----------------------|------------------------|------------------------|-----------------------|
|                    | FT. Chen            | [5] CF. Liang         | [6] J. Lee          | [12] J. Lee          | [15] CL. Hsieh         | [16] SH. Lin           | [19] J. Song          |
| Data Rate          | 10Gb/s              | 10Gb/s                | 20Gb/s              | 20Gb/s               | 16Gb/s                 | 2Gb/s                  | 2.7Gb/s               |
| Phase Detector     | Binary              | Burst                 | Burst               | Linear               | Binary                 | Binary                 | Linear                |
| Frequency Detector | RPFD                | PFD                   | PFD                 | Pottbacker           | QDFD                   | BBPFD                  | WPFD                  |
| Reference Clock    | No                  | Yes                   | Yes                 | No                   | No                     | No                     | No                    |
| Acquisition Time   | 20µs                | 500ps                 | 50ps                | _                    | 1ms                    | 320µs                  | 0.96µs                |
| Bit Time           | 100ps               | 100ps                 | 50ps                | 50ps                 | 62.5ps                 | 500ps                  | 370ps                 |
| Peak-to-Peak       | 5.0ps               | 21.8ps                | 10ps                | 4.22ps               | 19.3ps                 | 30.2ps                 | 13.2ps                |
| CK Jitter          | 0.05UI              | 0.21UI                | 0.2UI               | 0.084UI              | 0.31UI                 | 0.06UI                 | 0.035UI               |
| RMS CK Jitter      | 558fs               | 3.4ps                 | 1.26ps              | 480fs                | 2.84ps                 | 4.02ps                 | 1.57ps                |
| Power Supply       | 1.0V                | 1.8V                  | 1.5V                | 1.5V                 | 1.5V                   | 1.8V                   | 1.2V                  |
| Power Consumption  | 92mW                | 56mW★                 | 102mW               | 154mW*               | 160mW                  | 56mW*                  | 23mW                  |
| Die Area           | 0.71mm <sup>2</sup> | 0.25mm <sup>2</sup> * | 0.96mm <sup>2</sup> | 0.85 mm <sup>2</sup> | 0.134mm <sup>2</sup> * | 0.133mm <sup>2</sup> * | 0.07mm <sup>2</sup> * |
| Technology         | 90nm<br>CMOS        | 180nm<br>CMOS         | 90nm<br>CMOS        | 90nm<br>CMOS         | 130nm<br>CMOS          | 180nm<br>CMOS          | 130nm<br>CMOS         |

TABLE I MODERN CDR PERFORMANCE COMPARISON

\* w/o I/O Buffer \* Core Area

swap the differential control currents,  $+i_{CTRL}$  and  $-i_{CTRL}$ , from the V/I converter. Therefore, the single-loop PFLL alternatively tracks clock frequency and phase. The measured CDR captures input data rate 10 Gb/s of PRBS  $2^{31} - 1$  over the range of 1.48 GHz and exhibits the clock peak-to-peak jitter of 5.0 ps or 0.05 UI. The proposed CDR removes the noise contribution from FLL in the dual-loop CDR to improve clock quality.

#### ACKNOWLEDGMENT

The authors would like to thank CIC and TSMC for fabrication support. The authors would also like to thank Prof. Wei-Zen Chen of National Chiao Tung University for measurement help.

#### REFERENCES

- M.-S. Lin and C.-C. Tsai *et al.*, "A 5 Gb/s low-power PCI express/ USB3.0 ready PHY in 40 nm CMOS technology with high-jitter immunity," in *Proc. IEEE Asian Solid-State Circuits Conf.*, Nov. 2009, pp. 177–180.
- [2] B. Razavi, Design of Integrated Circuits for Optical Communication Systems. New York: McGraw-Hill, 2003.
- [3] J.-K. Woo, H. Lee, W.-Y. Shin, H. Song, D.-K. Jeong, and S. Kim, "A fast-locking CDR circuit with an autonomously reconfigurable charge pump and loop filter," in *Proc. IEEE Asian Solid-State Circuits Conf.*, Nov. 2006, pp. 411–414.
- [4] N. Suzuki, K. Nakura, S. Kozaki, H. Tagami, M. Nogami, and J. Nakagawa, "Single platform 10G-EPON 10.3-Gbps/1.25-Gbps dual-rate CDR with fast burst-mode lock time employing 82.5 GS/s sampling IC and bit-rate adaptive decision logic circuit," in *Proc. IEEE ECOC*, Sep. 2010.
- [5] C.-F. Liang, H.-L. Chu, and S.-I. Liu, "10-Gb/s inductorless CDRs with digital frequency calibration," *IEEE Trans. Circuits Syst. I, Reg. Papers*, vol. 55, no. 9, pp. 2514–2524, Oct. 2008.
- [6] J. Lee and M. Liu, "A 20-Gb/s burst-mode clock and data recovery circuit using injection-locking technique," *IEEE J. Solid-State Circuits*, vol. 43, no. 3, pp. 619–630, Mar. 2008.
- [7] J. Terada, K. Nishimura, S. Kimura, H. Katsurai, N. Yoshimoto, and Y. Ohtomo, "A 10.3 Gb/s burst-mode CDR using a ΔΣ DAC," *IEEE J. Solid-State Circuits*, vol. 43, no. 12, pp. 2921–2928, Dec. 2008.

- [8] D. Messerschmitt, "Frequency detectors for PLL acquisition in timing and carrier recovery," *IEEE Trans. Commun.*, vol. CMO-27, no. 9, pp. 1288–1295, Sep. 1979.
- [9] M.-S. Hwang and S.-Y. Lee *et al.*, "A 180-Mb/s to 3.2-Gb/s, continuous-rate, fast-locking CDR without using external reference clock," in *Proc. IEEE Asian Solid-State Circuits Conf.*, Nov. 2007, pp. 144–147.
- [10] D. Dalton et al., "A 12.5-Mb/s to 2.7-Gb/s continuous-rate CDR with automatic frequency acquisition and data-rate readback," *IEEE J. Solid-State Circuits*, vol. 40, no. 12, pp. 2713–2725, Dec. 2005.
- [11] A. Pottbäcker, U. Langmann, and H.-U. Schreiber, "A Si bipolar phase and frequency detector IC for clock extraction up to 8 Gb/s," *IEEE J. Solid-State Circuits*, vol. 27, no. 12, pp. 1747–1751, Dec. 1992.
- [12] J. Lee and K.-C. Wu, "A 20-Gb/s full-rate linear clock and data recovery circuit with automatic frequency acquisition," *IEEE J. Solid-State Circuits*, vol. 44, no. 12, pp. 3590–3602, Dec. 2009.
- [13] I. Jung and D. Shin *et al.*, "A 140 Mb/s to 1.96 Gb/s referenceless transceiver with 7.2 μs frequency acquisition time," *IEEE Trans. Very Large Scale Integr. (VLSI) Syst.*, vol. 19, no. 7, pp. 1310–1314, Jul. 2011.
- [14] R. J. Yang, K. H. Chao, S. C. Hwu, C. K. Liang, and S. I. Liu, "A 155.52 Mbps-3.125 Gbps continuous-rate clock and data recovery circuit," *IEEE J. Solid-State Circuits*, vol. 41, no. 6, pp. 1380–1390, Jun. 2006.
- [15] C.-L. Hsieh and S.-I. Liu, "A 1–16-Gb/s wide-range clock/data recovery circuit with a bidirectional frequency detector," *IEEE Trans. Circuits Syst. II, Exp. Briefs*, vol. 58, no. 8, pp. 487–491, Aug. 2011.
- [16] S.-H. Lin and S.-I. Liu, "Full-rate bang-bang phase/frequency detectors for unilateral continuous-rate CDRs," *IEEE Trans. Circuits Syst. I, Reg. Papers*, vol. 55, no. 12, pp. 1214–1218, Dec. 2008.
- [17] N. Dodel, H. Klar, and S. Otte, "A 9.8–10.7 Gb/s bang-bang CDR with automatic frequency acquisition capability," in *Proc. IEEE Int. Midwest Symp. Circuits Syst. (MWSCAS)*, 2006, vol. 2, pp. 46–49.
- [18] B. Razavi, "Challenges in the design of high-speed clock data recovery circuits," *IEEE Commun. Mag.*, vol. 40, no. 8, pp. 94–101, Aug. 2002.
- [19] J. Song, I. Jung, M. Song, Y.-H. Kwak, S. Hwang, and C. Kim, "A 1.62 Gb/s–2.7 Gb/s referenceless transceiver for displayport v1.1a with weighted phase and frequency detection," *IEEE Trans. Circuits Syst. I, Reg. Papers*, vol. 60, no. 2, pp. 268–278, Feb. 2013.
- [20] J. D. H. Alexander, "Clock recovery from random binary data," *IEEE Electron. Lett.*, vol. 11, pp. 541–542, Oct. 1975.
- [21] F.-T. Chen and J.-M. Wu et al., "A 10 to 11.5 Ghz rotational phase and frequency detector for clock recovery circuit," in *Proc. IEEE Int.* Symp. Circuits Syst. (ISCAS), May 2011, pp. 185–188.

- [22] J. Lee, K. Kundert, and B. Razavi, "Analysis and modeling of bangbang clock and data recovery circuits," *IEEE J. Solid-State Circuits*, vol. 39, pp. 1571–1580, Sep. 2004.
- [23] J. Savoj and B. Razavi, "A 10-Gb/s CMOS clock and data recovery circuit with a half-rate binary phase/frequency detector," *IEEE J. Solid-State Circuits*, vol. 38, no. 1, pp. 13–21, Jan. 2003.
- [24] M. Nogawa, K. Nishimura, and S. Kimura et al., "A 10 Gb/s burstmode CDR IC in 0.13 μm CMOS," in *IEEE Int. Solid-State Circuit Conf. (ISSCC) Dig. Tech. Papers*, Feb. 2005, pp. 228–595.



Jen-Ming Wu (M'98) received the B.S. from National Taiwan University, Taiwan, and the Ph.D. from University of Southern California, Los Angeles, CA, USA.

From 1998 to 2003, he was with Sun Microsystems Inc., CA, USA, as Member of Technical Staff. Since 2003, he has been with the faculty of the Department of Electrical Engineering, National Tsing Hua University, Taiwan, where he currently holds an Associate Professor position. Prof. Wu has worked on various fields of electrical engineering including

high speed chip-to-chip interconnections, I/O interface for optical transceivers, digital signal processing, wireless communication transceiver design, and microprocessor architecture. Currently, his research works focus on high speed interconnections, wireless baseband communication technologies, embedded SoC design for multimedia communications.



**Fan-Ta Chen** was born in Hsinchu, Taiwan, in 1983. He received the B.S. degree from Yuan Ze University, Taiwan, in 2005 and M.S. degree from the National Tsing Hua University, Taiwan, in 2007. He is currently working toward the Ph.D. degree in Department of Electrical Engineering at National Tsing Hua University. His research interests are high-speed analog transmission interface for wireline communications.



**Ching-Te Chiu** (M'04) received her B.S. and M.S. degrees from National Taiwan University, and the Ph.D. degree from University of Maryland, College Park, MD, USA, all in electrical engineering. She was a Technical Member Staff with AT&T, a Lucent Technologies, and Agere Systems. She is currently with the Computer Science Department and Institute of Communications Engineering, National Tsing Hua University, Hsinchu, Taiwan, as a full Professor. Her research interests include high-speed SerDes design, high dynamic range image and video

processing, image super-resolution and pattern recognition.



**Min-Sheng Kao** received the Ph.D. degree in communications engineering of National Tsing Hua University, Taiwan, in 2011. His research interests are high-speed analog front-end circuits for wireline communications. He was with the Industrial Technology Research Institute (ITRI) Optical Communication and Optical Display Division from 2000 to 2003. He joined Mindspeed Technologies, Taiwan, in 2004 and is currently a Director of Marketing and Product Application in the High Performance Analog business unit.



Shawn S. H. Hsu (M'04) was born in Tainan, Taiwan. He received the B.S. degree from National Tsing Hua University, Hsinchu, Taiwan, in 1992, and the M.S. and the Ph. D. degrees from the University of Michigan, Ann Arbor, MI, USA, in 1997 and 2003, respectively. He is currently a Professor with the Department of Electrical Engineering, National Tsing Hua University. His research interests include MMICs/RFICs design for high speed communications and 3D integrated circuits.



Yu-Hao Hsu (S'10) received the B.S. degree in electrical engineering and the Ph.D. degree in communications engineering from National Tsing Hua University, Taiwan. He is currently a Principle Engineer with the Memory Design Program (MDP), Taiwan Semiconductor Manufacturing Company Limited (TSMC). His research interests are high-speed switch architecture design, high SerDes interface design, and SRAM compiler design.



**Mau-Chung Frank Chang** (M'79–SM'84–F'96) is currently the Department Chairman and the Wintek Distinguished Chair Professor of Electrical Engineering at the University of California, Los Angeles, CA, USA. Throughout his career, his research has focused on the development of high-speed semiconductor devices and high frequency integrated circuits for communication, radar and imaging systems. He is a member of the U.S. National Academy of Engineering and an Academician of Taiwan's Academia Sinica. He also received the IEEE David Sarnoff

Award in 2006 for developing and commercializing HBT power amplifiers for modern mobile phones.