# A Sierpinski Space-Filling Clock Tree Using Multiply-by-3 Fractal-Coupled Ring Oscillators

Yi-Wei Lin, Student Member, IEEE, and Shawn S. H. Hsu, Member, IEEE

Abstract—A space-filling clock tree is presented, which takes an advantage of LC resonant clocking, to obtain a uniform phase and amplitude multiply-by-3 clock from a unique Sierpinskicoupled ring oscillator (SCRO) array. The three-stage interleaved SCROs resemble the Sierpinski triangle, and are synchronized with a common frequency to all. The SCRO further provides an aligned output phase relationship to reduce skew. The triangle clock grid with a side length of 3.2 mm is filled by a space-filling clock tree. The 3-D stacked transformers at the tree endpoints extract the third harmonic of the SCRO array oscillation and scale the voltage amplitude of the extracted clock. The transformers also perform a built-in bandpass filtering function to remove injected noise. An experimental prototype integrated in a 90-nm CMOS operates at 2.85-4.3 GHz and consumes 19.2-49 mW under 0.7-1 V supply voltages. With 300-mV added supply noise, jitter was measured as 3.4 ps (rms) and 17.7 ps (pp). The measured results reveal substantial improvements in both power and jitter from this approach.

*Index Terms*—Adler equation, bandpass filter, clock distribution, coupled ring oscillators (CROs), harmonic extraction, injection locking, jitter transfer function, Sierpinski triangle fractal, space-filling tree, stacked 3-D transformer.

## I. INTRODUCTION

THE increased chip sizes in the multi-core processors, very-large-scale integration (VLSI)/system-onchip (SoC) systems, and recent 2.5/3-D IC systems will lead to an increase in the power and complexity of clock distribution networks. The network in conventional buffered H-trees contains many levels of energy-consuming inverter buffers to distribute a single-clock source across the whole chip, with a large buffer latency to which both skew and jitter are proportional [1]. Different approaches have been proposed to solve these problems [1]–[15]. Among them, oscillator array clocks are particularly promising [4]–[12], since multiple clock generators that are phase locked together in the array can eliminate large amounts of latency from the clock distribution, reducing both skew and jitter. As shown in Fig. 1, the clock distributions with directly coupled oscillators via interconnects consume less power than those that use phase detectors to synchronize a chip or multiple 3-D IC chips [2], [3],

Manuscript received December 8, 2016; revised April 20, 2017; accepted July 18, 2017. Date of publication August 21, 2017; date of current version October 23, 2017. This work was supported by the Ministry of Science and Technology, Taiwan, under Grant MOST 103-2220-E-007-003. This paper was approved by Associate Editor Tony Chan Carusone. (*Corresponding author: Shawn S. H. Hsu.*)

The authors are with the Department of Electrical Engineering, National Tsing Hua University, Hsinchu 30013, Taiwan (e-mail: yiwailin@gmail.com; shhsu@ee.nthu.edu.tw).

Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org.

Digital Object Identifier 10.1109/JSSC.2017.2732730

including standing-wave oscillators [4], [5] [Fig. 1(a) and (b)], traveling-wave oscillators [8]-[10] [Fig. 1(c)], distributed LC-based oscillators [11] [Fig. 1(d)], and coupled ring oscillators (CROs) [12] [Fig. 1(e)]. The first approach [4], [5] generates high-frequency standing waves on loss compensated interconnects and has been implemented as clock distribution. A clock grid realized by the transmission-line-based interconnect is proposed [4], and standing wave can be generated along the transmission line which acts as a half-wave resonator. Note that standing-wave clocks in Fig. 1(a) have the same phase at all points but with varying amplitude along the transmission lines. The rotary oscillator array (ROA) [8] in Fig. 1(c) is obtained by distributing CMOS inverters around a Mobius Strip, to sustain rotational traveling waves on the loop and interconnect many of these rings to constitute an ROA over a chip. The Mobius strip has the mathematical property of being non-orientable, and the cross-connected ring structure based on this concept allows a wave travels on this ring indefinitely. Traveling-wave oscillators provide uniformly high amplitude and a multiphase (360°) rotary clock. Yet, the uncertainty of the signal rotation direction on the ring in a complex network remains an issue, because oscillation occurs in the direction of the lowest loss, which depends on the field couplings between the ring and neighboring structures [8], [9].

A single rotary ring structure determines in which direction startup occurs rather explicitly, and is treated as a superposition of multiple standing-wave oscillators in the analysis of [10]. In practice, several cascaded standing-wave oscillators form a single closed loop, and are exploited as a space-filling curve clock distribution network in [5]. The clock in Fig. 1(b) features a standing-wave clock with almost uniform amplitude, by attaching several inductors at the ends of each transmission line section on the loop. The resonant frequency depends on the inductive loads, which shorten the length of the transmission line and thus make their actual lengths comparable. Lumped capacitors can also shorten the length of transmission line, but the voltage swing of standing-wave becomes smaller compared with that using lumped inductors [5]. Notice that standing- and traveling-wave distributions require high-quality on-chip transmission lines, which limit how slow the clock frequency can be. Exploiting inductance for clock distribution has attracted significant attention recently [4]–[11], [15], [16], since it is able to return energy back to the clock generator leading to low-power operation. Wire inductance or a set of on-chip inductors is therefore used to resonate with the clock load capacitance. In distributed LC-based oscillators [Fig. 1(d)] [11], determining the target resonance of the large network before tapeout consumes a considerable amount of

0018-9200 © 2017 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications\_standards/publications/rights/index.html for more information.





(c)





Fig. 1. Coupled oscillator array clocks (a) standing-wave oscillator clock, (b) inductively loaded standing-wave oscillator clock, (c) rotary traveling-wave oscillator clock, (d) distributed *LC*-based oscillator clock, and (e) CRO clock and the electrical equivalent.

computing resources. Moreover, the center-tapped inductors in the coupled resonant oscillators short the differential clock phases at low frequencies, resulting in multiple oscillation modes in these oscillators.

In contrast to the above-mentioned approaches, a nonresonant clock in Fig. 1(e) is generated and delivered by spatially distributed and electrically parallel buffers in a CRO array [12]. The high degree of parallelism among the buffers provides strong signal aggregation that approximates a clock phase alignment without complex synchronization schemes. The overlapping ring oscillators can be integrated into VLSI chips and chosen to oscillate at the desired frequency, based on the interconnect delay estimation. One feature that could be either an advantage or a drawback is that this array generates a clock with multiphases. This could be useful if multiple phases are needed, such as in scheduled-skew logic design [13], [14], but it discards the others if only one is desired.

In the first part of this paper, we propose using the topology of the Sierpinski triangle to design a CRO array clock, which generates an aligned three-phase clock without phase offsets between any two corresponding delay stages of the rings among the array. In addition to minimize a clock skew, the importance of the aligned phase relationship is to simplify the clock network design without taking the phase shift between the parallel elements in the array into consideration. As mentioned before, a generated clock with multiphases could be a waste in a single-phase clock network design. Therefore, the second part of this paper is to develop a low-power, high-frequency, and single-phase Sierpinski space-filling clock tree, which further takes the advantage of LC resonant clocking. This is done using LC tanks to extract the third harmonic from the three-phase clock signal, i.e., Nth harmonic extraction on the N-phase clock. The LC tank is implemented as a 3-D stacked transformer, which can amplify the output swing of harmonic clocks with little energy consumption. The circuit response of the transformer is a bandpass filter; hence, it also offers jitter filtering in addition to the clock buffering.

This paper is organized as follows. In Section II, two CRO arrays are analyzed by the generalized form of Adler equation. Then, the proposed array is verified to be a ring-oscillatorbased clock distribution that can achieve the synchronous state, with not only the same period but also the same phase at the corresponding nodes of all rings. Section III explores the effects of the proposed modification on extracting the single-phase harmonic clock from the array. In Section IV, experiments give the power consumption, and the measured jitter and skew of the prototype chip. Section V concludes this work.

#### **II. RING OSCILLATOR ARRAY CLOCKS**

A ring oscillator is a simple structure and similar to the conventional inverter-based clock buffer. From this point of view, regarding the similarity between the two circuits, a ring oscillator array clock is more closely related to the inverter buffer chain-based clocking practice. When a group of oscillators is synchronized in unity, it further establishes a coupled system, which is more resistant to impulsive perturbations than a single system. In previous studies [17]–[19], the net improvement in phase noise is shown to be  $10\log(M)$  decibels in a system of M coupled electrical oscillators. Jitter and phase noise are associated embodiments of oscillator noise in time and frequency domains, revealing that a coupled system can improve a jitter performance. However, an oscillator array clock requires a coupled system that dictates more than minimum jitter, but also minimum skew and equal delay for distributing the clock signal to the clock sinks. Hence, to meet the above requirements, it requires a well-defined coupling structure that assures minimum spatial variation in the array.

The CRO array in Fig. 1(e) [12] demonstrates the spacefilling nature of the array that builds a triangle mesh topology, and the electrical equivalent that assembles a parallel array. The mesh-CRO array is phase aligned by mutual injection locking, which can be viewed as the phase average effect (or signal aggregation) since phases between the corresponding nodes in the parallel delay elements can eventually be balanced. However, multiple inputs are necessary to couple rings together and to avoid the signal conflict in [12], which sees three inputs racing in a single-input inverter. On the other hand, it is more natural to lay out a ring oscillator array in a parallel structure [19], [20]. Both the arrays interconnect the ring oscillators through well-defined inputs to form a system, which can then be described by a set of differential equations. The generalized Adler equation is developed in [19], [21], and [22] to model the injection-locking behavior of coupled oscillators. Furthermore, the equations describe the phase relationship of the oscillator array. Two CRO arrays with cyclic and fractal natures are then modeled in the following sections. In the topological analysis, single-ended stages are assumed but work for differential ones as well. The consideration of adopting the fractal in a ring oscillator array clock is to reduce the asymmetry. A fractal-based distribution provides a selfsimilar structure to couple the identical rings properly, which have the same footprint and the same oscillation frequency. Another practical consideration regarding phase accuracy of the array is to construct a CRO with fewer stages, because stage mismatches and phase errors can be reduced among the CROs. The minimum stage number N of a ring oscillator is three, since a two-stage ring needs a buffer to pair with.

## A. Cyclic-Coupled Ring Oscillator

In cyclic-CROs (CCRO), Fig. 2(a) indicates an *M* coupled three-stage (N = 3) ring oscillator that extends horizontally and is coupled vertically. Fig. 2(b) represents the injection-locked behavior model in the second delay stage of the second horizontal oscillator [node (2, 2)] [19]. A single-input inverter is modeled as a hard-limited transconductor, and the *RC* loads represent the interconnects. The node voltage waveform can be written as  $A(t)e^{j\theta_{22}(t)}$  in the phasor form with a time-varying amplitude and phase, where  $I_{OSC}e^{j(\theta_{21}+\pi)}$  and  $I_{CP}e^{j(\theta_{12}+\pi)}$  are the currents generated from the inverters in the horizontal ring and vertical ring, respectively. Both currents are antiphase with their input voltage as the currents added and those injected at node (2, 2) as

$$\frac{Ae^{j\theta_{22}}}{R} + \frac{Cd(Ae^{j\theta_{22}})}{dt} = I_{OSC}e^{j(\theta_{21}+\pi)} + I_{CP}e^{j(\theta_{12}+\pi)}.$$
 (1)

The hard limiter in the behavior model only commutates the currents at the zero crossings, and the amplitude modulation is neglected in the hard limiter, such that  $dA/dt \cong 0$  [22]. By calculating the real and imaginary parts of (1), the generalized Adler equation at node (2, 2) can be given as

$$\frac{d\theta_{22}}{dt} = \frac{1}{RC} \frac{I_{\text{OSC}} \sin(\theta_{21} - \theta_{22}) + I_{\text{CP}} \sin(\theta_{12} - \theta_{22})}{I_{\text{OSC}} \cos(\theta_{21} - \theta_{22}) + I_{\text{CP}} \cos(\theta_{12} - \theta_{22})}.$$
 (2)

In addition, note that the coupling factor is defined as the ratio of the coupling current to the oscillation current  $k = I_{CP}/I_{OSC}$ . Then, we can similarly obtain the phase dynamic at node (3, 2) as

$$\frac{d\theta_{32}}{dt} = \frac{1}{RC} \frac{I_{\text{OSC}} \sin(\theta_{31} - \theta_{32}) + I_{\text{CP}} \sin(\theta_{22} - \theta_{32})}{I_{\text{OSC}} \cos(\theta_{31} - \theta_{32}) + I_{\text{CP}} \cos(\theta_{22} - \theta_{32})}.$$
 (3)



Fig. 2. CCRO (a) schematic of CCRO, (b) injection-locked behavior model of CCRO, and (c) output phase relationship.

Each phase equation describes the rate of change of the phase with time, and equivalently defines the oscillation frequency at each node. When oscillators in a coupled system are locked to a common frequency of  $\omega_0$ , we expect that  $d\theta_{22}/dt = d\theta_{32}/dt = \omega_0$ . Accordingly, we can have the phase shifts in two directions as shown in the following:

$$\theta_{21} - \theta_{22} = \theta_{31} - \theta_{32} = \phi_0 \tag{4}$$

$$\theta_{12} - \theta_{22} = \theta_{22} - \theta_{32} = \psi_0 \tag{5}$$

where  $\phi_0$  is the horizontal phase shift and  $\psi_0$  is the vertical phase shift. Since output phases of the corresponding nodes between adjacent horizontal and vertical rings are shifted by  $\phi_0$ and  $\psi_0$ , respectively, the phase relationship among those nodes is skewed, as illustrated in the left-hand side of Fig. 2(c). Also, note in Fig. 2(a) that  $\phi_0$  and  $\psi_0$  are both not well defined due to the long closing interconnect that feeds back the output of the *N*th or *M*th delay stage to the input of the first one in the horizontal or vertical ring. Impacts of *RC* delay variation in the CRO array include oscillation frequency shift, abnormal phase offset, and amplitude distortion on the generated oscillation signal at the node unbalanced with others. The unbalanced nodes span an entire row and column at the last delay stages of horizontal rings and vertical rings in the CCRO structure as shown in Fig. 2(a). Increase of coupling factor k could further increase the locking range of the oscillators to pull most oscillators to be locked at the oscillation frequency close to others. Yet it is still difficult to couple the ring oscillators without the aforementioned impacts, if a row and a column of the long interconnects remain. A modified structure that ensures that all nodes are RC balanced is apparently required for the array to output equal oscillation waveforms. As proposed in this paper, equilateral polygon meshes can be adopted to the layout design of N-stage rings, and establish equal path length interconnects between every two delay stages in a CRO array.

## B. Fractal-Coupled Ring Oscillator

A main contribution of this work is to introduce a fractalbased ring oscillator array containing all nodes equivalent



Fig. 3. Sierpinski gasket clock distribution network of a 9-SCRO (a) high-level schematic illustration and (b) simplified circuit block diagram implemented with 27 dual-input inverters.

in *RC* delay, and hence, the outputs from all rings have a common period. The array provides a three-phase ring oscillator clock by connecting identical rings to assemble a well-known Sierpinski triangle. A Sierpinski triangle is a selfsimilar set with the overall shape of an equilateral triangle, and can be subdivided recursively into smaller ones. The topology of an interleaved three-stage CRO is an equilateral triangle mesh, and in Fig. 3 is therefore represented by a triangle mesh.

As indicated by (4) and (5), even if  $\phi_0$  and  $\psi_0$  of the CCROs both remain constant and equal, the output phase relationship in Fig. 2(c) is still skewed. It is because there are two phase shifts in horizontal and vertical directions, and the cyclic unilateral coupling in Fig. 2(b) indicates that the extra phase shift (skew) in the plot of output phase relationship is from the vertical coupling. If the extra phase shift is removed, the output phase relationship can be aligned as illustrated in the right-hand side of Fig. 2(c).

In the proposed Sierpinski-CROs (SCROs) as shown in Fig. 3(a),  $\phi_0$  and  $\psi_0$  can be overlapped as aggregate at all bilaterally coupled delay stages to remove the extra phase shift. The unilateral coupling in CCROs is altered to the bilateral coupling in SCROs for sensing  $\phi_0$  from its own ring and  $\psi_0$ from the neighboring ring at every mutually coupled delay stage. The oscillation waves of all rings are phase aligned and triggered by the aggregate of  $\phi_0$  and  $\psi_0$  to travel in the counterclockwise direction, as indicated by the arrows.

The numbers in the high-level schematic [Fig. 3(a)] have a definite order from phase 1 to phase 3, representing the phase sequence of delay stages, and this order keeps repeating itself. The mutually coupled delay stage is presented as two circles put together, and both circles are marked with the same phase number to indicate this coupled delay stage is phase averaged. The edge of blue color in the Sierpinski triangle clock network corresponds to the electrical wire that connects the ring input of one inverter to the output of the previous one. The curved arrow of red color is the split of the edge to couple to the coupling input in the neighboring ring, and needs to route with the edge of equal length. Similar to the equal propagation delay in H-tree for approaching zero skew, selfsimilarity in the Sierpinski triangle ensures the equal stage delay to connect to a ring input in a ring, and couple to a coupling input in a neighboring ring through the edge and the curved arrow. The symmetry of SCROs at the boundary is done by connecting the stage delay  $\phi_0$  to the ring input and its own coupling input in the three delay stages of purple color of the outmost vertices, such that all inputs are welldefined. In contrast to [12], the boundary of the SCRO array is self-terminated to reduce variation. Since a coupled system is more immune to noise perturbation [18], a group of in-phase coupled oscillators would need excitation for startup more than a single ring oscillator. The SCRO array is thus designed to be initialized by the external injection signal CLK<sub>INJ</sub> at the center for startup, as illustrated in Fig. 3. Although CLK<sub>INJ</sub> is injected to the delay stages of phase 3 in the array, the phase sequences in the high-level schematic is consistent no matter at which node the startup of oscillation occurs. However, only the delay stages of the same phase can be chosen to be stimulated by CLK<sub>INJ</sub>, and the SCROs can then oscillate properly without CLK<sub>INI</sub>.

The simplified circuit block diagram of a 9-SCRO is illustrated in Fig. 3(b), where all interconnects are routed orthogonally to meet the design rules from the foundry. A dual-input inverter schematic that is composed of two single-input inverters with the outputs shunted together is shown in the left side [20]. Neither the transition at the ring input nor the coupling input of the inverter alone may be able to cause a complete transition at the output, such that the transitions at both inputs have to overlap as an aggregate. At any of the bilaterally coupled delay stages, the overlapping transitions are apparently identical at both sides, and the two inverters generate two complete output transitions at the same time. The generated oscillation waves have the same phase, and the two delay stages are in-phase coupled. In addition, the differential pairs for external injection are used in Fig. 3(b), because inverters in the prototype chip are all implemented with differential structures.

Fig. 4(a) shows all nodes in SCROs, two of which will be analyzed by the generalized Adler's equation. Fig. 4(b) demonstrates the mutual injection-locked models to derive the phase dynamics at output nodes (M, 1) of the first and second SCROs of the array. The dual-input inverters are modeled as



Fig. 4. SCRO (a) node denomination, and (b) injection-locked behavior model of SCRO.

two hard limiters, with the outputs wired together at the nodes, whose oscillation voltages are  $Ae^{j\theta_{11}}$  and  $Ae^{j\theta_{21}}$ , respectively. With a similar steady-state analysis as before, the mutually coupled phase equations between the two frequency entrainment nodes (1, 1) and (2, 1) can be expressed as

$$\frac{d\theta_{11}}{dt} = \frac{1}{RC} \frac{I_{\rm OSC} \sin(\theta_{13} - \theta_{11}) + I_{\rm CP} \sin(\theta_{23} - \theta_{11})}{I_{\rm OSC} \cos(\theta_{13} - \theta_{11}) + I_{\rm CP} \cos(\theta_{23} - \theta_{11})}$$
(6)

$$\frac{d\theta_{21}}{dt} = \frac{1}{RC} \frac{I_{\text{OSC}} \sin(\theta_{23} - \theta_{21}) + I_{\text{CP}} \sin(\theta_{13} - \theta_{21})}{I_{\text{OSC}} \cos(\theta_{23} - \theta_{21}) + I_{\text{CP}} \cos(\theta_{13} - \theta_{21})}$$
(7)



Fig. 5. Multiply-by-3 space-filling clock tree.

where  $\theta_{13} - \theta_{11} = \theta_{23} - \theta_{21} = \phi_0 = \pi/3$  is the stage delay and equal to  $\pi/3$  if the three-stage ring is interleaved to balance the interconnect RCs. While modeling with hard limiters, the coupling phase shift  $\theta_{23} - \theta_{11} = \theta_{13} - \theta_{21} = \psi_0$  needs to be equal to  $\phi_0$ ; then, the phase equations can easily be solved with the result of  $\theta_{13} = \theta_{23}$  and  $\theta_{11} = \theta_{21}$ . From the derived phase dynamics, adjacent delay stages are in-phase. However, to ensure the dual-input inverter functions properly, input transitions of  $\phi_0$  and  $\psi_0$  only need to be overlapped, allowing a small delay between the matched  $\phi_0$  and  $\psi_0$ . In addition, it is noted that a zero solution also satisfy the above phase equation, and the equations suggest a startup signal would be required for initializing the SCRO array.

## III. MULTIPLY-BY-3 SPACE-FILLING CLOCK TREE

The rest of this work is to design a clock tree with a unified single-phase clock, more power savings, and jitter suppression. The multiply-by-3 technique, using a 3-D stacked transformer, can be applied to the SCROs to obtain such a clock tree. The shortest possible connection of the center points in all subtriangles of the Sierpinski triangle is another self-similar structure, and develops into the Sierpinski space-filling clock tree [23]. The space-filling tree that evenly fills the Sierpinski triangle is adopted to design a clock tree that extracts the third harmonic from SCROs as multiple synchronized clock sources spread all over the chip. The transformer is attached electrically to the clock tree at the center of each SCRO, to obtain the harmonic clock signal on the power-supply rail. As shown in Fig. 5, it is phase aligned by mutual injection locking in SCROs, and extends iteratively like an H-tree by fractal properties.

## A. Multiply-by-3 Ring Oscillator

It is shown that an inductor can extract the third harmonic as the high-frequency output of a multiply-by-3 ring oscillator in [24]. Fig. 6 illustrates the schematic and output waveform of the multiply-by-3 CRO with a voltage amplification gain  $A_v$ . The replenished pulses from three delay stages of a CRO inject three times in a period T or  $1/\omega_0$ , one after another into the LC tank to maintain oscillation at  $3\omega_0$ . While applying the harmonic extraction technique, a three-stage ring offers the largest



Fig. 6. Extraction of third harmonic with amplification gain.

voltage swing on the obtained signal, because the selection of stage number N depends on the amplitudes of the oddharmonic components in the Fourier expansion of the square wave. Although the Nth harmonic is taken out, the prime operating frequency  $\omega_0$  is still inversely dependent on the RC time constant, as shown in (6) and (7).

In a coupled oscillator system, supply noise is the most detrimental, which is correlated in all oscillators [25]. An *LC* tank placed at the top of the SCRO provides an energy storage element in the oscillator and equivalently increases the quality factor of the ring oscillator. When a transformer further takes the place of the *LC* tank in the SCRO, the transformer serves as a bandpass filter to amplify a narrowband of the extracted signal's frequency spectrum, resulting in the substantial removal of low-frequency noise and unwanted higher order harmonic components.

## B. Bandpass Filtering

Bandpass filtering is shown in [26] and [27] to effectively suppress both random and deterministic jitter components in high-frequency clock signals. The transfer function of a transformer is a second-order bandpass filter, such that the filtering operation is to shift the dominant component of the signal through the bandpass filter, and the noise occurring beyond the filter's bandwidth can be suppressed. As shown in Fig. 6, two of the four terminals in the primary and secondary windings are short-circuited together to power supply  $V_{\text{DD}}$ , and the T-arrangement in Fig. 7 can be used to model the coils. Examining the circuit of Fig. 7, straightforward analysis yields two mesh current equations: one mesh with a primary inductance of  $L_P$  and a primary winding resistance of  $r_P$  and a second mesh with a secondary inductance of  $L_S$ 



Fig. 7. T-equivalent circuit model for transformer.

and winding resistance of  $r_S$ . Furthermore, the two meshes must have a common inductance of M. Let the secondary be terminated by a load with an input impedance of Z(s), and replaced with a resistive load of R to simplify the analysis. The voltage-transfer function of the transformer is then obtained in (8) [28]

H(s)

$$=\frac{\pm sMR}{s^2(L_PL_S-M^2)+s(r_PL_S+r_SL_P+RL_P)+r_P(r_S+R)}.$$
(8)

Mutual inductance *M* is related to the coefficient of coupling  $k_C$  by  $M = k_C \sqrt{L_P L_S}$ . For perfect coupling  $(k_C = 1)$  and zero winding resistance  $(r_p = r_S = 0)$ , the transfer function in (8) approaches the turns ratio by  $H(s) = \pm M/L_P = \pm \sqrt{L_S/L_P} = \pm n$ , where *n* is the ratio of secondary turns to primary turns. Rearranging (8) gives

$$H(s) = \pm n \cdot k_C \frac{s}{s^2 \left(\frac{(1-k_C^2)L_S}{R}\right) + s \left(1 + \frac{r_S}{R} + \frac{r_P}{R} \frac{L_S}{L_P}\right) + \frac{r_P}{L_P} \left(\frac{r_S}{R} + 1\right)}.$$
(9)

Every transformer has losses in the form of winding resistance, or less than ideal coupling, and (9) reveals that such a nonideal transformer will result in a bandpass system [28]. Moreover, (9) approaches a high-pass filter with  $k_C = 1$ , and approaches a low-pass filter with  $r_p = r_S = 0$ . The lower and upper cutoff frequencies of  $f_L$  and  $f_H$  can thus be estimated to approximate the bandpass filter by the product of two filters, but a restriction is imposed. The two cutoff frequencies should be far enough without dropping the gain at the passband. Despite the requirement to be satisfied with, it suggests the two most important parameters for determining the frequency selectivity of a coupled resonator bandpass filter, i.e., the winding resistances and the coupling strength in the transformer.

## C. Jitter Transfer Function

I

In order to investigate jitter filtering characteristic provided by the transformer-based harmonic extraction technique, consider the following time-domain expression of a jittery clock waveform at the primary side of transformer, which is at frequency  $f_P$  (with amplitude  $A_P$ ) that contains a sinusoidal jitter component with phase noise amplitude  $\beta_P$  and phase modulation frequency  $f_m$ 

$$V_P(t) = A_P \cos(2\pi f_P t + \beta_P \sin 2\pi f_m t).$$
(10)



Fig. 8. Frequency response of a jittery clock signal after bandpass filtering.

For small  $\beta_P$  values, (10) can be expressed in the frequency domain as follows:

$$S_P(f) = \frac{A_P}{2}\delta(f - f_P) - \frac{\beta_P A_P}{4}\delta[(f - f_L) - \delta(f - f_H)]$$
(11)

where  $f_L = f_P - f_m$  and  $f_H = f_P + f_m$ . The frequency response of received clock at the secondary side of transformer is given by multiplying the frequency-domain expression  $S_P(f)$  with the signal transfer function of transformer. The frequency-domain expression of the received clock signal  $S_S(f)$  can be written as

$$S_S(f) = \frac{H(f_P)A_P}{2}\delta(f - f_P)$$
$$-\frac{\beta_P A_P}{4}[H(f_L)\delta(f - f_L) - H(f_H)\delta(f - f_H)] (12)$$

where  $H(f_P)$ ,  $H(f_L)$ , and  $H(f_H)$  are the signal transfer function of transformer at  $f_P$ ,  $f_L$ , and  $f_H$ , respectively. We may also express the time-domain received clock with jitter  $V_S(t)$  at the secondary side as follows:

$$V_S(t) = A_S \cos(2\pi f_P t + \beta_S \sin 2\pi f_m t)$$
(13)

where  $A_S$  is the received clock amplitude and  $\beta_S$  is the received phase noise amplitude at the secondary side. The frequency-domain expression of the received clock signal at the secondary side  $S'_S(f)$  is given by the inverse Fourier transform of  $V_S(t)$ 

$$S_{S}'(f) = \frac{A_{S}}{2}\delta(f - f_{P}) - \frac{\beta_{S}A_{S}}{4}[\delta(f - f_{L}) - \delta(f - f_{H})].$$
(14)

Equating (12) and (14), and assuming small amplitude phase noise

$$A_S \approx H(f_P)A_P \tag{15}$$

$$\beta_S \approx \frac{H(f_L) + H(f_H)}{H(f_P)} \beta_P.$$
(16)

The jitter transfer function can be defined as the ratio of the output to the input phase noise magnitude as follows [29], [30]:

$$\text{JTF}_{\text{XFMR}} \equiv \frac{\beta_S}{\beta_P} = \frac{H(f_L) + H(f_H)}{H(f_P)}.$$
 (17)

For typical bandpass filtering,  $|H(f_L)| \approx |H(f_H)| < |H(f_P)|$ and  $|H(f_P)| > 1$ . Thus,  $|\text{JTF}_{\text{XFMR}}(j2\pi f_m)| << 1$  and the jitter of the received clock at the secondary side of transformer is filtered by bandpass filtering. The concept of jitter filtering process of clock signal is illustrated in Fig. 8, and a frequency modulated clock with lowered side bands can be obtained.

## D. 3-D Stacked Transformer

The 3-D stacked transformer in Fig. 9(a) shows the proposed 1-to-6 transformer structure. The transformer only occupies an area of  $60 \times 60 \ \mu m^2$ , and the secondary turns are offset with respect to the primary turns, to minimize the capacitive coupling arising from the fringing electric fields [31], as shown in the 3-D view of Fig. 9(a). The secondary coil is thus routed with increased diameters, by adding a line-spacingwide offset in each following turn at the two metal layer groups enclosed by two circular truncated cones, which are from  $M_4$  to  $M_2$  and  $M_6$  to  $M_8$ , respectively. Fig. 9(b) shows the simulated characteristics of the  $S_{11}$  parameters and  $Z_{in}$ , and Fig. 9(c) shows voltage-conversion ratio. The operating frequency of SCROs is varied by sweeping the supply voltage of  $V_{DD}$  in experiments, and the proposed LC tank at primary side is tuned to act like an open circuit at all possible thirdharmonic frequencies to output the oscillation waves. The  $S_{11}$  parameters and  $Z_{in}$  indicate in which frequency range signals are reflected and high tank impedance can be obtained at the primary coil. The third-harmonic output voltage swing can be determined by the product of the third-harmonic current at one of the delay stages and the tank impedances. Because the third-harmonic voltage swing can be smaller than onetenth of the supply voltage [24], a secondary coil is followed by the primary to form a bandpass filter, with certain gain at passband to enlarge the signal swing. The transformer provides additional noise and jitter filtering, when it is tuned to only amplify over the spectral region containing the majority of the signal energy from the primary side. The bandpass filter's center frequency is tuned to the third harmonic, which is in the GHz range suitable for modern VLSI chips. The frequency selection behavior of the transformer can be found from the response curve of voltage amplification in Fig. 9(c). As the frequency at the input is varied, the voltage-conversion ratio



(a)



Fig. 9. Transformer characteristics (a) 3-D stacked transformer, (b) simulated responses  $S_{11}$  and  $Z_{in}$ , and (c) voltage gain of the transformer.

in Fig 9(c) changes with the frequency. At 5 GHz, a maximum conversion ratio of 3.55 is obtained. The lower and upper cutoff frequencies of  $f_L$  and  $f_H$  are near 2.5 and 6.5 GHz, respectively, and the 3-dB bandwidth of the filter is 4 GHz. A final observation is that the voltage-scaling ratio from the secondary to the primary windings is below 0.19 or -14.42 dB all over the simulated frequencies. The small voltage-scaling ratio from the secondary to the primary side can hardly disturb the SCRO active circuits through magnetic coupling. Note that the ratio of 0.19 is not inversely proportional to the peak gain of 3.5, because the parasitic resistances of each turns in the 3-D stacked transformer are in a series connection and form a voltage divider. The outmost turn is the longest routing with the highest resistance in the secondary coil.

When the signal is applied to one of the two outsets of the secondary coil, most signal strength is divided to the top  $M_8$  metal layer routing or bottom  $M_2$  metal layer routing [see Fig. 9(a)] and not coupled to the primary coil. As a result, the conversion ratio from the secondary to primary is lower than expected.

#### E. Ring Oscillator Design

In a circuit with a differential structure, power-supply noise is usually suppressed as it is seen as common-mode noise. The circuit diagram in Fig. 10(a) is the differential version of the circuit shown in Fig. 6, which couples two threestage ring oscillator through inverter latches. Two transformers are equipped with the circuit for extracting the differential  $3\omega_0$  signal. In Fig. 10(b), a multiple-input inverter gate level



(b)

Coupling Input 3+

Coupling Input 2+

Fig. 10. (a) Coupled three-stage ring oscillator and (b) schematic of three-stage differential ring oscillator.

Coupling Input 1+



Fig. 11. Differential ring oscillator delay stage.

symbol is used to represent the transistor-level implementation of the delay stage in the proposed oscillator structure, which adds coupling inputs and integrates the latch for enabling differential signaling. The circuit diagram in Fig. 10(b) then illustrates a closed-loop connection of three delay stages that provides a phase shift of 180° for the ring to oscillate.

The complementary nMOS/pMOS differential circuit with multiple inputs in Fig. 11 is used to implement the transistor level delay stage, and the  $3\omega_0+$  and  $3\omega_0-$  are the nodes connected to the two transformers. The delay stage design can receive the outputs from the previous stage of its own and

the neighboring ring. The transistor pairs  $M_{1-2}$  and  $M_{3-4}$ , labeling the ring and coupling input, respectively, have a very close aspect (W/L) ratio, since the larger the coupling factor, the wider the locking range, and the wider the bandwidth of phase noise and jitter improvement in the *M* coupled oscillators [19].  $M_{7-8}$  are cross-coupled to maintain a 180° phase shift between the two sides of the differential circuit, and to form a CMOS latch with  $M_{5-6}$ .  $M_{5-6}$  maintain the previous state at the outputs against supply and substrate disturbances [32], thus self-regulating its own output from variations. Another advantage of assembling a latch of  $M_{5-8}$  in



Fig. 12. Frequency spectrum of  $3\omega_0$  clock waveform.









Fig. 13. (a) Frequency variation versus process variation and mismatch using the TSMC Monte Carlo transistor model and (b) SCRO skew performance.

the delay stage is that positive feedback will take effect to help the outputs transit faster. Faster transition time is important for reducing phase noise and jitter [33], because noise from active devices injects into LC tank over a short window of time when both pMOS and nMOS conduct.

Moreover, the ring oscillators in SCROs can be modified as the voltage-controlled oscillators (VCOs) by employing a digitally controlled capacitor array or varactors to enable the frequency-tuning ability. For instance, an MIM capacitor array distributed on the clock network or varactors inserted between the differential pairs in the ring oscillator delay stages can be realized with SCROs.

#### F. Simulated Results

The output waveform of the extracted third harmonic in SCROs is expected to have sinusoidal shape, with harmonics which peak at  $3\omega_0$ ,  $6\omega_0$ , and  $9\omega_0$ . Fig. 12 gives the simulated spectrum of the third-harmonic clock at 4.18 GHz.



Fig. 14. Die photograph.

The spectrum also confirms the cancellation and filtering of the first and second harmonics of the prime  $\omega_0$  signal at 1.39 GHz.

Skew is another source of timing uncertainties in a clock distribution network, and it is mainly from mismatches, which can only be adequately quantified through measuring a considerable amount of dies. Monte Carlo simulations of SCRO performance against process variation and mismatch is shown in Fig. 13. Fig. 13(a) shows the Gaussian distribution of the SCRO operating frequency in the Monte Carlo analysis with the sample number of 500. A mean mu is 4.31 GHz, and standard deviation sigma  $s_d$  is 269.9 MHz. The skews between the output waveform of the first SCRO [labeled with #1 in Fig. 4(a)] and two other most distant SCROs [labeled with #5 and #9 in Fig. 4(a)] are shown in Fig. 13(b), and the claims of low skew between different oscillators are verified through the Monte Carlo simulation.

## G. Considerations of Practical Implementation

Energy recycling enabled with the LC circuitry at a few GHz range demonstrates a promising trend in low-power microprocessor design [34]-[36]. The proposed SCROs operate in the similar frequency range, and the following considerations provide guidance for possible microprocessor implementation with SCROs. As indicated in the Monte Carlo simulations, the SCRO operation frequency varies on different process corners. The operation frequency of a clock network also varies in processor circuits with temperature fluctuations. The PVT calibration technique [37] and phase-locked loop-based control loops [38] [39] can be implemented with SCROs to enable a PVT variation-tolerated clock distribution. Accordingly, the SCROs need to be modified as the VCOs by employing a digitally controlled capacitor array or varactors to enable the frequency tuning ability from a constant supply voltage. If there is a separate supply voltage for SCRO, a delay-locked loop-based control loop can be adopted as well. However, signals crossing from one to another voltage domain have to be interfaced through the level shifters. It is known that level shifter is one of the jitter sources for clock buffering and data level shifting, and proper care should be taken in



Fig. 15. Measurement results (a) clock frequency and dc current consumption versus supply voltage and (b) output waveform.

designing the voltage-domain change. For instance, supply regulation for minimizing  $V_{DD}$  fluctuation and ground bounce with supply regulators and bypass capacitances is essential to establish steady reference voltage and fixed trigger threshold in the clock/data receiving circuitry.

#### **IV. EXPERIMENTS**

Fig. 14 shows the die photograph of the prototype chip built using TSMC 90-nm CMOS 1P9M standard process. The test chip features a Sierpinski triangle clock distribution network and a Sierpinski space-filling clock tree. The topology of a SCRO is an equilateral triangle with a side length of 800  $\mu$ m, for distributing a 0.95–1.43 GHz fundamental frequency clock and 2.85 – 4.3 GHz third-harmonic clock, and the side length of the Sierpinski clock grid is 3200  $\mu$ m. The total length of the clock grid is 9.6 mm, but the chip area is only 1.45 × 2.1 mm<sup>2</sup>, because the interconnects between delay stages are all routed with the meander line style to save chip area. This clocking scheme exploits the subharmonic injection locking at  $\omega_0$  and obtains a single-phase harmonic clock at  $3\omega_0$ . As can be seen in Fig. 14, there are three



Fig. 16. (a) Output open-drain buffer and (b) noise generation and monitoring circuits.



Fig. 17. (a) Measured waveform of power-supply noise. (b) Measured jitter as a function of injected power-supply noise.

differential transformer sets at the top of the test chip, and six of them are placed at the bottom. The placement of transformers is not symmetric, because the width of the test chip is already pad limited, but the height can be reduced by placing them compactly to save the chip area. Also, each differential transformer is confined within a guard ring to make sure that the parasitic effects around the transformer are similar at different locations on the chip, as shown in the die photograph. The output waveform of each differential transformer set is probed by using a Ground-Signal-Ground (G-S-G) RF probing pad to avoid a pad-limited design. Also, a G-S-G-S-G differential RF probing pad on the left-hand side of the chip is able to initiate the startup of the oscillator array.

Measurements of the actual performance versus  $V_{\text{DD}}$  are shown in Fig. 15(a). The oscillator is seen to be functional down to 0.7-V supply voltage, and consumes a supply current of 27.8 mA, resulting in a total power consumption of only 19.4 mW. The red and blue lines are the clock frequency and the measured  $I_{\text{DD}}$ , respectively, and both of them vary with  $V_{\text{DD}}$  almost linearly. The oscillation frequency of a ring oscillator is inversely proportional to *RC* delay, and hence,  $I_{\text{DD}} \cong C \times V_{\text{PP}} \times f_{\text{osc}}$ , where  $V_{\text{pp}}$  is the voltage swing at output and is equal to  $V_{\text{DD}}$  in a simple ring oscillator. Since  $f_{\text{osc}}$  varies lineally with  $V_{\text{DD}}$  in experiments,  $I_{\text{DD}}$  is expected to increase quadratically with  $V_{\text{DD}}$  from the above equation. However,  $V_{pp}$  in a multiply-by-3 SCRO is the  $3\omega_0$  output swing, and is smaller than  $V_{DD}$ . In addition, the recycled current in the *LC* tank can be added to charge the interconnection node within SCROs. Considering the above two causes, the measurement results can be conceivable and agree with the expectation.

The measured harmonic clock frequency varies from 2.83 GHz when supplied by 0.7 V to 4.3 GHz when supplied by 1 V. The measured harmonic clock frequencies are all in the passband of the proposed transformer based bandpass filter, as shown in Fig. 9(c). Fig. 15(b) shows the measured waveform at an SCRO triggered by another one in a Keysight 86100C wide-bandwidth sampling oscilloscope. It shows a 176-mV sinusoidal swing at a frequency of 2.83 GHz, without using any built-in averaging function from sampling oscilloscope to smooth the observed waveform. Also, jitter is measured as 1.25 ps (rms) and 15.56 ps (pp).

## A. Jitter

Fig. 16(a) shows the open-drain buffer, which converts the differential signal to a single-ended one, such that the output waveform of each transformer set can be measured by probing at a G-S-G RF pad. The noise generation and monitoring circuit in Fig. 16(b) are used to disturb the supply voltage  $V_{DD}$  of the SCROs and to view the injected noise signal at the

| Ref.                                                 | This Work                         | 2003 [4] | 2007 [6] | 2008 [7] | 2006 [11]                          | 2015 [40]                           |
|------------------------------------------------------|-----------------------------------|----------|----------|----------|------------------------------------|-------------------------------------|
| Technology                                           | 90nm                              | 0.18µm   | 0.18µm   | 0.18µm   | 0.18µm                             | 0.13µm                              |
| Frequency (GHz)                                      | 2.83-4.3<br>(By V <sub>DD</sub> ) | 10       | 12       | 9.5      | 1.1-1.6<br>(By C <sub>tune</sub> ) | 1.7-2.0<br>(Ву С <sub>Аггау</sub> ) |
| Area (mm <sup>2</sup> )                              | 1.45×2.1                          | 1.4×3    | 1×1      | 2.2×2.2  | ~2.4×2.4                           | 4.4×2.25                            |
| Length (µm)                                          | 3200×3                            | 3000×6   | 1000×4   | 1320×12  | N/A                                | N/A                                 |
| Skew (ps)                                            | 10                                | 0.6-2.6  | 8        | 6        | N/A                                | N/A                                 |
| Jitter (rms) (ps)                                    | 1.25                              | 2.3      | 0.83     | 0.77     | 0.51                               | 0.0668                              |
| Jitter (pp) (ps)                                     | 15.56                             | N/A      | 4.89     | 4.3      | 34                                 | N/A                                 |
| <b>Jitter (rms) (ps)</b><br>(With 300mV added noise) | 3.4                               | N/A      | N/A      | N/A      | 3.1                                | N/A                                 |
| <b>Jitter (pp) (ps)</b><br>(With 300mV added noise)  | 17.7                              | N/A      | N/A      | N/A      | 42                                 | N/A                                 |
| V <sub>DD</sub> (V)                                  | 0.7                               | 1.8      | 0.9      | 1.8      | 1.8                                | N/A                                 |
| DC power (mW)                                        | 19.4                              | 378      | 80       | ~329     | 50.4                               | 30.2                                |

TABLE I Summary of the Proposed Sierpinski Clock Tree and Comparison With Previous Works

same time. The integrated circuit package resonant frequency is often near one-tenth of the clock frequency in recent GHz-rate VLSI chip, and the excitation at this frequency contributes the most supply noise [11], [15]. In the experiment, an injected noise at 283 MHz is used to mimic the supply noise. It is induced by turning ON and OFF the two large (~ 1000  $\mu$ m wide) and different sized MOS switches of  $M_{1-2}$  to short-circuit the on-chip supply rails at frequencies near 283 MHz. The supply noise is injected at both the onchip supply pin and the gate of  $M_3$  in Fig. 16(b), and a bias tee is used to bias the noise monitor of  $M_3$ , which is an open-drain buffer. The buffered noise at the drain of  $M_3$  is observed at the RF output of the bias tee with a real-time oscilloscope. Fig. 17(a) shows the measured powersupply noise. The measured jitter in Fig. 17(b) is based on the jitter histogram, which is statistically generated by the sampling oscilloscope. The precision time-base module equipped with the oscilloscope is Keysight 86116A, which has a 63-GHz bandwidth. In Fig. 17(b), the measured jitter is plotted as a function of supply noise, and demonstrates a flat and slowly increasing response to impulsive perturbations; for instance, jitter (rms) is below 3.4 ps, and jitter (pp) is below 17.7 ps under 300-mV injected supply noise. The plot is measured under the bias voltage of 0.7 V. Compared to previous work [11], the test chip is biased without the need for choosing the optimum bias. The measured rms jitter is comparable to other LC-tank-based resonant oscillator [11], because SCRO is basically a ring-type oscillator with the proposed harmonic extraction technique. The better pkpk jitter performance under 300-mV injected supply noise, as shown in Fig. 17(b) and Table I, can be attributed to that ring oscillators in the SCRO coupling structure are mutually injection locked at every corresponding delay stage to oscillate at a common frequency (or period).

## B. Skew

In order to confirm the clock skew, six harmonic clock waves are probed at the bottom SCROs one by one, using the waveform persistence feature of the sampling oscilloscope,



Fig. 18. Measured skew by overlapping output clock waves of bottom six SCROs.

to be superimposed and displayed on the monitor [5]–[7]. The concept of the clock skew measurement is close to the data eye diagram measurement in the sampling scope, since both measurements need a trigger source to trigger the scope, and after sampling the waveforms persist on the screen. Any of the clock waves at the top three SCROs can be used as the trigger source to sample the outputs of the other six bottom SCROs on the test chip. The observed clock skew is 10 ps as shown in Fig. 18, which is comparable but slightly larger than other measured results. Table I summarizes the comparison with previous works. The skew of 10 ps is believed to be due to the delay between the ring and the coupling input transitions in the mutually coupled delay stage of a 9-SCRO. The delay originates from the extended routing in the test chip layout design to conveniently connect the coupling inputs with each other's previous stage outputs in the adjacent delay stages. The SCROs are closely placed, and therefore, the delay is small, but it could limit the best skew performance to some degree.

## C. Practical Testing Consideration

Since the SCRO operating frequency could change from die to die due to process variations, testing becomes an important consideration in actual microprocessor implementation. The demultiplexer (DEMUX) and multiplexer (MUX) can be adopted in the proposed clock network design with processor circuits for probing at less pads to reduce testing cost. The clock distribution design of a SoC implementation for low-cost IC testing can further be realized with the built-in self-test (BIST) technique [40]. For example, the BIST circuit takes a chip area of only 0.025 mm<sup>2</sup> in a 130-nm CMOS process and allows testing of the integrity of the clock distribution system.

## V. CONCLUSION

In this paper, we propose the SCROs and Sierpinski spacefilling clock tree. Both of them exploit fractal properties to establish a well-defined structure, and facilitate future scaling in a network design. The SCRO utilizes the injection-locking technique to reduce the clock skew, and provides an aligned phase relationship. The clock tree with the harmonic extraction technique provides a unified single-phase global clock, and minimizes the clock jitter by the built-in bandpass filtering function from the 3-D stacked transformer. The topologies and techniques improve the ring-oscillator-based clock distribution with low timing uncertainties, and develop compatibility with current practices. The harmonic clock frequency is 2.85–4.3 GHz under 0.7–1 V supply voltages. The power consumption is 19.4 mW for a supply voltage of 0.7 V, and it indicates a significant power saving from the resonant clocking with frequency multiplication. The measured jitter characteristic shows strong resistance to supply noise under 300 mV. The measured jitters are 3.4 ps (rms) and 17.7 ps (pp), respectively.

#### ACKNOWLEDGMENT

The authors would like to thank the National Chip Implementation Center and the National Applied Research Laboratories for the chip fabrication and measurement on this work. They would also like to thank Prof. Y.-J. Hsu, K.-M. Feng, S.-Y. Huang, and T.-C. Wang at National Tsing Hua University for their kind assistance.

#### REFERENCES

- P. J. Restle *et al.*, "A clock distribution network for microprocessors," *IEEE J. Solid-State Circuits*, vol. 36, no. 5, pp. 792–799, May 2001.
- [2] V. Gutnik and A. P. Chandrakasan, "Active GHz clock network using distributed PLLs," *IEEE J. Solid-State Circuits*, vol. 35, no. 11, pp. 1553–1560, Nov. 2000.
- [3] T. S. Sandhu and K. El-Sankary, "A mismatch-insensitive skew compensation architecture for clock synchronization in 3-D ICs," *IEEE Trans. Very Large Scale Integr. (VLSI) Syst.*, vol. 24, no. 6, pp. 2026–2039, Jun. 2016.
- [4] F. O'Mahony, C. P. Yue, M. A. Horowitz, and S. S. Wong, "A 10-GHz global clock distribution using coupled standing-wave oscillators," *IEEE J. Solid-State Circuits*, vol. 38, no. 11, pp. 1813–1820, Nov. 2003.
- [5] M. Sasaki, "A high-frequency clock distribution network using inductively loaded standing-wave oscillators," *IEEE J. Solid-State Circuits*, vol. 44, no. 10, pp. 2800–2807, Oct. 2009.
- [6] M. Sasaki, M. Shiozaki, A. Mori, A. Iwata, and H. Ikeda, "12 GHz lowarea-overhead standing-wave clock distribution with inductively-loaded and coupled technique," in *IEEE ISSCC Dig. Tech. Papers*, Feb. 2007, pp. 180–595.
- [7] M. Sasaki, "A 9.5 GHz 6 ps-skew space-filling-curve clock distribution with 1.8 V full-swing standing-wave oscillators," in *IEEE ISSCC Dig. Tech. Papers*, Feb. 2008, pp. 518–633.

- [8] J. Wood, T. C. Edwards, and S. Lipa, "Rotary traveling-wave oscillator arrays: A new clock technology," *IEEE J. Solid-State Circuits*, vol. 36, no. 11, pp. 1654–1665, Nov. 2001.
- [9] Y. Teng and B. Taskin, "ROA-brick topology for low-skew rotary resonant clock network design," *IEEE Trans. Very Large Scale Integr. (VLSI) Syst.*, vol. 23, no. 11, pp. 2519–2530, Nov. 2015.
- [10] K. Takinami, R. Walsworth, S. Osman, and S. Beccue, "Phase-noise analysis in rotary traveling-wave oscillators using simple physical model," *IEEE Trans. Microw. Theory Techn.*, vol. 58, no. 6, pp. 1465–1474, Jun. 2010.
- [11] S. C. Chan, K. L. Shepard, and P. J. Restle, "Distributed differential oscillators for global clock networks," *IEEE J. Solid-State Circuits*, vol. 41, no. 9, pp. 2083–2094, Sep. 2006.
- [12] L. Hall, M. Clements, W. Liu, and G. Bilbro, "Clock distribution using cooperative ring oscillators," in *Proc. 17th Conf. Adv. Res. (VLSI)*, Sep. 1997, pp. 62–75.
- [13] I. S. Kourtev and E. G. Friedman, "Clock skew scheduling for improved reliability via quadratic programming," in *IEEE/ACM Int. Conf. Comput.-Aided Des. Dig. Tech. Papers*, Nov. 1999, pp. 239–243.
- [14] B. Taskin and I. S. Kourtev, "Delay insertion method in clock skew scheduling," *IEEE Trans. Comput.-Aided Des. Integr.*, vol. 25, no. 4, pp. 651–663, Apr. 2006.
- [15] S. C. Chan, K. L. Shepard, and P. J. Restle, "Uniform-phase uniformamplitude resonant-load global clock distributions," *IEEE J. Solid-State Circuits*, vol. 40, no. 1, pp. 102–109, Jan. 2005.
- [16] A. J. Drake, K. J. Nowka, T. Y. Nguyen, J. L. Burns, and R. B. Brown, "Resonant clocking using distributed parasitic capacitance," *IEEE J. Solid-State Circuits*, vol. 39, no. 9, pp. 1520–1528, Sep. 2004.
- [17] H.-C. Chang, X. Cao, U. K. Mishra, and R. A. York, "Phase noise in coupled oscillators: Theory and experiment," *IEEE Trans. Microw. Theory Techn.*, vol. 45, no. 5, pp. 604–615, May 1997.
- [18] D. K. Shaeffer and S. Kudszus, "Performance-optimized microstrip coupled VCOs for 40-GHz and 43-GHz OC-768 optical transmission," *IEEE J. Solid-State Circuits*, vol. 38, no. 7, pp. 1130–1138, Jul. 2003.
- [19] M. M. Abdul-Latif and E. Sanchez-Sinencio, "Low phase noise wide tuning range N-push cyclic-coupled ring oscillators," *IEEE J. Solid-State Circuits*, vol. 47, no. 6, pp. 1278–1294, Jun. 2012.
- [20] J. G. Maneatis and M. A. Horowitz, "Precise delay generation using coupled oscillators," *IEEE J. Solid-State Circuits*, vol. 28, no. 12, pp. 1273–1282, Dec. 1993.
- [21] A. Mirzaei, M. E. Heidari, R. Bagheri, S. Chehrazi, and A. A. Abidi, "The quadrature LC oscillator: A complete portrait based on injection locking," *IEEE J. Solid-State Circuits*, vol. 42, no. 9, pp. 1916–1932, Sep. 2007.
- [22] A. Mirzaei, M. E. Heidari, R. Bagheri, and A. A. Abidi, "Multiphase injection widens lock range of ring-oscillator-based frequency dividers," *IEEE J. Solid-State Circuits*, vol. 43, no. 3, pp. 656–671, Mar. 2008.
- [23] J. J. Kuffner and S. M. LaValle, "Space-filling trees: A new perspective on incremental search for motion planning," in *Proc. IEEE/RSJ Int. Conf. Intell. Robots Syst. (IROS)*, Sep. 2011, pp. 2199–2206.
- [24] S. Verma, J. Xu, and T. H. Lee, "A multiply-by-3 coupled-ring oscillator for low-power frequency synthesis," *IEEE J. Solid-State Circuits*, vol. 39, no. 4, pp. 709–713, Apr. 2004.
- [25] F. Herzel and B. Razavi, "A study of oscillator jitter due to supply and substrate noise," *IEEE Trans. Circuits Syst. II, Analog Digit. Signal Process.*, vol. 46, no. 1, pp. 56–62, Jan. 1999.
- [26] T. M. Hollis and D. J. Comer, "Bandpass filtering of high-speed forwarded clocks," *Anal. Integr. Circuits Signal Process.*, vol. 54, pp. 171–184, Mar. 2008.
- [27] H. Song, J. Song, A. Dey, and Y. Song, "Jitter transfer function model and VLSI jitter filter circuits," in *Proc. IEEE SOCC*, Sep. 2010, pp. 48–51.
- [28] K. S. S. Kumar, *Electric Circuits and Networks*. New Delhi, India: Dorling Kindersely, 2009, pp. 667–673.
- [29] G. Balamurugan and N. Shanbhag, "Modeling and mitigation of jitter in high-speed source-synchronous interchip communication systems," in *Proc. 37th Asilomar Conf. Signals, Syst. Comput.*, vol. 2. Nov. 2003, pp. 1681–1687.
- [30] A. Ragab, Y. Liu, K. Hu, P. Chiang, and S. Palermo, "Receiver jitter tracking characteristics in high-speed source synchronous links," *J. Electr. Comput. Eng.*, vol. 2011, 2011, Art. no. 982314.

- [31] A. Zolfaghari, A. Chan, and B. Razavi, "Stacked inductors and transformers in CMOS technology," *IEEE J. Solid-State Circuits*, vol. 36, no. 4, pp. 620–628, Apr. 2001.
- [32] I.-C. Hwang, C. Kim, and S.-M. Kang, "A CMOS self-regulating VCO with low supply sensitivity," *IEEE J. Solid-State Circuits*, vol. 39, no. 1, pp. 42–48, Jan. 2004.
- [33] A. Hajimiri and T. H. Lee, "Design issues in CMOS differential LC oscillators," *IEEE J. Solid-State Circuits*, vol. 34, no. 5, pp. 717–724, May 1999.
- [34] A. T. Ishii, J. C. Kao, V. S. Sathe, and M. C. Papaefthymiou, "A resonantclock 200 MHz ARM926EJ-S<sup>TM</sup> microcontroller," in *Proc. ESSCIRC*, 2009, pp. 356–359.
- [35] F. U. Rahman and V. S. Sathe, "Voltage-scalable frequencyindependent quasi-resonant clocking implementation of a 0.7-to-1.2 V DVFS system," in *IEEE ISSCC Dig. Tech. Papers*, Dec. 2016, pp. 334–335.
- [36] P. Restle *et al.*, "Wide-frequency-range resonant clock with on-the-fly mode changing for the POWER8<sup>TM</sup> microprocessor," in *IEEE ISSCC Dig. Tech. Papers*, Feb. 2014, pp. 100–101.
- [37] S. Choi, S. Yoo, Y. Lim, and J. Choi, "A PVT-robust and low-jitter ring-VCO-based injection-locked clock multiplier with a continuous frequency-tracking loop using a replica-delay cell and a dual-edge phase detector," *IEEE J. Solid-State Circuits*, vol. 51, no. 8, pp. 1878–1889, Aug. 2016.
- [38] I. Galton, D. A. Towne, J. J. Rosenberg, and H. T. Jensen, "Clock distribution using coupled oscillators," in *Proc. IEEE Int. Symp. Circuits Syst.*, vol. 3, May 1996, pp. 217–220.
- [39] H. Mizuno and K. Ishibashi, "A noise-immune GHz-clock distribution scheme using synchronous distributed oscillators," in *IEEE ISSCC Dig. Tech. Papers*, Feb. 1998, pp. 404–405.
- [40] Z. Bai, X. Zhou, R. D. Mason, and G. Allan, "Low-phase noise clock distribution network using rotary traveling-wave oscillators and built-in self-test phase tuning technique," *IEEE Trans. Circuits Syst. II, Exp. Briefs*, vol. 62, no. 1, pp. 41–45, Jan. 2015.



Yi-Wei Lin (S'11) received the B.S. degree in electronic engineering from Chung Yuan Christian University, Chungli, Taiwan, in 2006, and the M.S. degree from the University of Southern California, Los Angeles, CA, USA, in 2008. He is currently pursuing the Ph.D. degree in electronics engineering with National Tsing Hua University, Hsinchu, Taiwan.

From 2013 to 2015, he participated in NTHU NPIE Bridge Program and his research focuses were 3-D IC signaling and clock synchronization. His

current research interests include low-power VLSI, data signal transmission, clock generation and distribution, and 3-D IC technology.



Shawn S. H. Hsu (M'04) was born in Tainan, Taiwan. He received the B.S. degree from National Tsing Hua University, Hsinchu, Taiwan, in 1992, and the M.S. and Ph.D. degrees from the University of Michigan, Ann Arbor, MI, USA, in 1997 and 2003, respectively.

In 2003, he joined the Electrical Engineering Department, National Tsing Hua University, Hsinchu, Taiwan, as an Assistant Professor. In August 2014, he was appointed a Distinguished Professor with National Tsing Hua University. He

is involved with the design, fabrication, and the modeling of high-frequency transistors and interconnects. He is also interested in heterogeneous integration using system-in-package and 3-D integrated circuit technology for high-speed wireless/optical communications. He is currently a Professor with the Institute of Electronics Engineering and the Electrical Engineering Department, National Tsing Hua University. His current research interests include the design of monolithic microwave integrated circuits and RF integrated circuits using Si/III–V-based technologies.