

# A Novel MUX-FF Circuit for Low Power and High Speed Serial Link Interfaces

Wei-Yu Tsai, Ching-Te Chiu, Jen-Ming Wu, Shuo-Hung Hsu, Yar-Sun Hsu

Department of Electrical Engineering

National Tsing Hua University

Hsinchu, Taiwan 30013, R.O.C.

Email: s9861629@m98.nthu.edu.tw

**Abstract**—In this paper, a novel multiplexer-flip-flop (MUX-FF) topology using the current mode logic (CML) is presented. A CML multiplexer-latch (MUX-latch) is proposed by combining a multiplexer and the loopback storage part of a latch into a single module so that the buffer part of a latch can be removed. A MUX-FF is implemented by cascading two stages of MUX-latches. The output of a MUX-FF is edge-triggered, so it is insensitive to input noise. All the paths from inputs to the output are symmetric. Power and area can be reduced due to the removal of DFFs. Simulation results show that a MUX-FF can achieve a similar frequency as a conventional tree-type MUX by saving 56 % of area and 72 % of power consumption.

## I. INTRODUCTION

In recent years, serialized data transmission is widely adopted in modern communication systems and I/O devices. Multiplexers (MUX) are applied to convert low-speed parallel data into a high speed serial datum. Apart from the required high operating frequency, low power and area have also become important features for embedded systems.

Conventionally, serialization is proposed with several stages of 2-1 MUX cells in a tree-type structure [1], which makes the paths of data be nearly balanced. Fig. 1 shows the conventional 2-1 tree-type MUX schematic [2][3][4] and its timing diagram. At each stage, D-latches are inserted to latch the data temporarily in order to let two input data be out of order. The sequencing D-flip-flops (D-FF) guarantee sufficient setup time and hold time so the MUX can achieve high bandwidth.

The conventional 2-1 MUX contains a sequencing block and a 2-1 MUX. The sequencing block is used to synchronize both inputs with the clock and provide a half cycle difference between two inputs. The sequencing block has two paths, one containing a single D-FF and the other path with one D-FF and a latch. The sequencing D-FFs are used to trigger the data at clock rising or falling edges. As a result, the data is hold for the MUX and be less sensitive to input noise or timing jitter. The outputs of the sequencing block (Seq.1 and Seq.2) are a half cycle difference.

Nevertheless, the sequencing block contains asymmetric data paths for Data1 and Data2 and causes timing jitter between the outputs of these two signals. Data1 and Data2 propagate through two and three latches respectively. So there is one extra latch in the data path for Data2. The more latches,



Fig. 1. (a) Conventional 2-1 MUX; (b) Conventional 2-1 MUX timing diagram

the more delay time would occur for the data to pass through. For a multi-level  $2^n$ -to-1 serializing circuit, the maximum extra number of latches between two data paths is equal to  $n$  latches.

In this paper, a combined multiplexer-flip-flop (MUX-FF) topology is presented. A MUX-FF is composed of two stages of multiplexer-latch (MUX-latch). The storage part of a latch and the MUX structures are inherited, and the remaining circuit would be eliminated. Since the sequencing block is removed, data path can be totally balanced and get better concordance. With the proposed MUX-FF architecture, the area overhead and power consumption can also be reduced.

The rest of the paper is organized as below. Section II describes the background information of conventional 4-1 CML MUX. Our proposed MUX-Latch and MUX-FF are presented in section III. Simulation results are shown in section IV. A brief conclusion is given in Section V.



Fig. 2. (a) Conventional 4-1 MUX; (b) Conventional 4-1 MUX timing diagram

## II. CONVENTIONAL CML MUX

Fig. 2 shows the conventional tree-type 4-1 MUX and its timing diagram. The clock frequency of the second stage is twice of the first stage. At each stage, the phase-difference of input data needed for the 2-1 multiplexer is half of a clock period, which is a quarter period of the clock in the previous stage. Generating clock signals that have a quarter of period difference is hard to be implemented. That's the reason why a sequencing block is required to provide a half cycle delay between two inputs, instead of generating the data with two different phases from the previous stage.

In the GHz range circuit design, current mode logic (CML) circuits are usually used. The main advantages of CML are its high operating speed and constant power consumption independent of operation frequency. Fig. 3 shows the MUX [4] and latch circuit [5] designed in CML topology. In this paper, the drain of the transistor for  $V_{in}$  is assigned as  $V_{out}$ , so the function of the latch works as a buffer rather than an inverter.

A CML latch contains two parts, the buffer in the front and a storage loop in the back. Here we call the storage loop as the storage part. If the clock signal ( $V_c$ ) is high, data would propagate directly through the circuit, from  $V_{in}$  to output  $V_{out}$ . When the clock becomes low, current goes through the storage part, and the input has no effect on the output. Similarly, the multiplexer circuit can be separated into two buffers with selecting input signals. Only the selected data would propagate to the output. As we can see in the design,



Fig. 3. (a) Circuit of CML latch; (b) Circuit of CML MUX



Fig. 4. (a) MUX-latch circuit; (b) MUX-latch timing diagram

containing CML latches followed by CML MUXs, many stages of the buffer repetitions could be unnecessary.

## III. CML MUX-LATCH AND MUX-FF CIRCUIT

### A. MUX-LATCH

Fig. 4 shows the proposed combination circuit of a multiplexer and a latch, called MUX-latch, and its timing diagram. The clock  $V_{c0}$  used in the storage part has the same frequency as the output data. The two select signals  $V_{c1}$  and  $V_{c2}$  are of the same frequency as the input data but only valid for one-fourth of the input data period. For the first half clock period with a valid  $V_{c1}$ ,  $Data1$  is selected to propagate and the data is hold in the storage part for the next half clock period. After the first clock period of  $Data1$ ,  $Data2$  is selected and stored in the same way as  $Data1$ .



Fig. 5. (a) Function of latch; (b) Function of MUX-latch

The frequency of the output data is the same as the clock signal. The phase-difference needed by the multiplexer is 1 clock period between Data1 and Data2 in the new structure. It's easy to implement this phase difference which is actually 1/2 period clock cycle of the clock in the previous stage. So a sequencing block is not needed in each stage to provide the phase difference between input signals.

Fig. 5 shows the function comparison of a latch and a MUX-latch. A MUX-latch is similar with a latch, which propagate data if the clock is at the high level, or hold the data otherwise. The only difference between a latch and a MUX-latch is the time sharing. The high level of clock is for propagating of alternate Data1 and Data2.

### B. MUX-FF

Like a latch, a MUX-latch is level triggering so more than one stage of MUX-latch is needed to achieve edge triggering function. A MUX-FF, as shown in Fig. 6, is defined as the combination of two stage MUX-latches. The first stage contains two MUX-latches (top and bottom) and the second stage has only one MUX-latch.

In the first stage, p0 and p2 are control signals for the top MUX-latch and p1 and p3 are for the bottom MUX-latch. The p0 to p3 are valid for one-fourth of the input data period to control the selecting of inputs. The four input data are propagated by the MUX-latches one by one. When the top MUX-latch is propagating a datum, the bottom MUX-latch stores and holds the datum.

In the second stage, the signals, phase0 and phase1, are used for selecting the data. Each of the signals phase0 and phase1 is high for one-fourth of the data period of out1 and out2. As shown in Fig. 6(b), when phase1 is high and out2 propagates to the final output, the latching part of the bottom MUX-latch in the first stage is in the store and hold state. At this state, the output of the second stage MUX-latch is independent of the inputs Data1 and Data3 and is edge triggered by the rising edge of phase1. Therefore, the combination of the two stage of MUX-latch can be viewed as a MUX-FF.

Generally, any multi-stage of MUX-latches can be combined to edge-trigger the data. The connection of MUX-latches in a MUX-FF shown in Fig. 6(a) can be applied to implement a multi-level  $2^n$ -to-1 serializing circuit. The  $2^n$



Fig. 6(a) MUX-FF schematic; (b) Function of MUX-FF; (c) Timing diagram of MUX-FF

input data of different phases are supposed to be provided. Since the input data frequency is really low, it's less sensitive to timing jitter. A simple circuit with only a few latches is needed to generate different phase inputs at the very beginning of the design. Compared with the conventional MUX circuit,  $n$  stages of the sequencing module are needed. Therefore, the unbalance of data propagating paths can also be lowered.



Fig. 7. (a) Simulated 4-1 Conventional MUX Module; (b) Simulated 4-1 MUX-FF module

#### IV. IMPLEMENTATION AND SIMULATION RESULTS

The bottleneck of the operation frequency in a serializing circuit is in the last stage, which is operating at the highest speed of the system. A 4-1 MUX-FF is the smallest unit in the proposed MUX-FF scheme, so a conventional 4-1 MUX is implemented for comparison. Fig. 7 shows the modules of conventional MUX and MUX-FF.

Both of the two structures contain an output buffer, with a  $50\ \Omega$  load resistor connected at the output node. As shown in Fig. 7(b), a phase generator is used to replace the previous stage of MUX-latches, which is supposed to provide needed different phases for the data.

In the CML topology, either a latch or a multiplexer contains 7 transistors. There are 42 transistors in a conventional 2-1 MUX cell, which is composed of 5 latches and a multiplexer. But in a MUX-latch, the transistor count is only 10.

Nevertheless, the drain of transistor for the storage part is connected to the multiplexer output to implement the MUX-latch. The output parasitic capacitance of a MUX-latch could be much bigger than a conventional one. In order to operate at the same frequency for both of the two designs, we enlarge the size of transistors in the MUX-latch topology to support more current and keep the operation speed.

Fig. 8(a)(b) shows the output eye diagrams of a conventional 4-1 MUX and a 4-1 MUX-FF. The simulated module is built in the 90 nm CMOS process technology, and the output data rate is 10 Gbits/s.

With the same operating speed, Table I compares the size and power consumption of a MUX-FF with a conventional MUX. Many of the transistors are eliminated, so the ratio of



Fig. 8. 10Gbits/s eye diagram of (a) Conventional 4-1 MUX; (b) 4-1 MUX-FF

transistor count for a conventional MUX and MUX-FF is about 4:1. Since we enlarge the size of transistors in the MUX-FF, the ratio of the total area is 2.25:1, rather than the same ratio of the transistor count. The current and power of MUX-FF are decreased to be 28 percent of the conventional MUX.

TABLE I.

|                  | Conventional 4-1 MUX                       | MUX-FF                                    |
|------------------|--------------------------------------------|-------------------------------------------|
| Transistor count | 126                                        | 30                                        |
| Total area       | $1080 * 4\ \mu\text{m} * 0.1\ \mu\text{m}$ | $480 * 4\ \mu\text{m} * 0.1\ \mu\text{m}$ |
| Total current    | 45.2115 mA                                 | 12.6703 mA                                |
| Total power      | 54.2538 mW                                 | 15.2043 mW                                |

#### V. CONCLUSION

In this paper, we propose a MUX-FF topology which is combined of multiplexers and the storage part of latches. Compared with the conventional topology, a MUX-FF has the edge-trigger function but fewer numbers of transistors are needed in the circuit. While implementing a 4-1 multiplexing function in 10Gbits/s, the area and power consumption of a MUX-FF are only 44 % and 28 % of a conventional MUX, respectively.

#### REFERENCES

- [1] M. Ida, N. Kato, and T. Takada, "A 4 Gb/s GaAs 16:1 multiplexer/1:16 demultiplexer LSI chip," *IEEE J. Solid-State Circuit*, vol.24, no. 4, pp.928-932, Aug. 1989.
- [2] D. Kehrer, H. Wohlmuth, H. Knapp, M. Wurzer, and A. L. Scholtz, "40 Gb/s 2:1 multiplexer and 1:2 demultiplexer in 120 nm CMOS," *IEEE J. Solid-State Circuits*, vol. 38, no. 11, pp. 1830-1837, Nov. 2003.
- [3] K. Kanda, D. Yamazaki, T. Yamamoto, M. Horinaka, J. Ogawa, H. Tamura, and H. Onodera, "40 Gb/s 4:1 MUX/1:4 DEMUX in 90 nm standard CMOS," in *IEEE ISSCC*, pp. 152-153, Feb. 2005.
- [4] A. Tanabe, M. Umetani, I. Fujiwara, T. Ogura, K. Kataoka, M. Okihara, H. Sakuraba, T. Endoh, and F. Masuoka, "0.18- $\mu\text{m}$  CMOS 10-Gb/s multiplexer/demultiplexer ICs using current mode logic with tolerance to threshold voltage fluctuation," *IEEE J. Solid-State Circuits*, vol. 36, no. 6, pp. 988-996, Jun. 2001.
- [5] P. Heydari and R. Mohanveli, "Design of ultrahigh-speed low-voltage CMOS CML buffers and latches," *IEEE Trans. VLSI Syst.*, vol. 12, pp.1081-1093, Oct. 2004.