# A 20 Gbps Scalable Load Balanced Birkhoff-von Neumann Symmetric TDM Switch IC with SERDES Interfaces

Yu-Hao Hsu, Min-Sheng Kao, Hou-Cheng Tzeng, Ching-Te Chiu\*, Jen-Ming Wu, Shuo-Hung Hsu \*Institute of Communications Engineering, National Tsing Hua University, Hsinchu, 300, Taiwan. E-mail:ctchiu@cs.nthu.edu.tw

Abstract—For the first time, we implemented a reconfigurable load-balanced TDM switch IC with SERDES interface circuits for high speed networking applications. An N×N TDM switch could be constructed recursively from the TDM switch IC to achieve switching capacity of hundred gigabits per second or higher. The TDM switch IC contained a digital 8×8 TDM switch core with 8B10B CODECs and analog SERDES I/O interfaces. In the I/O interfaces, eight 2.56/3.2Gbps dual-mode 16/20:1 SERDES with CML buffers were developed. The 16/20:1 instead of 8/10:1 serializer and deserializer were used to reduce the required operating frequency in the switch core by half. New half-rate architectures and all static CMOS gates were used in the 16/20:1 serializer and deserializer for the low power consumption. A wide-band CML I/O buffer with our patented PMOS active load scheme was developed. All implementation were based on the 0.18 µm CMOS technology. Our implementation showed a 20 Gbps switching capacity for the 8×8 TDM switch IC.

### I Introduction

There is an urgent need to built high speed switches that scale with the transmission speed of fiber optics. As the key limitation of an electronic switch is the memory accessing speed, input-buffered switches, capable of performing parallel read/write, have received a lot of attention recently. High-end routers, such as Cisco 12000 and Juniper T640, are based on conflict resolution of parallel buffers. However, conflict resolution requires additional computation and communication overheads, which prohibit it from building switches with much higher speed.

Several load-balanced switches are proposed to resolve the memory access confliction and eliminate the extra computational and communication overhead by uniformly distributing input traffic [1,2,3]. Those load-balanced switches provide a promising approach to construct terabit switch fabric with 100% throughput rate without conflict resolution. The load-balanced switch consists of parallel buffers and symmetric TDM switches as shown in [1,2]. The parallel buffers can be implemented inside line cards with various queuing algorithms. Here we focus on the construction of an  $N \times N$  symmetric TDM switch.

### II. The Architecture of the Symmetric TDM IC

The overall architecture of the 8x8 TDM switch is shown in Figure 1. It includes eight receiving modules for the eight input ports, eight transmitting modules for the eight output ports and the 8x8 symmetric TDM switch core with 8B10B CODECs. Each receiving module contains a CML input buffer and a deserializer to convert the serial input into internal parallel data bus. Each transmitting module contains a serializer to convert the output data bus from the 8x8 TDM switch into a serial datum which is sent out through the CML output buffer. One SERDES module is shown in

Figure 2.



Fig. 1. Symmetric TDM switch with SERDES interfaces



Fig. 2. One channel SERDES interface

The SERDES interface is used to reduce the overall pin count. Each input or output port at the receiving/transmitting interface is designed to support data transmission rate up to 3.2 Gbps. After the 10B to 8B conversion, the internal data rate at the 8x8 TDM switch core is about 20Gbps with 2.56Gbps per port.

### A. Symmetric TDM Switch Core with 8B10B CODEC

The connection patterns of an  $N \times N$  crossbar switch fabric can be described as follows. During the  $t^{th}$  time slot, input port i is connected to the output port j if

$$(i+j) = (t+1) \bmod N.$$
 (1)

A switch fabric that implements the connection patterns in Eq. (1) is called a symmetric Time Division Multiplexing (TDM) switch. In our design, a 64×64 symmetric TDM switch can be constructed from sixteen 8×8 symmetric TDM switch ICs via perfect shuffle interconnection.

The 8B10B CODEC can be turned on or bypassed. When the 8B10B CODEC is enabled, the descrializer converts the serial datum into the 20-bit data bus for the 8B10B decoder. The 8B10B CODEC is used for error detection and generating DC-balanced binary signals for ease of CDR.

# B. Ultra-low-power 16/20:1 Dual Mode Serializer and Deserializer

When integrating the serializer/deserializer with the TDM switch core, the 8/10:1 serializer (deserializer) requires 320MHz clock frequency to achieve 2.56/3.2Gbps data rate. We propose a 16/20:1 serializer (deserializer) cooperating with two 8B10B encoders (decoders) that can reduce the operating frequency requirement in digital core by half. Therefore lots of core power consumption can be reduced in

the system. Moreover, the 16/20:1 serializer (deserializer) implemented by all static CMOS gates effectively reduces the power from 50mW to 5mW as compared with the one implemented by the Source Coupled Logic (SCL).

The 16/20:1 serializer converts the sixteen (or twenty) bit input data into a serial datum. For the 20:1 mode, since it is not the power of 2, the general tree type multiplexer can not be used. Therefore, we adopt the shift register approach to store the parallel inputs and send it out serially. The block diagram of the 16/20:1 serializer is shown in Fig. 3. We could treat the last 2:1 MUX and two DFFs in the dashed box in Fig. 3 as a double-edge-triggered DFF, which means that only a half rate clock is needed. For example, a 3.2Gbps data stream requires a 1.6GHz clock. The similar approach is done in the 16/20:1 deserializer.



Fig. 3 Block diagram of a 16/20:1 serializer



Fig. 4 (i) CML input interface

(ii) CML output interface



Fig. 5 Block diagram of a basic CML buffer

### C. CML I/O Interface

The architecture of the CML I/O interface is shown in Figure 4. The CML input interface consists of an equalizer, an inductive-peaking active feedback CML limiting amplifier and a DC offset canceling circuit. The typical input sensitivity is 4mV and the limiting amplifier output swing is around 250mV. The CML output interface consists of a level-shift circuit, a voltage-peaking circuit and three-stage CML buffers, used as a backplane driver. The last stage of CML output buffer can provide approximately 8mA driving current in order to drive 50 ohm load and let an output swing range up to 250mV.

The architecture of a basic differential current-mode logic buffer circuit is shown in Figure 5. It includes an active inductor formed by PMOS transistors that act as active resistors connected to NMOS transistors load. They act as the on-chip inductors to employ inductive-peaking. Compared with on-chip inductors, active inductors require much lower chip area and consume less power but have the same frequency response. This CML buffer circuit also incorporates active feedback and negative Miller capacitance to meet high-speed requirement.



Fig. 6 Die micrograph



Fig. 7 The measurement of one channel 16/20:1 multiplexer with CML output Buffer@ 3.2Gbps



Fig. 8 The eye diagram of the CML output buffer@3.2Gbps 2^7 - 1 PRBS input

## III. Measurement

This symmetric TDM switch IC was fabricated using 0.18µm CMOS technology. The overall chip area (including PLL) is 3.65×3.57mm^2. Fig. 6 shows the die micrograph. Fig. 7 shows the measurement result of one channel 16/20:1 multiplexer with CML output buffer at 3.2Gbps. The eye diagram of the CML output buffer at 3.2Gbps is shown in Fig. 8. The differential output voltage is 250mVp-p with 20.2ps jitter at 3.2Gbps. The receiver sensitivity is 25mVp-p. The power consumption for serializer, deserializer and PLL are 31mW/Ch, 28mW/ch and 24mW respectively. The implementation results show that the 8×8 TDM switch with SERDES interface can achieve a 20 Gbps switching rate.

### IV. Summary and Conclusions

For the first time, the reconfigurable load-balanced TDM switch module is implemented. The module is simple and can easily be scaled up to an N×N TDM switch. A  $64\times64$  TDM switch can be recursively constructed from the 8x8 TDM modules to reach 160Gbps switching capacity using a 0.18 $\mu$ m CMOS technology. In the SERDES interface, we developed low power 8-channel CML transceivers using half rate and dual-mode 16/20:1 multiplexing schemes.

### References

- [1] C. S. Chang, D. S. Lee and Y. S. Jou, "Load balanced Birkhoff-von Neumann switches, part I: one-stage buffering," Computer Communications, Vol. 25, pp. 611-622, 2002.
- [2] C. S. Chang, D. S. Lee and C. M. Lien, "Load balanced Birkhoff-von Neumann switches, part II: multi-stage buffering," Computer Communications, Vol. 25, pp. 623-634, 2002.
- [3] Keslassy, I.; Shang-Tse Chuang; McKeown, N., "A load-balanced switch with arbitrary number of line cards," Infocom, 2004, page 2007-1026, March 2004.