

# **Structured Approach for Designing 4:2 Compressor**

Hima Kantamneni, Mahmoud Jawharji, and Damu Radhakrishnan

Department of Electrical and Computer Engineering

**Division of Engineering Programs** 

State University of New York, New Paltz, NY 12561

kantamnenihima1993@gmail.com, m.jawharji@gmail.com, damu@engr.newpaltz.edu

**Abstract** — 4:2 compressors play a large role in the design of arithmetic units, especially in the case of multipliers. In this study we compare different implementations of 4:2 compressors in terms of their power, delay, power delay product and the number of transistors used. A total of 36 different implementations are compared and simulated using LTSPICE, and the power consumption was found to vary from 0.12 $\mu$ w to 0.38 $\mu$ w, and the power-delay product from 0.12fJ to 0.30fJ at 0.6V using 45nm BICMOS technology. The transistor counts vary from 30 to 38 for different implementations of the 4:2 compressor.

**Keyword** — 4:2 compressor, XOR gate, multiplexer, power, power delay product.

## Introduction

Power reduction is one of the major challenges facing todays VLSI designers. With every new generation, the circuit complexity and processing speed increases. The frequency is reaching a top limit and is settling down in the 4-5GHz range. This is due to the high cost of heat removal from these chips. Hence, designers are always looking for ways to minimize power consumption in their designs.

In this paper we present various designs of a popular device called 4:2 compressor [1]. The 4:2 compressor merits significance because of its use in hardware multipliers and digital signal processing units such as Fast Hardware Fourier Transform (FFT). multipliers predominantly use 4:2 compressors in their partial product reduction stage because of their structured layout compared to normal full adder cells. Many designs for 4:2 compressors exist in the literature focusing on power reduction [2-9]. Our focus here is to design individual submodules of the 4:2 compressor in many ways and making different designs by combining these submodules in various ways. The different designs are then sorted out in terms of their power, delay and power delay product.

## 4:2 Compressor

A 4:2 compressor consists of five inputs and three outputs and can be implemented with two stages of full-adders (FA) connected one after the other as shown in Figure 1. Usually the sum output of a full adder is implemented by cascading two XOR gates. If we use a similar implementation for the full adders used in Figure 1, then the total delay for the sum output (S) of the 4:2 compressor will be 4 XOR delays. Various approaches have been proposed in literature to improve their speed. A novel design of a 4:2 compressor based on a modified set of equations for the sum and carry outputs of the compressor is shown in Figure 2 [2]. The output equations are:

$$\begin{split} & \mathrm{S} = \mathrm{X}_1 \oplus \mathrm{X}_2 \oplus \mathrm{X}_3 \oplus \mathrm{X}_4 \oplus \mathrm{C}_{\mathrm{in}} \\ & \mathrm{C} = (\mathrm{X}_1 \oplus \mathrm{X}_2 \oplus \mathrm{X}_3 \oplus \mathrm{X}_4) \mathrm{C}_{\mathrm{in}} + \left( \overline{\mathrm{X}_1 \oplus \mathrm{X}_2 \oplus \mathrm{X}_3 \oplus \mathrm{X}_4} \right) \! \mathrm{X}_4 \\ & \mathrm{C}_{\mathrm{out}} = (\mathrm{X}_1 \oplus \mathrm{X}_2) \mathrm{X}_3 + \left( \overline{\mathrm{X}_1 \oplus \mathrm{X}_2} \right) \! \mathrm{X}_1 \end{split}$$



Fig. 1. 4:2 Compressor Composed of Two FAs



Fig. 2: Architecture of 4:2 Compressor

Their CPL implementation is very efficient for realizing multiplexers (MUX) and XORs, they need pull-up circuits and inverters to minimize the reduced-swing switching as well as weak signal transmission. A similar 4:2 compressor circuit in complementary pass transistor logic is presented in [3] that use a minimum of 40 transistors. CPL style design uses small input loads, provides good output driving capability due to their output inverters, and has a fast differential stage. But this differential stage, on the other hand, leads to considerably larger short-circuits currents. Furthermore, the substantial number of nodes in the circuit accounts for increased switching activity. A purely MUX based implementation of a 4:2 compressor using CMOS pass transistors is given in [4]. Their Implementation needed CMOS inverters for inverting the input bits and the outputs of some intermediate MUXs.



The inverters at the input have the maximum switching activity compared to all other nodes in the circuit and hence the power dissipation of this circuit is increased. Modified multiplexer based designs for 4:2 compressors are presented in [5]. The XOR gates were replaced by multiplexers to minimize delay in the critical path. Multiplexers in the internal paths were implemented in the transmission gate style. The delay, power and area were reduced due to the combined XOR-XNOR module and multiplexers.

A number of low power 4:2 compressors are presented in [6]-[8]. The designs in [8] are driven by input signals without using any direct path to supply voltages, and result in lower short circuit power. A number of high speed, low power 3:2, 4:2 and 5:2 compressors capable of operating at ultra-low voltages are presented in [9]. 4:2 compressor designs in the cascaded full adder style by using optimized full adders are presented in [10], [11] and are shown to have 9% less latency and 16% reduction in power delay product (PDP) compared to earlier designs. In [12] the conventional 4:2 compressors were modified to merge with the partial product generation block of an n×n-bit multiplier, while eliminating n<sup>2</sup> inverters. The proposed compressors were reported to provide up to 20% energy reduction in 65nm CMOS technology.

A current mode fully differential 4:2 compressor in 65-nm CMOS technology is presented in [13], [14]. In [13] they claimed area reduction and speed improvement up to 45% compared to other high speed conventional compressor circuits, while maintaining lower PDP of 3.26fJ at 1.2V.

## New Design of 4:2 Compressor

For convenience, the XORs in Figure 2 are numbered as XOR<sub>1</sub>, XOR<sub>2</sub>, XOR<sub>3</sub> and XOR<sub>4</sub>. Three different designs for XOR<sub>1</sub> and XOR<sub>2</sub> are shown in Figure 3 [15], [16]. All three designs generate both XOR (H) and XNOR (H<sub>P</sub>) outputs. The transistor counts in these designs vary from 6 to 10. Fig. 3(a) uses a six transistor XOR-XNOR design. Figures 3(b) and 3(c) use explicit inverters for generating the XNOR output. Hence the XNOR output exhibits extra delay compared to the XOR output. Two designs for XOR<sub>3</sub> module are shown in Fig. 4. This module receives the inputs in both normal and complementary form, and hence the number of transistors used in their implementation is less than that used in XOR<sub>1</sub> and XOR<sub>2</sub>. Similarly, two designs of XOR<sub>4</sub> module are shown in Fig. 5. Fig 5(a) uses six transistors and Fig 5(b) uses four transistors. The design of multiplexer uses two transmission gates and is shown in Fig. 6.



(a) Six Transistor XOR-XNOR



(b) Ten Transistor XOR-XNOR



(c) Nine Transistor XOR-XNOR Fig. 3. XOR<sub>1</sub> and XOR<sub>2</sub> Modules

By combining the individual designs for the XOR gates at the different levels a total of 36 designs are possible for the 4:2 compressors.



(b) Six Transistor XOR-XNOR Fig. 4. Two Implementations of XOR<sub>3</sub> Module

The designs (a), (b) and (c) are referred as A, B and C respectively later in this paper. Each 4:2 compressor is given a unique four letter label by combining the designs for the four XOR modules. For example, design ACAB refers to design A of XOR<sub>1</sub>, design C of XOR<sub>2</sub>, design A of XOR<sub>3</sub> and design of B of XOR<sub>4</sub>. The 2 to 1 multiplexer design used in the 4:2 compressor is shown in Figure 6.



## **Simulation Results**

LTSPICE was used for simulation using 45nm BSIM4 bulk CMOS model [17]. All PMOS transistors were sized at 450nm and NMOS transistors at 225nm. Three different supply voltages were used: 1.2V, 0.8V and 0.6V. Simulations were carried out individually for each XOR gate module. Simulation results are tabulated in Table 1. The average power for design 3(a) at 0.6V is left blank in Table 1, since the output waveform for the 6 transistor XOR-XNOR gate was not showing good logic values. By comparing the power consumption of XOR modules in Figure 3(b) and 3(c), it may be noted that 3(b) consumes more power. This is because the input inverter feeds the gates of PMOS transistors of the transmission gate.



(a) Six Transistor XOR



(b) Four Transistor XOR Fig. 5. Two Implantations of XOR<sub>4</sub> Module



Fig. 6. Multiplexer Module

These PMOS transistors are double the size of NMOS transistors used in the transmission gate, and hence exhibits larger capacitive load on the inverter. The same is true for the designs in Figure 4(b) and Figure 5(a). The

average power, delay and power-delay product for the different 4:2 compressor designs are tabulated in Table 2. They are arranged in ascending order of power. They vary from  $0.12\mu$ W to  $0.38\mu$ W, and 0.12fJ to 0.30fJ at 0.6V. The designs using the 6 transistor XOR-XNOR gates were not simulated at 0.6V and they are left blank in Table 2. The transistor counts used in each design are also shown in Table 2 (in index).

| Design | Avera<br>45n | age Power<br>m Technol | No. of<br>Transistors |    |  |
|--------|--------------|------------------------|-----------------------|----|--|
|        | 1.2V         | 0.8V                   | 0.6V                  |    |  |
| 3(a)   | 0.21         | 0.07                   | -                     | 6  |  |
| 3(b)   | 1.24         | 0.48                   | 0.25                  | 10 |  |
| 3(c)   | 0.38         | 0.16                   | 0.09                  | 9  |  |
| 4(a)   | 0.28         | 0.12                   | 0.07                  | 6  |  |
| 4(b)   | 0.79         | 0.34                   | 0.17                  | 6  |  |
| 5(a)   | 0.83         | 0.50                   | 0.21                  | 6  |  |
| 5(b)   | 0.14         | 0.05                   | 0.03                  | 4  |  |

#### Table 1. Power Consumption of XOR Modules

## Conclusion

A comparative study of 4:2 compressors is done in this paper. The 4:2 compressor is divided into four modules. For each module different designs are presented. Based on the division, 36 different designs are shown. They were all designed using 45nm BICMOS technology. Simulations were carried out using LTSPICE at three different supply voltages of 0.6V, 0.8V and 1.2V. At 0.6V, the six transistor XOR-XNOR gate designs used in XOR<sub>1</sub> and XOR<sub>2</sub> were not producing good output waveforms and hence the power consumption using these gates were not calculated. They are left blank in Table 2. The power and power-delay product varied from  $0.12\mu$ W to  $0.38\mu$ W, and 0.12fJ to 0.30fJ respectively at 0.6V. The transistor count varies from 30 to 38 for different implementations.

## References

- [1]. A. Weinberger, "4-2 carry-save adder module," IBM Tech. Discl. Bulletin, vol. 23, Jan. 1981.
- [2]. D. Ghosh, S.K. Nandy and K. Parthasarathy, "TWTXBB: A Low Latency, High Throughput Multiplier Architecture Using a New 4-2 Compressor," 7th Intl. Conf. on VLSI Design, Calcutta, India, pp. 77-82, Jan. 1994.
- [3]. Y. Kanie, Y. Kubota, S. Toyoyama, Y. Iwase and S. Tsuchimoto, " 4-2 Compressor with Complementary Pass-Transistor Logic," IEICE Trans. Electron., vol. E77-C, no. 4, pp. 647-649, April 1994.
- [4]. N. Ohkubo, M. Suzuki, T. Shinbo, T. Yamanaka, A. Shimizu, K. Sasaki and Y. Nakagome, "A 4.4-ns CMOS 54X54-b Multiplier Using Pass-transistor Multiplier," Proc. IEEE Custom Integrated Circuits Conf., pp. 26.4.1-26.4.4, 1994.
- [5]. J. Tonfat and R. Reis, "Low power 3–2 and 4–2 adder compressors implemented using ASTRAN," 2012 IEEE 3rd Latin American Symposium on Circuits and Systems (LASCAS), pp. 1-4, Playa del Carmen, Mexico,



#### March 2012.

- [6]. C. H. Chang, J. Gu, and M. Zhang, "Ultra low-voltage lowpower CMOS 4-2 and 5-2 compressors for fast arithmetic circuits," IEEE Trans. Circuits and Syst., vol. 51, issue 10, pp. 1985-1997, Oct. 2004.
- [7]. K. Prasad and K.K. Parhi, "Low-power 4-2 and 5-2 compressors," Thirty-Fifth Asilomar Conference on Signals, Systems and Computers, vol. 1, pp. 129-133, Pacific Grove, CA, USA, Nov. 2001.
- [8]. P. D. Gopineedi, H. Thapliyal, M. B. Srinivas and H. Arabnia, "Novel and Efficient 4:2 and 5:2 Compressors with Minimum Number of Transistors Designed for Low-Power Operations," Proceedings of the 2006 International Conference on Embedded Systems & Applications, Las Vegas, Nevada, USA, ESA 2006: pp. 160-168, June 2006.
- [9]. S. Veeramachaneni, et. al., "Novel Architectures for High-Speed and Low-Power 3-2, 4-2 and 5-2 Compressors," 20th International Conference on VLSI Design, pp. 324-329, Jan. 2007.
- [10]. A Pishvaie, G. Jaberipur and A. Jahanian, "High Performance CMOS (4; 2) compressors," International Journal of Electronics, vol. 101, issue 11, pp. 1511-1525, Jan. 2014.
- [11]. A Pishvaie, G. Jaberipur and A. Jahanian, "Improved CMOS (4; 2) compressor designs for parallel multipliers," Computers & Electrical Engineering, vol. 38, issue 6, pp. 1703-1716, Nov. 2012.
- [12]. D. Baran, M. Aktan and V.G. Oklobdzija, "Energy efficient implementation of parallel CMOS multipliers with improved compressors," 16th ACM/IEEE international symposium on low-power electronics and design, pp. 147–152, Austin, TX, USA, Aug. 2010.
- [13]. P. Aliparast, Z.D. Koozehkanani, A.M. Khiavi, G. Karimian and H.B. Bahar, "A very high-speed CMOS 4-2 compressor using fully differential current-mode circuit techniques," Analog Integrated Circuits and Signal Processing, vol. 66, Issue 2, pp. 235–243, Feb. 2011.
- [14]. P. Aliparast, Z.D. Koozehkanani and F. Nazari, "An Ultra High Speed Digital 4-2 Compressor in 65-nm CMOS," International Journal of Computer Theory and Engineering, vol. 5, no. 4, pp. 593-597, Aug. 2013.
- [15]. A. M. Shams and M.A. Bayoumi, "A Structured Approach

for Designing Low Power Adders," Proc. 31st ASILOMAR Conf. on Signals, Systems and Computers, vol. 1, pp. 751-761, 1998.

- [16]. D. Radhakrishnan, "Low Voltage CMOS Full Adder," IEE Proc-Circuits, Devices Syst., vol. 148, no. 1, pp. 19-24, Feb. 2001.
- [17]. http://ptm.asu.edu

### **Author's Profile**



**Hima Kantamneni** - Hima Kantamneni completed her MS degree in Electrical Engineering from the State University of New York, New Paltz, New York in 2017. Her research interests are in Digital IC Design, especially in

the area of Low Power High Speed Designs. She is an active student member of IEEE.



MahmoudJawharji-Mahmoud Jawharji finished hisBS in Com-puter EngineeringandMS inElectricalEngineeringfrom the StateUniversity of New York, NewPaltz, New York in 2017, andthen moved back to SaudiArabia. His MS thesis was on

"Low Power Partial Product Reduction Stage for Booth Multipliers". During his last year of his MS program, Mahmoud also worked as a teacher assistant for the Department of Engineering for a year. His passion is to continue research and pursue an academic career.



Damu Radhakrishnan - Damu Radhakrishnan is an Associate Professor and Graduate Program Coordinator in the Department of Electrical and Computer Engineering at State University of New York, New Paltz, New York.

His research interests are in Low Power VLSI Design and Design of High Performance Digital Architectures. He is a Life member of IEEE.



| Table 2. Average Power, Delay and Power Delay Product Comparisons |               |      |      |      |       |      |      |                     |      |      |             |  |
|-------------------------------------------------------------------|---------------|------|------|------|-------|------|------|---------------------|------|------|-------------|--|
| Na                                                                | Average Power |      |      |      | Delay |      |      | Power Delay Product |      |      | No. of      |  |
| NO                                                                | Cell          |      | (µw) |      | (ns)  |      |      | (fj)                |      |      | Transistors |  |
|                                                                   |               | 1.2V | 0.8V | 0.6V | 1.2V  | 0.8V | 0.6V | 1.2V                | 0.8V | 0.6V |             |  |
| 1                                                                 | CAAB          | 0.40 | 0.14 | -    | 0.11  | 0.25 | -    | 0.04                | 0.03 | -    | 31          |  |
| 2                                                                 | CCAB          | 0.62 | 0.23 | 0.12 | 0.15  | 0.51 | 1.36 | 0.09                | 0.12 | 0.16 | 34          |  |
| 3                                                                 | ACAB          | 0.63 | 0.37 | -    | 0.18  | 1.52 | -    | 0.11                | 0.56 | -    | 31          |  |
| 4                                                                 | CAAA          | 0.63 | 0.27 | -    | 0.15  | 0.35 | -    | 0.09                | 0.09 | -    | 33          |  |
| 5                                                                 | AAAB          | 0.70 | 0.42 | -    | 0.20  | 0.43 | -    | 0.14                | 0.18 | -    | 30          |  |
| 6                                                                 | CBAB          | 0.87 | 0.31 | 0.18 | 0.20  | 0.46 | 1.22 | 0.18                | 0.14 | 0.22 | 35          |  |
| 7                                                                 | CABB          | 0.88 | 0.24 | -    | 0.10  | 0.20 | -    | 0.09                | 0.05 | -    | 31          |  |
| 8                                                                 | ABAB          | 0.96 | 0.31 | -    | 0.17  | 0.36 | -    | 0.17                | 0.11 | -    | 32          |  |
| 9                                                                 | CCAA          | 0.96 | 0.33 | 0.15 | 0.20  | 0.44 | 1.19 | 0.20                | 0.15 | 0.18 | 36          |  |
| 10                                                                | BCAB          | 0.98 | 0.35 | 0.22 | 0.16  | 0.35 | 0.97 | 0.15                | 0.12 | 0.21 | 35          |  |
| 11                                                                | AAAA          | 1.00 | 0.48 | -    | 0.15  | 0.34 | -    | 0.15                | 0.16 | -    | 32          |  |
| 12                                                                | BAAB          | 1.01 | 0.43 | -    | 0.10  | 0.26 | -    | 0.10                | 0.11 | -    | 32          |  |
| 13                                                                | ACAA          | 1.05 | 0.47 | -    | 0.20  | 0.41 | -    | 0.21                | 0.19 | -    | 33          |  |
| 14                                                                | CCBB          | 1.07 | 0.39 | 0.22 | 0.23  | 0.50 | 1.21 | 0.25                | 0.19 | 0.27 | 34          |  |
| 15                                                                | ACBB          | 1.15 | 0.10 | -    | 0.45  | 3.25 | -    | 0.52                | 0.32 | -    | 31          |  |
| 16                                                                | AABB          | 1.16 | 0.56 | -    | 0.12  | 0.25 | -    | 0.14                | 0.14 | -    | 30          |  |
| 17                                                                | CBAA          | 1.22 | 0.43 | 0.22 | 0.20  | 0.44 | 1.19 | 0.25                | 0.19 | 0.26 | 37          |  |
| 18                                                                | CABA          | 1.24 | 0.40 | -    | 0.11  | 0.22 | -    | 0.13                | 0.09 | -    | 33          |  |
| 19                                                                | BBAB          | 1.25 | 0.44 | 0.24 | 0.16  | 0.35 | 0.97 | 0.19                | 0.15 | 0.23 | 34          |  |
| 20                                                                | CBBB          | 1.25 | 0.46 | 0.28 | 0.10  | 0.21 | 0.52 | 0.12                | 0.09 | 0.15 | 35          |  |
| 21                                                                | ABAA          | 1.33 | 0.59 | -    | 0.20  | 0.41 | -    | 0.26                | 0.24 | -    | 34          |  |
| 22                                                                | ABBB          | 1.33 | 0.20 | -    | 0.08  | 0.16 | -    | 0.10                | 0.03 | -    | 32          |  |
| 23                                                                | BAAA          | 1.34 | 0.56 | -    | 0.11  | 0.30 | -    | 0.15                | 0.17 | -    | 34          |  |
| 24                                                                | BCAA          | 1.35 | 0.46 | 0.25 | 0.18  | 0.41 | 1.12 | 0.24                | 0.19 | 0.28 | 37          |  |
| 25                                                                | BCBB          | 1.39 | 0.49 | 0.25 | 0.07  | 0.17 | 0.46 | 0.10                | 0.08 | 0.12 | 35          |  |
| 26                                                                | BABB          | 1.46 | 0.59 | -    | 0.10  | 0.23 | -    | 0.15                | 0.13 | -    | 32          |  |
| 27                                                                | CCBA          | 1.47 | 0.53 | 0.31 | 0.11  | 0.22 | 0.57 | 0.16                | 0.12 | 0.18 | 36          |  |
| 28                                                                | AABA          | 1.51 | 0.65 | -    | 0.13  | 0.26 | -    | 0.20                | 0.17 | -    | 32          |  |
| 29                                                                | ACBA          | 1.52 | 0.24 | -    | 0.08  | 0.18 | -    | 0.13                | 0.04 | -    | 33          |  |
| 30                                                                | BBBB          | 1.57 | 0.56 | 0.28 | 0.07  | 0.17 | 0.46 | 0.12                | 0.09 | 0.13 | 36          |  |
| 31                                                                | BBAA          | 1.62 | 0.55 | 0.27 | 0.18  | 0.41 | 1.12 | 0.29                | 0.22 | 0.30 | 38          |  |
| 32                                                                | CBBA          | 1.65 | 0.60 | 0.36 | 0.11  | 0.22 | 0.57 | 0.17                | 0.13 | 0.20 | 37          |  |
| 33                                                                | ABBA          | 1.70 | 0.32 | -    | 0.08  | 0.17 | -    | 0.14                | 0.06 | -    | 34          |  |
| 34                                                                | BCBA          | 1.75 | 0.63 | 0.37 | 0.08  | 0.18 | 0.50 | 0.14                | 0.12 | 0.19 | 37          |  |
| 35                                                                | BABA          | 1.83 | 0.74 | -    | 0.11  | 0.25 | -    | 0.20                | 0.18 | -    | 34          |  |
| 36                                                                | BBBA          | 1.93 | 0.68 | 0.38 | 0.08  | 0.18 | 0.50 | 0.16                | 0.13 | 0.19 | 38          |  |

Index