Skip to main content Accessibility help
×
Hostname: page-component-74d7c59bfc-b9mx5 Total loading time: 0 Render date: 2026-02-09T08:30:02.986Z Has data issue: false hasContentIssue false

1 - Introduction

3D Integration and Near-Field Coupling

Published online by Cambridge University Press:  17 September 2021

Tadahiro Kuroda
Affiliation:
University of Tokyo
Wai-Yeung Yip
Affiliation:
University of Tokyo

Summary

Chapter 1 starts by tracing the history of the computer, integrated circuit (IC), and connector in the last 60 years. In particular, it describes how the goal of IC development evolved from high-performance IC to low-power IC and interface, and then to high energy efficiency. This provides the background to help the reader understand current and future challenges faced by the IC and connector in addressing the diverging performance needs of various emerging applications. This in turn sets the stage for the introduction of 3D IC integration, which is evolving from low-cost wirebond to high-performance and high-density TSV-based solutions to offer More than Moore performance improvement. The challenges faced by 3D integration are then enumerated, and 2.5D integration and wireless interface technologies are presented as current and future solutions respectively. A brief overview of wireless technologies is then provided, followed by an explanation of why near-field coupling has been applied to develop two wireless interface technologies – ThruChip Interface (TCI) and Transmission Line Coupler (TLC). The chapter concludes with an overview of TCI and TLC and an elaboration of how they address respectively the challenges in 3D IC integration and connector performance scaling.

Information

Type
Chapter
Information
Publisher: Cambridge University Press
Print publication year: 2021

1 Introduction 3D Integration and Near-Field Coupling

1.1 A Short History of the Computer, IC, and Connector

The advance in computer performance has been nothing short of spectacular. In just 60 years, within the lifetime of the average human being, computer performance has increased by 100,000-fold, while power consumption has decreased by a similar factor of 100,000, physical dimensions by a factor of 1,000,000, and price by a factor of 1,000. However, advance in integrated circuit (IC) technology, which has been fueling this explosive growth, is now facing severe challenges that may bring the growth to a stop. Before we ponder where we are going to go from here, it is worth looking back to see how we have come to this point.

1.1.1 History of the Computer

The 2014 British film The Imitation Game portrays the development of a computer by Alan Turing to break the secret code behind Nazi Germany’s Enigma communications machine. Turing is considered one of the fathers of the modern-day computer due to his development of the Turing machine that provides the model for a general-purpose computer. Nevertheless, his code-breaking computer completed in 1938 had the sole purpose of breaking the Enigma machine.

Indeed, early computers were specific-purpose computers where the types of problems they could solve were determined by how their components were wired together. In other words, it was a wired-logic computer architecture. Therefore, what we think of as programming today was done by rewiring the components, if at all possible. This hardwired architecture has two severe challenges.

  1. (1) Challenge of scale: the size of problems that can be solved is limited by the scale of the network of components.

  2. (2) Challenge of wiring: as the system scale goes up, the wiring complexity grows exponentially and beyond what can be handled manually.

These two challenges were overcome with two groundbreaking inventions: the von Neumann computer with stored programs and the IC. In a von Neumann computer, the data to be processed and a list of commands in the form of a program to control how the hardware processes the data are stored in memory. The computer operates by retrieving input data and the program from memory. Then in each operational cycle, a different command is executed resulting in a different set of operations. Different types of problems can thus be solved by feeding different programs with different sets of commands to the same piece of hardware with the same wiring. A von Neumann computer is therefore a general-purpose computer. More complex problems can be solved with more complex programs, overcoming the challenge of scale.

Meanwhile, invention of the IC, chronicled in the next section, has enabled a large number of circuit elements to be integrated and interconnected all on a single chip. Using photolithography, all interconnections can be made in parallel as part of the chip-making process, overcoming the challenge of wiring. The IC enables integration and parallelization of simplified and progressively downsized computing resources, driving rapid advance in chip computational performance.

However, as computational performance continues to rise, a new challenge in interconnecting chips is emerging. In particular, economics requires that the processing unit and the bulk of the memory it uses be implemented on separate chips, and connecting these two chips has become challenging as the amount and required speed of data flow in the chip interface grows, resulting in what is known as the von Neumann bottleneck in the interface, limiting overall system computational performance. Consequently, continuous advance in computer performance requires increasing both chip and interface performance.

On the other hand, the rapid advance in artificial intelligence technology based on neural network and deep learning is renewing interest in the wired-logic computer architecture, where the function performed is implemented through hardwiring like the neural network in the human brain. As a result, history has come full circle and the challenges of scale and wiring are once again pushed to the forefront, which necessitates innovative solutions to drive the next phase of computing technology advancement.

1.1.2 The Four Seasons of IC Development

1.1.2.1 The “Big Bang”: Invention of the IC

One of the first general-purpose computers built was ENIAC (Figure 1.1), developed in the United States in 1945 and introduced in 1946. It was a gigantic piece of machine built with 100,000 components and weighed 27 tons. It consumed so much power – 150 kW – rumor has it that streetlights in the neighborhood dimmed when it was being operated. It had about 5,000,000 solder connections made by hand. Its heat-generating vacuum tubes had such poor reliability that a few of them broke and the system went down every day.

Figure 1.1 Programming ENIAC. Photo by US Army, public domain

The transistor, a more reliable replacement of the vacuum tube, was invented by William Shockley et al. in 1948, based on the groundwork laid by Bell Labs in the prior year. With much less heat generation and higher reliability, it enabled construction of much larger-scale systems. However, the increasing scale of computer systems led to the exponential growth in the number of interconnections between components and hence wiring complexity. It was such a challenging problem that it became known as the tyranny of numbers. The winning solution that emerged, among various proposed, was the IC.

The IC was invented by Jack Kilby and Robert Noyce in 1958–1959. At the time, Texas Instruments (TI) was working on the Micro-Module solution to the problem of the tyranny of numbers that enabled circuit components to be densely packed on a printed circuit board (PCB). Kilby, despite being a TI engineer, developed an alternate solution – the groundbreaking concept of a monolithic IC where all circuit elements are integrated on a single piece (mono) of semiconductor substrate (lithic). In the following year, Robert Noyce would develop the silicon implementation of the IC.

1.1.2.2 Spring: Explosive Growth through Scaling

Around the same time in December 1959, American physicist Richard Feynman delivered a lecture entitled “There’s Plenty of Room at the Bottom,” where he discussed scientific manipulation at the atomic level. This could be considered the start of nanotechnology research, a field that fueled the advance in IC technology and hence its explosive growth.

In 1965, Gordon Moore of Fairchild Semiconductor (and later of Intel Corporation) published a paper entitled “Cramming More Components into Integrated Circuits,” where he described the potential of the IC with its exponentially growing integration density. He observed that the number of components in an IC had doubled approximately every year, and speculated that such a growth rate would continue. This speculation would later become known as Moore’s law and set the collective target for the advance of the semiconductor industry.

Figure 1.2 illustrates how the dominant computing device has evolved as transistor count increases following Moore’s law, and the changing driving force behind the evolution. From around 1980 to 1995, transistor channel length shrank from 4 to 0.35 µm. Meanwhile, the number of transistors that can be integrated on an IC increased from 100,000 to 100,000,000. As a result of this increased chip-level integration, the computer evolved from being an engineering workstation to a personal computer (PC), and its adoption spread from being one per group to one per person. As the trend continued, the smartphone replaced the PC as the dominant computing device, the Internet of Things (IoT) became ubiquitous, and artificial intelligence (AI) computing finally entered into the mainstream.

Figure 1.2 Evolution of the integrated circuit [Reference Kuroda1].

In this manner, scaling fueled the explosive growth of the IC. Unfortunately, there was a trade-off with an undesirable consequence that lurked in the background as computational performance rose. Eventually, around 1995, it reared its ugly head in the form of a wall that threatened to stop further scaling.

1.1.2.3 Summer: The Power Wall

Around 1995, process scaling ran into a power wall. Scaling resulted in a continuous increase in power consumption, to the point that the IC released so much heat as to degrade its reliability, bringing further scaling to a halt. As an illustration, from 1980 to 1995, in 15 years, power consumption of an IC increased 1,000-fold. On a per-unit-area basis, an IC generated as much heat as an electric cooktop – you can fry an egg by powering an array of these chips under the fry pan! How did that happen?

In 1974, Robert Dennard, inventor of dynamic random-access memory (DRAM), formulated with his colleagues the scaling theory that describes an ideal scaling scenario where the electric field within the semiconductor device is kept constant. If you scale the supply voltage and physical dimensions of the device by the same factor, the electric field, which is a function of the ratio of voltage to distance, will remain constant. As a result, transistors whose operation is driven by manipulation of the electric field will perform the same way after scaling. Under this scaling scenario, power density would remain unchanged as well. Therefore, for the same chip size, power remains the same after scaling, and hence no heat generation or dissipation problems should arise (see the column “Constant electric field” in Table 1.1). Meanwhile, as the device gets smaller, more devices can be integrated in the same area, leading to lower manufacturing costs. Furthermore, the circuit resistor–capacitor (RC) time constant is reduced, resulting in higher operating speed. It is indeed a beautiful scenario that, if realized, would have no undesirable consequences.

Table 1.1. Transistor scaling scenarios

Scaling scenarioConstant electric field (ideal)Constant voltage (actual, 1980–1995)
Scaling factorDevice size, x½½
Gate oxide thickness, tox½½
Voltage, V½1
Resultant changeElectric field, E12
Current, I V2/tox½2
Resistance, R V/I1½
Capacitance, C x2/tox½½
RC delay, τ CR½¼
Power, P CV2/ τ¼2
Power density, p P/x218
Source: © 1995 IEICE, [Reference Kuroda and Sakurai2] table 1.

The reality was that scaling was implemented with no reduction in supply voltage. In other words, it was not a constant electric field but rather constant voltage scaling. In this case, the circuit ran even faster as RC delay was further reduced, IC computational performance went up further, and more chips were sold bringing in higher revenues. The flip side was a large increase in power (see the “Constant voltage” column in Table 1.1). Given that IC power consumption was small to begin with, its increase had initially only a secondary effect. Nevertheless, continuous scaling resulted in rapid increase in power consumption. Before long, it hit a painful limit. This power wall was therefore the result of an unintended side effect of nonideal scaling.

The power consumption increase with scaling was the result of the laws of physics. It is therefore not an easy problem to solve. Faced with the power wall, various schemes were devised to handle the extra power that came with performance increase. For instance, in power gating, circuit operations are diligently monitored. Whenever a circuit is not in use, its power is cut off. In another scheme, supply voltage is lowered whenever high circuit performance is not required. Such power-saving actions may sound obvious given that we practice similar power conservation in our daily life. But with hundreds of millions of transistors on a large-scale IC, uncovering waste is no simple task. For any waste we can find and eliminate, that much power can be exchanged for increased performance. In other words, “no gain in power efficiency, no gain in IC performance.” This is a trade-off IC development has continued to face to this day.

1.1.2.4 Fall: The Leakage Wall

Around 2005, scaling hit yet another wall – the leakage wall. Unlike the power wall, the leakage wall was not a result of the progression of scaling; rather, it was the consequence of scaling reaching its limits. As transistor channel length was scaled down to 100 nm, gate oxide thickness was reduced to 1.2 nm, about the thickness of 4 molecules. If the oxide layer were thinned further, it would be subject to quantum effects under which tunneling current would flow. But without proportional reduction in gate oxide thickness, further transistor scaling would weaken ability of the gate to properly turn off the channel, resulting in leakage current. It is analogous to a faucet not properly turned off, allowing water to drip.

The options for fixing the “leaky faucet” include changing materials, modifying the manufacturing process, or adjusting how the device is constructed, as shown in Figure 1.3. For example, dielectric materials with higher dielectric constant were introduced to increase electric field strength without reducing thickness. In parallel, development of a manufacturing process was started to introduce distortions in the silicon structure (strained silicon) to stretch the silicon atoms to boost mobility and hence current when the transistor is on, without increasing leakage when the transistor is off. Furthermore, transistor gate construction was transformed from being planar to three-dimensional, such as in a fin field-effect transistor (FinFET), in order to strengthen gate control of the channel.

Figure 1.3 Low leakage device technologies [Reference Kuroda3].

Nevertheless, all these are measures to provide temporary relief only; there are no good ways to counter quantum effects that result from the laws of physics.

1.1.2.5 Winter: The End of Scaling?

The cover of the April 2015 issue of IEEE Spectrum (Figure 1.4) depicts a flipped-over computer chip in the shape of a belly-up, dead bug, with the competing headlines of “Moore’s Law is dead” vs. “Long live Moore’s Law.” “Is Moore’s Law dead?” is the billion-dollar question faced by the IC industry: Is scaling, that has propelled the industry in the last half of a century, coming to an end? Has IC development entered its winter season?

Figure 1.4 IEEE Spectrum cover, April 2015.

© 2015 IEEE

Over the prior 40 years, the manufacturing cost per transistor had dropped by a factor of 4 million. However, as channel length shrank below 28 nm, this unit cost started to rise. The economic benefits from scaling finally started to disappear. IC manufacturing has become a privilege that only a few companies with deep pockets can afford due to the enormous investment required. Even the costs for making an R&D test chip have risen from only several hundred thousand dollars to several million dollars. Faced with such adverse economic reality, once thriving IC research in both the academia and industry is gradually losing momentum.

1.1.2.6 The Second Spring

Nonetheless, innovation is not slowing; it is only taking a different form. Innovation in the semiconductor industry over the last 60 years has been mainly driven by Moore’s law in process scaling that has fueled an exponential proliferation in IC quantity and made it ubiquitous in our society. With the slowing or even ending of scaling, innovation is shifting to enhancing quality, in terms of how we apply IC technology in various innovative ways to enrich our life. The excitement in dramatically improving the price performance of an IC has been replaced by excitement in research and development of novel applications, including automated driving, artificial intelligence, big data, and IoT. Each of these areas is exploring different innovative ways to realize the potential provided by the performance achieved in scaling – IC’s first act. After experiencing the excitement of achieving a 100 million-fold improvement in the price performance of a computer, we are witnessing the opening of the curtain on IC’s second act, where advance in IC technology enters its second spring to deliver enrichment across various facets of our life.

1.1.3 History of the Connector

Regardless of where we are on the transistor count growth curve, there is a limit on how many functions can be packed into an IC. Similarly, there is a limit on how big a package, a module, and a printed circuit board can practically and economically be manufactured. Furthermore, not all desired functions can be performed by transistors. For instance, many functions require the use of sensors, actuators, and other input/output devices. On the other hand, economics oftentimes requires sourcing components made with different processes by different suppliers, and economy of scale favors the use of standardized parts. As a result, a computing system is generally constructed by connecting separate parts together, which necessitates not only interchip integration but also module integration using connectors.

In contrast to the IC, the history of the connector over the last 70 years has been characterized more by advances in quality instead of quantity, by how connections are made instead of exponential increase in the number of connections. The major technological innovations during this time include the following:

  • Crimped termination

  • Insulation displacement contact

  • Wire wrapping

  • Press-fit

  • Optical connector

In particular, the crimped termination represented a major paradigm shift since it gave rise to a solderless connection.

1.1.3.1 The Solderless Connection

The solderless electrical connection was invented by American engineer Uncas A. Whitaker in 1941 [4]. Known as a crimped termination, it is a connection formed by inserting open wires on one end into a small metal tube with a ring on the other end, and pressing the ring against the wires using a specialized tool (Figure 1.5). Compared to a rigid, soldered connection, this revolutionary technology delivers a much faster way to make and break a connection in a smaller space. Furthermore, it offers higher reliability in the face of shock and vibration. Consequently, when the United States entered World War II, the technology was widely adopted in military applications since it maintained electrical connections even under harsh operating conditions. Furthermore, it significantly increased the speed of equipment repair, including that of fighter jets on aircraft carriers, which was a critical advantage in wartime. After the war ended, the application of crimped termination was extended to commercial applications, especially those that required operation in harsh environment, including automotive and appliances, such as power tools for the homebuilding boom that ensued after the war.

Figure 1.5 A crimped termination.

© 2010 SMK Corporation [5]

The solderless electrical connection opened up the possibility for an easily breakable connection and hence the idea of an electronic connector, which enabled modularization and flexible selection of different combinations of modules in building a large-scale electronic system, resulting in flexibility, expandability, and scalability. System assembly can be done by users in the field, and it is easy to replace or upgrade part of the system. Consequently, it represented an advance that is parallel to the IC in solving the connection problem of large-scale systems.

1.1.3.2 Recent Challenges for the Connector

The introduction of the iPhone in 2007 set in motion the rapid shift from the PC to the smartphone as the dominant form of electronic and computing system. Even the first-generation iPhone represented an order-of-magnitude shrink in form factor compared to the PC. The subsequent exponential growth in functionality and performance of the smartphone with continuous, albeit incremental, shrink in form factor in the last decade only added to the pressure to continuously shrink both the footprint and height of the components within. Nevertheless, scaling of connector dimensions is not keeping pace with that of the IC. This is illustrated by the historic trend in the scaling of flexible printed circuit (FPC) connector pitch. Because of its flexibility, low profile, and small dimensions, FPC and its connector are commonly used in mobile devices. From 1996 to 2004, FPC connector pitch scaled by roughly 0.4× and its height by roughly 0.25× in eight years, corresponding to average scaling of 0.89× and 0.84× per year, respectively (Figure 1.6). By contrast, around the same time from 1995 to 2003, Intel process migrated from 350 to 90 nm, scaling gate length from 350 to 50 nm, or 0.14× in eight years [6]. More recently, based on the roadmap published by the Japan Electronics and Information Technology Industries Association (JEITA), mainstream pitch of FPC-to-PCB connectors used in small form-factor devices is projected to shift from 0.3–0.4 mm in 2016 to 0.2–0.3 mm in 2026 (Figure 1.7), which translates to total scaling of 0.714× over 10 years, and a modest average scaling of 0.967× per year. This represents a rapid deceleration in the scaling of FPC connector pitch.

Figure 1.6 FPC connector dimensions scaling, 1996–2004.

© 2010 SMK Corporation [5]

Figure 1.7 FPC connector pitch roadmap, 2016–2026.

© 2017 JEITA [7]

The challenges in scaling connector dimensions are not surprising given the mechanical processes required in manufacturing connectors. Furthermore, it is difficult to significantly scale the dimensions of the connector housing that is needed to provide mechanical stability to ensure a reliable connection, especially when it is subject to shock and vibration, such as in an automotive or space application. This is on top of the challenge in maintaining signal integrity as the data rates of signals passing through connectors rise rapidly. The time is ripe for another revolutionary connector technology to enable a leap in miniaturization and performance of electronic systems.

1.1.4 Closing Thoughts

Figure 1.8 depicts the evolution of large-scale computer systems, their current challenges, and proposed solutions. The connection problem of the early large-scale computers, the tyranny of numbers, was solved by the invention of the von Neumann computer, integrated circuit, and solderless connection. However, the advance that has been achieved in all these technologies has made them the victim of their own success, with Moore’s law scaling and connector miniaturization each reaching its limits, and von Neumann bottleneck exacerbating. Furthermore, the adoption of neural network and deep learning in AI has once again brought to the forefront the challenges of scale and wiring in building hardwired computers. We believe the solutions can be found in our two wireless coupling technologies – ThruChip Interface (TCI) and Transmission Line Coupler (TLC). TCI, a magnetic coupling technology, enables stacking DRAMs with the system on a chip (SoC) to alleviate the von Neumann bottleneck. The same technology also enables stacking of static random-access memory (SRAM) with neural network chips implemented in field-progammable gate arrays (FPGAs) or reconfigurable processors to solve the challenges of scale and wiring in an AI computer. On the other hand, TLC, an electromagnetic coupling technology, enables a contactless connector that overcomes the scaling limits of its electromechanical counterpart. Each of these two wireless coupling technologies will be introduced in ensuing sections.

Figure 1.8 Evolution of solutions to the connection problem of a large-scale computer.

1.2 Energy-Efficient Computing

1.2.1 High-Performance IC

As noted in Figure 1.2 in Section 1.1, early IC development was performance driven, where process scaling delivered rapid performance increase as anticipated by Moore’s law. As an illustration, comparing the Intel Pentium from 1995 to the 8086 processor from 1980, in the span of 15 years, transistor count increased from 30,000 to 3,000,000, clock frequency from 5 to 300 MHz, computing power from less than 1 to several hundred MIPS, in line with the target rate of a twofold increase every two years. Since minimum line width shrank from 3 to 0.35 µm, device size scaled by approximately 0.75 times every two years. Such exponential growth was fueled by smooth progress in process scaling, including advances in lithography, process technology, wafer size, and manufacturing yield.

As explained in Section 1.1.2.3, under an ideal constant electric field scaling scenario where both dimensions and voltage are halved, delay is halved and power reduced to 1/4, while power density remains unchanged. However, if voltage is kept unchanged, delay is further reduced by half. At a time when the engineering workstation was the technology driver, and hence performance the driving factor, and with the switch from n-channel metal–oxide–semiconductor (NMOS) to complementary metal–oxide–semiconductor (CMOS), which produced a three-orders-of-magnitude drop in power, constant voltage scaling became an easy choice over constant electric field scaling. But since constant voltage scaling results in a twofold increase in power and an eightfold increase in power density, from 1980 to around 1992, in 12 years IC power skyrocketed by increasing fourfold every three years (Figure 1.9).

Figure 1.9 Evolution of IC power and power density.

© 1999 IPSJ [Reference Kuroda8]

Eventually when IC power exceeded 10 W, the problem of increasing power could no longer be ignored. Consequently, constant electric field scaling was introduced as a countermeasure and voltage started to drop. Nevertheless, the electric field had risen to the point where charge carrier mobility had reached saturation. As a result, while ideally current scales as V2, in reality it was scaling as V1.3 (Table 1.2). Furthermore, leakage current prevented V from fully scaling proportionally with device dimensions. For example, while V scaled from 2.5 to 1.8 and then 1.3 V for the 250, 180, and 130 nm process nodes respectively, its scaling slowed to 1.2, 1.0, and 0.9 V as process scaled to 90, 65, and 45 nm. As a result, power increase from scaling was not suppressed as much as desired, and IC power continued to climb at a rate of twice every six years (Figure 1.9). Eventually, in the second half of 1990s it approached 100 W, and the IC heat generation problem became quite severe.

Table 1.2. Constant electric field scaling changes from around 1995.

Time periodPre-1995Post-1995
Scaling factorDevice size, x1/21/2
Gate oxide thickness, tox1/21/2
Voltage, V1/21/2
Resultant changeElectric field, E11
Current dependency on V, β2 (ideal)1.3
Current, I Vβ/tox1/21/1.23
Resistance, R V/I11/1.62
Capacitance, C x2/tox1/21/2
RC delay, τ CR1/21/3.25
Power, P CV2/ τ1/41/2.46
Power density, p P/x211.62
Source: © 2007 IEICE, [Reference Kuroda9], table 1.

1.2.2 Low-Power IC

In the decade from around 1995 to 2005, low power became a key design goal in IC development, in addition to performance. Solutions were developed to scale the power wall, while process scaling continued to raise the transistor count. During this period, the technology driver shifted from the engineering workstation to personal computer, which further exacerbated the power problem due to the shrinking system form factor.

One low-power solution that drew a lot of attention was the optimization of drain voltage (VDD) and transistor threshold voltage (VTH). Figure 1.10 depicts how IC power and RC delay vary respectively as a function of VDD and VTH. When VDD is lowered to reduce power, delay goes up. But if VTH is lowered accordingly, it is possible to suppress the rise in delay while significantly reducing power. For instance, if VDD and VTH are lowered simultaneously to move from operating point A to B in Figure 1.10b along the equip–delay line of 100 ps, there is no change in delay, but total power is reduced as shown in Figure 1.10a.

Figure 1.10 IC power and RC delay as a function of VDD and VTH.

© 1995 IEICE, [Reference Kuroda and Sakurai2], figure 4

The optimization of VDD and VTH for the best trade-off of power and delay can be performed both dynamically during chip operation and spatially based on the design requirements of individual circuit blocks. With close monitoring of chip activity, VDD and VTH can be adjusted dynamically in accordance with the speed requirement at any particular time.

Spatially, circuit blocks that demand high speed can be supplied with a high VDD combined with a low VTH, while blocks that can afford to run slower are supplied with a low VDD combined with a high VTH, trading off execution time for lower power. The drawback is an increase in the complexity of the power supply, especially in the case of spatial optimization, where multiple voltages have to be generated at the same time.

By moving the on-board DC–DC converter that supplies VDD to on-chip for internal adjustment, and by varying the substrate bias to adjust VTH, it is possible to dynamically optimize these two parameters. Figure 1.11 depicts an example processor that implemented an on-chip, variable VDD supply. By setting VDD to its minimum acceptable value for correct operation at any particular frequency, power consumption was reduced by up to 50%.

Figure 1.11 Power consumption reduction through VDD optimization.

Figure 1.12 depicts an example image processor that implemented an on-chip substrate bias circuit to vary VTH to achieve significant reduction in leakage current.

Figure 1.12 Leakage current reduction through VTH optimization. (b)

© 2002 IEEE. Reprinted, with permission, from [Reference Kuroda11]

Between 1996 and 1998, Toshiba in Japan successfully developed the world’s first processors with controllable VDD and VTH (Figure 1.13, [Reference Suzuki, Mita, Fujita, Yamane, Sano, Chiba, Watanabe, Matsuda, Maeda and Kuroda10, Reference Kuroda, Fujita, Mita, Nagamatsu, Yoshioka, Suzuki, Sano, Norishima, Murota, Kako, Kinugawa, Kakumu and Sakurai12Reference Takahashi, Hamada, Nishikawa, Arakida, Fujita, Hatori, Mita, Suzuki, Chiba, Terazawa, Sano, Watanabe, Usami, Igarashi, Ishikawa, Kanazawa, Kuroda and Furuyama13]). The technology was soon adopted by Transmeta, followed by Intel and AMD, validating the need for processor developers to counter the pressing power problem.

Nonetheless, even though Intel had been aggressively driving up processor performance by raising clock frequency, by October 2004, the company decided to change course and shifted focus away from pushing clock frequency beyond 4 GHz. It was more a business than engineering decision since it was technically possible to further raise the clock frequency. But by doing so, the leakage and heat problem would become too expensive to solve. Instead, Intel shifted to boosting performance by increasing the number of processor cores.

1.2.3 Low-Power IC Interface

An IC must communicate with the outside world to perform useful functions. Therefore, in minimizing overall system power, in addition to minimizing power of the IC itself, IC interface power must also be taken into consideration. Conceptually, the IC interface can be divided into two components – the on-chip input/output (I/O) circuits (transmitters or drivers, and receivers) and the off-chip physical interconnections. Power optimization of the IC interface thus requires both low power I/O circuits and off-chip interconnect design.

Signal transmission in and out of the IC involves generating an analog waveform that represents the transmitted digital information, preserving the integrity (quality) of the waveform as it propagates, and receiving and correctly extracting the embedded digital information. It is therefore an operation in both the digital and analog domains. As a result, unlike their digital counterpart, I/O circuits’ performance, including power consumption, does not scale proportionally with process. As transmission speed increases, circuit design must change to support the higher data rates, leading to rising power. For example, buffers must be inserted in the clock path for higher speed clock distribution to compensate for the higher attenuation at higher frequency. Furthermore, increased adoption of common mode logic (CML) circuits to support faster digital switching also contributes to higher power. As a result, while IC interface power roughly followed the scaling of CMOS gate power in earlier processes, from around the 130 nm node it reversed course and rose steadily thereafter (Figure 1.14). To counter this trend, an effective solution is to limit the increase in signal data rate and instead increase the number of parallel signals transmitted to increase total interface bandwidth. This was facilitated by the shift in the placement of I/O circuits from being along the perimeter to across the surface of the IC, leading to the shift from wirebond to flip-chip packaging.

Figure 1.14 IC interface power scaling with process.

© 2007 IEICE, [Reference Kuroda9], figure 4

The off-chip physical interconnections of an IC generally consist of bond wires or flip-chip bumps and solder balls, package traces, vias, PCB traces and connector pins, with total distance of more than an order-of-magnitude longer than the size of the IC (Figure 1.15). Consequently, off-chip interconnects have different electrical behavior than on-chip interconnects. As a rule of thumb, when an interconnect has delay that satisfies the following condition, it appears as a transmission line to signal propagation:

2td>0.1tr(1.1)

where td is the propagation delay of the interconnect, and tr is the edge rate of the signal. Delay of a typical PCB interconnect is around 6 ps/mm, while the typical edge rate of a 1 Gbps signal is around 200 ps. Therefore, a PCB interconnect only needs to be longer than 1.67 mm to appear as a transmission line to a 1 Gbps signal. Hence, typical PCB interconnects in high-performance computing systems that are on the order of a centimeter or more in length behave as transmission lines. Furthermore, due to the combination of high conductivity and sufficiently large surface area, PCB interconnects are generally low-loss transmission lines with relatively low resistive loss and hence low power consumption.

Figure 1.15 An off-chip interconnection in a typical PC memory system.

The approximate electrical behavior of a low-loss transmission line is fully defined by its characteristic impedance Z0 and propagation delay γ, which can be computed as follows:

Z0=L0C0(1.2)
γ=L0C0,(1.3)

where L0 and C0 are the per unit length inductance and capacitance respectively of the transmission line. In a low-loss transmission line, attenuation of signal amplitude is small. However, if the transmission line Z0 varies along its length, some of the signal energy will be reflected. When that happens, the shape of the received signal in a certain time slot depends not only on the shape of the transmitted signal in that time slot, but also the shape of reflection of the transmitted signal in prior time slots, resulting in what is known as intersymbol interference (ISI). Therefore, impedance variations along a transmission line result in ISI, which degrades integrity of the received signal.

When a voltage step VI arrives at an impedance discontinuity, a voltage step VR is reflected, resulting in a voltage step of VT being transmitted (Figure 1.16a). The voltage steps are related through the following two equations:

ρ=VRVI=Z2Z1Z2+Z1(1.4)
τ=VTVI=1+ρ=2Z2Z2+Z1,(1.5)

where ρ and τ are known as the reflection and transmissoin coefficients, and Z1 and Z2 the characteristic impedance of the transmission line before and after the discotninuity respectively. Therefore, the goal of off-chip interconnect design is to maintain a smooth characteristic impedance profile along its entire length to minimize reflection in order to preserve the integrity of the transmitted signal. Nevertheless, given the complexity of the off-chip interconnect, characteristic impedance variations along its length are unavoidable even with meticulous design optimization (Figure 1.16b).

Figure 1.16 Impedance variations of off-chip interconnects.

To compensate for the loss in integrity of the transmitted signal due to impedance variations of off-chip interconnects, I/O circuits must be modified accordingly, for example by adding equalization circuits, resulting in increased complexity and power consumption. Furthermore, in general, the driver at the near end of the interconnect has lower impedance while the receiver at the far end higher impedance than the interconnect, representing additional discontinuities. To minimize reflection due to these discontinuities requires adding resistors at the driver and/or receiver to alter their impedances, resulting in additional area, cost, as well as power consumption. Consequently, a more effective way to preserve signal integrity of a low-power IC interface is to minimize the communication distance to limit transmission line effects in order to reduce I/O circuit complexity and power consumption.

Putting the entire system on a single chip in the form of an SoC will result in the shortest communication distances. However, in practice, there is a limit on how much of the system can be brought on-chip. In addition to the high manufacturing costs of a bigger die, a larger and more complex SoC increases development costs and time to market. Furthermore, an SoC precludes tailoring the manufacturing process toward individual functions to minimize manufacturing costs. For instance, logic and memory chips are manufactured in substantially different processes for economic reasons. Therefore, putting the two functions on the same chip means the manufacturing process would be suboptimal for one or both functions, leading to suboptimal costs and performance. An alternate solution to minimize communication distance that is compatible with a multichip system is a system-in-package (SiP). The best example of an SiP is the memory subsystem of a smartphone implemented in a package-on-package (PoP), where a packaged memory chip is stacked on top of the application processor (AP) mounted on a second package substrate, thus eliminating the PCB connection that exists when the two chips are mounted separately on the board (Figure 1.17).

Figure 1.17 A package-on-package in a smartphone.

In the smartphone PoP, the AP and memory chips are connected indirectly through bond wires, solder bumps, and package substrate traces. In a later section, we will introduce a wireless chip interface that allows direct connection between multiple chips, thereby reducing the communication distance to a minimum.

1.2.4 Energy-Efficient IC

Around 2005, the technology driver for the IC industry changed again, from the PC to the smartphone, which demands continuous, aggressive downsizing. Minimum silicon line width has continued to shrink from 0.13 µm to 28 nm, and down to 7 nm, for example, for the A12 application processor in the iPhone XS released in 2018. Meanwhile, battery has become a severe performance limiter, since it occupies a large percentage of both the weight and size of a smartphone. Furthermore, the perceived performance of the device is greatly affected by how often the battery needs to be recharged, and hence how much energy its components consume. As a result, low-energy has become a key IC design objective.

With the smartphone market growth slowing, new technology drivers are emerging. Two of them are undoubtedly data center and IoT, each of which presents different design challenges than the smartphone. Servers and supercomputers for the data center drive the advance in power efficiency. Furthermore, given their tremendous power consumption, the heat they generate becomes a limiting factor on performance, while cooling becomes another large adder to operating costs. On the other hand, IoT, which includes wearables and implantables together with smartphones, demands aggressive reduction in energy consumption. Some of their applications require very small form factor, which significantly limits battery capacity.

Power and energy consumption are interrelated. But while power P = αCVDD2 f is a linear function of switching frequency f and capacitance C, and a quadratic function of voltage VDD, energy E = CVDD2 varies only with C and VDD. (α is the switching probability). Reducing VDD is therefore an effective way to reduce both power and energy consumption. In practice, when VDD is initially lowered, power and energy will indeed decrease proportionally. But when VDD drops below a certain value, power and energy will reverse course and rise (Figure 1.18). This is due to rising leakage current. When VDD is lowered, VTH needs to be lowered accordingly to avoid performance loss. But when VTH is lowered, leakage current rises exponentially. Consequently, leakage current must be taken into account when considering overall power and energy efficiency. In the case of a metal–oxide–semiconductor field-effect transistor (MOSFET), transistor switching is controlled by the electric field between the gate and the substrate. This electric field is like the packing attached to the handle of a water faucet that controls the water flow. If this packing is weakened, the faucet will leak. In the MOS transistor, when the gate length is reduced without a corresponding reduction in oxide thickness, the electric field controlling switching is weakened, resulting in increased leakage. Therefore, as we lower VDD to reduce power and energy consumption, we must implement countermeasures to rising leakage, such as the material and structural changes described in Section 1.1.2.4 and Figure 1.3.

Figure 1.18 IC power as a function of VDD [Reference Kuroda3].

Given that the overall power consumption of a chip is the sum of its active power Pactive and leakage power Pleak, it is a function of how active the chip is. A logic chip that spends a lot of time performing active computation has a larger proportion of active power than a memory chip that is active only during a memory access in a localized area and during management operations. As a result, the optimal VDD and VTH for the lowest power and energy are different between logic and memory chips (Figure 1.19). It is therefore necessary to understand the operation of a chip when we design for low power and low energy.

Figure 1.19 Logic and memory energy consumption as a function of supply voltage.

As an illustration, from experience, lowest overall power is achieved when active and leakage power are around 80% and 20% respectively of overall processor power. For example, both Intel’s Pentium4 and IBM’s Power5 had about 20% leakage power.

Figure 1.20 depicts the energy consumption as a function of logic and memory VDD for an example processor in a 32nm CMOS process [Reference De15]. Based on these data, overall energy is minimized at a logic VDD of around 0.45 V. The Low-Power Electronics Association & Project (LEAP) in Japan has successfully developed a new device known as silicon on thin buried oxide (SOTB) that can operate down to 0.4 V [Reference Ishigaki, Tsuchiya, Morita, Sugii, Kimura and Swart16], which makes it possible to operate the processor at its optimal VDD for low power.

Figure 1.20 Optimized VDD for minimum total energy.

© 2014 IEEE. Reprinted, with permission, from [Reference De15]

1.2.5 Energy-Efficient Computing Trends

With SOTB, a low voltage of 0.4 V has been achieved. Yet, there is headroom for further reduction. For logic circuits to function correctly, the output of a CMOS invertor must be able to achieve full swing up to its supply voltage of VDD to drive the invertor in the next stage. Theoretically, VDD can be lowered to 0.036 V before the invertor gain drops below 1. Therefore, VDD can be lowered by another order of magnitude. Since energy is a quadratic function of voltage, there is room to reduce energy further by two orders of magnitude.

Nevertheless, in practice, a lower VDD is difficult to achieve. When VDD is lowered further, gate operation becomes unstable. For example, since the on and off (leakage) currents of the gate inside a flip-flop go against each other, when the on-to-off current ratio becomes too small, the circuit will stop functioning. It is therefore necessary to maintain a certain balance between on and off currents to ensure proper operation. Furthermore, increase in leakage current resulting from lower voltage as discussed in the previous section will need to be addressed. As a result, various research efforts are under way to develop steep slope devices with more pronounced subthreshold characteristics [Reference Topaloglu and Wong17].

In addition, as voltage is lowered further, manufacturing variations will surface as another challenge. When VDD approaches the threshold voltage, small changes in VTH will lead to large changes in current, resulting in large variations in circuit speed that cannot be ignored. As a result, synchronous circuit designs may not be possible anymore [Reference Fant18Reference Maruyama, Hamada and Kuorda19]. One solution is to allow certain amount of functional errors, as long as the probability of an error is sufficiently low. In other words, the chip will be functionally correct most of the time, but not 100% of the time. It is a statistical approach.

Of course, this is not acceptable for mission critical applications such as airplane and automotive navigation, as well as financial transaction computation. On the other hand, there are applications where small errors do not alter the outcome, such as image and face recognition (Figure 1.21).

Figure 1.21 Statistical system design approach and its application.

The statistical system design approach is compatible with some of the trending new applications such as image recognition and artificial intelligence. A hot area of technology development in image recognition and artificial intelligence is machine learning, where a deep neural network (DNN) is used to create an artificial intelligence system. The excitement in DNN research started when one such network achieved a breakthrough in dramatically reducing the error rate from 25% to 16% in image recognition in the 2012 ImageNet Large Scale Visual Recognition Challenge. Technology bellwethers such as Google, Microsoft, Amazon, and Baidu are all competing to deliver the next breakthrough in machine learning, in addition to startups funded by billions of dollars in venture capital [Reference Rowley20]. The technology is rooted in mimicking the algorithms humans use for image recognition. Since human actions are based on statistical and probability analysis, a statistical design approach can be a good fit for developing energy-efficient machine learning systems.

To see how much we need to improve power efficiency for machine learning applications, the natural benchmark for comparison is the human brain. The human brain consumes about 0.3 fJ of energy per computation, corresponding to a power efficiency of 15 W/50 PFLOPS. By contrast, Intel processors from 2010 consume about 0.3 nJ of energy per computation, resulting in an efficiency of 15 W/50 GFLOPS (Figure 1.22). In other words, the human brain is six orders of magnitude more power efficient. It will take a lot of effort, including introducing the statistical system design approach, running the machine nonstop to gain an advantage over the human brain, which needs to sleep every day, and so forth, in order to close this gap.

Figure 1.22 Power efficiency comparison: machine vs. human.

Another trend in energy-efficient computing is distributed computing. At the system level, the trend is to distribute processing over many physically small processing elements that are connected through short-range wireless communication so no one element has to work very hard. Early development efforts include Ubiquitous Computing (PARC/Weiser), Things That Think (MIT), Disappearing Computer (EU), Invisible Computing (Microsoft), Pervasive Computing (IBM), Paintable Computing (MIT), and Pushpin Computing (MIT). The proliferation of IoT can be considered an implementation of distributed computing at a more global level.

Figure 1.23 depicts an example of distributed computing and its benefits. It illustrates how the problem of searching for a parking space can be solved efficiently with short-range wireless communication and small capacity memory. First, a parking space broadcasts its availability to a passing car (Car3). Then the information is passed from this car to another (Car2) when they cross each other, and the process continues. A car (Car1) searching for a space can capture this information as it crosses Car2 without being in the vicinity of the parking space to directly receive the broadcast information. This is akin to the spread of fragrance, where locating the parking space is like tracing the fragrance back to its origin. The “fragrance” together with the group of cars that spread it makes the environment intelligent, or smart, to use the buzzword of the IoT era. With this smart environment, only short-range communication is required. Therefore, the amount of energy needed to locate the parking space can be dramatically reduced compared to random search using the traditional, centralized processing approach, as illustrated in Figure 1.23b.

Figure 1.23 Using “fragrance” to create a smart environment.

© 2007 IEICE, [Reference Kuroda9], figure 5

1.2.6 Closing Thoughts

Low-power and energy-efficient designs have allowed IC to continue to scale. Using a 10 nm process, it is now possible to put 100 million transistors on 1 mm2 of silicon [Reference Tyson21], resulting in 0.1 trillion transistors on a 1,000 mm2 chip. While still two orders of magnitude away, the transistor count is starting to approach the 60 trillion cell count of the human body (Figure 1.24). When the chip transistor count finally crosses over the human body cell count, IC development may become dominated by innovative application of the large quantity of transistors made available through advance in power and energy efficiency to achieve the next level of enrichment of our life.

Figure 1.24 Critical point in IC technology development [Reference Kuroda1].

1.3 Evolution from 2D to 3D Integration

1.3.1 Motivation for 3D Integration

IC chips and other semiconductors are generally packaged and mounted individually on a PCB to build a computer system. Since the individual components are mounted on a 2D surface and connected primarily through traces running in the x- and y-direction (plus vertical vias), this is known as 2D integration. However, as discussed in Section 1.2.3, this complicated off-chip interconnect structure can result in signal integrity (SI) problems due to transmission line effects, which lead to more complex I/O circuit design and higher power. To circumvent these SI problems, 3D integration is adopted where chips are placed on top of each other vertically and connected using primarily interconnects in the z-direction to minimize the communication distance between chips (Figure 1.25). Moreover, stacking chips on top of each other instead of spreading them across a 2D surface minimizes package footprint, which is critical for applications with limited board area such as mobile. This is exemplified by the use of PoP in a smartphone as introduced in Section 1.2.3. But in recent years, a new driver of 3D integration has emerged in the form of More than Moore. As the name suggests, this is an effort to offer more performance than what Moore’s law delivers.

Figure 1.25 From 2D to 3D integration.

As explained in Section 1.1, we can no longer achieve IC performance increase through Moore’s law scaling without having to improve power efficiency. Furthermore, around the 28 nm process node, manufacturing cost per transistor started to rise. In addition, nonrecurring engineering (NRE) and fixed costs including process development cost, IC design cost, mask cost, and manufacturing equipment cost are all going up exponentially with advanced process scaling. For example, it was predicted in 2018 that 3 nm would cost $4 billion to $5 billion in process development, while total IC design cost including software development and validation was estimated to almost triple from 16 to 7 nm [Reference Hruska22]. On the other hand, it was estimated that by 2014 extreme ultraviolet (EUV) lithography manufacturing equipment development would have already cost $14 billion [Reference Manners23]; yet it required another four years of development before its initial production use was announced in 2018 [Reference Moore24]. It is therefore imperative for the IC industry to explore alternatives to Moore’s law scaling to continue to improve IC cost performance. Since Moore’s law has been about scaling of 2D device structures, it is only natural to explore using the third dimension as an alternative or complement, hence the emergence of 3D solutions.

1.3.2 Monolithic 3D IC

The most advanced IC processes today employ a FinFET, which is a 3D device. This is because in FinFET the transistor gate is no longer two-dimensional but instead is wrapped around the channel utilizing the z-direction as well (Figure 1.26a). Another way to extend the 2D IC to 3D is a monolithic IC where multiple device layers are integrated in the vertical direction within the IC. A 3D NAND flash memory chip is one such example where multiple layers of memory cells are stacked on top of each other (Figure 1.26b). Scaling is achieved by increasing the number of layers. One merit of this solution is that since memory cell layers are measured in nanometer in thickness, stacking does not appreciably increase the overall thickness of the IC. In 2018, the most advanced 3D NAND flash in production had a total of more than 90 cell layers.

Figure 1.26 Monolithic 3D ICs – FinFET and 3D NAND.

(b) © 2011 Toshiba Corporation. Reprinted, with permission, from [Reference Aochi, Katsumata and Fukuzumi25]

However, since IC manufacturing cost and yield are highly dependent on the number of processing steps and processing complexity, monolithic 3D IC has significant cost and yield disadvantage, such that its adoption for mass production has been limited to NAND flash. Instead, 3D integration of IC chips is the leading 3D solution for logic and DRAM memory chips.

1.3.3 Conventional 3D Integration Solutions

In addition to lowering IC manufacturing cost, 3D integration of IC offers two advantages over monolithic 3D IC and Moore’s law scaling of 2D IC. First, by keeping digital, analog, and memory functions implemented as separate chips, the manufacturing process for each function can be independently optimized to achieve better overall performance. Second, 3D integration enables reuse of intellectual properties (IP) by mixing and matching different chips to develop different systems without IC redesign, significantly reducing development cost in terms of both engineering cost and time to market.

Due to these benefits, 3D integration of IC chips based on conventional packaging technologies have been in use for more than two decades. Figure 1.27 (from [26] with added PoP data point based on [Reference Yano, Sugiyama, Ishihara, Fukui, Juso, Miyata, Sota and Fujita27]) depicts the transition of IC packaging solution from 2D to 3D integration based on conventional packaging technologies in the late 1990s and the evolution of 3D integration technology thereafter. The initial solution was a stacked IC wirebond (WB) package where multiple chips are stacked face-up and wirebonded to a ball grid array (BGA) or chip scale package (CSP) substrate. The chips can be stacked directly on top of each other to minimize overall package height, or with spacers in between to create vertical clearance for wirebonding when the stacking chips are square and of the same size (Figure 1.28a and b). Then in early 2000, PoP was introduced, which stacks single- or multichip BGA packages on top of each other (Figure 1.28c).

Figure 1.27 Evolution from 2 to 3D IC integration.

© 2005 Nikkei BP [26]; photo: © 2016 Toshiba Corporation [Reference Matsudera and Kawasaki28]

Figure 1.28 Conventional 3D IC integration solutions.

(a) © 2007 IEEE. Reprinted, with permission, from [Reference Worwag and Dory29]

Table 1.3 compares the stacked IC WB package with PoP. While innovations are necessary to reduce wire length and loop height in the stacked IC WB package to increase the number of stacked chips, wirebonding has been in mass production for years and is therefore low cost and reliable. However, since bare dice are difficult to test, it is challenging to provide known good dice. As a result, the packaged part may have a yield problem, especially as the number of chips increases. By contrast, individual packages in a PoP are tested by their respective suppliers before they are stacked. Furthermore, the PoP assembly is done by the system user who is free to mix and match the individual packages. This provides flexibility not only in changing the content of each packaged part, such as changing the mix of memory types or capacity, but also in multisourcing. Meanwhile, since the interconnection between individual packages consists of solder balls and substrate traces, its electrical performance is generally better than wirebond interconnection, which is long and narrow. The lower package, with a full ball grid array, has high interconnection density. However, the upper package, with depopulation of the ball grid array in the center to accommodate the die on the lower package, may have limited interconnection density, especially when the lower package has a large die. Individual packages can use either wirebonding or flip-chip for die attachment. In fact, as previously noted, PoP is commonly used to build a memory SiP in a smartphone where the SoC is housed in a high-performance flip-chip package at the bottom, while memory dice in the upper package are connected to the package substrate with bond wires. Nevertheless, PoP has a disadvantage in package height since individual packages, with a substrate, solder balls, bond wires, and molding compound, are significantly thicker than individual chips.

Table 1.3. Comparison of conventional 3D IC integration solutions.

Stacked IC WB packagePoP
Cost+
Yield (known good die)+
Electrical performance+
Interconnection densityUpper package: −
Lower package: +
System flexibility
(IC mix-and-match)
+
Package height+
ApplicationHigh-capacity memorySoC memory SiP

Rather than stacking heterogeneous chips to create an SiP, the stacked IC WB package, given its drawbacks, is commonly used for stacking homogeneous chips to create a high-capacity memory solution. This is because stacking of low-cost memory chips requires a low-cost solution. Furthermore, being low cost and having high yield means known good die is not as important a requirement. Finally, many chips need to be stacked for capacity, making a stacked IC WB package a better fit than PoP. For instance, [Reference James30] reports a nine-stack, 32 Gb micro-SD flash memory card from Sandisk, as well as a 16-stack, 64 Gb NAND part in an iPhone from Samsung.

Meanwhile, despite the advantages of PoP over the stacked IC WB package, its height disadvantage limits its applicability. When Sharp introduced its PoP technology in 2002, it showed a three-stack example [Reference Yano, Sugiyama, Ishihara, Fukui, Juso, Miyata, Sota and Fujita27], so the technology supports stacking of more than two packages. However, the total stack thickness increases rapidly as the number of packages increases, since package thickness is on the order of 500 µm (without including the solder ball standoff) as opposed to die thickness that is on the order of 50 µm after thinning. As a result, direct die stacking is the preferred solution for stacking many ICs together.

Nevertheless, the use of wirebonding technology in the stacked IC WB package makes it difficult to overcome its disadvantages. In particular, the poor electrical characteristics of bond wires, which result in poor signal and power integrity, prevents it from being used for high-performance chips such as processors, especially as wire length increases with the number of chips being stacked. Furthermore, since wirebonding is a peripheral interconnect technology where bond pads are placed along the perimeter of the IC, most commonly in one or two staggered rows, it is difficult to significantly increase the interconnection density to increase bandwidth. Consequently, an area array interconnect technology with better electrical characteristics and shorter interconnects is needed for heterogeneous 3D chip integration for high-performance systems.

1.3.4 Advanced 3D Integration Solutions

To address the limitations of conventional 3D IC integration solutions, various advanced solutions have been in development to dramatically increase interconnection density and/or the number of ICs that can be integrated. These solutions can generally be classified as wired vs. wireless, and nst = 2 (face-to-face) vs. nst ≥ 2 (free orientation) solutions, where nst is the number of stacked chips (Figure 1.29, [Reference Ezaki, Kondo, Ozaki, Sasaki, Yonernura, Kitano, Tanaka and Hirayarna31]–[Reference Mizoguchi, Yusof, Miura, Sakura and Kuroda34]). All four classes of solutions offer significant improvement in interconnection density and communication distance compared to conventional solutions. Since connection terminals are placed in an area instead of peripheral array in all solutions, the number of interconnections can be increased in a quadratic fashion. On the other hand, because adjacent chips communicate with each other directly instead of through a package substrate, communication distance and hence interconnect lengths are greatly reduced.

Figure 1.29 Advanced 3D IC integration solutions.

Two variants of wired 3D IC integration solutions using microbumps [Reference Ezaki, Kondo, Ozaki, Sasaki, Yonernura, Kitano, Tanaka and Hirayarna31], [Reference Kumagai, Yang, Izumino, Narita, Shinjo, Iwashita, Nakaoka, Kawamura, Komabashiri, Minato, Ambo, Suzuki, Liu, Song, Goto, Ikenaga, Mabuchi and Yoshida35] and through-silicon-vias (TSVs) [Reference Burns, McIlrath, Keast, Lewis, Loomis, Warner and Wyatt32] respectively have entered production. Microbump technology employs small solder bumps formed on the chip surface to vertically connect two chips together. Since connection terminals are on the front side of the chips, the two chips must be oriented face to face for interconnection, which precludes integration of more than two chips. Hence, nst = 2. On the other hand, with TSV, as the name suggests, via holes through the silicon substrate are created to add connection terminals to the backside of the chip as well. Interconnection between chips may still be created using microbumps. But since interconnection can be established using both the front and backside of the chip, integration of more than two chips (nst ≥ 2) becomes possible. TSV therefore has broader applicability than microbump alone.

For wireless 3D IC integration, the initial solution proposed uses capacitive coupling [Reference Kanda, Antono, Ishida, Kawaguchi, Kuroda and Sakurai33, Reference Hopkins, Chow, Bosnyak, Coates, Ebergen, Fairbanks, Gainsley, Ho, Lexau, Liu, Ono, Schauer, Sutherland and Drost36Reference Gu, Xu, Ko and Chang37]. It establishes an electrical connection by aligning connection terminals in the form of metal pads on the surface of the mating chips across a dielectric gap to form a capacitor. Since the impedance of a capacitor is inversely proportional to frequency, this is an alternating current (AC) only connection with no direct physical contact – information is passed through the gap between mating chips by varying the electric field associated with the capacitor. Given that electric field cannot effectively penetrate a silicon substrate, this capacitive coupling solution can only be applied to two chips oriented face to face. Hence, nst = 2. To overcome this and other shortcomings of the capacitive coupling solution, an alternate solution using inductive coupling has been developed that replaces the matching metal pads with matching inductive coils and communicates by changing the magnetic field linked to the coils [Reference Mizoguchi, Yusof, Miura, Sakura and Kuroda34], [Reference Miura, Mizoguchi, Inoue, Tsuji, Sakurai and Kuroda38]–[Reference Miura, Kohama, Sugimori, Ishikuro, Sakurai and Kuroda41].

1.3.4.1 Nature of a TSV

Since electronic circuits are formed on the top surface of an IC, the straightforward way to connect to them is from the front side. Therefore, the natural way to stack more than two dice over each other is to stack all of them facing up and connect the top surface of individual dices to a common point to create interconnections, which is how it is done in a conventional wirebond stack. To enable a drastically different integration scheme, the die must be additionally processed in such a way as to enable its electronic circuits to be connected from the back side as well. Such is the concept of a TSV. It is an electrical connection that traverses the silicon substrate, thereby enabling connection to active circuits of the chip from its back side as well.

Figure 1.30 depicts the basic structure of a TSV fabricated using one of two commonly employed processes known as a via-middle process [Reference Denda42]. It connects the metal layers on the front side of the die to its back side. A microbump on each of the front and back sides allows the die to connect with an adjacent die above and below respectively. While the die in a single-chip package is generally 200–500 µm thick, for TSV 3D integration it is thinned to about 50 µm. A thinner die helps to keep the stack height low, even if many of them are stacked. But more importantly, it helps to keep the TSV short and hence its diameter small, while maintaining a certain via aspect ratio for manufacturability. A shorter TSV in turn results in a shorter fabrication time. A 50 µm die thickness results in a via diameter between 3 µm and 20 µm, with 5 µm being a typical value. The via wall consist of a SiO2 insulating layer, a barrier layer that is, for example, made of tantalum (Ta), titanium oxide (TiO2), or titanium nitride (TiN) to prevent copper from diffusing into the silicon substrate, as well as a copper seed layer. The copper seed layer serves as an electrode for plating to fill the via with copper, thereby completing the electrical connection from the front to back side of the die. Even with this simplified picture, it is clear that the structure of TSV consists of multiple structural components made of different materials. This has implications in processing complexity as well as yield and reliability due to thermomechanical stresses.

Figure 1.30 Structure of a TSV.

Reproduced by permission from Sei-ichi Denda, 半導体の高次元化技術 [Enabling Technology for Higher Dimensional Semiconductors], Japan: Tokyo Denki University Press, 2015. © Denda Sei-ichi 2015

The via-middle process is so named because the TSV formation occurs in the middle of the IC fabrication process, after the front end of line (FEOL) process to create the transistors, but before the back end of line (BEOL) process to form the metal layers. A second common process known as via-last creates the TSVs at the end of the IC fabrication process, after wafer thinning and bump formation. Figure 1.31 depicts where the TSV formation occurs within the IC fabrication flow, including the temperature profiles for each of the fabrication steps. The two TSV processes are different not only because of the different positions within the IC fabrication process flow, but also because of the very different process temperature profiles they are exposed to, resulting in different process complexities and cost and yield equations. Furthermore, for outsourced manufacturing, while the via-middle TSV process falls naturally within the flow of the foundry, the via-last process may occur either at the foundry or outsourced semiconductor assembly and test (OSAT) vendor. As a result, the choice of the TSV fabrication process has supply chain implications as well.

Figure 1.31 TSV formation within the IC fabrication process flow.

Reproduced by permission from Sei-ichi Denda, 半導体の高次元化技術 [Enabling Technology for Higher Dimensional Semiconductors], Japan: Tokyo Denki University Press, 2015. © 2015 Denda Sei-ichi

Figure 1.32 contrasts the use of TSV versus wirebond in 3D IC integration. The following key differences can be observed:

  • TSV enables a smaller form factor: the wirebond stack requires additional area and height to accommodate the wire loops as well as bond pads on the package substrate. Besides, while the example in the figure uses 90 degree rotation of every other die to create clearance for the wire loops, when the dice are square and of the same size, spacers must be inserted between adjacent dice to create such clearance, which increases the stack height further.

  • TSV enables more robust interconnection: the visual comparison in Figure 1.32 illustrates the contrast in interconnection complexity between the two solutions. While the interconnects in the TSV stack are short and tucked away, those in the wirebond stack are long and susceptible to mechanical force that can tangle them and cause electrical shorts.

  • TSV enables a larger number of dice to be stacked: in the wirebond stack, the number of wires and their maximum length increase with the number of dice. Therefore, the maximum number of dice in the stack is limited by manufacturability and electrical requirements. Longer wires are more susceptible to sweeping force that can cause electrical shorts. Furthermore, longer wires have higher inductance that can lead to unacceptably large power supply noise and signal degradation, especially at high frequency.

  • The TSV stack enables a higher interconnect density: while both TSV and wirebond pitch are on the order of 50 µm, TSVs are placed across the surface of the die while wirebonds are limited to its perimeter. Consequently, TSV increases interconnect density in a quadratic manner.

TSV is therefore an advanced technology offering a small form factor, robust, high-capacity stacking, and high interconnect density solution for 3D IC integration.

Figure 1.32 3D IC integration technology comparison: wirebond vs. TSV.

Reproduced by permission from Sei-ichi Denda, 半導体の高次元化技術 [Enabling Technology for Higher Dimensional Semiconductors], Japan: Tokyo Denki University Press, 2015. © 2015 Denda Sei-ichi
1.3.4.2 TSV for Wired IC Integration

The first industrywide effort to commercialize the application of TSV to wired 3D IC integration was driven by the Joint Electron Device Engineering Council (JEDEC) through its release of the Wide I/O specification in 2011. At the time, the mobile DRAM industry was looking for a solution to deliver a quantum leap in energy efficiency. The transition from the first- to second-generation low-power DRAM technology, LPDDR to LPDDR2, achieved an almost 50% reduction in I/O energy per bit in conjunction with a doubling of bandwidth, resulting in the same total power. However, the next transition achieved only another 35% reduction in energy per bit while the bandwidth again doubled, resulting in a 31% increase in total power. This is summarized in Table 1.4, which is based on data published in [Reference Denda42] and [Reference Kim43]. The continuous increase in data rate demanded by the industry was anticipated to make improvement in energy efficiency more and more difficult as LPDDR evolves. In particular, for the memory interface, higher data rates exacerbate signal integrity problems, including signal reflection, ISI, and crosstalk. Furthermore, skew in delay across LPDDR’s parallel data bus becomes a larger and larger percentage of bit time, especially as the bus width increases to achieve higher bandwidth. Meanwhile, a higher-frequency clock is harder to generate and distribute due to higher jitter, higher sensitivity to noise, and higher attenuation. All these problems require architectural and circuit improvements to solve, such as the use of equalization circuits, transmission line terminations, signal phase alignment circuits, phase locked loop (PLL), delay locked loop (DLL), on-chip power supply bypassing, and insertion of clock buffers that lead to increased power. As a result, the Wide I/O solution was proposed to take low-power memory development in a different direction to enable continuous bandwidth scaling while drastically improving energy efficiency.

Table 1.4. Evolution of mobile DRAM energy efficiency.

Specification releaseData rate (Mb/s)Bandwidth (Gb/s)Relative I/O energy per bit (%)Relative total I/O power @ min BW (%)
LPDDR20074001.6100100
LPDDR22009800–1,0663.2–4.350.3101
LPDDR320121,600–1,8666.4–7.532.7131
Wide I/O201120012.84.536

Figure 1.33 depicts the construction of a Wide I/O memory system. It consists of one or more Wide I/O DRAM dice stacked on top of a processor die. The 3D die stack is interconnected with TSVs and microbumps, while the bottom processor die is flip-chip attached to the package substrate.

Figure 1.33 Construction of a Wide I/O memory system.

As the name suggests, Wide I/O adopts a very wide data bus to achieve high bandwidth while keeping the data rate and hence I/O power low. Despite running at ¼ the data rate, it delivers four times the bandwidth of LPDDR2. This is made possible by the use of TSV to enable the 512-bit wide data bus, which is 16 times that of LPDDR2. The use of TSV instead of the conventional PCB interconnect results in drastic reduction in both the interconnect area (5 µm TSV diameter vs. 100 µm PCB trace width) and length (50 µm TSV length vs. tens of mm of PCB trace length), thereby avoiding an order-of-magnitude increase in interface area while significantly reducing interconnect capacitance and transmission line effects. Furthermore, since the interface bypasses the package, the bus width is not limited by the pin count and cost of the package. As a result of the lower data rate and simpler interconnects, the energy per bit is reduced by an order of magnitude while bandwidth is more than tripled compared to LPDDR2 (Table 1.4).

1.3.4.3 2.5D IC Integration Using TSV

Despite the promise of dramatically improved energy efficiency, Wide I/O (both versions 1 and 2 released in 2011 and 2014 respectively) was not able to replace LPDDR as the low-power memory solution for mobile applications. After the release of the Wide I/O specification, LPDDR continued to evolve from LPDDR3 in 2012 to LPDDR5 in 2019, doubling the data rate in every generation, as shown in Figure 1.34 based on [Reference Kim43]. Meanwhile, energy efficiency improved by about 35% in every generation. This is not sufficient to fully offset the increase in power due to doubling of the data rate, so total memory I/O power goes up at maximum bandwidth. Nevertheless, the industry has so far decided to stay with this evolutionary path, due to challenges in applying TSV to 3D integration of heterogeneous dice.

Figure 1.34 Evolution of LPDDR bandwidth and energy efficiency.

The challenges in applying TSV to 3D integration of heterogeneous dice include added manufacturing cost, yield loss, thermomechanical reliability problems, and device performance degradation. The added manufacturing cost is the result of both added processing for TSV formation and die area overhead. Meanwhile, the complexity of the TSV structure and the multitude of materials used that have diverse coefficients of thermal expansion (CTEs) lead to creation of thermomechanical stresses when the die stack is subject to thermal cycling during both manufacturing and normal operation. Such stresses result in lower manufacturing yield, lower reliability, and semiconductor device performance degradation [Reference Karmarkar, Xu and Moroz44]. Furthermore, thermomechanical stress also occurs at the flip-chip bump interface between the bottom die in the stack and the package substrate below, due to CTE mismatch between silicon and the package substrate material, such as epoxy resin. Such stress exists in single-die packages as well. But the much thinner die in a 3D stack means it is more susceptible to warpage.

There are solutions to overcome some of these challenges, such as adding redundant TSVs and enlarging the TSV keep-out zone to keep active circuits away. But such solutions increase the already large area overhead. Figure 1.35 depicts a relative size comparison between TSVs and NAND gates. The study in [Reference Samal, Nayak, Ichihashi, Banna and Lim45] from 2016 estimated a die area overhead of almost 40% for a TSV diameter of 5 µm in a 14 nm FinFET process, while noting that practical TSV diameters could be up to 10 µm large.

Figure 1.35 Relative size comparison: TSV vs. NAND gate.

© 2016 IEEE. Reprinted, with permission, from [Reference Samal, Nayak, Ichihashi, Banna and Lim45]

One additional challenge for TSV 3D integration of heterogeneous dice arises when a high-power chip is placed toward the bottom of the stack, which is the case with Wide I/O, where the processor chip sits at the bottom. In this case, heat removal from the high-power chip can be a challenge even if there is a heat sink at the top, especially if there are many chips in the stack.

To alleviate these problems, an intermediate solution known as 2.5D IC integration has emerged that has achieved commercial success especially in the high-end FPGA and graphics processing unit (GPU) markets.

Figure 1.36 depicts the cross section of the first 2.5D integrated FPGA commercial product announced by Xilinx in late 2011 [Reference Bolsens46]. It illustrates the basic construction of a 2.5D IC using TSV. First, the ICs are not stacked and hence have no TSVs. This facilitates heat removal from the individual ICs and eliminates IC manufacturing and reliability problems related to the use of TSV. Second, there is a passive silicon interposer inserted between the ICs and package substrate. The interposer serves the sole purpose of interconnecting the ICs and the package substrate and hence contains no active circuits. Interchip connections consist of microbumps and metal traces only without using TSVs. TSVs in the interposer are only used for power, ground, and external I/Os. Consequently, the TSV density is much lower than in 3D IC integration. As a result, the interposer can be fabricated in a lower-performance, more mature, and hence less expensive process. Furthermore, since the use of TSV is limited to the passive interposer, there is no TSV-induced device performance degradation.

Figure 1.36 Cross section of a 2.5D integrated FPGA.

© 2012 IEEE. Reprinted, with permission, from [Reference Madden, Wu, Kim, Banijamali, Abugharbieh, Ramalingam and Wu47]

Figure 1.36 shows an example of homogeneous 2.5D integration where all the chips are of the same type. The same technology can be applied to integration of heterogeneous chips. For instance, in Virtex-7 HT, which is also introduced in [Reference Bolsens46], three FPGA chips are integrated with two 28G SerDes transceiver chips in a 2.5D package. Furthermore, 2.5D can be combined with 3D integration where one of the chips mounted on the silicon interposer is a 3D integrated IC. A representative example is high bandwidth memory (HBM), which is introduced later in this section.

As elaborated in [Reference Bolsens46], compared to 3D, 2.5D integration represents an evolutionary solution in terms of design flow, test, thermal, and reliability. Furthermore, a 2.5D integrated FPGA offers high logic cell capacity at much lower power. As an illustration, the Virtex-7 2000T FPGA from Xilinx is a homogeneous 2.5D integration of four identical 28 nm FPGA chips on top of a 65 nm silicon interposer. It contains a total of 2 million logic cells and about 10,000 connections between adjacent chips and consumes 19 W of power. Using conventional FPGAs, four individually packaged chips will be required to match the capacity, while consuming almost six times the power, with 28.6% consumed by the interconnections.

Without vertically stacking the ICs, 2.5D integration requires a larger area than 3D. But it offers similar interconnection density as 3D integration, as well as shorter interconnections and higher IC packing density and hence a smaller footprint than 2D integration, while overcoming some key challenges faced by 3D integration. The pros and cons of 2.5D integration compared to 2 and 3D are summarized in Table 1.5.

Table 1.5. Pros and cons of 2.5D IC integration.

2.5 vs. 2D2.5 vs. 3D
Pros– Higher interconnection density
– Shorter IC interconnections
∘ Lower power, higher speed
– Smaller footprint
– No TSV cost adder to ICs
– No TSV-related yield and reliability problems for ICs
– No TSV-induced device performance degradation
– Easy heat removal
Cons– Added interposer cost including TSVs
– Added microbump assembly complexity
– Larger footprint
– Longer IC interconnections

Given the smaller footprint, 3D is still the preferred solution over 2.5D when many dice need to be integrated, and when the application is less cost sensitive. A good example can be found in high-end GPU systems that have adopted the HBM DRAM solution to meet their large memory capacity and high-speed requirements. HBM utilizes a combination of 2.5D and 3D integration to deliver the highest memory performance. HBM DRAM is a 3D stack of DRAM core chips on top of a logic chip. This memory stack is then placed side by side with a high-performance processor on a passive silicon interposer to form a 2.5D integrated memory system. It thus combines the best of both worlds. Three-dimensional integration allows the DRAM footprint to remain small while providing large capacity. Meanwhile, 2.5D integration of the processor keeps TSVs out of the large and expensive die of the system, while enabling effective removal of the large amount of heat it generates.

Figure 1.37 shows both a conceptual drawing and an optical image of the cross section of an HBM solution from [Reference Chen, Hu, Ting, Wei, Yu, Huang, Chang, Wang, Hou, Wu and Yu48]. The optical image illustrates the relative dimensions of the different parts of the memory system. Note that the use of an eight-stack HBM memory component incurs no height penalty since it stays within the height of the SoC. However, the interposer does add to the overall thickness of the package even though it is relatively thin.

Figure 1.37 Cross section of an HBM memory subsystem.

© 2017 IEEE. Reprinted, with permission, from [Reference Chen, Hu, Ting, Wei, Yu, Huang, Chang, Wang, Hou, Wu and Yu48]
1.3.4.4 Wireless 3D Integration

Wireless communication, as opposed to wired communication, does not require physical contact, e.g., by the use of a wire, to establish the communication channel. As any user of WiFi knows, wireless communication frees you from being tethered and hence offers you great mobility and convenience. However, since the receiver can be anywhere within the communication range, the transmitted signal must likewise be available everywhere. As a result, wireless communication is generally higher cost and offers lower bandwidth (throughput) than wired communication. By contrast, due to its very short communication distance of <100 µm, wireless 3D IC integration can be optimized to compete with its wired counterpart in cost and performance, while offering other benefits.

First, the connection terminals in the form of either metal pads or inductive coils for wireless integration are manufactured as part of the standard IC metal layer fabrication process. As a result, there is minimal added processing cost. By contrast, TSVs require complicated mechanical and wafer-level processing, adding significant manufacturing cost. Second, since wireless integration uses AC coupling, it can be applied to chips supplied with different voltage levels without any level shifters. Third, because there is no physical connection, wirelessly integrated chips are detachable – it is as easy to disconnect as it is to connect them together. Likewise, individual chips can easily be wirelessly connected to and disconnected from probe cards, making them easy to test and without concern for damages that a physically connected probe can inflict.

Performance of wireless integration in terms of communication data rate and reliability, power consumption, and chip area can also be optimized to rival that of wired integration, for three reasons. First, unlike WiFi, the very short communication distance makes it possible to create large capacitive and inductive coupling coefficients. As a result, high communication reliability can be achieved without consuming a lot of power. Second, since the connection terminals of the integrated chips do not require a DC path to the external world, they can be insulated to avoid electrostatic discharge (ESD) so as to eliminate ESD protection circuits, thereby reducing capacitive loading on the interconnection. This enables both high data rate and low power consumption. Circuit area is reduced as well. Third, in the case of inductive coupling, since the magnetic field can penetrate through circuits on the silicon substrate, the inductive coils can be placed above active circuits with no keep-out area. (Keep-out area is, however, needed to suppress crosstalk to sensitive circuits.) Consequently, die area overhead is small. Contrast this with the use of TSVs, where each via requires a keep-out area, thus adding significant area overhead, especially when they are numbered in the thousands. To experimentally confirm these and other merits of the inductive coupling solution and to advance its development, an inductively coupled wireless interface was prototyped in several test chips between 2006 and 2008 (Figure 1.38).

Figure 1.38 Inductive coupling interface test chips from 2006–2008.

© 2008 The Japan Institute of Electronics Packaging. Reprinted, with permission, from [Reference Miura and Kuroda49]

Figure 1.39 depicts the schematics of the transmitter and receiver circuits, and simulated and measured waveforms from an inductive coupling interface fabricated in a 180 nm CMOS process in 2006 (top-left of Figure 1.38 and [Reference Miura, Mizoguchi, Inoue, Niitsu, Nakagawa, Tago, Fukaishi, Sakurai and Kuroda39]). A data rate of 1 Gbps at a bit error rate (BER) <10–13 was achieved, delivering reliable communication comparable to that of a wired interface. To minimize chip area, the inductive coils were placed above the transmitter and receiver circuits. Yet, no appreciable degradation in communication reliability was detected. Furthermore, by implementing a four-phase time-multiplexing scheme to skew the timing of data switching in adjacent communication channels (interconnections), total crosstalk from neighboring channels was significantly reduced. This enabled operating 1,024 parallel channels at a 30 µm pitch simultaneously at a per-channel data rate of 1 Gbps at BER <10–13, resulting in a total interface throughput of 1 Tb/s.

Figure 1.39 Inductive coupling test chip, 2006.

© 2008 The Japan Institute of Electronics Packaging. Reprinted, with permission, from [Reference Miura and Kuroda49]

Figure 1.40 compares the test interface from 2006 with alternate solutions from around the same time frame, in both total channel throughput and area/throughput, confirming the merits of the inductive coupling solution.

Figure 1.40 Benchmarking the 2006 inductive coupling test chip.

© 2008 The Japan Institute of Electronics Packaging. Reprinted, with permission, from [Reference Miura and Kuroda49]

The inductive coupling interface was further refined in a test chip in 2007 (middle of Figure 1.38 and [Reference Miura, Ishikuro, Sakurai and Kuroda40]) that achieved an energy efficiency comparable to that of wired communication. The energy efficiency of the 2006 test chip interface was dominated by the transmitter circuits, which consumed 2.2 pJ/b out of a total energy consumption of 2.8 pJ/b. By optimizing the shape of the transmitted pulse and reducing its width, in addition to migrating to a 90 nm process and reducing VDD, the (transmitter + receiver) energy consumption was reduced by a factor of 1/20 to achieve 0.14 pJ/b, making it comparable to alternate wired solutions (Figure 1.41, [Reference Miura, Ishikuro, Sakurai and Kuroda40]).

Figure 1.41 Inductive coupling test chip, 2007.

© 2008 The Japan Institute of Electronics Packaging. Reprinted, with permission, from [Reference Miura and Kuroda49]
1.3.4.5 Inductive vs. Capacitive Coupling for 3D Integration

The key advantage of inductive coupling over capacitive coupling for 3D IC integration is twofold. First, signal loss through silicon substrate is small. While capacitive coupling relies on the vertical electric field between the two coupled terminals, inductive coupling utilizes the magnetic field linking the two terminals instead. Figure 1.42 depicts the simulated S21 of the communication channel as a function of substrate resistivity when the substrate is in the transmission path, comparing inductive against capacitive coupling. S21, which is defined as the ratio of output to input signal, is a parameter that quantifies signal transmission efficiency in the frequency domain. The results reveal that electric field is significantly more susceptible to loss to substrate resistivity than the magnetic field used in inductive coupling. When resistivity reaches around the 1~0.1 Ωcm value of p+ Si, capacitive coupling suffers a dramatic increase in loss. As a result, capacitive coupling cannot be used to communicate through a silicon substrate effectively. Instead, it is confined to face-to-face IC integration. The magnetic field does suffer loss due to eddy current induced in the substrate. However, the loss is small enough that the magnetic field can effectively penetrate p+ Si substrate, making it possible to apply inductive coupling to both face-up and back-to-back chip stacking.

Figure 1.42 Simulated S21 dependency on substrate resistivity.

© 2008 The Japan Institute of Electronics Packaging. Reprinted, with permission, from [Reference Miura and Kuroda49]

A second advantage of inductive over capacitive coupling is its support for a longer communication distance. The equations defining the coupling and the transceiver circuits are compared in Figure 1.43. The voltage of the received signal through capacitive coupling is defined by the following equation:

VR=VTCCCC+CSUB,(1.6)

where VT is the transmitted voltage, CC the capacitance between the coupled terminals, and CSUB the capacitance between the receiver terminal and the underlying substrate (Figure 1.43a). As the communication distance and hence the distance between the coupled terminals increases, CC and hence VR decreases. This loss in VR cannot be compensated by increasing the area of the coupled terminals since larger area increases both CC and CSUB proportionally. By contrast, the voltage of the received signal through inductive coupling is defined by the following equation:

VR=MdITdt,(1.7)

where IT is the transmitted current and M the mutual inductance between the coupled terminals (Figure 1.43b). The received voltage is proportional to M, which can be increased by increasing the area of the coupled coils. Therefore, within practical limits, longer communication distance can be supported by enlarging the inductive coils.

Figure 1.43 Mechanism of capacitive and inductive coupling.

© 2008 The Japan Institute of Electronics Packaging. Reprinted, with permission, from [Reference Miura and Kuroda49]

A test chip from 2008 (top-right of Figure 1.38 and [Reference Miura, Kohama, Sugimori, Ishikuro, Sakurai and Kuroda41]) confirmed the scalability of communication distance of inductive coupling. Figure 1.44 compares the performance of the test chip with an alternate capacitive coupling interface from 2007. The same data rate of 11 Gb/s at comparable BER was realized using a smaller area for five times the communication distance. Furthermore, communication distance could be extended from 15 to 45 µm with only a 23% drop in data rate to 8.5 Gb/s at the same BER.

Figure 1.44 Performance comparison between inductive and capacitive coupling.

© 2008 The Japan Institute of Electronics Packaging. Reprinted, with permission, from [Reference Miura and Kuroda49]
1.3.4.6 Wireless <3D Integration

While 2.5D integration using TSVs as introduced in Section 1.3.4.3 offers a lower-cost alternative to TSV-based 3D integration, the introduction of TSVs and a silicon interposer, albeit fabricated in a more mature process, still represents a significant cost adder. Furthermore, the integrated package contains three levels of bumps and solder balls – microbumps between chips and interposer, C4 or flip-chip bumps between interposer and package substrate, and solder balls connecting package substrate to the PCB (Figure 1.45a). As a result, thermomechanical stress can occur in three physical interfaces, degrading reliability of the integrated part. The problem is exacerbated by the increasing silicon interposer size to accommodate larger processor chips and/or more integrated chips placed side by side. For instance, the CoWoS-2 integration technology from TSMC supports a large interposer of 1,200 mm2 in size. To alleviate the cost and reliability problems of conventional 2.5D integration, inductive coupling (wireless, nst ≥ 2) has been explored for enhancing <3D integration. In all three solutions of TCI <3D integration in Figure 1.45, both TSVs and microbumps are eliminated. Furthermore, in TCI 2.5D integration, the large silicon interposer in conventional 2.5D integration is replaced by a small piece of silicon interposer between each pair of adjacent chips. Together they add up to attractive cost saving and higher reliability to enhance <3D integration solutions. These solutions will be further discussed in Section 2.3.

Figure 1.45 TCI <3D integration solutions.

1.3.5 Closing Thoughts

The slowing of Moore’s law scaling has spawned a multitude of More than Moore innovations to provide additional improvement over what Moore’s law can deliver. They include monolithic 3D IC such as 3D NAND flash memory, as well as 3D and 2.5D IC integration. Among 3D IC integration solutions, there are wired vs. wireless, nst = 2 vs. nst ≥ 2 solutions utilizing microbumps, TSVs, and capacitive and inductive coupling. TSVs are also utilized in 2.5D integration, which offers better cost with some disadvantages, such as a larger footprint than its 3D counterpart. Meanwhile, inductive coupling can also be used for various <3D More than Moore solutions. To determine the optimal solution requires careful comparison of a host of factors.

As we learned from looking back at the history of the IC, the industry has evolved from designing for performance only, to designing for performance, power, and energy efficiency. A frequently used IC evaluation metric is PPA, as in performance, power, and area. But even that is insufficient for evaluating an integrated solution. For instance, since the early days of integrating multiple chips in one package, known good die (KGD) has been a major technical challenge. The overall yield of a multichip package is the product of the yields of individual chips and the yield of the integration process. As a result, the yield of the integrated subsystem falls off rapidly as the number of integrated chips increases, unless the chips are known to be good before integration. Unfortunately, the mainstream IC test infrastructure has been set up to test packaged chips instead of bare chips. Therefore, 2.5 and 3D integration of bare IC chips has the additional challenge of KGD. The point of this discussion is that, given the complex performance and cost equations of an integrated IC subsystem, many factors must be considered in comparing different More than Moore solutions. Some key factors are the following:

  • PPA

  • Fabrication costs of individual chips, including overhead for integration

  • Packaging or integration cost

  • Chip yield (KGD)

  • Circuit design complexity and power and signal integrity

  • Development cost and time to market

  • Thermal management especially heat removal

  • Form factor (footprint and height) of integrated subsystem

Careful trade-offs between these factors must be made based on your specific product requirements in order to choose the optimal solution. One size does not fit all.

1.4 Near-Field Coupling Interconnect Technology

1.4.1 Motivation for Using Near-Field Coupling for Interconnection

As enumerated in Section 1.3.4.4, wireless technology has numerous merits for IC integration compared to wired technology. These merits include flexibility in the placement of receiver relative to transmitter, elimination of level shifters between chips supplied with different voltage levels, and removal of ESD protection circuits, which reduces capacitive loading on the interconnection and hence enables a higher data rate, lower power consumption, and smaller circuit area. In addition, chips can be tested wirelessly without concern for damages from a physically connected probe.

However, wireless is actually a catch-all term that includes a multitude of technologies with very different characteristics. In particular, one useful way to classify wireless technologies is to divide them into far-field and near-field. As the names suggest, far-field technology is utilized for communicating through long distances, while near-field short distances, using changing electric and/or magnetic field. Figure 1.46 from [Reference Ishikuro and Kuroda50] illustrates the comparison between near- and far-field. The two wireless technologies widely adopted in smartphones – Wi-Fi for Internet connection and near-field connection (NFC) for mobile payment – are example of far-field and near-field technologies respectively (Figure 1.46a). This agrees with our experience – Wi-Fi supports more than 10 meters in communication distance, while NFC only a few centimeters. Nevertheless, the technical distinction is more complicated since the boundary between far-field and near-field is not absolute, and it depends on factors such as the size of the antenna and wavelength of the signal being transmitted. As a rule of thumb, near-field communication occurs when the communication distance d meets the following criterion:

d<λ2π,(1.8)

where λ is the wavelength of the signal. Therefore, for a 50 GHz signal in free space, a communication distance of <1 mm is within the near-field region.

Figure 1.46 Comparison between near- and far-field.

© 2010 IEEE. Reprinted, with permission, from [Reference Ishikuro and Kuroda50]

Another important distinction between the two modes of wireless communication lies in how the field strength varies over distance. While the field strength degrades inversely with distance from the source in far-field communication, it degrades significantly faster in near-field communication, following an inverse-square law for electric field and inverse-cubed law for magnetic field (Figure 1.46b). As a result, the transmitted energy in near-field communication is concentrated in a small space, leading to more efficient use of energy (Figure 1.46c). Furthermore, parallel communication channels can be placed close to each other without excessive crosstalk, enabling a large number of interconnections and hence high integration and aggregate bandwidth.

Because of these merits of near-field communication, we have adopted it in our TCI, using inductive coupling for wireless IC integration to deliver More than Moore (Figure 1.47a). In addition, we have used it to implement our TLC using electromagnetic coupling to create a contactless connector to enable LEGO-like modular construction of electronic systems (Figure 1.47b).

Figure 1.47 Near-field coupling interconnect technologies.

© 2017 IEEE. Reprinted, with permission, from [Reference Kuroda51]

1.4.2 Near-Field IC Interconnect Technology

1.4.2.1 Motivation for a Near-Field IC Interface

ThruChip Interface (TCI) is a near-field IC interconnect technology that has been developed to address two major challenges facing the IC industry. The first challenge is the slowing of Moore’s law, which calls for innovative More than Moore solutions for 3D IC integration as discussed in Section 1.3. Using inductive coupling, TCI offers a better nst ≥ 2 IC integration solution than TSV, enabling integration of more than two chips at lower cost while delivering high data rate at low power.

The second challenge addressed by TCI is the performance gap between chip core (computational performance) and I/O (interface communication bandwidth). Enabled by Moore’s law, performance of digital circuits in the chip core has been increasing by about 71% every year. This is the result of an annual increase in transistor speed of about 15% coupled with annual increase in chip core functional density of about 49% (1.15 * 1.49 = 1.71). Based on Rent’s rule, to take full advantage of this increased computational performance, the I/O bandwidth should increase at an annual rate of 46% (1.710.7 = 1.46). In reality, however, process scaling only drives an annual I/O bandwidth increase of 28%. While flip-chip packaging is commonplace for high-performance processors today, for many years wirebond was the dominant packaging technology. Since wirebond packaging uses peripheral pads for I/O interconnection, chip I/O functional density improves only linearly with process scaling, by about 11% annually. Together with the annual increase of 15% in transistor speed, total I/O bandwidth improvement from scaling has been around 28% (1.15 * 1.11 = 1.28). To make up for the deficit, circuit innovations have been employed to boost the data rate to sustain a total annual bandwidth increase of about 46% (Figure 1.48).

Figure 1.48 Disparity in performance scaling: chip vs. I/O.

Unfortunately, continuous increase in data rate is not sustainable, because of the high circuit power and off-chip interconnect signal integrity problems that result. For instance, to enable high data rate, a high-speed clock must be distributed across the chip, which requires insertion of buffers to compensate for loss, significantly increasing power consumption of the clock tree. In some cases, I/O power has risen to be more than 30% of total chip power. Therefore, in order to deliver increasing I/O bandwidth without breaking the power budget, instead of disproportionally boosting data rate, a better approach is to increase the number of parallel interconnections. The first step is to shift from using a peripheral to an area array of I/O connection terminals. This has resulted in the wide adoption of flip-chip packaging for high-performance IC. Nevertheless, since the flip-chip bumping process has not been scaling as fast as the IC fabrication process, flip-chip bump pitch and hence I/O interconnection density scaling has not been keeping pace with IC performance increase in terms of process node scaling, as shown in Figure 1.49, based on the industry roadmap published in 2015 [52]. Consequently, an alternate area array interconnect technology that follows Moore’s law scaling is desired.

Figure 1.49 Scaling comparison: flip-chip bump pitch vs. process.

The Wide I/O Mobile DRAM solution introduced in Section 1.3.4.2 was a solution that used TSV to create a large number of parallel interconnections to deliver high I/O bandwidth. But as previously noted, TSV carries a significant cost premium both because of the added processing cost and the area overhead resulting from its large keep-out zone. For a memory chip, the added manufacturing cost of TSV can amount to 10% of the total cost. Furthermore, thermomechanical stresses can lead to loss in reliability. The problem is magnified by the tens of thousands of TSVs and microbumps utilized by each chip. Therefore, a lower fabrication cost, lower area overhead, and higher reliability solution is warranted.

1.4.2.2 Characteristics and Merits of TCI

TCI is an inductive near-field coupling, wireless interconnect technology offering an alternate solution to TSV for 3D IC integration and for delivering high interface bandwidth at low power. Figure 1.50 depicts the basic structure of TCI. Specifically, it consists of a pair of coupled coils that communicate by varying the magnetic field linked between them (Figure 1.50a). Each coil is implemented as a multiturn loop (n = number of turns) across multiple interconnect layers (m = number of layers) as part of the BEOL IC fabrication process (Figure 1.50b). The basic transceiver consists of simple digital CMOS circuits – the transmitter is an H-bridge circuit that switches current in the transmitter coil at the transition of Txdata, while the receiver is a sense-amplifier flip-flop that detects the resultant small voltage magnetically induced in the receiver coil to create Rxdata (Figure 1.50c). With proper mitigation measures, electromagnetic interference (EMI) from the coils can be made negligible, allowing transmitter and receiver circuits to be placed directly underneath the coils with no keep-out zone to form a compact I/O design (Figure 1.50d). In fact, test chip results have shown that TCI coils can be placed directly on top of active circuits and memory cells, including SRAM, NAND, and DRAM, with no appreciable degradation.

Figure 1.50 Basic structure of TCI.

(c) and (d) © 2017 IEEE. Reprinted, with permission, from [Reference Kuroda51]

Table 1.6 enumerates the general merits of TCI given its characteristics, and the value delivered to 3D IC integration and to a low-power, high-bandwidth chip interface respectively. Meanwhile, Table 1.7 highlights how TCI compares favorably against TSV. In contrast to TSV, which requires mechanical processing, TCI is an electrical and digital solution fabricated as part of the standard wafer process. As a result, TCI incurs much lower manufacturing cost. Because OSAT is not involved in implementing the interconnection, TCI can utilize the existing manufacturing ecosystem. Furthermore, unlike TSV, TCI can be placed directly over circuits, resulting in minimal area overhead. Additionally, being a contactless interconnection, TCI eliminates the need for ESD protection circuits, resulting in much smaller parasitic capacitance and hence significantly lower power and higher speed.

Table 1.6. Merits of TCI interconnect technology.

TCI characteristicMeritValue delivered
3D IC integrationLow-power, high-BW I/F
WirelessContactless => no ESDLow power, small area, high speedLow power, small area, high BW
Contactless => no level shifterIntegration flexibility
Contactless probingKGDEasy probing
High placement toleranceHigh integration yield
No thermal stress in interfaceHigh reliability
Near-field couplingConcentration of transmitted powerLow transmission power
Small crosstalk ⇒ high interconnection densitySmall form factorHigh BW
Short communication distanceSmall form factorNo transmission line effects ⇒ high BW
InductiveCommunication through silicon substrateIntegration of more than 2 chips
Coils fabricated in BEOLNo mechanical processing ⇒ high yieldLow cost
No fabrication cost-adderLow cost
Benefit from process scalingHeadroom for PPA improvement
Transceiver uses digital CMOS circuitsBenefit from process scalingHeadroom for PPA improvement
Low EMICircuits under coils ⇒ small area overheadLow cost
No keep-out zone ⇒ high interconnection densityHigh integrationHigh BW

Table 1.7. Comparison of TCI and TSV.

TSVTCI
Connection mechanismMechanicalElectrical
Fabrication processAdditional steps requiredStandard CMOS
Dimensional scalingDifficultEasy
YieldLow, difficult to improveHigh (100%)
EcosystemNew model requiredConventional
Cost adder>40%A few %
PlacementDedicated area with keep-outFlexible
Bandwidth<512 Gb/s>512 Gb/s
ESD protectionRequiredNot required
Power consumptionHighLow
Source: © 2017, IEEE. Reprinted, with permission, from [Reference Kuroda51].

In summary, TCI is a low-cost, low-power, high-bandwidth technology with high interconnection density and reliability, which enables flexible and high-level integration of ICs in small form factor. Furthermore, its PPA will continue to improve as process scales under Moore’s law, enabling it to advance together with IC performance.

The table in Figure 1.51 from [Reference Ishikuro and Kuroda50] illustrates how TCI performance scales with process. Under this scaling scenario, coupling coefficient and crosstalk and hence signal-to-noise ratio will remain unchanged, while both aggregated data rate/area and energy/bit will improve drastically in a cubic fashion. This represents an optimistic scenario – in reality, supply voltage and chip thickness may not scale as fast as transistor size. Nevertheless, it illustrates how TCI performance will benefit from scaling in transistor size, power supply voltage, and chip thickness simultaneously. Hence, there is plenty of headroom for its performance to improve as process scaling continues.

Figure 1.51 TCI performance scaling scenario.

© 2010 IEEE. Reprinted, with permission, from [Reference Ishikuro and Kuroda50]

The potential of TCI has been demonstrated in multiple test chips. In 2010, a data rate of 30 Gb/s/channel and 2.2 Tb/s/mm2 was reported in [Reference Take, Miura and Kuroda53], an energy efficiency of 0.01 pJ/b in [Reference Miura, Shidei, Yuxiang, Kawai, Takatsu, Kiyota, Asano and Kuroda54], and successful random memory access to a stack of 128 NAND flash chips in [Reference Saito, Miura and Kuroda55].

1.4.2.3 Application of TCI

Given the myriad of benefits delivered by TCI, it is expected to offer improvement to a large variety of applications. Figure 1.52 illustrates some of these applications.

Figure 1.52 Example TCI applications.

© 2010 IEEE. Reprinted, with permission, from [Reference Ishikuro and Kuroda50]

For instance, a conventional SSD with a stack of eight wirebonded NAND flash memory chips can be replaced with a stack of 64 TCI-equipped chips. Even with eight times the capacity, the TCI-enabled SSD would use fewer packages (9→1) and bond wires (1500→200), hence be lower cost with lower assembly complexity in a smaller form factor, at almost half the power (Figure 1.53). Additional details are provided in Section 2.7.

Figure 1.53 High-capacity SSD enabled with TCI.

On the other hand, using TCI, a processor can be directly connected to a stack of SRAM chips fabricated in a different process optimized for SRAM and supplied with a different voltage (Figure 1.54, [Reference Niitsu, Shimazaki, Sugimori, Kohama, Kasuga, Nonomura, Saen, Komatsu, Osada, Irie, Hattori, Hasegawa and Kuroda57Reference Osada, Saen, Okuma, Niitsu, Shimazaki, Sugimori, Kohama, Kasuga, Nonomura, Irie, Hattori, Hasegawa and Kuroda58]). This would offer an alternate memory solution to DRAM with better energy efficiency and bandwidth per unit area. Additional details are provided in Sections 2.3.1 and 4.1.

Figure 1.54 Processor and SRAM stack integrated with TCI.

1.4.3 Near-Field Module Connector Technology

1.4.3.1 Motivation for a Near-Field Coupled Module Connector

TCI interconnect technology offers a versatile low-power, high-performance solution for connecting multiple IC chips directly together. However, it is not always possible to integrate all the IC chips that need to communicate with each other into a single package. For instance, to build a large-scale computer system such as a supercomputer, it is necessary to modularize smaller building blocks such as a graphic or memory card. This is done for multiple reasons, including cost, design simplicity, ease of customization, serviceability (ease of replacing defective parts), and scalability. As a result, while it is best to have minimal communication distance to suppress signal integrity problems and interface power, it is also necessary to develop interface technology that supports long communication distance, traversing multiple printed circuit boards and connectors.

Interface technology that supports long communication distance must address similar challenges of delivering high bandwidth at low power as is described in Section 1.4.1. As an illustration, the communication data rate between the application processor and display in a smartphone continues to climb exponentially as new display format emerges (Figure 1.55, [Reference Kosuge59]). Full HD format, which has enjoyed widespread adoption starting around 2016, requires a date rate of 3 Gb/s. To support the next upgrade in resolution to 4K/8K, the maximum data rate will need to increase to 50 Gb/s. Even higher data rates will be necessary to support richer color and higher refresh rate. With these trends as the backdrop, the MIPI Alliance responsible for setting standards for mobile applications has created its high-performance interface physical layer M-PHY specification to support 10 Gb/s-class internal data communication.

Figure 1.55 Data rate as a function of display format.

© 2016 Atsutake Kosuge [Reference Kosuge59]

Meanwhile, the widely adopted DDR external memory interface, USB and PCIe peripheral interfaces, automotive LAN interface, as well as satellite processor-memory interfaces are all experiencing similar increases in data rate (Figure 1.56, [Reference Kosuge59]).

Figure 1.56 Data rate trends of widely adopted interfaces.

© 2016 Atsutake Kosuge [Reference Kosuge59]

With the exception of automotive LAN interface, all interfaces have a data rate exceeding 1 Gbps/lane. Given that communication distance in these interfaces is longer than a centimeter, at such high data rates, signal transmission is subject to transmission line effects. (Remember that a PCB interconnect only needs to be longer than 1.67 mm to appear as a transmission line to a 1 Gbps signal.) As a result, signal transmission is subject to reflection and hence distortion at any discontinuity in characteristic impedance Z0 along the interface (Section 1.2.3).

Figure 1.57 depicts a typical interconnect structure for some of these interfaces constructed with modules. To minimize signal distortion, the signal must see Z0 close to being constant as it traverses this interconnect structure. For practical reasons, the default Z0 for most applications is 50 Ω. Therefore, the target Z0 for module PCB1, connector1, main PCB, connector2, and module PCB2 should all be 50 Ω. As data rate goes up such that smaller and smaller interconnect features begin to exhibit transmission line effects, designing for constant Z0 becomes increasingly challenging. Z0, which governs how an electromagnetic signal propagates, is determined by the electric and magnetic field patterns around the transmission line. It is therefore highly dependent on any large metallic component of the interconnect structure. Conversely, it is easier to control Z0 of an interconnect when there is a large inherent metallic component. As a result, it is easier to control Z0 of a PCB interconnect than a connector since the former generally contains large voltage and ground planes running parallel to signal paths while the latter does not (Figures 1.57 and 1.58). Figure 1.59 shows the simulated signal degradation due to an interconnect consisting of a backplane connector plus a 5 cm PCB trace. The significant degradation due to the connector is evident in both the frequency and time domain.

Figure 1.57 A typical interconnect structure in a modular system.

© 2016 Atsutake Kosuge [Reference Kosuge59]

Figure 1.58 Internal structure of a backplane connector.

© 2016 Atsutake Kosuge [Reference Kosuge59]

Figure 1.59 Simulated signal degradation of a backplane interconnect.

© 2016 Atsutake Kosuge [Reference Kosuge59]

To compensate for the severe degradation due to transmission line effects, complicated equalization circuits are commonplace in the implementation of high-speed interfaces. Common equalization circuit solutions include TX (transmitter) deemphasis to boost high-frequency components while attenuating low-frequency components of the transmitted signal (Figure 1.60a, [Reference Meghelli, Rylov, Bulzacchelli, Rhee, Rylyakov, Ainspan, Parker, Beakes, Chung, Beukema, Pepeljugoski, Shan, Kwark, Gowda and Friedman60]), continuous time linear equalizer (CTLE) to boost high-frequency components of the received signal (Figure 1.60b, [Reference Shekhar, Jaussi, O’Mahony, Mansuri and Casper61]), and decision feedback equalizer (DFE), which determines how to compensate each received bit based on the preceding bit pattern (Figure 1.60c, [Reference Musah, Jaussi, Balamurugan, Hyvonen, Hsueh, Keskin, Shekhar, Kennedy, Sen, Inti, Mansuri, Leddige, Horine, Roberts, Mooney and Casper62]). As the name implies, equalization circuits serve to equalize the transmission gain of an interconnect across different frequencies. For proper data transmission, such circuits are necessary to compensate for the wild variation in gain over frequency, as shown in Figure 1.59a. While CTLE is relatively simple, TX deemphasis and DFE circuits grow in complexity as the interconnect complexity increases. Equalization circuits can therefore consume a lot of both power and chip area, significantly increasing interface power and area overhead.

Figure 1.60 Common equalization circuit solutions.

© 2016 Atsutake Kosuge [Reference Kosuge59]

Another transmission line effect degrading signal integrity is crosstalk. In addition to controlling characteristic impedance, large voltage and ground planes also serve to suppress crosstalk since they limit the spreading of electric and magnetic fields from one transmission line to another. Conversely, the lack of such planes in a connector contributes to not only poorly controlled characteristic impedance, but also significant crosstalk. As a result, from an electrical signal integrity standpoint, a connector is generally the weakest link in an interconnect structure for a high-speed interface.

From a mechanical standpoint too, the connector has weaknesses in terms of environmental robustness given that a connection is made through physical contact. This can be problematic when the connector must operate under harsh environmental conditions, such as when it is used in consumer applications and subject to user mishandling, or in an automotive and subject to heat and vibration as well as high density of contaminants, or in a satellite and subject to stressful conditions during launch and a harsh environment in space. In particular, to allow for physical contact to make a connection, connector pins must be exposed and hence subject to deterioration due to environmental conditions. Furthermore, strong vibration can weaken the connection, especially when it is solderless. On the other hand, water can create a short, while contaminants can create a short or increase contact resistance. Additionally, for applications where the connector must endure multiple cycles of mating and disengaging, the physical connection can be weakened through wear and tear.

The ever-shrinking form factor demanded by applications such as mobile and IoT poses additional challenges to the conventional connector. Smaller physical dimensions further degrade the environmental robustness of the connector. Smaller pins and pin pitch are more susceptible to the effects of contaminants and shorting, and wear and tear. Furthermore, smaller pins result in a smaller contact force making it easier to break a connection through vibration. On the other hand, since connector manufacturing involves mechanical processing, it is difficult to scale down its dimensions, making it challenging to support not only a shrinking form factor but also increasing interconnection density.

As a result, an alternate connector technology is necessary to enable better characteristic impedance and crosstalk control for high-speed interfaces, better robustness for operating under harsh environment, and better scaling to support increasing pin count in shrinking form factor.

While TCI can address a lot of these challenges, it suffers from two limitations when applied to a connector. First, since the coil in the coupler is inductive, its impedance varies with frequency. As a result, when it is used to connect two transmission lines, it creates a discontinuity in the characteristic impedance of the interconnect. This is demonstrated by the simulated received signal through a TCI connector attached to a 5 cm, 50 Ω transmission line in Figure 1.61. The impedance discontinuity results in large ripples in the frequency domain gain, and closes the data eye in time domain. Second, the coil inductance together with its parasitic capacitance create a resonance circuit. Since coil size varies linearly with communication distance, the large communication distance required in connector applications means that the coil size must grow, simultaneously increasing its inductance and capacitance, resulting in a lower resonant frequency. Consequently, a TCI connector has a limited bandwidth.

Figure 1.61 Simulated received signal through a TCI connector and transmission line.

© 2016 IEEE. Reprinted, with permission, from [Reference Kosuge, Kadomoto and Kuroda63]
1.4.3.2 Characteristics and Merits of TLC Technology

TLC is an electromagnetic near-field coupling, wireless interconnect technology developed to address the challenges faced by conventional connectors enumerated in the previous section. Figure 1.62a depicts the basic structure of TLC, which consists of two pairs of differential transmission lines aligned face to face with each other and can be implemented in a PCB manufacturing process. Each pair is designed to have well-controlled characteristic impedance and minimal crosstalk by virtue of a proximal ground plane (Figure 1.62b). This is evident from the simulated transmission gain in the frequency domain of an example TLC connector in Figure 1.63, which exhibits relative flatness over a wide frequency range. Note that since TLC is a wireless technology, there is no DC gain and hence the curve falls off at low frequency.

Figure 1.62 Basic structure of TLC.

(a) and (b) © 2016 Atsutake Kosuge [Reference Kosuge59]; (c) © 2015 IEEE. Reprinted, with permission, from [Reference Kosuge, Hashiba, Kawajiri, Hasegawa, Shidei, Ishikuro, Kuroda and Takeuchi64]

Figure 1.63 Transmission gain of an example TLC connector.

© 2015 IEEE. Reprinted, with permission, from [Reference Kosuge, Ishizuka, Taguchi, Ishikuro and Kuroda65]

Figure 1.62c shows the basic structure of the TLC transceiver. It consists of a simple 50 Ω terminated current mode logic driver and a hysteresis comparator in the receiver. In contrast to conventional connectors, the well-controlled characteristic impedance of a TLC connector significantly reduces the need for complex equalization circuits.

Table 1.8 enumerates the general merits of TLC given its characteristics and how they address the challenges of implementing a high-speed interface that includes connectors. It lists the characteristics that strengthen the environmental robustness of the connector. Furthermore, it explains how it enables a low-power and high-bandwidth interface at a lower cost than conventional mechanical connectors. While the PCB manufacturing process does not scale as fast as the IC fabrication process, it enjoys better scaling than the mechanical processes used to manufacture conventional connectors. As a result, there is room for scaling the TLC connector form factor. Besides, using digital CMOS circuits for its transceiver allows its PPA to improve as IC process scales. Finally, not requiring level shifters can be a real benefit when more and more heterogeneous subsystems are connected together as the semiconductor continues to proliferate, especially in IoT and automotive applications.

Table 1.8. Merits of TLC connector technology.

Characteristic of TLCMeritChallenge addressed
WirelessContactless ⇒ connection terminals can be sealedNo shorting by water or contaminants
No deterioration from direct contact with environment
Contactless ⇒ no level shifterIntegration flexibility
No physical interconnectionNo wear and tear
Vibration tolerance
Near-field couplingSmall crosstalk ⇒ high interconnection densitySmall form factor
High bandwidth
Short communication distanceSmall form factor
Fabricated in PCB processNo mechanical processing ⇒ high yieldLow cost
No fabrication cost adderLow cost
Benefit from PCB process scalingContinuous improvement of PPA
Transceiver using only digital CMOS circuitsBenefit from IC process scalingContinuous improvement of PPA
Implemented with transmission linesWell-controlled characteristic impedance and low crosstalkHigh bandwidth
No complicated equalization circuitsLow power, small area, simple transceiver design
1.4.3.3 Application of TLC

Given the myriad of benefits delivered by TLC, it is expected to offer improvement to a large variety of applications. Figure 1.64 from [Reference Kuroda51] illustrates some of these applications, which range from consumer applications, including memory cards, displays, smartphones, and PC memory DIMM modules, to automotive and space applications.

Figure 1.64 Example TLC applications.

© 2017 IEEE. Reprinted, with permission, from [Reference Kuroda51]

For instance, TLC can enable a low-cost, low-power, high-bandwidth solution to build a modular smartphone with various heterogeneous subsystems, with low profile and small footprint (Figure 1.65, [Reference Kosuge, Ishizuka, Kadomoto and Kuroda68]), while offering headroom for future PPA improvement. Further discussion of TLC application in smartphones is provided in Section 3.3.

Figure 1.65 A modular smartphone design.

© 2015 IEEE. Reprinted, with permission, from [Reference Kosuge, Ishizuka, Kadomoto and Kuroda68]

1.4.4 Closing Thoughts

A key merit of wireless interconnect technology can be appreciated by examining the conceptual structure of an interconnect. It is basically a perpendicular structure that connects two parts fabricated in a layering process – IC process for chips and PCB process for boards attached with a connector. While the layering process can be scaled over time to downsize the parts, the perpendicular structure is not as scalable. Furthermore, relative movement between the two parts due to thermal expansion or vibration can induce stress in the perpendicular structure, thereby reducing its reliability. By using a wireless interconnect technology, this perpendicular structure is eliminated, resulting in a more scalable and reliable solution. Furthermore, eliminating the physical contact helps improve the environmental robustness of the solution, and facilitates interconnection of two heterogeneous parts, such as parts with different supply voltages.

Within the wireless solution space, using near-field coupling suppresses crosstalk and focuses the transmitted energy within the short communication distance, resulting in higher bandwidth, lower power, and higher interconnection density for smaller form factor.

In TCI, near-field coupling is realized through inductive coupling between coils implemented using BEOL of the IC fabrication process. This enables very small coils and very small communication distance, resulting in simple transceiver design in a low power solution for 3D IC integration and for delivering high interface bandwidth that scales with the IC fabrication process.

Meanwhile in TLC, near-field coupling is realized through electromagnetic coupling between transmission lines implemented using the PCB manufacturing process. This enables an interconnect with well-controlled characteristic impedance and crosstalk and hence simple transceiver design, resulting in a low-power and high-bandwidth connector solution that scales with the PCB manufacturing process.

With the myriad of benefits offered by TCI and TLC, they are expected to open up possibilities for performance improvement in various applications. For instance, although the concept of a modular smartphone design has not taken root, the LEGO-like modularization enabled by TLC should open up possibilities for system designers to create more customized products in different application areas using the same set of building block modules, driving rapid and further proliferation of IC and electronics in different facets of our life.

References

Kuroda, T.. “Semiconductor industry in 2025,” Panel Discussion (presented but not published). 2010 IEEE International Solid-State Circuits Conference, Feb. 2010. Updated Nov. 2019.Google Scholar
Kuroda, T. and Sakurai, T.. (1995, Apr.). “Overview of low-power ULSI circuit techniques.” IEICE Transactions on Electronics. E78-C(4), pp. 334344.Google Scholar
Kuroda, T.. “Will SOI ever become a mainstream technology?” Panel Discussion (presented but not published), 2002 IEEE International Electron Devices Meeting, Dec. 2002.Google Scholar
TE Connectivity Ltd. Retrieved on Jun. 11, 2019. Available: www.te.com/usa-en/products/brands/amp.html?tab=pgp-storyGoogle Scholar
SMK Corporation, “電線対基板圧着コネクタの動向 [Crimped connector trends],” in Japanese. Retrieved on Jun. 11, 2019. Available: www.smk.co.jp/products/connectors/technology/1002dempaCSbtow/Google Scholar
Process technology history – Intel. WikiChip. Retrieved on Jun. 11, 2019. Available: https://en.wikichip.org/wiki/intel/processGoogle Scholar
Japan Electronics and Information Technology Industries Association (JEITA). (2017, Mar.). “2026 年までの電子部品技術ロードマップ [Roadmap of electronic component technology up to year 2026].” In Japanese. Retrieved on Jun. 11, 2019. Available: www.jeita.or.jp/japanese/assets/pdf/letter/vol21/15.pdfGoogle Scholar
Kuroda, T.. (1999, Apr.). “ディープサブミクロン時代の半導体集積回路の技術課題とEDAへの期待 [Technological challenges of IC in the deep submicron age and expectations for EDA].” 情報処理学会論文誌 [IPSJ Journal], 40(4), pp. 15001506.Google Scholar
Kuroda, T.. (2007, Nov.). “システムLSIの低電力技術 [Low-power technology for system LSI].” 電子情報通信学会誌 [Journal of Institute of Electronics, Information and Communication Engineers], 90(11), pp. 977981.Google Scholar
Suzuki, K., Mita, S., Fujita, T., Yamane, F., Sano, F., Chiba, A., Watanabe, Y., Matsuda, K., Maeda, T., and Kuroda, T.. “A 300MIPS/W RISC core processor with variable supply-voltage scheme in variable threshold-voltage CMOS.” 1997 IEEE Custom Integrated Circuits Conference, pp. 587–590, May 1997.Google Scholar
Kuroda, T.. “Optimization and control of VDD and VTH for low-power, high-speed CMOS design.” 2002 IEEE/ACM International Conference on Computer-Aided Design, pp. 28–34, 2002.Google Scholar
Kuroda, T., Fujita, T., Mita, S., Nagamatsu, T., Yoshioka, S., Suzuki, K., Sano, F., Norishima, M., Murota, M., Kako, M., Kinugawa, M., Kakumu, M., and Sakurai, T.. (1996, Nov.). “A 0.9V 150MHz 10mW 4mm2 2-D discrete cosine transform core processor with variable-threshold-voltage scheme.” IEEE Journal of Solid-State Circuits. 31(11), pp. 17701779.Google Scholar
Takahashi, M., Hamada, M., Nishikawa, T., Arakida, H., Fujita, T., Hatori, F., Mita, S., Suzuki, K., Chiba, A., Terazawa, T., Sano, F., Watanabe, Y., Usami, K., Igarashi, M., Ishikawa, T., Kanazawa, M., Kuroda, T., and Furuyama, T.. (1998, Nov.). “A 60-mW MPEG4 video codec using clustered voltage scaling with variable supply-voltage scheme.” IEEE Journal of Solid-State Circuits. 33(11), pp. 17721780.Google Scholar
Fuketa, H., Yasufuku, T., Iida, S., Takamiya, M., Nomura, M., Shinohara, H., and Sakurai, T.. “Device-circuit interactions in extremely low voltage CMOS designs (invited).” 2011 IEEE International Electron Devices Meeting, pp. 559–562, Dec. 2011.Google Scholar
De, V.. “Energy efficient computing in nanoscale CMOS: Challenges and opportunities (plenary),” 2014 IEEE Asian Solid-State Circuits Conference, pp. 5–8, Nov. 2014.Google Scholar
Ishigaki, T., Tsuchiya, R., Morita, Y., Sugii, N., and Kimura, S.. (2010). “Ultralow-power LSI technology with silicon on thin buried oxide (SOTB) CMOSFET,” in Solid State Circuits Technology, Swart, J. W., Ed. Croatia: INTECH, 2010, ch. 7, pp. 145156.Google Scholar
Topaloglu, R. O. and Wong, H.-S. P., Eds. (2015). Beyond-CMOS Technologies for Next Generation Computer Design. Cham, Switzerland: Springer.Google Scholar
Fant, K.. (2005). Logically Determined Design: Clockless System Design with NULL Convention Logic. Hoboken: Wiley-Interscience.Google Scholar
Maruyama, T., Hamada, M., and Kuorda, T.. (2018, Aug.). “Comparative performance analysis of dual-rail domino logic and CMOS logic under near-threshold operation,” IEEE International Midwest Symposium on Circuits and Systems, pp. 25–28.Google Scholar
Rowley, J. D.. (2018, Mar.). “Venture funding into AI and machine learning levels off as tech matures.” Retrieved on Jun. 12, 2019. Available: news.crunchbase.com/news/venture-funding-ai-machine-learning-levels-off-tech-matures/.Google Scholar
Tyson, M.. (2018, Jun.). “Intel 10nm density is 2.7× improved over its 14nm node.” Retrieved on Jun. 12, 2019. Available: https://hexus.net/tech/news/cpu/119699-intel-10nm-density-27x-improved-14nm-nodeGoogle Scholar
Hruska, J.. (2018, Jun.). “As chip design costs skyrocket, 3nm process node is in jeopardy.” Retrieved on Jun. 12, 2019. Available: www.extremetech.com/computing/272096-3nm-process-nodeGoogle Scholar
Manners, D.. (2014, Apr.). “EUV cost is $14bn and counting.” Retrieved on Jun. 12, 2019. Available: www.electronicsweekly.com/news/business/finance/euv-cost-14bn-counting-2014-04/Google Scholar
Moore, S.. (2018, Jan.). “EUV lithography finally ready for chip manufacturing.” Retrieved on Jun. 12, 2019. Available: https://spectrum.ieee.org/semiconductors/nanotechnology/euv-lithography-finally-ready-for-chip-manufacturingGoogle Scholar
Aochi, H., Katsumata, R., and Fukuzumi, Y.. (2011). “BiCS flash memory for realization of ultrahigh-density nonvolatile storage devices.” In Japanese. Toshiba Review. 66(9), pp. 1619.Google Scholar
(2005, Oct.). “LSIは平面から立体へ チップを貫く伝送路で実現 [Shift from 2D to 3D LSI integration realized using through-chip interconnects].” In Japanese. Nikkei Electronics, (2005/10/10) pp. 82–91.Google Scholar
Yano, Y., Sugiyama, T., Ishihara, S., Fukui, Y., Juso, H., Miyata, K., Sota, Y., and Fujita, K.. “Three-dimensional very thin stacked packaging technology for SiP.” 52nd Electronic Components and Technology Conference, pp. 1329–1334, May 2002.Google Scholar
Matsudera, K. and Kawasaki, K.. (2016). “World’s first 16-die stacked NAND flash memory package fabricated using TSV technology.” Toshiba Review. 71(6), pp. 2023.Google Scholar
Worwag, W. and Dory, T.. “Copper via plating in three dimensional interconnects.” 2007 Electronic Components and Technology Conference, pp. 842–846, May 2007.Google Scholar
James, D.. (2014). “3D ICs in the real world.” 25th Annual SEMI Advanced Semiconductor Manufacturing Conference, pp. 113–119.Google Scholar
Ezaki, T., Kondo, K., Ozaki, H., Sasaki, N., Yonernura, H., Kitano, M., Tanaka, S., and Hirayarna, T.. “A 160Gb/s interface design configuration for multichip LSI.” 2004 IEEE International Solid-State Circuits Conference, pp. 140–141, Feb. 2004.Google Scholar
Burns, J., McIlrath, L., Keast, C., Lewis, C., Loomis, A., Warner, K., and Wyatt, P.. “Three-dimensional integrated circuits for low-power, high-bandwidth systems on a chip.” 2001 IEEE International Solid-State Circuits Conference, pp. 268–269, Feb. 2001.Google Scholar
Kanda, K., Antono, D. D., Ishida, K., Kawaguchi, H., Kuroda, T., and Sakurai, T.. “1.27Gb/s/pin 3mW/pin wireless superconnect (WSC) interface scheme.” 2003 IEEE International Solid-State Circuits Conference, pp. 186–187, Feb. 2003.Google Scholar
Mizoguchi, D., Yusof, Y. B., Miura, N., Sakura, T., and Kuroda, T.. “A 1.2Gb/s/pin wireless superconnect based on inductive inter-chip signaling (IIS).” 2004 IEEE International Solid-State Circuits Conference, pp. 142–143, Feb. 2004.Google Scholar
Kumagai, K., Yang, C., Izumino, H., Narita, N., Shinjo, K., Iwashita, S., Nakaoka, Y., Kawamura, T., Komabashiri, H., Minato, T., Ambo, A., Suzuki, T., Liu, Z., Song, Y., Goto, S., Ikenaga, T., Mabuchi, Y., and Yoshida, K.. “System-in-silicon architecture and its application to H.264/AVC motion estimation for 1080HDTV.” 2006 IEEE International Solid-State Circuits Conference, pp. 430–431, Feb. 2006.Google Scholar
Hopkins, D., Chow, A., Bosnyak, R., Coates, B., Ebergen, J., Fairbanks, S., Gainsley, J., Ho, R., Lexau, J., Liu, F., Ono, T., Schauer, J., Sutherland, I., and Drost, R.. “Circuit techniques to enable 430Gb/s/mm2 proximity communication.” 2007 IEEE International Solid-State Circuits Conference, pp. 368–369, Feb. 2007.Google Scholar
Gu, Q., Xu, Z., Ko, J., and Chang, M.-C. F.. “Two 10Gb/s/pin low-power interconnect methods for 3D ICs.” 2007 IEEE International Solid-State Circuits Conference, pp. 448–449, Feb. 2007.Google Scholar
Miura, N., Mizoguchi, D., Inoue, M., Tsuji, H., Sakurai, T., and Kuroda, T.. “A 195Gb/s 1.2W 3D-stacked inductive inter-chip wireless superconnect with transmit power control scheme.” 2005 IEEE International Solid-State Circuits Conference, pp. 264–265, Feb. 2005.Google Scholar
Miura, N., Mizoguchi, D., Inoue, M., Niitsu, K., Nakagawa, Y., Tago, M., Fukaishi, M., Sakurai, T., and Kuroda, T.. “A 1Tb/s 3W inductive-coupling transceiver for inter-chip clock and data link.” 2006 IEEE International Solid-State Circuits Conference, pp. 424–425, Feb. 2006.Google Scholar
Miura, N., Ishikuro, H., Sakurai, T., and Kuroda, T.. “A 0.14pJ/b inductive-coupling inter-chip data transceiver with digitally-controlled precise pulse shaping.” 2007 IEEE International Solid-State Circuits Conference, pp. 358–359, Feb. 2007.CrossRefGoogle Scholar
Miura, N., Kohama, Y., Sugimori, Y., Ishikuro, H., Sakurai, T., and Kuroda, T.. “An 11Gb/s inductive-coupling link with burst transmission.” 2008 IEEE International Solid-State Circuits Conference, pp. 298–299, Feb. 2008.Google Scholar
Denda, S.. (2015.). 半導体の高次元化技術 [Enabling Technology for Higher Dimensional Semiconductors], in Japanese. Japan: Tokyo Denki University Press.Google Scholar
Kim, J.. “The future of graphic and mobile memory for new applications.” 2016 IEEE Hot Chips 28 Symposium, Aug. 2016.Google Scholar
Karmarkar, A. P., Xu, X., and Moroz, V.. “Performance and reliability analysis of 3D-integration structures employing Through Silicon Via (TSV).” IEEE International Reliability Physics Symposium, pp. 682–687, Apr. 2009.Google Scholar
Samal, S. K., Nayak, D., Ichihashi, M., Banna, S., and Lim, S. K.. “Monolithic 3D IC vs. TSV-based 3D IC in 14nm FinFET technology.” IEEE SOI-3D-Subthreshold Microelectronics Technology Unified Conference, pp. 1–2, 2016.Google Scholar
Bolsens, I.. (2011). “2.5D ICs: Just a stepping stone or a long term alternative to 3D.” Retrieved Aug. 8, 2019. Available: www.xilinx.com/publications/about/3-D_Architectures.pdf.Google Scholar
Madden, L., Wu, E., Kim, N., Banijamali, B., Abugharbieh, K., Ramalingam, S., and Wu, X.. “Advancing high performance heterogeneous integration through die stacking.” 2012 European Solid-State Circuit Conference, pp. 18–24, Sep. 2012.Google Scholar
Chen, W. C., Hu, C., Ting, K. C., Wei, V., Yu, T. H., Huang, S. Y., Chang, V.C.Y., Wang, C. T., Hou, S. Y., Wu, C. H., and Yu, D.. “Wafer level integration of an advanced logic-memory system through 2nd generation CoWoS® technology.” 2017 Symposium on VLSI Technology, pp. T54–T55, Jun. 2017.Google Scholar
Miura, N. and Kuroda, T.. (2008). 3次元実装のための低電力・広帯域誘導結合通信 [Low-power and wideband inductive-coupling communication for 3D integration]. In Japanese. エレクトロニクス実装学会誌 Journal of The Japan Institute of Electronics Packaging. 11(3), pp 174181. Available: www.jstage.jst.go.jp/article/jiep1998/11/3/11_3_174/_pdf/-char/ja.Google Scholar
Ishikuro, H. and Kuroda, T.. (2010, Oct.). “Wireless proximity interfaces with a pulse-based inductive coupling technique.IEEE Communications Magazine. 48(10), pp. 192199.CrossRefGoogle Scholar
Kuroda, T.. “System integration in a package for cloud and edge.” 2017 IEEE Electron Devices Technology and Manufacturing Conference, pp. 42–43, Feb. 2017.Google Scholar
(2015). International Technology Roadmap for Semiconductors 2.0, 2015 Edition, Heterogeneous Integration. Available: www.dropbox.com/sh/3jfh5fq634b5yqu/AADYT8V2Nj5bX6C5q764kUg4a?dl=0&preview=2_2015+ITRS+2.0+Herogeneous+Integration.pdf.Google Scholar
Take, Y., Miura, N., and Kuroda, T.. “A 30 Gb/s/Link 2.2 Tb/s/mm2 inductively-coupled injection-locking CDR for high-speed DRAM interface.” 2010 IEEE Asian Solid-State Circuits Conference, pp. 81–84, Nov. 2010.Google Scholar
Miura, N., Shidei, T., Yuxiang, Y., Kawai, S., Takatsu, K., Kiyota, Y., Asano, Y., Kuroda, T.. “A 0.7V 20fJ/bit inductive-coupling data link with dual-coil transmission scheme.” 2010 Symposium on VLSI Circuits, pp. 201–202, Jun. 2010.CrossRefGoogle Scholar
Saito, M., Miura, N., and Kuroda, T.. “A 2Gb/s 1.8pJ/b/chip inductive-coupling through-chip bus for 128-Die NAND-Flash memory stacking.” 2010 IEEE International Solid-State Circuits Conference, pp. 440–441, Feb. 2010.Google Scholar
Saito, M., Sugimori, Y., Kohama, Y., Yoshida, Y., Miura, N., Ishikuro, H., Sakurai, T., and Kuroda, T.. (2010, Jan.). “2 Gb/s 15 pJ/b/chip inductive-coupling programmable bus for NAND Flash memory stacking.IEEE Journal of Solid-State Circuits. 45(1), pp. 134141.Google Scholar
Niitsu, K., Shimazaki, Y., Sugimori, Y., Kohama, Y., Kasuga, K., Nonomura, I., Saen, M., Komatsu, S., Osada, K., Irie, N., Hattori, T., Hasegawa, A., and Kuroda, T. “An inductive-coupling link for 3D integration of a 90nm CMOS processor and a 65nm CMOS SRAM.” 2009 IEEE International Solid-State Circuits Conference, pp. 480–481, Feb. 2009.Google Scholar
Osada, K., Saen, M., Okuma, Y., Niitsu, K., Shimazaki, Y., Sugimori, Y., Kohama, Y., Kasuga, K., Nonomura, I., Irie, N., Hattori, T., Hasegawa, A., and Kuroda, T.. “3D system integration of processor and multi-stacked SRAMs by using inductive-coupling links.” 2009 Symposium on VLSI Circuits, pp. 256–257, Jun. 2009.Google Scholar
Kosuge, A.. “伝送線路結合器を用いた高信頼非接触インタフェース [High reliability contactless interface using transmission line coupler],” in Japanese, Ph.D. dissertation, Department of Electronics and Electrical Engineering, Keio University, Yokohama, Kanagawa Prefecture, Japan, 2016.Google Scholar
Meghelli, M., Rylov, S., Bulzacchelli, J., Rhee, W., Rylyakov, A., Ainspan, H., Parker, B., Beakes, M., Chung, A., Beukema, T., Pepeljugoski, P., Shan, L., Kwark, Y., Gowda, S., and Friedman, D.. “A 10Gb/s 5-Tap-DFE/4-Tap-FFE transceiver in 90nm CMOS.” 2006 IEEE International Solid-State Circuits Conference, pp. 213–214, Feb. 2006.Google Scholar
Shekhar, S., Jaussi, J. E., O’Mahony, F., Mansuri, M., and Casper, B.. “Design considerations for low-power receiver front-end in high-speed data links.” 2013 IEEE Custom Integrated Circuits Conference, pp. 1–8, Sep. 2013.Google Scholar
Musah, T., Jaussi, J. E., Balamurugan, G., Hyvonen, S., Hsueh, T.-C., Keskin, G., Shekhar, S., Kennedy, J., Sen, S., Inti, R., Mansuri, M., Leddige, M., Horine, B., Roberts, C., Mooney, R., and Casper, B.. (2014, Dec.). “A 4-32 Gb/s bidirectional link with 3-Tap FFE/6-Tap DFE and collaborative CDR in 22 nm CMOS.” IEEE Journal of Solid-State Circuits. 49(12), pp. 30793090.Google Scholar
Kosuge, A., Kadomoto, J., and Kuroda, T.. (2016, Jun.). “A 6 Gb/s 6 pJ/b 5 mm-distance non-contact interface for modular smartphones using two-fold transmission line coupler and high EMC tolerant pulse transceiver.IEEE Journal of Solid-State Circuits. 51(6), pp. 14461456.Google Scholar
Kosuge, A., Hashiba, J., Kawajiri, T., Hasegawa, S., Shidei, T., Ishikuro, H., Kuroda, T., and Takeuchi, K.. “Inductively-powered wireless solid-state drive (SSD) system with merged error correction of high-speed non-contact data links and NAND flash memory.” 2015 Symposium on VLSI Circuits, pp. c218–c219, Jun. 2015.Google Scholar
Kosuge, A., Ishizuka, S., Taguchi, M., Ishikuro, H., and Kuroda, T.. (2015, Aug.). “Analysis and design of an 8.5-Gb/s/link multi-drop bus using energy-equipartitioned transmission line couplers.” IEEE Transactions on Circuits and Systems. 62(8), pp. 21222131.Google Scholar
Miura, N., Saito, M., Taguchi, M., and Kuroda, T.. (2013, Feb.). “A 6nW inductive-coupling wake-up transceiver for reducing standby power of non-contact memory card by 500×.” 2013 IEEE International Solid-State Circuits Conference, pp. 214–215.Google Scholar
Mizuhara, W., Shidei, T., Kosuge, A., Takeya, T., Miura, N., Taguchi, M., Ishikuro, H., and Kuroda, T.. “A 0.15mm-thick non-contact connector for MIPI using vertical directional coupler.” 2013 IEEE International Solid-State Circuits Conference, pp. 200–201, Feb. 2013.Google Scholar
Kosuge, A., Ishizuka, S., Kadomoto, J., and Kuroda, T.. “A 6Gb/s 6pJ/b 5mm-distance non-contact interface for modular smartphones using two-fold transmission-line coupler and EMC-qualified pulse transceiver.” 2015 IEEE International Solid-State Circuits Conference, pp. 176–177, Feb. 2015.Google Scholar
Kosuge, A., Ishizuka, S., Liu, L., Okada, A., Taguchi, M., Ishikuro, H., and Kuroda, T.. “An electromagnetic clip connector for in-vehicle LAN to reduce wire harness weight by 30%.” 2014 IEEE International Solid-State Circuits Conference, pp. 496–497, Feb. 2014.Google Scholar
Kosuge, A., Ishizuka, S., Abe, M., Ichikawa, S., and Kuroda, T.. “A 6.5Gb/s shared bus using electromagnetic connectors for downsizing and lightening satellite processor system by 60%.” 2015 IEEE International Solid-State Circuits Conference, pp. 434–435, Feb. 2015.Google Scholar
Figure 0

Figure 1.1 Programming ENIAC. Photo by US Army, public domain

Figure 1

Figure 1.2 Evolution of the integrated circuit [1].

Figure 2

Table 1.1. Transistor scaling scenarios

Source: © 1995 IEICE, [2] table 1.
Figure 3

Figure 1.3 Low leakage device technologies [3].

Figure 4

Figure 1.4 IEEE Spectrum cover, April 2015.

© 2015 IEEE
Figure 5

Figure 1.5 A crimped termination.

© 2010 SMK Corporation [5]
Figure 6

Figure 1.6 FPC connector dimensions scaling, 1996–2004.

© 2010 SMK Corporation [5]
Figure 7

Figure 1.7 FPC connector pitch roadmap, 2016–2026.

© 2017 JEITA [7]
Figure 8

Figure 1.8 Evolution of solutions to the connection problem of a large-scale computer.

Figure 9

Figure 1.9 Evolution of IC power and power density.

© 1999 IPSJ [8]
Figure 10

Table 1.2. Constant electric field scaling changes from around 1995.

Source: © 2007 IEICE, [9], table 1.
Figure 11

Figure 1.10 IC power and RC delay as a function of VDD and VTH.

© 1995 IEICE, [2], figure 4
Figure 12

Figure 1.11 Power consumption reduction through VDD optimization.

© 1997 IEEE. Reprinted, with permission, from [10]
Figure 13

Figure 1.12 Leakage current reduction through VTH optimization. (b)

© 2002 IEEE. Reprinted, with permission, from [11]
Figure 14

Figure 1.13 Example ICs with variable VDD (VS) and VTH (VT).

© 1996, 1997, 1998 IEEE. Reprinted, with permission, from (a) [12]; (b) [10]; (c) [13]
Figure 15

Figure 1.14 IC interface power scaling with process.

© 2007 IEICE, [9], figure 4
Figure 16

Figure 1.15 An off-chip interconnection in a typical PC memory system.

Figure 17

Figure 1.16 Impedance variations of off-chip interconnects.

Figure 18

Figure 1.17 A package-on-package in a smartphone.

Figure 19

Figure 1.18 IC power as a function of VDD [3].

Figure 20

Figure 1.19 Logic and memory energy consumption as a function of supply voltage.

© 2011 IEEE. Reprinted, with permission, from [14]
Figure 21

Figure 1.20 Optimized VDD for minimum total energy.

© 2014 IEEE. Reprinted, with permission, from [15]
Figure 22

Figure 1.21 Statistical system design approach and its application.

Figure 23

Figure 1.22 Power efficiency comparison: machine vs. human.

Figure 24

Figure 1.23 Using “fragrance” to create a smart environment.

© 2007 IEICE, [9], figure 5
Figure 25

Figure 1.24 Critical point in IC technology development [1].

Figure 26

Figure 1.25 From 2D to 3D integration.

Figure 27

Figure 1.26 Monolithic 3D ICs – FinFET and 3D NAND.

(b) © 2011 Toshiba Corporation. Reprinted, with permission, from [25]
Figure 28

Figure 1.27 Evolution from 2 to 3D IC integration.

© 2005 Nikkei BP [26]; photo: © 2016 Toshiba Corporation [28]
Figure 29

Figure 1.28 Conventional 3D IC integration solutions.

(a) © 2007 IEEE. Reprinted, with permission, from [29]
Figure 30

Table 1.3. Comparison of conventional 3D IC integration solutions.

Figure 31

Figure 1.29 Advanced 3D IC integration solutions.

Figure 32

Figure 1.30 Structure of a TSV.

Reproduced by permission from Sei-ichi Denda, 半導体の高次元化技術 [Enabling Technology for Higher Dimensional Semiconductors], Japan: Tokyo Denki University Press, 2015. © Denda Sei-ichi 2015
Figure 33

Figure 1.31 TSV formation within the IC fabrication process flow.

Reproduced by permission from Sei-ichi Denda, 半導体の高次元化技術 [Enabling Technology for Higher Dimensional Semiconductors], Japan: Tokyo Denki University Press, 2015. © 2015 Denda Sei-ichi
Figure 34

Figure 1.32 3D IC integration technology comparison: wirebond vs. TSV.

Reproduced by permission from Sei-ichi Denda, 半導体の高次元化技術 [Enabling Technology for Higher Dimensional Semiconductors], Japan: Tokyo Denki University Press, 2015. © 2015 Denda Sei-ichi
Figure 35

Table 1.4. Evolution of mobile DRAM energy efficiency.

Figure 36

Figure 1.33 Construction of a Wide I/O memory system.

Figure 37

Figure 1.34 Evolution of LPDDR bandwidth and energy efficiency.

Figure 38

Figure 1.35 Relative size comparison: TSV vs. NAND gate.

© 2016 IEEE. Reprinted, with permission, from [45]
Figure 39

Figure 1.36 Cross section of a 2.5D integrated FPGA.

© 2012 IEEE. Reprinted, with permission, from [47]
Figure 40

Table 1.5. Pros and cons of 2.5D IC integration.

Figure 41

Figure 1.37 Cross section of an HBM memory subsystem.

© 2017 IEEE. Reprinted, with permission, from [48]
Figure 42

Figure 1.38 Inductive coupling interface test chips from 2006–2008.

© 2008 The Japan Institute of Electronics Packaging. Reprinted, with permission, from [49]
Figure 43

Figure 1.39 Inductive coupling test chip, 2006.

© 2008 The Japan Institute of Electronics Packaging. Reprinted, with permission, from [49]
Figure 44

Figure 1.40 Benchmarking the 2006 inductive coupling test chip.

© 2008 The Japan Institute of Electronics Packaging. Reprinted, with permission, from [49]
Figure 45

Figure 1.41 Inductive coupling test chip, 2007.

© 2008 The Japan Institute of Electronics Packaging. Reprinted, with permission, from [49]
Figure 46

Figure 1.42 Simulated S21 dependency on substrate resistivity.

© 2008 The Japan Institute of Electronics Packaging. Reprinted, with permission, from [49]
Figure 47

Figure 1.43 Mechanism of capacitive and inductive coupling.

© 2008 The Japan Institute of Electronics Packaging. Reprinted, with permission, from [49]
Figure 48

Figure 1.44 Performance comparison between inductive and capacitive coupling.

© 2008 The Japan Institute of Electronics Packaging. Reprinted, with permission, from [49]
Figure 49

Figure 1.45 TCI <3D integration solutions.

Figure 50

Figure 1.46 Comparison between near- and far-field.

© 2010 IEEE. Reprinted, with permission, from [50]
Figure 51

Figure 1.47 Near-field coupling interconnect technologies.

© 2017 IEEE. Reprinted, with permission, from [51]
Figure 52

Figure 1.48 Disparity in performance scaling: chip vs. I/O.

Figure 53

Figure 1.49 Scaling comparison: flip-chip bump pitch vs. process.

Figure 54

Figure 1.50 Basic structure of TCI.

(c) and (d) © 2017 IEEE. Reprinted, with permission, from [51]
Figure 55

Table 1.6. Merits of TCI interconnect technology.

Figure 56

Table 1.7. Comparison of TCI and TSV.

Source: © 2017, IEEE. Reprinted, with permission, from [51].
Figure 57

Figure 1.51 TCI performance scaling scenario.

© 2010 IEEE. Reprinted, with permission, from [50]
Figure 58

Figure 1.52 Example TCI applications.

© 2010 IEEE. Reprinted, with permission, from [50]
Figure 59

Figure 1.53 High-capacity SSD enabled with TCI.

© 2010 IEEE. Reprinted, with permission, from [56]
Figure 60

Figure 1.54 Processor and SRAM stack integrated with TCI.

(a) © 2009 IEEE. Reprinted, with permission, from [57]; (b) Copyright 2009 The Japan Society of Applied Physics [58]
Figure 61

Figure 1.55 Data rate as a function of display format.

© 2016 Atsutake Kosuge [59]
Figure 62

Figure 1.56 Data rate trends of widely adopted interfaces.

© 2016 Atsutake Kosuge [59]
Figure 63

Figure 1.57 A typical interconnect structure in a modular system.

© 2016 Atsutake Kosuge [59]
Figure 64

Figure 1.58 Internal structure of a backplane connector.

© 2016 Atsutake Kosuge [59]
Figure 65

Figure 1.59 Simulated signal degradation of a backplane interconnect.

© 2016 Atsutake Kosuge [59]
Figure 66

Figure 1.60 Common equalization circuit solutions.

© 2016 Atsutake Kosuge [59]
Figure 67

Figure 1.61 Simulated received signal through a TCI connector and transmission line.

© 2016 IEEE. Reprinted, with permission, from [63]
Figure 68

Figure 1.62 Basic structure of TLC.

(a) and (b) © 2016 Atsutake Kosuge [59]; (c) © 2015 IEEE. Reprinted, with permission, from [64]
Figure 69

Figure 1.63 Transmission gain of an example TLC connector.

© 2015 IEEE. Reprinted, with permission, from [65]
Figure 70

Table 1.8. Merits of TLC connector technology.

Figure 71

Figure 1.64 Example TLC applications.

© 2017 IEEE. Reprinted, with permission, from [51]
Figure 72

Figure 1.65 A modular smartphone design.

© 2015 IEEE. Reprinted, with permission, from [68]

Accessibility standard: Unknown

Why this information is here

This section outlines the accessibility features of this content - including support for screen readers, full keyboard navigation and high-contrast display options. This may not be relevant for you.

Accessibility Information

Accessibility compliance for the HTML of this book is currently unknown and may be updated in the future.

Save book to Kindle

To save this book to your Kindle, first ensure no-reply@cambridge.org is added to your Approved Personal Document E-mail List under your Personal Document Settings on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part of your Kindle email address below. Find out more about saving to your Kindle.

Note you can select to save to either the @free.kindle.com or @kindle.com variations. ‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi. ‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.

Find out more about the Kindle Personal Document Service.

  • Introduction
  • Tadahiro Kuroda, University of Tokyo, Wai-Yeung Yip, University of Tokyo
  • Book: Wireless Interface Technologies for 3D IC and Module Integration
  • Online publication: 17 September 2021
  • Chapter DOI: https://doi.org/10.1017/9781108893299.002
Available formats
×

Save book to Dropbox

To save content items to your account, please confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account. Find out more about saving content to Dropbox.

  • Introduction
  • Tadahiro Kuroda, University of Tokyo, Wai-Yeung Yip, University of Tokyo
  • Book: Wireless Interface Technologies for 3D IC and Module Integration
  • Online publication: 17 September 2021
  • Chapter DOI: https://doi.org/10.1017/9781108893299.002
Available formats
×

Save book to Google Drive

To save content items to your account, please confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account. Find out more about saving content to Google Drive.

  • Introduction
  • Tadahiro Kuroda, University of Tokyo, Wai-Yeung Yip, University of Tokyo
  • Book: Wireless Interface Technologies for 3D IC and Module Integration
  • Online publication: 17 September 2021
  • Chapter DOI: https://doi.org/10.1017/9781108893299.002
Available formats
×