FreshRSS

Zobrazení pro čtení

Jsou dostupné nové články, klikněte pro obnovení stránky.

3.5D: The Great Compromise

The semiconductor industry is converging on 3.5D as the next best option in advanced packaging, a hybrid approach that includes stacking logic chiplets and bonding them separately to a substrate shared by other components.

This assembly model satisfies the need for big increases in performance while sidestepping some of the thorniest issues in heterogeneous integration. It establishes a middle ground between 2.5D, which already is in widespread use inside of data centers, and full 3D-ICs, which the chip industry has been struggling to commercialize for the better part of a decade.

A 3.5D architecture offers several key advantages:

  • It creates enough physical separation to effectively address thermal dissipation and noise.
  • It provides a way to add more SRAM into high-speed designs. SRAM has been the go-to choice for processor cache since the mid-1960s, and remains an essential element for faster processing. But SRAM no longer scales at the same rate as digital transistors, so it is consuming more real estate (in percentage terms) at each new node. And because the size of a reticle is fixed, the best available option is to add area by stacking chiplets vertically.
  • By thinning the interface between processing elements and memory, a 3.5D approach also can shorten the distances that signals need to travel and greatly improve processing speeds well beyond a planar implementation. This is essential with large language models and AI/ML, where the amount of data that needs to be processed quickly is exploding.

Chipmakers still point to fully integrated 3D-ICs as the best performing alternative to a planar SoC, but packing everything into a 3D configuration makes it harder to deal with physical effects. Thermal dissipation is probably the most difficult to contend with. Workloads can vary significantly, creating dynamic thermal gradients and trapping heat in unexpected places, which in turn reduce the lifespan and reliability of chips. On top of that, power and substrate noise become more problematic at each new node, as do concerns about electromagnetic interference.

“What the market has adopted first is high-performance chips, and those produce a lot of heat,” said Marc Swinnen, director of product marketing at Ansys. “They have gone for expensive cooling systems with a huge number of fans and heat sinks, and they have opted for silicon interposers, which arguably are some of the most expensive technologies for connecting chips together. But it also gives the highest performance and is very good for thermal because it matches the coefficient of thermal expansion. Thermal is one of the big reasons that’s been successful. In addition to that, you may want bigger systems with more stuff that you can’t fit on one chip. That’s just a reticle-size limitation. Another is heterogeneous integration, where you want multiple different processes, like an RF process or the I/O, which don’t need to be in 5nm.”

A 3.5D assembly also provides more flexibility to add additional processor cores, and higher yield because known good die can be manufactured and tested separately, a concept first pioneered by Xilinx in 2011 at 28nm.

3.5D is a loose amalgamation of all these approaches. It can include two to three chiplets stacked on top of each other, or even multiple stacks laid out horizontally.

“It’s limited vertical, and not just for thermal reasons,” said Bill Chen, fellow and senior technical advisor at ASE Group. “It’s also for performance reasons. But thermal is the limiting factor, and we’ve talked about many different materials to help with that — diamond and graphene — but that limitation is still there.”

This is why the most likely combination, at least initially, will be processors stacked on SRAM, which simplifies the cooling. The heat generated by high utilization of different processing elements can be removed with heat sinks or liquid cooling. And with one or more thinned out substrates, signals will travel shorter distances, which in turn uses less power to move data back and forth between processors and memory.

“Most likely, this is going to be logic over memory on a logic process,” said Javier DeLaCruz, fellow and senior director of Silicon Ops Engineering at Arm. “These are all contained within an SoC normally, but a portion of that is going to be SRAM, which does not scale very well from node to node. So having logic over memory and a logic process is really the winning solution, and that’s one of the better use cases for 3D because that’s what really shortens your connectivity. A processor generally doesn’t talk to another processor. They talk to each other through memory, so having the memory on a different floor with no latency between them is pretty attractive.”

The SRAM doesn’t necessarily have to be at the same node as the processors advanced node, which also helps with yield, and reliability. At a recent Samsung Foundry event, Taejoong Song, the company’s vice president of foundry business development, showed a roadmap of a 3.5D configuration using a 2nm chiplet stacked on a 4nm chiplet next year, and a 1.4nm chiplet on top of a 2nm chiplet in 2027.


Fig. 1: Samsung’s heterogeneous integration roadmap showing stacked DRAM (HBM), chiplets and co-packaged optics. Source: Samsung Foundry

Intel Foundry’s approach is similar in many ways. “Our 3.5D technology is implemented on a substrate with silicon bridges,” said Kevin O’Buckley, senior vice president and general manager of Foundry Services at Intel. “This is not an incredibly costly, low-yielding, multi-reticle form-factor silicon, or even RDL. We’re using thin silicon slices in a much more cost-efficient fashion to enable that die-to-die connectivity — even stacked die-to-die connectivity — through a silicon bridge. So you get the same advantages of silicon density, the same SI (signal integrity) performance of that bridge without having to put a giant monolithic interposer underneath the whole thing, which is both cost- and capacity-prohibitive. It’s working. It’s in the lab and it’s running.”


Fig. 2: Intel’s 3.5D model. Source: Intel

The strategy here is partly evolutionary — 3.5D has been in R&D for at least several years — and part revolutionary, because thinning out the interconnect layer, figuring out a way to handle these thinner interconnect layers, and how to bond them is still a work in progress. There is a potential for warping, cracking, or other latent defects, and dynamically configuring data paths to maximize throughput is an ongoing challenge. But there have been significant advances in thermal management on two- and three-chiplet stacks.

“There will be multiple solutions,” said C.P. Hung, vice president of corporate R&D at ASE. “For example, besides the device itself and an external heat sink, a lot of people will be adding immersion cooling or local liquid cooling. So for the packaging, you can probably also expect to see the implementation of a vapor chamber, which will add a good interface from the device itself to an external heat sink. With all these challenges, we also need to target a different pitch. For example, nowadays you see mass production with a 45 to 40 pitch. That is a typical bumping solution. We expect the industry to move to a 25 to 20 micron bump pitch. Then, to go further, we need hybrid bonding, which is a less than 10 micron pitch.”


Fig. 3: Today’s interposers support more than 100,000 I/Os at a 45m pitch. Source: ASE

Hybrid bonding solves another thorny problem, which is co-planarity across thousands of micro-bumps. “People are starting to realize that the densities we’re interconnecting require a level of flatness, which the guys who make traditional things to be bonded are having a hard time meeting with reasonable yield,” David Fromm, COO at Promex Industries. “That makes it hard to build them, and the thinking is, ‘So maybe we’ve got to do something else.’ You’re starting to see some of that.”

Taming the Hydra
Managing heat remains a challenge, even with all the latest advances and a 3.5D assembly, but the ability to isolate the thermal effects from other components is the best option available today, and possibly well into the future. Still, there are other issues to contend with. Even 2.5D isn’t easy, and a large percentage of the 2.5D implementations have been bespoke designs by large systems companies with very deep pockets.

One of the big remaining challenges is closing timing so that signals arrive at the right place at the right fraction of a second. This becomes harder as more elements are added into chips, and in a 3.5D or 3D-IC, this can be incredibly complex.

“Timing ultimately is the key,” said Sutirtha Kabir, R&D director at Synopsys. “It’s not guaranteed that at whatever your temperature is, you can use the same library for timing. So the question is how much thermal- and IR-aware timing do you have to do? These are big systems. You have to make sure your sign-off is converging. There are two things coming out. There are a bunch of multi-physics effects that are all clumped together. And yes, you could traditionally do one at a time as sign-off, but that isn’t going to work very well. You need to figure out how to solve these problems simultaneously. Ultimately, you’re doing one design. It’s not one for thermal, one for IR, one for timing. The second thing is the data is exploding. How do you efficiently handle the data, because you cannot wait for days and days of runtime and simulation and analysis?”

Physically assembling these devices isn’t easy, either. “The challenge here is really in the thermal, electrical, and mechanical connection of all these various die with different thicknesses and different coefficients of thermal expansion,” said Intel’s O’Buckley. “So with three die, you’ve got the die and an active base, and those are substantially thinned to enable them to come together. And then EMIB is in the substrate. There’s always intense thermal-mechanical qualification work done to manage not just the assembly, but to ensure in the final assembly — the second-level assembly when this is going through system-level card attach — that this thing stays together.”

And depending upon demands for speed, the interconnects and interconnect materials can change. “Hybrid bonding gives you, by far, the best signal and power density,” said Arm’s DeLaCruz. “And it gives you the best thermal conductivity, because you don’t have that underfill that you would otherwise have to put in between the die, which is a pretty significant barrier. This is likely where the industry will go. It’s just a matter of having the production base.”

Hybrid bonding has been used for years for image sensors using wafer-on-wafer connections. “The tricky part is going into the logic space, where you’re moving from wafer-on-wafer to a die-on-wafer process, which is more complex,” DeLaCruz said. “While it currently would cost more, that’s a temporary problem because there’s not much of an installed base to support it and drive down the cost. There’s really no expensive material or equipment costs.”

Toward mass customization
All of this is leading toward the goal of choosing chiplets from a menu and then rapidly connecting them into some sort of architecture that is proven to work. That may not materialize for years. But commercial chiplets will show up in advanced designs over the next couple years, most likely in high-bandwidth memory with a customized processor in the stack, with more following that path in the future.

At least part of this will depend on how standardized the processes for designing, manufacturing, and testing become. “We’re seeing a lot of 2.5D from customers able to secure silicon interposers,” said Ruben Fuentes, vice president for the Design Center at Amkor Technology. “These customers want to place their chiplets on an interposer, then the full module is placed on a flip-chip substrate package. We also have customers who say they either don’t want to use a silicon interposer or cannot secure them. They prefer an RDL interconnect with S-SWIFT or with S-Connect, which serves as an interposer in very dense areas.”

But with at least a third of these leading designs only for internal use, and the remainder confined to large processor vendors, the rest of the market hasn’t caught up yet. Once it does, that will drive economies of scale and open the door to more complete assembly design kits, commercial chiplets, and more options for customization.

“Everybody is generally going in the same direction,” said Fuentes. “But not everything is the same height. HBMs are pre-packaged and are taller than ICs. HBMs could have 12 or 16 ICs stacked inside. It makes a difference from a co-planarity and thermal standpoint, and metal balancing on different layers. So now vendors are having a hard time processing all this data because suddenly you have these huge databases that are a lot bigger than the standard packaging databases. We’re seeing bridges, S-Connect, SWIFT, and then S-SWIFT. This is new territory, and we’re seeing a performance gap in the packaging tools. There’s work that needs to be done here, but software vendors have been very proactive in finding solutions. Additionally, these packages need to be routed. There is limited automated routing, so a good amount of interactive routing is still required, so it takes a lot of time.”


Fig. 4: Packaging roadmap showing bridge and hybrid bonding connections for modules and chiplets, respectively. Source: Amkor Technology

What’s missing
The key challenges ahead for 3.5D are proven reliability and customizability — requirements that are seemingly contradictory, and which are beyond the control of any single company. There are four major pieces to making all of this work.

EDA is the first important piece of the puzzle, and the challenge extends just beyond a single chip. “The IC designers have to think about a lot of things concurrently, like thermal, signal integrity, and power integrity,” said Keith Lanier, technical product management director at Synopsys. “But even beyond that, there’s a new paradigm in terms of how people need to work. Traditional packaging folks and IC designers need to work closely together to make these 3.5D designs successful.”

It’s not just about doing more with the same or fewer people. It’s doing more with different people, too. “It’s understanding the architecture definition, the functional requirements, constraints, and having those well-defined,” Lanier said. “But then it’s also feasibility, which includes partitioning and technology selection, and then prototyping and floor-planning. This is lots and lots of data that is required to be generated, and you need analysis-driven exploration, design, and implementation. And AI will be required to help designers and system design teams manage the sheer complexity of these 3.5D designs.”

Process/assembly design kits are a second critical piece, and this is likely to be split between the foundries and the OSATs. “If the customer wants a silicon interposer for a 2.5D package, it would be up to the foundry that’s going to manufacture the interposer to provide the PDK. We would provide the PDK for all of our products, such as S-SWIFT and S-Connect,” said Amkor’s Fuentes.

Setting realistic parameters is the third piece of the puzzle. While the type of processing elements and some of the analog functions may change — particularly those involving power and communication — most of the components will remain the same. That determines what can be pre-built and pre-tested, and the speed and ease of assembly.

“A lot of the standards that are being deployed, like UCIe interfaces and HBM interfaces are heading to where 20% is customization and 80% is on the shelf,” said Intel’s O’Buckley. “But we aren’t there today. At the scale that our customers are deploying these products, the economics of spending that extra time to optimize an implementation is a decimal point. It’s not leveraging 80/20 standards. We’ll get there. But most of these designs you can count on your fingers and toes because of the cost and scale required to do them. And until the infrastructure for standards-based chiplets gets mature, the barrier of entry for companies that want to do this without that scale is just too high. Still, it is going to happen.”

Ensuring processes are consistent is the fourth piece of the puzzle. The tools and the individual processes don’t need to change. “The customer has a ‘target’ for the outcome they want for a particular tool, which typically is a critical dimension measured by a metrology tool,” said David Park, vice president of marketing at Tignis. “As long as there is some ‘measurement’ that determines the goodness of some outcome, which typically is the result of a process step, we can either predict the bad outcome — and engineers have to take some corrective or preventive action — or we can optimize the recipe of that tool in real time to keep the result in the range they want.”

Park noted there is a recipe that controls the inputs. “The tool does whatever it is supposed to do,” he said. “Then you measure the output to see how far you deviated from the acceptable output.”

The challenge is that inside of a 3.5D system, what is considered acceptable output is still being defined. There are many processes with different tolerances. Defining what is consistent enough will require a broad understanding of how all the pieces work together under specific workloads, and where the potential weaknesses are that need to be adjusted.

“One of the problems here is as these densities get higher and the copper pillars get smaller, the amount of space you need between the pillar and the substrate have to be highly controlled,” said Dick Otte, president and CEO of Promex. “There’s a conflict — not so much with how you fabricate the chip, because it usually has the copper pillars on it — but with the substrate. A lot of the substrate technologies are not inherently flat. It’s the same issue with glass. You’ve got a really nice flat piece of glass. The first thing you’re going to do is put down a layer of metal and you’re going to pattern it. And then you put down a layer of dielectric, and suddenly you’ve got a lump where the conductor goes. And now, where do you put the contact points? So you always have the one plan which is going to be the contact point where all the pillars come in. But what if I only need one layer and I don’t need three?”

Conclusion
For the past decade, the chip industry has been trying to figure out a way to balance faster processing, domain-specific designs, limited reticle size, and the enormous cost of scaling an SoC. After investigating nearly every possible packaging approach, interconnect, power delivery method, substrate and dielectric material, 3.5D has emerged as the front runner — at least for now.

This approach provides the chip industry with a common thread on which to begin developing assembly design kits, commercial chiplets, and to fill in the missing tools and services throughout the supply chain. Whether this ultimately becomes a springboard for full 3D-ICs, or a platform on which to use 3D stacking more effectively, remains to be seen. But for the foreseeable future, large chipmakers have converged on a path forward to provide orders of magnitude performance improvements and a way to contain costs. The rest of the industry will be working to smooth out that path for years to come.

Related Reading
Intel Vs. Samsung Vs. TSMC
Foundry competition heats up in three dimensions and with novel technologies as planar scaling benefits diminish.
3D Metrology Meets Its Match In 3D Chips And Packages
Next-generation tools take on precision challenges in three dimensions.
Design Flow Challenged By 3D-IC Process, Thermal Variation
Rethinking traditional workflows by shifting left can help solve persistent problems caused by process and thermal variations.
Floor-Planning Evolves Into The Chiplet Era
Automatically mitigating thermal issues becomes a top priority in heterogeneous designs.

The post 3.5D: The Great Compromise appeared first on Semiconductor Engineering.

Chip Industry Week In Review

Rapidus and IBM are jointly developing mass production capabilities for chiplet-based advanced packages. The collaboration builds on an existing agreement to develop 2nm process technology.

Vanguard and NXP will jointly establish VisionPower Semiconductor Manufacturing Company (VSMC) in Singapore to build a $7.8 billion, 12-inch wafer plant. This is part of a global supply chain shift “Out of China, Out of Taiwan,” according to TrendForce.

Alphawave joined forces with Arm to develop an advanced chiplet based on Arm’s Neoverse Compute Subystems for AI/ML. The chiplet contains the Neoverse N3 CPU core cluster and Arm Coherent Mesh Network, and will be targeted at HPC in data centers, AI/ML applications, and 5G/6G infrastructure.

ElevATE Semiconductor and GlobalFoundries will partner for high-voltage chips to be produced at GF’s facility in Essex Junction, Vermont, which GF bought from IBM. The chips are essential for semiconductor testing equipment, aerospace, and defense systems.

NVIDIA, OpenAI, and Microsoft are under investigation by the U.S. Federal Trade Commission and Justice Department for violation of antitrust laws in the generative AI industry, according to the New York Times.

Quick links to more news:

Market Reports
Global
In-Depth
Education and Training
Security
Product News
Research
Events and Further Reading


Global

Apollo Global Management will invest $11 billion in Intel’s Fab 34 in Ireland, thereby acquiring a 49% stake in Intel’s Irish manufacturing operations.

imec and ASML opened their jointly run High-NA EUV Lithography Lab in Veldhoven, the Netherlands. The lab will be used to prepare  the next-generation litho for high-volume manufacturing, expected to begin in 2025 or 2026.

Expedera opened a new semiconductor IP design center in India. The location, the sixth of its kind for the company, is aimed at helping to make up for a shortfall in trained technicians, researchers, and engineers in the semiconductor sector.

Foxconn will build an advanced computing center in Taiwan with NVIDIA’s Blackwell platform at its core. The site will feature GB200 servers, which consist of 64 racks and 4,608 GPUs, and will be completed by 2026.

Intel and its 14 partner companies in Japan will use Sharp‘s LCD plants to research semiconductor production technology, a cost reduction move that should also produce income for Sharp, according to Nikkei Asia.

Japan is considering legislation to support the commercial production of advanced semiconductors, per Reuters.

Saudi Arabia aims to establish at least 50 semiconductor design companies as part of a new National Semiconductor Hub, funded with over $266 million.

Air Liquide is opening a new industrial gas production facility in Idaho, which will produce ultra-pure nitrogen and other gases for Micron’s new fab.

Microsoft will invest 33.7 billion Swedish crowns ($3.2 billion) to expand its cloud and AI infrastructure in Sweden over a two-year period, reports Bloomberg. The company also will invest $1 billion to establish a new data center in northwest Indiana.

AI data centers could consume as much as 9.1% of the electricity generated in the U.S. by 2030, according to a white paper published by the Electric Power Research Institute. That would more than double the electricity currently consumed by data centers, though EPRI notes this is a worst case scenario and advances in efficiency could be a mitigating factor.


Markets and Money

The Semiconductor Industry Association (SIA) announced global semiconductor sales increased 15.8% year-over-year in April, and the group projected a market growth of 16% in 2024. Conversely, global semiconductor equipment billings contracted 2% year-over-year to US$26.4 billion in Q1 2024, while quarter-over-quarter billings dropped 6% during the same period, according to SEMI‘s Worldwide Semiconductor Equipment Market Statistics (WWSEMS) Report.

Cadence completed its acquisition of BETA CAE Systems International, a provider of multi-domain, engineering simulation solutions.

Cisco‘s investment arm launched a $1 billion fund to aid AI startups as part of its AI innovation strategy. Nearly $200 million has already been earmarked.

The power and RF GaN markets will grow beyond US$2.45 billion and US$1.9 billion in 2029, respectively, according to Yole, which is offering a webinar on the topic.

The micro LED chip market is predicted to reach $580 million by 2028, driven by head-mounted devices and automotive applications, according to TrendForce. The cost of Micro LED chips may eventually come down due to size miniaturization.


In-Depth

Semiconductor Engineering published its Automotive, Security, and Pervasive Computing newsletter this week, featuring these top stories:

More reporting this week:


Security

Scott Best, Rambus senior director of Silicon Security Products, delivered a keynote at the Hardwear.io conference this week (below), detailing a $60 billion reverse engineering threat for hardware in just three markets — $30 billion for printer consumables, $20 billion for rechargeable batteries with some type of authentication, and $10 billion for medical devices such as sonogram probes.


Photo source: Ed Sperling/Semiconductor Engineering

wolfSSL debuted wolfHSM for automotive hardware security modules, with its cryptographic library ported to run in automotive HSMs like Infineon’s Aurix Tricore TC3XX.

Cisco integrated AMD Pensando data processing units (DPUs) with its Hypershield security architecture for defending AI-scale data centers.

OMNIVISION released an intelligent CMOS image sensor for human presence detection, infrared facial authentication, and always-on technology with a single sensing camera. And two new image sensors for industrial and consumer security surveillance cameras.

Digital Catapult announced a new cohort of companies will join Digital Security by Design’s Technology Access Program, gaining access to an Arm Morello prototype evaluation hardware kit based on Capability Hardware Enhanced RISC Instructions (CHERI), to find applications across critical UK sectors.

University of Southampton researchers used formal verification to evaluate the hardware reliability of a RISC-V ibex core in the presence of soft errors.

Several institutions published their students’ master’s and PhD work:

  • Virginia Tech published a dissertation proposing sPACtre, a defense mechanism that aims to prevent Spectre control-flow attacks on existing hardware.
  • Wright State University published a thesis proposing an approach that uses various machine learning models to bring an improvement in hardware Trojan identification with power signal side channel analysis
  • Wright State University published a thesis examining the effect of aging on the reliability of SRAM PUFs used for secure and trusted microelectronics IC applications.
  • Nanyang Technological University published a Final Year Project proposing a novel SAT-based circuit preprocessing attack based on the concept of logic cones to enhance the efficacy of SAT attacks on complex circuits like multipliers.

The Cybersecurity and Infrastructure Security Agency (CISA) issued a number of alerts/advisories.


Education and Training

Renesas and the Indian Institute of Technology Hyderabad (IIT Hyderabad) signed a three-year MoU to collaborate on VLSI and embedded semiconductor systems, with a focus on R&D and academic interactions to advance the “Make in India” strategy.

Charlie Parker, senior machine learning engineer at Tignis, presented a talk on “Why Every Fab Should Be Using AI.

Penn State and the National Sun Yat-Sen University (NSYSU) in Taiwan partnered to develop educational and research programs focused on semiconductors and photonics.

Rapidus and Hokkaido University partnered on education and research to enhance Japan’s scientific and technological capabilities and develop human resources for the semiconductor industry.

The University of Minnesota named Steve Koester its first “Chief Semiconductor Officer,” and launched a website devoted to semiconductor and microelectronics research and education.

The state of Michigan invested $10 million toward semiconductor workforce development.


Product News

Siemens reported breakthroughs in high-level C++ verification that will be used in conjunction with its Catapult software. Designers will be able to use formal property checking via the Catapult Formal Assert software and reachability coverage analysis through Catapult Formal CoverCheck.

Infineon released several products:

Augmental, an MIT Media Lab spinoff, released a tongue-based computer controller, dubbed the MouthPad.

NVIDIA revealed a new line of products that will form the basis of next-gen AI data centers. Along with partners ASRock Rack, ASUS, GIGABYTE, Ingrasys, and others, the NVIDIA GPUs and networking tech will offer cloud, on-premises, embedded, and edge AI systems. NVIDIA founder and CEO Jensen Huang showed off the company’s upcoming Rubin platform, which will succeed its current Blackwell platform. The new system will feature new GPUs, an Arm-based CPU and advanced networking with NVLink 6, CX9 SuperNIC and X1600 converged InfiniBand/Ethernet switch.

Intel showed off its Xeon 6 processors at Computex 2024. The company also unveiled architectural details for its Lunar Lake client computing processor, which will use 40% less SoC power, as well as a new NPU, and X2 graphic processing unit cores for gaming.


Research

imec released a roadmap for superconducting digital technology to revolutionize AI/ML.

CEA-Leti reported breakthroughs in three projects it considers key to the next generation of CMOS image sensors. The projects involved embedding AI in the CIS and stacking multiple dies to create 3D architectures.

Researchers from MIT’s Computer Science & Artificial Intelligence Laboratory (MIT-CSAIL) used a type of generative AI, known as diffusion models, to train multi-purpose robots, and designed the Grasping Neural Process for more intelligent robotic grasping.

IBM and Pasqal partnered to develop a common approach to quantum-centric supercomputing and to promote application research in chemistry and materials science.

Stanford University and Q-NEXT researchers investigated diamond to find the source of its temperamental nature when it comes to emitting quantum signals.

TU Wien researchers investigated how AI categorizes images.

In Canada:

  • Simon Fraser University received funding of over $80 million from various sources to upgrade the supercomputing facility at the Cedar National Host Site.
  • The Digital Research Alliance of Canada announced $10.28 million to renew the University of Victoria’s Arbutus cloud infrastructure.
  • The Canadian government invested $18.4 million in quantum research at the University of Waterloo.

Events and Further Reading

Find upcoming chip industry events here, including:

Event Date Location
SNUG Europe: Synopsys User Group Jun 10 – 11 Munich
IEEE RAS in Data Centers Summit: Reliability, Availability and Serviceability Jun 11 – 12 Santa Clara, CA
AI for Semiconductors (MEPTEC) Jun 12 – 13 Online
3D & Systems Summit Jun 12 – 14 Dresden, Germany
PCI-SIG Developers Conference Jun 12 – 13 Santa Clara, CA
Standards for Chiplet Design with 3DIC Packaging (Part 1) Jun 14 Online
AI Hardware and Edge AI Summit: Europe Jun 18 – 19 London, UK
Standards for Chiplet Design with 3DIC Packaging (Part 2) Jun 21 Online
DAC 2024 Jun 23 – 27 San Francisco
RISC-V Summit Europe 2024 Jun 24 – 28 Munich
Leti Innovation Days 2024 Jun 25 – 27 Grenoble, France
Find All Upcoming Events Here

Upcoming webinars are here.


Semiconductor Engineering’s latest newsletters:

Automotive, Security and Pervasive Computing
Systems and Design
Low Power-High Performance
Test, Measurement and Analytics
Manufacturing, Packaging and Materials

 

The post Chip Industry Week In Review appeared first on Semiconductor Engineering.

Predicting And Preventing Process Drift

Increasingly tight tolerances and rigorous demands for quality are forcing chipmakers and equipment manufacturers to ferret out minor process variances, which can create significant anomalies in device behavior and render a device non-functional.

In the past, many of these variances were ignored. But for a growing number of applications, that’s no longer possible. Even minor fluctuations in deposition rates during a chemical vapor deposition (CVD) process, for example, can lead to inconsistencies in layer uniformity, which can impact the electrical isolation properties essential for reliable circuit operation. Similarly, slight variations in a photolithography step can cause alignment issues between layers, leading to shorts or open circuits in the final device.

Some of these variances can be attributed to process error, but more frequently they stem from process drift — the gradual deviation of process parameters from their set points. Drift can occur in any of the hundreds of process steps involved in manufacturing a single wafer, subtly altering the electrical properties of chips and leading to functional and reliability issues. In highly complex and sensitive ICs, even the slightest deviations can cause defects in the end product.

“All fabs already know drift. They understand drift. They would just like a better way to deal with drift,” said David Park, vice president of marketing at Tignis. “It doesn’t matter whether it’s lithography, CMP (chemical mechanical polishing), CVD or PVD (chemical/physical vapor deposition), they’re all going to have drift. And it’s all going to happen at various rates because they are different process steps.”

At advanced nodes and in dense advanced packages, where a nanometer can be critical, controlling process drift is vital for maintaining high yield and ensuring profitability. By rigorously monitoring and correcting for drift, engineers can ensure that production consistently meets quality standards, thereby maximizing yield and minimizing waste.

“Monitoring and controlling hundreds of thousands of sensors in a typical fab requires the ability to handle petabytes of real-time data from a large variety of tools,” said Vivek Jain, principal product manager, smart manufacturing at Synopsys. “Fabs can only control parameters or behaviors they can measure and analyze. They use statistical analysis and error budget breakdowns to define upper control limits (UCLs) and lower control limits (LCLs) to monitor the stability of measured process parameters and behaviors.”

Dialing in legacy fabs
In legacy fabs — primarily 200mm — most of the chips use 180nm or older process technology, so process drift does not need to be as precisely monitored as in the more advanced 300mm counterparts. Nonetheless, significant divergence can lead to disparities in device performance and reliability, creating a cascade of operational challenges.

Manufacturers operating at older technology nodes might lack the sophisticated, real-time monitoring and control methods that are standard in cutting-edge fabs. While the latter have embraced ML to predict and correct for drift, many legacy operations still rely heavily on periodic manual checks and adjustments. Thus, the management of process drift in these settings is reactive rather than proactive, making changes after problems are detected rather than preventing them.

“There is a separation between 300-millimeter and 200-millimeter fabs,” said Park. “The 300-millimeter guys are all doing some version of machine learning. Sometimes it’s called advanced process control, and sometimes it’s actually AI-powered process control. For some of the 200-millimeter fabs with more mature process nodes, they basically have a recipe they set and a bunch of technicians looking at machines and looking at the CDs. When the drift happens, they go through their process recipe and manually adjust for the out-of-control processes, and that’s just what they’ve always done. It works for them.”

For these older fabs, however, the repercussions of process drift can be substantial. Minor deviations in process parameters, such as temperature or pressure during the deposition or etching phases, gradually can lead to changes in the physical structure of the semiconductor devices. Over time, these minute alterations can compound, resulting in layers of materials that deviate from their intended characteristics. Such deviations affect critical dimensions and ultimately can compromise the electrical performance of the chip, leading to slower processing speeds, higher power consumption, or outright device failure.

The reliability equation is equally impacted by process drift. Chips are expected to operate consistently over extended periods, often under a range of environmental conditions. However, when process-induced variability can weaken the device’s resilience, precipitating early wear-out mechanisms and reducing its lifetime. In situations where dependability is non-negotiable, such as in automotive or medical applications, those variations can have dire consequences.

But with hundreds of process steps for a typical IC, eliminating all variability in fabs is simply not feasible.

“Process drift is never going to not happen, because the processes are going to have some sort of side effect,” Park said. “The machines go out of spec and things like pumps and valves and all sorts of things need to be replaced. You’re still going to have preventive maintenance (PM). But if the critical dimensions are being managed correctly, which is typically what triggers the drift, you can go a longer period of time between cleanings or the scheduled PMs and get more capacity.”

Process drift pitfalls
Managing process drift in semiconductor manufacturing presents several complex challenges. Hysteresis, for example, is a phenomenon where the output of a process varies not solely because of current input conditions, but also based on the history of the states through which the process already has passed. This memory effect can significantly complicate precision control, as materials and equipment might not reset to a baseline state after each operational cycle. Consequently, adjustments that were effective in previous cycles may not yield the same outcomes due to accumulated discrepancies.

One common cause of hysteresis is thermal cycling, where repeated heating and cooling create mechanical stresses. Those stresses can be additive, releasing inconsistently based on temperature history.  That, in turn, can lead to permanent changes in the output of a circuit, such as a voltage reference, which affects its precision and stability.

In many field-effect transistors (FETs), hysteresis also can occur due to charge trapping. This happens when charges are captured in ‘trap states’ within the semiconductor material or at the interface with another material, such as an oxide layer. The trapped charges then can modulate the threshold voltage of the device over time and under different electrical biases, potentially leading to operational instability and variability in device performance.

Human factors also play a critical role in process drift, with errors stemming from incorrect settings adjustments, mishandling of materials, misinterpretation of operational data, or delayed responses to process anomalies. Such errors, though often minor, can lead to substantial variations in manufacturing processes, impacting the consistency and reliability of semiconductor devices.

“Once in production, the biggest source of variability is human error or inconsistency during maintenance,” said Russell Dover, general manager of service product line at Lam Research. “Wet clean optimization (WCO) and machine learning through equipment intelligence solutions can help address this.”

The integration of new equipment into existing production lines introduces additional complexities. New machinery often features increased speed, throughput, and tighter tolerances, but it must be integrated thoughtfully to maintain the stringent specifications required by existing product lines. This is primarily because the specifications and performance metrics of legacy chips have been long established and are deeply integrated into various applications with pre-existing datasheets.

“From an equipment supplier perspective, we focus on tool matching,” said Dover. “That includes manufacturing and installing tools to be identical within specification, ensuring they are set up and running identically — and then bringing to bear systems, tooling, software and domain knowledge to ensure they are maintained and remain as identical as possible.”

The inherent variability of new equipment, even those with advanced capabilities, requires careful calibration and standardization.

“Some equipment, like transmission electron microscopes, are incredibly powerful,” said Jian-Min Zuo, a materials science and engineering professor at the University of Illinois’ Grainger College of Engineering. “But they are also very finicky, depending on how you tune the machine. How you set it up under specific conditions may vary slightly every time. So there are a number of things that can be done when you try to standardize those procedures, and also standardize the equipment. One example is to generate a curate, like a certain type of test case, where you can collect data from different settings and make sure you’re taking into account the variability in the instruments.”

Process drift solutions
As semiconductor manufacturers grapple with the complexities of process drift, a diverse array of strategies and tools has emerged to address the problem. Advanced process control (APC) systems equipped with real-time monitoring capabilities can extract patterns and predictive insights from massive data sets gathered from various sensors throughout the manufacturing process.

By understanding the relationships between different process variables, APC can predict potential deviations before they result in defects. This predictive capability enables the system to make autonomous adjustments to process parameters in real-time, ensuring that each process step remains within the defined control limits. Essentially, APC acts as a dynamic feedback mechanism that continuously fine-tunes the production process.

Fig. 1: Reduced process drift with AI/ML advanced process control. Source: Tignis

Fig. 1: Reduced process drift with AI/ML advanced process control. Source: Tignis

While APC proactively manages and optimizes the process to prevent deviations, fault detection and classification (FDC) reacts to deviations by detecting and classifying any faults that still occur.

FDC data serves as an advanced early-warning system. This system monitors the myriad parameters and signals during the chip fabrication process, rapidly detecting any variances that could indicate a malfunction or defect in the production line. The classification component of FDC is particularly crucial, as it does more than just flag potential issues. It categorizes each detected fault based on its characteristics and probable causes, vastly simplifying the trouble-shooting process. This allows engineers to swiftly pinpoint the type of intervention needed, whether it’s recalibrating instruments, altering processing recipes, or conducting maintenance repairs.

Statistical process control (SPC) is primarily focused on monitoring and controlling process variations using statistical methods to ensure the process operates efficiently and produces output that meets quality standards. SPC involves plotting data in real-time against control limits on control charts, which are statistically determined to represent the expected normal process behavior. When process measurements stray outside these control limits, it signals that the process may be out of control due to special causes of variation, requiring investigation and correction. SPC is inherently proactive and preventive, aiming to detect potential problems before they result in product defects.

“Statistical process control (SPC) has been a fundamental methodology for the semiconductor industry almost from its very foundation, as there are two core factors supporting the need,” said Dover. “The first is the need for consistent quality, meaning every product needs to be as near identical as possible, and second, the very high manufacturing volume of chips produced creates an excellent workspace for statistical techniques.”

While SPC, FDC, and APC might seem to serve different purposes, they are deeply interconnected. SPC provides the baseline by monitoring process stability and quality over time, setting the stage for effective process control. FDC complements SPC by providing the tools to quickly detect and address anomalies and faults that occur despite the preventive measures put in place by SPC. APC takes insights from both SPC and FDC to adjust process parameters proactively, not just to correct deviations but also to optimize process performance continually.

Despite their benefits, integrating SPC, FDC and APC systems into existing semiconductor manufacturing environments can pose challenges. These systems require extensive configuration and tuning to adapt to specific manufacturing conditions and to interface effectively with other process control systems. Additionally, the success of these systems depends on the quality and granularity of the data they receive, necessitating high-fidelity sensors and a robust data management infrastructure.

“For SPC to be effective you need tight control limits,” adds Dover. “A common trap in the world of SPC is to keep adding control charts (by adding new signals or statistics) during a process ramp, or maybe inheriting old practices from prior nodes without validating their relevance. The result can be millions of control charts running in parallel. It is not a stretch to state that if you are managing a million control charts you are not really controlling much, as it is humanly impossible to synthesize and react to a million control charts on a daily basis.”

This is where AI/ML becomes invaluable, because it can monitor the performance and sustainability of the new equipment more efficiently than traditional methods. By analyzing data from the new machinery, AI/ML can confirm observations, such as reduced accumulation, allowing for adjustments to preventive maintenance schedules that differ from older equipment. This capability not only helps in maintaining the new equipment more effectively but also in optimizing the manufacturing process to take full advantage of the technological upgrades.

AI/ML also facilitate a smoother transition when integrating new equipment, particularly in scenarios involving ‘copy exact’ processes where the goal is to replicate production conditions across different equipment setups. AI and ML can analyze the specific outputs and performance variations of the new equipment compared to the established systems, reducing the time and effort required to achieve optimal settings while ensuring that the new machinery enhances production without compromising the quality and reliability of the legacy chips being produced.

AI/ML
Being more proactive in identifying drift and adjusting parameters in real-time is a necessity. With a very accurate model of the process, you can tune your recipe to minimize that variability and improve both quality and yield.

“The ability to quickly visualize a month’s worth of data in seconds, and be able to look at windows of time, is a huge cost savings because it’s a lot more involved to get data for the technicians or their process engineers to try and figure out what’s wrong,” said Park. “AI/ML has a twofold effect, where you have fewer false alarms, and just fewer alarms in general. So you’re not wasting time looking at things that you shouldn’t have to look at in the first place. But when you do find issues, AI/ML can help you get to the root cause in the diagnostics associated with that much more quickly.”

When there is a real alert, AI/ML offers the ability to correlate multiple parameters and inputs that are driving that alert.

“Traditional process control systems monitor each parameter separately or perform multivariate analysis for key parameters that require significant effort from fab engineers,” adds Jain. “With the amount of fab data scaling exponentially, it is becoming humanly impossible to extract all the actionable insights from the data. Machine learning and artificial intelligence can handle big data generated within a fab to provide effective process control with minimal oversight.”

AI/ML also can look for more other ways of predicting when the drift is going to take your process out of specification. Those correlations can be bivariate and multivariate, as well as univariate. And a machine learning engine that is able to sift through tremendous amounts of data and a larger number of variables than most humans also can turn up some interesting correlations.

“Another benefit of AI/ML is troubleshooting when something does trigger an alarm or alert,” adds Park. “You’ve got SPC and FDC that people are using, and a lot of them have false positives, or false alerts. In some cases, it’s as high as 40% of the alerts that you get are not relevant for what you’re doing. This is where AI/ML becomes vital. It’s never going to take false alerts to zero, but it can significantly reduce the amount of false alerts that you have.”

Engaging with these modern drift solutions, such as AI/ML-based systems, is not mere adherence to industry trends but an essential step towards sustainable semiconductor production. Going beyond the mere mitigation of process drift, these technologies empower manufacturers to optimize operations and maintain the consistency of critical dimensions, allowed by the intelligent analysis of extensive data and automation of complex control processes.

Conclusion
Monitoring process drift is essential for maintaining quality of the device being manufactured, but it also can ensure that the entire fabrication lifecycle operates at peak efficiency. Detecting and managing process drift is a significant challenge in volume production because these variables can be subtle and may compound over time. This makes identifying the root cause of any drift difficult, particularly when measurements are only taken at the end of the production process.

Combating these challenges requires a vigilant approach to process control, regular equipment servicing, and the implementation of AI/ML algorithms that can assist in predicting and correcting for drift. In addition, fostering a culture of continuous improvement and technological adaptation is crucial. Manufacturers must embrace a mindset that prioritizes not only reactive measures, but also proactive strategies to anticipate and mitigate process drift before it affects the production line. This includes training personnel to handle new technologies effectively and to understand the dynamics of process control deeply. Such education enables staff to better recognize early signs of drift and respond swiftly and accurately.

Moreover, the integration of comprehensive data analytics platforms can revolutionize how fabs monitor and analyze the vast amounts of data they generate. These platforms can aggregate data from multiple sources, providing a holistic view of the manufacturing process that is not possible with isolated measurements. With these insights, engineers can refine their process models, enhance predictive maintenance schedules, and optimize the entire production flow to reduce waste and improve yields.

Related Reading
Tackling Variability With AI-Based Process Control
How AI in advanced process control reduces equipment variability and corrects for process drift.

The post Predicting And Preventing Process Drift appeared first on Semiconductor Engineering.

Tackling Variability With AI-based Process Control

Jon Herlocker, co-founder and CEO of Tignis, sat down with Semiconductor Engineering to talk about how AI in advanced process control reduces equipment variability and corrects for process drift. What follows are excerpts of that conversation.

SE: How is AI being used in semiconductor manufacturing and what will the impact be?

Herlocker: AI is going to create a completely different factory. The real change is going to happen when AI gets integrated, from the design side all the way through the manufacturing side. We are just starting to see the beginnings of this integration right now. One of the biggest challenges in the semiconductor industry is it can take years from the time an engineer designs a new device to that device reaching high-volume production. Machine learning is going to cut that to half, or even a quarter. The AI technology that Tignis offers today accelerates that very last step — high-volume manufacturing. Our customers want to know how to tune their tools so that every time they process a wafer the process is in control. Traditionally, device makers get the hardware that meets their specifications from the equipment manufacturer, and then the fab team gets their process recipes working. Depending on the size of the fab, they try to physically replicate that process in a ‘copy exact’ manner, which can take a lot of time and effort. But now device makers can use machine learning (ML) models to autonomously compensate for the differences in equipment variation to produce the exact same outcome, but with significantly less effort by process engineers and equipment technicians.

SE: How is this typically done?

Herlocker: A classic APC system on the floor today might model three input parameters using linear models. But if you need to model 20 or 30 parameters, these linear models don’t work very well. With AI controllers and non-linear models, customers can ingest all of their rich sensor data that shows what is happening in the chamber, and optimally modulate the recipe settings to ensure that the outcome is on-target. AI tools such as our PAICe Maker solution can control any complex process with a greater degree of precision.

SE: So, the adjustments AI process control software makes is to tweak inputs to provide consistent outputs?

Herlocker: Yes, I preach this all the time. By letting AI automate the tasks that were traditionally very manual and time-consuming, engineers and technicians in the fab can remove a lot of the manual precision tasks they needed to do to control their equipment, significantly reducing module operating costs. AI algorithms also can help identify integration issues — interacting effects between tools that are causing variability. We look at process control from two angles. Software can autonomously control the tool by modulating the recipe parameters in response to sensor readings and metrology. But your autonomous control cannot control the process if your equipment is not doing what it is supposed to do, so we developed a separate AI learning platform that ensures equipment is performing to specification. It brings together all the different data silos across the fab – the FDC trace data, metrology data, test data, equipment data, and maintenance data. The aggregation of all that data is critical to understanding the causes of a variation in equipment. This is where ML algorithms can automatically sift through massive amount of data to help process engineers and data scientists determine what parameters are most influencing their process outcomes.

SE: Which process tools benefit the most from AI modeling of advanced process control?

Herlocker: We see the most interest in thin film deposition tools. The physics involved in plasma etching and plasma-enhanced CVD are non-linear processes. That is why you can get much better control with ML modeling. You also can model how the process and equipment evolves over time. For example, every time you run a batch through the PECVD chamber you get some amount of material accumulation on the chamber walls, and that changes the physics and chemistry of the process. AI can build a predictive model of that chamber. In addition to reacting to what it sees in the chamber, it also can predict what the chamber is going to look like for the next run, and now the ML model can tweak the input parameters before you even see the feedback.

SE: How do engineers react to the idea that the AI will be shifting the tool recipe?

Herlocker: That is a good question. Depending on the customer, they have different levels of comfort about how frequently things should change, and how much human oversight there needs to be for that change. We have seen everything from, ‘Just make a recommendation and one of our engineers will decide whether or not to accept that recommendation,’ to adjusting the recipe once a day, to autonomously adjusting for every run. The whole idea behind these adjustments is for variability reduction and drift management, and customers weigh the targeted results versus the perceived risk of taking a novel approach.

SE: Does this involve building confidence in AI-based approaches?

Herlocker: Absolutely, and our systems have a large number of fail-safes, and some limits are hard-coded. We have people with PhDs in chemical engineering and material science who have operated these tools for years. These experts understand the physics of what is happening in these tools, and they have the practical experience to know what level of change can be expected or not.

SE: How much of your modeling is physics-based?

Herlocker: In the beginning, all of our modeling was physics-based, because we were working with equipment makers on their next-generation tools. But now we are also bringing our technology to device makers, where we can also deliver a lot of value by squeezing the most juice out of a data-driven approach. The main challenge with physics models is they are usually IP-protected. When we work with equipment makers, they typically pay us to build those physics-based models so they cannot be shared with other customers.

SE: So are your target customers the toolmakers or the fabs?

Herlocker: They are both our target customers. Most of our sales and marketing efforts are focused on device makers with legacy fabs. In most cases, the fab manager has us engage with their team members to do an assessment. Frequently, that team includes a cross section of automation, process, and equipment teams. The automation team is most interested in reducing the time to detect some sort of deviation that is going to cause yield loss, scrap, or tool downtime. The process and equipment engineers are interested in reducing variability or controlling drift, which also increases chamber life.

For example, let’s consider a PECVD tool. As I mentioned, every time you run the process, byproducts such as polymer materials build up on the chamber walls. You want a thickness of x in your deposition, but you are getting a slightly different wafer thickness uniformity due to drift of that chamber because of plasma confinement changes. Eventually, you must shut down the tool, wet clean the chamber, replace the preventive maintenance kit parts, and send them through the cleaning loop (i.e., to the cleaning vendor shop). Then you need to season the chamber and bring it back online. By controlling the process better, the PECVD team does not have to vent the chamber as often to clean parts. Just a 5% increase in chamber life can be quite significant from a maintenance cost reduction perspective (e.g., parts spend, refurb spend, cleaning spend, etc.). Reducing variability has a similarly large impact, particularly if it is a bottleneck tool, because then that reduction directly contributes to higher or more stable yields via more ‘sweet spot’ processing time, and sometimes better wafer throughput due to the longer chamber lifetime. The ROI story is more nuanced on non-bottleneck tools because they don’t modulate fab revenue, but the ROI there is still there. It is just more about chamber life stability.

SE: Where does this go next?

Herlocker: We also are working with OEMs on next-generation toolsets. Using AI/ML as the core of process control enables equipment makers to control processes that are impossible to implement with existing control strategies and software. For example, imagine on each process step there are a million different parameters that you can control. Further imagine that changing any one parameter has a global effect on all the other parameters, and only by co-varying all the million parameters in just the right way will you get the ideal outcome. And to further complicate things, toss in run-to-run variance, so that the right solution continues to change over time. And then there is the need to do this more than 200 times per hour to support high-volume manufacturing. AI/ML enables this kind of process control, which in turn will enable a step function increase in the ability to produce more complex devices more reliably.

SE: What additional changes do you see from AI-based algorithms?

Herlocker: Machine learning will dramatically improve the agility and productivity of the facility broadly. For example, process engineers will spend less time chasing issues and have more time to implement continuous improvement. Maintenance engineers will have time to do more preventive maintenance. Agility and resiliency — the ability to rapidly adjust to or maintain operations, despite disturbances in the factory or market — will increase. If you look at ML combined with upcoming generative AI capabilities, within a year or two we are going to have agents that effectively will understand many aspects of how equipment or a process works. These agents will make good engineers great, and enable better capture, aggregation, and transfer of manufacturing knowledge. In fact, we have some early examples of this running in our labs. These ML agents capture and ingest knowledge very quickly. So when it comes to implementing the vision of smart factories, machine learning automation will have a massive impact on manufacturing in the future.

The post Tackling Variability With AI-based Process Control appeared first on Semiconductor Engineering.

❌