FreshRSS

Zobrazení pro čtení

Jsou dostupné nové články, klikněte pro obnovení stránky.

AI/ML’s Role In Design And Test Expands

The role of AI and ML in test keeps growing, providing significant time and money savings that often exceed initial expectations. But it doesn’t work in all cases, sometimes even disrupting well-tested process flows with questionable return on investment.

One of the big attractions of AI is its ability to apply analytics to large data sets that are otherwise limited by human capabilities. In the critical design-to-test realm, AI can address problems such as tool incompatibilities between the design set-up, simulation, and ATE test program, which typically slows debugging and development efforts. Some of the most time-consuming and costly aspects of design-to-test arise from incompatibilities between tools.

“During device bring-up and debug, complex software/hardware interactions can expose the need for domain knowledge from multiple teams or stakeholders, who may not be familiar with each other’s tools,” said Richard Fanning, lead software engineer at Teradyne. “Any time spent doing conversions or debugging differences in these set-ups is time wasted. Our toolset targets this exact problem by allowing all set-ups to use the same set of source files so everyone can be sure they are running the same thing.”

ML/AI can help keep design teams on track, as well. “As we drive down this technology curve, the analytics and the compute infrastructure that we have to bring to bear becomes increasingly more complex and you want to be able to make the right decision with a minimal amount of overkill,” said Ken Butler, senior director of business development in the ACS data analytics platform group at Advantest. “In some cases, we are customizing the test solution on a die-by-die type of basis.”

But despite the hype, not all tools work well in every circumstance. “AI has some great capabilities, but it’s really just a tool,” said Ron Press, senior director of technology enablement at Siemens Digital Industries Software, in a recent presentation at a MEPTEC event. “We still need engineering innovation. So sometimes people write about how AI is going to take away everybody’s job. I don’t see that at all. We have more complex designs and scaling in our designs. We need to get the same work done even faster by using AI as a tool to get us there.”

Speeding design to characterization to first silicon
In the face of ever-shrinking process windows and the lowest allowable defectivity rates, chipmakers continually are improving the design-to-test processes to ensure maximum efficiency during device bring-up and into high volume manufacturing. “Analytics in test operations is not a new thing. This industry has a history of analyzing test data and making product decisions for more than 30 years,” said Advantest’s Butler. “What is different now is that we’re moving to increasingly smaller geometries, advanced packaging technologies and chiplet-based designs. And that’s driving us to change the nature of the type of analytics that we do, both in terms of the software and the hardware infrastructure. But from a production test viewpoint, we’re still kind of in the early days of our journey with AI and test.”

Nonetheless, early adopters are building out the infrastructure needed for in-line compute and AI/ML modeling to support real-time inferencing in test cells. And because no one company has all the expertise needed in-house, partnerships and libraries of applications are being developed with tool-to-tool compatibility in mind.

“Protocol libraries provide out-of-the-box solutions for communicating common protocols. This reduces the development and debug effort for device communication,” said Teradyne’s Fanning. “We have seen situations where a test engineer has been tasked with talking to a new protocol interface, and saved significant time using this feature.”

In fact, data compatibility is a consistent theme, from design all the way through to the latest developments in ATE hardware and software. “Using the same test sequences between characterization and production has become key as the device complexity has increased exponentially,” explained Teradyne’s Fanning. “Partnerships with EDA tool and IP vendors is also key. We have worked extensively with industry leaders to ensure that the libraries and test files they output are formats our system can utilize directly. These tools also have device knowledge that our toolset does not. This is why the remote connect feature is key, because our partners can provide context-specific tools that are powerful during production debug. Being able to use these tools real-time without having to reproduce a setup or use case in a different environment has been a game changer.”

Serial scan test
But if it seems as if all the configuration changes are happening on the test side, it’s important to take stock of substantial changes on the approach to multi-core design for test.

Tradeoffs during the iterative process of design for test (DFT) have become so substantial in the case of multi-core products that a new approach has become necessary.

“If we look at the way a design is typically put together today, you have multiple cores that are going to be produced at different times,” said Siemens’ Press. “You need to have an idea of how many I/O pins you need to get your scan channels, the deep serial memory from the tester that’s going to be feeding through your I/O pins to this core. So I have a bunch of variables I need to trade off. I have the number of pins going to the core, the pattern size, and the complexity of the core. Then I’ll try to figure out what’s the best combination of cores to test together in what is called hierarchical DFT. But as these designs get more complex, with upwards of 2,500 cores, that’s a lot of tradeoffs to figure out.”

Press noted that applying AI with the same architecture can provide a 20% to 30% higher efficiency, but an improved methodology based on packetized scan test (see figure 1) actually makes more sense.


Fig. 1: Advantages to the serial scan network (SSN) approach. Source: Siemens

“Instead of having tester channels feeding into the scan channels that go to each core, you have a packetized bus and packets of data that feed through all the cores. Then you instruct the cores when their packet information is going to be available. By doing this, you don’t have as many variables you need to trade off,” he said. At the core level, each core can be optimized for any number of scan channels and patterns, and the I/O pin count is no longer a variable in the calculation. “Then, when you put it into this final chip, it deliver from the packets the amount of data you need for that core, that can work with any size serial bus, in what is called a serial scan network (SSN).”

Some of the results reported by Siemens EDA customers (see figure 2) highlight both supervised and unsupervised machine learning implementation for improvements in diagnosis resolution and failure analysis. DFT productivity was boosted by 5 to 10X using the serial scan network methodology.


Fig. 2: Realized benefits using machine learning and the serial scan network approach. Source: Siemens

What slows down AI implementation in HVM?
In the transition from design to testing of a device, the application of machine learning algorithms can enable a number of advantages, from better pairing of chiplet performance for use in an advanced package to test time reduction. For example, only a subset of high-performing devices may require burn-in.

“You can identify scratches on wafers, and then bin out the dies surrounding those scratches automatically within wafer sort,” said Michael Schuldenfrei, fellow at NI/Emerson Test & Measurement. “So AI and ML all sounds like a really great idea, and there are many applications where it makes sense to use AI. The big question is, why isn’t it really happening frequently and at-scale? The answer to that goes into the complexity of building and deploying these solutions.”

Schuldenfrei summarized four key steps in ML’s lifecycle, each with its own challenges. In the first phase, the training, engineering teams use data to understand a particular issue and then build a model that can be used to predict an outcome associated with that issue. Once the model is validated and the team wants to deploy it in the production environment, it needs to be integrated with the existing equipment, such as a tester or manufacturing execution system (MES). Models also mature and evolve over time, requiring frequent validation of the data going into the model and checking to see that the model is functioning as expected. Models also must adapt, requiring redeployment, learning, acting, validating and adapting, in a continuous circle.

“That eats up a lot of time for the data scientists who are charged with deploying all these new AI-based solutions in their organizations. Time is also wasted in the beginning when they are trying to access the right data, organizing it, connecting it all together, making sense of it, and extracting features from it that actually make sense,” said Schuldenfrei.

Further difficulties are introduced in a distributed semiconductor manufacturing environment in which many different test houses are situated in various locations around the globe. “By the time you finish implementing the ML solution, your model is stale and your product is probably no longer bleeding edge so it has lost its actionability, when the model needs to make a decision that actually impacts either the binning or the processing of that particular device,” said Schuldenfrei. “So actually deploying ML-based solutions in a production environment with high-volume semiconductor test is very far from trivial.”

He cited a 2014 Google article that stated how the ML code development part of the process is both the smallest and easiest part of the whole exercise, [1] whereas the various aspects of building infrastructure, data collection, feature extraction, data verification, and managing model deployments are the most challenging parts.

Changes from design through test ripple through the ecosystem. “People who work in EDA put lots of effort into design rule checking (DRC), meaning we’re checking that the work we’ve done and the design structure are safe to move forward because we didn’t mess anything up in the process,” said Siemens’ Press. “That’s really important with AI — what we call verifiability. If we have some type of AI running and giving us a result, we have to make sure that result is safe. This really affects the people doing the design, the DFT group and the people in test engineering that have to take these patterns and apply them.”

There are a multitude of ML-based applications for improving test operations. Advantest’s Butler highlighted some of the apps customers are pursuing most often, including search time reduction, shift left testing, test time reduction, and chiplet pairing (see figure 3).

“For minimum voltage, maximum frequency, or trim tests, you tend to set a lower limit and an upper limit for your search, and then you’re going to search across there in order to be able to find your minimum voltage for this particular device,” he said. “Those limits are set based on process split, and they may be fairly wide. But if you have analytics that you can bring to bear, then the AI- or ML-type techniques can basically tell you where this die lies on the process spectrum. Perhaps it was fed forward from an earlier insertion, and perhaps you combine it with what you’re doing at the current insertion. That kind of inference can help you narrow the search limits and speed up that test. A lot of people are very interested in this application, and some folks are doing it in production to reduce search time for test time-intensive tests.”


Fig. 3: Opportunities for real-time and/or post-test improvements to pair or bin devices, improve yield, throughput, reliability or cost using the ACS platform. Source: Advantest

“The idea behind shift left is perhaps I have a very expensive test insertion downstream or a high package cost,” Butler said. “If my yield is not where I want it to be, then I can use analytics at earlier insertions to be able to try to predict which devices are likely to fail at the later insertion by doing analysis at an earlier insertion, and then downgrade or scrap those die in order to optimize downstream test insertions, raising the yield and lowering overall cost. Test time reduction is very simply the addition or removal of test content, skipping tests to reduce cost. Or you might want to add test content for yield improvement,” said Butler.

“If I have a multi-tiered device, and it’s not going to pass bin 1 criteria – but maybe it’s bin 2 if I add some additional content — then people may be looking at analytics to try to make those decisions. Finally, two things go together in my mind, this idea of chiplet designs and smart pairing. So the classic example is a processor die with a stack of high bandwidth memory on top of it. Perhaps I’m interested in high performance in some applications and low power in others. I want to be able to match the content and classify die as they’re coming through the test operation, and then downstream do pick-and-place and put them together in such a way that I maximize the yield for multiple streams of data. Similar kinds of things apply for achieving a low power footprint and carbon footprint.”

Generative AI
The question that inevitably comes up when discussing the role of AI in semiconductors is whether or not large language models like ChatGPT can prove useful to engineers working in fabs. Early work shows some promise.

“For example, you can ask the system to build an outlier detection model for you that looks for parts that are five sigma away from the center line, saying ‘Please create the script for me,’ and the system will create the script. These are the kinds of automated, generative AI-based solutions that we’re already playing with,” says Schuldenfrei. “But from everything I’ve seen so far, there is still quite a lot of work to be done to get these systems to provide outputs with high enough quality. At the moment, the amount of human interaction that is needed afterward to fix problems with the algorithms or models that generative AI is producing is still quite significant.”

A lingering question is how to access the test programs needed to train the new test programs when everyone is protecting important test IP? “Most people value their test IP and don’t necessarily want to set up guardrails around the training and utilization processes,” Butler said. “So finding a way to accelerate the overall process of developing test programs while protecting IP is the challenge. It’s clear this kind of technology is going to be brought to bear, just like we already see in the software development process.”

Failure analysis
Failure analysis is typically a costly and time-consuming endeavor for fabs because it requires a trip back in time to gather wafer processing, assembly, and packaging data specific to a particular failed device, known as a returned material authorization (RMA). Physical failure analysis is performed in an FA lab, using a variety of tools to trace the root cause of the failure.

While scan diagnostic data has been used for decades, a newer approach involves pairing a digital twin with scan diagnostics data to find the root cause of failures.

“Within test, we have a digital twin that does root cause deconvolution based on scan failure diagnosis. So instead of having to look at the physical device and spend time trying to figure out the root cause, since we have scan, we have millions and millions of virtual sample points,” said Siemens’ Press. “We can reverse-engineer what we did to create the patterns and figure out where the mis-compare happened within the scan cells deep within the design. Using YieldInsight and unsupervised machine learning with training on a bunch of data, we can very quickly pinpoint the fail locations. This allows us to run thousands, or tens of thousands fail diagnoses in a short period of time, giving us the opportunity to identify the systematic yield limiters.”

Yet another approach that is gaining steam is using on-die monitors to access specific performance information in lieu of physical FA. “What is needed is deep data from inside the package to monitor performance and reliability continuously, which is what we provide,” said Alex Burlak, vice president of test and analytics at proteanTecs. “For example, if the suspected failure is from the chiplet interconnect, we can help the analysis using deep data coming from on-chip agents instead of taking the device out of context and into the lab (where you may or may not be able to reproduce the problem). Even more, the ability to send back data and not the device can in many cases pinpoint the problem, saving the expensive RMA and failure analysis procedure.”

Conclusion
The enthusiasm around AI and machine learning is being met by robust infrastructure changes in the ATE community to accommodate the need for real-time inferencing of test data and test optimization for higher yield, higher throughput, and chiplet classifications for multi-chiplet packages. For multi-core designs, packetized test, commercialized as an SSN methodology, provides a more flexible approach to optimizing each core for the number of scan chains, patterns and bus width needs of each core in a device.

The number of testing applications that can benefit from AI continues to rise, including test time reduction, Vmin/Fmax search reduction, shift left, smart pairing of chiplets, and overall power reduction. New developments like identical source files for all setups across design, characterization, and test help speed the critical debug and development stage for new products.

Reference

  1. https://proceedings.neurips.cc/paper_files/paper/2015/file/86df7dcfd896fcaf2674f757a2463eba-Paper.pdf

The post AI/ML’s Role In Design And Test Expands appeared first on Semiconductor Engineering.

Controlling Warpage In Advanced Packages

Warpage is becoming a serious concern in advanced packaging, where a heterogeneous mix of materials can cause uneven stress points during assembly and packaging, and under real workloads in the field.

Warpage plays a critical role in determining whether an advanced package can be assembled successfully and meet long-term reliability targets. New advances, such as molding compounds with improved thermal properties, advanced modeling techniques, and creative architectures involving two molding steps are enabling greater control over package warpage, while also providing more flexibility to optimize a robust multi-chiplet system.

Warpage is the inevitable result of the mismatch in coefficients of thermal expansion (CTEs) between the silicon chip, molding compound, copper, polyimide, and other materials. It changes throughout the assembly process, and can cause cracking or delamination failures. The most vulnerable spots include low-k cores, which are subject to cracking and shorts, or non-wet failures in micro-bumps.

“One thing that’s very hot these days is the discussion around warpage and stress of the package,” said Kenneth Larsen, senior director of product management at Synopsys. “This is not only when you’re going through the manufacturing process, where you change temperatures. That can cause warpage. But it’s also when the device you’re building needs to be inserted into a socket. You can have issues around warpage there, as well.”

Even when warpage is effectively addressed during assembly and packaging, a device still may warp under heavy usage in the field. This is particularly true with heterogeneous designs, where chiplets are developed using different materials or processes, and where logic is concentrated in specific areas of an asymmetrical package.

The transition to multi-chiplet packaging is accelerating rapidly due to demands for ever-higher processing speeds and low latency, especially in mobile, automotive and high-performance compute/AI applications. Engineers increasingly are turning to modeling and simulation to understand temperature-dependent warpage, which can vary depending on die thickness, mold-to-silicon ratio, and substrate type. Organic substrates are very attractive because they are inexpensive and can be customized to any size, but they are much more flexible and susceptible to warpage than silicon substrates.

All these considerations point to the need for thermal and structural models of complex heterogeneous assemblies and packages. “Advanced modeling allows companies to simulate the behavior of different materials, thermal dynamics, and mechanical stresses during the assembly process,” said Mike Kelly, vice president of chiplets/FCBGA integration at Amkor. “Through this virtual experimentation, one can predict and mitigate potential challenges, ensuring that the final product meets stringent quality and reliability standards.”

How warpage happens
The assembly process includes multiple heating and cooling steps, which induce a certain amount of deformation between adjacent materials with different thermal and mechanical properties. In advanced packaging, warpage in the 100 micron range is not unheard of.

One of the reasons warpage is such a problem today is the large size of chiplets and the very tight process windows for chiplets, redistribution layers (RDLs), substrates, and bumps of various sizes. The relative expansion and contraction of neighboring materials depends on differences in the material’s CTE, which spells out the increase in size with each degree change of temperature (ppm/°C).

“Chiplets are typically relatively large die,” said Dick Otte, CEO of Promex Industries. “In the iPad, it’s 20 x 30 millimeters, with as many as 10,000 I/Os — usually copper pillar. Just simply taking a single die and putting it down on a substrate can be quite a challenge because the pitches are so small. So what’s critical for these assemblies is controlling warpage and planarity. It needs to stay planar through the whole reflow solder process to bridge that gap between the copper pillar and the contact on the circuit board without warping.”

Warpage can either happen upward, bending at the edges (smiling), or downward (crying), depending on the relative CTEs of the materials in the stack. Silicon, for example, is 2.8; copper is 17; FR4 PCB is 14 to 17 ppm/°C. The worst CTE mismatch is between a silicon interposer and an organic substrate.

It helps to envision stacks in packaging as groups of materials. “You have to look at the CTE of the materials and their reaction at temperatures, so you’ve got relatively low expansion copper on the top and solder at the bottom,” Otte said. “They’re kind of equal with a high expansion dielectric in the middle, so that when you heat this thing up, it kind of expands by the same amount. If you just put all the copper on the top, that thing is going to warp toward the copper side when you heat it up. Copper is 15 ppm per degree C. The organics are more like twice that, at 25 to 30 ppm/°C.

Other key metrics are the modulus, or the elasticity of a material, and the glass transition temperature (Tg), the temperature at which a material begins to flow. These values are related, too. For example, when it comes to the thermal behavior of polymers like epoxy molding compound (EMC), the modulus tends to plummet above its glass transition temperature. That happens because polymer chains tend to slide freely in the liquid state, whereas they are stiffer in a solid form.

In addition to solder reflow, warpage tends to occur at the post-molding curing step. Hung-Chun Yang and colleagues at ASE recently determined that die thickness substantially influences warpage levels measured at multiple steps in an existing process for chip-first fan-out chip on substrate package. [1] They noted that “severe wafer warpage occurred after curing, resulting in misalignment and difficulty in handling in the subsequent process.” To reduce package warpage, the team replaced a metal carrier/thin film approach with a glass carrier. The team also determined that a 3D finite element method (FEM) captures the warpage behavior and agreed well with actual test vehicle data.


Fig. 1: The glass carrier in the improved flow (right) induced less warpage than the original flow. Increasing the die thickness also dramatically reduced warpage. Source: ASE

The chip-first process begins with probing the fabricated wafers, thinning and then electroplating copper studs prior to sawing and placement of known good die in two schemes. The initial process used a metal carrier that is removed after molding and replaced with a thin film. The improved process uses a glass carrier that remained through molding, curing, mold grinding, RDL, and copper pillar processes, and then wass de-bonded.

Warpage reaches its maximum level during post-mold curing, and it changes most dramatically at the curing step and after glass carrier debonding. The glass carrier flow reduces warpage overall. In addition, the ASE engineers determined they can reduce warpage an additional 35% by increasing the wafer thickness from 0.54mm to 0.7mm.

A second strategy for reducing warpage involves using EMCs with different thermal properties, especially when the process calls for two molding steps. Amkor engineers recently evaluated the reliability performance of two high-performance multi-chiplet packages by modeling and fabricating two high-performance test vehicles. One used a module approximately the size of one reticle, containing 1 ASIC, 2 HBMs and 2 bridge die (33 x 26mm). The second module was 3 reticles in size, with 2 ASICs, 8 HBMs and 10 bridge dies (54 x 46mm). [2] Heejun Jang and colleagues at Amkor Technology Korea carried out modeling and simulation using the Ansys Parametric Design Language (APDL) version 16.1 simulator and compared results with test vehicles containing dummy dies.

Amkor’s die-last S-Connect process starts with a carrier wafer, on which copper studs for the bridge die and copper pillars are fabricated (see figure 2). The integrated passives and bridge die are embedded in the first mold, which is cured and then ground back. RDL is deposited on the mold and solder capture pads and dies attached to the pads using micro-bumps. Then, the solder is reflowed and underfilled. The second mold around the face-up die is cured and ground back, followed by C4 bumping on the bottom for flip-chip connect to the substrate. The simulation analyzes warpage with 9 combinations of 3 different EMCs with high, medium, and low CTEs (7 to 12 ppm below Tg, 22 to 46 ppm above Tg) and high-to-low glass transition temperatures (145°C to 175°C). [2]


Fig. 2: Process flow for S-Connect Package. Source: Amkor

Warpage as a function of EMC choice showed all materials followed the same smile pattern at room temperature, and cry pattern at high temperature (250°C). The EMCs with the lower CTEs caused less warpage. And in cases where the mold occupies more area relative to chip area, the warpage level is more pronounced. More importantly, the warpage levels were roughly 50% higher for 450µm die relative to 650µm-thick die. Interestingly, the thicker silicon die was 3X more effective in controlling warpage relative to EMC material selection on overall module warpage, so die thickness is the biggest lever in reducing warpage in cases where it can be increased.

Reliability testing is paramount once the package configuration is chosen. Amkor ran its advanced packaging test vehicles through moisture resistance testing, highly accelerated stress testing, thermal cycling condition B, and high temperature storage tests. These are needed to root out infant mortality issues, and cross-sectional analysis can reveal any cracks or latent defects that could precipitate into failures in field use.

While the above example may constitute a large multi-chiplet package today, package sizes are growing larger still, which means even more attention to warpage will be needed. More and more this will drive assembly lines toward digital twin or virtual representations to enable process and package optimization.

“By creating virtual representations of the semiconductor assembly line, one can identify potential areas of concern and optimize control strategies,” said Amkor’s Kelly. “Virtual fabrication in package assembly enables companies to assess the impact of design changes on manufacturing processes before physical prototypes are even created. This not only accelerates the product development cycle, but also minimizes the risk of costly errors.”

The early identification of potential bottlenecks further shortens cycle times, and enhances overall efficiency.

Conclusion
Going forward, even greater attention to mechanical and thermal properties will be required by teams comprised of designers and packaging engineers. “Tight tolerances in new packaging design require an accurate analysis of mechanical and electrical tolerances during stack up,” said Curtis Zwenger, vice president of engineering and technical marketing at Amkor. “Increasingly higher levels of process capability are required, with common metrics like CpK. Identification of these critical interactions in the design can be accomplished early in process development with this type of modeling. In turn, these analyses guide the investment of advanced process control to ensure process capability is maintained.”

References

  1. C. Yang, et al, “Investigation of Wafer Warpage Evolution Based on Fan-out Chip-first Process,” 2024 International Conference on Electronics Packaging (ICEP), Toyama, Japan, 2024, pp. 151-152, doi: 10.23919/ICEP61562.2024.10535572.
  2. H. Jang et al., “Reliability Performance of S-Connect Module (Bridge Technology) for Heterogeneous Integration Packaging,” 2023 IEEE 73rd Electronic Components and Technology Conference (ECTC), Orlando, FL, USA, 2023, pp. 1027-1031, doi: 10.1109/ECTC51909.2023.00175.

Related Reading
What Works Best For Chiplets
Not all chiplets are interchangeable, and options will be limited.

The post Controlling Warpage In Advanced Packages appeared first on Semiconductor Engineering.

Electromigration Concerns Grow In Advanced Packages

The incessant demand for more speed in chips requires forcing more energy through ever-smaller devices, increasing current density and threatening long-term chip reliability. While this problem is well understood, it’s becoming more difficult to contain in leading-edge designs.

Of particular concern is electromigration, which is becoming more troublesome in advanced packages with multiple chiplets, where various bonding and interconnect schemes create abrupt changes in materials and geometries. For example, electrons may travel from a copper trace to a solder bump of SAC (tin-silver-copper), then to an underbump metal based on nickel, and finally to an interposer copper pad. That, in turn, can cause atoms to shift, resulting in failures in solder joints or in copper redistribution layers in high-density fan-out packages.

“From an electromigration perspective, advanced packaging causes increased packaging density, reduced packaging size, and the dimensions of interconnects to shrink, so the current density is now in close proximity to the maximum current density limit per EM design rules,” said Dermott Lynch, director of technical product management in Synopsys‘ EDA Group.

Any additional stresses the package may be subjected to during assembly and use, whether mechanical or thermal, also can help induce or accelerate electromigration. “Electromigration, in general, gets worse due to temperature and stress, both of which advanced packaging increases,” said Lynch. “Electromigration is also cumulative, so essentially it integrates all the temperature highs and stress over the lifetime until an interconnect breaks down or shorts. Larger processing temperature and operation temperature will make it worse, but it also depends on time under that temperature.”

In fact, managing thermal pathways is perhaps the greatest challenge associated the movement toward the ultimate package, a 3D-IC. “Electromigration is very temperature-sensitive,” said Marc Swinnen, director of product marketing in Ansys’ Semiconductor Division. “Depending on your thermal map, your power integrity will have to adapt to the local temperature profile that you have. So when you look at a chip, you can calculate how much power the chip is putting out, but you cannot tell how hot the chip will get because ‘it depends.’ Is it sitting on a cold plate or sitting in the sun in the Sahara? System concerns come in, and multi-physics modeling is important to understanding these co-dependent effects.”

Thermal engineering also means moving heat away from the most vulnerable points of failure, such as solder bumps. “Effective thermal management is essential for bump reliability,” said Curtis Zwenger, vice president of engineering and technical marketing at Amkor. “Engineers are incorporating thermal enhancement techniques, such as the use of thermal interface materials and advanced heat dissipation solutions, to ensure that bumps are not subjected to excessive temperature-related stresses.”

Zwenger noted that engineers are looking into new materials, while optimizing the use of existing materials to minimize the possibility of electromigration. “Semiconductor packaging engineers are implementing a range of measures to enhance bump reliability and maximize bump yield. These strategies include new materials for solder bumps and underbump metallization, optimizing bump size, pitch and shape for reliability, advanced process control methods to control variability and maximize yield, and simulating and modeling reliability.”

What is electromigration?
Electromigration is the mass transport of metal atoms caused by the electron wind from current flowing through a conductor, typically copper. When current density is high enough, metal will diffuse in the direction of current flow, creating tiny hillocks downstream and leaving behind vacancies or voids. With enough electromigration, failures occur due to severe line thinning, causing opens, or due to hillocks that bridge adjacent lines, causing short circuits.

Electromigration is a diffusion-controlled mechanism that can take three forms — bulk, grain boundary, or surface diffusion, depending on the metal. Aluminum migrates by grain boundary diffusion whereas copper migrates on the surface or at its grain boundaries.

For most of the semiconductor industry’s history, electromigration was primarily an on-chip concern, but on-chip EM is largely under control by reliability engineers. But with the scaling and rapid developments in advanced packaging — implementing TSVs, fan-out packaging with redistribution layers, and copper pillar bumps — electromigration has emerged as a major threat at the package level. Current flowing through the solder bump causes joule heating, and heat from other parts of the package may also dissipate through the solder bumps. EM can become an issue for solder joint connections between chip and interposer, or chip and PCB, as well as in RDLs. Solder joint failures typically manifest as voids or cracks.

Fig. 1: Electromigration can create short circuits between two interconnects through the development of hillocks, or an open circuit through the creation of voids in interconnect. Source: Ansys

Fig. 1: Electromigration can create short circuits between two interconnects through the development of hillocks, or an open circuit through the creation of voids in interconnect. Source: Ansys

Electromigration progresses more quickly at higher temperatures, at higher currents, under greater mechanical stress and in the presence of defects or impurities in the metal. Black’s equation describes an interconnect’s mean time-to-failure with respect to its temperature, current density and the activation energy needed to dislodge a metal atom as:

Black's equation

J is the current density, k is Boltzmann’s constant, T is temperature, Ea is the activation energy, and N is a scaling factor that depends on the metal’s properties. Black’s equation is useful because it easily shows how shorter, wider interconnects will tend to have longer MTTF. In addition, electromigration time-to-failure very strongly depends on the interconnect’s temperature. That temperature is primarily the result of the chip’s environmental temperature, self-heating of the conductor caused by current flow, the heat from neighboring interconnects or transistors, and the thermal conductivity of the surrounding material.

It is also important to note that electromigration is a runaway process. As current density and/or temperature increases, electromigration increases, which raises current density, causing more metal to migrate in a destructive feedback loop.

EM failure modes and allowable current density
In the case of copper redistribution layers in polyimide material, as current flows through the RDL, heat accumulates in the conductor due to Joule heating generation, which can degrade performance. As the required current density and Joule heating temperature is increasing in the fine-line Cu RDL structures (<5nm lines and spaces), self-heating is considered a key factor in the reliability of high-density fan out packages.

JiHye Kwon, senior manager of R&D at Amkor, recently used EM testing and Black’s equation to determine the electromigration failure mechanisms for a given RDL stack and high-density fan-out package with 2µm or 10µm wide RDL layers, 1,000µm long. [1]

High density fan-out is an emerging technology, as it features more aggressive scaling than wafer level fan-out packages. The three layers of copper RDL (3µm thick with Ta/Cu seed) were fabricated followed by polyimide fill, copper pillar deposition, die attach, and overmold. Kwon’s team tested both 2 and 10µm RDL at different current densities and temperatures until resistance increased by 100% (EM failure), but the maximum allowed current density corresponded with a 20% resistance increase. The failure modes occurred in two stages, first by void nucleation and growth and second with copper reduction and oxidation. The study yielded Ea and current density exponent values that can be useful in future designs of RDLs.

Meanwhile, a team of researchers from ASE recently demonstrated how susceptibility to electromigration is determined on copper pillar interconnects in flip chip quad flat no-lead (FCQFN) for high-power automotive applications. The multi-layered copper pillar bumps with a Cu/Ni/Sn1.8Ag configuration were bonded to a silver-plated copper leadframe and tested under extreme EM conditions of 10 kA/cm2 current density and temperatures of 150°C, 160°C and 180°C, while taking in-situ resistance measurements. [2] The EM failures corresponded with rapid rises in electrical resistance that corresponded with the formation of intermetallic compounds and voids at the Cu/solder interfaces. The team built an EM prediction model of interconnects based on a Black-type EM equation, following the JEDEC standard with five test conditions.

After the statistic calculation from the lifetime of samples, the ASE team determined activation energy of Cu pillar interconnects in the FCQFN package (1.12 ± 0.03 eV). The maximum current of the Cu pillar interconnects allowable lasting 10 years at a 105°C operating temperature at a 0.1% failure rate was larger than 2A for the FCQFN Cu pillar structure. “The FCQFN package has great potential in terms of its excellent anti-EM performance for future high-power applications,” the article said.

Designing/manufacturing for EM resiliency
Building electromigration resilience into advanced devices begins with using only EM-compliant linewidths in circuit designs based on the current density and heat profile that the interconnects will experience during operation over the lifetime of the device. Electromigration mitigation also requires process and materials engineering to ensure durability, for instance, of copper pillar bumps under BGA packages. It also calls for an optimized assembly process window and tight process control to prevent tiny violations of design rules that can later precipitate as EM failures.

As the industry makes its way toward true 3D packages, and eventually 3D-ICs, it seems clear that modeling and simulation will play an increasing role in determining many of the guard rails for manufacturing and assembly before manufacturing and assembly even begins. “Reliability modeling and simulation tools are being used to better understand the reliability of bump structures. This proactive approach helps in identifying potential issues before they arise, enabling engineers to implement preventive measures,” said Zwenger.

Modeling and simulation at the system level also will be essential to understanding the complex interplay between reliability mechanisms with thermal and mechanical stress in multi-chiplet systems during operation.

“Electromigration for stacked die is challenging,” said Synopsys’ Lynch. “Localized, die-to-die workloads cause repetitive current flow in specific areas. This generates local heat, increasing EM resulting in wire degradation, while producing even more heat. Reducing the thermal issue becomes critical to ensuring EM reliability.”

As stated previously, solder bumps can become a site for EM reliability failure. “Engineers fine-tune bump design in terms of bump size, pitch, and shape to ensure uniformity and reliability across the entire package. This includes the adoption of innovative Cu bump structures for improved mechanical and electrical properties,” said Amkor’s Zwenger.

In flip-chip BGA and other flip-chip applications, underfill materials — typically thermoset epoxies — are used to reduce the thermal stresses on solder bumps. “Underfill materials play a critical role in providing mechanical support and thermal stability to the bumps,” Zwenger said. “Engineers are investing in the development of advanced underfill formulations with enhanced properties, such as improved adhesion, thermal conductivity, and stress relief.”

Conclusion
Because of its dependence on temperature, electromigration is a failure mechanism to watch and plan for as devices continue to scale and systems integrators continue to cram more and more chiplets of various functions into advanced packages.

“In advanced technologies, the current density is now in close proximity to the maximum density,” said Synopsys’ Lynch. “Anything that causes an increase in temperature poses a threat. Designers of multi-die systems need to understand the impact of temperature and design systems to remove the heat.”

References

  1. JiHye Kwon, “Electromigration Performance Of Fine-Line Cu Redistribution Layer (RDL) For HDFO Packaging,” Semiconductor Engineering, Jan. 18, 2024, https://semiengineering.com/electromigration-performance-of-fine-line-cu-redistribution-layer-rdl-for-hdfo-packaging/
  2. -Y. Tsai, et al., “An Electromigration Study of Cu Pillar Interconnects in Flip-chip QFN Packaging under Extreme Conditions for High-power Applications,” 2023 IEEE 25th Electronics Packaging Technology Conference (EPTC), Singapore, 2023, pp. 326-332, doi: 10.1109/EPTC59621.2023.10457564.

Related Reading
What Can Go Wrong In Heterogeneous Integration
Workflows and tools are disconnected, mechanical stress is ill-defined, and complete co-planarity is nearly impossible. But there are solutions on the horizon.
Thermal Integrity Challenges Grow In 2.5D
Work is underway to map heat flows in interposer-based designs, but there’s much more to be done.
Chiplets: 2023 (EBook)
What chiplets are, what they are being used for today, and what they will be used for in the future.

The post Electromigration Concerns Grow In Advanced Packages appeared first on Semiconductor Engineering.

Electromigration Concerns Grow In Advanced Packages

The incessant demand for more speed in chips requires forcing more energy through ever-smaller devices, increasing current density and threatening long-term chip reliability. While this problem is well understood, it’s becoming more difficult to contain in leading-edge designs.

Of particular concern is electromigration, which is becoming more troublesome in advanced packages with multiple chiplets, where various bonding and interconnect schemes create abrupt changes in materials and geometries. For example, electrons may travel from a copper trace to a solder bump of SAC (tin-silver-copper), then to an underbump metal based on nickel, and finally to an interposer copper pad. That, in turn, can cause atoms to shift, resulting in failures in solder joints or in copper redistribution layers in high-density fan-out packages.

“From an electromigration perspective, advanced packaging causes increased packaging density, reduced packaging size, and the dimensions of interconnects to shrink, so the current density is now in close proximity to the maximum current density limit per EM design rules,” said Dermott Lynch, director of technical product management in Synopsys‘ EDA Group.

Any additional stresses the package may be subjected to during assembly and use, whether mechanical or thermal, also can help induce or accelerate electromigration. “Electromigration, in general, gets worse due to temperature and stress, both of which advanced packaging increases,” said Lynch. “Electromigration is also cumulative, so essentially it integrates all the temperature highs and stress over the lifetime until an interconnect breaks down or shorts. Larger processing temperature and operation temperature will make it worse, but it also depends on time under that temperature.”

In fact, managing thermal pathways is perhaps the greatest challenge associated the movement toward the ultimate package, a 3D-IC. “Electromigration is very temperature-sensitive,” said Marc Swinnen, director of product marketing in Ansys’ Semiconductor Division. “Depending on your thermal map, your power integrity will have to adapt to the local temperature profile that you have. So when you look at a chip, you can calculate how much power the chip is putting out, but you cannot tell how hot the chip will get because ‘it depends.’ Is it sitting on a cold plate or sitting in the sun in the Sahara? System concerns come in, and multi-physics modeling is important to understanding these co-dependent effects.”

Thermal engineering also means moving heat away from the most vulnerable points of failure, such as solder bumps. “Effective thermal management is essential for bump reliability,” said Curtis Zwenger, vice president of engineering and technical marketing at Amkor. “Engineers are incorporating thermal enhancement techniques, such as the use of thermal interface materials and advanced heat dissipation solutions, to ensure that bumps are not subjected to excessive temperature-related stresses.”

Zwenger noted that engineers are looking into new materials, while optimizing the use of existing materials to minimize the possibility of electromigration. “Semiconductor packaging engineers are implementing a range of measures to enhance bump reliability and maximize bump yield. These strategies include new materials for solder bumps and underbump metallization, optimizing bump size, pitch and shape for reliability, advanced process control methods to control variability and maximize yield, and simulating and modeling reliability.”

What is electromigration?
Electromigration is the mass transport of metal atoms caused by the electron wind from current flowing through a conductor, typically copper. When current density is high enough, metal will diffuse in the direction of current flow, creating tiny hillocks downstream and leaving behind vacancies or voids. With enough electromigration, failures occur due to severe line thinning, causing opens, or due to hillocks that bridge adjacent lines, causing short circuits.

Electromigration is a diffusion-controlled mechanism that can take three forms — bulk, grain boundary, or surface diffusion, depending on the metal. Aluminum migrates by grain boundary diffusion whereas copper migrates on the surface or at its grain boundaries.

For most of the semiconductor industry’s history, electromigration was primarily an on-chip concern, but on-chip EM is largely under control by reliability engineers. But with the scaling and rapid developments in advanced packaging — implementing TSVs, fan-out packaging with redistribution layers, and copper pillar bumps — electromigration has emerged as a major threat at the package level. Current flowing through the solder bump causes joule heating, and heat from other parts of the package may also dissipate through the solder bumps. EM can become an issue for solder joint connections between chip and interposer, or chip and PCB, as well as in RDLs. Solder joint failures typically manifest as voids or cracks.

Fig. 1: Electromigration can create short circuits between two interconnects through the development of hillocks, or an open circuit through the creation of voids in interconnect. Source: Ansys

Fig. 1: Electromigration can create short circuits between two interconnects through the development of hillocks, or an open circuit through the creation of voids in interconnect. Source: Ansys

Electromigration progresses more quickly at higher temperatures, at higher currents, under greater mechanical stress and in the presence of defects or impurities in the metal. Black’s equation describes an interconnect’s mean time-to-failure with respect to its temperature, current density and the activation energy needed to dislodge a metal atom as:

Black's equation

J is the current density, k is Boltzmann’s constant, T is temperature, Ea is the activation energy, and N is a scaling factor that depends on the metal’s properties. Black’s equation is useful because it easily shows how shorter, wider interconnects will tend to have longer MTTF. In addition, electromigration time-to-failure very strongly depends on the interconnect’s temperature. That temperature is primarily the result of the chip’s environmental temperature, self-heating of the conductor caused by current flow, the heat from neighboring interconnects or transistors, and the thermal conductivity of the surrounding material.

It is also important to note that electromigration is a runaway process. As current density and/or temperature increases, electromigration increases, which raises current density, causing more metal to migrate in a destructive feedback loop.

EM failure modes and allowable current density
In the case of copper redistribution layers in polyimide material, as current flows through the RDL, heat accumulates in the conductor due to Joule heating generation, which can degrade performance. As the required current density and Joule heating temperature is increasing in the fine-line Cu RDL structures (<5nm lines and spaces), self-heating is considered a key factor in the reliability of high-density fan out packages.

JiHye Kwon, senior manager of R&D at Amkor, recently used EM testing and Black’s equation to determine the electromigration failure mechanisms for a given RDL stack and high-density fan-out package with 2µm or 10µm wide RDL layers, 1,000µm long. [1]

High density fan-out is an emerging technology, as it features more aggressive scaling than wafer level fan-out packages. The three layers of copper RDL (3µm thick with Ta/Cu seed) were fabricated followed by polyimide fill, copper pillar deposition, die attach, and overmold. Kwon’s team tested both 2 and 10µm RDL at different current densities and temperatures until resistance increased by 100% (EM failure), but the maximum allowed current density corresponded with a 20% resistance increase. The failure modes occurred in two stages, first by void nucleation and growth and second with copper reduction and oxidation. The study yielded Ea and current density exponent values that can be useful in future designs of RDLs.

Meanwhile, a team of researchers from ASE recently demonstrated how susceptibility to electromigration is determined on copper pillar interconnects in flip chip quad flat no-lead (FCQFN) for high-power automotive applications. The multi-layered copper pillar bumps with a Cu/Ni/Sn1.8Ag configuration were bonded to a silver-plated copper leadframe and tested under extreme EM conditions of 10 kA/cm2 current density and temperatures of 150°C, 160°C and 180°C, while taking in-situ resistance measurements. [2] The EM failures corresponded with rapid rises in electrical resistance that corresponded with the formation of intermetallic compounds and voids at the Cu/solder interfaces. The team built an EM prediction model of interconnects based on a Black-type EM equation, following the JEDEC standard with five test conditions.

After the statistic calculation from the lifetime of samples, the ASE team determined activation energy of Cu pillar interconnects in the FCQFN package (1.12 ± 0.03 eV). The maximum current of the Cu pillar interconnects allowable lasting 10 years at a 105°C operating temperature at a 0.1% failure rate was larger than 2A for the FCQFN Cu pillar structure. “The FCQFN package has great potential in terms of its excellent anti-EM performance for future high-power applications,” the article said.

Designing/manufacturing for EM resiliency
Building electromigration resilience into advanced devices begins with using only EM-compliant linewidths in circuit designs based on the current density and heat profile that the interconnects will experience during operation over the lifetime of the device. Electromigration mitigation also requires process and materials engineering to ensure durability, for instance, of copper pillar bumps under BGA packages. It also calls for an optimized assembly process window and tight process control to prevent tiny violations of design rules that can later precipitate as EM failures.

As the industry makes its way toward true 3D packages, and eventually 3D-ICs, it seems clear that modeling and simulation will play an increasing role in determining many of the guard rails for manufacturing and assembly before manufacturing and assembly even begins. “Reliability modeling and simulation tools are being used to better understand the reliability of bump structures. This proactive approach helps in identifying potential issues before they arise, enabling engineers to implement preventive measures,” said Zwenger.

Modeling and simulation at the system level also will be essential to understanding the complex interplay between reliability mechanisms with thermal and mechanical stress in multi-chiplet systems during operation.

“Electromigration for stacked die is challenging,” said Synopsys’ Lynch. “Localized, die-to-die workloads cause repetitive current flow in specific areas. This generates local heat, increasing EM resulting in wire degradation, while producing even more heat. Reducing the thermal issue becomes critical to ensuring EM reliability.”

As stated previously, solder bumps can become a site for EM reliability failure. “Engineers fine-tune bump design in terms of bump size, pitch, and shape to ensure uniformity and reliability across the entire package. This includes the adoption of innovative Cu bump structures for improved mechanical and electrical properties,” said Amkor’s Zwenger.

In flip-chip BGA and other flip-chip applications, underfill materials — typically thermoset epoxies — are used to reduce the thermal stresses on solder bumps. “Underfill materials play a critical role in providing mechanical support and thermal stability to the bumps,” Zwenger said. “Engineers are investing in the development of advanced underfill formulations with enhanced properties, such as improved adhesion, thermal conductivity, and stress relief.”

Conclusion
Because of its dependence on temperature, electromigration is a failure mechanism to watch and plan for as devices continue to scale and systems integrators continue to cram more and more chiplets of various functions into advanced packages.

“In advanced technologies, the current density is now in close proximity to the maximum density,” said Synopsys’ Lynch. “Anything that causes an increase in temperature poses a threat. Designers of multi-die systems need to understand the impact of temperature and design systems to remove the heat.”

References

  1. JiHye Kwon, “Electromigration Performance Of Fine-Line Cu Redistribution Layer (RDL) For HDFO Packaging,” Semiconductor Engineering, Jan. 18, 2024, https://semiengineering.com/electromigration-performance-of-fine-line-cu-redistribution-layer-rdl-for-hdfo-packaging/
  2. -Y. Tsai, et al., “An Electromigration Study of Cu Pillar Interconnects in Flip-chip QFN Packaging under Extreme Conditions for High-power Applications,” 2023 IEEE 25th Electronics Packaging Technology Conference (EPTC), Singapore, 2023, pp. 326-332, doi: 10.1109/EPTC59621.2023.10457564.

Related Reading
What Can Go Wrong In Heterogeneous Integration
Workflows and tools are disconnected, mechanical stress is ill-defined, and complete co-planarity is nearly impossible. But there are solutions on the horizon.
Thermal Integrity Challenges Grow In 2.5D
Work is underway to map heat flows in interposer-based designs, but there’s much more to be done.
Chiplets: 2023 (EBook)
What chiplets are, what they are being used for today, and what they will be used for in the future.

The post Electromigration Concerns Grow In Advanced Packages appeared first on Semiconductor Engineering.

❌