Zobrazení pro čtení

Jsou dostupné nové články, klikněte pro obnovení stránky.

AI/ML’s Role In Design And Test Expands

Od: Laura Peters

5. Srpen 2024 v 09:03

The role of AI and ML in test keeps growing, providing significant time and money savings that often exceed initial expectations. But it doesn’t work in all cases, sometimes even disrupting well-tested process flows with questionable return on investment.

One of the big attractions of AI is its ability to apply analytics to large data sets that are otherwise limited by human capabilities. In the critical design-to-test realm, AI can address problems such as tool incompatibilities between the design set-up, simulation, and ATE test program, which typically slows debugging and development efforts. Some of the most time-consuming and costly aspects of design-to-test arise from incompatibilities between tools.

“During device bring-up and debug, complex software/hardware interactions can expose the need for domain knowledge from multiple teams or stakeholders, who may not be familiar with each other’s tools,” said Richard Fanning, lead software engineer at Teradyne. “Any time spent doing conversions or debugging differences in these set-ups is time wasted. Our toolset targets this exact problem by allowing all set-ups to use the same set of source files so everyone can be sure they are running the same thing.”

ML/AI can help keep design teams on track, as well. “As we drive down this technology curve, the analytics and the compute infrastructure that we have to bring to bear becomes increasingly more complex and you want to be able to make the right decision with a minimal amount of overkill,” said Ken Butler, senior director of business development in the ACS data analytics platform group at Advantest. “In some cases, we are customizing the test solution on a die-by-die type of basis.”

But despite the hype, not all tools work well in every circumstance. “AI has some great capabilities, but it’s really just a tool,” said Ron Press, senior director of technology enablement at Siemens Digital Industries Software, in a recent presentation at a MEPTEC event. “We still need engineering innovation. So sometimes people write about how AI is going to take away everybody’s job. I don’t see that at all. We have more complex designs and scaling in our designs. We need to get the same work done even faster by using AI as a tool to get us there.”

Speeding design to characterization to first silicon
In the face of ever-shrinking process windows and the lowest allowable defectivity rates, chipmakers continually are improving the design-to-test processes to ensure maximum efficiency during device bring-up and into high volume manufacturing. “Analytics in test operations is not a new thing. This industry has a history of analyzing test data and making product decisions for more than 30 years,” said Advantest’s Butler. “What is different now is that we’re moving to increasingly smaller geometries, advanced packaging technologies and chiplet-based designs. And that’s driving us to change the nature of the type of analytics that we do, both in terms of the software and the hardware infrastructure. But from a production test viewpoint, we’re still kind of in the early days of our journey with AI and test.”

Nonetheless, early adopters are building out the infrastructure needed for in-line compute and AI/ML modeling to support real-time inferencing in test cells. And because no one company has all the expertise needed in-house, partnerships and libraries of applications are being developed with tool-to-tool compatibility in mind.

“Protocol libraries provide out-of-the-box solutions for communicating common protocols. This reduces the development and debug effort for device communication,” said Teradyne’s Fanning. “We have seen situations where a test engineer has been tasked with talking to a new protocol interface, and saved significant time using this feature.”

In fact, data compatibility is a consistent theme, from design all the way through to the latest developments in ATE hardware and software. “Using the same test sequences between characterization and production has become key as the device complexity has increased exponentially,” explained Teradyne’s Fanning. “Partnerships with EDA tool and IP vendors is also key. We have worked extensively with industry leaders to ensure that the libraries and test files they output are formats our system can utilize directly. These tools also have device knowledge that our toolset does not. This is why the remote connect feature is key, because our partners can provide context-specific tools that are powerful during production debug. Being able to use these tools real-time without having to reproduce a setup or use case in a different environment has been a game changer.”

Serial scan test
But if it seems as if all the configuration changes are happening on the test side, it’s important to take stock of substantial changes on the approach to multi-core design for test.

Tradeoffs during the iterative process of design for test (DFT) have become so substantial in the case of multi-core products that a new approach has become necessary.

“If we look at the way a design is typically put together today, you have multiple cores that are going to be produced at different times,” said Siemens’ Press. “You need to have an idea of how many I/O pins you need to get your scan channels, the deep serial memory from the tester that’s going to be feeding through your I/O pins to this core. So I have a bunch of variables I need to trade off. I have the number of pins going to the core, the pattern size, and the complexity of the core. Then I’ll try to figure out what’s the best combination of cores to test together in what is called hierarchical DFT. But as these designs get more complex, with upwards of 2,500 cores, that’s a lot of tradeoffs to figure out.”

Press noted that applying AI with the same architecture can provide a 20% to 30% higher efficiency, but an improved methodology based on packetized scan test (see figure 1) actually makes more sense.

Fig. 1: Advantages to the serial scan network (SSN) approach. Source: Siemens

“Instead of having tester channels feeding into the scan channels that go to each core, you have a packetized bus and packets of data that feed through all the cores. Then you instruct the cores when their packet information is going to be available. By doing this, you don’t have as many variables you need to trade off,” he said. At the core level, each core can be optimized for any number of scan channels and patterns, and the I/O pin count is no longer a variable in the calculation. “Then, when you put it into this final chip, it deliver from the packets the amount of data you need for that core, that can work with any size serial bus, in what is called a serial scan network (SSN).”

Some of the results reported by Siemens EDA customers (see figure 2) highlight both supervised and unsupervised machine learning implementation for improvements in diagnosis resolution and failure analysis. DFT productivity was boosted by 5 to 10X using the serial scan network methodology.

Fig. 2: Realized benefits using machine learning and the serial scan network approach. Source: Siemens

What slows down AI implementation in HVM?
In the transition from design to testing of a device, the application of machine learning algorithms can enable a number of advantages, from better pairing of chiplet performance for use in an advanced package to test time reduction. For example, only a subset of high-performing devices may require burn-in.

“You can identify scratches on wafers, and then bin out the dies surrounding those scratches automatically within wafer sort,” said Michael Schuldenfrei, fellow at NI/Emerson Test & Measurement. “So AI and ML all sounds like a really great idea, and there are many applications where it makes sense to use AI. The big question is, why isn’t it really happening frequently and at-scale? The answer to that goes into the complexity of building and deploying these solutions.”

Schuldenfrei summarized four key steps in ML’s lifecycle, each with its own challenges. In the first phase, the training, engineering teams use data to understand a particular issue and then build a model that can be used to predict an outcome associated with that issue. Once the model is validated and the team wants to deploy it in the production environment, it needs to be integrated with the existing equipment, such as a tester or manufacturing execution system (MES). Models also mature and evolve over time, requiring frequent validation of the data going into the model and checking to see that the model is functioning as expected. Models also must adapt, requiring redeployment, learning, acting, validating and adapting, in a continuous circle.

“That eats up a lot of time for the data scientists who are charged with deploying all these new AI-based solutions in their organizations. Time is also wasted in the beginning when they are trying to access the right data, organizing it, connecting it all together, making sense of it, and extracting features from it that actually make sense,” said Schuldenfrei.

Further difficulties are introduced in a distributed semiconductor manufacturing environment in which many different test houses are situated in various locations around the globe. “By the time you finish implementing the ML solution, your model is stale and your product is probably no longer bleeding edge so it has lost its actionability, when the model needs to make a decision that actually impacts either the binning or the processing of that particular device,” said Schuldenfrei. “So actually deploying ML-based solutions in a production environment with high-volume semiconductor test is very far from trivial.”

He cited a 2014 Google article that stated how the ML code development part of the process is both the smallest and easiest part of the whole exercise, [1] whereas the various aspects of building infrastructure, data collection, feature extraction, data verification, and managing model deployments are the most challenging parts.

Changes from design through test ripple through the ecosystem. “People who work in EDA put lots of effort into design rule checking (DRC), meaning we’re checking that the work we’ve done and the design structure are safe to move forward because we didn’t mess anything up in the process,” said Siemens’ Press. “That’s really important with AI — what we call verifiability. If we have some type of AI running and giving us a result, we have to make sure that result is safe. This really affects the people doing the design, the DFT group and the people in test engineering that have to take these patterns and apply them.”

There are a multitude of ML-based applications for improving test operations. Advantest’s Butler highlighted some of the apps customers are pursuing most often, including search time reduction, shift left testing, test time reduction, and chiplet pairing (see figure 3).

“For minimum voltage, maximum frequency, or trim tests, you tend to set a lower limit and an upper limit for your search, and then you’re going to search across there in order to be able to find your minimum voltage for this particular device,” he said. “Those limits are set based on process split, and they may be fairly wide. But if you have analytics that you can bring to bear, then the AI- or ML-type techniques can basically tell you where this die lies on the process spectrum. Perhaps it was fed forward from an earlier insertion, and perhaps you combine it with what you’re doing at the current insertion. That kind of inference can help you narrow the search limits and speed up that test. A lot of people are very interested in this application, and some folks are doing it in production to reduce search time for test time-intensive tests.”

Fig. 3: Opportunities for real-time and/or post-test improvements to pair or bin devices, improve yield, throughput, reliability or cost using the ACS platform. Source: Advantest

“The idea behind shift left is perhaps I have a very expensive test insertion downstream or a high package cost,” Butler said. “If my yield is not where I want it to be, then I can use analytics at earlier insertions to be able to try to predict which devices are likely to fail at the later insertion by doing analysis at an earlier insertion, and then downgrade or scrap those die in order to optimize downstream test insertions, raising the yield and lowering overall cost. Test time reduction is very simply the addition or removal of test content, skipping tests to reduce cost. Or you might want to add test content for yield improvement,” said Butler.

“If I have a multi-tiered device, and it’s not going to pass bin 1 criteria – but maybe it’s bin 2 if I add some additional content — then people may be looking at analytics to try to make those decisions. Finally, two things go together in my mind, this idea of chiplet designs and smart pairing. So the classic example is a processor die with a stack of high bandwidth memory on top of it. Perhaps I’m interested in high performance in some applications and low power in others. I want to be able to match the content and classify die as they’re coming through the test operation, and then downstream do pick-and-place and put them together in such a way that I maximize the yield for multiple streams of data. Similar kinds of things apply for achieving a low power footprint and carbon footprint.”

Generative AI
The question that inevitably comes up when discussing the role of AI in semiconductors is whether or not large language models like ChatGPT can prove useful to engineers working in fabs. Early work shows some promise.

“For example, you can ask the system to build an outlier detection model for you that looks for parts that are five sigma away from the center line, saying ‘Please create the script for me,’ and the system will create the script. These are the kinds of automated, generative AI-based solutions that we’re already playing with,” says Schuldenfrei. “But from everything I’ve seen so far, there is still quite a lot of work to be done to get these systems to provide outputs with high enough quality. At the moment, the amount of human interaction that is needed afterward to fix problems with the algorithms or models that generative AI is producing is still quite significant.”

A lingering question is how to access the test programs needed to train the new test programs when everyone is protecting important test IP? “Most people value their test IP and don’t necessarily want to set up guardrails around the training and utilization processes,” Butler said. “So finding a way to accelerate the overall process of developing test programs while protecting IP is the challenge. It’s clear this kind of technology is going to be brought to bear, just like we already see in the software development process.”

Failure analysis
Failure analysis is typically a costly and time-consuming endeavor for fabs because it requires a trip back in time to gather wafer processing, assembly, and packaging data specific to a particular failed device, known as a returned material authorization (RMA). Physical failure analysis is performed in an FA lab, using a variety of tools to trace the root cause of the failure.

While scan diagnostic data has been used for decades, a newer approach involves pairing a digital twin with scan diagnostics data to find the root cause of failures.

“Within test, we have a digital twin that does root cause deconvolution based on scan failure diagnosis. So instead of having to look at the physical device and spend time trying to figure out the root cause, since we have scan, we have millions and millions of virtual sample points,” said Siemens’ Press. “We can reverse-engineer what we did to create the patterns and figure out where the mis-compare happened within the scan cells deep within the design. Using YieldInsight and unsupervised machine learning with training on a bunch of data, we can very quickly pinpoint the fail locations. This allows us to run thousands, or tens of thousands fail diagnoses in a short period of time, giving us the opportunity to identify the systematic yield limiters.”

Yet another approach that is gaining steam is using on-die monitors to access specific performance information in lieu of physical FA. “What is needed is deep data from inside the package to monitor performance and reliability continuously, which is what we provide,” said Alex Burlak, vice president of test and analytics at proteanTecs. “For example, if the suspected failure is from the chiplet interconnect, we can help the analysis using deep data coming from on-chip agents instead of taking the device out of context and into the lab (where you may or may not be able to reproduce the problem). Even more, the ability to send back data and not the device can in many cases pinpoint the problem, saving the expensive RMA and failure analysis procedure.”

Conclusion
The enthusiasm around AI and machine learning is being met by robust infrastructure changes in the ATE community to accommodate the need for real-time inferencing of test data and test optimization for higher yield, higher throughput, and chiplet classifications for multi-chiplet packages. For multi-core designs, packetized test, commercialized as an SSN methodology, provides a more flexible approach to optimizing each core for the number of scan chains, patterns and bus width needs of each core in a device.

The number of testing applications that can benefit from AI continues to rise, including test time reduction, V_min/F_max search reduction, shift left, smart pairing of chiplets, and overall power reduction. New developments like identical source files for all setups across design, characterization, and test help speed the critical debug and development stage for new products.

Reference

https://proceedings.neurips.cc/paper_files/paper/2015/file/86df7dcfd896fcaf2674f757a2463eba-Paper.pdf

The post AI/ML’s Role In Design And Test Expands appeared first on Semiconductor Engineering.

Chip Industry Week In Review

Semiconductor Engineering

Od: The SE Staff

17. Květen 2024 v 09:01

President Biden will raise the tariff rate on Chinese semiconductors from 25% to 50% by 2025, among other measures to protect U.S. businesses from China’s trade practices. Also, as part of President Biden’s AI Executive Order, the Administration released steps to protect workers from AI risks, including human oversight of systems and transparency about what systems are being used.

Intel is in advanced talks with Apollo Global Management for the equity firm to provide more than $11 billion to build a fab in Ireland, reported the Wall Street Journal. Also, Intel’s Foundry Services appointed Kevin O’Buckley as the senior vice president and general manager.

Polar is slated to receive up to $120 million in CHIPS Act funding to establish an independent American foundry in Minnesota. The company expects to invest about $525 million in the expansion of the facility over the next two years, with a $75 million investment from the State of Minnesota.

Arm plans to develop AI chips for launch next year, reports Nikkei Asia.

South Korea is planning a support package worth more than 10 trillion won ($7.3 billion) aimed at chip materials, equipment makers, and fabless companies throughout the semiconductor supply chain, according to Reuters.

Quick links to more news:

Global
In-Depth
Markets and Money
Security
Supercomputing
Education and Training
Product News
Research
Events and Further Reading

Global

Edwards opened a new facility in Asan City, South Korea. The 15,000m² factory provides a key production site for abatement systems, and integrated vacuum and abatement systems for semiconductor manufacturing.

France’s courtship with mega-tech is paying off. Microsoft is investing more than US $4 billion to expand its cloud computing and AI infrastructure, including bringing up to 25,000 advanced GPUs to the country by the end of 2025. The “Choose France” campaign also snagged US $1.3 billion from Amazon for cloud infrastructure expansion, genAI and more.

Toyota, Nissan, and Honda are teaming up on AI and chips for next-gen cars with support from Japan’s Ministry of Economy, Trade and Industry, (METI), reports Nikkei Asia.

Meanwhile, IBM and Honda are collaborating on long-term R&D of next-gen technologies for software-defined vehicles (SDV), including chiplets, brain-inspired computing, and hardware-software co-optimization.

Siemens and Foxconn plan to collaborate on global manufacturing processes in electronics, information and communications technology, and electric vehicles (EV).

TSMC confirmed a Q424 construction start date for its first European plant in Dresden, Germany.

Amazon Web Services (AWS) plans to invest €7.8 billion (~$8.4B) in the AWS European Sovereign Cloud in Germany through 2040. The system is designed to serve public sector organizations and customers in highly regulated industries.

In-Depth

Semiconductor Engineering published its Low Power-High Performance newsletter this week, featuring these stories:

- Will Domain-Specific ICs Become Ubiquitous?
- Running More Efficient AI/ML Code With Neuromorphic Engines
- Power/Performance Costs In Chip Security

And this week’s Test, Measurement & Analytics newsletter featured these stories:

Using Predictive Maintenance To Boost IC Manufacturing Efficiency
The Future Of Fault Coverage In Chips
Doing More At Functional Test

Markets and Money

The U.S. National Institute of Standards and Technology (NIST) awarded more than $1.2 million to 12 businesses in 8 states under the Small Business Innovation Research (SBIR) Program to fund R&D of products relating to cybersecurity, quantum computing, health care, semiconductor manufacturing, and other critical areas.

Engineering services and consulting company Infosys completed the acquisition of InSemi Technology, a provider of semiconductor design and embedded software development services.

The quantum market, which includes quantum networking and sensors alongside computing, is predicted to grow from $838 million in 2024 to $1.8 billion in 2029, reports Yole.

Shipments of OLED monitors reached about 200,000 units in Q1 2024, a year over year growth of 121%, reports TrendForce.

Global EV sales grew 18% in Q1 2024 with plug-in hybrid electric vehicles (PHEV) sales seeing 46% YoY growth and battery electric vehicle (BEV) sales growing just 7%, according to Counterpoint. China leads global EV sales with 28% YoY growth, while the US grew just 2%. Tesla saw a 9% YoY drop, but topped BEV sales with a 19% market share. BYD grew 13% YoY and exported about 100,000 EVs with 152% YoY growth, mainly in Southeast Asia.

DeepX raised $80.5 million in Series C funding for its on-device NPU IP and AI SoCs tailored for applications including physical security, robotics, and mobility.

MetisX raised $44 million in Series A funding for its memory solutions built on Compute Express Link (CXL) for accelerating large-scale data processing applications.

Security

While security experts have been warning of a growing threat in electronics for decades, there have been several recent fundamental changes that elevate the risk.

Synopsys and the Ponemon Institute released a report showing 54% of surveyed organizations suffered a software supply chain attack in the past year and 20% were not effective in their response. And 52% said their development teams use AI tools to generate code, but only 32% have processes to evaluate it for license, security, and quality risks.

Researchers at Ruhr University Bochum and TU Darmstadt presented a solution for the automated generation of fault-resistant circuits (AGEFA) and assessed the security of examples generated by AGEFA against side-channel analysis and fault injection.

TXOne reported on operational technology security and the most effective method for preventing production interruptions caused by cyber-attacks.

CrowdStrike and NVIDIA are collaborating to accelerate the use of analytics and AI in cybersecurity to help security teams combat modern cyberattacks, including AI-powered threats.

The National Institute of Standards and Technology (NIST) finalized its guidelines for protecting sensitive data, known as controlled unclassified information, aimed at organizations that do business with the federal government.

The Defense Advanced Research Projects Agency (DARPA) awarded BAE Systems a $12 million contract to solve thermal challenges limiting electronic warfare systems, particularly in GaN transistors.

Sigma Defense won a $4.7 million contract from the U.S. Army for an AI-powered virtual training environment, partnering with Brightline Interactive on a system that uses spatial computing and augmented intelligence workflows.

SkyWater’s advanced packaging operation in Florida has been accredited as a Category 1A Trusted Supplier by the Defense Microelectronics Activity (DMEA) of the U.S. Department of Defense (DoD).

Videos of two CWE-focused sessions from CVE/FIRST VulnCon 2024 were made available on the CWE YouTube Channel.

The Cybersecurity and Infrastructure Security Agency (CISA) issued a number of alerts/advisories.

Supercomputing

Supercomputers are battling for top dog.

The Frontier supercomputer at Oak Ridge National Laboratory (ORNL) retained the top spot on the Top500 list of the world’s fastest systems with an HPL score of 1.206 EFlop/s. The as-yet incomplete Aurora system at Argonne took second place, becoming the world’s second exascale system at 1.012 EFlop/s. The Green500 list, which tracks energy efficiency of compute, saw three new entrants take the top places.

Cerebras Systems, Sandia National Laboratory, Lawrence Livermore National Laboratory, and Los Alamos National Laboratory used Cerebras’ second generation Wafer Scale Engine to perform atomic scale molecular dynamics simulations at the millisecond scale, which they claim is 179X faster than the Frontier supercomputer.

UT Austin‘s Stampede3 Supercomputer is now in full production, serving the open science community through 2029.

Education and Training

SEMI announced the SEMI University Semiconductor Certification Programs to help alleviate the workforce skills gap. Its first two online courses are designed for new talent seeking careers in the industry, and experienced workers looking to keep their skills current. Also, SEMI and other partners launched a European Chip Skills Academy Summer School in Italy.

Siemens created an industry credential program for engineering students that supplements a formal degree by validating industry knowledge and skills. Nonprofit agency ABET will provide accreditation. The first two courses are live at the University of Colorado Boulder (CU Boulder) and a series is planned with Pennsylvania State University (Penn State).

Syracuse University launched a $20 million Center for Advanced Semiconductor Manufacturing, with co-funding from Onondaga County.

Starting young is a good thing. An Arizona school district, along with the University Of Arizona, is creating a semiconductor program for high schoolers.

Product News

Siemens and Sony partnered to enable immersive engineering via a spatial content creation system, NX Immersive Designer, which includes Sony’s XR head-mounted display. The integration of hardware and software gives designers and engineers natural ways to interact with a digital twin. Siemens also extended its Xcelerator as a Service portfolio with solutions for product engineering and lifecycle management, cloud-based high-performance simulation, and manufacturing operations management. It will be available on Microsoft Azure, as well.

Advantest announced the newest addition to its portfolio of power supplies for the V93000 EXA Scale SoC test platform. The DC Scale XHC32 power supply offers 32 channels with single-instrument total current of up to 640A.

Fig. 1: Advantest’s DC Scale XHC32. Source: Advantest

Infineon released its XENSIV TLE49SR angle sensors, which can withstand stray magnetic fields of up to 8 mT, ideal for applications of safety-critical automotive chassis systems.

Google debuted its sixth generation Cloud TPU, 4.7X faster and 67% more energy-efficient than the previous generation, with double the high-bandwidth memory.

X-Silicon uncorked a RISC-V vector CPU, coupled with a Vulkan-enabled GPU ISA and AI/ML acceleration in a single processor core, aimed at embedded and IoT applications.

IBM expanded its Qiskit quantum software stack, including the stable release of its SDK for building, optimizing, and visualizing quantum circuits.

Northeastern University announced the general availability of testing and integration solutions for Open RAN through the Open6G Open Testing and Integration Center (Open 6G OTIC).

Research

The University of Glasgow received £3 million (~$3.8M) from the Engineering and Physical Sciences Research Council (EPSRC)’s Strategic Equipment Grant scheme to help establish “Analogue,” an Automated Nano Analysing, Characterisation and Additive Packaging Suite to research silicon chip integration and packaging.

EPFL researchers developed scalable photonic ICs, based on lithium tantalate.

DISCO developed a way to increase the diameter of diamond wafers that uses the KABRA process, a laser ingot slicing method.

CEA-Leti developed two complementary approaches for high performance photon detectors — a mercury cadmium telluride-based avalanche photodetector and a superconducting single photon detector.

Toshiba demonstrated storage capacities of over 30TB with two next-gen large capacity recording technologies for hard disk drives (HDDs): Heat Assisted Magnetic Recording (HAMR) and Microwave Assisted Magnetic Recording (MAMR).

Caltech neuroscientists reported that their brain-machine interface (BMI) worked successfully in a second human patient, following 2022’s first instance, proving the device is not dependent on one particular brain or one location in a brain.

Linköping University researchers developed a cheap, sustainable battery made from zinc and lignin, while ORNL researchers developed carbon-capture batteries.

Events and Further Reading

Find upcoming chip industry events here, including:

Event	Date	Location
European Test Symposium	May 20 – 24	The Hague, Netherlands
NI Connect Austin 2024	May 20 – 22	Austin, Texas
ITF World 2024 (imec)	May 21 – 22	Antwerp, Belgium
Embedded Vision Summit	May 21 – 23	Santa Clara, CA
ASIP Virtual Seminar 2024	May 22	Online
Electronic Components and Technology Conference (ECTC) 2024	May 28 – 31	Denver, Colorado
Hardwear.io Security Trainings and Conference USA 2024	May 28 – Jun 1	Santa Clara, CA
SW Test	Jun 3 – 5	Carlsbad, CA
IITC2024: Interconnect Technology Conference	Jun 3 – 6	San Jose, CA
VOICE Developer Conference	Jun 3 – 5	La Jolla, CA
CHIPS R&D Standardization Readiness Level Workshop	Jun 4 – 5	Online and Boulder, CO

Find All Upcoming Events Here

Upcoming webinars are here.

Semiconductor Engineering’s latest newsletters:

Automotive, Security and Pervasive Computing
Systems and Design
Low Power-High Performance
Test, Measurement and Analytics
Manufacturing, Packaging and Materials

The post Chip Industry Week In Review appeared first on Semiconductor Engineering.

2.5D Integration: Big Chip Or Small PCB?

Semiconductor Engineering

Od: Brian Bailey

29. Únor 2024 v 09:08

Defining whether a 2.5D device is a printed circuit board shrunk down to fit into a package, or is a chip that extends beyond the limits of a single die, may seem like hair-splitting semantics, but it can have significant consequences for the overall success of a design.

Planar chips always have been limited by size of the reticle, which is about 858mm². Beyond that, yield issues make the silicon uneconomical. For years, that has limited the number of features that could be crammed onto a planar substrate. Any additional features would need to be designed into additional chips and connected with a printed circuit board (PCB).

The advent of 2.5D packaging technology has opened up a whole new axis for expansion, allowing multiple chiplets to be interconnected inside an advanced package. But the starting point for this packaged design can have a big impact on how the various components are assembled, who is involved, and which tools are deployed and when.

There are several reasons why 2.5D is gaining ground today. One is cost. “If you can build smaller chips, or chiplets, and those chiplets have been designed and optimized to be integrated into a package, it can make the whole thing smaller,” says Tony Mastroianni, advanced packaging solutions director at Siemens Digital Industries Software. “And because the yield is much higher, that has a dramatic impact on cost. Rather than having 50% or below yield for die-sized chips, you can get that up into the 90% range.”

Interconnecting chips using a PCB also limits performance. “Historically, we had chips packaged separately, put on the PCB, and connected with some routing,” says Ramin Farjadrad, CEO and co-founder of Eliyan. “The problems people started to face were twofold. One was that the bandwidth between these chips was limited by going through the PCB, and then a limited number of balls on the package limited the connectivity between these chips.”

The key difference with 2.5D compared to a PCB is that 2.5D uses chip dimensions. There are much finer-grain wires, and various components can be packed much closer together on an interposer or in a package than on a board. For those reasons, wires can be shorter, there can be more of them, and bandwidth is increased.

That impacts performance at multiple levels. “Since they are so close, you don’t have the long transport RC or LC delays, so it’s much faster,” says Siemens’ Mastroianni. “You don’t need big drivers on a chip to drive long traces over the board, so you have lower power. You get orders of magnitude better performance — and lower power. A common metric is to talk about pico joules per bit. The amount of energy it takes to move bits makes 2.5D compelling.”

Still, the mindset affects the initial design concept, and that has repercussions throughout the flow. “If you talk to a die designer, they’re probably going to say that it is just a big chip,” says John Park, product management group director in the Custom IC & PCB Group at Cadence. “But if you talk to a package designer, or a board designer, they’re going to say it’s basically a tiny PCB.”

Who is right? “The internal organizational structure within the company often decides how this is approached,” says Marc Swinnen, director of product marketing at Ansys. “Longer term, you want to make sure that your company is structured to match the physics and not try to match the physics to your company.”

What is clear is that nothing is certain. “The digital world was very regular in that every two years we got a new node that was half size,” says Cadence’s Park. “There would be some new requirements, but it was very evolutionary. Packaging is the Wild West. We might get 8 new packaging technologies this year, 3 next year, 12 the next year. Many of these are coming from the foundries, whereas it used to be just from the outsourced semiconductor assembly and test companies (OSATs) and the substrate providers. While the foundries are a new entrant, the OSATs are offering some really interesting packaging technologies at a lower cost.”

Part of the reason for this is that different groups of people have different requirement sets. “The government and the military see the primary benefits as heterogeneous integration capabilities,” says Ansys’ Swinnen. “They are not pushing the edge of processing technology. Instead, they are designing things like monolithic microwave integrated circuits (MMICs), where they need waveguides for very high-speed signals. They approach it from a packaging assembly point of view. Conversely, the high-performance compute (HPC) companies approach it from a pile of 5nm and 3nm chips with high performance high-bandwidth memory (HBM). They see it as a silicon assembly problem. The benefit they see is the flexibility of the architecture, where they can throw in cores and interfaces and create products for specific markets without having to redesign each chiplet. They see flexibility as the benefit. Military sees heterogeneous integration as the benefit.”

Materials
There are several materials used as the substrate in 2.5D packaging technology, each of which has different tradeoffs in terms of cost, density, and bandwidth, along with each having a selection of different physical issues that must be overcome. One of the primary points of differentiation is the bump pitch, as shown in figure 1.

Fig 1. Chiplet interconnection for various substrate configurations. Source: Eliyan

Fig 1. Chiplet interconnection for various substrate configurations. Source: Eliyan

When talking about an interposer, it generally is considered to be silicon. “The interposer could be a large piece of silicon (Fig 1 top), or just silicon bridges between the chips (Fig 1 middle) to provide the connectivity,” says Eliyan’s Farjadrad. “Both of these solutions use micro-bumps, which have high density. Interposers and bridges provide a lot of high-density bumps and traces, and that gives you bandwidth. If you utilize 1,000 wires each running at 5Gb, you get 5Tb. If you have 10,000, you get 50Tb. But those signals cannot go more than two or three millimeters. Alternatively, if you avoid the silicon interposer and you stay with an organic package (Fig 1 bottom), such as flip chip package, the density of the traces is 5X to 10X less. However, the thickness of the wires can be 5X to 10X more. That’s a significant advantage, because the resistance of the wires will go down by the square of the thickness of the wires. The cross section of that wire goes up by the square of that wire, so the resistance comes down significantly. If it’s 5X less density, that means you can run signals almost 25X further.”

For some people, it is all about bandwidth per millimeter. “If you have a parallel bus, or a parallel interface that is high speed, and you want bandwidth per millimeter, then you would probably pick a silicon interposer,” says Kent Stahn, senior manager of hardware engineering in Synopsys‘ Solutions Group. “An organic substrate is low-loss, low-cost, but it doesn’t have the density. In between, there are a bunch of solutions that deliver on some of that, but not for the same cost.”

There are other reasons to pick a substrate material, as well. “Silicon interposer comes from a foundry, so availability is a problem,” says Manuel Mota, senior staff product manager in Synopsys’ Solutions Group. “Some companies are facing challenges in sourcing advanced packages because capacity is taken. By going to other technologies that have a little less bandwidth density, but perhaps enough for your application, you can find them elsewhere. That’s becoming a critical aspect.”

All of these technologies are progressing rapidly, however. “The reticle limit is about 858mm square,” says Park. “People are talking about interposers that are perhaps four times that size, but we have laminates that go much bigger. Some of the laminate substrates coming from Japan are approaching that same level of interconnect density that we can get from silicon. I personally see more push towards organic substrates. Chip-on-Wafer-on-Substrate (CoWoS) from TSMC uses a silicon interposer and has been the technology of choice for about 12 years. More recently they introduced CoWoS-R, which uses film polyamide, closer to an organic type of substrate. Now we hear a lot about glass substrates.”

Over time, the total real estate inside the package may grow. “It doesn’t make sense for foundries to continue to build things the size of a 30-inch printed circuit board,” adds Park. “There are materials that are capable of addressing the bigger designs. Where we really need density is die-to-die. We want those chiplets right next to each other, a couple of millimeters of interconnect length. We want things very short. But the rest of it is just fanning out the I/O so that it connects to the PCB.”

This is why bridges are popular. “We do see a progression to bridges for the high-speed part of the interface,” say Synopsys’ Stahn. “The back side of it would be fanout, like RDL fanout. We see RDL packages that are going to be more like traditional packages going forward.”

Interposers offer additional capabilities. “Today, 99% of the interposers are passive,” says Park. “There’s no front end of line, there are no device layers. It’s purely back end of line processing. You are adding three, four, five metal layers to that silicon. That’s what we call a passive interposer. It’s just creating that die-to-die interconnect. But there are people taking that die and making it an active interposer, basically adding logic to that.”

That can happen for different purposes. “You already see some companies doing active interposers, where they add power management or some of the controls logic,” says Mota. “When you start putting active circuits on interposer, is it still a 2.5D integration, or does it become a 3D integration? We don’t see a big trend toward active interposers today.”

There are some new issues, though. “You have to consider coefficients of thermal expansion (CTE) mismatches,” says Stahn. “This happens whenever two materials with different CTEs are bonded together. Let’s start with the silicon interposer. You can get higher wattage systems, where the SoCs can be talking to their peers, and that can consume a lot of power. A silicon interposer still has to go in a package. The CTE mismatches are between the silicon to the package material. And with the bridge, you’re using it where you need it, but it’s still silicon die-to-die. You have to do the thermal mechanical analysis to make sure that the power that you’re delivering, and the CTE mismatches that you have, result in a viable system.”

While signal lengths in theory can get longer, this poses some problems. “When you’re making those long connections inside a chip, you typically limit those routes to a couple of millimeters, and then you buffer it,” says Mastroianni. “The problem with a passive silicon interposer is there are no buffers. That can really become a serious issue. If you do need to make those connections, you need to plan those out very carefully. And you do need to make sure you’re running timing analysis. Typically, your package guys are not going to be doing that analysis. That’s more of a problem that’s been solved with static timing analysis by silicon engineers. We do need to introduce an STA flow and deal with all the extractions that include organic and silicon type traces, and it becomes a new problem. When you start getting into some of those very long traces, your simple RC timing delays, which are assumed in normal STA delay calculators, don’t account for some of the inductance and mutual inductance between those traces, so you can get serious accuracy issues for those long traces.”

Active interposers help. “With active interposers, you can overcome some of the long-distance problems by putting in buffers or signal repeaters,” says Swinnen. “Then it starts looking more like a chip again, and you can only do it on silicon. You have the EMIB technology from Intel, where they embedded chiplet into the interposer and that’s an active bridge. The chip talks to the EMIB chip, and they both talk to you through this little active bridge chip, which is not exactly an active interposer, but acts almost like an active interposer.”

But even passive components add value. “The first thing that’s being done is including trench capacitors in the interposer,” says Mastroianni. “That gives you the ability to do some good decoupling, where it counts, close to the die. If you put them out on the board, you lose a lot of the benefits for the high-speed interfaces. If you can get them in the interposer, sitting right under where you have the fast-switching speed signals, you can get some localized decoupling.”

In addition to different materials, there is the question of who designs the interposer. “The industry seems to think of it as a little PCB in the context of who’s doing the design,” says Matt Commens, senior manager for product management at Ansys. “The interposers are typically being designed by packaging engineers, even though they are silicon processes. This is especially true for the high-performance ones. It seems counterintuitive, but they have that signal integrity background, they’ve been designing transmission lines and minimizing mismatch at interconnects. A traditional IC designer works from a component point of view. So certainly, the industry is telling us that the people they’re assigning to do that design work are packaging type of personas.”

Power
There are some considerable differences in routing between PCBs and interposers. “Interposer routing is much easier, as the number of components is drastically reduced compared to the PCB,” says Andy Heinig, head of department for efficient electronics at Fraunhofer IIS/EAS. “On the other hand, the power grid on the interposer is much more complex due to the higher resistance of the metal layers and the fact that the power grid is cut out by signal wires. The routing for the die-to-die interface is more complex due to the routing density.”

Power delivery looks very different. “If you look at a PCB, they put these big metal pour areas embedded in the layers, and they void out areas where things need to go through,” says Park. “You put down a bunch of copper and then you void out the others. We can’t build an interposer that way. We have to deposit the interconnect, so the power and ground structures on a silicon interposer will look more like a digital chip. But the signal will look more like a PCB or laminate package.”

Routing does look more like a PCB than a chip. “You’ll see things like teardrops or fillets where it makes a connection to a pad or via to create better yield,” adds Park. “The routing styles today are more aligned to PCBs than they are to a digital IC, where you just have 90° orthogonal corners and clean routing channels. For interposers, whether it’s silicon or organic, the via is often bigger than the wire, which is a classic PCB problem. The routers, if we’re talking about digital, is again more like a small PCB than a die.”

TSVs can create problems, too. “If you’re going to treat them as square, you’re losing a lot of space at the corners,” says Swinnen. “You really want 45° around those objects. Silicon routers are traditionally Manhattan, although there has been a long tradition of RDL routing, which is the top layer where the bumps are connected. That has traditionally used octagonal bumps or round bumps, and then 45° routing. It’s not as flexible as the PCB routing, but they have redistribution layer routers, and also they have some routers that come from the full custom side which have full river routing.”

Related Reading
True 3D Is Much Tougher Than 2.5D
While terms often are used interchangeably, they are very different technologies with different challenges.
Thermal Integrity Challenges Grow In 2.5D
Work is underway to map heat flows in interposer-based designs, but there’s much more to be done.

The post 2.5D Integration: Big Chip Or Small PCB? appeared first on Semiconductor Engineering.