FreshRSS

Zobrazení pro čtení

Jsou dostupné nové články, klikněte pro obnovení stránky.

Achieving Zero Defect Manufacturing Part 2: Finding Defect Sources

Semiconductor manufacturing creates a wealth of data – from materials, products, factory subsystems and equipment. But how do we best utilize that information to optimize processes and reach the goal of zero defect manufacturing?

This is a topic we first explored in our previous blog, “Achieving Zero Defect Manufacturing Part 1: Detect & Classify.” In it, we examined real-time defect classification at the defect, die and wafer level. In this blog, the second in our three-part series, we will discuss how to use root cause analysis to determine the source of defects. For starters, we will address the software tools needed to properly conduct root cause analysis for a faster understanding of visual, non-visual and latent defect sources.

About software

The software platform fabs choose impacts how well users are able to integrate data, conduct database analytics and perform server-side and real-time analytics. Manufacturers want the ability to choose a platform that can scale by data volume, type and multisite integration. In addition, all of this data – whether it is coming from metrology, inspection or testing – must be normalized before fabs can apply predictive modeling and machine learning based analytics to find the root cause of defects and failures. This search, however, goes beyond a simple examination of process steps and tools; manufacturers also need a clear understanding of each device’s genealogy. In addition, fabs should employ an AI-based yield optimizer capable of running multiple models and offering potential optimization measures that can be taken in the factory to improve the process.

Now that we have discussed software needs, we will turn our attention to two use cases to further our examination of root cause analysis in zero defect manufacturing.

Root Cause Case No. 1

The first root cause value case we would like to discuss involves the integration of wafer probe, photoluminescence and epitaxial (epi) data. Previously, integrating these three kinds of data was not possible because the identification for wafers and lots – pre- and post-epi – were generally not linked. Wafers and lots were often identified by entirely different names before and after the epi step. For reasons that do not need to be explained, this was a huge hindrance to advancing the goal of zero defect manufacturing because the impact of the epi process on yield was not detected in a timely manner, resulting in higher defectivity and yield loss.

But the challenge is not as simple as identification and naming practices. Typical wafer ID trackers are not applied prior to the post-epi step because of technical and logistical constraints. The solution is for fabs to employ defect and yield analytics software that will enable genealogy that can link data from the epi and pre-epi processes to post-epi processes. The real innovation occurs when the genealogical information is normalized and interpolated with electrical test data. Once integrated, this data offers users a more complete understanding of where yield limiting events are occurring.

Fig. 1: Photoluminescence map (left) and electrical test performance by epi tool (right).

For example, let us consider the following scenario: in figure 1 (left) we show a group of dies that negatively affect performance on the upper left edge of the wafer. Through more traditional measures, this pocket of defectivity may have gone unnoticed, allowing for bad die to move forward in the process. But by applying integrated data, genealogical information and electrical test data, this trouble-plagued area was identified down to the epi tool and chamber (figure 1, right), and the defective material was prevented from going forward in the process. As significant as this is, with the right software platform this approach enables root cause analysis to be conducted in minutes, not days.

Now, onto the second use case in which we look at how to problem solve within the supply chain.

Root Cause Case No. 2

During final test and measurement, chips sometimes fail. In many cases, the faulty chips were previously determined to be good chips and were advanced forward in the process as a result of combining multiple chips coming from different products, lots, or wafers. The important thing here is to understand why this happens.

When there is a genealogy model in a yield software platform, fabs are able to pick the lots and wafers where bad chips come from and then run this information through pattern analysis software. In one particular scenario (figure 2), users were able to apply pattern analysis software to discover that all of the defective die arose from a spin coater issue, in this case, a leak negatively impacting the underbump metallization area following typical preventive maintenance measures.

To compensate for this, the team used integrated analytics to create a fault detection and classification (FDC) model to identify similar circumstances going forward. In this case, the FDC model monitors the suction power of the spin coater. If suction power for more than 10 consecutive samples are above the set limit, alarms are triggered and an appropriate Out of Control Action Plan (OCAP) process is executed that includes notification to tool owner.

Fig. 2: Proactive zero defect manufacturing at-a-glance.

The above explains how fabs are able to turn reactive root cause analytics into proactive monitoring. With such an approach, manufacturers can monitor for this and other issues and avoid the advancement of future defective die. Furthermore, the number of defect signatures that can be monitored inline can be as high as 40 different signatures, if not more. And in case these defects are missed at the process level, they can be identified at the inspection level or post-inspection, avoiding hundreds of issues further along in the process.

Conclusion

Zero defect manufacturing is not so much of a goal as it is a commitment to root out defects before they happen. To accomplish this, fabs need a wealth of data from the entire process to achieve a clear picture of what is going wrong, where it is going wrong and why it is going wrong. In this blog, we offered specific scenarios where root cause analysis was used to find defects across wafers and dies. However, these are just a few examples of how software can be used to find difficult-to-find defects. It can be beneficial in many different areas across the entire process, with each application further strengthening a fab’s efforts to employ a zero defect manufacturing approach, increasing yield and meeting the stringent requirements of some of the industry’s most advanced customers.

In our next blog, we will discuss how to detect dormant defects, use feedback and feedforward measures, and monitor the health of process control equipment. We hope you join us as we continue to explore methods for achieving zero defect manufacturing.

The post Achieving Zero Defect Manufacturing Part 2: Finding Defect Sources appeared first on Semiconductor Engineering.

Ensure Reliability In Automotive ICs By Reducing Thermal Effects

Od: Lee Wang

In the relentless pursuit of performance and miniaturization, the semiconductor industry has increasingly turned to 3D integrated circuits (3D-ICs) as a cutting-edge solution. Stacking dies in a 3D assembly offers numerous benefits, including enhanced performance, reduced power consumption, and more efficient use of space. However, this advanced technology also introduces significant thermal dissipation challenges that can impact the electrical behavior, reliability, performance, and lifespan of the chips (figure 1). For automotive applications, where safety and reliability are paramount, managing these thermal effects is of utmost importance.

Fig. 1: Illustration of a 3D-IC with heat dissipation.

3D-ICs have become particularly attractive for safety-critical devices like automotive sensors. Advanced driver-assistance systems (ADAS) and autonomous vehicles (AVs) rely on these compact, high-performance chips to process vast amounts of sensor data in real time. Effective thermal management in these devices is a top priority to ensure that they function reliably under various operating conditions.

The thermal challenges of 3D-ICs in automotive applications

The stacked configuration of 3D-ICs inherently leads to complex thermal dynamics. In traditional 2D designs, heat dissipation occurs across a single plane, making it relatively straightforward to manage. However, in 3D-ICs, multiple active layers generate heat, creating significant thermal gradients and hotspots. These thermal issues can adversely affect device performance and reliability, which is particularly critical in automotive applications where components must operate reliably under extreme temperatures and harsh conditions.

These thermal effects in automotive 3D-ICs can impact the electrical behavior of the circuits, causing timing errors, increased leakage currents, and potential device failure. Therefore, accurate and comprehensive thermal analysis throughout the design flow is essential to ensure the reliability and performance of automotive ICs.

The importance of early and continuous thermal analysis

Traditionally, thermal analysis has been performed at the package and system levels, often as a separate process from IC design. However, with the advent of 3D-ICs, this approach is no longer sufficient.

To address the thermal challenges of 3D-ICs for automotive applications, it is crucial to incorporate die-level thermal analysis early in the design process and continue it throughout the design flow (figure 2). Early-stage thermal analysis can help identify potential hotspots and thermal bottlenecks before they become critical issues, enabling designers to make informed decisions about chiplet placement, power distribution, and cooling strategies. These early decisions reduce the risks of thermal-induced failures, improving the reliability of 3D automotive ICs.

Fig. 2: Die-level detailed thermal analysis using accurate package and boundary conditions should be fully integrated into the ASIC design flow to allow for fast thermal exploration.

Early package design, floorplanning and thermal feasibility analysis

During the initial package design and floorplanning stage, designers can use high-level power estimates and simplified models to perform thermal feasibility studies. These early analyses help identify configurations that are likely to cause thermal problems, allowing designers to rule out problematic designs before investing significant time and resources in detailed implementation.

Fig. 3: Thermal analysis as part of the package design, floorplanning and implementation flows.

For example, thermal analysis can reveal issues such as overlapping heat sources in stacked dies or insufficient cooling paths. By identifying these problems early, designers can explore alternative floorplans and adjust power distribution to mitigate thermal risks. This proactive approach reduces the likelihood of encountering critical thermal issues late in the design process, thereby shortening the overall design cycle.

Iterative thermal analysis throughout design refinement

As the design progresses and more detailed information becomes available, thermal analysis should be performed iteratively to refine the thermal model and verify that the design remains within acceptable thermal limits. At each stage of design refinement, additional details such as power maps, layout geometries and their material properties can be incorporated into the thermal model to improve accuracy.

This iterative approach lets designers continuously monitor and address thermal issues, ensuring that the design evolves in a thermally aware manner. By integrating thermal analysis with other design verification tasks, such as timing and power analysis, designers can achieve a holistic view of the design’s performance and reliability.

A robust thermal analysis tool should support various stages of the design process, providing value from initial concept to final signoff:

  1. Early design planning: At the conceptual stage, designers can apply high-level power estimates to explore the thermal impact of different design options. This includes decisions related to 3D partitioning, die assembly, block and TSV floorplan, interface layer design, and package selection. By identifying potential thermal issues early, designers can make informed decisions that avoid costly redesigns later.
  2. Detailed design and implementation: As designs become more detailed, thermal analysis should be used to verify that the design stays within its thermal budget. This involves analyzing the maturing package and die layout representations to account for their impact on thermally sensitive electrical circuits. Fine-grained power maps are crucial at this stage to capture hotspot effects accurately.
  3. Design signoff: Before finalizing the design, it is essential to perform comprehensive thermal verification. This ensures that the design meets all thermal constraints and reliability requirements. Automated constraints checking and detailed reporting can expedite this process, providing designers with clear insights into any remaining thermal issues.
  4. Connection to package-system analysis: Models from IC-level thermal analysis can be used in thermal analysis of the package and system. The integration lets designers build a streamlined flow through the entire development process of a 3D electronic product.

Tools and techniques for accurate thermal analysis

To effectively manage thermal challenges in automotive ICs, designers need advanced tools and techniques that can provide accurate and fast thermal analysis throughout the design flow. Modern thermal analysis tools are equipped with capabilities to handle the complexity of 3D-IC designs, from early feasibility studies to final signoff.

High-fidelity thermal models

Accurate thermal analysis requires high-fidelity thermal models that capture the intricate details of the 3D-IC assembly. These models should account for non-uniform material properties, fine-grained power distributions, and the thermal impact of through-silicon vias (TSVs) and other 3D features. Advanced tools can generate detailed thermal models based on the actual design geometries, providing a realistic representation of heat flow and temperature distribution.

For instance, tools like Calibre 3DThermal embeds an optimized custom 3D solver from Simcenter Flotherm to perform precise thermal analysis down to the nanometer scale. By leveraging detailed layer information and accurate boundary conditions, these tools can produce reliable thermal models that reflect the true thermal behavior of the design.

Automation and results viewing

Automation is a key feature of modern thermal analysis tools, enabling designers to perform complex analyses without requiring deep expertise in thermal engineering. An effective thermal analysis tool must offer advanced automation to facilitate use by non-experts. Key automation features include:

  1. Optimized gridding: Automatically applying finer grids in critical areas of the model to ensure high resolution where needed, while using coarser grids elsewhere for efficiency.
  2. Time step automation: In transient analysis, smaller time steps can be automatically generated during power transitions to capture key impacts accurately.
  3. Equivalent thermal properties: Automatically reducing model complexity while maintaining accuracy by applying different bin sizes for critical (hotspot) vs non-critical regions when generating equivalent thermal properties.
  4. Power map compression: Using adaptive bin sizes to compress very large power maps to improve tool performance.
  1. Automated reporting: Generating summary reports that highlight key results for easy review and decision-making (figure 4).

Fig. 4: Ways to view thermal analysis results.

Automated thermal analysis tools can also integrate seamlessly with other design verification and implementation tools, providing a unified environment for managing thermal, electrical, and mechanical constraints. This integration ensures that thermal considerations are consistently addressed throughout the design flow, from initial feasibility analysis to final tape-out and even connecting with package-level analysis tools.

Real-world application

The practical benefits of integrated thermal analysis solutions are evident in real-world applications. For instance, a leading research organization, CEA, utilized an advanced thermal analysis tool from Siemens EDA to study the thermal performance of their 3DNoC demonstrator. The high-fidelity thermal model they developed showed a worst-case difference of just 3.75% and an average difference within 2% between simulation and measured data, demonstrating the accuracy and reliability of the tool (figure 5).

Fig. 5: Correlation of simulation versus measured results.

The path forward for automotive 3D-IC thermal management

As the automotive industry continues to embrace advanced technologies, the importance of accurate thermal analysis throughout the design flow of 3D-ICs cannot be overstated. By incorporating thermal analysis early in the design process and iteratively refining thermal models, designers can mitigate thermal risks, reduce design time, and enhance chip reliability.

Advanced thermal analysis tools that integrate seamlessly with the broader design environment are essential for achieving these goals. These tools enable designers to perform high-fidelity thermal analysis, automate complex tasks, and ensure that thermal considerations are addressed consistently from package design, through implementation to signoff.

By embracing these practices, designers can unlock the full potential of 3D-IC technology, delivering innovative, high-performance devices that meet the demands of today’s increasingly complex automotive applications.

For more information about die-level 3D-IC thermal analysis, read Conquer 3DIC thermal impacts with Calibre 3DThermal.

The post Ensure Reliability In Automotive ICs By Reducing Thermal Effects appeared first on Semiconductor Engineering.

How do you keep music synced to action, especially in an in-engine cutscene or something that is highly music dependent like Necrodancer, B.P.M., etc.? It seems like it would be really easy for a brief hitch to completely throw off the music timing.

If keeping the beat is the most important thing in the game, then you build the game around keeping the beat. There are many different ways to approach the problem, but if I were building such a system myself, I would start with a system to handle a data-driven beat (e.g. this level/song sets the beat to X, that level/song sets the beat to Y) and then build all of my visuals and gameplay on top of that. The key component to making this work would likely be an animation system that could scale animations faster or slower in order to match the timing of the music.

On the data side, this would mean all animations would be built so they could be sped up by dropping frames, or slowed down by holding certain frames for additional length. All animations would also need to be the same length (or a multiple of a standard length), so that I can ensure the animations will fit into a standard musical measure. If I wanted to have a variety of attack animations and hit reactions, I would probably also establish a set of rules that each attacking and hit reaction animation must always be the same number of frames. I could further standardize each attack and hit reaction active frame happening on the same frame each time.

The code side would then play my animations to those musical measures along the beat. It would do so by scaling the animations longer or shorter based on the beat. The system could add or cut animation frames so that each animation can play in sync with the music. Once I've got the animation system integrated with the beat-keeping system, I can then ensure each animation should start playing on the appropriate frame to keep the beat. As long as the animations are scaled to the musical measure and the music keeps the same beat for its entire duration, the animations should always sync to the beat of the music.

[Join us on Discord] and/or [Support us on Patreon]

Got a burning question you want answered?

ML Method To Predict IR Drop Levels

A new technical paper titled “IR drop Prediction Based on Machine Learning and Pattern Reduction” was published by researchers at National Tsing Hua University, National Taiwan University of Science and Technology, and MediaTek.

Abstract (partial)
“In this paper, we propose a machine learning-based method to predict IR drop levels and present an algorithm for reducing simulation patterns, which could reduce the time and computing resources required for IR drop analysis within the ECO flow. Experimental results show that our approach can reduce the number of patterns by approximately 50%, thereby decreasing the analysis time while maintaining accuracy.”

Find the technical paper here. Published June 2024.

Yong-Fong Chang, Yung-Chih Chen, Yu-Chen Cheng, Shu-Hong Lin, Che-Hsu Lin, Chun-Yuan Chen, Yu-Hsuan Chen, Yu-Che Lee, Jia-Wei Lin, Hsun-Wei Pao, Shih-Chieh Chang, Yi-Ting Li, and Chun-Yao Wang. 2024. IR drop Prediction Based on Machine Learning and Pattern Reduction. In Proceedings of the Great Lakes Symposium on VLSI 2024 (GLSVLSI ’24). Association for Computing Machinery, New York, NY, USA, 516–519. https://doi.org/10.1145/3649476.3658775

The post ML Method To Predict IR Drop Levels appeared first on Semiconductor Engineering.

Vertical Slice Breakdown - Dragon Age Veilguard

It’s been a few days since the Dragon Age Veilguard gameplay video was released. I posted a challenge for aspiring developers to identify as many specific features and systems as they could spot. My expertise is in gameplay, so that’s where I will be focusing. Expertise on visuals like lighting, rendering, shaders, etc. should be directed elsewhere.

0:22 - In-Game Cinematic with moving cameras
0:30 - Seamless cinematic transfer to gameplay, quest tracking UI element, different walking speeds
0:36 - Interactable element with UI
0:43 - Camera movement - orbital motion, but likely not detachable
0:53 - Party member movement, including waiting for the player as part of an escort sequence
2:08 - Uninteractable NPC actors perform animations
2:13 - Scriptable terrain changes/destruction
2:18 - Scriptable interactions with multiple actors
2:29 - Uninterrupted conversations when transitioning from gameplay to in-game cinematic
2:39 - Context-specific traversal method with special traversal animation (balancing across a thin beam)
2:50 - Small sequence that is likely unloading the last area and loading in data for the next environment. Likely also locks players off from returning to the previous area.
3:22 - Conversation wheel with “personality” icons and paraphrased words
3:39 - Dynamic inventory in game cinematics, show player’s items
3:46 - Scripted Player equipment change during cinematic
4:04 - Quest variables (e.g. player background) result in different NPC response
4:27 - Combat UI including current target (four red dots), Combat log
4:30 - Player can jump
4:33 - UI Melee danger indicator for incoming attacks - silver for enemy attacking, gold for shortly impending damage
4:35 - Player can dash/dodge
4:39 - Event log - Items/Loot notification
4:42 - Shooting UI including hit/miss indicator (red reticle), time scaling, arrow charging (rounded purple bar above arrow count), arrow refill cooldown
5:03 - Some kind of special charge/jumping attack
5:09 - XP gain UI, Quest objective completion UI, Quest objective map indicator UI
5:12 - Auto sheath weapons
5:15 - Potion use/Health recovery
5:18 - Recover potions from the environment
5:40 - Quest objective indicator change on approach
5:49 - Ranged attack danger indicator
5:51 - Defensive action (player reflects damage back on ranged attacker)
6:06 - Enemies can be knocked off edges when fatal
6:10 - Destructible objects in combat, can be scripted
6:16 - Some kind of “special” dodge skill with VFX, likely a rogue class skill
6:51 - Second context-specific traversal method (sliding down a slope) also likely a second “can’t go back” type of lockoff
7:01 - Action/Command UI (party/self ability commands)
7:06 - Specific skill used, skill cooldown, enemy debuffed + UI (weakened), resource used (purple bar at bottom of screen)
7:07 - Quick use button mapping, likely for controller face buttons
7:09 - Resource bar refills on its own and on attack damage
10:47 - Different kinds of health bars (likely magical shield and armor)
11:59 - Boss UI with both magical shield and armor bars. Not sure what the number 4 there indicates
12:15 - Telegraphed danger zones projected onto the floor
12:22 - Quick recover timing event
14:45 - Conversation option for branching cinematic
14:51 - Follower approval UI event log
18:49 - Destructible object with health bar and UI highlighting

Each of these elements is something that would need to be designed and implemented by someone on the gameplay team working with UI, engineering, and art. See anything I missed? Which did you get?

[Join us on Discord] and/or [Support us on Patreon]

Got a burning question you want answered?

Looks like they released the Dragon Age: Veilguard gameplay reveal today. Aspiring developers,…

Looks like they released the Dragon Age: Veilguard gameplay reveal today. Aspiring developers, here’s a homework assignment. Treat this as a video of a vertical slice. Make a list of the individual features and systems you observe working from the video. I’ll work my own list and post it on Friday. If you’re feeling confident, tag me and we’ll compare lists!

If you need an idea for what such a list might look like, read my [Vertical Slice Glossary Entry].

[Join us on Discord] and/or [Support us on Patreon]

Got a burning question you want answered?

Study: Three skulls of medieval Viking women were deliberately elongated

Artificially modified skull from the female Viking individual in Havor, Hablingbo parish, Gotland.

Enlarge / Artificially modified skull from a female Viking individual in Havor, Hablingbo parish, Gotland. (credit: © SHM/Johnny Karlsson 2008-11-05/CC BY 2.5 SE)

German archaeologists discovered that the skulls of three medieval Viking women found on the Swedish island of Gotland in the Baltic Sea showed evidence of an unusual procedure to elongate their skulls. The process gave them an unusual and distinctive appearance, according to a paper published in the journal Current Swedish Archaeology. Along with evidence that the Viking men from the island may have deliberately filed their teeth, the discovery sheds light on the role body modification may have played in Viking culture

When people hear about Viking body modification, they probably think of Viking tattoos, particularly since the History Channel series Vikings popularized that notion. But whether actual Vikings sported tattoos is a matter of considerable debate. There is no mention of tattoos in the few Norse sagas and poetry that have survived, although other unusual physical characteristics are often mentioned, such as scars.

The only real evidence comes from a 10th century travel account by an Arab traveler and trader named Ahmad Ibn Fadlan, whose travel account, Mission to the Volga, describes the Swedish Viking traders ("Rusiyyah") he met in the Middle Volga region of Russia. "They are dark from the tips of their toes right up to their necks—trees, pictures, and the like," Ibn Fadlan wrote. But the precise Arabic translation is unclear, and there is no hard archaeological evidence, since human skin typically doesn't preserve for centuries after a Viking burial.

Read 7 remaining paragraphs | Comments

AI For Data Management

Data management is becoming a significant new challenge for the chip industry, as well as a brand new opportunity, as the amount of data collected at every step of design through manufacturing continues to grow.

Exacerbating the problem is the rising complexity of designs, many of which are highly customized and domain-specific at the leading edge, as well as increasing demands for reliability and traceability. There also is a growing focus on chiplets developed using different processes, including some from different foundries, and new materials such as glass substrates and ruthenium interconnects. On the design side, EDA and verification tools can generate terabytes of data on a weekly or even a daily basis, unlike in the past when this was largely done on a per-project basis.

While more data can be used to provide insights into processes and enable better designs, it’s an ongoing challenge to manage the current volumes being generated. The entire industry must rethink some well-proven methodologies and processes, as well as invest in a variety of new tools and approaches. At the same time, these changes are generating concern in an industry used to proceeding cautiously, one step at a time, based on silicon- and field-proven strategies. Increasingly, AI/ML is being added into design tools to identify anomalies and patterns in large data sets, and many of those tools are being regularly updated as algorithms are updated and new features are added, making it difficult to know exactly when and where to invest, which data to focus on, and with whom to share it.

“Every company has its own design flow, and almost every company has its own methodology around harvesting that data, or best practices about what reports should or should not be written out at what point,” said Rob Knoth, product management director in Cadence’s Digital & Signoff group. “There’s a death by 1,000 cuts that can happen in terms of just generating titanic volumes of data because, in general, disk space is cheap. People don’t think about it a lot, and they’ll just keep generating reports. The problem is that just because you’re generating reports doesn’t mean you’re using them.”

Fig. 1: Rising design complexity is driving increased need for data management. Source: IEEE Rising Stars 2022/Cadence

As with any problem in chip design, there is opportunity in figuring out a path forward. “You can always just not use the data, and then you’re back where you started,” said Tony Chan Carusone, CTO at Alphawave Semi. “The reason it becomes a problem for organizations is because they haven’t architected things from the beginning to be scalable, and therefore, to be able to handle all this data. Now, there’s an opportunity to leverage data, and it’s a different way. So it’s disruptive because you have to tear things apart, from re-architecting systems and processes to how you collect and store data, and organize it in order to take advantage of the opportunity.”

Buckets of data, buckets of problems
The challenges that come with this influx of data can be divided into three buckets, said Jim Schultz, senior staff product manager at Synopsys. The first is figuring out what information is actually critical to keep. “If you make a run, designers tend to save that run because if they need to do a follow up run, they have some data there and they may go, ‘Okay, well, what’s the runtime? How long did that run take, because my manager is going to ask me what I think the runtime is going to be on the next project or the next iteration of the block. While that data may not be necessary, designers and engineers have a tendency to hang onto it anyway, just in case.”

The second challenge is that once the data starts to pour in, it doesn’t stop, raising questions about how to manage collection. And third, once the data is collected, how can it be put to best use?

“Data analytics have been around with other types of companies exploring different types of data analytics, but the differences are those are can be very generic solutions,” said Schultz. “What we need for our industry is going to be very specific data analytics. If I have a timing issue, I want you to help me pinpoint what the cause of that timing violation is. That’s very specific to what we do in EDA. When we talk about who is cutting through the noise, we don’t want data that’s just presented. We want the data that is what the designer most cares about.”

Data security
The sheer number of tools being used and companies and people involved along the design pathway raises another challenge — security.

“There’s a lot of thought and investment going into the security aspect of data, and just as much as the problem of what data to save and store is the type of security we have to have without hindering the user day-to-day,” said Simon Rance, director of product management at Keysight. “That’s becoming a bigger challenge. Things like the CHIPS Act and the geopolitical scenarios we have at the moment are compounding that problem because a lot of the companies that used to create all these devices by themselves are having to collaborate, even with companies in different regions of the globe.”

This requires a balancing act. “It’s almost like a recording studio where you have all these knobs and dials to fine tune it, to make sure we have security of the data,” said Rance. “But we’re also able to get the job done as smoothly and as easily as we can.”

Further complicating the security aspect is that designing chips is not a one-man job. As leading-edge chips become increasingly complex and heterogeneous, they can involve hundreds of people in multiple companies.

“An important thing to consider when you’re talking about big data and analytics is what you’re going to share and with whom you’re going to share it,” said Synopsys’ Schultz. “In particular, when you start bringing in and linking data from different sources, if you start bringing in data related to silicon performance, you don’t want everybody to have access to that data. So the whole security protocol is important.”

Even the mundane matters — having a ton of data makes it likely, at some point, that data will be moved.

“The more places the data has to be transferred to, the more delays,” said Rance. “The bigger the data set, the longer it takes to go from A to B. For example, a design team in the U.S. may be designing during the day. Then, another team in Singapore or Japan will pick up on that design in their time zone, but they’re across the world. So you’re going to have to sync the data back and forth between these kinds of design sites. The bigger the data, the harder to sync.”

Solutions
The first step toward solving the issue of too much data is figuring out what data is actually needed. Rance said his team has found success using smart algorithms that help figure out which data is essential, which in turn can help optimize storage and transfer times.

There are less technical problems that can rear their heads, as well. Gina Jacobs, head of global communications and brand marketing at Arteris, said that engineers who use a set methodology — particularly those who are used to working on a problem by themselves and “brute forcing” a solution – also can find themselves overwhelmed by data.

“Engineers and designers can also switch jobs, taking with them institutional knowledge,” Jacobs said. “But all three problems can be solved with a single solution — having data stored in a standardized way that is easily accessible and sortable. It’s about taking data and requirements and specifications in different forms and then having it in the one place so that the different teams have access to it, and then being able to make changes so there is a single source of truth.”

Here, EDA design and data management tools are increasingly relying on artificial intelligence to help. Schultz forecasted a future where generative AI will touch every facet of chip development. “Along with that is the advanced data analytics that is able to mine all of that data you’ve been collecting, instead of going beyond the simple things that people have been doing, like predicting how long runtime is going to be or getting an idea what the performance is going to be,” he said. “Tools are going to be able to deal with all of that data and recognize trends much faster.”

Still, those all-encompassing AI tools, capable of complex analysis, are still years away. Cadence’s Knoth said he’s already encountered clients that are reluctant to bring it into the mix due to fears over the costs involved in disk space, compute resources, and licenses. Others, however, have been a bit more open-minded.

“Initially, AI can use a lot of processors to generate a lot of data because it’s doing a lot of things in parallel when it’s doing the inferencing, but it usually gets to the result faster and more predictably,” he said. So while a machine learning algorithm may generate even more vast amounts of data, on top of the piles currently available, “a good machine learning algorithm could be watching and smartly killing or restarting jobs where needed.”

As for the humans who are still an essential component to chip design, Alphawave’s Carusone said hardware engineers should take a page from lessons learned years ago from their counterparts in the software development world.

These include:

  • Having an organized and automated way to collect data, file it in a repository, and not do anything manually;
  • Developing ways to run verification and lab testing and everything in between in parallel, but with the data organized in a way that can be mined; and
  • Creating methods for rigorously checking in and out of different test cases that you want to consider.

“The big thing is you’ve got all this data collected, but then what is each of each of those files, each of those collections of data?” said Carusone. “What does that correspond to? What test conditions was that collected in? The software community dealt with that a while ago, and the hardware community also needs to have this under its belt, taking it to the next level and recognizing we really need to be able to do this en masse. We need to be able to have dozens of people work in parallel, collecting data and have it all on there. We can test a big collection of our designs in the lab without anyone having to touch a thing, and then also try refinements of the firmware, scale them out, then have all the data come in and be analyzed. Being able to have all that done in an automated way lets you track down and fix problems a lot more quickly.”

Conclusion
The influx of new tools used to analyze and test chip designs has increased productivity, but those designs come with additional considerations. Institutions and individual engineers and designers have never had access to so much data, but that data is of limited value if it’s not used effectively.

Strategies to properly store and order that data are essential. Some powerful tools are already in place to help do that, and the AI revolution promises to make even more powerful resources available to quickly cut down on the time needed to run tests and analyze the results.

For now, handling all that data remains a tricky balance, according to Cadence’s Knoth. “If this was an easy problem, it wouldn’t be a problem. Being able to communicate effectively, hierarchically — not just from a people management perspective, but also hierarchically from a chip and project management perspective — is difficult. The teams that do this well invest resources into that process, specifically the communication of top-down tightening of budgets or top-down floorplan constraints. These are important to think about because every engineer is looking at chip-level timing reports, but the problem that they’re trying to solve might not ever be visible. But if they have a report that says, ‘Here is your view of what your problems are to solve,’ you can make some very effective work.”

Further Reading
EDA Pushes Deeper Into AI
AI is both evolutionary and revolutionary, making it difficult to assess where and how it will be used, and what problems may crop up.
Optimizing EDA Cloud Hardware And Workloads
Algorithms written for GPUs can slice simulation time from weeks to hours, but not everything is optimized or benefits equally.

The post AI For Data Management appeared first on Semiconductor Engineering.

Chip Aging Becoming Key Factor In Data Center Economics

Chip aging is becoming a much bigger concern inside of data centers, where it can impact server uptime, utilization rates, and the amount of energy needed to drive signals and cool entire server racks.

Aging in chips is the result of both higher logic utilization and increasing transistor density. This is problematic for data centers, in general, but especially for AI chips where digital logic is expected to run at maximum speed. That generates more heat, which becomes harder to dissipate as the number specialized and general-purpose processing elements per square millimeter of silicon continues to rise. Heat typically gets trapped between the fins of finFETs and gate-all-around FETs, accelerating electromigration and reducing the time it takes for dielectrics to break down. It also can cause warpage, which can rupture the bonds and contacts between different components in an advanced package or on a PCB.

For data centers, that creates a number of challenges:

  • Thermal management: This requires a deep understanding of workloads and the resulting transient thermal gradients as processing is load-balanced on-chip, between chips or chiplets, and between servers;
  • More data: Data from sensors everywhere, along with larger training sets, all need to be processed faster than in the past to keep up with the flood of data, but all of that needs to happen in the same or smaller footprint without overheating any part of a device, and
  • In-circuit monitoring: Sensors can be added into chips to detect variations in heat and data speeds in different paths, but it’s much more difficult to keep track of tens of thousands of these monitors as they collect data from heterogeneous processing elements, each of which can age at different rates depending on process variation, defectivity, varying workloads, and ambient thermal conditions.

“Servers are much more capable today than they were 10 years ago, and the issue is that power hasn’t scaled like it used to,” said Steven Woo, Rambus fellow and distinguished inventor. “Now, if you want to do lots more work in your server, you have to burn more power to do it. Twenty years ago, a server might dissipate a couple hundred watts. But with the latest servers that NVIDIA just announced around Grace Blackwell, the whole rack is 120 kilowatts, and the individual servers are many kilowatts. Just delivering power into those racks is causing changes in the infrastructure in the industry. Now that you have to bring in and dissipate more power in a small space, you get all kinds of interesting things that could happen over time. The heat that’s being dissipated can have effects on the chip, and you have to worry sometimes about thermal cycling where, as the chip is doing a lot of work, maybe part of the chip stops and then it does more work. You get these rapid cycles of dissipating a lot of power, then not, then dissipating a lot of power, then not. That cycling causes local heating and cooling, leading to thermal stresses, and this impacts all chips, including memory.”

As a result, everyone from the data center manager to the chip architect now has to understand how a chip behaves in the field, and how increasingly customized chip and system architectures will function over time. Downtime is costly for a data center, but under-utilization and reduced performance also carries a high price tag. That, in turn, affects how much margin is considered essential, such as extra data paths if some of them are fully or partially closed off by electromigration, and how that margin will impact performance, power, and area/cost over a chip’s projected lifetime — especially in a heterogeneous design with specialized compute elements.

“When it comes to the hyper-scalers and high powered, highly customized, heterogeneous chips for various different workloads, these chips are on 24/7, so consistent uptime is critical,” said Dan Lee, product management director at Cadence. “Since all of these chips are done at the really advanced nodes, with the smaller device sizes, more developers are looking to do aging analysis, and derive the wear and tear so they can see if the chip is going to last a year or five years. At the same time, an important consideration is also thermal — especially when we’re talking about these heterogeneous integrations, and you don’t really get the thermal conductivity that you would in a straightforward, monolithic design. There’s a bit more thought or planning that needs to be a part of this because aging and heating are related. All things being equal, if you’re operating in a very hot environment, you’re going to expect a lower lifespan.”

Still, determining how much shorter that lifespan will be isn’t always a precise calculation. “Data center SoCs that execute mission-critical workloads need to provide scalable visibility, predict problems before they occur, provide deep-dive analyses into problems, and be optimized to increase longevity of investment,” said Padmakumar Karthik, senior technology manager at Arm. “Data center diagnostic patterns are often deployed to measure the health of an SoC post-manufacturing to prevent silent data corruption (SDC) issues. But on-chip sensors provide an additional layer of insights, detecting droops or aging or thermal events on-chip, all of which can cause SDC incidents. For this reason, scalable, customizable sensor frameworks that can monitor and adapt throughout the useful life of the device, enabling continuous design optimization and preventive maintenance, will be increasingly important.”

There are multiple ways to achieve this, but each data center can be very different. In some cases, chips are designed by systems companies for internal use. And in most cases, there is a mix of different hardware and software, not all of which is state-of-the-art. “Many data centers have legacy infrastructure that may not be inherently designed for optimal power efficiency,” noted Noam Brousard, vice president of systems at proteanTecs, in a recent blog. “Upgrading or retrofitting such infrastructure poses challenges in achieving comprehensive power optimization.”

Even within a single rack, stresses can vary greatly from one server to the next, and from one chip to the next even in the same server. “You can imagine when you have a very big chip, toward the edges of the chip it will expand more than in a small chip, and that can add stress,” said Rambus’ Woo. “You have to really be careful about how you cool things, and memory is no different. You have very specific things you worry about with memory, like the ability to retain data, depending on how hot the chip is.”

In addition, as chips age, parameters drift. Marc Swinnen, director of product marketing in Ansys’ semiconductor division, said the traditional approach has been to use a library that’s characterized as a brand new chip. “The library is characterized at 1 year, 5 years, 10 years, 15 years, and you can run all your analysis multiple times with these different aged libraries. That sounds good on paper, and that’s what a lot of people do, but the problem is that not all parts of the chip age at the same rate. This is why aging is often associated with activity and temperature. Some parts of the chip are more active and hotter than other parts of the chip, so the aging time runs differently for different parts. This means you want to apply some of the old library to some parts of the chip, and the younger library to other parts of the chip, because if signals run between them you have setup and hold issues. If everything slows down at the same time — or one slows down and the other one doesn’t — you’re going to get mismatches, and that’s the difficulty. At the bottom level, it’s easy. Every gate is assigned its right age. That’s simple. You do an analysis with every gate. But how do you assign the age to every gate? Where do you get that information from? You need a lot of realistic activity, and then predict that over the lifespan and with temperature. That’s the problem. How do you actually construct this aging map? Once you have it, the analysis is not that hard.”

Aging maps are application- and workload-specific. Every chip will age differently depending on the functions it performs.

But aging is just one of many factors that affect data center uptime. “When we look at data center, we look at the whole application first, then whittle it down to what that means for chips and packages,” said Kelly Morgan, senior principal application engineer at Ansys. “From the mechanical reliability lens of the data center operation, we go through thermal cycling, obviously. We’re in a controlled environment. But what does that influence? How does that influence the integrity of the chips as you go through thermal cycles? Typically, we’ll look at things like solder fatigue and other effects.”

Another factor to consider is shipping and handling, which can affect the aging of a chip, package, and board.

“Even before the device is put in place, there are opportunities for vibration,” Morgan said. “You might hit something, which is a bit of a shock. We have customers who are looking at things like drop, shock, and vibration, and they have goals they need to test to. Typically, the standard process is to do a lot of physical testing. Now as you can imagine, that can be pretty challenging. You have to be pretty far along in the design process before you really start to go and test, and if there’s an issue, then you’ve got to go back and retest. Early simulation helps here, especially for those larger-scale events, and that comes down to the chassis, the board, to all the components, including the ICs.”


Fig. 1: Components of complete electronic system analysis. Source: Ansys

Quality control remains a big challenge when it comes to mechanical stresses that can affect aging. Adam Cron, distinguished architect at Synopsys, pointed to a recent Intel white paper, which noted that at the current acceptable defectivity rates, one core fails every two days. To account for this, Cron noted that certain commercial tools support in-system delay testing in a BiST mode. By adding specific IP, any ATPG patterns could be added to that. (Intel’s paper said its solution only applies to stuck-at testing.)

“In very large, millions-of-cores data center-type environments, the implication is that you’d better be ready,” Cron said. “One of the things they were talking about in this paper was in-system scan. Intel was bringing a database of test patterns in, and then applying it in-system after isolating a core. And then, upon a failure, they’d quarantine and move on. But the data centers are apparently running out of that opportunistic time slot to do any of this. We’ve heard some interesting conversations about the fact that people do run a lot of things during certain times. However, other times are cheaper, so all the holes are just getting filled in terms of runtime. Monitors are certainly something to look at, but monitors are looking at systemic degradation. That’s known, if you will. And so as things degrade, Vmin will change, maybe frequency will change. And they’ll be on a pace. They can figure out when to do that. That’s easy enough to figure out. However, if there’s a marginality or some broken component in there, it is not up to the tool to find that. And frankly, the in-system scan wasn’t addressing all components on the die. It was only up to like 80% of stuck-at coverage, which isn’t that much, especially when you’re not looking at all of the pieces inside the die. The point is, there are still opportunities to do better.”

Cron noted that one big systems company suggested a dual-core lockstep mechanism, starting out the data center in dual-core lock-step mode for X number of months. “When it looks like you’ve squeezed the major part of the curve out, in terms of finding these defective components, then unlock them, double your capacity, run like that for a while, and periodically hook some back up again. That means everything is utilized, at least. Of course, some are working at half capacity here and there, but it’s not the whole die. And there are some implications there from a design standpoint, at least for the hardware, but also possibly the operating system, depending on who decides what physical core is used versus what virtual core is used.”

Approaches to measuring aging
Any discussion around aging circuits really boils down to extending the life of the machines in the data center, and not getting caught by surprise when failures occur.

“How do you do that? You have to measure the aging of those machines,” said Neil Hand, director of marketing, IC segment at Siemens EDA. “Right now, if you speak to the CIOs of these big companies with big data centers, they say, ‘We’ve got to get rid of the machines after three years because we can’t risk it going down.’ If you look at embedded analytics capabilities, you can start to embed aging monitors in those devices, you can start to monitor those in real time. It doesn’t look that different than what it does from an automotive perspective. It’s all the same technologies, effectively, but you’re monitoring them. And then you can say, ‘We’re now at 90% of our life for this server.’ We can then just replace that server.”

This feeds into corporate goals around sustainability, as well. “It comes down to building the best thing to begin with, then building it with design for manufacturing in mind so that you don’t get waste during manufacturing, achieve better yields, and finally extend the life of products and build them in environmentally-sustainable ways,” Hand said. “If you can extend the data center lifecycle from three years to five years, that’s big. And especially if you start going to these high-performance, application-specific type of clusters, you may not need to change them as often, because if the underlying capabilities aren’t changing, that might drive the cycling of it. In the case of a biological computer, if there’s no new change to the underlying protein folding mechanisms, you might say, ‘We don’t need a new compute platform. This is really good.”

The longer the product life can be extended, the better. Design for aging is a matter of, first, performing the aging analysis with the foundry models. “Run the simulations and observe the effects,” said Cadence’s Lee. “When you’re doing the simulation, you want to have the right mission profiles, so you come up with an accurate prediction of how your device is going to behave after a certain number of years in deployment. You may want to combine that with thermal analysis, for example, because how that aging is going to behave will depend on what temperature this design is going to be working at. You may think it’s 22 degrees Celsius, but maybe through some thermal analysis you realize it’s actually going to be operating at 35 or 40 degrees most of the time. That may change the outcome of your aging analysis.”

In terms of the associated thermal analysis, this can extend beyond a single device. “It’s also how that heat is moving,” Lee said. “Let’s say you have this integrated design, where you have some power devices alongside some logic, or some other functionality that is lower power. What you may want to understand is, if those bandgaps or power circuits are generating a lot of heat, that may be shifting over into other parts of your design. So when you run your aging analysis, you may assume that you’re running at 25 degrees, whereas the power devices are at 40 or 45 degrees. They’re on the same chip, they’re very close to each other, and you have to understand how much of that heat is moving over to your logic and what that’s going to bring the temperature up to. You want to know that so you can perform the aging analysis based on that higher temperature.”

Another consideration is combining aging analysis and interconnect parasitics, which is especially relevant for advanced nodes due to the parasitics in the interconnect. “They’re dominant when it comes to performance and functionality,” Lee added. “So when thinking about aging, you also have to think about it being an aged device that has to push the electrons through this interconnect. That’s a pretty heavy load. When you’re doing the aging analysis, you probably will have to be doing it with extracted parasitics. You just can’t do it on a pure schematic design. It doesn’t give you enough detail about what’s really happening physically. This may be included in the aging analysis tool. When most people talk about aging, they may not think about the parasitic aspect to it.”

Combating aging, thermal in memory
While standards don’t work in custom silicon, they do work for some standard components in those devices, such as memory. Over the past 10 to 15 years, memory standards have started to address the impact of heat.

“If you start to exceed certain temperature limits, you’ve got to refresh the device more frequently because the charge can leak off the cells more quickly,” said Rambus’ Woo. “So there are temperature-dependent refresh rates. There are other things that can be exacerbated, like the capacitors are getting smaller, they’re holding fewer electrons because there are so many more of them on a chip now, so we’ve seen memories adopt on-die error correction. This on-die error correction is something that is hidden from the outside world. In many cases, you don’t even know an error has occurred and been corrected on the chip. Those kinds of technologies become even more important now because the temperatures can be higher.”

There also is growing demand for more telemetry to provide monitoring information. “You just want to know if anything is overheating,” said Woo. “Does something seem like it’s malfunctioning? The data center manager will get regular updates about the status of the major components of the system. A lot of boards now in servers have baseboard management controllers (BMCs), which are little chips that sit on each board and are responsible for, among other things, reporting back the health of that board when a server might have five or six boards. We’re frequently seeing more of these BMC chips.”

Design for aging
While the goal is to be able to guarantee a certain lifetime for the chips in a data center, the challenges for achieving that are expanding. “There’s a growing list of things that can be harmful to devices over their lifetime,” Woo said. “It’s a balance between not adding too much cost, even though you have to increase the reliability and maybe add new features, and all of these things are in play with each other.”

Whether it is liquid cooling or higher levels of RAS ECC in the system, there is no single best answer for every application. In general, the industry is moving toward higher reliability and increasing resilience, but there are many ways to get there and challenges with each of them.

“Just as 15 years ago we didn’t necessarily always think we had to talk about power, now we have to talk about it all the time,” Woo said. “The same thing is going to be true for resilience and reliability. It’s going to be required to become part of the way people think about architectures, and part of that is how the memory system improves its reliability. You can’t really do anything unless you can compute on some data, and you have to make sure that data is reliable. It will touch how memory is stored in a DRAM. It will touch how memory is communicated across links. And it even will touch how processors manipulate data once they get a hold of it in their caches, and in the compute pipelines. Also, one of the key things people will worry about is how much of that susceptibility is brought about by age-related issues, like heating cycles, etc.”

Finally, there are even issues around the quality of the power that comes into a system. “The servers get noise on the power rails, and it’s a balance between how much money you’re willing to pay for the power delivery versus the quality of power,” said Woo. “You have to be tolerant of those kinds of things, too. Power management becomes more challenging, as well as the amount of power that these systems are using today. NVIDIA systems bring 48-volt power into the racks, and there is talk about even higher voltage levels. Those changes in infrastructure can all impact heat, and can age components differently.”

The post Chip Aging Becoming Key Factor In Data Center Economics appeared first on Semiconductor Engineering.

Interoperability And Automation Yield A Scalable And Efficient Safety Workflow

By Ann Keffer, Arun Gogineni, and James Kim

Cars deploying ADAS and AV features rely on complex digital and analog systems to perform critical real-time applications. The large number of faults that need to be tested in these modern automotive designs make performing safety verification using a single technology impractical.

Yet, developing an optimized safety methodology with specific fault lists automatically targeted for simulation, emulation and formal is challenging. Another challenge is consolidating fault resolution results from various fault injection runs for final metric computation.

The good news is that interoperability of fault injection engines, optimization techniques, and an automated flow can effectively reduce overall execution time to quickly close-the-loop from safety analysis to safety certification.

Figure 1 shows some of the optimization techniques in a safety flow. Advanced methodologies such as safety analysis for optimization and fault pruning, concurrent fault simulation, fault emulation, and formal based analysis can be deployed to validate the safety requirements for an automotive SoC.

Fig. 1: Fault list optimization techniques.

Proof of concept: an automotive SoC

Using an SoC level test case, we will demonstrate how this automated, multi-engine flow handles the large number of faults that need to be tested in advanced automotive designs. The SoC design we used in this test case had approximately three million gates. First, we used both simulation and emulation fault injection engines to efficiently complete the fault campaigns for final metrics. Then we performed formal analysis as part of finishing the overall fault injection.

Fig. 2: Automotive SoC top-level block diagram.

Figure 3 is a representation of the safety island block from figure 2. The color-coded areas show where simulation, emulation, and formal engines were used for fault injection and fault classification.

Fig. 3: Detailed safety island block diagram.

Fault injection using simulation was too time and resource consuming for the CPU core and cache memory blocks. Those blocks were targeted for fault injection with an emulation engine for efficiency. The CPU core is protected by a software test library (STL) and the cache memory is protected by ECC. The bus interface requires end-to-end protection where fault injection with simulation was determined to be efficient. The fault management unit was not part of this experiment. Fault injection for the fault management unit will be completed using formal technology as a next step.

Table 1 shows the register count for the blocks in the safety island.

Table 1: Block register count.

The fault lists generated for each of these blocks were optimized to focus on the safety critical nodes which have safety mechanisms/protection.

SafetyScope, a safety analysis tool, was run to create the fault lists for the FMs for both the Veloce Fault App (fault emulator) and the fault simulator and wrote the fault lists to the functional safety (FuSa) database.

For the CPU and cache memory blocks, the emulator inputs the synthesized blocks and fault injection/fault detection nets (FIN/FDN). Next, it executed the stimulus and captured the states of all the FDNs. The states were saved and used as a “gold” reference for comparison against fault inject runs. For each fault listed in the optimized fault list, the faulty behavior was emulated, and the FDNs were compared against the reference values generated during the golden run, and the results were classified and updated in the fault database with attributes.

Fig. 4: CPU cluster. (Source from https://developer.arm.com/Processors/Cortex-R52)

For each of the sub parts shown in the block diagram, we generated an optimized fault list using the analysis engine. The fault lists are saved into individual session in the FuSa database. We used the statistical random sampling on the overall faults to generate the random sample from the FuSa database.

Now let’s look at what happens when we take one random sample all the way through the fault injection using emulation. However, for this to completely close on the fault injection, we processed N samples.

Table 2: Detected faults by safety mechanisms.

Table 3 shows that the overall fault distribution for total faults is in line with the fault distribution of the random sampled faults. The table further captures the total detected faults of 3125 out of 4782 total faults. We were also able model the detected faults per sub part and provide an overall detected fault ratio of 65.35%. Based on the faults in the random sample and our coverage goal of 90%, we calculated that the margin of error (MOE) is ±1.19%.

Table 3: Results of fault injection in CPU and cache memory.

The total detected (observed + unobserved) 3125 faults provide a clear fault classification. The undetected observed also provide a clear classification for Residual faults. We did further analysis of undetected unobserved and not injected faults.

Table 4: Fault classification after fault injection.

We used many debug techniques to analyze the 616 Undetected Unobserved faults. First, we used formal analysis to check the cone of influence (COI) of these UU faults. The faults which were outside the COI were deemed safe, and there were five faults which were further dropped from analysis. For the faults which were inside the COI, we used engineering judgment with justification of various configurations like, ECC, timer, flash mem related etc. Finally, using formal and engineering judgment we were able to further classify 616 UU faults into safe faults and remaining UU faults into conservatively residual faults. We also reviewed the 79 residual faults and were able to classify 10 faults into safe faults. The not injected faults were also tested against the simulation model to check if any further stimulus is able to inject those faults. Since no stimulus was able to inject these faults, we decided to drop these faults from our consideration and against the margin of error accordingly. With this change our new MOE is ±1.293%.

In parallel, the fault simulator pulled the optimized fault lists for the failure modes of the bus block and ran fault simulations using stimulus from functional verification. The initial set of stimuli didn’t provide enough coverage, so higher quality stimuli (test vectors) were prepared, and additional fault campaigns were run on the new stimuli. All the fault classifications were written into the FuSa database. All runs were parallel and concurrent for overall efficiency and high performance.

Safety analysis using SafetyScope helped to provide more accuracy and reduce the iteration of fault simulation. CPU and cache mem after emulation on various tests resulted an overall SPFM of over 90% as shown in Table 5.

Table 5: Overall results.

At this time not all the tests for BUS block (end to end protection) doing the fault simulation had been completed. Table 6 shows the first initial test was able to resolve the 9.8% faults very quickly.

Table 6: Percentage of detected faults for BUS block by E2E SM.

We are integrating more tests which have high traffic on the BUS to mimic the runtime operation state of the SoC. The results of these independent fault injections (simulation and emulation) were combined for calculating the final metrics on the above blocks, with the results shown in Table 7.

Table 7: Final fault classification post analysis.

Conclusion

In this article we shared the details of a new functional safety methodology used in an SoC level automotive test case, and we showed how our methodology produces a scalable, efficient safety workflow using optimization techniques for fault injection using formal, simulation, and emulation verification engines. Performing safety analysis prior to running the fault injection was very critical and time saving. Therefore, the interoperability for using multiple engines and reading the results from a common FuSa database is necessary for a project of this scale.

For more information on this highly effective functional safety flow for ADAS and AV automotive designs, please download the Siemens EDA whitepaper Complex safety mechanisms require interoperability and automation for validation and metric closure.

Arun Gogineni is an engineering manager and architect for IC functional safety at Siemens EDA.

James Kim is a technical leader at Siemens EDA.

The post Interoperability And Automation Yield A Scalable And Efficient Safety Workflow appeared first on Semiconductor Engineering.

Maximizing Energy Efficiency For Automotive Chips

Silicon chips are central to today’s sophisticated advanced driver assistance systems, smart safety features, and immersive infotainment systems. Industry sources estimate that now there are over 1,000 integrated circuits (ICs), or chips, in an average ICE car, and twice as many in an average EV. Such a large amount of electronics translates into kilowatts of power being consumed – equivalent to a couple of dishwashers running continuously. For an ICE vehicle, this puts a lot of stress on the vehicle’s electrical and charging system, leading automotive manufacturers to consider moving to 48V systems (vs. today’s mainstream 12V systems). These 48V systems reduce the current levels in the vehicle’s wiring, enabling the use of lower cost smaller-gauge wire, as well as delivering higher reliability. For EVs, higher energy efficiency of on-board electronics translates directly into longer range – the primary consideration of many EV buyers (second only to price). Driver assistance and safety features often employ redundant component techniques to ensure reliability, further increasing vehicle energy consumption. Lack of energy efficiency for an EV also means more frequent charging, further stressing the power grid and producing a detrimental effect on the environment. All these considerations necessitate the need for a comprehensive energy-efficient design methodology for automotive ICs.

What’s driving demand for compute power in cars?

Classification and processing of massive amounts of data from multiple sources in automotive applications – video, audio, radar, lidar – results in a high degree of complexity in automotive ICs as software algorithms require large amounts of compute power. Hardware architectural decisions, and even hardware-software partitioning, must be done with energy efficiency in mind. There is a plethora of tradeoffs at this stage:

  • Flexibility of a general-purpose CPU-based architecture vs. efficiency of a dedicated digital signal processor (DSP) vs. a hardware accelerator
  • Memory sub-system design: how much is required, how it will be partitioned, how much precision is really needed, just to name a few considerations

In order to enable reliable decisions, architects must have access to a system that models, in a robust manner, power, performance, and area (PPA) characteristics of the hardware, as well as use cases. The idea is to eliminate error-prone estimates and guesswork.

To improve energy efficiency, automotive IC designers also must adopt many of the power reduction techniques traditionally used by architects and engineers in the low-power application space (e.g. mobile or handheld devices), such as power domain shutoff, voltage and frequency scaling, and effective clock and data gating. These techniques can be best evaluated at the hardware design level (register transfer level, or RTL) – but with the realistic system workload. As a system workload – either a boot sequence or an application – is millions of clock cycles long, only an emulation-based solution delivers a practical turnaround time (TAT) for power analysis at this stage. This power analysis can reveal intervals of wasted power – power consumption bugs – whether due to active clocks when the data stream is not active, redundant memory access when the address for the read operation doesn’t change for many clock cycles (and/or when the address and data input don’t change for the write operation over many cycles), or unnecessary data toggles while clocks are gated off.

To cope with the huge amount of data and the requirement to process that data in real time (or near real time), automotive designers employ artificial intelligence (AI) algorithms, both in software and in hardware. Millions of multiply-accumulate (MAC) operations per second and other arithmetic-intensive computations to process these algorithms give rise to a significant amount of wasted power due to glitches – multiple signal transitions per clock cycle. At the RTL stage, with the advanced RTL power analysis tools available today, it is possible to measure the amount of wasted power due to glitches as well as to identify glitch sources. Equipped with this information, an RTL design engineer can modify their RTL source code to lower the glitch activity, reduce the size of the downstream logic, or both, to reduce power.

Working together with the RTL design engineer is another critical persona – the verification engineer. In order to verify the functional behavior of the design, the verification engineer is no longer dealing just with the RTL source: they also have to verify the proper functionality of the global power reduction techniques such as power shutoff and voltage/frequency scaling. Doing so requires a holistic approach that leverages a comprehensive description of power intent, such as the Unified Power Format (UPF). All verification technologies – static, formal, emulation, and simulation – can then correctly interpret this power intent to form an effective verification methodology.

Power intent also carries through to the implementation part of the flow, as well as signoff. During the implementation process, power can be further optimized through physical design techniques while conforming to timing and area constraints. Highly accurate power signoff is then used to check conformance to power specifications before tape-out.

Design and verification flow for more energy-efficient automotive SoCs

Synopsys delivers a complete end-to-end solution that allows IC architects and designers to drive energy efficiency in automotive designs. This solution spans the entire design flow from architecture to RTL design and verification, to emulation-driven power analysis, to implementation and, ultimately, to power signoff. Automotive IC design teams can now put in place a rigorous methodology that enables intelligent architectural decisions, RTL power analysis with consistent accuracy, power-aware physical design, and foundry-certified power signoff.

The post Maximizing Energy Efficiency For Automotive Chips appeared first on Semiconductor Engineering.

Why A.A. Milne and his son Christopher Robin ended up hating Winnie the Pooh, and stopped speaking with each other

If you've ever suspected that the seemingly innocent Winnie the Pooh had a hidden sinister side, here's your proof.

The lovable, roly-poly, honey-aficionado became a cherished figure in children's literature almost overnight with the publication of author A.A. Milne's When We Were Very Young in 1924. — Read the rest

The post Why A.A. Milne and his son Christopher Robin ended up hating Winnie the Pooh, and stopped speaking with each other appeared first on Boing Boing.

Design Tool Think Tank Required

When I was in the EDA industry as a technologist, there were three main parts to my role. The first was to tell customers about new technologies being developed and tool extensions that would be appearing in the next release. These were features they might find beneficial both in the projects they were undertaking today, and even more so, would apply to future projects. Second, I would try and find out what new issues they were finding, or where the tools were not delivering the capabilities they required. This would feed into tool development planning. And finally, I would take those features selected by the marketing team for implementation and try to work out how best to implement them if it wasn’t obvious to the development teams.

By far the most difficult task out of the three was getting new requirements from customers. Most engineers have their heads down, concentrating on getting their latest chip out. When you ask them about new features, the only thing they offer are their current pain points. These usually involve incremental features or bugs, where the workaround is disliked, or insufficient performance.

Thirty years ago, when I first started doing that role, there were dedicated methodology groups within the larger companies whose job it was to develop flows and methodologies for future projects. This would appear to be the ideal people to ask, but in many cases they were so disconnected from the development team that what they asked for would never actually be used by the development team. These groups were idealists who wanted to instill revolutionary changes, whereas the development teams wanted evolutionary tools. The furthest many of those developments went was pilot projects that never became mainstream.

It seems as if the industry needs a better path to get requirements into the EDA companies. This used to be defined by the ITRS, which would look forward and project the new capabilities that would be required and the timeframes for them. That no longer exists. Today, standards are being driven by semiconductor companies. This is a change from the past, where we used to see the EDA companies driving the developments done within groups like Accellera. When I look at their recent undertakings, most of them are driven by end users.

Getting a standards group started today happens fairly late in the process. It implies an immediate need, but does not really allow time for solutions to be developed ahead of time. It appears that a think tank is required where the industry can discuss issues and problems for which new tool development is required. That can then be built into the EDA roadmaps so that the technology becomes available when it is needed.

One such area is power analysis. I have been writing stories about how important power and energy is becoming and may indeed soon become the limiter for many of the most complex designs. Some of the questions I always ask are:

  • What tools are being developed for doing power analysis of software?
  • How can you calculate the energy consumed for a given function?
  • How can users optimize a design for power or energy?

I rarely get straight answers to any of these questions. Instead, I’m often given vague ideas about how a user could do this in a manual fashion given the tools currently available.

I was beginning to think I was barking up the wrong tree and perhaps these were not legitimate concerns. My sanity was restored by a comment on one of my recent power related stories. Allan Cantle, OCP HPC Sub-Project Leader at Open Compute Project Foundation, wrote: “While it’s great to see articles like this highlight the need for us all to focus on energy centric computing, the sad news is that our tools don’t report energy in any obvious way to show the stupid architectural mistakes we often make from an energy consumption perspective. We are solving all the problems from a bottoms-up perspective by bringing things closer together. While that does bring tremendous energy efficiency benefits, it also creates massively increasing energy density. There is so much low-hanging fruit from a top-down system architecture approach that the industry is missing because we need to think outside the box and across our silos.”

Cantle went on to say: “A trivial improvement in tools that report energy consumption as a first-class metric will make it far easier for us to understand and rectify the mistakes we make as we build new energy-centric, domain-specific computers for each application. Alternatively, the silicon gods that rule our industry would be wise to take a step backward and think about the problem from a systems level perspective.”

I couldn’t agree more, and I find it frustrating that no EDA company seems to be listening. I am sure part of the problem is that the large customers are working on their own internal solutions, and they feel it will provide them with a competitive advantage. Until it becomes clear that all of their competitors have similar solutions, and that they no longer get an advantage from it, then they will look to transfer those solutions to the EDA companies so they do not have to maintain them. The EDA companies will then start to fight to make the solution they have acquired the standard. It all takes a long time.

In partial defense of the EDA companies, they are facing so many new issues these days that they are spread very thin dealing with new nodes, 2.5D, 3D, shift left, multi-physics, AI algorithms – to name just a few. They already spend more on R&D than most technology companies as a percentage of revenue.

Perhaps Accellera could start to include discussion forums in events like DVCon. This would allow for an open discussion about the problems they need to have solved. Perhaps they could start to produce the EDA equivalent of the old ITRS roadmap. It sure would save a lot of time and energy (pun intended).

The post Design Tool Think Tank Required appeared first on Semiconductor Engineering.

Figuring Out Semiconductor Manufacturing's Climate Footprint



Samuel K. Moore Hi. I’m Samuel K. Moore for IEEE Spectrum‘s Fixing the Future podcast. Before we start, I want to tell you that you can get the latest coverage from some of Spectrum‘s most important beats, including AI, climate change, and robotics, by signing up for one of our free newsletters. Just go to spectrum.ieee.org/newsletters to subscribe. The semiconductor industry is in the midst of a major expansion driven by the seemingly insatiable demands of AI, the addition of more intelligence in transportation, and national security concerns, among many other things. Governments and the industry itself are starting to worry what this expansion might mean for chip-making’s carbon footprint and its sustainability generally. Can we make everything in our world smarter without worsening climate change? I’m here with someone who’s helping figure out the answer. Lizzie Boakes is a life cycle analyst in the Sustainable Semiconductor Technologies and Systems Program at IMEC, the Belgium-based nanotech research organization. Welcome, Lizzie.

Lizzie Boakes: Hello.

Moore: Thanks very much for coming to talk with us.

Boakes: You’re welcome. Pleasure to be here.

Moore: So let’s start with, just how big is the carbon footprint of the semiconductor industry? And is it really big enough for us to worry about?

Boakes: Yeah. So quantifying the carbon footprint of the semiconductor industry is not an easy task at all, and that’s because semiconductors are now embedded in so many industries. So the most obvious industry is the ICT industry, which is estimated to be about approximately 3 percent of the global emissions. However, semiconductors can also be found in so many other industries, and their embedded nature is increasing dramatically. So they’re embedded in automotives, they’re embedded in healthcare applications, as far as aerospace and defense applications too. So their expansion and adoption of semiconductors in all of these different industries just makes it very hard to quantify.

And the global impact of the semiconductor chip manufacturing itself is expected to increase as well because of the fact that we need more and more of these chips. So the global chip market is projected to have a 7 percent compound annual growth rate in the next coming years. And bearing in mind that the manufacturing of the IC chips itself often accounts for the largest share of the life cycle climate impact, especially for consumer electronics, for instance. This increase in demand for so many chips and the demand for the manufacturing of those chips will have a significant impact on the climate impact of the semiconductor industry. So it’s really crucial that we focus on this and we identify the challenges and try to work towards reducing the impact to achieve any of our ambitions at reaching net zero before 2050.

Moore: Okay. So the way you looked at this, it was sort of a— it was cradle-to-gate life cycle. Can you sort of explain what that entails, what that really means?

Boakes: Yeah. So cradle to gate here means that we quantify the climate impacts, not only of the IC manufacturing processes that occur inside the semiconductor fab, but also we quantify the embedded impact of all of the energy and material flows that are entering the fab that are necessary for the fab to operate. So in other words, we try to quantify the climate impact of the value chain upstream to the fab itself, and that’s where the cradle begins. So the extraction of all of the materials that you need, all of the energy sources. For instance, the extraction of coal for electricity production. That’s the cradle. And the gate refers to the point where you stop the analysis, you stop the quantification of the impact. And in our case, that is the end of the processing of the silicon wafer for a specific technology node.

Moore: Okay. So it stops basically when you’ve got the die, but it hasn’t been packaged and put in a computer.

Boakes: Exactly.

Moore: And so why do you feel like you have to look at all the upstream stuff that a chip-maker may not really have any control over, like coal and such like that?

Boakes: So there is a big need to analyze your scope through what is called— in greenhouse gas protocol, you have three different scopes. Your scope one is your direct emissions. Your scope two is the emissions related to the electricity consumption and the production of electricity that you have consumed in your operation. And scope three is basically everything else, and a lot of people start with scope three, all of their upstream materials. And it does have— it’s obviously the largest scope because it’s everything else other than what you’re doing. And I think it’s necessary to coordinate your supply chain so that you make sure you’re doing the most sustainable solution that you can. So if there are— you have power in your purchasing, you have power over how you choose your supply chain. And if you can manipulate it in a way where you have reduced emissions, then that should be done. Often, scope three is the largest proportion of the total impact, A, because it’s one of the biggest groups, but B, because there is a lot of materials and things coming in. So yeah, it’s necessary to have a look up there and see how you can best reduce your emissions. And yeah, you can have power in your influence over what you choose in the end, in terms of what you’re purchasing.

Moore: All right. So in your analysis, what did you see as sort of the biggest contributors to the chip fabs carbon output?

Boakes: So without effective abatement, the processed gases that are released as direct emissions, they would really dominate the total emissions of the IC chip manufacturing. And this is because the processed gases that are often consumed in IC manufacturing, they have a very high GWP value. So if you do not abate them and you do not destroy them in a small abatement system, then their emissions and contribution to global warming are very large. However, you can drastically reduce that emission already by deploying effective abatements on specific process areas, the high-impact process areas. And if you do that, then this distribution shifts.

So then you would see that the direct emission-- the contribution of the direct emissions would reduce because you’ve reduced your direct emission output. But then the next-biggest contributor would be the electrical energy. So the scope to the emissions that are related to the production of the electricity that you’re consuming. And as you can imagine, IC manufacturing is very energy-intensive. So there’s a lot of electricity coming in, so it’s necessary then to try to start to decarbonize your electricity provider or reduce your carbon intensity of your electricity that you’re purchasing.

And then once you do that step, you would also see that again the distribution changes, and your scope three, your upstream materials, would then be the largest contributors to the total impact. And the materials that we’ve identified as being the most or the largest contributors to that impact would be, for instance, the silicon wafers themselves, the raw wafers before you start processing, as well as wet chemicals. So these are chemicals that are very specific to the semiconductor industry. There’s a lot of consumption there, and they’re very specific and have a high GWP value.

Moore: Okay. So if we could start with— unpack a few of those. First off, what are some of these chemicals, and are they generally abated well these days? Or is this sort of something that’s still a coming problem?

Boakes: Yeah. So they could be from specific photoresists to— there is a very heavy consumption of basic chemicals for neutralization of wastewater, these types of things. So there’s a combination of having in a high embedded GWP value, which means that it takes a very large amount of-- or has a very large impact to produce the chemical itself, or you just have a lot that you’re consuming of it. So it might have a low embedded impact, but you’re just using so much of it that, in the end, it’s the higher contributor anyway. So you have two kind of buckets there. And yeah, it would just be a matter of, you have to multiply through the amounts by your embedded emission to see which ones come on top. But yeah, we see that often, the wastewater treatment uses a lot of these chemicals just for neutralization and treatment of wastewater on site, as well as very specific chemicals for the semiconductor industry such as photoresists and CMP cleans, those types of very specific chemistries which, again, it’s difficult to quantify the embedded impact of because often there’s a proprietary— you don’t exactly know what goes into it, and it’s a lot of difficulty trying to actually characterize those chemicals appropriately. So often we apply a proxy value to those. So this is something that we would really like to improve in the future would be having more communication with our supply chain and really understanding what the real embedded impact of those chemicals would be. This is something that we really would need to work on to really identify the high-impact chemicals and try anything we can to reduce them.

Moore: Okay. And what about those direct greenhouse gas emission chemicals? Are those generally abated, or is that something that’s still being worked on?

Boakes: So there is quite, yeah, a substantial amount of work going into the abatement system. So we have the usual methane combustion of processed gases. There’s also now development in plasma abatement systems. So there are different abatement systems being developed, and their effectiveness is quite high. However, we don’t have such a good oversight at the moment on the amount of abatement that’s being deployed in high-volume manufacturing. This, again, is quite a sensitive topic to discuss from a research perspective when you don’t have insight into the fab itself. So asking particular questions about how much abatement is deployed on certain tools is not such easy data to come across.

So we often go with models. So we apply the IPCC Tier 2c model where, basically, you calculate the direct emissions by how much you’ve used. So it’s a mathematical model based on how much you’ve consumed. There is a model that generates the amounts that would be emitted directly into the atmosphere. So this is the model that we’ve applied. And we see that, yeah, it does correlate sometimes with the top-down reporting that comes from the industry. So yeah, I think there is a lot of way forward where we can start comparing top-down reporting to these bottom-up models that we’ve been generating from a kind of research perspective. So yeah, there’s still a lot of work to do to match those.

Moore: Okay. Are there any particular nasties in terms of what those chemicals are? I don’t think people are familiar with really what comes out of the smokestack of chip fab.

Boakes: So one of the highest GWP gases, for instance, would be the sulfur hexafluoride, so SF6. This has a GWP value of 25,200 kilograms of CO2 equivalent. So that really means that it has over 25,000 times more damaging effects to the climate compared to a CO2, so the equivalent CO2 molecule. So this is extremely high. But there’s also others like NF4 that— these also have over 1,000 times more damaging to the climate than CO2. However, they can be abated. So in these abatement systems, you can destroy them and they’re no longer being released.

There are also efforts going into replacing high GWP gases such as these that I’ve mentioned to use alternatives which have a lower GWP value. However, this is going to take a lot of process development and a lot of effort to go into changing those process flows to adapt to these new alternatives. And this will then be a slow adoption into the high-volume fabs because, as we know, this industry is quite rigid to any changes that you suggest. So yeah, it will be a slow adoption if there are any alternatives. And for the meantime, effective abatement can destroy quite a lot. But it would really be having to employ and really have those abatement systems on those high-impact process areas.

Moore: As Moore’s Law continues, each step or manufacturing node might have a different carbon footprint. What were some of the big trends your research revealed regarding that?

Boakes: So in our model, we’ve assumed a constant fab operation condition, and this means that we’ve assumed the same abatement systems, the same electrical carbon intensities, for all of the different technology nodes, which-- yeah. So we see that there is a general increase in total emissions under these assumptions, and we double in total climate impact from N28 to A14. So when we evolve in that technology node, we do see it doubling between N28 and A14. And this can be attributed to the increased process complexity as well as the increased number of steps, in process steps, as well as the different chemistries being used, different materials that are being embedded in the chips. This all contributes to it. So generally, there is an increase because of the process complexities that’s required to really reach those aggressive pitches in the more advanced technology nodes.

Moore: I see. Okay. So as things are progressing, they’re also kind of getting worse in some ways. Is there anything—?

Boakes: Yeah.

Moore: Is this inevitable, or is there—?

Boakes: [laughter] Yeah. If you make things more complicated, it will probably take more energy and more materials to do it. Also, when you make things smaller, you need to change your processes and use-- yeah, for instance, with interconnect metals, we’ve really reached the physical limits sometimes because it’s gotten so small that the physical limits of really traditional metals like copper or tungsten has been reached. And now they’re looking for new alternatives like ruthenium, yeah, or platinum. Different types of metals which-- again, if it’s a platinum group metal, of course it’s going to have a higher embedded impact. So when we hit those limits, physical limits or limits to the current technology and we need to change it in a way that makes it more complicated, more energy-intensive— again, the move to EUV. EUV is an extremely energy-intensive tool compared to DUV.

But an interesting point there on the EUV topic would be that it’s really important to keep this holistic view because even though moving from a DUV tool to an EUV tool, it has a large jump in energy intensity per kilowatt hour. The power intensity of the tool is much higher. However, you’re able to reduce the number of total steps to achieve a certain deposition or edge. So you’re able to overall reduce your emissions, or you’re able to reduce your energy intensity of the process flow. So even though we make all these changes and we might think, “Oh, that’s a very powerful tool,” it could go and cut down on process steps in the holistic view. So it’s always good to keep a kind of life cycle perspective to be able to see, “Okay, if I implement this tool, it does have a higher power intensity, but I can reduce half of the number of steps to achieve the same result. So it’s overall better. So it’s always good to keep that kind of holistic view when we’re doing any type of sustainability assessment.

Moore: Oh, that’s interesting. That’s interesting. So you also looked at— as sort of the nodes get more advanced and processes get more complex. What did that do to water consumption?

Boakes: Also, so again, the number of steps in a similar sense. If you’re increasing your number of process steps, there would be an increase in the number of those wet clean steps as well that are often the high-water-consumption steps. So if you have an increased number of those particular process steps, then you’re going to have a higher water consumption in the end. So it is just based on the number of steps and the complexity of the process as we advance into the more advanced technology nodes.

Moore: Okay. So it sounds like complexity is kind of king in this field.

Boakes: Yeah.

Moore: What should the industry be focusing on most to achieve its carbon goals going forward?

Boakes: Yeah. So I think to start off, you need to think of the largest contributors and prioritize those. So of course, if you’re looking at the total impact and we’re looking at a system that doesn’t have effective abatement, then of course, direct emissions would be the first thing that you want to try to focus on and reducing, as they would be the largest contributors. However, once you start moving into a system which already has effective abatement, then your next objective would be to decarbonize your electricity production, go for a lower-carbon-intensity electricity provider, so you’re moving more towards green energy.

And at the same time, you would also want to try to target your high-impact value chain. So your materials and energy that are coming into the fab, you need to look at the ones that are the most highly impacting and then try to find a way to find a provider that does a kind of decarbonized version of the same material or try to design a way where you don’t need that certain material. So not necessarily that it has to be done in a sequential order. Of course, you can do it all in parallel. It would be better. So it doesn’t have to be one, two, three, but the idea and the prioritizing comes from targeting the largest contributors. And that would be direct emissions, decarbonizing your electricity production, and then looking at your supply chain and looking into those high-impact materials.

Moore: Okay. And as a researcher, I’m sure there’s data you would love to have that you probably don’t have. What could industry do better about providing that kind of data to make these models work?

Boakes: So for a lot of our a lot of our scope three, so that upstream, that cradle-to-fab, let’s call it— those impacts. We’ve had to use quite a lot— we had to rely quite a lot on life cycle assessment literature or life cycle assessment databases, which are available through purchasing, or sometimes if you’re lucky, you have a free database. So I would say-- and that’s also because my role in my research group is more looking at that LCA and upstream materials and quantifying the environmental impact of that. So from my perspective, I really think that this industry needs to work on providing data through the supply chain, which is standardized in a way that people can understand, which is product-specific so that we can really allocate embedded impact to a specific product and multiply that through then by our inventory, which we have data on. So for me, it’s really having a standardized way of communicating sustainability impact of production, upstream production, throughout the supply chain. Not only tier one, but all the way up to the cradle, the beginning of the value chain. So this is something-- and I know it is evolving and it will be slow, and it does need a lot of cooperation. But I do think that that would be very, very useful for really making our work more realistic, more representative. And then people can rely on it better when they start using our data in their product carbon footprints, for instance.

Moore: Okay. And speaking of sort of your work, can you tell me what imec.netzero is and how that works?

Boakes: Yeah. This is a web app that’s been developed in our program, so the SSTS program at IMEC. And this web app is a way for people to interact with the model that we’ve been building, the LCA model. So it’s based on life cycle assessment, and it’s really what we’ve been talking about with this cradle-to-gate model of the IC-chip-manufacturing process. It tries to model a generic fab. So we don’t necessarily point to any specific fab or process flow from a certain company. But we try to make a very generic industry average that people can use to estimate and get a more realistic view on the modern IC chip. Because we noticed that, in literature and what’s available in LCA databases, the semiconductor data is extremely old, and we know that this industry moves very quickly. So there is a huge gap between what’s happening now and what is going into your phones and what’s going into the computers and the LCA data that’s available to try to quantify that from a sustainability perspective. So imec.netzero, we work with all of— we have the benefit of being connected with the industry and now a position in IMEC, and we have a view on those more advanced technology nodes.

So not only do we have models for the nodes that are being generated and produced today, but we also predict the future nodes. And we have models to predict what will happen in 5 years’ time, in 10 years’ time. So it’s a really powerful tool, and it’s available publicly. We have a public version, which is a limited-- it has limited functionality in comparison to the program partner version. So we work with our program partners who have access to a much more complicated and, yeah, deep way of using the web app, as well as the other work that we do in our program. And our program partners also contribute data to the model, and we’re constantly evolving the model to improve always. So that’s a bit of an overview.

Moore: Cool. Cool. Thank you very much, Lizzie. I have been speaking to Lizzie Boakes, a life cycle analyst in the Sustainable Semiconductor Technologies and Systems Program at IMEC, the Belgium-based nanotech research organization. Thank you again, Lizzie. This has been fantastic.

❌