Semiconductor Engineering
Blog Review: Aug. 21Jesse Allen
Cadence’s Reela Samuel explores the critical role of PCIe 6.0 equalization in maintaining signal integrity and solutions to mitigate verification challenges, such as creating checkers to verify all symbols of TS0, ensuring the correct functioning of scrambling, and monitoring phase and LTSSM state transitions. Siemens’ John McMillan introduces an advanced packaging flow for Intel’s Embedded Multi-die Interconnect Bridge (EMIB) technology, including technical challenges, design methodologies, and
21. Srpen 2024 v 09:01

Blog Review: Aug. 21

Od: Jesse Allen

21. Srpen 2024 v 09:01

Cadence’s Reela Samuel explores the critical role of PCIe 6.0 equalization in maintaining signal integrity and solutions to mitigate verification challenges, such as creating checkers to verify all symbols of TS0, ensuring the correct functioning of scrambling, and monitoring phase and LTSSM state transitions.

Siemens’ John McMillan introduces an advanced packaging flow for Intel’s Embedded Multi-die Interconnect Bridge (EMIB) technology, including technical challenges, design methodologies, and the integration of EMIBs in system-level package designs.

Synopsys’ Dustin Todd checks out what’s next for the U.S. CHIPS and Science Act, including the establishment of the National Semiconductor Technology Center and the allocation of $13 billion for research and development efforts.

Keysight’s Roberto Piacentini Filho explores the challenges of managing the large design files and massive volumes of data involved in a modern chip design project, which can take up as much as a terabyte of disk space and involve hundreds of thousands of files.

Arm’s Sandeep Mistry shows how ML models developed for mobile computer vision applications and requiring tens to hundreds of millions of multiply-accumulate (MACs) operations per inference can be deployed to a modern microcontroller.

Ansys’ Aliyah Mallak explores an effort to manufacture biotech products in microgravity and how simulation helps ensure payloads containing delicate, temperature-sensitive spore samples and bioreactors make it safely to the International Space Station or low Earth orbit safely.

Micron Technology’s Amit Srivastava, ULVAC’s Brian Coppa, and SEMI’s Mark da Silva suggest tackling corporate sustainability goals with a bottom-up approach that leverages various sensing technologies, at the cleanroom, sub-fab, and facilities levels for both greenfield and brownfield device-making facilities, to enable predictive analytics.

And don’t miss the blogs featured in the latest Manufacturing, Packaging & Materials newsletter:

Amkor’s JeongMin Ju shows how to prevent critical failures in copper RDLs caused by overcurrent-induced fusing.

Synopsys’ Al Blais discusses curvilinear checking and fracture requirements for the MULTIGON era.

Lam Research’s Dempsey Deng compares the parasitic capacitance of a 6F² honeycomb DRAM device to a 4F² VCAT DRAM structure.

Brewer Science’s Jessica Albright covers debonding methods, thermal, topography, adhesion, and thickness variation.

SEMI’s John Cooney reviews a fireside chat between the President of SEMI Americas and the U.S. Under Secretary of State for Economic Growth, Energy, and the Environment on securing supply chains.

The post Blog Review: Aug. 21 appeared first on Semiconductor Engineering.

Semiconductor Engineering
AI/ML’s Role In Design And Test ExpandsLaura Peters
The role of AI and ML in test keeps growing, providing significant time and money savings that often exceed initial expectations. But it doesn’t work in all cases, sometimes even disrupting well-tested process flows with questionable return on investment. One of the big attractions of AI is its ability to apply analytics to large data sets that are otherwise limited by human capabilities. In the critical design-to-test realm, AI can address problems such as tool incompatibilities between the des
5. Srpen 2024 v 09:03

AI/ML’s Role In Design And Test Expands

Semiconductor Engineering

Od: Laura Peters

5. Srpen 2024 v 09:03

The role of AI and ML in test keeps growing, providing significant time and money savings that often exceed initial expectations. But it doesn’t work in all cases, sometimes even disrupting well-tested process flows with questionable return on investment.

One of the big attractions of AI is its ability to apply analytics to large data sets that are otherwise limited by human capabilities. In the critical design-to-test realm, AI can address problems such as tool incompatibilities between the design set-up, simulation, and ATE test program, which typically slows debugging and development efforts. Some of the most time-consuming and costly aspects of design-to-test arise from incompatibilities between tools.

“During device bring-up and debug, complex software/hardware interactions can expose the need for domain knowledge from multiple teams or stakeholders, who may not be familiar with each other’s tools,” said Richard Fanning, lead software engineer at Teradyne. “Any time spent doing conversions or debugging differences in these set-ups is time wasted. Our toolset targets this exact problem by allowing all set-ups to use the same set of source files so everyone can be sure they are running the same thing.”

ML/AI can help keep design teams on track, as well. “As we drive down this technology curve, the analytics and the compute infrastructure that we have to bring to bear becomes increasingly more complex and you want to be able to make the right decision with a minimal amount of overkill,” said Ken Butler, senior director of business development in the ACS data analytics platform group at Advantest. “In some cases, we are customizing the test solution on a die-by-die type of basis.”

But despite the hype, not all tools work well in every circumstance. “AI has some great capabilities, but it’s really just a tool,” said Ron Press, senior director of technology enablement at Siemens Digital Industries Software, in a recent presentation at a MEPTEC event. “We still need engineering innovation. So sometimes people write about how AI is going to take away everybody’s job. I don’t see that at all. We have more complex designs and scaling in our designs. We need to get the same work done even faster by using AI as a tool to get us there.”

Speeding design to characterization to first silicon
In the face of ever-shrinking process windows and the lowest allowable defectivity rates, chipmakers continually are improving the design-to-test processes to ensure maximum efficiency during device bring-up and into high volume manufacturing. “Analytics in test operations is not a new thing. This industry has a history of analyzing test data and making product decisions for more than 30 years,” said Advantest’s Butler. “What is different now is that we’re moving to increasingly smaller geometries, advanced packaging technologies and chiplet-based designs. And that’s driving us to change the nature of the type of analytics that we do, both in terms of the software and the hardware infrastructure. But from a production test viewpoint, we’re still kind of in the early days of our journey with AI and test.”

Nonetheless, early adopters are building out the infrastructure needed for in-line compute and AI/ML modeling to support real-time inferencing in test cells. And because no one company has all the expertise needed in-house, partnerships and libraries of applications are being developed with tool-to-tool compatibility in mind.

“Protocol libraries provide out-of-the-box solutions for communicating common protocols. This reduces the development and debug effort for device communication,” said Teradyne’s Fanning. “We have seen situations where a test engineer has been tasked with talking to a new protocol interface, and saved significant time using this feature.”

In fact, data compatibility is a consistent theme, from design all the way through to the latest developments in ATE hardware and software. “Using the same test sequences between characterization and production has become key as the device complexity has increased exponentially,” explained Teradyne’s Fanning. “Partnerships with EDA tool and IP vendors is also key. We have worked extensively with industry leaders to ensure that the libraries and test files they output are formats our system can utilize directly. These tools also have device knowledge that our toolset does not. This is why the remote connect feature is key, because our partners can provide context-specific tools that are powerful during production debug. Being able to use these tools real-time without having to reproduce a setup or use case in a different environment has been a game changer.”

Serial scan test
But if it seems as if all the configuration changes are happening on the test side, it’s important to take stock of substantial changes on the approach to multi-core design for test.

Tradeoffs during the iterative process of design for test (DFT) have become so substantial in the case of multi-core products that a new approach has become necessary.

“If we look at the way a design is typically put together today, you have multiple cores that are going to be produced at different times,” said Siemens’ Press. “You need to have an idea of how many I/O pins you need to get your scan channels, the deep serial memory from the tester that’s going to be feeding through your I/O pins to this core. So I have a bunch of variables I need to trade off. I have the number of pins going to the core, the pattern size, and the complexity of the core. Then I’ll try to figure out what’s the best combination of cores to test together in what is called hierarchical DFT. But as these designs get more complex, with upwards of 2,500 cores, that’s a lot of tradeoffs to figure out.”

Press noted that applying AI with the same architecture can provide a 20% to 30% higher efficiency, but an improved methodology based on packetized scan test (see figure 1) actually makes more sense.

Fig. 1: Advantages to the serial scan network (SSN) approach. Source: Siemens

“Instead of having tester channels feeding into the scan channels that go to each core, you have a packetized bus and packets of data that feed through all the cores. Then you instruct the cores when their packet information is going to be available. By doing this, you don’t have as many variables you need to trade off,” he said. At the core level, each core can be optimized for any number of scan channels and patterns, and the I/O pin count is no longer a variable in the calculation. “Then, when you put it into this final chip, it deliver from the packets the amount of data you need for that core, that can work with any size serial bus, in what is called a serial scan network (SSN).”

Some of the results reported by Siemens EDA customers (see figure 2) highlight both supervised and unsupervised machine learning implementation for improvements in diagnosis resolution and failure analysis. DFT productivity was boosted by 5 to 10X using the serial scan network methodology.

Fig. 2: Realized benefits using machine learning and the serial scan network approach. Source: Siemens

What slows down AI implementation in HVM?
In the transition from design to testing of a device, the application of machine learning algorithms can enable a number of advantages, from better pairing of chiplet performance for use in an advanced package to test time reduction. For example, only a subset of high-performing devices may require burn-in.

“You can identify scratches on wafers, and then bin out the dies surrounding those scratches automatically within wafer sort,” said Michael Schuldenfrei, fellow at NI/Emerson Test & Measurement. “So AI and ML all sounds like a really great idea, and there are many applications where it makes sense to use AI. The big question is, why isn’t it really happening frequently and at-scale? The answer to that goes into the complexity of building and deploying these solutions.”

Schuldenfrei summarized four key steps in ML’s lifecycle, each with its own challenges. In the first phase, the training, engineering teams use data to understand a particular issue and then build a model that can be used to predict an outcome associated with that issue. Once the model is validated and the team wants to deploy it in the production environment, it needs to be integrated with the existing equipment, such as a tester or manufacturing execution system (MES). Models also mature and evolve over time, requiring frequent validation of the data going into the model and checking to see that the model is functioning as expected. Models also must adapt, requiring redeployment, learning, acting, validating and adapting, in a continuous circle.

“That eats up a lot of time for the data scientists who are charged with deploying all these new AI-based solutions in their organizations. Time is also wasted in the beginning when they are trying to access the right data, organizing it, connecting it all together, making sense of it, and extracting features from it that actually make sense,” said Schuldenfrei.

Further difficulties are introduced in a distributed semiconductor manufacturing environment in which many different test houses are situated in various locations around the globe. “By the time you finish implementing the ML solution, your model is stale and your product is probably no longer bleeding edge so it has lost its actionability, when the model needs to make a decision that actually impacts either the binning or the processing of that particular device,” said Schuldenfrei. “So actually deploying ML-based solutions in a production environment with high-volume semiconductor test is very far from trivial.”

He cited a 2014 Google article that stated how the ML code development part of the process is both the smallest and easiest part of the whole exercise, [1] whereas the various aspects of building infrastructure, data collection, feature extraction, data verification, and managing model deployments are the most challenging parts.

Changes from design through test ripple through the ecosystem. “People who work in EDA put lots of effort into design rule checking (DRC), meaning we’re checking that the work we’ve done and the design structure are safe to move forward because we didn’t mess anything up in the process,” said Siemens’ Press. “That’s really important with AI — what we call verifiability. If we have some type of AI running and giving us a result, we have to make sure that result is safe. This really affects the people doing the design, the DFT group and the people in test engineering that have to take these patterns and apply them.”

There are a multitude of ML-based applications for improving test operations. Advantest’s Butler highlighted some of the apps customers are pursuing most often, including search time reduction, shift left testing, test time reduction, and chiplet pairing (see figure 3).

“For minimum voltage, maximum frequency, or trim tests, you tend to set a lower limit and an upper limit for your search, and then you’re going to search across there in order to be able to find your minimum voltage for this particular device,” he said. “Those limits are set based on process split, and they may be fairly wide. But if you have analytics that you can bring to bear, then the AI- or ML-type techniques can basically tell you where this die lies on the process spectrum. Perhaps it was fed forward from an earlier insertion, and perhaps you combine it with what you’re doing at the current insertion. That kind of inference can help you narrow the search limits and speed up that test. A lot of people are very interested in this application, and some folks are doing it in production to reduce search time for test time-intensive tests.”

Fig. 3: Opportunities for real-time and/or post-test improvements to pair or bin devices, improve yield, throughput, reliability or cost using the ACS platform. Source: Advantest

“The idea behind shift left is perhaps I have a very expensive test insertion downstream or a high package cost,” Butler said. “If my yield is not where I want it to be, then I can use analytics at earlier insertions to be able to try to predict which devices are likely to fail at the later insertion by doing analysis at an earlier insertion, and then downgrade or scrap those die in order to optimize downstream test insertions, raising the yield and lowering overall cost. Test time reduction is very simply the addition or removal of test content, skipping tests to reduce cost. Or you might want to add test content for yield improvement,” said Butler.

“If I have a multi-tiered device, and it’s not going to pass bin 1 criteria – but maybe it’s bin 2 if I add some additional content — then people may be looking at analytics to try to make those decisions. Finally, two things go together in my mind, this idea of chiplet designs and smart pairing. So the classic example is a processor die with a stack of high bandwidth memory on top of it. Perhaps I’m interested in high performance in some applications and low power in others. I want to be able to match the content and classify die as they’re coming through the test operation, and then downstream do pick-and-place and put them together in such a way that I maximize the yield for multiple streams of data. Similar kinds of things apply for achieving a low power footprint and carbon footprint.”

Generative AI
The question that inevitably comes up when discussing the role of AI in semiconductors is whether or not large language models like ChatGPT can prove useful to engineers working in fabs. Early work shows some promise.

“For example, you can ask the system to build an outlier detection model for you that looks for parts that are five sigma away from the center line, saying ‘Please create the script for me,’ and the system will create the script. These are the kinds of automated, generative AI-based solutions that we’re already playing with,” says Schuldenfrei. “But from everything I’ve seen so far, there is still quite a lot of work to be done to get these systems to provide outputs with high enough quality. At the moment, the amount of human interaction that is needed afterward to fix problems with the algorithms or models that generative AI is producing is still quite significant.”

A lingering question is how to access the test programs needed to train the new test programs when everyone is protecting important test IP? “Most people value their test IP and don’t necessarily want to set up guardrails around the training and utilization processes,” Butler said. “So finding a way to accelerate the overall process of developing test programs while protecting IP is the challenge. It’s clear this kind of technology is going to be brought to bear, just like we already see in the software development process.”

Failure analysis
Failure analysis is typically a costly and time-consuming endeavor for fabs because it requires a trip back in time to gather wafer processing, assembly, and packaging data specific to a particular failed device, known as a returned material authorization (RMA). Physical failure analysis is performed in an FA lab, using a variety of tools to trace the root cause of the failure.

While scan diagnostic data has been used for decades, a newer approach involves pairing a digital twin with scan diagnostics data to find the root cause of failures.

“Within test, we have a digital twin that does root cause deconvolution based on scan failure diagnosis. So instead of having to look at the physical device and spend time trying to figure out the root cause, since we have scan, we have millions and millions of virtual sample points,” said Siemens’ Press. “We can reverse-engineer what we did to create the patterns and figure out where the mis-compare happened within the scan cells deep within the design. Using YieldInsight and unsupervised machine learning with training on a bunch of data, we can very quickly pinpoint the fail locations. This allows us to run thousands, or tens of thousands fail diagnoses in a short period of time, giving us the opportunity to identify the systematic yield limiters.”

Yet another approach that is gaining steam is using on-die monitors to access specific performance information in lieu of physical FA. “What is needed is deep data from inside the package to monitor performance and reliability continuously, which is what we provide,” said Alex Burlak, vice president of test and analytics at proteanTecs. “For example, if the suspected failure is from the chiplet interconnect, we can help the analysis using deep data coming from on-chip agents instead of taking the device out of context and into the lab (where you may or may not be able to reproduce the problem). Even more, the ability to send back data and not the device can in many cases pinpoint the problem, saving the expensive RMA and failure analysis procedure.”

Conclusion
The enthusiasm around AI and machine learning is being met by robust infrastructure changes in the ATE community to accommodate the need for real-time inferencing of test data and test optimization for higher yield, higher throughput, and chiplet classifications for multi-chiplet packages. For multi-core designs, packetized test, commercialized as an SSN methodology, provides a more flexible approach to optimizing each core for the number of scan chains, patterns and bus width needs of each core in a device.

The number of testing applications that can benefit from AI continues to rise, including test time reduction, V_min/F_max search reduction, shift left, smart pairing of chiplets, and overall power reduction. New developments like identical source files for all setups across design, characterization, and test help speed the critical debug and development stage for new products.

Reference

https://proceedings.neurips.cc/paper_files/paper/2015/file/86df7dcfd896fcaf2674f757a2463eba-Paper.pdf

The post AI/ML’s Role In Design And Test Expands appeared first on Semiconductor Engineering.

Semiconductor Engineering
Ensure Reliability In Automotive ICs By Reducing Thermal EffectsLee Wang
In the relentless pursuit of performance and miniaturization, the semiconductor industry has increasingly turned to 3D integrated circuits (3D-ICs) as a cutting-edge solution. Stacking dies in a 3D assembly offers numerous benefits, including enhanced performance, reduced power consumption, and more efficient use of space. However, this advanced technology also introduces significant thermal dissipation challenges that can impact the electrical behavior, reliability, performance, and lifespan of
5. Srpen 2024 v 09:01

Ensure Reliability In Automotive ICs By Reducing Thermal Effects

Semiconductor Engineering

Od: Lee Wang

5. Srpen 2024 v 09:01

In the relentless pursuit of performance and miniaturization, the semiconductor industry has increasingly turned to 3D integrated circuits (3D-ICs) as a cutting-edge solution. Stacking dies in a 3D assembly offers numerous benefits, including enhanced performance, reduced power consumption, and more efficient use of space. However, this advanced technology also introduces significant thermal dissipation challenges that can impact the electrical behavior, reliability, performance, and lifespan of the chips (figure 1). For automotive applications, where safety and reliability are paramount, managing these thermal effects is of utmost importance.

Fig. 1: Illustration of a 3D-IC with heat dissipation.

3D-ICs have become particularly attractive for safety-critical devices like automotive sensors. Advanced driver-assistance systems (ADAS) and autonomous vehicles (AVs) rely on these compact, high-performance chips to process vast amounts of sensor data in real time. Effective thermal management in these devices is a top priority to ensure that they function reliably under various operating conditions.

The thermal challenges of 3D-ICs in automotive applications

The stacked configuration of 3D-ICs inherently leads to complex thermal dynamics. In traditional 2D designs, heat dissipation occurs across a single plane, making it relatively straightforward to manage. However, in 3D-ICs, multiple active layers generate heat, creating significant thermal gradients and hotspots. These thermal issues can adversely affect device performance and reliability, which is particularly critical in automotive applications where components must operate reliably under extreme temperatures and harsh conditions.

These thermal effects in automotive 3D-ICs can impact the electrical behavior of the circuits, causing timing errors, increased leakage currents, and potential device failure. Therefore, accurate and comprehensive thermal analysis throughout the design flow is essential to ensure the reliability and performance of automotive ICs.

The importance of early and continuous thermal analysis

Traditionally, thermal analysis has been performed at the package and system levels, often as a separate process from IC design. However, with the advent of 3D-ICs, this approach is no longer sufficient.

To address the thermal challenges of 3D-ICs for automotive applications, it is crucial to incorporate die-level thermal analysis early in the design process and continue it throughout the design flow (figure 2). Early-stage thermal analysis can help identify potential hotspots and thermal bottlenecks before they become critical issues, enabling designers to make informed decisions about chiplet placement, power distribution, and cooling strategies. These early decisions reduce the risks of thermal-induced failures, improving the reliability of 3D automotive ICs.

Fig. 2: Die-level detailed thermal analysis using accurate package and boundary conditions should be fully integrated into the ASIC design flow to allow for fast thermal exploration.

Early package design, floorplanning and thermal feasibility analysis

During the initial package design and floorplanning stage, designers can use high-level power estimates and simplified models to perform thermal feasibility studies. These early analyses help identify configurations that are likely to cause thermal problems, allowing designers to rule out problematic designs before investing significant time and resources in detailed implementation.

Fig. 3: Thermal analysis as part of the package design, floorplanning and implementation flows.

For example, thermal analysis can reveal issues such as overlapping heat sources in stacked dies or insufficient cooling paths. By identifying these problems early, designers can explore alternative floorplans and adjust power distribution to mitigate thermal risks. This proactive approach reduces the likelihood of encountering critical thermal issues late in the design process, thereby shortening the overall design cycle.

Iterative thermal analysis throughout design refinement

As the design progresses and more detailed information becomes available, thermal analysis should be performed iteratively to refine the thermal model and verify that the design remains within acceptable thermal limits. At each stage of design refinement, additional details such as power maps, layout geometries and their material properties can be incorporated into the thermal model to improve accuracy.

This iterative approach lets designers continuously monitor and address thermal issues, ensuring that the design evolves in a thermally aware manner. By integrating thermal analysis with other design verification tasks, such as timing and power analysis, designers can achieve a holistic view of the design’s performance and reliability.

A robust thermal analysis tool should support various stages of the design process, providing value from initial concept to final signoff:

Early design planning: At the conceptual stage, designers can apply high-level power estimates to explore the thermal impact of different design options. This includes decisions related to 3D partitioning, die assembly, block and TSV floorplan, interface layer design, and package selection. By identifying potential thermal issues early, designers can make informed decisions that avoid costly redesigns later.
Detailed design and implementation: As designs become more detailed, thermal analysis should be used to verify that the design stays within its thermal budget. This involves analyzing the maturing package and die layout representations to account for their impact on thermally sensitive electrical circuits. Fine-grained power maps are crucial at this stage to capture hotspot effects accurately.
Design signoff: Before finalizing the design, it is essential to perform comprehensive thermal verification. This ensures that the design meets all thermal constraints and reliability requirements. Automated constraints checking and detailed reporting can expedite this process, providing designers with clear insights into any remaining thermal issues.
Connection to package-system analysis: Models from IC-level thermal analysis can be used in thermal analysis of the package and system. The integration lets designers build a streamlined flow through the entire development process of a 3D electronic product.

Tools and techniques for accurate thermal analysis

To effectively manage thermal challenges in automotive ICs, designers need advanced tools and techniques that can provide accurate and fast thermal analysis throughout the design flow. Modern thermal analysis tools are equipped with capabilities to handle the complexity of 3D-IC designs, from early feasibility studies to final signoff.

High-fidelity thermal models

Accurate thermal analysis requires high-fidelity thermal models that capture the intricate details of the 3D-IC assembly. These models should account for non-uniform material properties, fine-grained power distributions, and the thermal impact of through-silicon vias (TSVs) and other 3D features. Advanced tools can generate detailed thermal models based on the actual design geometries, providing a realistic representation of heat flow and temperature distribution.

For instance, tools like Calibre 3DThermal embeds an optimized custom 3D solver from Simcenter Flotherm to perform precise thermal analysis down to the nanometer scale. By leveraging detailed layer information and accurate boundary conditions, these tools can produce reliable thermal models that reflect the true thermal behavior of the design.

Automation and results viewing

Automation is a key feature of modern thermal analysis tools, enabling designers to perform complex analyses without requiring deep expertise in thermal engineering. An effective thermal analysis tool must offer advanced automation to facilitate use by non-experts. Key automation features include:

Optimized gridding: Automatically applying finer grids in critical areas of the model to ensure high resolution where needed, while using coarser grids elsewhere for efficiency.
Time step automation: In transient analysis, smaller time steps can be automatically generated during power transitions to capture key impacts accurately.
Equivalent thermal properties: Automatically reducing model complexity while maintaining accuracy by applying different bin sizes for critical (hotspot) vs non-critical regions when generating equivalent thermal properties.
Power map compression: Using adaptive bin sizes to compress very large power maps to improve tool performance.

Automated reporting: Generating summary reports that highlight key results for easy review and decision-making (figure 4).

Fig. 4: Ways to view thermal analysis results.

Automated thermal analysis tools can also integrate seamlessly with other design verification and implementation tools, providing a unified environment for managing thermal, electrical, and mechanical constraints. This integration ensures that thermal considerations are consistently addressed throughout the design flow, from initial feasibility analysis to final tape-out and even connecting with package-level analysis tools.

Real-world application

The practical benefits of integrated thermal analysis solutions are evident in real-world applications. For instance, a leading research organization, CEA, utilized an advanced thermal analysis tool from Siemens EDA to study the thermal performance of their 3DNoC demonstrator. The high-fidelity thermal model they developed showed a worst-case difference of just 3.75% and an average difference within 2% between simulation and measured data, demonstrating the accuracy and reliability of the tool (figure 5).

Fig. 5: Correlation of simulation versus measured results.

The path forward for automotive 3D-IC thermal management

As the automotive industry continues to embrace advanced technologies, the importance of accurate thermal analysis throughout the design flow of 3D-ICs cannot be overstated. By incorporating thermal analysis early in the design process and iteratively refining thermal models, designers can mitigate thermal risks, reduce design time, and enhance chip reliability.

Advanced thermal analysis tools that integrate seamlessly with the broader design environment are essential for achieving these goals. These tools enable designers to perform high-fidelity thermal analysis, automate complex tasks, and ensure that thermal considerations are addressed consistently from package design, through implementation to signoff.

By embracing these practices, designers can unlock the full potential of 3D-IC technology, delivering innovative, high-performance devices that meet the demands of today’s increasingly complex automotive applications.

For more information about die-level 3D-IC thermal analysis, read Conquer 3DIC thermal impacts with Calibre 3DThermal.

The post Ensure Reliability In Automotive ICs By Reducing Thermal Effects appeared first on Semiconductor Engineering.

Semiconductor Engineering
Chip Industry Week in ReviewThe SE Staff
Okinawa Institute of Science and Technology proposed a new EUV litho technology using only four reflective mirrors and a new method of illumination optics that it claims will use 1/10 the power and cost half as much as existing EUV technology from ASML. Applied Materials may not receive expected U.S. funding to build a $4 billion research facility in Sunnyvale, CA, due to internal government disagreements over how to fund chip R&D, according to Bloomberg. SEMI published a position paper this
2. Srpen 2024 v 09:01

Chip Industry Week in Review

Semiconductor Engineering

Od: The SE Staff

2. Srpen 2024 v 09:01

Okinawa Institute of Science and Technology proposed a new EUV litho technology using only four reflective mirrors and a new method of illumination optics that it claims will use 1/10 the power and cost half as much as existing EUV technology from ASML.

Applied Materials may not receive expected U.S. funding to build a $4 billion research facility in Sunnyvale, CA, due to internal government disagreements over how to fund chip R&D, according to Bloomberg.

SEMI published a position paper this week cautioning the European Union against imposing additional export controls to allow companies, encouraging them to be “as free as possible in their investment decisions to avoid losing their agility and relevance across global markets.” SEMI’s recommendations on outbound investments are in response to the European Economic Security Strategy and emphasize the need for a transparent and predictable regulatory framework.

The U.S. may restrict China’s access to HBM chips and the equipment needed to make them, reports Bloomberg. Today those chips are manufactured by two Korean-based companies, Samsung and SK hynix, but U.S.-based Micron expects to begin shipping 12-high stacks of HBM3E in 2025, and is currently working on HBM4.

Synopsys executive chair and founder Dr. Aart de Geus was named the winner of the Semiconductor Industry Association’s Robert N. Noyce Award. De Geus was selected due to his contributions to EDA technology over a career spanning more than four decades.

The top three foundries plan to implement high-NA EUV lithography as early as 2025 for the 18 angstrom generation, but the replacement of single exposure high-NA (0.55) over double patterning with standard EUV (NA = 0.33) depends on whether it provides better results at a reasonable cost per wafer.

Quick links to more news:

Global
In-Depth
Market Reports and Earnings
Education and Training
Security
Product News
Research
Events and Further Reading

Global

Belgium-based Imec released part 2 of its chiplets series, addressing testing strategies and standardization efforts, as well as guidelines and research “towards efficient ESD protection strategies for advanced 3D systems-on-chip.”

Also in Belgium, BelGan, maker of GaN chips, filed for bankruptcy according to the Brussels Times.

TSMC‘s Dresden, Germany, plant will break ground this month.

The UK will dole out more than £100 million (~US $128 million) in funding to develop five new quantum research hubs in Glasgow, Edinburgh, Birmingham, Oxford, and London.

MassPhoton is opening Hong Kong‘s first ultra-high vacuum GaN epitaxial wafer pilot line and will establish a GaN research center.

Infineon completed the sale of its manufacturing sites in the Philippines and South Korea to ASE.

Israel-based RAAAM Memory Technologies received a €5.25 million grant from the European Innovation Council (EIC) to support the development and commercialization of its innovative memory solutions. This funding will enable RAAAM to advance its research in high-performance and energy-efficient memory technologies, accelerating their integration into various applications and markets.

In-Depth

Semiconductor Engineering published its Automotive, Security and Pervasive Computing newsletter this week, featuring these top stories and video:

Chip Security Now Depends On Widening Supply Chain
Defining Chip Threat Models To Determining Security Risks
Post-Quantum Computing Threatens Fundamental Transport Protocols

And:

Where Power Savings Really Count
Making Electronics More Efficient (video)

Market Reports and Earnings

The semiconductor equipment industry is on a positive trajectory in 2024, with moderate revenue growth observed in Q2 after a subdued Q1, according to a new report from Yole Group. Wafer Fab Equipment revenue is projected to grow by 1.3% year-on-year, despite a 12% drop in Q1. Test equipment lead times are normalizing, improving order conditions. Key areas driving growth include memory and logic capital expenditures and high-bandwidth memory demand.

Worldwide silicon wafer shipments increased by 7% in Q2 2024, according to SEMI‘s latest report. This growth is attributed to robust demand from multiple semiconductor sectors, driven by advancements in AI, 5G, and automotive technologies.

The RF GaN market is projected to grow to US $2 billion by 2029, a 10% CAGR, according to Yole Group.

Counterpoint released their Q2 smartphone top 10 report.

Renesas completed their acquisition of EDA firm Altium, best known for its EDA platform and freeware CircuitMaker package.

It’s earnings season and here are recently released financials in the chip industry:

AMD Advantest Amkor Ansys Arteris Arm ASE ASM ASML
Cadence IBM Intel Lam Research Lattice Nordson   NXP   Onsemi
Qualcomm   Rambus Samsung SK Hynix   STMicro   Teradyne TI
Tower  TSMC UMC Western Digital

Industry stock price impacts are here.

Education and Training

Rochester Institute of Technology is leading a new pilot program to prepare community college students in areas such as cleanroom operations, new materials, simulation, and testing processes, with the intent of eventual transfer into RIT’s microelectronic engineering program.

Purdue University inked a deal with three research institutions — University of Piraeus, Technical University of Crete, and King’s College London —to develop joint research programs for semiconductors, AI and other critical technology fields.

The European Chips Skills Academy formed the Educational Leaders Board to help bridge the talent gap in Europe’s microelectronics sector. The Board includes representatives from universities, vocational training providers, educators and research institutions who collaborate on strategic initiatives to strengthen university networks and build academic expertise through ECSA training programs.

Security

The Cybersecurity and Infrastructure Security Agency (CISA) is encouraging Apple users to review and apply this week’s recent security updates.

Microsoft Azure experienced a nearly 10 hour DDoS attack this week, leading to global service disruption for many customers. “While the initial trigger event was a Distributed Denial-of-Service (DDoS) attack, which activated our DDoS protection mechanisms, initial investigations suggest that an error in the implementation of our defenses amplified the impact of the attack rather than mitigating it,” stated Microsoft in a release.

NIST published:

“Recommendations For Increasing U.S. Participation and Leadership in Standards Development,” a report outlining cybersecurity recommendations and mitigation strategies.
Final guidance documents and software to help improve the “safety, security and trustworthiness of AI systems.”
Cloud Computing Forensic Reference Architecture guide.

Delta Air Lines plans to seek damages after losing $500 million in lost revenue due to security company CrowdStrike‘s software update debacle. And shareholders are also angry.

Recent security research:

Physically Secure Logic Locking With Nanomagnet Logic (UT Dallas)
WBP: Training-time Backdoor Attacks through HW-based Weight Bit Poisoning (UCF)
S-Tune: SOT-MTJ Manufacturing Parameters Tuning for Secure Next Generation of Computing ( U. of Arizona, UCF)
Diffie Hellman Picture Show: Key Exchange Stories from Commercial VoWiFi Deployments (CISPA, SBA Research, U. of Vienna)

Product News

Lam Research introduced a new version of its cryogenic etch technology designed to enhance the manufacturing of 3D NAND for AI applications. This technology allows for the precise etching of high aspect ratio features, crucial for creating 1,000-layer 3D NAND.

Fig.1: 3D NAND etch. Source: Lam Research

Alphawave Semi launched its Universal Chiplet Interconnect Express Die-to-Die IP. The subsystem offers 8 Tbps/mm bandwidth density and supports operation at 24 Gbps for D2D connectivity.

Infineon introduced a new MCU series for industrial and consumer motor controls, as well as power conversion system applications. The company also unveiled its new GoolGaN Drive product family of integrated single switches and half-bridges with integrated drivers.

Rambus released its DDR5 Client Clock Driver for next-gen, high-performance desktops and notebooks. The chips include Gen1 to Gen4 RCDs, power management ICs, Serial Presence Detect Hubs, and temperature sensors for leading-edge servers.

SK hynix introduced its new GDDR7 graphics DRAM. The product has an operating speed of 32Gbps, can process 1.5TB of data per second and has a 50% power efficiency improvement compared to the previous generation.

Intel launched its new Lunar Lake Ultra processors. The long awaited chips will be included in more than 80 laptop designs and has more than 40 NPU tera operations per second as well as over 60 GPU TOPS delivering more than 100 platform TOPS.

Brewer Science achieved recertification as a Certified B Corporation, reaffirming its commitment to sustainable and ethical business practices.

Panasonic adopted Siemens’ Teamcenter X cloud product lifecycle management solution, citing Teamcenter X’s Mendix low-code platform, improved operational efficiency and flexibility for its choice.

Keysight validated its 5G NR FR1 1024-QAM demodulation test cases for the first time. The 5G NR radio access technology supports eMBB and was validated on the 3GPP TS 38.521-4 test specification.

Research

In a 47-page deep-dive report, the Center for Security and Emerging Technology delved into all of the scientific breakthroughs from 1980 to present that brought EUV lithography to commercialization, including lessons learned for the next emerging technologies.

Researchers at the Paul Scherrer Institute developed a high-performance X-ray tomography technique using burst ptychography, achieving a resolution of 4nm. This method allows for non-destructive imaging of integrated circuits, providing detailed views of nanostructures in materials like silicon and metals.

MIT signed a four-year agreement with the Novo Nordisk Foundation Quantum Computing Programme at University of Copenhagen, focused on accelerating quantum computing hardware research.

MIT’s Research Laboratory of Electronics (RLE) developed a mechanically flexible wafer-scale integrated photonics fabrication platform. This enables the creation of flexible photonic circuits that maintain high performance while being bendable and stretchable. It offers significant potential for integrating photonic circuits into various flexible substrate applications in wearable technology, medical devices, and flexible electronics.

The Naval Research Lab identified a new class of semiconductor nanocrystals with bright ground-state excitons, emphasizing an important advancement in optoelectronics.

Researchers from National University of Singapore developed a novel method, known as tension-driven CHARM3D, to fabricate 3D self-healing circuits, enabling the 3D printing of free-standing metallic structures without the need for support materials and external pressure.

Find more research in our Technical Papers library.

Events and Further Reading

Find upcoming chip industry events here, including:

Event	Date	Location
Atomic Layer Deposition (ALD 2024)	Aug 4 – 7	Helsinki
Flash Memory Summit	Aug 6 – 8	Santa Clara, CA
USENIX Security Symposium	Aug 14 – 16	Philadelphia, PA
SPIE Optics + Photonics 2024	Aug 18 – 22	San Diego, CA
Cadence Cloud Tech Day	Aug 20	San Jose, CA
Hot Chips 2024	Aug 25- 27	Stanford University/ Hybrid
Optica Online Industry Meeting: PIC Manufacturing, Packaging and Testing (imec)	Aug 27	Online
SEMICON Taiwan	Sep 4 -6	Taipei
DVCON Taiwan	Sep 10 – 11	Hsinchu
AI HW and Edge AI Summit	Sep 9 – 12	San Jose, CA
GSA Executive Forum	Sep 26	Menlo Park, CA
SPIE Photomask Technology + EUVL	Sep 29 – Oct 3	Monterey, CA
Strategic Materials Conference: SMC 2024	Sep 30 – Oct 2	San Jose, CA

Find All Upcoming Events Here

Upcoming webinars are here, including topics such as quantum safe cryptography, analytics for high-volume manufacturing, and mastering EMC simulations for electronic design.

Find Semiconductor Engineering’s latest newsletters here:

Automotive, Security and Pervasive Computing
Systems and Design
Low Power-High Performance
Test, Measurement and Analytics
Manufacturing, Packaging and Materials

The post Chip Industry Week in Review appeared first on Semiconductor Engineering.

Semiconductor Engineering
Chip Industry Week in ReviewThe SE Staff
Okinawa Institute of Science and Technology proposed a new EUV litho technology using only four reflective mirrors and a new method of illumination optics that it claims will use 1/10 the power and cost half as much as existing EUV technology from ASML. Applied Materials may not receive expected U.S. funding to build a $4 billion research facility in Sunnyvale, CA, due to internal government disagreements over how to fund chip R&D, according to Bloomberg. SEMI published a position paper this
2. Srpen 2024 v 09:01

Chip Industry Week in Review

Semiconductor Engineering

Od: The SE Staff

2. Srpen 2024 v 09:01

Okinawa Institute of Science and Technology proposed a new EUV litho technology using only four reflective mirrors and a new method of illumination optics that it claims will use 1/10 the power and cost half as much as existing EUV technology from ASML.

Applied Materials may not receive expected U.S. funding to build a $4 billion research facility in Sunnyvale, CA, due to internal government disagreements over how to fund chip R&D, according to Bloomberg.

SEMI published a position paper this week cautioning the European Union against imposing additional export controls to allow companies, encouraging them to be “as free as possible in their investment decisions to avoid losing their agility and relevance across global markets.” SEMI’s recommendations on outbound investments are in response to the European Economic Security Strategy and emphasize the need for a transparent and predictable regulatory framework.

The U.S. may restrict China’s access to HBM chips and the equipment needed to make them, reports Bloomberg. Today those chips are manufactured by two Korean-based companies, Samsung and SK hynix, but U.S.-based Micron expects to begin shipping 12-high stacks of HBM3E in 2025, and is currently working on HBM4.

Synopsys executive chair and founder Dr. Aart de Geus was named the winner of the Semiconductor Industry Association’s Robert N. Noyce Award. De Geus was selected due to his contributions to EDA technology over a career spanning more than four decades.

The top three foundries plan to implement high-NA EUV lithography as early as 2025 for the 18 angstrom generation, but the replacement of single exposure high-NA (0.55) over double patterning with standard EUV (NA = 0.33) depends on whether it provides better results at a reasonable cost per wafer.

Quick links to more news:

Global
In-Depth
Market Reports and Earnings
Education and Training
Security
Product News
Research
Events and Further Reading

Global

Belgium-based Imec released part 2 of its chiplets series, addressing testing strategies and standardization efforts, as well as guidelines and research “towards efficient ESD protection strategies for advanced 3D systems-on-chip.”

Also in Belgium, BelGan, maker of GaN chips, filed for bankruptcy according to the Brussels Times.

TSMC‘s Dresden, Germany, plant will break ground this month.

The UK will dole out more than £100 million (~US $128 million) in funding to develop five new quantum research hubs in Glasgow, Edinburgh, Birmingham, Oxford, and London.

MassPhoton is opening Hong Kong‘s first ultra-high vacuum GaN epitaxial wafer pilot line and will establish a GaN research center.

Infineon completed the sale of its manufacturing sites in the Philippines and South Korea to ASE.

Israel-based RAAAM Memory Technologies received a €5.25 million grant from the European Innovation Council (EIC) to support the development and commercialization of its innovative memory solutions. This funding will enable RAAAM to advance its research in high-performance and energy-efficient memory technologies, accelerating their integration into various applications and markets.

In-Depth

Semiconductor Engineering published its Automotive, Security and Pervasive Computing newsletter this week, featuring these top stories and video:

Chip Security Now Depends On Widening Supply Chain
Defining Chip Threat Models To Determining Security Risks
Post-Quantum Computing Threatens Fundamental Transport Protocols

And:

Where Power Savings Really Count
Making Electronics More Efficient (video)

Market Reports and Earnings

The semiconductor equipment industry is on a positive trajectory in 2024, with moderate revenue growth observed in Q2 after a subdued Q1, according to a new report from Yole Group. Wafer Fab Equipment revenue is projected to grow by 1.3% year-on-year, despite a 12% drop in Q1. Test equipment lead times are normalizing, improving order conditions. Key areas driving growth include memory and logic capital expenditures and high-bandwidth memory demand.

Worldwide silicon wafer shipments increased by 7% in Q2 2024, according to SEMI‘s latest report. This growth is attributed to robust demand from multiple semiconductor sectors, driven by advancements in AI, 5G, and automotive technologies.

The RF GaN market is projected to grow to US $2 billion by 2029, a 10% CAGR, according to Yole Group.

Counterpoint released their Q2 smartphone top 10 report.

Renesas completed their acquisition of EDA firm Altium, best known for its EDA platform and freeware CircuitMaker package.

It’s earnings season and here are recently released financials in the chip industry:

AMD Advantest Amkor Ansys Arteris Arm ASE ASM ASML
Cadence IBM Intel Lam Research Lattice Nordson   NXP   Onsemi
Qualcomm   Rambus Samsung SK Hynix   STMicro   Teradyne TI
Tower  TSMC UMC Western Digital

Industry stock price impacts are here.

Education and Training

Rochester Institute of Technology is leading a new pilot program to prepare community college students in areas such as cleanroom operations, new materials, simulation, and testing processes, with the intent of eventual transfer into RIT’s microelectronic engineering program.

Purdue University inked a deal with three research institutions — University of Piraeus, Technical University of Crete, and King’s College London —to develop joint research programs for semiconductors, AI and other critical technology fields.

The European Chips Skills Academy formed the Educational Leaders Board to help bridge the talent gap in Europe’s microelectronics sector. The Board includes representatives from universities, vocational training providers, educators and research institutions who collaborate on strategic initiatives to strengthen university networks and build academic expertise through ECSA training programs.

Security

The Cybersecurity and Infrastructure Security Agency (CISA) is encouraging Apple users to review and apply this week’s recent security updates.

Microsoft Azure experienced a nearly 10 hour DDoS attack this week, leading to global service disruption for many customers. “While the initial trigger event was a Distributed Denial-of-Service (DDoS) attack, which activated our DDoS protection mechanisms, initial investigations suggest that an error in the implementation of our defenses amplified the impact of the attack rather than mitigating it,” stated Microsoft in a release.

NIST published:

“Recommendations For Increasing U.S. Participation and Leadership in Standards Development,” a report outlining cybersecurity recommendations and mitigation strategies.
Final guidance documents and software to help improve the “safety, security and trustworthiness of AI systems.”
Cloud Computing Forensic Reference Architecture guide.

Delta Air Lines plans to seek damages after losing $500 million in lost revenue due to security company CrowdStrike‘s software update debacle. And shareholders are also angry.

Recent security research:

Physically Secure Logic Locking With Nanomagnet Logic (UT Dallas)
WBP: Training-time Backdoor Attacks through HW-based Weight Bit Poisoning (UCF)
S-Tune: SOT-MTJ Manufacturing Parameters Tuning for Secure Next Generation of Computing ( U. of Arizona, UCF)
Diffie Hellman Picture Show: Key Exchange Stories from Commercial VoWiFi Deployments (CISPA, SBA Research, U. of Vienna)

Product News

Lam Research introduced a new version of its cryogenic etch technology designed to enhance the manufacturing of 3D NAND for AI applications. This technology allows for the precise etching of high aspect ratio features, crucial for creating 1,000-layer 3D NAND.

Fig.1: 3D NAND etch. Source: Lam Research

Alphawave Semi launched its Universal Chiplet Interconnect Express Die-toDie IP. The subsystem offers 8 Tbps/mm bandwidth density and supports operation at 24 Gbps for D2D connectivity.

Infineon introduced a new MCU series for industrial and consumer motor controls, as well as power conversion system applications. The company also unveiled its new GoolGaN Drive product family of integrated single switches and half-bridges with integrated drivers.

Rambus released its DDR5 Client Clock Driver for next-gen, high-performance desktops and notebooks. The chips include Gen1 to Gen4 RCDs, power management ICs, Serial Presence Detect Hubs, and temperature sensors for leading-edge servers.

SK hynix introduced its new GDDR7 graphics DRAM. The product has an operating speed of 32Gbps, can process 1.5TB of data per second and has a 50% power efficiency improvement compared to the previous generation.

Intel launched its new Lunar Lake Ultra processors. The long awaited chips will be included in more than 80 laptop designs and has more than 40 NPU tera operations per second as well as over 60 GPU TOPS delivering more than 100 platform TOPS.

Brewer Science achieved recertification as a Certified B Corporation, reaffirming its commitment to sustainable and ethical business practices.

Panasonic adopted Siemens’ Teamcenter X cloud product lifecycle management solution, citing Teamcenter X’s Mendix low-code platform, improved operational efficiency and flexibility for its choice.

Keysight validated its 5G NR FR1 1024-QAM demodulation test cases for the first time. The 5G NR radio access technology supports eMBB and was validated on the 3GPP TS 38.521-4 test specification.

Research

In a 47-page deep-dive report, the Center for Security and Emerging Technology delved into all of the scientific breakthroughs from 1980 to present that brought EUV lithography to commercialization, including lessons learned for the next emerging technologies.

Researchers at the Paul Scherrer Institute developed a high-performance X-ray tomography technique using burst ptychography, achieving a resolution of 4nm. This method allows for non-destructive imaging of integrated circuits, providing detailed views of nanostructures in materials like silicon and metals.

MIT signed a four-year agreement with the Novo Nordisk Foundation Quantum Computing Programme at University of Copenhagen, focused on accelerating quantum computing hardware research.

MIT’s Research Laboratory of Electronics (RLE) developed a mechanically flexible wafer-scale integrated photonics fabrication platform. This enables the creation of flexible photonic circuits that maintain high performance while being bendable and stretchable. It offers significant potential for integrating photonic circuits into various flexible substrate applications in wearable technology, medical devices, and flexible electronics.

The Naval Research Lab identified a new class of semiconductor nanocrystals with bright ground-state excitons, emphasizing an important advancement in optoelectronics.

Researchers from National University of Singapore developed a novel method, known as tension-driven CHARM3D, to fabricate 3D self-healing circuits, enabling the 3D printing of free-standing metallic structures without the need for support materials and external pressure.

Find more research in our Technical Papers library.

Events and Further Reading

Find upcoming chip industry events here, including:

Event	Date	Location
Atomic Layer Deposition (ALD 2024)	Aug 4 – 7	Helsinki
Flash Memory Summit	Aug 6 – 8	Santa Clara, CA
USENIX Security Symposium	Aug 14 – 16	Philadelphia, PA
SPIE Optics + Photonics 2024	Aug 18 – 22	San Diego, CA
Cadence Cloud Tech Day	Aug 20	San Jose, CA
Hot Chips 2024	Aug 25- 27	Stanford University/ Hybrid
Optica Online Industry Meeting: PIC Manufacturing, Packaging and Testing (imec)	Aug 27	Online
SEMICON Taiwan	Sep 4 -6	Taipei
DVCON Taiwan	Sep 10 – 11	Hsinchu
AI HW and Edge AI Summit	Sep 9 – 12	San Jose, CA
GSA Executive Forum	Sep 26	Menlo Park, CA
SPIE Photomask Technology + EUVL	Sep 29 – Oct 3	Monterey, CA
Strategic Materials Conference: SMC 2024	Sep 30 – Oct 2	San Jose, CA

Find All Upcoming Events Here

Upcoming webinars are here, including topics such as quantum safe cryptography, analytics for high-volume manufacturing, and mastering EMC simulations for electronic design.

Find Semiconductor Engineering’s latest newsletters here:

Automotive, Security and Pervasive Computing
Systems and Design
Low Power-High Performance
Test, Measurement and Analytics
Manufacturing, Packaging and Materials

The post Chip Industry Week in Review appeared first on Semiconductor Engineering.

Semiconductor Engineering
RISC-V Heralds New Era Of CooperationBrian Bailey
RISC-V is paving the way for open source to become accepted within the hardware community, creating a level of industry collaboration never seen in the past, while revitalizing the connection between academia and industry. The big question is whether this arrangement is just a placeholder while the industry re-learns how to develop processors, or whether this processor architecture is something very different. In either case, there is a clear and pressing need for more flexible processor archite
30. Květen 2024 v 09:05

RISC-V Heralds New Era Of Cooperation

Semiconductor Engineering

Od: Brian Bailey

30. Květen 2024 v 09:05

RISC-V is paving the way for open source to become accepted within the hardware community, creating a level of industry collaboration never seen in the past, while revitalizing the connection between academia and industry.

The big question is whether this arrangement is just a placeholder while the industry re-learns how to develop processors, or whether this processor architecture is something very different. In either case, there is a clear and pressing need for more flexible processor architectures, and at least for now, RISC-V has filled a void.

“RISC-V was born out of academia and has had strong collaboration within universities from day one,” says Loren Hobbs, vice president of product and business development at Bluespec. “This collaboration continues today, with many of the most popular open-source RISC-V processors having come from universities. Organizations such as OpenHW Group and CHIPS Alliance serve a central and critical role in driving the collaboration, which is bi-directional between the academic community and industry.”

Collaboration of this type has not existed with the industrial community in the past. “We are learning from each other,” says Florian Wohlrab, CEO at OpenHW. “We are learning best practices for verification. At the same time, we are learning what things to avoid. It is growing where people say, ‘Yes, I really get benefit from sharing ideas.'”

The need for processor flexibility exists within industry as well as academia. “There is a need within the industry for diversification on the processor front,” says Neil Hand, director of marketing at Siemens EDA. “In the past, this led to a fragmented set of companies that couldn’t work together. They didn’t see the value of working together. But RISC-V has a cohesive central organization where anyone who wants to get into the processor space can collaborate. They don’t have to expose their secret sauce, but they get to benefit from each other. A rising tide lifts all boats, and that’s really the situation we’re at with RISC-V.”

Longevity
Whether the industry can build upon this success, or whether it fizzles out over time, remains to be seen. But at least for now, RISC-V’s momentum is growing. “We are at the beginning of a revolution in hardware design,” says OpenHW’s Wohlrab. “We saw the same thing for software when Linux came out 20 or so years ago. No one was really thinking about sharing software or collaboratively developing software. There were some small open-source ventures, but working together on a big project took a long time to develop. Now we are all sharing software, all co-working. But for hardware, we’re just at the beginning of this new concept, and a lot of people need to understand that we can do the same for hardware as we did for software.”

Underlying RISC-V’s success is widespread collaboration. “One of the pillars sustaining the success of RISC-V is customization that works with the ecosystem and leverages a well-defined process,” says Sergio Marchese, vice president of application engineering at SmartDV. “RISC-V vendors face the challenge of showing how their processor customization capabilities serve an application and demonstrating the complete process on real hardware. Without strategic partnerships, RISC-V vendors must walk a much more challenging, time-consuming, and resource-intensive road.”

That framework is what makes it unique. “RISC-V has formed this framework for collaboration, and it fixes everything,” says Siemens’ Hand. “Now, when a university has a really cool idea for memory tagging in a processor design, they don’t have to build the compilers, they don’t have to build the reference platform. They already exist. Maybe a compiler optimization startup has this great idea for handling code optimization. They don’t have to build the rest of the ecosystem. When a processor IP company has this great idea, they can become focused within this bigger picture. That’s the unique nature of it. It’s not just a processor specification.”

Historically, one of the problems associated with open-source hardware was quality, because finding bugs in silicon is expensive. OpenHW is an important piece of the puzzle. “Why should everyone reinvent the wheel by themselves?” asks Wohlrab. “Why can’t we get the basic building blocks, some basic chips, take some design from academia, which has reasonably good quality, and build on them, verify them together. We are verifying with different tools, making sure we get a high coverage, and then everyone can go off and use them in their own chips for mass production, for volume shipment.”

This benefits companies both large and small. “There are several processor vendors that have switched to RISC-V,” says Hand. “Synopsys has moved to RISC-V. Andes has moved to RISC-V. MIPS has moved to RISC-V. Why? Because they can leverage the whole ecosystem. The downside of it is commoditization, which as a customer is really beneficial because you can delay choosing a processor till later in the design flow. Your early decision is to use the Arm ecosystem or RISC-V, and then you can work through it. That creates an interesting set of dynamics. You can start to create new opportunities for companies that develop and deliver IP, because you can benchmark them, swap them in and out, and see which one works. On the flip side, it makes it awful from a lock-in perspective once you’re in that socket.”

Fragmentation
Of course, there will be some friction in the system. “In the early days of RISC-V there was nearly a 1:1 balance between contributors and consumers of the technology,” says Geir Eide, director, product management for Siemens EDA. “Today there are thousands of RISC-V consumers, but only a small percentage of those will be contributors. There is a risk that there will be a disconnect between them. If, for instance, a particular market or regional segment is growing at a higher pace than others, or other market segments and regions are more conservative, they tend to stick to established solutions longer. That increases the risk that it could lead to fragmentation.”

Is that likely to impact development long term? “We do not believe that RISC-V will become regionally concentrated, although though there may be regional concentrations of focus within the broad set of implementation choices provided by RISC-V,” says Bluespec’s Hobbs. “A prime example of this is the Barcelona Supercomputer Center, creating a regional focus area for high-performance computing using RISC-V. However, while there may be regional focus areas, this does not mean that the RISC-V standard is, or will become, fragmented. In fact, one of the key tenets of the creation and foundation of RISC-V was preventing fragmentation of the ISA, and it continues to be a key function of RISC-V international.”

China may be a different story. “A lot of companies in China are creating RISC-V cores for internal consumption — for political reasons mostly,” says John Min, vice president of customer service at Arteris. “I think China will go 100% RISC-V for embedded, but it’s a one-way street. They will keep leveraging what the Western companies do and enhance it. China will continue sucking all advancements, such as vectorization, or the special domain-specific acceleration enhancements. They will create their own and make it their own internally, but they will give nothing back.”

Such splits have occurred in the past. “Design languages are the most recent example of that,” says Hand. “There was a regional split, and you had Europe focus on VHDL while America went with Verilog. With RISC-V, there will be that regional split where people will go off and do their things regionally. Europe has focused projects, India has theirs, but they’re still doing it within this framework. It’s this realization that everyone benefits. They’re not doing it to benefit the other people. They’re doing it ultimately to save themselves effort, to save themselves cost, but they realize that by doing it in that framework it is a net benefit to everyone.”

Bi-directionality
An important element is that everyone benefits, and that has to stretch across the academic/commercial boundary. “RISC-V has propelled a new degree of collaboration between academia and commercial organizations,” says Dave Kelf, CEO at Breker. “It’s noticeable that institutions such as Harvey Mudd College in Claremont, California, and ETH in Zurich, Switzerland, to name two, have produced advanced processor designs as a teaching aid, and have collaborated with multiple companies on their verification and design. This has been further advanced by OpenHW Group, which has made these designs accessible to the industry. This bi-directional collaboration benefits the tool providers to further enhance their offerings working on advanced, open devices, while also enabling academia to improve their designs to a commercial quality level. The virtuous circle created is essential if we are to see RISC-V established as a mainstream, industry-wide capability.”

Academia has a lot to offer in hardware advancement. “Researchers in universities are developing innovative new software and hardware to push the limits of RISC-V innovation,” says Dave Miller, head of corporate communications at SiFive. “Many of the RISC-V projects in academia are focused on optimizing performance and energy efficiency for AI workloads, and are open source so the entire ecosystem can benefit. Researchers are also actively contributing to RISC-V working groups to share their knowledge and collaborate with industry participants. These working groups are split evenly between representatives from APAC, Europe, and North America, all working together towards common goals.”

In many cases, industry is willing to fund such projects. “It makes it easy to have research topics that don’t need to boil the ocean,” says Hand. “If you’re a PhD student and you have a great idea, you can go do it. It’s easy for an industry partner to say, ‘I’ll sponsor that. That’s an interesting thing, and I am not required to allocate ridiculous amounts of money into an open-ended project. It’s like I can see the connection of how that research will go into a commercial product later.'”

This feeds back into academia. “The academics have been jumping on board with OpenHW,” says Wohlrab. “By taking their cores and productizing them, they get a chip back that could be shipped in high volume. Then they can do their research on a real commercial product and can see if their idea would fly in real life. They get real numbers and can see real figures for the benefits of a new branch predictor.”

It can also have a long-term benefit for tools. “There are areas where they want to collaborate with us, especially around security,” says Kiran Vittal, executive director for alliances marketing management at Synopsys. “They are building RISC-V based sub-systems using open-source RISC-V processors, and then academia wants to look at not only the AI part, but the security part. There are post-doc students or PhD students looking into using our tools to verify or to implement whatever they’re doing on security.”

That provides an incentive for EDA to offer better tools for use in universities. “Although there has always been collaboration between universities and the industry, where industry provides the universities with access to EDA tools, IP cores, etc., there’s often a bit of a lag,” says Siemens’ Eide. “In many situations (especially outside of the core area of a particular project), universities have access to older versions of the commercial solutions. If you for instance look at a new grad’s resume, where you in the past would see references to old tech, now you see a lot of references to relatively sophisticated use of RISC-V.”

Moving Forward
This collaboration needs to keep pushing forward. “We had an initiative to create a standardized interface for accelerators,” says Wohlrab. “RISC-V International standardized how to add custom instructions in the ISA, but there was no standard for the hardware interface. So we built this. It was a cool discussion. There were people from Silicon Labs, people from NXP, people from Thales, plus several startups. They all came together and asked, ‘How can we make it future proof and put the accelerators inside?'”

The application space for RISC-V is changing. “The big inflection point is Linux and Android,” says Arteris’ Min. “Android already has some support, but when both Android and Linux are really supported, it will change the mobile apps processor game. The number of designs will proliferate. The number of high-end designs will explode. It will take the whole industry to enable that because RISC-V companies are not big enough to create this by themselves. All the RISC-V companies are partners, because we enable this high-end design at the processor levels.”

That would deepen the software community’s engagement. “An embedded software developer needs to understand the underlying hardware if they want to run Linux on a RISC-V processor that uses custom instructions/ accelerators,” says Bluespec’s Hobbs. “To develop complex embedded hardware/software systems, both embedded software developers and embedded hardware developments must possess contextual understanding of the interoperability of hardware and software. The developer must understand how the customized processor is leveraging the custom instructions in hardware for Linux to efficiently manage and execute the accelerated workloads.”

This collaboration could reinvigorate research into EDA, as well. “With AI you can build predictive models,” says Hand. “Could that be used to identify the change effects from making an extension? What does that mean? There’s a cloud of influence — not directly gate-wise, because that immediately explodes — but perhaps based on test suites. ‘I know that something that touches that logic touches this downstream, which touches the rest of the design.’ That’s where AI plays a big role, and it is one of the interesting areas because in verification there are so many unknowns. When AI comes along, any guidance or any visibility that you can give is incredibly powerful. Even if it is not right 100% of the time, that’s okay, as long as it generates false negatives and not false positives.”

There is a great opportunity for EDA companies. “We collaborate with many of the open-source providers, with OpenHW group, with ETH in Zurich,” says Synopsys’ Vittal. “We want to promote our solutions when it comes to any processor design and you need standard tools like synthesis, place and route, simulation. But then there are also other kinds of unique solutions because RISC-V is so customizable, you can build your own custom instructions. You need something specific to verify these custom instructions and that’s why the Imperas golden models are important. We also collaborated with Bluespec to develop a verification methodology to take you through functional verification and debug.”

There are still some wrinkles to be worked out for customizations. “RISC-V gives us predictability,” says Hand. “We can create a compliance test suite, give you a processor optimization package if you’re on the implementation side. We can create analytics and testing solutions because we know what it’s going to look like. But for non-standard processors, it is effectively as a service, because everyone’s processor is a little bit different. The reason you see a lot of focus on the verification, from platform architecture exploration all the way through, is because if you change one little thing, such as an addressing mode, it impacts pretty much 100% of your processor verification. You’ve got to retest the whole processor. Most people aren’t set up like an Arm or an Intel with huge processor verification teams and the infrastructure, and so they need automation to do it for them.”

Conclusion
RISC-V has enabled the industry to create a framework for collaboration, which enables everyone to work together for selfish reasons. It is a symbiotic relationship that continues to build, and it is creating a wider sphere of influence over time.

“It’s unique in the modern era of semiconductor,” says Hand. “You have such a wide degree of collaboration, where you have processor manufacturers, the software industry leaders, EDA companies, all working on a common infrastructure.”

Related Reading
RISC-V Micro-Architectural Verification
Verifying a processor is much more than making sure the instructions work, but the industry is building from a limited knowledge base and few dedicated tools.
RISC-V Wants All Your Cores
It is not enough to want to dominate the world of CPUs. RISC-V has every core in its sights, and it’s starting to take steps to get there.

The post RISC-V Heralds New Era Of Cooperation appeared first on Semiconductor Engineering.

Semiconductor Engineering
Chip Aging Becoming Key Factor In Data Center EconomicsAnn Mutschler
Chip aging is becoming a much bigger concern inside of data centers, where it can impact server uptime, utilization rates, and the amount of energy needed to drive signals and cool entire server racks. Aging in chips is the result of both higher logic utilization and increasing transistor density. This is problematic for data centers, in general, but especially for AI chips where digital logic is expected to run at maximum speed. That generates more heat, which becomes harder to dissipate as the
20. Květen 2024 v 09:01

Chip Aging Becoming Key Factor In Data Center Economics

Semiconductor Engineering

Od: Ann Mutschler

20. Květen 2024 v 09:01

Chip aging is becoming a much bigger concern inside of data centers, where it can impact server uptime, utilization rates, and the amount of energy needed to drive signals and cool entire server racks.

Aging in chips is the result of both higher logic utilization and increasing transistor density. This is problematic for data centers, in general, but especially for AI chips where digital logic is expected to run at maximum speed. That generates more heat, which becomes harder to dissipate as the number specialized and general-purpose processing elements per square millimeter of silicon continues to rise. Heat typically gets trapped between the fins of finFETs and gate-all-around FETs, accelerating electromigration and reducing the time it takes for dielectrics to break down. It also can cause warpage, which can rupture the bonds and contacts between different components in an advanced package or on a PCB.

For data centers, that creates a number of challenges:

Thermal management: This requires a deep understanding of workloads and the resulting transient thermal gradients as processing is load-balanced on-chip, between chips or chiplets, and between servers;
More data: Data from sensors everywhere, along with larger training sets, all need to be processed faster than in the past to keep up with the flood of data, but all of that needs to happen in the same or smaller footprint without overheating any part of a device, and
In-circuit monitoring: Sensors can be added into chips to detect variations in heat and data speeds in different paths, but it’s much more difficult to keep track of tens of thousands of these monitors as they collect data from heterogeneous processing elements, each of which can age at different rates depending on process variation, defectivity, varying workloads, and ambient thermal conditions.

“Servers are much more capable today than they were 10 years ago, and the issue is that power hasn’t scaled like it used to,” said Steven Woo, Rambus fellow and distinguished inventor. “Now, if you want to do lots more work in your server, you have to burn more power to do it. Twenty years ago, a server might dissipate a couple hundred watts. But with the latest servers that NVIDIA just announced around Grace Blackwell, the whole rack is 120 kilowatts, and the individual servers are many kilowatts. Just delivering power into those racks is causing changes in the infrastructure in the industry. Now that you have to bring in and dissipate more power in a small space, you get all kinds of interesting things that could happen over time. The heat that’s being dissipated can have effects on the chip, and you have to worry sometimes about thermal cycling where, as the chip is doing a lot of work, maybe part of the chip stops and then it does more work. You get these rapid cycles of dissipating a lot of power, then not, then dissipating a lot of power, then not. That cycling causes local heating and cooling, leading to thermal stresses, and this impacts all chips, including memory.”

As a result, everyone from the data center manager to the chip architect now has to understand how a chip behaves in the field, and how increasingly customized chip and system architectures will function over time. Downtime is costly for a data center, but under-utilization and reduced performance also carries a high price tag. That, in turn, affects how much margin is considered essential, such as extra data paths if some of them are fully or partially closed off by electromigration, and how that margin will impact performance, power, and area/cost over a chip’s projected lifetime — especially in a heterogeneous design with specialized compute elements.

“When it comes to the hyper-scalers and high powered, highly customized, heterogeneous chips for various different workloads, these chips are on 24/7, so consistent uptime is critical,” said Dan Lee, product management director at Cadence. “Since all of these chips are done at the really advanced nodes, with the smaller device sizes, more developers are looking to do aging analysis, and derive the wear and tear so they can see if the chip is going to last a year or five years. At the same time, an important consideration is also thermal — especially when we’re talking about these heterogeneous integrations, and you don’t really get the thermal conductivity that you would in a straightforward, monolithic design. There’s a bit more thought or planning that needs to be a part of this because aging and heating are related. All things being equal, if you’re operating in a very hot environment, you’re going to expect a lower lifespan.”

Still, determining how much shorter that lifespan will be isn’t always a precise calculation. “Data center SoCs that execute mission-critical workloads need to provide scalable visibility, predict problems before they occur, provide deep-dive analyses into problems, and be optimized to increase longevity of investment,” said Padmakumar Karthik, senior technology manager at Arm. “Data center diagnostic patterns are often deployed to measure the health of an SoC post-manufacturing to prevent silent data corruption (SDC) issues. But on-chip sensors provide an additional layer of insights, detecting droops or aging or thermal events on-chip, all of which can cause SDC incidents. For this reason, scalable, customizable sensor frameworks that can monitor and adapt throughout the useful life of the device, enabling continuous design optimization and preventive maintenance, will be increasingly important.”

There are multiple ways to achieve this, but each data center can be very different. In some cases, chips are designed by systems companies for internal use. And in most cases, there is a mix of different hardware and software, not all of which is state-of-the-art. “Many data centers have legacy infrastructure that may not be inherently designed for optimal power efficiency,” noted Noam Brousard, vice president of systems at proteanTecs, in a recent blog. “Upgrading or retrofitting such infrastructure poses challenges in achieving comprehensive power optimization.”

Even within a single rack, stresses can vary greatly from one server to the next, and from one chip to the next even in the same server. “You can imagine when you have a very big chip, toward the edges of the chip it will expand more than in a small chip, and that can add stress,” said Rambus’ Woo. “You have to really be careful about how you cool things, and memory is no different. You have very specific things you worry about with memory, like the ability to retain data, depending on how hot the chip is.”

In addition, as chips age, parameters drift. Marc Swinnen, director of product marketing in Ansys’ semiconductor division, said the traditional approach has been to use a library that’s characterized as a brand new chip. “The library is characterized at 1 year, 5 years, 10 years, 15 years, and you can run all your analysis multiple times with these different aged libraries. That sounds good on paper, and that’s what a lot of people do, but the problem is that not all parts of the chip age at the same rate. This is why aging is often associated with activity and temperature. Some parts of the chip are more active and hotter than other parts of the chip, so the aging time runs differently for different parts. This means you want to apply some of the old library to some parts of the chip, and the younger library to other parts of the chip, because if signals run between them you have setup and hold issues. If everything slows down at the same time — or one slows down and the other one doesn’t — you’re going to get mismatches, and that’s the difficulty. At the bottom level, it’s easy. Every gate is assigned its right age. That’s simple. You do an analysis with every gate. But how do you assign the age to every gate? Where do you get that information from? You need a lot of realistic activity, and then predict that over the lifespan and with temperature. That’s the problem. How do you actually construct this aging map? Once you have it, the analysis is not that hard.”

Aging maps are application- and workload-specific. Every chip will age differently depending on the functions it performs.

But aging is just one of many factors that affect data center uptime. “When we look at data center, we look at the whole application first, then whittle it down to what that means for chips and packages,” said Kelly Morgan, senior principal application engineer at Ansys. “From the mechanical reliability lens of the data center operation, we go through thermal cycling, obviously. We’re in a controlled environment. But what does that influence? How does that influence the integrity of the chips as you go through thermal cycles? Typically, we’ll look at things like solder fatigue and other effects.”

Another factor to consider is shipping and handling, which can affect the aging of a chip, package, and board.

“Even before the device is put in place, there are opportunities for vibration,” Morgan said. “You might hit something, which is a bit of a shock. We have customers who are looking at things like drop, shock, and vibration, and they have goals they need to test to. Typically, the standard process is to do a lot of physical testing. Now as you can imagine, that can be pretty challenging. You have to be pretty far along in the design process before you really start to go and test, and if there’s an issue, then you’ve got to go back and retest. Early simulation helps here, especially for those larger-scale events, and that comes down to the chassis, the board, to all the components, including the ICs.”

Fig. 1: Components of complete electronic system analysis. Source: Ansys

Quality control remains a big challenge when it comes to mechanical stresses that can affect aging. Adam Cron, distinguished architect at Synopsys, pointed to a recent Intel white paper, which noted that at the current acceptable defectivity rates, one core fails every two days. To account for this, Cron noted that certain commercial tools support in-system delay testing in a BiST mode. By adding specific IP, any ATPG patterns could be added to that. (Intel’s paper said its solution only applies to stuck-at testing.)

“In very large, millions-of-cores data center-type environments, the implication is that you’d better be ready,” Cron said. “One of the things they were talking about in this paper was in-system scan. Intel was bringing a database of test patterns in, and then applying it in-system after isolating a core. And then, upon a failure, they’d quarantine and move on. But the data centers are apparently running out of that opportunistic time slot to do any of this. We’ve heard some interesting conversations about the fact that people do run a lot of things during certain times. However, other times are cheaper, so all the holes are just getting filled in terms of runtime. Monitors are certainly something to look at, but monitors are looking at systemic degradation. That’s known, if you will. And so as things degrade, V_min will change, maybe frequency will change. And they’ll be on a pace. They can figure out when to do that. That’s easy enough to figure out. However, if there’s a marginality or some broken component in there, it is not up to the tool to find that. And frankly, the in-system scan wasn’t addressing all components on the die. It was only up to like 80% of stuck-at coverage, which isn’t that much, especially when you’re not looking at all of the pieces inside the die. The point is, there are still opportunities to do better.”

Cron noted that one big systems company suggested a dual-core lockstep mechanism, starting out the data center in dual-core lock-step mode for X number of months. “When it looks like you’ve squeezed the major part of the curve out, in terms of finding these defective components, then unlock them, double your capacity, run like that for a while, and periodically hook some back up again. That means everything is utilized, at least. Of course, some are working at half capacity here and there, but it’s not the whole die. And there are some implications there from a design standpoint, at least for the hardware, but also possibly the operating system, depending on who decides what physical core is used versus what virtual core is used.”

Approaches to measuring aging
Any discussion around aging circuits really boils down to extending the life of the machines in the data center, and not getting caught by surprise when failures occur.

“How do you do that? You have to measure the aging of those machines,” said Neil Hand, director of marketing, IC segment at Siemens EDA. “Right now, if you speak to the CIOs of these big companies with big data centers, they say, ‘We’ve got to get rid of the machines after three years because we can’t risk it going down.’ If you look at embedded analytics capabilities, you can start to embed aging monitors in those devices, you can start to monitor those in real time. It doesn’t look that different than what it does from an automotive perspective. It’s all the same technologies, effectively, but you’re monitoring them. And then you can say, ‘We’re now at 90% of our life for this server.’ We can then just replace that server.”

This feeds into corporate goals around sustainability, as well. “It comes down to building the best thing to begin with, then building it with design for manufacturing in mind so that you don’t get waste during manufacturing, achieve better yields, and finally extend the life of products and build them in environmentally-sustainable ways,” Hand said. “If you can extend the data center lifecycle from three years to five years, that’s big. And especially if you start going to these high-performance, application-specific type of clusters, you may not need to change them as often, because if the underlying capabilities aren’t changing, that might drive the cycling of it. In the case of a biological computer, if there’s no new change to the underlying protein folding mechanisms, you might say, ‘We don’t need a new compute platform. This is really good.”

The longer the product life can be extended, the better. Design for aging is a matter of, first, performing the aging analysis with the foundry models. “Run the simulations and observe the effects,” said Cadence’s Lee. “When you’re doing the simulation, you want to have the right mission profiles, so you come up with an accurate prediction of how your device is going to behave after a certain number of years in deployment. You may want to combine that with thermal analysis, for example, because how that aging is going to behave will depend on what temperature this design is going to be working at. You may think it’s 22 degrees Celsius, but maybe through some thermal analysis you realize it’s actually going to be operating at 35 or 40 degrees most of the time. That may change the outcome of your aging analysis.”

In terms of the associated thermal analysis, this can extend beyond a single device. “It’s also how that heat is moving,” Lee said. “Let’s say you have this integrated design, where you have some power devices alongside some logic, or some other functionality that is lower power. What you may want to understand is, if those bandgaps or power circuits are generating a lot of heat, that may be shifting over into other parts of your design. So when you run your aging analysis, you may assume that you’re running at 25 degrees, whereas the power devices are at 40 or 45 degrees. They’re on the same chip, they’re very close to each other, and you have to understand how much of that heat is moving over to your logic and what that’s going to bring the temperature up to. You want to know that so you can perform the aging analysis based on that higher temperature.”

Another consideration is combining aging analysis and interconnect parasitics, which is especially relevant for advanced nodes due to the parasitics in the interconnect. “They’re dominant when it comes to performance and functionality,” Lee added. “So when thinking about aging, you also have to think about it being an aged device that has to push the electrons through this interconnect. That’s a pretty heavy load. When you’re doing the aging analysis, you probably will have to be doing it with extracted parasitics. You just can’t do it on a pure schematic design. It doesn’t give you enough detail about what’s really happening physically. This may be included in the aging analysis tool. When most people talk about aging, they may not think about the parasitic aspect to it.”

Combating aging, thermal in memory
While standards don’t work in custom silicon, they do work for some standard components in those devices, such as memory. Over the past 10 to 15 years, memory standards have started to address the impact of heat.

“If you start to exceed certain temperature limits, you’ve got to refresh the device more frequently because the charge can leak off the cells more quickly,” said Rambus’ Woo. “So there are temperature-dependent refresh rates. There are other things that can be exacerbated, like the capacitors are getting smaller, they’re holding fewer electrons because there are so many more of them on a chip now, so we’ve seen memories adopt on-die error correction. This on-die error correction is something that is hidden from the outside world. In many cases, you don’t even know an error has occurred and been corrected on the chip. Those kinds of technologies become even more important now because the temperatures can be higher.”

There also is growing demand for more telemetry to provide monitoring information. “You just want to know if anything is overheating,” said Woo. “Does something seem like it’s malfunctioning? The data center manager will get regular updates about the status of the major components of the system. A lot of boards now in servers have baseboard management controllers (BMCs), which are little chips that sit on each board and are responsible for, among other things, reporting back the health of that board when a server might have five or six boards. We’re frequently seeing more of these BMC chips.”

Design for aging
While the goal is to be able to guarantee a certain lifetime for the chips in a data center, the challenges for achieving that are expanding. “There’s a growing list of things that can be harmful to devices over their lifetime,” Woo said. “It’s a balance between not adding too much cost, even though you have to increase the reliability and maybe add new features, and all of these things are in play with each other.”

Whether it is liquid cooling or higher levels of RAS ECC in the system, there is no single best answer for every application. In general, the industry is moving toward higher reliability and increasing resilience, but there are many ways to get there and challenges with each of them.

“Just as 15 years ago we didn’t necessarily always think we had to talk about power, now we have to talk about it all the time,” Woo said. “The same thing is going to be true for resilience and reliability. It’s going to be required to become part of the way people think about architectures, and part of that is how the memory system improves its reliability. You can’t really do anything unless you can compute on some data, and you have to make sure that data is reliable. It will touch how memory is stored in a DRAM. It will touch how memory is communicated across links. And it even will touch how processors manipulate data once they get a hold of it in their caches, and in the compute pipelines. Also, one of the key things people will worry about is how much of that susceptibility is brought about by age-related issues, like heating cycles, etc.”

Finally, there are even issues around the quality of the power that comes into a system. “The servers get noise on the power rails, and it’s a balance between how much money you’re willing to pay for the power delivery versus the quality of power,” said Woo. “You have to be tolerant of those kinds of things, too. Power management becomes more challenging, as well as the amount of power that these systems are using today. NVIDIA systems bring 48-volt power into the racks, and there is talk about even higher voltage levels. Those changes in infrastructure can all impact heat, and can age components differently.”

The post Chip Aging Becoming Key Factor In Data Center Economics appeared first on Semiconductor Engineering.

Semiconductor Engineering
Chip Industry Week In ReviewThe SE Staff
President Biden will raise the tariff rate on Chinese semiconductors from 25% to 50% by 2025, among other measures to protect U.S. businesses from China’s trade practices. Also, as part of President Biden’s AI Executive Order, the Administration released steps to protect workers from AI risks, including human oversight of systems and transparency about what systems are being used. Intel is in advanced talks with Apollo Global Management for the equity firm to provide more than $11 billion to bui
17. Květen 2024 v 09:01

Chip Industry Week In Review

Semiconductor Engineering

Od: The SE Staff

17. Květen 2024 v 09:01

President Biden will raise the tariff rate on Chinese semiconductors from 25% to 50% by 2025, among other measures to protect U.S. businesses from China’s trade practices. Also, as part of President Biden’s AI Executive Order, the Administration released steps to protect workers from AI risks, including human oversight of systems and transparency about what systems are being used.

Intel is in advanced talks with Apollo Global Management for the equity firm to provide more than $11 billion to build a fab in Ireland, reported the Wall Street Journal. Also, Intel’s Foundry Services appointed Kevin O’Buckley as the senior vice president and general manager.

Polar is slated to receive up to $120 million in CHIPS Act funding to establish an independent American foundry in Minnesota. The company expects to invest about $525 million in the expansion of the facility over the next two years, with a $75 million investment from the State of Minnesota.

Arm plans to develop AI chips for launch next year, reports Nikkei Asia.

South Korea is planning a support package worth more than 10 trillion won ($7.3 billion) aimed at chip materials, equipment makers, and fabless companies throughout the semiconductor supply chain, according to Reuters.

Quick links to more news:

Global
In-Depth
Markets and Money
Security
Supercomputing
Education and Training
Product News
Research
Events and Further Reading

Global

Edwards opened a new facility in Asan City, South Korea. The 15,000m² factory provides a key production site for abatement systems, and integrated vacuum and abatement systems for semiconductor manufacturing.

France’s courtship with mega-tech is paying off. Microsoft is investing more than US $4 billion to expand its cloud computing and AI infrastructure, including bringing up to 25,000 advanced GPUs to the country by the end of 2025. The “Choose France” campaign also snagged US $1.3 billion from Amazon for cloud infrastructure expansion, genAI and more.

Toyota, Nissan, and Honda are teaming up on AI and chips for next-gen cars with support from Japan’s Ministry of Economy, Trade and Industry, (METI), reports Nikkei Asia.

Meanwhile, IBM and Honda are collaborating on long-term R&D of next-gen technologies for software-defined vehicles (SDV), including chiplets, brain-inspired computing, and hardware-software co-optimization.

Siemens and Foxconn plan to collaborate on global manufacturing processes in electronics, information and communications technology, and electric vehicles (EV).

TSMC confirmed a Q424 construction start date for its first European plant in Dresden, Germany.

Amazon Web Services (AWS) plans to invest €7.8 billion (~$8.4B) in the AWS European Sovereign Cloud in Germany through 2040. The system is designed to serve public sector organizations and customers in highly regulated industries.

In-Depth

Semiconductor Engineering published its Low Power-High Performance newsletter this week, featuring these stories:

- Will Domain-Specific ICs Become Ubiquitous?
- Running More Efficient AI/ML Code With Neuromorphic Engines
- Power/Performance Costs In Chip Security

And this week’s Test, Measurement & Analytics newsletter featured these stories:

Using Predictive Maintenance To Boost IC Manufacturing Efficiency
The Future Of Fault Coverage In Chips
Doing More At Functional Test

Markets and Money

The U.S. National Institute of Standards and Technology (NIST) awarded more than $1.2 million to 12 businesses in 8 states under the Small Business Innovation Research (SBIR) Program to fund R&D of products relating to cybersecurity, quantum computing, health care, semiconductor manufacturing, and other critical areas.

Engineering services and consulting company Infosys completed the acquisition of InSemi Technology, a provider of semiconductor design and embedded software development services.

The quantum market, which includes quantum networking and sensors alongside computing, is predicted to grow from $838 million in 2024 to $1.8 billion in 2029, reports Yole.

Shipments of OLED monitors reached about 200,000 units in Q1 2024, a year over year growth of 121%, reports TrendForce.

Global EV sales grew 18% in Q1 2024 with plug-in hybrid electric vehicles (PHEV) sales seeing 46% YoY growth and battery electric vehicle (BEV) sales growing just 7%, according to Counterpoint. China leads global EV sales with 28% YoY growth, while the US grew just 2%. Tesla saw a 9% YoY drop, but topped BEV sales with a 19% market share. BYD grew 13% YoY and exported about 100,000 EVs with 152% YoY growth, mainly in Southeast Asia.

DeepX raised $80.5 million in Series C funding for its on-device NPU IP and AI SoCs tailored for applications including physical security, robotics, and mobility.

MetisX raised $44 million in Series A funding for its memory solutions built on Compute Express Link (CXL) for accelerating large-scale data processing applications.

Security

While security experts have been warning of a growing threat in electronics for decades, there have been several recent fundamental changes that elevate the risk.

Synopsys and the Ponemon Institute released a report showing 54% of surveyed organizations suffered a software supply chain attack in the past year and 20% were not effective in their response. And 52% said their development teams use AI tools to generate code, but only 32% have processes to evaluate it for license, security, and quality risks.

Researchers at Ruhr University Bochum and TU Darmstadt presented a solution for the automated generation of fault-resistant circuits (AGEFA) and assessed the security of examples generated by AGEFA against side-channel analysis and fault injection.

TXOne reported on operational technology security and the most effective method for preventing production interruptions caused by cyber-attacks.

CrowdStrike and NVIDIA are collaborating to accelerate the use of analytics and AI in cybersecurity to help security teams combat modern cyberattacks, including AI-powered threats.

The National Institute of Standards and Technology (NIST) finalized its guidelines for protecting sensitive data, known as controlled unclassified information, aimed at organizations that do business with the federal government.

The Defense Advanced Research Projects Agency (DARPA) awarded BAE Systems a $12 million contract to solve thermal challenges limiting electronic warfare systems, particularly in GaN transistors.

Sigma Defense won a $4.7 million contract from the U.S. Army for an AI-powered virtual training environment, partnering with Brightline Interactive on a system that uses spatial computing and augmented intelligence workflows.

SkyWater’s advanced packaging operation in Florida has been accredited as a Category 1A Trusted Supplier by the Defense Microelectronics Activity (DMEA) of the U.S. Department of Defense (DoD).

Videos of two CWE-focused sessions from CVE/FIRST VulnCon 2024 were made available on the CWE YouTube Channel.

The Cybersecurity and Infrastructure Security Agency (CISA) issued a number of alerts/advisories.

Supercomputing

Supercomputers are battling for top dog.

The Frontier supercomputer at Oak Ridge National Laboratory (ORNL) retained the top spot on the Top500 list of the world’s fastest systems with an HPL score of 1.206 EFlop/s. The as-yet incomplete Aurora system at Argonne took second place, becoming the world’s second exascale system at 1.012 EFlop/s. The Green500 list, which tracks energy efficiency of compute, saw three new entrants take the top places.

Cerebras Systems, Sandia National Laboratory, Lawrence Livermore National Laboratory, and Los Alamos National Laboratory used Cerebras’ second generation Wafer Scale Engine to perform atomic scale molecular dynamics simulations at the millisecond scale, which they claim is 179X faster than the Frontier supercomputer.

UT Austin‘s Stampede3 Supercomputer is now in full production, serving the open science community through 2029.

Education and Training

SEMI announced the SEMI University Semiconductor Certification Programs to help alleviate the workforce skills gap. Its first two online courses are designed for new talent seeking careers in the industry, and experienced workers looking to keep their skills current. Also, SEMI and other partners launched a European Chip Skills Academy Summer School in Italy.

Siemens created an industry credential program for engineering students that supplements a formal degree by validating industry knowledge and skills. Nonprofit agency ABET will provide accreditation. The first two courses are live at the University of Colorado Boulder (CU Boulder) and a series is planned with Pennsylvania State University (Penn State).

Syracuse University launched a $20 million Center for Advanced Semiconductor Manufacturing, with co-funding from Onondaga County.

Starting young is a good thing. An Arizona school district, along with the University Of Arizona, is creating a semiconductor program for high schoolers.

Product News

Siemens and Sony partnered to enable immersive engineering via a spatial content creation system, NX Immersive Designer, which includes Sony’s XR head-mounted display. The integration of hardware and software gives designers and engineers natural ways to interact with a digital twin. Siemens also extended its Xcelerator as a Service portfolio with solutions for product engineering and lifecycle management, cloud-based high-performance simulation, and manufacturing operations management. It will be available on Microsoft Azure, as well.

Advantest announced the newest addition to its portfolio of power supplies for the V93000 EXA Scale SoC test platform. The DC Scale XHC32 power supply offers 32 channels with single-instrument total current of up to 640A.

Fig. 1: Advantest’s DC Scale XHC32. Source: Advantest

Infineon released its XENSIV TLE49SR angle sensors, which can withstand stray magnetic fields of up to 8 mT, ideal for applications of safety-critical automotive chassis systems.

Google debuted its sixth generation Cloud TPU, 4.7X faster and 67% more energy-efficient than the previous generation, with double the high-bandwidth memory.

X-Silicon uncorked a RISC-V vector CPU, coupled with a Vulkan-enabled GPU ISA and AI/ML acceleration in a single processor core, aimed at embedded and IoT applications.

IBM expanded its Qiskit quantum software stack, including the stable release of its SDK for building, optimizing, and visualizing quantum circuits.

Northeastern University announced the general availability of testing and integration solutions for Open RAN through the Open6G Open Testing and Integration Center (Open 6G OTIC).

Research

The University of Glasgow received £3 million (~$3.8M) from the Engineering and Physical Sciences Research Council (EPSRC)’s Strategic Equipment Grant scheme to help establish “Analogue,” an Automated Nano Analysing, Characterisation and Additive Packaging Suite to research silicon chip integration and packaging.

EPFL researchers developed scalable photonic ICs, based on lithium tantalate.

DISCO developed a way to increase the diameter of diamond wafers that uses the KABRA process, a laser ingot slicing method.

CEA-Leti developed two complementary approaches for high performance photon detectors — a mercury cadmium telluride-based avalanche photodetector and a superconducting single photon detector.

Toshiba demonstrated storage capacities of over 30TB with two next-gen large capacity recording technologies for hard disk drives (HDDs): Heat Assisted Magnetic Recording (HAMR) and Microwave Assisted Magnetic Recording (MAMR).

Caltech neuroscientists reported that their brain-machine interface (BMI) worked successfully in a second human patient, following 2022’s first instance, proving the device is not dependent on one particular brain or one location in a brain.

Linköping University researchers developed a cheap, sustainable battery made from zinc and lignin, while ORNL researchers developed carbon-capture batteries.

Events and Further Reading

Find upcoming chip industry events here, including:

Event	Date	Location
European Test Symposium	May 20 – 24	The Hague, Netherlands
NI Connect Austin 2024	May 20 – 22	Austin, Texas
ITF World 2024 (imec)	May 21 – 22	Antwerp, Belgium
Embedded Vision Summit	May 21 – 23	Santa Clara, CA
ASIP Virtual Seminar 2024	May 22	Online
Electronic Components and Technology Conference (ECTC) 2024	May 28 – 31	Denver, Colorado
Hardwear.io Security Trainings and Conference USA 2024	May 28 – Jun 1	Santa Clara, CA
SW Test	Jun 3 – 5	Carlsbad, CA
IITC2024: Interconnect Technology Conference	Jun 3 – 6	San Jose, CA
VOICE Developer Conference	Jun 3 – 5	La Jolla, CA
CHIPS R&D Standardization Readiness Level Workshop	Jun 4 – 5	Online and Boulder, CO

Find All Upcoming Events Here

Upcoming webinars are here.

Semiconductor Engineering’s latest newsletters:

Automotive, Security and Pervasive Computing
Systems and Design
Low Power-High Performance
Test, Measurement and Analytics
Manufacturing, Packaging and Materials

The post Chip Industry Week In Review appeared first on Semiconductor Engineering.

Semiconductor Engineering
Will Domain-Specific ICs Become Ubiquitous?Brian Bailey
Questions are surfacing for all types of design, ranging from small microcontrollers to leading-edge chips, over whether domain-specific design will become ubiquitous, or whether it will fall into the historic pattern of customization first, followed by lower-cost, general-purpose components. Custom hardware always has been a double-edged sword. It can provide a competitive edge for chipmakers, but often requires more time to design, verify, and manufacture a chip, which can sometimes cost a mar
16. Květen 2024 v 09:05

Will Domain-Specific ICs Become Ubiquitous?

Semiconductor Engineering

Od: Brian Bailey

16. Květen 2024 v 09:05

Questions are surfacing for all types of design, ranging from small microcontrollers to leading-edge chips, over whether domain-specific design will become ubiquitous, or whether it will fall into the historic pattern of customization first, followed by lower-cost, general-purpose components.

Custom hardware always has been a double-edged sword. It can provide a competitive edge for chipmakers, but often requires more time to design, verify, and manufacture a chip, which can sometimes cost a market window. In addition, it’s often too expensive for all but the most price-resilient applications. This is a well-understood equation at the leading edge of design, particularly where new technologies such as generative AI are involved.

But with planar scaling coming to an end, and with more features tailored to specific domains, the chip industry is struggling to figure out whether the business/technical equation is undergoing a fundamental and more permanent change. This is muddied further by the fact that some 30% to 35% of all design tools today are being sold to large systems companies for chips that will never be sold commercially. In those applications, the collective savings from improved performance per watt may dwarf the cost of designing, verifying, and manufacturing a highly optimized multi-chip/multi-chiplet package across a large data center, leaving the debate about custom vs. general-purpose more uncertain than ever.

“If you go high enough in the engineering organization, you’re going to find that what people really want to do is a software-defined whatever it is,” says Russell Klein, program director for high-level synthesis at Siemens EDA. “What they really want to do is buy off-the-shelf hardware, put some software on it, make that their value-add, and ship that. That paradigm is breaking down in a number of domains. It is breaking down where we need either extremely high performance, or we need extreme efficiency. If we need higher performance than we can get from that off-the-shelf system, or we need greater efficiency, we need the battery to last longer, or we just can’t burn as much power, then we’ve got to start customizing the hardware.”

Even the selection of processing units can make a solution custom. “Domain-specific computing is already ubiquitous,” says Dave Fick, CEO and cofounder of Mythic. “Modern computers, whether in a laptop, phone, security camera, or in farm equipment, consist of a mix of hardware blocks co-optimized with software. For instance, it is common for a computer to have video encode or decode hardware units to allow a system to connect to a camera efficiently. It is common to have accelerators for encryption so that we can safely communicate. Each of these is co-optimized with software algorithms to make commonly used functions highly efficient and flexible.”

Steve Roddy, chief marketing officer at Quadric, agrees. “Heterogeneous processing in SoCs has been de rigueur in the vast majority of consumer applications for the past two decades or more. SoCs for mobile phones, tablets, televisions, and automotive applications have long been required to meet a grueling combination of high-performance plus low-cost requirements, which has led to the proliferation of function-specific processors found in those systems today. Even low-cost SoCs for mobile phones today have CPUs for running Android, complex GPUs to paint the display screen, audio DSPs for offloading audio playback in a low-power mode, video DSPs paired with NPUs in the camera subsystem to improve image capture (stabilization, filters, enhancement), baseband DSPs — often with attached NPUs — for high speed communications channel processing in the Wi-Fi and 5G subsystems, sensor hub fusion DSPs, and even power-management processors that maximize battery life.”

It helps to separate what you call general-purpose and what is application-specific. “There is so much benefit to be had from running your software on dedicated hardware, what we call bespoke silicon, because it gives you an advantage over your competitors,” says Marc Swinnen, director of product marketing in Ansys’ Semiconductor Division. “Your software runs faster, lower power, and is designed to run specifically what you want to run. It’s hard for a competitor with off-the-shelf hardware to compete with you. Silicon has become so central to the business value, the business model, of many companies that it has become important to have that optimized.”

There is a balance, however. “If there is any cost justification in terms of return on investment and deployment costs, power costs, thermal costs, cooling costs, then it always makes sense to build a custom ASIC,” says Sharad Chole, chief scientist and co-founder of Expedera. “We saw that for cryptocurrency, we see that right now for AI. We saw that for edge computing, which requires extremely ultra-low power sensors and ultra-low power processes. But there also has been a push for general-purpose computing hardware, because then you can easily make the applications more abstract and scalable.”

Part of the seeming conflict is due to the scope of specificity. “When you look at the architecture, it’s really the scope that determines the application specificity,” says Frank Schirrmeister, vice president of solutions and business development at Arteris. “Domain-specific computing is ubiquitous now. The important part is the constant moving up of the domain specificity to something more complex — from the original IP, to configurable IP, to subsystems that are configurable.”

In the past, it has been driven more by economics. “There’s an ebb and a flow to it,” says Paul Karazuba, vice president of marketing at Expedera. “There’s an ebb and a flow to putting everything into a processor. There’s an ebb and a flow to having co-processors, augmenting functions that are inside of that main processor. It’s a natural evolution of pretty much everything. It may not necessarily be cheaper to design your own silicon, but it may be more expensive in the long run to not design your own silicon.”

An attempt to formalize that ebb and flow was made by Tsugio Makimoto in the 1990s, when he was Sony’s CTO. He observed that electronics cycled between custom solutions and programmable ones approximately every 10 years. What’s changed is that most custom chips from the time of his observation contained highly programmable standard components.

Technology drivers
Today, it would appear that technical issues will decide this. “The industry has managed to work around power issues and push up the thermal envelope beyond points I personally thought were going to be reasonable, or feasible,” says Elad Alon, co-founder and CEO of Blue Cheetah. “We’re hitting that power limit, and when you hit the power limit it drives you toward customization wherever you can do it. But obviously, there is tension between flexibility, scalability, and applicability to the broadest market possible. This is seen in the fast pace of innovation in the AI software world, where tomorrow there could be an entirely different algorithm, and that throws out almost all the customizations one may have done.”

The slowing of Moore’s Law will have a fundamental influence on the balance point. “There have been a number of bespoke silicon companies in the past that were successful for a short period of time, but then failed,” says Ansys’ Swinnen. “They had made some kind of advance, be it architectural or addressing a new market need, but then the general-purpose chips caught up. That is because there’s so much investment in them, and there’s so many people using them, there’s an entire army of people advancing, versus your company, just your team, that’s advancing your bespoke solution. Inevitably, sooner or later, they bypass you and the general-purpose hardware just gets better than the specific one. Right now, the pendulum has swung toward custom solutions being the winner.”

However, general-purpose processors do not automatically advance if companies don’t keep up with adoption of the latest nodes, and that leads to even more opportunities. “When adding accelerators to a general-purpose processor starts to break down, because you want to go faster or become more efficient, you start to create truly customized implementations,” says Siemens’ Klein. “That’s where high-level synthesis starts to become really interesting, because you’ve got that software-defined implementation as your starting point. We can take it through high-level synthesis (HLS) and build an accelerator that’s going to do that one specific thing. We could leave a bunch of registers to define its behavior, or we can just hard code everything. The less general that system is, the more specific it is, usually the higher performance and the greater efficiency that we’re going to take away from it. And it almost always is going to be able to beat a general-purpose accelerator or certainly a general-purpose processor in terms of both performance and efficiency.”

At the same time, IP has become massively configurable. “There used to be IP as the building blocks,” says Arteris’ Schirrmeister. “Since then, the industry has produced much larger and more complex IP that takes on the role of sub-systems, and that’s where scope comes in. We have seen Arm with what they call the compute sub-systems (CSS), which are an integration and then hardened. People care about the chip as a whole, and then the chip and the system context with all that software. Application specificity has become ubiquitous in the IP space. You either build hard cores, you use a configurable core, or you use high-level synthesis. All of them are, by definition, application-specific, and the configurability plays in there.”

Put in perspective, there is more than one way to build a device, and an increasing number of options for getting it done. “There’s a really large market for specialized computing around some algorithm,” says Klein. “IP for that is going to be both in the form of discrete chips, as well as IP that could be built into something. Ultimately, that has to become silicon. It’s got to be hardened to some degree. They can set some parameters and bake it into somebody’s design. Consider an Arm processor. I can configure how many CPUs I want, I can configure how big I want the caches, and then I can go bake that into a specific implementation. That’s going to be the thing that I build, and it’s going to be more targeted. It will have better efficiency and a better cost profile and a better power profile for the thing that I’m doing. Somebody else can take it and configure it a little bit differently. And to the degree that the IP works, that’s a great solution. But there will always be algorithms that don’t have a big enough market for IP to address. And that’s where you go in and do the extreme customization.”

Chiplets
Some have questioned if the emerging chiplet industry will reverse this trend. “We will continue to see systems composed of many hardware accelerator blocks, and advanced silicon integration technologies (i.e., 3D stacking and chiplets) will make that even easier,” says Mythic’s Fick. “There are many companies working on open standards for chiplets, enabling communication bandwidth and energy efficiency that is an order of magnitude greater than what can be built on a PCB. Perhaps soon, the advanced system-in-package will overtake the PCB as the way systems are designed.”

Chiplets are not likely to be highly configurable. “Configuration in the chiplet world might become just a function of switching off things you don’t need,” says Schirrmeister. “Configuration really means that you do not use certain things. You don’t get your money back for those items. It’s all basically applying math and predicting what your volumes are going to be. If it’s an incremental cost that has one more block on it to support another interface, or making the block the Ethernet block with time triggered stuff in it for automotive, that gives you an incremental effort of X. Now, you have to basically estimate whether it also gives you a multiple of that incremental effort as incremental profit. It works out this way because chips just become very configurable. Chiplets are just going in the direction or finding the balance of more generic usage so that you can apply them in more chiplet designs.”

The chiplet market is far from certain today. “The promise of chiplets is that you use only the function that you want from the supplier that you want, in the right node, at the right location,” says Expedera’s Karazuba. “The idea of specialization and chiplets are at arm’s length. They’re actually together, but chiplets have a long way to go. There’s still not that universal agreement of the different things around a chiplet that have to be in order to make the product truly mass market.”

While chiplets have been proven to work, nearly all of the chiplets in use today are proprietary. “To build a viable [commercial] chiplet company, you have to be going after a broad enough market, large enough from a dollar perspective, then you can make all the investment, have success and get everything back accordingly,” says Blue Cheetah’s Alon. “There’s a similar tension where people would like to build a general-purpose chiplet that can be used anywhere, by anyone. That is the plug-and-play discussion, but you could finish up with something that becomes so general-purpose, with so much overhead, that it’s just not attractive in any particular market. In the chiplet case, for technical reasons, it might not actually really work that way at all. You might try to build it for general purpose, and it turns out later that it doesn’t plug into particular sockets that are of interest.”

The economics of chiplet viability have not yet been defined. “The thing about chiplets is they can be small,” says Klein. “Being small means that we don’t need as big a market for them as we would for a very large chip. We can also build them on different technologies. We can have some that are on older technologies, where transistors are cheaper, and we can combine those with other chiplets that might be leading-edge nodes where we could have general-purpose CPUs or NPU accelerators. There’s a mix-and-match, and we can do chiplets smaller than we can general-purpose chips. We can do smaller runs of them. We can take that IP and customize it for a particular market vertical and create some chiplets for that, change the configuration a bit, and do another run for something else. There’s a level of customization that can be deployed and supported by the market that’s a little bit more than we’ve seen in full-size chips, where the entire thing has to be built into one package.

Conclusion
What it means for a design to be general-purpose or custom is changing. All designs will contain some of each. Some companies will develop novel architectures using general-purpose processors, and these will be better than a fully general-purpose solution. Others will create highly customized hardware for some functions that are known to be stable, and general purpose for things that are likely to change. One thing has never changed, however. A company is not likely to add more customization than necessary to satisfy the needs of the market they are targeting.

Further Reading
Challenges With Chiplets And Power Delivery
Benefits and challenges in heterogeneous integration.
Chiplets: 2023 (EBook)
What chiplets are, what they are being used for today, and what they will be used for in the future.

The post Will Domain-Specific ICs Become Ubiquitous? appeared first on Semiconductor Engineering.

Kotaku
Final Fantasy Ghosts, Warframe Starting Help, And More TipsKotaku Staff
A game doesn’t have to be newly released for it to require some guidance to get through, as this week’s batch of tips will prove. From the OG Final Fantasy 7 to Metal Gear Solid 2, there’s some older games that people have been recently dipping into that may require some support to get through. Thats’ what we’re here…Read more...
18. Květen 2024 v 16:00

Final Fantasy Ghosts, Warframe Starting Help, And More Tips

Kotaku

Od: Kotaku Staff

18. Květen 2024 v 16:00

A game doesn’t have to be newly released for it to require some guidance to get through, as this week’s batch of tips will prove. From the OG Final Fantasy 7 to Metal Gear Solid 2, there’s some older games that people have been recently dipping into that may require some support to get through. Thats’ what we’re here…

Reset Domain Crossing Verification

Semiconductor Engineering

Od: Siemens EDA

13. Květen 2024 v 09:01

By Reetika and Sulabh Kumar Khare, Siemens EDA DI SW

To meet low-power and high-performance requirements, system on chip (SoC) designs are equipped with several asynchronous and soft reset signals. These reset signals help to safeguard software and hardware functional safety as they can be asserted to speedily recover the system onboard to an initial state and clear any pending errors or events.

By definition, a reset domain crossing (RDC) occurs when a path’s transmitting flop has an asynchronous reset, and the receiving flop either has a different asynchronous reset than the transmitting flop or has no reset. The multitude of asynchronous reset sources found in today’s complex automotive designs means there are a large number of RDC paths, which can lead to systematic faults and hence cause data-corruption, glitches, metastability, or functional failures — along with other issues.

This issue is not covered by standard, static verification methods, such as clock domain crossing (CDC) analysis. Therefore, a proper reset domain crossing verification methodology is required to prevent errors in the reset design during the RTL verification stage.

A soft reset is an internally generated reset (register/latch/black-box output is used as a reset) that allows the design engineer to reset a specific portion of the design (specific module/subsystem) without affecting the entire system. Design engineers frequently use a soft reset mechanism to reset/restart the device without fully powering it off, as this helps to conserve power by selectively resetting specific electronic components while keeping others in an operational state. A soft reset typically involves manipulating specific registers or signals to trigger the reset process. Applying soft resets is a common technique used to quickly recover from a problem or test a specific area of the design. This can save time during simulation and verification by allowing the designer to isolate and debug specific issues without having to restart the entire simulation. Figure 1 shows a simple soft reset and its RTL to demonstrate that SoftReg is a soft reset for flop Reg.

Fig. 1: SoftReg is a soft reset for register Reg.

This article presents a systematic methodology to identify RDCs, with different soft resets, that are unsafe, even though the asynchronous reset domain is the same on the transmitter and receiver ends. Also, with enough debug aids, we will identify the safe RDCs (safe from metastability only if it meets the static timing analysis), with different asynchronous reset domains, that help to avoid silicon failures and minimize false crossing results. As a part of static analysis, this systematic methodology enables designers to intelligently identify critical reset domain bugs associated with soft resets.

A methodology to identify critical reset domain bugs

With highly complex reset architectures in automotive designs, there arises the need for a proper verification method to detect RDC issues. It is essential to detect unsafe RDCs systematically and apply appropriate synchronization techniques to tackle the issues that may arise due to delays in reset paths caused by soft resets. Thus designers can ensure proper operation of their designs and avoid the associated risks. By handling RDCs effectively, designers can mitigate potential issues and enhance the overall robustness and performance of a design. This systematic flow involves several steps to assist in RDC verification closure using standard RDC verification tools (see figure 2).

Fig. 2: Flowchart of methodology for RDC verification.

Specification of clock and reset signals

Signals that are intended to generate a clock and reset pulse should be specified by the user as clock or reset signals, respectively, during the set-up step in RDC verification. By specifying signals as clocks or resets (according to their expected behavior), designers can perform design rule checking and other verification checks to ensure compliance with clock and reset related guidelines and standards as well as best practices. This helps identify potential design issues and improve the overall quality of the design by reducing noise in the results.

Clock detection

Ideally, design engineers should define the clock signals and then the verification tool should trace these clocks down to the leaf clocks. Unfortunately, with complex designs, this is not possible as the design might have black boxes that originate clocks, or it may have some combinational logic in the clock signals that do not cover all the clocks specified by the user. All the un-specified clocks need to be identified and mapped to the user-specified primary clocks. An exhaustive detection of clocks is required in RDC verification, as potential metastability may occur if resets are used in different clock domains than the sequential element itself, leading to critical bugs.

Reset detection

Ideally, design engineers should define the reset signals, but again, due to the complexity of automotive and other modern designs, it is not possible to specify all the reset signals. Therefore a specialized verification tool is required for detection of resets. All the localized, black-box, gated, and primary resets need to be identified, and based on their usage in the RTL, they should be classified as synchronous, asynchronous, or dual type and then mapped to the user-specified primary resets.

Soft reset detection

The soft resets — i.e., the internally generated resets by flops and latches — need to be systematically detected as they can cause critical metastability issues when used in different clock domains, and they require static timing analysis when used in the same clock domain. Detecting soft resets helps identify potential metastability problems and allows designers to apply proper techniques for resolving these issues.

Reset tree analysis

Analysis of reset trees helps designers identify issues early in the design process, before RDC analysis. It helps to highlight some important errors in the reset design that are not commonly caught by lint tools. These include:

Dual synchronicity reset signals, i.e., the reset signal with a sample synchronous reset flop and a sample asynchronous reset flop
An asynchronous set/reset signal used as a data signal can result in incorrect data sampling because the reset state cannot be controlled

Reset domain crossing analysis

This step involves analyzing a design to determine the logic across various reset domains and identify potential RDCs. The analysis should also identify common reset sequences of asynchronous and soft reset sources at the transmitter and receiver registers of the crossings to avoid detection of false crossings that might appear as potential issues due to complex combinations of reset sources. False crossings are where a transmitter register and receiver register are asserted simultaneously due to dependencies among the reset assertion sequences, and as a result, any metastability that might occur on the receiver end is mitigated.

Analyze and fix RDC issues

The concluding step is to analyze the results of the verification steps to verify if data paths crossing reset domains are safe from metastability. For the RDCs identified as unsafe — which may occur either due to different asynchronous reset domains at the transmitter and receiver ends or due to the soft reset being used in a different clock domain than the sequential element itself — design engineers can develop solutions to eliminate or mitigate metastability by restructuring the design, modifying reset synchronization logic, or adjusting the reset ordering. Traditionally safe RDCs — i.e., crossings where a soft reset is used in the same clock domain as the sequential element itself — need to be verified using static timing analysis.

Figure 3 presents our proposed flow for identifying and eliminating metastability issues due to soft resets. After implementing the RDC solutions, re-verify the design to ensure that the reset domain crossing issues have been effectively addressed.

Fig. 3: Flowchart for proposed methodology to tackle metastability issues due to soft resets.

This methodology was used on a design with 374,546 register bits, 8 latch bits, and 45 RAMs. The Questa RDC verification tool using this new methodology identified around 131 reset domains, which consisted of 19 asynchronous domains defined by the user, as well as 81 asynchronous reset domains inferred by the tool.

The first run analyzed data paths crossing asynchronous reset domains without any soft reset analysis. It reported nearly 40,000 RDC crossings (as shown in table 1).

Reset domain crossings without soft reset analysis	Severity	Number of crossings
Reset domain crossing from a reset to a reset	Violation	28408
Reset domain crossing from a reset to non-reset	Violation	11235

Table 1: RDC analysis without soft resets.

In the second run, we did soft reset analysis and detected 34 soft resets, which resulted in additional violations for RDC paths with transmitter soft reset sources in different clock domains. These were critical violations that were missed in the initial run. Also, some RDC violations were converted to cautions (RDC paths with a transmitter soft reset in the same clock domain) as these paths would be safe from metastability as long as they meet the setup time window (as shown in table 2).

Reset domain crossings with soft reset analysis	Severity	Number of crossings
Reset domain crossing from a reset to a reset	Violation	26957
Reset domain crossing from a reset to non-reset	Violation	10523
Reset domain crossing with tx reset source in different clock	Violation	880
Reset domain crossing from a reset to Rx with same clock	Caution	2412

Table 2: RDC analysis with soft resets.

To gain a deeper understanding of RDC, metastability, and soft reset analysis in the context of this new methodology, please download the full paper Techniques to identify reset metastability issues due to soft resets.

The post Reset Domain Crossing Verification appeared first on Semiconductor Engineering.

Semiconductor Engineering
SRAM Security Concerns GrowKaren Heyman
SRAM security concerns are intensifying as a combination of new and existing techniques allow hackers to tap into data for longer periods of time after a device is powered down. This is particularly alarming as the leading edge of design shifts from planar SoCs to heterogeneous systems in package, such as those used in AI or edge processing, where chiplets frequently have their own memory hierarchy. Until now, most cybersecurity concerns involving volatile memory have focused on DRAM, because it
9. Květen 2024 v 09:08

SRAM Security Concerns Grow

Semiconductor Engineering

Od: Karen Heyman

9. Květen 2024 v 09:08

SRAM security concerns are intensifying as a combination of new and existing techniques allow hackers to tap into data for longer periods of time after a device is powered down.

This is particularly alarming as the leading edge of design shifts from planar SoCs to heterogeneous systems in package, such as those used in AI or edge processing, where chiplets frequently have their own memory hierarchy. Until now, most cybersecurity concerns involving volatile memory have focused on DRAM, because it is often external and easier to attack. SRAM, in contrast, does not contain a component as obviously vulnerable as a heat-sensitive capacitor, and in the past it has been harder to pinpoint. But as SoCs are disaggregated and more features are added into devices, SRAM is becoming a much bigger security concern.

The attack scheme is well understood. Known as cold boot, it was first identified in 2008, and is essentially a variant of a side-channel attack. In a cold boot approach, an attacker dumps data from internal SRAM to an external device, and then restarts the system from the external device with some code modification. “Cold boot is primarily targeted at SRAM, with the two primary defenses being isolation and in-memory encryption,” said Vijay Seshadri, distinguished engineer at Cycuity.

Compared with network-based attacks, such as DRAM’s rowhammer, cold boot is relatively simple. It relies on physical proximity and a can of compressed air.

The vulnerability was first described by Edward Felton, director of Princeton University’s Center for Information Technology Policy, J. Alex Halderman, currently director of the Center for Computer Security & Society at the University of Michigan, and colleagues. The breakthrough in their research was based on the growing realization in the engineering research community that data does not vanish from memory the moment a device is turned off, which until then was a common assumption. Instead, data in both DRAM and SRAM has a brief “remanence.”[1]

Using a cold boot approach, data can be retrieved, especially if an attacker sprays the chip with compressed air, cooling it enough to slow the degradation of the data. As the researchers described their approach, “We obtained surface temperatures of approximately −50°C with a simple cooling technique — discharging inverted cans of ‘canned air’ duster spray directly onto the chips. At these temperatures, we typically found that fewer than 1% of bits decayed even after 10 minutes without power.”

Unfortunately, despite nearly 20 years of security research since the publication of the Halderman paper, the authors’ warning still holds true. “Though we discuss several strategies for mitigating these risks, we know of no simple remedy that would eliminate them.”

However unrealistic, there is one simple and obvious remedy to cold boot — never leave a device unattended. But given human behavior, it’s safer to assume that every device is vulnerable, from smart watches to servers, as well as automotive chips used for increasingly autonomous driving.

While the original research exclusively examined DRAM, within the last six years cold boot has proven to be one of the most serious vulnerabilities for SRAM. In 2018, researchers at Germany’s Technische Universität Darmstadt published a paper describing a cold boot attack method that is highly resistant to memory erasure techniques, and which can be used to manipulate the cryptographic keys produced by the SRAM physical unclonable function (PUF).

As with so many security issues, it’s been a cat-and-mouse game between remedies and counter-attacks. And because cold boot takes advantage of slowing down memory degradation, in 2022 Yang-Kyu Choi and colleagues at the Korea Advanced Institute of Science and Technology (KAIST), described a way to undo the slowdown with an ultra-fast data sanitization method that worked within 5 ns, using back bias to control the device parameters of CMOS.

Fig. 1: Asymmetric forward back-biasing scheme for permanent erasing. (a) All the data are reset to 1. (b) All the data are reset to 0. Whether all the data where reset to 1 or 0 is determined by the asymmetric forward back-biasing scheme. Source: KAIST/Creative Commons [2]

Their paper, as well as others, have inspired new approaches to combating cold boot attacks.

“To mitigate the risk of unauthorized access from unknown devices, main devices, or servers, check the authenticated code and unique identity of each accessing device,” said Jongsin Yun, memory technologist at Siemens EDA. “SRAM PUF is one of the ways to securely identify each device. SRAM is made of two inverters cross-coupled to each other. Although each inverter is designed to be the same device, normally one part of the inverter has a somewhat stronger NMOS than the other due to inherent random dopant fluctuation. During the initial power-on process, SRAM data will be either data 1 or 0, depending on which side has a stronger device. In other words, the initial data state of the SRAM array at the power on is decided by this unique random process variation and most of the bits maintain this property for life. One can use this unique pattern as a fingerprint of a device. The SRAM PUF data is reconstructed with other coded data to form a cryptographic key. SRAM PUF is a great way to anchor its secure data into hardware. Hackers may use a DFT circuit to access the memory. To avoid insecurely reading the SRAM information through DFT, the security-critical design makes DFT force delete the data as an initial process of TEST mode.”

However, there can be instances where data may be required to be kept in a non-volatile memory (NVM). “Data is considered insecure if the NVM is located outside of the device,” said Yun. “Therefore, secured data needs to be stored within the device with write protection. One-time programmable (OTP) memory or fuses are good storage options to prevent malicious attackers from tampering with the modified information. OTP memory and fuses are used to store cryptographic keys, authentication information, and other critical settings for operation within the device. It is useful for anti-rollback, which prevents hackers from exploiting old vulnerabilities that have been fixed in newer versions.”

Chiplet vulnerabilities
Chiplets also could present another vector for attack, due to their complexity and interconnections. “A chiplet has memory, so it’s going to be attacked,” said Cycuity’s Seshadri. “Chiplets, in general, are going to exacerbate the problem, rather than keeping it status quo, because you’re going to have one chiplet talking to another. Could an attack on one chiplet have a side effect on another? There need to be standards to address this. In fact, they’re coming into play already. A chiplet provider has to say, ‘Here’s what I’ve done for security. Here’s what needs to be done when interfacing with another chiplet.”

Yun notes there is a further physical vulnerability for those working with chiplets and SiPs. “When multiple chiplets are connected to form a SiP, we have to trust data coming from an external chip, which creates further complications. Verification of the chiplet’s authenticity becomes very important for SiPs, as there is a risk of malicious counterfeit chiplets being connected to the package for hacking purposes. Detection of such counterfeit chiplets is imperative.”

These precautions also apply when working with DRAM. In all situations, Seshardi said, thinking about security has to go beyond device-level protection. “The onus of protecting DRAM is not just on the DRAM designer or the memory designer,” he said. “It has to be secured by design principles when you are developing. In addition, you have to look at this holistically and do it at a system level. You must consider all the other things that communicate with DRAM or that are placed near DRAM. You must look at a holistic solution, all the way from software down to things like the memory controller and then finally, the DRAM itself.”

Encryption as a backup
Data itself always must be encrypted as second layer of protection against known and novel attacks, so an organization’s assets will still be protected even if someone breaks in via cold boot or another method.

“The first and primary method of preventing a cold boot attack is limiting physical access to the systems, or physically modifying the systems case or hardware preventing an attacker’s access,” said Jim Montgomery, market development director, semiconductor at TXOne Networks. “The most effective programmatic defense against an attack is to ensure encryption of memory using either a hardware- or software-based approach. Utilizing memory encryption will ensure that regardless of trying to dump the memory, or physically removing the memory, the encryption keys will remain secure.”

Montgomery also points out that TXOne is working with the Semiconductor Manufacturing Cybersecurity Consortium (SMCC) to develop common criteria based upon SEMI E187 and E188 standards to assist DM’s and OEM’s to implement secure procedures for systems security and integrity, including controlling the physical environment.

What kind and how much encryption will depend on use cases, said Jun Kawaguchi, global marketing executive for Winbond. “Encryption strength for a traffic signal controller is going to be different from encryption for nuclear plants or medical devices, critical applications where you need much higher levels,” he said. “There are different strengths and costs to it.”

Another problem, in the post-quantum era, is that encryption itself may be vulnerable. To defend against those possibilities, researchers are developing post-quantum encryption schemes. One way to stay a step ahead is homomorphic encryption [HE], which will find a role in data sharing, since computations can be performed on encrypted data without first having to decrypt it.

Homomorphic encryption could be in widespread use as soon as the next few years, according to Ronen Levy, senior manager for IBM’s Cloud Security & Privacy Technologies Department, and Omri Soceanu, AI Security Group manager at IBM. However, there are still challenges to be overcome.

“There are three main inhibitors for widespread adoption of homomorphic encryption — performance, consumability, and standardization,” according to Levy. “The main inhibitor, by far, is performance. Homomorphic encryption comes with some latency and storage overheads. FHE hardware acceleration will be critical to solving these issues, as well as algorithmic and cryptographic solutions, but without the necessary expertise it can be quite challenging.”

An additional issue is that most consumers of HE technology, such as data scientists and application developers, do not possess deep cryptographic skills, HE solutions that are designed for cryptographers can be impractical. A few HE solutions require algorithmic and cryptographic expertise that inhibit adoption by those who lack these skills.

Finally, there is a lack of standardization. “Homomorphic encryption is in the process of being standardized,” said Soceanu. “But until it is fully standardized, large organizations may be hesitant to adopt a cryptographic solution that has not been approved by standardization bodies.”

Once these issues are resolved, they predicted widespread use as soon as the next few years. “Performance is already practical for a variety of use cases, and as hardware solutions for homomorphic encryption become a reality, more use cases would become practical,” said Levy. “Consumability is addressed by creating more solutions, making it easier and hopefully as frictionless as possible to move analytics to homomorphic encryption. Additionally, standardization efforts are already in progress.”

A new attack and an old problem
Unfortunately, security never will be as simple as making users more aware of their surroundings. Otherwise, cold boot could be completely eliminated as a threat. Instead, it’s essential to keep up with conference talks and the published literature, as graduate students keep probing SRAM for vulnerabilities, hopefully one step ahead of genuine attackers.

For example, SRAM-related cold boot attacks originally targeted discrete SRAM. The reason is that it’s far more complicated to attack on-chip SRAM, which is isolated from external probing and has minimal intrinsic capacitance. However, in 2022, Jubayer Mahmod, then a graduate student at Virginia Tech and his advisor, associate professor Matthew Hicks, demonstrated what they dubbed “Volt Boot,” a new method that could penetrate on-chip SRAM. According to their paper, “Volt Boot leverages asymmetrical power states (e.g., on vs. off) to force SRAM state retention across power cycles, eliminating the need for traditional cold boot attack enablers, such as low-temperature or intrinsic data retention time…Unlike other forms of SRAM data retention attacks, Volt Boot retrieves data with 100% accuracy — without any complex post-processing.”

Conclusion
While scientists and engineers continue to identify vulnerabilities and develop security solutions, decisions about how much security to include in a design is an economic one. Cost vs. risk is a complex formula that depends on the end application, the impact of a breach, and the likelihood that an attack will occur.

“It’s like insurance,” said Kawaguchi. “Security engineers and people like us who are trying to promote security solutions get frustrated because, similar to insurance pitches, people respond with skepticism. ‘Why would I need it? That problem has never happened before.’ Engineers have a hard time convincing their managers to spend that extra dollar on the costs because of this ‘it-never-happened-before’ attitude. In the end, there are compromises. Yet ultimately, it’s going to cost manufacturers a lot of money when suddenly there’s a deluge of demands to fix this situation right away.”

References

S. Skorobogatov, “Low temperature data remanence in static RAM”, Technical report UCAM-CL-TR-536, University of Cambridge Computer Laboratory, June 2002.
Han, SJ., Han, JK., Yun, GJ. et al. Ultra-fast data sanitization of SRAM by back-biasing to resist a cold boot attack. Sci Rep 12, 35 (2022). https://doi.org/10.1038/s41598-021-03994-2

The post SRAM Security Concerns Grow appeared first on Semiconductor Engineering.

Semiconductor Engineering
Software-Defined Vehicle Momentum GrowsAnn Mutschler
Experts at the Table: The automotive ecosystem is undergoing a transformation toward software-defined vehicles, spurring new architectures with more software. Semiconductor Engineering sat down to discuss the impact of these changes with Suraj Gajendra, vice president of products and solutions in Arm‘s automotive line of business; Chuck Alpert, R&D automotive fellow at Cadence; Steve Spadoni, zone controller and power distribution application manager at Infineon; Rebeca Delgado, chief techno
9. Květen 2024 v 09:06

Software-Defined Vehicle Momentum Grows

Semiconductor Engineering

Od: Ann Mutschler

9. Květen 2024 v 09:06

Experts at the Table: The automotive ecosystem is undergoing a transformation toward software-defined vehicles, spurring new architectures with more software. Semiconductor Engineering sat down to discuss the impact of these changes with Suraj Gajendra, vice president of products and solutions in Arm‘s automotive line of business; Chuck Alpert, R&D automotive fellow at Cadence; Steve Spadoni, zone controller and power distribution application manager at Infineon; Rebeca Delgado, chief technology officer and principal AI engineer at Intel Automotive; Cyril Clocher, senior director in the automotive product line for high-performance computing at Renesas; David Fritz, vice president, hybrid and virtual systems at Siemens EDA; and Marc Serughetti, senior director, systems design group at Synopsys. What follows are excerpts of that discussion.

L-R: Arm’s Gajendra, Cadence’s Alpert, Infineon’s Spadoni, Intel’s Delgado, Renesas’ Clocher, Siemens’ Fritz, Synopsys’ Serughetti.

L-R: Arm’s Gajendra, Cadence’s Alpert, Infineon’s Spadoni, Intel’s Delgado, Renesas’ Clocher, Siemens’ Fritz, Synopsys’ Serughetti.

SE: The automotive ecosystem is undergoing a technology evolution the likes of which has not been seen, including the move to software-defined vehicles. To set a baseline for this discussion, what is your definition of an SDV?

Gajendra: A software-defined vehicle is a concept, a trend, an idea, where the whole ecosystem can drive new capabilities and new user experiences into the car, even after it rolls out of the showroom or dealership. It’s a pretty loaded concept. There’s a lot of infrastructure that needs to come together, such as software development in the cloud, seamless deployment of that software development onto the car, the whole deployment of over-the-air updates, and the connectivity. In short, the concept of a software-defined vehicle is expecting a world where we can drive new experiences, new capabilities, and new features into the car throughout its lifetime.

Alpert: In thinking about what SDV means, one example is the battery — especially in an EV. I’m not talking about the technology of the battery that’s evolved, but rather the idea that in the past when you wanted to charge your car in your garage and you were worried about starting a fire, you’d think, ‘No, don’t do that because your whole house could burn down.’ The idea is that in the past, maybe we might put a temperature sensor on the battery, but now we actually have software that can monitor it. It might even have AI to predict if the battery is reaching some state that might cause a fire in the future. You also might have something that connects to the power grid and learns when is a good time to charge, because it’s a low-usage period so it’s cheaper. This is just one part of the car, but you can imagine a whole bunch of software that you want to put on top of it in order to connect to the universe. You need a software-defined vehicle platform in order for this, or in all the other parts of your car, to communicate with the world and provide the best user experience.

Spadoni: Infineon’s definition of a software-defined vehicle is a redefining of architecture — specifically, electrical and electronic architecture, feature allocation, and the entire topology of the vehicle, from power generation and storage to power distribution and high compute. It really means new electrical architectures, and it has consequences for the business model of every OEM and Tier 1 involved. It’s a major change to previous methodologies in the last 30 years.

Delgado: Software-defined vehicle is not just over-the-air updates. It’s truly a new methodology and a new philosophy for how to architect every ingredient of the vehicle to continue to deliver value over time, in which the value is very tightly attached to the software that delivers the user experience. Ultimately, this architecture must enable the different practices on how to deliver this new value over time. What’s very interesting is that these practices of moving to software-defined architecture has been done by many other industries already. Intel has a ton of heritage, and actually helped those industries transform. That transformation is truly what we’re observing here. It’s an incredible opportunity, and possibly a crisis if not done right.

Clocher: To apply an analogy here, the car is the new smartphone. But for us, it’s more than that. I’ve heard about the platform, yes, and it’s the major architecture evolution that we’ll see in the next decade. For us at Renesas, it will be a journey that will take time to enhance the user experience, to generate new revenue streams for the industry as it moves from decentralized to centralized classic compute with zonal architecture. We can apply all those buzzwords to a software-defined vehicle. Those platform will need big computers and heavy complex hardware solutions and this will generate evolutions, upgrades to the car during its entire lifetime, but underneath we know — at least at Renesas, and certainly at some other players and silicon vendors — that this will need a huge amount of hardware resources to manage what we have in mind to deploy this platform.

Fritz: I see software-defined vehicles a bit differently than what’s been mentioned so far. For many years, you’d have the hardware team doing their design, and the software team doing their design, and it all needs to come together. There’s an English natural language discussion about what needs to happen, and as we all know, that never really goes terribly well. In automotive that becomes an integration storm, and it is a nightmare. With the new compute requirements that have been mentioned already, that just compounds the issue. So the way I see this is that we tend, as people who have an engineering background, to dive into how we’re going to do things. We hear ‘software-defined vehicle,’ we immediately think about how to do that. There’s not a lot of thought about why it needs to be done, and what needs to happen. We jump into the ‘how’ too early, and a lot of the discussion here is exemplary of that kind of approach. When I’m looking at software-defined vehicles, I’m looking at why it’s important that the software needs to run effectively on a piece of hardware. And for that hardware, why is it important for it to actually operate properly on the software? Then you can decide how to put together a new methodology that’s going to bring those things together. In the past, it’s been called hardware/software co-design. There have been attempts many times, and as has been mentioned, other industries have made this transition. What’s unique about automotive is that it’s not just one transition that needs to happen. It’s hundreds or thousands of transitions. The ecosystem needs to be turned upside down, which we’re seeing happen right now, and you need to bring all that together. It really is a methodology where you need the tooling, you need the processes, you need the thinking, you need the organizations to change so that they can make this transition in a realistic way. SDV is a huge transition. It is a way for the automotive industry to morph into something that has longevity and can meet customer expectations, which it really hasn’t met for some time now.

Serughetti: At the end of the day, if we look starting at the top from our perspective, SDV is a means to bring and enhance the car experience for the customer. That’s the end result that the OEMs look at, but they look at it from the perspective of how that improves the OEM efficiencies, and how that creates new business opportunities. The way we look at it, and what’s important, is the impact it has on the industry, the impact on the processes, on the methodologies, on the people, on the ecosystem, on the technology. It’s really a transformation of the automotive market that is going to fundamentally change how the industry moves forward and bring the OEM into a world in which they are really looking at how they become efficient in delivering cars, how they bring new features, but at the same time, how they evolve their business as well.

SE: As you’ve all described, SDV requires many inter-dependencies, and the entire ecosystem has to have an understanding of the ‘why,’ which should then lead back to laying out the plan for how to get there. Where does the ecosystem stand today in terms of realizing SDV?

Fritz: OEMs have decided in the last few years that they’ve got to take control of their own destiny. They cannot simply take what the suppliers provide. They need a methodology — like this whole SDV concept, and any tooling necessary to provide that — to push down into their suppliers, such that, ‘Here’s what I need. If you can’t do this for me, I will go find someone that will.’ This is not the old ecosystem that bubbled up from the IP to the Tier 2s, to the Tier 1s, and then to the OEMs, which gave them limited choices to go from. So when I say, “Turn the ecosystem upside down,” that’s what is happening. But every OEM has their own ecosystem, and they’re not all in the same place. Even region-to-region, they can be very different.

Delgado: This is a critical discussion, and effectively where the industry has to eventually settle. The magnitude of the transformation of the ecosystem includes roles in the technology evolution. The silicon content is expected to quadruple over the next few years in the vehicle for defining the in-cabin experience of the end user. At the end of the day, the complexity of the transition of roles is of such magnitude that the proprietary, fragmented, and broken approaches that David articulated are really not going to enable the industry to transform at the speed it requires to deliver and meet the experiences. But more than anything, they are not going to address the actual technology changes necessary to implement and allow for this value delivery mechanism. At the end of the day, this is where Intel really believes collaboration is key, and anybody who wants to participate in this ecosystem must provide scalability — also known as top-to-bottom support of the different product lines that our OEMs and Tier 1s are having to support, versus a broken-up approach on these ever-evolving higher performance and higher performance compute needs. It has to be future-proof, because you’re going to launch the vehicle eventually. So certain hardware has to be future-proofed to a certain affordability envelope, and there has to be a strategy around that. And then the ecosystem and that collaboration must be able to deliver that aggregation. It has to be done with certain anchoring technology that will allow us to deliver that performance. Collaboration is key in the sense that these technologies cannot be single-handedly owned, developed, let alone owned, defined, developed, and integrated by OEMs in silos with a proprietary end-to-end architecture definition. There obviously will be differentiations on the actual implementation, but the technologies at large have to have a sense of reuse, particularly from other verticals that have already done software-defined transformations and then tuned in the right ways toward the automotive requirements.

Spadoni: There are probably a wide variety of implementations. At Infineon, we partner with OEMs and Tier 1s and we see different approaches. For example, General Motors has more of a modular approach that emulates what happened in in the mobile phone space. It seems that Ford has a more pragmatic approach, along with Stellantis, but all of them are facing very similar challenges in that affordability has become a big problem. There are multiple generations of implementations that are going to occur, and you’ll see a striving toward how to pay for this extra hardware. It leads to tradeoffs in implementations of other systems that have to have savings in order for them to afford these vehicles. No one ever goes into a dealership and says, ‘Give me a software-defined vehicle.’ Everyone’s looking for value, and you can see it now with volumes going down. There’s a saturation of people buying at the high level. The OEMs want to get more sales, which means they’ll have to go to the lower-cost-value vehicles, and that’s going to affect the electrical and electronic architectures and the software-defined vehicle.

Clocher: What we’re seeing I would summarize as the impact on the ecosystem. We’re moving to an OEM-centric ecosystem. One size does not fit all, meaning OEMs will have their different tastes, their different definitions of levels of integration they want to have in their software-defined vehicle — especially given more complex tasks that we all have to do, rather than the challenge we have to solve, because we’re not talking about a common umbrella of software-defined vehicle. But it really does mean different implementations and different meanings for OEM A from OEM B. I would fully agree with David and Steve that we are far from having a common understanding of, at least, the market itself. And that’s fine, because this will bring differentiation, and ultimately that’s why a customer will go to Dealership A versus Dealership B. This is what the industry wants to see — continue to differentiate, continue to add value to the ultimate product, which is the car.

Serughetti: The important point in all this is, of course, you’re breaking the model that exists today. That’s one of the big challenges. We used to have Tier 1s that were building boxes, and delivering software. This was a complete black box. When it would go to integration, there were all sorts of problems. And now you’re going to break this? The challenge for the OEM is how they do this. They want to control software, but are they equipped to do this today? We see the problems today that some of the legacy OEMs have in setting up their software organizations, the challenges of CARIAD and all such organizations that are trying to do this. It’s not easy to change those companies. Of course, the new entrants don’t have this problem because they are coming from a brand new design versus the ones that deal with legacy. So for the OEM, it’s about how to take control of the software. What does that mean in terms of the processes, in terms of agile development, digital twins, and all of these technologies everybody’s talking about? The other side is, ‘It’s all nice, this software,’ but this software runs on all the companies that are delivering hardware, and that becomes essential to it. You can have the best software, but if your hardware is not there to support performance, power, and all of those aspects, you’re not going to be successful. So the ecosystem is evolving how hardware, software, and all of this comes together. The OEM wants to be the central point. That’s what we’re talking about in terms of the process methodology aspects that are making this transition evolve.

Gajendra: Where are we in this journey? How far have we come? And where are we going? Going back to the point that David mentioned earlier about supply chain evolving and the supply chain turned upside down, five years ago, if we sat here in this sort of a panel and discussed software-defined vehicles, the conversation would have been entirely different. It would have been stuck with the traditional supply chain that we’ve seen for the last 35 or 40 years in the automotive industry. There are fundamentally two aspects here. The supply chain is evolving, and the infrastructure that we, as a community — this team, for example, and many others in the community — are trying to enable is going to be key to making our EDA partners happy. The use of virtual platforms today in the cloud to try and shift left and develop and validate some of these technologies and software wasn’t even there five years ago, so we’ve come a long way. We’ve made a lot of progress together as an industry. Yes, we have a long way to go until we actually have a truly software-defined vehicle. We can go and ask for a software-defined vehicle in the dealership. But the changes we are seeing in terms of all sorts of technology providers trying to make sure that the technology that we eventually will have in the hardware is provided in some sort of virtual form, be it fast models or whatever it is in the cloud, for the vast majority of software ecosystem in automotive this is a big change. I was at Embedded World, and the amount of virtual platforms and the demos that people were actually showing — silicon partners like we have here, Intel, Renesas, Infineon, EDA companies — pointed to a strong movement of, ‘Let’s build the infrastructure that we can build, and then provide that infrastructure to the OEMs to take it from there.’ There is a lot of work going on. Together we will make the infrastructure across the board, be it virtual platform or others, richer and more capable.

Alpert: For sure, OEMs have to control their own destiny. In the past, they would do it by differentiating maybe because they had better engine performance, or some other feature. But going forward, the differentiation is going to be their software. Whoever can make software that will provide additional value, and brand it, that’s going to be the differentiator and that’s the trend. In terms of how you get there, a shared ecosystem is important. SOAFEE is a potential way that, together with virtual platforms, you can provide a shared ecosystem for development, but still allow everyone to differentiate and plug-and-play. That’s one reason we’re working closely with Arm on trying to have a reference design specifically for this purpose. But again, we’re not saying, ‘This is the design you use. This is how you do it.’ That’s not it. The point is, let’s start somewhere, and then people can start swapping out pieces and doing different things. As long as OEMs can plug-and-play, then they can still differentiate. But they don’t have to invent everything themselves, which would be too costly.

Related Reading
Software-Defined Vehicles Ready To Roll
New approach could have big effects on cost, safety, security, and time to market.

The post Software-Defined Vehicle Momentum Grows appeared first on Semiconductor Engineering.

Semiconductor Engineering
SRAM Security Concerns GrowKaren Heyman
SRAM security concerns are intensifying as a combination of new and existing techniques allow hackers to tap into data for longer periods of time after a device is powered down. This is particularly alarming as the leading edge of design shifts from planar SoCs to heterogeneous systems in package, such as those used in AI or edge processing, where chiplets frequently have their own memory hierarchy. Until now, most cybersecurity concerns involving volatile memory have focused on DRAM, because it
9. Květen 2024 v 09:08