Semiconductor Engineering
Blog Review: Aug. 21Jesse Allen
Cadence’s Reela Samuel explores the critical role of PCIe 6.0 equalization in maintaining signal integrity and solutions to mitigate verification challenges, such as creating checkers to verify all symbols of TS0, ensuring the correct functioning of scrambling, and monitoring phase and LTSSM state transitions. Siemens’ John McMillan introduces an advanced packaging flow for Intel’s Embedded Multi-die Interconnect Bridge (EMIB) technology, including technical challenges, design methodologies, and
21. Srpen 2024 v 09:01

Blog Review: Aug. 21

Od: Jesse Allen

21. Srpen 2024 v 09:01

Cadence’s Reela Samuel explores the critical role of PCIe 6.0 equalization in maintaining signal integrity and solutions to mitigate verification challenges, such as creating checkers to verify all symbols of TS0, ensuring the correct functioning of scrambling, and monitoring phase and LTSSM state transitions.

Siemens’ John McMillan introduces an advanced packaging flow for Intel’s Embedded Multi-die Interconnect Bridge (EMIB) technology, including technical challenges, design methodologies, and the integration of EMIBs in system-level package designs.

Synopsys’ Dustin Todd checks out what’s next for the U.S. CHIPS and Science Act, including the establishment of the National Semiconductor Technology Center and the allocation of $13 billion for research and development efforts.

Keysight’s Roberto Piacentini Filho explores the challenges of managing the large design files and massive volumes of data involved in a modern chip design project, which can take up as much as a terabyte of disk space and involve hundreds of thousands of files.

Arm’s Sandeep Mistry shows how ML models developed for mobile computer vision applications and requiring tens to hundreds of millions of multiply-accumulate (MACs) operations per inference can be deployed to a modern microcontroller.

Ansys’ Aliyah Mallak explores an effort to manufacture biotech products in microgravity and how simulation helps ensure payloads containing delicate, temperature-sensitive spore samples and bioreactors make it safely to the International Space Station or low Earth orbit safely.

Micron Technology’s Amit Srivastava, ULVAC’s Brian Coppa, and SEMI’s Mark da Silva suggest tackling corporate sustainability goals with a bottom-up approach that leverages various sensing technologies, at the cleanroom, sub-fab, and facilities levels for both greenfield and brownfield device-making facilities, to enable predictive analytics.

And don’t miss the blogs featured in the latest Manufacturing, Packaging & Materials newsletter:

Amkor’s JeongMin Ju shows how to prevent critical failures in copper RDLs caused by overcurrent-induced fusing.

Synopsys’ Al Blais discusses curvilinear checking and fracture requirements for the MULTIGON era.

Lam Research’s Dempsey Deng compares the parasitic capacitance of a 6F² honeycomb DRAM device to a 4F² VCAT DRAM structure.

Brewer Science’s Jessica Albright covers debonding methods, thermal, topography, adhesion, and thickness variation.

SEMI’s John Cooney reviews a fireside chat between the President of SEMI Americas and the U.S. Under Secretary of State for Economic Growth, Energy, and the Environment on securing supply chains.

The post Blog Review: Aug. 21 appeared first on Semiconductor Engineering.

Semiconductor Engineering
Ensure Reliability In Automotive ICs By Reducing Thermal EffectsLee Wang
In the relentless pursuit of performance and miniaturization, the semiconductor industry has increasingly turned to 3D integrated circuits (3D-ICs) as a cutting-edge solution. Stacking dies in a 3D assembly offers numerous benefits, including enhanced performance, reduced power consumption, and more efficient use of space. However, this advanced technology also introduces significant thermal dissipation challenges that can impact the electrical behavior, reliability, performance, and lifespan of
5. Srpen 2024 v 09:01

Ensure Reliability In Automotive ICs By Reducing Thermal Effects

Semiconductor Engineering

Od: Lee Wang

5. Srpen 2024 v 09:01

In the relentless pursuit of performance and miniaturization, the semiconductor industry has increasingly turned to 3D integrated circuits (3D-ICs) as a cutting-edge solution. Stacking dies in a 3D assembly offers numerous benefits, including enhanced performance, reduced power consumption, and more efficient use of space. However, this advanced technology also introduces significant thermal dissipation challenges that can impact the electrical behavior, reliability, performance, and lifespan of the chips (figure 1). For automotive applications, where safety and reliability are paramount, managing these thermal effects is of utmost importance.

Fig. 1: Illustration of a 3D-IC with heat dissipation.

3D-ICs have become particularly attractive for safety-critical devices like automotive sensors. Advanced driver-assistance systems (ADAS) and autonomous vehicles (AVs) rely on these compact, high-performance chips to process vast amounts of sensor data in real time. Effective thermal management in these devices is a top priority to ensure that they function reliably under various operating conditions.

The thermal challenges of 3D-ICs in automotive applications

The stacked configuration of 3D-ICs inherently leads to complex thermal dynamics. In traditional 2D designs, heat dissipation occurs across a single plane, making it relatively straightforward to manage. However, in 3D-ICs, multiple active layers generate heat, creating significant thermal gradients and hotspots. These thermal issues can adversely affect device performance and reliability, which is particularly critical in automotive applications where components must operate reliably under extreme temperatures and harsh conditions.

These thermal effects in automotive 3D-ICs can impact the electrical behavior of the circuits, causing timing errors, increased leakage currents, and potential device failure. Therefore, accurate and comprehensive thermal analysis throughout the design flow is essential to ensure the reliability and performance of automotive ICs.

The importance of early and continuous thermal analysis

Traditionally, thermal analysis has been performed at the package and system levels, often as a separate process from IC design. However, with the advent of 3D-ICs, this approach is no longer sufficient.

To address the thermal challenges of 3D-ICs for automotive applications, it is crucial to incorporate die-level thermal analysis early in the design process and continue it throughout the design flow (figure 2). Early-stage thermal analysis can help identify potential hotspots and thermal bottlenecks before they become critical issues, enabling designers to make informed decisions about chiplet placement, power distribution, and cooling strategies. These early decisions reduce the risks of thermal-induced failures, improving the reliability of 3D automotive ICs.

Fig. 2: Die-level detailed thermal analysis using accurate package and boundary conditions should be fully integrated into the ASIC design flow to allow for fast thermal exploration.

Early package design, floorplanning and thermal feasibility analysis

During the initial package design and floorplanning stage, designers can use high-level power estimates and simplified models to perform thermal feasibility studies. These early analyses help identify configurations that are likely to cause thermal problems, allowing designers to rule out problematic designs before investing significant time and resources in detailed implementation.

Fig. 3: Thermal analysis as part of the package design, floorplanning and implementation flows.

For example, thermal analysis can reveal issues such as overlapping heat sources in stacked dies or insufficient cooling paths. By identifying these problems early, designers can explore alternative floorplans and adjust power distribution to mitigate thermal risks. This proactive approach reduces the likelihood of encountering critical thermal issues late in the design process, thereby shortening the overall design cycle.

Iterative thermal analysis throughout design refinement

As the design progresses and more detailed information becomes available, thermal analysis should be performed iteratively to refine the thermal model and verify that the design remains within acceptable thermal limits. At each stage of design refinement, additional details such as power maps, layout geometries and their material properties can be incorporated into the thermal model to improve accuracy.

This iterative approach lets designers continuously monitor and address thermal issues, ensuring that the design evolves in a thermally aware manner. By integrating thermal analysis with other design verification tasks, such as timing and power analysis, designers can achieve a holistic view of the design’s performance and reliability.

A robust thermal analysis tool should support various stages of the design process, providing value from initial concept to final signoff:

Early design planning: At the conceptual stage, designers can apply high-level power estimates to explore the thermal impact of different design options. This includes decisions related to 3D partitioning, die assembly, block and TSV floorplan, interface layer design, and package selection. By identifying potential thermal issues early, designers can make informed decisions that avoid costly redesigns later.
Detailed design and implementation: As designs become more detailed, thermal analysis should be used to verify that the design stays within its thermal budget. This involves analyzing the maturing package and die layout representations to account for their impact on thermally sensitive electrical circuits. Fine-grained power maps are crucial at this stage to capture hotspot effects accurately.
Design signoff: Before finalizing the design, it is essential to perform comprehensive thermal verification. This ensures that the design meets all thermal constraints and reliability requirements. Automated constraints checking and detailed reporting can expedite this process, providing designers with clear insights into any remaining thermal issues.
Connection to package-system analysis: Models from IC-level thermal analysis can be used in thermal analysis of the package and system. The integration lets designers build a streamlined flow through the entire development process of a 3D electronic product.

Tools and techniques for accurate thermal analysis

To effectively manage thermal challenges in automotive ICs, designers need advanced tools and techniques that can provide accurate and fast thermal analysis throughout the design flow. Modern thermal analysis tools are equipped with capabilities to handle the complexity of 3D-IC designs, from early feasibility studies to final signoff.

High-fidelity thermal models

Accurate thermal analysis requires high-fidelity thermal models that capture the intricate details of the 3D-IC assembly. These models should account for non-uniform material properties, fine-grained power distributions, and the thermal impact of through-silicon vias (TSVs) and other 3D features. Advanced tools can generate detailed thermal models based on the actual design geometries, providing a realistic representation of heat flow and temperature distribution.

For instance, tools like Calibre 3DThermal embeds an optimized custom 3D solver from Simcenter Flotherm to perform precise thermal analysis down to the nanometer scale. By leveraging detailed layer information and accurate boundary conditions, these tools can produce reliable thermal models that reflect the true thermal behavior of the design.

Automation and results viewing

Automation is a key feature of modern thermal analysis tools, enabling designers to perform complex analyses without requiring deep expertise in thermal engineering. An effective thermal analysis tool must offer advanced automation to facilitate use by non-experts. Key automation features include:

Optimized gridding: Automatically applying finer grids in critical areas of the model to ensure high resolution where needed, while using coarser grids elsewhere for efficiency.
Time step automation: In transient analysis, smaller time steps can be automatically generated during power transitions to capture key impacts accurately.
Equivalent thermal properties: Automatically reducing model complexity while maintaining accuracy by applying different bin sizes for critical (hotspot) vs non-critical regions when generating equivalent thermal properties.
Power map compression: Using adaptive bin sizes to compress very large power maps to improve tool performance.

Automated reporting: Generating summary reports that highlight key results for easy review and decision-making (figure 4).

Fig. 4: Ways to view thermal analysis results.

Automated thermal analysis tools can also integrate seamlessly with other design verification and implementation tools, providing a unified environment for managing thermal, electrical, and mechanical constraints. This integration ensures that thermal considerations are consistently addressed throughout the design flow, from initial feasibility analysis to final tape-out and even connecting with package-level analysis tools.

Real-world application

The practical benefits of integrated thermal analysis solutions are evident in real-world applications. For instance, a leading research organization, CEA, utilized an advanced thermal analysis tool from Siemens EDA to study the thermal performance of their 3DNoC demonstrator. The high-fidelity thermal model they developed showed a worst-case difference of just 3.75% and an average difference within 2% between simulation and measured data, demonstrating the accuracy and reliability of the tool (figure 5).

Fig. 5: Correlation of simulation versus measured results.

The path forward for automotive 3D-IC thermal management

As the automotive industry continues to embrace advanced technologies, the importance of accurate thermal analysis throughout the design flow of 3D-ICs cannot be overstated. By incorporating thermal analysis early in the design process and iteratively refining thermal models, designers can mitigate thermal risks, reduce design time, and enhance chip reliability.

Advanced thermal analysis tools that integrate seamlessly with the broader design environment are essential for achieving these goals. These tools enable designers to perform high-fidelity thermal analysis, automate complex tasks, and ensure that thermal considerations are addressed consistently from package design, through implementation to signoff.

By embracing these practices, designers can unlock the full potential of 3D-IC technology, delivering innovative, high-performance devices that meet the demands of today’s increasingly complex automotive applications.

For more information about die-level 3D-IC thermal analysis, read Conquer 3DIC thermal impacts with Calibre 3DThermal.

The post Ensure Reliability In Automotive ICs By Reducing Thermal Effects appeared first on Semiconductor Engineering.

Semiconductor Engineering
RISC-V Heralds New Era Of CooperationBrian Bailey
RISC-V is paving the way for open source to become accepted within the hardware community, creating a level of industry collaboration never seen in the past, while revitalizing the connection between academia and industry. The big question is whether this arrangement is just a placeholder while the industry re-learns how to develop processors, or whether this processor architecture is something very different. In either case, there is a clear and pressing need for more flexible processor archite
30. Květen 2024 v 09:05

RISC-V Heralds New Era Of Cooperation

Semiconductor Engineering

Od: Brian Bailey

30. Květen 2024 v 09:05

RISC-V is paving the way for open source to become accepted within the hardware community, creating a level of industry collaboration never seen in the past, while revitalizing the connection between academia and industry.

The big question is whether this arrangement is just a placeholder while the industry re-learns how to develop processors, or whether this processor architecture is something very different. In either case, there is a clear and pressing need for more flexible processor architectures, and at least for now, RISC-V has filled a void.

“RISC-V was born out of academia and has had strong collaboration within universities from day one,” says Loren Hobbs, vice president of product and business development at Bluespec. “This collaboration continues today, with many of the most popular open-source RISC-V processors having come from universities. Organizations such as OpenHW Group and CHIPS Alliance serve a central and critical role in driving the collaboration, which is bi-directional between the academic community and industry.”

Collaboration of this type has not existed with the industrial community in the past. “We are learning from each other,” says Florian Wohlrab, CEO at OpenHW. “We are learning best practices for verification. At the same time, we are learning what things to avoid. It is growing where people say, ‘Yes, I really get benefit from sharing ideas.'”

The need for processor flexibility exists within industry as well as academia. “There is a need within the industry for diversification on the processor front,” says Neil Hand, director of marketing at Siemens EDA. “In the past, this led to a fragmented set of companies that couldn’t work together. They didn’t see the value of working together. But RISC-V has a cohesive central organization where anyone who wants to get into the processor space can collaborate. They don’t have to expose their secret sauce, but they get to benefit from each other. A rising tide lifts all boats, and that’s really the situation we’re at with RISC-V.”

Longevity
Whether the industry can build upon this success, or whether it fizzles out over time, remains to be seen. But at least for now, RISC-V’s momentum is growing. “We are at the beginning of a revolution in hardware design,” says OpenHW’s Wohlrab. “We saw the same thing for software when Linux came out 20 or so years ago. No one was really thinking about sharing software or collaboratively developing software. There were some small open-source ventures, but working together on a big project took a long time to develop. Now we are all sharing software, all co-working. But for hardware, we’re just at the beginning of this new concept, and a lot of people need to understand that we can do the same for hardware as we did for software.”

Underlying RISC-V’s success is widespread collaboration. “One of the pillars sustaining the success of RISC-V is customization that works with the ecosystem and leverages a well-defined process,” says Sergio Marchese, vice president of application engineering at SmartDV. “RISC-V vendors face the challenge of showing how their processor customization capabilities serve an application and demonstrating the complete process on real hardware. Without strategic partnerships, RISC-V vendors must walk a much more challenging, time-consuming, and resource-intensive road.”

That framework is what makes it unique. “RISC-V has formed this framework for collaboration, and it fixes everything,” says Siemens’ Hand. “Now, when a university has a really cool idea for memory tagging in a processor design, they don’t have to build the compilers, they don’t have to build the reference platform. They already exist. Maybe a compiler optimization startup has this great idea for handling code optimization. They don’t have to build the rest of the ecosystem. When a processor IP company has this great idea, they can become focused within this bigger picture. That’s the unique nature of it. It’s not just a processor specification.”

Historically, one of the problems associated with open-source hardware was quality, because finding bugs in silicon is expensive. OpenHW is an important piece of the puzzle. “Why should everyone reinvent the wheel by themselves?” asks Wohlrab. “Why can’t we get the basic building blocks, some basic chips, take some design from academia, which has reasonably good quality, and build on them, verify them together. We are verifying with different tools, making sure we get a high coverage, and then everyone can go off and use them in their own chips for mass production, for volume shipment.”

This benefits companies both large and small. “There are several processor vendors that have switched to RISC-V,” says Hand. “Synopsys has moved to RISC-V. Andes has moved to RISC-V. MIPS has moved to RISC-V. Why? Because they can leverage the whole ecosystem. The downside of it is commoditization, which as a customer is really beneficial because you can delay choosing a processor till later in the design flow. Your early decision is to use the Arm ecosystem or RISC-V, and then you can work through it. That creates an interesting set of dynamics. You can start to create new opportunities for companies that develop and deliver IP, because you can benchmark them, swap them in and out, and see which one works. On the flip side, it makes it awful from a lock-in perspective once you’re in that socket.”

Fragmentation
Of course, there will be some friction in the system. “In the early days of RISC-V there was nearly a 1:1 balance between contributors and consumers of the technology,” says Geir Eide, director, product management for Siemens EDA. “Today there are thousands of RISC-V consumers, but only a small percentage of those will be contributors. There is a risk that there will be a disconnect between them. If, for instance, a particular market or regional segment is growing at a higher pace than others, or other market segments and regions are more conservative, they tend to stick to established solutions longer. That increases the risk that it could lead to fragmentation.”

Is that likely to impact development long term? “We do not believe that RISC-V will become regionally concentrated, although though there may be regional concentrations of focus within the broad set of implementation choices provided by RISC-V,” says Bluespec’s Hobbs. “A prime example of this is the Barcelona Supercomputer Center, creating a regional focus area for high-performance computing using RISC-V. However, while there may be regional focus areas, this does not mean that the RISC-V standard is, or will become, fragmented. In fact, one of the key tenets of the creation and foundation of RISC-V was preventing fragmentation of the ISA, and it continues to be a key function of RISC-V international.”

China may be a different story. “A lot of companies in China are creating RISC-V cores for internal consumption — for political reasons mostly,” says John Min, vice president of customer service at Arteris. “I think China will go 100% RISC-V for embedded, but it’s a one-way street. They will keep leveraging what the Western companies do and enhance it. China will continue sucking all advancements, such as vectorization, or the special domain-specific acceleration enhancements. They will create their own and make it their own internally, but they will give nothing back.”

Such splits have occurred in the past. “Design languages are the most recent example of that,” says Hand. “There was a regional split, and you had Europe focus on VHDL while America went with Verilog. With RISC-V, there will be that regional split where people will go off and do their things regionally. Europe has focused projects, India has theirs, but they’re still doing it within this framework. It’s this realization that everyone benefits. They’re not doing it to benefit the other people. They’re doing it ultimately to save themselves effort, to save themselves cost, but they realize that by doing it in that framework it is a net benefit to everyone.”

Bi-directionality
An important element is that everyone benefits, and that has to stretch across the academic/commercial boundary. “RISC-V has propelled a new degree of collaboration between academia and commercial organizations,” says Dave Kelf, CEO at Breker. “It’s noticeable that institutions such as Harvey Mudd College in Claremont, California, and ETH in Zurich, Switzerland, to name two, have produced advanced processor designs as a teaching aid, and have collaborated with multiple companies on their verification and design. This has been further advanced by OpenHW Group, which has made these designs accessible to the industry. This bi-directional collaboration benefits the tool providers to further enhance their offerings working on advanced, open devices, while also enabling academia to improve their designs to a commercial quality level. The virtuous circle created is essential if we are to see RISC-V established as a mainstream, industry-wide capability.”

Academia has a lot to offer in hardware advancement. “Researchers in universities are developing innovative new software and hardware to push the limits of RISC-V innovation,” says Dave Miller, head of corporate communications at SiFive. “Many of the RISC-V projects in academia are focused on optimizing performance and energy efficiency for AI workloads, and are open source so the entire ecosystem can benefit. Researchers are also actively contributing to RISC-V working groups to share their knowledge and collaborate with industry participants. These working groups are split evenly between representatives from APAC, Europe, and North America, all working together towards common goals.”

In many cases, industry is willing to fund such projects. “It makes it easy to have research topics that don’t need to boil the ocean,” says Hand. “If you’re a PhD student and you have a great idea, you can go do it. It’s easy for an industry partner to say, ‘I’ll sponsor that. That’s an interesting thing, and I am not required to allocate ridiculous amounts of money into an open-ended project. It’s like I can see the connection of how that research will go into a commercial product later.'”

This feeds back into academia. “The academics have been jumping on board with OpenHW,” says Wohlrab. “By taking their cores and productizing them, they get a chip back that could be shipped in high volume. Then they can do their research on a real commercial product and can see if their idea would fly in real life. They get real numbers and can see real figures for the benefits of a new branch predictor.”

It can also have a long-term benefit for tools. “There are areas where they want to collaborate with us, especially around security,” says Kiran Vittal, executive director for alliances marketing management at Synopsys. “They are building RISC-V based sub-systems using open-source RISC-V processors, and then academia wants to look at not only the AI part, but the security part. There are post-doc students or PhD students looking into using our tools to verify or to implement whatever they’re doing on security.”

That provides an incentive for EDA to offer better tools for use in universities. “Although there has always been collaboration between universities and the industry, where industry provides the universities with access to EDA tools, IP cores, etc., there’s often a bit of a lag,” says Siemens’ Eide. “In many situations (especially outside of the core area of a particular project), universities have access to older versions of the commercial solutions. If you for instance look at a new grad’s resume, where you in the past would see references to old tech, now you see a lot of references to relatively sophisticated use of RISC-V.”

Moving Forward
This collaboration needs to keep pushing forward. “We had an initiative to create a standardized interface for accelerators,” says Wohlrab. “RISC-V International standardized how to add custom instructions in the ISA, but there was no standard for the hardware interface. So we built this. It was a cool discussion. There were people from Silicon Labs, people from NXP, people from Thales, plus several startups. They all came together and asked, ‘How can we make it future proof and put the accelerators inside?'”

The application space for RISC-V is changing. “The big inflection point is Linux and Android,” says Arteris’ Min. “Android already has some support, but when both Android and Linux are really supported, it will change the mobile apps processor game. The number of designs will proliferate. The number of high-end designs will explode. It will take the whole industry to enable that because RISC-V companies are not big enough to create this by themselves. All the RISC-V companies are partners, because we enable this high-end design at the processor levels.”

That would deepen the software community’s engagement. “An embedded software developer needs to understand the underlying hardware if they want to run Linux on a RISC-V processor that uses custom instructions/ accelerators,” says Bluespec’s Hobbs. “To develop complex embedded hardware/software systems, both embedded software developers and embedded hardware developments must possess contextual understanding of the interoperability of hardware and software. The developer must understand how the customized processor is leveraging the custom instructions in hardware for Linux to efficiently manage and execute the accelerated workloads.”

This collaboration could reinvigorate research into EDA, as well. “With AI you can build predictive models,” says Hand. “Could that be used to identify the change effects from making an extension? What does that mean? There’s a cloud of influence — not directly gate-wise, because that immediately explodes — but perhaps based on test suites. ‘I know that something that touches that logic touches this downstream, which touches the rest of the design.’ That’s where AI plays a big role, and it is one of the interesting areas because in verification there are so many unknowns. When AI comes along, any guidance or any visibility that you can give is incredibly powerful. Even if it is not right 100% of the time, that’s okay, as long as it generates false negatives and not false positives.”

There is a great opportunity for EDA companies. “We collaborate with many of the open-source providers, with OpenHW group, with ETH in Zurich,” says Synopsys’ Vittal. “We want to promote our solutions when it comes to any processor design and you need standard tools like synthesis, place and route, simulation. But then there are also other kinds of unique solutions because RISC-V is so customizable, you can build your own custom instructions. You need something specific to verify these custom instructions and that’s why the Imperas golden models are important. We also collaborated with Bluespec to develop a verification methodology to take you through functional verification and debug.”

There are still some wrinkles to be worked out for customizations. “RISC-V gives us predictability,” says Hand. “We can create a compliance test suite, give you a processor optimization package if you’re on the implementation side. We can create analytics and testing solutions because we know what it’s going to look like. But for non-standard processors, it is effectively as a service, because everyone’s processor is a little bit different. The reason you see a lot of focus on the verification, from platform architecture exploration all the way through, is because if you change one little thing, such as an addressing mode, it impacts pretty much 100% of your processor verification. You’ve got to retest the whole processor. Most people aren’t set up like an Arm or an Intel with huge processor verification teams and the infrastructure, and so they need automation to do it for them.”

Conclusion
RISC-V has enabled the industry to create a framework for collaboration, which enables everyone to work together for selfish reasons. It is a symbiotic relationship that continues to build, and it is creating a wider sphere of influence over time.

“It’s unique in the modern era of semiconductor,” says Hand. “You have such a wide degree of collaboration, where you have processor manufacturers, the software industry leaders, EDA companies, all working on a common infrastructure.”

Related Reading
RISC-V Micro-Architectural Verification
Verifying a processor is much more than making sure the instructions work, but the industry is building from a limited knowledge base and few dedicated tools.
RISC-V Wants All Your Cores
It is not enough to want to dominate the world of CPUs. RISC-V has every core in its sights, and it’s starting to take steps to get there.

The post RISC-V Heralds New Era Of Cooperation appeared first on Semiconductor Engineering.

Semiconductor Engineering
High-Level Synthesis Propels Next-Gen AI AcceleratorsRussell Klein
Everything around you is getting smarter. Artificial intelligence is not just a data center application but will be deployed in all kinds of embedded systems that we interact with daily. We expect to talk to and gesture at them. We expect them to recognize and understand us. And we expect them to operate with just a little bit of common sense. This intelligence is making these systems not just more functional and easier to use, but safer and more secure as well. All this intelligence comes from
20. Květen 2024 v 09:01

High-Level Synthesis Propels Next-Gen AI Accelerators

Semiconductor Engineering

Od: Russell Klein

20. Květen 2024 v 09:01

Everything around you is getting smarter. Artificial intelligence is not just a data center application but will be deployed in all kinds of embedded systems that we interact with daily. We expect to talk to and gesture at them. We expect them to recognize and understand us. And we expect them to operate with just a little bit of common sense. This intelligence is making these systems not just more functional and easier to use, but safer and more secure as well.

All this intelligence comes from advances in deep neural networks. One of the key challenges of neural networks is their computational complexity. Small neural networks can take millions of multiply accumulate operations (MACs) to produce a result. Larger ones can take billions. Large language models, and similarly complex networks, can take trillions. This level of computation is beyond what can be delivered by embedded processors.

In some cases, the computation of these inferences can be off-loaded over a network to a data center. Increasingly, devices have fast and reliable network connections – making this a viable option for many systems. However, there are also a lot of systems that have hard real time requirements that cannot be met by even the fastest and most reliable networks. For example, any system that has autonomous mobility – self-driving cars or self-piloted drones – needs to make decisions faster than could be done through an off-site data center. There are also systems where sensitive data is being processed that should not be sent over networks. And anything that goes over a network introduces an additional attack surface for hackers. For all of these reasons – performance, privacy, and security – some inferencing will need to be done on embedded systems.

For very simple networks, embedded CPUs can handle the task. Even a Raspberry Pi can deploy a simple object recognition algorithm. For more complex tasks there are embedded GPUs, as well as neural processing units (NPUs) targeted at embedded systems that can deliver greater computational capability. But for the highest levels of performance and efficiency, building a bespoke AI (Artificial Intelligence) accelerator can enable applications that would otherwise be impractical.

Engineering a new piece of hardware is a daunting undertaking, whether for ASIC or FPGA. But it enables developers to reach a level of performance and efficiency not possible with off-the-shelf components. But how can the average development team build a better machine learning accelerator than the designers creating the most leading-edge commercial AI accelerators, with multiple generations under their belt? By highly customizing the implementation to the specific inference being performed, the implementation can be an order of magnitude better than more generalized solutions.

When a general-purpose AI accelerator developer creates an NPU, their goal is to support any neural network that anyone might conceive. They want to get thousands of design ins, so they have to make the design as general as possible. Not only that, but they also aim to have some level of “future proofing” built into their designs. They want to be able to support any network that might be imagined for several years into the future. Not an easy task in a technology that is evolving so rapidly.

A bespoke accelerator needs to only support the one, or perhaps several, networks to be used. This freedom allows many programmable elements in the implementation of the accelerator to be fixed in hardware. This creates hardware that is both smaller and faster than something general purpose. For example, a dedicated convolution accelerator, with a fixed image and filter size, can be up to 10 times faster than a well-designed general purpose TPU.

General purpose accelerators usually use floating point numbers. This is because virtually all neural networks are developed in Python on general purpose computers using floating point numbers. To ensure correct support of those neural networks, the accelerator must, of course, support floating point numbers. However, most neural networks use numbers close to 0, and require a lot of precision there. And floating-point multipliers are huge. If they are not needed, omitting them from the design saves a lot of area and power.

Some NPUs support integer representation, and sometimes with a variety of sizes. But supporting multiple numeric representation formats adds circuitry, which consumes power and adds propagation delays. Choosing one representation and using that exclusively enables a smaller faster implementation.

When building a bespoke accelerator, one is not limited to 8 bits or 16 bits, any size can be used. Picking the correct numeric representation, or “quantizing” a neural network, allows the data and the operators to be optimally sized. Quantization can significantly reduce the data needed to be stored, moved, and operated on. Reducing the memory footprint for the weight database and shrinking the multipliers can really improve the area and power of a design. For example, a 10-bit fixed-point multiplier is about 20 times smaller than a 32-bit floating-point multiplier, and, correspondingly, will use about 1/20^th the power. This means the design can either be much smaller and energy efficient by using the smaller multiplier, or the designer can opt to use the area and deploy 20 multipliers that can operate in parallel, producing much higher performance using the same resources.

One of the key challenges in building a bespoke machine learning accelerator is that the data scientists who created the neural network usually do not understand hardware design, and the hardware designers do not understand data science. In a traditional design flow, they would use “meetings” and “specifications” to transfer knowledge and share ideas. But, honestly, no one likes meetings or specifications. And they are not particularly good at effecting an information exchange.

High-Level Synthesis (HLS) allows an implementation produced by the data scientists to be used, not just as an executable reference, but as a machine-readable input to the hardware design process. This eliminates the manual reinterpretation of the algorithm in the design flow, which is slow and extremely error prone. HLS synthesizes an RTL implementation from an algorithmic description. Usually, the algorithm is described in C++ or SystemC, but a number of design flows like HLS4ML are enabling HLS tools to take neural network descriptions directly from machine learning frameworks.

HLS enables a practical exploration of quantization in a way that is not yet practical in machine learning frameworks. To fully understand the impact of quantization requires a bit accurate implementation of the algorithm, including the characterization of the effects of overflow, saturation, and rounding. Today this in only practical in hardware description languages (HDLs) or HLS bit accurate data types (https://hlslibs.org).

As machine learning becomes ubiquitous, more embedded systems will need to deploy inferencing accelerators. HLS is a practical and proven way to create bespoke accelerators, optimized for a very specific application, that deliver higher performance and efficiency than general purpose NPUs.

For more information on this topic, read the paper: High-Level Synthesis Enables the Next Generation of Edge AI Accelerators.

The post High-Level Synthesis Propels Next-Gen AI Accelerators appeared first on Semiconductor Engineering.

Semiconductor Engineering
Chip Aging Becoming Key Factor In Data Center EconomicsAnn Mutschler
Chip aging is becoming a much bigger concern inside of data centers, where it can impact server uptime, utilization rates, and the amount of energy needed to drive signals and cool entire server racks. Aging in chips is the result of both higher logic utilization and increasing transistor density. This is problematic for data centers, in general, but especially for AI chips where digital logic is expected to run at maximum speed. That generates more heat, which becomes harder to dissipate as the
20. Květen 2024 v 09:01

Chip Aging Becoming Key Factor In Data Center Economics

Semiconductor Engineering

Od: Ann Mutschler

20. Květen 2024 v 09:01

Chip aging is becoming a much bigger concern inside of data centers, where it can impact server uptime, utilization rates, and the amount of energy needed to drive signals and cool entire server racks.

Aging in chips is the result of both higher logic utilization and increasing transistor density. This is problematic for data centers, in general, but especially for AI chips where digital logic is expected to run at maximum speed. That generates more heat, which becomes harder to dissipate as the number specialized and general-purpose processing elements per square millimeter of silicon continues to rise. Heat typically gets trapped between the fins of finFETs and gate-all-around FETs, accelerating electromigration and reducing the time it takes for dielectrics to break down. It also can cause warpage, which can rupture the bonds and contacts between different components in an advanced package or on a PCB.

For data centers, that creates a number of challenges:

Thermal management: This requires a deep understanding of workloads and the resulting transient thermal gradients as processing is load-balanced on-chip, between chips or chiplets, and between servers;
More data: Data from sensors everywhere, along with larger training sets, all need to be processed faster than in the past to keep up with the flood of data, but all of that needs to happen in the same or smaller footprint without overheating any part of a device, and
In-circuit monitoring: Sensors can be added into chips to detect variations in heat and data speeds in different paths, but it’s much more difficult to keep track of tens of thousands of these monitors as they collect data from heterogeneous processing elements, each of which can age at different rates depending on process variation, defectivity, varying workloads, and ambient thermal conditions.

“Servers are much more capable today than they were 10 years ago, and the issue is that power hasn’t scaled like it used to,” said Steven Woo, Rambus fellow and distinguished inventor. “Now, if you want to do lots more work in your server, you have to burn more power to do it. Twenty years ago, a server might dissipate a couple hundred watts. But with the latest servers that NVIDIA just announced around Grace Blackwell, the whole rack is 120 kilowatts, and the individual servers are many kilowatts. Just delivering power into those racks is causing changes in the infrastructure in the industry. Now that you have to bring in and dissipate more power in a small space, you get all kinds of interesting things that could happen over time. The heat that’s being dissipated can have effects on the chip, and you have to worry sometimes about thermal cycling where, as the chip is doing a lot of work, maybe part of the chip stops and then it does more work. You get these rapid cycles of dissipating a lot of power, then not, then dissipating a lot of power, then not. That cycling causes local heating and cooling, leading to thermal stresses, and this impacts all chips, including memory.”

As a result, everyone from the data center manager to the chip architect now has to understand how a chip behaves in the field, and how increasingly customized chip and system architectures will function over time. Downtime is costly for a data center, but under-utilization and reduced performance also carries a high price tag. That, in turn, affects how much margin is considered essential, such as extra data paths if some of them are fully or partially closed off by electromigration, and how that margin will impact performance, power, and area/cost over a chip’s projected lifetime — especially in a heterogeneous design with specialized compute elements.

“When it comes to the hyper-scalers and high powered, highly customized, heterogeneous chips for various different workloads, these chips are on 24/7, so consistent uptime is critical,” said Dan Lee, product management director at Cadence. “Since all of these chips are done at the really advanced nodes, with the smaller device sizes, more developers are looking to do aging analysis, and derive the wear and tear so they can see if the chip is going to last a year or five years. At the same time, an important consideration is also thermal — especially when we’re talking about these heterogeneous integrations, and you don’t really get the thermal conductivity that you would in a straightforward, monolithic design. There’s a bit more thought or planning that needs to be a part of this because aging and heating are related. All things being equal, if you’re operating in a very hot environment, you’re going to expect a lower lifespan.”

Still, determining how much shorter that lifespan will be isn’t always a precise calculation. “Data center SoCs that execute mission-critical workloads need to provide scalable visibility, predict problems before they occur, provide deep-dive analyses into problems, and be optimized to increase longevity of investment,” said Padmakumar Karthik, senior technology manager at Arm. “Data center diagnostic patterns are often deployed to measure the health of an SoC post-manufacturing to prevent silent data corruption (SDC) issues. But on-chip sensors provide an additional layer of insights, detecting droops or aging or thermal events on-chip, all of which can cause SDC incidents. For this reason, scalable, customizable sensor frameworks that can monitor and adapt throughout the useful life of the device, enabling continuous design optimization and preventive maintenance, will be increasingly important.”

There are multiple ways to achieve this, but each data center can be very different. In some cases, chips are designed by systems companies for internal use. And in most cases, there is a mix of different hardware and software, not all of which is state-of-the-art. “Many data centers have legacy infrastructure that may not be inherently designed for optimal power efficiency,” noted Noam Brousard, vice president of systems at proteanTecs, in a recent blog. “Upgrading or retrofitting such infrastructure poses challenges in achieving comprehensive power optimization.”

Even within a single rack, stresses can vary greatly from one server to the next, and from one chip to the next even in the same server. “You can imagine when you have a very big chip, toward the edges of the chip it will expand more than in a small chip, and that can add stress,” said Rambus’ Woo. “You have to really be careful about how you cool things, and memory is no different. You have very specific things you worry about with memory, like the ability to retain data, depending on how hot the chip is.”

In addition, as chips age, parameters drift. Marc Swinnen, director of product marketing in Ansys’ semiconductor division, said the traditional approach has been to use a library that’s characterized as a brand new chip. “The library is characterized at 1 year, 5 years, 10 years, 15 years, and you can run all your analysis multiple times with these different aged libraries. That sounds good on paper, and that’s what a lot of people do, but the problem is that not all parts of the chip age at the same rate. This is why aging is often associated with activity and temperature. Some parts of the chip are more active and hotter than other parts of the chip, so the aging time runs differently for different parts. This means you want to apply some of the old library to some parts of the chip, and the younger library to other parts of the chip, because if signals run between them you have setup and hold issues. If everything slows down at the same time — or one slows down and the other one doesn’t — you’re going to get mismatches, and that’s the difficulty. At the bottom level, it’s easy. Every gate is assigned its right age. That’s simple. You do an analysis with every gate. But how do you assign the age to every gate? Where do you get that information from? You need a lot of realistic activity, and then predict that over the lifespan and with temperature. That’s the problem. How do you actually construct this aging map? Once you have it, the analysis is not that hard.”

Aging maps are application- and workload-specific. Every chip will age differently depending on the functions it performs.

But aging is just one of many factors that affect data center uptime. “When we look at data center, we look at the whole application first, then whittle it down to what that means for chips and packages,” said Kelly Morgan, senior principal application engineer at Ansys. “From the mechanical reliability lens of the data center operation, we go through thermal cycling, obviously. We’re in a controlled environment. But what does that influence? How does that influence the integrity of the chips as you go through thermal cycles? Typically, we’ll look at things like solder fatigue and other effects.”

Another factor to consider is shipping and handling, which can affect the aging of a chip, package, and board.

“Even before the device is put in place, there are opportunities for vibration,” Morgan said. “You might hit something, which is a bit of a shock. We have customers who are looking at things like drop, shock, and vibration, and they have goals they need to test to. Typically, the standard process is to do a lot of physical testing. Now as you can imagine, that can be pretty challenging. You have to be pretty far along in the design process before you really start to go and test, and if there’s an issue, then you’ve got to go back and retest. Early simulation helps here, especially for those larger-scale events, and that comes down to the chassis, the board, to all the components, including the ICs.”

Fig. 1: Components of complete electronic system analysis. Source: Ansys

Quality control remains a big challenge when it comes to mechanical stresses that can affect aging. Adam Cron, distinguished architect at Synopsys, pointed to a recent Intel white paper, which noted that at the current acceptable defectivity rates, one core fails every two days. To account for this, Cron noted that certain commercial tools support in-system delay testing in a BiST mode. By adding specific IP, any ATPG patterns could be added to that. (Intel’s paper said its solution only applies to stuck-at testing.)

“In very large, millions-of-cores data center-type environments, the implication is that you’d better be ready,” Cron said. “One of the things they were talking about in this paper was in-system scan. Intel was bringing a database of test patterns in, and then applying it in-system after isolating a core. And then, upon a failure, they’d quarantine and move on. But the data centers are apparently running out of that opportunistic time slot to do any of this. We’ve heard some interesting conversations about the fact that people do run a lot of things during certain times. However, other times are cheaper, so all the holes are just getting filled in terms of runtime. Monitors are certainly something to look at, but monitors are looking at systemic degradation. That’s known, if you will. And so as things degrade, V_min will change, maybe frequency will change. And they’ll be on a pace. They can figure out when to do that. That’s easy enough to figure out. However, if there’s a marginality or some broken component in there, it is not up to the tool to find that. And frankly, the in-system scan wasn’t addressing all components on the die. It was only up to like 80% of stuck-at coverage, which isn’t that much, especially when you’re not looking at all of the pieces inside the die. The point is, there are still opportunities to do better.”

Cron noted that one big systems company suggested a dual-core lockstep mechanism, starting out the data center in dual-core lock-step mode for X number of months. “When it looks like you’ve squeezed the major part of the curve out, in terms of finding these defective components, then unlock them, double your capacity, run like that for a while, and periodically hook some back up again. That means everything is utilized, at least. Of course, some are working at half capacity here and there, but it’s not the whole die. And there are some implications there from a design standpoint, at least for the hardware, but also possibly the operating system, depending on who decides what physical core is used versus what virtual core is used.”

Approaches to measuring aging
Any discussion around aging circuits really boils down to extending the life of the machines in the data center, and not getting caught by surprise when failures occur.

“How do you do that? You have to measure the aging of those machines,” said Neil Hand, director of marketing, IC segment at Siemens EDA. “Right now, if you speak to the CIOs of these big companies with big data centers, they say, ‘We’ve got to get rid of the machines after three years because we can’t risk it going down.’ If you look at embedded analytics capabilities, you can start to embed aging monitors in those devices, you can start to monitor those in real time. It doesn’t look that different than what it does from an automotive perspective. It’s all the same technologies, effectively, but you’re monitoring them. And then you can say, ‘We’re now at 90% of our life for this server.’ We can then just replace that server.”

This feeds into corporate goals around sustainability, as well. “It comes down to building the best thing to begin with, then building it with design for manufacturing in mind so that you don’t get waste during manufacturing, achieve better yields, and finally extend the life of products and build them in environmentally-sustainable ways,” Hand said. “If you can extend the data center lifecycle from three years to five years, that’s big. And especially if you start going to these high-performance, application-specific type of clusters, you may not need to change them as often, because if the underlying capabilities aren’t changing, that might drive the cycling of it. In the case of a biological computer, if there’s no new change to the underlying protein folding mechanisms, you might say, ‘We don’t need a new compute platform. This is really good.”

The longer the product life can be extended, the better. Design for aging is a matter of, first, performing the aging analysis with the foundry models. “Run the simulations and observe the effects,” said Cadence’s Lee. “When you’re doing the simulation, you want to have the right mission profiles, so you come up with an accurate prediction of how your device is going to behave after a certain number of years in deployment. You may want to combine that with thermal analysis, for example, because how that aging is going to behave will depend on what temperature this design is going to be working at. You may think it’s 22 degrees Celsius, but maybe through some thermal analysis you realize it’s actually going to be operating at 35 or 40 degrees most of the time. That may change the outcome of your aging analysis.”

In terms of the associated thermal analysis, this can extend beyond a single device. “It’s also how that heat is moving,” Lee said. “Let’s say you have this integrated design, where you have some power devices alongside some logic, or some other functionality that is lower power. What you may want to understand is, if those bandgaps or power circuits are generating a lot of heat, that may be shifting over into other parts of your design. So when you run your aging analysis, you may assume that you’re running at 25 degrees, whereas the power devices are at 40 or 45 degrees. They’re on the same chip, they’re very close to each other, and you have to understand how much of that heat is moving over to your logic and what that’s going to bring the temperature up to. You want to know that so you can perform the aging analysis based on that higher temperature.”

Another consideration is combining aging analysis and interconnect parasitics, which is especially relevant for advanced nodes due to the parasitics in the interconnect. “They’re dominant when it comes to performance and functionality,” Lee added. “So when thinking about aging, you also have to think about it being an aged device that has to push the electrons through this interconnect. That’s a pretty heavy load. When you’re doing the aging analysis, you probably will have to be doing it with extracted parasitics. You just can’t do it on a pure schematic design. It doesn’t give you enough detail about what’s really happening physically. This may be included in the aging analysis tool. When most people talk about aging, they may not think about the parasitic aspect to it.”

Combating aging, thermal in memory
While standards don’t work in custom silicon, they do work for some standard components in those devices, such as memory. Over the past 10 to 15 years, memory standards have started to address the impact of heat.

“If you start to exceed certain temperature limits, you’ve got to refresh the device more frequently because the charge can leak off the cells more quickly,” said Rambus’ Woo. “So there are temperature-dependent refresh rates. There are other things that can be exacerbated, like the capacitors are getting smaller, they’re holding fewer electrons because there are so many more of them on a chip now, so we’ve seen memories adopt on-die error correction. This on-die error correction is something that is hidden from the outside world. In many cases, you don’t even know an error has occurred and been corrected on the chip. Those kinds of technologies become even more important now because the temperatures can be higher.”

There also is growing demand for more telemetry to provide monitoring information. “You just want to know if anything is overheating,” said Woo. “Does something seem like it’s malfunctioning? The data center manager will get regular updates about the status of the major components of the system. A lot of boards now in servers have baseboard management controllers (BMCs), which are little chips that sit on each board and are responsible for, among other things, reporting back the health of that board when a server might have five or six boards. We’re frequently seeing more of these BMC chips.”

Design for aging
While the goal is to be able to guarantee a certain lifetime for the chips in a data center, the challenges for achieving that are expanding. “There’s a growing list of things that can be harmful to devices over their lifetime,” Woo said. “It’s a balance between not adding too much cost, even though you have to increase the reliability and maybe add new features, and all of these things are in play with each other.”

Whether it is liquid cooling or higher levels of RAS ECC in the system, there is no single best answer for every application. In general, the industry is moving toward higher reliability and increasing resilience, but there are many ways to get there and challenges with each of them.

“Just as 15 years ago we didn’t necessarily always think we had to talk about power, now we have to talk about it all the time,” Woo said. “The same thing is going to be true for resilience and reliability. It’s going to be required to become part of the way people think about architectures, and part of that is how the memory system improves its reliability. You can’t really do anything unless you can compute on some data, and you have to make sure that data is reliable. It will touch how memory is stored in a DRAM. It will touch how memory is communicated across links. And it even will touch how processors manipulate data once they get a hold of it in their caches, and in the compute pipelines. Also, one of the key things people will worry about is how much of that susceptibility is brought about by age-related issues, like heating cycles, etc.”

Finally, there are even issues around the quality of the power that comes into a system. “The servers get noise on the power rails, and it’s a balance between how much money you’re willing to pay for the power delivery versus the quality of power,” said Woo. “You have to be tolerant of those kinds of things, too. Power management becomes more challenging, as well as the amount of power that these systems are using today. NVIDIA systems bring 48-volt power into the racks, and there is talk about even higher voltage levels. Those changes in infrastructure can all impact heat, and can age components differently.”

The post Chip Aging Becoming Key Factor In Data Center Economics appeared first on Semiconductor Engineering.

Semiconductor Engineering
Will Domain-Specific ICs Become Ubiquitous?Brian Bailey
Questions are surfacing for all types of design, ranging from small microcontrollers to leading-edge chips, over whether domain-specific design will become ubiquitous, or whether it will fall into the historic pattern of customization first, followed by lower-cost, general-purpose components. Custom hardware always has been a double-edged sword. It can provide a competitive edge for chipmakers, but often requires more time to design, verify, and manufacture a chip, which can sometimes cost a mar
16. Květen 2024 v 09:05

Will Domain-Specific ICs Become Ubiquitous?

Semiconductor Engineering

Od: Brian Bailey

16. Květen 2024 v 09:05

Questions are surfacing for all types of design, ranging from small microcontrollers to leading-edge chips, over whether domain-specific design will become ubiquitous, or whether it will fall into the historic pattern of customization first, followed by lower-cost, general-purpose components.

Custom hardware always has been a double-edged sword. It can provide a competitive edge for chipmakers, but often requires more time to design, verify, and manufacture a chip, which can sometimes cost a market window. In addition, it’s often too expensive for all but the most price-resilient applications. This is a well-understood equation at the leading edge of design, particularly where new technologies such as generative AI are involved.

But with planar scaling coming to an end, and with more features tailored to specific domains, the chip industry is struggling to figure out whether the business/technical equation is undergoing a fundamental and more permanent change. This is muddied further by the fact that some 30% to 35% of all design tools today are being sold to large systems companies for chips that will never be sold commercially. In those applications, the collective savings from improved performance per watt may dwarf the cost of designing, verifying, and manufacturing a highly optimized multi-chip/multi-chiplet package across a large data center, leaving the debate about custom vs. general-purpose more uncertain than ever.

“If you go high enough in the engineering organization, you’re going to find that what people really want to do is a software-defined whatever it is,” says Russell Klein, program director for high-level synthesis at Siemens EDA. “What they really want to do is buy off-the-shelf hardware, put some software on it, make that their value-add, and ship that. That paradigm is breaking down in a number of domains. It is breaking down where we need either extremely high performance, or we need extreme efficiency. If we need higher performance than we can get from that off-the-shelf system, or we need greater efficiency, we need the battery to last longer, or we just can’t burn as much power, then we’ve got to start customizing the hardware.”

Even the selection of processing units can make a solution custom. “Domain-specific computing is already ubiquitous,” says Dave Fick, CEO and cofounder of Mythic. “Modern computers, whether in a laptop, phone, security camera, or in farm equipment, consist of a mix of hardware blocks co-optimized with software. For instance, it is common for a computer to have video encode or decode hardware units to allow a system to connect to a camera efficiently. It is common to have accelerators for encryption so that we can safely communicate. Each of these is co-optimized with software algorithms to make commonly used functions highly efficient and flexible.”

Steve Roddy, chief marketing officer at Quadric, agrees. “Heterogeneous processing in SoCs has been de rigueur in the vast majority of consumer applications for the past two decades or more. SoCs for mobile phones, tablets, televisions, and automotive applications have long been required to meet a grueling combination of high-performance plus low-cost requirements, which has led to the proliferation of function-specific processors found in those systems today. Even low-cost SoCs for mobile phones today have CPUs for running Android, complex GPUs to paint the display screen, audio DSPs for offloading audio playback in a low-power mode, video DSPs paired with NPUs in the camera subsystem to improve image capture (stabilization, filters, enhancement), baseband DSPs — often with attached NPUs — for high speed communications channel processing in the Wi-Fi and 5G subsystems, sensor hub fusion DSPs, and even power-management processors that maximize battery life.”

It helps to separate what you call general-purpose and what is application-specific. “There is so much benefit to be had from running your software on dedicated hardware, what we call bespoke silicon, because it gives you an advantage over your competitors,” says Marc Swinnen, director of product marketing in Ansys’ Semiconductor Division. “Your software runs faster, lower power, and is designed to run specifically what you want to run. It’s hard for a competitor with off-the-shelf hardware to compete with you. Silicon has become so central to the business value, the business model, of many companies that it has become important to have that optimized.”

There is a balance, however. “If there is any cost justification in terms of return on investment and deployment costs, power costs, thermal costs, cooling costs, then it always makes sense to build a custom ASIC,” says Sharad Chole, chief scientist and co-founder of Expedera. “We saw that for cryptocurrency, we see that right now for AI. We saw that for edge computing, which requires extremely ultra-low power sensors and ultra-low power processes. But there also has been a push for general-purpose computing hardware, because then you can easily make the applications more abstract and scalable.”

Part of the seeming conflict is due to the scope of specificity. “When you look at the architecture, it’s really the scope that determines the application specificity,” says Frank Schirrmeister, vice president of solutions and business development at Arteris. “Domain-specific computing is ubiquitous now. The important part is the constant moving up of the domain specificity to something more complex — from the original IP, to configurable IP, to subsystems that are configurable.”

In the past, it has been driven more by economics. “There’s an ebb and a flow to it,” says Paul Karazuba, vice president of marketing at Expedera. “There’s an ebb and a flow to putting everything into a processor. There’s an ebb and a flow to having co-processors, augmenting functions that are inside of that main processor. It’s a natural evolution of pretty much everything. It may not necessarily be cheaper to design your own silicon, but it may be more expensive in the long run to not design your own silicon.”

An attempt to formalize that ebb and flow was made by Tsugio Makimoto in the 1990s, when he was Sony’s CTO. He observed that electronics cycled between custom solutions and programmable ones approximately every 10 years. What’s changed is that most custom chips from the time of his observation contained highly programmable standard components.

Technology drivers
Today, it would appear that technical issues will decide this. “The industry has managed to work around power issues and push up the thermal envelope beyond points I personally thought were going to be reasonable, or feasible,” says Elad Alon, co-founder and CEO of Blue Cheetah. “We’re hitting that power limit, and when you hit the power limit it drives you toward customization wherever you can do it. But obviously, there is tension between flexibility, scalability, and applicability to the broadest market possible. This is seen in the fast pace of innovation in the AI software world, where tomorrow there could be an entirely different algorithm, and that throws out almost all the customizations one may have done.”

The slowing of Moore’s Law will have a fundamental influence on the balance point. “There have been a number of bespoke silicon companies in the past that were successful for a short period of time, but then failed,” says Ansys’ Swinnen. “They had made some kind of advance, be it architectural or addressing a new market need, but then the general-purpose chips caught up. That is because there’s so much investment in them, and there’s so many people using them, there’s an entire army of people advancing, versus your company, just your team, that’s advancing your bespoke solution. Inevitably, sooner or later, they bypass you and the general-purpose hardware just gets better than the specific one. Right now, the pendulum has swung toward custom solutions being the winner.”

However, general-purpose processors do not automatically advance if companies don’t keep up with adoption of the latest nodes, and that leads to even more opportunities. “When adding accelerators to a general-purpose processor starts to break down, because you want to go faster or become more efficient, you start to create truly customized implementations,” says Siemens’ Klein. “That’s where high-level synthesis starts to become really interesting, because you’ve got that software-defined implementation as your starting point. We can take it through high-level synthesis (HLS) and build an accelerator that’s going to do that one specific thing. We could leave a bunch of registers to define its behavior, or we can just hard code everything. The less general that system is, the more specific it is, usually the higher performance and the greater efficiency that we’re going to take away from it. And it almost always is going to be able to beat a general-purpose accelerator or certainly a general-purpose processor in terms of both performance and efficiency.”

At the same time, IP has become massively configurable. “There used to be IP as the building blocks,” says Arteris’ Schirrmeister. “Since then, the industry has produced much larger and more complex IP that takes on the role of sub-systems, and that’s where scope comes in. We have seen Arm with what they call the compute sub-systems (CSS), which are an integration and then hardened. People care about the chip as a whole, and then the chip and the system context with all that software. Application specificity has become ubiquitous in the IP space. You either build hard cores, you use a configurable core, or you use high-level synthesis. All of them are, by definition, application-specific, and the configurability plays in there.”

Put in perspective, there is more than one way to build a device, and an increasing number of options for getting it done. “There’s a really large market for specialized computing around some algorithm,” says Klein. “IP for that is going to be both in the form of discrete chips, as well as IP that could be built into something. Ultimately, that has to become silicon. It’s got to be hardened to some degree. They can set some parameters and bake it into somebody’s design. Consider an Arm processor. I can configure how many CPUs I want, I can configure how big I want the caches, and then I can go bake that into a specific implementation. That’s going to be the thing that I build, and it’s going to be more targeted. It will have better efficiency and a better cost profile and a better power profile for the thing that I’m doing. Somebody else can take it and configure it a little bit differently. And to the degree that the IP works, that’s a great solution. But there will always be algorithms that don’t have a big enough market for IP to address. And that’s where you go in and do the extreme customization.”

Chiplets
Some have questioned if the emerging chiplet industry will reverse this trend. “We will continue to see systems composed of many hardware accelerator blocks, and advanced silicon integration technologies (i.e., 3D stacking and chiplets) will make that even easier,” says Mythic’s Fick. “There are many companies working on open standards for chiplets, enabling communication bandwidth and energy efficiency that is an order of magnitude greater than what can be built on a PCB. Perhaps soon, the advanced system-in-package will overtake the PCB as the way systems are designed.”

Chiplets are not likely to be highly configurable. “Configuration in the chiplet world might become just a function of switching off things you don’t need,” says Schirrmeister. “Configuration really means that you do not use certain things. You don’t get your money back for those items. It’s all basically applying math and predicting what your volumes are going to be. If it’s an incremental cost that has one more block on it to support another interface, or making the block the Ethernet block with time triggered stuff in it for automotive, that gives you an incremental effort of X. Now, you have to basically estimate whether it also gives you a multiple of that incremental effort as incremental profit. It works out this way because chips just become very configurable. Chiplets are just going in the direction or finding the balance of more generic usage so that you can apply them in more chiplet designs.”

The chiplet market is far from certain today. “The promise of chiplets is that you use only the function that you want from the supplier that you want, in the right node, at the right location,” says Expedera’s Karazuba. “The idea of specialization and chiplets are at arm’s length. They’re actually together, but chiplets have a long way to go. There’s still not that universal agreement of the different things around a chiplet that have to be in order to make the product truly mass market.”

While chiplets have been proven to work, nearly all of the chiplets in use today are proprietary. “To build a viable [commercial] chiplet company, you have to be going after a broad enough market, large enough from a dollar perspective, then you can make all the investment, have success and get everything back accordingly,” says Blue Cheetah’s Alon. “There’s a similar tension where people would like to build a general-purpose chiplet that can be used anywhere, by anyone. That is the plug-and-play discussion, but you could finish up with something that becomes so general-purpose, with so much overhead, that it’s just not attractive in any particular market. In the chiplet case, for technical reasons, it might not actually really work that way at all. You might try to build it for general purpose, and it turns out later that it doesn’t plug into particular sockets that are of interest.”

The economics of chiplet viability have not yet been defined. “The thing about chiplets is they can be small,” says Klein. “Being small means that we don’t need as big a market for them as we would for a very large chip. We can also build them on different technologies. We can have some that are on older technologies, where transistors are cheaper, and we can combine those with other chiplets that might be leading-edge nodes where we could have general-purpose CPUs or NPU accelerators. There’s a mix-and-match, and we can do chiplets smaller than we can general-purpose chips. We can do smaller runs of them. We can take that IP and customize it for a particular market vertical and create some chiplets for that, change the configuration a bit, and do another run for something else. There’s a level of customization that can be deployed and supported by the market that’s a little bit more than we’ve seen in full-size chips, where the entire thing has to be built into one package.

Conclusion
What it means for a design to be general-purpose or custom is changing. All designs will contain some of each. Some companies will develop novel architectures using general-purpose processors, and these will be better than a fully general-purpose solution. Others will create highly customized hardware for some functions that are known to be stable, and general purpose for things that are likely to change. One thing has never changed, however. A company is not likely to add more customization than necessary to satisfy the needs of the market they are targeting.

Further Reading
Challenges With Chiplets And Power Delivery
Benefits and challenges in heterogeneous integration.
Chiplets: 2023 (EBook)
What chiplets are, what they are being used for today, and what they will be used for in the future.

The post Will Domain-Specific ICs Become Ubiquitous? appeared first on Semiconductor Engineering.

Semiconductor Engineering
Reset Domain Crossing VerificationSiemens EDA
By Reetika and Sulabh Kumar Khare, Siemens EDA DI SW To meet low-power and high-performance requirements, system on chip (SoC) designs are equipped with several asynchronous and soft reset signals. These reset signals help to safeguard software and hardware functional safety as they can be asserted to speedily recover the system onboard to an initial state and clear any pending errors or events. By definition, a reset domain crossing (RDC) occurs when a path’s transmitting flop has an asynchrono
13. Květen 2024 v 09:01

Reset Domain Crossing Verification

Semiconductor Engineering

Od: Siemens EDA

13. Květen 2024 v 09:01

By Reetika and Sulabh Kumar Khare, Siemens EDA DI SW

To meet low-power and high-performance requirements, system on chip (SoC) designs are equipped with several asynchronous and soft reset signals. These reset signals help to safeguard software and hardware functional safety as they can be asserted to speedily recover the system onboard to an initial state and clear any pending errors or events.

By definition, a reset domain crossing (RDC) occurs when a path’s transmitting flop has an asynchronous reset, and the receiving flop either has a different asynchronous reset than the transmitting flop or has no reset. The multitude of asynchronous reset sources found in today’s complex automotive designs means there are a large number of RDC paths, which can lead to systematic faults and hence cause data-corruption, glitches, metastability, or functional failures — along with other issues.

This issue is not covered by standard, static verification methods, such as clock domain crossing (CDC) analysis. Therefore, a proper reset domain crossing verification methodology is required to prevent errors in the reset design during the RTL verification stage.

A soft reset is an internally generated reset (register/latch/black-box output is used as a reset) that allows the design engineer to reset a specific portion of the design (specific module/subsystem) without affecting the entire system. Design engineers frequently use a soft reset mechanism to reset/restart the device without fully powering it off, as this helps to conserve power by selectively resetting specific electronic components while keeping others in an operational state. A soft reset typically involves manipulating specific registers or signals to trigger the reset process. Applying soft resets is a common technique used to quickly recover from a problem or test a specific area of the design. This can save time during simulation and verification by allowing the designer to isolate and debug specific issues without having to restart the entire simulation. Figure 1 shows a simple soft reset and its RTL to demonstrate that SoftReg is a soft reset for flop Reg.

Fig. 1: SoftReg is a soft reset for register Reg.

This article presents a systematic methodology to identify RDCs, with different soft resets, that are unsafe, even though the asynchronous reset domain is the same on the transmitter and receiver ends. Also, with enough debug aids, we will identify the safe RDCs (safe from metastability only if it meets the static timing analysis), with different asynchronous reset domains, that help to avoid silicon failures and minimize false crossing results. As a part of static analysis, this systematic methodology enables designers to intelligently identify critical reset domain bugs associated with soft resets.

A methodology to identify critical reset domain bugs

With highly complex reset architectures in automotive designs, there arises the need for a proper verification method to detect RDC issues. It is essential to detect unsafe RDCs systematically and apply appropriate synchronization techniques to tackle the issues that may arise due to delays in reset paths caused by soft resets. Thus designers can ensure proper operation of their designs and avoid the associated risks. By handling RDCs effectively, designers can mitigate potential issues and enhance the overall robustness and performance of a design. This systematic flow involves several steps to assist in RDC verification closure using standard RDC verification tools (see figure 2).

Fig. 2: Flowchart of methodology for RDC verification.

Specification of clock and reset signals

Signals that are intended to generate a clock and reset pulse should be specified by the user as clock or reset signals, respectively, during the set-up step in RDC verification. By specifying signals as clocks or resets (according to their expected behavior), designers can perform design rule checking and other verification checks to ensure compliance with clock and reset related guidelines and standards as well as best practices. This helps identify potential design issues and improve the overall quality of the design by reducing noise in the results.

Clock detection

Ideally, design engineers should define the clock signals and then the verification tool should trace these clocks down to the leaf clocks. Unfortunately, with complex designs, this is not possible as the design might have black boxes that originate clocks, or it may have some combinational logic in the clock signals that do not cover all the clocks specified by the user. All the un-specified clocks need to be identified and mapped to the user-specified primary clocks. An exhaustive detection of clocks is required in RDC verification, as potential metastability may occur if resets are used in different clock domains than the sequential element itself, leading to critical bugs.

Reset detection

Ideally, design engineers should define the reset signals, but again, due to the complexity of automotive and other modern designs, it is not possible to specify all the reset signals. Therefore a specialized verification tool is required for detection of resets. All the localized, black-box, gated, and primary resets need to be identified, and based on their usage in the RTL, they should be classified as synchronous, asynchronous, or dual type and then mapped to the user-specified primary resets.

Soft reset detection

The soft resets — i.e., the internally generated resets by flops and latches — need to be systematically detected as they can cause critical metastability issues when used in different clock domains, and they require static timing analysis when used in the same clock domain. Detecting soft resets helps identify potential metastability problems and allows designers to apply proper techniques for resolving these issues.

Reset tree analysis

Analysis of reset trees helps designers identify issues early in the design process, before RDC analysis. It helps to highlight some important errors in the reset design that are not commonly caught by lint tools. These include:

Dual synchronicity reset signals, i.e., the reset signal with a sample synchronous reset flop and a sample asynchronous reset flop
An asynchronous set/reset signal used as a data signal can result in incorrect data sampling because the reset state cannot be controlled

Reset domain crossing analysis

This step involves analyzing a design to determine the logic across various reset domains and identify potential RDCs. The analysis should also identify common reset sequences of asynchronous and soft reset sources at the transmitter and receiver registers of the crossings to avoid detection of false crossings that might appear as potential issues due to complex combinations of reset sources. False crossings are where a transmitter register and receiver register are asserted simultaneously due to dependencies among the reset assertion sequences, and as a result, any metastability that might occur on the receiver end is mitigated.

Analyze and fix RDC issues

The concluding step is to analyze the results of the verification steps to verify if data paths crossing reset domains are safe from metastability. For the RDCs identified as unsafe — which may occur either due to different asynchronous reset domains at the transmitter and receiver ends or due to the soft reset being used in a different clock domain than the sequential element itself — design engineers can develop solutions to eliminate or mitigate metastability by restructuring the design, modifying reset synchronization logic, or adjusting the reset ordering. Traditionally safe RDCs — i.e., crossings where a soft reset is used in the same clock domain as the sequential element itself — need to be verified using static timing analysis.

Figure 3 presents our proposed flow for identifying and eliminating metastability issues due to soft resets. After implementing the RDC solutions, re-verify the design to ensure that the reset domain crossing issues have been effectively addressed.

Fig. 3: Flowchart for proposed methodology to tackle metastability issues due to soft resets.

This methodology was used on a design with 374,546 register bits, 8 latch bits, and 45 RAMs. The Questa RDC verification tool using this new methodology identified around 131 reset domains, which consisted of 19 asynchronous domains defined by the user, as well as 81 asynchronous reset domains inferred by the tool.

The first run analyzed data paths crossing asynchronous reset domains without any soft reset analysis. It reported nearly 40,000 RDC crossings (as shown in table 1).

Reset domain crossings without soft reset analysis	Severity	Number of crossings
Reset domain crossing from a reset to a reset	Violation	28408
Reset domain crossing from a reset to non-reset	Violation	11235

Table 1: RDC analysis without soft resets.

In the second run, we did soft reset analysis and detected 34 soft resets, which resulted in additional violations for RDC paths with transmitter soft reset sources in different clock domains. These were critical violations that were missed in the initial run. Also, some RDC violations were converted to cautions (RDC paths with a transmitter soft reset in the same clock domain) as these paths would be safe from metastability as long as they meet the setup time window (as shown in table 2).

Reset domain crossings with soft reset analysis	Severity	Number of crossings
Reset domain crossing from a reset to a reset	Violation	26957
Reset domain crossing from a reset to non-reset	Violation	10523
Reset domain crossing with tx reset source in different clock	Violation	880
Reset domain crossing from a reset to Rx with same clock	Caution	2412

Table 2: RDC analysis with soft resets.

To gain a deeper understanding of RDC, metastability, and soft reset analysis in the context of this new methodology, please download the full paper Techniques to identify reset metastability issues due to soft resets.

The post Reset Domain Crossing Verification appeared first on Semiconductor Engineering.

Semiconductor Engineering
SRAM Security Concerns GrowKaren Heyman
SRAM security concerns are intensifying as a combination of new and existing techniques allow hackers to tap into data for longer periods of time after a device is powered down. This is particularly alarming as the leading edge of design shifts from planar SoCs to heterogeneous systems in package, such as those used in AI or edge processing, where chiplets frequently have their own memory hierarchy. Until now, most cybersecurity concerns involving volatile memory have focused on DRAM, because it
9. Květen 2024 v 09:08

SRAM Security Concerns Grow

Semiconductor Engineering

Od: Karen Heyman

9. Květen 2024 v 09:08

SRAM security concerns are intensifying as a combination of new and existing techniques allow hackers to tap into data for longer periods of time after a device is powered down.

This is particularly alarming as the leading edge of design shifts from planar SoCs to heterogeneous systems in package, such as those used in AI or edge processing, where chiplets frequently have their own memory hierarchy. Until now, most cybersecurity concerns involving volatile memory have focused on DRAM, because it is often external and easier to attack. SRAM, in contrast, does not contain a component as obviously vulnerable as a heat-sensitive capacitor, and in the past it has been harder to pinpoint. But as SoCs are disaggregated and more features are added into devices, SRAM is becoming a much bigger security concern.

The attack scheme is well understood. Known as cold boot, it was first identified in 2008, and is essentially a variant of a side-channel attack. In a cold boot approach, an attacker dumps data from internal SRAM to an external device, and then restarts the system from the external device with some code modification. “Cold boot is primarily targeted at SRAM, with the two primary defenses being isolation and in-memory encryption,” said Vijay Seshadri, distinguished engineer at Cycuity.

Compared with network-based attacks, such as DRAM’s rowhammer, cold boot is relatively simple. It relies on physical proximity and a can of compressed air.

The vulnerability was first described by Edward Felton, director of Princeton University’s Center for Information Technology Policy, J. Alex Halderman, currently director of the Center for Computer Security & Society at the University of Michigan, and colleagues. The breakthrough in their research was based on the growing realization in the engineering research community that data does not vanish from memory the moment a device is turned off, which until then was a common assumption. Instead, data in both DRAM and SRAM has a brief “remanence.”[1]

Using a cold boot approach, data can be retrieved, especially if an attacker sprays the chip with compressed air, cooling it enough to slow the degradation of the data. As the researchers described their approach, “We obtained surface temperatures of approximately −50°C with a simple cooling technique — discharging inverted cans of ‘canned air’ duster spray directly onto the chips. At these temperatures, we typically found that fewer than 1% of bits decayed even after 10 minutes without power.”

Unfortunately, despite nearly 20 years of security research since the publication of the Halderman paper, the authors’ warning still holds true. “Though we discuss several strategies for mitigating these risks, we know of no simple remedy that would eliminate them.”

However unrealistic, there is one simple and obvious remedy to cold boot — never leave a device unattended. But given human behavior, it’s safer to assume that every device is vulnerable, from smart watches to servers, as well as automotive chips used for increasingly autonomous driving.

While the original research exclusively examined DRAM, within the last six years cold boot has proven to be one of the most serious vulnerabilities for SRAM. In 2018, researchers at Germany’s Technische Universität Darmstadt published a paper describing a cold boot attack method that is highly resistant to memory erasure techniques, and which can be used to manipulate the cryptographic keys produced by the SRAM physical unclonable function (PUF).

As with so many security issues, it’s been a cat-and-mouse game between remedies and counter-attacks. And because cold boot takes advantage of slowing down memory degradation, in 2022 Yang-Kyu Choi and colleagues at the Korea Advanced Institute of Science and Technology (KAIST), described a way to undo the slowdown with an ultra-fast data sanitization method that worked within 5 ns, using back bias to control the device parameters of CMOS.

Fig. 1: Asymmetric forward back-biasing scheme for permanent erasing. (a) All the data are reset to 1. (b) All the data are reset to 0. Whether all the data where reset to 1 or 0 is determined by the asymmetric forward back-biasing scheme. Source: KAIST/Creative Commons [2]

Their paper, as well as others, have inspired new approaches to combating cold boot attacks.

“To mitigate the risk of unauthorized access from unknown devices, main devices, or servers, check the authenticated code and unique identity of each accessing device,” said Jongsin Yun, memory technologist at Siemens EDA. “SRAM PUF is one of the ways to securely identify each device. SRAM is made of two inverters cross-coupled to each other. Although each inverter is designed to be the same device, normally one part of the inverter has a somewhat stronger NMOS than the other due to inherent random dopant fluctuation. During the initial power-on process, SRAM data will be either data 1 or 0, depending on which side has a stronger device. In other words, the initial data state of the SRAM array at the power on is decided by this unique random process variation and most of the bits maintain this property for life. One can use this unique pattern as a fingerprint of a device. The SRAM PUF data is reconstructed with other coded data to form a cryptographic key. SRAM PUF is a great way to anchor its secure data into hardware. Hackers may use a DFT circuit to access the memory. To avoid insecurely reading the SRAM information through DFT, the security-critical design makes DFT force delete the data as an initial process of TEST mode.”

However, there can be instances where data may be required to be kept in a non-volatile memory (NVM). “Data is considered insecure if the NVM is located outside of the device,” said Yun. “Therefore, secured data needs to be stored within the device with write protection. One-time programmable (OTP) memory or fuses are good storage options to prevent malicious attackers from tampering with the modified information. OTP memory and fuses are used to store cryptographic keys, authentication information, and other critical settings for operation within the device. It is useful for anti-rollback, which prevents hackers from exploiting old vulnerabilities that have been fixed in newer versions.”

Chiplet vulnerabilities
Chiplets also could present another vector for attack, due to their complexity and interconnections. “A chiplet has memory, so it’s going to be attacked,” said Cycuity’s Seshadri. “Chiplets, in general, are going to exacerbate the problem, rather than keeping it status quo, because you’re going to have one chiplet talking to another. Could an attack on one chiplet have a side effect on another? There need to be standards to address this. In fact, they’re coming into play already. A chiplet provider has to say, ‘Here’s what I’ve done for security. Here’s what needs to be done when interfacing with another chiplet.”

Yun notes there is a further physical vulnerability for those working with chiplets and SiPs. “When multiple chiplets are connected to form a SiP, we have to trust data coming from an external chip, which creates further complications. Verification of the chiplet’s authenticity becomes very important for SiPs, as there is a risk of malicious counterfeit chiplets being connected to the package for hacking purposes. Detection of such counterfeit chiplets is imperative.”

These precautions also apply when working with DRAM. In all situations, Seshardi said, thinking about security has to go beyond device-level protection. “The onus of protecting DRAM is not just on the DRAM designer or the memory designer,” he said. “It has to be secured by design principles when you are developing. In addition, you have to look at this holistically and do it at a system level. You must consider all the other things that communicate with DRAM or that are placed near DRAM. You must look at a holistic solution, all the way from software down to things like the memory controller and then finally, the DRAM itself.”

Encryption as a backup
Data itself always must be encrypted as second layer of protection against known and novel attacks, so an organization’s assets will still be protected even if someone breaks in via cold boot or another method.

“The first and primary method of preventing a cold boot attack is limiting physical access to the systems, or physically modifying the systems case or hardware preventing an attacker’s access,” said Jim Montgomery, market development director, semiconductor at TXOne Networks. “The most effective programmatic defense against an attack is to ensure encryption of memory using either a hardware- or software-based approach. Utilizing memory encryption will ensure that regardless of trying to dump the memory, or physically removing the memory, the encryption keys will remain secure.”

Montgomery also points out that TXOne is working with the Semiconductor Manufacturing Cybersecurity Consortium (SMCC) to develop common criteria based upon SEMI E187 and E188 standards to assist DM’s and OEM’s to implement secure procedures for systems security and integrity, including controlling the physical environment.

What kind and how much encryption will depend on use cases, said Jun Kawaguchi, global marketing executive for Winbond. “Encryption strength for a traffic signal controller is going to be different from encryption for nuclear plants or medical devices, critical applications where you need much higher levels,” he said. “There are different strengths and costs to it.”

Another problem, in the post-quantum era, is that encryption itself may be vulnerable. To defend against those possibilities, researchers are developing post-quantum encryption schemes. One way to stay a step ahead is homomorphic encryption [HE], which will find a role in data sharing, since computations can be performed on encrypted data without first having to decrypt it.

Homomorphic encryption could be in widespread use as soon as the next few years, according to Ronen Levy, senior manager for IBM’s Cloud Security & Privacy Technologies Department, and Omri Soceanu, AI Security Group manager at IBM. However, there are still challenges to be overcome.

“There are three main inhibitors for widespread adoption of homomorphic encryption — performance, consumability, and standardization,” according to Levy. “The main inhibitor, by far, is performance. Homomorphic encryption comes with some latency and storage overheads. FHE hardware acceleration will be critical to solving these issues, as well as algorithmic and cryptographic solutions, but without the necessary expertise it can be quite challenging.”

An additional issue is that most consumers of HE technology, such as data scientists and application developers, do not possess deep cryptographic skills, HE solutions that are designed for cryptographers can be impractical. A few HE solutions require algorithmic and cryptographic expertise that inhibit adoption by those who lack these skills.

Finally, there is a lack of standardization. “Homomorphic encryption is in the process of being standardized,” said Soceanu. “But until it is fully standardized, large organizations may be hesitant to adopt a cryptographic solution that has not been approved by standardization bodies.”

Once these issues are resolved, they predicted widespread use as soon as the next few years. “Performance is already practical for a variety of use cases, and as hardware solutions for homomorphic encryption become a reality, more use cases would become practical,” said Levy. “Consumability is addressed by creating more solutions, making it easier and hopefully as frictionless as possible to move analytics to homomorphic encryption. Additionally, standardization efforts are already in progress.”

A new attack and an old problem
Unfortunately, security never will be as simple as making users more aware of their surroundings. Otherwise, cold boot could be completely eliminated as a threat. Instead, it’s essential to keep up with conference talks and the published literature, as graduate students keep probing SRAM for vulnerabilities, hopefully one step ahead of genuine attackers.

For example, SRAM-related cold boot attacks originally targeted discrete SRAM. The reason is that it’s far more complicated to attack on-chip SRAM, which is isolated from external probing and has minimal intrinsic capacitance. However, in 2022, Jubayer Mahmod, then a graduate student at Virginia Tech and his advisor, associate professor Matthew Hicks, demonstrated what they dubbed “Volt Boot,” a new method that could penetrate on-chip SRAM. According to their paper, “Volt Boot leverages asymmetrical power states (e.g., on vs. off) to force SRAM state retention across power cycles, eliminating the need for traditional cold boot attack enablers, such as low-temperature or intrinsic data retention time…Unlike other forms of SRAM data retention attacks, Volt Boot retrieves data with 100% accuracy — without any complex post-processing.”

Conclusion
While scientists and engineers continue to identify vulnerabilities and develop security solutions, decisions about how much security to include in a design is an economic one. Cost vs. risk is a complex formula that depends on the end application, the impact of a breach, and the likelihood that an attack will occur.

“It’s like insurance,” said Kawaguchi. “Security engineers and people like us who are trying to promote security solutions get frustrated because, similar to insurance pitches, people respond with skepticism. ‘Why would I need it? That problem has never happened before.’ Engineers have a hard time convincing their managers to spend that extra dollar on the costs because of this ‘it-never-happened-before’ attitude. In the end, there are compromises. Yet ultimately, it’s going to cost manufacturers a lot of money when suddenly there’s a deluge of demands to fix this situation right away.”

References

S. Skorobogatov, “Low temperature data remanence in static RAM”, Technical report UCAM-CL-TR-536, University of Cambridge Computer Laboratory, June 2002.
Han, SJ., Han, JK., Yun, GJ. et al. Ultra-fast data sanitization of SRAM by back-biasing to resist a cold boot attack. Sci Rep 12, 35 (2022). https://doi.org/10.1038/s41598-021-03994-2

The post SRAM Security Concerns Grow appeared first on Semiconductor Engineering.

Semiconductor Engineering
Software-Defined Vehicle Momentum GrowsAnn Mutschler
Experts at the Table: The automotive ecosystem is undergoing a transformation toward software-defined vehicles, spurring new architectures with more software. Semiconductor Engineering sat down to discuss the impact of these changes with Suraj Gajendra, vice president of products and solutions in Arm‘s automotive line of business; Chuck Alpert, R&D automotive fellow at Cadence; Steve Spadoni, zone controller and power distribution application manager at Infineon; Rebeca Delgado, chief techno
9. Květen 2024 v 09:06

Software-Defined Vehicle Momentum Grows

Semiconductor Engineering

Od: Ann Mutschler

9. Květen 2024 v 09:06

Experts at the Table: The automotive ecosystem is undergoing a transformation toward software-defined vehicles, spurring new architectures with more software. Semiconductor Engineering sat down to discuss the impact of these changes with Suraj Gajendra, vice president of products and solutions in Arm‘s automotive line of business; Chuck Alpert, R&D automotive fellow at Cadence; Steve Spadoni, zone controller and power distribution application manager at Infineon; Rebeca Delgado, chief technology officer and principal AI engineer at Intel Automotive; Cyril Clocher, senior director in the automotive product line for high-performance computing at Renesas; David Fritz, vice president, hybrid and virtual systems at Siemens EDA; and Marc Serughetti, senior director, systems design group at Synopsys. What follows are excerpts of that discussion.

L-R: Arm’s Gajendra, Cadence’s Alpert, Infineon’s Spadoni, Intel’s Delgado, Renesas’ Clocher, Siemens’ Fritz, Synopsys’ Serughetti.

L-R: Arm’s Gajendra, Cadence’s Alpert, Infineon’s Spadoni, Intel’s Delgado, Renesas’ Clocher, Siemens’ Fritz, Synopsys’ Serughetti.

SE: The automotive ecosystem is undergoing a technology evolution the likes of which has not been seen, including the move to software-defined vehicles. To set a baseline for this discussion, what is your definition of an SDV?

Gajendra: A software-defined vehicle is a concept, a trend, an idea, where the whole ecosystem can drive new capabilities and new user experiences into the car, even after it rolls out of the showroom or dealership. It’s a pretty loaded concept. There’s a lot of infrastructure that needs to come together, such as software development in the cloud, seamless deployment of that software development onto the car, the whole deployment of over-the-air updates, and the connectivity. In short, the concept of a software-defined vehicle is expecting a world where we can drive new experiences, new capabilities, and new features into the car throughout its lifetime.

Alpert: In thinking about what SDV means, one example is the battery — especially in an EV. I’m not talking about the technology of the battery that’s evolved, but rather the idea that in the past when you wanted to charge your car in your garage and you were worried about starting a fire, you’d think, ‘No, don’t do that because your whole house could burn down.’ The idea is that in the past, maybe we might put a temperature sensor on the battery, but now we actually have software that can monitor it. It might even have AI to predict if the battery is reaching some state that might cause a fire in the future. You also might have something that connects to the power grid and learns when is a good time to charge, because it’s a low-usage period so it’s cheaper. This is just one part of the car, but you can imagine a whole bunch of software that you want to put on top of it in order to connect to the universe. You need a software-defined vehicle platform in order for this, or in all the other parts of your car, to communicate with the world and provide the best user experience.

Spadoni: Infineon’s definition of a software-defined vehicle is a redefining of architecture — specifically, electrical and electronic architecture, feature allocation, and the entire topology of the vehicle, from power generation and storage to power distribution and high compute. It really means new electrical architectures, and it has consequences for the business model of every OEM and Tier 1 involved. It’s a major change to previous methodologies in the last 30 years.

Delgado: Software-defined vehicle is not just over-the-air updates. It’s truly a new methodology and a new philosophy for how to architect every ingredient of the vehicle to continue to deliver value over time, in which the value is very tightly attached to the software that delivers the user experience. Ultimately, this architecture must enable the different practices on how to deliver this new value over time. What’s very interesting is that these practices of moving to software-defined architecture has been done by many other industries already. Intel has a ton of heritage, and actually helped those industries transform. That transformation is truly what we’re observing here. It’s an incredible opportunity, and possibly a crisis if not done right.

Clocher: To apply an analogy here, the car is the new smartphone. But for us, it’s more than that. I’ve heard about the platform, yes, and it’s the major architecture evolution that we’ll see in the next decade. For us at Renesas, it will be a journey that will take time to enhance the user experience, to generate new revenue streams for the industry as it moves from decentralized to centralized classic compute with zonal architecture. We can apply all those buzzwords to a software-defined vehicle. Those platform will need big computers and heavy complex hardware solutions and this will generate evolutions, upgrades to the car during its entire lifetime, but underneath we know — at least at Renesas, and certainly at some other players and silicon vendors — that this will need a huge amount of hardware resources to manage what we have in mind to deploy this platform.

Fritz: I see software-defined vehicles a bit differently than what’s been mentioned so far. For many years, you’d have the hardware team doing their design, and the software team doing their design, and it all needs to come together. There’s an English natural language discussion about what needs to happen, and as we all know, that never really goes terribly well. In automotive that becomes an integration storm, and it is a nightmare. With the new compute requirements that have been mentioned already, that just compounds the issue. So the way I see this is that we tend, as people who have an engineering background, to dive into how we’re going to do things. We hear ‘software-defined vehicle,’ we immediately think about how to do that. There’s not a lot of thought about why it needs to be done, and what needs to happen. We jump into the ‘how’ too early, and a lot of the discussion here is exemplary of that kind of approach. When I’m looking at software-defined vehicles, I’m looking at why it’s important that the software needs to run effectively on a piece of hardware. And for that hardware, why is it important for it to actually operate properly on the software? Then you can decide how to put together a new methodology that’s going to bring those things together. In the past, it’s been called hardware/software co-design. There have been attempts many times, and as has been mentioned, other industries have made this transition. What’s unique about automotive is that it’s not just one transition that needs to happen. It’s hundreds or thousands of transitions. The ecosystem needs to be turned upside down, which we’re seeing happen right now, and you need to bring all that together. It really is a methodology where you need the tooling, you need the processes, you need the thinking, you need the organizations to change so that they can make this transition in a realistic way. SDV is a huge transition. It is a way for the automotive industry to morph into something that has longevity and can meet customer expectations, which it really hasn’t met for some time now.

Serughetti: At the end of the day, if we look starting at the top from our perspective, SDV is a means to bring and enhance the car experience for the customer. That’s the end result that the OEMs look at, but they look at it from the perspective of how that improves the OEM efficiencies, and how that creates new business opportunities. The way we look at it, and what’s important, is the impact it has on the industry, the impact on the processes, on the methodologies, on the people, on the ecosystem, on the technology. It’s really a transformation of the automotive market that is going to fundamentally change how the industry moves forward and bring the OEM into a world in which they are really looking at how they become efficient in delivering cars, how they bring new features, but at the same time, how they evolve their business as well.

SE: As you’ve all described, SDV requires many inter-dependencies, and the entire ecosystem has to have an understanding of the ‘why,’ which should then lead back to laying out the plan for how to get there. Where does the ecosystem stand today in terms of realizing SDV?

Fritz: OEMs have decided in the last few years that they’ve got to take control of their own destiny. They cannot simply take what the suppliers provide. They need a methodology — like this whole SDV concept, and any tooling necessary to provide that — to push down into their suppliers, such that, ‘Here’s what I need. If you can’t do this for me, I will go find someone that will.’ This is not the old ecosystem that bubbled up from the IP to the Tier 2s, to the Tier 1s, and then to the OEMs, which gave them limited choices to go from. So when I say, “Turn the ecosystem upside down,” that’s what is happening. But every OEM has their own ecosystem, and they’re not all in the same place. Even region-to-region, they can be very different.

Delgado: This is a critical discussion, and effectively where the industry has to eventually settle. The magnitude of the transformation of the ecosystem includes roles in the technology evolution. The silicon content is expected to quadruple over the next few years in the vehicle for defining the in-cabin experience of the end user. At the end of the day, the complexity of the transition of roles is of such magnitude that the proprietary, fragmented, and broken approaches that David articulated are really not going to enable the industry to transform at the speed it requires to deliver and meet the experiences. But more than anything, they are not going to address the actual technology changes necessary to implement and allow for this value delivery mechanism. At the end of the day, this is where Intel really believes collaboration is key, and anybody who wants to participate in this ecosystem must provide scalability — also known as top-to-bottom support of the different product lines that our OEMs and Tier 1s are having to support, versus a broken-up approach on these ever-evolving higher performance and higher performance compute needs. It has to be future-proof, because you’re going to launch the vehicle eventually. So certain hardware has to be future-proofed to a certain affordability envelope, and there has to be a strategy around that. And then the ecosystem and that collaboration must be able to deliver that aggregation. It has to be done with certain anchoring technology that will allow us to deliver that performance. Collaboration is key in the sense that these technologies cannot be single-handedly owned, developed, let alone owned, defined, developed, and integrated by OEMs in silos with a proprietary end-to-end architecture definition. There obviously will be differentiations on the actual implementation, but the technologies at large have to have a sense of reuse, particularly from other verticals that have already done software-defined transformations and then tuned in the right ways toward the automotive requirements.

Spadoni: There are probably a wide variety of implementations. At Infineon, we partner with OEMs and Tier 1s and we see different approaches. For example, General Motors has more of a modular approach that emulates what happened in in the mobile phone space. It seems that Ford has a more pragmatic approach, along with Stellantis, but all of them are facing very similar challenges in that affordability has become a big problem. There are multiple generations of implementations that are going to occur, and you’ll see a striving toward how to pay for this extra hardware. It leads to tradeoffs in implementations of other systems that have to have savings in order for them to afford these vehicles. No one ever goes into a dealership and says, ‘Give me a software-defined vehicle.’ Everyone’s looking for value, and you can see it now with volumes going down. There’s a saturation of people buying at the high level. The OEMs want to get more sales, which means they’ll have to go to the lower-cost-value vehicles, and that’s going to affect the electrical and electronic architectures and the software-defined vehicle.

Clocher: What we’re seeing I would summarize as the impact on the ecosystem. We’re moving to an OEM-centric ecosystem. One size does not fit all, meaning OEMs will have their different tastes, their different definitions of levels of integration they want to have in their software-defined vehicle — especially given more complex tasks that we all have to do, rather than the challenge we have to solve, because we’re not talking about a common umbrella of software-defined vehicle. But it really does mean different implementations and different meanings for OEM A from OEM B. I would fully agree with David and Steve that we are far from having a common understanding of, at least, the market itself. And that’s fine, because this will bring differentiation, and ultimately that’s why a customer will go to Dealership A versus Dealership B. This is what the industry wants to see — continue to differentiate, continue to add value to the ultimate product, which is the car.

Serughetti: The important point in all this is, of course, you’re breaking the model that exists today. That’s one of the big challenges. We used to have Tier 1s that were building boxes, and delivering software. This was a complete black box. When it would go to integration, there were all sorts of problems. And now you’re going to break this? The challenge for the OEM is how they do this. They want to control software, but are they equipped to do this today? We see the problems today that some of the legacy OEMs have in setting up their software organizations, the challenges of CARIAD and all such organizations that are trying to do this. It’s not easy to change those companies. Of course, the new entrants don’t have this problem because they are coming from a brand new design versus the ones that deal with legacy. So for the OEM, it’s about how to take control of the software. What does that mean in terms of the processes, in terms of agile development, digital twins, and all of these technologies everybody’s talking about? The other side is, ‘It’s all nice, this software,’ but this software runs on all the companies that are delivering hardware, and that becomes essential to it. You can have the best software, but if your hardware is not there to support performance, power, and all of those aspects, you’re not going to be successful. So the ecosystem is evolving how hardware, software, and all of this comes together. The OEM wants to be the central point. That’s what we’re talking about in terms of the process methodology aspects that are making this transition evolve.

Gajendra: Where are we in this journey? How far have we come? And where are we going? Going back to the point that David mentioned earlier about supply chain evolving and the supply chain turned upside down, five years ago, if we sat here in this sort of a panel and discussed software-defined vehicles, the conversation would have been entirely different. It would have been stuck with the traditional supply chain that we’ve seen for the last 35 or 40 years in the automotive industry. There are fundamentally two aspects here. The supply chain is evolving, and the infrastructure that we, as a community — this team, for example, and many others in the community — are trying to enable is going to be key to making our EDA partners happy. The use of virtual platforms today in the cloud to try and shift left and develop and validate some of these technologies and software wasn’t even there five years ago, so we’ve come a long way. We’ve made a lot of progress together as an industry. Yes, we have a long way to go until we actually have a truly software-defined vehicle. We can go and ask for a software-defined vehicle in the dealership. But the changes we are seeing in terms of all sorts of technology providers trying to make sure that the technology that we eventually will have in the hardware is provided in some sort of virtual form, be it fast models or whatever it is in the cloud, for the vast majority of software ecosystem in automotive this is a big change. I was at Embedded World, and the amount of virtual platforms and the demos that people were actually showing — silicon partners like we have here, Intel, Renesas, Infineon, EDA companies — pointed to a strong movement of, ‘Let’s build the infrastructure that we can build, and then provide that infrastructure to the OEMs to take it from there.’ There is a lot of work going on. Together we will make the infrastructure across the board, be it virtual platform or others, richer and more capable.

Alpert: For sure, OEMs have to control their own destiny. In the past, they would do it by differentiating maybe because they had better engine performance, or some other feature. But going forward, the differentiation is going to be their software. Whoever can make software that will provide additional value, and brand it, that’s going to be the differentiator and that’s the trend. In terms of how you get there, a shared ecosystem is important. SOAFEE is a potential way that, together with virtual platforms, you can provide a shared ecosystem for development, but still allow everyone to differentiate and plug-and-play. That’s one reason we’re working closely with Arm on trying to have a reference design specifically for this purpose. But again, we’re not saying, ‘This is the design you use. This is how you do it.’ That’s not it. The point is, let’s start somewhere, and then people can start swapping out pieces and doing different things. As long as OEMs can plug-and-play, then they can still differentiate. But they don’t have to invent everything themselves, which would be too costly.

Related Reading
Software-Defined Vehicles Ready To Roll
New approach could have big effects on cost, safety, security, and time to market.

The post Software-Defined Vehicle Momentum Grows appeared first on Semiconductor Engineering.

Semiconductor Engineering
SRAM Security Concerns GrowKaren Heyman
SRAM security concerns are intensifying as a combination of new and existing techniques allow hackers to tap into data for longer periods of time after a device is powered down. This is particularly alarming as the leading edge of design shifts from planar SoCs to heterogeneous systems in package, such as those used in AI or edge processing, where chiplets frequently have their own memory hierarchy. Until now, most cybersecurity concerns involving volatile memory have focused on DRAM, because it
9. Květen 2024 v 09:08