FreshRSS

Zobrazení pro čtení

Jsou dostupné nové články, klikněte pro obnovení stránky.

Survey of Energy Efficient PIM Processors

A new technical paper titled “Survey of Deep Learning Accelerators for Edge and Emerging Computing” was published by researchers at University of Dayton and the Air Force Research Laboratory.

Abstract

“The unprecedented progress in artificial intelligence (AI), particularly in deep learning algorithms with ubiquitous internet connected smart devices, has created a high demand for AI computing on the edge devices. This review studied commercially available edge processors, and the processors that are still in industrial research stages. We categorized state-of-the-art edge processors based on the underlying architecture, such as dataflow, neuromorphic, and processing in-memory (PIM) architecture. The processors are analyzed based on their performance, chip area, energy efficiency, and application domains. The supported programming frameworks, model compression, data precision, and the CMOS fabrication process technology are discussed. Currently, most commercial edge processors utilize dataflow architectures. However, emerging non-von Neumann computing architectures have attracted the attention of the industry in recent years. Neuromorphic processors are highly efficient for performing computation with fewer synaptic operations, and several neuromorphic processors offer online training for secured and personalized AI applications. This review found that the PIM processors show significant energy efficiency and consume less power compared to dataflow and neuromorphic processors. A future direction of the industry could be to implement state-of-the-art deep learning algorithms in emerging non-von Neumann computing paradigms for low-power computing on edge devices.”

Find the technical paper here. Published July 2024.

Alam, Shahanur, Chris Yakopcic, Qing Wu, Mark Barnell, Simon Khan, and Tarek M. Taha. 2024. “Survey of Deep Learning Accelerators for Edge and Emerging Computing” Electronics 13, no. 15: 2988. https://doi.org/10.3390/electronics13152988.

The post Survey of Energy Efficient PIM Processors appeared first on Semiconductor Engineering.

Survey of Energy Efficient PIM Processors

A new technical paper titled “Survey of Deep Learning Accelerators for Edge and Emerging Computing” was published by researchers at University of Dayton and the Air Force Research Laboratory.

Abstract

“The unprecedented progress in artificial intelligence (AI), particularly in deep learning algorithms with ubiquitous internet connected smart devices, has created a high demand for AI computing on the edge devices. This review studied commercially available edge processors, and the processors that are still in industrial research stages. We categorized state-of-the-art edge processors based on the underlying architecture, such as dataflow, neuromorphic, and processing in-memory (PIM) architecture. The processors are analyzed based on their performance, chip area, energy efficiency, and application domains. The supported programming frameworks, model compression, data precision, and the CMOS fabrication process technology are discussed. Currently, most commercial edge processors utilize dataflow architectures. However, emerging non-von Neumann computing architectures have attracted the attention of the industry in recent years. Neuromorphic processors are highly efficient for performing computation with fewer synaptic operations, and several neuromorphic processors offer online training for secured and personalized AI applications. This review found that the PIM processors show significant energy efficiency and consume less power compared to dataflow and neuromorphic processors. A future direction of the industry could be to implement state-of-the-art deep learning algorithms in emerging non-von Neumann computing paradigms for low-power computing on edge devices.”

Find the technical paper here. Published July 2024.

Alam, Shahanur, Chris Yakopcic, Qing Wu, Mark Barnell, Simon Khan, and Tarek M. Taha. 2024. “Survey of Deep Learning Accelerators for Edge and Emerging Computing” Electronics 13, no. 15: 2988. https://doi.org/10.3390/electronics13152988.

The post Survey of Energy Efficient PIM Processors appeared first on Semiconductor Engineering.

Dedicated Approximate Computing Framework To Efficiently Compute PCs On Hardware

A technical paper titled “On Hardware-efficient Inference in Probabilistic Circuits” was published by researchers at Aalto University and UCLouvain.

Abstract:

“Probabilistic circuits (PCs) offer a promising avenue to perform embedded reasoning under uncertainty. They support efficient and exact computation of various probabilistic inference tasks by design. Hence, hardware-efficient computation of PCs is highly interesting for edge computing applications. As computations in PCs are based on arithmetic with probability values, they are typically performed in the log domain to avoid underflow. Unfortunately, performing the log operation on hardware is costly. Hence, prior work has focused on computations in the linear domain, resulting in high resolution and energy requirements. This work proposes the first dedicated approximate computing framework for PCs that allows for low-resolution logarithm computations. We leverage Addition As Int, resulting in linear PC computation with simple hardware elements. Further, we provide a theoretical approximation error analysis and present an error compensation mechanism. Empirically, our method obtains up to 357x and 649x energy reduction on custom hardware for evidence and MAP queries respectively with little or no computational error.”

Find the technical paper here. Published May 2024 (preprint). CODE: https://github.com/lingyunyao/AAI_Probabilistic_Circuits

Yao, Lingyun, Martin Trapp, Jelin Leslin, Gaurav Singh, Peng Zhang, Karthekeyan Periasamy, and Martin Andraud. “On Hardware-efficient Inference in Probabilistic Circuits.” arXiv preprint arXiv:2405.13639 (2024).

Related Reading
Architecting Chips For High-Performance Computing
Data center IC designs are evolving, based on workloads, but making the tradeoffs for those workloads is not always straightforward.
AI Tradeoffs At The Edge
The best ways to optimize AI efficiency today, and other options under development.

The post Dedicated Approximate Computing Framework To Efficiently Compute PCs On Hardware appeared first on Semiconductor Engineering.

Efficient TNN Inference on RISC-V Processing Cores With Minimal HW Overhead

A new technical paper titled “xTern: Energy-Efficient Ternary Neural Network Inference on RISC-V-Based Edge Systems” was published by researchers at ETH Zurich and Universita di Bologna.

Abstract
“Ternary neural networks (TNNs) offer a superior accuracy-energy trade-off compared to binary neural networks. However, until now, they have required specialized accelerators to realize their efficiency potential, which has hindered widespread adoption. To address this, we present xTern, a lightweight extension of the RISC-V instruction set architecture (ISA) targeted at accelerating TNN inference on general-purpose cores. To complement the ISA extension, we developed a set of optimized kernels leveraging xTern, achieving 67% higher throughput than their 2-bit equivalents. Power consumption is only marginally increased by 5.2%, resulting in an energy efficiency improvement by 57.1%. We demonstrate that the proposed xTern extension, integrated into an octa-core compute cluster, incurs a minimal silicon area overhead of 0.9% with no impact on timing. In end-to-end benchmarks, we demonstrate that xTern enables the deployment of TNNs achieving up to 1.6 percentage points higher CIFAR-10 classification accuracy than 2-bit networks at equal inference latency. Our results show that xTern enables RISC-V-based ultra-low-power edge AI platforms to benefit from the efficiency potential of TNNs.”

Find the technical paper here. Published May 2024.

Rutishauser, Georg, Joan Mihali, Moritz Scherer, and Luca Benini. “xTern: Energy-Efficient Ternary Neural Network Inference on RISC-V-Based Edge Systems.” arXiv preprint arXiv:2405.19065 (2024).

The post Efficient TNN Inference on RISC-V Processing Cores With Minimal HW Overhead appeared first on Semiconductor Engineering.

CAM-Based CMOS Implementation Of Reference Frames For Neuromorphic Processors (Carnegie Mellon U.)

A technical paper titled “NeRTCAM: CAM-Based CMOS Implementation of Reference Frames for Neuromorphic Processors” was published by researchers at Carnegie Mellon University.

Abstract:

“Neuromorphic architectures mimicking biological neural networks have been proposed as a much more efficient alternative to conventional von Neumann architectures for the exploding compute demands of AI workloads. Recent neuroscience theory on intelligence suggests that Cortical Columns (CCs) are the fundamental compute units in the neocortex and intelligence arises from CC’s ability to store, predict and infer information via structured Reference Frames (RFs). Based on this theory, recent works have demonstrated brain-like visual object recognition using software simulation. Our work is the first attempt towards direct CMOS implementation of Reference Frames for building CC-based neuromorphic processors. We propose NeRTCAM (Neuromorphic Reverse Ternary Content Addressable Memory), a CAM-based building block that supports the key operations (store, predict, infer) required to perform inference using RFs. NeRTCAM architecture is presented in detail including its key components. All designs are implemented in SystemVerilog and synthesized in 7nm CMOS, and hardware complexity scaling is evaluated for varying storage sizes. NeRTCAM system for biologically motivated MNIST inference with a storage size of 1024 entries incurs just 0.15 mm^2 area, 400 mW power and 9.18 us critical path latency, demonstrating the feasibility of direct CMOS implementation of CAM-based Reference Frames.”

Find the technical paper here. Published May 2024 (preprint).

Nair, Harideep, William Leyman, Agastya Sampath, Quinn Jacobson, and John Paul Shen. “NeRTCAM: CAM-Based CMOS Implementation of Reference Frames for Neuromorphic Processors.” arXiv preprint arXiv:2405.11844 (2024).

Related Reading
Running More Efficient AI/ML Code With Neuromorphic Engines
Once a buzzword, neuromorphic engineering is gaining traction in the semiconductor industry.

The post CAM-Based CMOS Implementation Of Reference Frames For Neuromorphic Processors (Carnegie Mellon U.) appeared first on Semiconductor Engineering.

RISC-V Heralds New Era Of Cooperation

RISC-V is paving the way for open source to become accepted within the hardware community, creating a level of industry collaboration never seen in the past, while revitalizing the connection between academia and industry.

The big question is whether this arrangement is just a placeholder while the industry re-learns how to develop processors, or whether this processor architecture is something very different. In either case, there is a clear and pressing need for more flexible processor architectures, and at least for now, RISC-V has filled a void.

“RISC-V was born out of academia and has had strong collaboration within universities from day one,” says Loren Hobbs, vice president of product and business development at Bluespec. “This collaboration continues today, with many of the most popular open-source RISC-V processors having come from universities. Organizations such as OpenHW Group and CHIPS Alliance serve a central and critical role in driving the collaboration, which is bi-directional between the academic community and industry.”

Collaboration of this type has not existed with the industrial community in the past. “We are learning from each other,” says Florian Wohlrab, CEO at OpenHW. “We are learning best practices for verification. At the same time, we are learning what things to avoid. It is growing where people say, ‘Yes, I really get benefit from sharing ideas.'”

The need for processor flexibility exists within industry as well as academia. “There is a need within the industry for diversification on the processor front,” says Neil Hand, director of marketing at Siemens EDA. “In the past, this led to a fragmented set of companies that couldn’t work together. They didn’t see the value of working together. But RISC-V has a cohesive central organization where anyone who wants to get into the processor space can collaborate. They don’t have to expose their secret sauce, but they get to benefit from each other. A rising tide lifts all boats, and that’s really the situation we’re at with RISC-V.”

Longevity
Whether the industry can build upon this success, or whether it fizzles out over time, remains to be seen. But at least for now, RISC-V’s momentum is growing. “We are at the beginning of a revolution in hardware design,” says OpenHW’s Wohlrab. “We saw the same thing for software when Linux came out 20 or so years ago. No one was really thinking about sharing software or collaboratively developing software. There were some small open-source ventures, but working together on a big project took a long time to develop. Now we are all sharing software, all co-working. But for hardware, we’re just at the beginning of this new concept, and a lot of people need to understand that we can do the same for hardware as we did for software.”

Underlying RISC-V’s success is widespread collaboration. “One of the pillars sustaining the success of RISC-V is customization that works with the ecosystem and leverages a well-defined process,” says Sergio Marchese, vice president of application engineering at SmartDV. “RISC-V vendors face the challenge of showing how their processor customization capabilities serve an application and demonstrating the complete process on real hardware. Without strategic partnerships, RISC-V vendors must walk a much more challenging, time-consuming, and resource-intensive road.”

That framework is what makes it unique. “RISC-V has formed this framework for collaboration, and it fixes everything,” says Siemens’ Hand. “Now, when a university has a really cool idea for memory tagging in a processor design, they don’t have to build the compilers, they don’t have to build the reference platform. They already exist. Maybe a compiler optimization startup has this great idea for handling code optimization. They don’t have to build the rest of the ecosystem. When a processor IP company has this great idea, they can become focused within this bigger picture. That’s the unique nature of it. It’s not just a processor specification.”

Historically, one of the problems associated with open-source hardware was quality, because finding bugs in silicon is expensive. OpenHW is an important piece of the puzzle. “Why should everyone reinvent the wheel by themselves?” asks Wohlrab. “Why can’t we get the basic building blocks, some basic chips, take some design from academia, which has reasonably good quality, and build on them, verify them together. We are verifying with different tools, making sure we get a high coverage, and then everyone can go off and use them in their own chips for mass production, for volume shipment.”

This benefits companies both large and small. “There are several processor vendors that have switched to RISC-V,” says Hand. “Synopsys has moved to RISC-V. Andes has moved to RISC-V. MIPS has moved to RISC-V. Why? Because they can leverage the whole ecosystem. The downside of it is commoditization, which as a customer is really beneficial because you can delay choosing a processor till later in the design flow. Your early decision is to use the Arm ecosystem or RISC-V, and then you can work through it. That creates an interesting set of dynamics. You can start to create new opportunities for companies that develop and deliver IP, because you can benchmark them, swap them in and out, and see which one works. On the flip side, it makes it awful from a lock-in perspective once you’re in that socket.”

Fragmentation
Of course, there will be some friction in the system. “In the early days of RISC-V there was nearly a 1:1 balance between contributors and consumers of the technology,” says Geir Eide, director, product management for Siemens EDA. “Today there are thousands of RISC-V consumers, but only a small percentage of those will be contributors. There is a risk that there will be a disconnect between them. If, for instance, a particular market or regional segment is growing at a higher pace than others, or other market segments and regions are more conservative, they tend to stick to established solutions longer. That increases the risk that it could lead to fragmentation.”

Is that likely to impact development long term? “We do not believe that RISC-V will become regionally concentrated, although though there may be regional concentrations of focus within the broad set of implementation choices provided by RISC-V,” says Bluespec’s Hobbs. “A prime example of this is the Barcelona Supercomputer Center, creating a regional focus area for high-performance computing using RISC-V. However, while there may be regional focus areas, this does not mean that the RISC-V standard is, or will become, fragmented. In fact, one of the key tenets of the creation and foundation of RISC-V was preventing fragmentation of the ISA, and it continues to be a key function of RISC-V international.”

China may be a different story. “A lot of companies in China are creating RISC-V cores for internal consumption — for political reasons mostly,” says John Min, vice president of customer service at Arteris. “I think China will go 100% RISC-V for embedded, but it’s a one-way street. They will keep leveraging what the Western companies do and enhance it. China will continue sucking all advancements, such as vectorization, or the special domain-specific acceleration enhancements. They will create their own and make it their own internally, but they will give nothing back.”

Such splits have occurred in the past. “Design languages are the most recent example of that,” says Hand. “There was a regional split, and you had Europe focus on VHDL while America went with Verilog. With RISC-V, there will be that regional split where people will go off and do their things regionally. Europe has focused projects, India has theirs, but they’re still doing it within this framework. It’s this realization that everyone benefits. They’re not doing it to benefit the other people. They’re doing it ultimately to save themselves effort, to save themselves cost, but they realize that by doing it in that framework it is a net benefit to everyone.”

Bi-directionality
An important element is that everyone benefits, and that has to stretch across the academic/commercial boundary. “RISC-V has propelled a new degree of collaboration between academia and commercial organizations,” says Dave Kelf, CEO at Breker. “It’s noticeable that institutions such as Harvey Mudd College in Claremont, California, and ETH in Zurich, Switzerland, to name two, have produced advanced processor designs as a teaching aid, and have collaborated with multiple companies on their verification and design. This has been further advanced by OpenHW Group, which has made these designs accessible to the industry. This bi-directional collaboration benefits the tool providers to further enhance their offerings working on advanced, open devices, while also enabling academia to improve their designs to a commercial quality level. The virtuous circle created is essential if we are to see RISC-V established as a mainstream, industry-wide capability.”

Academia has a lot to offer in hardware advancement. “Researchers in universities are developing innovative new software and hardware to push the limits of RISC-V innovation,” says Dave Miller, head of corporate communications at SiFive. “Many of the RISC-V projects in academia are focused on optimizing performance and energy efficiency for AI workloads, and are open source so the entire ecosystem can benefit. Researchers are also actively contributing to RISC-V working groups to share their knowledge and collaborate with industry participants. These working groups are split evenly between representatives from APAC, Europe, and North America, all working together towards common goals.”

In many cases, industry is willing to fund such projects. “It makes it easy to have research topics that don’t need to boil the ocean,” says Hand. “If you’re a PhD student and you have a great idea, you can go do it. It’s easy for an industry partner to say, ‘I’ll sponsor that. That’s an interesting thing, and I am not required to allocate ridiculous amounts of money into an open-ended project. It’s like I can see the connection of how that research will go into a commercial product later.'”

This feeds back into academia. “The academics have been jumping on board with OpenHW,” says Wohlrab. “By taking their cores and productizing them, they get a chip back that could be shipped in high volume. Then they can do their research on a real commercial product and can see if their idea would fly in real life. They get real numbers and can see real figures for the benefits of a new branch predictor.”

It can also have a long-term benefit for tools. “There are areas where they want to collaborate with us, especially around security,” says Kiran Vittal, executive director for alliances marketing management at Synopsys. “They are building RISC-V based sub-systems using open-source RISC-V processors, and then academia wants to look at not only the AI part, but the security part. There are post-doc students or PhD students looking into using our tools to verify or to implement whatever they’re doing on security.”

That provides an incentive for EDA to offer better tools for use in universities. “Although there has always been collaboration between universities and the industry, where industry provides the universities with access to EDA tools, IP cores, etc., there’s often a bit of a lag,” says Siemens’ Eide. “In many situations (especially outside of the core area of a particular project), universities have access to older versions of the commercial solutions. If you for instance look at a new grad’s resume, where you in the past would see references to old tech, now you see a lot of references to relatively sophisticated use of RISC-V.”

Moving Forward
This collaboration needs to keep pushing forward. “We had an initiative to create a standardized interface for accelerators,” says Wohlrab. “RISC-V International standardized how to add custom instructions in the ISA, but there was no standard for the hardware interface. So we built this. It was a cool discussion. There were people from Silicon Labs, people from NXP, people from Thales, plus several startups. They all came together and asked, ‘How can we make it future proof and put the accelerators inside?'”

The application space for RISC-V is changing. “The big inflection point is Linux and Android,” says Arteris’ Min. “Android already has some support, but when both Android and Linux are really supported, it will change the mobile apps processor game. The number of designs will proliferate. The number of high-end designs will explode. It will take the whole industry to enable that because RISC-V companies are not big enough to create this by themselves. All the RISC-V companies are partners, because we enable this high-end design at the processor levels.”

That would deepen the software community’s engagement. “An embedded software developer needs to understand the underlying hardware if they want to run Linux on a RISC-V processor that uses custom instructions/ accelerators,” says Bluespec’s Hobbs. “To develop complex embedded hardware/software systems, both embedded software developers and embedded hardware developments must possess contextual understanding of the interoperability of hardware and software. The developer must understand how the customized processor is leveraging the custom instructions in hardware for Linux to efficiently manage and execute the accelerated workloads.”

This collaboration could reinvigorate research into EDA, as well. “With AI you can build predictive models,” says Hand. “Could that be used to identify the change effects from making an extension? What does that mean? There’s a cloud of influence — not directly gate-wise, because that immediately explodes — but perhaps based on test suites. ‘I know that something that touches that logic touches this downstream, which touches the rest of the design.’ That’s where AI plays a big role, and it is one of the interesting areas because in verification there are so many unknowns. When AI comes along, any guidance or any visibility that you can give is incredibly powerful. Even if it is not right 100% of the time, that’s okay, as long as it generates false negatives and not false positives.”

There is a great opportunity for EDA companies. “We collaborate with many of the open-source providers, with OpenHW group, with ETH in Zurich,” says Synopsys’ Vittal. “We want to promote our solutions when it comes to any processor design and you need standard tools like synthesis, place and route, simulation. But then there are also other kinds of unique solutions because RISC-V is so customizable, you can build your own custom instructions. You need something specific to verify these custom instructions and that’s why the Imperas golden models are important. We also collaborated with Bluespec to develop a verification methodology to take you through functional verification and debug.”

There are still some wrinkles to be worked out for customizations. “RISC-V gives us predictability,” says Hand. “We can create a compliance test suite, give you a processor optimization package if you’re on the implementation side. We can create analytics and testing solutions because we know what it’s going to look like. But for non-standard processors, it is effectively as a service, because everyone’s processor is a little bit different. The reason you see a lot of focus on the verification, from platform architecture exploration all the way through, is because if you change one little thing, such as an addressing mode, it impacts pretty much 100% of your processor verification. You’ve got to retest the whole processor. Most people aren’t set up like an Arm or an Intel with huge processor verification teams and the infrastructure, and so they need automation to do it for them.”

Conclusion
RISC-V has enabled the industry to create a framework for collaboration, which enables everyone to work together for selfish reasons. It is a symbiotic relationship that continues to build, and it is creating a wider sphere of influence over time.

“It’s unique in the modern era of semiconductor,” says Hand. “You have such a wide degree of collaboration, where you have processor manufacturers, the software industry leaders, EDA companies, all working on a common infrastructure.”

Related Reading
RISC-V Micro-Architectural Verification
Verifying a processor is much more than making sure the instructions work, but the industry is building from a limited knowledge base and few dedicated tools.
RISC-V Wants All Your Cores
It is not enough to want to dominate the world of CPUs. RISC-V has every core in its sights, and it’s starting to take steps to get there.

The post RISC-V Heralds New Era Of Cooperation appeared first on Semiconductor Engineering.

DRAM Microarchitectures And Their Impacts On Activate-Induced Bitflips Such As RowHammer 

A technical paper titled “DRAMScope: Uncovering DRAM Microarchitecture and Characteristics by Issuing Memory Commands” was published by researchers at Seoul National University and University of Illinois at Urbana-Champaign.

Abstract:

“The demand for precise information on DRAM microarchitectures and error characteristics has surged, driven by the need to explore processing in memory, enhance reliability, and mitigate security vulnerability. Nonetheless, DRAM manufacturers have disclosed only a limited amount of information, making it difficult to find specific information on their DRAM microarchitectures. This paper addresses this gap by presenting more rigorous findings on the microarchitectures of commodity DRAM chips and their impacts on the characteristics of activate-induced bitflips (AIBs), such as RowHammer and RowPress. The previous studies have also attempted to understand the DRAM microarchitectures and associated behaviors, but we have found some of their results to be misled by inaccurate address mapping and internal data swizzling, or lack of a deeper understanding of the modern DRAM cell structure. For accurate and efficient reverse-engineering, we use three tools: AIBs, retention time test, and RowCopy, which can be cross-validated. With these three tools, we first take a macroscopic view of modern DRAM chips to uncover the size, structure, and operation of their subarrays, memory array tiles (MATs), and rows. Then, we analyze AIB characteristics based on the microscopic view of the DRAM microarchitecture, such as 6F^2 cell layout, through which we rectify misunderstandings regarding AIBs and discover a new data pattern that accelerates AIBs. Lastly, based on our findings at both macroscopic and microscopic levels, we identify previously unknown AIB vulnerabilities and propose a simple yet effective protection solution.”

Find the technical paper here. Published May 2024.

Nam, Hwayong, Seungmin Baek, Minbok Wi, Michael Jaemin Kim, Jaehyun Park, Chihun Song, Nam Sung Kim, and Jung Ho Ahn. “DRAMScope: Uncovering DRAM Microarchitecture and Characteristics by Issuing Memory Commands.” arXiv preprint arXiv:2405.02499 (2024).

Related Reading
Securing DRAM Against Evolving Rowhammer Threats
A multi-layered, system-level approach is crucial to DRAM protection.

 

The post DRAM Microarchitectures And Their Impacts On Activate-Induced Bitflips Such As RowHammer  appeared first on Semiconductor Engineering.

Competitive Open-Source EDA Tools

A technical paper titled “Basilisk: Achieving Competitive Performance with Open EDA Tools on an Open-Source Linux-Capable RISC-V SoC” was published by researchers at ETH Zurich and University of Bologna.

Abstract:

“We introduce Basilisk, an optimized application-specific integrated circuit (ASIC) implementation and design flow building on the end-to-end open-source Iguana system-on-chip (SoC). We present enhancements to synthesis tools and logic optimization scripts improving quality of results (QoR), as well as an optimized physical design with an improved power grid and cell placement integration enabling a higher core utilization. The tapeout-ready version of Basilisk implemented in IHP’s open 130 nm technology achieves an operation frequency of 77 MHz (51 logic levels) under typical conditions, a 2.3x improvement compared to the baseline open-source EDA design flow presented in Iguana, and a higher 55% core utilization compared to 50% in the baseline design. Through collaboration with EDA tool developers and domain experts, Basilisk exemplifies a synergistic effort towards competitive open-source electronic design automation (EDA) tools for research and industry applications.”

Find the technical paper here. Published May 2024.

Sauter, Phillippe, Thomas Benz, Paul Scheffler, Zerun Jiang, Beat Muheim, Frank K. Gürkaynak, and Luca Benini. “Basilisk: Achieving Competitive Performance with Open EDA Tools on an Open-Source Linux-Capable RISC-V SoC.” arXiv preprint arXiv:2405.03523 (2024).

Related Reading
EDA Back On Investors’ Radar
Big changes are fueling growth, and it’s showing in EDA revenue, acquisitions, and stock prices.
RISC-V Wants All Your Cores
It is not enough to want to dominate the world of CPUs. RISC-V has every core in its sights, and it’s starting to take steps to get there.

The post Competitive Open-Source EDA Tools appeared first on Semiconductor Engineering.

Merging Power and Arithmetic Optimization Via Datapath Rewriting (Intel, Imperial College London)

A new technical paper titled “Combining Power and Arithmetic Optimization via Datapath Rewriting” was published by researchers at Intel Corporation and Imperial College London.

Abstract:
“Industrial datapath designers consider dynamic power consumption to be a key metric. Arithmetic circuits contribute a major component of total chip power consumption and are therefore a common target for power optimization. While arithmetic circuit area and dynamic power consumption are often correlated, there is also a tradeoff to consider, as additional gates can be added to explicitly reduce arithmetic circuit activity and hence reduce power consumption. In this work, we consider two forms of power optimization and their interaction: circuit area reduction via arithmetic optimization, and the elimination of redundant computations using both data and clock gating. By encoding both these classes of optimization as local rewrites of expressions, our tool flow can simultaneously explore them, uncovering new opportunities for power saving through arithmetic rewrites using the e-graph data structure. Since power consumption is highly dependent upon the workload performed by the circuit, our tool flow facilitates a data dependent design paradigm, where an implementation is automatically tailored to particular contexts of data activity. We develop an automated RTL to RTL optimization framework, ROVER, that takes circuit input stimuli and generates power-efficient architectures. We evaluate the effectiveness on both open-source arithmetic benchmarks and benchmarks derived from Intel production examples. The tool is able to reduce the total power consumption by up to 33.9%.”

Find the technical paper here. Published April 2024.

Samuel Coward, Theo Drane, Emiliano Morini, George Constantinides; arXiv:2404.12336v1.

The post Merging Power and Arithmetic Optimization Via Datapath Rewriting (Intel, Imperial College London) appeared first on Semiconductor Engineering.

Powering The Automotive Revolution: Advanced Packaging For Next-Generation Vehicle Computing

Automotive processors are rapidly adopting advanced process nodes. NXP announced the development of 5 nm automotive processors in 2020 [1], Mobileye announced EyeQ Ultra using 5 nm technology during CES 2022 [2], and TSMC announced its “Auto Early” 3 nm processes in 2023 [3]. In the past, the automotive industry was slow to adopt the latest semiconductor technologies due to reliability concerns and lack of a compelling need. Not anymore.

The use of advanced processes necessitates the use of advanced packaging as seen in high performance computing (HPC) and mobile applications because [4][5]:

  1. While transistor density has skyrocketed, I/O density has not increased proportionally and is holding back chip size reductions.
  2. Processors have heterogeneous, specialized blocks to support today’s workloads.
  3. Maximum chip sizes are limited by the slowdown of transistor scaling, photo reticle limits and lower yields.
  4. Cost per transistor improvements have slowed down with advanced nodes.
  5. Off-package dynamic random-access memory (DRAM) throttles memory bandwidth.

These have been drivers for the use of advanced packages like fan-out in mobile and 2.5D/3D in HPC. In addition, these drivers are slowly but surely showing up in automotive compute units in a variety of automotive architectures as well (see figure 1).

Fig. 1: Vehicle E/E architectures. (Image courtesy of Amkor Technology)

Vehicle electrical/electronic (E/E) architectures have evolved from 100+ distributed electronic control units (ECUs) to 10+ domain control units (DCUs) [6]. The most recent architecture introduces zonal or zone ECUs that are clustered in physical locations in cars and connect to powerful central computing units for processing. These newer architectures improve scalability, cost, and reliability of software-defined vehicles (SDVs) [7]. The processors in each of these architectures are more complex than those in the previous generation.

Multiple cameras, radar, lidar and ultrasonic sensors and more feed data into the compute units. Processing and inferencing this data require specialized functional blocks on the processor. For example, the Tesla Full Self-Driving (FSD) HW 3.0 system on chip (SoC) has central processing units (CPUs), graphic processing units (GPUs), neural network processing units, Low-Power Double Data Rate 4 (LPDDR4) controllers and other functional blocks – all integrated on a single piece of silicon [8]. Similarly, Mobileye EyeQ6 has functional blocks of CPU clusters, accelerator clusters, GPUs and an LPDDR5 interface [9]. As more functional blocks are introduced, the chip size and complexity will continue to increase. Instead of a single, monolithic silicon chip, a chiplet approach with separate functional blocks allows intellectual property (IP) reuse along with optimal process nodes for each functional block [10]. Additionally, large, monolithic pieces of silicon built on advanced processes tend to have yield challenges, which can also be overcome using chiplets.

Current advanced driver-assistance systems (ADAS) applications require a DRAM bandwidth of less than 60GB/s, which can be supported with standard double data rate (DDR) and LPDDR solutions. However, ADAS Level 4 and Level 5 will need up to 1024 GB/s memory bandwidth, which will require the use of solutions such as Graphic DDR (GDDR) or High Bandwidth Memory (HBM) [11][12].

Fig. 2: Automotive compute package roadmap. (Image courtesy of Amkor Technology)

Automotive processors have been using Flip Chip BGA (FCBGA) packages since 2010. FCBGA has become the mainstay of several automotive SoCs, such as EyeQ from Mobileye, Tesla FSD and NVIDIA Drive. Consumer applications of FCBGA packaging started around 1995 [13], so it took more than 15 years for this package to be adopted by the automotive industry. Computing units in the form of multichip modules (MCMs) or System-in-Package (SiP) have also been in automotive use since the early 2010s for infotainment processors. The use of MCMs is likely to increase in automotive compute to enable components like the SoC, DRAM and power management integrated circuit (PMIC) to communicate with each other without sending signals off-package.

As cars move to a central computing architecture, the SoCs will become more complex and run into size and cost challenges. Splitting these SoCs into chiplets becomes a logical solution and packaging these chiplets using fan-out or 2.5D packages becomes necessary. Just as FCBGA and MCMs transitioned into automotive from non-automotive applications, so will fan-out and 2.5D packaging for automotive compute processors (see figure 2). The automotive industry is cautious but the abovementioned architecture changes are pushing faster adoption of advanced packages. Materials, processes, and factory controls are key considerations for successful qualification of these packages in automotive compute applications.

In summary, the automotive industry is adopting advanced semiconductor technologies, such as 5 nm and 3 nm processes, which require the use of advanced packaging due to limitations in I/O density, chip size reductions, and memory bandwidth. Processors in the latest vehicle E/E architectures are more complex and require specialized functional blocks to process data from multiple sensors. As cars move to the central computing architecture, the SoCs will become more complex and run into size and cost challenges. Splitting these SoCs into chiplets becomes a logical solution and packaging these chiplets using fan-out or 2.5D technology becomes necessary.

Sources

  1. NXP. “NXP Selects TSMC 5nm Process for Next-Generation High-Performance Automotive Platform.” NXP, https://www.nxp.com/company/about-nxp/nxp-selects-tsmc-5nm-process-for-next-generation-high-performance-automotive-platform:NW-TSMC-5NM-HIGH-PERFORMANCE.
  2. Mobileye. “Mobileye at CES 2022.” Mobileye, https://www.mobileye.com/news/mobileye-ces-2022-tech-news/.
  3. Business Wire. “TSMC Showcases New Technology Developments at 2023 Technology Symposium.” Business Wire, https://www.businesswire.com/news/home/20230426005359/en/TSMC-Showcases-New-Technology-Developments-at-2023-Technology-Symposium.
  4. Swaminathan, Raja. “Advanced Packaging: Enabling Moore’s Law’s Next Frontier Through Heterogeneous Integration.” HotChips33, https://hc33.hotchips.org/assets/program/tutorials/2021%20Hot%20Chips%20AMD%20Advanced%20Packaging%20Swaminathan%20Final%20%2020210820.pdf
  5. SemiAnalysis. “Advanced Packaging Part 1” SemiAnalysis, https://www.semianalysis.com/p/advanced-packaging-part-1-pad-limited?utm_source=%2Fsearch%2Fadvanced%2520packaging&utm_medium=reader2.
  6. McKinsey & Company. “Getting Ready for Next-Generation EE Architecture with Zonal Compute.” McKinsey & Company, https://www.mckinsey.com/industries/semiconductors/our-insights/getting-ready-for-next-generation-ee-architecture-with-zonal-compute.
  7. NXP. “How Zonal E/E Architectures with Ethernet are Enabling Software-Defined Vehicles.” NXP, https://www.nxp.com/company/blog/how-zonal-e-e-architectures-with-ethernet-are-enabling-software-defined-vehicles:BL-HOW-ZONAL-EE-ARCHITECTURES.
  8. WikiChip. “Tesla (Car Company)/FSD Chip.” WikiChip, https://en.wikichip.org/wiki/tesla_(car_company)/fsd_chip.
  9. Mobileye. “EyeQ Chip.” Mobileye, https://www.mobileye.com/technology/eyeq-chip/.
  10. Ziadeh, Bassam. “Driving Adoption of Advanced IC Packaging in Automotive Applications.” Presentation at IMAPS DPC, March 2023. General Motors, Fountain Hills AZ, March 16, 2023.
  11. K Matthias Jung and Norbert Wehn. “Driving Against the Memory Wall: The Role of Memory for Autonomous Driving.” Fraunhofer IESE, Kaiserslautern, Germany, and Microelectronic Systems Design Research Group, University of Kaiserslautern, Kaiserslautern, Germany. Kluedo, https://kluedo.ub.rptu.de/frontdoor/deliver/index/docId/5286/file/_memory.pdf.
  12. Micron. “Cinco de Play: Memory – Is That Critical to Autonomous Driving?” Micron, https://www.micron.com/about/blog/2017/october/cinco-play-memory-is-that-critical-to-autonomous-driving.
  13. McKinsey & Company. “Advanced Chip Packaging: How Manufacturers Can Play to Win.” McKinsey & Company, https://www.mckinsey.com/industries/semiconductors/our-insights/advanced-chip-packaging-how-manufacturers-can-play-to-win.

The post Powering The Automotive Revolution: Advanced Packaging For Next-Generation Vehicle Computing appeared first on Semiconductor Engineering.

Merging Power and Arithmetic Optimization Via Datapath Rewriting (Intel, Imperial College London)

A new technical paper titled “Combining Power and Arithmetic Optimization via Datapath Rewriting” was published by researchers at Intel Corporation and Imperial College London.

Abstract:
“Industrial datapath designers consider dynamic power consumption to be a key metric. Arithmetic circuits contribute a major component of total chip power consumption and are therefore a common target for power optimization. While arithmetic circuit area and dynamic power consumption are often correlated, there is also a tradeoff to consider, as additional gates can be added to explicitly reduce arithmetic circuit activity and hence reduce power consumption. In this work, we consider two forms of power optimization and their interaction: circuit area reduction via arithmetic optimization, and the elimination of redundant computations using both data and clock gating. By encoding both these classes of optimization as local rewrites of expressions, our tool flow can simultaneously explore them, uncovering new opportunities for power saving through arithmetic rewrites using the e-graph data structure. Since power consumption is highly dependent upon the workload performed by the circuit, our tool flow facilitates a data dependent design paradigm, where an implementation is automatically tailored to particular contexts of data activity. We develop an automated RTL to RTL optimization framework, ROVER, that takes circuit input stimuli and generates power-efficient architectures. We evaluate the effectiveness on both open-source arithmetic benchmarks and benchmarks derived from Intel production examples. The tool is able to reduce the total power consumption by up to 33.9%.”

Find the technical paper here. Published April 2024.

Samuel Coward, Theo Drane, Emiliano Morini, George Constantinides; arXiv:2404.12336v1.

The post Merging Power and Arithmetic Optimization Via Datapath Rewriting (Intel, Imperial College London) appeared first on Semiconductor Engineering.

Powering The Automotive Revolution: Advanced Packaging For Next-Generation Vehicle Computing

Automotive processors are rapidly adopting advanced process nodes. NXP announced the development of 5 nm automotive processors in 2020 [1], Mobileye announced EyeQ Ultra using 5 nm technology during CES 2022 [2], and TSMC announced its “Auto Early” 3 nm processes in 2023 [3]. In the past, the automotive industry was slow to adopt the latest semiconductor technologies due to reliability concerns and lack of a compelling need. Not anymore.

The use of advanced processes necessitates the use of advanced packaging as seen in high performance computing (HPC) and mobile applications because [4][5]:

  1. While transistor density has skyrocketed, I/O density has not increased proportionally and is holding back chip size reductions.
  2. Processors have heterogeneous, specialized blocks to support today’s workloads.
  3. Maximum chip sizes are limited by the slowdown of transistor scaling, photo reticle limits and lower yields.
  4. Cost per transistor improvements have slowed down with advanced nodes.
  5. Off-package dynamic random-access memory (DRAM) throttles memory bandwidth.

These have been drivers for the use of advanced packages like fan-out in mobile and 2.5D/3D in HPC. In addition, these drivers are slowly but surely showing up in automotive compute units in a variety of automotive architectures as well (see figure 1).

Fig. 1: Vehicle E/E architectures. (Image courtesy of Amkor Technology)

Vehicle electrical/electronic (E/E) architectures have evolved from 100+ distributed electronic control units (ECUs) to 10+ domain control units (DCUs) [6]. The most recent architecture introduces zonal or zone ECUs that are clustered in physical locations in cars and connect to powerful central computing units for processing. These newer architectures improve scalability, cost, and reliability of software-defined vehicles (SDVs) [7]. The processors in each of these architectures are more complex than those in the previous generation.

Multiple cameras, radar, lidar and ultrasonic sensors and more feed data into the compute units. Processing and inferencing this data require specialized functional blocks on the processor. For example, the Tesla Full Self-Driving (FSD) HW 3.0 system on chip (SoC) has central processing units (CPUs), graphic processing units (GPUs), neural network processing units, Low-Power Double Data Rate 4 (LPDDR4) controllers and other functional blocks – all integrated on a single piece of silicon [8]. Similarly, Mobileye EyeQ6 has functional blocks of CPU clusters, accelerator clusters, GPUs and an LPDDR5 interface [9]. As more functional blocks are introduced, the chip size and complexity will continue to increase. Instead of a single, monolithic silicon chip, a chiplet approach with separate functional blocks allows intellectual property (IP) reuse along with optimal process nodes for each functional block [10]. Additionally, large, monolithic pieces of silicon built on advanced processes tend to have yield challenges, which can also be overcome using chiplets.

Current advanced driver-assistance systems (ADAS) applications require a DRAM bandwidth of less than 60GB/s, which can be supported with standard double data rate (DDR) and LPDDR solutions. However, ADAS Level 4 and Level 5 will need up to 1024 GB/s memory bandwidth, which will require the use of solutions such as Graphic DDR (GDDR) or High Bandwidth Memory (HBM) [11][12].

Fig. 2: Automotive compute package roadmap. (Image courtesy of Amkor Technology)

Automotive processors have been using Flip Chip BGA (FCBGA) packages since 2010. FCBGA has become the mainstay of several automotive SoCs, such as EyeQ from Mobileye, Tesla FSD and NVIDIA Drive. Consumer applications of FCBGA packaging started around 1995 [13], so it took more than 15 years for this package to be adopted by the automotive industry. Computing units in the form of multichip modules (MCMs) or System-in-Package (SiP) have also been in automotive use since the early 2010s for infotainment processors. The use of MCMs is likely to increase in automotive compute to enable components like the SoC, DRAM and power management integrated circuit (PMIC) to communicate with each other without sending signals off-package.

As cars move to a central computing architecture, the SoCs will become more complex and run into size and cost challenges. Splitting these SoCs into chiplets becomes a logical solution and packaging these chiplets using fan-out or 2.5D packages becomes necessary. Just as FCBGA and MCMs transitioned into automotive from non-automotive applications, so will fan-out and 2.5D packaging for automotive compute processors (see figure 2). The automotive industry is cautious but the abovementioned architecture changes are pushing faster adoption of advanced packages. Materials, processes, and factory controls are key considerations for successful qualification of these packages in automotive compute applications.

In summary, the automotive industry is adopting advanced semiconductor technologies, such as 5 nm and 3 nm processes, which require the use of advanced packaging due to limitations in I/O density, chip size reductions, and memory bandwidth. Processors in the latest vehicle E/E architectures are more complex and require specialized functional blocks to process data from multiple sensors. As cars move to the central computing architecture, the SoCs will become more complex and run into size and cost challenges. Splitting these SoCs into chiplets becomes a logical solution and packaging these chiplets using fan-out or 2.5D technology becomes necessary.

Sources

  1. NXP. “NXP Selects TSMC 5nm Process for Next-Generation High-Performance Automotive Platform.” NXP, https://www.nxp.com/company/about-nxp/nxp-selects-tsmc-5nm-process-for-next-generation-high-performance-automotive-platform:NW-TSMC-5NM-HIGH-PERFORMANCE.
  2. Mobileye. “Mobileye at CES 2022.” Mobileye, https://www.mobileye.com/news/mobileye-ces-2022-tech-news/.
  3. Business Wire. “TSMC Showcases New Technology Developments at 2023 Technology Symposium.” Business Wire, https://www.businesswire.com/news/home/20230426005359/en/TSMC-Showcases-New-Technology-Developments-at-2023-Technology-Symposium.
  4. Swaminathan, Raja. “Advanced Packaging: Enabling Moore’s Law’s Next Frontier Through Heterogeneous Integration.” HotChips33, https://hc33.hotchips.org/assets/program/tutorials/2021%20Hot%20Chips%20AMD%20Advanced%20Packaging%20Swaminathan%20Final%20%2020210820.pdf
  5. SemiAnalysis. “Advanced Packaging Part 1” SemiAnalysis, https://www.semianalysis.com/p/advanced-packaging-part-1-pad-limited?utm_source=%2Fsearch%2Fadvanced%2520packaging&utm_medium=reader2.
  6. McKinsey & Company. “Getting Ready for Next-Generation EE Architecture with Zonal Compute.” McKinsey & Company, https://www.mckinsey.com/industries/semiconductors/our-insights/getting-ready-for-next-generation-ee-architecture-with-zonal-compute.
  7. NXP. “How Zonal E/E Architectures with Ethernet are Enabling Software-Defined Vehicles.” NXP, https://www.nxp.com/company/blog/how-zonal-e-e-architectures-with-ethernet-are-enabling-software-defined-vehicles:BL-HOW-ZONAL-EE-ARCHITECTURES.
  8. WikiChip. “Tesla (Car Company)/FSD Chip.” WikiChip, https://en.wikichip.org/wiki/tesla_(car_company)/fsd_chip.
  9. Mobileye. “EyeQ Chip.” Mobileye, https://www.mobileye.com/technology/eyeq-chip/.
  10. Ziadeh, Bassam. “Driving Adoption of Advanced IC Packaging in Automotive Applications.” Presentation at IMAPS DPC, March 2023. General Motors, Fountain Hills AZ, March 16, 2023.
  11. K Matthias Jung and Norbert Wehn. “Driving Against the Memory Wall: The Role of Memory for Autonomous Driving.” Fraunhofer IESE, Kaiserslautern, Germany, and Microelectronic Systems Design Research Group, University of Kaiserslautern, Kaiserslautern, Germany. Kluedo, https://kluedo.ub.rptu.de/frontdoor/deliver/index/docId/5286/file/_memory.pdf.
  12. Micron. “Cinco de Play: Memory – Is That Critical to Autonomous Driving?” Micron, https://www.micron.com/about/blog/2017/october/cinco-play-memory-is-that-critical-to-autonomous-driving.
  13. McKinsey & Company. “Advanced Chip Packaging: How Manufacturers Can Play to Win.” McKinsey & Company, https://www.mckinsey.com/industries/semiconductors/our-insights/advanced-chip-packaging-how-manufacturers-can-play-to-win.

The post Powering The Automotive Revolution: Advanced Packaging For Next-Generation Vehicle Computing appeared first on Semiconductor Engineering.

White-Box Fuzzer With Static Analysis To Detect And Locate Timing Vulnerabilities In RISC-V Processors 

A technical paper titled “WhisperFuzz: White-Box Fuzzing for Detecting and Locating Timing Vulnerabilities in Processors” was published by researchers at Indian Institute of Technology Madras, Texas A&M University, and
Technische Universität Darmstadt.

Abstract:

“Timing vulnerabilities in processors have emerged as a potent threat. As processors are the foundation of any computing system, identifying these flaws is imperative. Recently fuzzing techniques, traditionally used for detecting software vulnerabilities, have shown promising results for uncovering vulnerabilities in large-scale hardware designs, such as processors. Researchers have adapted black-box or grey-box fuzzing to detect timing vulnerabilities in processors. However, they cannot identify the locations or root causes of these timing vulnerabilities, nor do they provide coverage feedback to enable the designer’s confidence in the processor’s security.
To address the deficiencies of the existing fuzzers, we present WhisperFuzz–the first white-box fuzzer with static analysis–aiming to detect and locate timing vulnerabilities in processors and evaluate the coverage of microarchitectural timing behaviors. WhisperFuzz uses the fundamental nature of processors’ timing behaviors, microarchitectural state transitions, to localize timing vulnerabilities. WhisperFuzz automatically extracts microarchitectural state transitions from a processor design at the register-transfer level (RTL) and instruments the design to monitor the state transitions as coverage. Moreover, WhisperFuzz measures the time a design-under-test (DUT) takes to process tests, identifying any minor, abnormal variations that may hint at a timing vulnerability. WhisperFuzz detects 12 new timing vulnerabilities across advanced open-sourced RISC-V processors: BOOM, Rocket Core, and CVA6. Eight of these violate the zero latency requirements of the Zkt extension and are considered serious security vulnerabilities. Moreover, WhisperFuzz also pinpoints the locations of the new and the existing vulnerabilities.”

Find the technical paper here. Published February 2024 (preprint).

Borkar, Pallavi, Chen Chen, Mohamadreza Rostami, Nikhilesh Singh, Rahul Kande, Ahmad-Reza Sadeghi, Chester Rebeiro, and Jeyavijayan Rajendran. “WhisperFuzz: White-Box Fuzzing for Detecting and Locating Timing Vulnerabilities in Processors.” arXiv preprint arXiv:2402.03704 (2024).

Related Reading
RISC-V Micro-Architectural Verification
Verifying a processor is much more than making sure the instructions work, but the industry is building from a limited knowledge base and few dedicated tools.
What’s Required To Secure Chips
There is no single solution, and the most comprehensive security may be too expensive.

The post White-Box Fuzzer With Static Analysis To Detect And Locate Timing Vulnerabilities In RISC-V Processors  appeared first on Semiconductor Engineering.

Ultra-Low Power CiM Design For Practical Edge Scenarios

A technical paper titled “Low Power and Temperature-Resilient Compute-In-Memory Based on Subthreshold-FeFET” was published by researchers at Zhejiang University, University of Notre Dame, Technical University of Munich, Munich Institute of Robotics and Machine Intelligence, and the Laboratory of Collaborative Sensing and Autonomous Unmanned Systems of Zhejiang Province.

Abstract:

“Compute-in-memory (CiM) is a promising solution for addressing the challenges of artificial intelligence (AI) and the Internet of Things (IoT) hardware such as ‘memory wall’ issue. Specifically, CiM employing nonvolatile memory (NVM) devices in a crossbar structure can efficiently accelerate multiply-accumulation (MAC) computation, a crucial operator in neural networks among various AI models. Low power CiM designs are thus highly desired for further energy efficiency optimization on AI models. Ferroelectric FET (FeFET), an emerging device, is attractive for building ultra-low power CiM array due to CMOS compatibility, high ION /IOF  ratio, etc. Recent studies have explored FeFET based CiM designs that achieve low power consumption. Nevertheless, subthreshold-operated FeFETs, where the operating voltages are scaled down to the subthreshold region to reduce array power consumption, are particularly vulnerable to temperature drift, leading to accuracy degradation. To address this challenge, we propose a temperature-resilient 2T-1FeFET CiM design that performs MAC operations reliably at subthreahold region from 0 to 85 Celsius, while consuming ultra-low power. Benchmarked against the VGG neural network architecture running the CIFAR-10 dataset, the proposed 2T-1FeFET CiM design achieves 89.45% CIFAR-10 test accuracy. Compared to previous FeFET based CiM designs, it exhibits immunity to temperature drift at an 8-bit wordlength scale, and achieves better energy efficiency with 2866 TOPS/W.”

Find the technical paper here. Published January 2024 (preprint).

Zhou, Yifei, Xuchu Huang, Jianyi Yang, Kai Ni, Hussam Amrouch, Cheng Zhuo, and Xunxhao Yin. “Low Power and Temperature-Resilient Compute-In-Memory Based on Subthreshold-FeFET.” arXiv preprint arXiv:2312.17442 (2023).

Related Reading
Increasing AI Energy Efficiency With Compute In Memory
How to process zettascale workloads and stay within a fixed power budget.
Modeling Compute In Memory With Biological Efficiency
Generative AI forces chipmakers to use compute resources more intelligently.

The post Ultra-Low Power CiM Design For Practical Edge Scenarios appeared first on Semiconductor Engineering.

❌