FreshRSS

Zobrazení pro čtení

Jsou dostupné nové články, klikněte pro obnovení stránky.

Heterogeneity Of 3DICs As A Security Vulnerability

A new technical paper titled “Harnessing Heterogeneity for Targeted Attacks on 3-D ICs” was published by Drexel University.

Abstract
“As 3-D integrated circuits (ICs) increasingly pervade the microelectronics industry, the integration of heterogeneous components presents a unique challenge from a security perspective. To this end, an attack on a victim die of a multi-tiered heterogeneous 3-D IC is proposed and evaluated. By utilizing on-chip inductive circuits and transistors with low voltage threshold (LVT), a die based on CMOS technology is proposed that includes a sensor to monitor the electromagnetic (EM) emissions from the normal function of a victim die, without requiring physical probing. The adversarial circuit is self-powered through the use of thermocouples that supply the generated current to circuits that sense EM emissions. Therefore, the integration of disparate technologies in a single 3-D circuit allows for a stealthy, wireless, and non-invasive side-channel attack. A thin-film thermo-electric generator (TEG) is developed that produces a 115 mV voltage source, which is amplified 5 × through a voltage booster to provide power to the adversarial circuit. An on-chip inductor is also developed as a component of a sensing array, which detects changes to the magnetic field induced by the computational activity of the victim die. In addition, the challenges associated with detecting and mitigating such attacks are discussed, highlighting the limitations of existing security mechanisms in addressing the multifaceted nature of vulnerabilities due to the heterogeneity of 3-D ICs.”

Find the technical paper here. Published June 2024.

Alec Aversa and Ioannis Savidis. 2024. Harnessing Heterogeneity for Targeted Attacks on 3-D ICs. In Proceedings of the Great Lakes Symposium on VLSI 2024 (GLSVLSI ’24). Association for Computing Machinery, New York, NY, USA, 246–251. https://doi.org/10.1145/3649476.3660385.

The post Heterogeneity Of 3DICs As A Security Vulnerability appeared first on Semiconductor Engineering.

A Hybrid ECO Detailed Placement Flow for Mitigating Dynamic IR Drop (UC San Diego)

A new technical paper titled “A Hybrid ECO Detailed Placement Flow for Improved Reduction of Dynamic IR Drop” was published by researchers at UC San Diego.

Abstract:

“With advanced semiconductor technology progressing well into sub-7nm scale, voltage drop has become an increasingly challenging issue. As a result, there has been extensive research focused on predicting and mitigating dynamic IR drops, leading to the development of IR drop engineering change order (ECO) flows – often integrated with modern commercial EDA tools. However, these tools encounter QoR limitations while mitigating IR drop. To address this, we propose a hybrid ECO detailed placement approach that is integrated with existing commercial EDA flows, to mitigate excessive peak current demands within power and ground rails. Our proposed hybrid approach effectively optimizes peak current levels within a specified “clip”– complementing and enhancing commercial EDA dynamic IR-driven ECO detailed placements. In particular, we: (i) order instances in a netlist in decreasing order of worst voltage drop; (ii) extract a clip around each instance; and (iii) solve an integer linear programming (ILP) problem to optimize instance placements. Our approach optimizes dynamic voltage drops (DVD) across ten designs by up to 15.3% compared to original conventional flows, with similar timing quality and 55.1% less runtime.”

Find the technical paper here. Published June 2024.

Andrew B. Kahng, Bodhisatta Pramanik, and Mingyu Woo. 2024. A Hybrid ECO Detailed Placement Flow for Improved Reduction of Dynamic IR Drop. In Proceedings of the Great Lakes Symposium on VLSI 2024 (GLSVLSI ’24). Association for Computing Machinery, New York, NY, USA, 390–396. https://doi.org/10.1145/3649476.3658727.

The post A Hybrid ECO Detailed Placement Flow for Mitigating Dynamic IR Drop (UC San Diego) appeared first on Semiconductor Engineering.

Survey of Energy Efficient PIM Processors

A new technical paper titled “Survey of Deep Learning Accelerators for Edge and Emerging Computing” was published by researchers at University of Dayton and the Air Force Research Laboratory.

Abstract

“The unprecedented progress in artificial intelligence (AI), particularly in deep learning algorithms with ubiquitous internet connected smart devices, has created a high demand for AI computing on the edge devices. This review studied commercially available edge processors, and the processors that are still in industrial research stages. We categorized state-of-the-art edge processors based on the underlying architecture, such as dataflow, neuromorphic, and processing in-memory (PIM) architecture. The processors are analyzed based on their performance, chip area, energy efficiency, and application domains. The supported programming frameworks, model compression, data precision, and the CMOS fabrication process technology are discussed. Currently, most commercial edge processors utilize dataflow architectures. However, emerging non-von Neumann computing architectures have attracted the attention of the industry in recent years. Neuromorphic processors are highly efficient for performing computation with fewer synaptic operations, and several neuromorphic processors offer online training for secured and personalized AI applications. This review found that the PIM processors show significant energy efficiency and consume less power compared to dataflow and neuromorphic processors. A future direction of the industry could be to implement state-of-the-art deep learning algorithms in emerging non-von Neumann computing paradigms for low-power computing on edge devices.”

Find the technical paper here. Published July 2024.

Alam, Shahanur, Chris Yakopcic, Qing Wu, Mark Barnell, Simon Khan, and Tarek M. Taha. 2024. “Survey of Deep Learning Accelerators for Edge and Emerging Computing” Electronics 13, no. 15: 2988. https://doi.org/10.3390/electronics13152988.

The post Survey of Energy Efficient PIM Processors appeared first on Semiconductor Engineering.

Heterogeneity Of 3DICs As A Security Vulnerability

A new technical paper titled “Harnessing Heterogeneity for Targeted Attacks on 3-D ICs” was published by Drexel University.

Abstract
“As 3-D integrated circuits (ICs) increasingly pervade the microelectronics industry, the integration of heterogeneous components presents a unique challenge from a security perspective. To this end, an attack on a victim die of a multi-tiered heterogeneous 3-D IC is proposed and evaluated. By utilizing on-chip inductive circuits and transistors with low voltage threshold (LVT), a die based on CMOS technology is proposed that includes a sensor to monitor the electromagnetic (EM) emissions from the normal function of a victim die, without requiring physical probing. The adversarial circuit is self-powered through the use of thermocouples that supply the generated current to circuits that sense EM emissions. Therefore, the integration of disparate technologies in a single 3-D circuit allows for a stealthy, wireless, and non-invasive side-channel attack. A thin-film thermo-electric generator (TEG) is developed that produces a 115 mV voltage source, which is amplified 5 × through a voltage booster to provide power to the adversarial circuit. An on-chip inductor is also developed as a component of a sensing array, which detects changes to the magnetic field induced by the computational activity of the victim die. In addition, the challenges associated with detecting and mitigating such attacks are discussed, highlighting the limitations of existing security mechanisms in addressing the multifaceted nature of vulnerabilities due to the heterogeneity of 3-D ICs.”

Find the technical paper here. Published June 2024.

Alec Aversa and Ioannis Savidis. 2024. Harnessing Heterogeneity for Targeted Attacks on 3-D ICs. In Proceedings of the Great Lakes Symposium on VLSI 2024 (GLSVLSI ’24). Association for Computing Machinery, New York, NY, USA, 246–251. https://doi.org/10.1145/3649476.3660385.

The post Heterogeneity Of 3DICs As A Security Vulnerability appeared first on Semiconductor Engineering.

A Hybrid ECO Detailed Placement Flow for Mitigating Dynamic IR Drop (UC San Diego)

A new technical paper titled “A Hybrid ECO Detailed Placement Flow for Improved Reduction of Dynamic IR Drop” was published by researchers at UC San Diego.

Abstract:

“With advanced semiconductor technology progressing well into sub-7nm scale, voltage drop has become an increasingly challenging issue. As a result, there has been extensive research focused on predicting and mitigating dynamic IR drops, leading to the development of IR drop engineering change order (ECO) flows – often integrated with modern commercial EDA tools. However, these tools encounter QoR limitations while mitigating IR drop. To address this, we propose a hybrid ECO detailed placement approach that is integrated with existing commercial EDA flows, to mitigate excessive peak current demands within power and ground rails. Our proposed hybrid approach effectively optimizes peak current levels within a specified “clip”– complementing and enhancing commercial EDA dynamic IR-driven ECO detailed placements. In particular, we: (i) order instances in a netlist in decreasing order of worst voltage drop; (ii) extract a clip around each instance; and (iii) solve an integer linear programming (ILP) problem to optimize instance placements. Our approach optimizes dynamic voltage drops (DVD) across ten designs by up to 15.3% compared to original conventional flows, with similar timing quality and 55.1% less runtime.”

Find the technical paper here. Published June 2024.

Andrew B. Kahng, Bodhisatta Pramanik, and Mingyu Woo. 2024. A Hybrid ECO Detailed Placement Flow for Improved Reduction of Dynamic IR Drop. In Proceedings of the Great Lakes Symposium on VLSI 2024 (GLSVLSI ’24). Association for Computing Machinery, New York, NY, USA, 390–396. https://doi.org/10.1145/3649476.3658727.

The post A Hybrid ECO Detailed Placement Flow for Mitigating Dynamic IR Drop (UC San Diego) appeared first on Semiconductor Engineering.

Survey of Energy Efficient PIM Processors

A new technical paper titled “Survey of Deep Learning Accelerators for Edge and Emerging Computing” was published by researchers at University of Dayton and the Air Force Research Laboratory.

Abstract

“The unprecedented progress in artificial intelligence (AI), particularly in deep learning algorithms with ubiquitous internet connected smart devices, has created a high demand for AI computing on the edge devices. This review studied commercially available edge processors, and the processors that are still in industrial research stages. We categorized state-of-the-art edge processors based on the underlying architecture, such as dataflow, neuromorphic, and processing in-memory (PIM) architecture. The processors are analyzed based on their performance, chip area, energy efficiency, and application domains. The supported programming frameworks, model compression, data precision, and the CMOS fabrication process technology are discussed. Currently, most commercial edge processors utilize dataflow architectures. However, emerging non-von Neumann computing architectures have attracted the attention of the industry in recent years. Neuromorphic processors are highly efficient for performing computation with fewer synaptic operations, and several neuromorphic processors offer online training for secured and personalized AI applications. This review found that the PIM processors show significant energy efficiency and consume less power compared to dataflow and neuromorphic processors. A future direction of the industry could be to implement state-of-the-art deep learning algorithms in emerging non-von Neumann computing paradigms for low-power computing on edge devices.”

Find the technical paper here. Published July 2024.

Alam, Shahanur, Chris Yakopcic, Qing Wu, Mark Barnell, Simon Khan, and Tarek M. Taha. 2024. “Survey of Deep Learning Accelerators for Edge and Emerging Computing” Electronics 13, no. 15: 2988. https://doi.org/10.3390/electronics13152988.

The post Survey of Energy Efficient PIM Processors appeared first on Semiconductor Engineering.

MTJ-Based CRAM Array

A new technical paper titled “Experimental demonstration of magnetic tunnel junction-based computational random-access memory” was published by researchers at University of Minnesota and University of Arizona, Tucson.

Abstract

“The conventional computing paradigm struggles to fulfill the rapidly growing demands from emerging applications, especially those for machine intelligence because much of the power and energy is consumed by constant data transfers between logic and memory modules. A new paradigm, called “computational random-access memory (CRAM),” has emerged to address this fundamental limitation. CRAM performs logic operations directly using the memory cells themselves, without having the data ever leave the memory. The energy and performance benefits of CRAM for both conventional and emerging applications have been well established by prior numerical studies. However, there is a lack of experimental demonstration and study of CRAM to evaluate its computational accuracy, which is a realistic and application-critical metric for its technological feasibility and competitiveness. In this work, a CRAM array based on magnetic tunnel junctions (MTJs) is experimentally demonstrated. First, basic memory operations, as well as 2-, 3-, and 5-input logic operations, are studied. Then, a 1-bit full adder with two different designs is demonstrated. Based on the experimental results, a suite of models has been developed to characterize the accuracy of CRAM computation. Scalar addition, multiplication, and matrix multiplication, which are essential building blocks for many conventional and machine intelligence applications, are evaluated and show promising accuracy performance. With the confirmation of MTJ-based CRAM’s accuracy, there is a strong case that this technology will have a significant impact on power- and energy-demanding applications of machine intelligence.”

Find the technical paper here. Published July 2024.  Find the University of Minnesota’s news release here.

Lv, Y., Zink, B.R., Bloom, R.P. et al. Experimental demonstration of magnetic tunnel junction-based computational random-access memory. npj Unconv. Comput. 1, 3 (2024). https://doi.org/10.1038/s44335-024-00003-3.

The post MTJ-Based CRAM Array appeared first on Semiconductor Engineering.

ML Method To Predict IR Drop Levels

A new technical paper titled “IR drop Prediction Based on Machine Learning and Pattern Reduction” was published by researchers at National Tsing Hua University, National Taiwan University of Science and Technology, and MediaTek.

Abstract (partial)
“In this paper, we propose a machine learning-based method to predict IR drop levels and present an algorithm for reducing simulation patterns, which could reduce the time and computing resources required for IR drop analysis within the ECO flow. Experimental results show that our approach can reduce the number of patterns by approximately 50%, thereby decreasing the analysis time while maintaining accuracy.”

Find the technical paper here. Published June 2024.

Yong-Fong Chang, Yung-Chih Chen, Yu-Chen Cheng, Shu-Hong Lin, Che-Hsu Lin, Chun-Yuan Chen, Yu-Hsuan Chen, Yu-Che Lee, Jia-Wei Lin, Hsun-Wei Pao, Shih-Chieh Chang, Yi-Ting Li, and Chun-Yao Wang. 2024. IR drop Prediction Based on Machine Learning and Pattern Reduction. In Proceedings of the Great Lakes Symposium on VLSI 2024 (GLSVLSI ’24). Association for Computing Machinery, New York, NY, USA, 516–519. https://doi.org/10.1145/3649476.3658775

The post ML Method To Predict IR Drop Levels appeared first on Semiconductor Engineering.

Power Electronic Packaging for Discrete Dies

A technical paper titled “Substrate Embedded Power Electronics Packaging for Silicon Carbide MOSFETs” was published by researchers at University of Cambridge, University of Warwick, Chongqing University, and SpaceX.

Abstract:

“This paper proposes a new power electronic packaging for discrete dies, namely Standard Cell which consists of a step-etched active metal brazed (AMB) substrate and a flexible printed circuit board (flex-PCB). The standard cell exhibits high thermal conductivity, complete electrical insulation, and low stray inductance, thereby enhancing the performance of SiC MOSFET devices. The standard cell has a stray power loop inductance of less than 1 nH and a gate loop inductance of less than 1.5 nH . The standard cell has a flat body with surface-mounting electrical connections on one side and direct thermal connections on the other. The use of flex-PCB die interconnection enables maximum utilization of source pads while providing a flexible gate-source connection and the converter PCB. This paper presents the design concept of the standard cell and experimentally validates its effectiveness in a converter system.”

Find the technical paper here. Published May 2024.

A. Janabi et al., “Substrate Embedded Power Electronics Packaging for Silicon Carbide MOSFETs,” in IEEE Transactions on Power Electronics, doi: 10.1109/TPEL.2024.3396779.

Related Reading
Big Shifts In Power Electronics Packaging
Packages are becoming more complex to endure high power, high temperature conditions across a variety of applications.
Power Semiconductors: A Deep Dive Into Materials, Manufacturing & Business
Premium Content: How these devices are made and work, challenges in manufacturing, related startups, as well as the reasons why so much effort and resources are being spent to develop new materials, and new processes.</

The post Power Electronic Packaging for Discrete Dies appeared first on Semiconductor Engineering.

Characterizing and Evaluating A Quantum Processor Unit In A HPC Center

A new technical paper titled “Calibration and Performance Evaluation of a Superconducting Quantum Processor in an HPC Center” was published by researchers at Leibniz Supercomputing Centre, IQM Quantum Computers, and Technical University of Munich.

Abstract

“As quantum computers mature, they migrate from laboratory environments to HPC centers. This movement enables large-scale deployments, greater access to the technology, and deep integration into HPC in the form of quantum acceleration. In laboratory environments, specialists directly control the systems’ environments and operations at any time with hands-on access, while HPC centers require remote and autonomous operations with minimal physical contact. The requirement for automation of the calibration process needed by all current quantum systems relies on maximizing their coherence times and fidelities and, with that, their best performance. It is, therefore, of great significance to establish a standardized and automatic calibration process alongside unified evaluation standards for quantum computing performance to evaluate the success of the calibration and operation of the system. In this work, we characterize our in-house superconducting quantum computer, establish an automatic calibration process, and evaluate its performance through quantum volume and an application-specific algorithm. We also analyze readout errors and improve the readout fidelity, leaning on error mitigation.”

Find the technical paper here. Published May 2024.

X. Deng, S. Pogorzalek, F. Vigneau, P. Yang, M. Schulz and L. Schulz, “Calibration and Performance Evaluation of a Superconducting Quantum Processor in an HPC Center,” ISC High Performance 2024 Research Paper Proceedings (39th International Conference), Hamburg, Germany, 2024, pp. 1-9, doi: 10.23919/ISC.2024.10528924.

The post Characterizing and Evaluating A Quantum Processor Unit In A HPC Center appeared first on Semiconductor Engineering.

Device Characteristics of GAA-Structured CMOS and CTFET Under Varying Temperatures

A new technical paper titled “Vertical-Stack Nanowire Structure of MOS Inverter and TFET Inverter in Low-temperature Application” was published by researchers at National Tsing Hua University and National United University in Taiwan.

Abstract
“Tunneling field effect transistors (TFET) have emerged as promising candidates for integrated circuits beyond conventional metal oxide semiconductor field effect transistors (MOSFET) and could overcome the physical limit, which results in the subthreshold swing (SS) < 60mV/dec at room temperature. In this study, we compare the complementary TFET (CTFET) with complementary metal oxide semiconductor (CMOS) at low temperatures (70K) by using the Gate-All-Around (GAA) architecture. The experiment result clearly shows that the CTEFT inverter has better characteristics than the CMOS inverter in various temperatures. While operating at a fixed temperature, the CMOS inverter performs an excellent on/off ratio and SS, etc. However, when a CMOS inverter operates at varying temperatures, CMOS performs worse than CTFET. This is attributed to the influence of lattice scattering, leading to the instability of CMOS characteristics. Therefore, the CTFET inverter is suitable for operation in environments with varying temperatures, exhibiting high stability, which can be applied in space technology. The simulation tool TCAD has been used to investigate the characteristics of CMOS and CTFET at low temperatures.”

Find the technical paper here. Published June 2024.

C. -C. Tien and Y. -H. Lin, “Vertical-Stack Nanowire Structure of MOS Inverter and TFET Inverter in Low-temperature Application,” in IEEE Access, doi: 10.1109/ACCESS.2024.3410677.

The post Device Characteristics of GAA-Structured CMOS and CTFET Under Varying Temperatures appeared first on Semiconductor Engineering.

Efficient TNN Inference on RISC-V Processing Cores With Minimal HW Overhead

A new technical paper titled “xTern: Energy-Efficient Ternary Neural Network Inference on RISC-V-Based Edge Systems” was published by researchers at ETH Zurich and Universita di Bologna.

Abstract
“Ternary neural networks (TNNs) offer a superior accuracy-energy trade-off compared to binary neural networks. However, until now, they have required specialized accelerators to realize their efficiency potential, which has hindered widespread adoption. To address this, we present xTern, a lightweight extension of the RISC-V instruction set architecture (ISA) targeted at accelerating TNN inference on general-purpose cores. To complement the ISA extension, we developed a set of optimized kernels leveraging xTern, achieving 67% higher throughput than their 2-bit equivalents. Power consumption is only marginally increased by 5.2%, resulting in an energy efficiency improvement by 57.1%. We demonstrate that the proposed xTern extension, integrated into an octa-core compute cluster, incurs a minimal silicon area overhead of 0.9% with no impact on timing. In end-to-end benchmarks, we demonstrate that xTern enables the deployment of TNNs achieving up to 1.6 percentage points higher CIFAR-10 classification accuracy than 2-bit networks at equal inference latency. Our results show that xTern enables RISC-V-based ultra-low-power edge AI platforms to benefit from the efficiency potential of TNNs.”

Find the technical paper here. Published May 2024.

Rutishauser, Georg, Joan Mihali, Moritz Scherer, and Luca Benini. “xTern: Energy-Efficient Ternary Neural Network Inference on RISC-V-Based Edge Systems.” arXiv preprint arXiv:2405.19065 (2024).

The post Efficient TNN Inference on RISC-V Processing Cores With Minimal HW Overhead appeared first on Semiconductor Engineering.

Comparing Thermal Properties In Molybdenum Substrate To Si And Glass For A System-On-Foil Integration (RIT, Lux)

A technical paper titled “Comparative Analysis of Thermal Properties in Molybdenum Substrate to Silicon and Glass for a System-on-Foil Integration” was published by researchers at Rochester Institute of Technology and Lux Semiconductors.

Abstract:

“Advanced electronics technology is moving towards smaller footprints and higher computational power. In order to achieve this, advanced packaging techniques are currently being considered, including organic, glass, and semiconductor-based substrates that allow for 2.5D or 3D integration of chips and devices. Metal-core substrates are a new alternative with similar properties to those of semiconductor-based substrates but with the added benefits of higher flexibility and metal ductility. This work comprehensively compares the thermal properties of a novel metal-based substrate, molybdenum, and silicon and fused silica glass substrates in the context of system-on-foil (SoF) integration. A simple electronic technique is used to simulate the heat generated by a typical CPU and to measure the heat dissipation properties of the substrates. The results indicate that molybdenum and silicon are able to effectively dissipate a continuous power density of 2.3 W/mm2 as the surface temperature only increases by ~15°C. In contrast, the surface temperature of fused silica glass substrates increases by >140°C for the same applied power. These simple techniques and measurements were validated with infrared camera measurements as well as through finite element analysis via COMSOL simulation. The results validate the use of molybdenum as an advanced packaging substrate and can be used to characterize new substrates and approaches for advanced packaging.”

Find the technical paper here. Published May 2024.

Huang, Tzu-Jung, Tobias Kiebala, Paul Suflita, Chad Moore, Graeme Housser, Shane McMahon, and Ivan Puchades. 2024. “Comparative Analysis of Thermal Properties in Molybdenum Substrate to Silicon and Glass for a System-on-Foil Integration” Electronics 13, no. 10: 1818. https://doi.org/10.3390/electronics13101818

Related Reading
The Race To Glass Substrates
Replacing silicon and organic substrates requires huge shifts in manufacturing, creating challenges that will take years to iron out.

 

The post Comparing Thermal Properties In Molybdenum Substrate To Si And Glass For A System-On-Foil Integration (RIT, Lux) appeared first on Semiconductor Engineering.

CAM-Based CMOS Implementation Of Reference Frames For Neuromorphic Processors (Carnegie Mellon U.)

A technical paper titled “NeRTCAM: CAM-Based CMOS Implementation of Reference Frames for Neuromorphic Processors” was published by researchers at Carnegie Mellon University.

Abstract:

“Neuromorphic architectures mimicking biological neural networks have been proposed as a much more efficient alternative to conventional von Neumann architectures for the exploding compute demands of AI workloads. Recent neuroscience theory on intelligence suggests that Cortical Columns (CCs) are the fundamental compute units in the neocortex and intelligence arises from CC’s ability to store, predict and infer information via structured Reference Frames (RFs). Based on this theory, recent works have demonstrated brain-like visual object recognition using software simulation. Our work is the first attempt towards direct CMOS implementation of Reference Frames for building CC-based neuromorphic processors. We propose NeRTCAM (Neuromorphic Reverse Ternary Content Addressable Memory), a CAM-based building block that supports the key operations (store, predict, infer) required to perform inference using RFs. NeRTCAM architecture is presented in detail including its key components. All designs are implemented in SystemVerilog and synthesized in 7nm CMOS, and hardware complexity scaling is evaluated for varying storage sizes. NeRTCAM system for biologically motivated MNIST inference with a storage size of 1024 entries incurs just 0.15 mm^2 area, 400 mW power and 9.18 us critical path latency, demonstrating the feasibility of direct CMOS implementation of CAM-based Reference Frames.”

Find the technical paper here. Published May 2024 (preprint).

Nair, Harideep, William Leyman, Agastya Sampath, Quinn Jacobson, and John Paul Shen. “NeRTCAM: CAM-Based CMOS Implementation of Reference Frames for Neuromorphic Processors.” arXiv preprint arXiv:2405.11844 (2024).

Related Reading
Running More Efficient AI/ML Code With Neuromorphic Engines
Once a buzzword, neuromorphic engineering is gaining traction in the semiconductor industry.

The post CAM-Based CMOS Implementation Of Reference Frames For Neuromorphic Processors (Carnegie Mellon U.) appeared first on Semiconductor Engineering.

Distributing RTL Simulation Across Thousands Of Cores On 4 IPU Sockets (EPFL)

A technical paper titled “Parendi: Thousand-Way Parallel RTL Simulation” was published by researchers at EPFL.

Abstract:

“Hardware development relies on simulations, particularly cycle-accurate RTL (Register Transfer Level) simulations, which consume significant time. As single-processor performance grows only slowly, conventional, single-threaded RTL simulation is becoming less practical for increasingly complex chips and systems. A solution is parallel RTL simulation, where ideally, simulators could run on thousands of parallel cores. However, existing simulators can only exploit tens of cores.
This paper studies the challenges inherent in running parallel RTL simulation on a multi-thousand-core machine (the Graphcore IPU, a 1472-core machine). Simulation performance requires balancing three factors: synchronization, communication, and computation. We experimentally evaluate each metric and analyze how it affects parallel simulation speed, drawing on contrasts between the large-scale IPU and smaller but faster x86 systems.
Using this analysis, we build Parendi, an RTL simulator for the IPU. It distributes RTL simulation across 5888 cores on 4 IPU sockets. Parendi runs large RTL designs up to 4x faster than a powerful, state-of-the-art x86 multicore system.”

Find the technical paper here. Published March 2024 (preprint).

Emami, Mahyar, Thomas Bourgeat, and James Larus. “Parendi: Thousand-Way Parallel RTL Simulation.” arXiv preprint arXiv:2403.04714 (2024).

Related Reading
Anatomy Of A System Simulation
Balancing the benefits of a model with the costs associated with that model is tough, but it becomes even trickier when dissimilar models are combined.

The post Distributing RTL Simulation Across Thousands Of Cores On 4 IPU Sockets (EPFL) appeared first on Semiconductor Engineering.

Voltage Reference Architectures For Harsh Environments: Quantum Computing And Space

A technical paper titled “Cryo-CMOS Voltage References for the Ultrawide Temperature Range From 300 K Down to 4.2 K” was published by researchers at Delft University of Technology, QuTech, Kavli Institute of Nanoscience Delft, and École Polytechnique Fédérale de Lausanne (EPFL).

Abstract:

“This article presents a family of sub-1-V, fully-CMOS voltage references adopting MOS devices in weak inversion to achieve continuous operation from room temperature (RT) down to cryogenic temperatures. Their accuracy limitations due to curvature, body effect, and mismatch are investigated and experimentally validated. Implemented in 40-nm CMOS, the references show a line regulation better than 2.7%/V from a supply as low as 0.99 V. By applying dynamic element matching (DEM) techniques, a spread of 1.2% (3 σ ) from 4.2 to 300 K can be achieved, resulting in a temperature coefficient (TC) of 111 ppm/K. As the first significant statistical characterization extending down to cryogenic temperatures, the results demonstrate the ability of the proposed architectures to work under cryogenic harsh environments, such as space-and quantum-computing applications.”

Find the technical paper here. Published April 2024.

J. van Staveren et al., “Cryo-CMOS Voltage References for the Ultrawide Temperature Range From 300 K Down to 4.2 K,” in IEEE Journal of Solid-State Circuits, doi: 10.1109/JSSC.2024.3378768.

Further Reading
The Race Toward Quantum Advantage
Enormous amounts of money have been invested into quantum computing, but so far it has not surpassed conventional computers. When will that change?
Managing P/P Tradeoffs With Voltage Droop Gets Trickier
Higher current densities set against lower power envelopes makes meeting specs more challenging, especially at advanced nodes.

The post Voltage Reference Architectures For Harsh Environments: Quantum Computing And Space appeared first on Semiconductor Engineering.

Merging Power and Arithmetic Optimization Via Datapath Rewriting (Intel, Imperial College London)

A new technical paper titled “Combining Power and Arithmetic Optimization via Datapath Rewriting” was published by researchers at Intel Corporation and Imperial College London.

Abstract:
“Industrial datapath designers consider dynamic power consumption to be a key metric. Arithmetic circuits contribute a major component of total chip power consumption and are therefore a common target for power optimization. While arithmetic circuit area and dynamic power consumption are often correlated, there is also a tradeoff to consider, as additional gates can be added to explicitly reduce arithmetic circuit activity and hence reduce power consumption. In this work, we consider two forms of power optimization and their interaction: circuit area reduction via arithmetic optimization, and the elimination of redundant computations using both data and clock gating. By encoding both these classes of optimization as local rewrites of expressions, our tool flow can simultaneously explore them, uncovering new opportunities for power saving through arithmetic rewrites using the e-graph data structure. Since power consumption is highly dependent upon the workload performed by the circuit, our tool flow facilitates a data dependent design paradigm, where an implementation is automatically tailored to particular contexts of data activity. We develop an automated RTL to RTL optimization framework, ROVER, that takes circuit input stimuli and generates power-efficient architectures. We evaluate the effectiveness on both open-source arithmetic benchmarks and benchmarks derived from Intel production examples. The tool is able to reduce the total power consumption by up to 33.9%.”

Find the technical paper here. Published April 2024.

Samuel Coward, Theo Drane, Emiliano Morini, George Constantinides; arXiv:2404.12336v1.

The post Merging Power and Arithmetic Optimization Via Datapath Rewriting (Intel, Imperial College London) appeared first on Semiconductor Engineering.

Merging Power and Arithmetic Optimization Via Datapath Rewriting (Intel, Imperial College London)

A new technical paper titled “Combining Power and Arithmetic Optimization via Datapath Rewriting” was published by researchers at Intel Corporation and Imperial College London.

Abstract:
“Industrial datapath designers consider dynamic power consumption to be a key metric. Arithmetic circuits contribute a major component of total chip power consumption and are therefore a common target for power optimization. While arithmetic circuit area and dynamic power consumption are often correlated, there is also a tradeoff to consider, as additional gates can be added to explicitly reduce arithmetic circuit activity and hence reduce power consumption. In this work, we consider two forms of power optimization and their interaction: circuit area reduction via arithmetic optimization, and the elimination of redundant computations using both data and clock gating. By encoding both these classes of optimization as local rewrites of expressions, our tool flow can simultaneously explore them, uncovering new opportunities for power saving through arithmetic rewrites using the e-graph data structure. Since power consumption is highly dependent upon the workload performed by the circuit, our tool flow facilitates a data dependent design paradigm, where an implementation is automatically tailored to particular contexts of data activity. We develop an automated RTL to RTL optimization framework, ROVER, that takes circuit input stimuli and generates power-efficient architectures. We evaluate the effectiveness on both open-source arithmetic benchmarks and benchmarks derived from Intel production examples. The tool is able to reduce the total power consumption by up to 33.9%.”

Find the technical paper here. Published April 2024.

Samuel Coward, Theo Drane, Emiliano Morini, George Constantinides; arXiv:2404.12336v1.

The post Merging Power and Arithmetic Optimization Via Datapath Rewriting (Intel, Imperial College London) appeared first on Semiconductor Engineering.

Ultrathin vdW Ferromagnet at Room Temperature (MIT)

A technical paper titled “Current-induced switching of a van der Waals ferromagnet at room temperature” was published by researchers at Massachusetts Institute of Technology (MIT).

Abstract:

“Recent discovery of emergent magnetism in van der Waals magnetic materials (vdWMM) has broadened the material space for developing spintronic devices for energy-efficient computation. While there has been appreciable progress in vdWMM discovery, a solution for non-volatile, deterministic switching of vdWMMs at room temperature has been missing, limiting the prospects of their adoption into commercial spintronic devices. Here, we report the first demonstration of current-controlled non-volatile, deterministic magnetization switching in a vdW magnetic material at room temperature. We have achieved spin-orbit torque (SOT) switching of the PMA vdW ferromagnet Fe3GaTe2  using a Pt spin-Hall layer up to 320 K, with a threshold switching current density as low as Jsw = 1.69 × 106 A cm-2 at room temperature. We have also quantitatively estimated the anti-damping-like SOT efficiency of our Fe3GaTe2/Pt bilayer system to be ξDL = 0:093, using the second harmonic Hall voltage measurement technique. These results mark a crucial step in making vdW magnetic materials a viable choice for the development of scalable, energy-efficient spintronic devices.”

Find the technical paper here. Published February 2024. MIT’s related news article and video is here.

Kajale, S.N., Nguyen, T., Chao, C.A. et al. Current-induced switching of a van der Waals ferromagnet at room temperature. Nat Commun 15, 1485 (2024). https://doi.org/10.1038/s41467-024-45586-4

 

 

The post Ultrathin vdW Ferromagnet at Room Temperature (MIT) appeared first on Semiconductor Engineering.

Rapid Exchange Cooling With Trapped Ions For Implementation In A Quantum Charge-Coupled Device

A technical paper titled “Rapid exchange cooling with trapped ions” was published by researchers at Georgia Tech Research Institute.

Abstract:

“The trapped-ion quantum charge-coupled device (QCCD) architecture is a leading candidate for advanced quantum information processing. In current QCCD implementations, imperfect ion transport and anomalous heating can excite ion motion during a calculation. To counteract this, intermediate cooling is necessary to maintain high-fidelity gate performance. Cooling the computational ions sympathetically with ions of another species, a commonly employed strategy, creates a significant runtime bottleneck. Here, we demonstrate a different approach we call exchange cooling. Unlike sympathetic cooling, exchange cooling does not require trapping two different atomic species. The protocol introduces a bank of “coolant” ions which are repeatedly laser cooled. A computational ion can then be cooled by transporting a coolant ion into its proximity. We test this concept experimentally with two 40Ca+ ions, executing the necessary transport in 107 μs, an order of magnitude faster than typical sympathetic cooling durations. We remove over 96%, and as many as 102(5) quanta, of axial motional energy from the computational ion. We verify that re-cooling the coolant ion does not decohere the computational ion. This approach validates the feasibility of a single-species QCCD processor, capable of fast quantum simulation and computation.”

Find the technical paper here. Published February 2024.  A related news release, including a video, can be found here.

Fallek, S.D., Sandhu, V.S., McGill, R.A. et al. Rapid exchange cooling with trapped ions. Nat Commun 15, 1089 (2024). https://doi.org/10.1038/s41467-024-45232-z

Related Reading
The Race Toward Quantum Advantage
Enormous amounts of money have been invested into quantum computing, but so far it has not surpassed conventional computers. When will that change?

The post Rapid Exchange Cooling With Trapped Ions For Implementation In A Quantum Charge-Coupled Device appeared first on Semiconductor Engineering.

Ultra-Low Power CiM Design For Practical Edge Scenarios

A technical paper titled “Low Power and Temperature-Resilient Compute-In-Memory Based on Subthreshold-FeFET” was published by researchers at Zhejiang University, University of Notre Dame, Technical University of Munich, Munich Institute of Robotics and Machine Intelligence, and the Laboratory of Collaborative Sensing and Autonomous Unmanned Systems of Zhejiang Province.

Abstract:

“Compute-in-memory (CiM) is a promising solution for addressing the challenges of artificial intelligence (AI) and the Internet of Things (IoT) hardware such as ‘memory wall’ issue. Specifically, CiM employing nonvolatile memory (NVM) devices in a crossbar structure can efficiently accelerate multiply-accumulation (MAC) computation, a crucial operator in neural networks among various AI models. Low power CiM designs are thus highly desired for further energy efficiency optimization on AI models. Ferroelectric FET (FeFET), an emerging device, is attractive for building ultra-low power CiM array due to CMOS compatibility, high ION /IOF  ratio, etc. Recent studies have explored FeFET based CiM designs that achieve low power consumption. Nevertheless, subthreshold-operated FeFETs, where the operating voltages are scaled down to the subthreshold region to reduce array power consumption, are particularly vulnerable to temperature drift, leading to accuracy degradation. To address this challenge, we propose a temperature-resilient 2T-1FeFET CiM design that performs MAC operations reliably at subthreahold region from 0 to 85 Celsius, while consuming ultra-low power. Benchmarked against the VGG neural network architecture running the CIFAR-10 dataset, the proposed 2T-1FeFET CiM design achieves 89.45% CIFAR-10 test accuracy. Compared to previous FeFET based CiM designs, it exhibits immunity to temperature drift at an 8-bit wordlength scale, and achieves better energy efficiency with 2866 TOPS/W.”

Find the technical paper here. Published January 2024 (preprint).

Zhou, Yifei, Xuchu Huang, Jianyi Yang, Kai Ni, Hussam Amrouch, Cheng Zhuo, and Xunxhao Yin. “Low Power and Temperature-Resilient Compute-In-Memory Based on Subthreshold-FeFET.” arXiv preprint arXiv:2312.17442 (2023).

Related Reading
Increasing AI Energy Efficiency With Compute In Memory
How to process zettascale workloads and stay within a fixed power budget.
Modeling Compute In Memory With Biological Efficiency
Generative AI forces chipmakers to use compute resources more intelligently.

The post Ultra-Low Power CiM Design For Practical Edge Scenarios appeared first on Semiconductor Engineering.

❌