FreshRSS

Zobrazení pro čtení

Jsou dostupné nové články, klikněte pro obnovení stránky.

Chip Industry Technical Paper Roundup: August 20

New technical papers recently added to Semiconductor Engineering’s library:

Technical Paper Research Organizations
Design Technology Co-Optimization and Time-Efficient Verification for Enhanced Pin Accessibility in the Post-3-nm Node Samsung Electronics and Kyungpook National University (KNU)
Search-in-Memory (SiM): Reliable, Versatile, and Efficient Data Matching in SSD’s NAND Flash Memory Chip for Data Indexing Acceleration TU Dortmund, Academia Sinica, and National Taiwan University
Achieving Sustainability in the Semiconductor Industry: The Impact of Simulation and AI Lam Research
HMComp: Extending Near-Memory Capacity using Compression in Hybrid Memory Chalmers University of Technology and ZeroPoint Technologies
Retention-aware zero-shifting technique for Tiki-Taka algorithm-based analog deep learning accelerator Pohang University of Science and Technology, Korea University, and Kyungpook National University
Improvement of Contact Resistance and 3D Integration of 2D Material Field-Effect Transistors Using Semi-Metallic PtSe2 Contacts Yonsei University, Korea Advanced Institute of Science and Technology (KAIST), Lincoln University College, Korea Institute of Science and Technology (KIST), and Ewha Womans University
Ultra-steep slope cryogenic FETs based on bilayer graphene RWTH Aachen University, Forschungszentrum Julich, National Institute for Materials Science (Japan), and AMO GmbH

More Reading
Technical Paper Library home

The post Chip Industry Technical Paper Roundup: August 20 appeared first on Semiconductor Engineering.

A Generic Approach For Fuzzing Arbitrary Hypervisors

A technical paper titled “HYPERPILL: Fuzzing for Hypervisor-bugs by Leveraging the Hardware Virtualization Interface” was presented at the August 2024 USENIX Security Symposium by researchers at EPFL, Boston University, and Zhejiang University.

Abstract:

“The security guarantees of cloud computing depend on the isolation guarantees of the underlying hypervisors. Prior works have presented effective methods for automatically identifying vulnerabilities in hypervisors. However, these approaches are limited in scope. For instance, their implementation is typically hypervisor-specific and limited by requirements for detailed grammars, access to source-code, and assumptions about hypervisor behaviors. In practice, complex closed-source and recent open-source hypervisors are often not suitable for off-the-shelf fuzzing techniques.

HYPERPILL introduces a generic approach for fuzzing arbitrary hypervisors. HYPERPILL leverages the insight that although hypervisor implementations are diverse, all hypervisors rely on the identical underlying hardware-virtualization interface to manage virtual-machines. To take advantage of the hardware-virtualization interface, HYPERPILL makes a snapshot of the hypervisor, inspects the snapshotted hardware state to enumerate the hypervisor’s input-spaces, and leverages feedback-guided snapshot-fuzzing within an emulated environment to identify vulnerabilities in arbitrary hypervisors. In our evaluation, we found that beyond being the first hypervisor-fuzzer capable of identifying vulnerabilities in arbitrary hypervisors across all major attack-surfaces (i.e., PIO/MMIO/Hypercalls/DMA), HYPERPILL also outperforms state-of-the-art approaches that rely on access to source-code, due to the granularity of feedback provided by HYPERPILL’s emulation-based approach. In terms of coverage, HYPERPILL outperformed past fuzzers for 10/12 QEMU devices, without the API hooking or source-code instrumentation techniques required by prior works. HYPERPILL identified 26 new bugs in recent versions of QEMU, Hyper-V, and macOS Virtualization Framework across four device-categories.”

Find the technical paper here. Published August 2024. Distinguished Paper Award Winner.

Bulekov, Alexander, Qiang Liu, Manuel Egele, and Mathias Payer. “HYPERPILL: Fuzzing for Hypervisor-bugs by Leveraging the Hardware Virtualization Interface.” In 33rd USENIX Security Symposium (USENIX Security 24). 2024.

Further Reading
SRAM Security Concerns Grow
Volatile memory threat increases as chips are disaggregated into chiplets, making it easier to isolate memory and slow data degradation.

The post A Generic Approach For Fuzzing Arbitrary Hypervisors appeared first on Semiconductor Engineering.

A Novel Attack For Depleting DNN Model Inference With Runtime Code Fault Injections

A technical paper titled “Yes, One-Bit-Flip Matters! Universal DNN Model Inference Depletion with Runtime Code Fault Injection” was presented at the August 2024 USENIX Security Symposium by researchers at Peng Cheng Laboratory, Shanghai Jiao Tong University, CSIRO’s Data61, University of Western Australia, and University of Waterloo.

Abstract:

“We propose, FrameFlip, a novel attack for depleting DNN model inference with runtime code fault injections. Notably, Frameflip operates independently of the DNN models deployed and succeeds with only a single bit-flip injection. This fundamentally distinguishes it from the existing DNN inference depletion paradigm that requires injecting tens of deterministic faults concurrently. Since our attack performs at the universal code or library level, the mandatory code snippet can be perversely called by all mainstream machine learning frameworks, such as PyTorch and TensorFlow, dependent on the library code. Using DRAM Rowhammer to facilitate end-to-end fault injection, we implement Frameflip across diverse model architectures (LeNet, VGG-16, ResNet-34 and ResNet-50) with different datasets (FMNIST, CIFAR-10, GTSRB, and ImageNet). With a single bit fault injection, Frameflip achieves high depletion efficacy that consistently renders the model inference utility as no better than guessing. We also experimentally verify that identified vulnerable bits are almost equally effective at depleting different deployed models. In contrast, transferability is unattainable for all existing state-of-the-art model inference depletion attacks. Frameflip is shown to be evasive against all known defenses, generally due to the nature of current defenses operating at the model level (which is model-dependent) in lieu of the underlying code level.”

Find the technical paper here. Published August 2024. Distinguished Paper Award Winner.

Li, Shaofeng, Xinyu Wang, Minhui Xue, Haojin Zhu, Zhi Zhang, Yansong Gao, Wen Wu, and Xuemin Sherman Shen. “Yes, One-Bit-Flip Matters! Universal DNN Model Inference Depletion with Runtime Code Fault Injection.” In Proceedings of the 33th USENIX Security Symposium. 2024.

Related Reading
Why It’s So Hard To Secure AI Chips
Much of the hardware is the same, but AI systems have unique vulnerabilities that require novel defense strategies.

The post A Novel Attack For Depleting DNN Model Inference With Runtime Code Fault Injections appeared first on Semiconductor Engineering.

Uncovering A Significant Residual Attack Surface For Cross-Privilege Spectre-V2 Attacks

A technical paper titled “InSpectre Gadget: Inspecting the Residual Attack Surface of Cross-privilege Spectre v2” was presented at the August 2024 USENIX Security Symposium by researchers at Vrije Universiteit Amsterdam.

Abstract:

“Spectre v2 is one of the most severe transient execution vulnerabilities, as it allows an unprivileged attacker to lure a privileged (e.g., kernel) victim into speculatively jumping to a chosen gadget, which then leaks data back to the attacker. Spectre v2 is hard to eradicate. Even on last-generation Intel CPUs, security hinges on the unavailability of exploitable gadgets. Nonetheless, with (i) deployed mitigations—eIBRS, no-eBPF, (Fine)IBT—all aimed at hindering many usable gadgets, (ii) existing exploits relying on now-privileged features (eBPF), and (iii) recent Linux kernel gadget analysis studies reporting no exploitable gadgets, the common belief is that there is no residual attack surface of practical concern.

In this paper, we challenge this belief and uncover a significant residual attack surface for cross-privilege Spectre-v2 attacks. To this end, we present InSpectre Gadget, a new gadget analysis tool for in-depth inspection of Spectre gadgets. Unlike existing tools, ours performs generic constraint analysis and models knowledge of advanced exploitation techniques to accurately reason over gadget exploitability in an automated fashion. We show that our tool can not only uncover new (unconventionally) exploitable gadgets in the Linux kernel, but that those gadgets are sufficient to bypass all deployed Intel mitigations. As a demonstration, we present the first native Spectre-v2 exploit against the Linux kernel on last-generation Intel CPUs, based on the recent BHI variant and able to leak arbitrary kernel memory at 3.5 kB/sec. We also present a number of gadgets and exploitation techniques to bypass the recent FineIBT mitigation, along with a case study on a 13th Gen Intel CPU that can leak kernel memory at 18 bytes/sec.”

Find the technical paper here. Published August 2024. Distinguished Paper Award Winner.  Find additional information here on VU Amsterdam’s site.

Wiebing, Sander, Alvise de Faveri Tron, Herbert Bos, and Cristiano Giuffrida. “InSpectre Gadget: Inspecting the residual attack surface of cross-privilege Spectre v2.” In USENIX Security. 2024.

Further Reading
Defining Chip Threat Models To Identify Security Risks
Not every device has the same requirements, and even the best security needs to adapt.

The post Uncovering A Significant Residual Attack Surface For Cross-Privilege Spectre-V2 Attacks appeared first on Semiconductor Engineering.

Data Memory-Dependent Prefetchers Pose SW Security Threat By Breaking Cryptographic Implementations

A technical paper titled “GoFetch: Breaking Constant-Time Cryptographic Implementations Using Data Memory-Dependent Prefetchers” was presented at the August 2024 USENIX Security Symposium by researchers at University of Illinois Urbana-Champaign, University of Texas at Austin, Georgia Institute of Technology, University of California Berkeley, University of Washington, and Carnegie Mellon University.

Abstract:

“Microarchitectural side-channel attacks have shaken the foundations of modern processor design. The cornerstone defense against these attacks has been to ensure that security-critical programs do not use secret-dependent data as addresses. Put simply: do not pass secrets as addresses to, e.g., data memory instructions. Yet, the discovery of data memory-dependent prefetchers (DMPs)—which turn program data into addresses directly from within the memory system—calls into question whether this approach will continue to remain secure.

This paper shows that the security threat from DMPs is significantly worse than previously thought and demonstrates the first end-to-end attacks on security-critical software using the Apple m-series DMP. Undergirding our attacks is a new understanding of how DMPs behave which shows, among other things, that the Apple DMP will activate on behalf of any victim program and attempt to “leak” any cached data that resembles a pointer. From this understanding, we design a new type of chosen-input attack that uses the DMP to perform end-to-end key extraction on popular constant-time implementations of classical (OpenSSL Diffie-Hellman Key Exchange, Go RSA decryption) and post-quantum cryptography (CRYSTALS-Kyber and CRYSTALS-Dilithium).”

Find the technical paper here. Published August 2024.

Chen, Boru, Yingchen Wang, Pradyumna Shome, Christopher W. Fletcher, David Kohlbrenner, Riccardo Paccagnella, and Daniel Genkin. “GoFetch: Breaking constant-time cryptographic implementations using data memory-dependent prefetchers.” In Proc. USENIX Secur. Symp, pp. 1-21. 2024.

Further Reading
Chip Security Now Depends On Widening Supply Chain
How tighter HW-SW integration and increasing government involvement are changing the security landscape for chips and systems.

 

The post Data Memory-Dependent Prefetchers Pose SW Security Threat By Breaking Cryptographic Implementations appeared first on Semiconductor Engineering.

A New Low-Cost HW-Counterbased RowHammer Mitigation Technique

A technical paper titled “ABACuS: All-Bank Activation Counters for Scalable and Low Overhead RowHammer Mitigation” was presented at the August 2024 USENIX Security Symposium by researchers at ETH Zurich.

Abstract:

“We introduce ABACuS, a new low-cost hardware-counterbased RowHammer mitigation technique that performance-, energy-, and area-efficiently scales with worsening RowHammer vulnerability. We observe that both benign workloads and RowHammer attacks tend to access DRAM rows with the same row address in multiple DRAM banks at around the same time. Based on this observation, ABACuS’s key idea is to use a single shared row activation counter to track activations to the rows with the same row address in all DRAM banks. Unlike state-of-the-art RowHammer mitigation mechanisms that implement a separate row activation counter for each DRAM bank, ABACuS implements fewer counters (e.g., only one) to track an equal number of aggressor rows.

Our comprehensive evaluations show that ABACuS securely prevents RowHammer bitflips at low performance/energy overhead and low area cost. We compare ABACuS to four state-of-the-art mitigation mechanisms. At a nearfuture RowHammer threshold of 1000, ABACuS incurs only 0.58% (0.77%) performance and 1.66% (2.12%) DRAM energy overheads, averaged across 62 single-core (8-core) workloads, requiring only 9.47 KiB of storage per DRAM rank. At the RowHammer threshold of 1000, the best prior lowarea-cost mitigation mechanism incurs 1.80% higher average performance overhead than ABACuS, while ABACuS requires 2.50× smaller chip area to implement. At a future RowHammer threshold of 125, ABACuS performs very similarly to (within 0.38% of the performance of) the best prior performance- and energy-efficient RowHammer mitigation mechanism while requiring 22.72× smaller chip area. We show that ABACuS’s performance scales well with the number of DRAM banks. At the RowHammer threshold of 125, ABACuS incurs 1.58%, 1.50%, and 2.60% performance overheads for 16-, 32-, and 64-bank systems across all single-core workloads, respectively. ABACuS is freely and openly available at https://github.com/CMU-SAFARI/ABACuS.”

Find the technical paper here.

Olgun, Ataberk, Yahya Can Tugrul, Nisa Bostanci, Ismail Emir Yuksel, Haocong Luo, Steve Rhyner, Abdullah Giray Yaglikci, Geraldo F. Oliveira, and Onur Mutlu. “Abacus: All-bank activation counters for scalable and low overhead rowhammer mitigation.” In USENIX Security. 2024.

Further Reading
Securing DRAM Against Evolving Rowhammer Threats
A multi-layered, system-level approach is crucial to DRAM protection.

The post A New Low-Cost HW-Counterbased RowHammer Mitigation Technique appeared first on Semiconductor Engineering.

Flexible-Wafer Platform And CMOS-Compatible 300mm Wafer-Scale Integrated-Photonics Fabrication

A new technical paper titled “Mechanically-flexible wafer-scale integrated-photonics fabrication platform” was published by researchers at MIT and New York Center for Research, Economic Advancement, Technology, Engineering, and Science (NY CREATES).

Abstract
“The field of integrated photonics has advanced rapidly due to wafer-scale fabrication, with integrated-photonics platforms and fabrication processes being demonstrated at both infrared and visible wavelengths. However, these demonstrations have primarily focused on fabrication processes on silicon substrates that result in rigid photonic wafers and chips, which limit the potential application spaces. There are many application areas that would benefit from mechanically-flexible integrated-photonics wafers, such as wearable healthcare monitors and pliable displays. Although there have been demonstrations of mechanically-flexible photonics fabrication, they have been limited to fabrication processes on the individual device or chip scale, which limits scalability. In this paper, we propose, develop, and experimentally characterize the first 300-mm wafer-scale platform and fabrication process that results in mechanically-flexible photonic wafers and chips. First, we develop and describe the 300-mm wafer-scale CMOS-compatible flexible platform and fabrication process. Next, we experimentally demonstrate key optical functionality at visible wavelengths, including chip coupling, waveguide routing, and passive devices. Then, we perform a bend-durability study to characterize the mechanical flexibility of the photonic chips, demonstrating bending a single chip 2000 times down to a bend diameter of 0.5 inch with no degradation in the optical performance. Finally, we experimentally characterize polarization-rotation effects induced by bending the flexible photonic chips. This work will enable the field of integrated photonics to advance into new application areas that require flexible photonic chips.”

Find the technical paper here. Published May 2024. Find MIT’s news release here.

Notaros, M., Dyer, T., Garcia Coleto, A. et al. Mechanically-flexible wafer-scale integrated-photonics fabrication platform. Sci Rep 14, 10623 (2024). https://doi.org/10.1038/s41598-024-61055-w.

The post Flexible-Wafer Platform And CMOS-Compatible 300mm Wafer-Scale Integrated-Photonics Fabrication appeared first on Semiconductor Engineering.

Heterogeneity Of 3DICs As A Security Vulnerability

A new technical paper titled “Harnessing Heterogeneity for Targeted Attacks on 3-D ICs” was published by Drexel University.

Abstract
“As 3-D integrated circuits (ICs) increasingly pervade the microelectronics industry, the integration of heterogeneous components presents a unique challenge from a security perspective. To this end, an attack on a victim die of a multi-tiered heterogeneous 3-D IC is proposed and evaluated. By utilizing on-chip inductive circuits and transistors with low voltage threshold (LVT), a die based on CMOS technology is proposed that includes a sensor to monitor the electromagnetic (EM) emissions from the normal function of a victim die, without requiring physical probing. The adversarial circuit is self-powered through the use of thermocouples that supply the generated current to circuits that sense EM emissions. Therefore, the integration of disparate technologies in a single 3-D circuit allows for a stealthy, wireless, and non-invasive side-channel attack. A thin-film thermo-electric generator (TEG) is developed that produces a 115 mV voltage source, which is amplified 5 × through a voltage booster to provide power to the adversarial circuit. An on-chip inductor is also developed as a component of a sensing array, which detects changes to the magnetic field induced by the computational activity of the victim die. In addition, the challenges associated with detecting and mitigating such attacks are discussed, highlighting the limitations of existing security mechanisms in addressing the multifaceted nature of vulnerabilities due to the heterogeneity of 3-D ICs.”

Find the technical paper here. Published June 2024.

Alec Aversa and Ioannis Savidis. 2024. Harnessing Heterogeneity for Targeted Attacks on 3-D ICs. In Proceedings of the Great Lakes Symposium on VLSI 2024 (GLSVLSI ’24). Association for Computing Machinery, New York, NY, USA, 246–251. https://doi.org/10.1145/3649476.3660385.

The post Heterogeneity Of 3DICs As A Security Vulnerability appeared first on Semiconductor Engineering.

A Hybrid ECO Detailed Placement Flow for Mitigating Dynamic IR Drop (UC San Diego)

A new technical paper titled “A Hybrid ECO Detailed Placement Flow for Improved Reduction of Dynamic IR Drop” was published by researchers at UC San Diego.

Abstract:

“With advanced semiconductor technology progressing well into sub-7nm scale, voltage drop has become an increasingly challenging issue. As a result, there has been extensive research focused on predicting and mitigating dynamic IR drops, leading to the development of IR drop engineering change order (ECO) flows – often integrated with modern commercial EDA tools. However, these tools encounter QoR limitations while mitigating IR drop. To address this, we propose a hybrid ECO detailed placement approach that is integrated with existing commercial EDA flows, to mitigate excessive peak current demands within power and ground rails. Our proposed hybrid approach effectively optimizes peak current levels within a specified “clip”– complementing and enhancing commercial EDA dynamic IR-driven ECO detailed placements. In particular, we: (i) order instances in a netlist in decreasing order of worst voltage drop; (ii) extract a clip around each instance; and (iii) solve an integer linear programming (ILP) problem to optimize instance placements. Our approach optimizes dynamic voltage drops (DVD) across ten designs by up to 15.3% compared to original conventional flows, with similar timing quality and 55.1% less runtime.”

Find the technical paper here. Published June 2024.

Andrew B. Kahng, Bodhisatta Pramanik, and Mingyu Woo. 2024. A Hybrid ECO Detailed Placement Flow for Improved Reduction of Dynamic IR Drop. In Proceedings of the Great Lakes Symposium on VLSI 2024 (GLSVLSI ’24). Association for Computing Machinery, New York, NY, USA, 390–396. https://doi.org/10.1145/3649476.3658727.

The post A Hybrid ECO Detailed Placement Flow for Mitigating Dynamic IR Drop (UC San Diego) appeared first on Semiconductor Engineering.

Classification and Localization of Semiconductor Defect Classes in Aggressive Pitches (imec, Screen)

A new technical paper titled “An Evaluation of Continual Learning for Advanced Node Semiconductor Defect Inspection” was published by Imec and SCREEN SPE Germany.

Abstract

“Deep learning-based semiconductor defect inspection has gained traction in recent years, offering a powerful and versatile approach that provides high accuracy, adaptability, and efficiency in detecting and classifying nano-scale defects. However, semiconductor manufacturing processes are continually evolving, leading to the emergence of new types of defects over time. This presents a significant challenge for conventional supervised defect detectors, as they may suffer from catastrophic forgetting when trained on new defect datasets, potentially compromising performance on previously learned tasks. An alternative approach involves the constant storage of previously trained datasets alongside pre-trained model versions, which can be utilized for (re-)training from scratch or fine-tuning whenever encountering a new defect dataset. However, adhering to such a storage template is impractical in terms of size, particularly when considering High-Volume Manufacturing (HVM). Additionally, semiconductor defect datasets, especially those encompassing stochastic defects, are often limited and expensive to obtain, thus lacking sufficient representation of the entire universal set of defectivity. This work introduces a task-agnostic, meta-learning approach aimed at addressing this challenge, which enables the incremental addition of new defect classes and scales to create a more robust and generalized model for semiconductor defect inspection. We have benchmarked our approach using real resist-wafer SEM (Scanning Electron Microscopy) datasets for two process steps, ADI and AEI, demonstrating its superior performance compared to conventional supervised training methods.”

Find the technical paper here.  Published July 2024 (preprint).

Prasad, Amit, Bappaditya Dey, Victor Blanco, and Sandip Halder. “An Evaluation of Continual Learning for Advanced Node Semiconductor Defect Inspection.” arXiv preprint arXiv:2407.12724 (2024).

The post Classification and Localization of Semiconductor Defect Classes in Aggressive Pitches (imec, Screen) appeared first on Semiconductor Engineering.

Survey of Energy Efficient PIM Processors

A new technical paper titled “Survey of Deep Learning Accelerators for Edge and Emerging Computing” was published by researchers at University of Dayton and the Air Force Research Laboratory.

Abstract

“The unprecedented progress in artificial intelligence (AI), particularly in deep learning algorithms with ubiquitous internet connected smart devices, has created a high demand for AI computing on the edge devices. This review studied commercially available edge processors, and the processors that are still in industrial research stages. We categorized state-of-the-art edge processors based on the underlying architecture, such as dataflow, neuromorphic, and processing in-memory (PIM) architecture. The processors are analyzed based on their performance, chip area, energy efficiency, and application domains. The supported programming frameworks, model compression, data precision, and the CMOS fabrication process technology are discussed. Currently, most commercial edge processors utilize dataflow architectures. However, emerging non-von Neumann computing architectures have attracted the attention of the industry in recent years. Neuromorphic processors are highly efficient for performing computation with fewer synaptic operations, and several neuromorphic processors offer online training for secured and personalized AI applications. This review found that the PIM processors show significant energy efficiency and consume less power compared to dataflow and neuromorphic processors. A future direction of the industry could be to implement state-of-the-art deep learning algorithms in emerging non-von Neumann computing paradigms for low-power computing on edge devices.”

Find the technical paper here. Published July 2024.

Alam, Shahanur, Chris Yakopcic, Qing Wu, Mark Barnell, Simon Khan, and Tarek M. Taha. 2024. “Survey of Deep Learning Accelerators for Edge and Emerging Computing” Electronics 13, no. 15: 2988. https://doi.org/10.3390/electronics13152988.

The post Survey of Energy Efficient PIM Processors appeared first on Semiconductor Engineering.

Flexible-Wafer Platform And CMOS-Compatible 300mm Wafer-Scale Integrated-Photonics Fabrication

A new technical paper titled “Mechanically-flexible wafer-scale integrated-photonics fabrication platform” was published by researchers at MIT and New York Center for Research, Economic Advancement, Technology, Engineering, and Science (NY CREATES).

Abstract
“The field of integrated photonics has advanced rapidly due to wafer-scale fabrication, with integrated-photonics platforms and fabrication processes being demonstrated at both infrared and visible wavelengths. However, these demonstrations have primarily focused on fabrication processes on silicon substrates that result in rigid photonic wafers and chips, which limit the potential application spaces. There are many application areas that would benefit from mechanically-flexible integrated-photonics wafers, such as wearable healthcare monitors and pliable displays. Although there have been demonstrations of mechanically-flexible photonics fabrication, they have been limited to fabrication processes on the individual device or chip scale, which limits scalability. In this paper, we propose, develop, and experimentally characterize the first 300-mm wafer-scale platform and fabrication process that results in mechanically-flexible photonic wafers and chips. First, we develop and describe the 300-mm wafer-scale CMOS-compatible flexible platform and fabrication process. Next, we experimentally demonstrate key optical functionality at visible wavelengths, including chip coupling, waveguide routing, and passive devices. Then, we perform a bend-durability study to characterize the mechanical flexibility of the photonic chips, demonstrating bending a single chip 2000 times down to a bend diameter of 0.5 inch with no degradation in the optical performance. Finally, we experimentally characterize polarization-rotation effects induced by bending the flexible photonic chips. This work will enable the field of integrated photonics to advance into new application areas that require flexible photonic chips.”

Find the technical paper here. Published May 2024. Find MIT’s news release here.

Notaros, M., Dyer, T., Garcia Coleto, A. et al. Mechanically-flexible wafer-scale integrated-photonics fabrication platform. Sci Rep 14, 10623 (2024). https://doi.org/10.1038/s41598-024-61055-w.

The post Flexible-Wafer Platform And CMOS-Compatible 300mm Wafer-Scale Integrated-Photonics Fabrication appeared first on Semiconductor Engineering.

Heterogeneity Of 3DICs As A Security Vulnerability

A new technical paper titled “Harnessing Heterogeneity for Targeted Attacks on 3-D ICs” was published by Drexel University.

Abstract
“As 3-D integrated circuits (ICs) increasingly pervade the microelectronics industry, the integration of heterogeneous components presents a unique challenge from a security perspective. To this end, an attack on a victim die of a multi-tiered heterogeneous 3-D IC is proposed and evaluated. By utilizing on-chip inductive circuits and transistors with low voltage threshold (LVT), a die based on CMOS technology is proposed that includes a sensor to monitor the electromagnetic (EM) emissions from the normal function of a victim die, without requiring physical probing. The adversarial circuit is self-powered through the use of thermocouples that supply the generated current to circuits that sense EM emissions. Therefore, the integration of disparate technologies in a single 3-D circuit allows for a stealthy, wireless, and non-invasive side-channel attack. A thin-film thermo-electric generator (TEG) is developed that produces a 115 mV voltage source, which is amplified 5 × through a voltage booster to provide power to the adversarial circuit. An on-chip inductor is also developed as a component of a sensing array, which detects changes to the magnetic field induced by the computational activity of the victim die. In addition, the challenges associated with detecting and mitigating such attacks are discussed, highlighting the limitations of existing security mechanisms in addressing the multifaceted nature of vulnerabilities due to the heterogeneity of 3-D ICs.”

Find the technical paper here. Published June 2024.

Alec Aversa and Ioannis Savidis. 2024. Harnessing Heterogeneity for Targeted Attacks on 3-D ICs. In Proceedings of the Great Lakes Symposium on VLSI 2024 (GLSVLSI ’24). Association for Computing Machinery, New York, NY, USA, 246–251. https://doi.org/10.1145/3649476.3660385.

The post Heterogeneity Of 3DICs As A Security Vulnerability appeared first on Semiconductor Engineering.

A Hybrid ECO Detailed Placement Flow for Mitigating Dynamic IR Drop (UC San Diego)

A new technical paper titled “A Hybrid ECO Detailed Placement Flow for Improved Reduction of Dynamic IR Drop” was published by researchers at UC San Diego.

Abstract:

“With advanced semiconductor technology progressing well into sub-7nm scale, voltage drop has become an increasingly challenging issue. As a result, there has been extensive research focused on predicting and mitigating dynamic IR drops, leading to the development of IR drop engineering change order (ECO) flows – often integrated with modern commercial EDA tools. However, these tools encounter QoR limitations while mitigating IR drop. To address this, we propose a hybrid ECO detailed placement approach that is integrated with existing commercial EDA flows, to mitigate excessive peak current demands within power and ground rails. Our proposed hybrid approach effectively optimizes peak current levels within a specified “clip”– complementing and enhancing commercial EDA dynamic IR-driven ECO detailed placements. In particular, we: (i) order instances in a netlist in decreasing order of worst voltage drop; (ii) extract a clip around each instance; and (iii) solve an integer linear programming (ILP) problem to optimize instance placements. Our approach optimizes dynamic voltage drops (DVD) across ten designs by up to 15.3% compared to original conventional flows, with similar timing quality and 55.1% less runtime.”

Find the technical paper here. Published June 2024.

Andrew B. Kahng, Bodhisatta Pramanik, and Mingyu Woo. 2024. A Hybrid ECO Detailed Placement Flow for Improved Reduction of Dynamic IR Drop. In Proceedings of the Great Lakes Symposium on VLSI 2024 (GLSVLSI ’24). Association for Computing Machinery, New York, NY, USA, 390–396. https://doi.org/10.1145/3649476.3658727.

The post A Hybrid ECO Detailed Placement Flow for Mitigating Dynamic IR Drop (UC San Diego) appeared first on Semiconductor Engineering.

Classification and Localization of Semiconductor Defect Classes in Aggressive Pitches (imec, Screen)

A new technical paper titled “An Evaluation of Continual Learning for Advanced Node Semiconductor Defect Inspection” was published by Imec and SCREEN SPE Germany.

Abstract

“Deep learning-based semiconductor defect inspection has gained traction in recent years, offering a powerful and versatile approach that provides high accuracy, adaptability, and efficiency in detecting and classifying nano-scale defects. However, semiconductor manufacturing processes are continually evolving, leading to the emergence of new types of defects over time. This presents a significant challenge for conventional supervised defect detectors, as they may suffer from catastrophic forgetting when trained on new defect datasets, potentially compromising performance on previously learned tasks. An alternative approach involves the constant storage of previously trained datasets alongside pre-trained model versions, which can be utilized for (re-)training from scratch or fine-tuning whenever encountering a new defect dataset. However, adhering to such a storage template is impractical in terms of size, particularly when considering High-Volume Manufacturing (HVM). Additionally, semiconductor defect datasets, especially those encompassing stochastic defects, are often limited and expensive to obtain, thus lacking sufficient representation of the entire universal set of defectivity. This work introduces a task-agnostic, meta-learning approach aimed at addressing this challenge, which enables the incremental addition of new defect classes and scales to create a more robust and generalized model for semiconductor defect inspection. We have benchmarked our approach using real resist-wafer SEM (Scanning Electron Microscopy) datasets for two process steps, ADI and AEI, demonstrating its superior performance compared to conventional supervised training methods.”

Find the technical paper here.  Published July 2024 (preprint).

Prasad, Amit, Bappaditya Dey, Victor Blanco, and Sandip Halder. “An Evaluation of Continual Learning for Advanced Node Semiconductor Defect Inspection.” arXiv preprint arXiv:2407.12724 (2024).

The post Classification and Localization of Semiconductor Defect Classes in Aggressive Pitches (imec, Screen) appeared first on Semiconductor Engineering.

Survey of Energy Efficient PIM Processors

A new technical paper titled “Survey of Deep Learning Accelerators for Edge and Emerging Computing” was published by researchers at University of Dayton and the Air Force Research Laboratory.

Abstract

“The unprecedented progress in artificial intelligence (AI), particularly in deep learning algorithms with ubiquitous internet connected smart devices, has created a high demand for AI computing on the edge devices. This review studied commercially available edge processors, and the processors that are still in industrial research stages. We categorized state-of-the-art edge processors based on the underlying architecture, such as dataflow, neuromorphic, and processing in-memory (PIM) architecture. The processors are analyzed based on their performance, chip area, energy efficiency, and application domains. The supported programming frameworks, model compression, data precision, and the CMOS fabrication process technology are discussed. Currently, most commercial edge processors utilize dataflow architectures. However, emerging non-von Neumann computing architectures have attracted the attention of the industry in recent years. Neuromorphic processors are highly efficient for performing computation with fewer synaptic operations, and several neuromorphic processors offer online training for secured and personalized AI applications. This review found that the PIM processors show significant energy efficiency and consume less power compared to dataflow and neuromorphic processors. A future direction of the industry could be to implement state-of-the-art deep learning algorithms in emerging non-von Neumann computing paradigms for low-power computing on edge devices.”

Find the technical paper here. Published July 2024.

Alam, Shahanur, Chris Yakopcic, Qing Wu, Mark Barnell, Simon Khan, and Tarek M. Taha. 2024. “Survey of Deep Learning Accelerators for Edge and Emerging Computing” Electronics 13, no. 15: 2988. https://doi.org/10.3390/electronics13152988.

The post Survey of Energy Efficient PIM Processors appeared first on Semiconductor Engineering.

Secure Low-Cost In-DRAM Trackers For Mitigating Rowhammer (Georgia Tech, Google, Nvidia)

A new technical paper titled “MINT: Securely Mitigating Rowhammer with a Minimalist In-DRAM Tracker” was published by researchers at Georgia Tech, Google, and Nvidia.

Abstract
“This paper investigates secure low-cost in-DRAM trackers for mitigating Rowhammer (RH). In-DRAM solutions have the advantage that they can solve the RH problem within the DRAM chip, without relying on other parts of the system. However, in-DRAM mitigation suffers from two key challenges: First, the mitigations are synchronized with refresh, which means we cannot mitigate at arbitrary times. Second, the SRAM area available for aggressor tracking is severely limited, to only a few bytes. Existing low-cost in-DRAM trackers (such as TRR) have been broken by well-crafted access patterns, whereas prior counter-based schemes require impractical overheads of hundreds or thousands of entries per bank. The goal of our paper is to develop an ultra low-cost secure in-DRAM tracker.

Our solution is based on a simple observation: if only one row can be mitigated at refresh, then we should ideally need to track only one row. We propose a Minimalist In-DRAM Tracker (MINT), which provides secure mitigation with just a single entry. At each refresh, MINT probabilistically decides which activation in the upcoming interval will be selected for mitigation at the next refresh. MINT provides guaranteed protection against classic single and double-sided attacks. We also derive the minimum RH threshold (MinTRH) tolerated by MINT across all patterns. MINT has a MinTRH of 1482 which can be lowered to 356 with RFM. The MinTRH of MINT is lower than a prior counter-based design with 677 entries per bank, and is within 2x of the MinTRH of an idealized design that stores one-counter-per-row. We also analyze the impact of refresh postponement on the MinTRH of low-cost in-DRAM trackers, and propose an efficient solution to make such trackers compatible with refresh postponement.”

Find the technical paper here. Preprint published July 2024.

Qureshi, Moinuddin, Salman Qazi, and Aamer Jaleel. “MINT: Securely Mitigating Rowhammer with a Minimalist In-DRAM Tracker.” arXiv preprint arXiv:2407.16038 (2024).

The post Secure Low-Cost In-DRAM Trackers For Mitigating Rowhammer (Georgia Tech, Google, Nvidia) appeared first on Semiconductor Engineering.

MTJ-Based CRAM Array

A new technical paper titled “Experimental demonstration of magnetic tunnel junction-based computational random-access memory” was published by researchers at University of Minnesota and University of Arizona, Tucson.

Abstract

“The conventional computing paradigm struggles to fulfill the rapidly growing demands from emerging applications, especially those for machine intelligence because much of the power and energy is consumed by constant data transfers between logic and memory modules. A new paradigm, called “computational random-access memory (CRAM),” has emerged to address this fundamental limitation. CRAM performs logic operations directly using the memory cells themselves, without having the data ever leave the memory. The energy and performance benefits of CRAM for both conventional and emerging applications have been well established by prior numerical studies. However, there is a lack of experimental demonstration and study of CRAM to evaluate its computational accuracy, which is a realistic and application-critical metric for its technological feasibility and competitiveness. In this work, a CRAM array based on magnetic tunnel junctions (MTJs) is experimentally demonstrated. First, basic memory operations, as well as 2-, 3-, and 5-input logic operations, are studied. Then, a 1-bit full adder with two different designs is demonstrated. Based on the experimental results, a suite of models has been developed to characterize the accuracy of CRAM computation. Scalar addition, multiplication, and matrix multiplication, which are essential building blocks for many conventional and machine intelligence applications, are evaluated and show promising accuracy performance. With the confirmation of MTJ-based CRAM’s accuracy, there is a strong case that this technology will have a significant impact on power- and energy-demanding applications of machine intelligence.”

Find the technical paper here. Published July 2024.  Find the University of Minnesota’s news release here.

Lv, Y., Zink, B.R., Bloom, R.P. et al. Experimental demonstration of magnetic tunnel junction-based computational random-access memory. npj Unconv. Comput. 1, 3 (2024). https://doi.org/10.1038/s44335-024-00003-3.

The post MTJ-Based CRAM Array appeared first on Semiconductor Engineering.

NeuroHammer Attacks on ReRAM-Based Memories

A new technical paper titled “NVM-Flip: Non-Volatile-Memory BitFlips on the System Level” was published by researchers at Ruhr-University Bochum, University of Duisburg-Essen, and Robert Bosch.

Abstract
“Emerging non-volatile memories (NVMs) are promising candidates to substitute conventional memories due to their low access latency, high integration density, and non-volatility. These superior properties stem from the memristor representing the centerpiece of each memory cell and is branded as the fourth fundamental circuit element. Memristors encode information in the form of its resistance by altering the physical characteristics of their filament. Hence, each memristor can store multiple bits increasing the memory density and positioning it as a potential candidate to replace DRAM and SRAM-based memories, such as caches.

However, new security risks arise with the benefits of these emerging technologies, like the recent NeuroHammer attack, which allows adversaries to deliberately flip bits in ReRAMs. While NeuroHammer has been shown to flip single bits within memristive crossbar arrays, the system-level impact remains unclear. Considering the significance of the Rowhammer attack on conventional DRAMs, NeuroHammer can potentially cause crucial damage to applications taking advantage of emerging memory technologies.

To answer this question, we introduce NVgem5, a versatile system-level simulator based on gem5. NVgem5 is capable of injecting bit-flips in eNVMs originating from NeuroHammer. Our experiments evaluate the impact of the NeuroHammer attack on main and cache memories. In particular, we demonstrate a single-bit fault attack on cache memories leaking the secret key used during the computation of RSA signatures. Our findings highlight the need for improved hardware security measures to mitigate the risk of hardware-level attacks in computing systems based on eNVMs.”

Find the technical paper here. Published June 2024.

Felix Staudigl, Jan Philipp Thoma, Christian Niesler, Karl Sturm, Rebecca Pelke, Dominik Germek, Jan Moritz Joseph, Tim Güneysu, Lucas Davi, and Rainer Leupers. 2024. NVM-Flip: Non-Volatile-Memory BitFlips on the System Level. In Proceedings of the 2024 ACM Workshop on Secure and Trustworthy Cyber-Physical Systems (SaT-CPS ’24). Association for Computing Machinery, New York, NY, USA, 11–20. https://doi.org/10.1145/3643650.3658606

The post NeuroHammer Attacks on ReRAM-Based Memories appeared first on Semiconductor Engineering.

ML Method To Predict IR Drop Levels

A new technical paper titled “IR drop Prediction Based on Machine Learning and Pattern Reduction” was published by researchers at National Tsing Hua University, National Taiwan University of Science and Technology, and MediaTek.

Abstract (partial)
“In this paper, we propose a machine learning-based method to predict IR drop levels and present an algorithm for reducing simulation patterns, which could reduce the time and computing resources required for IR drop analysis within the ECO flow. Experimental results show that our approach can reduce the number of patterns by approximately 50%, thereby decreasing the analysis time while maintaining accuracy.”

Find the technical paper here. Published June 2024.

Yong-Fong Chang, Yung-Chih Chen, Yu-Chen Cheng, Shu-Hong Lin, Che-Hsu Lin, Chun-Yuan Chen, Yu-Hsuan Chen, Yu-Che Lee, Jia-Wei Lin, Hsun-Wei Pao, Shih-Chieh Chang, Yi-Ting Li, and Chun-Yao Wang. 2024. IR drop Prediction Based on Machine Learning and Pattern Reduction. In Proceedings of the Great Lakes Symposium on VLSI 2024 (GLSVLSI ’24). Association for Computing Machinery, New York, NY, USA, 516–519. https://doi.org/10.1145/3649476.3658775

The post ML Method To Predict IR Drop Levels appeared first on Semiconductor Engineering.

Power Electronic Packaging for Discrete Dies

A technical paper titled “Substrate Embedded Power Electronics Packaging for Silicon Carbide MOSFETs” was published by researchers at University of Cambridge, University of Warwick, Chongqing University, and SpaceX.

Abstract:

“This paper proposes a new power electronic packaging for discrete dies, namely Standard Cell which consists of a step-etched active metal brazed (AMB) substrate and a flexible printed circuit board (flex-PCB). The standard cell exhibits high thermal conductivity, complete electrical insulation, and low stray inductance, thereby enhancing the performance of SiC MOSFET devices. The standard cell has a stray power loop inductance of less than 1 nH and a gate loop inductance of less than 1.5 nH . The standard cell has a flat body with surface-mounting electrical connections on one side and direct thermal connections on the other. The use of flex-PCB die interconnection enables maximum utilization of source pads while providing a flexible gate-source connection and the converter PCB. This paper presents the design concept of the standard cell and experimentally validates its effectiveness in a converter system.”

Find the technical paper here. Published May 2024.

A. Janabi et al., “Substrate Embedded Power Electronics Packaging for Silicon Carbide MOSFETs,” in IEEE Transactions on Power Electronics, doi: 10.1109/TPEL.2024.3396779.

Related Reading
Big Shifts In Power Electronics Packaging
Packages are becoming more complex to endure high power, high temperature conditions across a variety of applications.
Power Semiconductors: A Deep Dive Into Materials, Manufacturing & Business
Premium Content: How these devices are made and work, challenges in manufacturing, related startups, as well as the reasons why so much effort and resources are being spent to develop new materials, and new processes.</

The post Power Electronic Packaging for Discrete Dies appeared first on Semiconductor Engineering.

Demonstrating Programmable Nonlinear Quantum Photonic ICs

A technical paper titled “Programmable Nonlinear Quantum Photonic Circuits” was published by researchers at Niels Bohr Institute, University of Copenhagen, University of Bristol, and Ruhr-Universitat Bochum.

Abstract:

“The lack of interactions between single photons prohibits direct nonlinear operations in quantum optical circuits, representing a central obstacle in photonic quantum technologies. Here, we demonstrate multi-mode nonlinear photonic circuits where both linear and direct nonlinear operations can be programmed with high precision at the single-photon level. Deterministic nonlinear interaction is realized with a tunable quantum dot embedded in a nanophotonic waveguide mediating interactions between individual photons within a temporal linear optical interferometer. We demonstrate the capability to reprogram the nonlinear photonic circuits and implement protocols where strong nonlinearities are required, in particular for quantum simulation of anharmonic molecular dynamics, thereby showcasing the new key functionalities enabled by our technology.”

Find the technical paper here. Published May 2024 (preprint).

Nielsen, Kasper H., Ying Wang, Edward Deacon, Patrik I. Sund, Zhe Liu, Sven Scholz, Andreas D. Wieck et al. “Programmable Nonlinear Quantum Photonic Circuits.” arXiv preprint arXiv:2405.17941 (2024).

Related Reading
The Race Toward Quantum Advantage
Enormous amounts of money have been invested into quantum computing, but so far it has not surpassed conventional computers. When will that change?
Photonics: The Former And Future Solution
Twenty-five years ago, photonics was supposed to be the future of high technology. Has that future finally arrived?

The post Demonstrating Programmable Nonlinear Quantum Photonic ICs appeared first on Semiconductor Engineering.

Dedicated Approximate Computing Framework To Efficiently Compute PCs On Hardware

A technical paper titled “On Hardware-efficient Inference in Probabilistic Circuits” was published by researchers at Aalto University and UCLouvain.

Abstract:

“Probabilistic circuits (PCs) offer a promising avenue to perform embedded reasoning under uncertainty. They support efficient and exact computation of various probabilistic inference tasks by design. Hence, hardware-efficient computation of PCs is highly interesting for edge computing applications. As computations in PCs are based on arithmetic with probability values, they are typically performed in the log domain to avoid underflow. Unfortunately, performing the log operation on hardware is costly. Hence, prior work has focused on computations in the linear domain, resulting in high resolution and energy requirements. This work proposes the first dedicated approximate computing framework for PCs that allows for low-resolution logarithm computations. We leverage Addition As Int, resulting in linear PC computation with simple hardware elements. Further, we provide a theoretical approximation error analysis and present an error compensation mechanism. Empirically, our method obtains up to 357x and 649x energy reduction on custom hardware for evidence and MAP queries respectively with little or no computational error.”

Find the technical paper here. Published May 2024 (preprint). CODE: https://github.com/lingyunyao/AAI_Probabilistic_Circuits

Yao, Lingyun, Martin Trapp, Jelin Leslin, Gaurav Singh, Peng Zhang, Karthekeyan Periasamy, and Martin Andraud. “On Hardware-efficient Inference in Probabilistic Circuits.” arXiv preprint arXiv:2405.13639 (2024).

Related Reading
Architecting Chips For High-Performance Computing
Data center IC designs are evolving, based on workloads, but making the tradeoffs for those workloads is not always straightforward.
AI Tradeoffs At The Edge
The best ways to optimize AI efficiency today, and other options under development.

The post Dedicated Approximate Computing Framework To Efficiently Compute PCs On Hardware appeared first on Semiconductor Engineering.

Characterizing and Evaluating A Quantum Processor Unit In A HPC Center

A new technical paper titled “Calibration and Performance Evaluation of a Superconducting Quantum Processor in an HPC Center” was published by researchers at Leibniz Supercomputing Centre, IQM Quantum Computers, and Technical University of Munich.

Abstract

“As quantum computers mature, they migrate from laboratory environments to HPC centers. This movement enables large-scale deployments, greater access to the technology, and deep integration into HPC in the form of quantum acceleration. In laboratory environments, specialists directly control the systems’ environments and operations at any time with hands-on access, while HPC centers require remote and autonomous operations with minimal physical contact. The requirement for automation of the calibration process needed by all current quantum systems relies on maximizing their coherence times and fidelities and, with that, their best performance. It is, therefore, of great significance to establish a standardized and automatic calibration process alongside unified evaluation standards for quantum computing performance to evaluate the success of the calibration and operation of the system. In this work, we characterize our in-house superconducting quantum computer, establish an automatic calibration process, and evaluate its performance through quantum volume and an application-specific algorithm. We also analyze readout errors and improve the readout fidelity, leaning on error mitigation.”

Find the technical paper here. Published May 2024.

X. Deng, S. Pogorzalek, F. Vigneau, P. Yang, M. Schulz and L. Schulz, “Calibration and Performance Evaluation of a Superconducting Quantum Processor in an HPC Center,” ISC High Performance 2024 Research Paper Proceedings (39th International Conference), Hamburg, Germany, 2024, pp. 1-9, doi: 10.23919/ISC.2024.10528924.

The post Characterizing and Evaluating A Quantum Processor Unit In A HPC Center appeared first on Semiconductor Engineering.

Device Characteristics of GAA-Structured CMOS and CTFET Under Varying Temperatures

A new technical paper titled “Vertical-Stack Nanowire Structure of MOS Inverter and TFET Inverter in Low-temperature Application” was published by researchers at National Tsing Hua University and National United University in Taiwan.

Abstract
“Tunneling field effect transistors (TFET) have emerged as promising candidates for integrated circuits beyond conventional metal oxide semiconductor field effect transistors (MOSFET) and could overcome the physical limit, which results in the subthreshold swing (SS) < 60mV/dec at room temperature. In this study, we compare the complementary TFET (CTFET) with complementary metal oxide semiconductor (CMOS) at low temperatures (70K) by using the Gate-All-Around (GAA) architecture. The experiment result clearly shows that the CTEFT inverter has better characteristics than the CMOS inverter in various temperatures. While operating at a fixed temperature, the CMOS inverter performs an excellent on/off ratio and SS, etc. However, when a CMOS inverter operates at varying temperatures, CMOS performs worse than CTFET. This is attributed to the influence of lattice scattering, leading to the instability of CMOS characteristics. Therefore, the CTFET inverter is suitable for operation in environments with varying temperatures, exhibiting high stability, which can be applied in space technology. The simulation tool TCAD has been used to investigate the characteristics of CMOS and CTFET at low temperatures.”

Find the technical paper here. Published June 2024.

C. -C. Tien and Y. -H. Lin, “Vertical-Stack Nanowire Structure of MOS Inverter and TFET Inverter in Low-temperature Application,” in IEEE Access, doi: 10.1109/ACCESS.2024.3410677.

The post Device Characteristics of GAA-Structured CMOS and CTFET Under Varying Temperatures appeared first on Semiconductor Engineering.

Efficient TNN Inference on RISC-V Processing Cores With Minimal HW Overhead

A new technical paper titled “xTern: Energy-Efficient Ternary Neural Network Inference on RISC-V-Based Edge Systems” was published by researchers at ETH Zurich and Universita di Bologna.

Abstract
“Ternary neural networks (TNNs) offer a superior accuracy-energy trade-off compared to binary neural networks. However, until now, they have required specialized accelerators to realize their efficiency potential, which has hindered widespread adoption. To address this, we present xTern, a lightweight extension of the RISC-V instruction set architecture (ISA) targeted at accelerating TNN inference on general-purpose cores. To complement the ISA extension, we developed a set of optimized kernels leveraging xTern, achieving 67% higher throughput than their 2-bit equivalents. Power consumption is only marginally increased by 5.2%, resulting in an energy efficiency improvement by 57.1%. We demonstrate that the proposed xTern extension, integrated into an octa-core compute cluster, incurs a minimal silicon area overhead of 0.9% with no impact on timing. In end-to-end benchmarks, we demonstrate that xTern enables the deployment of TNNs achieving up to 1.6 percentage points higher CIFAR-10 classification accuracy than 2-bit networks at equal inference latency. Our results show that xTern enables RISC-V-based ultra-low-power edge AI platforms to benefit from the efficiency potential of TNNs.”

Find the technical paper here. Published May 2024.

Rutishauser, Georg, Joan Mihali, Moritz Scherer, and Luca Benini. “xTern: Energy-Efficient Ternary Neural Network Inference on RISC-V-Based Edge Systems.” arXiv preprint arXiv:2405.19065 (2024).

The post Efficient TNN Inference on RISC-V Processing Cores With Minimal HW Overhead appeared first on Semiconductor Engineering.

Adoption of Chiplet Technology in the Automotive Industry

A technical paper titled “Chiplets on Wheels: Review Paper on Holistic Chiplet Solutions for Autonomous Vehicles” was published by researchers at the Indian Institute of Technology, Madras.

Abstract
“On the advent of the slow death of Moore’s law, the silicon industry is moving towards a new era of chiplets. The automotive industry is experiencing a profound transformation towards software-defined vehicles, fueled by the surging demand for automotive compute chips, expected to reach 20-22 billion by 2030. High-performance compute (HPC) chips become instrumental in meeting the soaring demand for computational power. Various strategies, including centralized electrical and electronic architecture and the innovative Chiplet Systems, are under exploration. The latter, breaking down System-on-Chips (SoCs) into functional units, offers unparalleled customization and integration possibilities. The research accentuates the crucial open Chiplet ecosystem, fostering collaboration and enhancing supply chain resilience. In this paper, we address the unique challenges that arise when attempting to leverage chiplet-based architecture to design a holistic silicon solution for the automotive industry. We propose a throughput-oriented micro-architecture for ADAS and infotainment systems alongside a novel methodology to evaluate chiplet architectures. Further, we develop in-house simulation tools leveraging the gem5 framework to simulate latency and throughput. Finally, we perform an extensive design of thermally-aware chiplet placement and develop a micro-fluids-based cooling design.”

Find the technical paper here. Published May 2024.

Narashiman, Swathi, Divyaratna Joshi, Deepak Sridhar, Harish Rajesh, Sanjay Sattva, and Varun Manjunath. “Chiplets on Wheels: Review Paper on Holistic Chiplet Solutions for Autonomous Vehicles.” arXiv preprint arXiv:2406.00182 (2024).

The post Adoption of Chiplet Technology in the Automotive Industry appeared first on Semiconductor Engineering.

Comparing Thermal Properties In Molybdenum Substrate To Si And Glass For A System-On-Foil Integration (RIT, Lux)

A technical paper titled “Comparative Analysis of Thermal Properties in Molybdenum Substrate to Silicon and Glass for a System-on-Foil Integration” was published by researchers at Rochester Institute of Technology and Lux Semiconductors.

Abstract:

“Advanced electronics technology is moving towards smaller footprints and higher computational power. In order to achieve this, advanced packaging techniques are currently being considered, including organic, glass, and semiconductor-based substrates that allow for 2.5D or 3D integration of chips and devices. Metal-core substrates are a new alternative with similar properties to those of semiconductor-based substrates but with the added benefits of higher flexibility and metal ductility. This work comprehensively compares the thermal properties of a novel metal-based substrate, molybdenum, and silicon and fused silica glass substrates in the context of system-on-foil (SoF) integration. A simple electronic technique is used to simulate the heat generated by a typical CPU and to measure the heat dissipation properties of the substrates. The results indicate that molybdenum and silicon are able to effectively dissipate a continuous power density of 2.3 W/mm2 as the surface temperature only increases by ~15°C. In contrast, the surface temperature of fused silica glass substrates increases by >140°C for the same applied power. These simple techniques and measurements were validated with infrared camera measurements as well as through finite element analysis via COMSOL simulation. The results validate the use of molybdenum as an advanced packaging substrate and can be used to characterize new substrates and approaches for advanced packaging.”

Find the technical paper here. Published May 2024.

Huang, Tzu-Jung, Tobias Kiebala, Paul Suflita, Chad Moore, Graeme Housser, Shane McMahon, and Ivan Puchades. 2024. “Comparative Analysis of Thermal Properties in Molybdenum Substrate to Silicon and Glass for a System-on-Foil Integration” Electronics 13, no. 10: 1818. https://doi.org/10.3390/electronics13101818

Related Reading
The Race To Glass Substrates
Replacing silicon and organic substrates requires huge shifts in manufacturing, creating challenges that will take years to iron out.

 

The post Comparing Thermal Properties In Molybdenum Substrate To Si And Glass For A System-On-Foil Integration (RIT, Lux) appeared first on Semiconductor Engineering.

Using Formal Verification To Evaluate The HW Reliability Of A RISC-V Ibex Core In The Presence Of Soft Errors

A technical paper titled “Using Formal Verification to Evaluate Single Event Upsets in a RISC-V Core” was published by researchers at University of Southampton.

Abstract:

“Reliability has been a major concern in embedded systems. Higher transistor density and lower voltage supply increase the vulnerability of embedded systems to soft errors. A Single Event Upset (SEU), which is also called a soft error, can reverse a bit in a sequential element, resulting in a system failure. Simulation-based fault injection has been widely used to evaluate reliability, as suggested by ISO26262. However, it is practically impossible to test all faults for a complex design. Random fault injection is a compromise that reduces accuracy and fault coverage. Formal verification is an alternative approach. In this paper, we use formal verification, in the form of model checking, to evaluate the hardware reliability of a RISC-V Ibex Core in the presence of soft errors. Backward tracing is performed to identify and categorize faults according to their effects (no effect, Silent Data Corruption, crashes, and hangs). By using formal verification, the entire state space and fault list can be exhaustively explored. It is found that misaligned instructions can amplify fault effects. It is also found that some bits are more vulnerable to SEUs than others. In general, most of the bits in the Ibex Core are vulnerable to Silent Data Corruption, and the second pipeline stage is more vulnerable to Silent Data Corruption than the first.”

Find the technical paper here. Published May 2024 (preprint).

Xue, Bing, and Mark Zwolinski. “Using Formal Verification to Evaluate Single Event Upsets in a RISC-V Core.” arXiv preprint arXiv:2405.12089 (2024).

Related Reading
Formal Verification’s Usefulness Widens
Demand for IC reliability pushes formal into new applications, where complex interactions and security risks are difficult to solve with other tools.
RISC-V Micro-Architectural Verification
Verifying a processor is much more than making sure the instructions work, but the industry is building from a limited knowledge base and few dedicated tools.

The post Using Formal Verification To Evaluate The HW Reliability Of A RISC-V Ibex Core In The Presence Of Soft Errors appeared first on Semiconductor Engineering.

CAM-Based CMOS Implementation Of Reference Frames For Neuromorphic Processors (Carnegie Mellon U.)

A technical paper titled “NeRTCAM: CAM-Based CMOS Implementation of Reference Frames for Neuromorphic Processors” was published by researchers at Carnegie Mellon University.

Abstract:

“Neuromorphic architectures mimicking biological neural networks have been proposed as a much more efficient alternative to conventional von Neumann architectures for the exploding compute demands of AI workloads. Recent neuroscience theory on intelligence suggests that Cortical Columns (CCs) are the fundamental compute units in the neocortex and intelligence arises from CC’s ability to store, predict and infer information via structured Reference Frames (RFs). Based on this theory, recent works have demonstrated brain-like visual object recognition using software simulation. Our work is the first attempt towards direct CMOS implementation of Reference Frames for building CC-based neuromorphic processors. We propose NeRTCAM (Neuromorphic Reverse Ternary Content Addressable Memory), a CAM-based building block that supports the key operations (store, predict, infer) required to perform inference using RFs. NeRTCAM architecture is presented in detail including its key components. All designs are implemented in SystemVerilog and synthesized in 7nm CMOS, and hardware complexity scaling is evaluated for varying storage sizes. NeRTCAM system for biologically motivated MNIST inference with a storage size of 1024 entries incurs just 0.15 mm^2 area, 400 mW power and 9.18 us critical path latency, demonstrating the feasibility of direct CMOS implementation of CAM-based Reference Frames.”

Find the technical paper here. Published May 2024 (preprint).

Nair, Harideep, William Leyman, Agastya Sampath, Quinn Jacobson, and John Paul Shen. “NeRTCAM: CAM-Based CMOS Implementation of Reference Frames for Neuromorphic Processors.” arXiv preprint arXiv:2405.11844 (2024).

Related Reading
Running More Efficient AI/ML Code With Neuromorphic Engines
Once a buzzword, neuromorphic engineering is gaining traction in the semiconductor industry.

The post CAM-Based CMOS Implementation Of Reference Frames For Neuromorphic Processors (Carnegie Mellon U.) appeared first on Semiconductor Engineering.

DRAM Microarchitectures And Their Impacts On Activate-Induced Bitflips Such As RowHammer 

A technical paper titled “DRAMScope: Uncovering DRAM Microarchitecture and Characteristics by Issuing Memory Commands” was published by researchers at Seoul National University and University of Illinois at Urbana-Champaign.

Abstract:

“The demand for precise information on DRAM microarchitectures and error characteristics has surged, driven by the need to explore processing in memory, enhance reliability, and mitigate security vulnerability. Nonetheless, DRAM manufacturers have disclosed only a limited amount of information, making it difficult to find specific information on their DRAM microarchitectures. This paper addresses this gap by presenting more rigorous findings on the microarchitectures of commodity DRAM chips and their impacts on the characteristics of activate-induced bitflips (AIBs), such as RowHammer and RowPress. The previous studies have also attempted to understand the DRAM microarchitectures and associated behaviors, but we have found some of their results to be misled by inaccurate address mapping and internal data swizzling, or lack of a deeper understanding of the modern DRAM cell structure. For accurate and efficient reverse-engineering, we use three tools: AIBs, retention time test, and RowCopy, which can be cross-validated. With these three tools, we first take a macroscopic view of modern DRAM chips to uncover the size, structure, and operation of their subarrays, memory array tiles (MATs), and rows. Then, we analyze AIB characteristics based on the microscopic view of the DRAM microarchitecture, such as 6F^2 cell layout, through which we rectify misunderstandings regarding AIBs and discover a new data pattern that accelerates AIBs. Lastly, based on our findings at both macroscopic and microscopic levels, we identify previously unknown AIB vulnerabilities and propose a simple yet effective protection solution.”

Find the technical paper here. Published May 2024.

Nam, Hwayong, Seungmin Baek, Minbok Wi, Michael Jaemin Kim, Jaehyun Park, Chihun Song, Nam Sung Kim, and Jung Ho Ahn. “DRAMScope: Uncovering DRAM Microarchitecture and Characteristics by Issuing Memory Commands.” arXiv preprint arXiv:2405.02499 (2024).

Related Reading
Securing DRAM Against Evolving Rowhammer Threats
A multi-layered, system-level approach is crucial to DRAM protection.

 

The post DRAM Microarchitectures And Their Impacts On Activate-Induced Bitflips Such As RowHammer  appeared first on Semiconductor Engineering.

Competitive Open-Source EDA Tools

A technical paper titled “Basilisk: Achieving Competitive Performance with Open EDA Tools on an Open-Source Linux-Capable RISC-V SoC” was published by researchers at ETH Zurich and University of Bologna.

Abstract:

“We introduce Basilisk, an optimized application-specific integrated circuit (ASIC) implementation and design flow building on the end-to-end open-source Iguana system-on-chip (SoC). We present enhancements to synthesis tools and logic optimization scripts improving quality of results (QoR), as well as an optimized physical design with an improved power grid and cell placement integration enabling a higher core utilization. The tapeout-ready version of Basilisk implemented in IHP’s open 130 nm technology achieves an operation frequency of 77 MHz (51 logic levels) under typical conditions, a 2.3x improvement compared to the baseline open-source EDA design flow presented in Iguana, and a higher 55% core utilization compared to 50% in the baseline design. Through collaboration with EDA tool developers and domain experts, Basilisk exemplifies a synergistic effort towards competitive open-source electronic design automation (EDA) tools for research and industry applications.”

Find the technical paper here. Published May 2024.

Sauter, Phillippe, Thomas Benz, Paul Scheffler, Zerun Jiang, Beat Muheim, Frank K. Gürkaynak, and Luca Benini. “Basilisk: Achieving Competitive Performance with Open EDA Tools on an Open-Source Linux-Capable RISC-V SoC.” arXiv preprint arXiv:2405.03523 (2024).

Related Reading
EDA Back On Investors’ Radar
Big changes are fueling growth, and it’s showing in EDA revenue, acquisitions, and stock prices.
RISC-V Wants All Your Cores
It is not enough to want to dominate the world of CPUs. RISC-V has every core in its sights, and it’s starting to take steps to get there.

The post Competitive Open-Source EDA Tools appeared first on Semiconductor Engineering.

Distributing RTL Simulation Across Thousands Of Cores On 4 IPU Sockets (EPFL)

A technical paper titled “Parendi: Thousand-Way Parallel RTL Simulation” was published by researchers at EPFL.

Abstract:

“Hardware development relies on simulations, particularly cycle-accurate RTL (Register Transfer Level) simulations, which consume significant time. As single-processor performance grows only slowly, conventional, single-threaded RTL simulation is becoming less practical for increasingly complex chips and systems. A solution is parallel RTL simulation, where ideally, simulators could run on thousands of parallel cores. However, existing simulators can only exploit tens of cores.
This paper studies the challenges inherent in running parallel RTL simulation on a multi-thousand-core machine (the Graphcore IPU, a 1472-core machine). Simulation performance requires balancing three factors: synchronization, communication, and computation. We experimentally evaluate each metric and analyze how it affects parallel simulation speed, drawing on contrasts between the large-scale IPU and smaller but faster x86 systems.
Using this analysis, we build Parendi, an RTL simulator for the IPU. It distributes RTL simulation across 5888 cores on 4 IPU sockets. Parendi runs large RTL designs up to 4x faster than a powerful, state-of-the-art x86 multicore system.”

Find the technical paper here. Published March 2024 (preprint).

Emami, Mahyar, Thomas Bourgeat, and James Larus. “Parendi: Thousand-Way Parallel RTL Simulation.” arXiv preprint arXiv:2403.04714 (2024).

Related Reading
Anatomy Of A System Simulation
Balancing the benefits of a model with the costs associated with that model is tough, but it becomes even trickier when dissimilar models are combined.

The post Distributing RTL Simulation Across Thousands Of Cores On 4 IPU Sockets (EPFL) appeared first on Semiconductor Engineering.

A Micro Light-Emitting Transistor With An N-Channel GaN FET In Series With A GaN LED

A technical paper titled “Tunnel Junction-Enabled Monolithically Integrated GaN Micro-Light Emitting Transistor” was published by researchers at the Ohio State University and Sandia National Laboratory.

Abstract:

“GaN/InGaN microLEDs are a very promising technology for next generation displays. Switching control transistors and their integration are key components in achieving high-performance, efficient displays. Monolithic integration of microLEDs with GaN switching devices provides an opportunity to control microLED output power with capacitive (voltage) control rather than current controlled schemes. This approach can greatly reduce system complexity for the driver circuit arrays while maintaining device opto-electronic performance. In this work, we demonstrate a 3-terminal GaN micro-light emitting transistor that combines a GaN/InGaN blue tunneling-based microLED with a GaN n-channel FET. The integrated device exhibits excellent gate control, drain current control and optical emission control. This work provides a promising pathway for future monolithic integration of GaN FETs with microLED to enable fast switching high efficiency microLED display and communication systems.”

Find the technical paper here. Published April 2024.

Rahman, Sheikh Ifatur, Mohammad Awwad, Chandan Joishi, Zane-Jamal Eddine, Brendan Gunning, Andrew Armstrong, and Siddharth Rajan. “Tunnel Junction-Enabled Monolithically Integrated GaN Micro-Light Emitting Transistor.” arXiv preprint arXiv:2404.05095 (2024).

The post A Micro Light-Emitting Transistor With An N-Channel GaN FET In Series With A GaN LED appeared first on Semiconductor Engineering.

Voltage Reference Architectures For Harsh Environments: Quantum Computing And Space

A technical paper titled “Cryo-CMOS Voltage References for the Ultrawide Temperature Range From 300 K Down to 4.2 K” was published by researchers at Delft University of Technology, QuTech, Kavli Institute of Nanoscience Delft, and École Polytechnique Fédérale de Lausanne (EPFL).

Abstract:

“This article presents a family of sub-1-V, fully-CMOS voltage references adopting MOS devices in weak inversion to achieve continuous operation from room temperature (RT) down to cryogenic temperatures. Their accuracy limitations due to curvature, body effect, and mismatch are investigated and experimentally validated. Implemented in 40-nm CMOS, the references show a line regulation better than 2.7%/V from a supply as low as 0.99 V. By applying dynamic element matching (DEM) techniques, a spread of 1.2% (3 σ ) from 4.2 to 300 K can be achieved, resulting in a temperature coefficient (TC) of 111 ppm/K. As the first significant statistical characterization extending down to cryogenic temperatures, the results demonstrate the ability of the proposed architectures to work under cryogenic harsh environments, such as space-and quantum-computing applications.”

Find the technical paper here. Published April 2024.

J. van Staveren et al., “Cryo-CMOS Voltage References for the Ultrawide Temperature Range From 300 K Down to 4.2 K,” in IEEE Journal of Solid-State Circuits, doi: 10.1109/JSSC.2024.3378768.

Further Reading
The Race Toward Quantum Advantage
Enormous amounts of money have been invested into quantum computing, but so far it has not surpassed conventional computers. When will that change?
Managing P/P Tradeoffs With Voltage Droop Gets Trickier
Higher current densities set against lower power envelopes makes meeting specs more challenging, especially at advanced nodes.

The post Voltage Reference Architectures For Harsh Environments: Quantum Computing And Space appeared first on Semiconductor Engineering.

Framework For Early Anomaly Detection In AMS Components Of Automotive SoCs

A technical paper titled “Enhancing Functional Safety in Automotive AMS Circuits through Unsupervised Machine Learning” was published by researchers at University of Texas at Dallas, Intel Corporation, NXP Semiconductors, and Texas Instruments.

Abstract:

“Given the widespread use of safety-critical applications in the automotive field, it is crucial to ensure the Functional Safety (FuSa) of circuits and components within automotive systems. The Analog and Mixed-Signal (AMS) circuits prevalent in these systems are more vulnerable to faults induced by parametric perturbations, noise, environmental stress, and other factors, in comparison to their digital counterparts. However, their continuous signal characteristics present an opportunity for early anomaly detection, enabling the implementation of safety mechanisms to prevent system failure. To address this need, we propose a novel framework based on unsupervised machine learning for early anomaly detection in AMS circuits. The proposed approach involves injecting anomalies at various circuit locations and individual components to create a diverse and comprehensive anomaly dataset, followed by the extraction of features from the observed circuit signals. Subsequently, we employ clustering algorithms to facilitate anomaly detection. Finally, we propose a time series framework to enhance and expedite anomaly detection performance. Our approach encompasses a systematic analysis of anomaly abstraction at multiple levels pertaining to the automotive domain, from hardware- to block-level, where anomalies are injected to create diverse fault scenarios. By monitoring the system behavior under these anomalous conditions, we capture the propagation of anomalies and their effects at different abstraction levels, thereby potentially paving the way for the implementation of reliable safety mechanisms to ensure the FuSa of automotive SoCs. Our experimental findings indicate that our approach achieves 100% anomaly detection accuracy and significantly optimizes the associated latency by 5X, underscoring the effectiveness of our devised solution.”

Find the technical paper here. Published April 2024 (preprint).

Arunachalam, Ayush, Ian Kintz, Suvadeep Banerjee, Arnab Raha, Xiankun Jin, Fei Su, Viswanathan Pillai Prasanth, Rubin A. Parekhji, Suriyaprakash Natarajan, and Kanad Basu. “Enhancing Functional Safety in Automotive AMS Circuits through Unsupervised Machine Learning.” arXiv preprint arXiv:2404.01632 (2024).

Related Reading
Creating IP In The Shadow Of ISO 26262
Automotive regulations can turn an interesting chip design project into a complex and often frustrating checklist exercise. In the case of ISO 26262, that includes a 12-part standard for automotive safety.
Shifting Left Using Model-Based Engineering
MBSE becomes useful for identifying potential problems earlier in the design flow, but it’s not perfect.

 

The post Framework For Early Anomaly Detection In AMS Components Of Automotive SoCs appeared first on Semiconductor Engineering.

Metrology For 2D Materials: A Review From The International Roadmap For Devices And Systems (NIST, Et Al.)

A technical paper titled “Metrology for 2D materials: a perspective review from the international roadmap for devices and systems” was published by researchers at Arizona State University, IBM Research, Unity-SC, and the National Institute of Standards and Technology (NIST).

Abstract:

“The International Roadmap for Devices and Systems (IRDS) predicts the integration of 2D materials into high-volume manufacturing as channel materials within the next decade, primarily in ultra-scaled and low-power devices. While their widespread adoption in advanced chip manufacturing is evolving, the need for diverse characterization methods is clear. This is necessary to assess structural, electrical, compositional, and mechanical properties to control and optimize 2D materials in mass-produced devices. Although the lab-to-fab transition remains nascent and a universal metrology solution is yet to emerge, rapid community progress underscores the potential for significant advancements. This paper reviews current measurement capabilities, identifies gaps in essential metrology for CMOS-compatible 2D materials, and explores fundamental measurement science limitations when applying these techniques in high-volume semiconductor manufacturing.”

Find the technical paper here. Published April 2024.

Changming Wu et al., Freeform direct-write and rewritable photonic integrated circuits in phase-change thin films.Sci. Adv.10,eadk1361(2024).DOI:10.1126/sciadv.adk1361

Further Reading
Closing The Test And Metrology Gap In 3D-IC Packages
Finding defects in stacked die is a daunting challenge. Equipment, processes, and methodologies all need modifications, and that’s just for starters.
Pressure Builds On Failure Analysis Labs
Goal is to find the causes of failures faster and much earlier — preferably before first silicon.

The post Metrology For 2D Materials: A Review From The International Roadmap For Devices And Systems (NIST, Et Al.) appeared first on Semiconductor Engineering.

Merging Power and Arithmetic Optimization Via Datapath Rewriting (Intel, Imperial College London)

A new technical paper titled “Combining Power and Arithmetic Optimization via Datapath Rewriting” was published by researchers at Intel Corporation and Imperial College London.

Abstract:
“Industrial datapath designers consider dynamic power consumption to be a key metric. Arithmetic circuits contribute a major component of total chip power consumption and are therefore a common target for power optimization. While arithmetic circuit area and dynamic power consumption are often correlated, there is also a tradeoff to consider, as additional gates can be added to explicitly reduce arithmetic circuit activity and hence reduce power consumption. In this work, we consider two forms of power optimization and their interaction: circuit area reduction via arithmetic optimization, and the elimination of redundant computations using both data and clock gating. By encoding both these classes of optimization as local rewrites of expressions, our tool flow can simultaneously explore them, uncovering new opportunities for power saving through arithmetic rewrites using the e-graph data structure. Since power consumption is highly dependent upon the workload performed by the circuit, our tool flow facilitates a data dependent design paradigm, where an implementation is automatically tailored to particular contexts of data activity. We develop an automated RTL to RTL optimization framework, ROVER, that takes circuit input stimuli and generates power-efficient architectures. We evaluate the effectiveness on both open-source arithmetic benchmarks and benchmarks derived from Intel production examples. The tool is able to reduce the total power consumption by up to 33.9%.”

Find the technical paper here. Published April 2024.

Samuel Coward, Theo Drane, Emiliano Morini, George Constantinides; arXiv:2404.12336v1.

The post Merging Power and Arithmetic Optimization Via Datapath Rewriting (Intel, Imperial College London) appeared first on Semiconductor Engineering.

Merging Power and Arithmetic Optimization Via Datapath Rewriting (Intel, Imperial College London)

A new technical paper titled “Combining Power and Arithmetic Optimization via Datapath Rewriting” was published by researchers at Intel Corporation and Imperial College London.

Abstract:
“Industrial datapath designers consider dynamic power consumption to be a key metric. Arithmetic circuits contribute a major component of total chip power consumption and are therefore a common target for power optimization. While arithmetic circuit area and dynamic power consumption are often correlated, there is also a tradeoff to consider, as additional gates can be added to explicitly reduce arithmetic circuit activity and hence reduce power consumption. In this work, we consider two forms of power optimization and their interaction: circuit area reduction via arithmetic optimization, and the elimination of redundant computations using both data and clock gating. By encoding both these classes of optimization as local rewrites of expressions, our tool flow can simultaneously explore them, uncovering new opportunities for power saving through arithmetic rewrites using the e-graph data structure. Since power consumption is highly dependent upon the workload performed by the circuit, our tool flow facilitates a data dependent design paradigm, where an implementation is automatically tailored to particular contexts of data activity. We develop an automated RTL to RTL optimization framework, ROVER, that takes circuit input stimuli and generates power-efficient architectures. We evaluate the effectiveness on both open-source arithmetic benchmarks and benchmarks derived from Intel production examples. The tool is able to reduce the total power consumption by up to 33.9%.”

Find the technical paper here. Published April 2024.

Samuel Coward, Theo Drane, Emiliano Morini, George Constantinides; arXiv:2404.12336v1.

The post Merging Power and Arithmetic Optimization Via Datapath Rewriting (Intel, Imperial College London) appeared first on Semiconductor Engineering.

Ultrathin vdW Ferromagnet at Room Temperature (MIT)

A technical paper titled “Current-induced switching of a van der Waals ferromagnet at room temperature” was published by researchers at Massachusetts Institute of Technology (MIT).

Abstract:

“Recent discovery of emergent magnetism in van der Waals magnetic materials (vdWMM) has broadened the material space for developing spintronic devices for energy-efficient computation. While there has been appreciable progress in vdWMM discovery, a solution for non-volatile, deterministic switching of vdWMMs at room temperature has been missing, limiting the prospects of their adoption into commercial spintronic devices. Here, we report the first demonstration of current-controlled non-volatile, deterministic magnetization switching in a vdW magnetic material at room temperature. We have achieved spin-orbit torque (SOT) switching of the PMA vdW ferromagnet Fe3GaTe2  using a Pt spin-Hall layer up to 320 K, with a threshold switching current density as low as Jsw = 1.69 × 106 A cm-2 at room temperature. We have also quantitatively estimated the anti-damping-like SOT efficiency of our Fe3GaTe2/Pt bilayer system to be ξDL = 0:093, using the second harmonic Hall voltage measurement technique. These results mark a crucial step in making vdW magnetic materials a viable choice for the development of scalable, energy-efficient spintronic devices.”

Find the technical paper here. Published February 2024. MIT’s related news article and video is here.

Kajale, S.N., Nguyen, T., Chao, C.A. et al. Current-induced switching of a van der Waals ferromagnet at room temperature. Nat Commun 15, 1485 (2024). https://doi.org/10.1038/s41467-024-45586-4

 

 

The post Ultrathin vdW Ferromagnet at Room Temperature (MIT) appeared first on Semiconductor Engineering.

K-Fault Resistant Partitioning To Assess Redundancy-Based HW Countermeasures To Fault Injections

A technical paper titled “Fault-Resistant Partitioning of Secure CPUs for System Co-Verification against Faults” was published by researchers at Université Paris-Saclay, Graz University of Technology, lowRISC, University Grenoble Alpes, Thales, and Sorbonne University.

Abstract:

“To assess the robustness of CPU-based systems against fault injection attacks, it is necessary to analyze the consequences of the fault propagation resulting from the intricate interaction between the software and the processor. However, current formal methodologies that combine both hardware and software aspects experience scalability issues, primarily due to the use of bounded verification techniques. This work formalizes the notion of k-fault resistant partitioning as an inductive solution to this fault propagation problem when assessing redundancy-based hardware countermeasures to fault injections. Proven security guarantees can then reduce the remaining hardware attack surface to consider in a combined analysis with the software, enabling a full co-verification methodology. As a result, we formally verify the robustness of the hardware lockstep countermeasure of the OpenTitan secure element to single bit-flip injections. Besides that, we demonstrate that previously intractable problems, such as analyzing the robustness of OpenTitan running a secure boot process, can now be solved by a co-verification methodology that leverages a k-fault resistant partitioning. We also report a potential exploitation of the register file vulnerability in two other software use cases. Finally, we provide a security fix for the register file, verify its robustness, and integrate it into the OpenTitan project.”

Find the technical paper here. Published 2024 (preprint).

Tollec, Simon, Vedad Hadžić, Pascal Nasahl, Mihail Asavoae, Roderick Bloem, Damien Couroussé, Karine Heydemann, Mathieu Jan, and Stefan Mangard. “Fault-Resistant Partitioning of Secure CPUs for System Co-Verification against Faults.” Cryptology ePrint Archive (2024).

Related Reading
RISC-V Micro-Architectural Verification
Verifying a processor is much more than making sure the instructions work, but the industry is building from a limited knowledge base and few dedicated tools.
New Concepts Required For Security Verification
Why it’s so difficult to ensure that hardware works correctly and is capable of detecting vulnerabilities that may show up in the field.

The post K-Fault Resistant Partitioning To Assess Redundancy-Based HW Countermeasures To Fault Injections appeared first on Semiconductor Engineering.

White-Box Fuzzer With Static Analysis To Detect And Locate Timing Vulnerabilities In RISC-V Processors 

A technical paper titled “WhisperFuzz: White-Box Fuzzing for Detecting and Locating Timing Vulnerabilities in Processors” was published by researchers at Indian Institute of Technology Madras, Texas A&M University, and
Technische Universität Darmstadt.

Abstract:

“Timing vulnerabilities in processors have emerged as a potent threat. As processors are the foundation of any computing system, identifying these flaws is imperative. Recently fuzzing techniques, traditionally used for detecting software vulnerabilities, have shown promising results for uncovering vulnerabilities in large-scale hardware designs, such as processors. Researchers have adapted black-box or grey-box fuzzing to detect timing vulnerabilities in processors. However, they cannot identify the locations or root causes of these timing vulnerabilities, nor do they provide coverage feedback to enable the designer’s confidence in the processor’s security.
To address the deficiencies of the existing fuzzers, we present WhisperFuzz–the first white-box fuzzer with static analysis–aiming to detect and locate timing vulnerabilities in processors and evaluate the coverage of microarchitectural timing behaviors. WhisperFuzz uses the fundamental nature of processors’ timing behaviors, microarchitectural state transitions, to localize timing vulnerabilities. WhisperFuzz automatically extracts microarchitectural state transitions from a processor design at the register-transfer level (RTL) and instruments the design to monitor the state transitions as coverage. Moreover, WhisperFuzz measures the time a design-under-test (DUT) takes to process tests, identifying any minor, abnormal variations that may hint at a timing vulnerability. WhisperFuzz detects 12 new timing vulnerabilities across advanced open-sourced RISC-V processors: BOOM, Rocket Core, and CVA6. Eight of these violate the zero latency requirements of the Zkt extension and are considered serious security vulnerabilities. Moreover, WhisperFuzz also pinpoints the locations of the new and the existing vulnerabilities.”

Find the technical paper here. Published February 2024 (preprint).

Borkar, Pallavi, Chen Chen, Mohamadreza Rostami, Nikhilesh Singh, Rahul Kande, Ahmad-Reza Sadeghi, Chester Rebeiro, and Jeyavijayan Rajendran. “WhisperFuzz: White-Box Fuzzing for Detecting and Locating Timing Vulnerabilities in Processors.” arXiv preprint arXiv:2402.03704 (2024).

Related Reading
RISC-V Micro-Architectural Verification
Verifying a processor is much more than making sure the instructions work, but the industry is building from a limited knowledge base and few dedicated tools.
What’s Required To Secure Chips
There is no single solution, and the most comprehensive security may be too expensive.

The post White-Box Fuzzer With Static Analysis To Detect And Locate Timing Vulnerabilities In RISC-V Processors  appeared first on Semiconductor Engineering.

Impact of Scaling and BEOL Technology Solutions At The 7nm Node On MRAM

A technical paper titled “Impact of Technology Scaling and Back-End-of-the-Line Technology Solutions on Magnetic Random-Access Memories” was published by researchers at Georgia Institute of Technology.

Abstract:

“While magnetic random-access memories (MRAMs) are promising because of their nonvolatility, relatively fast speeds, and high endurance, there are major challenges in adopting them for the advanced technology nodes. One of the major challenges in scaling MRAM devices is caused by the ever-increasing resistances of interconnects. In this article, we first study the impact of shrunk interconnect dimensions on MRAM performance at various technology nodes. Then, we investigate the impact of various potential back-end-of-the-line (BEOL) technology solutions at the 7 nm node. Based on interconnect resistance values from technology computer-aided design (TCAD) simulations and MRAM device characteristics from experimentally validated/calibrated physical models, we quantify the potential array-level performance of MRAM using SPICE simulations. We project that some potential BEOL technology solutions can reduce the write energy by up to 34.6% with spin-orbit torque (SOT) MRAM and 29.0% with spin-transfer torque (STT) MRAM. We also observe up to 21.4% reduction in the read energy of the SOT-MRAM arrays.”

Find the technical paper here. Published January 2024.

P. Kumar, D. E. Shim, S. Narla and A. Naeemi, “Impact of Technology Scaling and Back-End-of-the-Line Technology Solutions on Magnetic Random-Access Memories,” in IEEE Journal on Exploratory Solid-State Computational Devices and Circuits, vol. 10, pp. 13-21, 2024, doi: 10.1109/JXCDC.2024.3357625.

Related Reading
MRAM Getting More Attention At Smallest Nodes
Why this 25-year-old technology may be the memory of choice for leading edge designs and in automotive applications.
ReRAM Seeks To Replace NOR
There is increased interest in ReRAM for embedded computing, especially in automotive applications, as more of its known issues are solved. Nevertheless, there is no one-size-fits-all NVM.

The post Impact of Scaling and BEOL Technology Solutions At The 7nm Node On MRAM appeared first on Semiconductor Engineering.

Rapid Exchange Cooling With Trapped Ions For Implementation In A Quantum Charge-Coupled Device

A technical paper titled “Rapid exchange cooling with trapped ions” was published by researchers at Georgia Tech Research Institute.

Abstract:

“The trapped-ion quantum charge-coupled device (QCCD) architecture is a leading candidate for advanced quantum information processing. In current QCCD implementations, imperfect ion transport and anomalous heating can excite ion motion during a calculation. To counteract this, intermediate cooling is necessary to maintain high-fidelity gate performance. Cooling the computational ions sympathetically with ions of another species, a commonly employed strategy, creates a significant runtime bottleneck. Here, we demonstrate a different approach we call exchange cooling. Unlike sympathetic cooling, exchange cooling does not require trapping two different atomic species. The protocol introduces a bank of “coolant” ions which are repeatedly laser cooled. A computational ion can then be cooled by transporting a coolant ion into its proximity. We test this concept experimentally with two 40Ca+ ions, executing the necessary transport in 107 μs, an order of magnitude faster than typical sympathetic cooling durations. We remove over 96%, and as many as 102(5) quanta, of axial motional energy from the computational ion. We verify that re-cooling the coolant ion does not decohere the computational ion. This approach validates the feasibility of a single-species QCCD processor, capable of fast quantum simulation and computation.”

Find the technical paper here. Published February 2024.  A related news release, including a video, can be found here.

Fallek, S.D., Sandhu, V.S., McGill, R.A. et al. Rapid exchange cooling with trapped ions. Nat Commun 15, 1089 (2024). https://doi.org/10.1038/s41467-024-45232-z

Related Reading
The Race Toward Quantum Advantage
Enormous amounts of money have been invested into quantum computing, but so far it has not surpassed conventional computers. When will that change?

The post Rapid Exchange Cooling With Trapped Ions For Implementation In A Quantum Charge-Coupled Device appeared first on Semiconductor Engineering.

Ultra-Low Power CiM Design For Practical Edge Scenarios

A technical paper titled “Low Power and Temperature-Resilient Compute-In-Memory Based on Subthreshold-FeFET” was published by researchers at Zhejiang University, University of Notre Dame, Technical University of Munich, Munich Institute of Robotics and Machine Intelligence, and the Laboratory of Collaborative Sensing and Autonomous Unmanned Systems of Zhejiang Province.

Abstract:

“Compute-in-memory (CiM) is a promising solution for addressing the challenges of artificial intelligence (AI) and the Internet of Things (IoT) hardware such as ‘memory wall’ issue. Specifically, CiM employing nonvolatile memory (NVM) devices in a crossbar structure can efficiently accelerate multiply-accumulation (MAC) computation, a crucial operator in neural networks among various AI models. Low power CiM designs are thus highly desired for further energy efficiency optimization on AI models. Ferroelectric FET (FeFET), an emerging device, is attractive for building ultra-low power CiM array due to CMOS compatibility, high ION /IOF  ratio, etc. Recent studies have explored FeFET based CiM designs that achieve low power consumption. Nevertheless, subthreshold-operated FeFETs, where the operating voltages are scaled down to the subthreshold region to reduce array power consumption, are particularly vulnerable to temperature drift, leading to accuracy degradation. To address this challenge, we propose a temperature-resilient 2T-1FeFET CiM design that performs MAC operations reliably at subthreahold region from 0 to 85 Celsius, while consuming ultra-low power. Benchmarked against the VGG neural network architecture running the CIFAR-10 dataset, the proposed 2T-1FeFET CiM design achieves 89.45% CIFAR-10 test accuracy. Compared to previous FeFET based CiM designs, it exhibits immunity to temperature drift at an 8-bit wordlength scale, and achieves better energy efficiency with 2866 TOPS/W.”

Find the technical paper here. Published January 2024 (preprint).

Zhou, Yifei, Xuchu Huang, Jianyi Yang, Kai Ni, Hussam Amrouch, Cheng Zhuo, and Xunxhao Yin. “Low Power and Temperature-Resilient Compute-In-Memory Based on Subthreshold-FeFET.” arXiv preprint arXiv:2312.17442 (2023).

Related Reading
Increasing AI Energy Efficiency With Compute In Memory
How to process zettascale workloads and stay within a fixed power budget.
Modeling Compute In Memory With Biological Efficiency
Generative AI forces chipmakers to use compute resources more intelligently.

The post Ultra-Low Power CiM Design For Practical Edge Scenarios appeared first on Semiconductor Engineering.

Electronic Noise in vdW Layered AFMS (UCLA)

A technical paper titled “Electronic Noise Spectroscopy of Quasi-2D van der Waals Antiferromagnetic Semiconductors” was published by researchers at University of California Los Angeles.

Abstract:

“We investigated low-frequency current fluctuations, i.e. electronic noise, in FePS3 van der Waals, layered antiferromagnetic semiconductor. The noise measurements have been used as noise spectroscopy for advanced materials characterization of the charge carrier dynamics affected by spin ordering and trapping states. Owing to the high resistivity of the material, we conducted measurements on vertical device configuration. The measured noise spectra reveal pronounced Lorentzian peaks of two different origins. One peak is observed only near the Neel temperature and it is attributed to the corresponding magnetic phase transition. The second Lorentzian peak, visible in the entire measured temperature range, has the characteristics of the trap-assisted generation-recombination processes similar to those in conventional semiconductors but shows a clear effect of the spin order reconfiguration near the Neel temperature. The obtained results contribute to understanding the electron and spin dynamics in this type of antiferromagnetic semiconductors and demonstrate the potential of electronic noise spectroscopy for advanced materials characterization.”

Find the technical paper here. Published January 2024 (preprint).

Ghosh, Subhajit, Zahra Ebrahim Nataj, Fariborz Kargar, and Alexander A. Balandin. “Electronic Noise Spectroscopy of Quasi-2D van der Waals Antiferromagnetic Semiconductors.” arXiv preprint arXiv:2401.12432 (2024).

The post Electronic Noise in vdW Layered AFMS (UCLA) appeared first on Semiconductor Engineering.

❌