FreshRSS

Normální zobrazení

Jsou dostupné nové články, klikněte pro obnovení stránky.
PředevčíremHlavní kanál
  • ✇Semiconductor Engineering
  • Next-Gen High-Speed Communication In Data CentersEd Sperling
    Data centers are being flooded with data. While more of it needs to be processed locally, much of it also needs to be moved around within a system and between systems. This has put a spotlight on a variety of new optical technologies and methodologies. Yang Zhang, senior product marketing manager at Cadence, talks about the rapid increase in different types of optics and optical scenarios being developed to improve performance, reduce latency, and reduce overall power.  The post Next-Gen High-S
     

Next-Gen High-Speed Communication In Data Centers

21. Srpen 2024 v 09:15

Data centers are being flooded with data. While more of it needs to be processed locally, much of it also needs to be moved around within a system and between systems. This has put a spotlight on a variety of new optical technologies and methodologies. Yang Zhang, senior product marketing manager at Cadence, talks about the rapid increase in different types of optics and optical scenarios being developed to improve performance, reduce latency, and reduce overall power.

The post Next-Gen High-Speed Communication In Data Centers appeared first on Semiconductor Engineering.

  • ✇Semiconductor Engineering
  • Research Bits: Aug. 20Jesse Allen
    EUV mirror interference lithography Researchers from the Paul Scherrer Institute developed an EUV lithography technique that can produce conductive tracks with a separation of just five nanometers by exposing the sample indirectly rather than directly. Called EUV mirror interference lithography (MIL), the technique uses two mutually coherent beams that are reflected onto the wafer by two identical mirrors. The beams then create an interference pattern whose period depends on both the angle of in
     

Research Bits: Aug. 20

20. Srpen 2024 v 09:01

EUV mirror interference lithography

Researchers from the Paul Scherrer Institute developed an EUV lithography technique that can produce conductive tracks with a separation of just five nanometers by exposing the sample indirectly rather than directly.

Called EUV mirror interference lithography (MIL), the technique uses two mutually coherent beams that are reflected onto the wafer by two identical mirrors. The beams then create an interference pattern whose period depends on both the angle of incidence and the wavelength of the light. In addition to the 5nm resolution, the conductive tracks were found to have high contrast and sharp edges.

“Our results show that EUV lithography can produce extremely high resolutions, indicating that there are no fundamental limitations yet. This is really exciting since it extends the horizon of what we deem as possible and can also open up new avenues for research in the field of EUV lithography and photoresist materials,” said Dimitrios Kazazis of the Laboratory of X-ray Nanoscience and Technologies at PSI in a statement.

The method is currently too slow for industrial chip production and can produce only simple and periodic structures. However, the team sees it as a resource for early development of new photoresists and plans to continue research to improve its performance and capabilities. [1]

Artificial sapphire dielectrics

Researchers from Shanghai Institute of Microsystem and Information Technology created artificial sapphire dielectric wafers made of single-crystalline aluminum oxide (Al2O3).

“The aluminum oxide we created is essentially artificial sapphire, identical to natural sapphire in terms of crystal structure, dielectric properties and insulation characteristics,” said Tian Zi’ao, a researcher at SIMIT, in a release.

“By using intercalation oxidation technology on single-crystal aluminum, we were able to produce this single-crystal aluminum oxide dielectric material,” added Di Zengfeng, a researcher at SIMIT, in a release. “Unlike traditional amorphous dielectric materials, our crystalline sapphire can achieve exceptionally low leakage at just one-nanometer level.”

The researchers hope the improved dielectric properties could lead to more power-efficient devices. [2]

Accelerating computation on sparse data sets

Researchers from Lehigh University and Lawrence Berkeley National Laboratory developed specialized hardware that enables faster computation on data sets that have a high number of zero values, frequent in the fields of bioinformatics and physical sciences. The hardware is portable and can be integrated into general-purpose multi-core computers.

“The accelerating sparse accumulation (ASA) architecture includes a hardware buffer, a hardware cache, and a hardware adder. It takes two sparse matrices, performs a matrix multiplication, and outputs a sparse matrix. The ASA only uses non-zero data when it performs this operation, which makes the architecture more efficient. The hardware buffer and the cache allow the computer processor to easily manage the flow of data; the hardware adder allows the processor to quickly generate values to fill up the empty matrices,” explained Berkely Lab’s Ingrid Ockert in a press release. “Once these values are calculated, the ASA system produces an output. This operation is a building block that the researcher can then use in other functions. For instance, researchers could use these outputs to generate graphs or they could process these outputs through other algorithms such as a Sparse General Matrix-Matrix Multiplication (SpGEMM) algorithm.”

The ASA architecture could accelerate a variety of algorithms. Microbiome research is presented as an example, where it could be used to run metagenomic assembly and similarity clustering algorithms such as Markov Cluster Algorithms that quickly characterize the genetic markers of all of the organisms in a soil sample. [3]

References

[1] I. Giannopoulos, I. Mochi, M. Vockenhuber, Y. Ekinci & D. Kazazis. Extreme ultraviolet lithography reaches 5 nm resolution. Nanoscale, 12.08.2024 https://doi.org/10.1039/D4NR01332H

[2] Zeng, D., Zhang, Z., Xue, Z. et al. Single-crystalline metal-oxide dielectrics for top-gate 2D transistors. Nature (2024). https://doi.org/10.1038/s41586-024-07786-2

[3] Chao Zhang, Maximilian Bremer, Cy Chan, John M Shalf, and Xiaochen Guo. ASA: Accelerating Sparse Accumulation in Column-wise SpGEMM. ACM Transactions on Architecture and Code Optimization (TACO) Volume 19, Issue 4, Article No.: 49, Pages 1-24 https://doi.org/10.1145/3543068

The post Research Bits: Aug. 20 appeared first on Semiconductor Engineering.

  • ✇Semiconductor Engineering
  • Real-World Applications Of Computational Fluid DynamicsEd Sperling
    More powerful chips are enabling chips to process more data faster, but they’re also having a revolutionary impact on how that data can be used. Simulations that used to take days or weeks now can be completed in a matter of hours, and multi-physics simulations that were implausible to even consider are now very much in the realm of what is possible. Parviz Moin, professor of mechanical engineering and director of the Center for Turbulence Research at Stanford University, talks about a future fi
     

Real-World Applications Of Computational Fluid Dynamics

5. Srpen 2024 v 09:15

More powerful chips are enabling chips to process more data faster, but they’re also having a revolutionary impact on how that data can be used. Simulations that used to take days or weeks now can be completed in a matter of hours, and multi-physics simulations that were implausible to even consider are now very much in the realm of what is possible. Parviz Moin, professor of mechanical engineering and director of the Center for Turbulence Research at Stanford University, talks about a future filled with “what if” scenarios, more grid points to capture tiny anomalies in wind or the behavior of jet engines, and much more detailed high-fidelity numerical simulations of waves, chemical reactions, and phase changes.

The post Real-World Applications Of Computational Fluid Dynamics appeared first on Semiconductor Engineering.

  • ✇Semiconductor Engineering
  • Research Bits: Aug. 5Jesse Allen
    Measuring temperature with neutrons Researchers from Osaka University, National Institutes for Quantum Science and Technology, Hokkaido University, Japan Atomic Energy Agency, and Tokamak Energy developed a way to rapidly measure the temperature of electronic components inside a device using neutrons. The technique, called ‘neutron resonance absorption’ (NRA), examines neutrons being absorbed by atomic nuclei at certain energy levels to determine the properties of the material. After being gener
     

Research Bits: Aug. 5

5. Srpen 2024 v 09:01

Measuring temperature with neutrons

Researchers from Osaka University, National Institutes for Quantum Science and Technology, Hokkaido University, Japan Atomic Energy Agency, and Tokamak Energy developed a way to rapidly measure the temperature of electronic components inside a device using neutrons.

The technique, called ‘neutron resonance absorption’ (NRA), examines neutrons being absorbed by atomic nuclei at certain energy levels to determine the properties of the material. After being generated using high-intensity laser beans, the neutrons were then decelerated to a very low energy level before being passed through the sample, in this case plates of tantalum and silver. The temporal signal of the NRA was altered in a predictable manner when the sample material’s temperature was changed.

“This technology makes it possible to instantaneously and accurately measure temperature,” said Zechen Lan of Osaka University, in a statement. “As our method is non-destructive, it can be used to monitor devices like batteries and semiconductor devices.”

The technique can acquire temperature data in a window of 100 nanoseconds, and the measurement device itself is about a tenth of the size of similar equipment.

“Using lasers to generate and accelerate ions and neutrons is nothing new, but the techniques we’ve developed in this study represent an exciting advance,” added Akifumi Yogo of Osaka University, in a statement. “We expect that the high temporal resolution will allow electronics to be examined in greater detail, help us to understand normal operating conditions, and pinpoint abnormalities.” [1]

Mapping heat transfer

Researchers from the University of Rochester applied optical super-resolution fluorescence microscopy techniques used in biological imaging to map heat transfer in electronic devices using luminescent nanoparticles.

By applying highly doped upconverting nanoparticles to the surface of a device, the researchers were able to achieve super-high resolution thermometry at the nanoscale level from up to 10 millimeters away.

Rochester researchers demonstrated their super-high resolution thermometry techniques on an electrical heater structure that the team designed to produce sharp temperature gradients. (Credit: University of Rochester / J. Adam Fenster)

“The building blocks of our modern electronics are transistors with nanoscale features, so to understand which parts of overheating, the first step is to get a detailed temperature map,” said Andrea Pickel, an assistant professor from the University of Rochester’s Department of Mechanical Engineering, in a release. “But you need something with nanoscale resolution to do that.”

The researchers demonstrated the technique using an electrical heater structure designed to produce sharp temperature gradients. To improve the process, the team hopes to lower the laser power used and refine the methods for applying layers of nanoparticles to the devices. [2]

ML for predicting thermal properties

Researchers from MIT, Argonne National Laboratory, Harvard University, the University of South Carolina, Emory University, the University of California at Santa Barbara, and Oak Ridge National Laboratory propose a new machine learning framework that provides much faster prediction of phonon dispersion relations, an important measurement for determining the thermal properties of a material and how heat moves through semiconductors and insulators.

Heat-carrying phonons have an extremely wide frequency range, and the particles interact and travel at different speeds. “Phonons are the culprit for the thermal loss, yet obtaining their properties is notoriously challenging, either computationally or experimentally,” said Mingda Li, associate professor of nuclear science and engineering at MIT, in a release.

The researchers started with a graph neural network (GNN) that converts a material’s atomic structure into a crystal graph comprising multiple nodes, which represent atoms, connected by edges, which represent the interatomic bonding between atoms.

To make it suitable for predicting phonon dispersion relations, they created a virtual node graph neural network (VGNN) by adding a series of flexible virtual nodes to the fixed crystal structure to represent phonons. This enabled the VGNN to skip many complex calculations when estimating phonon dispersion relations, making it a more efficient method than a standard GNN.

Li noted that a VGNN could be used to calculate phonon dispersion relations for a few thousand materials in a few seconds with a personal computer. The technique could also be used to predict challenging optical and magnetic properties. [3]

References

[1] Lan, Z., Arikawa, Y., Mirfayzi, S.R. et al. Single-shot laser-driven neutron resonance spectroscopy for temperature profiling. Nat Commun 15, 5365 (2024). https://doi.org/10.1038/s41467-024-49142-y

[2] Ziyang Ye et al., Optical super-resolution nanothermometry via stimulated emission depletion imaging of upconverting nanoparticles. Sci. Adv. 10, eado6268 (2024) https://doi.org/10.1126/sciadv.ado6268

[3] Okabe, R., Chotrattanapituk, A., Boonkird, A. et al. Virtual node graph neural network for full phonon prediction. Nat Comput Sci 4, 522–531 (2024). https://doi.org/10.1038/s43588-024-00661-0

The post Research Bits: Aug. 5 appeared first on Semiconductor Engineering.

  • ✇Semiconductor Engineering
  • Heterogeneity Of 3DICs As A Security VulnerabilityTechnical Paper Link
    A new technical paper titled “Harnessing Heterogeneity for Targeted Attacks on 3-D ICs” was published by Drexel University. Abstract “As 3-D integrated circuits (ICs) increasingly pervade the microelectronics industry, the integration of heterogeneous components presents a unique challenge from a security perspective. To this end, an attack on a victim die of a multi-tiered heterogeneous 3-D IC is proposed and evaluated. By utilizing on-chip inductive circuits and transistors with low voltage th
     

Heterogeneity Of 3DICs As A Security Vulnerability

A new technical paper titled “Harnessing Heterogeneity for Targeted Attacks on 3-D ICs” was published by Drexel University.

Abstract
“As 3-D integrated circuits (ICs) increasingly pervade the microelectronics industry, the integration of heterogeneous components presents a unique challenge from a security perspective. To this end, an attack on a victim die of a multi-tiered heterogeneous 3-D IC is proposed and evaluated. By utilizing on-chip inductive circuits and transistors with low voltage threshold (LVT), a die based on CMOS technology is proposed that includes a sensor to monitor the electromagnetic (EM) emissions from the normal function of a victim die, without requiring physical probing. The adversarial circuit is self-powered through the use of thermocouples that supply the generated current to circuits that sense EM emissions. Therefore, the integration of disparate technologies in a single 3-D circuit allows for a stealthy, wireless, and non-invasive side-channel attack. A thin-film thermo-electric generator (TEG) is developed that produces a 115 mV voltage source, which is amplified 5 × through a voltage booster to provide power to the adversarial circuit. An on-chip inductor is also developed as a component of a sensing array, which detects changes to the magnetic field induced by the computational activity of the victim die. In addition, the challenges associated with detecting and mitigating such attacks are discussed, highlighting the limitations of existing security mechanisms in addressing the multifaceted nature of vulnerabilities due to the heterogeneity of 3-D ICs.”

Find the technical paper here. Published June 2024.

Alec Aversa and Ioannis Savidis. 2024. Harnessing Heterogeneity for Targeted Attacks on 3-D ICs. In Proceedings of the Great Lakes Symposium on VLSI 2024 (GLSVLSI ’24). Association for Computing Machinery, New York, NY, USA, 246–251. https://doi.org/10.1145/3649476.3660385.

The post Heterogeneity Of 3DICs As A Security Vulnerability appeared first on Semiconductor Engineering.

  • ✇Semiconductor Engineering
  • A Hybrid ECO Detailed Placement Flow for Mitigating Dynamic IR Drop (UC San Diego)Technical Paper Link
    A new technical paper titled “A Hybrid ECO Detailed Placement Flow for Improved Reduction of Dynamic IR Drop” was published by researchers at UC San Diego. Abstract: “With advanced semiconductor technology progressing well into sub-7nm scale, voltage drop has become an increasingly challenging issue. As a result, there has been extensive research focused on predicting and mitigating dynamic IR drops, leading to the development of IR drop engineering change order (ECO) flows – often integrated wi
     

A Hybrid ECO Detailed Placement Flow for Mitigating Dynamic IR Drop (UC San Diego)

A new technical paper titled “A Hybrid ECO Detailed Placement Flow for Improved Reduction of Dynamic IR Drop” was published by researchers at UC San Diego.

Abstract:

“With advanced semiconductor technology progressing well into sub-7nm scale, voltage drop has become an increasingly challenging issue. As a result, there has been extensive research focused on predicting and mitigating dynamic IR drops, leading to the development of IR drop engineering change order (ECO) flows – often integrated with modern commercial EDA tools. However, these tools encounter QoR limitations while mitigating IR drop. To address this, we propose a hybrid ECO detailed placement approach that is integrated with existing commercial EDA flows, to mitigate excessive peak current demands within power and ground rails. Our proposed hybrid approach effectively optimizes peak current levels within a specified “clip”– complementing and enhancing commercial EDA dynamic IR-driven ECO detailed placements. In particular, we: (i) order instances in a netlist in decreasing order of worst voltage drop; (ii) extract a clip around each instance; and (iii) solve an integer linear programming (ILP) problem to optimize instance placements. Our approach optimizes dynamic voltage drops (DVD) across ten designs by up to 15.3% compared to original conventional flows, with similar timing quality and 55.1% less runtime.”

Find the technical paper here. Published June 2024.

Andrew B. Kahng, Bodhisatta Pramanik, and Mingyu Woo. 2024. A Hybrid ECO Detailed Placement Flow for Improved Reduction of Dynamic IR Drop. In Proceedings of the Great Lakes Symposium on VLSI 2024 (GLSVLSI ’24). Association for Computing Machinery, New York, NY, USA, 390–396. https://doi.org/10.1145/3649476.3658727.

The post A Hybrid ECO Detailed Placement Flow for Mitigating Dynamic IR Drop (UC San Diego) appeared first on Semiconductor Engineering.

  • ✇Semiconductor Engineering
  • Survey of Energy Efficient PIM ProcessorsTechnical Paper Link
    A new technical paper titled “Survey of Deep Learning Accelerators for Edge and Emerging Computing” was published by researchers at University of Dayton and the Air Force Research Laboratory. Abstract “The unprecedented progress in artificial intelligence (AI), particularly in deep learning algorithms with ubiquitous internet connected smart devices, has created a high demand for AI computing on the edge devices. This review studied commercially available edge processors, and the processors that
     

Survey of Energy Efficient PIM Processors

A new technical paper titled “Survey of Deep Learning Accelerators for Edge and Emerging Computing” was published by researchers at University of Dayton and the Air Force Research Laboratory.

Abstract

“The unprecedented progress in artificial intelligence (AI), particularly in deep learning algorithms with ubiquitous internet connected smart devices, has created a high demand for AI computing on the edge devices. This review studied commercially available edge processors, and the processors that are still in industrial research stages. We categorized state-of-the-art edge processors based on the underlying architecture, such as dataflow, neuromorphic, and processing in-memory (PIM) architecture. The processors are analyzed based on their performance, chip area, energy efficiency, and application domains. The supported programming frameworks, model compression, data precision, and the CMOS fabrication process technology are discussed. Currently, most commercial edge processors utilize dataflow architectures. However, emerging non-von Neumann computing architectures have attracted the attention of the industry in recent years. Neuromorphic processors are highly efficient for performing computation with fewer synaptic operations, and several neuromorphic processors offer online training for secured and personalized AI applications. This review found that the PIM processors show significant energy efficiency and consume less power compared to dataflow and neuromorphic processors. A future direction of the industry could be to implement state-of-the-art deep learning algorithms in emerging non-von Neumann computing paradigms for low-power computing on edge devices.”

Find the technical paper here. Published July 2024.

Alam, Shahanur, Chris Yakopcic, Qing Wu, Mark Barnell, Simon Khan, and Tarek M. Taha. 2024. “Survey of Deep Learning Accelerators for Edge and Emerging Computing” Electronics 13, no. 15: 2988. https://doi.org/10.3390/electronics13152988.

The post Survey of Energy Efficient PIM Processors appeared first on Semiconductor Engineering.

  • ✇Semiconductor Engineering
  • Heterogeneity Of 3DICs As A Security VulnerabilityTechnical Paper Link
    A new technical paper titled “Harnessing Heterogeneity for Targeted Attacks on 3-D ICs” was published by Drexel University. Abstract “As 3-D integrated circuits (ICs) increasingly pervade the microelectronics industry, the integration of heterogeneous components presents a unique challenge from a security perspective. To this end, an attack on a victim die of a multi-tiered heterogeneous 3-D IC is proposed and evaluated. By utilizing on-chip inductive circuits and transistors with low voltage th
     

Heterogeneity Of 3DICs As A Security Vulnerability

A new technical paper titled “Harnessing Heterogeneity for Targeted Attacks on 3-D ICs” was published by Drexel University.

Abstract
“As 3-D integrated circuits (ICs) increasingly pervade the microelectronics industry, the integration of heterogeneous components presents a unique challenge from a security perspective. To this end, an attack on a victim die of a multi-tiered heterogeneous 3-D IC is proposed and evaluated. By utilizing on-chip inductive circuits and transistors with low voltage threshold (LVT), a die based on CMOS technology is proposed that includes a sensor to monitor the electromagnetic (EM) emissions from the normal function of a victim die, without requiring physical probing. The adversarial circuit is self-powered through the use of thermocouples that supply the generated current to circuits that sense EM emissions. Therefore, the integration of disparate technologies in a single 3-D circuit allows for a stealthy, wireless, and non-invasive side-channel attack. A thin-film thermo-electric generator (TEG) is developed that produces a 115 mV voltage source, which is amplified 5 × through a voltage booster to provide power to the adversarial circuit. An on-chip inductor is also developed as a component of a sensing array, which detects changes to the magnetic field induced by the computational activity of the victim die. In addition, the challenges associated with detecting and mitigating such attacks are discussed, highlighting the limitations of existing security mechanisms in addressing the multifaceted nature of vulnerabilities due to the heterogeneity of 3-D ICs.”

Find the technical paper here. Published June 2024.

Alec Aversa and Ioannis Savidis. 2024. Harnessing Heterogeneity for Targeted Attacks on 3-D ICs. In Proceedings of the Great Lakes Symposium on VLSI 2024 (GLSVLSI ’24). Association for Computing Machinery, New York, NY, USA, 246–251. https://doi.org/10.1145/3649476.3660385.

The post Heterogeneity Of 3DICs As A Security Vulnerability appeared first on Semiconductor Engineering.

  • ✇Semiconductor Engineering
  • A Hybrid ECO Detailed Placement Flow for Mitigating Dynamic IR Drop (UC San Diego)Technical Paper Link
    A new technical paper titled “A Hybrid ECO Detailed Placement Flow for Improved Reduction of Dynamic IR Drop” was published by researchers at UC San Diego. Abstract: “With advanced semiconductor technology progressing well into sub-7nm scale, voltage drop has become an increasingly challenging issue. As a result, there has been extensive research focused on predicting and mitigating dynamic IR drops, leading to the development of IR drop engineering change order (ECO) flows – often integrated wi
     

A Hybrid ECO Detailed Placement Flow for Mitigating Dynamic IR Drop (UC San Diego)

A new technical paper titled “A Hybrid ECO Detailed Placement Flow for Improved Reduction of Dynamic IR Drop” was published by researchers at UC San Diego.

Abstract:

“With advanced semiconductor technology progressing well into sub-7nm scale, voltage drop has become an increasingly challenging issue. As a result, there has been extensive research focused on predicting and mitigating dynamic IR drops, leading to the development of IR drop engineering change order (ECO) flows – often integrated with modern commercial EDA tools. However, these tools encounter QoR limitations while mitigating IR drop. To address this, we propose a hybrid ECO detailed placement approach that is integrated with existing commercial EDA flows, to mitigate excessive peak current demands within power and ground rails. Our proposed hybrid approach effectively optimizes peak current levels within a specified “clip”– complementing and enhancing commercial EDA dynamic IR-driven ECO detailed placements. In particular, we: (i) order instances in a netlist in decreasing order of worst voltage drop; (ii) extract a clip around each instance; and (iii) solve an integer linear programming (ILP) problem to optimize instance placements. Our approach optimizes dynamic voltage drops (DVD) across ten designs by up to 15.3% compared to original conventional flows, with similar timing quality and 55.1% less runtime.”

Find the technical paper here. Published June 2024.

Andrew B. Kahng, Bodhisatta Pramanik, and Mingyu Woo. 2024. A Hybrid ECO Detailed Placement Flow for Improved Reduction of Dynamic IR Drop. In Proceedings of the Great Lakes Symposium on VLSI 2024 (GLSVLSI ’24). Association for Computing Machinery, New York, NY, USA, 390–396. https://doi.org/10.1145/3649476.3658727.

The post A Hybrid ECO Detailed Placement Flow for Mitigating Dynamic IR Drop (UC San Diego) appeared first on Semiconductor Engineering.

  • ✇Semiconductor Engineering
  • Survey of Energy Efficient PIM ProcessorsTechnical Paper Link
    A new technical paper titled “Survey of Deep Learning Accelerators for Edge and Emerging Computing” was published by researchers at University of Dayton and the Air Force Research Laboratory. Abstract “The unprecedented progress in artificial intelligence (AI), particularly in deep learning algorithms with ubiquitous internet connected smart devices, has created a high demand for AI computing on the edge devices. This review studied commercially available edge processors, and the processors that
     

Survey of Energy Efficient PIM Processors

A new technical paper titled “Survey of Deep Learning Accelerators for Edge and Emerging Computing” was published by researchers at University of Dayton and the Air Force Research Laboratory.

Abstract

“The unprecedented progress in artificial intelligence (AI), particularly in deep learning algorithms with ubiquitous internet connected smart devices, has created a high demand for AI computing on the edge devices. This review studied commercially available edge processors, and the processors that are still in industrial research stages. We categorized state-of-the-art edge processors based on the underlying architecture, such as dataflow, neuromorphic, and processing in-memory (PIM) architecture. The processors are analyzed based on their performance, chip area, energy efficiency, and application domains. The supported programming frameworks, model compression, data precision, and the CMOS fabrication process technology are discussed. Currently, most commercial edge processors utilize dataflow architectures. However, emerging non-von Neumann computing architectures have attracted the attention of the industry in recent years. Neuromorphic processors are highly efficient for performing computation with fewer synaptic operations, and several neuromorphic processors offer online training for secured and personalized AI applications. This review found that the PIM processors show significant energy efficiency and consume less power compared to dataflow and neuromorphic processors. A future direction of the industry could be to implement state-of-the-art deep learning algorithms in emerging non-von Neumann computing paradigms for low-power computing on edge devices.”

Find the technical paper here. Published July 2024.

Alam, Shahanur, Chris Yakopcic, Qing Wu, Mark Barnell, Simon Khan, and Tarek M. Taha. 2024. “Survey of Deep Learning Accelerators for Edge and Emerging Computing” Electronics 13, no. 15: 2988. https://doi.org/10.3390/electronics13152988.

The post Survey of Energy Efficient PIM Processors appeared first on Semiconductor Engineering.

  • ✇Semiconductor Engineering
  • MTJ-Based CRAM ArrayTechnical Paper Link
    A new technical paper titled “Experimental demonstration of magnetic tunnel junction-based computational random-access memory” was published by researchers at University of Minnesota and University of Arizona, Tucson. Abstract “The conventional computing paradigm struggles to fulfill the rapidly growing demands from emerging applications, especially those for machine intelligence because much of the power and energy is consumed by constant data transfers between logic and memory modules. A new p
     

MTJ-Based CRAM Array

A new technical paper titled “Experimental demonstration of magnetic tunnel junction-based computational random-access memory” was published by researchers at University of Minnesota and University of Arizona, Tucson.

Abstract

“The conventional computing paradigm struggles to fulfill the rapidly growing demands from emerging applications, especially those for machine intelligence because much of the power and energy is consumed by constant data transfers between logic and memory modules. A new paradigm, called “computational random-access memory (CRAM),” has emerged to address this fundamental limitation. CRAM performs logic operations directly using the memory cells themselves, without having the data ever leave the memory. The energy and performance benefits of CRAM for both conventional and emerging applications have been well established by prior numerical studies. However, there is a lack of experimental demonstration and study of CRAM to evaluate its computational accuracy, which is a realistic and application-critical metric for its technological feasibility and competitiveness. In this work, a CRAM array based on magnetic tunnel junctions (MTJs) is experimentally demonstrated. First, basic memory operations, as well as 2-, 3-, and 5-input logic operations, are studied. Then, a 1-bit full adder with two different designs is demonstrated. Based on the experimental results, a suite of models has been developed to characterize the accuracy of CRAM computation. Scalar addition, multiplication, and matrix multiplication, which are essential building blocks for many conventional and machine intelligence applications, are evaluated and show promising accuracy performance. With the confirmation of MTJ-based CRAM’s accuracy, there is a strong case that this technology will have a significant impact on power- and energy-demanding applications of machine intelligence.”

Find the technical paper here. Published July 2024.  Find the University of Minnesota’s news release here.

Lv, Y., Zink, B.R., Bloom, R.P. et al. Experimental demonstration of magnetic tunnel junction-based computational random-access memory. npj Unconv. Comput. 1, 3 (2024). https://doi.org/10.1038/s44335-024-00003-3.

The post MTJ-Based CRAM Array appeared first on Semiconductor Engineering.

  • ✇Recent Questions - Game Development Stack Exchange
  • What's the fastest collision detectionuser289661
    I want to build a simulator using Game Maker in which there are many (up to hundreds) of obj_Cell, each has a radius of 10, (they don't have sprites, they just draw a circle with radius 10). Each step, I want the cells to check around itself for obj_A to see if any obj_A is "touching" it, (has a distance less than 10 from the cell). However, since I have so many obj_Cell and so many obj_A, the most straight forward code (below) will have to be executed tens of thousands of times (number of obj_
     

What's the fastest collision detection

I want to build a simulator using Game Maker in which there are many (up to hundreds) of obj_Cell, each has a radius of 10, (they don't have sprites, they just draw a circle with radius 10).

Each step, I want the cells to check around itself for obj_A to see if any obj_A is "touching" it, (has a distance less than 10 from the cell). However, since I have so many obj_Cell and so many obj_A, the most straight forward code (below) will have to be executed tens of thousands of times (number of obj_A * number of obj_Cell)

//In obj_Cell:
(with obj_A)
{
   if distance_to_object(other)<=10
   {
      //do stuff
   }
}

Is there any possible way to make this faster?

  • ✇Semiconductor Engineering
  • LPDDR Memory Is Key For On-Device AI PerformanceNidish Kamath
    Low-Power Double Data Rate (LPDDR) emerged as a specialized high performance, low power memory for mobile phones. Since its first release in 2006, each new generation of LPDDR has delivered the bandwidth and capacity needed for major shifts in the mobile user experience. Once again, LPDDR is at the forefront of another key shift as the next wave of generative AI applications will be built into our mobile phones and laptops. AI on endpoints is all about efficient inference. The process of employi
     

LPDDR Memory Is Key For On-Device AI Performance

24. Červen 2024 v 09:02

Low-Power Double Data Rate (LPDDR) emerged as a specialized high performance, low power memory for mobile phones. Since its first release in 2006, each new generation of LPDDR has delivered the bandwidth and capacity needed for major shifts in the mobile user experience. Once again, LPDDR is at the forefront of another key shift as the next wave of generative AI applications will be built into our mobile phones and laptops.

AI on endpoints is all about efficient inference. The process of employing trained AI models to make predictions or decisions requires specialized memory technologies with greater performance that are tailored to the unique demands of endpoint devices. Memory for AI inference on endpoints requires getting the right balance between bandwidth, capacity, power and compactness of form factor.

LPDDR evolved from DDR memory technology as a power-efficient alternative; LPDDR5, and the optional extension LPDDR5X, are the most recent updates to the standard. LPDDR5X is focused on improving performance, power, and flexibility; it offers data rates up to 8.533 Gbps, significantly boosting speed and performance. Compared to DDR5 memory, LPDDR5/5X limits the data bus width to 32 bits, while increasing the data rate. The switch to a quarter-speed clock, as compared to a half-speed clock in LPDDR4, along with a new feature – Dynamic Voltage Frequency Scaling – keeps the higher data rate LPDDR5 operation within the same thermal budget as LPDDR4-based devices.

Given the space considerations of mobiles, combined with greater memory needs for advanced applications, LPDDR5X can support capacities of up to 64GB by using multiple DRAM dies in a multi-die package. Consider the example of a 7B LLaMa 2 model: the model consumes 3.5GB of memory capacity if based on INT4. A LPDDR5X package of x64, with two LPDDR5X devices per package, provides an aggregate bandwidth of 68 GB/s and, therefore, a LLaMa 2 model can run inference at 19 tokens per second.

As demand for more memory performance grows, we see LPDDR5 evolve in the market with the major vendors announcing additional extensions to LPDDR5 known as LPDDR5T, with the T standing for turbo. LPDDR5T boosts performance to 9.6 Gbps enabling an aggregate bandwidth of 76.8 GB/s in a x64 package of multiple LPDDR5T stacks. Therefore, the above example of a 7B LLaMa 2 model can run inference at 21 tokens per second.

With its low power consumption and high bandwidth capabilities, LPDDR5 is a great choice of memory not just for cutting-edge mobile devices, but also for AI inference on endpoints where power efficiency and compact form factor are crucial considerations. Rambus offers a new LPDDR5T/5X/5 Controller IP that is fully optimized for use in applications requiring high memory throughput and low latency. The Rambus LPDDR5T/5X/5 Controller enables cutting-edge LPDDR5T memory devices and supports all third-party LPDDR5 PHYs. It maximizes bus bandwidth and minimizes latency via look-ahead command processing, bank management and auto-precharge. The controller can be delivered with additional cores such as the In-line ECC or Memory Analyzer cores to improve in-field reliability, availability and serviceability (RAS).

The post LPDDR Memory Is Key For On-Device AI Performance appeared first on Semiconductor Engineering.

  • ✇Semiconductor Engineering
  • Powering Next-Generation Insightful DesignSara Louie
    The Ansys team is gearing up for an exciting time at DAC this week, where we’ll be sharing a whole new way of visualizing physical phenomena in 3D-IC designs, powered by NVIDIA Omniverse, a platform for developing OpenUSD and RTX-enabled 3D applications and workflows. Please attend our Exhibitor Forum session so we can show you the valuable design insights you can gain by interactively viewing surface currents, temperatures and mechanical deformations in a representative 3D-IC design. Visualizin
     

Powering Next-Generation Insightful Design

24. Červen 2024 v 09:01

The Ansys team is gearing up for an exciting time at DAC this week, where we’ll be sharing a whole new way of visualizing physical phenomena in 3D-IC designs, powered by NVIDIA Omniverse, a platform for developing OpenUSD and RTX-enabled 3D applications and workflows. Please attend our Exhibitor Forum session so we can show you the valuable design insights you can gain by interactively viewing surface currents, temperatures and mechanical deformations in a representative 3D-IC design.

Visualizing physical phenomena in 3D is a new paradigm for IC packaging signal and power integrity (SI/PI) engineers who are more familiar with schematics and 2D results plots (TDR, eye diagrams, SYZ parameters, etc). There’s a good reason for that – it really hasn’t been practical to save 3D physics data – or even to run full 3D simulations – for complex IC package designs until more recent (~5-10 years) advancements in Ansys solver technologies along with increasing accessibility to high performance compute power. I have had the pleasure of supporting a few early adopters in the SI/PI engineering community as they used 3D field plotting in HFSS to gain design insights that helped them avoid costly tape-out delays and chip re-spin. Their experiences motivate my desire to share this invaluable capability with the greater IC packaging design community.

I started my career using HFSS to design antennas for biomedical applications. Like me, the greater antenna and RF component design community has been plotting fields in 3D from Day 1 – I don’t know of any antenna engineer that hasn’t plotted a 3D radiation pattern after running an HFSS simulation. What I do know is that I’ve met hundreds of SI/PI engineers who have never plotted surface currents in their package or PCB models after running an HFSS simulation. And that must change.

…but why exactly? What value does one gain by plotting fields in 3D? If you don’t know what kind of design insights you gain by plotting fields, please allow me to show you because seeing is revealing.

Let’s say I send a signal from point A and it reflects off 3 different plates before returning back to point A:

Fig. 1: The plot below shows the received power from a sensor placed at Point A (the same position as the emitter). Can you tell which bump in this two 2D plot is the original signal returned back home?

Fig. 2: No? Neither can I. Even if you somehow guessed correctly, could you explain what caused all those other bumps with certainty?

With HFSS field plotting, everything becomes crystal clear:

Video 1: 3-bounce animation.

The original signal returns after a travel distance of 6 meters around the circle while the rest of the bumps result from other reflections – this is the kind of physical insight that a 2D plot simply can’t deliver.

Advanced packaging for 3D-IC design can’t be done with generic rules of thumb. To meet stringent specifications (operate at higher frequencies, speeds, and lower latency, power consumption), engineers are demanding the use of HFSS because its gold-standard reputation has come from countless validation studies comparing simulated results against measurement, and there is no room to compromise on accuracy for these very complex and costly designs. As more electronics are packed into tighter spaces, the risk for unwanted coupling between the different components (often stacked vertically) increases and being able to identify the true aggressors becomes more challenging. That’s when field visualization as a means to debug – i.e., uncover, learn, truly understand – what’s happening becomes an invaluable tool.

What exactly is causing unwanted radiation or reflections? What exactly is the source of noise coupling into this line? Are we seeing any current crowding on conductors or significant volume losses in dielectrics that may lead to thermal problems? Plot the fields and you will have the information required to diagnose issues and design the exact right solution – nothing more (over-designed), nothing less (failing design).

If you’re one of the early adopters who has plotted fields in highly complex designs, you will know that fields post-processing can be very graphics intensive – especially if you want to immerse yourself in the physical phenomena taking place in your design by plotting in a full 3D volume and cutting through the 3D space layer by layer. Ansys uses the enhanced graphics and visual rendering capabilities offered by NVIDIA Omniverse core technologies, available as APIs, to provide a seamlessly interactive and more intuitive experience, increasing accessibility to design insights that engineers can only gain by physics visualization.

I’m not asking my SI/PI engineering colleagues to ditch the schematics and 2D results plots – I just believe that they should (and inevitably will!) add 3D field plotting into their design process. The rise in design complexity coupled with advancements in Ansys and NVIDIA technologies is poised to make 3D field plotting – an invaluable yet heretofore underutilized tool in the world of signal and power integrity design – a practical requirement. For a first look into what will one day be commonplace design practice, please attend our Exhibitor Forum session on Wed., June 26, 1:45 PM -3:00 PM at DAC to experience our interactive demonstration of Ansys physics visualization in a representative 3D-IC design, powered by NVIDIA Omniverse.

The post Powering Next-Generation Insightful Design appeared first on Semiconductor Engineering.

  • ✇Semiconductor Engineering
  • ML Method To Predict IR Drop LevelsTechnical Paper Link
    A new technical paper titled “IR drop Prediction Based on Machine Learning and Pattern Reduction” was published by researchers at National Tsing Hua University, National Taiwan University of Science and Technology, and MediaTek. Abstract (partial) “In this paper, we propose a machine learning-based method to predict IR drop levels and present an algorithm for reducing simulation patterns, which could reduce the time and computing resources required for IR drop analysis within the ECO flow. Exper
     

ML Method To Predict IR Drop Levels

21. Červen 2024 v 18:04

A new technical paper titled “IR drop Prediction Based on Machine Learning and Pattern Reduction” was published by researchers at National Tsing Hua University, National Taiwan University of Science and Technology, and MediaTek.

Abstract (partial)
“In this paper, we propose a machine learning-based method to predict IR drop levels and present an algorithm for reducing simulation patterns, which could reduce the time and computing resources required for IR drop analysis within the ECO flow. Experimental results show that our approach can reduce the number of patterns by approximately 50%, thereby decreasing the analysis time while maintaining accuracy.”

Find the technical paper here. Published June 2024.

Yong-Fong Chang, Yung-Chih Chen, Yu-Chen Cheng, Shu-Hong Lin, Che-Hsu Lin, Chun-Yuan Chen, Yu-Hsuan Chen, Yu-Che Lee, Jia-Wei Lin, Hsun-Wei Pao, Shih-Chieh Chang, Yi-Ting Li, and Chun-Yao Wang. 2024. IR drop Prediction Based on Machine Learning and Pattern Reduction. In Proceedings of the Great Lakes Symposium on VLSI 2024 (GLSVLSI ’24). Association for Computing Machinery, New York, NY, USA, 516–519. https://doi.org/10.1145/3649476.3658775

The post ML Method To Predict IR Drop Levels appeared first on Semiconductor Engineering.

  • ✇Semiconductor Engineering
  • Power Electronic Packaging for Discrete DiesTechnical Paper Link
    A technical paper titled “Substrate Embedded Power Electronics Packaging for Silicon Carbide MOSFETs” was published by researchers at University of Cambridge, University of Warwick, Chongqing University, and SpaceX. Abstract: “This paper proposes a new power electronic packaging for discrete dies, namely Standard Cell which consists of a step-etched active metal brazed (AMB) substrate and a flexible printed circuit board (flex-PCB). The standard cell exhibits high thermal conductivity, complete
     

Power Electronic Packaging for Discrete Dies

21. Červen 2024 v 00:19

A technical paper titled “Substrate Embedded Power Electronics Packaging for Silicon Carbide MOSFETs” was published by researchers at University of Cambridge, University of Warwick, Chongqing University, and SpaceX.

Abstract:

“This paper proposes a new power electronic packaging for discrete dies, namely Standard Cell which consists of a step-etched active metal brazed (AMB) substrate and a flexible printed circuit board (flex-PCB). The standard cell exhibits high thermal conductivity, complete electrical insulation, and low stray inductance, thereby enhancing the performance of SiC MOSFET devices. The standard cell has a stray power loop inductance of less than 1 nH and a gate loop inductance of less than 1.5 nH . The standard cell has a flat body with surface-mounting electrical connections on one side and direct thermal connections on the other. The use of flex-PCB die interconnection enables maximum utilization of source pads while providing a flexible gate-source connection and the converter PCB. This paper presents the design concept of the standard cell and experimentally validates its effectiveness in a converter system.”

Find the technical paper here. Published May 2024.

A. Janabi et al., “Substrate Embedded Power Electronics Packaging for Silicon Carbide MOSFETs,” in IEEE Transactions on Power Electronics, doi: 10.1109/TPEL.2024.3396779.

Related Reading
Big Shifts In Power Electronics Packaging
Packages are becoming more complex to endure high power, high temperature conditions across a variety of applications.
Power Semiconductors: A Deep Dive Into Materials, Manufacturing & Business
Premium Content: How these devices are made and work, challenges in manufacturing, related startups, as well as the reasons why so much effort and resources are being spent to develop new materials, and new processes.</

The post Power Electronic Packaging for Discrete Dies appeared first on Semiconductor Engineering.

  • ✇Semiconductor Engineering
  • Characterizing and Evaluating A Quantum Processor Unit In A HPC CenterTechnical Paper Link
    A new technical paper titled “Calibration and Performance Evaluation of a Superconducting Quantum Processor in an HPC Center” was published by researchers at Leibniz Supercomputing Centre, IQM Quantum Computers, and Technical University of Munich. Abstract “As quantum computers mature, they migrate from laboratory environments to HPC centers. This movement enables large-scale deployments, greater access to the technology, and deep integration into HPC in the form of quantum acceleration. In labo
     

Characterizing and Evaluating A Quantum Processor Unit In A HPC Center

11. Červen 2024 v 04:36

A new technical paper titled “Calibration and Performance Evaluation of a Superconducting Quantum Processor in an HPC Center” was published by researchers at Leibniz Supercomputing Centre, IQM Quantum Computers, and Technical University of Munich.

Abstract

“As quantum computers mature, they migrate from laboratory environments to HPC centers. This movement enables large-scale deployments, greater access to the technology, and deep integration into HPC in the form of quantum acceleration. In laboratory environments, specialists directly control the systems’ environments and operations at any time with hands-on access, while HPC centers require remote and autonomous operations with minimal physical contact. The requirement for automation of the calibration process needed by all current quantum systems relies on maximizing their coherence times and fidelities and, with that, their best performance. It is, therefore, of great significance to establish a standardized and automatic calibration process alongside unified evaluation standards for quantum computing performance to evaluate the success of the calibration and operation of the system. In this work, we characterize our in-house superconducting quantum computer, establish an automatic calibration process, and evaluate its performance through quantum volume and an application-specific algorithm. We also analyze readout errors and improve the readout fidelity, leaning on error mitigation.”

Find the technical paper here. Published May 2024.

X. Deng, S. Pogorzalek, F. Vigneau, P. Yang, M. Schulz and L. Schulz, “Calibration and Performance Evaluation of a Superconducting Quantum Processor in an HPC Center,” ISC High Performance 2024 Research Paper Proceedings (39th International Conference), Hamburg, Germany, 2024, pp. 1-9, doi: 10.23919/ISC.2024.10528924.

The post Characterizing and Evaluating A Quantum Processor Unit In A HPC Center appeared first on Semiconductor Engineering.

  • ✇Semiconductor Engineering
  • Device Characteristics of GAA-Structured CMOS and CTFET Under Varying TemperaturesTechnical Paper Link
    A new technical paper titled “Vertical-Stack Nanowire Structure of MOS Inverter and TFET Inverter in Low-temperature Application” was published by researchers at National Tsing Hua University and National United University in Taiwan. Abstract “Tunneling field effect transistors (TFET) have emerged as promising candidates for integrated circuits beyond conventional metal oxide semiconductor field effect transistors (MOSFET) and could overcome the physical limit, which results in the subthreshold
     

Device Characteristics of GAA-Structured CMOS and CTFET Under Varying Temperatures

11. Červen 2024 v 04:17

A new technical paper titled “Vertical-Stack Nanowire Structure of MOS Inverter and TFET Inverter in Low-temperature Application” was published by researchers at National Tsing Hua University and National United University in Taiwan.

Abstract
“Tunneling field effect transistors (TFET) have emerged as promising candidates for integrated circuits beyond conventional metal oxide semiconductor field effect transistors (MOSFET) and could overcome the physical limit, which results in the subthreshold swing (SS) < 60mV/dec at room temperature. In this study, we compare the complementary TFET (CTFET) with complementary metal oxide semiconductor (CMOS) at low temperatures (70K) by using the Gate-All-Around (GAA) architecture. The experiment result clearly shows that the CTEFT inverter has better characteristics than the CMOS inverter in various temperatures. While operating at a fixed temperature, the CMOS inverter performs an excellent on/off ratio and SS, etc. However, when a CMOS inverter operates at varying temperatures, CMOS performs worse than CTFET. This is attributed to the influence of lattice scattering, leading to the instability of CMOS characteristics. Therefore, the CTFET inverter is suitable for operation in environments with varying temperatures, exhibiting high stability, which can be applied in space technology. The simulation tool TCAD has been used to investigate the characteristics of CMOS and CTFET at low temperatures.”

Find the technical paper here. Published June 2024.

C. -C. Tien and Y. -H. Lin, “Vertical-Stack Nanowire Structure of MOS Inverter and TFET Inverter in Low-temperature Application,” in IEEE Access, doi: 10.1109/ACCESS.2024.3410677.

The post Device Characteristics of GAA-Structured CMOS and CTFET Under Varying Temperatures appeared first on Semiconductor Engineering.

  • ✇Semiconductor Engineering
  • Efficient TNN Inference on RISC-V Processing Cores With Minimal HW OverheadTechnical Paper Link
    A new technical paper titled “xTern: Energy-Efficient Ternary Neural Network Inference on RISC-V-Based Edge Systems” was published by researchers at ETH Zurich and Universita di Bologna. Abstract “Ternary neural networks (TNNs) offer a superior accuracy-energy trade-off compared to binary neural networks. However, until now, they have required specialized accelerators to realize their efficiency potential, which has hindered widespread adoption. To address this, we present xTern, a lightweight e
     

Efficient TNN Inference on RISC-V Processing Cores With Minimal HW Overhead

11. Červen 2024 v 02:28

A new technical paper titled “xTern: Energy-Efficient Ternary Neural Network Inference on RISC-V-Based Edge Systems” was published by researchers at ETH Zurich and Universita di Bologna.

Abstract
“Ternary neural networks (TNNs) offer a superior accuracy-energy trade-off compared to binary neural networks. However, until now, they have required specialized accelerators to realize their efficiency potential, which has hindered widespread adoption. To address this, we present xTern, a lightweight extension of the RISC-V instruction set architecture (ISA) targeted at accelerating TNN inference on general-purpose cores. To complement the ISA extension, we developed a set of optimized kernels leveraging xTern, achieving 67% higher throughput than their 2-bit equivalents. Power consumption is only marginally increased by 5.2%, resulting in an energy efficiency improvement by 57.1%. We demonstrate that the proposed xTern extension, integrated into an octa-core compute cluster, incurs a minimal silicon area overhead of 0.9% with no impact on timing. In end-to-end benchmarks, we demonstrate that xTern enables the deployment of TNNs achieving up to 1.6 percentage points higher CIFAR-10 classification accuracy than 2-bit networks at equal inference latency. Our results show that xTern enables RISC-V-based ultra-low-power edge AI platforms to benefit from the efficiency potential of TNNs.”

Find the technical paper here. Published May 2024.

Rutishauser, Georg, Joan Mihali, Moritz Scherer, and Luca Benini. “xTern: Energy-Efficient Ternary Neural Network Inference on RISC-V-Based Edge Systems.” arXiv preprint arXiv:2405.19065 (2024).

The post Efficient TNN Inference on RISC-V Processing Cores With Minimal HW Overhead appeared first on Semiconductor Engineering.

  • ✇Semiconductor Engineering
  • Margin Sensors In The WildBarry Pangrle
    Back in March, I wrote up an article here that looked at how a proxy circuit could be used to measure variations in circuit performance as conditions changed in the operating environment. There were a couple of recent presentations on margin sensors at two of the big EDA vendors’ customer engineering forums that we’ll look at as well as another product with an upcoming presentation at DAC. Margin sensors have applications for silicon health and performance monitoring for SoCs, characterization,
     

Margin Sensors In The Wild

10. Červen 2024 v 09:01

Back in March, I wrote up an article here that looked at how a proxy circuit could be used to measure variations in circuit performance as conditions changed in the operating environment. There were a couple of recent presentations on margin sensors at two of the big EDA vendors’ customer engineering forums that we’ll look at as well as another product with an upcoming presentation at DAC. Margin sensors have applications for silicon health and performance monitoring for SoCs, characterization, yield, reliability, safety, power, and performance. How they are configured, though, determines their best suited tasks.

The first presentation was given at Synopsys’ SNUG Silicon Valley on March 20, 2024, titled “Diagnosis of Timing Margin on Silicon with PMM (Path Margin Monitor)”, by Gurnrack Moon, Principal Engineer at Samsung. One of the key aspects of the PMM that Samsung appreciated was the closer correlation between the PMM and the actual paths versus, say, using a Ring Oscillator approach.

Fig. 1: Synopsys Path Margin Monitor diagram. (Source: Synopsys)

My previous article described how the “Monitor Logic” portion of the PMM diagram shown above in figure 1 would conceptually work. Taps taken along the synthetic circuit of buffers could be compared to see how far the signal made it down the path and thus determine how much margin is available. A strength of this approach is that it allows one PMM to be used on multiple paths. It does have a disadvantage, though, of introducing additional control overhead and adding additional delay components in to the monitor path.

The PMMs on the chip are connected in a daisy-chain fashion which reduces the number of signals needed to send information from the PMMs to the Path Margin Monitor Controller. This also reduces the number of signals for communication. This setup efficiently uses chip area to provide information about the state of the silicon. Typically, one might expect this type of capability to be exercised in a “diagnostic” mode where data would be captured, analyzed, and then used to determine appropriate voltage and frequency settings as opposed to a more dynamic or adaptive approach.

Samsung appreciated being able to “determine if there are problems or what is different from what is designed, and what needs to be improved. In addition, PMM data fed to the Synopsys Silicon.da analytics platform provides rich analytics, shortening the debug/analysis time.” This was used on production silicon. Synopsys also has other blog articles here and here for the interested reader.

The second presentation was given at CadenceLIVE Silicon Valley, April 17, 2024, titled “Challenges in Datacenters: Search for Advanced Power Management Mechanisms”, and presented by Ziv Paz, Vice President of Business Development at proteanTecs. His presentation focused on proteanTecs’ Margin Agents and noted how these sensors were sensitive to process, aging, workload stress, latent defects, operating conditions, DC IR drops, and local Vdroops.

Fig. 2: Reducing voltage while staying within margin. (Source: proteanTecs, CadenceLIVE)

Figure 2 shows how designers must handle “worst-case” scenarios and often do so by creating enough margin to operate under those conditions. In the diagram shown here, that margin shows up as a higher operating VDD. If the normal operating mode is 650mV with an allowance for a -10% change in VDD then the design is implemented to run at 585mV (90% * 650mV). Most of the time though, the circuitry will operate properly below 650mV so that running at 650mV is just wasting energy.

proteanTecs then presented a case study that was designed using TSMC’s 5nm technology. The chip incorporated 448 margin agents consisting of buffers with a unit delay of 7ps.

Fig. 3: Example margin agents and corresponding voltage. (Source: proteanTecs, CadenceLIVE)

Figure 3 above shows the margin agents (all 448) on the left side with the thicker black line showing the worst case for all 448. The right side shows the voltage. It also demonstrates that when the threshold is lowered the voltage will now drop to 614mV and the design continues to operate properly.

Fig. 4: Example margin agents with droop and corresponding voltage. (Source: proteanTecs, CadenceLIVE)

Figure 4 shows that as the voltage on the right drops that the worst-case margin agent values also drop and once they cross the yellow(-ish) line the voltage is signaled to return to the pre-AVS voltage of 650mV. The margin agent values then improve and the AVS voltage of 614mV will kick back in. By reacting when the margin agents cross the yellow line, it allows time for the voltage to increase and adjust before the voltage hits the red (585mV) line, thus always keeping it in the proper operating zone.

For this case, proteanTecs saw a 10.77% power saving and said that they’ve typically seen savings in the 9%-14% range. For this data center-oriented customer, this was important because of a limited power budget per rack, cooling limitations, carbon neutrality requirements (PUE), and a high CAPEX. Other benefits are a higher MTTF, lower maintenance costs, and a prolonged system lifetime. proteanTecs claimed a minimal impact on area and that currently most of their designs are in 7nm, 5nm, and below.

The third vendor announced their Aeonic Insight product line including a droop detector on November 14, 2023. Movellus’ Michael Durr, Director of Application Engineering is scheduled to give a talk at DAC on Wednesday, June 26, 2024, titled “Droop! There it is!” Movellus has been long known for their digital clock generation IP and, as one might guess, their design uses a synthetic circuit for detecting changes in the operating environment. Leveraging their clock generation expertise, they are initially targeting an adaptive frequency (or clock) scaling (AFS) approach that also leverages their digital clock generation IP.

The post Margin Sensors In The Wild appeared first on Semiconductor Engineering.

Comparing Thermal Properties In Molybdenum Substrate To Si And Glass For A System-On-Foil Integration (RIT, Lux)

31. Květen 2024 v 18:39

A technical paper titled “Comparative Analysis of Thermal Properties in Molybdenum Substrate to Silicon and Glass for a System-on-Foil Integration” was published by researchers at Rochester Institute of Technology and Lux Semiconductors.

Abstract:

“Advanced electronics technology is moving towards smaller footprints and higher computational power. In order to achieve this, advanced packaging techniques are currently being considered, including organic, glass, and semiconductor-based substrates that allow for 2.5D or 3D integration of chips and devices. Metal-core substrates are a new alternative with similar properties to those of semiconductor-based substrates but with the added benefits of higher flexibility and metal ductility. This work comprehensively compares the thermal properties of a novel metal-based substrate, molybdenum, and silicon and fused silica glass substrates in the context of system-on-foil (SoF) integration. A simple electronic technique is used to simulate the heat generated by a typical CPU and to measure the heat dissipation properties of the substrates. The results indicate that molybdenum and silicon are able to effectively dissipate a continuous power density of 2.3 W/mm2 as the surface temperature only increases by ~15°C. In contrast, the surface temperature of fused silica glass substrates increases by >140°C for the same applied power. These simple techniques and measurements were validated with infrared camera measurements as well as through finite element analysis via COMSOL simulation. The results validate the use of molybdenum as an advanced packaging substrate and can be used to characterize new substrates and approaches for advanced packaging.”

Find the technical paper here. Published May 2024.

Huang, Tzu-Jung, Tobias Kiebala, Paul Suflita, Chad Moore, Graeme Housser, Shane McMahon, and Ivan Puchades. 2024. “Comparative Analysis of Thermal Properties in Molybdenum Substrate to Silicon and Glass for a System-on-Foil Integration” Electronics 13, no. 10: 1818. https://doi.org/10.3390/electronics13101818

Related Reading
The Race To Glass Substrates
Replacing silicon and organic substrates requires huge shifts in manufacturing, creating challenges that will take years to iron out.

 

The post Comparing Thermal Properties In Molybdenum Substrate To Si And Glass For A System-On-Foil Integration (RIT, Lux) appeared first on Semiconductor Engineering.

CAM-Based CMOS Implementation Of Reference Frames For Neuromorphic Processors (Carnegie Mellon U.)

31. Květen 2024 v 18:29

A technical paper titled “NeRTCAM: CAM-Based CMOS Implementation of Reference Frames for Neuromorphic Processors” was published by researchers at Carnegie Mellon University.

Abstract:

“Neuromorphic architectures mimicking biological neural networks have been proposed as a much more efficient alternative to conventional von Neumann architectures for the exploding compute demands of AI workloads. Recent neuroscience theory on intelligence suggests that Cortical Columns (CCs) are the fundamental compute units in the neocortex and intelligence arises from CC’s ability to store, predict and infer information via structured Reference Frames (RFs). Based on this theory, recent works have demonstrated brain-like visual object recognition using software simulation. Our work is the first attempt towards direct CMOS implementation of Reference Frames for building CC-based neuromorphic processors. We propose NeRTCAM (Neuromorphic Reverse Ternary Content Addressable Memory), a CAM-based building block that supports the key operations (store, predict, infer) required to perform inference using RFs. NeRTCAM architecture is presented in detail including its key components. All designs are implemented in SystemVerilog and synthesized in 7nm CMOS, and hardware complexity scaling is evaluated for varying storage sizes. NeRTCAM system for biologically motivated MNIST inference with a storage size of 1024 entries incurs just 0.15 mm^2 area, 400 mW power and 9.18 us critical path latency, demonstrating the feasibility of direct CMOS implementation of CAM-based Reference Frames.”

Find the technical paper here. Published May 2024 (preprint).

Nair, Harideep, William Leyman, Agastya Sampath, Quinn Jacobson, and John Paul Shen. “NeRTCAM: CAM-Based CMOS Implementation of Reference Frames for Neuromorphic Processors.” arXiv preprint arXiv:2405.11844 (2024).

Related Reading
Running More Efficient AI/ML Code With Neuromorphic Engines
Once a buzzword, neuromorphic engineering is gaining traction in the semiconductor industry.

The post CAM-Based CMOS Implementation Of Reference Frames For Neuromorphic Processors (Carnegie Mellon U.) appeared first on Semiconductor Engineering.

ARM Unveils Cortex-X925, A725 and A520 Cores for Flagships

1. Červen 2024 v 00:10

ARM has just unveiled its next-generation CPU cores ahead of the next generation of flagship chipsets. Therefore, chipmakers like Qualcomm, Samsung, and MediaTek will have ...

The post ARM Unveils Cortex-X925, A725 and A520 Cores for Flagships appeared first on Gizchina.com.

Asus NUC 14 Performance is a Meteor Lake mini PC with NVIDIA RTX 40 Series graphics (a ROG NUC by another name)

28. Květen 2024 v 16:56

The Asus NUC 14 Performance is a small desktop computer with the guts of a gaming laptop. It supports up to an Intel Core Ultra 9 185H Meteor Lake processor and up to NVIDIA GeForce RTC 4070 graphics. But since the NUC 14 Performance is a desktop rather than a laptop, there’s room for more […]

The post Asus NUC 14 Performance is a Meteor Lake mini PC with NVIDIA RTX 40 Series graphics (a ROG NUC by another name) appeared first on Liliputing.

  • ✇Semiconductor Engineering
  • High-Level Synthesis Propels Next-Gen AI AcceleratorsRussell Klein
    Everything around you is getting smarter. Artificial intelligence is not just a data center application but will be deployed in all kinds of embedded systems that we interact with daily. We expect to talk to and gesture at them. We expect them to recognize and understand us. And we expect them to operate with just a little bit of common sense. This intelligence is making these systems not just more functional and easier to use, but safer and more secure as well. All this intelligence comes from
     

High-Level Synthesis Propels Next-Gen AI Accelerators

20. Květen 2024 v 09:01

Everything around you is getting smarter. Artificial intelligence is not just a data center application but will be deployed in all kinds of embedded systems that we interact with daily. We expect to talk to and gesture at them. We expect them to recognize and understand us. And we expect them to operate with just a little bit of common sense. This intelligence is making these systems not just more functional and easier to use, but safer and more secure as well.

All this intelligence comes from advances in deep neural networks. One of the key challenges of neural networks is their computational complexity. Small neural networks can take millions of multiply accumulate operations (MACs) to produce a result. Larger ones can take billions. Large language models, and similarly complex networks, can take trillions. This level of computation is beyond what can be delivered by embedded processors.

In some cases, the computation of these inferences can be off-loaded over a network to a data center. Increasingly, devices have fast and reliable network connections – making this a viable option for many systems. However, there are also a lot of systems that have hard real time requirements that cannot be met by even the fastest and most reliable networks. For example, any system that has autonomous mobility – self-driving cars or self-piloted drones – needs to make decisions faster than could be done through an off-site data center. There are also systems where sensitive data is being processed that should not be sent over networks. And anything that goes over a network introduces an additional attack surface for hackers. For all of these reasons – performance, privacy, and security – some inferencing will need to be done on embedded systems.

For very simple networks, embedded CPUs can handle the task. Even a Raspberry Pi can deploy a simple object recognition algorithm. For more complex tasks there are embedded GPUs, as well as neural processing units (NPUs) targeted at embedded systems that can deliver greater computational capability. But for the highest levels of performance and efficiency, building a bespoke AI (Artificial Intelligence) accelerator can enable applications that would otherwise be impractical.

Engineering a new piece of hardware is a daunting undertaking, whether for ASIC or FPGA. But it enables developers to reach a level of performance and efficiency not possible with off-the-shelf components. But how can the average development team build a better machine learning accelerator than the designers creating the most leading-edge commercial AI accelerators, with multiple generations under their belt? By highly customizing the implementation to the specific inference being performed, the implementation can be an order of magnitude better than more generalized solutions.

When a general-purpose AI accelerator developer creates an NPU, their goal is to support any neural network that anyone might conceive. They want to get thousands of design ins, so they have to make the design as general as possible. Not only that, but they also aim to have some level of “future proofing” built into their designs. They want to be able to support any network that might be imagined for several years into the future. Not an easy task in a technology that is evolving so rapidly.

A bespoke accelerator needs to only support the one, or perhaps several, networks to be used. This freedom allows many programmable elements in the implementation of the accelerator to be fixed in hardware. This creates hardware that is both smaller and faster than something general purpose. For example, a dedicated convolution accelerator, with a fixed image and filter size, can be up to 10 times faster than a well-designed general purpose TPU.

General purpose accelerators usually use floating point numbers. This is because virtually all neural networks are developed in Python on general purpose computers using floating point numbers. To ensure correct support of those neural networks, the accelerator must, of course, support floating point numbers. However, most neural networks use numbers close to 0, and require a lot of precision there. And floating-point multipliers are huge. If they are not needed, omitting them from the design saves a lot of area and power.

Some NPUs support integer representation, and sometimes with a variety of sizes. But supporting multiple numeric representation formats adds circuitry, which consumes power and adds propagation delays. Choosing one representation and using that exclusively enables a smaller faster implementation.

When building a bespoke accelerator, one is not limited to 8 bits or 16 bits, any size can be used. Picking the correct numeric representation, or “quantizing” a neural network, allows the data and the operators to be optimally sized. Quantization can significantly reduce the data needed to be stored, moved, and operated on. Reducing the memory footprint for the weight database and shrinking the multipliers can really improve the area and power of a design. For example, a 10-bit fixed-point multiplier is about 20 times smaller than a 32-bit floating-point multiplier, and, correspondingly, will use about 1/20th the power. This means the design can either be much smaller and energy efficient by using the smaller multiplier, or the designer can opt to use the area and deploy 20 multipliers that can operate in parallel, producing much higher performance using the same resources.

One of the key challenges in building a bespoke machine learning accelerator is that the data scientists who created the neural network usually do not understand hardware design, and the hardware designers do not understand data science. In a traditional design flow, they would use “meetings” and “specifications” to transfer knowledge and share ideas. But, honestly, no one likes meetings or specifications. And they are not particularly good at effecting an information exchange.

High-Level Synthesis (HLS) allows an implementation produced by the data scientists to be used, not just as an executable reference, but as a machine-readable input to the hardware design process. This eliminates the manual reinterpretation of the algorithm in the design flow, which is slow and extremely error prone. HLS synthesizes an RTL implementation from an algorithmic description. Usually, the algorithm is described in C++ or SystemC, but a number of design flows like HLS4ML are enabling HLS tools to take neural network descriptions directly from machine learning frameworks.

HLS enables a practical exploration of quantization in a way that is not yet practical in machine learning frameworks. To fully understand the impact of quantization requires a bit accurate implementation of the algorithm, including the characterization of the effects of overflow, saturation, and rounding. Today this in only practical in hardware description languages (HDLs) or HLS bit accurate data types (https://hlslibs.org).

As machine learning becomes ubiquitous, more embedded systems will need to deploy inferencing accelerators. HLS is a practical and proven way to create bespoke accelerators, optimized for a very specific application, that deliver higher performance and efficiency than general purpose NPUs.

For more information on this topic, read the paper: High-Level Synthesis Enables the Next Generation of Edge AI Accelerators.

The post High-Level Synthesis Propels Next-Gen AI Accelerators appeared first on Semiconductor Engineering.

  • ✇Semiconductor Engineering
  • Chip Aging Becoming Key Factor In Data Center EconomicsAnn Mutschler
    Chip aging is becoming a much bigger concern inside of data centers, where it can impact server uptime, utilization rates, and the amount of energy needed to drive signals and cool entire server racks. Aging in chips is the result of both higher logic utilization and increasing transistor density. This is problematic for data centers, in general, but especially for AI chips where digital logic is expected to run at maximum speed. That generates more heat, which becomes harder to dissipate as the
     

Chip Aging Becoming Key Factor In Data Center Economics

20. Květen 2024 v 09:01

Chip aging is becoming a much bigger concern inside of data centers, where it can impact server uptime, utilization rates, and the amount of energy needed to drive signals and cool entire server racks.

Aging in chips is the result of both higher logic utilization and increasing transistor density. This is problematic for data centers, in general, but especially for AI chips where digital logic is expected to run at maximum speed. That generates more heat, which becomes harder to dissipate as the number specialized and general-purpose processing elements per square millimeter of silicon continues to rise. Heat typically gets trapped between the fins of finFETs and gate-all-around FETs, accelerating electromigration and reducing the time it takes for dielectrics to break down. It also can cause warpage, which can rupture the bonds and contacts between different components in an advanced package or on a PCB.

For data centers, that creates a number of challenges:

  • Thermal management: This requires a deep understanding of workloads and the resulting transient thermal gradients as processing is load-balanced on-chip, between chips or chiplets, and between servers;
  • More data: Data from sensors everywhere, along with larger training sets, all need to be processed faster than in the past to keep up with the flood of data, but all of that needs to happen in the same or smaller footprint without overheating any part of a device, and
  • In-circuit monitoring: Sensors can be added into chips to detect variations in heat and data speeds in different paths, but it’s much more difficult to keep track of tens of thousands of these monitors as they collect data from heterogeneous processing elements, each of which can age at different rates depending on process variation, defectivity, varying workloads, and ambient thermal conditions.

“Servers are much more capable today than they were 10 years ago, and the issue is that power hasn’t scaled like it used to,” said Steven Woo, Rambus fellow and distinguished inventor. “Now, if you want to do lots more work in your server, you have to burn more power to do it. Twenty years ago, a server might dissipate a couple hundred watts. But with the latest servers that NVIDIA just announced around Grace Blackwell, the whole rack is 120 kilowatts, and the individual servers are many kilowatts. Just delivering power into those racks is causing changes in the infrastructure in the industry. Now that you have to bring in and dissipate more power in a small space, you get all kinds of interesting things that could happen over time. The heat that’s being dissipated can have effects on the chip, and you have to worry sometimes about thermal cycling where, as the chip is doing a lot of work, maybe part of the chip stops and then it does more work. You get these rapid cycles of dissipating a lot of power, then not, then dissipating a lot of power, then not. That cycling causes local heating and cooling, leading to thermal stresses, and this impacts all chips, including memory.”

As a result, everyone from the data center manager to the chip architect now has to understand how a chip behaves in the field, and how increasingly customized chip and system architectures will function over time. Downtime is costly for a data center, but under-utilization and reduced performance also carries a high price tag. That, in turn, affects how much margin is considered essential, such as extra data paths if some of them are fully or partially closed off by electromigration, and how that margin will impact performance, power, and area/cost over a chip’s projected lifetime — especially in a heterogeneous design with specialized compute elements.

“When it comes to the hyper-scalers and high powered, highly customized, heterogeneous chips for various different workloads, these chips are on 24/7, so consistent uptime is critical,” said Dan Lee, product management director at Cadence. “Since all of these chips are done at the really advanced nodes, with the smaller device sizes, more developers are looking to do aging analysis, and derive the wear and tear so they can see if the chip is going to last a year or five years. At the same time, an important consideration is also thermal — especially when we’re talking about these heterogeneous integrations, and you don’t really get the thermal conductivity that you would in a straightforward, monolithic design. There’s a bit more thought or planning that needs to be a part of this because aging and heating are related. All things being equal, if you’re operating in a very hot environment, you’re going to expect a lower lifespan.”

Still, determining how much shorter that lifespan will be isn’t always a precise calculation. “Data center SoCs that execute mission-critical workloads need to provide scalable visibility, predict problems before they occur, provide deep-dive analyses into problems, and be optimized to increase longevity of investment,” said Padmakumar Karthik, senior technology manager at Arm. “Data center diagnostic patterns are often deployed to measure the health of an SoC post-manufacturing to prevent silent data corruption (SDC) issues. But on-chip sensors provide an additional layer of insights, detecting droops or aging or thermal events on-chip, all of which can cause SDC incidents. For this reason, scalable, customizable sensor frameworks that can monitor and adapt throughout the useful life of the device, enabling continuous design optimization and preventive maintenance, will be increasingly important.”

There are multiple ways to achieve this, but each data center can be very different. In some cases, chips are designed by systems companies for internal use. And in most cases, there is a mix of different hardware and software, not all of which is state-of-the-art. “Many data centers have legacy infrastructure that may not be inherently designed for optimal power efficiency,” noted Noam Brousard, vice president of systems at proteanTecs, in a recent blog. “Upgrading or retrofitting such infrastructure poses challenges in achieving comprehensive power optimization.”

Even within a single rack, stresses can vary greatly from one server to the next, and from one chip to the next even in the same server. “You can imagine when you have a very big chip, toward the edges of the chip it will expand more than in a small chip, and that can add stress,” said Rambus’ Woo. “You have to really be careful about how you cool things, and memory is no different. You have very specific things you worry about with memory, like the ability to retain data, depending on how hot the chip is.”

In addition, as chips age, parameters drift. Marc Swinnen, director of product marketing in Ansys’ semiconductor division, said the traditional approach has been to use a library that’s characterized as a brand new chip. “The library is characterized at 1 year, 5 years, 10 years, 15 years, and you can run all your analysis multiple times with these different aged libraries. That sounds good on paper, and that’s what a lot of people do, but the problem is that not all parts of the chip age at the same rate. This is why aging is often associated with activity and temperature. Some parts of the chip are more active and hotter than other parts of the chip, so the aging time runs differently for different parts. This means you want to apply some of the old library to some parts of the chip, and the younger library to other parts of the chip, because if signals run between them you have setup and hold issues. If everything slows down at the same time — or one slows down and the other one doesn’t — you’re going to get mismatches, and that’s the difficulty. At the bottom level, it’s easy. Every gate is assigned its right age. That’s simple. You do an analysis with every gate. But how do you assign the age to every gate? Where do you get that information from? You need a lot of realistic activity, and then predict that over the lifespan and with temperature. That’s the problem. How do you actually construct this aging map? Once you have it, the analysis is not that hard.”

Aging maps are application- and workload-specific. Every chip will age differently depending on the functions it performs.

But aging is just one of many factors that affect data center uptime. “When we look at data center, we look at the whole application first, then whittle it down to what that means for chips and packages,” said Kelly Morgan, senior principal application engineer at Ansys. “From the mechanical reliability lens of the data center operation, we go through thermal cycling, obviously. We’re in a controlled environment. But what does that influence? How does that influence the integrity of the chips as you go through thermal cycles? Typically, we’ll look at things like solder fatigue and other effects.”

Another factor to consider is shipping and handling, which can affect the aging of a chip, package, and board.

“Even before the device is put in place, there are opportunities for vibration,” Morgan said. “You might hit something, which is a bit of a shock. We have customers who are looking at things like drop, shock, and vibration, and they have goals they need to test to. Typically, the standard process is to do a lot of physical testing. Now as you can imagine, that can be pretty challenging. You have to be pretty far along in the design process before you really start to go and test, and if there’s an issue, then you’ve got to go back and retest. Early simulation helps here, especially for those larger-scale events, and that comes down to the chassis, the board, to all the components, including the ICs.”


Fig. 1: Components of complete electronic system analysis. Source: Ansys

Quality control remains a big challenge when it comes to mechanical stresses that can affect aging. Adam Cron, distinguished architect at Synopsys, pointed to a recent Intel white paper, which noted that at the current acceptable defectivity rates, one core fails every two days. To account for this, Cron noted that certain commercial tools support in-system delay testing in a BiST mode. By adding specific IP, any ATPG patterns could be added to that. (Intel’s paper said its solution only applies to stuck-at testing.)

“In very large, millions-of-cores data center-type environments, the implication is that you’d better be ready,” Cron said. “One of the things they were talking about in this paper was in-system scan. Intel was bringing a database of test patterns in, and then applying it in-system after isolating a core. And then, upon a failure, they’d quarantine and move on. But the data centers are apparently running out of that opportunistic time slot to do any of this. We’ve heard some interesting conversations about the fact that people do run a lot of things during certain times. However, other times are cheaper, so all the holes are just getting filled in terms of runtime. Monitors are certainly something to look at, but monitors are looking at systemic degradation. That’s known, if you will. And so as things degrade, Vmin will change, maybe frequency will change. And they’ll be on a pace. They can figure out when to do that. That’s easy enough to figure out. However, if there’s a marginality or some broken component in there, it is not up to the tool to find that. And frankly, the in-system scan wasn’t addressing all components on the die. It was only up to like 80% of stuck-at coverage, which isn’t that much, especially when you’re not looking at all of the pieces inside the die. The point is, there are still opportunities to do better.”

Cron noted that one big systems company suggested a dual-core lockstep mechanism, starting out the data center in dual-core lock-step mode for X number of months. “When it looks like you’ve squeezed the major part of the curve out, in terms of finding these defective components, then unlock them, double your capacity, run like that for a while, and periodically hook some back up again. That means everything is utilized, at least. Of course, some are working at half capacity here and there, but it’s not the whole die. And there are some implications there from a design standpoint, at least for the hardware, but also possibly the operating system, depending on who decides what physical core is used versus what virtual core is used.”

Approaches to measuring aging
Any discussion around aging circuits really boils down to extending the life of the machines in the data center, and not getting caught by surprise when failures occur.

“How do you do that? You have to measure the aging of those machines,” said Neil Hand, director of marketing, IC segment at Siemens EDA. “Right now, if you speak to the CIOs of these big companies with big data centers, they say, ‘We’ve got to get rid of the machines after three years because we can’t risk it going down.’ If you look at embedded analytics capabilities, you can start to embed aging monitors in those devices, you can start to monitor those in real time. It doesn’t look that different than what it does from an automotive perspective. It’s all the same technologies, effectively, but you’re monitoring them. And then you can say, ‘We’re now at 90% of our life for this server.’ We can then just replace that server.”

This feeds into corporate goals around sustainability, as well. “It comes down to building the best thing to begin with, then building it with design for manufacturing in mind so that you don’t get waste during manufacturing, achieve better yields, and finally extend the life of products and build them in environmentally-sustainable ways,” Hand said. “If you can extend the data center lifecycle from three years to five years, that’s big. And especially if you start going to these high-performance, application-specific type of clusters, you may not need to change them as often, because if the underlying capabilities aren’t changing, that might drive the cycling of it. In the case of a biological computer, if there’s no new change to the underlying protein folding mechanisms, you might say, ‘We don’t need a new compute platform. This is really good.”

The longer the product life can be extended, the better. Design for aging is a matter of, first, performing the aging analysis with the foundry models. “Run the simulations and observe the effects,” said Cadence’s Lee. “When you’re doing the simulation, you want to have the right mission profiles, so you come up with an accurate prediction of how your device is going to behave after a certain number of years in deployment. You may want to combine that with thermal analysis, for example, because how that aging is going to behave will depend on what temperature this design is going to be working at. You may think it’s 22 degrees Celsius, but maybe through some thermal analysis you realize it’s actually going to be operating at 35 or 40 degrees most of the time. That may change the outcome of your aging analysis.”

In terms of the associated thermal analysis, this can extend beyond a single device. “It’s also how that heat is moving,” Lee said. “Let’s say you have this integrated design, where you have some power devices alongside some logic, or some other functionality that is lower power. What you may want to understand is, if those bandgaps or power circuits are generating a lot of heat, that may be shifting over into other parts of your design. So when you run your aging analysis, you may assume that you’re running at 25 degrees, whereas the power devices are at 40 or 45 degrees. They’re on the same chip, they’re very close to each other, and you have to understand how much of that heat is moving over to your logic and what that’s going to bring the temperature up to. You want to know that so you can perform the aging analysis based on that higher temperature.”

Another consideration is combining aging analysis and interconnect parasitics, which is especially relevant for advanced nodes due to the parasitics in the interconnect. “They’re dominant when it comes to performance and functionality,” Lee added. “So when thinking about aging, you also have to think about it being an aged device that has to push the electrons through this interconnect. That’s a pretty heavy load. When you’re doing the aging analysis, you probably will have to be doing it with extracted parasitics. You just can’t do it on a pure schematic design. It doesn’t give you enough detail about what’s really happening physically. This may be included in the aging analysis tool. When most people talk about aging, they may not think about the parasitic aspect to it.”

Combating aging, thermal in memory
While standards don’t work in custom silicon, they do work for some standard components in those devices, such as memory. Over the past 10 to 15 years, memory standards have started to address the impact of heat.

“If you start to exceed certain temperature limits, you’ve got to refresh the device more frequently because the charge can leak off the cells more quickly,” said Rambus’ Woo. “So there are temperature-dependent refresh rates. There are other things that can be exacerbated, like the capacitors are getting smaller, they’re holding fewer electrons because there are so many more of them on a chip now, so we’ve seen memories adopt on-die error correction. This on-die error correction is something that is hidden from the outside world. In many cases, you don’t even know an error has occurred and been corrected on the chip. Those kinds of technologies become even more important now because the temperatures can be higher.”

There also is growing demand for more telemetry to provide monitoring information. “You just want to know if anything is overheating,” said Woo. “Does something seem like it’s malfunctioning? The data center manager will get regular updates about the status of the major components of the system. A lot of boards now in servers have baseboard management controllers (BMCs), which are little chips that sit on each board and are responsible for, among other things, reporting back the health of that board when a server might have five or six boards. We’re frequently seeing more of these BMC chips.”

Design for aging
While the goal is to be able to guarantee a certain lifetime for the chips in a data center, the challenges for achieving that are expanding. “There’s a growing list of things that can be harmful to devices over their lifetime,” Woo said. “It’s a balance between not adding too much cost, even though you have to increase the reliability and maybe add new features, and all of these things are in play with each other.”

Whether it is liquid cooling or higher levels of RAS ECC in the system, there is no single best answer for every application. In general, the industry is moving toward higher reliability and increasing resilience, but there are many ways to get there and challenges with each of them.

“Just as 15 years ago we didn’t necessarily always think we had to talk about power, now we have to talk about it all the time,” Woo said. “The same thing is going to be true for resilience and reliability. It’s going to be required to become part of the way people think about architectures, and part of that is how the memory system improves its reliability. You can’t really do anything unless you can compute on some data, and you have to make sure that data is reliable. It will touch how memory is stored in a DRAM. It will touch how memory is communicated across links. And it even will touch how processors manipulate data once they get a hold of it in their caches, and in the compute pipelines. Also, one of the key things people will worry about is how much of that susceptibility is brought about by age-related issues, like heating cycles, etc.”

Finally, there are even issues around the quality of the power that comes into a system. “The servers get noise on the power rails, and it’s a balance between how much money you’re willing to pay for the power delivery versus the quality of power,” said Woo. “You have to be tolerant of those kinds of things, too. Power management becomes more challenging, as well as the amount of power that these systems are using today. NVIDIA systems bring 48-volt power into the racks, and there is talk about even higher voltage levels. Those changes in infrastructure can all impact heat, and can age components differently.”

The post Chip Aging Becoming Key Factor In Data Center Economics appeared first on Semiconductor Engineering.

  • ✇Semiconductor Engineering
  • Efficient ElectronicsAndy Heinig
    Attention nowadays has turned to the energy consumption of systems that run on electricity. At the moment, the discussion is focused on electricity consumption in data centers: if this continues to rise at its current rate, it will account for a significant proportion of global electricity consumption in the future. Yet there are other, less visible electricity consumers whose power needs are also constantly growing. One example is mobile communications, where ongoing expansion – especially with
     

Efficient Electronics

16. Květen 2024 v 09:07

Attention nowadays has turned to the energy consumption of systems that run on electricity. At the moment, the discussion is focused on electricity consumption in data centers: if this continues to rise at its current rate, it will account for a significant proportion of global electricity consumption in the future. Yet there are other, less visible electricity consumers whose power needs are also constantly growing. One example is mobile communications, where ongoing expansion – especially with the new current 5G standard and the future 6G standard – is pushing up the number of base stations required. This, too, will drive up electricity demand, as the latter increases linearly with the number of stations; at least, if the demand per base station is not reduced. Another example is electronics for the management of household appliances and in the industrial sector: more and more such systems are being installed, and their electronics are becoming significantly more powerful. They are not currently optimized for power consumption, but rather for performance.

This state of affairs simply cannot continue into the future for two reasons: first, the price of electricity will continue to rise worldwide; and second, many companies are committed to becoming carbon neutral. Their desire for carbon neutrality in turn makes electricity yet more expensive and restricts the overall quantity much more severely. As a result, there will be a significant demand for efficient electronics in the coming years, particularly as regards electricity consumption.

This development is already evident today, especially in power electronics, where the use of new semiconductor materials such as GaN or SiC has made it possible to reduce power consumption. A key driver for the development and introduction of such new materials was the electric car market, as reduced losses in the electronics leads directly to increased vehicle range. In the future, these materials will also find their way into other areas; for instance, they are already beginning to establish themselves in voltage transformers in various industries. However, this shift requires more factories and more suppliers for production, and further work also needs to be carried out to develop appropriate circuit concepts for these technologies.

In addition to the use of new materials, other concepts to reduce energy consumption are needed. The data center sector will require increasingly better-adapted circuits – ones that have been developed for a specific task, and as a result can perform this task much more efficiently than universal processors. This involves striking the optimum balance between universal architectures, such as microprocessors and graphics cards, and highly specialized architectures that are suitable for only one use case. Some products will also fall between these two extremes. The increased energy efficiency is then “purchased” through the effort and expense of developing exceptionally specially adapted architectures. It’s important to note that the more specialized an adapted architecture is, the smaller the market for it. That means the only way such architectures will be economically viable is if they can be developed efficiently. This calls for new approaches to derive these architectures directly from high-level hardware/software optimization, without the additional implementation steps that are still necessary today. In sum, the only way to make this approach possible is by using novel concepts and tools to generate circuits directly from a high-level description.

The post Efficient Electronics appeared first on Semiconductor Engineering.

  • ✇Semiconductor Engineering
  • How To Successfully Deploy GenAI On Edge DevicesGordon Cooper
    Generative AI (GenAI) burst onto the scene and into the public’s imagination with the launch of ChatGPT in late 2022. Users were amazed at the natural language processing chatbot’s ability to turn a short text prompt into coherent humanlike text including essays, language translations, and code examples. Technology companies – impressed with ChatGPT’s abilities – have started looking for ways to improve their own products or customer experiences with this innovative technology. Since the ‘cost’
     

How To Successfully Deploy GenAI On Edge Devices

16. Květen 2024 v 09:06

Generative AI (GenAI) burst onto the scene and into the public’s imagination with the launch of ChatGPT in late 2022. Users were amazed at the natural language processing chatbot’s ability to turn a short text prompt into coherent humanlike text including essays, language translations, and code examples. Technology companies – impressed with ChatGPT’s abilities – have started looking for ways to improve their own products or customer experiences with this innovative technology. Since the ‘cost’ of adding GenAI includes a significant jump in computational complexity and power requirements versus previous AI models, can this class of AI algorithms be applied to practical edge device applications where power, performance and cost are critical? It depends.

What is GenAI?

A simple definition of GenAI is ‘a class of machine learning algorithms that can produce various types of content including human like text and images.’ Early machine learning algorithms focused on detecting patterns in images, speech or text and then making predictions based on the data. For example, predicting the percentage likelihood that a certain image included a cat. GenAI algorithms take the next step – they perceive and learn patterns and then generate new patterns on demand by mimicking the original dataset. They generate a new image of a cat or describe a cat in detail.

While ChatGPT might be the most well-known GenAI algorithm, there are many available, with more being released on a regular basis. Two major types of GenAI algorithms are text-to-text generators – aka chatbots – like ChatGPT, GPT-4, and Llama2, and text-to-image generative model like DALLE-2, Stable Diffusion, and Midjourney. You can see example prompts and their returned outputs of these two types of GenAI models in figure 1. Because one is text based and one is image based, these two types of outputs will demand different resources from edge devices attempting to implement these algorithms.

Fig. 1: Example GenAI outputs from a text-to-image generator (DALLE-2) and a text-to-text generator (ChatGPT).

Edge device applications for Gen AI

Common GenAI use cases require connection to the internet and from there access to large server farms to compute the complex generative AI algorithms. However, for edge device applications, the entire dataset and neural processing engine must reside on the individual edge device. If the generative AI models can be run at the edge, there are potential use cases and benefits for applications in automobiles, cameras, smartphones, smart watches, virtual and augmented reality, IoT, and more.

Deploying GenAI on edge devices has significant advantages in scenarios where low latency, privacy or security concerns, or limited network connectivity are critical considerations.

Consider the possible application of GenAI in automotive applications. A vehicle is not always in range of a wireless signal, so GenAI needs to run with resources available on the edge. GenAI could be used for improving roadside assistance and converting a manual into an AI-enhanced interactive guide. In-car uses could include a GenAI-powered virtual voice assistant, improving the ability to set navigation, play music or send messages with your voice while driving. GenAI could also be used to personalize your in-cabin experience.

Other edge applications could benefit from generative AI. Augmented Reality (AR) edge devices could be enhanced by locally generating overlay computer-generated imagery and relying less heavily on cloud processing. While connected mobile devices can use generative AI for translation services, disconnected devices should be able to offer at least a portion of the same capabilities. Like our automotive example, voice assistant and interactive question-and-answer systems could benefit a range of edge devices.

While uses cases for GenAI at the edge exist now, implementations must overcome the challenges related to computational complexity and model size and limitations of power, area, and performance inherent in edge devices.

What technology is required to enable GenAI?

To understand GenAI’s architectural requirements, it is helpful to understand its building blocks. At the heart of GenAI’s rapid development are transformers, a relatively new type of neural network introduced in a Google Brain paper in 2017. Transformers have outperformed established AI models like Recurrent Neural Networks (RNNs) for natural language processing and Convolutional Neural Networks (CNNs) for images, video or other two- or three-dimensional data. A significant architectural improvement of a transformer model is its attention mechanism. Transformers can pay more attention to specific words or pixels than legacy AI models, drawing better inferences from the data. This allows transformers to better learn contextual relationships between words in a text string compared to RNNs and to better learn and express complex relationships in images compared to CNNs.

Fig. 2: Parameter sizes for various machine learning algorithms.

GenAI models are pre-trained on vast amounts of data which allows them to better recognize and interpret human language or other types of complex data. The larger the datasets, the better the model can process human language, for instance. Compared to CNN or vision transformer machine learning models, GenAI algorithms have parameters – the pretrained weights or coefficients used in the neural network to identify patterns and create new ones – that are orders of magnitude larger. We can see in figure 2 that ResNet50 – a common CNN algorithm used for benchmarking – has 25 million parameters (or coefficients). Some transformers like BERT and Vision Transformer (ViT) have parameters in the hundreds of millions. While other transformers, like Mobile ViT, have been optimized to better fit in embedded and mobile applications. MobileViT is comparable to the CNN model MobileNet in parameters.

Compared to CNN and vision transformers, ChatGPT requires 175 billion parameters and GPT-4 requires 1.75 trillion parameters. Even GPUs implemented in server farms struggle to execute these high-end large language models. How could an embedded neural processing unit (NPU) hope to complete so many parameters given the limited memory resources of edge devices? The answer is they cannot. However, there is a trend toward making GenAI more accessible in edge device applications, which have more limited computation resources. Some LLM models are tuned to reduce the resource requirements for a reduced parameter set. For example, Llama-2 offers a 70 billion parameter version of their model, but they also have created smaller models with fewer parameters. Llama-2 with seven billion parameters is still large, but it is within reach of a practical embedded NPU implementation.

There is no hard threshold for generative AI running on the edge, however, text-to-image generators like Stable Diffusion with one billion parameters can run comfortably on an NPU. And the expectation is for edge devices to run LLMs up to six to seven billion parameters. MLCommons have added GPT-J, a six billion parameter GenAI model, to their MLPerf edge AI benchmark list.

Running GenAI on the edge

GenAI algorithms require a significant amount of data movement and computation complexity (with transformer support). The balance of those two requirements can determine whether a given architecture is compute-bound – not enough multiplications for the data available – or memory bound – not enough memory and/or bandwidth for all the multiplications required for processing. Text-to-image has a better mix of compute and bandwidth requirements – more computations needed for processing two dimensional images and fewer parameters (in the one billion range). Large language models are more lopsided. There is less compute required, but a significantly large amount of data movement. Even the smaller (6-7B parameter) LLMs are memory bound.

The obvious solution is to choose the fastest memory interface available. From figure 3, you can see that a typically memory used in edge devices, LPDDR5, has a bandwidth of 51 Gbps, while HBM2E can support up to 461 Gbps. This does not, however, take into consideration the power-down benefits of LPDDR memory over HBM. While HBM interfaces are often used in high-end server-type AI implementations, LPDDR is almost exclusively used in power sensitive applications because of its power down abilities.

Fig. 3: The bandwidth and power difference between LPDDR and HBM.

Using LPDDR memory interfaces will automatically limit the maximum data bandwidth achievable with an HBM memory interface. That means edge applications will automatically have less bandwidth for GenAI algorithms than an NPU or GPU used in a server application. One way to address bandwidth limitations is to increase the amount of on-chip L2 memory. However, this impacts area and, therefore, silicon cost. While embedded NPUs often implement hardware and software to reduce bandwidth, it will not allow an LPDDR to approach HBM bandwidths. The embedded AI engine will be limited to the amount of LPDDR bandwidth available.

Implementation of GenAI on an NPX6 NPU IP

The Synopsys ARC NPX6 NPU IP family is based on a sixth-generation neural network architecture designed to support a range of machine learning models including CNNs and transformers. The NPX6 family is scalable with a configurable number of cores, each with its own independent matrix multiplication engine, generic tensor accelerator (GTA), and dedicated direct memory access (DMA) units for streamlined data processing. The NPX6 can scale for applications requiring less than one TOPS of performance to those requiring thousands of TOPS using the same development tools to maximize software reuse.

The matrix multiplication engine, GTA and DMA have all been optimized for supporting transformers, which allow the ARC NPX6 to support GenAI algorithms. Each core’s GTA is expressly designed and optimized to efficiently perform nonlinear functions, such as ReLU, GELU, sigmoid. These are implemented using a flexible lookup table approach to anticipate future nonlinear functions. The GTA also supports other critical operations, including SoftMax and L2 normalization needed in transformers. Complementing this, the matrix multiplication engine within each core can perform 4,096 multiplications per cycle. Because GenAI is based on transformers, there are no computation limitations for running GenAI on the NPX6 processor.

Efficient NPU design for transformer-based models like GenAI requires complex multi-level memory management. The ARC NPX6 processor has a flexible memory hierarchy and can support a scalable L2 memory up to 64MB of on chip SRAM. Furthermore, each NPX6 core is equipped with independent DMAs dedicated to the tasks of fetching feature maps and coefficients and writing new feature maps. This segregation of tasks allows for an efficient, pipelined data flow that minimizes bottlenecks and maximizes the processing throughput. The family also has a range of bandwidth reduction techniques in hardware and software to maximize bandwidth.

In an embedded GenAI application, the ARC NPX6 family will only be limited by the LPDDR available in the system. The NPX6 successfully runs Stable Diffusion (text-to-image) and Llama-2 7B (text-to-text) GenAI algorithms with efficiency dependent on system bandwidth and the use of on-chip SRAM. While larger GenAI models could run on the NPX6, they will be slower – measured in tokens per second – than server implementations. Learn more at www.synopsys.com/npx

The post How To Successfully Deploy GenAI On Edge Devices appeared first on Semiconductor Engineering.

  • ✇Semiconductor Engineering
  • Will Domain-Specific ICs Become Ubiquitous?Brian Bailey
    Questions are surfacing for all types of design, ranging from small microcontrollers to leading-edge chips, over whether domain-specific design will become ubiquitous, or whether it will fall into the historic pattern of customization first, followed by lower-cost, general-purpose components. Custom hardware always has been a double-edged sword. It can provide a competitive edge for chipmakers, but often requires more time to design, verify, and manufacture a chip, which can sometimes cost a mar
     

Will Domain-Specific ICs Become Ubiquitous?

16. Květen 2024 v 09:05

Questions are surfacing for all types of design, ranging from small microcontrollers to leading-edge chips, over whether domain-specific design will become ubiquitous, or whether it will fall into the historic pattern of customization first, followed by lower-cost, general-purpose components.

Custom hardware always has been a double-edged sword. It can provide a competitive edge for chipmakers, but often requires more time to design, verify, and manufacture a chip, which can sometimes cost a market window. In addition, it’s often too expensive for all but the most price-resilient applications. This is a well-understood equation at the leading edge of design, particularly where new technologies such as generative AI are involved.

But with planar scaling coming to an end, and with more features tailored to specific domains, the chip industry is struggling to figure out whether the business/technical equation is undergoing a fundamental and more permanent change. This is muddied further by the fact that some 30% to 35% of all design tools today are being sold to large systems companies for chips that will never be sold commercially. In those applications, the collective savings from improved performance per watt may dwarf the cost of designing, verifying, and manufacturing a highly optimized multi-chip/multi-chiplet package across a large data center, leaving the debate about custom vs. general-purpose more uncertain than ever.

“If you go high enough in the engineering organization, you’re going to find that what people really want to do is a software-defined whatever it is,” says Russell Klein, program director for high-level synthesis at Siemens EDA. “What they really want to do is buy off-the-shelf hardware, put some software on it, make that their value-add, and ship that. That paradigm is breaking down in a number of domains. It is breaking down where we need either extremely high performance, or we need extreme efficiency. If we need higher performance than we can get from that off-the-shelf system, or we need greater efficiency, we need the battery to last longer, or we just can’t burn as much power, then we’ve got to start customizing the hardware.”

Even the selection of processing units can make a solution custom. “Domain-specific computing is already ubiquitous,” says Dave Fick, CEO and cofounder of Mythic. “Modern computers, whether in a laptop, phone, security camera, or in farm equipment, consist of a mix of hardware blocks co-optimized with software. For instance, it is common for a computer to have video encode or decode hardware units to allow a system to connect to a camera efficiently. It is common to have accelerators for encryption so that we can safely communicate. Each of these is co-optimized with software algorithms to make commonly used functions highly efficient and flexible.”

Steve Roddy, chief marketing officer at Quadric, agrees. “Heterogeneous processing in SoCs has been de rigueur in the vast majority of consumer applications for the past two decades or more.  SoCs for mobile phones, tablets, televisions, and automotive applications have long been required to meet a grueling combination of high-performance plus low-cost requirements, which has led to the proliferation of function-specific processors found in those systems today.  Even low-cost SoCs for mobile phones today have CPUs for running Android, complex GPUs to paint the display screen, audio DSPs for offloading audio playback in a low-power mode, video DSPs paired with NPUs in the camera subsystem to improve image capture (stabilization, filters, enhancement), baseband DSPs — often with attached NPUs — for high speed communications channel processing in the Wi-Fi and 5G subsystems, sensor hub fusion DSPs, and even power-management processors that maximize battery life.”

It helps to separate what you call general-purpose and what is application-specific. “There is so much benefit to be had from running your software on dedicated hardware, what we call bespoke silicon, because it gives you an advantage over your competitors,” says Marc Swinnen, director of product marketing in Ansys’ Semiconductor Division. “Your software runs faster, lower power, and is designed to run specifically what you want to run. It’s hard for a competitor with off-the-shelf hardware to compete with you. Silicon has become so central to the business value, the business model, of many companies that it has become important to have that optimized.”

There is a balance, however. “If there is any cost justification in terms of return on investment and deployment costs, power costs, thermal costs, cooling costs, then it always makes sense to build a custom ASIC,” says Sharad Chole, chief scientist and co-founder of Expedera. “We saw that for cryptocurrency, we see that right now for AI. We saw that for edge computing, which requires extremely ultra-low power sensors and ultra-low power processes. But there also has been a push for general-purpose computing hardware, because then you can easily make the applications more abstract and scalable.”

Part of the seeming conflict is due to the scope of specificity. “When you look at the architecture, it’s really the scope that determines the application specificity,” says Frank Schirrmeister, vice president of solutions and business development at Arteris. “Domain-specific computing is ubiquitous now. The important part is the constant moving up of the domain specificity to something more complex — from the original IP, to configurable IP, to subsystems that are configurable.”

In the past, it has been driven more by economics. “There’s an ebb and a flow to it,” says Paul Karazuba, vice president of marketing at Expedera. “There’s an ebb and a flow to putting everything into a processor. There’s an ebb and a flow to having co-processors, augmenting functions that are inside of that main processor. It’s a natural evolution of pretty much everything. It may not necessarily be cheaper to design your own silicon, but it may be more expensive in the long run to not design your own silicon.”

An attempt to formalize that ebb and flow was made by Tsugio Makimoto in the 1990s, when he was Sony’s CTO. He observed that electronics cycled between custom solutions and programmable ones approximately every 10 years. What’s changed is that most custom chips from the time of his observation contained highly programmable standard components.

Technology drivers
Today, it would appear that technical issues will decide this. “The industry has managed to work around power issues and push up the thermal envelope beyond points I personally thought were going to be reasonable, or feasible,” says Elad Alon, co-founder and CEO of Blue Cheetah. “We’re hitting that power limit, and when you hit the power limit it drives you toward customization wherever you can do it. But obviously, there is tension between flexibility, scalability, and applicability to the broadest market possible. This is seen in the fast pace of innovation in the AI software world, where tomorrow there could be an entirely different algorithm, and that throws out almost all the customizations one may have done.”

The slowing of Moore’s Law will have a fundamental influence on the balance point. “There have been a number of bespoke silicon companies in the past that were successful for a short period of time, but then failed,” says Ansys’ Swinnen. “They had made some kind of advance, be it architectural or addressing a new market need, but then the general-purpose chips caught up. That is because there’s so much investment in them, and there’s so many people using them, there’s an entire army of people advancing, versus your company, just your team, that’s advancing your bespoke solution. Inevitably, sooner or later, they bypass you and the general-purpose hardware just gets better than the specific one. Right now, the pendulum has swung toward custom solutions being the winner.”

However, general-purpose processors do not automatically advance if companies don’t keep up with adoption of the latest nodes, and that leads to even more opportunities. “When adding accelerators to a general-purpose processor starts to break down, because you want to go faster or become more efficient, you start to create truly customized implementations,” says Siemens’ Klein. “That’s where high-level synthesis starts to become really interesting, because you’ve got that software-defined implementation as your starting point. We can take it through high-level synthesis (HLS) and build an accelerator that’s going to do that one specific thing. We could leave a bunch of registers to define its behavior, or we can just hard code everything. The less general that system is, the more specific it is, usually the higher performance and the greater efficiency that we’re going to take away from it. And it almost always is going to be able to beat a general-purpose accelerator or certainly a general-purpose processor in terms of both performance and efficiency.”

At the same time, IP has become massively configurable. “There used to be IP as the building blocks,” says Arteris’ Schirrmeister. “Since then, the industry has produced much larger and more complex IP that takes on the role of sub-systems, and that’s where scope comes in. We have seen Arm with what they call the compute sub-systems (CSS), which are an integration and then hardened. People care about the chip as a whole, and then the chip and the system context with all that software. Application specificity has become ubiquitous in the IP space. You either build hard cores, you use a configurable core, or you use high-level synthesis. All of them are, by definition, application-specific, and the configurability plays in there.”

Put in perspective, there is more than one way to build a device, and an increasing number of options for getting it done. “There’s a really large market for specialized computing around some algorithm,” says Klein. “IP for that is going to be both in the form of discrete chips, as well as IP that could be built into something. Ultimately, that has to become silicon. It’s got to be hardened to some degree. They can set some parameters and bake it into somebody’s design. Consider an Arm processor. I can configure how many CPUs I want, I can configure how big I want the caches, and then I can go bake that into a specific implementation. That’s going to be the thing that I build, and it’s going to be more targeted. It will have better efficiency and a better cost profile and a better power profile for the thing that I’m doing. Somebody else can take it and configure it a little bit differently. And to the degree that the IP works, that’s a great solution. But there will always be algorithms that don’t have a big enough market for IP to address. And that’s where you go in and do the extreme customization.”

Chiplets
Some have questioned if the emerging chiplet industry will reverse this trend. “We will continue to see systems composed of many hardware accelerator blocks, and advanced silicon integration technologies (i.e., 3D stacking and chiplets) will make that even easier,” says Mythic’s Fick. “There are many companies working on open standards for chiplets, enabling communication bandwidth and energy efficiency that is an order of magnitude greater than what can be built on a PCB. Perhaps soon, the advanced system-in-package will overtake the PCB as the way systems are designed.”

Chiplets are not likely to be highly configurable. “Configuration in the chiplet world might become just a function of switching off things you don’t need,” says Schirrmeister. “Configuration really means that you do not use certain things. You don’t get your money back for those items. It’s all basically applying math and predicting what your volumes are going to be. If it’s an incremental cost that has one more block on it to support another interface, or making the block the Ethernet block with time triggered stuff in it for automotive, that gives you an incremental effort of X. Now, you have to basically estimate whether it also gives you a multiple of that incremental effort as incremental profit. It works out this way because chips just become very configurable. Chiplets are just going in the direction or finding the balance of more generic usage so that you can apply them in more chiplet designs.”

The chiplet market is far from certain today. “The promise of chiplets is that you use only the function that you want from the supplier that you want, in the right node, at the right location,” says Expedera’s Karazuba. “The idea of specialization and chiplets are at arm’s length. They’re actually together, but chiplets have a long way to go. There’s still not that universal agreement of the different things around a chiplet that have to be in order to make the product truly mass market.”

While chiplets have been proven to work, nearly all of the chiplets in use today are proprietary. “To build a viable [commercial] chiplet company, you have to be going after a broad enough market, large enough from a dollar perspective, then you can make all the investment, have success and get everything back accordingly,” says Blue Cheetah’s Alon. “There’s a similar tension where people would like to build a general-purpose chiplet that can be used anywhere, by anyone. That is the plug-and-play discussion, but you could finish up with something that becomes so general-purpose, with so much overhead, that it’s just not attractive in any particular market. In the chiplet case, for technical reasons, it might not actually really work that way at all. You might try to build it for general purpose, and it turns out later that it doesn’t plug into particular sockets that are of interest.”

The economics of chiplet viability have not yet been defined. “The thing about chiplets is they can be small,” says Klein. “Being small means that we don’t need as big a market for them as we would for a very large chip. We can also build them on different technologies. We can have some that are on older technologies, where transistors are cheaper, and we can combine those with other chiplets that might be leading-edge nodes where we could have general-purpose CPUs or NPU accelerators. There’s a mix-and-match, and we can do chiplets smaller than we can general-purpose chips. We can do smaller runs of them. We can take that IP and customize it for a particular market vertical and create some chiplets for that, change the configuration a bit, and do another run for something else. There’s a level of customization that can be deployed and supported by the market that’s a little bit more than we’ve seen in full-size chips, where the entire thing has to be built into one package.

Conclusion
What it means for a design to be general-purpose or custom is changing. All designs will contain some of each. Some companies will develop novel architectures using general-purpose processors, and these will be better than a fully general-purpose solution. Others will create highly customized hardware for some functions that are known to be stable, and general purpose for things that are likely to change. One thing has never changed, however. A company is not likely to add more customization than necessary to satisfy the needs of the market they are targeting.

Further Reading
Challenges With Chiplets And Power Delivery
Benefits and challenges in heterogeneous integration.
Chiplets: 2023 (EBook)
What chiplets are, what they are being used for today, and what they will be used for in the future.

The post Will Domain-Specific ICs Become Ubiquitous? appeared first on Semiconductor Engineering.

  • ✇Semiconductor Engineering
  • DDR5 PMICs Enable Smarter, Power-Efficient Memory ModulesTim Messegee
    Power management has received increasing focus in microelectronic systems as the need for greater power density, efficiency and precision have grown apace. One of the important ongoing trends in service of these needs has been the move to localizing power delivery. To optimize system power, it’s best to deliver as high a voltage as possible to the endpoint where the power is consumed. Then at the endpoint, that incoming high voltage can be regulated into the lower voltages with higher currents r
     

DDR5 PMICs Enable Smarter, Power-Efficient Memory Modules

16. Květen 2024 v 09:05

Power management has received increasing focus in microelectronic systems as the need for greater power density, efficiency and precision have grown apace. One of the important ongoing trends in service of these needs has been the move to localizing power delivery. To optimize system power, it’s best to deliver as high a voltage as possible to the endpoint where the power is consumed. Then at the endpoint, that incoming high voltage can be regulated into the lower voltages with higher currents required by the endpoint components.

We saw this same trend play out in the architecting of the DDR5 generation of computer main memory. In planning for DDR5, the industry laid out ambitious goals for memory bandwidth and capacity. Concurrently, the aim was to maintain power within the same envelope as DDR4 on a per module basis. In order to achieve these goals, DDR5 required a smarter DIMM architecture; one that would embed more intelligence in the DIMM and increase its power efficiency. One of the biggest architectural changes of this smarter DIMM architecture was moving power management from the motherboard to an on-module Power Management IC (PMIC) on each DDR5 RDIMM.

In previous DDR generations, the power regulator on the motherboard had to deliver a low voltage at high current across the motherboard, through a connector and then onto the DIMM. As supply voltages were reduced over time (to maintain power levels at higher data rates), it was a growing challenge to maintain the desired voltage level because of IR drop. By implementing a PMIC on the DDR5 RDIMM, the problem with IR drop was essentially eliminated.

In addition, the on-DIMM PMIC allows for very fine-grain control of the voltage levels supplied to the various components on the DIMM. As such, DIMM suppliers can really dial in the best power levels for the performance target of a particular DIMM configuration. On-DIMM PMICs also offered an economic benefit. Power management on the motherboard meant the regulator had to be designed to support a system with fully populated DIMMs. On-DIMM PMICs means only paying for the power management capacity you need to support your specific system memory configuration.

The upshot is that power management has become a major enabler of increasing memory performance. Advancing memory performance has been the mission of Rambus for nearly 35 years. We’re intimate with memory subsystem design on modules, with expertise across many critical enabling technologies, and have demonstrated the disciplines required to successfully develop chips for the challenging module environment with its increased power density, space constraints and complex thermal management challenges.

As part of the development of our DDR5 memory interface chipset, Rambus built a world-class power management team and has now introduced a new family of DDR5 server PMICs. This new server PMIC product family lays the foundation for a roadmap of future power management chips. As AI continues to expand from training to inference, increasing demands on memory performance will extend beyond servers to client systems and drive the need for new PMIC solutions tailored for emerging use cases and form factors across the computing landscape.

Resources:

The post DDR5 PMICs Enable Smarter, Power-Efficient Memory Modules appeared first on Semiconductor Engineering.

  • ✇Semiconductor Engineering
  • How Quickly Can You Take Your Idea To Chip Design?Kira Jones
    Gone are the days of expensive tapeouts only done by commercial companies. Thanks to Tiny Tapeout, students, hobbyists, and more can design a simple ASIC or PCB design and actually send it to a foundry for a small fraction of the usual cost. Learners from all walks of life can use the resources to learn how to design a chip, without signing an NDA or installing licenses, faster than ever before. Whether you’re a digital, analog, or mixed-signal designer, there’s support for you. We’re excited to
     

How Quickly Can You Take Your Idea To Chip Design?

16. Květen 2024 v 09:04

Gone are the days of expensive tapeouts only done by commercial companies. Thanks to Tiny Tapeout, students, hobbyists, and more can design a simple ASIC or PCB design and actually send it to a foundry for a small fraction of the usual cost. Learners from all walks of life can use the resources to learn how to design a chip, without signing an NDA or installing licenses, faster than ever before. Whether you’re a digital, analog, or mixed-signal designer, there’s support for you.

We’re excited to support our academic network in participating in this initiative to gain more hands-on experience that will prepare them for a career in the semiconductor industry. We have professors incorporating it into the classroom, giving students the exciting opportunity to take their ideas from concept to reality.

“It gives people this joy when we design the chip in class. The 50 students that took the class last year, they designed a chip and Google funded it, and every time they got their design on the chip, their eyes got really big. I love being able to help students do that, and I want to do that all over the country,” said Matt Morrison, associate teaching professor in computer science and engineering, University of Notre Dame.

We also advise and encourage extracurricular design teams to challenge themselves outside the classroom. We partner with multiple design teams focused on creating an environment for students to explore the design flow process from RTL-to-GDS, and Tiny Tapeout provides an avenue for them.

“Just last year, SiliconJackets was founded by Zachary Ellis and me as a Georgia Tech club that takes ideas to SoC tapeout. Today, I am excited to share that we submitted the club’s first-ever design to Tiny Tapeout 6. This would not have been possible without the help from our advisors, and industry partners at Apple and Cadence,” said Nealson Li, SiliconJackets vice president and co-founder.

Tiny Tapeout also creates a culture of knowledge sharing, allowing participants to share their designs online, collaborate with one another, and build off an existing design. This creates a unique opportunity to learn from others’ experiences, enabling faster learning and more exposure.

“One of my favorite things about this project is that you’re not only going to get your design, but everybody else’s as well. You’ll be able to look through the chips’ data sheet and try out someone else’s design. In our previous runs, we’ve seen some really interesting designs, including RISC-V CPUs, FPGAs, ring oscillators, synthesizers, USB devices, and loads more,” said Matt Venn, science & technology communicator and electronic engineer.

Tiny Tapeout is on its seventh run, launched on April 22, 2024, and will remain open until June 1, 2024, or until all the slots fill up! With each run, more unique designs are created, more knowledge is shared, and more of the future workforce is developed. Check out the designs that were just submitted for Tiny Tapeout 6.

What can you expect when you participate?

  • Access to training materials
  • Ability to create your own design using one of the templates
  • Support from the FAQs or Tiny Tapeout community

Interested in learning more? Check out their webpage. Want to use Cadence tools for your design? Check out our University Program and what tools students can access for free. We can’t wait to see what you come up with and how it’ll help you launch a career in the electronics industry!

The post How Quickly Can You Take Your Idea To Chip Design? appeared first on Semiconductor Engineering.

  • ✇Semiconductor Engineering
  • Research Bits: May 13Jesse Allen
    On-chip microcapacitors Scientists from Lawrence Berkeley National Laboratory and University of California Berkeley developed microcapacitors with ultrahigh energy and power density that could be used for on-chip energy storage. The microcapacitors were made with thin films of hafnium oxide (HfO2) and zirconium oxide (ZrO2) engineered to achieve a negative capacitance effect, which increased overall capacitance and enabled it to store greater amounts of charge. “We’ve shown that it’s possible to
     

Research Bits: May 13

13. Květen 2024 v 09:01

On-chip microcapacitors

Scientists from Lawrence Berkeley National Laboratory and University of California Berkeley developed microcapacitors with ultrahigh energy and power density that could be used for on-chip energy storage.

The microcapacitors were made with thin films of hafnium oxide (HfO2) and zirconium oxide (ZrO2) engineered to achieve a negative capacitance effect, which increased overall capacitance and enabled it to store greater amounts of charge.

“We’ve shown that it’s possible to store a lot of energy in microcapacitors made from engineered thin films, much more than what is possible with ordinary dielectrics,” said Sayeef Salahuddin, a Berkeley Lab faculty senior scientist and UC Berkeley professor, in a release. “What’s more, we’re doing this with a material that can be processed directly on top of microprocessors.”

The films were grown with atomic layer deposition. The ratio of HfO2 and ZrO2 leads the films to be either ferroelectric or antiferroelectric. Balancing the composition at the tipping point between the two gives rise to the negative capacitance effect where the material can be very easily polarized by even a small electric field.

By interspersing atomically thin layers of aluminum oxide after every few layers of HfO2-ZrO2, they could grow the films up to 100 nm thick and integrate them into 3D trench capacitor structures. The researchers claim the microcapacitor shows 9x higher energy density and 170x higher power density compared to today’s electrostatic capacitors. They are working on scaling up the technology and integrating it into full-size microchips. [1]

Deformable micro-supercapacitor

Researchers from Pohang University of Science and Technology (POSTECH) and Korea Institute of Industrial Technology (KITECH) built a micro-supercapacitor (MSC) capable of stretching, twisting, folding, and wrinkling.

The team used laser ablation for fine patterning of both eutectic gallium-indium liquid metal (EGaIn) and graphene layers on a stretchable polystyrene-block-poly(ethylene-co-butylene)-block-polystyrene copolymer (SEBS) substrate.

The MSC retailed its areal capacitance after stretching up to 1,000 cycles and operated stably while being mechanically deformed. [2]

Oriented 2D nanofillers

Researchers from the University of Houston, Jackson State University, and Howard University have developed a flexible high-energy-density capacitor created using layered polymers with mechanically exfoliated flakes of 2D materials as nanofillers.

By arranging materials like mica and hexagonal boron nitride (hBN) in specific layers, they created a thin sandwich-like structure with higher energy density and efficiency than capacitors with randomly blended-in nanofillers.

“Our work demonstrates the development of high energy and high-power density capacitors by blocking electrical breakdown pathways in polymeric materials using the oriented 2D nanofillers,” said Maninderjeet Singh, a University of Houston chemical engineering PhD graduate and now a postdoctoral research scientist at Columbia University, in a release. “We achieved an ultra-high energy density of approximately 75 J/cm³, the highest reported for a polymeric dielectric capacitor to date.” [3]

References

[1] Cheema, S.S., Shanker, N., Hsu, SL. et al. Giant energy storage and power density negative capacitance superlattices. Nature (2024). https://doi.org/10.1038/s41586-024-07365-5

[2] Kim, KW., Park, S.J., Park, SJ. et al. Deformable micro-supercapacitor fabricated via laser ablation patterning of Graphene/liquid metal. npj Flex Electron 8, 18 (2024). https://doi.org/10.1038/s41528-024-00306-2

[3] Singh, M., Das, P., Samanta, P. N., et al. Ultrahigh Capacitive Energy Density in Stratified 2D Nanofiller-Based Polymer Dielectric Films. ACS Nano 2023 17 (20), 20262-20272. https://doi.org/10.1021/acsnano.3c06249

The post Research Bits: May 13 appeared first on Semiconductor Engineering.

  • ✇Semiconductor Engineering
  • Distributing RTL Simulation Across Thousands Of Cores On 4 IPU Sockets (EPFL)Technical Paper Link
    A technical paper titled “Parendi: Thousand-Way Parallel RTL Simulation” was published by researchers at EPFL. Abstract: “Hardware development relies on simulations, particularly cycle-accurate RTL (Register Transfer Level) simulations, which consume significant time. As single-processor performance grows only slowly, conventional, single-threaded RTL simulation is becoming less practical for increasingly complex chips and systems. A solution is parallel RTL simulation, where ideally, simulators
     

Distributing RTL Simulation Across Thousands Of Cores On 4 IPU Sockets (EPFL)

A technical paper titled “Parendi: Thousand-Way Parallel RTL Simulation” was published by researchers at EPFL.

Abstract:

“Hardware development relies on simulations, particularly cycle-accurate RTL (Register Transfer Level) simulations, which consume significant time. As single-processor performance grows only slowly, conventional, single-threaded RTL simulation is becoming less practical for increasingly complex chips and systems. A solution is parallel RTL simulation, where ideally, simulators could run on thousands of parallel cores. However, existing simulators can only exploit tens of cores.
This paper studies the challenges inherent in running parallel RTL simulation on a multi-thousand-core machine (the Graphcore IPU, a 1472-core machine). Simulation performance requires balancing three factors: synchronization, communication, and computation. We experimentally evaluate each metric and analyze how it affects parallel simulation speed, drawing on contrasts between the large-scale IPU and smaller but faster x86 systems.
Using this analysis, we build Parendi, an RTL simulator for the IPU. It distributes RTL simulation across 5888 cores on 4 IPU sockets. Parendi runs large RTL designs up to 4x faster than a powerful, state-of-the-art x86 multicore system.”

Find the technical paper here. Published March 2024 (preprint).

Emami, Mahyar, Thomas Bourgeat, and James Larus. “Parendi: Thousand-Way Parallel RTL Simulation.” arXiv preprint arXiv:2403.04714 (2024).

Related Reading
Anatomy Of A System Simulation
Balancing the benefits of a model with the costs associated with that model is tough, but it becomes even trickier when dissimilar models are combined.

The post Distributing RTL Simulation Across Thousands Of Cores On 4 IPU Sockets (EPFL) appeared first on Semiconductor Engineering.

  • ✇Semiconductor Engineering
  • Voltage Reference Architectures For Harsh Environments: Quantum Computing And SpaceTechnical Paper Link
    A technical paper titled “Cryo-CMOS Voltage References for the Ultrawide Temperature Range From 300 K Down to 4.2 K” was published by researchers at Delft University of Technology, QuTech, Kavli Institute of Nanoscience Delft, and École Polytechnique Fédérale de Lausanne (EPFL). Abstract: “This article presents a family of sub-1-V, fully-CMOS voltage references adopting MOS devices in weak inversion to achieve continuous operation from room temperature (RT) down to cryogenic temperatures. Their
     

Voltage Reference Architectures For Harsh Environments: Quantum Computing And Space

A technical paper titled “Cryo-CMOS Voltage References for the Ultrawide Temperature Range From 300 K Down to 4.2 K” was published by researchers at Delft University of Technology, QuTech, Kavli Institute of Nanoscience Delft, and École Polytechnique Fédérale de Lausanne (EPFL).

Abstract:

“This article presents a family of sub-1-V, fully-CMOS voltage references adopting MOS devices in weak inversion to achieve continuous operation from room temperature (RT) down to cryogenic temperatures. Their accuracy limitations due to curvature, body effect, and mismatch are investigated and experimentally validated. Implemented in 40-nm CMOS, the references show a line regulation better than 2.7%/V from a supply as low as 0.99 V. By applying dynamic element matching (DEM) techniques, a spread of 1.2% (3 σ ) from 4.2 to 300 K can be achieved, resulting in a temperature coefficient (TC) of 111 ppm/K. As the first significant statistical characterization extending down to cryogenic temperatures, the results demonstrate the ability of the proposed architectures to work under cryogenic harsh environments, such as space-and quantum-computing applications.”

Find the technical paper here. Published April 2024.

J. van Staveren et al., “Cryo-CMOS Voltage References for the Ultrawide Temperature Range From 300 K Down to 4.2 K,” in IEEE Journal of Solid-State Circuits, doi: 10.1109/JSSC.2024.3378768.

Further Reading
The Race Toward Quantum Advantage
Enormous amounts of money have been invested into quantum computing, but so far it has not surpassed conventional computers. When will that change?
Managing P/P Tradeoffs With Voltage Droop Gets Trickier
Higher current densities set against lower power envelopes makes meeting specs more challenging, especially at advanced nodes.

The post Voltage Reference Architectures For Harsh Environments: Quantum Computing And Space appeared first on Semiconductor Engineering.

  • ✇Semiconductor Engineering
  • Design Considerations In PhotonicsKaren Heyman
    Experts at the Table: Semiconductor Engineering sat down to talk about what CMOS and photonics engineers need to know to successfully collaborate, with James Pond, fellow at Ansys; Gilles Lamant, distinguished engineer at Cadence; and Mitch Heins, business development manager for photonic solutions at Synopsys. What follows are excerpts of that conversation. To view part one of this discussion, click here. Part two is here. L-R: Ansys’s Pond, Cadence’s Lamant, Synopsys’ Heins SE: What do engine
     

Design Considerations In Photonics

1. Květen 2024 v 09:05

Experts at the Table: Semiconductor Engineering sat down to talk about what CMOS and photonics engineers need to know to successfully collaborate, with James Pond, fellow at Ansys; Gilles Lamant, distinguished engineer at Cadence; and Mitch Heins, business development manager for photonic solutions at Synopsys. What follows are excerpts of that conversation. To view part one of this discussion, click here. Part two is here.


L-R: Ansys’s Pond, Cadence’s Lamant, Synopsys’ Heins

SE: What do engineers who have spent their careers in CMOS need to know about designing for photonics?

Lamant:  It’s hard, no illusion. I had good mentors, including both James and Mitch, so I actually did that transition. Ten years ago, I knew nearly nothing about photonics. It takes having good mentors who can help you. That’s the biggest thing. It’s not enough to just try the software on your own. In addition, having an RF background is very useful in many ways. Photonics is the multiplication of RF. In photonics, you have multiple modes. In RF, you tend to only consider one mode, but a lot of the theory behind photonics is very much a generalization of RF.

Heins: We try to make our photonics flow look as much as we can like our electronics flow. We try to take the last 30 to 40 years of learning in EDA and apply it to photonics. One thing we see a lot is that when people are coming right out of school in photonics, they don’t necessarily have a deep background in how to do IC design. There are a lot of things we’ve learned, like design rule checking, that we now take for granted. It’s like breathing. You’ve got to do it. Layout versus schematic, you’ve got to do it. Even circuit-level simulation. As CMOS veterans, you’d think, of course, you always simulate your circuit before you go to manufacture, but that’s not the case in photonics.

Lamant: Those people actually know photonics, but they don’t know how to create a system. This is a different type of challenge. People who know photonics, know how to make a device. They’re expert at that. But they have no idea how to take that device and bring it to a full system that they can sell. I see that in so many startups. It’s not to make the point for EDA software. They use free software. They use Klayout and all those things that they have access to in the university. But all of those tools are not part of the ecosystem of trying to make a system. They say, ‘We wrote a custom simulator to simulate our ring.’ But the question then is, ‘How do you simulate the driver for your ring that goes with it?’ I see many startups fail because they don’t have that ability to take it from academic thinking to production.

You have the electronics people trying to do photonics, they have some methodology background, and other things, but they have a gap in knowledge. Fortunately, they can get caught up, especially if they’re an analog designer or an RF designer. They can close that gap by talking to the right people. Unfortunately, the people who know photonics do not have the knowledge of how to make a full system out of it, and this is greatly hurting the photonics world.

Pond: I would agree. We have two worlds of engineers who have been coming together over the last decade or so. Those who came from an EDA background — electrical circuit design, especially RF — have probably had the easiest time. We’ve been doing better and better for them. Ten years ago there was nothing. Now, there’s a more traditional workflow that looks more like an EDA workflow. Still, they have a lot to learn. But the workflow, the cockpit, and so on, follows along with the EDA model.

In the other direction, maybe we haven’t done quite as good a job because people coming from a photonics background can be really thrown off by the scale and complexity of EDA tools. My impression, coming from photonics, is EDA tools have been developed over many decades. When that happens, you end up with tools that are incredibly powerful, but you wonder if they’d been developed more recently, maybe things wouldn’t be done this way. There’s a resistance on the photonic engineer side to dive into that world because there’s a lot to learn about the EDA workflows. People from photonics have to embrace and take on that EDA world, because, as Gilles says, it’s necessary, it really has to be done.

Heins:  Now, you’re seeing a ton of work going into how to apply AI to help folks bring these kinds of more complex flows under control. There’s so much to learn, but if AI can help you take care of the plumbing, if you will, you can advance much faster. We already extensively use AI for SoCs or packaged designs where you have tens and hundreds of billions of transistors. Photonics is a different vector. The signal itself is much more complex than electrical. The optimization that you have to go through is much more complex. But AI can help get a handle on that, so as we go forward, you’ll start to see these kinds of complexities simplified for people.

SE: Is there something analogous to error correction/parity checks in the photonics world?

Lamant: That can’t be analogized to photonics, because that’s about knowing the original signal and comparing it to the others. Once you have reconstituted your data, and it’s back to being a digital set of bits, then you have a parity check or different types of things that today have nothing to do with photonics because it’s the physical link. In physical links, you can do retiming or a lot of things, but the error correction happens independently, on both sides.

Heins: Tuning might be something closer to it. If my resonance frequencies are not as expected, can I detect that and then adjust for it? That happens a lot. You could think of those kinds of things as error correction.

Pond: Most of the kind of error correction we’re talking about is just using all the standard methods, whether you have an optical link or a copper link. But there are some really interesting things. We had a workflow, developed between Ansys and Cadence a few years ago on a PAM-4 system, where we did a driver simulation and the photonic link together. You look into shifting the timing of signals to compensate for different effects. If you look at the eye at different locations, it may look completely distorted and wrong, because you’re pre-compensating for an effect that’s going to come later through the photonic portion of the link. That’s one of the reasons why it’s important to be able to do the full system simulation. You can’t just independently optimize the driver electronics and the photonics. They have to be done together, so you can perform the signal correction work.

Heins: You do things like equalization. Dispersion is another one. You get different wavelengths traveling at different speeds, and we compensate for that. At the physical level, there are some corrections that do take place, depending on the kind of system you’re trying to make. If you’re in coherent systems, where path links matter, phase matters, that’s more like trying to make the circuit correct by construction, so that you don’t encounter problems.

That raises another issue, which is manufacturing variances. There, you’re back to doing lots of sensitivity analysis through Monte Carlo-type simulations, parameterized simulations, etc., where you’re trying to get a feel for the sensitivity of your device, to a shift that could occur, either through the manufacturing process or just as this system sits in its ecosystem of whatever’s around it. It’s not quite error correction, per se, but certainly trying to design for that is something we care about.

SE: Any concluding thoughts?

Lamant: There is a lot of wondering and pondering right now, but it’s also exciting. We’ve reached the point where photonics is here to stay and will be part of more and more things. Looking forward, the interesting question is where it will become part of the actual data processing. Sensing is a terrific application for photonics, but I am not totally sold on the actual data processing. I’m not even using the word “computing” here, because processing and computing are very different things. Photonics is probably never going to be doing general computing. It may be doing specialized niche, like a Fourier transform-type of processing, and it needs to be part of a system.

Heins: It comes down to two things. What will really happen with quantum computing? And will quantum computing use photonics? A lot of people are looking at photonics for quantum computing because you can do a lot more of that work at room temperature than at 4 Kelvin or something like that — not all of it, but big chunks of it. If quantum computing actually becomes more than prototypes, and photonics is a big part of that, that could shift the answer. The other big issue in compute is we don’t have memory for photonics. If someone makes a breakthrough where suddenly states can be stored in some fashion, then all bets are off and everything changes again. But at this point, I don’t see anything promising.

One of the biggest challenges we have going forward for the whole ecosystem, in general, is lack of standards in this space, which makes interoperability between tools from our companies very difficult. The signal in photonics is very complex. It’s actually complex math, with real and imaginary parts. There are a lot of extra things that we have to take into account, and a lot of times we don’t even have common nomenclature or agreement on metrics and how to measure things. This is going to take time, but it’s being pushed by customers driving us to work together. For example, chiplets are great for photonics because a photonic IC is a chiplet. But all of a sudden, now you’re in a mixed domain, multi-physics type of environment, and there are some huge challenges to make that all work together. We have a pretty good handle on system functional verification, design-for-test, and all these things in the electronic IC world. In photonics, we’ve got a lot of work to do.

Pond: For me, it’s been exciting. I’ve been doing this for more than 20 years. In 2022, when I saw the first product with fibers actually coming out of the package, that was the dream from 20 years back. It took a lot of effort to get there. Things have been maturing very fast, especially in the last decade. That’s really promising from an EDA/EPDA-type of workflow perspective. The datacoms, as we’ve all said, are proven and not going to go away, given the investment from foundries, which is going to continue and even accelerate. It’s exciting times for all these other applications, from sensing to quantum and so on. There’s a lot of innovation possible. It’s not clear what’s going to be a winner yet and what’s not, but it’s a great time to be in photonics.

Read parts one and two of the discussion:
Photonics: The Former And Future Solution
Twenty-five years ago, photonics was supposed to be the future of high technology. Has that future finally arrived?
The Challenges Of Working With Photonics
From curvilinear designs to thermal vulnerabilities, what engineers need to know about the advantages and disadvantages of photonics

The post Design Considerations In Photonics appeared first on Semiconductor Engineering.

Merging Power and Arithmetic Optimization Via Datapath Rewriting (Intel, Imperial College London)

A new technical paper titled “Combining Power and Arithmetic Optimization via Datapath Rewriting” was published by researchers at Intel Corporation and Imperial College London.

Abstract:
“Industrial datapath designers consider dynamic power consumption to be a key metric. Arithmetic circuits contribute a major component of total chip power consumption and are therefore a common target for power optimization. While arithmetic circuit area and dynamic power consumption are often correlated, there is also a tradeoff to consider, as additional gates can be added to explicitly reduce arithmetic circuit activity and hence reduce power consumption. In this work, we consider two forms of power optimization and their interaction: circuit area reduction via arithmetic optimization, and the elimination of redundant computations using both data and clock gating. By encoding both these classes of optimization as local rewrites of expressions, our tool flow can simultaneously explore them, uncovering new opportunities for power saving through arithmetic rewrites using the e-graph data structure. Since power consumption is highly dependent upon the workload performed by the circuit, our tool flow facilitates a data dependent design paradigm, where an implementation is automatically tailored to particular contexts of data activity. We develop an automated RTL to RTL optimization framework, ROVER, that takes circuit input stimuli and generates power-efficient architectures. We evaluate the effectiveness on both open-source arithmetic benchmarks and benchmarks derived from Intel production examples. The tool is able to reduce the total power consumption by up to 33.9%.”

Find the technical paper here. Published April 2024.

Samuel Coward, Theo Drane, Emiliano Morini, George Constantinides; arXiv:2404.12336v1.

The post Merging Power and Arithmetic Optimization Via Datapath Rewriting (Intel, Imperial College London) appeared first on Semiconductor Engineering.

Inside Google’s Secret Lab: How Google Pixel Phone Cameras Are Tested

22. Duben 2024 v 13:12
Google Pixel Secret Camera Lab

Google Pixel devices are best known for their camera capabilities. Take a look at DxOMark’s smartphone camera rankings. You will notice that the latest device ...

The post Inside Google’s Secret Lab: How Google Pixel Phone Cameras Are Tested appeared first on Gizchina.com.

Merging Power and Arithmetic Optimization Via Datapath Rewriting (Intel, Imperial College London)

A new technical paper titled “Combining Power and Arithmetic Optimization via Datapath Rewriting” was published by researchers at Intel Corporation and Imperial College London.

Abstract:
“Industrial datapath designers consider dynamic power consumption to be a key metric. Arithmetic circuits contribute a major component of total chip power consumption and are therefore a common target for power optimization. While arithmetic circuit area and dynamic power consumption are often correlated, there is also a tradeoff to consider, as additional gates can be added to explicitly reduce arithmetic circuit activity and hence reduce power consumption. In this work, we consider two forms of power optimization and their interaction: circuit area reduction via arithmetic optimization, and the elimination of redundant computations using both data and clock gating. By encoding both these classes of optimization as local rewrites of expressions, our tool flow can simultaneously explore them, uncovering new opportunities for power saving through arithmetic rewrites using the e-graph data structure. Since power consumption is highly dependent upon the workload performed by the circuit, our tool flow facilitates a data dependent design paradigm, where an implementation is automatically tailored to particular contexts of data activity. We develop an automated RTL to RTL optimization framework, ROVER, that takes circuit input stimuli and generates power-efficient architectures. We evaluate the effectiveness on both open-source arithmetic benchmarks and benchmarks derived from Intel production examples. The tool is able to reduce the total power consumption by up to 33.9%.”

Find the technical paper here. Published April 2024.

Samuel Coward, Theo Drane, Emiliano Morini, George Constantinides; arXiv:2404.12336v1.

The post Merging Power and Arithmetic Optimization Via Datapath Rewriting (Intel, Imperial College London) appeared first on Semiconductor Engineering.

GDC 2024: We reveal incredible Work Graphs perf, AMD FSR 3.1, GI with Brixelizer, and so much more

22. Březen 2024 v 17:00

AMD GPUOpen - Graphics and game developer resources

Learn about our GDC 2024 activities, including AMD FSR 3.1, AMD FidelityFX Brixelizer, work graphs, mesh shaders, tools, CPU, and more.

The post GDC 2024: We reveal incredible Work Graphs perf, AMD FSR 3.1, GI with Brixelizer, and so much more appeared first on AMD GPUOpen.

  • ✇Semiconductor Engineering
  • Ultrathin vdW Ferromagnet at Room Temperature (MIT)Technical Paper Link
    A technical paper titled “Current-induced switching of a van der Waals ferromagnet at room temperature” was published by researchers at Massachusetts Institute of Technology (MIT). Abstract: “Recent discovery of emergent magnetism in van der Waals magnetic materials (vdWMM) has broadened the material space for developing spintronic devices for energy-efficient computation. While there has been appreciable progress in vdWMM discovery, a solution for non-volatile, deterministic switching of vdWMMs
     

Ultrathin vdW Ferromagnet at Room Temperature (MIT)

A technical paper titled “Current-induced switching of a van der Waals ferromagnet at room temperature” was published by researchers at Massachusetts Institute of Technology (MIT).

Abstract:

“Recent discovery of emergent magnetism in van der Waals magnetic materials (vdWMM) has broadened the material space for developing spintronic devices for energy-efficient computation. While there has been appreciable progress in vdWMM discovery, a solution for non-volatile, deterministic switching of vdWMMs at room temperature has been missing, limiting the prospects of their adoption into commercial spintronic devices. Here, we report the first demonstration of current-controlled non-volatile, deterministic magnetization switching in a vdW magnetic material at room temperature. We have achieved spin-orbit torque (SOT) switching of the PMA vdW ferromagnet Fe3GaTe2  using a Pt spin-Hall layer up to 320 K, with a threshold switching current density as low as Jsw = 1.69 × 106 A cm-2 at room temperature. We have also quantitatively estimated the anti-damping-like SOT efficiency of our Fe3GaTe2/Pt bilayer system to be ξDL = 0:093, using the second harmonic Hall voltage measurement technique. These results mark a crucial step in making vdW magnetic materials a viable choice for the development of scalable, energy-efficient spintronic devices.”

Find the technical paper here. Published February 2024. MIT’s related news article and video is here.

Kajale, S.N., Nguyen, T., Chao, C.A. et al. Current-induced switching of a van der Waals ferromagnet at room temperature. Nat Commun 15, 1485 (2024). https://doi.org/10.1038/s41467-024-45586-4

 

 

The post Ultrathin vdW Ferromagnet at Room Temperature (MIT) appeared first on Semiconductor Engineering.

Google Chrome begins testing shared dictionary compression technology

Od: Efe Udin
8. Březen 2024 v 20:41
Chrome extensions

Google Chrome, a leading web browser, is at the forefront of testing innovative technologies to enhance web performance. One such advancement is the implementation of ...

The post Google Chrome begins testing shared dictionary compression technology appeared first on Gizchina.com.

  • ✇AMD GPUOpen
  • CPU profiling for UnityGPUOpen
    AMD GPUOpen - Graphics and game developer resources This is a general guide focusing on CPU profiling for Unity, including which tools are useful for profiling and how to use these tools to find hotspots in your code. The post CPU profiling for Unity appeared first on AMD GPUOpen.
     

CPU profiling for Unity

Od: GPUOpen
5. Leden 2024 v 16:56

AMD GPUOpen - Graphics and game developer resources

This is a general guide focusing on CPU profiling for Unity, including which tools are useful for profiling and how to use these tools to find hotspots in your code.

The post CPU profiling for Unity appeared first on AMD GPUOpen.

  • ✇AMD GPUOpen
  • Unreal Engine performance guideGPUOpen
    AMD GPUOpen - Graphics and game developer resources Our one-stop guide to performance with Unreal Engine. The post Unreal Engine performance guide appeared first on AMD GPUOpen.
     

Rapid Exchange Cooling With Trapped Ions For Implementation In A Quantum Charge-Coupled Device

A technical paper titled “Rapid exchange cooling with trapped ions” was published by researchers at Georgia Tech Research Institute.

Abstract:

“The trapped-ion quantum charge-coupled device (QCCD) architecture is a leading candidate for advanced quantum information processing. In current QCCD implementations, imperfect ion transport and anomalous heating can excite ion motion during a calculation. To counteract this, intermediate cooling is necessary to maintain high-fidelity gate performance. Cooling the computational ions sympathetically with ions of another species, a commonly employed strategy, creates a significant runtime bottleneck. Here, we demonstrate a different approach we call exchange cooling. Unlike sympathetic cooling, exchange cooling does not require trapping two different atomic species. The protocol introduces a bank of “coolant” ions which are repeatedly laser cooled. A computational ion can then be cooled by transporting a coolant ion into its proximity. We test this concept experimentally with two 40Ca+ ions, executing the necessary transport in 107 μs, an order of magnitude faster than typical sympathetic cooling durations. We remove over 96%, and as many as 102(5) quanta, of axial motional energy from the computational ion. We verify that re-cooling the coolant ion does not decohere the computational ion. This approach validates the feasibility of a single-species QCCD processor, capable of fast quantum simulation and computation.”

Find the technical paper here. Published February 2024.  A related news release, including a video, can be found here.

Fallek, S.D., Sandhu, V.S., McGill, R.A. et al. Rapid exchange cooling with trapped ions. Nat Commun 15, 1089 (2024). https://doi.org/10.1038/s41467-024-45232-z

Related Reading
The Race Toward Quantum Advantage
Enormous amounts of money have been invested into quantum computing, but so far it has not surpassed conventional computers. When will that change?

The post Rapid Exchange Cooling With Trapped Ions For Implementation In A Quantum Charge-Coupled Device appeared first on Semiconductor Engineering.

  • ✇Semiconductor Engineering
  • Ultra-Low Power CiM Design For Practical Edge ScenariosTechnical Paper Link
    A technical paper titled “Low Power and Temperature-Resilient Compute-In-Memory Based on Subthreshold-FeFET” was published by researchers at Zhejiang University, University of Notre Dame, Technical University of Munich, Munich Institute of Robotics and Machine Intelligence, and the Laboratory of Collaborative Sensing and Autonomous Unmanned Systems of Zhejiang Province. Abstract: “Compute-in-memory (CiM) is a promising solution for addressing the challenges of artificial intelligence (AI) and th
     

Ultra-Low Power CiM Design For Practical Edge Scenarios

A technical paper titled “Low Power and Temperature-Resilient Compute-In-Memory Based on Subthreshold-FeFET” was published by researchers at Zhejiang University, University of Notre Dame, Technical University of Munich, Munich Institute of Robotics and Machine Intelligence, and the Laboratory of Collaborative Sensing and Autonomous Unmanned Systems of Zhejiang Province.

Abstract:

“Compute-in-memory (CiM) is a promising solution for addressing the challenges of artificial intelligence (AI) and the Internet of Things (IoT) hardware such as ‘memory wall’ issue. Specifically, CiM employing nonvolatile memory (NVM) devices in a crossbar structure can efficiently accelerate multiply-accumulation (MAC) computation, a crucial operator in neural networks among various AI models. Low power CiM designs are thus highly desired for further energy efficiency optimization on AI models. Ferroelectric FET (FeFET), an emerging device, is attractive for building ultra-low power CiM array due to CMOS compatibility, high ION /IOF  ratio, etc. Recent studies have explored FeFET based CiM designs that achieve low power consumption. Nevertheless, subthreshold-operated FeFETs, where the operating voltages are scaled down to the subthreshold region to reduce array power consumption, are particularly vulnerable to temperature drift, leading to accuracy degradation. To address this challenge, we propose a temperature-resilient 2T-1FeFET CiM design that performs MAC operations reliably at subthreahold region from 0 to 85 Celsius, while consuming ultra-low power. Benchmarked against the VGG neural network architecture running the CIFAR-10 dataset, the proposed 2T-1FeFET CiM design achieves 89.45% CIFAR-10 test accuracy. Compared to previous FeFET based CiM designs, it exhibits immunity to temperature drift at an 8-bit wordlength scale, and achieves better energy efficiency with 2866 TOPS/W.”

Find the technical paper here. Published January 2024 (preprint).

Zhou, Yifei, Xuchu Huang, Jianyi Yang, Kai Ni, Hussam Amrouch, Cheng Zhuo, and Xunxhao Yin. “Low Power and Temperature-Resilient Compute-In-Memory Based on Subthreshold-FeFET.” arXiv preprint arXiv:2312.17442 (2023).

Related Reading
Increasing AI Energy Efficiency With Compute In Memory
How to process zettascale workloads and stay within a fixed power budget.
Modeling Compute In Memory With Biological Efficiency
Generative AI forces chipmakers to use compute resources more intelligently.

The post Ultra-Low Power CiM Design For Practical Edge Scenarios appeared first on Semiconductor Engineering.

  • ✇Semiconductor Engineering
  • Research Bits: Feb. 19Jesse Allen
    DNA assembly of 3D nanomaterials Scientists from Brookhaven National Laboratory, Columbia University, and Stony Brook University developed a method that uses DNA to instruct molecules to organize themselves into targeted 3D patterns and produce a wide variety of designed metallic and semiconductor 3D nanostructures. “We have been using DNA to program nanoscale materials for more than a decade,” said corresponding author Oleg Gang, a professor of chemical engineering and of applied physics and ma
     

Research Bits: Feb. 19

19. Únor 2024 v 09:01

DNA assembly of 3D nanomaterials

Scientists from Brookhaven National Laboratory, Columbia University, and Stony Brook University developed a method that uses DNA to instruct molecules to organize themselves into targeted 3D patterns and produce a wide variety of designed metallic and semiconductor 3D nanostructures.

“We have been using DNA to program nanoscale materials for more than a decade,” said corresponding author Oleg Gang, a professor of chemical engineering and of applied physics and materials science at Columbia Engineering, in a release. “Now, by building on previous achievements, we have developed a method for converting these DNA-based structures into many types of functional inorganic 3D nano-architectures, and this opens tremendous opportunities for 3D nanoscale manufacturing.”

Researchers program strands of DNA to “direct” the self-assembly process towards molecular arrangements that give rise to properties such as electrical conductivity, photosensitivity, and magnetism, which can then be scaled up to functional materials.

The team used the method to grow silica on a DNA lattice, which helped to create a robust structure. They then used vapor-phase infiltration and liquid-phase infiltration, which bonds a precursor chemical in vapor or liquid form to a nanoscale lattice, to produce a variety of 3D metallic structures.

Scientists used a new, universal method to create a variety of 3D metallic and semiconductor nanostructures, including this structure revealed by an electron microscope. The scale bar represents one micrometer. The superimposed graphics convey that the researchers combined multiple techniques to layer silicon dioxide, then alumina-doped zinc oxide, and finally platinum on top of a DNA “scaffolding.” This complex structure represents new possibilities for advanced manufacturing at small scales. (Credit: Brookhaven National Laboratory)

“Stacking these techniques showed much more depth of control than has ever been accomplished before,” said Aaron Michelson, a postdoctoral researcher at Brookhaven’s Center for Functional Nanomaterials, in a release. “Whatever vapors are available as precursors for vapor-phase infiltration can be coupled with various metal salts compatible with liquid-phase infiltration to create more complex structures. For example, we were able to combine platinum, aluminum, and zinc on top of one nanostructure.”

They were also able to add on semiconducting metal oxides, such as zinc oxide, to an insulating nanostructure, providing it with electrical conductivity and photoluminescent properties. [1]

Mott insulator transistor

Researchers from the University of Nebraska-Lincoln, Brookhaven National Laboratory, University of the Basque Country, and NYU Shanghai propose a way to make transistors out of Mott insulators.

The researchers were able to direct the Mott transition from insulator to metal and back again by topping a Mott insulator with a gate insulator made of a ferroelectric material and using a voltage to flip the ferroelectric material’s polarization. A third layer beneath the Mott channel that allows charges to migrate from the Mott down to it improved control over the insulator-metal transition with an on-off ratio of 385.

Additionally, the researchers claim that the Mott-ferroelectric pairing is more energy-efficient than other non-volatile but magnetism-based memory, including MRAM.

“We can have very high-performance devices, retaining many manufacturing processes of conventional semiconductors and overcoming some fundamental limitations of them,” said Xia Hong, professor of physics at the University of Nebraska-Lincoln, in a release. “I think it’s ready. It’s really competitive with other non-volatile memory technologies.” [2]

Faster wireless data speeds

Researchers from Osaka University and IMRA America suggest a way to increase wireless data transmission speeds by reducing the noise in the system using lasers.

Future 6G transmitters and receivers are expected to use the sub-terahertz band, which extends from 100 GHz to 300 GHz, using an approach called “multi-level signal modulation” to further increase the data transmission rate. However, this approach is highly sensitive to noise at the upper end of the frequency range.

“This problem has limited 300-GHz communications so far,” said Keisuke Maekawa of Osaka University in a statement. “However, we found that at high frequencies, a signal generator based on a photonic device had much less phase noise than a conventional electrical signal generator.”

The team used a stimulated Brillouin scattering laser, which employs interactions between sound and light waves, to generate a precise signal. They then set up a 300 GHz-band wireless communication system that employs the laser-based signal generator in both the transmitter and receiver. The system also used on-line digital signal processing (DSP) to demodulate the signals in the receiver and increase the data rate.

“Our team achieved a single-channel transmission rate of 240 gigabits per second,” said Tadao Nagatsuma, a professor at Osaka University, in a release. “This is the highest transmission rate obtained so far in the world using on-line DSP.” The researchers expect that with multiplexing techniques and more sensitive receivers, the data rate can be increased to 1 terabit per second. [3]

References

[1] Aaron Michelson et al., Three-dimensional nanoscale metal, metal oxide, and semiconductor frameworks through DNA-programmable assembly and templating. Sci. Adv. 10, eadl0604 (2024). https://doi.org/10.1126/sciadv.adl0604

[2] Hao, Y., Chen, X., Zhang, L. et al. Record high room temperature resistance switching in ferroelectric-gated Mott transistors unlocked by interfacial charge engineering. Nat Commun 14, 8247 (2023). https://doi.org/10.1038/s41467-023-44036-x

[3] Keisuke Maekawa, Tomoya Nakashita, Toki Yoshioka, Takashi Hori, Antoine Rolland, Tadao Nagatsuma, Single-channel 240-Gbit/s sub-THz wireless communications using ultra-low phase noise receiver, IEICE Electronics Express, Article ID 20.20230584, Advance online publication December 25, 2023, Online ISSN 1349-2543, https://doi.org/10.1587/elex.20.20230584

The post Research Bits: Feb. 19 appeared first on Semiconductor Engineering.

❌
❌