Normální zobrazení

Jsou dostupné nové články, klikněte pro obnovení stránky.

PředevčíremSemiconductor Engineering

Semiconductor Engineering
Comparing Thermal Properties In Molybdenum Substrate To Si And Glass For A System-On-Foil Integration (RIT, Lux)Technical Paper Link
A technical paper titled “Comparative Analysis of Thermal Properties in Molybdenum Substrate to Silicon and Glass for a System-on-Foil Integration” was published by researchers at Rochester Institute of Technology and Lux Semiconductors. Abstract: “Advanced electronics technology is moving towards smaller footprints and higher computational power. In order to achieve this, advanced packaging techniques are currently being considered, including organic, glass, and semiconductor-based substrates t
31. Květen 2024 v 18:39

Comparing Thermal Properties In Molybdenum Substrate To Si And Glass For A System-On-Foil Integration (RIT, Lux)

Od: Technical Paper Link

31. Květen 2024 v 18:39

A technical paper titled “Comparative Analysis of Thermal Properties in Molybdenum Substrate to Silicon and Glass for a System-on-Foil Integration” was published by researchers at Rochester Institute of Technology and Lux Semiconductors.

Abstract:

“Advanced electronics technology is moving towards smaller footprints and higher computational power. In order to achieve this, advanced packaging techniques are currently being considered, including organic, glass, and semiconductor-based substrates that allow for 2.5D or 3D integration of chips and devices. Metal-core substrates are a new alternative with similar properties to those of semiconductor-based substrates but with the added benefits of higher flexibility and metal ductility. This work comprehensively compares the thermal properties of a novel metal-based substrate, molybdenum, and silicon and fused silica glass substrates in the context of system-on-foil (SoF) integration. A simple electronic technique is used to simulate the heat generated by a typical CPU and to measure the heat dissipation properties of the substrates. The results indicate that molybdenum and silicon are able to effectively dissipate a continuous power density of 2.3 W/mm² as the surface temperature only increases by ~15°C. In contrast, the surface temperature of fused silica glass substrates increases by >140°C for the same applied power. These simple techniques and measurements were validated with infrared camera measurements as well as through finite element analysis via COMSOL simulation. The results validate the use of molybdenum as an advanced packaging substrate and can be used to characterize new substrates and approaches for advanced packaging.”

Find the technical paper here. Published May 2024.

Huang, Tzu-Jung, Tobias Kiebala, Paul Suflita, Chad Moore, Graeme Housser, Shane McMahon, and Ivan Puchades. 2024. “Comparative Analysis of Thermal Properties in Molybdenum Substrate to Silicon and Glass for a System-on-Foil Integration” Electronics 13, no. 10: 1818. https://doi.org/10.3390/electronics13101818

Related Reading
The Race To Glass Substrates
Replacing silicon and organic substrates requires huge shifts in manufacturing, creating challenges that will take years to iron out.

The post Comparing Thermal Properties In Molybdenum Substrate To Si And Glass For A System-On-Foil Integration (RIT, Lux) appeared first on Semiconductor Engineering.

Semiconductor Engineering
Using Formal Verification To Evaluate The HW Reliability Of A RISC-V Ibex Core In The Presence Of Soft ErrorsTechnical Paper Link
A technical paper titled “Using Formal Verification to Evaluate Single Event Upsets in a RISC-V Core” was published by researchers at University of Southampton. Abstract: “Reliability has been a major concern in embedded systems. Higher transistor density and lower voltage supply increase the vulnerability of embedded systems to soft errors. A Single Event Upset (SEU), which is also called a soft error, can reverse a bit in a sequential element, resulting in a system failure. Simulation-based fa
31. Květen 2024 v 18:33

Using Formal Verification To Evaluate The HW Reliability Of A RISC-V Ibex Core In The Presence Of Soft Errors

Semiconductor Engineering

Od: Technical Paper Link

31. Květen 2024 v 18:33

A technical paper titled “Using Formal Verification to Evaluate Single Event Upsets in a RISC-V Core” was published by researchers at University of Southampton.

Abstract:

“Reliability has been a major concern in embedded systems. Higher transistor density and lower voltage supply increase the vulnerability of embedded systems to soft errors. A Single Event Upset (SEU), which is also called a soft error, can reverse a bit in a sequential element, resulting in a system failure. Simulation-based fault injection has been widely used to evaluate reliability, as suggested by ISO26262. However, it is practically impossible to test all faults for a complex design. Random fault injection is a compromise that reduces accuracy and fault coverage. Formal verification is an alternative approach. In this paper, we use formal verification, in the form of model checking, to evaluate the hardware reliability of a RISC-V Ibex Core in the presence of soft errors. Backward tracing is performed to identify and categorize faults according to their effects (no effect, Silent Data Corruption, crashes, and hangs). By using formal verification, the entire state space and fault list can be exhaustively explored. It is found that misaligned instructions can amplify fault effects. It is also found that some bits are more vulnerable to SEUs than others. In general, most of the bits in the Ibex Core are vulnerable to Silent Data Corruption, and the second pipeline stage is more vulnerable to Silent Data Corruption than the first.”

Find the technical paper here. Published May 2024 (preprint).

Xue, Bing, and Mark Zwolinski. “Using Formal Verification to Evaluate Single Event Upsets in a RISC-V Core.” arXiv preprint arXiv:2405.12089 (2024).

Related Reading
Formal Verification’s Usefulness Widens
Demand for IC reliability pushes formal into new applications, where complex interactions and security risks are difficult to solve with other tools.
RISC-V Micro-Architectural Verification
Verifying a processor is much more than making sure the instructions work, but the industry is building from a limited knowledge base and few dedicated tools.

The post Using Formal Verification To Evaluate The HW Reliability Of A RISC-V Ibex Core In The Presence Of Soft Errors appeared first on Semiconductor Engineering.

Semiconductor Engineering
CAM-Based CMOS Implementation Of Reference Frames For Neuromorphic Processors (Carnegie Mellon U.)Technical Paper Link
A technical paper titled “NeRTCAM: CAM-Based CMOS Implementation of Reference Frames for Neuromorphic Processors” was published by researchers at Carnegie Mellon University. Abstract: “Neuromorphic architectures mimicking biological neural networks have been proposed as a much more efficient alternative to conventional von Neumann architectures for the exploding compute demands of AI workloads. Recent neuroscience theory on intelligence suggests that Cortical Columns (CCs) are the fundamental co
31. Květen 2024 v 18:29

CAM-Based CMOS Implementation Of Reference Frames For Neuromorphic Processors (Carnegie Mellon U.)

Semiconductor Engineering

Od: Technical Paper Link

31. Květen 2024 v 18:29

A technical paper titled “NeRTCAM: CAM-Based CMOS Implementation of Reference Frames for Neuromorphic Processors” was published by researchers at Carnegie Mellon University.

Abstract:

“Neuromorphic architectures mimicking biological neural networks have been proposed as a much more efficient alternative to conventional von Neumann architectures for the exploding compute demands of AI workloads. Recent neuroscience theory on intelligence suggests that Cortical Columns (CCs) are the fundamental compute units in the neocortex and intelligence arises from CC’s ability to store, predict and infer information via structured Reference Frames (RFs). Based on this theory, recent works have demonstrated brain-like visual object recognition using software simulation. Our work is the first attempt towards direct CMOS implementation of Reference Frames for building CC-based neuromorphic processors. We propose NeRTCAM (Neuromorphic Reverse Ternary Content Addressable Memory), a CAM-based building block that supports the key operations (store, predict, infer) required to perform inference using RFs. NeRTCAM architecture is presented in detail including its key components. All designs are implemented in SystemVerilog and synthesized in 7nm CMOS, and hardware complexity scaling is evaluated for varying storage sizes. NeRTCAM system for biologically motivated MNIST inference with a storage size of 1024 entries incurs just 0.15 mm^2 area, 400 mW power and 9.18 us critical path latency, demonstrating the feasibility of direct CMOS implementation of CAM-based Reference Frames.”

Find the technical paper here. Published May 2024 (preprint).

Nair, Harideep, William Leyman, Agastya Sampath, Quinn Jacobson, and John Paul Shen. “NeRTCAM: CAM-Based CMOS Implementation of Reference Frames for Neuromorphic Processors.” arXiv preprint arXiv:2405.11844 (2024).

Related Reading
Running More Efficient AI/ML Code With Neuromorphic Engines
Once a buzzword, neuromorphic engineering is gaining traction in the semiconductor industry.

The post CAM-Based CMOS Implementation Of Reference Frames For Neuromorphic Processors (Carnegie Mellon U.) appeared first on Semiconductor Engineering.

Semiconductor Engineering
Chip Industry Week In ReviewThe SE Staff
JEDEC and the Open Compute Project rolled out a new set of guidelines for standardizing chiplet characterization details, such as thermal properties, physical and mechanical requirements, and behavior specs. Those details have been a sticking point for commercial chiplets, because without them it’s not possible to choose the best chiplet for a particular application or workload. The guidelines are a prerequisite for a multi-vendor chiplet marketplace. AMD, Broadcom, Cisco, Google, HPE, Intel, Me
31. Květen 2024 v 09:01

Chip Industry Week In Review

Semiconductor Engineering

Od: The SE Staff

31. Květen 2024 v 09:01

JEDEC and the Open Compute Project rolled out a new set of guidelines for standardizing chiplet characterization details, such as thermal properties, physical and mechanical requirements, and behavior specs. Those details have been a sticking point for commercial chiplets, because without them it’s not possible to choose the best chiplet for a particular application or workload. The guidelines are a prerequisite for a multi-vendor chiplet marketplace.

AMD, Broadcom, Cisco, Google, HPE, Intel, Meta, and Microsoft proposed a new high-speed, low-latency interconnect specification, Ultra Accelerator Link (UALink), between accelerators and switches in AI computing pods. The 1.0 specification will enable the connection of up to 1,024 accelerators within a pod and allow for direct loads and stores between the memory attached to accelerators.

Arm debuted a range of new CPUs, including the Cortex-X925 for on-device generative AI, and the Cortex-A725 with improved efficiency for AI and mobile gaming. It also announced the Immortalis-G925 GPU for flagship smartphones, and the Mali-G725/625 GPUs for consumer devices. Additionally, Arm announced Compute Subsystems (CSS) for Client to provide foundational computing elements for AI smartphone and PC SoCs, and it introduced KleidiAI, a set of compute kernels for developers of AI frameworks. The Armv9-A architecture also added support for the Scalable Matrix Extension to accelerate AI workloads.

TSMC said its 2nm process is on target to begin mass production in 2025. Meanwhile, Samsung is expected to release its 1nm plan next month, targeting mass production for 2026 — a year ahead of schedule, reports Business Korea.

CHIPs for America and NATCAST released a 2024 roadmap for the U.S. National Semiconductor Technology Center (NSTC), identifying priorities for facilities, research, workforce development, and membership.

China is investing CNY 344 billion (~$47.5 billion) into the third phase of its National Integrated Circuit Industry Investment Fund, also known as the Big Fund, to support its semiconductor sector and supply chain, according to numerous reports.

Malaysia plans to invest $5.3 billion in seed capital and support for semiconductor manufacturing in an effort to attract more than $100 billion in foreign investments, reports Reuters. Prime Minister Anwar Ibrahim announced the effort to create at least 10 companies focused on IC design, advanced packaging, and equipment manufacturing.

imec demonstrated a die-to-wafer hybrid bonding flow for Cu-Cu and SiCN-SiCN at pitches down to 2µm at the IEEE’s ECTC conference. This breakthrough could enable die and wafer-level optical interconnects.

The chip industry is racing to develop glass for advanced packaging, setting the stage for one of the biggest shifts in chip materials in decades — and one that will introduce a broad new set of challenges that will take years to fully resolve.

Quick links to more news:

In-Depth
Global
Product News
Markets and Money
Security
Research and Training
Quantum
Events and Further Reading

In-Depth

Semiconductor Engineering published its Systems & Design newsletter featuring these top stories:

RISC-V Heralds New Era Of Cooperation
AI For Data Management
Trouble Ahead For IC Verification

Global

STMicroelectronics is building a fully integrated SiC facility in Catania, Italy. The high-volume 200mm facility is projected to cost over $5 billion.

Siliconware Precision Industries Co. Ltd.(SPIL) broke ground on an RM 6 billion (~$1.3 billion) advanced packaging and testing facility in Malaysia. Also, Google will invest $2 billion in Malaysia for its first data center, and a Google Cloud hub to meet growing demand for cloud services and AI literacy programs, reports AP.

In an SEC filing, Applied Materials received additional subpoenas from the U.S. Department of Commerce’s (DoC) Bureau of Industry and Security related to shipments of advanced semiconductor equipment to China. This comes on the heels of similar subpoenas issued last year.

A Chinese contractor working for SK hynix was arrested in South Korea and is being charged with funneling more than 3,000 copies of a paper on solving process failure issues to Huawei, reports South Korea’s Union News.

VSORA, CEA-Grenoble, and Valeo were awarded $7 million from the French government to build low-latency, low-power AI inference co-processors for autonomous driving and other applications.

In the U.S., the National Highway Traffic Safety Administration (NHTSA) is investigating unexpected driving behaviors of vehicles equipped with Waymo‘s 5th Generation automated driving system (ADS), with details of nine new incidents on top of the first 22.

Product News

ASE introduced powerSIP, a power delivery platform designed to reduce signal and transmission loss while addressing current density challenges.

Infineon announced a roadmap for energy-efficient power supply units based on Si, SiC, and GaN to address the energy needs of AI data centers, featuring new 8 kW and 12 kW PSUs, in addition to the 3 kW and 3.3 kW units available today. The company also released its CoolSiC MOSFET 400 V family, specially developed for use in the AC/DC stage of AI servers, complementing the PSU roadmap.

Fig. 1: Infineon’s 8kW PSU. Source: Infineon

Infineon also introduced two new generations of high voltage (HV) and medium voltage (MV) CoolGaN TM devices, enabling customers to use GaN in voltage classes from 40 V to 700 V. The devices are built using Infineon’s 8-inch foundry processes.

Ansys launched Ansys Access on Microsoft Azure to provide pre-configured simulation products optimized for HPC on Azure infrastructure.

Foxconn Industrial Internet used Keysight Technology’s Open RAN Studio solution to certify an outdoor Open Radio Unit (O-RU).

Andes Technology announced an SoC and development board for the development and porting of large RISC-V applications.

MediaTek uncorked a pair of mobile chipsets built on a 4nm process that use an octa-core CPU consisting of 4X Arm Cortex-A78 cores operating at up to 2.5GHz paired with 4X Arm Cortex-A55 cores.

The NVIDIA H200 Blackwell platform is expected to begin shipping in Q3 of 2024 and will be available to data centers by Q4, according to TrendForce.

A room-temperature direct fusion hybrid bonding system from Be Semiconductor has shipped to the NHanced advanced packaging facility in North Carolina. The new system offers faster throughput for copper interconnects with submicron pad sizes, greater accuracy and reduced warpage.

Markets and Money

Frore Systems raised $80 million for its solid-state active cooling module, which removes heat from the top of a chip without fans. The device in systems ranging from notebooks and network edge gateways to data centers.

Axus Technology received $12.5 million in capital equity funding to make its chemical mechanical planarization (CMP) equipment for semiconductor wafer polishing, thinning, and cleaning, including of silicon carbide (SiC) wafers.

Elon Musk’s xAI announced a series B funding round of $6 billion.

Micron was ordered to pay $445 million in damages to Netlist for patent infringement of the company’s DDR4 memory module technology between 2021 and 2024.

Global revenue from AI semiconductors is predicted to total $71 billion in 2024, up 33% from 2023, according to Gartner. In 2025, it is expected to jump to $91.9 billion. The value of AI accelerators used in servers is expected to total $21 billion in 2024 and reach $33 billion by 2028.

NAND flash revenue was $14.71 billion in Q1 2024, an increase of 28.1%, according to TrendForce.

The optical transceiver market dipped from $11 billion in 2022 to $10.9 billion in 2023, but it is predicted to reach $22.4 billion by 2029, driven by AI, 800G applications, and the transition to 200G/lane ecosystem technologies, reports Yole.

Yole also found that ultra-wideband technical choices and packaging types used by NXP, Apple, and Qorvo vary considerably, ranging from 7nm to 90nm, with both CMOS and finFET transistors.

The global market share of GenAI-capable smartphones increased to 6% in Q1 2024 from 1.3% in the previous quarter, reports Counterpoint. The premium segment accounted for over 70% of sales with Samsung on top and contributing 58%. Meanwhile, global foldable smartphone shipments were up 49% YoY in Q1 2024, led by Huawei, HONOR, and Motorola.

Security

The National Science Foundation awarded Worcester Polytechnic Institute researcher Shahin Tajik almost $0.6 million to develop new technologies to address hardware security vulnerabilities.

The Hyperform consortium was formed to develop European sovereignty in post-quantum cryptography, funded by the French government and EU credits. Members include IDEMIA Secure Transactions, CEA Leti, and the French cybersecurity agency (ANSSI).

In security research:

University of California Davis and University of Arizona researchers proposed a framework leveraging generative pre-trained transformer (GPT) models to automate the obfuscation process.
Columbia University and Intel researchers presented a secure digital low dropout regulator that integrates an attack detector and a detection-driven protection scheme to mitigate correlation power analysis.
Pohang University of Science and Technology (POSTECH) researchers analyzed threshold switch devices and their performance in hardware security.

The U.S. Defense Advanced Research Projects Agency (DARPA) seeks proposals for its AI Quantified program to develop technology to help deploy generative AI safely and effectively across the Department of Defense (DoD) and society.

Vanderbilt University and Oak Ridge National Laboratory (ORNL) partnered to develop dependable AI for national security applications.

The Cybersecurity and Infrastructure Security Agency (CISA) issued a number of alerts/advisories.

Research and Training

New York continues to amp up their semiconductor offerings. NY CREATES and Raytheon unveiled a semiconductor workforce training program. And Syracuse University is hosting a free virtual course focused on the semiconductor industry this summer.

In research news:

A team of researchers at MIT and other universities found that extreme temperatures up to 500°C did not significantly degrade GaN materials or contacts.
University of Cambridge researchers developed adaptive and eco-friendly sensors that can be directly and imperceptibly printed onto biological surfaces, such as a finger or flower petal.
Researchers at Rice University and Hanyang University developed an elastic material that moves like skin and can adjust its dielectric frequency to stabilize RF communications and counter disruptive frequency shifts that interfere with electronics when a substrate is twisted or stretched, with potential for stretchable wearable electronic devices.

The National Science Foundation (NSF) awarded $36 million to three projects chosen for their potential to revolutionize computing. The University of Texas at Austin-led project aims to create a next-gen open-source intelligent and adaptive OS. The Harvard University-led project targets sustainable computing. The University of Massachusetts Amherst-led project will develop computational decarbonization.

Quantum

Singapore will invest close to S$300 million (~$222 million) into its National Quantum Strategy to support the development and deployment of quantum technologies, including an initiative to design and build a quantum processor within the country.

Several quantum partnerships were announced:

Riverlane and Alice & Bob will integrate Riverlane’s quantum error correction stack within Alice & Bob’s larger quantum computing system based on cat qubit technology.
New York University and the University of Copenhagen will collaborate to explore the viability of hybrid superconductor-semiconductor quantum materials for the production of quantum chips and integration with CMOS processes.
NXP, eleQtron, and ParityQC showed off a full-stack, ion-trap based quantum computer demonstrator for Germany’s DLR Quantum Computing Initiative.
Photonic says it demonstrated distributed entanglement between quantum modules using optically-linked silicon spin qubits with a native telecom networking interface as part of a quantum internet effort with Microsoft.
Classiq and HPE say they developed a rapid method for solving large-scale combinatorial optimization problems by combining quantum and classical HPC approaches.

Events and Further Reading

Find upcoming chip industry events here, including:

Event	Date	Location
Hardwear.io Security Trainings and Conference USA 2024	May 28 – Jun 1	Santa Clara, CA
SWTest	Jun 3 – 5	Carlsbad, CA
IITC2024: Interconnect Technology Conference	Jun 3 – 6	San Jose, CA
VOICE Developer Conference	Jun 3 – 5	La Jolla, CA
CHIPS R&D Standardization Readiness Level Workshop	Jun 4 – 5	Online and Boulder, CO
SNUG Europe: Synopsys User Group	Jun 10 – 11	Munich
IEEE RAS in Data Centers Summit: Reliability, Availability and Serviceability	Jun 11 – 12	Santa Clara, CA
3D & Systems Summit	Jun 12 – 14	Dresden, Germany
PCI-SIG Developers Conference	Jun 12 – 13	Santa Clara, CA
AI Hardware and Edge AI Summit: Europe	Jun 18 – 19	London, UK
DAC 2024	Jun 23 – 27	San Francisco

Find All Upcoming Events Here

Upcoming webinars are here, including integrated SLM analytics solution, prototyping and validation of perception sensor systems, and improving PCB designs for performance and reliability.

Semiconductor Engineering’s latest newsletters:

Automotive, Security and Pervasive Computing
Systems and Design
Low Power-High Performance
Test, Measurement and Analytics
Manufacturing, Packaging and Materials

The post Chip Industry Week In Review appeared first on Semiconductor Engineering.

Semiconductor Engineering
DAC Panel Could Spark FireworksBrian Bailey
Panels can often become love fests. While a title may sound controversial, it turns out that everyone quickly finds that all the panelists agree on the major points. This is sometimes the result of how the panel was put together – the proposal came from one company, and they wanted to get their customers or clients onto the panel. They are unlikely to ask a major competitor to be part of the event. These panels can become livelier if they have a moderator who opens up a panel to audience questio
30. Květen 2024 v 09:08

DAC Panel Could Spark Fireworks

Semiconductor Engineering

Od: Brian Bailey

30. Květen 2024 v 09:08

Panels can often become love fests. While a title may sound controversial, it turns out that everyone quickly finds that all the panelists agree on the major points. This is sometimes the result of how the panel was put together – the proposal came from one company, and they wanted to get their customers or clients onto the panel. They are unlikely to ask a major competitor to be part of the event.

These panels can become livelier if they have a moderator who opens up a panel to audience questions and they decide to throw the spanner in the works. This tends to happen a lot more in the technical panels, because each researcher, who may have taken a different approach to a problem, wants to introduce the audience to their alternative solution. But the pavilion panels tend to be a little more sedate – in part because nobody wants to burn bridges within such a tight industry.

It is quite common for me to moderate a panel each DAC, and this year is no exception. I will be moderating a technical panel whose title is directly confrontational: “Why Is EDA Playing Catchup to Disruptive Technologies Like AI? What Can We Do to Change This?”

The abstract for the panel talks about EDA having a closed mindset, consistently missing disruptive changes by choosing incremental approaches. I know that when I first read it – when I was invited to be the chair for it – I was immediately up in arms.

Twenty years ago, while working at an EDA company, I attempted to drive such disruptive changes in the verification industry. Several times a year, I would go out and talk to our customers and exchange ideas with them about the problems they were facing. We would present ideas about both incremental and disruptive developments we had underway. The message was always the same. “Can we have the incremental changes yesterday? And we don’t have time to think about the longer-term ideas.” It reminded me of the cartoon where a stone-age person is pulling a cart with square wheels and doesn’t have time to listen to the person offering him round ones.

Even so, we did go ahead and develop some of them, and a few of them did achieve an element of success. But to go from first adopters to more mainstream interests often took 10 years. Even today, many of those are still niche tools, and probably money sinks for the companies that developed them. Examples are high-level synthesis and virtual prototypes, the only two pieces of the whole ESL movement that survived. Still, they believe that long term, the industry will need them. Many other pieces completely fell by the wayside, such as hardware/software co-design. That, however, may start to resurface thanks to RISC-V.

Many of the tools associated with ESL were direct collaborations between EDA companies and researchers. I established a research collaboration program with the University of Washington that looked at multi-abstraction simulation, protocol checking and had elements of system synthesis. The only thing that came out of that was hardware software co-verification. Protocol checking, in the form of VIP, also has become popular, although not directly because of this program. Co-verification had a useful life of about five years before SystemC made the solution obsolete.

Many disruptive innovations actually have come from industry, then were commercialized by EDA companies. SystemC is one example of that. Constrained random verification is another. Portable Stimulus, while still nascent, also was developed within industry. These solutions have an advantage in that they were developed to solve a significant enough problem within the industry that they have broader appeal. There is little that has actually come from academia in recent decades.

The panel title also talks specifically about AI and accuses EDA of being behind already. It is not clear that they are. Thirty years ago, you could go to DAC and see all the new tools and flows that EDA companies were working on. Many of them might be ready within a year or two. But today, EDA companies will make no announcements until at least a few of their customers, that they chose as development partners, have had silicon success.

A typical chip cycle is 18 months. Given that we are beginning to hear about some of these tools today means they may have been in use for a good part of that 18 months. Plus, development of those tools must have started about a year before that. Let’s remember that ChatGPT only came to the fore 18 months ago, and it should be quite obvious why few generative AI products have yet been announced. The fact that there are so many EDA AI announcements would make me think that EDA companies were very quick off the starting blocks.

The panelists are Prith Banerjee – Ansys, who has written a book about disruption; Jan Rabaey – professor in the Graduate School of in the Electrical Engineering and Computer Sciences at the University of California, Berkeley, who also serves as the CTO of the Systems Technology Co-Optimization division at imec; Samir Mittal, corporate VP for Silicon Systems AI at Micron Technology; James Scapa, founder and CEO of Altair; and Charles Alpert, fellow at Cadence Design Systems.

If you are going to be at DAC and have access to the technical program, this 90-minute panel may be worth your time. Wednesday June 26^th at 10:30am. Come ready with your questions because I will certainly be opening this panel up to the audience very quickly. While sparks may fly, please try and keep your cool and be respectful.

The post DAC Panel Could Spark Fireworks appeared first on Semiconductor Engineering.

Semiconductor Engineering
Vision Is Why LLMs Matter On The EdgeBen Gomes
Large Language Models (LLMs) have taken the world by storm since the 2017 Transformers paper, but pushing them to the edge has proved problematic. Just this year, Google had to revise its plans to roll out Gemini Nano on all new Pixel models — the down-spec’d hardware options proved unable to host the model as part of a positive user experience. But the implementation of language-focused models at the edge is perhaps the wrong metric to look at. If you are forced to host a language-focused model
30. Květen 2024 v 09:05

Vision Is Why LLMs Matter On The Edge

Semiconductor Engineering

Od: Ben Gomes

30. Květen 2024 v 09:05

Large Language Models (LLMs) have taken the world by storm since the 2017 Transformers paper, but pushing them to the edge has proved problematic. Just this year, Google had to revise its plans to roll out Gemini Nano on all new Pixel models — the down-spec’d hardware options proved unable to host the model as part of a positive user experience. But the implementation of language-focused models at the edge is perhaps the wrong metric to look at. If you are forced to host a language-focused model for your phone or car in the cloud, that may be acceptable as an intermediate step in development. Vision applications of AI, on the other hand, are not so flexible: many of them rely on low latency and high dependability. If a vehicle relies on AI to identify that it should not hit the obstacle in front of it, a blip in contacting the server can be fatal. Accordingly, the most important LLMs to fit on the edge are vision models — the models whose purpose is most undermined by the reliance on remote resources.

“Large Language Models” can be an imprecise term, so it is worth defining. The original 2017 Transformer LLM that many see as kickstarting the AI rush was 215 million parameters. BERT was giant for its time (2018) at 335 million parameters. Both of these models might be relabeled as “Small Language Models” by some today to distinguish from models like GPT4 and Gemini Ultra with as much as 1.7 trillion parameters, but for the purposes here, all fall under the LLM category. All of these are language models though, so why does it matter for vision? The trick here is that language is an abstract system of deriving meaning from a structured ordering of arbitrary objects. There is no “correct” association of meaning and form in language which we could base these models on. Accordingly, these arbitrary units are substitutable — nothing forces architecture developed for language to only be applied to language, and all the language objects are converted to multidimensional vectors anyway. LLM architecture is thus highly generalizable, and typically retains the core strength from having been developed for language: a strong ability to carry through semantic information. Thus, when we talk about LLMs at the edge, it can be a language model cross-trained on image data, or it might be a vision-only model which is built on the foundation of technology designed for language. At the software and hardware levels, for bringing models to the edge, this distinction makes little difference.

Vision LLMs on the edge flexibly apply across many different use cases, but key applications where they show the greatest advantages are: embodied agents (an especially striking example of the benefits of cross-training embodied agents on language data can be seen with Dynalang’s advantages over DreamerV3 in interpreting the world due to superior semantic parsing), inpainting (as seen with the latent diffusion models), LINGO-2’s decision-making abilities in self-driving vehicles, context-aware security (such as ViViT), information extraction (Gemini’s ability to find and report data from video), and user assistance (physician aids, driver assist, etc). Specifically notable and exciting here is the ability for Vision LLMs to leverage language as a lossy storage and abstraction of visual data for decision-making algorithms to then interact with — especially as seen in LINGO-2 and Dynalang. Many of these vision-oriented LLMs depend on edge deployment to realize their value, and they benefit from the work that has already been done for optimizing language-oriented LLMs. Despite this, vision LLMs are still struggling for edge deployment just as the language-oriented models are. The improvements for edge deployments come in three classes: model architecture, system resource utilization, and hardware optimization. We will briefly review the first two and look more closely at the third since it often gets the least attention.

Model architecture optimizations include the optimizations that must be made at the model level: “distilling” models to create leaner imitators, restructuring where models spend their resource budget (such as the redistribution of transformer modules in Stable Diffusion XL) and pursuing alternate architectures (state-space models, H3 modules, etc.) to escape the quadratically scaling costs of transformers.

System resource optimizations are all the things that can be done in software to an already complete model. Quantization (to INT8, INT4, or even INT2) is a common focus here for both latency and memory burden, but of course compromises accuracy. Speculative decoding can improve utilization and latency. And of course, tiling, such as seen with FlashAttention, has become near-ubiquitous for improving utilization and latency.

Finally, there are hardware optimizations. The first option here is a general-purpose GPU, TPU, NPU or similar, but those tend to be best suited for settings where capability is needed without demanding streamlined optimization such as might be the case on a home computer. Custom hardware, such as purpose-built NPUs, generally has the advantage when the application is especially sensitive to latency or resource consumption, and this covers much of the applications for vision LLMs.

Exploring this trade-off further: Stable Diffusion’s architecture and resource demands have been discussed here before, but it is worth circling back to it as an example of why hardware solutions are so important in this space. Using Stable Diffusion 1.5 for simplicity, let us focus specifically on the U-Net component of the model. In this diagram, you can see the rough construction of the model: it downsamples repeatedly on the left until it hits the bottom of the U, and then upsamples up the right side, bringing back in residual connections from the left at each stage.

This U-Net implementation has 865 million parameters and entails 750 billion operations. The parameters are a fair proxy for the memory burden, and the operations are a direct representation of the compute demands. The distribution of these burdens on resources is not even however. If we plot the parameters and operations for each layer, a clear picture emerges:

These graphs show a model that is destined for gross inefficiencies at every step. Most of the memory burden peaks in the center, whereas the compute is heavily taxed at the two tails but underutilized in the center. These inefficiencies come with costs. The memory peak can overwhelm on-chip storage, thus incurring I/O operations, or else requiring a large excess of unused memory for most of the graph. Similarly, storing residuals for later incurs I/O latency and higher power draws. The underutilization of the compute power at the center of the graph means that the processor will have wasteful power draw as it cannot use the tail of the power curve as it does sparser operations. While software interventions can also help here, this is exactly the kind of problem that custom hardware solutions are meant to address. Custom silicon tailored to the model can let you offload some of that memory burden into additional compute cycles at the center of the graph without incurring extra I/O operations by recomputing the residual connections instead of kicking them out to memory. In doing so, the total required memory drops, and the processor can remain at full utilization. Rightsizing the resource allotment and finding ways to redistribute the burdens are key components to how these models can be best deployed at the edge.

Despite their name, LLMs are important to the vision domain for their flexibility in handling different inputs and their strength at interpreting meaning in images. Whether used for embodied agents, context-aware security, or user assistance, their use at the edge requires a dependable low latency which precludes cloud-based solutions, in contrast to other AI applications on edge devices. Bringing them successfully to the edge asks for optimizations at every level, and we have seen already some of the possibilities at the hardware level. Conveniently, the common architecture with language-oriented LLMs means that many of the solutions needed to bring these most essential models to the edge in turn may also generalize back to the language-oriented models which donated the architecture in the first place.

The post Vision Is Why LLMs Matter On The Edge appeared first on Semiconductor Engineering.

Semiconductor Engineering
RISC-V Heralds New Era Of CooperationBrian Bailey
RISC-V is paving the way for open source to become accepted within the hardware community, creating a level of industry collaboration never seen in the past, while revitalizing the connection between academia and industry. The big question is whether this arrangement is just a placeholder while the industry re-learns how to develop processors, or whether this processor architecture is something very different. In either case, there is a clear and pressing need for more flexible processor archite
30. Květen 2024 v 09:05

RISC-V Heralds New Era Of Cooperation

Semiconductor Engineering

Od: Brian Bailey

30. Květen 2024 v 09:05

RISC-V is paving the way for open source to become accepted within the hardware community, creating a level of industry collaboration never seen in the past, while revitalizing the connection between academia and industry.

The big question is whether this arrangement is just a placeholder while the industry re-learns how to develop processors, or whether this processor architecture is something very different. In either case, there is a clear and pressing need for more flexible processor architectures, and at least for now, RISC-V has filled a void.

“RISC-V was born out of academia and has had strong collaboration within universities from day one,” says Loren Hobbs, vice president of product and business development at Bluespec. “This collaboration continues today, with many of the most popular open-source RISC-V processors having come from universities. Organizations such as OpenHW Group and CHIPS Alliance serve a central and critical role in driving the collaboration, which is bi-directional between the academic community and industry.”

Collaboration of this type has not existed with the industrial community in the past. “We are learning from each other,” says Florian Wohlrab, CEO at OpenHW. “We are learning best practices for verification. At the same time, we are learning what things to avoid. It is growing where people say, ‘Yes, I really get benefit from sharing ideas.'”

The need for processor flexibility exists within industry as well as academia. “There is a need within the industry for diversification on the processor front,” says Neil Hand, director of marketing at Siemens EDA. “In the past, this led to a fragmented set of companies that couldn’t work together. They didn’t see the value of working together. But RISC-V has a cohesive central organization where anyone who wants to get into the processor space can collaborate. They don’t have to expose their secret sauce, but they get to benefit from each other. A rising tide lifts all boats, and that’s really the situation we’re at with RISC-V.”

Longevity
Whether the industry can build upon this success, or whether it fizzles out over time, remains to be seen. But at least for now, RISC-V’s momentum is growing. “We are at the beginning of a revolution in hardware design,” says OpenHW’s Wohlrab. “We saw the same thing for software when Linux came out 20 or so years ago. No one was really thinking about sharing software or collaboratively developing software. There were some small open-source ventures, but working together on a big project took a long time to develop. Now we are all sharing software, all co-working. But for hardware, we’re just at the beginning of this new concept, and a lot of people need to understand that we can do the same for hardware as we did for software.”

Underlying RISC-V’s success is widespread collaboration. “One of the pillars sustaining the success of RISC-V is customization that works with the ecosystem and leverages a well-defined process,” says Sergio Marchese, vice president of application engineering at SmartDV. “RISC-V vendors face the challenge of showing how their processor customization capabilities serve an application and demonstrating the complete process on real hardware. Without strategic partnerships, RISC-V vendors must walk a much more challenging, time-consuming, and resource-intensive road.”

That framework is what makes it unique. “RISC-V has formed this framework for collaboration, and it fixes everything,” says Siemens’ Hand. “Now, when a university has a really cool idea for memory tagging in a processor design, they don’t have to build the compilers, they don’t have to build the reference platform. They already exist. Maybe a compiler optimization startup has this great idea for handling code optimization. They don’t have to build the rest of the ecosystem. When a processor IP company has this great idea, they can become focused within this bigger picture. That’s the unique nature of it. It’s not just a processor specification.”

Historically, one of the problems associated with open-source hardware was quality, because finding bugs in silicon is expensive. OpenHW is an important piece of the puzzle. “Why should everyone reinvent the wheel by themselves?” asks Wohlrab. “Why can’t we get the basic building blocks, some basic chips, take some design from academia, which has reasonably good quality, and build on them, verify them together. We are verifying with different tools, making sure we get a high coverage, and then everyone can go off and use them in their own chips for mass production, for volume shipment.”

This benefits companies both large and small. “There are several processor vendors that have switched to RISC-V,” says Hand. “Synopsys has moved to RISC-V. Andes has moved to RISC-V. MIPS has moved to RISC-V. Why? Because they can leverage the whole ecosystem. The downside of it is commoditization, which as a customer is really beneficial because you can delay choosing a processor till later in the design flow. Your early decision is to use the Arm ecosystem or RISC-V, and then you can work through it. That creates an interesting set of dynamics. You can start to create new opportunities for companies that develop and deliver IP, because you can benchmark them, swap them in and out, and see which one works. On the flip side, it makes it awful from a lock-in perspective once you’re in that socket.”

Fragmentation
Of course, there will be some friction in the system. “In the early days of RISC-V there was nearly a 1:1 balance between contributors and consumers of the technology,” says Geir Eide, director, product management for Siemens EDA. “Today there are thousands of RISC-V consumers, but only a small percentage of those will be contributors. There is a risk that there will be a disconnect between them. If, for instance, a particular market or regional segment is growing at a higher pace than others, or other market segments and regions are more conservative, they tend to stick to established solutions longer. That increases the risk that it could lead to fragmentation.”

Is that likely to impact development long term? “We do not believe that RISC-V will become regionally concentrated, although though there may be regional concentrations of focus within the broad set of implementation choices provided by RISC-V,” says Bluespec’s Hobbs. “A prime example of this is the Barcelona Supercomputer Center, creating a regional focus area for high-performance computing using RISC-V. However, while there may be regional focus areas, this does not mean that the RISC-V standard is, or will become, fragmented. In fact, one of the key tenets of the creation and foundation of RISC-V was preventing fragmentation of the ISA, and it continues to be a key function of RISC-V international.”

China may be a different story. “A lot of companies in China are creating RISC-V cores for internal consumption — for political reasons mostly,” says John Min, vice president of customer service at Arteris. “I think China will go 100% RISC-V for embedded, but it’s a one-way street. They will keep leveraging what the Western companies do and enhance it. China will continue sucking all advancements, such as vectorization, or the special domain-specific acceleration enhancements. They will create their own and make it their own internally, but they will give nothing back.”

Such splits have occurred in the past. “Design languages are the most recent example of that,” says Hand. “There was a regional split, and you had Europe focus on VHDL while America went with Verilog. With RISC-V, there will be that regional split where people will go off and do their things regionally. Europe has focused projects, India has theirs, but they’re still doing it within this framework. It’s this realization that everyone benefits. They’re not doing it to benefit the other people. They’re doing it ultimately to save themselves effort, to save themselves cost, but they realize that by doing it in that framework it is a net benefit to everyone.”

Bi-directionality
An important element is that everyone benefits, and that has to stretch across the academic/commercial boundary. “RISC-V has propelled a new degree of collaboration between academia and commercial organizations,” says Dave Kelf, CEO at Breker. “It’s noticeable that institutions such as Harvey Mudd College in Claremont, California, and ETH in Zurich, Switzerland, to name two, have produced advanced processor designs as a teaching aid, and have collaborated with multiple companies on their verification and design. This has been further advanced by OpenHW Group, which has made these designs accessible to the industry. This bi-directional collaboration benefits the tool providers to further enhance their offerings working on advanced, open devices, while also enabling academia to improve their designs to a commercial quality level. The virtuous circle created is essential if we are to see RISC-V established as a mainstream, industry-wide capability.”

Academia has a lot to offer in hardware advancement. “Researchers in universities are developing innovative new software and hardware to push the limits of RISC-V innovation,” says Dave Miller, head of corporate communications at SiFive. “Many of the RISC-V projects in academia are focused on optimizing performance and energy efficiency for AI workloads, and are open source so the entire ecosystem can benefit. Researchers are also actively contributing to RISC-V working groups to share their knowledge and collaborate with industry participants. These working groups are split evenly between representatives from APAC, Europe, and North America, all working together towards common goals.”

In many cases, industry is willing to fund such projects. “It makes it easy to have research topics that don’t need to boil the ocean,” says Hand. “If you’re a PhD student and you have a great idea, you can go do it. It’s easy for an industry partner to say, ‘I’ll sponsor that. That’s an interesting thing, and I am not required to allocate ridiculous amounts of money into an open-ended project. It’s like I can see the connection of how that research will go into a commercial product later.'”

This feeds back into academia. “The academics have been jumping on board with OpenHW,” says Wohlrab. “By taking their cores and productizing them, they get a chip back that could be shipped in high volume. Then they can do their research on a real commercial product and can see if their idea would fly in real life. They get real numbers and can see real figures for the benefits of a new branch predictor.”

It can also have a long-term benefit for tools. “There are areas where they want to collaborate with us, especially around security,” says Kiran Vittal, executive director for alliances marketing management at Synopsys. “They are building RISC-V based sub-systems using open-source RISC-V processors, and then academia wants to look at not only the AI part, but the security part. There are post-doc students or PhD students looking into using our tools to verify or to implement whatever they’re doing on security.”

That provides an incentive for EDA to offer better tools for use in universities. “Although there has always been collaboration between universities and the industry, where industry provides the universities with access to EDA tools, IP cores, etc., there’s often a bit of a lag,” says Siemens’ Eide. “In many situations (especially outside of the core area of a particular project), universities have access to older versions of the commercial solutions. If you for instance look at a new grad’s resume, where you in the past would see references to old tech, now you see a lot of references to relatively sophisticated use of RISC-V.”

Moving Forward
This collaboration needs to keep pushing forward. “We had an initiative to create a standardized interface for accelerators,” says Wohlrab. “RISC-V International standardized how to add custom instructions in the ISA, but there was no standard for the hardware interface. So we built this. It was a cool discussion. There were people from Silicon Labs, people from NXP, people from Thales, plus several startups. They all came together and asked, ‘How can we make it future proof and put the accelerators inside?'”

The application space for RISC-V is changing. “The big inflection point is Linux and Android,” says Arteris’ Min. “Android already has some support, but when both Android and Linux are really supported, it will change the mobile apps processor game. The number of designs will proliferate. The number of high-end designs will explode. It will take the whole industry to enable that because RISC-V companies are not big enough to create this by themselves. All the RISC-V companies are partners, because we enable this high-end design at the processor levels.”

That would deepen the software community’s engagement. “An embedded software developer needs to understand the underlying hardware if they want to run Linux on a RISC-V processor that uses custom instructions/ accelerators,” says Bluespec’s Hobbs. “To develop complex embedded hardware/software systems, both embedded software developers and embedded hardware developments must possess contextual understanding of the interoperability of hardware and software. The developer must understand how the customized processor is leveraging the custom instructions in hardware for Linux to efficiently manage and execute the accelerated workloads.”

This collaboration could reinvigorate research into EDA, as well. “With AI you can build predictive models,” says Hand. “Could that be used to identify the change effects from making an extension? What does that mean? There’s a cloud of influence — not directly gate-wise, because that immediately explodes — but perhaps based on test suites. ‘I know that something that touches that logic touches this downstream, which touches the rest of the design.’ That’s where AI plays a big role, and it is one of the interesting areas because in verification there are so many unknowns. When AI comes along, any guidance or any visibility that you can give is incredibly powerful. Even if it is not right 100% of the time, that’s okay, as long as it generates false negatives and not false positives.”

There is a great opportunity for EDA companies. “We collaborate with many of the open-source providers, with OpenHW group, with ETH in Zurich,” says Synopsys’ Vittal. “We want to promote our solutions when it comes to any processor design and you need standard tools like synthesis, place and route, simulation. But then there are also other kinds of unique solutions because RISC-V is so customizable, you can build your own custom instructions. You need something specific to verify these custom instructions and that’s why the Imperas golden models are important. We also collaborated with Bluespec to develop a verification methodology to take you through functional verification and debug.”

There are still some wrinkles to be worked out for customizations. “RISC-V gives us predictability,” says Hand. “We can create a compliance test suite, give you a processor optimization package if you’re on the implementation side. We can create analytics and testing solutions because we know what it’s going to look like. But for non-standard processors, it is effectively as a service, because everyone’s processor is a little bit different. The reason you see a lot of focus on the verification, from platform architecture exploration all the way through, is because if you change one little thing, such as an addressing mode, it impacts pretty much 100% of your processor verification. You’ve got to retest the whole processor. Most people aren’t set up like an Arm or an Intel with huge processor verification teams and the infrastructure, and so they need automation to do it for them.”

Conclusion
RISC-V has enabled the industry to create a framework for collaboration, which enables everyone to work together for selfish reasons. It is a symbiotic relationship that continues to build, and it is creating a wider sphere of influence over time.

“It’s unique in the modern era of semiconductor,” says Hand. “You have such a wide degree of collaboration, where you have processor manufacturers, the software industry leaders, EDA companies, all working on a common infrastructure.”

Related Reading
RISC-V Micro-Architectural Verification
Verifying a processor is much more than making sure the instructions work, but the industry is building from a limited knowledge base and few dedicated tools.
RISC-V Wants All Your Cores
It is not enough to want to dominate the world of CPUs. RISC-V has every core in its sights, and it’s starting to take steps to get there.

The post RISC-V Heralds New Era Of Cooperation appeared first on Semiconductor Engineering.

Semiconductor Engineering
Turbocharging Cost-Conscious SoCs With CacheJohn Min
Some design teams creating system-on-chip (SoC) devices are fortunate to work with the latest and greatest technology nodes coupled with a largely unconstrained budget for acquiring intellectual property (IP) blocks from trusted third-party vendors. However, many engineers are not so privileged. For every “spare no expense” project, there are a thousand “do the best you can with a limited budget” counterparts. One way to squeeze the most performance out of lower-cost, earlier generation, mid-ran
30. Květen 2024 v 09:04

Turbocharging Cost-Conscious SoCs With Cache

Semiconductor Engineering

Od: John Min

30. Květen 2024 v 09:04

Some design teams creating system-on-chip (SoC) devices are fortunate to work with the latest and greatest technology nodes coupled with a largely unconstrained budget for acquiring intellectual property (IP) blocks from trusted third-party vendors. However, many engineers are not so privileged. For every “spare no expense” project, there are a thousand “do the best you can with a limited budget” counterparts.

One way to squeeze the most performance out of lower-cost, earlier generation, mid-range processor and accelerator cores is to employ the judicious application of caches.

Cutting costs

A simplified example of a typical cost-conscious SoC scenario is illustrated in figure 1. Although the SoC may be composed of many IPs, only three are shown here for clarity.

Fig. 1: Portion of a cost-conscious, non-cache-coherent SoC. (Source: Arteris)

The predominant technology for connecting the IPs inside an SoC is network-on-chip (NoC) interconnect IP. This may be thought of as an IP that spans the entire device. The example shown in figure 1 may be assumed to reflect a non-cache-coherent scenario. In this case, any coherency requirements will be handled in software.

Let’s assume the SoC’s clock is running at 1GHz. Suppose a central processing unit (CPU) based on a reduced instruction set computer (RISC) architecture running a typical instruction will consume a single clock cycle. However, access to external DRAM memory can take anywhere between 100 and 200 processor clock cycles (we’ll average this out to be 150 cycles for the purposes of this article). This means that if the CPU lacked a Level 1 (L1) cache and was connected directly to the DRAM via the NoC and DDR memory controller, each instruction would consume 150 processor clock cycles, resulting in a CPU utilization of only 1/150 = 0.67%.

This is why CPUs, along with some accelerators and other IPs, employ cache memories to increase processor utilization and application performance. The underlying premise upon which the cache concept is based is the principle of locality. The idea is that only a small amount of the main memory is being employed at any given time and that locations in that space are being accessed multiple times. Mainly due to loops, nested loops and subroutines, instructions and their associated data experience temporal, spatial and sequential locality. This means that once a block of instructions and data have been copied from the main memory into an IP’s cache, the IP will typically access them repeatedly.

Today’s high-end CPU IPs usually have a minimum of a Level 1 (L1) and Level 2 (L2) cache, and they often have a Level 3 (L3) cache. Also, some accelerator IPs, like graphics processing units (GPUs) often have their own internal caches. However, these latest-generation high-end IPs often have a 5X to 10X price compared to their previous-generation mid-range counterparts. As a result, as illustrated in figure 1, the CPU in a cost-conscious SoC may come equipped with only an L1 cache.

Let’s consider the CPU and its L1 cache in a little more depth. When the CPU requests something in its cache, the result is called a cache hit. Since the L1 cache typically runs at the same speed as the processor core, a cache hit will be processed in a single processor clock cycle. By comparison, if the requested data is not in the cache, the result, called a cache miss, will require access to the main memory, which will consume 150 processor clock cycles.

Now consider running 1,000,000 instructions. If the cache were large enough to contain the whole program, then this would consume only 1,000,000 clock cycles, resulting in a CPU efficiency of 1,000,000 instructions/1,000,000 clock cycles = 100%.

Unfortunately, the L1 cache in a mid-range CPU will typically be only 16KB to 64KB in size. If we assume a 95% cache hit rate, then 950,000 of our 1,000,000 instructions will take one processor clock cycle. The remaining 50,000 instructions will each consume 150 clock cycles. Thus, the CPU efficiency in this case can be calculated as 1,000,000/((950,000 * 1) + (50,000 * 150)) = ~12%.

Turbocharging performance

A cost-effective way of turbocharging the performance of a cost-conscious SoC is to add cache IPs. For example, CodaCache from Arteris is a configurable, standalone non-coherent cache IP. Each CodaCache instance can be up to 8MB in size, and multiple copies can be instantiated in the same SoC, as demonstrated in figure 2.

Fig. 2: Portion of a turbocharged, non-cache-coherent SoC. (Source: Arteris)

It is not the intention of this article to suggest that every IP should be equipped with a CodaCache. Figure 2 is intended only to provide examples of potential CodaCache deployments.

If a CodaCache instance is associated with an IP, it’s known as a dedicated cache (DC). Alternatively, if a CodaCache instance is associated with a DDR memory controller, it’s referred to as a last-level cache (LLC). A DC will accelerate the performance of the IP with which it is associated, while an LLC will enhance the performance of the entire SoC.

As an example of the type of performance boost we might expect, consider the CPU shown in figure 2. Let’s assume the CodaCache DC instance associated with this IP is running at half the processor speed and that any accesses to this cache consume 20 processor clock cycles. If we also assume a 95% cache hit rate for this DC, then—for 1,000,000 instructions—our overall CPU+L1+DC efficiency can be calculated as 1,000,000/((950,000 * 1) + (47,500 * 20) + (2,500 * 150)) = ~44%. That’s a performance boost of ~273%!

Conclusion

In the past, embedded programmers relished the challenge of squeezing the highest performance possible out of small processors with low clock speeds and limited memory resources. In fact, it was common for computer magazines to issue challenges to their readers along the lines of, “Who can perform task X on processor Y in the minimum number of clock cycles using the smallest amount of memory?”

Today, many SoC developers enjoy the challenge of squeezing the highest performance possible out of their designs, especially if they are constrained to use lower-performing mid-range IPs. Deploying CodaCache IPs as dedicated and last-level caches provides an affordable way for engineers to turbocharge their cost-conscious SoCs. To learn more about CodaCache from Arteris, visit arteris.com.

The post Turbocharging Cost-Conscious SoCs With Cache appeared first on Semiconductor Engineering.

Semiconductor Engineering
AI-Powered Data Analytics To Revolutionize The Semiconductor IndustryReela Samuel
In the age where data reigns supreme, the semiconductor industry stands on the cusp of revolutionary change, redefining complexity and productivity through a lens crafted by artificial intelligence (AI). The intersection of AI and the semiconductor industry is not merely an emerging trend—it is the fulcrum upon which the next generation of technological innovation balances. Semiconductor companies are facing a critical juncture where the burgeoning complexity of chip designs is outpacing the gro
30. Květen 2024 v 09:03

AI-Powered Data Analytics To Revolutionize The Semiconductor Industry

Semiconductor Engineering

Od: Reela Samuel

30. Květen 2024 v 09:03

In the age where data reigns supreme, the semiconductor industry stands on the cusp of revolutionary change, redefining complexity and productivity through a lens crafted by artificial intelligence (AI). The intersection of AI and the semiconductor industry is not merely an emerging trend—it is the fulcrum upon which the next generation of technological innovation balances. Semiconductor companies are facing a critical juncture where the burgeoning complexity of chip designs is outpacing the growth of skilled human resources. This is where the infusion of AI-powered data analytics catalyzes a seismic shift in the industry’s approach to efficiency and productivity.

AI in semiconductor design: A revolution beckons

With technological leaps like 5G, AI, and autonomous vehicles driving chip demand, the status quo for semiconductor design is no longer sustainable. Traditional design methodologies fall short in addressing the challenges presented by these new technologies, and the need for a new approach is non-negotiable. AI, with its capacity to process massive datasets and learn from patterns, offers a revolutionary solution. Gathered with vast amounts of electronic design automation (EDA) data, machine learning algorithms can pave an efficient path through design complexities.

Navigating the complexity of EDA data with AI

The core of semiconductor design lies in the complexity of EDA data, which is often disparate, unstructured, and immensely intricate, existing in various formats ranging from simple text to sophisticated binary machine-readable data. AI presents a beacon of hope in taming this beast by enabling the industry to store, process, and analyze data with unprecedented efficiency.

AI-enabled data analytics offer a path through the labyrinth of EDA complexity, providing a scalable and sophisticated data storage and processing solution. By harnessing AI’s capabilities, the semiconductor industry can dissect, organize, and distill data into actionable insights, elevating the efficacy of chip design processes.

Leveraging AI in design excellence

Informed decisions are the cornerstone of successful chip design, and the fusion of AI-driven analytics with semiconductor engineering marks a watershed moment in the industry. AI’s ability to comprehend and process unstructured data at scale enables a deeper understanding of design challenges, yielding solutions that optimize SoCs’ power, performance, and area (PPA).

AI models, fed by fragmented data points from EDA compilation, can predict bottlenecks, performance constraints, or power inefficiencies before they impede the design process. This foresight empowers engineers with informed design decisions, fostering an efficient and anticipatory design culture.

Reimagining engineering team efficiency

One of the most significant roadblocks in the semiconductor industry has been aligning designer resources with the exponential growth of chip demand. As designs become complex, they evolve into multifaceted systems on chips (SoCs) housing myriad hierarchical blocks that accumulate vast amounts of data throughout the iterative development cycle. When harnessed effectively, this data possesses untapped potential to elevate the efficiency of engineering teams.

Consolidating data review into a systematic, knowledge-driven process paves the way for accelerated design closure and seamless knowledge transfer between projects. This refined approach can significantly enhance the productivity of engineering teams, a crucial factor if the semiconductor industry is to meet the burgeoning chip demand without exponentially expanding design teams.

Ensuring a systemic AI integration

For the full potential of AI to be realized, a systemic integration across the semiconductor ecosystem is paramount. This integration spans the collection and storage of data and the development of AI models attuned to the industry’s specific needs. Robust AI infrastructure, equipped to handle the diverse data formats, is the cornerstone of this integration. AI models must complement it and be fine-tuned to the peculiarities of semiconductor design, ensuring that the insights they produce are accurate and actionable.

Cultivating AI competencies within engineering teams

As AI plays a central role in the semiconductor industry, it highlights the need for AI competencies within engineering teams. Upskilling the workforce to leverage AI tools and platforms is a critical step toward a harmonized AI ecosystem. This journey toward proficiency entails familiarization with AI concepts and a collaborative approach that blends domain expertise with AI acumen. Engineering teams adept at harnessing AI can unlock its full potential and become pioneers of innovation in the semiconductor landscape.

Intelligent system design

At Cadence, the conception of technological ecosystems is encapsulated within a framework of three concentric circles—a model neatly epitomized by the sophistication of an electric vehicle. The first circle represents the data used by the car; the second circle represents the physical car, including the mechanical, electrical, hardware, and software components. The third circle represents the silicon that powers the entire system.

The Cadence.AI Platform operates at the vanguard of pervasive intelligence, harnessing data and AI-driven analytics to propel system and silicon design to unprecedented levels of excellence. By deploying Cadence.AI, we converge our computational software innovations, from Cadence’s Verisium AI-Driven Verification Platform to the Cadence Cerebrus Intelligent Chip Explorer’s AI-driven implementation.

The AI-driven future of semiconductor innovation

The implications are far-reaching as the semiconductor industry charts its course into an AI-driven era. AI promises to redefine design efficiency, expedite time to market, and pioneer new frontiers in chip innovation. The path forward demands a concerted effort to integrate AI seamlessly into the semiconductor fabric, cultivating an ecosystem primed for the challenges and opportunities ahead.

Semiconductor firms that champion AI adoption will set the standard for the industry’s evolution, carving a niche for themselves as pioneers of a new chip design and production paradigm. The future of semiconductor innovation is undoubtedly AI, and the time to embrace this transformative force is now.

Cadence is already at the forefront of this AI-led revolution. Our Cadence.AI Platform is a testament to AI’s power in redefining systems and silicon design. By enabling the concurrent creation of multiple designs, optimizing team productivity, and pioneering leaner design approaches, Cadence.AI illustrates the true potential of AI in semiconductor innovation.

The harmonized suite of our AI tools equips our customers with the ability to employ AI-driven optimization and debugging, facilitating the concurrent creation of multiple designs while optimizing the productivity of engineering teams. It empowers a leaner workforce to achieve more, elevating their capability to generate a spectrum of designs in parallel with unmatched efficiency and precision, resulting in a new frontier in design excellence, where AI acts as a co-pilot to the engineering team, steering the way to unparalleled chip performance. Learn more about the power of AI to forge intelligent designs.

The post AI-Powered Data Analytics To Revolutionize The Semiconductor Industry appeared first on Semiconductor Engineering.

Semiconductor Engineering
AI For Data ManagementAdam Kovac
Data management is becoming a significant new challenge for the chip industry, as well as a brand new opportunity, as the amount of data collected at every step of design through manufacturing continues to grow. Exacerbating the problem is the rising complexity of designs, many of which are highly customized and domain-specific at the leading edge, as well as increasing demands for reliability and traceability. There also is a growing focus on chiplets developed using different processes, includ
30. Květen 2024 v 09:03

AI For Data Management

Semiconductor Engineering

Od: Adam Kovac

30. Květen 2024 v 09:03

Data management is becoming a significant new challenge for the chip industry, as well as a brand new opportunity, as the amount of data collected at every step of design through manufacturing continues to grow.

Exacerbating the problem is the rising complexity of designs, many of which are highly customized and domain-specific at the leading edge, as well as increasing demands for reliability and traceability. There also is a growing focus on chiplets developed using different processes, including some from different foundries, and new materials such as glass substrates and ruthenium interconnects. On the design side, EDA and verification tools can generate terabytes of data on a weekly or even a daily basis, unlike in the past when this was largely done on a per-project basis.

While more data can be used to provide insights into processes and enable better designs, it’s an ongoing challenge to manage the current volumes being generated. The entire industry must rethink some well-proven methodologies and processes, as well as invest in a variety of new tools and approaches. At the same time, these changes are generating concern in an industry used to proceeding cautiously, one step at a time, based on silicon- and field-proven strategies. Increasingly, AI/ML is being added into design tools to identify anomalies and patterns in large data sets, and many of those tools are being regularly updated as algorithms are updated and new features are added, making it difficult to know exactly when and where to invest, which data to focus on, and with whom to share it.

“Every company has its own design flow, and almost every company has its own methodology around harvesting that data, or best practices about what reports should or should not be written out at what point,” said Rob Knoth, product management director in Cadence’s Digital & Signoff group. “There’s a death by 1,000 cuts that can happen in terms of just generating titanic volumes of data because, in general, disk space is cheap. People don’t think about it a lot, and they’ll just keep generating reports. The problem is that just because you’re generating reports doesn’t mean you’re using them.”

Fig. 1: Rising design complexity is driving increased need for data management. Source: IEEE Rising Stars 2022/Cadence

As with any problem in chip design, there is opportunity in figuring out a path forward. “You can always just not use the data, and then you’re back where you started,” said Tony Chan Carusone, CTO at Alphawave Semi. “The reason it becomes a problem for organizations is because they haven’t architected things from the beginning to be scalable, and therefore, to be able to handle all this data. Now, there’s an opportunity to leverage data, and it’s a different way. So it’s disruptive because you have to tear things apart, from re-architecting systems and processes to how you collect and store data, and organize it in order to take advantage of the opportunity.”

Buckets of data, buckets of problems
The challenges that come with this influx of data can be divided into three buckets, said Jim Schultz, senior staff product manager at Synopsys. The first is figuring out what information is actually critical to keep. “If you make a run, designers tend to save that run because if they need to do a follow up run, they have some data there and they may go, ‘Okay, well, what’s the runtime? How long did that run take, because my manager is going to ask me what I think the runtime is going to be on the next project or the next iteration of the block. While that data may not be necessary, designers and engineers have a tendency to hang onto it anyway, just in case.”

The second challenge is that once the data starts to pour in, it doesn’t stop, raising questions about how to manage collection. And third, once the data is collected, how can it be put to best use?

“Data analytics have been around with other types of companies exploring different types of data analytics, but the differences are those are can be very generic solutions,” said Schultz. “What we need for our industry is going to be very specific data analytics. If I have a timing issue, I want you to help me pinpoint what the cause of that timing violation is. That’s very specific to what we do in EDA. When we talk about who is cutting through the noise, we don’t want data that’s just presented. We want the data that is what the designer most cares about.”

Data security
The sheer number of tools being used and companies and people involved along the design pathway raises another challenge — security.

“There’s a lot of thought and investment going into the security aspect of data, and just as much as the problem of what data to save and store is the type of security we have to have without hindering the user day-to-day,” said Simon Rance, director of product management at Keysight. “That’s becoming a bigger challenge. Things like the CHIPS Act and the geopolitical scenarios we have at the moment are compounding that problem because a lot of the companies that used to create all these devices by themselves are having to collaborate, even with companies in different regions of the globe.”

This requires a balancing act. “It’s almost like a recording studio where you have all these knobs and dials to fine tune it, to make sure we have security of the data,” said Rance. “But we’re also able to get the job done as smoothly and as easily as we can.”

Further complicating the security aspect is that designing chips is not a one-man job. As leading-edge chips become increasingly complex and heterogeneous, they can involve hundreds of people in multiple companies.

“An important thing to consider when you’re talking about big data and analytics is what you’re going to share and with whom you’re going to share it,” said Synopsys’ Schultz. “In particular, when you start bringing in and linking data from different sources, if you start bringing in data related to silicon performance, you don’t want everybody to have access to that data. So the whole security protocol is important.”

Even the mundane matters — having a ton of data makes it likely, at some point, that data will be moved.

“The more places the data has to be transferred to, the more delays,” said Rance. “The bigger the data set, the longer it takes to go from A to B. For example, a design team in the U.S. may be designing during the day. Then, another team in Singapore or Japan will pick up on that design in their time zone, but they’re across the world. So you’re going to have to sync the data back and forth between these kinds of design sites. The bigger the data, the harder to sync.”

Solutions
The first step toward solving the issue of too much data is figuring out what data is actually needed. Rance said his team has found success using smart algorithms that help figure out which data is essential, which in turn can help optimize storage and transfer times.

There are less technical problems that can rear their heads, as well. Gina Jacobs, head of global communications and brand marketing at Arteris, said that engineers who use a set methodology — particularly those who are used to working on a problem by themselves and “brute forcing” a solution – also can find themselves overwhelmed by data.

“Engineers and designers can also switch jobs, taking with them institutional knowledge,” Jacobs said. “But all three problems can be solved with a single solution — having data stored in a standardized way that is easily accessible and sortable. It’s about taking data and requirements and specifications in different forms and then having it in the one place so that the different teams have access to it, and then being able to make changes so there is a single source of truth.”

Here, EDA design and data management tools are increasingly relying on artificial intelligence to help. Schultz forecasted a future where generative AI will touch every facet of chip development. “Along with that is the advanced data analytics that is able to mine all of that data you’ve been collecting, instead of going beyond the simple things that people have been doing, like predicting how long runtime is going to be or getting an idea what the performance is going to be,” he said. “Tools are going to be able to deal with all of that data and recognize trends much faster.”

Still, those all-encompassing AI tools, capable of complex analysis, are still years away. Cadence’s Knoth said he’s already encountered clients that are reluctant to bring it into the mix due to fears over the costs involved in disk space, compute resources, and licenses. Others, however, have been a bit more open-minded.

“Initially, AI can use a lot of processors to generate a lot of data because it’s doing a lot of things in parallel when it’s doing the inferencing, but it usually gets to the result faster and more predictably,” he said. So while a machine learning algorithm may generate even more vast amounts of data, on top of the piles currently available, “a good machine learning algorithm could be watching and smartly killing or restarting jobs where needed.”

As for the humans who are still an essential component to chip design, Alphawave’s Carusone said hardware engineers should take a page from lessons learned years ago from their counterparts in the software development world.

These include:

Having an organized and automated way to collect data, file it in a repository, and not do anything manually;
Developing ways to run verification and lab testing and everything in between in parallel, but with the data organized in a way that can be mined; and
Creating methods for rigorously checking in and out of different test cases that you want to consider.

“The big thing is you’ve got all this data collected, but then what is each of each of those files, each of those collections of data?” said Carusone. “What does that correspond to? What test conditions was that collected in? The software community dealt with that a while ago, and the hardware community also needs to have this under its belt, taking it to the next level and recognizing we really need to be able to do this en masse. We need to be able to have dozens of people work in parallel, collecting data and have it all on there. We can test a big collection of our designs in the lab without anyone having to touch a thing, and then also try refinements of the firmware, scale them out, then have all the data come in and be analyzed. Being able to have all that done in an automated way lets you track down and fix problems a lot more quickly.”

Conclusion
The influx of new tools used to analyze and test chip designs has increased productivity, but those designs come with additional considerations. Institutions and individual engineers and designers have never had access to so much data, but that data is of limited value if it’s not used effectively.

Strategies to properly store and order that data are essential. Some powerful tools are already in place to help do that, and the AI revolution promises to make even more powerful resources available to quickly cut down on the time needed to run tests and analyze the results.

For now, handling all that data remains a tricky balance, according to Cadence’s Knoth. “If this was an easy problem, it wouldn’t be a problem. Being able to communicate effectively, hierarchically — not just from a people management perspective, but also hierarchically from a chip and project management perspective — is difficult. The teams that do this well invest resources into that process, specifically the communication of top-down tightening of budgets or top-down floorplan constraints. These are important to think about because every engineer is looking at chip-level timing reports, but the problem that they’re trying to solve might not ever be visible. But if they have a report that says, ‘Here is your view of what your problems are to solve,’ you can make some very effective work.”

Further Reading
EDA Pushes Deeper Into AI
AI is both evolutionary and revolutionary, making it difficult to assess where and how it will be used, and what problems may crop up.
Optimizing EDA Cloud Hardware And Workloads
Algorithms written for GPUs can slice simulation time from weeks to hours, but not everything is optimized or benefits equally.

The post AI For Data Management appeared first on Semiconductor Engineering.