FreshRSS

Zobrazení pro čtení

Jsou dostupné nové články, klikněte pro obnovení stránky.

Giant Chips Give Supercomputers a Run for Their Money



As large supercomputers keep getting larger, Sunnyvale, California-based Cerebras has been taking a different approach. Instead of connecting more and more GPUs together, the company has been squeezing as many processors as it can onto one giant wafer. The main advantage is in the interconnects—by wiring processors together on-chip, the wafer-scale chip bypasses many of the computational speed losses that come from many GPUs talking to each other, as well as losses from loading data to and from memory.

Now, Cerebras has flaunted the advantages of their wafer-scale chips in two separate but related results. First, the company demonstrated that its second generation wafer-scale engine, WSE-2, was significantly faster than world’s fastest supercomputer, Frontier, in molecular dynamics calculations—the field that underlies protein folding, modeling radiation damage in nuclear reactors, and other problems in material science. Second, in collaboration with machine learning model optimization company Neural Magic, Cerebras demonstrated that a sparse large language model could perform inference at one-third of the energy cost of a full model without losing any accuracy. Although the results are in vastly different fields, they were both possible because of the interconnects and fast memory access enabled by Cerebras’ hardware.

Speeding Through the Molecular World

“Imagine there’s a tailor and he can make a suit in a week,” says Cerebras CEO and co-founder Andrew Feldman. “He buys the neighboring tailor, and she can also make a suit in a week, but they can’t work together. Now, they can now make two suits in a week. But what they can’t do is make a suit in three and a half days.”

According to Feldman, GPUs are like tailors that can’t work together, at least when it comes to some problems in molecular dynamics. As you connect more and more GPUs, they can simulate more atoms at the same time, but they can’t simulate the same number of atoms more quickly.

Cerebras’ wafer-scale engine, however, scales in a fundamentally different way. Because the chips are not limited by interconnect bandwidth, they can communicate quickly, like two tailors collaborating perfectly to make a suit in three and a half days.

“It’s difficult to create materials that have the right properties, that have a long lifetime and sufficient strength and don’t break.” —Tomas Oppelstrup, Lawrence Livermore National Laboratory

To demonstrate this advantage, the team simulated 800,000 atoms interacting with each other, calculating the interactions in increments of one femtosecond at a time. Each step took just microseconds to compute on their hardware. Although that’s still 9 orders of magnitude slower than the actual interactions, it was also 179 times as fast as the Frontier supercomputer. The achievement effectively reduced a year’s worth of computation to just two days.

This work was done in collaboration with Sandia, Lawrence Livermore, and Los Alamos National Laboratories. Tomas Oppelstrup, staff scientist at Lawrence Livermore National Laboratory, says this advance makes it feasible to simulate molecular interactions that were previously inaccessible.

Oppelstrup says this will be particularly useful for understanding the longer-term stability of materials in extreme conditions. “When you build advanced machines that operate at high temperatures, like jet engines, nuclear reactors, or fusion reactors for energy production,” he says, “you need materials that can withstand these high temperatures and very harsh environments. It’s difficult to create materials that have the right properties, that have a long lifetime and sufficient strength and don’t break.” Being able to simulate the behavior of candidate materials for longer, Oppelstrup says, will be crucial to the material design and development process.

Ilya Sharapov, principal engineer at Cerebras, say the company is looking forward to extending applications of its wafer-scale engine to a larger class of problems, including molecular dynamics simulations of biological processes and simulations of airflow around cars or aircrafts.

Downsizing Large Language Models

As large language models (LLMs) are becoming more popular, the energy costs of using them are starting to overshadow the training costs—potentially by as much as a factor of ten in some estimates. “Inference is is the primary workload of AI today because everyone is using ChatGPT,” says James Wang, director of product marketing at Cerebras, “and it’s very expensive to run especially at scale.”

One way to reduce the energy cost (and speed) of inference is through sparsity—essentially, harnessing the power of zeros. LLMs are made up of huge numbers of parameters. The open-source Llama model used by Cerebras, for example, has 7 billion parameters. During inference, each of those parameters is used to crunch through the input data and spit out the output. If, however, a significant fraction of those parameters are zeros, they can be skipped during the calculation, saving both time and energy.

The problem is that skipping specific parameters is a difficult to do on a GPU. Reading from memory on a GPU is relatively slow, because they’re designed to read memory in chunks, which means taking in groups of parameters at a time. This doesn’t allow GPUs to skip zeros that are randomly interspersed in the parameter set. Cerebras CEO Feldman offered another analogy: “It’s equivalent to a shipper, only wanting to move stuff on pallets because they don’t want to examine each box. Memory bandwidth is the ability to examine each box to make sure it’s not empty. If it’s empty, set it aside and then not move it.”

“There’s a million cores in a very tight package, meaning that the cores have very low latency, high bandwidth interactions between them.” —Ilya Sharapov, Cerebras

Some GPUs are equipped for a particular kind of sparsity, called 2:4, where exactly two out of every four consecutively stored parameters are zeros. State-of-the-art GPUs have terabytes per second of memory bandwidth. The memory bandwidth of Cerebras’ WSE-2 is more than one thousand times as high, at 20 petabytes per second. This allows for harnessing unstructured sparsity, meaning the researchers can zero out parameters as needed, wherever in the model they happen to be, and check each one on the fly during a computation. “Our hardware is built right from day one to support unstructured sparsity,” Wang says.

Even with the appropriate hardware, zeroing out many of the model’s parameters results in a worse model. But the joint team from Neural Magic and Cerebras figured out a way to recover the full accuracy of the original model. After slashing 70 percent of the parameters to zero, the team performed two further phases of training to give the non-zero parameters a chance to compensate for the new zeros.

This extra training uses about 7 percent of the original training energy, and the companies found that they recover full model accuracy with this training. The smaller model takes one-third of the time and energy during inference as the original, full model. “What makes these novel applications possible in our hardware,” Sharapov says, “Is that there’s a million cores in a very tight package, meaning that the cores have very low latency, high bandwidth interactions between them.”

Chip Industry Week In Review

President Biden will raise the tariff rate on Chinese semiconductors from 25% to 50% by 2025, among other measures to protect U.S. businesses from China’s trade practices. Also, as part of President Biden’s AI Executive Order, the Administration released steps to protect workers from AI risks, including human oversight of systems and transparency about what systems are being used.

Intel is in advanced talks with Apollo Global Management for the equity firm to provide more than $11 billion to build a fab in Ireland, reported the Wall Street Journal. Also, Intel’s Foundry Services appointed Kevin O’Buckley as the senior vice president and general manager.

Polar is slated to receive up to $120 million in CHIPS Act funding to establish an independent American foundry in Minnesota. The company expects to invest about $525 million in the expansion of the facility over the next two years, with a $75 million investment from the State of Minnesota.

Arm plans to develop AI chips for launch next year, reports Nikkei Asia.

South Korea is planning a support package worth more than 10 trillion won ($7.3 billion) aimed at chip materials, equipment makers, and fabless companies throughout the semiconductor supply chain, according to Reuters.

Quick links to more news:

Global
In-Depth
Markets and Money
Security
Supercomputing
Education and Training
Product News
Research
Events and Further Reading


Global

Edwards opened a new facility in Asan City, South Korea. The 15,000m² factory provides a key production site for abatement systems, and integrated vacuum and abatement systems for semiconductor manufacturing.

France’s courtship with mega-tech is paying off.  Microsoft is investing more than US $4 billion to expand its cloud computing and AI infrastructure, including bringing up to 25,000 advanced GPUs to the country by the end of 2025. The “Choose France” campaign also snagged US $1.3 billion from Amazon for cloud infrastructure expansion, genAI and more.

Toyota, Nissan, and Honda are teaming up on AI and chips for next-gen cars with support from Japan’s Ministry of Economy, Trade and Industry, (METI), reports Nikkei Asia.

Meanwhile, IBM and Honda are collaborating on long-term R&D of next-gen technologies for software-defined vehicles (SDV), including chiplets, brain-inspired computing, and hardware-software co-optimization.

Siemens and Foxconn plan to collaborate on global manufacturing processes in electronics, information and communications technology, and electric vehicles (EV).

TSMC confirmed a Q424 construction start date for its first European plant in Dresden, Germany.

Amazon Web Services (AWS) plans to invest €7.8 billion (~$8.4B) in the AWS European Sovereign Cloud in Germany through 2040. The system is designed to serve public sector organizations and customers in highly regulated industries.


In-Depth

Semiconductor Engineering published its Low Power-High Performance newsletter this week, featuring these stories:

And this week’s Test, Measurement & Analytics newsletter featured these stories:


Markets and Money

The U.S. National Institute of Standards and Technology (NIST) awarded more than $1.2 million to 12 businesses in 8 states under the Small Business Innovation Research (SBIR) Program to fund R&D of products relating to cybersecurity, quantum computing, health care, semiconductor manufacturing, and other critical areas.

Engineering services and consulting company Infosys completed the acquisition of InSemi Technology, a provider of semiconductor design and embedded software development services.

The quantum market, which includes quantum networking and sensors alongside computing, is predicted to grow from $838 million in 2024 to $1.8 billion in 2029, reports Yole.

Shipments of OLED monitors reached about 200,000 units in Q1 2024, a year over year growth of 121%, reports TrendForce.

Global EV sales grew 18% in Q1 2024 with plug-in hybrid electric vehicles (PHEV) sales seeing 46% YoY growth and battery electric vehicle (BEV) sales growing just 7%, according to Counterpoint. China leads global EV sales with 28% YoY growth, while the US grew just 2%. Tesla saw a 9% YoY drop, but topped BEV sales with a 19% market share. BYD grew 13% YoY and exported about 100,000 EVs with 152% YoY growth, mainly in Southeast Asia.

DeepX raised $80.5 million in Series C funding for its on-device NPU IP and AI SoCs tailored for applications including physical security, robotics, and mobility.

MetisX raised $44 million in Series A funding for its memory solutions built on Compute Express Link (CXL) for accelerating large-scale data processing applications.


Security

While security experts have been warning of a growing threat in electronics for decades, there have been several recent fundamental changes that elevate the risk.

Synopsys and the Ponemon Institute released a report showing 54% of surveyed organizations suffered a software supply chain attack in the past year and 20% were not effective in their response. And 52% said their development teams use AI tools to generate code, but only 32% have processes to evaluate it for license, security, and quality risks.

Researchers at Ruhr University Bochum and TU Darmstadt presented a solution for the automated generation of fault-resistant circuits (AGEFA) and assessed the security of examples generated by AGEFA against side-channel analysis and fault injection.

TXOne reported on operational technology security and the most effective method for preventing production interruptions caused by cyber-attacks.

CrowdStrike and NVIDIA are collaborating to accelerate the use of analytics and AI in cybersecurity to help security teams combat modern cyberattacks, including AI-powered threats.

The National Institute of Standards and Technology (NIST) finalized its guidelines for protecting sensitive data, known as controlled unclassified information, aimed at organizations that do business with the federal government.

The Defense Advanced Research Projects Agency (DARPA) awarded BAE Systems a $12 million contract to solve thermal challenges limiting electronic warfare systems, particularly in GaN transistors.

Sigma Defense won a $4.7 million contract from the U.S. Army for an AI-powered virtual training environment, partnering with Brightline Interactive on a system that uses spatial computing and augmented intelligence workflows.

SkyWater’s advanced packaging operation in Florida has been accredited as a Category 1A Trusted Supplier by the Defense Microelectronics Activity (DMEA) of the U.S. Department of Defense (DoD).

Videos of two CWE-focused sessions from CVE/FIRST VulnCon 2024 were made available on the CWE YouTube Channel.

The Cybersecurity and Infrastructure Security Agency (CISA) issued a number of alerts/advisories.


Supercomputing

Supercomputers are battling for top dog.

The Frontier supercomputer at Oak Ridge National Laboratory (ORNL) retained the top spot on the Top500 list of the world’s fastest systems with an HPL score of 1.206 EFlop/s. The as-yet incomplete Aurora system at Argonne took second place, becoming the world’s second exascale system at 1.012 EFlop/s. The Green500 list, which tracks energy efficiency of compute, saw three new entrants take the top places.

Cerebras Systems, Sandia National Laboratory, Lawrence Livermore National Laboratory, and Los Alamos National Laboratory used Cerebras’ second generation Wafer Scale Engine to perform atomic scale molecular dynamics simulations at the millisecond scale, which they claim is 179X faster than the Frontier supercomputer.

UT Austin‘s Stampede3 Supercomputer is now in full production, serving the open science community through 2029.


Education and Training

SEMI announced the SEMI University Semiconductor Certification Programs to help alleviate the workforce skills gap. Its first two online courses are designed for new talent seeking careers in the industry, and experienced workers looking to keep their skills current.  Also, SEMI and other partners launched a European Chip Skills Academy Summer School in Italy.

Siemens created an industry credential program for engineering students that supplements a formal degree by validating industry knowledge and skills. Nonprofit agency ABET will provide accreditation. The first two courses are live at the University of Colorado Boulder (CU Boulder) and a series is planned with Pennsylvania State University (Penn State).

Syracuse University launched a $20 million Center for Advanced Semiconductor Manufacturing, with co-funding from Onondaga County.

Starting young is a good thing.  An Arizona school district, along with the University Of Arizona,  is creating a semiconductor program for high schoolers.


Product News

Siemens and Sony partnered to enable immersive engineering via a spatial content creation system, NX Immersive Designer, which includes Sony’s XR head-mounted display. The integration of hardware and software gives designers and engineers natural ways to interact with a digital twin. Siemens also extended its Xcelerator as a Service portfolio with solutions for product engineering and lifecycle management, cloud-based high-performance simulation, and manufacturing operations management. It will be available on Microsoft Azure, as well.

Advantest announced the newest addition to its portfolio of power supplies for the V93000 EXA Scale SoC test platform. The DC Scale XHC32 power supply offers 32 channels with single-instrument total current of up to 640A.

Fig. 1: Advantest’s DC Scale XHC32. Source: Advantest

Infineon released its XENSIV TLE49SR angle sensors, which can withstand stray magnetic fields of up to 8 mT, ideal for applications of safety-critical automotive chassis systems.

Google debuted its sixth generation Cloud TPU, 4.7X faster and 67% more energy-efficient than the previous generation, with double the high-bandwidth memory.

X-Silicon uncorked a RISC-V vector CPU, coupled with a Vulkan-enabled GPU ISA and AI/ML acceleration in a single processor core, aimed at embedded and IoT applications.

IBM expanded its Qiskit quantum software stack, including the stable release of its SDK for building, optimizing, and visualizing quantum circuits.

Northeastern University announced the general availability of testing and integration solutions for Open RAN through the Open6G Open Testing and Integration Center (Open 6G OTIC).


Research

The University of Glasgow received £3 million (~$3.8M) from the Engineering and Physical Sciences Research Council (EPSRC)’s Strategic Equipment Grant scheme to help establish “Analogue,” an Automated Nano Analysing, Characterisation and Additive Packaging Suite to research silicon chip integration and packaging.

EPFL researchers developed scalable photonic ICs, based on lithium tantalate.

DISCO developed a way to increase the diameter of diamond wafers that uses the KABRA process, a laser ingot slicing method.

CEA-Leti developed two complementary approaches for high performance photon detectors — a mercury cadmium telluride-based avalanche photodetector and a superconducting single photon detector.

Toshiba demonstrated storage capacities of over 30TB with two next-gen large capacity recording technologies for hard disk drives (HDDs): Heat Assisted Magnetic Recording (HAMR) and Microwave Assisted Magnetic Recording (MAMR).

Caltech neuroscientists reported that their brain-machine interface (BMI) worked successfully in a second human patient, following 2022’s first instance, proving the device is not dependent on one particular brain or one location in a brain.

Linköping University researchers developed a cheap, sustainable battery made from zinc and lignin, while ORNL researchers developed carbon-capture batteries.


Events and Further Reading

Find upcoming chip industry events here, including:

Event Date Location
European Test Symposium May 20 – 24 The Hague, Netherlands
NI Connect Austin 2024 May 20 – 22 Austin, Texas
ITF World 2024 (imec) May 21 – 22 Antwerp, Belgium
Embedded Vision Summit May 21 – 23 Santa Clara, CA
ASIP Virtual Seminar 2024 May 22 Online
Electronic Components and Technology Conference (ECTC) 2024 May 28 – 31 Denver, Colorado
Hardwear.io Security Trainings and Conference USA 2024 May 28 – Jun 1 Santa Clara, CA
SW Test Jun 3 – 5 Carlsbad, CA
IITC2024: Interconnect Technology Conference Jun 3 – 6 San Jose, CA
VOICE Developer Conference Jun 3 – 5 La Jolla, CA
CHIPS R&D Standardization Readiness Level Workshop Jun 4 – 5 Online and Boulder, CO
Find All Upcoming Events Here

Upcoming webinars are here.


Semiconductor Engineering’s latest newsletters:

Automotive, Security and Pervasive Computing
Systems and Design
Low Power-High Performance
Test, Measurement and Analytics
Manufacturing, Packaging and Materials

 

The post Chip Industry Week In Review appeared first on Semiconductor Engineering.

Expect a Wave of Wafer-Scale Computers



At TSMC’s North American Technology Symposium on Wednesday, the company detailed both its semiconductor technology and chip-packaging technology road maps. While the former is key to keeping the traditional part of Moore’s Law going, the latter could accelerate a trend toward processors made from more and more silicon, leading quickly to systems the size of a full silicon wafer. Such a system, Tesla’s next generation Dojo training tile is already in production, TSMC says. And in 2027 the foundry plans to offer technology for more complex wafer-scale systems than Tesla’s that could deliver 40 times as much computing power as today’s systems.

For decades chipmakers increased the density of logic on processors largely by scaling down the area that transistors take up and the size of interconnects. But that scheme has been running out of steam for a while now. Instead, the industry is turning to advanced packaging technology that allows a single processor to be made from a larger amount of silicon. The size of a single chip is hemmed in by the largest pattern that lithography equipment can make. Called the reticle limit, that’s currently about 800 square millimeters. So if you want more silicon in your GPU you need to make it from two or more dies. The key is connecting those dies so that signals can go from one to the other as quickly and with as little energy as if they were all one big piece of silicon.

TSMC already makes a wafer-size AI accelerator for Cerebras, but that arrangement appears to be unique and is different from what TSMC is now offering with what it calls System-on-Wafer.

In 2027, you will get a full-wafer integration that delivers 40 times as much compute power, more than 40 reticles’ worth of silicon, and room for more than 60 high-bandwidth memory chips, TSMC predicts

For Cerebras, TSMC makes a wafer full of identical arrays of AI cores that are smaller than the reticle limit. It connects these arrays across the “scribe lines,” the areas between dies that are usually left blank, so the wafer can be diced up into chips. No chipmaking process is perfect, so there are always flawed parts on every wafer. But Cerebras designed in enough redundancy that it doesn’t matter to the finished computer.

However, with its first round of System-on-Wafer, TSMC is offering a different solution to the problems of both reticle limit and yield. It starts with already tested logic dies to minimize defects. (Tesla’s Dojo contains a 5-by-5 grid of pretested processors.) These are placed on a carrier wafer, and the blank spots between the dies are filled in. Then a layer of high-density interconnects is constructed to connect the logic using TSMC’s integrated fan-out technology. The aim is to make data bandwidth among the dies so high that they effectively act like a single large chip.

By 2027, TSMC plans to offer wafer-scale integration based on its more advanced packaging technology, chip-on-wafer-on-substrate (CoWoS). In that technology, pretested logic and, importantly, high-bandwidth memory, is attached to a silicon substrate that’s been patterned with high-density interconnects and shot through with vertical connections called through-silicon vias. The attached logic chips can also take advantage of the company’s 3D-chip technology called system-on-integrated chips (SoIC).

The wafer-scale version of CoWoS is the logical endpoint of an expansion of the packaging technology that’s already visible in top-end GPUs. Nvidia’s next GPU, Blackwell, uses CoWos to integrate more than 3 reticle sizes’ worth of silicon, including 8 high-bandwidth memory (HBM) chips. By 2026, the company plans to expand that to 5.5 reticles, including 12 HBMs. TSMC says that would translate to more than 3.5 times as much compute power as its 2023 tech allows. But in 2027, you can get a full wafer integration that delivers 40 times as much compute, more than 40 reticles’ worth of silicon and room for more than 60 HBMs, TSMC predicts.

What Wafer Scale Is Good For

The 2027 version of system-on-wafer somewhat resembles technology called Silicon-Interconnect Fabric, or Si-IF, developed at UCLA more than five years ago. The team behind SiIF includes electrical and computer-engineering professor Puneet Gupta and IEEE Fellow Subramanian Iyer, who is now charged with implementing the packaging portion of the United States’ CHIPS Act.

Since then, they’ve been working to make the interconnects on the wafer more dense and to add other features to the technology. “If you want this as a full technology infrastructure, it needs to do many other things beyond just providing fine-pitch connectivity,” says Gupta, also an IEEE Fellow. “One of the biggest pain points for these large systems is going to be delivering power.” So the UCLA team is working on ways to add good-quality capacitors and inductors to the silicon substrate and integrating gallium nitride power transistors.

AI training is the obvious first application for wafer-scale technology, but it is not the only one, and it may not even be the best, says University of Illinois Urbana-Champaign computer architect and IEEE Fellow Rakesh Kumar. At the International Symposium on Computer Architecture in June, his team is presenting a design for a wafer-scale network switch for data centers. Such a system could cut the number of advanced network switches in a very large—16,000-rack—data center from 4,608 to just 48, the researchers report. A much smaller, enterprise-scale, data center for say 8,000 servers could get by using a single wafer-scale switch.

❌