IEEE Spectrum
Ansys SimAI Software Predicts Fully Transient Vehicle Crash OutcomesAnsys
The Ansys SimAI™ cloud-enabled generative artificial intelligence (AI) platform combines the predictive accuracy of Ansys simulation with the speed of generative AI. Because of the software’s versatile underlying neural networks, it can extend to many types of simulation, including structural applications. This white paper shows how the SimAI cloud-based software applies to highly nonlinear, transient structural simulations, such as automobile crashes, and includes: Vehicle kinematics and d
20. Srpen 2024 v 20:09

Ansys SimAI Software Predicts Fully Transient Vehicle Crash Outcomes

Od: Ansys

20. Srpen 2024 v 20:09

The Ansys SimAI™ cloud-enabled generative artificial intelligence (AI) platform combines the predictive accuracy of Ansys simulation with the speed of generative AI. Because of the software’s versatile underlying neural networks, it can extend to many types of simulation, including structural applications.
This white paper shows how the SimAI cloud-based software applies to highly nonlinear, transient structural simulations, such as automobile crashes, and includes:

Vehicle kinematics and deformation
Forces acting upon the vehicle
How it interacts with its environment
How understanding the changing and rapid sequence of events helps predict outcomes

These simulations can reduce the potential for occupant injuries and the severity of vehicle damage and help understand the crash’s overall dynamics. Ultimately, this leads to safer automotive design.

Download this free whitepaper now!

IEEE Spectrum
A New Type of Neural Network Is More InterpretableMatthew Hutson
Artificial neural networks—algorithms inspired by biological brains—are at the center of modern artificial intelligence, behind both chatbots and image generators. But with their many neurons, they can be black boxes, their inner workings uninterpretable to users. Researchers have now created a fundamentally new way to make neural networks that in some ways surpasses traditional systems. These new networks are more interpretable and also more accurate, proponents say, even when they’re smaller.
5. Srpen 2024 v 17:00

A New Type of Neural Network Is More Interpretable

IEEE Spectrum

Od: Matthew Hutson

5. Srpen 2024 v 17:00

Artificial neural networks—algorithms inspired by biological brains—are at the center of modern artificial intelligence, behind both chatbots and image generators. But with their many neurons, they can be black boxes, their inner workings uninterpretable to users.

Researchers have now created a fundamentally new way to make neural networks that in some ways surpasses traditional systems. These new networks are more interpretable and also more accurate, proponents say, even when they’re smaller. Their developers say the way they learn to represent physics data concisely could help scientists uncover new laws of nature.

“It’s great to see that there is a new architecture on the table.” —Brice Ménard, Johns Hopkins University

For the past decade or more, engineers have mostly tweaked neural-network designs through trial and error, says Brice Ménard, a physicist at Johns Hopkins University who studies how neural networks operate but was not involved in the new work, which was posted on arXiv in April. “It’s great to see that there is a new architecture on the table,” he says, especially one designed from first principles.

One way to think of neural networks is by analogy with neurons, or nodes, and synapses, or connections between those nodes. In traditional neural networks, called multi-layer perceptrons (MLPs), each synapse learns a weight—a number that determines how strong the connection is between those two neurons. The neurons are arranged in layers, such that a neuron from one layer takes input signals from the neurons in the previous layer, weighted by the strength of their synaptic connection. Each neuron then applies a simple function to the sum total of its inputs, called an activation function.

black text on a white background with red and blue lines connecting on the left and black lines connecting on the right In traditional neural networks, sometimes called multi-layer perceptrons [left], each synapse learns a number called a weight, and each neuron applies a simple function to the sum of its inputs. In the new Kolmogorov-Arnold architecture [right], each synapse learns a function, and the neurons sum the outputs of those functions.The NSF Institute for Artificial Intelligence and Fundamental Interactions

In the new architecture, the synapses play a more complex role. Instead of simply learning how strong the connection between two neurons is, they learn the full nature of that connection—the function that maps input to output. Unlike the activation function used by neurons in the traditional architecture, this function could be more complex—in fact a “spline” or combination of several functions—and is different in each instance. Neurons, on the other hand, become simpler—they just sum the outputs of all their preceding synapses. The new networks are called Kolmogorov-Arnold Networks (KANs), after two mathematicians who studied how functions could be combined. The idea is that KANs would provide greater flexibility when learning to represent data, while using fewer learned parameters.

“It’s like an alien life that looks at things from a different perspective but is also kind of understandable to humans.” —Ziming Liu, Massachusetts Institute of Technology

The researchers tested their KANs on relatively simple scientific tasks. In some experiments, they took simple physical laws, such as the velocity with which two relativistic-speed objects pass each other. They used these equations to generate input-output data points, then, for each physics function, trained a network on some of the data and tested it on the rest. They found that increasing the size of KANs improves their performance at a faster rate than increasing the size of MLPs did. When solving partial differential equations, a KAN was 100 times as accurate as an MLP that had 100 times as many parameters.

In another experiment, they trained networks to predict one attribute of topological knots, called their signature, based on other attributes of the knots. An MLP achieved 78 percent test accuracy using about 300,000 parameters, while a KAN achieved 81.6 percent test accuracy using only about 200 parameters.

What’s more, the researchers could visually map out the KANs and look at the shapes of the activation functions, as well as the importance of each connection. Either manually or automatically they could prune weak connections and replace some activation functions with simpler ones, like sine or exponential functions. Then they could summarize the entire KAN in an intuitive one-line function (including all the component activation functions), in some cases perfectly reconstructing the physics function that created the dataset.

“In the future, we hope that it can be a useful tool for everyday scientific research,” says Ziming Liu, a computer scientist at the Massachusetts Institute of Technology and the paper’s first author. “Given a dataset we don’t know how to interpret, we just throw it to a KAN, and it can generate some hypothesis for you. You just stare at the brain [the KAN diagram] and you can even perform surgery on that if you want.” You might get a tidy function. “It’s like an alien life that looks at things from a different perspective but is also kind of understandable to humans.”

Dozens of papers have already cited the KAN preprint. “It seemed very exciting the moment that I saw it,” says Alexander Bodner, an undergraduate student of computer science at the University of San Andrés, in Argentina. Within a week, he and three classmates had combined KANs with convolutional neural networks, or CNNs, a popular architecture for processing images. They tested their Convolutional KANs on their ability to categorize handwritten digits or pieces of clothing. The best one approximately matched the performance of a traditional CNN (99 percent accuracy for both networks on digits, 90 percent for both on clothing) but using about 60 percent fewer parameters. The datasets were simple, but Bodner says other teams with more computing power have begun scaling up the networks. Other people are combining KANs with transformers, an architecture popular in large language models.

One downside of KANs is that they take longer per parameter to train—in part because they can’t take advantage of GPUs. But they need fewer parameters. Liu notes that even if KANs don’t replace giant CNNs and transformers for processing images and language, training time won’t be an issue at the smaller scale of many physics problems. He’s looking at ways for experts to insert their prior knowledge into KANs—by manually choosing activation functions, say—and to easily extract knowledge from them using a simple interface. Someday, he says, KANs could help physicists discover high-temperature superconductors or ways to control nuclear fusion.

IEEE Spectrum
Vodafone Launches Private 5G Tech to Compete With Wi-FiMargo Anderson
As the world’s 5G rollout continues with its predictable fits and starts, the cellular technology is also starting to move into a space already dominated by another wireless tech: Wi-Fi. Private 5G networks—in which a person or company sets up their own facility-wide cellular network—are today finding applications where Wi-Fi was once the only viable game in town. This month, the Newbury, England–based telecom company Vodafone is releasing a Raspberry Pi–based private 5G base station that it is
20. Červen 2024 v 17:01

Vodafone Launches Private 5G Tech to Compete With Wi-Fi

IEEE Spectrum

Od: Margo Anderson

20. Červen 2024 v 17:01

As the world’s 5G rollout continues with its predictable fits and starts, the cellular technology is also starting to move into a space already dominated by another wireless tech: Wi-Fi. Private 5G networks—in which a person or company sets up their own facility-wide cellular network—are today finding applications where Wi-Fi was once the only viable game in town. This month, the Newbury, England–based telecom company Vodafone is releasing a Raspberry Pi–based private 5G base station that it is now being aimed at developers, who might then jump-start a wave of private 5G innovation.

“The Raspberry Pi is the most affordable CPU[-based] computer that you can get,” says Santiago Tenorio, network architecture director at Vodafone. “Which means that what we build, in essence, has a similar bill of materials as a good quality Wi-Fi router.”

The company has teamed with the Surrey, England–based Lime Microsystems to release a crowd-funded range of private 5G base-station kits ranging in price from US $800 to $12,000.

“In a Raspberry Pi—in this case, a Raspberry Pi 4 is what we used—then you can be sure you can run that anywhere, because it’s the tiniest processor that you can have,” Tenorio says.

a person holding a black box in their hand Santiago Tenorio holds one of Lime Microsystems’ private 5G base-station kits.Vodafone

Private 5G for Drones and Bakeries

There are a range of reasons, Tenorio says, why someone might want their own private 5G network. At the moment, the scenarios mostly concern companies and organizations—although individual expert users could still be drawn to, for instance, 5G’s relatively low latency and network flexibility.

Tenorio highlighted security and mobility as two big selling points for private 5G.

A commercial storefront business, for instance, might be attracted to the extra security protections that a SIM card can provide compared to password-based wireless network security. Because each SIM card contains its own unique identifier and encryption keys, thereby also enabling a network to be able to recognize and authorize each individual connection, Tenorio says private 5G network security is a considerable selling point.

Plus, Tenorio says, it’s simpler for customers to access. Envisioning a use case of a bakery with its own privately deployed 5G network, he says, “You don’t need a password. You don’t need a conversation [with a clerk behind a counter] or a QR code. You simply walk into the bakery, and you are into the bakery’s network.”

As to mobility, Tenorio suggests one emergency relief and rescue application that might rely on the presence of a nearby 5G station that causes devices in its range to ping.

Setting up a private 5G base station on a drone, Tenorio says, would enable that drone to fly over a disaster area and, via its airborne network, send a challenge signal to all devices in its coverage area to report in. Any device receiving that signal with a compatible SIM card then responds with its unique identification information.

“Then any phone would try to register,” Tenorio says. “And then you would know if there is someone [there].”

Not only that, but because the ping would be from a device with a SIM card, the private 5G rescue drone in the above scenario could potentially provide crucial information about each individual, just based on the device’s identifier alone. And that user-identifying feature of private 5G isn’t exactly irrelevant to a bakery owner—or to any other commercial customer—either, Tenorio says.

“If you are a bakery,” he says, “You could actually know who your customers are, because anyone walking into the bakery would register on your network and would leave their [International Mobile Subscriber Identity].”

Winning the Lag Race

According to Christian Wietfeld, professor of electrical engineering at the Technical University of Dortmund in Germany, private 5G networks also bring the possibility of less lag. His team has tested private 5G deployments—although, Wietfeld says that they haven’t yet tested the present Vodafone/Lime Microsystem base station—and have found private 5G to provide reliably better connectivity.

Wietfeld’s team will present their research at the IEEE International Symposium on Personal, Indoor and Mobile Radio Communications in September in Valencia, Spain. They found that private 5G can deliver connections up to 10 times as fast as connections in networks with high loads, compared to Wi-Fi (the IEEE 802.11 wireless standard).

“The additional cost and effort to operate a private 5G network pays off in lower downtimes of production and less delays in delivery of goods,” Wietfeld says. “Also, for safety-critical use cases such as on-campus teleoperated driving, private 5G networks provide the necessary reliability and predictability of performance.”

For Lime Networks, according to the company’s CEO and founder Ebrahim Bushehri, the challenge comes in developing a private 5G base station that maximized versatility and openness to whatever kinds of applications developers could envision—while still being reasonably inexpensive and retaining a low-power envelope.

“The solution had to be ultraportable and with an optional battery pack which could be mounted on drones and autonomous robots, for remote and tactical deployments, such as emergency-response scenarios and temporary events,” Bushehri says.

Meanwhile, the crowdfunding behind the device’s rollout, via the website Crowd Supply, allows both companies to keep tabs on the kinds of applications the developer community is envisioning for this technology, he says.

“Crowdfunding,” Bushehri says, “Is one of the key indicators of community interest and engagement. Hence the reason for launching the campaign on Crowd Supply to get feedback from early adopters.”

IEEE Spectrum
Nvidia Conquers Latest AI TestsSamuel K. Moore
For years, Nvidia has dominated many machine learning benchmarks, and now there are two more notches in its belt. MLPerf, the AI benchmarking suite sometimes called “the Olympics of machine learning,” has released a new set of training tests to help make more and better apples-to-apples comparisons between competing computer systems. One of MLPerf’s new tests concerns fine-tuning of large language models, a process that takes an existing trained model and trains it a bit more with specialized
12. Červen 2024 v 17:00

Nvidia Conquers Latest AI Tests

IEEE Spectrum

Od: Samuel K. Moore

12. Červen 2024 v 17:00

For years, Nvidia has dominated many machine learning benchmarks, and now there are two more notches in its belt.

MLPerf, the AI benchmarking suite sometimes called “the Olympics of machine learning,” has released a new set of training tests to help make more and better apples-to-apples comparisons between competing computer systems. One of MLPerf’s new tests concerns fine-tuning of large language models, a process that takes an existing trained model and trains it a bit more with specialized knowledge to make it fit for a particular purpose. The other is for graph neural networks, a type of machine learning behind some literature databases, fraud detection in financial systems, and social networks.

Even with the additions and the participation of computers using Google’s and Intel’s AI accelerators, systems powered by Nvidia’s Hopper architecture dominated the results once again. One system that included 11,616 Nvidia H100 GPUs—the largest collection yet—topped each of the nine benchmarks, setting records in five of them (including the two new benchmarks).

“If you just throw hardware at the problem, it’s not a given that you’re going to improve.” —Dave Salvator, Nvidia

The 11,616-H100 system is “the biggest we’ve ever done,” says Dave Salvator, director of accelerated computing products at Nvidia. It smashed through the GPT-3 training trial in less than 3.5 minutes. A 512-GPU system, for comparison, took about 51 minutes. (Note that the GPT-3 task is not a full training, which could take weeks and cost millions of dollars. Instead, the computers train on a representative portion of the data, at an agreed-upon point well before completion.)

Compared to Nvidia’s largest entrant on GPT-3 last year, a 3,584 H100 computer, the 3.5-minute result represents a 3.2-fold improvement. You might expect that just from the difference in the size of these systems, but in AI computing that isn’t always the case, explains Salvator. “If you just throw hardware at the problem, it’s not a given that you’re going to improve,” he says.

“We are getting essentially linear scaling,” says Salvator. By that he means that twice as many GPUs lead to a halved training time. “[That] represents a great achievement from our engineering teams,” he adds.

Competitors are also getting closer to linear scaling. This round Intel deployed a system using 1,024 GPUs that performed the GPT-3 task in 67 minutes versus a computer one-fourth the size that took 224 minutes six months ago. Google’s largest GPT-3 entry used 12-times the number of TPU v5p accelerators as its smallest entry and performed its task nine times as fast.

Linear scaling is going to be particularly important for upcoming “AI factories” housing 100,000 GPUs or more, Salvator says. He says to expect one such data center to come online this year, and another, using Nvidia’s next architecture, Blackwell, to startup in 2025.

Nvidia’s streak continues

Nvidia continued to boost training times despite using the same architecture, Hopper, as it did in last year’s training results. That’s all down to software improvements, says Salvator. “Typically, we’ll get a 2-2.5x [boost] from software after a new architecture is released,” he says.

For GPT-3 training, Nvidia logged a 27 percent improvement from the June 2023 MLPerf benchmarks. Salvator says there were several software changes behind the boost. For example, Nvidia engineers tuned up Hopper’s use of less accurate, 8-bit floating point operations by trimming unnecessary conversions between 8-bit and 16-bit numbers and better targeting of which layers of a neural network could use the lower precision number format. They also found a more intelligent way to adjust the power budget of each chip’s compute engines, and sped communication among GPUs in a way that Salvator likened to “buttering your toast while it’s still in the toaster.”

Additionally, the company implemented a scheme called flash attention. Invented in the Stanford University laboratory of Samba Nova founder Chris Re, flash attention is an algorithm that speeds transformer networks by minimizing writes to memory. When it first showed up in MLPerf benchmarks, flash attention shaved as much as 10 percent from training times. (Intel, too, used a version of flash attention but not for GPT-3. It instead used the algorithm for one of the new benchmarks, fine-tuning.)

Using other software and network tricks, Nvidia delivered an 80 percent speedup in the text-to-image test, Stable Diffusion, versus its submission in November 2023.

New benchmarks

MLPerf adds new benchmarks and upgrades old ones to stay relevant to what’s happening in the AI industry. This year saw the addition of fine-tuning and graph neural networks.

Fine tuning takes an already trained LLM and specializes it for use in a particular field. Nvidia, for example took a trained 43-billion-parameter model and trained it on the GPU-maker’s design files and documentation to create ChipNeMo, an AI intended to boost the productivity of its chip designers. At the time, the company’s chief technology officer Bill Dally said that training an LLM was like giving it a liberal arts education, and fine tuning was like sending it to graduate school.

The MLPerf benchmark takes a pretrained Llama-2-70B model and asks the system to fine tune it using a dataset of government documents with the goal of generating more accurate document summaries.

There are several ways to do fine-tuning. MLPerf chose one called low-rank adaptation (LoRA). The method winds up training only a small portion of the LLM’s parameters leading to a 3-fold lower burden on hardware and reduced use of memory and storage versus other methods, according to the organization.

The other new benchmark involved a graph neural network (GNN). These are for problems that can be represented by a very large set of interconnected nodes, such as a social network or a recommender system. Compared to other AI tasks, GNNs require a lot of communication between nodes in a computer.

The benchmark trained a GNN on a database that shows relationships about academic authors, papers, and institutes—a graph with 547 million nodes and 5.8 billion edges. The neural network was then trained to predict the right label for each node in the graph.

Future fights

Training rounds in 2025 may see head-to-head contests comparing new accelerators from AMD, Intel, and Nvidia. AMD’s MI300 series was launched about six months ago, and a memory-boosted upgrade the MI325x is planned for the end of 2024, with the next generation MI350 slated for 2025. Intel says its Gaudi 3, generally available to computer makers later this year, will appear in MLPerf’s upcoming inferencing benchmarks. Intel executives have said the new chip has the capacity to beat H100 at training LLMs. But the victory may be short-lived, as Nvidia has unveiled a new architecture, Blackwell, which is planned for late this year.

Semiconductor Engineering
Efficient TNN Inference on RISC-V Processing Cores With Minimal HW OverheadTechnical Paper Link
A new technical paper titled “xTern: Energy-Efficient Ternary Neural Network Inference on RISC-V-Based Edge Systems” was published by researchers at ETH Zurich and Universita di Bologna. Abstract “Ternary neural networks (TNNs) offer a superior accuracy-energy trade-off compared to binary neural networks. However, until now, they have required specialized accelerators to realize their efficiency potential, which has hindered widespread adoption. To address this, we present xTern, a lightweight e
11. Červen 2024 v 02:28

Efficient TNN Inference on RISC-V Processing Cores With Minimal HW Overhead

Semiconductor Engineering

Od: Technical Paper Link

11. Červen 2024 v 02:28

A new technical paper titled “xTern: Energy-Efficient Ternary Neural Network Inference on RISC-V-Based Edge Systems” was published by researchers at ETH Zurich and Universita di Bologna.

Abstract
“Ternary neural networks (TNNs) offer a superior accuracy-energy trade-off compared to binary neural networks. However, until now, they have required specialized accelerators to realize their efficiency potential, which has hindered widespread adoption. To address this, we present xTern, a lightweight extension of the RISC-V instruction set architecture (ISA) targeted at accelerating TNN inference on general-purpose cores. To complement the ISA extension, we developed a set of optimized kernels leveraging xTern, achieving 67% higher throughput than their 2-bit equivalents. Power consumption is only marginally increased by 5.2%, resulting in an energy efficiency improvement by 57.1%. We demonstrate that the proposed xTern extension, integrated into an octa-core compute cluster, incurs a minimal silicon area overhead of 0.9% with no impact on timing. In end-to-end benchmarks, we demonstrate that xTern enables the deployment of TNNs achieving up to 1.6 percentage points higher CIFAR-10 classification accuracy than 2-bit networks at equal inference latency. Our results show that xTern enables RISC-V-based ultra-low-power edge AI platforms to benefit from the efficiency potential of TNNs.”

Find the technical paper here. Published May 2024.

Rutishauser, Georg, Joan Mihali, Moritz Scherer, and Luca Benini. “xTern: Energy-Efficient Ternary Neural Network Inference on RISC-V-Based Edge Systems.” arXiv preprint arXiv:2405.19065 (2024).

The post Efficient TNN Inference on RISC-V Processing Cores With Minimal HW Overhead appeared first on Semiconductor Engineering.

IEEE Spectrum
This Japanese Aircraft Became a 5G Base StationTim Hornyak
Skies over Tokyo are thick with air traffic these days amid an influx of international tourists. But one plane recently helped revive the dream of airborne Internet access for all. Researchers in Japan announced on 28 May that they have successfully tested 5G communications equipment in the 38 gigahertz band from an altitude of 4 kilometers.The experiment was aimed at developing an aerial relay backhaul with millimeter-wave band links between ground stations and a simulated High-Altitude Platfor
6. Červen 2024 v 16:51

This Japanese Aircraft Became a 5G Base Station

IEEE Spectrum

Od: Tim Hornyak

6. Červen 2024 v 16:51

Skies over Tokyo are thick with air traffic these days amid an influx of international tourists. But one plane recently helped revive the dream of airborne Internet access for all. Researchers in Japan announced on 28 May that they have successfully tested 5G communications equipment in the 38 gigahertz band from an altitude of 4 kilometers.

The experiment was aimed at developing an aerial relay backhaul with millimeter-wave band links between ground stations and a simulated High-Altitude Platform Station (HAPS), a radio station aboard an uncrewed aircraft that stays aloft in the stratosphere for extended periods of time. A Cessna flying out of Chofu Airfield in western Tokyo was outfitted with a 38 GHz 5G base station and core network device, and three ground stations were equipped with lens antennas with automatic tracking.

With the Cessna as a relay station, the setup enabled communication between one ground station connected to the 5G terrestrial network and a terrestrial base station connected to a user terminal, according to a consortium of Japanese companies and the National Institute of Information and Communications Technology.

“We developed technology that enables communication using 5G [New Radio] by correctly directing 38 GHz beams toward three ground stations while adapting to the flight attitude, speed, direction, position, altitude, etc. during aircraft rotation,” said Shinichi Tanaka, a manager in broadcaster SKY Perfect JSAT’s Space Business Division. “We confirmed that the onboard system, designed for the stratosphere, has adequate communication and tracking performance even under the flight speed and attitude fluctuations of a Cessna aircraft, which are more severe than those of HAPS.”

The sharpest beam width of the ground station antenna is 0.8 degrees, and the trial demonstrated a tracking method that always captures the Cessna in this angular range, Tanaka added.

A diagram with photos shows Cessna in the air alongside a photo of the onboard antenna, as well as a ground station consisting of a platform with antennas. A Cessna [top left] carried a 38 GHz antenna [top right] during a flight, functioning as a 5G base station for receivers on the ground [bottom right]. The plane was able to connect to multiple ground stations at once [illustration, bottom left].NTT Docomo

Millimeter wave bands, such as the 38 GHz band, have the highest data capacity for 5G and are suited for crowded venues such as stadiums and shopping centers. When used outdoors, however, the signals can be attenuated by rain and other moisture in the atmosphere. To counter this, the consortium successfully tested an algorithm that automatically switches between multiple ground stations to compensate for moisture-weakened signals.

Unlike Google’s failed Loon effort, which focused on providing direct communication to user terminals, the HAPS trial is aimed at creating backhaul lines for base stations. Led by Japan’s Ministry of Internal Affairs and Communications, the experiment is designed to deliver high-speed, high-capacity communications both for the development of 5G and 6G networks as well as emergency response. The latter is critical in disaster-prone Japan—in January, communication lines around the Noto Peninsula on the Sea of Japan were severed following a magnitude-7 earthquake that caused over 1,500 casualties.

“This is the world’s first successful 5G communication experiment via the sky using the Q-band frequency,” said Hinata Kohara, a researcher with mobile carrier NTT Docomo’s 6G Network Innovation Department. “In addition, the use of 5G communication base stations and core network equipment on the aircraft for communication among multiple ground stations enables flexible and fast route switching of the ground [gateway] station for a feeder link, and is robust against propagation characteristics such as rainfall. Another key feature is the use of a full digital beamforming method for beam control, which uses multiple independent beams to improve frequency utilization efficiency.”

Doppler shift compensation was a challenge in the experiment, Kohara said, adding that the researchers will conduct further tests to find a solution with the aim of commercializing a HAPS service in 2026. Aside from SKY Perfect JSAT and NTT Docomo, the consortium includes Panasonic Holdings, known for its electronics equipment.

The HAPS push comes as NTT Docomo announced it has led another consortium in a $100 million investment in Airbus’ AALTO HAPS, operator of the Zephyr fixed-wing uncrewed aerial vehicle. The solar-powered wing can be used for 5G direct-to-device communications or Earth observation, and has set records including 64 days of stratospheric flight. According to Airbus, it has a reach of “up to 250 terrestrial towers in difficult mountainous terrain.” Docomo said the investment is aimed at commercializing Zephyr services in Japan, including coverage of rural areas and disaster zones, and around the world in 2026.

IEEE Spectrum
Will Scaling Solve Robotics?Nishanth J. Kumar
This post was originally published on the author’s personal blog. Last year’s Conference on Robot Learning (CoRL) was the biggest CoRL yet, with over 900 attendees, 11 workshops, and almost 200 accepted papers. While there were a lot of cool new ideas (see this great set of notes for an overview of technical content), one particular debate seemed to be front and center: Is training a large neural network on a very large dataset a feasible way to solve robotics?1 Of course, some version of
28. Květen 2024 v 12:00

Will Scaling Solve Robotics?

IEEE Spectrum

Od: Nishanth J. Kumar

28. Květen 2024 v 12:00

This post was originally published on the author’s personal blog.

Last year’s Conference on Robot Learning (CoRL) was the biggest CoRL yet, with over 900 attendees, 11 workshops, and almost 200 accepted papers. While there were a lot of cool new ideas (see this great set of notes for an overview of technical content), one particular debate seemed to be front and center: Is training a large neural network on a very large dataset a feasible way to solve robotics?¹

Of course, some version of this question has been on researchers’ minds for a few years now. However, in the aftermath of the unprecedented success of ChatGPT and other large-scale “foundation models” on tasks that were thought to be unsolvable just a few years ago, the question was especially topical at this year’s CoRL. Developing a general-purpose robot, one that can competently and robustly execute a wide variety of tasks of interest in any home or office environment that humans can, has been perhaps the holy grail of robotics since the inception of the field. And given the recent progress of foundation models, it seems possible that scaling existing network architectures by training them on very large datasets might actually be the key to that grail.

Given how timely and significant this debate seems to be, I thought it might be useful to write a post centered around it. My main goal here is to try to present the different sides of the argument as I heard them, without bias towards any side. Almost all the content is taken directly from talks I attended or conversations I had with fellow attendees. My hope is that this serves to deepen people’s understanding around the debate, and maybe even inspire future research ideas and directions.

I want to start by presenting the main arguments I heard in favor of scaling as a solution to robotics.

Why Scaling Might Work

It worked for Computer Vision (CV) and Natural Language Processing (NLP), so why not robotics? This was perhaps the most common argument I heard, and the one that seemed to excite most people given recent models like GPT4-V and SAM. The point here is that training a large model on an extremely large corpus of data has recently led to astounding progress on problems thought to be intractable just 3 to 4 years ago. Moreover, doing this has led to a number of emergent capabilities, where trained models are able to perform well at a number of tasks they weren’t explicitly trained for. Importantly, the fundamental method here of training a large model on a very large amount of data is general and not somehow unique to CV or NLP. Thus, there seems to be no reason why we shouldn’t observe the same incredible performance on robotics tasks.
- We’re already starting to see some evidence that this might work well: Chelsea Finn, Vincent Vanhoucke, and several others pointed to the recent RT-X and RT-2 papers from Google DeepMind as evidence that training a single model on large amounts of robotics data yields promising generalization capabilities. Russ Tedrake of Toyota Research Institute (TRI) and MIT pointed to the recent Diffusion Policies paper as showing a similar surprising capability. Sergey Levine of UC Berkeley highlighted recent efforts and successes from his group in building and deploying a robot-agnostic foundation model for navigation. All of these works are somewhat preliminary in that they train a relatively small model with a paltry amount of data compared to something like GPT4-V, but they certainly do seem to point to the fact that scaling up these models and datasets could yield impressive results in robotics.
Progress in data, compute, and foundation models are waves that we should ride: This argument is closely related to the above one, but distinct enough that I think it deserves to be discussed separately. The main idea here comes from Rich Sutton’s influential essay: The history of AI research has shown that relatively simple algorithms that scale well with data always outperform more complex/clever algorithms that do not. A nice analogy from Karol Hausman’s early career keynote is that improvements to data and compute are like a wave that is bound to happen given the progress and adoption of technology. Whether we like it or not, there will be more data and better compute. As AI researchers, we can either choose to ride this wave, or we can ignore it. Riding this wave means recognizing all the progress that’s happened because of large data and large models, and then developing algorithms, tools, datasets, etc. to take advantage of this progress. It also means leveraging large pre-trained models from vision and language that currently exist or will exist for robotics tasks.
Robotics tasks of interest lie on a relatively simple manifold, and training a large model will help us find it: This was something rather interesting that Russ Tedrake pointed out during a debate in the workshop on robustly deploying learning-based solutions. The manifold hypothesis as applied to robotics roughly states that, while the space of possible tasks we could conceive of having a robot do is impossibly large and complex, the tasks that actually occur practically in our world lie on some much lower-dimensional and simpler manifold of this space. By training a single model on large amounts of data, we might be able to discover this manifold. If we believe that such a manifold exists for robotics—which certainly seems intuitive—then this line of thinking would suggest that robotics is not somehow different from CV or NLP in any fundamental way. The same recipe that worked for CV and NLP should be able to discover the manifold for robotics and yield a shockingly competent generalist robot. Even if this doesn’t exactly happen, Tedrake points out that attempting to train a large model for general robotics tasks could teach us important things about the manifold of robotics tasks, and perhaps we can leverage this understanding to solve robotics.
Large models are the best approach we have to get at “commonsense” capabilities, which pervade all of robotics: Another thing Russ Tedrake pointed out is that “common sense” pervades almost every robotics task of interest. Consider the task of having a mobile manipulation robot place a mug onto a table. Even if we ignore the challenging problems of finding and localizing the mug, there are a surprising number of subtleties to this problem. What if the table is cluttered and the robot has to move other objects out of the way? What if the mug accidentally falls on the floor and the robot has to pick it up again, re-orient it, and place it on the table? And what if the mug has something in it, so it’s important it’s never overturned? These “edge cases” are actually much more common that it might seem, and often are the difference between success and failure for a task. Moreover, these seem to require some sort of ‘common sense’ reasoning to deal with. Several people argued that large models trained on a large amount of data are the best way we know of to yield some aspects of this ‘common sense’ capability. Thus, they might be the best way we know of to solve general robotics tasks.

As you might imagine, there were a number of arguments against scaling as a practical solution to robotics. Interestingly, almost no one directly disputes that this approach could work in theory. Instead, most arguments fall into one of two buckets: (1) arguing that this approach is simply impractical, and (2) arguing that even if it does kind of work, it won’t really “solve” robotics.

Why Scaling Might Not Work

It’s impractical

We currently just don’t have much robotics data, and there’s no clear way we’ll get it: This is the elephant in pretty much every large-scale robot learning room. The Internet is chock-full of data for CV and NLP, but not at all for robotics. Recent efforts to collect very large datasets have required tremendous amounts of time, money, and cooperation, yet have yielded a very small fraction of the amount of vision and text data on the Internet. CV and NLP got so much data because they had an incredible “data flywheel”: tens of millions of people connecting to and using the Internet. Unfortunately for robotics, there seems to be no reason why people would upload a bunch of sensory input and corresponding action pairs. Collecting a very large robotics dataset seems quite hard, and given that we know that a lot of important “emergent” properties only showed up in vision and language models at scale, the inability to get a large dataset could render this scaling approach hopeless.
Robots have different embodiments: Another challenge with collecting a very large robotics dataset is that robots come in a large variety of different shapes, sizes, and form factors. The output control actions that are sent to a Boston Dynamics Spot robot are very different to those sent to a KUKA iiwa arm. Even if we ignore the problem of finding some kind of common output space for a large trained model, the variety in robot embodiments means we’ll probably have to collect data from each robot type, and that makes the above data-collection problem even harder.
There is extremely large variance in the environments we want robots to operate in: For a robot to really be “general purpose,” it must be able to operate in any practical environment a human might want to put it in. This means operating in any possible home, factory, or office building it might find itself in. Collecting a dataset that has even just one example of every possible building seems impractical. Of course, the hope is that we would only need to collect data in a small fraction of these, and the rest will be handled by generalization. However, we don’t know how much data will be required for this generalization capability to kick in, and it very well could also be impractically large.
Training a model on such a large robotics dataset might be too expensive/energy-intensive: It’s no secret that training large foundation models is expensive, both in terms of money and in energy consumption. GPT-4V—OpenAI’s biggest foundation model at the time of this writing—reportedly cost over US $100 million and 50 million KWh of electricity to train. This is well beyond the budget and resources that any academic lab can currently spare, so a larger robotics foundation model would need to be trained by a company or a government of some kind. Additionally, depending on how large both the dataset and model itself for such an endeavor are, the costs may balloon by another order-of-magnitude or more, which might make it completely infeasible.

Even if it works as well as in CV/NLP, it won’t solve robotics

The 99.X problem and long tails: Vincent Vanhoucke of Google Robotics started a talk with a provocative assertion: Most—if not all—robot learning approaches cannot be deployed for any practical task. The reason? Real-world industrial and home applications typically require 99.X percent or higher accuracy and reliability. What exactly that means varies by application, but it’s safe to say that robot learning algorithms aren’t there yet. Most results presented in academic papers top out at 80 percent success rate. While that might seem quite close to the 99.X percent threshold, people trying to actually deploy these algorithms have found that it isn’t so: getting higher success rates requires asymptotically more effort as we get closer to 100 percent. That means going from 85 to 90 percent might require just as much—if not more—effort than going from 40 to 80 percent. Vincent asserted in his talk that getting up to 99.X percent is a fundamentally different beast than getting even up to 80 percent, one that might require a whole host of new techniques beyond just scaling.
- Existing big models don’t get to 99.X percent even in CV and NLP: As impressive and capable as current large models like GPT-4V and DETIC are, even they don’t achieve 99.X percent or higher success rate on previously-unseen tasks. Current robotics models are very far from this level of performance, and I think it’s safe to say that the entire robot learning community would be thrilled to have a general model that does as well on robotics tasks as GPT-4V does on NLP tasks. However, even if we had something like this, it wouldn’t be at 99.X percent, and it’s not clear that it’s possible to get there by scaling either.
Self-driving car companies have tried this approach, and it doesn’t fully work (yet): This is closely related to the above point, but important and subtle enough that I think it deserves to stand on its own. A number of self-driving car companies—most notably Tesla and Wayve—have tried training such an end-to-end big model on large amounts of data to achieve Level 5 autonomy. Not only do these companies have the engineering resources and money to train such models, but they also have the data. Tesla in particular has a fleet of over 100,000 cars deployed in the real world that it is constantly collecting and then annotating data from. These cars are being teleoperated by experts, making the data ideal for large-scale supervised learning. And despite all this, Tesla has so far been unable to produce a Level 5 autonomous driving system. That’s not to say their approach doesn’t work at all. It competently handles a large number of situations—especially highway driving—and serves as a useful Level 2 (i.e., driver assist) system. However, it’s far from 99.X percent performance. Moreover, data seems to suggest that Tesla’s approach is faring far worse than Waymo or Cruise, which both use much more modular systems. While it isn’t inconceivable that Tesla’s approach could end up catching up and surpassing its competitors performance in a year or so, the fact that it hasn’t worked yet should serve as evidence perhaps that the 99.X percent problem is hard to overcome for a large-scale ML approach. Moreover, given that self-driving is a special case of general robotics, Tesla’s case should give us reason to doubt the large-scale model approach as a full solution to robotics, especially in the medium term.
Many robotics tasks of interest are quite long-horizon: Accomplishing any task requires taking a number of correct actions in sequence. Consider the relatively simple problem of making a cup of tea given an electric kettle, water, a box of tea bags, and a mug. Success requires pouring the water into the kettle, turning it on, then pouring the hot water into the mug, and placing a tea-bag inside it. If we want to solve this with a model trained to output motor torque commands given pixels as input, we’ll need to send torque commands to all 7 motors at around 40 Hz. Let’s suppose that this tea-making task requires 5 minutes. That requires 7 * 40 * 60 * 5 = 84,000 correct torque commands. This is all just for a stationary robot arm; things get much more complicated if the robot is mobile, or has more than one arm. It is well-known that error tends to compound with longer-horizons for most tasks. This is one reason why—despite their ability to produce long sequences of text—even LLMs cannot yet produce completely coherent novels or long stories: small deviations from a true prediction over time tend to add up and yield extremely large deviations over long-horizons. Given that most, if not all robotics tasks of interest require sending at least thousands, if not hundreds of thousands, of torques in just the right order, even a fairly well-performing model might really struggle to fully solve these robotics tasks.

Okay, now that we’ve sketched out all the main points on both sides of the debate, I want to spend some time diving into a few related points. Many of these are responses to the above points on the ‘against’ side, and some of them are proposals for directions to explore to help overcome the issues raised.

Miscellaneous Related Arguments

We can probably deploy learning-based approaches robustly

One point that gets brought up a lot against learning-based approaches is the lack of theoretical guarantees. At the time of this writing, we know very little about neural network theory: we don’t really know why they learn well, and more importantly, we don’t have any guarantees on what values they will output in different situations. On the other hand, most classical control and planning approaches that are widely used in robotics have various theoretical guarantees built-in. These are generally quite useful when certifying that systems are safe.

However, there seemed to be general consensus amongst a number of CoRL speakers that this point is perhaps given more significance than it should. Sergey Levine pointed out that most of the guarantees from controls aren’t really that useful for a number of real-world tasks we’re interested in. As he put it: “self-driving car companies aren’t worried about controlling the car to drive in a straight line, but rather about a situation in which someone paints a sky onto the back of a truck and drives in front of the car,” thereby confusing the perception system. Moreover, Scott Kuindersma of Boston Dynamics talked about how they’re deploying RL-based controllers on their robots in production, and are able to get the confidence and guarantees they need via rigorous simulation and real-world testing. Overall, I got the sense that while people feel that guarantees are important, and encouraged researchers to keep trying to study them, they don’t think that the lack of guarantees for learning-based systems means that they cannot be deployed robustly.

What if we strive to deploy Human-in-the-Loop systems?

In one of the organized debates, Emo Todorov pointed out that existing successful ML systems, like Codex and ChatGPT, work well only because a human interacts with and sanitizes their output. Consider the case of coding with Codex: it isn’t intended to directly produce runnable, bug-free code, but rather to act as an intelligent autocomplete for programmers, thereby making the overall human-machine team more productive than either alone. In this way, these models don’t have to achieve the 99.X percent performance threshold, because a human can help correct any issues during deployment. As Emo put it: “humans are forgiving, physics is not.”

Chelsea Finn responded to this by largely agreeing with Emo. She strongly agreed that all successfully-deployed and useful ML systems have humans in the loop, and so this is likely the setting that deployed robot learning systems will need to operate in as well. Of course, having a human operate in the loop with a robot isn’t as straightforward as in other domains, since having a human and robot inhabit the same space introduces potential safety hazards. However, it’s a useful setting to think about, especially if it can help address issues brought on by the 99.X percent problem.

Maybe we don’t need to collect that much real-world data for scaling

A number of people at the conference were thinking about creative ways to overcome the real-world data bottleneck without actually collecting more real world data. Quite a few of these people argued that fast, realistic simulators could be vital here, and there were a number of works that explored creative ways to train robot policies in simulation and then transfer them to the real world. Another set of people argued that we can leverage existing vision, language, and video data and then just ‘sprinkle in’ some robotics data. Google’s recent RT-2 model showed how taking a large model trained on internet scale vision and language data, and then just fine-tuning it on a much smaller set robotics data can produce impressive performance on robotics tasks. Perhaps through a combination of simulation and pretraining on general vision and language data, we won’t actually have to collect too much real-world robotics data to get scaling to work well for robotics tasks.

Maybe combining classical and learning-based approaches can give us the best of both worlds

As with any debate, there were quite a few people advocating the middle path. Scott Kuindersma of Boston Dynamics titled one of his talks “Let’s all just be friends: model-based control helps learning (and vice versa)”. Throughout his talk, and the subsequent debates, his strong belief that in the short to medium term, the best path towards reliable real-world systems involves combining learning with classical approaches. In her keynote speech for the conference, Andrea Thomaz talked about how such a hybrid system—using learning for perception and a few skills, and classical SLAM and path-planning for the rest—is what powers a real-world robot that’s deployed in tens of hospital systems in Texas (and growing!). Several papers explored how classical controls and planning, together with learning-based approaches can enable much more capability than any system on its own. Overall, most people seemed to argue that this ‘middle path’ is extremely promising, especially in the short to medium term, but perhaps in the long-term either pure learning or an entirely different set of approaches might be best.

What Can/Should We Take Away From All This?

If you’ve read this far, chances are that you’re interested in some set of takeaways/conclusions. Perhaps you’re thinking “this is all very interesting, but what does all this mean for what we as a community should do? What research problems should I try to tackle?” Fortunately for you, there seemed to be a number of interesting suggestions that had some consensus on this.

We should pursue the direction of trying to just scale up learning with very large datasets

Despite the various arguments against scaling solving robotics outright, most people seem to agree that scaling in robot learning is a promising direction to be investigated. Even if it doesn’t fully solve robotics, it could lead to a significant amount of progress on a number of hard problems we’ve been stuck on for a while. Additionally, as Russ Tedrake pointed out, pursuing this direction carefully could yield useful insights about the general robotics problem, as well as current learning algorithms and why they work so well.

We should also pursue other existing directions

Even the most vocal proponents of the scaling approach were clear that they don’t think everyone should be working on this. It’s likely a bad idea for the entire robot learning community to put its eggs in the same basket, especially given all the reasons to believe scaling won’t fully solve robotics. Classical robotics techniques have gotten us quite far, and led to many successful and reliable deployments: pushing forward on them or integrating them with learning techniques might be the right way forward, especially in the short to medium terms.

We should focus more on real-world mobile manipulation and easy-to-use systems

Vincent Vanhoucke made an observation that most papers at CoRL this year were limited to tabletop manipulation settings. While there are plenty of hard tabletop problems, things generally get a lot more complicated when the robot—and consequently its camera view—moves. Vincent speculated that it’s easy for the community to fall into a local minimum where we make a lot of progress that’s specific to the tabletop setting and therefore not generalizable. A similar thing could happen if we work predominantly in simulation. Avoiding these local minima by working on real-world mobile manipulation seems like a good idea.

Separately, Sergey Levine observed that a big reason why LLM’s have seen so much excitement and adoption is because they’re extremely easy to use: especially by non-experts. One doesn’t have to know about the details of training an LLM, or perform any tough setup, to prompt and use these models for their own tasks. Most robot learning approaches are currently far from this. They often require significant knowledge of their inner workings to use, and involve very significant amounts of setup. Perhaps thinking more about how to make robot learning systems easier to use and widely applicable could help improve adoption and potentially scalability of these approaches.

We should be more forthright about things that don’t work

There seemed to be a broadly-held complaint that many robot learning approaches don’t adequately report negative results, and this leads to a lot of unnecessary repeated effort. Additionally, perhaps patterns might emerge from consistent failures of things that we expect to work but don’t actually work well, and this could yield novel insight into learning algorithms. There is currently no good incentive for researchers to report such negative results in papers, but most people seemed to be in favor of designing one.

We should try to do something totally new

There were a few people who pointed out that all current approaches—be they learning-based or classical—are unsatisfying in a number of ways. There seem to be a number of drawbacks with each of them, and it’s very conceivable that there is a completely different set of approaches that ultimately solves robotics. Given this, it seems useful to try think outside the box. After all, every one of the current approaches that’s part of the debate was only made possible because the few researchers that introduced them dared to think against the popular grain of their times.

Acknowledgements: Huge thanks to Tom Silver and Leslie Kaelbling for providing helpful comments, suggestions, and encouragement on a previous draft of this post.

—
¹ In fact, this was the topic of a popular debate hosted at a workshop on the first day; many of the points in this post were inspired by the conversation during that debate.

IEEE Spectrum
Three New Supercomputers Reach Top of Green500 ListDina Genkina
Over just the past couple of years, supercomputing has accelerated into the exascale era—with the world’s most massive machines capable of performing over a billion billion operations per second. But unless big efficiency improvements can intervene along its exponential growth curve, computing is also anticipated to require increasingly impractical and unsustainable amounts of energy—even, according to one widely cited study, by 2040 demanding more energy than the world’s total present-day outpu
24. Květen 2024 v 17:45

Three New Supercomputers Reach Top of Green500 List

IEEE Spectrum

Od: Dina Genkina

24. Květen 2024 v 17:45

Over just the past couple of years, supercomputing has accelerated into the exascale era—with the world’s most massive machines capable of performing over a billion billion operations per second. But unless big efficiency improvements can intervene along its exponential growth curve, computing is also anticipated to require increasingly impractical and unsustainable amounts of energy—even, according to one widely cited study, by 2040 demanding more energy than the world’s total present-day output.

Fortunately, the high-performance computing community is shifting focus now toward not just increased performance (measured in raw petaflops or exaflops) but also higher efficiency, boosting the number of operations per watt.

The Green500 list saw newcomers enter into the top three spots, suggesting that some of the world’s newest high-performance systems may be chasing efficiency at least as much as sheer power.

The newest ranking of the Top500 supercomputers (a list of the world’s most powerful machines) and its cousin the Green500 (ranking instead the world’s highest-efficiency machines) came out last week. The leading 10 of the Top 500 largest supercomputers remains mostly unchanged, headed up by Oak Ridge National Laboratory’s Frontier exascale computer. There was only one new addition in the top 10, at No. 6: Swiss National Supercomputing Center’s Alps system. Meanwhile, Argonne National Laboratory’s Aurora doubled its size, but kept its second-tier ranking.

On the other hand, The Green500 list saw newcomers enter into the top three spots, suggesting that some of the world’s newest high-performance systems may be chasing efficiency at least as much as sheer power.

Heading up the new Green500 list was JEDI, Jülich Supercomputing Center’s prototype system for its impending JUPITER exascale computer. The No. 2 and No. 3 spots went to the University of Bristol’s Isambard AI, also the first phase of a larger planned system, and the Helios supercomputer from the Polish organization Cyfronet. In fourth place is the previous list’s leader, the Simons Foundation’s Henri.

A Hopper Runs Through It

The top three systems on the Green500 list have one thing in common—they are all built with Nvidia’s Grace Hopper superchips, a combination of the Hopper (H100) GPU and the Grace CPU. There are two main reasons why the Grace Hopper architecture is so efficient, says Dion Harris, director of accelerated data center go-to-market strategy at Nvidia. The first is the Grace CPU, which benefits from the ARM instruction set architecture’s superior power performance. Plus, he says, it incorporates a memory structure, called LPDDR5X, that’s commonly found in cellphones and is optimized for energy efficiency.

Close-up of the NVIDIA logo on computing equipment Nvidia’s GH200 Grace Hopper superchip, here deployed in Jülich’s JEDI machine, now powers the world’s top three most efficient HPC systems. Jülich Supercomputing Center

The second advantage of the Grace Hopper, Harris says, is a newly developed interconnect between the Hopper GPU and the Grace CPU. The connection takes advantage of the CPU and GPU’s proximity to each other on one board, and achieves a bandwidth of 900 gigabits per second, about 7 times as fast as the latest PCIe gen5 interconnects. This allows the GPU to access the CPU’s memory quickly, which is particularly important for highly parallel applications such as AI training or graph neural networks, Harris says.

All three top systems use Grace Hoppers, but Jülich’s JEDI still leads the pack by a noticeable margin—72.7 gigaflops per watt, as opposed to 68.8 gigaflops per watt for the runner-up (and 65.4 gigaflops per watt for the previous champion). The JEDI team attributes their added success to the way they’ve connected their chips together. Their interconnect fabric was also from Nvidia—Quantum-2 InfiniBand—rather than the HPE Slingshot used by the other two top systems.

The JEDI team also cites specific optimizations they did to accommodate the Green500 benchmark. In addition to using all the latest Nvidia gear, JEDI cuts energy costs with its cooling system. Instead of using air or chilled water, JEDI circulates hot water throughout its compute nodes to take care of the excess heat. “Under normal weather conditions, the excess heat can be taken care of by free cooling units without the need of additional cold-water cooling,” says Benedikt von St. Vieth, head of the division for high-performance computing at Jülich.

JUPITER will use the same architecture as its prototype, JEDI, and von St. Vieth says he aims for it to maintain much of the prototype’s energy efficiency—although with increased scale, he adds, more energy may be lost to interconnecting fabric.

Of course, most crucial is the performance of these systems on real scientific tasks, not just on the Green500 benchmark. “It was really exciting to see these systems come online,” Nvidia’s Harris says, “But more importantly, I think we’re really excited to see the science come out of these systems, because I think [the energy efficiency] will have more impact on the applications even than on the benchmark.”

Semiconductor Engineering
High-Level Synthesis Propels Next-Gen AI AcceleratorsRussell Klein
Everything around you is getting smarter. Artificial intelligence is not just a data center application but will be deployed in all kinds of embedded systems that we interact with daily. We expect to talk to and gesture at them. We expect them to recognize and understand us. And we expect them to operate with just a little bit of common sense. This intelligence is making these systems not just more functional and easier to use, but safer and more secure as well. All this intelligence comes from
20. Květen 2024 v 09:01

High-Level Synthesis Propels Next-Gen AI Accelerators

Semiconductor Engineering

Od: Russell Klein

20. Květen 2024 v 09:01

Everything around you is getting smarter. Artificial intelligence is not just a data center application but will be deployed in all kinds of embedded systems that we interact with daily. We expect to talk to and gesture at them. We expect them to recognize and understand us. And we expect them to operate with just a little bit of common sense. This intelligence is making these systems not just more functional and easier to use, but safer and more secure as well.

All this intelligence comes from advances in deep neural networks. One of the key challenges of neural networks is their computational complexity. Small neural networks can take millions of multiply accumulate operations (MACs) to produce a result. Larger ones can take billions. Large language models, and similarly complex networks, can take trillions. This level of computation is beyond what can be delivered by embedded processors.

In some cases, the computation of these inferences can be off-loaded over a network to a data center. Increasingly, devices have fast and reliable network connections – making this a viable option for many systems. However, there are also a lot of systems that have hard real time requirements that cannot be met by even the fastest and most reliable networks. For example, any system that has autonomous mobility – self-driving cars or self-piloted drones – needs to make decisions faster than could be done through an off-site data center. There are also systems where sensitive data is being processed that should not be sent over networks. And anything that goes over a network introduces an additional attack surface for hackers. For all of these reasons – performance, privacy, and security – some inferencing will need to be done on embedded systems.

For very simple networks, embedded CPUs can handle the task. Even a Raspberry Pi can deploy a simple object recognition algorithm. For more complex tasks there are embedded GPUs, as well as neural processing units (NPUs) targeted at embedded systems that can deliver greater computational capability. But for the highest levels of performance and efficiency, building a bespoke AI (Artificial Intelligence) accelerator can enable applications that would otherwise be impractical.

Engineering a new piece of hardware is a daunting undertaking, whether for ASIC or FPGA. But it enables developers to reach a level of performance and efficiency not possible with off-the-shelf components. But how can the average development team build a better machine learning accelerator than the designers creating the most leading-edge commercial AI accelerators, with multiple generations under their belt? By highly customizing the implementation to the specific inference being performed, the implementation can be an order of magnitude better than more generalized solutions.

When a general-purpose AI accelerator developer creates an NPU, their goal is to support any neural network that anyone might conceive. They want to get thousands of design ins, so they have to make the design as general as possible. Not only that, but they also aim to have some level of “future proofing” built into their designs. They want to be able to support any network that might be imagined for several years into the future. Not an easy task in a technology that is evolving so rapidly.

A bespoke accelerator needs to only support the one, or perhaps several, networks to be used. This freedom allows many programmable elements in the implementation of the accelerator to be fixed in hardware. This creates hardware that is both smaller and faster than something general purpose. For example, a dedicated convolution accelerator, with a fixed image and filter size, can be up to 10 times faster than a well-designed general purpose TPU.

General purpose accelerators usually use floating point numbers. This is because virtually all neural networks are developed in Python on general purpose computers using floating point numbers. To ensure correct support of those neural networks, the accelerator must, of course, support floating point numbers. However, most neural networks use numbers close to 0, and require a lot of precision there. And floating-point multipliers are huge. If they are not needed, omitting them from the design saves a lot of area and power.

Some NPUs support integer representation, and sometimes with a variety of sizes. But supporting multiple numeric representation formats adds circuitry, which consumes power and adds propagation delays. Choosing one representation and using that exclusively enables a smaller faster implementation.

When building a bespoke accelerator, one is not limited to 8 bits or 16 bits, any size can be used. Picking the correct numeric representation, or “quantizing” a neural network, allows the data and the operators to be optimally sized. Quantization can significantly reduce the data needed to be stored, moved, and operated on. Reducing the memory footprint for the weight database and shrinking the multipliers can really improve the area and power of a design. For example, a 10-bit fixed-point multiplier is about 20 times smaller than a 32-bit floating-point multiplier, and, correspondingly, will use about 1/20^th the power. This means the design can either be much smaller and energy efficient by using the smaller multiplier, or the designer can opt to use the area and deploy 20 multipliers that can operate in parallel, producing much higher performance using the same resources.

One of the key challenges in building a bespoke machine learning accelerator is that the data scientists who created the neural network usually do not understand hardware design, and the hardware designers do not understand data science. In a traditional design flow, they would use “meetings” and “specifications” to transfer knowledge and share ideas. But, honestly, no one likes meetings or specifications. And they are not particularly good at effecting an information exchange.

High-Level Synthesis (HLS) allows an implementation produced by the data scientists to be used, not just as an executable reference, but as a machine-readable input to the hardware design process. This eliminates the manual reinterpretation of the algorithm in the design flow, which is slow and extremely error prone. HLS synthesizes an RTL implementation from an algorithmic description. Usually, the algorithm is described in C++ or SystemC, but a number of design flows like HLS4ML are enabling HLS tools to take neural network descriptions directly from machine learning frameworks.

HLS enables a practical exploration of quantization in a way that is not yet practical in machine learning frameworks. To fully understand the impact of quantization requires a bit accurate implementation of the algorithm, including the characterization of the effects of overflow, saturation, and rounding. Today this in only practical in hardware description languages (HDLs) or HLS bit accurate data types (https://hlslibs.org).

As machine learning becomes ubiquitous, more embedded systems will need to deploy inferencing accelerators. HLS is a practical and proven way to create bespoke accelerators, optimized for a very specific application, that deliver higher performance and efficiency than general purpose NPUs.

For more information on this topic, read the paper: High-Level Synthesis Enables the Next Generation of Edge AI Accelerators.

The post High-Level Synthesis Propels Next-Gen AI Accelerators appeared first on Semiconductor Engineering.

Semiconductor Engineering
Chip Industry Week In ReviewThe SE Staff
Synopsys refocused its security priorities around chips, striking a deal to sell off its Software Integrity Group subsidiary to private equity firms Clearlake Capital Group and Francisco Partners for about $2.1 billion. That deal comes on the heels of Synopsys’ recent acquisition of Intrinsic ID, which develops physical unclonable function IP. Sassine Ghazi, Synopsys’ president and CEO, said in an interview that the sale of the software group “gives us the ability to have management bandwidth, c
10. Květen 2024 v 09:01

Chip Industry Week In Review

Semiconductor Engineering

Od: The SE Staff

10. Květen 2024 v 09:01

Synopsys refocused its security priorities around chips, striking a deal to sell off its Software Integrity Group subsidiary to private equity firms Clearlake Capital Group and Francisco Partners for about $2.1 billion. That deal comes on the heels of Synopsys’ recent acquisition of Intrinsic ID, which develops physical unclonable function IP. Sassine Ghazi, Synopsys’ president and CEO, said in an interview that the sale of the software group “gives us the ability to have management bandwidth, capital, and to double down on what we’re doing in our core business.”

The U.S. Commerce Department reportedly pulled export licenses from Intel and Qualcomm that permitted them to ship semiconductors to Huawei, the Financial Times reported. The move comes after advanced chips from Intel reportedly were used in new laptops and smartphones from the China-based company.

Apple debuted its second-generation 3nm M4 chip with the launch of the new iPad Pro. The CPU and GPU each have up to 10 cores, with a neural engine capable of 38 TOPS, and a total of 28 billion transistors. Apple also is working with TSMC to develop its own AI processors for running software in data centers, reports The Wall Street Journal.

The U.S. is expected to triple its semiconductor manufacturing capacity by 2032, according to a new report by the Semiconductor Industry Association and Boston Consulting. By that year, the U.S. is projected to have 28% of global capacity for advanced logic manufacturing and over a quarter of total global capital expenditures.

Fig. 1: Source: Semiconductor Industry Association and Boston Consulting Group.

Quick links to more news:

Global
Market Reports
Automotive
Security
Product News
Education and Training
Research
In-Depth
Events
Further Reading

Around The Globe

The U.S. Commerce Department plans to solicit bids from organizations interested in creating and managing a new CHIPS Manufacturing USA institute focused on digital twins in the semiconductor sector. The government will award up to $285 million to the selected proposal.

The U.S. National Science Foundation and Department of Energy announced the first 35 projects to be supported with computational time through the National Artificial Intelligence Research Resource (NAIRR) Pilot. The initial selected projects will gain access to several U.S. supercomputing centers and other resources, with the goal of advancing responsible AI research.

Through its new Federal AI Sandbox, MITRE is offering up its computing power to U.S. government agencies. “Our new Federal AI Sandbox will help level the playing field, making the high-quality compute power needed to train and test custom AI solutions available to any agency,” stated Charles Clancy, MITRE, senior vice president and chief technology officer, in the release.

Saudi Arabia’s $100 billion investment fund for semiconductor and AI technology pledged it would divest from China if requested by the U.S, reported Bloomberg.

Japan’s SoftBank is holding talks with UK-based AI Chip firm Graphcore about a possible acquisition, reports Bloomberg.

India’s chip industry is heating up. Mindgrove launched the country’s first SoC, named Secure IoT. The chip clocks at 700 MHz, and the company is touting its key security algorithms, secure boot, and on-chip OTP memory. Meanwhile, Lam Research is expanding its global semiconductor fabrication supply chain to include India.

Microsoft will build a $3.3 billion AI data center in Racine, Wisconsin, the same location as the failed Foxconn investment touted six years ago.

Markets And Money

The SIA announced first-quarter global semiconductor sales grew more than 15% YoY, still 5.7% below Q4 2023, but a big improvement over last year. Consider that the semiconductor materials market contracted 8.2% in 2023 to $66.7 billion, down from a record $72.7 billion in 2022, according to a new report from SEMI.

The demand for AI-powered consumer electronics will drive global AI chipset shipments to 1.3 billion by 2030, according to ABI Research.

TrendForce released several new industry reports this week. Among the highlights:

HBM prices are expected to increase by up to 10% in 2025, representing more than 30% of total DRAM value.
In Q2, DRAM contract prices rose 13% to 18%, while NAND flash prices increased 15% to 20%.
The top 10 design firms’ combined revenue increased 12% in 2023, with NVIDIA taking the lead for the first time.

A number of acquisitions were announced recently:

High-voltage IC company, Power Integrations, will purchase the assets of Odyssey Semiconductor Technologies, a developer of gallium nitride (GaN) transistors.
Mobix Labs agreed to buy RF design company RaGE Systems for $20 million in cash, stock, and incentives.
V-Tek, a packaging services and inspection company, acquired A&J Programming, a manufacturer of automated handling and programming equipment.

The global smartphone market grew 6% year-over-year, shipping 296.9 million units in Q124, according to a Counterpoint report. Samsung toppled Apple for the top spot with a 20% share.

Automotive

U.S. Justice Department is investigating whether Tesla committed securities or wire fraud for misleading consumers and investors about its EV’s autopilot capabilities, according to Reuters.

The automotive ecosystem is undergoing a huge transformation toward software-defined vehicles, spurring new architectures that can be future-proofed and customized with software.

Infineon introduced a microcontroller for the automotive battery management sector, integrating high-precision analog and high-voltage subsystems on a single chip. Infineon also inked a deal with China’s Xiaomi to provide SiC power modules for Xiaomi’s new SU7 smart EV.

Keysight and ETAS are teaming up to embed ETAS fuzz testing software into Keysight’s automotive cybersecurity platform.

Also, Keysight’s device security research lab, Riscure Security Solutions, can now conduct vehicle type approval evaluations under United Nations R155/R156 regulations. Keysight acquired Riscure in March.

Two autonomous driving companies received big funding. British AI company Wayve received a $1.05 billion Series C investment from SoftBank, with contributions from NVIDIA and Microsoft. Hyundai spent an additional $475 million on Motional, according its recent earnings report.

The automotive imaging market grew to U.S. $5.7 billion in 2023 due to increased production, autonomy demand, and higher-resolution offerings.

Automotive Grade Linux (AGL), a collaborative cross-industry effort developing an open source platform for all Software-Defined Vehicles (SDVs), released cloud-native functionality, RISC-V architecture and flutter applications.

Security

SRAM security concerns are intensifying as a combination of new and existing techniques allow hackers to tap into data for longer periods of time after a device is powered down. This is particularly alarming as the leading edge of design shifts to heterogeneous systems in package, where chiplets frequently have their own memory hierarchy.

Machine learning is being used by hackers to find weaknesses in chips and systems, but it also is starting to be used to prevent breaches by pinpointing hardware and software design flaws.

txOne Networks, provider of Cyber-Physical Systems security, raised $51 million in Series B extension round of funding.

The U.S. Department of Justice charged a Russian national with his role as the creator, developer and administrator of the LockBit, a prolific ramsomware group, that allegedly stole $100 million in payments from 2,000 victims.

The Cybersecurity and Infrastructure Security Agency (CISA) launched “We Can Secure Our World,” a new public awareness program promoting “basic cyber hygiene” and the agency also issues a number of alerts/advisories.

Product News

Siemens unveiled its Solido IP Validation Suite software, an automated quality assurance product designed to work across all design IP types and formats. The suite includes Solido Crosscheck and IPdelta software, which both provide in-view, cross-view and version-to-version QA checks.

proteanTecs announced its lifecycle monitoring solution is being integrated into SAPEON’s new AI processors.

SpiNNcloud Systems revealed their SpiNNaker2 system, an event-based AI platform supercomputer containing chips that are a mesh of 152 ARM-based cores. The platform has the ability to emulate 10 billion neurons while still maintaining power efficiency and reliability.

Ansys partnered with Schrodinger to develop new computational materials. The collaboration will see Schrodinger’s molecular modeling technology used in Ansys’ simulation tools to evaluate performance ahead of the prototype phase.

Keysight introduced a pulse generator to its handheld radio frequency analyzer software options. The Option 357 pulse generator is downloadable on B- and C-Series FieldFox analyzers.

Education and Training

Semiconductor fever is hitting academia:

Penn State discussed its role in leading 15 universities to drive advances in chip integration and packaging.
Georgia Tech’s explained its research is happening at all the levels of the “semiconductor stack,” touting its 28,500 square feet of academic cleanroom space.
And in the past month Purdue University, Dassault Systems and Lam Research expanded an existing deal to use virtual twins and simulation tools in workforce development.

Arizona State University is beefing up their technology programs with a new bachelor’s and doctoral degree in robotics and autonomous systems.

Microsoft is partnering with Gateway Technical College in Wisconsin to create a Data Center Academy to train Wisconsinites for data center and STEM roles by 2030.

Research

Stanford-led researchers used ordinary-appearing glasses for an augmented reality headset, utilizing waveguide display techniques, holographic imaging, and AI.

UC Berkeley, LLNL, and MIT engineered a miniaturized on-chip energy storage and power delivery, using an atomic-scale approach to modify electrostatic capacitors.

ORNL and other researchers observed a “surprising isotope effect in the optoelectronic properties of a single layer of molybdenum disulfide” when they substituted heavier isotope of molybdenum in the crystal.

Three U.S. national labs are partnering with NVIDIA to develop advanced memory technologies for high performance computing.

In-Depth

In addition to this week’s Automotive, Security and Pervasive Computing newsletter, here are more top stories and tech talk from the week:

Communication Is Key To Finding And Fixing Bugs In ICs
Can Models Created With AI Be Trusted?
Adapting To Evolving IC Requirements

Events

Find upcoming chip industry events here, including:

Event	Date	Location
ASMC: Advanced Semiconductor Manufacturing Conference	May 13 – 16	Albany, NY
ISES Taiwan 2024: International Semiconductor Executive Summit	May 14 – 15	New Taipei City
Ansys Simulation World 2024	May 14 – 16	Online
Women In Semiconductors	May 16	Albany, NY
European Test Symposium	May 20 – 24	The Hague, Netherlands
NI Connect Austin 2024	May 20 – 22	Austin, Texas
ITF World 2024 (imec)	May 21 – 22	Antwerp, Belgium
Embedded Vision Summit	May 21 – 23	Santa Clara, CA
ASIP Virtual Seminar 2024	May 22	Online
Electronic Components and Technology Conference (ECTC) 2024	May 28 – 31	Denver, Colorado
Hardwear.io Security Trainings and Conference USA 2024	May 28 – Jun 1	Santa Clara, CA
Find All Upcoming Events Here

Upcoming webinars are here.

SRAM Security Concerns Grow

Semiconductor Engineering

Od: Karen Heyman

9. Květen 2024 v 09:08

SRAM security concerns are intensifying as a combination of new and existing techniques allow hackers to tap into data for longer periods of time after a device is powered down.

This is particularly alarming as the leading edge of design shifts from planar SoCs to heterogeneous systems in package, such as those used in AI or edge processing, where chiplets frequently have their own memory hierarchy. Until now, most cybersecurity concerns involving volatile memory have focused on DRAM, because it is often external and easier to attack. SRAM, in contrast, does not contain a component as obviously vulnerable as a heat-sensitive capacitor, and in the past it has been harder to pinpoint. But as SoCs are disaggregated and more features are added into devices, SRAM is becoming a much bigger security concern.

The attack scheme is well understood. Known as cold boot, it was first identified in 2008, and is essentially a variant of a side-channel attack. In a cold boot approach, an attacker dumps data from internal SRAM to an external device, and then restarts the system from the external device with some code modification. “Cold boot is primarily targeted at SRAM, with the two primary defenses being isolation and in-memory encryption,” said Vijay Seshadri, distinguished engineer at Cycuity.

Compared with network-based attacks, such as DRAM’s rowhammer, cold boot is relatively simple. It relies on physical proximity and a can of compressed air.

The vulnerability was first described by Edward Felton, director of Princeton University’s Center for Information Technology Policy, J. Alex Halderman, currently director of the Center for Computer Security & Society at the University of Michigan, and colleagues. The breakthrough in their research was based on the growing realization in the engineering research community that data does not vanish from memory the moment a device is turned off, which until then was a common assumption. Instead, data in both DRAM and SRAM has a brief “remanence.”[1]

Using a cold boot approach, data can be retrieved, especially if an attacker sprays the chip with compressed air, cooling it enough to slow the degradation of the data. As the researchers described their approach, “We obtained surface temperatures of approximately −50°C with a simple cooling technique — discharging inverted cans of ‘canned air’ duster spray directly onto the chips. At these temperatures, we typically found that fewer than 1% of bits decayed even after 10 minutes without power.”

Unfortunately, despite nearly 20 years of security research since the publication of the Halderman paper, the authors’ warning still holds true. “Though we discuss several strategies for mitigating these risks, we know of no simple remedy that would eliminate them.”

However unrealistic, there is one simple and obvious remedy to cold boot — never leave a device unattended. But given human behavior, it’s safer to assume that every device is vulnerable, from smart watches to servers, as well as automotive chips used for increasingly autonomous driving.

While the original research exclusively examined DRAM, within the last six years cold boot has proven to be one of the most serious vulnerabilities for SRAM. In 2018, researchers at Germany’s Technische Universität Darmstadt published a paper describing a cold boot attack method that is highly resistant to memory erasure techniques, and which can be used to manipulate the cryptographic keys produced by the SRAM physical unclonable function (PUF).

As with so many security issues, it’s been a cat-and-mouse game between remedies and counter-attacks. And because cold boot takes advantage of slowing down memory degradation, in 2022 Yang-Kyu Choi and colleagues at the Korea Advanced Institute of Science and Technology (KAIST), described a way to undo the slowdown with an ultra-fast data sanitization method that worked within 5 ns, using back bias to control the device parameters of CMOS.

Fig. 1: Asymmetric forward back-biasing scheme for permanent erasing. (a) All the data are reset to 1. (b) All the data are reset to 0. Whether all the data where reset to 1 or 0 is determined by the asymmetric forward back-biasing scheme. Source: KAIST/Creative Commons [2]

Their paper, as well as others, have inspired new approaches to combating cold boot attacks.

“To mitigate the risk of unauthorized access from unknown devices, main devices, or servers, check the authenticated code and unique identity of each accessing device,” said Jongsin Yun, memory technologist at Siemens EDA. “SRAM PUF is one of the ways to securely identify each device. SRAM is made of two inverters cross-coupled to each other. Although each inverter is designed to be the same device, normally one part of the inverter has a somewhat stronger NMOS than the other due to inherent random dopant fluctuation. During the initial power-on process, SRAM data will be either data 1 or 0, depending on which side has a stronger device. In other words, the initial data state of the SRAM array at the power on is decided by this unique random process variation and most of the bits maintain this property for life. One can use this unique pattern as a fingerprint of a device. The SRAM PUF data is reconstructed with other coded data to form a cryptographic key. SRAM PUF is a great way to anchor its secure data into hardware. Hackers may use a DFT circuit to access the memory. To avoid insecurely reading the SRAM information through DFT, the security-critical design makes DFT force delete the data as an initial process of TEST mode.”

However, there can be instances where data may be required to be kept in a non-volatile memory (NVM). “Data is considered insecure if the NVM is located outside of the device,” said Yun. “Therefore, secured data needs to be stored within the device with write protection. One-time programmable (OTP) memory or fuses are good storage options to prevent malicious attackers from tampering with the modified information. OTP memory and fuses are used to store cryptographic keys, authentication information, and other critical settings for operation within the device. It is useful for anti-rollback, which prevents hackers from exploiting old vulnerabilities that have been fixed in newer versions.”

Chiplet vulnerabilities
Chiplets also could present another vector for attack, due to their complexity and interconnections. “A chiplet has memory, so it’s going to be attacked,” said Cycuity’s Seshadri. “Chiplets, in general, are going to exacerbate the problem, rather than keeping it status quo, because you’re going to have one chiplet talking to another. Could an attack on one chiplet have a side effect on another? There need to be standards to address this. In fact, they’re coming into play already. A chiplet provider has to say, ‘Here’s what I’ve done for security. Here’s what needs to be done when interfacing with another chiplet.”

Yun notes there is a further physical vulnerability for those working with chiplets and SiPs. “When multiple chiplets are connected to form a SiP, we have to trust data coming from an external chip, which creates further complications. Verification of the chiplet’s authenticity becomes very important for SiPs, as there is a risk of malicious counterfeit chiplets being connected to the package for hacking purposes. Detection of such counterfeit chiplets is imperative.”

These precautions also apply when working with DRAM. In all situations, Seshardi said, thinking about security has to go beyond device-level protection. “The onus of protecting DRAM is not just on the DRAM designer or the memory designer,” he said. “It has to be secured by design principles when you are developing. In addition, you have to look at this holistically and do it at a system level. You must consider all the other things that communicate with DRAM or that are placed near DRAM. You must look at a holistic solution, all the way from software down to things like the memory controller and then finally, the DRAM itself.”

Encryption as a backup
Data itself always must be encrypted as second layer of protection against known and novel attacks, so an organization’s assets will still be protected even if someone breaks in via cold boot or another method.

“The first and primary method of preventing a cold boot attack is limiting physical access to the systems, or physically modifying the systems case or hardware preventing an attacker’s access,” said Jim Montgomery, market development director, semiconductor at TXOne Networks. “The most effective programmatic defense against an attack is to ensure encryption of memory using either a hardware- or software-based approach. Utilizing memory encryption will ensure that regardless of trying to dump the memory, or physically removing the memory, the encryption keys will remain secure.”

Montgomery also points out that TXOne is working with the Semiconductor Manufacturing Cybersecurity Consortium (SMCC) to develop common criteria based upon SEMI E187 and E188 standards to assist DM’s and OEM’s to implement secure procedures for systems security and integrity, including controlling the physical environment.

What kind and how much encryption will depend on use cases, said Jun Kawaguchi, global marketing executive for Winbond. “Encryption strength for a traffic signal controller is going to be different from encryption for nuclear plants or medical devices, critical applications where you need much higher levels,” he said. “There are different strengths and costs to it.”

Another problem, in the post-quantum era, is that encryption itself may be vulnerable. To defend against those possibilities, researchers are developing post-quantum encryption schemes. One way to stay a step ahead is homomorphic encryption [HE], which will find a role in data sharing, since computations can be performed on encrypted data without first having to decrypt it.

Homomorphic encryption could be in widespread use as soon as the next few years, according to Ronen Levy, senior manager for IBM’s Cloud Security & Privacy Technologies Department, and Omri Soceanu, AI Security Group manager at IBM. However, there are still challenges to be overcome.

“There are three main inhibitors for widespread adoption of homomorphic encryption — performance, consumability, and standardization,” according to Levy. “The main inhibitor, by far, is performance. Homomorphic encryption comes with some latency and storage overheads. FHE hardware acceleration will be critical to solving these issues, as well as algorithmic and cryptographic solutions, but without the necessary expertise it can be quite challenging.”

An additional issue is that most consumers of HE technology, such as data scientists and application developers, do not possess deep cryptographic skills, HE solutions that are designed for cryptographers can be impractical. A few HE solutions require algorithmic and cryptographic expertise that inhibit adoption by those who lack these skills.

Finally, there is a lack of standardization. “Homomorphic encryption is in the process of being standardized,” said Soceanu. “But until it is fully standardized, large organizations may be hesitant to adopt a cryptographic solution that has not been approved by standardization bodies.”

Once these issues are resolved, they predicted widespread use as soon as the next few years. “Performance is already practical for a variety of use cases, and as hardware solutions for homomorphic encryption become a reality, more use cases would become practical,” said Levy. “Consumability is addressed by creating more solutions, making it easier and hopefully as frictionless as possible to move analytics to homomorphic encryption. Additionally, standardization efforts are already in progress.”

A new attack and an old problem
Unfortunately, security never will be as simple as making users more aware of their surroundings. Otherwise, cold boot could be completely eliminated as a threat. Instead, it’s essential to keep up with conference talks and the published literature, as graduate students keep probing SRAM for vulnerabilities, hopefully one step ahead of genuine attackers.

For example, SRAM-related cold boot attacks originally targeted discrete SRAM. The reason is that it’s far more complicated to attack on-chip SRAM, which is isolated from external probing and has minimal intrinsic capacitance. However, in 2022, Jubayer Mahmod, then a graduate student at Virginia Tech and his advisor, associate professor Matthew Hicks, demonstrated what they dubbed “Volt Boot,” a new method that could penetrate on-chip SRAM. According to their paper, “Volt Boot leverages asymmetrical power states (e.g., on vs. off) to force SRAM state retention across power cycles, eliminating the need for traditional cold boot attack enablers, such as low-temperature or intrinsic data retention time…Unlike other forms of SRAM data retention attacks, Volt Boot retrieves data with 100% accuracy — without any complex post-processing.”

Conclusion
While scientists and engineers continue to identify vulnerabilities and develop security solutions, decisions about how much security to include in a design is an economic one. Cost vs. risk is a complex formula that depends on the end application, the impact of a breach, and the likelihood that an attack will occur.

“It’s like insurance,” said Kawaguchi. “Security engineers and people like us who are trying to promote security solutions get frustrated because, similar to insurance pitches, people respond with skepticism. ‘Why would I need it? That problem has never happened before.’ Engineers have a hard time convincing their managers to spend that extra dollar on the costs because of this ‘it-never-happened-before’ attitude. In the end, there are compromises. Yet ultimately, it’s going to cost manufacturers a lot of money when suddenly there’s a deluge of demands to fix this situation right away.”

References

S. Skorobogatov, “Low temperature data remanence in static RAM”, Technical report UCAM-CL-TR-536, University of Cambridge Computer Laboratory, June 2002.
Han, SJ., Han, JK., Yun, GJ. et al. Ultra-fast data sanitization of SRAM by back-biasing to resist a cold boot attack. Sci Rep 12, 35 (2022). https://doi.org/10.1038/s41598-021-03994-2

The post SRAM Security Concerns Grow appeared first on Semiconductor Engineering.

Semiconductor Engineering
Enhancing HMI Security: How To Protect ICS Environments From Cyber ThreatsJim Montgomery
HMIs (Human Machine Interfaces) can be broadly defined as just about anything that allows humans to interface with their machines, and so are found throughout the technical world. In OT environments, operators use various HMIs to interact with industrial control systems in order to direct and monitor the operational systems. And wherever humans and machines intersect, security problems can ensue. Protecting HMI in cybersecurity plans, particularly in OT/ICS environments, can be a challenge, as H
9. Květen 2024 v 09:05

Enhancing HMI Security: How To Protect ICS Environments From Cyber Threats

Semiconductor Engineering

Od: Jim Montgomery

9. Květen 2024 v 09:05

HMIs (Human Machine Interfaces) can be broadly defined as just about anything that allows humans to interface with their machines, and so are found throughout the technical world. In OT environments, operators use various HMIs to interact with industrial control systems in order to direct and monitor the operational systems. And wherever humans and machines intersect, security problems can ensue.

Protecting HMI in cybersecurity plans, particularly in OT/ICS environments, can be a challenge, as HMIs offer a variety of vulnerabilities that threat actors can exploit to achieve any number of goals, from extortion to sabotage.

Consider the sort of OT environments HMIs are found in, including water and power utilities, manufacturing facilities, chemical production, oil and gas infrastructure, smart buildings, hospitals, and more. The HMIs in these environments offer bad actors a range of attack vectors through which they can enter and begin to wreak havoc, either financial, physical, or both.

What’s the relationship between HMI and SCADA?

SCADA (supervisory control and data acquisition) systems are used to acquire and analyze data and control industrial systems. Because of the role SCADA plays in these settings — generally overseeing the control of hugely complex, expensive, and dangerous-if-misused industrial equipment, processes, and facilities — they are extremely attractive to threat actors.

Unfortunately, the HMIs that operators use to interface with these systems may contain a number of vulnerabilities that are among the most highly exploitable and frequently breached vectors for attacks against SCADA systems.

Once an attacker gains access, they can seize from operators the ability to control the system. They can cause machinery to malfunction and suffer irreparable damage; they can taint products, steal information, and extort ransom. Even beyond ransom demands, the cost of production stoppages, lost sales, equipment replacement, and reputational damage can swallow some companies and create shortages in the market. Attacks can also cause equipment to perform in ways that threaten human life and safety.

Three types of HMIs in ICS that are vulnerable to attack

HMI security has to account for a range of “vulnerability options” available for exploitation by bad actors, such as keyboards, touch screens, and tablets, as well as more sophisticated interface points. Among the more frequently attacked are the Graphical User Interface and mobile and remote access.

Graphical User Interface

Attackers can use the Graphical User Interface or GUI to gain complete access to the system and manipulate it at will. They can often gain access by exploiting misconfigured access controls or bugs and other vulnerabilities that exist in a lot of software, including GUI software. If the system is web- or network-connected, their work is easier, especially if introducing malware is a goal. Once in, they can also move laterally, exploring or compromising interconnected systems and widening the attack.

Mobile and remote access

Even before COVID-19, mobile and remote access techniques were already being incorporated into managing a growing number of OT networks. When the pandemic hit hard, remote access often became a necessity. As the crisis faded, however, mobile and remote access became even more entrenched.

Remote access points are especially vulnerable. For one, remote access software can contain its own security vulnerabilities, like unpatched flaws and bugs or misconfigurations. Attackers may find openings in VPNs (virtual private networks) or RDP (remote desktop protocol) and use these holes to slip past security measures and carry out their mission.

Access controls

Attackers can compromise access control mechanisms to acquire the same permissions and privileges as authorized users, and once they gain access, they can do pretty much anything they want regarding system operations and data access. Access can be gained in many of the usual ways, such as an outdated VPN or stolen or purchased credentials. (Stolen or other credentials are readily available through online markets.)

The initial attack may just be a toe in the network while reconnaissance for holes in the access control system is conducted. Weak passwords, unnecessary access rights, and the usual misconfigurations and software vulnerabilities are all an attacker needs. As further walls are breached, attackers can then escalate their level of privilege to do whatever a legitimate user can do.

Understanding attack techniques in ICS HMI cybersecurity

Code injection

When attackers insert or inject malicious code into a software program or system, that’s code injection, and it can give the attacker access to core system functions. The resulting mayhem can include manipulation of control software, leading to shutdowns, equipment damage, and dangerous, even life-threatening situations if system changes result in hazardous chemical releases, changed formulas, explosions, or the misbehavior of large, heavy machinery. Code injections can corrupt, delete, or steal data and may result in compliance failure and fines in certain situations.

Malware virus infection

Malware can enter a network through various access points in addition to HMIs, even ones no one would ever expect, such as manufacturer-provided software updates or factory-fresh physical assets added to the production environment. A technician connecting a laptop or an employee plugging in a flash drive without knowing it’s infected will work just as well. As the walls between IT and OT thin, that attack surface widens as well. Once in the network, the attacker can escalate privileges, look around a bit, and see what’s worth doing or stealing. When enough has been learned, the attacker executes the malicious code, which can include ransomware or spyware. As in other attacks, operations can be interfered with, sometimes dangerously so.

Data tampering

Data tampering simply means that data is altered without authorization, including data used to operate, control, and monitor industrial systems. Attackers gain access through vulnerabilities in the system software or HMI devices or through passageways between IT and OT. Once in, they can explore the system to give themselves even greater access to more sensitive areas, where they can steal valuable and confidential system data, interrupt operations, compromise equipment, and damage the company’s business interests and competitive advantage.

Memory corruption

Memory corruption can happen in any computer network and may not represent anything nefarious. Yet memory corruption has also been used as an attack technique that can be deployed against OT networks and is thus potentially extremely damaging since data controls machinery, processes, formulas, and other essential functions. Attackers find software vulnerabilities in HMI or other access points through which the memory of an application or system can be reached and corrupted. This can lead to crashes, data leakage, denial of services (DoS), and even attacker takeovers of ICS and SCADA systems.

Spear phishing

Spear phishing attacks are generally launched against IT networks, which can then be used to open a corridor to the OT network. Spear phishing is basically a more targeted version of phishing attacks, in which an attacker will impersonate a legitimate, trusted source via email or web page, for example. In 2014, attackers targeted a German steel mill with an email suspected of carrying malicious code. They then used access to the business network to get to the SCADA/ICS network, where they modified the PLCs (programmable logic controllers) and took over the furnace’s operations. The physical damage they inflicted forced the plant to shut down.

DoS and DDoS attacks

Denial of Service (DoS) and Distributed Denial of Service (DDoS) work by overwhelming HMI points with excessive traffic or requests so they are unable to handle authorized control and monitoring functions. In 2016, some particularly vicious malware dubbed Industroyer (also Crashoveride) was deployed in an attack against Ukraine’s power grid and blacked out a substantial section of Kyiv. Industroyer was developed specifically to attack ICS and SCADA systems. The multipronged attack began by exploiting vulnerabilities in digital substation relays. A timer regulating the attack executed a distributed denial-of-service (DDoS) attack on every protection relay on the network that used any of four specific communication protocols. Simultaneously, it deleted all MicroSCADA-related files from the workstations’ hard drives. As the relays stopped functioning, lights went out across the city.

Exploiting remote access

The growing use of remote access to HMI systems during and after COVID-19 has provided threat actors with a wealth of newly available attack vectors. Less-than-airtight remote access security protocols make them very enticing for ICS-specific malware. HAVEX malware, for example, uses a remote access trojan (RAT) downloaded from OT vendor websites. The RAT can then scan for devices on the ports commonly used OT assets, collect information, and send it back to the attacker’s command and control server. A long-term attack used just such a method to gain remote access to energy networks in the U.S. and internationally, during which data thieves collected and “exfiltrated” (stole) enterprise and ICS-related data.

Credential theft

Obtaining unauthorized credentials is not all that difficult these days, with a robust online marketplace making it easier than ever. Phishing and spear phishing, malware, weak passwords, and vulnerabilities or misconfigurations that grant access to places where unencrypted credentials are all sources. With credentials in hand, attackers can move past security, including MFA (multifactor authentication), conduct reconnaissance, and give themselves whatever level of privilege they need to complete whatever their mission is. Or they simply persist and observe, learning all they can before finally acting against the ICS or SCADA system.

Zero-day attacks

Zero-day attacks got their name because they’re generally carried out against a previously existing yet unknown vulnerability; the vendor has zero days to fix it because the attack is already underway. Vulnerabilities that are completely unknown to either the software developer or the cybersecurity community exist throughout the software world, including in OT networks and their HMIs. Unsuspected and thus unpatched, they give fast-moving threat actors the opportunity to carry out a zero-day attack without resistance. The 2010 Stuxnet attack against Iran’s nuclear program used zero-day vulnerabilities in Windows to access the network and spread, eventually destroying the centrifuges. One thousand machines sustained physical damage.

Best practices for enhancing HMI security

Network segmentation for isolation

Network segmentation should be a core defense in securing industrial networks. Segmentation creates an environment that’s naturally resistant to intruders. Many of the attack techniques described above give attackers the ability to move laterally through the network. Segmenting the network prevents this lateral movement, limiting the attack radius and potential for damage. As OT networks become more connected to the world and the line between IT and OT continues to blur, network segmentation can segregate HMI systems from other parts of the network and the outside world. It can also segment defined zones within the OT network from each other so attacks can be contained.

Software and firmware updates

Software and firmware updates are recommended in all cybersecurity situations, but installing patches and updates in OT networks is easier said than done. OT networks prioritize continuous operations. There are compatibility issues, unpatchable legacy systems, and other roadblocks. The solution is virtual patching. Virtual patching is achieved by identifying all vulnerabilities within an OT network and applying a security mechanism such as a physical IPS (intrusion prevention system) or firewall. Rules are created, traffic is inspected and filtered, and attacks can be blocked and investigated.

Employee training on cybersecurity awareness

The more employees know about network operations, vulnerabilities, and cyberattack methods, the more they can do to help protect the network. Since few organizations have the internal staff to provide the necessary training, third-party training partners can be a viable solution. In any event, all employees should be trained in a company’s written policies, the general threat landscape, security best practices, how to handle physical assets like flash drives or laptops, how to recognize an attack, and what the company’s response protocol is. Specific training should be provided for employees who work remotely.

The evolving HMI security threat landscape

Concrete predictions about future threats and responses are hard to make, but the HMI security threat landscape will most likely evolve much the same way the entire security landscape will, with one major addition.

Air-gapped environments are going away

For a long time, many OT networks were air-gapped off from the world, physically and digitally isolated from the risks of contamination. Data and malware transfer alike required physical media, but inconvenience was safety. As OT networks continue to merge with the connected world, that kind of protection is going away. Remote work is becoming more prevalent, and the very connected IoT (Internet of Things) is now all over the automated factory floor. If wireless access points are left hanging from equipment, no one gives it a thought, except threat actors looking for a way in. (This is where basic employee training might help.)

Threat actors are innovators

Threat actors are becoming increasingly sophisticated. They devote much more time and thought to innovative ways to penetrate HMI and other OT network points than the people who operate them. AI and machine learning techniques are further empowering bad actors.

The statistics bear this out, especially as IT and OT networks continue to converge. In a study on 2023 OT/ICS cybersecurity activities, 76% of organizations were moving toward converged networks, and 97% reported IT security incidents also affected OT environments. Nearly half (47%) of businesses reported OT/ICS ransomware attacks, and 76% had significant concerns about state-sponsored actors.

On the positive side, however, pressure from regulators, insurance companies, and boards of directors is pushing organizations to think and act on cybersecurity for HMI points and throughout the network far more aggressively than many currently do. According to the study, 68% of organizations were increasing their budgets, 38% had dedicated OT security teams, and 77% had achieved a level-3 maturity in OT/ICS security.

Complete OT security

Cybersecurity in industrial environments presents challenges far different than those in IT networks. TXOne specializes in OT cybersecurity, with OT-native solutions designed for the equipment, environment, and day-to-day realities of industrial settings.

The post Enhancing HMI Security: How To Protect ICS Environments From Cyber Threats appeared first on Semiconductor Engineering.

Semiconductor Engineering
Chip Industry Week In ReviewThe SE Staff
Synopsys refocused its security priorities around chips, striking a deal to sell off its Software Integrity Group subsidiary to private equity firms Clearlake Capital Group and Francisco Partners for about $2.1 billion. That deal comes on the heels of Synopsys’ recent acquisition of Intrinsic ID, which develops physical unclonable function IP. Sassine Ghazi, Synopsys’ president and CEO, said in an interview that the sale of the software group “gives us the ability to have management bandwidth, c
10. Květen 2024 v 09:01

Chip Industry Week In Review

Semiconductor Engineering

Od: The SE Staff

10. Květen 2024 v 09:01

Synopsys refocused its security priorities around chips, striking a deal to sell off its Software Integrity Group subsidiary to private equity firms Clearlake Capital Group and Francisco Partners for about $2.1 billion. That deal comes on the heels of Synopsys’ recent acquisition of Intrinsic ID, which develops physical unclonable function IP. Sassine Ghazi, Synopsys’ president and CEO, said in an interview that the sale of the software group “gives us the ability to have management bandwidth, capital, and to double down on what we’re doing in our core business.”

The U.S. Commerce Department reportedly pulled export licenses from Intel and Qualcomm that permitted them to ship semiconductors to Huawei, the Financial Times reported. The move comes after advanced chips from Intel reportedly were used in new laptops and smartphones from the China-based company.

Apple debuted its second-generation 3nm M4 chip with the launch of the new iPad Pro. The CPU and GPU each have up to 10 cores, with a neural engine capable of 38 TOPS, and a total of 28 billion transistors. Apple also is working with TSMC to develop its own AI processors for running software in data centers, reports The Wall Street Journal.

The U.S. is expected to triple its semiconductor manufacturing capacity by 2032, according to a new report by the Semiconductor Industry Association and Boston Consulting. By that year, the U.S. is projected to have 28% of global capacity for advanced logic manufacturing and over a quarter of total global capital expenditures.

Fig. 1: Source: Semiconductor Industry Association and Boston Consulting Group.

Quick links to more news:

Global
Market Reports
Automotive
Security
Product News
Education and Training
Research
In-Depth
Events
Further Reading

Around The Globe

The U.S. Commerce Department plans to solicit bids from organizations interested in creating and managing a new CHIPS Manufacturing USA institute focused on digital twins in the semiconductor sector. The government will award up to $285 million to the selected proposal.

The U.S. National Science Foundation and Department of Energy announced the first 35 projects to be supported with computational time through the National Artificial Intelligence Research Resource (NAIRR) Pilot. The initial selected projects will gain access to several U.S. supercomputing centers and other resources, with the goal of advancing responsible AI research.

Through its new Federal AI Sandbox, MITRE is offering up its computing power to U.S. government agencies. “Our new Federal AI Sandbox will help level the playing field, making the high-quality compute power needed to train and test custom AI solutions available to any agency,” stated Charles Clancy, MITRE, senior vice president and chief technology officer, in the release.

Saudi Arabia’s $100 billion investment fund for semiconductor and AI technology pledged it would divest from China if requested by the U.S, reported Bloomberg.

Japan’s SoftBank is holding talks with UK-based AI Chip firm Graphcore about a possible acquisition, reports Bloomberg.

India’s chip industry is heating up. Mindgrove launched the country’s first SoC, named Secure IoT. The chip clocks at 700 MHz, and the company is touting its key security algorithms, secure boot, and on-chip OTP memory. Meanwhile, Lam Research is expanding its global semiconductor fabrication supply chain to include India.

Microsoft will build a $3.3 billion AI data center in Racine, Wisconsin, the same location as the failed Foxconn investment touted six years ago.

Markets And Money

The SIA announced first-quarter global semiconductor sales grew more than 15% YoY, still 5.7% below Q4 2023, but a big improvement over last year. Consider that the semiconductor materials market contracted 8.2% in 2023 to $66.7 billion, down from a record $72.7 billion in 2022, according to a new report from SEMI.

The demand for AI-powered consumer electronics will drive global AI chipset shipments to 1.3 billion by 2030, according to ABI Research.

TrendForce released several new industry reports this week. Among the highlights:

HBM prices are expected to increase by up to 10% in 2025, representing more than 30% of total DRAM value.
In Q2, DRAM contract prices rose 13% to 18%, while NAND flash prices increased 15% to 20%.
The top 10 design firms’ combined revenue increased 12% in 2023, with NVIDIA taking the lead for the first time.

A number of acquisitions were announced recently:

High-voltage IC company, Power Integrations, will purchase the assets of Odyssey Semiconductor Technologies, a developer of gallium nitride (GaN) transistors.
Mobix Labs agreed to buy RF design company RaGE Systems for $20 million in cash, stock, and incentives.
V-Tek, a packaging services and inspection company, acquired A&J Programming, a manufacturer of automated handling and programming equipment.