FreshRSS

Zobrazení pro čtení

Jsou dostupné nové články, klikněte pro obnovení stránky.

Next-Gen High-Speed Communication In Data Centers

Data centers are being flooded with data. While more of it needs to be processed locally, much of it also needs to be moved around within a system and between systems. This has put a spotlight on a variety of new optical technologies and methodologies. Yang Zhang, senior product marketing manager at Cadence, talks about the rapid increase in different types of optics and optical scenarios being developed to improve performance, reduce latency, and reduce overall power.

The post Next-Gen High-Speed Communication In Data Centers appeared first on Semiconductor Engineering.

Blog Review: Aug. 21

Cadence’s Reela Samuel explores the critical role of PCIe 6.0 equalization in maintaining signal integrity and solutions to mitigate verification challenges, such as creating checkers to verify all symbols of TS0, ensuring the correct functioning of scrambling, and monitoring phase and LTSSM state transitions.

Siemens’ John McMillan introduces an advanced packaging flow for Intel’s Embedded Multi-die Interconnect Bridge (EMIB) technology, including technical challenges, design methodologies, and the integration of EMIBs in system-level package designs.

Synopsys’ Dustin Todd checks out what’s next for the U.S. CHIPS and Science Act, including the establishment of the National Semiconductor Technology Center and the allocation of $13 billion for research and development efforts.

Keysight’s Roberto Piacentini Filho explores the challenges of managing the large design files and massive volumes of data involved in a modern chip design project, which can take up as much as a terabyte of disk space and involve hundreds of thousands of files.

Arm’s Sandeep Mistry shows how ML models developed for mobile computer vision applications and requiring tens to hundreds of millions of multiply-accumulate (MACs) operations per inference can be deployed to a modern microcontroller.

Ansys’ Aliyah Mallak explores an effort to manufacture biotech products in microgravity and how simulation helps ensure payloads containing delicate, temperature-sensitive spore samples and bioreactors make it safely to the International Space Station or low Earth orbit safely.

Micron Technology’s Amit Srivastava, ULVAC’s Brian Coppa, and SEMI’s Mark da Silva suggest tackling corporate sustainability goals with a bottom-up approach that leverages various sensing technologies, at the cleanroom, sub-fab, and facilities levels for both greenfield and brownfield device-making facilities, to enable predictive analytics.

And don’t miss the blogs featured in the latest Manufacturing, Packaging & Materials newsletter:

Amkor’s JeongMin Ju shows how to prevent critical failures in copper RDLs caused by overcurrent-induced fusing.

Synopsys’ Al Blais discusses curvilinear checking and fracture requirements for the MULTIGON era.

Lam Research’s Dempsey Deng compares the parasitic capacitance of a 6F2 honeycomb DRAM device to a 4F2 VCAT DRAM structure.

Brewer Science’s Jessica Albright covers debonding methods, thermal, topography, adhesion, and thickness variation.

SEMI’s John Cooney reviews a fireside chat between the President of SEMI Americas and the U.S. Under Secretary of State for Economic Growth, Energy, and the Environment on securing supply chains.

The post Blog Review: Aug. 21 appeared first on Semiconductor Engineering.

3.5D: The Great Compromise

The semiconductor industry is converging on 3.5D as the next best option in advanced packaging, a hybrid approach that includes stacking logic chiplets and bonding them separately to a substrate shared by other components.

This assembly model satisfies the need for big increases in performance while sidestepping some of the thorniest issues in heterogeneous integration. It establishes a middle ground between 2.5D, which already is in widespread use inside of data centers, and full 3D-ICs, which the chip industry has been struggling to commercialize for the better part of a decade.

A 3.5D architecture offers several key advantages:

  • It creates enough physical separation to effectively address thermal dissipation and noise.
  • It provides a way to add more SRAM into high-speed designs. SRAM has been the go-to choice for processor cache since the mid-1960s, and remains an essential element for faster processing. But SRAM no longer scales at the same rate as digital transistors, so it is consuming more real estate (in percentage terms) at each new node. And because the size of a reticle is fixed, the best available option is to add area by stacking chiplets vertically.
  • By thinning the interface between processing elements and memory, a 3.5D approach also can shorten the distances that signals need to travel and greatly improve processing speeds well beyond a planar implementation. This is essential with large language models and AI/ML, where the amount of data that needs to be processed quickly is exploding.

Chipmakers still point to fully integrated 3D-ICs as the best performing alternative to a planar SoC, but packing everything into a 3D configuration makes it harder to deal with physical effects. Thermal dissipation is probably the most difficult to contend with. Workloads can vary significantly, creating dynamic thermal gradients and trapping heat in unexpected places, which in turn reduce the lifespan and reliability of chips. On top of that, power and substrate noise become more problematic at each new node, as do concerns about electromagnetic interference.

“What the market has adopted first is high-performance chips, and those produce a lot of heat,” said Marc Swinnen, director of product marketing at Ansys. “They have gone for expensive cooling systems with a huge number of fans and heat sinks, and they have opted for silicon interposers, which arguably are some of the most expensive technologies for connecting chips together. But it also gives the highest performance and is very good for thermal because it matches the coefficient of thermal expansion. Thermal is one of the big reasons that’s been successful. In addition to that, you may want bigger systems with more stuff that you can’t fit on one chip. That’s just a reticle-size limitation. Another is heterogeneous integration, where you want multiple different processes, like an RF process or the I/O, which don’t need to be in 5nm.”

A 3.5D assembly also provides more flexibility to add additional processor cores, and higher yield because known good die can be manufactured and tested separately, a concept first pioneered by Xilinx in 2011 at 28nm.

3.5D is a loose amalgamation of all these approaches. It can include two to three chiplets stacked on top of each other, or even multiple stacks laid out horizontally.

“It’s limited vertical, and not just for thermal reasons,” said Bill Chen, fellow and senior technical advisor at ASE Group. “It’s also for performance reasons. But thermal is the limiting factor, and we’ve talked about many different materials to help with that — diamond and graphene — but that limitation is still there.”

This is why the most likely combination, at least initially, will be processors stacked on SRAM, which simplifies the cooling. The heat generated by high utilization of different processing elements can be removed with heat sinks or liquid cooling. And with one or more thinned out substrates, signals will travel shorter distances, which in turn uses less power to move data back and forth between processors and memory.

“Most likely, this is going to be logic over memory on a logic process,” said Javier DeLaCruz, fellow and senior director of Silicon Ops Engineering at Arm. “These are all contained within an SoC normally, but a portion of that is going to be SRAM, which does not scale very well from node to node. So having logic over memory and a logic process is really the winning solution, and that’s one of the better use cases for 3D because that’s what really shortens your connectivity. A processor generally doesn’t talk to another processor. They talk to each other through memory, so having the memory on a different floor with no latency between them is pretty attractive.”

The SRAM doesn’t necessarily have to be at the same node as the processors advanced node, which also helps with yield, and reliability. At a recent Samsung Foundry event, Taejoong Song, the company’s vice president of foundry business development, showed a roadmap of a 3.5D configuration using a 2nm chiplet stacked on a 4nm chiplet next year, and a 1.4nm chiplet on top of a 2nm chiplet in 2027.


Fig. 1: Samsung’s heterogeneous integration roadmap showing stacked DRAM (HBM), chiplets and co-packaged optics. Source: Samsung Foundry

Intel Foundry’s approach is similar in many ways. “Our 3.5D technology is implemented on a substrate with silicon bridges,” said Kevin O’Buckley, senior vice president and general manager of Foundry Services at Intel. “This is not an incredibly costly, low-yielding, multi-reticle form-factor silicon, or even RDL. We’re using thin silicon slices in a much more cost-efficient fashion to enable that die-to-die connectivity — even stacked die-to-die connectivity — through a silicon bridge. So you get the same advantages of silicon density, the same SI (signal integrity) performance of that bridge without having to put a giant monolithic interposer underneath the whole thing, which is both cost- and capacity-prohibitive. It’s working. It’s in the lab and it’s running.”


Fig. 2: Intel’s 3.5D model. Source: Intel

The strategy here is partly evolutionary — 3.5D has been in R&D for at least several years — and part revolutionary, because thinning out the interconnect layer, figuring out a way to handle these thinner interconnect layers, and how to bond them is still a work in progress. There is a potential for warping, cracking, or other latent defects, and dynamically configuring data paths to maximize throughput is an ongoing challenge. But there have been significant advances in thermal management on two- and three-chiplet stacks.

“There will be multiple solutions,” said C.P. Hung, vice president of corporate R&D at ASE. “For example, besides the device itself and an external heat sink, a lot of people will be adding immersion cooling or local liquid cooling. So for the packaging, you can probably also expect to see the implementation of a vapor chamber, which will add a good interface from the device itself to an external heat sink. With all these challenges, we also need to target a different pitch. For example, nowadays you see mass production with a 45 to 40 pitch. That is a typical bumping solution. We expect the industry to move to a 25 to 20 micron bump pitch. Then, to go further, we need hybrid bonding, which is a less than 10 micron pitch.”


Fig. 3: Today’s interposers support more than 100,000 I/Os at a 45m pitch. Source: ASE

Hybrid bonding solves another thorny problem, which is co-planarity across thousands of micro-bumps. “People are starting to realize that the densities we’re interconnecting require a level of flatness, which the guys who make traditional things to be bonded are having a hard time meeting with reasonable yield,” David Fromm, COO at Promex Industries. “That makes it hard to build them, and the thinking is, ‘So maybe we’ve got to do something else.’ You’re starting to see some of that.”

Taming the Hydra
Managing heat remains a challenge, even with all the latest advances and a 3.5D assembly, but the ability to isolate the thermal effects from other components is the best option available today, and possibly well into the future. Still, there are other issues to contend with. Even 2.5D isn’t easy, and a large percentage of the 2.5D implementations have been bespoke designs by large systems companies with very deep pockets.

One of the big remaining challenges is closing timing so that signals arrive at the right place at the right fraction of a second. This becomes harder as more elements are added into chips, and in a 3.5D or 3D-IC, this can be incredibly complex.

“Timing ultimately is the key,” said Sutirtha Kabir, R&D director at Synopsys. “It’s not guaranteed that at whatever your temperature is, you can use the same library for timing. So the question is how much thermal- and IR-aware timing do you have to do? These are big systems. You have to make sure your sign-off is converging. There are two things coming out. There are a bunch of multi-physics effects that are all clumped together. And yes, you could traditionally do one at a time as sign-off, but that isn’t going to work very well. You need to figure out how to solve these problems simultaneously. Ultimately, you’re doing one design. It’s not one for thermal, one for IR, one for timing. The second thing is the data is exploding. How do you efficiently handle the data, because you cannot wait for days and days of runtime and simulation and analysis?”

Physically assembling these devices isn’t easy, either. “The challenge here is really in the thermal, electrical, and mechanical connection of all these various die with different thicknesses and different coefficients of thermal expansion,” said Intel’s O’Buckley. “So with three die, you’ve got the die and an active base, and those are substantially thinned to enable them to come together. And then EMIB is in the substrate. There’s always intense thermal-mechanical qualification work done to manage not just the assembly, but to ensure in the final assembly — the second-level assembly when this is going through system-level card attach — that this thing stays together.”

And depending upon demands for speed, the interconnects and interconnect materials can change. “Hybrid bonding gives you, by far, the best signal and power density,” said Arm’s DeLaCruz. “And it gives you the best thermal conductivity, because you don’t have that underfill that you would otherwise have to put in between the die, which is a pretty significant barrier. This is likely where the industry will go. It’s just a matter of having the production base.”

Hybrid bonding has been used for years for image sensors using wafer-on-wafer connections. “The tricky part is going into the logic space, where you’re moving from wafer-on-wafer to a die-on-wafer process, which is more complex,” DeLaCruz said. “While it currently would cost more, that’s a temporary problem because there’s not much of an installed base to support it and drive down the cost. There’s really no expensive material or equipment costs.”

Toward mass customization
All of this is leading toward the goal of choosing chiplets from a menu and then rapidly connecting them into some sort of architecture that is proven to work. That may not materialize for years. But commercial chiplets will show up in advanced designs over the next couple years, most likely in high-bandwidth memory with a customized processor in the stack, with more following that path in the future.

At least part of this will depend on how standardized the processes for designing, manufacturing, and testing become. “We’re seeing a lot of 2.5D from customers able to secure silicon interposers,” said Ruben Fuentes, vice president for the Design Center at Amkor Technology. “These customers want to place their chiplets on an interposer, then the full module is placed on a flip-chip substrate package. We also have customers who say they either don’t want to use a silicon interposer or cannot secure them. They prefer an RDL interconnect with S-SWIFT or with S-Connect, which serves as an interposer in very dense areas.”

But with at least a third of these leading designs only for internal use, and the remainder confined to large processor vendors, the rest of the market hasn’t caught up yet. Once it does, that will drive economies of scale and open the door to more complete assembly design kits, commercial chiplets, and more options for customization.

“Everybody is generally going in the same direction,” said Fuentes. “But not everything is the same height. HBMs are pre-packaged and are taller than ICs. HBMs could have 12 or 16 ICs stacked inside. It makes a difference from a co-planarity and thermal standpoint, and metal balancing on different layers. So now vendors are having a hard time processing all this data because suddenly you have these huge databases that are a lot bigger than the standard packaging databases. We’re seeing bridges, S-Connect, SWIFT, and then S-SWIFT. This is new territory, and we’re seeing a performance gap in the packaging tools. There’s work that needs to be done here, but software vendors have been very proactive in finding solutions. Additionally, these packages need to be routed. There is limited automated routing, so a good amount of interactive routing is still required, so it takes a lot of time.”


Fig. 4: Packaging roadmap showing bridge and hybrid bonding connections for modules and chiplets, respectively. Source: Amkor Technology

What’s missing
The key challenges ahead for 3.5D are proven reliability and customizability — requirements that are seemingly contradictory, and which are beyond the control of any single company. There are four major pieces to making all of this work.

EDA is the first important piece of the puzzle, and the challenge extends just beyond a single chip. “The IC designers have to think about a lot of things concurrently, like thermal, signal integrity, and power integrity,” said Keith Lanier, technical product management director at Synopsys. “But even beyond that, there’s a new paradigm in terms of how people need to work. Traditional packaging folks and IC designers need to work closely together to make these 3.5D designs successful.”

It’s not just about doing more with the same or fewer people. It’s doing more with different people, too. “It’s understanding the architecture definition, the functional requirements, constraints, and having those well-defined,” Lanier said. “But then it’s also feasibility, which includes partitioning and technology selection, and then prototyping and floor-planning. This is lots and lots of data that is required to be generated, and you need analysis-driven exploration, design, and implementation. And AI will be required to help designers and system design teams manage the sheer complexity of these 3.5D designs.”

Process/assembly design kits are a second critical piece, and this is likely to be split between the foundries and the OSATs. “If the customer wants a silicon interposer for a 2.5D package, it would be up to the foundry that’s going to manufacture the interposer to provide the PDK. We would provide the PDK for all of our products, such as S-SWIFT and S-Connect,” said Amkor’s Fuentes.

Setting realistic parameters is the third piece of the puzzle. While the type of processing elements and some of the analog functions may change — particularly those involving power and communication — most of the components will remain the same. That determines what can be pre-built and pre-tested, and the speed and ease of assembly.

“A lot of the standards that are being deployed, like UCIe interfaces and HBM interfaces are heading to where 20% is customization and 80% is on the shelf,” said Intel’s O’Buckley. “But we aren’t there today. At the scale that our customers are deploying these products, the economics of spending that extra time to optimize an implementation is a decimal point. It’s not leveraging 80/20 standards. We’ll get there. But most of these designs you can count on your fingers and toes because of the cost and scale required to do them. And until the infrastructure for standards-based chiplets gets mature, the barrier of entry for companies that want to do this without that scale is just too high. Still, it is going to happen.”

Ensuring processes are consistent is the fourth piece of the puzzle. The tools and the individual processes don’t need to change. “The customer has a ‘target’ for the outcome they want for a particular tool, which typically is a critical dimension measured by a metrology tool,” said David Park, vice president of marketing at Tignis. “As long as there is some ‘measurement’ that determines the goodness of some outcome, which typically is the result of a process step, we can either predict the bad outcome — and engineers have to take some corrective or preventive action — or we can optimize the recipe of that tool in real time to keep the result in the range they want.”

Park noted there is a recipe that controls the inputs. “The tool does whatever it is supposed to do,” he said. “Then you measure the output to see how far you deviated from the acceptable output.”

The challenge is that inside of a 3.5D system, what is considered acceptable output is still being defined. There are many processes with different tolerances. Defining what is consistent enough will require a broad understanding of how all the pieces work together under specific workloads, and where the potential weaknesses are that need to be adjusted.

“One of the problems here is as these densities get higher and the copper pillars get smaller, the amount of space you need between the pillar and the substrate have to be highly controlled,” said Dick Otte, president and CEO of Promex. “There’s a conflict — not so much with how you fabricate the chip, because it usually has the copper pillars on it — but with the substrate. A lot of the substrate technologies are not inherently flat. It’s the same issue with glass. You’ve got a really nice flat piece of glass. The first thing you’re going to do is put down a layer of metal and you’re going to pattern it. And then you put down a layer of dielectric, and suddenly you’ve got a lump where the conductor goes. And now, where do you put the contact points? So you always have the one plan which is going to be the contact point where all the pillars come in. But what if I only need one layer and I don’t need three?”

Conclusion
For the past decade, the chip industry has been trying to figure out a way to balance faster processing, domain-specific designs, limited reticle size, and the enormous cost of scaling an SoC. After investigating nearly every possible packaging approach, interconnect, power delivery method, substrate and dielectric material, 3.5D has emerged as the front runner — at least for now.

This approach provides the chip industry with a common thread on which to begin developing assembly design kits, commercial chiplets, and to fill in the missing tools and services throughout the supply chain. Whether this ultimately becomes a springboard for full 3D-ICs, or a platform on which to use 3D stacking more effectively, remains to be seen. But for the foreseeable future, large chipmakers have converged on a path forward to provide orders of magnitude performance improvements and a way to contain costs. The rest of the industry will be working to smooth out that path for years to come.

Related Reading
Intel Vs. Samsung Vs. TSMC
Foundry competition heats up in three dimensions and with novel technologies as planar scaling benefits diminish.
3D Metrology Meets Its Match In 3D Chips And Packages
Next-generation tools take on precision challenges in three dimensions.
Design Flow Challenged By 3D-IC Process, Thermal Variation
Rethinking traditional workflows by shifting left can help solve persistent problems caused by process and thermal variations.
Floor-Planning Evolves Into The Chiplet Era
Automatically mitigating thermal issues becomes a top priority in heterogeneous designs.

The post 3.5D: The Great Compromise appeared first on Semiconductor Engineering.

Chip Industry Technical Paper Roundup: August 20

New technical papers recently added to Semiconductor Engineering’s library:

Technical Paper Research Organizations
Design Technology Co-Optimization and Time-Efficient Verification for Enhanced Pin Accessibility in the Post-3-nm Node Samsung Electronics and Kyungpook National University (KNU)
Search-in-Memory (SiM): Reliable, Versatile, and Efficient Data Matching in SSD’s NAND Flash Memory Chip for Data Indexing Acceleration TU Dortmund, Academia Sinica, and National Taiwan University
Achieving Sustainability in the Semiconductor Industry: The Impact of Simulation and AI Lam Research
HMComp: Extending Near-Memory Capacity using Compression in Hybrid Memory Chalmers University of Technology and ZeroPoint Technologies
Retention-aware zero-shifting technique for Tiki-Taka algorithm-based analog deep learning accelerator Pohang University of Science and Technology, Korea University, and Kyungpook National University
Improvement of Contact Resistance and 3D Integration of 2D Material Field-Effect Transistors Using Semi-Metallic PtSe2 Contacts Yonsei University, Korea Advanced Institute of Science and Technology (KAIST), Lincoln University College, Korea Institute of Science and Technology (KIST), and Ewha Womans University
Ultra-steep slope cryogenic FETs based on bilayer graphene RWTH Aachen University, Forschungszentrum Julich, National Institute for Materials Science (Japan), and AMO GmbH

More Reading
Technical Paper Library home

The post Chip Industry Technical Paper Roundup: August 20 appeared first on Semiconductor Engineering.

Research Bits: Aug. 20

EUV mirror interference lithography

Researchers from the Paul Scherrer Institute developed an EUV lithography technique that can produce conductive tracks with a separation of just five nanometers by exposing the sample indirectly rather than directly.

Called EUV mirror interference lithography (MIL), the technique uses two mutually coherent beams that are reflected onto the wafer by two identical mirrors. The beams then create an interference pattern whose period depends on both the angle of incidence and the wavelength of the light. In addition to the 5nm resolution, the conductive tracks were found to have high contrast and sharp edges.

“Our results show that EUV lithography can produce extremely high resolutions, indicating that there are no fundamental limitations yet. This is really exciting since it extends the horizon of what we deem as possible and can also open up new avenues for research in the field of EUV lithography and photoresist materials,” said Dimitrios Kazazis of the Laboratory of X-ray Nanoscience and Technologies at PSI in a statement.

The method is currently too slow for industrial chip production and can produce only simple and periodic structures. However, the team sees it as a resource for early development of new photoresists and plans to continue research to improve its performance and capabilities. [1]

Artificial sapphire dielectrics

Researchers from Shanghai Institute of Microsystem and Information Technology created artificial sapphire dielectric wafers made of single-crystalline aluminum oxide (Al2O3).

“The aluminum oxide we created is essentially artificial sapphire, identical to natural sapphire in terms of crystal structure, dielectric properties and insulation characteristics,” said Tian Zi’ao, a researcher at SIMIT, in a release.

“By using intercalation oxidation technology on single-crystal aluminum, we were able to produce this single-crystal aluminum oxide dielectric material,” added Di Zengfeng, a researcher at SIMIT, in a release. “Unlike traditional amorphous dielectric materials, our crystalline sapphire can achieve exceptionally low leakage at just one-nanometer level.”

The researchers hope the improved dielectric properties could lead to more power-efficient devices. [2]

Accelerating computation on sparse data sets

Researchers from Lehigh University and Lawrence Berkeley National Laboratory developed specialized hardware that enables faster computation on data sets that have a high number of zero values, frequent in the fields of bioinformatics and physical sciences. The hardware is portable and can be integrated into general-purpose multi-core computers.

“The accelerating sparse accumulation (ASA) architecture includes a hardware buffer, a hardware cache, and a hardware adder. It takes two sparse matrices, performs a matrix multiplication, and outputs a sparse matrix. The ASA only uses non-zero data when it performs this operation, which makes the architecture more efficient. The hardware buffer and the cache allow the computer processor to easily manage the flow of data; the hardware adder allows the processor to quickly generate values to fill up the empty matrices,” explained Berkely Lab’s Ingrid Ockert in a press release. “Once these values are calculated, the ASA system produces an output. This operation is a building block that the researcher can then use in other functions. For instance, researchers could use these outputs to generate graphs or they could process these outputs through other algorithms such as a Sparse General Matrix-Matrix Multiplication (SpGEMM) algorithm.”

The ASA architecture could accelerate a variety of algorithms. Microbiome research is presented as an example, where it could be used to run metagenomic assembly and similarity clustering algorithms such as Markov Cluster Algorithms that quickly characterize the genetic markers of all of the organisms in a soil sample. [3]

References

[1] I. Giannopoulos, I. Mochi, M. Vockenhuber, Y. Ekinci & D. Kazazis. Extreme ultraviolet lithography reaches 5 nm resolution. Nanoscale, 12.08.2024 https://doi.org/10.1039/D4NR01332H

[2] Zeng, D., Zhang, Z., Xue, Z. et al. Single-crystalline metal-oxide dielectrics for top-gate 2D transistors. Nature (2024). https://doi.org/10.1038/s41586-024-07786-2

[3] Chao Zhang, Maximilian Bremer, Cy Chan, John M Shalf, and Xiaochen Guo. ASA: Accelerating Sparse Accumulation in Column-wise SpGEMM. ACM Transactions on Architecture and Code Optimization (TACO) Volume 19, Issue 4, Article No.: 49, Pages 1-24 https://doi.org/10.1145/3543068

The post Research Bits: Aug. 20 appeared first on Semiconductor Engineering.

A Generic Approach For Fuzzing Arbitrary Hypervisors

A technical paper titled “HYPERPILL: Fuzzing for Hypervisor-bugs by Leveraging the Hardware Virtualization Interface” was presented at the August 2024 USENIX Security Symposium by researchers at EPFL, Boston University, and Zhejiang University.

Abstract:

“The security guarantees of cloud computing depend on the isolation guarantees of the underlying hypervisors. Prior works have presented effective methods for automatically identifying vulnerabilities in hypervisors. However, these approaches are limited in scope. For instance, their implementation is typically hypervisor-specific and limited by requirements for detailed grammars, access to source-code, and assumptions about hypervisor behaviors. In practice, complex closed-source and recent open-source hypervisors are often not suitable for off-the-shelf fuzzing techniques.

HYPERPILL introduces a generic approach for fuzzing arbitrary hypervisors. HYPERPILL leverages the insight that although hypervisor implementations are diverse, all hypervisors rely on the identical underlying hardware-virtualization interface to manage virtual-machines. To take advantage of the hardware-virtualization interface, HYPERPILL makes a snapshot of the hypervisor, inspects the snapshotted hardware state to enumerate the hypervisor’s input-spaces, and leverages feedback-guided snapshot-fuzzing within an emulated environment to identify vulnerabilities in arbitrary hypervisors. In our evaluation, we found that beyond being the first hypervisor-fuzzer capable of identifying vulnerabilities in arbitrary hypervisors across all major attack-surfaces (i.e., PIO/MMIO/Hypercalls/DMA), HYPERPILL also outperforms state-of-the-art approaches that rely on access to source-code, due to the granularity of feedback provided by HYPERPILL’s emulation-based approach. In terms of coverage, HYPERPILL outperformed past fuzzers for 10/12 QEMU devices, without the API hooking or source-code instrumentation techniques required by prior works. HYPERPILL identified 26 new bugs in recent versions of QEMU, Hyper-V, and macOS Virtualization Framework across four device-categories.”

Find the technical paper here. Published August 2024. Distinguished Paper Award Winner.

Bulekov, Alexander, Qiang Liu, Manuel Egele, and Mathias Payer. “HYPERPILL: Fuzzing for Hypervisor-bugs by Leveraging the Hardware Virtualization Interface.” In 33rd USENIX Security Symposium (USENIX Security 24). 2024.

Further Reading
SRAM Security Concerns Grow
Volatile memory threat increases as chips are disaggregated into chiplets, making it easier to isolate memory and slow data degradation.

The post A Generic Approach For Fuzzing Arbitrary Hypervisors appeared first on Semiconductor Engineering.

A Novel Attack For Depleting DNN Model Inference With Runtime Code Fault Injections

A technical paper titled “Yes, One-Bit-Flip Matters! Universal DNN Model Inference Depletion with Runtime Code Fault Injection” was presented at the August 2024 USENIX Security Symposium by researchers at Peng Cheng Laboratory, Shanghai Jiao Tong University, CSIRO’s Data61, University of Western Australia, and University of Waterloo.

Abstract:

“We propose, FrameFlip, a novel attack for depleting DNN model inference with runtime code fault injections. Notably, Frameflip operates independently of the DNN models deployed and succeeds with only a single bit-flip injection. This fundamentally distinguishes it from the existing DNN inference depletion paradigm that requires injecting tens of deterministic faults concurrently. Since our attack performs at the universal code or library level, the mandatory code snippet can be perversely called by all mainstream machine learning frameworks, such as PyTorch and TensorFlow, dependent on the library code. Using DRAM Rowhammer to facilitate end-to-end fault injection, we implement Frameflip across diverse model architectures (LeNet, VGG-16, ResNet-34 and ResNet-50) with different datasets (FMNIST, CIFAR-10, GTSRB, and ImageNet). With a single bit fault injection, Frameflip achieves high depletion efficacy that consistently renders the model inference utility as no better than guessing. We also experimentally verify that identified vulnerable bits are almost equally effective at depleting different deployed models. In contrast, transferability is unattainable for all existing state-of-the-art model inference depletion attacks. Frameflip is shown to be evasive against all known defenses, generally due to the nature of current defenses operating at the model level (which is model-dependent) in lieu of the underlying code level.”

Find the technical paper here. Published August 2024. Distinguished Paper Award Winner.

Li, Shaofeng, Xinyu Wang, Minhui Xue, Haojin Zhu, Zhi Zhang, Yansong Gao, Wen Wu, and Xuemin Sherman Shen. “Yes, One-Bit-Flip Matters! Universal DNN Model Inference Depletion with Runtime Code Fault Injection.” In Proceedings of the 33th USENIX Security Symposium. 2024.

Related Reading
Why It’s So Hard To Secure AI Chips
Much of the hardware is the same, but AI systems have unique vulnerabilities that require novel defense strategies.

The post A Novel Attack For Depleting DNN Model Inference With Runtime Code Fault Injections appeared first on Semiconductor Engineering.

Uncovering A Significant Residual Attack Surface For Cross-Privilege Spectre-V2 Attacks

A technical paper titled “InSpectre Gadget: Inspecting the Residual Attack Surface of Cross-privilege Spectre v2” was presented at the August 2024 USENIX Security Symposium by researchers at Vrije Universiteit Amsterdam.

Abstract:

“Spectre v2 is one of the most severe transient execution vulnerabilities, as it allows an unprivileged attacker to lure a privileged (e.g., kernel) victim into speculatively jumping to a chosen gadget, which then leaks data back to the attacker. Spectre v2 is hard to eradicate. Even on last-generation Intel CPUs, security hinges on the unavailability of exploitable gadgets. Nonetheless, with (i) deployed mitigations—eIBRS, no-eBPF, (Fine)IBT—all aimed at hindering many usable gadgets, (ii) existing exploits relying on now-privileged features (eBPF), and (iii) recent Linux kernel gadget analysis studies reporting no exploitable gadgets, the common belief is that there is no residual attack surface of practical concern.

In this paper, we challenge this belief and uncover a significant residual attack surface for cross-privilege Spectre-v2 attacks. To this end, we present InSpectre Gadget, a new gadget analysis tool for in-depth inspection of Spectre gadgets. Unlike existing tools, ours performs generic constraint analysis and models knowledge of advanced exploitation techniques to accurately reason over gadget exploitability in an automated fashion. We show that our tool can not only uncover new (unconventionally) exploitable gadgets in the Linux kernel, but that those gadgets are sufficient to bypass all deployed Intel mitigations. As a demonstration, we present the first native Spectre-v2 exploit against the Linux kernel on last-generation Intel CPUs, based on the recent BHI variant and able to leak arbitrary kernel memory at 3.5 kB/sec. We also present a number of gadgets and exploitation techniques to bypass the recent FineIBT mitigation, along with a case study on a 13th Gen Intel CPU that can leak kernel memory at 18 bytes/sec.”

Find the technical paper here. Published August 2024. Distinguished Paper Award Winner.  Find additional information here on VU Amsterdam’s site.

Wiebing, Sander, Alvise de Faveri Tron, Herbert Bos, and Cristiano Giuffrida. “InSpectre Gadget: Inspecting the residual attack surface of cross-privilege Spectre v2.” In USENIX Security. 2024.

Further Reading
Defining Chip Threat Models To Identify Security Risks
Not every device has the same requirements, and even the best security needs to adapt.

The post Uncovering A Significant Residual Attack Surface For Cross-Privilege Spectre-V2 Attacks appeared first on Semiconductor Engineering.

Data Memory-Dependent Prefetchers Pose SW Security Threat By Breaking Cryptographic Implementations

A technical paper titled “GoFetch: Breaking Constant-Time Cryptographic Implementations Using Data Memory-Dependent Prefetchers” was presented at the August 2024 USENIX Security Symposium by researchers at University of Illinois Urbana-Champaign, University of Texas at Austin, Georgia Institute of Technology, University of California Berkeley, University of Washington, and Carnegie Mellon University.

Abstract:

“Microarchitectural side-channel attacks have shaken the foundations of modern processor design. The cornerstone defense against these attacks has been to ensure that security-critical programs do not use secret-dependent data as addresses. Put simply: do not pass secrets as addresses to, e.g., data memory instructions. Yet, the discovery of data memory-dependent prefetchers (DMPs)—which turn program data into addresses directly from within the memory system—calls into question whether this approach will continue to remain secure.

This paper shows that the security threat from DMPs is significantly worse than previously thought and demonstrates the first end-to-end attacks on security-critical software using the Apple m-series DMP. Undergirding our attacks is a new understanding of how DMPs behave which shows, among other things, that the Apple DMP will activate on behalf of any victim program and attempt to “leak” any cached data that resembles a pointer. From this understanding, we design a new type of chosen-input attack that uses the DMP to perform end-to-end key extraction on popular constant-time implementations of classical (OpenSSL Diffie-Hellman Key Exchange, Go RSA decryption) and post-quantum cryptography (CRYSTALS-Kyber and CRYSTALS-Dilithium).”

Find the technical paper here. Published August 2024.

Chen, Boru, Yingchen Wang, Pradyumna Shome, Christopher W. Fletcher, David Kohlbrenner, Riccardo Paccagnella, and Daniel Genkin. “GoFetch: Breaking constant-time cryptographic implementations using data memory-dependent prefetchers.” In Proc. USENIX Secur. Symp, pp. 1-21. 2024.

Further Reading
Chip Security Now Depends On Widening Supply Chain
How tighter HW-SW integration and increasing government involvement are changing the security landscape for chips and systems.

 

The post Data Memory-Dependent Prefetchers Pose SW Security Threat By Breaking Cryptographic Implementations appeared first on Semiconductor Engineering.

A New Low-Cost HW-Counterbased RowHammer Mitigation Technique

A technical paper titled “ABACuS: All-Bank Activation Counters for Scalable and Low Overhead RowHammer Mitigation” was presented at the August 2024 USENIX Security Symposium by researchers at ETH Zurich.

Abstract:

“We introduce ABACuS, a new low-cost hardware-counterbased RowHammer mitigation technique that performance-, energy-, and area-efficiently scales with worsening RowHammer vulnerability. We observe that both benign workloads and RowHammer attacks tend to access DRAM rows with the same row address in multiple DRAM banks at around the same time. Based on this observation, ABACuS’s key idea is to use a single shared row activation counter to track activations to the rows with the same row address in all DRAM banks. Unlike state-of-the-art RowHammer mitigation mechanisms that implement a separate row activation counter for each DRAM bank, ABACuS implements fewer counters (e.g., only one) to track an equal number of aggressor rows.

Our comprehensive evaluations show that ABACuS securely prevents RowHammer bitflips at low performance/energy overhead and low area cost. We compare ABACuS to four state-of-the-art mitigation mechanisms. At a nearfuture RowHammer threshold of 1000, ABACuS incurs only 0.58% (0.77%) performance and 1.66% (2.12%) DRAM energy overheads, averaged across 62 single-core (8-core) workloads, requiring only 9.47 KiB of storage per DRAM rank. At the RowHammer threshold of 1000, the best prior lowarea-cost mitigation mechanism incurs 1.80% higher average performance overhead than ABACuS, while ABACuS requires 2.50× smaller chip area to implement. At a future RowHammer threshold of 125, ABACuS performs very similarly to (within 0.38% of the performance of) the best prior performance- and energy-efficient RowHammer mitigation mechanism while requiring 22.72× smaller chip area. We show that ABACuS’s performance scales well with the number of DRAM banks. At the RowHammer threshold of 125, ABACuS incurs 1.58%, 1.50%, and 2.60% performance overheads for 16-, 32-, and 64-bank systems across all single-core workloads, respectively. ABACuS is freely and openly available at https://github.com/CMU-SAFARI/ABACuS.”

Find the technical paper here.

Olgun, Ataberk, Yahya Can Tugrul, Nisa Bostanci, Ismail Emir Yuksel, Haocong Luo, Steve Rhyner, Abdullah Giray Yaglikci, Geraldo F. Oliveira, and Onur Mutlu. “Abacus: All-bank activation counters for scalable and low overhead rowhammer mitigation.” In USENIX Security. 2024.

Further Reading
Securing DRAM Against Evolving Rowhammer Threats
A multi-layered, system-level approach is crucial to DRAM protection.

The post A New Low-Cost HW-Counterbased RowHammer Mitigation Technique appeared first on Semiconductor Engineering.

❌