FreshRSS

Zobrazení pro čtení

Jsou dostupné nové články, klikněte pro obnovení stránky.

Deep dive into Arm’s new Cortex-X925 and Immortalis-G925 for mobile

Arm Cortex A925

Mobile chipset development continues to advance at a brisk pace, bringing us superior gaming performance, accelerating the latest AI features, and more power-efficient PCs. Arm, one of the companies charting the course, has announced its 2024 selection of CPU and GPU cores to power these growing use cases.

Some (but not all) of 2025’s next-generation top-tier smartphones will be powered by Arm’s newly announced cores. Arm has been giving out fewer details on its CPU and GPU technologies in recent years, but let’s examine the announcements in closer detail to see what we can expect.

The big one: Arm Cortex-X925 core

The flagship CPU in Arm’s 2024 portfolio is the powerhouse Arm Cortex-X925. Despite the name change, this is the direct successor to last generation’s Armv9.2 Cortex-X4 found in processors like the Qualcomm Snapdragon 8 Gen 3. We had anticipated this core to be called the Cortex-X5, but Arm has changed the moniker to match other products in this year’s portfolio.

Headline figures for the Arm Cortex-X925 include a 15% higher performance IPC improvement over the Cortex-X4. This extends to 36% once the gains from moving to 3nm manufacturing, higher clock speeds in excess of 3.6GHz, and larger caches are factored in.  AI performance sees even bigger potential gains, running some models 46% faster on the CPU than the X4. The bottom line is that single-core CPU capabilities will see a significant uplift next-gen.

Cortex-X925Arm Cortex-X4Arm Cortex-X3Arm Cortex-X2
Peak clock speed~3.6GHz~3.4GHz~3.25GHz~3.0GHz
Decode Width10 instructions10 instructions6 instructions
(8 mop)
5 instructions
Dispatch Pipeline Depth10 cycles10 cycles11 cycles for instructions
(9 cycles for mop)
10 cycles
OoO Execution Window1,500
(2x 750)
768
(2x 384)
640
(2x 320)
448
(2x 288)
Execution Units(assumed)
6x ALU (some 2-cycle)
2x ALU/MAC
2x ALU/MAC/DIV

3x Branch
6x ALU
1x ALU/MAC
1x ALU/MAC/DIV

3x Branch
4x ALU
1x ALU/MUL
1x ALU/MAC/DIV

2x Branch
2x ALU
1x ALU/MAC
1x ALU/MAC/DIV

2x Branch
ArchitectureARMv9.2ARMv9.2ARMv9ARMv9

The gains from 3nm are an important part of the performance uplift expected for this generation. Arm has worked extensively to optimize its design for its partners on both FinFET and GAA processes (aka TSMC and Samsung). That leaves the 15% like-for-like improvement over the previous model, which comes down to several key changes in the X925’s microarchitecture.

In the processing core, for example, the X925 now has six SIMD units (the powerful number crunchers that batch compute floating point math and AI workloads) up from four, allowing them to do more heavy math in parallel.  This likely accounts for most of the core’s AI/ML performance boost. There’s also an additional integer multiply unit and extra floating point compare unit, which again increases the core’s sheer number-crunching capabilities when fully fed. Arm is reluctant to discuss die area size these days, but the X925 must be getting pretty big.

Arm Client 2024 CPU Reference Cluster

Credit: Robert Triggs / Android Authority

Another interesting change is that some of the ALUs have been switched to dedicated 2-cycle instruction versions. This helps avoid stalls in the regular 1-cycle units but presumably means that these ALUs can’t perform some of the simpler arithmetic. This seems like the sort of design change that only intricate use-case data would allude to.

Instruction dispatch remains 10-wide, but Arm has doubled the X925’s maximum number of instructions in flight, now a colossal 1,500. Likewise, there’s twice the L1 instruction cache bandwidth and double the L1 instruction lookup table size to speed up instruction fetching. Meanwhile, the backend consists of an extra load pipeline to bring more data in from memory. In other words, there are plenty of out-of-order instructions floating around to keep those number-crunching cores busy.

That’s a lot of jargon, but the themes are very familiar from previous years— an ever-wider front end feeding an increasingly insatiable execution engine. In that sense, the X925 is an update to the X4 rather than a wholesale redesign. Even so, performance will take a solid leap forward again in 2025, though a fair chunk of the benefits also come from the move to 3nm.

Power-efficient Arm Cortex-A725 and A520

Sadly, Arm hasn’t provided as many details about the equally important Cortex-A725 — the new middle core that’ll form the backbone of upcoming mobile SoCs.

Arm claims that the A725 is 25% more efficient than the A720 and offers the option for higher peak performance if required. Again, though, this implies the move to 3nm, and Arm hasn’t given us a standard metric for IPC performance gains. However, it claims a 20% boost to L3 traffic, which helps realize some extra performance.

On the microarchitecture level, Arm increased the re-order buffer and instruction issue queue sizes, improving throughput. A new 1MB L2 cache configuration also allows the core to reach a higher performance level. But if that’s it, the A725 is a minor revision of the A720, which was already an optimization of 2022’s A710 core.

Arm Cortex A725 efficiency graph

Credit: Robert Triggs / Android Authority

This leads us to the refreshed Cortex-A520, certainly the least exciting model in this year’s CPU trio. The core architecture remains unchanged. Instead, Arm has optimized the A520 footprint for upcoming 3nm processes, resulting in 15% energy-efficient gains.

Looking at Arm’s power efficiency curves, this generation has an even greater crossover between the Cortex-A725 and A520. While the A520 can still reach the very lowest power levels for standby and low-clock tasks, the A725 can deliver vastly more performance for the same power as a maxed-out A520. In other words, many tasks run much faster and just as efficiently on the A725. It’s little wonder that Arm’s 2024 reference design suggests just two A520s, further reducing the number of small cores from what we see in current-generation chipsets.

Vastly improved gaming with the Immortalis G925

Arm continues to upgrade its GPU line-up too, with the Immortalis G925, Mali G725, and Mali G625. As with last year’s range, silicon partners need to use a larger core count to ensure robust ray tracing performance and leverage the Immortalis branding. Ten to 24 cores, up from 16 last gen, is classed as Immortalis, six to nine for a G725 implementation, and one to five cores for a budget G625 setup.

Regardless of the configuration, each G925 core promises a 30% reduction in power consumption when built on 3nm, up to 37% improved performance, and a whopping 52% gain in ray racing over last-gen’s Immortalis G720. That last metric has a big caveat: it requires developers to leverage new APIs to designate targets as “intricate objects,” which the G925 then traces with reduced fidelity. Think leaves or grass that is very expensive to compute individually but that players won’t notice if ray traced at lower accuracy. It’s a neat idea, but entirely dependent on developers knowing about and then coding for.

Arm Immortalis G925 Performance

In real-world games, Arm is claiming even more significant gains with 14 Immortalis G925 cores versus 12 older G720. Of course, that’s not a like-for-like comparison, so take it with a pinch of salt. But giving Arm the benefit of the doubt, I’d guess that you can fit 14 G925 cores in the space of 12 of the previous G720s, but that’s entirely my speculation.

Still, for just two more cores, Arm touts a 72% performance improvement in Call of Duty, 49% in Genshin Impact, 46% in Diablo Immortal, and a 29% gain in Fortnite. The key likes in the core’s new Fragment Prepass technique. The TLDR is that this vastly improves hidden object culling (think a player or object hidden behind a wall), reducing CPU load for these big performance gains. Games with complex geometry benefit most, hence the performance differences between CoD and Fortnite.

If you want a more in-depth explanation, Arm has replaced the traditional Z-buffer Hidden Surface Removal (HSR) technique, like forward pixel kill or primitive re-ordering, with its fragment prepass technology. The key difference is that it removes the need to re-order the Z-buffer (depth buffer) to make culling decisions, reducing driver CPU cycles by up to 43% per thread. This is all done in hardware, meaning there is no overhead for developers, but it doesn’t benefit all games equally.

What about AI?

Arm Immortalis G925 Machine Learning

No 2024 announcement is complete without AI, and Arm had a fair bit to say here despite not having a dedicated AI accelerator to augment its more traditional CPU and GPU parts. Instead, Arm is banking on the more developer-friendly and universal appeal of the CPU and, to a lesser extent, the GPU to tout its AI capabilities.

For instance, Arm points out that most third-party AI Android apps run on the CPU rather than an accelerator, as few have invested the development resources to support the numerous SoC API platforms. In lieu of a more universal API, Arm is banking on the CPU to remain an essential component for AI. That said, this is much easier to say when you don’t have skin in the mobile AI accelerator market.

Still, Arm has some performance numbers to trot out here. The Arm Cortex-X925 boasts a 42% faster time to first token with an 8-billion LLaMA 3 model and 46% faster for a 3.8 billion Phi 3 model. AI CPU inference is also up 59% compared to the Cortex-X4, with GPU inference capabilities receiving a 36% boost over last year’s reference platform. Similarly, the new GPU (in a 14-core versus 12-core configuration) is up to 50% faster in natural language processing, 41% faster in image segmentation, and 32% faster for speech-to-text.

Those are all very welcome improvements to help make AI apps more responsive, but it’s worth remembering that neither a CPU nor a GPU is as fast and efficient as a dedicated AI accelerator.

What to expect from next-gen products

Samsung Galaxy S24 homescreen in hand

Credit: Robert Triggs / Android Authority

Arm’s next-gen cores are destined for 2025 flagship smartphones, with Samsung and MediaTek likely to be the biggest mobile silicon vendors to leverage these cutting-edge technologies. Qualcomm is moving to a new custom CPU core for the Snapdragon 8 Gen 4, which means that the majority of flagship Android phones in 2025 probably won’t use Arm Cortex-X925 or Immortalis-G925.

Likewise, the upcoming major wave of Windows on Arm laptops are all powered by Qualcomm’s Snapdragon X Elite platform. Again, this platform uses custom CPU cores rather than Arm’s Cortex. Arm didn’t have much to say about specific plans for Arm-based PCs, likely given Qualcomm’s exclusivity deal with Microsoft, which is rumored to end in 2024. Still, it’s entirely possible that we might see other silicon vendors use Arm Cortex-X cores, quite possibly the new X925, for rival chipsets at some point in 2025. For instance, Arm envisions a PC chip with up to 12 Cortex-X925 CPU cores to push performance well beyond mobile.

Although Arm announced its latest client technologies in the first half of the year, partner chipsets will be announced near the end of 2024,  at the earliest. Smartphones powered by the Cortex-X925 and/or Immortalis-G925 are expected to land in consumer hands in early 2025.

Arm’s new CPUs and GPUs will power 2025 phones, and here’s what to know

It’s that time of year again. Arm has revealed a new line of CPUs and GPUs that could power your flagship Android phone in 2025. What does this mean for your next phone, though? Join us as we dive into these new announcements and demystify them.

Want a deeper dive into Arm’s latest technologies? Then we’ve got a dedicated CPU and GPU article for you as well. But here are the key points worth knowing.

The biggest upgrade to a big core yet?

Arm Cortex A925

Arm traditionally offers a Cortex-X core as its most powerful CPU core, and rumors pointed to the Cortex-X5 being the biggest year-on-year performance leap in Cortex-X history. Only, this new core isn’t called the Cortex-X5 but rather the Cortex-X925.

The new core brings a notable clock speed boost, reaching 3.6GHz compared to the Cortex X4’s maximum touted speed of 3.4GHz. The new CPU core also has a few other tweaks (e.g. four load pipelines, two-cycle ALUs) to ensure it stays fed with instructions.

X925 offers big performance gains for next-gen phones, but 3mn is a key component.

In any event, Arm is touting a significant 36% boost to single-core performance using the Geekbench 6 single-core score as a reference. This is compared to an unnamed “premium Android” device with the Cortex-X4 CPU. That uplift will definitely make for even more responsive flagship phones next year. However, this comparison has a couple of caveats, such as the X925’s aforementioned clock speed boost over the Cortex-X4 and the smaller 3nm manufacturing process.

Arm says you can expect a 15% boost to IPC (instructions per clock) performance when the same clock speed and manufacturing process is taken into account. That’s generally in line with the leap from Cortex-X3 to Cortex-X4.

What about the medium and little cores?

The Arm Cortex-A520 at 3nm.

Credit: Supplied by Arm

In addition to the Cortex-X925, Arm has also announced the Cortex-A725 as a successor to the Cortex-A720 medium core. This will likely be the workhorse for most tasks on your high-end smartphone, so we’re glad to see a 25% efficiency boost over the previous CPU core. This is particularly good news for companies like MediaTek that intend to ditch power-sipping little CPU cores in their flagship chips.

The UK chip designer also touted a somewhat nebulous 35% “performance efficiency” gain over the Cortex-A720, describing this as the improvement in performance divided by the improvement in power at that performance. It didn’t dish out a traditional performance gain figure.

So what about the little core? Arm is still offering the Cortex-A520, and it’s basically unchanged from previous generations. Arm mentioned a 15% efficiency gain, but this comes from optimizations for the 3nm manufacturing process. The key takeaway is that next-gen smartphones will be more frugal with power, but that’s mostly down to the move to 3nm.

A few small but key GPU upgrades

Arm’s Mali graphics parts are the backbone of many affordable phones today and a few high-end devices. Those expecting a giant step forward might be a little disappointed at first glance.

A big reason why the Immortalis-G925 GPU appears much more powerful than the current G720 is because Arm is increasing the GPU’s maximum shader core count for its comparisons. The current-generation Immortalis-G720 offers 10 to 16 shader cores, while the brand-new G925 sports options for 10 to 24 shader cores.

The Immortalis-G925 GPU seemingly takes a brute-force approach to boost performance, but also packs new features for in-game performance and efficiency.

Arm says you can expect 37% better performance in “graphics apps” or 30% less power consumption in leading games versus the G720. However, these figures compare a 14-core Immortalis-G925 to a 12-core Immortalis-G720, suggesting smaller gains in most of these areas when making an apples-to-apples comparison. Still, that performance boost is undoubtedly good for gamers.

The Arm-Immortalis G925 GPU.

Credit: Supplied by Arm

The company is serving up some real-world performance improvements thanks to a fragment prepass feature. This handles functionality related to object occlusion. This means up to 43% fewer CPU cycles are spent on the render thread. In English, this improvement significantly frees up the CPU, improving performance and efficiency in the process. This improvement is a big reason why you can apparently expect up to 72% better performance in Call of Duty Mobile, a 46% performance boost to Diablo Immortal, a 49% boost to Genshin Impact, and a 46% boost to Roblox.

Arm will also offer this GPU core as the mid-range Mali-G725, scaling from six to nine shader cores. Meanwhile, the Mali-G625 is available for low-end devices and scales from one to five shader cores. But we typically see chips for budget phones and Android TV boxes sticking with older Arm GPUs.

Who will actually use Arm’s new CPUs and GPUs?

A slide confirming that Qualcomm's mobile platform will get the Oryon CPU in 2024.

Credit: Hadlee Simons / Android Authority

Perhaps the biggest takeaway for Arm’s newest CPUs and GPUs is that they won’t appear in the Qualcomm Snapdragon 8 Gen 4 processor that’s tipped to power loads of high-end Android phones. This isn’t a surprise for the GPU, as Qualcomm has long used its own Adreno GPU for its Snapdragon processors. However, this is a major turn-of-events for the CPU side, as Snapdragon processors have used Arm Cortex CPU cores since 2017’s Snapdragon 835. Instead, Qualcomm will use its custom Oryon CPU in the 8 Gen 4.

That’s not to say we won’t see Arm’s Cortex CPU cores in Snapdragon silicon in 2025. However, Qualcomm typically uses older Cortex CPUs rather than the latest Arm cores in its mid-tier chipsets.

For the first time since 2016, top-end Snapdragon chips will use custom CPUs instead of Arm hardware.

So who will be using these newest CPU and GPU parts? MediaTek has long used the latest and greatest Arm CPUs and GPUs in its flagship Dimensity 9000 series of smartphone processors. That means the Taiwanese brand is a good bet to use these new parts in the upcoming Dimensity 9400. MediaTek went aggressive with the current Dimensity 9300, featuring four Cortex-X4 CPU cores and four Cortex-A720 CPU cores, with no little cores at all. It also adopted the Arm Immortalis-G720 GPU.

We’re also expecting Samsung to use Arm’s latest CPUs in the rumored Exynos 2500 processor. This chipset will likely power Galaxy S25 models in some regions. The S24’s Exynos 2400 processor opted for a rather exotic CPU arrangement, featuring one Cortex-X4, five Cortex-A720s, and four Cortex-A520 little cores. So it’s a safe bet that the Exynos 2500 could offer an eclectic CPU layout too. Samsung uses AMD-based GPUs instead of Arm’s graphics parts, and we expect this to be the case in 2025 as well.

Google uses Arm CPUs and GPUs for its Tensor smartphone processors, but the company tends to use older parts. Don’t be surprised if the Tensor G4, expected in the Pixel 9 series, continues this trend.

Laying the foundation for future Windows laptops

Arm Cortex-X925 PC performance

Credit: Supplied by Arm

Qualcomm has had an exclusivity deal with Microsoft since 2017, with Snapdragon chipsets exclusively powering Windows on Arm laptops. This deal is reportedly set to expire at the end of 2024, and AMD, NVIDIA, and MediaTek are apparently readying Arm-based Windows processors.

Therefore, it’s no surprise to see that Arm is touting the Cortex-X925 big core as delivering “ultimate performance for PCs.” Arm envisions a PC chip with up to 12 Cortex-X925 CPU cores for high-end performance. The company also claimed up to 25% more single-threaded performance compared to “shipping PC laptops” like the ASUS ROG Zephyrus G14 and Razer Blade 15.

If companies like MediaTek are indeed preparing to launch Windows on Arm chips using Arm CPUs in 2025, then there’s a good chance that the Cortex-X925 (or a successor) could form the basis for these processors. So we hope Arm’s claims bear out here.

No NPU acceleration? No problem!

Samsung Galaxy S24 Ultra on device AI toggle 1

Credit: Lanh Nguyen / Android Authority

Arm also insists that the CPU and GPU have a major role to play in AI processing, and it’s hard to disagree. Many AI models still run on the CPU and GPU because they might not be optimized for specific NPUs from the various chipmakers. In fact, Arm claims that 70% of AI apps on the Play Store default to the CPU. It also points to the fact that many of the apps that default to the NPU are actually first-party apps (e.g. Google, Samsung).

We’ve also seen some chipmakers like Qualcomm attempt to make life easier for app developers. The company has an AI Hub for app developers, offering AI models that are optimized for Qualcomm NPUs. These models can still run on MediaTek, Tensor, or Exynos processors but default to the CPU or GPU.

Arm is also offering AI-specific improvements for its CPU and GPU. This is handy because many AI apps still default to the CPU.

Fortunately, Arm says it’s bringing plenty of CPU and GPU improvements for AI. For starters, it says the Cortex-X925 CPU core offers a 46% performance boost over the Cortex-X4 when measuring time-to-first-token in the Phi 3 small language model (3.8 billion parameters). For less demanding AI tasks, Arm envisions the Cortex-A725 as the reliable workhorse.

The chip designer also says that its Immortalis-G925 GPU can bring 36% faster AI inference than the previous GPU. It adds that the new GPU is up to 50% faster in natural language processing, up to 41% faster in image segmentation, and up to 32% faster for speech-to-text. But again, these comparisons are for an Immortalis-G925 14-core GPU versus a 12-core Immortalis-G720 part. So we’re curious about a like-for-like comparison.


Competitive Open-Source EDA Tools

A technical paper titled “Basilisk: Achieving Competitive Performance with Open EDA Tools on an Open-Source Linux-Capable RISC-V SoC” was published by researchers at ETH Zurich and University of Bologna.

Abstract:

“We introduce Basilisk, an optimized application-specific integrated circuit (ASIC) implementation and design flow building on the end-to-end open-source Iguana system-on-chip (SoC). We present enhancements to synthesis tools and logic optimization scripts improving quality of results (QoR), as well as an optimized physical design with an improved power grid and cell placement integration enabling a higher core utilization. The tapeout-ready version of Basilisk implemented in IHP’s open 130 nm technology achieves an operation frequency of 77 MHz (51 logic levels) under typical conditions, a 2.3x improvement compared to the baseline open-source EDA design flow presented in Iguana, and a higher 55% core utilization compared to 50% in the baseline design. Through collaboration with EDA tool developers and domain experts, Basilisk exemplifies a synergistic effort towards competitive open-source electronic design automation (EDA) tools for research and industry applications.”

Find the technical paper here. Published May 2024.

Sauter, Phillippe, Thomas Benz, Paul Scheffler, Zerun Jiang, Beat Muheim, Frank K. Gürkaynak, and Luca Benini. “Basilisk: Achieving Competitive Performance with Open EDA Tools on an Open-Source Linux-Capable RISC-V SoC.” arXiv preprint arXiv:2405.03523 (2024).

Related Reading
EDA Back On Investors’ Radar
Big changes are fueling growth, and it’s showing in EDA revenue, acquisitions, and stock prices.
RISC-V Wants All Your Cores
It is not enough to want to dominate the world of CPUs. RISC-V has every core in its sights, and it’s starting to take steps to get there.

The post Competitive Open-Source EDA Tools appeared first on Semiconductor Engineering.

❌