Overcoming Chiplet Integration Challenges With Adaptability

Od: Jayson Bethurem

9. Květen 2024 v 09:06

Chiplets are exploding in popularity due to key benefits such as lower cost, lower power, higher performance and greater flexibility to meet specific market requirements. More importantly, chiplets can reduce time-to-market, thus decreasing time-to-revenue! Heterogeneous and modular SoC design can accelerate innovation and adaptation for many companies. What’s not to like about chiplets? Well, as chiplets come to fruition, we are starting to realize many of the complications of chiplet designs.

Fig. 1: Expanded chiplet system view.

The interface challenge

The primary concept of chiplets is integration of ICs (Integrated Circuits) from multiple companies. However, many of these did not fully consider interoperation with other ICs. That’s partially based on a lack of interconnect standards around chiplets. Moreover, ICs have their own computational and bandwidth requirements. This is further complicated as competing interfaces standards vie for adoption as shown in table 1.

Standard d	Throughput	Density	Max Delay
Advanced Interface Bus (Intel, AIB)	2 Gbps	504 Gbps/mm	5 ns
Bandwidth Engine	10.3 Gbps	N/A	2.4 ns
BoW (Bunch of Wires)	16 Gbps	1280 Gbps/mm	5 ns
HBM3 (JEDEC)	4.8 Gbps	N/A	N/A
Infinity Fabric (AMD)	10.6 Gbps	N/A	9 ns
Lipincon (TSMC)	2.8 Gbps	536 Gbps/mm	14ns
Multi-Die I/O (Intel)	5.4 Gbps	1600 Gbps/mm	N/A
XSR/USR (Rambus)	112 Gbps	N/A	N/A
UCIe	32 Gbps	1350 Gbps/mm	2 ns

Table 1: Chiplet interconnect options.

Most chiplet interconnects are dominated by UCIe (Universal Chiplet Interconnect) and the unimaginatively named BoW (Bunch of Wires). UCIe introduced the 1.0 spec, and as with any first edition of a specification, it is inevitable updates will follow. UCIe 1.1 fixes several holes and gaps in 1.0. It addresses gray areas, missing definitions, ECNs and more. And it is very likely not the last update, as UCIe’s vision is to grow up the stack – adding additional protocol layers on top of the system layers.

Because of newness and expected evolution of UCIe and BoW protocols, designing them in is risky. Additionally, there will always be a place for multiple die-to-die interfaces, beyond UCIe. Specific use cases and designs will inherently be matched to different metrics leading for many designs to fall back to proprietary interfaces.

As you can see, there are many choices, and many of these have tradeoffs. Integration into a chiplet with a variety of these protocols would greatly benefit from adaptability via data/protocol adaptation that can easily be enabled with embedded programmable logic, or eFPGA. A lightweight protocol shim implemented in eFPGA IP can not only reformat data but also buffer data to maximize internal processing. Finally, consider that data between ICs in a chiplet can be globally asynchronous – another easy task resolved with eFPGA IP with FIFO synchronizers.

The security challenge

Beyond the interfaces, security is another emerging challenge. A few factors of chiplets must be cautiously considered:

Varying ICs from unknown and possibly unreputable manufacturers
IC can contain internal IP from additional third-party sources
Each IC may receive and introduce external data into the system

Naturally, this begs for attestation and provenance to ensure vendor confidence. As such, root of trust generally starts with the supply chain and auditing all vendors. However, it only takes one failed component, the least secure component, to jeopardize the entire system.

Root of trust suddenly becomes an issue and uncovers another issue. Which IC, or ICs, in the chiplet manage root of trust? As we’ve seen time and time again, security threats evolve at an alarming rate. But chiplets have an opportunity here. Again, embedded FPGAs have the flexible nature to adapt, thus thwarting these evolving security threats. eFPGA IP can also physically disable unused interfaces – minimizing surface attack vectors.

Adaptable cryptography cores can perform a variety of tasks with high performance in eFPGA IP. These tasks include authentication/digital signing, key generation, encapsulation/decapsulation, random number generation and much more. Further, post-quantum security cores that run very efficiently on eFPGA are becoming available. Figure 2 shows a ML Kyber Encapsulation Module from Xiphera that fits into only four Flex Logix EFLX tiles, efficiently packed at 98% utilization with a throughput of over 2 Gbps.

Fig. 2: ML-KEM IP core from Xiphera implemented on Flex Logix EFLX eFPGA IP.

Managing all data communication within a chiplet seems daunting; however, it is feasible. Designers have the choice of implementing eFPGA on every IC in the chiplet for adaptable data signage. Or standalone on the interposer, where system designers can define a secure enclave in which all data is authenticated and encrypted by an independent IC with eFPGA. eFPGA can also process streaming data at a very high rate. And in most cases can keep up with line rate, as seen with programmable data planes in SmartNICs.

eFPGA can add another critical security benefit. Every instance of eFPGA in the chiplet offers the ability to obfuscate critical algorithms, cryptography and protocols. This enables manufacturers to protect design secrets by not only programming these features in a controlled environment, but also adapting these as threats evolve.

The validation problem

Again, the absence of fully defined industry standards presents integration challenges. Conventional methods of qualification, testing, and validation become increasingly more complex. Yet this becomes another opportunity for eFPGA IP. It can be configured as an in-system diagnostic tool that provides testing, debugging and observability. Not only during IC bring up, but also during run time – eliminating finger pointing between independent companies.

The reconfigurability solution

While we’ve discussed a few different chiplet issues and solutions with adaptable eFPGA, it is important to realize that a singular instance of this IP can perform all these functions in a chiplet, as eFPGA IP is completely reconfigurable. It can be time-sliced and uniquely configured differently during specific operational phases of the chiplet. As mentioned in the examples above, during IC bring up it can provide insightful debug visibility into the system. During boot, it can enable secure boot and attested firmware updates to all ICs in the chiplet. During run time, it can perform cryptographic functions as well independently manage a secure enclave environment. eFPGA is also perfect for any other software acceleration your applications need, as its heavily parallel and pipelined nature is perfect for complex signal processing tasks. Lastly, during an RMA process it can also investigate and determine system failures. This is just a short list of the features eFPGA IP can enable in a chiplet.

Customizable for the perfect solution

Flex Logix EFLX IP delivers excellent PPA (Power, Performance and Area) and is available on the most advanced nodes, including Intel 18A and TSMC 7nm and 5nm. Furthermore, Flex Logix eFPGA IP is scalable – enabling you to choose the best balance of programmable logic, embedded memory and signal processing resources.

Fig. 3: Scalable Flex Logix eFPGA IP.

Want to learn more about Flex Logix IP? Contact us at [email protected] or visit our website https://flex-logix.com.

The post Overcoming Chiplet Integration Challenges With Adaptability appeared first on Semiconductor Engineering.

Overcoming Chiplet Integration Challenges With Adaptability

Semiconductor Engineering

Od: Jayson Bethurem

9. Květen 2024 v 09:06

Chiplets are exploding in popularity due to key benefits such as lower cost, lower power, higher performance and greater flexibility to meet specific market requirements. More importantly, chiplets can reduce time-to-market, thus decreasing time-to-revenue! Heterogeneous and modular SoC design can accelerate innovation and adaptation for many companies. What’s not to like about chiplets? Well, as chiplets come to fruition, we are starting to realize many of the complications of chiplet designs.

Fig. 1: Expanded chiplet system view.

The interface challenge

The primary concept of chiplets is integration of ICs (Integrated Circuits) from multiple companies. However, many of these did not fully consider interoperation with other ICs. That’s partially based on a lack of interconnect standards around chiplets. Moreover, ICs have their own computational and bandwidth requirements. This is further complicated as competing interfaces standards vie for adoption as shown in table 1.

Standard d	Throughput	Density	Max Delay
Advanced Interface Bus (Intel, AIB)	2 Gbps	504 Gbps/mm	5 ns
Bandwidth Engine	10.3 Gbps	N/A	2.4 ns
BoW (Bunch of Wires)	16 Gbps	1280 Gbps/mm	5 ns
HBM3 (JEDEC)	4.8 Gbps	N/A	N/A
Infinity Fabric (AMD)	10.6 Gbps	N/A	9 ns
Lipincon (TSMC)	2.8 Gbps	536 Gbps/mm	14ns
Multi-Die I/O (Intel)	5.4 Gbps	1600 Gbps/mm	N/A
XSR/USR (Rambus)	112 Gbps	N/A	N/A
UCIe	32 Gbps	1350 Gbps/mm	2 ns

Table 1: Chiplet interconnect options.

Most chiplet interconnects are dominated by UCIe (Universal Chiplet Interconnect) and the unimaginatively named BoW (Bunch of Wires). UCIe introduced the 1.0 spec, and as with any first edition of a specification, it is inevitable updates will follow. UCIe 1.1 fixes several holes and gaps in 1.0. It addresses gray areas, missing definitions, ECNs and more. And it is very likely not the last update, as UCIe’s vision is to grow up the stack – adding additional protocol layers on top of the system layers.

Because of newness and expected evolution of UCIe and BoW protocols, designing them in is risky. Additionally, there will always be a place for multiple die-to-die interfaces, beyond UCIe. Specific use cases and designs will inherently be matched to different metrics leading for many designs to fall back to proprietary interfaces.

As you can see, there are many choices, and many of these have tradeoffs. Integration into a chiplet with a variety of these protocols would greatly benefit from adaptability via data/protocol adaptation that can easily be enabled with embedded programmable logic, or eFPGA. A lightweight protocol shim implemented in eFPGA IP can not only reformat data but also buffer data to maximize internal processing. Finally, consider that data between ICs in a chiplet can be globally asynchronous – another easy task resolved with eFPGA IP with FIFO synchronizers.

The security challenge

Beyond the interfaces, security is another emerging challenge. A few factors of chiplets must be cautiously considered:

Varying ICs from unknown and possibly unreputable manufacturers
IC can contain internal IP from additional third-party sources
Each IC may receive and introduce external data into the system

Naturally, this begs for attestation and provenance to ensure vendor confidence. As such, root of trust generally starts with the supply chain and auditing all vendors. However, it only takes one failed component, the least secure component, to jeopardize the entire system.

Root of trust suddenly becomes an issue and uncovers another issue. Which IC, or ICs, in the chiplet manage root of trust? As we’ve seen time and time again, security threats evolve at an alarming rate. But chiplets have an opportunity here. Again, embedded FPGAs have the flexible nature to adapt, thus thwarting these evolving security threats. eFPGA IP can also physically disable unused interfaces – minimizing surface attack vectors.

Adaptable cryptography cores can perform a variety of tasks with high performance in eFPGA IP. These tasks include authentication/digital signing, key generation, encapsulation/decapsulation, random number generation and much more. Further, post-quantum security cores that run very efficiently on eFPGA are becoming available. Figure 2 shows a ML Kyber Encapsulation Module from Xiphera that fits into only four Flex Logix EFLX tiles, efficiently packed at 98% utilization with a throughput of over 2 Gbps.

Fig. 2: ML-KEM IP core from Xiphera implemented on Flex Logix EFLX eFPGA IP.

Managing all data communication within a chiplet seems daunting; however, it is feasible. Designers have the choice of implementing eFPGA on every IC in the chiplet for adaptable data signage. Or standalone on the interposer, where system designers can define a secure enclave in which all data is authenticated and encrypted by an independent IC with eFPGA. eFPGA can also process streaming data at a very high rate. And in most cases can keep up with line rate, as seen with programmable data planes in SmartNICs.

eFPGA can add another critical security benefit. Every instance of eFPGA in the chiplet offers the ability to obfuscate critical algorithms, cryptography and protocols. This enables manufacturers to protect design secrets by not only programming these features in a controlled environment, but also adapting these as threats evolve.

The validation problem

Again, the absence of fully defined industry standards presents integration challenges. Conventional methods of qualification, testing, and validation become increasingly more complex. Yet this becomes another opportunity for eFPGA IP. It can be configured as an in-system diagnostic tool that provides testing, debugging and observability. Not only during IC bring up, but also during run time – eliminating finger pointing between independent companies.

The reconfigurability solution

While we’ve discussed a few different chiplet issues and solutions with adaptable eFPGA, it is important to realize that a singular instance of this IP can perform all these functions in a chiplet, as eFPGA IP is completely reconfigurable. It can be time-sliced and uniquely configured differently during specific operational phases of the chiplet. As mentioned in the examples above, during IC bring up it can provide insightful debug visibility into the system. During boot, it can enable secure boot and attested firmware updates to all ICs in the chiplet. During run time, it can perform cryptographic functions as well independently manage a secure enclave environment. eFPGA is also perfect for any other software acceleration your applications need, as its heavily parallel and pipelined nature is perfect for complex signal processing tasks. Lastly, during an RMA process it can also investigate and determine system failures. This is just a short list of the features eFPGA IP can enable in a chiplet.

Customizable for the perfect solution

Flex Logix EFLX IP delivers excellent PPA (Power, Performance and Area) and is available on the most advanced nodes, including Intel 18A and TSMC 7nm and 5nm. Furthermore, Flex Logix eFPGA IP is scalable – enabling you to choose the best balance of programmable logic, embedded memory and signal processing resources.

Fig. 3: Scalable Flex Logix eFPGA IP.

Want to learn more about Flex Logix IP? Contact us at [email protected] or visit our website https://flex-logix.com.

The post Overcoming Chiplet Integration Challenges With Adaptability appeared first on Semiconductor Engineering.

Accelerate Complex Algorithms With Adaptable Signal Processing Solutions

Semiconductor Engineering

Od: Jayson Bethurem

7. Březen 2024 v 09:03

Technology is continuously advancing and exponentially increasing the amount of data produced. Data comes from a multitude of sources and formats, requiring systems to process different algorithms. Each of these algorithms present their own challenges including low-latency and deterministic processing to keep up with incoming data rates and rapid response time. Considering that many of these semiconductors are designed years in advance of evolving technology, it places a challenging problem for IC designers. Take video systems for example, the resolution and color depth of imaging sensors is doubling every few years, affecting how the generated data gets processed. Alternatively, AI algorithms update nearly every year. In both cases, not only do the data paths need to increase in width and throughput, but so does the memory for weights and activations. It’s nearly unimaginable to build an IC today that doesn’t have some level of adaptability.

Fig. 1: Common adaptive noise filter design.

This is not a new concept. For years, many engineers have utilized FPGAs for this type of processing, which spans from I/Q data from radio comms to video streams from image sensors to BLDC motor control algorithms to AI models. FPGAs are perfect for handling complex algorithms that benefit from parallel and pipelined processing. Additionally, FPGA architectures are loaded with embedded memory, which can be tightly coupled to increase determinism and performance of algorithms, whereas processors can be bogged down with memory fetching, cache misses and low-level interrupts.

As data evolves and becomes more complex, FPGAs have followed suit. They have greatly increased their processing capability by building hardened signal processing blocks, which too have increased in capability over time. Many SoCs and ASICs have adjacent FPGAs to solve these processing challenges and cover this capability. However, discrete FPGA implementations have a few drawbacks, namely price and power, but also limited data transactions between the FPGA and external components like processors. But with Flex Logix eFPGA IP, any device can adopt this level of capability and reduce discrete overhead of FPGA cost and power by nearly 90%.

Like traditional FPGAs, Flex Logix EFLX IP includes 6-input LUT programmable logic, embedded memory and DSP blocks with 22×22 multipliers with 48-bit accumulators.

Fig. 2: EFLX eFPGA IP.

Unlike traditional FPGAs, these IP blocks can be scaled to fit your specific application. And depending on your application, you can select more or less DSP vs. logic as well as memory vs logic ratios. Thus, algorithms needing more memory and multipliers than logic can utilize higher ratios of DSP cores.

Flex Logix IP has evolved with signal processing demands and recently introduced InferX IP, which can dramatically increase performance and lower power consumption. InferX is effectively a scalable one-dimensional tensor processor (vector & matrix) controlled by the eFPGA fabric, which allows this IP to adapt to any signal processing algorithm implementation, including AI models. InferX has roughly 10 times the DSP performance of the aforementioned DSP IP and uses only one-quarter of the area. And while many associate TPUs with AI applications, this IP is ideal for any vector/matrix computation.

Fig. 3: InferX IP scalable from 1/8th of a tile to > 8 tiles.

InferX achieves up to dozens of TeraMACs/second at TSMC 5nm node. It is ideal for applications including FFT, FIR, IIR, Beam Forming, Matrix/Vector operations, Matrix Inversions, Kalman functions and more. It can handle Real or Complex, INT16x16 with accumulation at INT40 for accuracy. Multiple DSP operations can be pipelined in streaming mode or packet mode. See below for more benchmarks for common algorithms running on TSMC’s 5nm node.

InferX DSP solutions are easily programmed via common tools like Matlab Simulink. Flex Logix has built a ready-to-use standard Simulink block set that provides a simplified configuration, bit-accurate modeling with flexible precision.

Fig. 4: Simulink design flow.

Fig. 5: Cycle-accurate simulation of InferX soft logic driving InferX TPUs ensures functionality.

InferX IP works seamlessly with Flex Logix EFLX eFPGA IP and can be reconfigured in microseconds, enabling ICs to adapt to any data stream and the appropriate algorithm in near real-time. For ASIC manufacturers to accomplish this, they would have to multiplex between several hardened algorithms, forfeiting future algorithm and new data stream change. Flex Logix IP is the perfect adaptable accelerator for all semiconductors and is available for many nodes including advanced nodes like TSMC 5nm and 3nm as well as planned for Intel 18A.

Want to learn more about Flex Logix EFLX IP and signal processing solutions?
Contact us at [email protected] to learn more or visit our website https://flex-logix.com.

The post Accelerate Complex Algorithms With Adaptable Signal Processing Solutions appeared first on Semiconductor Engineering.

Zobrazení pro čtení

The interface challenge

The security challenge

The validation problem

The reconfigurability solution

Customizable for the perfect solution

The interface challenge

The security challenge

The validation problem

The reconfigurability solution

Customizable for the perfect solution