Normální zobrazení

Jsou dostupné nové články, klikněte pro obnovení stránky.

PředevčíremHlavní kanál

Semiconductor Engineering
Margin Sensors In The WildBarry Pangrle
Back in March, I wrote up an article here that looked at how a proxy circuit could be used to measure variations in circuit performance as conditions changed in the operating environment. There were a couple of recent presentations on margin sensors at two of the big EDA vendors’ customer engineering forums that we’ll look at as well as another product with an upcoming presentation at DAC. Margin sensors have applications for silicon health and performance monitoring for SoCs, characterization,
10. Červen 2024 v 09:01

Margin Sensors In The Wild

Od: Barry Pangrle

10. Červen 2024 v 09:01

Back in March, I wrote up an article here that looked at how a proxy circuit could be used to measure variations in circuit performance as conditions changed in the operating environment. There were a couple of recent presentations on margin sensors at two of the big EDA vendors’ customer engineering forums that we’ll look at as well as another product with an upcoming presentation at DAC. Margin sensors have applications for silicon health and performance monitoring for SoCs, characterization, yield, reliability, safety, power, and performance. How they are configured, though, determines their best suited tasks.

The first presentation was given at Synopsys’ SNUG Silicon Valley on March 20, 2024, titled “Diagnosis of Timing Margin on Silicon with PMM (Path Margin Monitor)”, by Gurnrack Moon, Principal Engineer at Samsung. One of the key aspects of the PMM that Samsung appreciated was the closer correlation between the PMM and the actual paths versus, say, using a Ring Oscillator approach.

Fig. 1: Synopsys Path Margin Monitor diagram. (Source: Synopsys)

My previous article described how the “Monitor Logic” portion of the PMM diagram shown above in figure 1 would conceptually work. Taps taken along the synthetic circuit of buffers could be compared to see how far the signal made it down the path and thus determine how much margin is available. A strength of this approach is that it allows one PMM to be used on multiple paths. It does have a disadvantage, though, of introducing additional control overhead and adding additional delay components in to the monitor path.

The PMMs on the chip are connected in a daisy-chain fashion which reduces the number of signals needed to send information from the PMMs to the Path Margin Monitor Controller. This also reduces the number of signals for communication. This setup efficiently uses chip area to provide information about the state of the silicon. Typically, one might expect this type of capability to be exercised in a “diagnostic” mode where data would be captured, analyzed, and then used to determine appropriate voltage and frequency settings as opposed to a more dynamic or adaptive approach.

Samsung appreciated being able to “determine if there are problems or what is different from what is designed, and what needs to be improved. In addition, PMM data fed to the Synopsys Silicon.da analytics platform provides rich analytics, shortening the debug/analysis time.” This was used on production silicon. Synopsys also has other blog articles here and here for the interested reader.

The second presentation was given at CadenceLIVE Silicon Valley, April 17, 2024, titled “Challenges in Datacenters: Search for Advanced Power Management Mechanisms”, and presented by Ziv Paz, Vice President of Business Development at proteanTecs. His presentation focused on proteanTecs’ Margin Agents and noted how these sensors were sensitive to process, aging, workload stress, latent defects, operating conditions, DC IR drops, and local Vdroops.

Fig. 2: Reducing voltage while staying within margin. (Source: proteanTecs, CadenceLIVE)

Figure 2 shows how designers must handle “worst-case” scenarios and often do so by creating enough margin to operate under those conditions. In the diagram shown here, that margin shows up as a higher operating V_DD. If the normal operating mode is 650mV with an allowance for a -10% change in V_DD then the design is implemented to run at 585mV (90% * 650mV). Most of the time though, the circuitry will operate properly below 650mV so that running at 650mV is just wasting energy.

proteanTecs then presented a case study that was designed using TSMC’s 5nm technology. The chip incorporated 448 margin agents consisting of buffers with a unit delay of 7ps.

Fig. 3: Example margin agents and corresponding voltage. (Source: proteanTecs, CadenceLIVE)

Figure 3 above shows the margin agents (all 448) on the left side with the thicker black line showing the worst case for all 448. The right side shows the voltage. It also demonstrates that when the threshold is lowered the voltage will now drop to 614mV and the design continues to operate properly.

Fig. 4: Example margin agents with droop and corresponding voltage. (Source: proteanTecs, CadenceLIVE)

Figure 4 shows that as the voltage on the right drops that the worst-case margin agent values also drop and once they cross the yellow(-ish) line the voltage is signaled to return to the pre-AVS voltage of 650mV. The margin agent values then improve and the AVS voltage of 614mV will kick back in. By reacting when the margin agents cross the yellow line, it allows time for the voltage to increase and adjust before the voltage hits the red (585mV) line, thus always keeping it in the proper operating zone.

For this case, proteanTecs saw a 10.77% power saving and said that they’ve typically seen savings in the 9%-14% range. For this data center-oriented customer, this was important because of a limited power budget per rack, cooling limitations, carbon neutrality requirements (PUE), and a high CAPEX. Other benefits are a higher MTTF, lower maintenance costs, and a prolonged system lifetime. proteanTecs claimed a minimal impact on area and that currently most of their designs are in 7nm, 5nm, and below.

The third vendor announced their Aeonic Insight product line including a droop detector on November 14, 2023. Movellus’ Michael Durr, Director of Application Engineering is scheduled to give a talk at DAC on Wednesday, June 26, 2024, titled “Droop! There it is!” Movellus has been long known for their digital clock generation IP and, as one might guess, their design uses a synthetic circuit for detecting changes in the operating environment. Leveraging their clock generation expertise, they are initially targeting an adaptive frequency (or clock) scaling (AFS) approach that also leverages their digital clock generation IP.

The post Margin Sensors In The Wild appeared first on Semiconductor Engineering.

Semiconductor Engineering
Brain-Inspired, Silicon OptimizedBarry Pangrle
The 2024 International Solid State Circuits Conference was held this week in San Francisco. Submissions were up 40% and contributed to the quality of the papers accepted and the presentations given at the conference. The mood about the future of semiconductor technology was decidedly upbeat with predictions of a $1 trillion industry by 2030 and many expecting that the soaring demand for AI enabling silicon to speed up that timeline. Dr. Kevin Zhang, Senior Vice President, Business Development an
29. Únor 2024 v 09:06

Brain-Inspired, Silicon Optimized

Semiconductor Engineering

Od: Barry Pangrle

29. Únor 2024 v 09:06

The 2024 International Solid State Circuits Conference was held this week in San Francisco. Submissions were up 40% and contributed to the quality of the papers accepted and the presentations given at the conference.

The mood about the future of semiconductor technology was decidedly upbeat with predictions of a $1 trillion industry by 2030 and many expecting that the soaring demand for AI enabling silicon to speed up that timeline.

Dr. Kevin Zhang, Senior Vice President, Business Development and Overseas Operations Office for TSMC, showed the following slide during his opening plenary talk.

Fig. 1: TSMC semiconductor industry revenue forecast to 2030.

The 2030 semiconductor market by platform was broken out as 40% HPC, 30% Mobile, 15% Automotive, 10% IoT and 5% “Others”.

Dr. Zhang also outlined several new generations of transistor technologies, showing that there’s still more improvements to come.

Fig. 2: TSMC transistor architecture projected roadmap.

TSMC’s N2 will be going into production next year and is transitioning TSMC from finFET to nanosheet, and the figure still shows a next step of stacking NMOS and PMOS transistor to get increased density in silicon.

Lip Bu Tan, Chairman, Walden International, also backed up the $1T prediction.

Fig. 3: Walden semiconductor market drivers.

Mr. Tan also referenced an MIT paper from September 2023 titled, “AI Models are devouring energy. Tools to reduce consumption are here, if data centers will adopt.” It states that huge, popular models like ChatGPT signal a trend of large-scale AI, boosting some forecasts that predict data centers could draw up to 21% of the world’s electricity supply by 2030. That’s an astounding over 1/5 of the world’s electricity.

There also appears to be a virtuous cycle of using this new AI technology to create even better computing machines.

Fig. 4: Walden design productivity improvements.

The figure above shows a history of order of magnitude improvements in design productivity to help engineers make use of all the transistors that have been scaling with Moore’s Law. There are also advances in packaging and companies like AMD, Intel and Meta all presented papers of implementations using fine pitch hybrid bonding to build systems with even higher densities. Mr. Tan presented data attributed to market.us predicting that AI will drive a CAGR of 42% in 3D-IC chiplet growth between 2023 and 2033.

Jonah Alben, Senior Vice President of GPU Engineering for NVIDIA, further backed up the claim of generative AI enabling better productivity and better designs. Figure 5 below shows how NVIDIA was able to use their PrefixRL AI system to produce better designs along a whole design curve and stated that this technology was used to design nearly 13,000 circuits in NVIDIA’s Hopper.

There was also a Tuesday night panel session on generative AI for design, and the fairly recent Si Catalyst panel discussion held last November was covered here. This is definitely an area that is growing and gaining momentum.

Fig. 5: NVIDIA example improvements from PrefixRL.

To wrap up, let’s look at some work that’s been reporting best in class performance metrics in terms of efficiency, IBM’s NorthPole. Researchers at IBM published and presented the paper 11.4: “IBM NorthPole: An Architecture for Neural Network Inference with a 12nm Chip.” Last September after HotChips, the article IBM’s Energy-Efficient NorthPole AI Unit included many of the industry competition comparisons, so those won’t be included again here, but we will look at some of the other results that were reported.

The brain-inspired research team has been working for over a decade at IBM. In fact, in October 2014 their earlier spike-based research was reported in the article Brain-Inspired Power. Like many so-called asynchronous approaches, the information and communication overhead for the spikes meant that the energy efficiency didn’t pan out and the team re-thought how to best incorporate brain model concepts into silicon, hence the brain-inspired, silicon optimized tag line.

NorthPole makes use of what IBM refers to as near memory compute. As pointed out and shown here, the memory is tightly integrated with the compute blocks, which reduces how far data must travel and saves energy. As shown in figure 6, for ResNet-50 NorthPole is most efficient running at approximately 680mV and approximately 200MHz (in 12nm FinFET technology). This yields an energy metric of ~1100 frames/joule (equivalently fps/W).

Fig. 6: NorthPole voltage/frequency scaling results for ResNet-50.

To optimize the communication for NorthPole, IBM created 4 NoCs:

Partial Sum NoC (PSNoC) communicates within a layer – for spatial computing
Activation NoC (ANoC) reorganizes activations between layers
Model NoC (MNoC) delivers weights during layer execution
Instruction NoC (INoC) delivers the program for each layer prior to layer start

The Instruction and Model NoCs share the same architecture. The protocols are full-custom and optimized for 0 stall cycles and are 2-D meshes. The PSNoC is communicating across short distances and could be said to be NoC-ish. The ANoC is again its own custom protocol implementation. Along with using software to compile executables that are fully deterministic and perform no speculation and optimize the bit width of computations between 8-, 4- and 2-bit calculations, this all leads to a very efficient implementation.

Fig. 7: NorthPole exploded view of PCIe assembly.

IBM had a demonstration of NorthPole running at ISSCC. The unit is well designed for server use and the team is looking forward to the possibility of implementing NorthPole in a more advanced technology node. My thanks to John Arthur from IBM for taking some time to discuss NorthPole.

The post Brain-Inspired, Silicon Optimized appeared first on Semiconductor Engineering.