By Ann Keffer, Arun Gogineni, and James Kim
Cars deploying ADAS and AV features rely on complex digital and analog systems to perform critical real-time applications. The large number of faults that need to be tested in these modern automotive designs make performing safety verification using a single technology impractical.
Yet, developing an optimized safety methodology with specific fault lists automatically targeted for simulation, emulation and formal is challenging. Another challenge is consolidating fault resolution results from various fault injection runs for final metric computation.
The good news is that interoperability of fault injection engines, optimization techniques, and an automated flow can effectively reduce overall execution time to quickly close-the-loop from safety analysis to safety certification.
Figure 1 shows some of the optimization techniques in a safety flow. Advanced methodologies such as safety analysis for optimization and fault pruning, concurrent fault simulation, fault emulation, and formal based analysis can be deployed to validate the safety requirements for an automotive SoC.
Fig. 1: Fault list optimization techniques.
Proof of concept: an automotive SoC
Using an SoC level test case, we will demonstrate how this automated, multi-engine flow handles the large number of faults that need to be tested in advanced automotive designs. The SoC design we used in this test case had approximately three million gates. First, we used both simulation and emulation fault injection engines to efficiently complete the fault campaigns for final metrics. Then we performed formal analysis as part of finishing the overall fault injection.
Fig. 2: Automotive SoC top-level block diagram.
Figure 3 is a representation of the safety island block from figure 2. The color-coded areas show where simulation, emulation, and formal engines were used for fault injection and fault classification.
Fig. 3: Detailed safety island block diagram.
Fault injection using simulation was too time and resource consuming for the CPU core and cache memory blocks. Those blocks were targeted for fault injection with an emulation engine for efficiency. The CPU core is protected by a software test library (STL) and the cache memory is protected by ECC. The bus interface requires end-to-end protection where fault injection with simulation was determined to be efficient. The fault management unit was not part of this experiment. Fault injection for the fault management unit will be completed using formal technology as a next step.
Table 1 shows the register count for the blocks in the safety island.
Table 1: Block register count.
The fault lists generated for each of these blocks were optimized to focus on the safety critical nodes which have safety mechanisms/protection.
SafetyScope, a safety analysis tool, was run to create the fault lists for the FMs for both the Veloce Fault App (fault emulator) and the fault simulator and wrote the fault lists to the functional safety (FuSa) database.
For the CPU and cache memory blocks, the emulator inputs the synthesized blocks and fault injection/fault detection nets (FIN/FDN). Next, it executed the stimulus and captured the states of all the FDNs. The states were saved and used as a “gold” reference for comparison against fault inject runs. For each fault listed in the optimized fault list, the faulty behavior was emulated, and the FDNs were compared against the reference values generated during the golden run, and the results were classified and updated in the fault database with attributes.
Fig. 4: CPU cluster. (Source from https://developer.arm.com/Processors/Cortex-R52)
For each of the sub parts shown in the block diagram, we generated an optimized fault list using the analysis engine. The fault lists are saved into individual session in the FuSa database. We used the statistical random sampling on the overall faults to generate the random sample from the FuSa database.
Now let’s look at what happens when we take one random sample all the way through the fault injection using emulation. However, for this to completely close on the fault injection, we processed N samples.
Table 2: Detected faults by safety mechanisms.
Table 3 shows that the overall fault distribution for total faults is in line with the fault distribution of the random sampled faults. The table further captures the total detected faults of 3125 out of 4782 total faults. We were also able model the detected faults per sub part and provide an overall detected fault ratio of 65.35%. Based on the faults in the random sample and our coverage goal of 90%, we calculated that the margin of error (MOE) is ±1.19%.
Table 3: Results of fault injection in CPU and cache memory.
The total detected (observed + unobserved) 3125 faults provide a clear fault classification. The undetected observed also provide a clear classification for Residual faults. We did further analysis of undetected unobserved and not injected faults.
Table 4: Fault classification after fault injection.
We used many debug techniques to analyze the 616 Undetected Unobserved faults. First, we used formal analysis to check the cone of influence (COI) of these UU faults. The faults which were outside the COI were deemed safe, and there were five faults which were further dropped from analysis. For the faults which were inside the COI, we used engineering judgment with justification of various configurations like, ECC, timer, flash mem related etc. Finally, using formal and engineering judgment we were able to further classify 616 UU faults into safe faults and remaining UU faults into conservatively residual faults. We also reviewed the 79 residual faults and were able to classify 10 faults into safe faults. The not injected faults were also tested against the simulation model to check if any further stimulus is able to inject those faults. Since no stimulus was able to inject these faults, we decided to drop these faults from our consideration and against the margin of error accordingly. With this change our new MOE is ±1.293%.
In parallel, the fault simulator pulled the optimized fault lists for the failure modes of the bus block and ran fault simulations using stimulus from functional verification. The initial set of stimuli didn’t provide enough coverage, so higher quality stimuli (test vectors) were prepared, and additional fault campaigns were run on the new stimuli. All the fault classifications were written into the FuSa database. All runs were parallel and concurrent for overall efficiency and high performance.
Safety analysis using SafetyScope helped to provide more accuracy and reduce the iteration of fault simulation. CPU and cache mem after emulation on various tests resulted an overall SPFM of over 90% as shown in Table 5.
Table 5: Overall results.
At this time not all the tests for BUS block (end to end protection) doing the fault simulation had been completed. Table 6 shows the first initial test was able to resolve the 9.8% faults very quickly.
Table 6: Percentage of detected faults for BUS block by E2E SM.
We are integrating more tests which have high traffic on the BUS to mimic the runtime operation state of the SoC. The results of these independent fault injections (simulation and emulation) were combined for calculating the final metrics on the above blocks, with the results shown in Table 7.
Table 7: Final fault classification post analysis.
Conclusion
In this article we shared the details of a new functional safety methodology used in an SoC level automotive test case, and we showed how our methodology produces a scalable, efficient safety workflow using optimization techniques for fault injection using formal, simulation, and emulation verification engines. Performing safety analysis prior to running the fault injection was very critical and time saving. Therefore, the interoperability for using multiple engines and reading the results from a common FuSa database is necessary for a project of this scale.
For more information on this highly effective functional safety flow for ADAS and AV automotive designs, please download the Siemens EDA whitepaper Complex safety mechanisms require interoperability and automation for validation and metric closure.
Arun Gogineni is an engineering manager and architect for IC functional safety at Siemens EDA.
James Kim is a technical leader at Siemens EDA.
The post Interoperability And Automation Yield A Scalable And Efficient Safety Workflow appeared first on Semiconductor Engineering.