### What would you do when chips fail? - □ Is it due to design bugs? - If most chip fails with the same syndrome when running an application - □ Is it due to parametric yield loss? - Timing-related failure? - Insufficient silicon speed? - Noise-induced failure? - supply noise, cross-talk, leakage, etc.? - Lack of manufacturability? - inappropriate layout? - □ Is it due to random defects? - Via misalignment, Via/Contact void, Mouse bite, - Unintentional short/open wires, etc. Ch11-3 # Problem: Fault Diagnosis This chapter focuses more on diagnosis of defects or faults, not design bugs Question: Where are the fault locations? ## **Quality Metrics of Diagnosis** #### Success rate - The percentage of hitting at least one defect in the physical failure analysis - This is the ultimate goal of failure analysis #### Diagnostic resolution - Total <u>number of fault candidates reported</u> by a tool - The perfect diagnostic resolution is 1 - Though perfect resolution does not necessarily imply high hit rate #### First-hit index - Used for a tool that reports a ranked list of candidates - Refers to the index of the first candidate in the ranked list that turns out to be a true defect site - Smaller first-hit index indicates higher accuracy #### □ Top-10 hit - Used when there are multiple defects in the failing chip - The number of true defects in the top 10 candidates ## Possible Assumptions Used in Diagnosis - □ Stuck-At Fault Model Assumption - The defect behaves like a stuck-at fault - □ Single Fault Assumption - Only one fault affecting any faulty output - □ Logical Fault Assumption - A fault manifests itself as a logical error - □ Full-Scan Assumption - The chip under diagnosis has to be full-scanned Note: A diagnosis approach less dependent on the fault assumptions is more capable of dealing with practical situations. ### **Outline** #### Introduction - Combinational Logic Diagnosis - Cause-Effect Analysis - Effect-Cause Analysis - Chip-Level Strategy - Diagnostic Test Pattern Generation - □ Scan Chain Diagnosis - □ Logic BIST Diagnosis - Conclusion ### Cause-Effect Analysis - □ Fault dictionary (pre-analysis of all causes) - Records test response of every fault under the applied test set - Built by intensive fault simulation process - □ A chip is diagnosed (effect matching) - By matching up the failing syndromes observed at the tester with the pre-stored fault dictionary ## Backtrace Algorithm #### □ Trace back from each mismatched PO To find out suspicious faulty locations #### □ Functional Pruning During the traceback, some signals can be disqualified from the fault candidate set based on their signal values. #### □ Rules - (1) At a controlling case (i.e., 0 for a NAND gate): Its fanin signals with non-controlling values (i.e., 1) are excluded from the candidate set. - (2) At a non-controlling case (i.e., 1 for a NAND gate): Every fanin signal remains in the candidate set. ## Why Curable Vector? ### □ Information theory - A less probable event contains more information - Curable output is an easy-to-satisfy criterion, high aliasing - Curable vector is a hard-to-satisfy criterion, low aliasing ### □ Not all failing input vectors are equal! ### □ Niche input vector - Is an failing input vector that activates only one fault - Likely to be a curable vector of certain signals - Few, but tells more about the real fault locations | Failing<br>Input<br>Vectors | Signals in the CUD | | | | | | | | |-----------------------------|-----------------------|----------------|-----------------------|----------------|-----------------------|------------------------------------|-----------------------|--| | | <i>f</i> <sub>1</sub> | f <sub>2</sub> | <b>f</b> <sub>3</sub> | f <sub>4</sub> | <b>f</b> <sub>5</sub> | <b>f</b> <sub>6</sub> | <b>f</b> <sub>7</sub> | | | <b>V</b> <sub>1</sub> | * | | | | ( * | | | | | <b>V</b> <sub>2</sub> | * | * | * | | | | | | | <b>V</b> <sub>3</sub> | | | * | * | | | * | | | <b>V</b> <sub>4</sub> | | | | | * | * | | | | <b>V</b> <sub>5</sub> | | * | | | * | | | | | <b>v</b> <sub>6</sub> | | * | | | * | | | | | <b>V</b> <sub>7</sub> | * | | * | | | | | | | <b>v</b> <sub>8</sub> | | | * | | | | * | | | <b>V</b> <sub>9</sub> | | | * | | * | | | | | <b>V</b> <sub>10</sub> | | | | | * | | * | | | A mark *<br>a SLAT ve | | | | | is a va | (f <sub>3</sub> and f<br>lid fault | | | ### **Outline** - Introduction - □ Combinational Logic Diagnosis - Cause-Effect Analysis - Effect-Cause Analysis - ( - Chip-Level Strategy - Diagnostic Test Pattern Generation - □ Scan Chain Diagnosis - □ Logic BIST Diagnosis - Conclusion ## Main Strategy: Detach-Divide-and-then-Conquer - □ Phase 1: Isolate Independent Faults - Search for prime candidates - Use word-level information - □ Phase 2: Locate Dependent Faults As Well - Perform partitioning - Aim at finding one fault in each block ## Word-Level Registers and Outputs Signals in a design are often defined in words. This property can be used to differentiate fake prime candidates from the real ones. **Word-Level Output: 01** Word-Level Registers: R1, R2, State ``` module design( O1, ...) output[31:0] O1; reg[31:0] R1, R2; reg[5:0] State ... endmodule ``` ## Efficiency of Using Word-Level Info. - **□** Without word-level Information - 2.4 real faults out of 72.3 candidates - **□** With word-level Information - 1.23 real faults out of 3.65 candidates | # of candidates | Original | After<br>Filtering | Filtering<br>Ratio | |--------------------------|----------|--------------------|--------------------| | Prime<br>Candidates | 2.375 | 1.23 | 48.2 % | | Fake Prime<br>Candidates | 69.96 | 2.42 | 96.5 % | ### **Summary** - Strategy - (1) Search For Word-Level Prime Candidates - (2) Identify Independent Faults First - (3) Locate Dependent Faults As Well - Effectiveness - identify 2.98 faults in 5 signal inspections - find 3.8 faults in 10 signal inspections ## Computation of Average-Sum Filtering $\square$ (Average-sum filtering) Assume that the difference profile is given and denoted as D[i], where i is the index of a flip-flop. We use the following formula to compute a smoothed difference profile, SD[i]: $$SD[i] = 0.2*(D[i-2]+D[i-1]+D[i]+D[i+1]+D[i+2])$$ Ch11-69 ## Computation of Edge Detection - □ The true location of the faulty flip-flop is likely to be the *left-boundary of the transition region in the difference profile*. To detect this boundary, we can use a simply *edge detection formula* defined below. - $\Box$ (Edge detection) On the smoothed difference profile SD[i], the following formula can be used to compute the faulty frequency of each flip-flop as a suspicious profile. $$suspicion [i] = [-1,-1,-1,1,1] \cdot \begin{vmatrix} |SD[i] - SD[i-3]| \\ |SD[i] - SD[i-2]| \\ |SD[i] - SD[i-1]| \\ |SD[i] - SD[i+1]| \\ |SD[i] - SD[i+2]| \\ |SD[i] - SD[i+3] \end{vmatrix}$$ ### Summary of Scan Chain Diagnosis - Hardware Assisted - Extra logic on the scan chain - Good for stuck-at fault - Fault Simulation Based - To find a faulty circuit matching the syndromes [Kundu 1993] [Cheney 2000] [Stanley 2000] - Tightening heuristic → upper & lower bound [Guo 2001][Y. Huang 2005] - Use single-excitation pattern for better resolution [Li 2005] - Profiling-Based Method - Locate the fault directly from the difference profiles obtained by run-and-scan test - Applicable to bridging faults - Use signal processing techniques such as filtering and edge detection Ch11-71 ### **Outline** - Introduction - □ Combinational Logic Diagnosis - □ Scan Chain Diagnosis - Logic BIST Diagnosis - Overview - Interval-Based Method - Masking-Based Method - Conclusion ## Diagnosis for BISTed Logic #### □ Diagnosis in a BIST environment requires - determining from compacted output responses which test vectors have produced a faulty response (time information) - determining from compacted output responses which scan cells have captured errors (space information) #### □ The true fault location inside the logic Can then be inferred from the above space and time information using previously discussed combinational logic diagnosis ### **Conclusions** - □ Logic diagnosis for combinational logic - Has been mature - Good for not just stuck-at faults, but also bridging faults - □ Scan chain diagnosis - Making good progress ... - Fault-simulation-based, or signal-profiling based - □ Diagnosis of scan-based logic BIST - Hardware support is often required - Interval-unloading, or masking-based - **□** Future challenges - Performance (speed) debug - Diagnosis for logic with on-chip test compression and decompression - Diagnosis for parametric yield loss due to nanometer effects