Wednesday, November 24, 2021

Diagnostic Performance Metrics for PHM Algorithms

Carl Byington developed and patented diagnostic and prognostics technologies for Impact Technologies, Sikorsky Aircraft, and Lockheed Martin. Carl Byington became an expert in prognostics and health management (PHM) technologies and next-generation condition-based maintenance (CBM) solutions. He currently consults in these technical areas with his PHM Design company, located in Georgia. In this latest blog series he discusses how to verify and validate diagnostic algorithms for a specific application. See here for more about Carl Stewart Byington

Algorithm Concept

Diagnostic algorithms are typically qualified on specific types of faults with limited test data and engineering judgment of intended applicability. Mechanical system ailments in gearboxes for instance may vary from root cause conditions such as shaft misalignment; to fatigue events such as spalled bearings; to slower wear processes related to scuffing wear on gears. The accuracy of the fault detection and diagnostic processes for such a wide array of problems will not only depend on the algorithm’s sensitivity to signal- to-noise ratio but also load level, failure type and flight condition. Diagnostic algorithms that are sensitive to faulted conditions yet relatively insensitive to confounding conditions are desirable for a broader range of application in equipment health monitoring. Such generalized algorithms, though, may be less sensitive to early fault detection. In order to assess the risk associated with using certain diagnostic algorithms, qualify them for a range of use, determine desirable thresholds to produce known false alarm rates, and develop fusion approaches, we need to evaluate the detection performance and diagnostic accuracy using established performance metrics. Here are some specific metrics we can use for each step.

Decision Matrix Construct

The following Decision Matrix defines the cases used to evaluate fault detection. It is based on hypothesis testing methodology and represents the possible fault-detection combinations that may occur.

 Detection Decision Matrix


Outcome

Fault (F1)

No Fault (F0)

Total

Positive (D1)

(detected)

a

Number of defected faults

b

Number of false alarms

a+b

Total number of alarms

Negative (D0)

(not detected)

c

Number of missed faults

d

Number of correct rejections

c+d

Total number of non-alarms

 

a+c

Total number of faults

b+d

Total number of fault-free cases

a+b+c+d

Total number of cases

From this matrix, the detection metrics can readily be computed. The Probability of detection given a fault (a.k.a. sensitivity) assess the detected faults over all potential fault cases:

                                        (1)

The probability of false alarm (POFA) considers the proportion of all fault-free cases that trigger a fault detection alarm.

                                       (2)

The Accuracy is used to measure the effectiveness of the algorithm in correctly distinguishing between a fault-present and fault-free condition. The metric uses all available data for analysis (both fault and no fault):

Accuracy=         (3)

Diagnostic metrics are used to evaluate classification algorithms, typically consider multiple fault cases, and are based upon the confusion matrix concept. The matrix, illustrates the results of classifying data into several categories. A confusion matrix shows actual defects (as headings across the top of the table) and how they were classified (as headings down the first column). The shaded diagonal represents the number of correct classifications for each fault in the column and subsequent numbers in the columns represent incorrect classifications. Ideally, the numbers along the diagonal should dominate. The confusion matrix can be constructed using percentages or actual cases witnessed.

The probability of isolation (Fault Isolation Rate - FIR) is the percentage of all component failures that the classifier is able to unambiguously isolate. It is calculated using:

                                                (4)

                                                               (5)

              Ai = the number of detected faults in component i that the monitor is able to isolate                             unambiguously as due to any failure mode (Numbers on the diagonal of the Confusion Matrix).

              Ci = the number of detected faults in component i that the monitor is unable to                                     isolate unambiguously as due to any failure mode (Number off the diagonal of the Confusion Matrix).

An alternative metric that is also useful is the Kappa Coefficient, which represents how well an algorithm is able to correctly classify a fault with a correction for chance agreement.

         (6)

where:

N(obs in agreement) = sum of diagonals in matrix

N(exp in agreement) = sum{[sum of row)/N]*(sum of column)} for diagonals

N(total) = total number of observations

The metrics presented here form the basis for analysis of current detection and diagnostic algorithm effectiveness. The issue that confronts the PHM designer and researcher now is to generate realistic estimates of these metrics given the currently limited baseline data and faulted data available. The basis for an effective statistical analysis, given these limitations, will be discussed in a separate entry.

Additional Resources

A full review of detection and diagnostic metrics is summarized in a paper by Carl Byington, et al. It is available here:

http://www.humsconference.com.au/Papers2003/HUMSp404.pdf

Carl Byington’s related publications can be found at:

https://www.researchgate.net/profile/Carl-Byington

Carl Byington may be contacted for specific consulting engagements at:

https://phmdesign.com/contact-us/

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.