Alarm tree analysis using statistical techniques

Omar Aizpurúa, Humberto Álvarez
Department of Electrical Engineering
Technological University of Panamá
[email protected], [email protected]

Ramón Galán, Agustín Jiménez, Alfonso Romero
School of Industrial Engineers
Polytechnic University of Madrid
[email protected], [email protected], [email protected]

Abstract- This work presents a systematic method for the generation and treatment of the alarms' graphs, being its final object to find the Alarm Root Cause of the Massive Alarms that are produced in the dispatching centers. Although many works about this matter have been already developed, the problem about the alarm management in the industry is still complete/y unsolved. In this paper, a simple statistic analysis of the historical data base is conducted. The results obtained by the acquisition alarm systems, are used to generate a directed graph from which the more significan! alarms are extracted, previously analyzing any possible case in which a great quantity of alarms are produced.

Keywords- Avalanche of alarms, Root Cause Analysis, Alarm Tree Analysis, Electrical Power Administration

Resumen- Este trabajo presenta un método sistemático para la generación y tratamiento de los gráficos de las alarmas, siendo su objetivo final el de encontrar la causa principal de las alarmas masivas que se producen en los centros de distribución. Aunque se han desarrollado muchos trabajos sobre este tema, el problema de la gestión de alarmas en la industria no está todavía completamente resuelto. En este trabajo, se lleva a cabo un análisis estadístico simple de la base de datos histórica. Los resultados obtenidos por el sistema de adquisición de las alarmas, se utilizan para generar un gráfico dirigido en el que se extraen las alarmas más importantes, analizando previamente todos los posibles casos en los que se produjeron una gran cantidad de alarmas.

Palabras Claves- Avalancha de alarmas, Análisis de Causa Raíz, Análisis del árbol de alarmas, Administración de Alarmas

Paper type: Original
Received: January 24, 2011
Accepted: July 13, 2011

l. INTRODUCTION

With today's digital control systems 1t 1s possible to configure a large number of alarms in industrial processes. This leads to a large number of alarms generated under normal conditions in many centers (2,000 alarms per day and operator), multiplying these when there is an abnormality in the process (coming to overcome 40,000 daily alarms) [1]. This large number of alarms in times of failure are called "avalanche of alarms". In addition, few alarms are the "Root Cause" ofmore than 90% of all alarms [2].

A group of researchers in intelligent control at the Polytechnic University ofMadrid and the Technological University of Panama, has been working on the analysis of avalanches of alarms occurring in electrical power transmission and distribution centers. This issue can be addressed with different methods such as Bayesian Networks [3], neural networks [4], the expert systems [5], decision theory [6], etc., or even by a mixture of severa! of these methods, such as Neuro-fuzzy models [7] and Integrated Methodologies [1].

In this study, using statistical concepts, a directed graph of the alarms of an industrial process is to be developed using information from the database provided by the data acquisition system and identifying the most significant alarms from those that occur in a given time. That is, in an avalanche of alarms produced at any time, the system gets roots alarms that generate the rest of the alarms, reducing alarms received by the operator responsible for the supervision.

A file of alarms is generated, allowing to carry out certain operations, from the basis of available data of an electric power distribution center. These operations allow the development of a directed graph adjacency matrix, i.e. get to know how the network alarms that can occur are structured. Once matrix this is obtained the most significant alarms that occurred at an interval of time (temporal window) can be deduced by Boolean operations that filter the alarms objective ofthe problem.

The second section of this document develops the processing of data, presenting calculated correlations, partial correlations and conditional probabilities among all alarms resulting in the adjacency matrix of the directed graph of alarms. The third section explains the adjacency matrix and sorne of its characteristics and properties. The fourth section performs analysis of significant alarms, i.e. removal of alarms that are root cause in a case of avalanche. The fifth paragraph is a brief description of the program software created for the generation and analysis of alarms. Finally, outlining conclusions and possible future work used references. To illustrate the application, there is an example with its arrays and results (see Fig. 1).

2. DATA PROCESSING

From an electrical energy distribution center data acquisition system, a set of encoded data with a format of 0' s and 1's is obtained. In this matrix, columns correspond to the database acquisition system alarms and the rows are the time intervals considered when creating the encrypted file. When a 1 appears in a row or temporary window, it indicates that the alarm corresponding to the column in the matrix is active at that time. This last file will provide the necessary relationships between the alarms needed to obtain the adjacency matrix of a directed graph.

The example presented (Fig. 1) is part of a given directed graph for which that alarm file is built. The file, in this case, consists of 10 columns that correspond to thelO alarms to be studied and by a large number of lines that correspond to the same amount of temporary windows obtained from the database of events in a more general case.

Figure 1. Example

A. Correlation Coefficients

Correlation is a dimensionless measure ofthe linear relationship between two variables. Correlation between two alarms x and y is given by the ratio between the covariance and product or the standard deviations of the two variables, i.e.:

Thus the correlation is a measure on the degree of relationship between two variables, no matter what the cause or the effects are.

For this analysis, only the pos1t1ve correlation greater than certain value (threshold), are needed to draw a graph of relations between alarms.

For the example, the correlations between alarms i, j, are presented in the following table, considering a convenient threshold value of 0.2. In this case, the correlation between alarm i and j, is given by the corresponding i,j position ofthe matrix.

Table 1. Correlation coefficients

As seen in (Fig 2), it is possible to build a correlation graph from the resulting matrix, with the graph consisting of severa! redundant connections.

image

Figure 2. Alarm tree correlations

B. Partial Correlations

Excessive connections found with correlations in the previous paragraph can be simplified using partial correlations between two alarms and holding a third alarm constant. Namely, an existing correlation between two alarms (x and y) can be positive and greater than the threshold, but its partial correlation, maintaining alarm constant z, can be less than the threshold; Therefore, the relationship between the two alarms (x and y) would be explained by the relationship between the two alarms with the third (z). Partial correlations will help deleting redundant connections between nodes separated by more than one edge since they explain their common relationship with nodes between them. This process will delete connections between grandchildren and grandparents, because they are explained by the common parent-child relationship of the two alarms.

Partial correlations between all alarms are calculated to eliminate redundant or excessive connections of the graph of the figure above. The partial correlation between two alarms (x and y) keeping constant a third (z) are calculated based on correlations between three alarms, pairwise, follows:

image(2)

As with previous correlations, only values greater than the threshold are considered for the analysis.

For the example discussed, a portion of their calculated partial correlations is shown below.

Table 2. Partial Correlations

For example, correlations between alarms 1 and 2 for a given z, are all greater than the threshold of 0.2. It has been shown experimentally that a threshold equal to 0.2 is a good number to discard whether or not correlated variables.

The same is shown for correlations between 1 and 3. On the other hand, between alarms 1 and 4 there are values that are less than the threshold. That is, the partial correlation for 1 and 4, given as constant alarm 2, has a value of 0.000. Alarm 2 is a son of alarm 1 and father of alarm 3· thus the relationship between grandfather (1) and grandson (3) is explained by the common father-son (2). The same is seen in the relationship between alarms 1 and 5. The resulting graph is shown in figure 3, after eliminating the redundant relationships.

Figure 3. Not directed alarm tree

C. Conditional Probability

Conditional probabilities between alarms, which are also calculated from alarm files, are used to direct the graph. The conditional probability of alarm x given alarm y is:

For example, the array of probabilities shown below reflects the conditional probabilities of alarm i given alarm j in the corresponding position (i, j) for i different of j, and the probability of occurrence of the considered alarm when i is equal to j (diagonal).

Tabla 3. Matrix odds

By hypothesis, when an alarm occurs descendants also should activate your alarm.

The probability of the father given the occurrence of the son will then be lower than the probability of the son given the occurrence of the father. Therefore, with these conditional probabilities the directed graph is constructed. Therefore, when probability of alarm i given alarmj or P(Ai/Aj) is less than the probability of alarmj given i or P (Aj/Ai), then the link is from i to alarm j.

If this applies to the prev1ous undirected graph, it achieves the directed graph that was assumed at first, as you can see in the next figure (Fig. 4):

Figure 4. Prospect alarms tree

3. ADJACENCY MATRIX

A directed graph adjacency matrix is a squared matrix of dimension n, for n being the number of nodes (alarms). The matrix elements are 0, 1 for 1 when there is a link or edge between node 1, j or O otherwise. For the analyzed example the next adjacency matrix was obtained:

Table 4. Adjacency matrix of directed graph

As shown, it is possible to find a simple way to calculate the adjacency matrix file of a set of data alarm, if the following three conditions are met,

If all the conditions are fulfilled, then there are links between alarms i and j.

The adjacency matrix has an important feature. Elevating adjacency matrix to different powers, it results in severa! matrixes indicating connections between nodes in different generations. Elevating the adjacency matrix to the nth power, it is possible to obtain the connections of nodes separated by n edges. For example, the following connection matrices are obtained:

Tabla 5. Connection matrices

As shown, at row i of any n order matrix, it is possible to observe the descendants of order n or alarm i. On the contrary, if column i is seen it is possible to know the ascendants of order n in this alarm.

4. DISCUSSION

For the analysis of any particular case, the Problem Vector approach is utilized. The Problem Vector is a n dimension vector, for n being the number of alarms, formed by 1 and O's being 1 if the alarm is set or O o therwise. Pre-multiplying this vector by the distinct adjacency matrices, values different from O's are obtained in the alarm positions that are nth grade descendant of the Problem Vector alarms. A series of vector with zeros in the nth order descendants is obtained using the NOT statement with all these products. Thus, it is possible to logically multiply the previous results to filter all the descendant alarms from the Problem Vector. For example, assuming a vector problem with alarms 2, 8 and 10 the following Problem Vectors are:

For the example:

Problem * [A] = (0 0 0 1 1 1 0 0 0 1)

Problem * [A] = (0 0 0 0 0 0 0 0 1 0)

Problem * [A] = (0 0 0 0 0 0 0 0 0 0)

Problem * [A] = (0 0 0 0 0 0 0 0 0 0)

Applying the Boolean logical operations, the solution of significant alarms in a given time is: Solution = (P) and {not (P * [A]} and {not (P * [A]2}. . . and {not (P * [A]n≥3}

being n such that [A]n≥3 is the null array. For the example:

Note that for the used problem vector the most significant alarms are 2 and 8, as seen in the following graph (Fig. 5).

image

Figure 5.Example of analysis

5. SOFTWARE

To make all the calculations required for the creation of the directed graph of alarms, as well as for the analysis of any possible case of avalanche a software application was been developed. This customized software ("Alarm Tree") was programmed in Visual C++. It allows the user to open a file of alarms, consisting of O's and 1's, and to calculate the adjacency matrix, that allows the analysis of different alarm avalanches. As seen in the Fig.6, the graphical interface of the program is very simple, being formed by three clearly differentiated blocks: alarm file, calculation of adjacency matrix and analysis avalanche alarm.

In the first block, it opens the file alarm to be studied. It is read in order to count both the number of lines (temporary windows) and columns (alarms).

The second block, the adjacency matrix calculation, calculates the correlations between alarms, partial correlations between all them and their conditional probabilities. The adjacency matrix is built with all these calculations. Once done the calculation process, the matrices all data are displayed.

The last block allows the analysis or the active alarms to obtain the significant root alarms of the systems.

In addition, the software has an Exit key and contact information about the author of the program.

image

Figure 6. Program "Árbol de Alarmas"

6. CONCLUSIONS AND FUTURE WORK

Systematic treatment of massive alarms in electricity distribution and control centers is composed, typically, of a set of different methods that work together to solve the problem. Fig. 7 shows this methodology proposed by [1]. This work is intended to simplify the problem of alarm avalanches, achieving an easy way to obtain significant alarms in any case, starting with only data provided by the data acquisition system.

Future work is concemed with improving the software application. Among them it is possible to consider aspects such as incorporating a database with specific data from alarms and a response suggestion for them. Such improvement will help to analyze any avalanche and the decision-making process in terms of the root causes and possible failures in the system. Moreover, an application that allows the construction of the directed graphs and sub-graphs will be incorporated. Finally, other functions facilitating calculation handling, modifications or simulation of different conditions, can be added.

image

Figure 7. Usual methodology m the processing of massive alarms.

7. REFERENCES

[1] O. Aizpurúa, et al., "ANew Methodology of Massive Alarm System, in Electrical Power Administration". 7th Latin American and Caribbean Conference for Engineering and Technology. Laccei 2009.

[2] K. Julisch, "Clustering lntrusion Detection Alarms to Support Root Cause Analysis". ACM Transactions on Information and System Security 6. 2003.

[3] l. A. Beinlich, H. J Suermondt, R. M. Chavez, & Cooper, G. F. "The ALARM Monitoring System: A Case Study with Two Probabilistic Inference Techniques for Belief Networks". 1989 Pearl, Judea. Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference. Morgan Kaufmann. 1988.

[4] J. Orr, y D. Westenskow, Joumal of Clinical Monitoring. "A Breathing Circuits Alarm System Based on Neural Networks". Vol. 10, nº2, p. 101-109.0rr, J., et al., 1994 & Windsor, C.G., 1993.

[5] B. Buchanan, and E. Short Life Rule-based expert systems, Readings (MA). USA: Addison-Wesley. 1984.

[6] Horvitz, E. and Barry, M. Proceedings ofthe 11th Conference on Uncertainly in Artificial Intelligence. Display of Information for Time-Critica! Decision Making. p., 296-305. 1995.

[7] L. Blázquez, and L. De Miguel, "Diagnóstico Automático de Fallos para Sistemas Dinámicos No-Lineales". Universidad de Nuevo León y Universidad de Valladolid. 1999.