Anomaly 2 Trainer
Anomaly 2 Trainer
Early detection of significant abnormal changes is highly desirable for oil refinery processes, which consist of sophisticated unit operations handling hazardous and flammable inventories and operating at high temperature and pressure. Close monitoring and anomaly detection are vital for avoiding major accidents and losses and enable intervention before failure occurrence. A new big data analytics tool called Plant Health Index (PHI) is proposed in this work. PHI is a statistical anomaly detection software that trains its model using online normal plant operation, then uses statistical analytics to detect anomalies. For detecting the anomalies, a combined method of multivariate analysis of residuals and nonparametric models of the process is employed. The methodology provides a structured representation of the plant variables to ease the detection of problems along with the detection of operation changes of the system. The PHI system has been tested on a hydrotreating units in a refinery, which consists of catalytic reactors and separators. The current implementation tagged 170 process variables and proved effective in capturing the normal operational conditions of the plant. When placed online, PHI was able of detecting anomalies that are difficult to detect using the control system and before being detected by the alarm system.
Oil refineries are among the most complicated dynamical structures, requiring smooth, effective and safe operation to continuously produce high quality products at competitive costs. Extremely sophisticated surveillance systems are necessary, with early identification of plant malfunctions and anomalous behavior. Machine learning algorithms can be effectively used to discover anomalies based on online and historical data, which can lead to system health monitoring. When studying real-world data sets, knowing which examples stand out as being different from all others is a common requirement. Anomalies are these types of events, and the purpose of outlier detection or anomaly detection is to find all of them using online operational data1.
Condition-based maintenance (CBM) can reduce cost expenditure by reducing unnecessary and time-consuming maintenance activities and reducing human errors. One of the essential methodologies in CBM is to predict an anticipated normal state compared to a measured observation. If the difference between the expected state and the observed state increases, one can suspect an anomaly in the system. There are two types of models that are used in predicting normal states in the framework of CBM. The model is derived from basic physical principles for the first type, while for the second one, the model is inferred from historical observations5. This research focused on the latter model type, which is sometimes considered an empirical model based on statistical analytics. Empirical models are more practical for the following reasons6:
Research activities carried out in this study focused on statistical learning strategies for supporting condition-based maintenance (CBM). The proposed system is composed of a training mode and an execution mode. First, an empirical model is developed using the data collected from normal working conditions in the training mode. In contrast, execution mode involves deciding on an anomaly band by inspecting input operational data and studying its deviation from the modeling output from the training mode.
Operation supporting system is developed by using the process pattern recognition technology proposed in this research. The system is designed to provide a graphical user interface consisting of the main display, success tree display, trends display, counseling display, trainer, and runtime. Using the success tree display, the operators should be able to configure the tree and the weight of each node. Trends display is designed to include actual values and model estimates of process variables. Counseling display is provided to support operators in diagnosing the detected faults. Finally, operators can decide sampling methods, grouping options, and kernel optimization methods in the trainer and runtime module.
After having finished the implementation of the Transformer architecture, we can start experimenting and apply it to various tasks. In this notebook, we will focus on two tasks: parallel Sequence-to-Sequence, and set anomaly detection. The two tasks focus on different properties of the Transformer architecture, and we go through them below.
Transformers offer the perfect architecture for this as the Multi-Head Attention is permutation-equivariant, and thus, outputs the same values no matter in what order we enter the inputs (inputs and outputs are permuted equally). The task we are looking at for sets is Set Anomaly Detection which means that we try to find the element(s) in a set that does not fit the others. In the research community, the common application of anomaly detection is performed on a set of images, where \(N-1\)images belong to the same category/have the same high-level features while one belongs to another category. Note that category does not necessarily have to relate to a class in a standard classification problem, but could be the combination of multiple features. For instance, on a face dataset, this could be people with glasses, male, beard, etc. An example of distinguishing different animals can be seen below. The first four images show foxes, while the last represents a different animal. Wewant to recognize that the last image shows a different animal, but it is not relevant which class of animal it is.
Next, we can setup our datasets and data loaders below. Here, we will use a set size of 10, i.e. 9 images from one category + 1 anomaly. Feel free to change it if you want to experiment with the sizes.
We can already see that for some sets the task might be easier than for others. Difficulties can especially arise if the anomaly is in a different, but yet visually similar class (e.g. train vs bus, flour vs worm, etc.).
After having prepared the data, we can look closer at the model. Here, we have a classification of the whole set. For the prediction to be permutation-equivariant, we will output one logit for each image. Over these logits, we apply a softmax and train the anomaly image to have the highest score/probability. This is a bit different than a standard classification layer as the softmax is applied over images, not over output classes in the classical sense. However, if we swap two images in theirposition, we effectively swap their position in the output softmax. Hence, the prediction is equivariant with respect to the input. We implement this idea below in the subclass of the Transformer Lightning module.
In this example, the model confuses a palm tree with a building, giving a probability of 90% to image 2, and 8% to the actual anomaly. However, the difficulty here is that the picture of the building has been taken at a similar angle as the palms. Meanwhile, image 2 shows a rather unusual palm with a different color palette, which is why the model fails here. Nevertheless, in general, the model performs quite well.
In this tutorial, we took a closer look at the Multi-Head Attention layer which uses a scaled dot product between queries and keys to find correlations and similarities between input elements. The Transformer architecture is based on the Multi-Head Attention layer and applies multiple of them in a ResNet-like block. The Transformer is a very important, recent architecture that can be applied to many tasks and datasets. Although it is best known for its success in NLP, there is so much more toit. We have seen its application on sequence-to-sequence tasks and set anomaly detection. Its property of being permutation-equivariant if we do not provide any positional encodings, allows it to generalize to many settings. Hence, it is important to know the architecture, but also its possible issues such as the gradient problem during the first iterations solved by learning rate warm-up. If you are interested in continuing with the study of the Transformer architecture, please have a look atthe blog posts listed at the beginning of the tutorial notebook. 350c69d7ab