CableInspect-AD: An Expert-Annotated Anomaly Detection Dataset [NeurIPS 2024]

Akshatha Arodi1*, Margaux Luck1*, Jean-Luc Bedwani2, Aldo Zaimi1, Ge Li1, Nicolas Pouliot2, Julien Beaudry2, Gaétan Marceau Caron1
1Mila - Quebec AI Institute, 2Institut de recherche d'Hydro-Québec

*Denotes equal contribution

CableInspect-AD features power line cables with several types of defects, replicated with high-fidelity by domain experts from Hydro-Québec, a Canadian public utility, to reproduce realistic conditions for robotic inspection. Use the carrousel arrows above to inspect the different cables.

Abstract

Machine learning models are increasingly being deployed in real-world contexts. However, systematic studies on their transferability to specific and critical applications are underrepresented in the research literature. An important example is visual anomaly detection (VAD) for robotic power line inspection. While existing AD methods perform well in controlled environments, real-world scenarios present diverse and unexpected anomalies that current datasets fail to capture. To address this gap, we introduce CableInspect-AD, a high-quality, publicly available dataset created and annotated by domain experts from Hydro-Québec, a Canadian public utility. This dataset includes high-resolution images with challenging real-world anomalies, covering defects with varying severity levels. To address the challenges of collecting diverse anomalous and nominal examples for setting a detection threshold, we propose Enhanced-PatchCore, an enhancement to the celebrated PatchCore algorithm. This enhancement enables its use in scenarios with limited labeled data. We also present a comprehensive evaluation protocol based on cross-validation to assess models' performances. We evaluate our Enhanced-PatchCore for few-shot and many-shot detection, and Vision-Language Models for zero-shot detection. While promising, these models struggle to detect all anomalies, highlighting the dataset's value as a challenging benchmark for the broader research community.

Dataset

Robotic power line inspection represents a specialized and highly challenging domain characterized by a wide range of anomalies, further complicated by the changing appearance of cables due to natural wear. Recognizing the importance of open-science and transparency in evaluating machine learning models for such complex real-world applications, there is a clear need for more public industrial datasets. To this end, we introduce CableInspect-AD features 4,798 high-resolution images and 6,023 annotated anomalies across three types of power line cables that differ in color, texture, and braiding. These anomalies represent the seven most common defect types listed by Hydro-Québec, with varying severity levels. They are meticulously crafted by experts and are annotated at the image level, the pixel level, and with bounding boxes, to provide a detailed categorization of those anomalies both by type and by severity level.

Masaic

The figure shows some examples of anomalies created on the cables by experts. On each image, the anomaly types (grades) are annotated (masks outlined). The grades here are (I)mportant, (L)ight, (C)omplete, (E)xtracted and (D)eep. Anomalies such as long scratches(I) are hard to spot, whereas deposit(I) and spaced strands(I) are easier.

Prototype

The data associated with each instance was acquired through a meticulous manual process. To achieve this, experts selected three cables in operation and identified seven types of anomalies, each categorized by severity grades. Some of these anomalies were manually created by experts. Here, we show the apparatus used to simulate the power line inspection robot and to control the background and the lighting during the acquisition.

robot

Annotation

Here, we show example images after annotation. The images below show instances with more than one anomaly type in the same image with masks. It also highlights the variation in the appearance of different cables.

annotation annotation

Results

To address the challenges of collecting diverse anomalous and nominal examples for setting a detection threshold, we introduce Enhanced-PatchCore an improved approach to PatchCore that sets thresholds using only a training set with a few nominal images. We define a comprehensive evaluation protocol based on cross-validation and evaluate Enhanced-PatchCore for few-shot and many-shot detection. To further eliminate the need for a train set, we seek to use open-source conversational Vision-Language Models (LLaVA, BakLLaVA and CogVLM) and WinCLIP in zero-shot setting. Our findings indicate that the baselines show promising results in detecting anomalies on the cables. However, they struggle with certain types and grades of anomalies, highlighting the need for further research in real-world industrial contexts. By introducing CableInspect-AD, we aim to push the frontiers of VAD and demonstrate its potential to generalize to complex, real-world domains.

F1 score FPR The figure above shows the image level F1-Score and FPR of the VLMs and Enhanced-PatchCore with different thresholding strategies on our dataset. First, we can observe that CogVLM-17B has the best F1 Score, whereas CogVLM2-19B has the lowest FPR. Enhanced-PatchCore has a better F1 score than all VLMs except CogVLM-17B. There are large variations across VLMs, indicating the need for careful selection. Enhanced-PatchCore, even with limited nominal images, maintains competitiveness while offering the added advantage of pixel-level evaluation.

segmentation Enhanced-Patchcore outperforms WinCLIP in the segmentation task on our cropped dataset (background removed), with an AUPRO of 0.53 ± 0.08 compared to 0.27 ± 0.06 for WinCLIP. The figure above displays example outputs from Enhanced-Patchcore, illustrating that the model effectively identifies larger anomalies but struggles with subtler ones. The rightmost image is nominal (green); the rest show anomalies (red). The images (top) and pixel-level prediction heatmap (middle) are shown against ground truth masks (bottom) from different cables. The bottom row shows the segmentation masks coloured based on the anomaly type. Some anomalies are easily detected (left) whereas the others are difficult and are missed (middle). The rightmost image shows a nominal image where texture changes from wear are visible. These texture variations can distract the model, adding complexity to the task.

We find that, in general, the baselines show promising results in detecting anomalies on the cables, but struggle to detect anomalies of certain types and grades. All in all, this use case presents an important challenge for the development of new models that perform well on this task. The dataset is available in the public domain under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.

For more information, please refer to the datasheet provided in the paper. The dataset downloaded using this link includes images and annotation files in COCO format. We provide detailed explanations and scripts to generate labels and masks, along with instructions on how to read the dataset and code to reproduce the results in the code repository. For any issues regarding data download, please contact the authors.

Acknowledgement

This research was enabled in part by compute resources, software and technical help provided by Mila. We thank Ali Harakeh and Pierre-Luc St-Charles from the Mila Applied Machine Learning Research Team (AMLRT) for fruitful discussions, brainstorming and feedback. We also thank Hydro-Québec and IREQ for their involvement throughout the project. The project received funding from Hydro-Québec and was further supported by governmental contributions from the Ministère de l'Économie, de l'Innovation et de l'Énergie (MEIE) and Innovation, Science and Economic Development Canada (ISED). This website template is adapted from Nerfies, and is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.