Kerb and Channel and pavement condition monitoring using YOLO

By developing CV-based models and automated reporting systems, the project aims to improve the accuracy and efficiency of infrastructure inspections, including pavements, kerbs, channels, and sewage pipes. The expected outcome is a proof-of-concept solution that incorporates state-of-the-art techniques, helping Beca remain competitive in innovative infrastructure management solutions.

Beca aims to automate asset management workflows, including roading defects, and underground sewage pipeline defect detection. Such procedures were conventionally conducted by humans, which could be rather labour-intensive and costly. Beca’s internal AI scope aims to explore the possibilities of incorporating AI-powered workflows to complement the existing workflow pipelines. Key deliverables include:

Prototype for a pavement defect detection model
Kerb and channel defect detection for Auckland Transport
Pipevision defect detection for Dunedin City Council

Detecting pavement defects using YOLO

Kerb and channel defect detection using YOLO

Pipe defect detection using YOLO

Research Method

The research flowchart is presented as follows. First, a literature review was conducted to identify the state-of-the-art techniques that should be employed. The research method also includes data collection and model development. The following sub-sections will provide details on each stage.

Literature review

The literature review outlines innovative solutions within asset condition monitoring and defect detection, examining their integration within our competitor platforms. It highlights the strategic importance of incorporating these functionalities into Beca’s BEYON offering for upcoming Asset Management contracts. To stay competitive, feasibility and adoption considerations are crucial. The current Asset Management space, along with ongoing ML and LLM initiatives for defect detection are described. A comparative analysis of competitors/partners (Bentley, Conelabs) is provided. The literature review also reviewed incorporating 3D model defect labelling/viewing and dashboard functionality into Beyon based on competitor features. Moreover, it covered the advantages and limitations of the state-of-the-art potential competitors in terms of data collection instrumentation requirements and service provided.

Data collection

The data used in the research include both Beca’s data repository and online open-sourced resources. The research aims to develop a CV-based model or three asset inspection jobs, on pavement, K&C and sewage pipe are trained on different data sets, which are from different sources. The data was collected from open-source repositories that are shared by other researchers who addressed the same problems. In addition, the data also comes from Beca’s existing repositories.

Model development

The model development contains three stages, namely “Defect Detection and Classification”, “Segmentation and Measurement”, and “Reasoning and Reporting Automation”, which are explained in the following sub-sections.

Defect detection and classification

This module aims to determine and classify the defect from the photogrammetry data. Computer vision (CV) based models were trained for these purposes. The module involves “training dataset collection and labelling” and “computer vision model training”.
The labelling process was completed in Label Studio and Roboflow. Label Studio is a local software package while Roboflow is an online platform for training data labelling and augmentation, as shown in Figure 1. Both tools are used in this research to generate labelled samples. Besides manual labelling, self-labelling was also adopted. Grounding-Dino was deployed to automatically generate labelling using text-prompt, as shown in Figure 2 and Figure 3.

Figure 1 Roboflow labelling interface

Figure 2 Prompt text for Grounding-dino

Figure 3 Outputs example generated by the Grounding-dino

Computer vision model training

The CV model used in this research is a state-of-the-art You Only Look Once (YOLO) model. It supports real-time object detection and is known for its speed and accuracy in detecting objects in images of various sizes. Different Yolo models were trained for Pavement, K&C, and Pipevision and the GPU of the Beca computer NZ4400 was deployed for training.

Segmentation and measurement

After the defects are detected, segmentation and measurement is a key step in quantifying the defect size, especially for pavement and bridge surface defects. The SOTA semantic segmentation method UNet was adopted. UNet is an open-source pre-trained crack detection model and is fine-tuned in our projects with our datasets. After segmentation is completed, the length and width of the cracks are measured using an open-source algorithm, the output of which is shown in Figure 4.

Figure 4 Crack size measurement

Reporting automation

Reporting generation is a key step in the final stage of the pipeline. It aims to generate inspection reports from the defects that are picked up by the YOLO model. Hence, a LLM should be developed. In our approach, YOLO defects the defects and then the YOLO outputs are sent to the GPT-4 Vision, which utilises a visual transformer to understand images, and then translate this information into a word description. By combining the two, defects in kerb and channels are first detected using yolo, and then GPT-4 vision is applied on images with defects to generate descriptions of the defect and even find out causes of such defects from the image, as shown in Figure 5.


Yolo V5: Outputs frames containing defects	GPT-4 Vision: Text-based description of defects

Figure 5 YOLO and GPT-4 vision outputs

Research Results and Outcomes

The research results and outcomes are presented for the modules “Defect Detection and Classification”, “Segmentation and Measurement” and “Reasoning and Reporting Automation”.

Defect Detection and Classification

The performance of the YOLO model trained for pavement defect detection is shown in the below table. The precision of the trained model achieved approximately 70%, while the precision of the sewage pipe defect detector achieved 55%. Meanwhile, the performance of the kerb & channel defect detector is inferior to that of the other two. The improvements that can be made in future are discussed in the “Future Research” section

Figure 6 Model performance of pavement defect detector

Figure 7 Model performance of sewage pipe defect detector

Figure 8 Model performance of kerb & Channel defect detector

Segmentation and measurement

The performance of the segmentation and measurement algorithm was validated with human measurements. The size measurement for a single crack is estimated by knowing the image dimensions and is achieved by counting pixels contained in a crack to obtain its length and width. The results are shown in Table 2 and Figure 9.

*Table 2 Performance of crack measurement*
Ground-truth	Model’s results
Crack length = 212mm	Crack length = 197mm
Crack width = 1mm	Crack width = 7mm

Figure 9 Outputs of the crack measurement algorithm

Reasoning and Reporting Automation

The output from the Yolo model includes the frame number and bounding box of the defect, and is saved in a .txt file, and the outputs from the Yolo are then fed into the next stage “Reasoning and Reporting Automation”. In engaging the GPT vision, the first step was providing the prompt, which tells the GPT the role that it needs to play. The prompting used in this research is shown in Figure 10, moreover, the output scheme was also specified. This tells the GPT the format that it should follow when reporting. The final outputs are shown in Table 3.

system_message = {
    "role": "system",
    "content": "You are a crack detection engineer, specializing in identifying defects on kerbs and channels. Your task is to conduct a visual road condition rating survey, focusing on the provided image. Pay attention to details such as overall curb integrity, visibility of joints, debris in the channel, and transitions to the road surface. Emphasize a balanced tone, expressing confidence while acknowledging potential limitations due to shadows. Your responses should maintain a consistent structure, and when addressing debris, align with the notion that minor debris is typical and not necessarily indicative of a defect" 
}

Figure 10 GPT prompting messages

*Table 3: Inspection report outputs (GPT generated)*
Criteria of Interest	Description and Rating
Curb Integrity	The curb shows signs of cracking and potential shifting, particularly around the area where it meets the tree base. This could indicate structural weakness or damage due to root growth. The condition is rated as 2.5 (between Average and Poor). Confidence level: 75% due to shadows that may obscure smaller defects.
Surfaced Channel	The channel appears to be an earth surface channel with some accumulation of natural debris such as leaves and small branches. There is no visible evidence of significant blockages that would prevent water from flowing, but the proximity to the tree suggests potential for blockage due to falling debris. The general condition of the channel in regard to its effectiveness is rated as 2 (Average). Confidence level: 70% due to limited visibility of the entire channel.
Joint Condition	The joints between curb sections are partially visible, with some displacement noticeable near the tree, likely due to root growth. The condition of the joints is rated as 2 (Average). Confidence level: 80% as the joints are mostly visible and the defects are apparent.
Debris in Channel	There is minor debris in the form of leaves and small twigs, which is typical and does not indicate a significant defect. The condition is rated as 1 (Good) in terms of debris impact. Confidence level: 90% as the debris is clearly visible and typical for an outdoor environment.
Transition to Road Surface	The transition to the road surface is not fully visible in the image, but there does not appear to be any significant gaps or sinking at the visible edge of the curb. The condition is rated as 1.5 (Good to Average). Confidence level: 60% due to the limited view of the transition area.
Overall Condition	The curb and channel show signs of wear and potential damage near the tree, which may require attention to prevent further deterioration. There are no visible faults in catchpits/sumps or issues requiring urgent attention in the visible area. The overall condition is rated as 2 (Average). Confidence level: 75% based on the visible evidence and potential for unseen issues.
Additional Considerations	The presence of the tree should be noted as it may affect the long-term integrity of the curb and channel due to root growth. Regular maintenance to clear debris and monitor the condition near the tree base is recommended. Please note that the assessment is based on the visible aspects of the image and is subject to limitations such as shadows, lighting, image resolution, and the extent and angle of image capture.

Table 3 Inspection report outputs (GPT generated)

Summary of Research

The purpose of this project is to develop asset condition monitoring and defect detection technologies. By developing CV-based models and automated reporting systems, the project aims to improve the accuracy and efficiency of infrastructure inspections, including pavements, kerbs, channels, and sewage pipes. The expected outcome is a proof-of-concept solution that incorporates STOA techniques, helping Beca remain competitive in innovative infrastructure management solutions.