Search

Proactive sewer asset management requires accurate condition assessment, yet CCTV inspections remain costly because interpretation is manual. We evaluated 18 vision-language models (VLMs) in a zero-shot setting for automated classification of six sewer defect types using a curated dataset. Each model produced a defect label, a short explanation, and a confidence score. OpenAI proprietary models outperformed open-source ones. GPT-4.1 mini achieved the highest macro-F1 score (0.50), outperforming much larger models, especially for surface damage and cracks/breaks. Some open-source models, such as LLaMa 4 (16x17B) and Qwen2.5-VL (32B), performed above random guessing but remained behind the proprietary models. All models failed to detect production errors, the most difficult class, and performed poorly on deformations. Confidence scores were generally unreliable, with little distinction between correct and incorrect predictions. Textual-output analysis showed that models sometimes described defects accurately even when the assigned label was wrong, although major hallucinations remained. We conclude that VLMs show some promise for sewer asset management, but they are not ready for deployment. Future work should focus on adding asset metadata to prompts and fine-tuning open-source models, especially since larger, newer, and more expensive OpenAI models did not outperform smaller ones, although confirmation requires a more thorough statistical analysis.

Search Results

Refine search

Refine search

Actions for selected content:

1 results

Multi-class sewer defect detection with vision-language models

Search Results

Refine search

Refine search

Actions for selected content:

Save Search

1 results

Multi-class sewer defect detection with vision-language models