Seabed segmentation by Side Scan Sonar data is an important task, because modern seabed exploration is crucial in many areas of human activity. From ecology to oil exploration, there is a need for such analysis. Given the diversity of both sonars and survey conditions, there is a need to analyze such heterogeneous data. In addition, processing huge amounts of data is associated with considerable difficulty for the operator to analyze manually, especially for segmentation tasks. Therefore, the development of automated solutions is an important challenge.
To train the neural network, we used datasets obtained as a result of manual data labeling. The seabed surface with the following main categories were used as classes: blind, rock, hills, sand.
Overview
Detectron2 is an object detection and segmentation framework built on the PyTorch library. It offers a comprehensive set of tools for training, evaluation, and deployment of various computer vision models. Detectron2 has gained widespread adoption in both academic research and industrial applications due to its flexibility, performance, and ease of use.
Key Features:
Output results
Metrics/Model evaluation:
Overview
Segmentation models is a library that provides image segmentation using a large number of neural network architectures. The library is widely used in segmentation tasks and has proven to be the best in relevant competitions. The main neural networks that can be used in accordance with the task are: UNet++, FPN, PSPNet, and others. The architectures of these neural networks are shown in fig.4 – fig. 6. Given the peculiarities of the library construction, one of the advantages of this solution is the ability to change the architecture of the segmentation model without significant changes in the code.
In addition, this library supports a large number of backbones, which can significantly improve segmentation results.
Output results
The results of the seabed segmentation model are shown in Fig. 7.
Metrics/Model evaluation
Taking into account the fact that each neural network model has its own time and place, three neural networks with different backbones were tested. The IoU metric is used as the main indicator to evaluate the segmentation results:
IoU = Area of Intersection / Area of Union
Also, to estimate the model size, it was decided to use the model volume on disk, which stores the weights of the neural network. This way, we can compare the size of the model and the cost of inference.
Table below shows the main obtained indicators.
The above indicators show that the best results obtained for this dataset are about 83%, and no further growth is observed. This indicates the limitations of the dataset. However, the best architecture is UNet++ for this dataset. In addition, given the size of the model and the backbone, UNet++ and mobilenet_v2 are optimal in terms of quality/size ratio. Mobilenet_v2. It should be noted that in the case of a larger dataset with more complex patterns, the results may differ slightly. Thus, it is necessary to make appropriate measurements in future studies.
Overview
In January 2023, Ultralytics, the authors of YOLOv5, released YOLOv8. It is not the first YOLO architecture supporting segmentation, but it is the most efficient and fastest. The most popular method of YOLOv8 application is dependent on the ultralytics lib (AGPL-3.0 license) in Python. There is also implementation using ikomia lib (Apache 2.0 license), but it has not been tested by us. YOLOv8 was considered and tested as one of the state-of-the-art segmentation tools. It provides only instance segmentation, that is, it distinguishes every distinct object instance within a class.
The difference between semantic and instance segmentation
Output results
YOLO architecture has several pre-trained models: from the smallest and the fastest (YOLOv8n, nano) to the largest and the most accurate (YOLOv8x, extra large). The full list can be found here. The authors recommend using a pre-trained model for training a model on a custom dataset instead of training a model from scratch. The results are presented for a small and a large model.
Metrics/Model evaluation
As a precision metric, the mean average precision at an intersection over union (IoU), the mAP, was used. This metric estimates how accurately the detection matches the ground truth. There are mAP versions for both bounding box and for mask (segmentation polygons). Usually, the mAP with threshold 50 and average mAP in range of thresholds from 50 to 95 are used. They are denoted as mAP50 and mAP50-95. In addition, the confusion matrix is presented to demonstrate the statistics of false positives.
All image segmentation tasks can be solved using such frameworks/libraries as Detectron2, Yolov8, and Segmentation Models Pytorch. The necessity of a particular model depends primarily on hardware and software requirements. Namely: model size, inference time, license, etc.
Performance, flexibility, and license comparison
Advantages:
Disadvantages:
Advantages:
Disadvantages:
Advantages:
Disadvantages:
*The MIT License is a simple and permissive software license that allows the use, modification, and distribution of software without restrictions.
**AGPL-3.0 license that requires providing the source code of any modified AGPL software if you distribute it or provide access to it over a network.
For all frameworks for model training purposes, a Python environment is required.
The minimum dataset that should reliably predict/segment the seabed based on side scan sonar data should be about 1000 images with relatively evenly distributed classes.