Sonar Side Scan Semantic Segmentation

Introduction


Seabed segmentation by Side Scan Sonar data is an important task, because modern seabed exploration is crucial in many areas of human activity. From ecology to oil exploration, there is a need for such analysis. Given the diversity of both sonars and survey conditions, there is a need to analyze such heterogeneous data. In addition, processing huge amounts of data is associated with considerable difficulty for the operator to analyze manually, especially for segmentation tasks. Therefore, the development of automated solutions is an important challenge.

Dataset overview


To train the neural network, we used datasets obtained as a result of manual data labeling. The seabed surface with the following main categories were used as classes: blind, rock, hills, sand.

Frameworks

Detectron2

Overview

Detectron2 is an object detection and segmentation framework built on the PyTorch library. It offers a comprehensive set of tools for training, evaluation, and deployment of various computer vision models. Detectron2 has gained widespread adoption in both academic research and industrial applications due to its flexibility, performance, and ease of use.

Key Features:

  1. Modular Architecture: Detectron2 follows a modular design, allowing users to easily customize and extend various components such as backbones, feature extractors, and heads. This modular approach facilitates experimentation with different model architectures and components.
  2. Support for Multiple Tasks: Detectron2 supports a wide range of computer vision tasks including object detection, instance segmentation, keypoint detection, and panoptic segmentation. This versatility makes it suitable for various applications such as object tracking, image understanding, and scene understanding.
  3. Pre-trained Models: Detectron2 provides pre-trained models for popular architectures such as Faster R-CNN, Mask R-CNN, and RetinaNet. These pre-trained models serve as a starting point for training on custom datasets, enabling faster convergence and better performance.
  4. Efficient Training and Inference: Detectron2 incorporates optimizations for efficient training and inference, including support for mixed precision training, distributed training across multiple GPUs, and inference-time optimizations such as model pruning and quantization.
  5. Rich Set of Utilities: Detectron2 includes a rich set of utilities for data augmentation, visualization, and model evaluation. These utilities streamline the entire workflow from data preparation to model deployment, making it easier for users to build and deploy computer vision applications.

 

Output results

Metrics/Model evaluation:

Segmentation models pytorch


Overview

Segmentation models is a library that provides image segmentation using a large number of neural network architectures. The library is widely used in segmentation tasks and has proven to be the best in relevant competitions. The main neural networks that can be used in accordance with the task are: UNet++, FPN, PSPNet, and others. The architectures of these neural networks are shown in fig.4 – fig. 6. Given the peculiarities of the library construction, one of the advantages of this solution is the ability to change the architecture of the segmentation model without significant changes in the code.

In addition, this library supports a large number of backbones, which can significantly improve segmentation results.

Output results

The results of the seabed segmentation model are shown in Fig. 7.

 

Metrics/Model evaluation

Taking into account the fact that each neural network model has its own time and place, three neural networks with different backbones were tested.  The IoU metric is used as the main indicator to evaluate the segmentation results:

IoU = Area of Intersection / Area of Union

Also, to estimate the model size, it was decided to use the model volume on disk, which stores the weights of the neural network. This way, we can compare the size of the model and the cost of inference.

Table below shows the main obtained indicators.

The above indicators show that the best results obtained for this dataset are about 83%, and no further growth is observed. This indicates the limitations of the dataset. However, the best architecture is UNet++ for this dataset. In addition, given the size of the model and the backbone, UNet++ and mobilenet_v2 are optimal in terms of quality/size ratio. Mobilenet_v2. It should be noted that in the case of a larger dataset with more complex patterns, the results may differ slightly. Thus, it is necessary to make appropriate measurements in future studies.

Yolo


Overview

In January 2023, Ultralytics, the authors of YOLOv5, released YOLOv8. It is not the first YOLO architecture supporting segmentation, but it is the most efficient and fastest. The most popular method of YOLOv8 application is dependent on the ultralytics lib (AGPL-3.0 license) in Python. There is also implementation using ikomia lib (Apache 2.0 license), but it has not been tested by us. YOLOv8 was considered and tested as one of the state-of-the-art segmentation tools. It provides only instance segmentation, that is, it distinguishes every distinct object instance within a class.

The difference between semantic and instance segmentation

Output results

YOLO architecture has several pre-trained models: from the smallest and the fastest (YOLOv8n, nano) to the largest and the most accurate (YOLOv8x, extra large). The full list can be found here. The authors recommend using a pre-trained model for training a model on a custom dataset instead of training a model from scratch. The results are presented for a small and a large model.

Metrics/Model evaluation

As a precision metric, the mean average precision at an intersection over union (IoU), the mAP, was used. This metric estimates how accurately the detection matches the ground truth. There are mAP versions for both bounding box and for mask (segmentation polygons). Usually, the mAP with threshold 50 and average mAP in range of thresholds from 50 to 95 are used. They are denoted as mAP50 and mAP50-95. In addition, the confusion matrix is presented to demonstrate the statistics of false positives.

Conclusions

 

All image segmentation tasks can be solved using such frameworks/libraries as Detectron2, Yolov8, and Segmentation Models Pytorch. The necessity of a particular model depends primarily on hardware and software requirements. Namely: model size, inference time, license, etc.

Performance, flexibility, and license comparison

Detectron2


Advantages
:

  • High accuracy: Detectron2 is a powerful framework for segmentation with high precision.
  • Flexibility and customizability: Detectron2 provides numerous parameters for tuning models and algorithms, allowing for optimal results for specific tasks.
  • MIT license*


Disadvantages
:

  • Requires significant computational resources, especially for complex models and large datasets. Models 300mb+


YOLO


Advantages
:

  • Fast training and inference.
  • Lightweight models.


Disadvantages
:

  • Less less accurate than some other segmentation tools.
  • AGPL-3.0 license**


Segmentation Models Pytorch


Advantages
:

  • It is possible to get a fairly small model
  • In the case of semantic segmentation, it is in most cases an ideal solution that can be modified in favor of accuracy and speed by using various kinds of backbones
  • MIT license*


Disadvantages
:

  • May not offer the same level of performance or flexibility as frameworks like Detectron2, especially for complex segmentation tasks.
  • For better results, it may be necessary to use a larger dataset.


*
The MIT License is a simple and permissive software license that allows the use, modification, and distribution of software without restrictions.

**AGPL-3.0 license that requires providing the source code of any modified AGPL software if you distribute it or provide access to it over a network.

For all frameworks for model training purposes, a Python environment is required.

Dataset requirements


The minimum dataset that should reliably predict/segment the seabed based on side scan sonar data should be about 1000 images with relatively evenly distributed classes.

What our clients say

Throughout our cooperation, Aksiio team has consistently met and exceeded our expectations. From developing core algorithms for 3D Scanning & Reconstruction to building a dedicated team. I am pleased to recommend them because their expertise in challenging tech areas stands apart from the competition. Our software developers also liked the cooperation a lot from the human / personal side.”

Julian Berlow
CEO at Scoobe3D GmbH

“Aksiio team are helping us build a highly complex Computer Vision pipeline to automate the generation of virtual tours for commercial and residential real estate from scratch. We are working with an extended team that has delivered excellent results so far. It is clear that Aksiio’s specialists have deep technical knowledge that allows them to take on complicated projects such as ours. Borys has a great attitude, an eye for details, and most importantly, he is easy to communicate with. I look forward to the results of our work together.”

Tom Chomiak
CTO at LCP360

Looking over a long-standing history between Aksiio and Chesapeake Technology, I would like to recommend Aksiio team for many successful engineering jobs. I would recommend them to anyone who is looking for a solution to today’s challenging tech problems.”  

Harold Orlinsky
General Manager at Chesapeake Technology Inc

“Borys is building an incredible company focused on solving the most challenging problems in Deep Tech areas. Through our partnership on several projects, Borys has always been fully dedicated to adding value to our projects by building a trustful relationship.

Aksiio brings excellent knowledge in Deep Tech areas and a high level of consulting expertise in software engineering to our company.
I definitely recommend partnering with them on your project!”

Ravi Sahu
CEO at Strayos

“I’m pleased to recommend to work with Aksiio team to solve present-day challenging CAD development problems.
Obviously, the Aksiio team has extensive technical expertise that allows them to take up complex projects.

We approached Aksiio for their expertise and they were involved in the development of specialized modules. As a result Aksiio team managed to bring our CAD solution for designing insoles to a higher level.”

Rob Hendriks
CEO at Leading Foot Technology

“I am writing to recommend Aksiio, as a powerful 3D scripting team that I have had the pleasure of using extensively. As someone who has worked with various 3D modeling software in the past, I can confidently say that their customized scripts and workflow stands out as one of the most impressive tools available today. One of the main strengths of their work is their versatility. It offers a wide range of modeling features, including polygon modeling, NURBS modeling, and Additive Manufacturing tools. Another standout feature is its user-friendly interface. It is intuitive and easy to use, making it accessible to experienced professionals and field newcomers.”

Ashkan Sedigh
CTO at Dimension Ortho