AI matches human experts in classifying microscopic organisms

October 10, 2025

AI matches human experts in classifying microscopic organisms

A new study shows how deep learning can achieve human-level performance in estimating uncertainty when classifying foraminifera.

By Petter Bjørklund, Communications Officer at SFI Visual Intelligence

Foraminifera (forams) are shelled microorganisms that are abundant in the Earth’s seabed. Analyzing different species of forams provides important information about climate change, the state of the marine environment, and suitable areas for carbon capture and storage.

Past research has attempted to automate these classification tasks—a usually laborious and time-consuming manual process—with deep learning (DL) methods. Several studies show significant promise, but few have focused on the uncertainty of the methods’ classifications.

PhD Candidate Iver Martinsen. Photo: Petter Bjørklund / SFI Visual Intelligence

“Uncertainty estimation is crucial to avoid misclassifications that could overlook rare and ecologically significant species. It is important to develop DL methods which accurately calculate how uncertain their predictions are”, says Iver Martinsen, PhD Candidate at UiT The Arctic University of Norway and SFI Visual Intelligence (VI).

In a recently published study, Martinsen and researchers at UiT, VI, Nofima, and NSE show how deep learning can achieve human-level performance in estimating uncertainty when classifying forams. Using 260 images of forams and sediment grains, the researchers trained the DL methods to detect and classify these microscopic organisms.

Evaluating the performance of such methods remains a significant challenge, Martinsen says. To address this, they created a human-derived set of uncertainty estimations based on classification task responses from four senior geoscientists.

“The geoscientists were given the same 260 images and were tasked to classify each of them, as well as state their confidence level. This formed a comparative baseline which allowed us to assess the models’ estimations to those of human experts,” Martinsen explains.

The study also demonstrates how human uncertainty estimations may provide a relevant and valuable baseline for comparison, he adds. Results show that the DL methods’ estimations can match—and at times be better than—expert geoscientists.

“We gain valuable insights on how these methods’ estimations compare to each other and human experts. We believe this research is a leap towards making these automated tools more reliable, trustworthy, and applicable in real-world settings,” Martinsen says.

Publication

Quantifying uncertainty in foraminifera classification: How deep learning methods compare to human experts

July 16, 2025

Iver Martinsen, Steffen Aagaard Sørensen, Samuel Ortega, Fred Godtliebsen, Miguel Tejedor, Eirik Myrvoll-Nilsen

Paper abstract

Foraminifera are shell-bearing microorganisms that are commonly found in marine deposits on the seabed. They are important indicators in many analyses, are used in climate change research, monitoring marine environments, evolutionary studies, and are also frequently used in the oil and gas industry. Although some research has focused on automating the classification of foraminifera images, few have addressed the uncertainty in these classifications. Although foraminifera classification is not a safety-critical task, estimating uncertainty is crucial to avoid misclassifications that could overlook rare and ecologically significant species that are informative indicators of the environment in which they lived. Uncertainty estimation in deep learning has gained significant attention and many methods have been developed. However, evaluating the performance of these methods in practical settings remains a challenge. To create a benchmark for uncertainty estimation in the classification of foraminifera, we administered a multiple choice questionnaire containing classification tasks to four senior geologists. By analyzing their responses, we generated human-derived uncertainty estimates for a test set of 260 images of foraminifera and sediment grains. These uncertainty estimates served as a baseline for comparison when training neural networks in classification. We then trained multiple deep neural networks using a range of uncertainty quantification methods to classify and state the uncertainty about the classifications. The results of the deep learning uncertainty quantification methods were then analyzed and compared with the human benchmark, to see how the methods performed individually and how the methods aligned with humans. Our results show that human-level performance can be achieved with deep learning and that test-time data augmentation and ensembling can help improve both uncertainty estimation and classification performance. Our results also show that human uncertainty estimates are helpful indicators for detecting classification errors and that deep learning-based uncertainty estimates can improve calibration and classification accuracy.

‍

Full-text publication

View publication

View All publications