Multi-modal land cover mapping of remote sensing images using pyramid attention and gated fusion networks

August 1, 2022

The advances in remote sensing technologies and the fast-growing volume of remotely sensed multi-modality data have dramatically changed how we observe the Earth. One of the key applications in Earth observation is the land-cover classification (semantic mapping) which is highly valuable in environmental monitoring, agriculture and urban planning, predicting natural disasters and hazardous events, etc. Data from multi-modality remote sensing (RS can provide rich and varied details about the Earth's surface, such as light detection and ranging (LiDAR) data, it can supplement common multi-spectral imagery (e.g. RGB images) with additional information (e.g. elevation features of land-cover objects) about the same land.  

However, because multi-modal data commonly contains unequal, redundant, or even contradictory information, effectively fusing and utilizing such varied information to better map land cover is very difficult. Current multimodal land cover mapping methods mainly employ two independent encoders running in parallel to extract features separately, which tends to ignore the effects of noisy and redundant features from very different multimodal data.

In this work, we present a novel pyramid attention and gated fusion method (MultiModNet) for multi-modality land cover mapping in remote sensing, to address the challenges of 'what', 'how', and 'where' to effectively fuse multi-source features and to efficiently learn optimal joint representations of different modalities.  

We’ve conducted extensive experiments on two publicly available remote sensing benchmark datasets to test MultliModNet. Our results demonstrate the effectiveness and superiority of the proposed methods for multi-modal land cover classification. The MultiModNet framework has a scalable structure that enables it to be generalized to various multi-modal data and to be extended to more than two modalities.


Multi-modal land cover mapping of remote sensing images using pyramid attention and gated fusion networks

July 1, 2022

Qinghui Liu, Michael Kampffmeyer, Robert Jenssen, Arnt-Børre Salberg

Paper abstract

Multi-modality data is becoming readily available in remote sensing (RS) and can provide complementary information about the Earth's surface. Effective fusion of multi-modal information is thus important for various applications in RS, but also very challenging due to large domain differences, noise, and redundancies. There is a lack of effective and scalable fusion techniques for bridging multiple modality encoders and fully exploiting complementary information. To this end, we propose a new multi-modality network (MultiModNet) for land cover mapping of multi-modal remote sensing data based on a novel pyramid attention fusion (PAF) module and a gated fusion unit (GFU). The PAF module is designed to efficiently obtain rich fine-grained contextual representations from each modality with a built-in cross-level and cross-view attention fusion mechanism, and the GFU module utilizes a novel gating mechanism for early merging of features, thereby diminishing hidden redundancies and noise. This enables supplementary modalities to effectively extract the most valuable and complementary information for late feature fusion. Extensive experiments on two representative RS benchmark datasets demonstrate the effectiveness, robustness, and superiority of the MultiModNet for multi-modal land cover classification.