introduce
Galileo is a developed, highly multimodal fundamental model for processing, analyzing and understanding a variety of Earth Observation (EO) data streams, including optical, radar, elevation, climate and auxiliary maps. Galileo was developed with the support of researchers at McGill University, NASA Harvest AI2, Carleton University, the University of British Columbia, Vector Institute and Arizona State University. Galileo aims to provide unified generalist solutions for key applications such as agricultural land mapping, disaster response and environmental monitoring.
Contrary to previous remote sensing models limited to a single data type or scale, Galileo elastically fuses multiple sensing methods, aiming to identify phenomena, from tiny objects (such as fishing boats that measure only 1-2 pixels) to broad, slowly changing features.

Key functions and architectures
Multimode transformer design
Galileo is based on a visual transformer (VIT) architecture and is very suitable for the process:
- Multispectral optical image (For example, Sentinel-2)
- Synthetic Aperture Radar (SAR) (For example, Sentinel-1)
- Elevation and slope data (e.g. NASA SRTM)
- Weather/climate data (For example, precipitation and temperature of ERA 5)
- Land cover map, population, night lights, etc.
Flexible input processing:
Galileo’s tokenized pipeline allocates remote sensing inputs into space patches, time periods, and logical channel groups. This allows the model to process image, time series, and static tabular data in a single architectural configuration.
Unified local and global functional learning
The core innovation is Galileo’s self-supervised budgeting algorithm, which combines:
- Global Losses: Encourage abstraction in a wide space or temporal environment, which is the ideal to identify “big” or slowly changing features (glacial, forest loss).
- Local losses: Improve sensitivity to detail – It is crucial to detect small, rapidly changing objects (ships, debris).
Local and global goals are different:
- Prediction depth: Global tasks are targeted at deep representations; local tasks use shallow, linear projection functions.
- Masking strategy: Global tasks use structured, relevant spatiotemporal masks (forced predictions over large intervals); local tasks use random unstructured masks.
This dual-objective preprocessing enhances multi-scale feature representations, where Galileo can cross tasks and robustness even with limited labels.
Preprocessing datasets and policies
To ensure semantic and geographical diversity, Galileo’s preprocessed dataset covers the entire Earth, sampled by clustering methods to maximize variety and geographical expansion of land coverage. The dataset contains more than 127,000 spatiotemporal aligned samples, each of which includes four categories and nine remote sensing data types.
Pre-training for 500 eras was carried out on a large number of computing resources. Key aspects:
- Batch size: The effective batch size is 512.
- Data Enhancement: Flip, rotate and variable patch size.
- optimization: adamw with planned learning rate and weight decay.


Benchmark results
A summary of excellence
Galileo takes the benchmark as the benchmark 11 different datasets and 15 downstream tasksclassify and segment the time series across images and pixels. Specifically, it dominates public datasets such as European, bigearthnet, so2sat, mados (marine debris), sen1floods11 (SAR flood map), croppharvest (multi-mode crop classification).
Performance highlights of Galileo alkali (VIT-BASE):
- Category (Finetune):
- EUROSAT: 97.7% (top 1 accuracy, 100% training data)
- Better than professional models such as Croma (96.6%) and Satmae (96.6%) (96.6%)
- Pixel times:
- CROPHARVEST (Kenya): 84.2% (Top Presto and Anysat)
- Breizhcrops: 73.0%
- Subdivision (miou):
- Mados: 67.6%
- Bastis: 79.4%
Model flexibility:
Galileo was the best overall performance in all benchmarks, i.e. on professional competitors in image normalization and time series. It is worth noting that small model variants (Vit-Nano, Vit-Tiny) also obtain top or near top results, which are critical for resource-constrained settings.


Ablation and input importance
Removing any separate mode from training preprocessing (e.g., Viirs Night Light, ERA5, Dynamic World Map) results in performance degradation, and even on the benchmark, the input type cannot be used directly. For example, the lack of Viirs data reduced Mados Miou from 67.8% to 63.5%, demonstrating the value of the overall multimodal in feature generalization.
Open source and real-world impact
- Open weights and code:
All code, model weights, and preprocessed data are available on GitHub, facilitating transparency and adoption in the global EO community. - Social interests:
Galileo supports mission-critical NASA harvesting activities such as global crop type mapping, rapid disaster maps (floods, wildfires) and ocean pollution detection. The model’s ability to use limited labeled data makes it particularly valuable in areas where scarce ground truths support food security and climate adaptation efforts.
Technical summary table
Model | parameter | Support tasks | Level (lower = better) | Input method |
---|---|---|---|---|
Galileogi | 85m | Images, time series | 1 (Overall) | Optics, SAR, weather, etc. |
Expert Sota | Various | Usually 1 or 2 types | 3–10 | Limited |
Galileo-base: Consistently superior performance and flexibility across all major EO benchmarks.
in conclusion
Galileo’s methodological and engineering advances – multi-modal input, multi-scale local-global feature learning, and global diversified large-scale training – provides new standards for generalist remote sensing AI. Its flexibility is to provide reliable, high-quality maps and predictions from environmental monitoring to actual deployment of climate resilience regardless of mission or geographical location.
With open source access and active development, Galileo will be positioned to catalyze new waves of innovation in the science of the Earth system, thus empowering practitioners everywhere.
Check Paper,,,,, Model and Technology Blog. Check out ours anytime Tutorials, codes and notebooks for github pages. Also, please stay tuned for us twitter And don’t forget to join us 100K+ ml reddit And subscribe Our newsletter.

Asif Razzaq is CEO of Marktechpost Media Inc. As a visionary entrepreneur and engineer, ASIF is committed to harnessing the potential of artificial intelligence to achieve social benefits. His recent effort is to launch Marktechpost, an artificial intelligence media platform that has an in-depth coverage of machine learning and deep learning news that can sound both technically, both through technical voices and be understood by a wide audience. The platform has over 2 million views per month, demonstrating its popularity among its audience.
