Google DeepMind’s WeatherNext 2 uses function generation networks to speed up probabilistic weather forecasting by 8 times

Google DeepMind research launched Next weather 2, A medium-term global weather forecast system based on artificial intelligence, now supports upgraded forecasts in Google Search, Gemini, Pixel Weather and the Google Maps platform’s weather API, and will be integrated with Google Maps next. It combines the new FFeature Generating Network (FGN)a massively integrated architecture that delivers faster, more accurate and higher-resolution probabilistic forecasts than previous WeatherNext systems, and it’s exposed as a data product in Earth Engine, BigQuery, and as an early access model on Vertex AI.

From deterministic grids to functional integration

The core of WeatherNext 2 is the FGN model. Rather than predicting a single deterministic future field, the model samples directly from a joint distribution of 15-day global weather trajectories. Each state 𝑋ₜ includes 6 atmospheric variables at 13 pressure levels and 6 surface variables on a 0.25 degree latitude and longitude grid with a time step of 6 hours. The model learns to approximate 𝑝(𝑋ₜ ∣ 𝑋ₜ₋β:falcon₋₁) and runs autoregressively from two initial analysis frames to generate integrated trajectories.

Architecturally, each FGN instance follows a similar layout to the GenCast denoiser. Graph neural network encoders and decoders map between latent representations defined on a regular grid and a spherical six-fold refined icosahedral grid. Graph converters run on grid nodes. The production FGN used by WeatherNext 2 is larger than GenCast, with approximately 180 million parameters, 768 potential dimensions, and 24 transformer layers per model seed, compared to 57 million parameters, 512 potential dimensions, and 16 layers for GenCast. FGN also runs with a 6-hour time step, while GenCast uses a 12-hour time step.

Cognitive and arbitrary uncertainty modeling in function spaces

FGN distinguishes between epistemic and arbitrary uncertainty in a way that is applicable to large-scale predictions. Epistemic uncertainty arising from limited data and imperfect learning is handled by a deep ensemble consisting of 4 independently initialized and trained models. Each model seed has the above schema, and the system generates the same number of ensemble members from each seed when generating predictions.

Arbitrary uncertainties represent inherent variations in the atmosphere and unresolved processes, addressed through functional perturbations. At each prediction step, the model samples a 32-dimensional Gaussian noise vector 𝜖ₜ and feeds it through a parameter-sharing conditional normalization layer within the network. This effectively samples a new set of weights 𝜃ₜ for this forward pass. For the same initial conditions, different values ​​of 𝜖ₜ give different but dynamically consistent predictions, so the ensemble members look like different plausible weather outcomes rather than independent noise at each grid point.

Use CRPS for edge training and learn joint structures

A key design choice is that FGN is trained only on a per-location, per-variable margin, rather than on an explicit multi-variable objective. The model uses Continuous Ranked Probability Score (CRPS) as the training loss, computed by a fair estimator on the ensemble sample of each grid point and averaged across variables, levels, and time. CRPS encourages clear, well-calibrated predictive distributions for each scalar quantity. In the later training phase, the authors introduce short autoregressive rollouts, up to 8 steps, and backpropagation through the rollout, which improves long-range stability but is not a strict requirement for good joint behavior.

Although only marginal supervision is used, low-dimensional noise and shared feature perturbations force the model to learn realistic joint structures. Since a single 32-dimensional noise vector affects the entire global field, the simplest way to reduce CRPS everywhere is to encode physically consistent spatial and cross-variable correlations along this manifold, rather than independent fluctuations. Experiments confirm that the resulting set captures actual regional totals and derived quantities.

Measured gain relative to GenCast and traditional baselines

In terms of marginal indicators, WeatherNext 2’s FGN is significantly better than GenCast overall. FGN achieves better CRPS in 99.9% of cases, with statistically significant gains, with an average improvement of about 6.5%, and a maximum gain of nearly 18% for some variables in shorter delivery times. The ensemble mean root mean square error also improves while maintaining a good propagation skill relationship, indicating that the ensemble propagation is consistent with the 15-day forecast error.

To test the joint structure, the team evaluated the CRPS pooled over spatial windows at different scales and derived quantities such as 10-meter wind speed and geopotential height difference between 300 hPa and 500 hPa. Relative to GenCast, FGN improves average pooling and max pooling CRPS, showing that it can better model region-level aggregation and multivariate relationships rather than just point-wise values.

Tropical cyclone tracking is a particularly important use case. The research team calculated the overall average tracking error using an external tracker. Compared to GenCast, FGN achieves a position error approximately equivalent to an extra day of useful prediction skill. Even restricted to the 12-hour time step version, FGN still outperforms GenCast, with a delivery time 2 days longer than GenCast. Relative economic value analysis of orbital probabilistic fields also favors FGN over GenCast across a range of cost-to-loss ratios, which are critical for decision-makers planning evacuation and asset protection.

Main points

  1. Function generation network core: WeatherNext 2 is built on the Functional Generative Network, a collection of graph converters that predicts a complete 15-day global trajectory at a 6-hour time step on a 0.25° grid, modeling 6 atmospheric variables and 6 surface variables at 13 pressure levels.
  2. Explicit modeling of epistemic and arbitrary uncertainty: This system combines 4 independently trained FGN seeds for epistemic uncertainty with a shared 32-dimensional noise input that perturbs the network normalization layer to achieve arbitrary uncertainty, so each sample is a dynamically consistent alternative prediction rather than point-wise noise.
  3. Edge training to improve joint structure: FGN is trained using only fair CRPS for each location edge, but still improves the joint spatial and cross-variable structure compared to the previous diffusion-based WeatherNext Gen model, including lower pooled CRPS on regional-level aggregate fields and derived variables such as 10-meter wind speed and geopotential thickness.
  4. Accuracy continues to improve compared to GenCast and WeatherNext Gen: WeatherNext 2 achieves better CRPS than earlier GenCast-based WeatherNext models on 99.9% of variable, level, and lead time combinations, with an average CRPS improvement of approximately 6.5%, improved overall average RMSE, and higher relative economic value for extreme event thresholds and tropical cyclone tracks.

Check full text, technical details and Project page. Please feel free to check out our GitHub page for tutorials, code, and notebooks. In addition, welcome to follow us twitter And don’t forget to join our 100k+ ML SubReddit and subscribe our newsletter. wait! Are you using Telegram? Now you can also join us via telegram.


Michal Sutter is a data science professional with a master’s degree in data science from the University of Padua. With a solid foundation in statistical analysis, machine learning, and data engineering, Michal excels at transforming complex data sets into actionable insights.

🙌 FOLLOW MARKTECHPOST: Add us as your go-to source on Google.

You may also like...