The spatial SWE dataset in the Po River District has been generated using a hybrid approach called J-snow6, used in the last 10 years to support hydrological balances in different Italian alpine regions (https://aineva.it/wp-content/uploads/2019/12/NV93_2b.pdf). The modeling approach is built on the following procedures (see Fig. 2); (a) preprocessing of the meteorological data through the use of the MeteoIO library (https://meteoio.slf.ch/)15; (b) snow computation through GEOtop 2.016,17; (c) assimilation of in-situ snow measurements; (d) assimilation of EO snow products. Finally, a tiling approach18 has been used in order to efficiently manage the computational effort needed for the simulations.

Flow chart showing the modelling chain used to generate the data.
Meteo data preprocessing
The modelling chain has been driven using the following inputs (see Fig. 2): (a) meteorological data (i.e., air temperature, precipitation, direct shortwave radiation, wind velocity, air humidity) available at different locations within the computational domain, properly validated and homogenized. These data have been provided by various environmental agencies and government bodies located in the northern Italy and surrounding countries (see Acknowledgments for a full reference and their availability); (b) ERA5-Land19 meteorological reanalysis data, used to integrate those areas and years where in-situ meteo data was not available (https://cds.climate.copernicus.eu/datasets/reanalysis-era5-land?tab=overview); (c) point snow depth (HS) data, at automatic and manual stations, collected from various avalanche warning agencies (see Acknowledgments for a full reference), integrated with the HS dataset produced by Matiu et al.20 (https://doi.org/10.5281/zenodo.4064128); (d) Snow Cover Area (SCA) maps generated from MODIS data produced by Eurac Research21,22 available since the 2002-2003 winter season and having a 250 m spatial resolution (https://edp-portal.eurac.edu/discovery/ed636c24-24b3-11ef-9957-8d8b4d692a59); (e) a Digital Elevation Model (DEM) having a 500 m resolution23, necessary to create the geomorphological input maps for the physical model (https://cis2.eea.europa.eu/data/260/).
As in-situ meteorological records are often affected by quality24, reliability or representativeness issues7, the following processing procedures have been applied: (a) data cleaning, (b) calculation of the cloud transmittance, (c) gap filling of meteo data, (d) spatial interpolation of meteorological variables.
Data cleaning, includes the application of different filters for the meteorological variables24,25 that help to remove the noise from the data and ensure the integrity. We used the filters provided in the MeteoIO library15; in particular, Table 1 reports a list of the filters, processing and interpolation algorithms applied for each meteorological variable, namely, air temperature (TA), water equivalent of precipitations, either solid or liquid (PSUM), relative humidity (RH), wind velocity (VW), incoming short wave radiation (ISWR), air pressure (P), height of the snow, or snow depth (HS), cloud transmissivity (TAU_CLD). In the MeteoIO library, the MIN_MAX filter imposes minimum and maximum limits on meteorological data values, removing or limiting those that fall outside these realistic ranges. The RATE filter checks the rate of change between consecutive data points, identifying and correcting changes that are too rapid, which may indicate sensor errors. UNHEATED_RAINGAUGE is a specific filter for correcting data from unheated rain gauges, which can be unreliable in freezing conditions. The DESPIKING filter removes sudden, anomalous spikes in the data, often caused by sensor malfunctions or electronic noise, while UNDERCATCH_WMO corrects the underestimation of precipitation by applying guidelines from the World Meteorological Organization (WMO), particularly in windy or snowy conditions26. To fill temporal gaps in the data, a linear interpolation is applied, but only for limited periods of missing data and depending on the specific variable; this approach ensures that the interpolation remains reliable, avoiding excessive assumptions over long gaps.
Calculation of the cloud transmittance from direct shortwave radiation data, a necessary input for the GEOtop model, has been obtained using the TAU_CLD generator present in the MeteoIO library.
Meteo data spatial gap-filling, a common problem in meteorological time series and observational networks27, has been done through a spatial interpolation of ERA5-Land data considering a 12 km radius around each measurement point. If no weather station is available within this radius, the centroid of the ERA5-Land cell is taken to create a virtual station. At this virtual station, we extract all the meteorological variables that are consistent with the in-situ data. Figure 3 shows the location of ERA5-Land centroids (blue circles) and in-situ stations (orange circles) in each tile of the computational domain, giving rise to a final homogeneous input in terms of spatial granularity. In case of neighbouring measurements of HS and daily manual measurements of HS (HS24), the priority has been given to the latter.

Location of ERA5-Land centroids (blue circles) and in-situ stations (orange circles) used as input for the physical model. Computational tiles, with the identification number, are highlighted by the red grid.
Tables 2, 3, 4 report the number of sensors (for each meteo variable, where HS24 and HN24 are daily manual measurements of HS and fresh snow, respectively) used as input for the physical model in every tile, averaged over each decade. It is noticeable the increase in the number of sensors during the second and the third decade; such increase is also highlighted in Table 5, where the total number of sensor per year and variable over the entire domain is reported.
Spatial interpolation of meteorological variables has been obtained using the Inverse Distance Weighting (IDW)28 interpolation method present in the MeteoIO library. The algorithm performs spatial interpolation by weighting observations inversely to their distance from the target location. IDW_LAPSE enhances this by incorporating a lapse rate, adjusting interpolated values based on elevation differences, ideal for variables like temperature. When combined with a TREND of type GRAD (linear gradient) or FRAC (fractional), these methods account for broader spatial variations, refining the interpolation. The LISTON_RH algorithm implements a method for spatial interpolation of RH29. The process begins by calculating the dew point temperature for each input station, then dew point temperatures are spatially interpolated using the IDW_LAPSE algorithm. After the interpolation, the local dew point temperatures are converted back into relative humidity values. The STD_PRESS algorithm in MeteoIO calculates standard atmospheric pressure as a function of the elevation. It provides a baseline pressure estimate under standard conditions, mainly for correcting pressure-dependent meteorological variables; however, in GEOtop modeling, this has minimal impact modelling the snow. Table 1 reports the interpolation algorithms used for each meteo variable.
Snow processes model
GEOtop 2.016,17 is a process-based hydrological model that includes a multi-layer snow scheme for the efficient evolution of snowpack by solving 1D mass and energy balance16, accounting for the physical processes affecting the snowpack such as compaction and melting. For the purpose of this work instead of using the full GEOtop model (i.e., water and energy balance), inclusive of the integration of Richards equation in 3D, only the energy balance has been used (i.e., the snow module), which is 1D and thus has no lateral transfer of energy or mass; in doing this, we were able to parallelise the computational burden in different machines without losing any representation of the physical processes30. In order to consider the morphological complexity of the terrain, responsible for the large snow heterogeneity in the mountains, the model is fed with geomorphological maps of elevation, slope, aspect and sky view factor. In configurations similar to the one used in this paper, GEOtop has been already used in literature to model the snowpack evolution on a basin scale30,31,32. No calibration of the snow parameters has been done, and the standard values reported in the literature have been considered as valid31,32,33,34. More information about the interpolation in GEOtop can be found in Appendix C of Endrizzi et al.16.
Snow measurements assimilation
The assimilation of snow gauge data is approached in two key ways: through mass correction of the precipitation amount and correction of the energy inflow.
Mass correction
Measuring precipitation at high elevations is challenging due to the problem of undercatch that may reach a 20%–50% reduction value in winter35. Several techniques are present to correct the undercatch error36,37,38. Stemming from the approach introduced by Mair et al.39 we introduced a technique to transform the snow gauges time series into virtual rain measurements. We selected about 250 points across the domain (reported as orange circles in Fig. 4), based on criteria such as quality, territorial distribution and climate representativeness. Snow measurement time series have been filtered and processed in order to remove outliers and polish the signal. Then, analysing the HS signal, the various snowfall events along the time series have been identified and the corresponding volumetric amount of accumulation has been multiplied by the density of the fresh snow estimated through empirical formulations40. This activity requires a thorough match of simulations with the corresponding rain measurements to avoid false precipitation events and to estimate the actual snowfall elevation and amount. The procedure is recursively applied along all stations until an acceptable comparison between simulated and measured HS time series is reached (see “Validation” section); if the discrepancy between in-situ data and simulations passes a threshold-based check, the modeling exercise proceeds to satellite assimilation, otherwise a correction is calculated and applied in an iterative procedure. Finally, once obtained the final time series of solid precipitation along the various snow gauges, these are eventually used in the spatial interpolation procedure previously described to recalculate the final dataset of precipitation.

Tiles used to computationally represent the Po River District (red grid). Points of HS assimilation and validation described in the “Validation” section are reported as orange and blue circles, respectively, with the total number in each tile reported with the same color.
Energy correction
Snow measurements are also used to correct the 1D melting rate simulated by the model to avoid either a too rapid or too slow snow melting rate. Such operation is done retrieving the melting rate on a single measurement point from the slope of the HS time series during a melting phase; the measured melting rate is then used to correct the 1D melting rate evaluated by the physical model. This procedure is performed over about 250 measurement points. Finally, interpolating these melting rate corrections considering the geomorphology of the area of interest, 2D melting rate correction maps are retrieved.
EO assimilation
After the evaluation of melting rate maps through the use of snow gauges data, MODIS-retrieved SCA maps are used for correcting the melting rate in 2D. In order to compare MODIS and simulated SCA, we used the so-called “k index”, a metric that takes into account both the elevation and the aspect classes along the cardinal directions. This metric is used to efficiently evaluate the comparison of SCA between simulation and EO maps, enabling an iterative assimilation procedure. The goal of the comparison is not to exactly mimic the SCA evolution but to evaluate the representativeness of our results over the study area (expressed in terms of snowline along the cardinal directions), in coherence with geomorphological features such as aspect and elevation, i.e., the most important morphological features to explain the snow melting heterogeneity in the terrain.
The k index is based on the Intersection over Union (IoU), or Jaccard Index, a metric for comparing the similarity between two arbitrary shapes commonly used in computer vision41,42 and also employed for classification of EO images43,44. For our application (i.e., an automatic assimilation of EO-retrieved SCA images to correct the melting rate when the discrepancy between EO images and simulated SCA exceeds a threshold), this index proves to be appropriate because it enables a quick comparison between the simulated and the MODIS-retrieved SCA over a peculiar plan, using the so-called radar plot. The radar plot represents the snowline as a function of cardinal directions at a global scale (see Fig. 5 for a graphic representation). The k index thus highlights the performance of the simulations in representing the snow melting process, expressed as snowline elevation, and triggers the correction of the melting rate as a function of both elevation and aspect.

(a) Illustration of intersection and difference between EO and simulated areas in the cardinal plane (i.e., the radar plot). (b) Example of a radar plot comparing EO-retrieved and simulated SCA images.
Defining the union area (U) as the sum of the intersecting (I) and differing (D) SCA areas, the k index is expressed as the ratio of I over U (Fig. 5a):
$$k=\frac{I}{U}=\frac{I}{(D+I)}.$$
(1)
For specific cases, some modifications to the original index were made. In particular, if both I = 0 and D = 0, (i.e., no snow on the terrain), no EO data assimilation is needed in the physical model. In such a case, the agreement between the EO image and the simulation is perfect, and k has been forced to be 1. Considering the extreme cases, in case of no intersection (i.e., I = 0), k will be equal to 0, whereas in case of no difference (i.e., D = 0), k will be equal to 1. For other values of I and D, k ranges between 0 and 1.
During the EO SCA assimilation procedure, a radar plot is generated (Fig. 5b) for each pair of corresponding images (simulation vs MODIS) to visualize differences in SCA across cardinal directions, offering a means to understand the value of the obtained k index. If the k index is below a threshold, thus indicating a significant discrepancy, the model calculates and applies a correction of the melting rate in the next iteration of GEOtop model simulations. Otherwise, the modelling proceeds to the retrieval of SWE maps. Being used for the correction of the melting rate, the MODIS images have been assimilated (and similarly used for the validation) every five days (or multiple, in case of cloudy images), a value appropriate according to the evolution of the snowpack.
Computational approach
In our modelling approach, we had to deal with the daunting task of managing a huge spatial domain (about 74,000 km2), fine spatial resolution (500 m) and a long (30 year, daily) temporal dataset. This results in computationally intensive simulations that demand substantial computational resources and efficient processing methods. Therefore, we employed the sophisticated tiling approach18 to enhance the efficiency and scalability of our simulations, dividing the computational domain in 18 squared tiles (having a 100 km side length) in order to ease the simulation control and split the calculation burden over different machines (see Fig. 3). To effectively implement this approach, we adopted methods that address the problem of coherence across the tiles in 2D simulations. In particular, to maintain the coherence of the interpolated meteorological data across the tiles, a buffer zone (4 km wide) has been considered to ensure smooth transitions between adjacent tiles. Similarly, to maintain the coherence of melting rate maps, in-situ data have been interpolated following the same approach used for the meteorological data.
The dataset, consisting of 30 years of data with 273 GeoTIFF maps per year, occupies approximately 1.5 GB on disk due to Lempel-Ziv-Welch (LZW) compression. However, when decompressed and loaded into memory, each map requires significantly more space. Each map has a resolution of 999 × 659 pixels with a single band, where each pixel is stored as a 16-bit integer (Int16), requiring 2 bytes per pixel. The memory required for a single map is calculated as:
$$\,{\rm{Memory\; (in\; bytes)}}={\rm{Width}}\times {\rm{Height}}\times {\rm{Bytes\; per\; pixel}}\,\times {\rm{Number}}\,{\rm{of}}\,{\rm{bands}},$$
i.e.: 999 × 659 × 2 = 1316682 bytes ( ~ 1.26 MB). For the entire dataset, the total memory required to load all maps is:
$$\,{\rm{Total\; Memory\; (in\; GB)}}\,=\frac{1.26\,\,{\rm{MB}}\,\times 273\,{\rm{maps}}/{\rm{year}}\times 30\,{\rm{years}}}{1024\,{\rm{MB}}/{\rm{GB}}}\approx 10\,{\rm{GB}}.$$
To efficiently handle this dataset and perform computations, a computer with a multi-core processor, such as an Intel Core i7/i9 or AMD Ryzen 7/9, is recommended to ensure sufficient computational power for managing large datasets and performing geospatial analyses. At least 16 GB of RAM is required, although 32 GB or more is preferable to allow for loading multiple maps into memory simultaneously without performance slowdowns, especially considering the estimated 10.54 GB memory requirement for the entire dataset. For storage, a Solid-State Drive (SSD) with at least 512 GB of available space is ideal for fast read/write speeds. While the compressed dataset occupies only 1.5 GB, additional space is necessary for intermediate files and computation outputs. A dedicated GPU is not required, as all necessary calculations can be efficiently performed using the CPU.
Assumptions and limitations
The modeling of SWE using GEOtop presents several limitations, such as: (i) the model does not account for wind-driven snow transport, avalanche dynamics or snow accumulation at the base of steep slopes. (ii) The density of fresh snow and the precipitation phase mainly depend on 2 m air temperature without considering the atmospheric temperature profile. Such limitations and the physical validity of results under these assumptions are further described in the GEOtop model’s documentation16
Additional limitations stem from the data quality24 of in-situ meteorological data as highlighted in “Meteo data preprocessing” section. The in-situ datasets suffer from quality issues, availability, etc., necessitating the integration of the reanalysis datasets like ERA5-Land dataset45; at the same time, these reanalysis datasets are not a substitute for local-scale meteorological conditions24. Geostatistical methods and filtering techniques used for interpolation can sometimes fail to adequately represent complex phenomena such as thermal inversions or the intricacies of wind fields, which influence the snowpack energy exchanges29.
Finally, the main limitations related to data assimilation are the following: (i) the in-situ measurements provide only indirect estimates of SWE, as they are based on volumetric observations, and (ii) the assimilation of the energy balance is constrained, as it can only be calculated for specific points within the domain, making it difficult to spatialize—even with satellite-based observations. Despite these limitations, the error associated with modeling of SWE using our methodology remains within acceptable bounds as highlighted in the “Technical Validation” section.