Machine Learning Applications for Imputing Missing Hydroclimatic Data: The Case of Tormes Catchment in Spain

IAHR Document Library

« Back to Library Homepage « Book of Abstracts of the 16th International Conference on Hy...

Machine Learning Applications for Imputing Missing Hydroclimatic Data: The Case of Tormes Catchment in Spain

Download

Author(s): Hector Gonzalez Lopez; Majid Niazkar; Jaroslav Mysiak; Carlos Dionisio Perez Blanco

Linked Author(s):

Keywords: Hydroclimatic variables; Precipitation; Temperature; Machine learning; Missing data imputation

Abstract: Ground-based observations are essential for hydrological modelling, yet station records often contain missing values. Reanalysis products provide more continuous series even though they may be biased. This study evaluates eight machine learning (ML) models to impute missing temperature and precipitation records in the Tormes catchment (Spain) using CFSR and ERA5 as predictors. The tested models include Multiple Linear Regression (MLR), Decision Tree Regression, Random Forest Regression, Support Vector Regression, K-Nearest Neighbors, AdaBoost, Gradient Boosting Regressor, and XGBoost. Results show that both MLR and XGBoost accurately impute temperature, whereas precipitation remains more difficult due to nonlinearity and high frequency of zero values. Additional strategies, including time lags, seasonal information, hybrid SARIMA-ML, logarithmic transformation, and SHAP-based feature selection, did not improve results over the direct application of XGBoost. Overall, XGBoost provided the best performance for precipitation imputation, while simpler models were also effective for temperature.

DOI:

Year: 2026