Author(s): Thitipoom Chailert; Mark A. Trigg; Abdulrahman Altahhan; Evangelos Pournaras
Linked Author(s):
Keywords: Deep learning; Flash floods; Flood forecast; Input data; Long short-term memory (LSTM); Rare events
Abstract: The rapid onset of flash floods presents a major challenge for data-driven forecasting models, primarily due to the scarcity of relevant training samples within large historical datasets. This study addresses this limitation by optimizing the input data size for Long Short-Term Memory (LSTM) networks. We compared the performance of models trained on subsets of threshold-exceeding river level events defined by four selection criteria. The findings indicate that training on smaller, relevant datasets yields higher forecasting accuracy than using the full historical record. The subset of events characterised by a river level rate of change exceeding 0.22 metres per 180 minutes (Criteria D) resulted in the lowest Root Mean Square Error (RMSE) and the highest Nash-Sutcliffe Efficiency (NSE), suggesting that targeted data selection is a viable strategy to overcome data imbalance in deep learning-based flash flood forecasting.
Year: 2026