Author(s): Chia-Ling Chang; Wen-Yun Fu; Shien-Tsung Chen
Linked Author(s):
Keywords: Machine learning; Pollutant load; Random forest; River water quality
Abstract: This study uses machine learning (ML) algorithms to compare the performance of Random Forest (RF) and Linear Regression (LR) for multi-station water quality prediction in the Xindian River Basin. The research focuses on four key parameters: Dissolved Oxygen (DO), Biochemical Oxygen Demand (BOD5), Ammonia Nitrogen (NH3-N), and Suspended Solids (SS). Long-term water quality monitoring data are utilized, including essential inputs such as flow rates and other potential factors, including rainfall and pollution loads. To ensure model robustness, the training/testing split considers the spatial independence of monitoring stations, and performance is assessed using standard metrics. Additionally, feature importance analysis is included to improve model interpretability. The results examine how different input combinations influence predictive accuracy. Ultimately, this research presents an integrated workflow for developing the Xindian River water quality prediction model, which combines model prediction with interpretability, aiming to support data-driven decisions for prioritizing river management efforts and optimizing the water quality monitoring network.
Year: 2026