Author(s): J. P. Carbajal; V. Bellos
Linked Author(s): Vasilis Bellos
Keywords: Machine learning; Hydraulic modeling; Hydrological modeling; Supervised learning
Abstract: We provide an overview of Machine Learning (ML) and its role in hydrology and hydraulic. The aim is to ease the access of researchers in the latter fields to the techniques in ML. Machine Learning (ML) has received increasing attention in the later years. It promises to ease the problem of modeling from observations. It is a heavily mathematized field, with strong statistical jargon. Therefore, it can be difficult to access for researchers from the fields of hydrology and hydraulics (hydro-research). A bird’s-eye view of ML and its role can be beneficial for the hydro-research community. Herein we address the question of what is ML in general terms and what are its role in hydro-research. We discuss some important features in each of these roles. ML could be described as "the use of a set of observations to uncover an underlying process". The process is usually stated as a mathematical relation between the observations. Herein we consider only the case in which observations include the inputs and outputs of the process, i.e. supervised learning. The sought functional pattern, called unknown target function, establishes the relation between the inputs and the outputs, i.e. is a model of the process. The training examples are input-output samples from the underlying process and they represent all the direct information we have about it. To search for the target function we choose an extensive set of functions, which we call the hypothesis set. For example, we could choose all the linear functions between inputs and outputs (i.e. linear models), or all the functions generated by a given neural network. The key point is that the hypothesis set is built so as to contain the unknown target function or at least a very good approximation for it. This choice is based on our previous experience and the available expert knowledge about the underlying process, e.g. mathematical or phenomenological models. The learning algorithm uses the training examples to select the best candidate function from the hypothesis set, i.e. the final hypothesis. Since all prior information about the unknown target function is encoded in the hypothesis set, the quality of the best candidate heavily depends on the elements in the set. All fundamental research in supervised ML consist in developing new learning algorithms (mainly optimization algorithms), novel or concise descriptions of different hypothesis sets, and useful representations for input-output data (encodings). ML as described before can be useful for hydro-research in at least three situations: i) (artificial science) learning new models from measured data; ii) (scientific numerical modeling) using data to find the value of parameters of known models; iii) (emulation) replacing a model with a simpler version while maintaining the quality of the predictions.
DOI: https://doi.org/10.3850/978-981-11-2731-1_386-cd
Year: 2018