Probabilistic Flood Loss Models for Companies

Flood loss modeling is a central component of flood risk analysis. Conventionally, this involves univariable and deterministic stage‐damage functions. Recent advancements in the field promote the use of multivariable and probabilistic loss models, which consider variables beyond inundation depth and account for prediction uncertainty. Although companies contribute significantly to total loss figures, novel modeling approaches for companies are lacking. Scarce data and the heterogeneity among companies impede the development of company flood loss models. We present three multivariable flood loss models for companies from the manufacturing, commercial, financial, and service sector that intrinsically quantify prediction uncertainty. Based on object‐level loss data (n = 1,306), we comparatively evaluate the predictive capacity of Bayesian networks, Bayesian regression, and random forest in relation to deterministic and probabilistic stage‐damage functions, serving as benchmarks. The company loss data stem from four postevent surveys in Germany between 2002 and 2013 and include information on flood intensity, company characteristics, emergency response, private precaution, and resulting loss to building, equipment, and goods and stock. We find that the multivariable probabilistic models successfully identify and reproduce essential relationships of flood damage processes in the data. The assessment of model skill focuses on the precision of the probabilistic predictions and reveals that the candidate models outperform the stage‐damage functions, while differences among the proposed models are negligible. Although the combination of multivariable and probabilistic loss estimation improves predictive accuracy over the entire data set, wide predictive distributions stress the necessity for the quantification of uncertainty.


Introduction
Flooding poses immense risk to life and economic goods. Over the past four decades, 40% of globally recorded natural catastrophes were caused by pluvial or fluvial flooding and the share of hydrological events is rising (Munich Re, 2018). Severe fluvial flooding such as the 2002 event (Ulbrich et al., 2003) or 2013 event  in Germany can harm all components of society such as private households, infrastructure, or economy. Damage to companies constitutes a high share of total flood losses. For instance, businesses accounted for € 1.4 billion (15.9%) of the total direct flood loss of € 9.1 billion in 2002 (Mechler & Weichselgartner, 2003). In the 2013 flood, companies suffered € 1.3 billion (19%) of the total € 6.7 billion damage (German Federal Ministry of the Interior, 2013; Thieken et al., 2016). Despite the substantial contribution of companies to overall damage, previous flood loss research addressed residential damage for the most part (Gerl et al., 2016;Gissing & Blong, 2004).
Flood risk assessment comprises the evaluation of flood hazard, exposure, and vulnerability (Merz, Hall, et al., 2010;Olsen et al., 2015). Vulnerability describes the susceptibility of exposed assets, such as buildings or contents, to sustain damage during a flood. The assessment of monetary loss through loss models represents a cornerstone in flood risk analysis and directly influences flood management practice, for instance in the cost-benefit analysis of flood management measures or in the calculation of insurance premiums . Conventionally, flood loss estimation engages univariable stage-damage functions, which relate the hazard intensity at an asset, that is, inundation depth, to the damage grade or absolute damage (Alfieri et al., 2016;Grigg & Helweg, 1975;Huizinga et al., 2017;G. F. White, 1945). Most flood loss models feature a variety of distinct stage-damage functions differentiating between occupancy (e.g., residential, commercial, and industrial), asset type (e.g., building, contents, and equipment), and asset characteristics (e.g., building type, building material, and number of stories). Several models include explicit stage-damage functions for the commercial and industrial sector, for instance, the Multi-Coloured Manual (Penning-Rowsell et al., 2005), HAZUS-MH (Scawthorn et al., 2006), the stage-damage functions of the International Commission for the Protection of the Rhine (2016), or the global data set of stage-damage functions by Huizinga et al. (2017). Still, stage-damage functions often omit other damage influencing factors such as inundation duration or preparedness (Kelman & Spence, 2004;Middelmann-Fernandes, 2010;Thieken et al., 2005) and, more importantly, cannot account for interactions among the variables. As a result, stage-damage functions can only partially describe the damage processes (Gissing & Blong, 2004;Merz et al., 2004;Rözer et al., 2019;Schröter et al., 2014;. The advance of machine learning and data mining promoted the development of multivariable flood loss models, which jointly consider a variety of damage influencing factors and their interdependency. The modeling community proposed ample methods for flood loss estimation including multivariate generalized regression (Rözer et al., 2019;Van Ootegem et al., 2015;Zhai et al., 2005), rule-based models (Elmer et al., 2010;Kreibich et al., 2010;Thieken et al., 2008), tree-based approaches (Carisi et al., 2018;Hasanzadeh Nafari, Ngo, & Mendis, 2016;Kreibich et al., 2017;Sieg et al., 2017;Sultana et al., 2018), and Bayesian networks (Lüdtke et al., 2019;Vogel et al., 2012Vogel et al., , 2014Wagenaar et al., 2018).
Another advantage of such flood loss models is their ability to quantify the predictive uncertainty in their loss estimates. By returning predictive distributions instead of deterministic point estimates, probabilistic models inherently provide reliability information alongside their predictions (e.g., Lüdtke et al., 2019;Rözer et al., 2019;Wagenaar et al., 2018). Despite the evidently large uncertainties governing loss estimation, only a small number of existing models is probabilistic (Gerl et al., 2016). However, the explicit consideration of predictive uncertainty bears concrete value for flood risk management practice. For instance, flood loss estimates are central components of risk-based decision making in flood protection planning (Merz & Thieken, 2009;Wagenaar et al., 2016). Decision-making frameworks such as expected utility theory or multicriteria analysis regard uncertainty information as integral for evaluating competing protection strategies (De Brito & Evers, 2016;Kreibich et al., 2014;Kunreuther et al., 2013). Probabilistic loss models inherently provide this uncertainty information and, hence, fit neatly into different decision support tools (Lüdtke et al., 2019). In this context, they represent an alternative to multimodel ensembles of deterministic flood models (see, e.g., Figueiredo et al., 2018), where a sufficient number of models is lacking or the setup of an ensemble is too expensive. Furthermore,  showed that probabilistic loss models can aid in bridging the gap between flood risk assessment at different scales, as they provide more accurate and informative loss estimates than deterministic models on the object level and are capable of propagating predictive uncertainty to aggregated levels, such as municipalities or states. Since both modelers and decision makers benefit from the transparent communication of uncertainty in damage estimates, further efforts should aim at the implementation of probabilistic loss models Meyer et al., 2013).
Company flood loss models that account for variable interactions and predictive uncertainty at the same time are still an exception. Several aspects impede the development of novel flood loss estimation techniques for companies. First, the damage processes of companies and residential buildings differ, which, in turn, requires the separate setup of company loss models . Second, companies are more heterogeneous than private households, for instance, with respect to building type, size, or occupancy. Namely, company equipment ranges from heavy machinery over technical devices to office items depending on the business sector, whereas the composition of the contents varies less across private households, and the size of companies ranges from self-employed persons to production facilities with large numbers of employees, while household sizes range in the same order of magnitude. This heterogeneity reflects in the loss data as variance (Gissing & Blong, 2004). Third, flood loss data are scarce and often inaccurate, especially for companies Molinari et al., 2014;Seifert et al., 2010;Sieg et al., 2017). Examples of multivariable flood loss models for companies that account for variable interactions are the empirical-synthetic FLFAcs model (Hasanzadeh Nafari, Ngo, & Lehman, 2016), the rule-based FLEMOcs model Seifert et al., 2010), and the random forest model of Sultana et al. (2018). Sieg et al. (2017) and  explored the capability of random forests to predict company flood loss for different economic sectors and spatial scales. Despite the necessity of proper model benchmarking (Gerl et al., 2016), an intercomparison of different multivariable probabilistic company flood loss models is still missing.
In this study, we present three multivariable probabilistic flood loss models for companies: Bayesian networks, Bayesian zero-and-one-inflated beta regression, and random forest. These models performed well in loss prediction exercises for the residential sector (Rözer et al., 2019;Schröter et al., 2014), where they outperformed other approaches such as rule-based models, probabilistic Gaussian regression models, or deterministic stage-damage functions; but except for random forest they have not been implemented for companies to date. The random forest model for companies of Sieg et al. (2017) and  achieved promising performance scores but has not yet been tested against equally complex models. We aim at closing these gaps by implementing the models for the estimation of company flood loss and conducting a thorough comparison of their predictive capacity on basis of the same data. Since the three candidate models can deal with multidimensional, heterogeneously scaled model data and return predictive distributions of flood loss, they fulfill the requirements of modern flood loss estimation and match the highly variable company loss data. We benchmark the proposed models against a probabilistic and a deterministic stage-damage function, serving as standard reference models. We fit and validate all models separately for direct tangible loss to the company assets building (BUI), equipment (EQU), and goods and stock (GNS) on the basis of object-level company loss data (n = 1,306) collected in postevent surveys in Germany between 2002 and 2013. The multivariable candidate models use information on flood intensity, company characteristics, and private precaution to estimate the flood damage, whereas the stage-damage functions solely depend on water depth. The objective of this study is the comparative examination and assessment of 1. the predictive capacity of multivariable models against the established, univariable stage-damage functions and 2. differences in predictive power among the multivariable probabilistic candidate models with a particularly focus on probabilistic forecasting. The results of this study offer new insights into flood damage processes of companies, the added value of complex modeling approaches, and the potential of probabilistic modeling in flood risk assessment.

Survey Data
The empirical company flood loss data used in this study stem from four individual postevent surveys after major floods in Germany that occurred in the period from 2002 to 2013 (Kreibich et al., 2007;Thieken et al., 2016). The survey questionnaires remained consistent over all four surveys and gathered information on flood intensity, company characteristics, emergency and private precautionary measures, flood experience, and flood loss. Large flood events in the Danube and Elbe river catchments in 2002 and 2013 contribute the largest share (n = 1,014) to the total number of 1,346 completed company interviews. The remaining company loss data were collected in the aftermath of events in 2005, 2006, and 2010-2012 in the Danube, Elbe, Oder, and Rhine catchments. The data set is dominated by small-and medium-sized companies with less than 250 employees. For details on the survey data set and the collection methodology see Kreibich et al. (2007). Table 1 lists a subset of variables from the survey data set, which we used for modeling in this study. The selection of the variable subset is primarily based on the studies of Kreibich et al. (2010) and Sieg et al. (2017), in which the authors quantitatively investigated variable importance with respect to relative loss on subsets of the same survey data. Furthermore, the composition of the predictor set was influenced by existing residential flood loss models (Elmer et al., 2010;Schröter et al., 2014;Wagenaar et al., 2018). In the following we motivate the predictor set and reference to studies, where the predictor was identified as influential or used in a loss model. Water depth Penning-Rowsell et al., 2005;Scawthorn et al., 2006;Sieg et al., 2017) and inundation duration Sieg et al., 2017;Vogel et al., 2014;Wagenaar et al., 2017) describe the intensity of the damaging flood event and are widely used in the prediction of flood loss. We augmented the surveyed flood intensity information on water depth and inundation duration by regionalized estimates of flood return periods (Elmer et al., 2010;Wagenaar et al., 2017Wagenaar et al., , 2018. Regional return period estimates provide additional insight on the general magnitude of the flood independent of spatially volatile inundation depths. Moreover, return periods allow for implications on the flood experience of affected companies, since severe events might have an impact on infrequently inundated neighborhoods with low risk awareness (Elmer et al., 2010). The calculation of the return period estimates involved a statistical extreme value analysis of time series of annual maximum discharge at river gauges in affected regions and was carried out in analogy to Elmer et al. (2010). Company characteristics are included into the model through the business sector in which the company operates, the company size expressed by the number of employees, and the spatial situation of the premises at the affected site Sieg et al., 2017). We assume that the flood experience of a company is tied to the number of previous floods that the company experienced Schröter et al., 2014;Wagenaar et al., 2018). In that sense, companies that were flooded once or several times before the surveyed event exhibit higher flood experience than companies, which never encountered flooding before.

Water Resources Research
For the assessment of companies' flood precaution (Kreibich et al., 2007Thieken et al., 2008;Vogel et al., 2018), we computed a ratio from a set of individual adaptation, mitigation, and emergency measures similar to Sieg et al. (2017). In contrast to Sieg et al., we combined adaption, mitigation, and emergency measures in one precaution ratio in order to reduce the number of predictor variables. The precaution ratio is defined as the number of precautionary measures that a specific company actually implemented prior to the damaging flood (nI) divided by the number of relevant measures that this company could have possibly implemented (nP) (1) Hence, company precaution is a ratio on the interval [0, 1], where well-prepared companies are assigned high ratios and poorly prepared companies are assigned low ratios. The observed values range from 4 to 10 for nP and from 0 to 10 for nI. The individual measures from which the precaution ratio was calculated are listed in Table 2. Except for the measure "saving equipment/saving goods and stock," which allowed for ordered answers depending on the amount of saved assets, all measures are treated as binary variables meaning that they were either implemented at the occurrence of the flood or not.
The damage to assets is expressed relative to their replacement value in order to facilitate the transferability of the derived models in space and time . Consequently, losses to building, equipment, and goods and stock are a ratio on the interval [0, 1], where a relative loss of 0 corresponds to no damage and a relative loss of one corresponds to the total loss of the asset.
Furthermore, we excluded companies (n = 8) with extraordinary long-lasting inundation durations (>30 days) since we found evidence for erroneous survey answers in these cases. Prior to the model derivation, we removed companies with missing predictor values and subdivided the resulting data set (n = 1,306) into three asset-specific data sets (n bui = 545, n equ = 829, n gns = 928). Figure 1 shows the distributions of the predictor and response variables for the three asset-specific data sets in the form of violin plots (Hintze & Nelson, 1998). The variable distributions are estimated through kernel density estimation (Silverman, 1998). The response variable, relative loss, contains considerable shares of no (value: 0) and total (value: 1) loss cases for building (0: 32%, 1: 4%), equipment (0: 37%, 1: 17%), and goods and stock (0: 51%, 1: 20%). This results in bimodality of the relative loss distributions, which is particularly pronounced for equipment and goods and stock.

Development of Probabilistic Loss Estimation Models 2.2.1. Random Forest
Random forest (RF) is a machine learning technique, which uses ensembles of decision trees for classification and regression problems (for details see Breiman, 2001;Liaw & Wiener, 2002). RFs are capable of Note. We estimate the degree of precaution for each company on the basis of the listed measures. a Ordered answer with four levels: from 0 = "nothing was saved" to 3 = "everything was saved"; this measure is possible for all companies. handling high-dimensional, nonlinear data and offer large flexibility as they accept discrete and continuous predictors at the same time (James et al., 2013).
RF is a supervised learning algorithm, which fits a large number of individual decision trees to data. The tree ensemble draws its predictive power from two techniques: bootstrap aggregation (bagging) and random feature selection. Bagging generates bootstrap samples of the original data before growing the trees and aggregates predictions of the individual trees afterward. During tree construction, random feature selection constrains the set of possible split variables at each splitting node, introducing additional randomness. The combination of bagging and random feature selection decreases the correlation among trees, which prevents overfitting and increases the prediction accuracy of the forest. Moreover, RFs inherently provide estimates of predictor importance. During tree construction, the algorithm randomly permutes each predictor and tracks the resulting mean decrease in prediction accuracy of the RF. A strong deterioration in prediction accuracy indicates that the respective predictor is more relevant for the predictive capacity of the RF.
The standard implementation of RF uses the classification and regression tree algorithm to construct the individual decision trees by recursively partitioning the training data into homogeneous subsets (Breiman et al., 1984). However, during recursive partitioning, this algorithm favors predictors with many possible splits (e.g., continuous variables) over predictors with few splits (e.g., categorical variables), leading to a variable selection bias (A. P. White & Liu, 1994). Hothorn et al. (2006) developed a recursive partitioning routine based on permutation tests, termed conditional inference tree algorithm, which overcomes this bias. Since the company loss data set used in this study consists of continuous, ordinal, and nominal variables, we used the conditional inference tree algorithm. In addition, we obtained conditional response distributions of relative loss instead of mean values by employing the quantile regression forest methodology of Meinshausen (2006). The majority of previous studies on flood loss modeling used the conventional classification and regression tree algorithm (e.g., Carisi et al., 2018;Kreibich et al., 2017;Schröter et al., 2014), but recent works increasingly applied the conditional inference tree algorithm (Sieg et al., 2017;Sultana et al., 2018) or a combination of conditional inference trees and quantile regression forests Sieg, Schinko, et al., 2019), which we also applied in this study.
Our RF model is controlled by two parameters, the number of trees n tree and the number of randomly sampled predictors m try during partitioning. We decided for a common parameter choice with n tree = 1,000 and m try = 3 (Hastie et al., 2009;Liaw & Wiener, 2002). The supporting information (SI) to this paper provide information on the computational implementation of the RF model (Hothorn & Zeileis, 2015).

Bayesian Network
A Bayesian network (BN) is a probabilistic graphical model. It does not distinguish between predictor and response variable but represents the joint probability distribution of all variables in form of a directed acyclic graph (for details see Jensen & Nielsen, 2007;Nagarajan et al., 2013;Pearl, 2009). BNs encode the statistical dependence structure of the random variables into a set of nodes and arcs. Each variable is symbolized by a node, while the conditional dependence or independence of two variables is expressed by the presence or absence of a connecting arc between their corresponding nodes. This independence mapping of a BN facilitates efficient probabilistic computation as the global, joint distribution of the variable set can be factorized into a product of local, conditional probability distributions.
In theory, BNs are applicable to continuous and discrete variables. Yet in practice, continuous BNs are usually restricted to normally distributed variables in order to maintain closed-form expressions of the associated probability distributions (Scutari, 2010). Since our flood loss data contain both discrete and continuous variables, which partly have skewed distributions, we implement discrete BNs in this study. The factorized formulation of the joint probability distribution for a discrete BN reads where X i are all n variables of the BN and Π Xi are the respective parent nodes of X i in the directed acyclic graph, that is, nodes whose arcs point toward X i . In a discrete BN, all probability distributions are multinomial, and the local distributions of the nodes are defined in conditional probability tables, which represent the parameters of the model ( Consequently, the implementation of a BN requires (1) the definition of the graph structure and (2) the estimation of the conditional probability table values. We learned three separate network structures and their corresponding parameters to receive individual BN models for the company assets building, equipment, and goods and stock. For prediction we employed Bayesian inference.
The continuous variables in the survey data demanded for adjustments before we could use them for learning and prediction in a discrete BN. Therefore, we binned all continuous variables into intervals (Koller & Friedman, 2009;Vogel et al., 2012Vogel et al., , 2014 by means of an equal-frequency discretization scheme (e.g., Wagenaar et al., 2018). This discretization routine calculates interval boundaries in a way that the resulting bins contain an equal amount of observations. In the interest of model accuracy, we assigned 10 bins to the presumably most influential predictors water depth and precaution ratio Sieg et al., 2017) and to the target variable relative loss. The number of classes for the other continuous variables inundation duration, company size, and return period was set to 5. By definition, a discrete BN returns a probability mass function of the target variable. However, the other two candidate models of this study provide continuous predictive distributions on the interval [0,1] for the relative loss. For the purpose of comparability, we derive a continuous probability density for the binned BN, by fitting a distribution to data that we sampled from the observed relative loss cases with sampling weights according to the probability that the BN predicted. For further details on the BNs we refer to the SI (Højsgaard, 2012).

Bayesian Regression
In the Bayesian regression (BR) (for details see Gelman et al., 2013;McElreath, 2018), we model relative loss with a zero-and-one-inflated beta distribution. The conventional beta distribution is a common choice for modeling fractional data, which range from 0 to 1 such as relative loss (Ferrari & Cribari-Neto, 2004). However, the beta distribution is not defined on those boundaries and, hence, cannot reproduce extreme cases of no (0) or total loss (1), which are abundant in the study data. The zero-and-one-inflated beta distribution (Ospina & Ferrari, 2010) overcomes this limitation by combining the beta with the Bernoulli distribution, which accounts for the excess in zeros and ones in the model data. The resulting mixture distribution has the following cumulative distribution function (CDF): where y is the response, relative loss, λ is the zero-and-one-inflation probability (i.e., the probability that the response is 0 or 1), F Bernoulli (·| γ) is the CDF of the Bernoulli distribution with parameter γ, which is the conditional one-inflation probability (i.e., the probability that the response is 1 rather than 0). F Beta (·| μ, ϕ) is the CDF of the reparameterized beta distribution with μ and ϕ as mean and precision parameter (Ferrari & Cribari-Neto, 2004).
We configure the BR as a distributional model, which means that not only the mean μ of the beta distribution is predicted but also the remaining parameters λ, γ, and ϕ. We use different sets of predictor variables, X λ , X γ , X μ , and X ϕ , for each parameter, receiving the following functions in the regression model where Y i denotes the response variable for observation i (i.e., the relative loss of one company) and X par,i the respective values of the predictor variables for the corresponding parameter. The parameters α par and β par are the intercept and regression coefficients of the corresponding distribution parameter in the combined regression model.
We estimate the mean μ of the beta distribution from all available predictors. In contrast, the inflation parameters λ and γ and the precision parameter ϕ are predicted by a selection of the most influential predictor variables for the respective asset. In this way, we reduce the number of model parameters, which improves the model convergence during parameter estimation and accounts for differences in the flood damage processes across the assets. The analysis of Sieg et al. (2017), which has been conducted on a subset of the data 10.1029/2020WR027649

Water Resources Research
SCHOPPA ET AL.
used in this study, suggests that the spatial situation of a company is a major factor for flood loss to buildings. Furthermore, the predictor importance measures for equipment, and goods and stock vary particularly strong across the economic sectors of the companies. Water depth and precaution exhibited high predictor importance across all assets. Table 3 shows which variables we used for predicting the zero-andone-inflated beta parameters in the individual asset loss models. Before model fitting, we transformed continuous predictors by a Yeo-Johnson transformation in order to treat the pronounced skewness in the predictors variables (Yeo, 2000). In addition, we centered and scaled continuous predictors. The regression terms contain individual coefficients for each level of the categorical predictors, sector, and spatial situation (i.e., dummy encoding; McElreath (2018)), and we model the ordinal variable flood experience as a monotonic effect (Bürkner, 2018).

Comparison to Stage-Damage Functions
We compare the previously presented candidate models to univariable stage-damage functions (SDF). SDFs predict flood loss solely from water depth, wd, and represent the conventional approach to flood loss modeling . Following similar studies (Rözer et al., 2019;Schröter et al., 2014;Wagenaar et al., 2017), we employ a square root SDF as a baseline model for comparison. Square root SDFs outperformed other functional forms (linear and polynomial) before (Elmer et al., 2010) and are arguably the most common instance of a SDF (Wagenaar et al., 2017). We implement a deterministic (SDF-D) and a probabilistic version (SDF-P) of the square root SDF, in order to differentiate between the added value of multivariable and probabilistic prediction separately.
The deterministic SDF represents an established standard approach to flood loss estimation. The model is a simple, least square regression, which is defined as follows: where Y i is the observed relative loss, α and β are the intercept and regression coefficient, and ε i is the error for observation i. During model fitting, the error sum of squares is minimized.
We implement the probabilistic SDF in a Bayesian framework in order to assure comparability with the probabilistic candidate models. Like in the BR model, we assume that relative loss follows a zero-and-oneinflated Beta distribution. The SDF model formulation reads Other than in the BR model, we only predict the mean parameter μ of the beta distribution. The remaining distribution parameters, λ, γ, and ϕ are assumed to be constant across observations; that is, we estimate them during the inference but do not predict them. We estimated the parameters of SDF-P in analogy to the BR model via MCMC. The SI contains further information on the prior choice for the SDF-P model.

Table 3 Predictor Sets of the Zero-and-One-Inflated Beta Regression
Building Equipment/goods and stock μ -beta mean all predictors all predictors ϕ -beta precision water depth and precaution water depth and precaution λ -zero-and-one-inflation water depth, precaution, and spatial situation water depth, precaution, and sector γ -conditional one-inflation water depth, precaution, and spatial situation water depth, precaution, and sector Note. The predictors vary over the parameters and over the assets. Differences between the models for building and equipment/goods and stock are indicated in italics.

Water Resources Research
SCHOPPA ET AL.

Model Validation
We validate the predictive performance of BN, BR, RF, and SDF individually for the three assets building, equipment, and goods and stock. This results in 12 asset-model combinations. All candidate models return probabilistic predictions rather than deterministic loss estimates. However, the models do not provide analytical predictive distributions but simulated approximations in the form of samples. For each model, we sampled 1,000 values from the conditional response distribution and evaluated this probabilistic response with respect to accuracy, sharpness, and calibration. Within one asset data set, we estimated the model test errors through repeated tenfold cross validation in order to receive robust estimates of true model performance. That is, we initiated 100 independent runs of tenfold cross validation with varying, random data partitioning. In each of the tenfold cross-validation runs, every company is held out of the training set for prediction exactly once. We validate model performance for each cross-validation fold by means of three performance metrics: 1. The mean absolute error (MAE) for the mean of the predictive distribution. The MAE evaluates the accuracy of a point forecast and averages the absolute difference between the observed response and the predicted point estimate over the number of observations. 2. The mean bias error (MBE), which quantifies model overestimation and underestimation in the mean of the predictive distributions. 3. The continuous ranked probability score (CRPS), which is a proper scoring rule that evaluates the entire continuous distribution of a probabilistic forecast. It jointly assesses the sharpness and calibration of the predictive distribution and generalizes the absolute error (Gneiting & Katzfuss, 2014;Matheson & Winkler, 1976). Hence, the CRPS can be compared directly to the MAE. The CRPS for one observation y i is defined as where F i (x) is the CDF of the predictive distribution f i (x) and 1{·} is the indicator function. We compute the CRPS with an empirical CDF estimated from samples of f i (x). For details on the numerical implementation of the CRPS for simulated forecasts, we refer to the corresponding literature (Jordan et al., 2019;Krüger et al., 2016). For the proportional response variable, relative loss, the CRPS is defined on the interval [0, 1] with the optimum at 0. Note that the CRPS is calculated individually for each observation. For the comparison with the MAE, we computed the mean CRPS value in each cross-validation fold.

Variable Importance in Multivariable Models
We compare the fitted multivariable models with respect to plausibility and consistency. First, the model fits should be in line with physical principles governing flood damage processes, for example, that loss increases with larger water depths. Second, the relative effect and influence of the predictors should be similar across models, since they are fit to the same training data. Figure 2 compares the learned BN structures, the estimated BR regression parameters for the mean parameter of the beta distribution (μ), and the RF predictor importance measures for the three study assets. In the BN structures, variables with the strongest statistical dependence on relative loss are directly connected to its node. The relative magnitude and sign of the estimated BR regression coefficients yield information on the effect of the corresponding predictor on relative loss. The coefficients for the categorical predictors, spatial situation, and sector express the deviation in flood loss for each variable group individually and relative to the first group of the respective variable, which acts as a reference (see "dummy encoding" in section 2.2.3 ). For example, companies that operate in the second sector group, "commercial," experienced considerably higher building loss than companies belonging to the first sector group, that is, "manufacturing," since the respective coefficient "sec[com]" is positive in the building model. Ultimately, RF expresses variable 10.1029/2020WR027649

Water Resources Research
SCHOPPA ET AL.
importance through the change of model accuracy that is induced by simulating the absence of a particular variable. The stronger the decrease in RF accuracy, the more relevant is the withheld predictor.
Water depth is a dominant influencing factor for flood loss to all three assets, as indicated by direct arc connections to relative loss in all BNs. This is confirmed by high absolute values of BR regression coefficients and RF predictor importance. The relevance of water depth deteriorates from building over equipment to goods and stock. The estimated signs of the BR regression coefficients show that water depth has a positive effect on relative loss. The high variable importance of water depth is in accordance with the majority of

Water Resources Research
company flood loss models (e.g., Hasanzadeh Nafari, Ngo, & Lehman, 2016;Kreibich et al., 2010;Penning-Rowsell et al., 2005;Sieg et al., 2017), where water depth is the most influential predictor. The remaining flood intensity variables, return period, and duration predominantly drive relative loss as well, yet to a lesser degree than water depth, confirming findings from similar studies for private households and companies (e.g., Kreibich et al., 2010;Sieg et al., 2017;Vogel et al., 2018).
Precaution is a likewise important predictor in the proposed models with direct arc connections to relative loss in all BNs. BR and RF reveal that the effect of precaution becomes more important for losses to equipment and, especially, goods and stock. Precaution was identified as an influential variable before Sieg et al., 2017), but it exhibits striking relevance in the presented models, which might trace back to different approaches to estimating company precaution. The large negative impact of precaution on relative loss in the BR models implies that precautionary measures can reduce flood loss effectively.
The spatial situation is more significant for losses to building than to equipment, and goods and stock. In contrast, the economic sector exhibits higher explanatory power for equipment, and goods and stock as indicated by the direct arc connections from sector to relative loss in the respective BNs. The effect of the company size is negative with maximum magnitude for losses to goods and stock. Sieg et al. (2017) found similar patterns of predictor importance for the spatial situation and company size. Flood experience plays a minor role for all assets and seems to reduce relative loss as well. The BN graphs imply that the predictive power of company size and flood experience is covered by correlated variables in adjacent nodes that are closer to relative loss (spatial situation and precaution), which explains their inferior overall importance. Kreibich et al. (2010) also identified flood experience as a subordinate predictor.
The variation in the predictor effects across the assets suggests that damage processes differ for losses to building, equipment, and goods and stock. This was also reported by Sieg et al. (2017), who observed fluctuating predictor importance across asset types for a subset of the same survey data. In general, building loss is controlled by variables describing the hazard intensity, precaution, and the spatial situation. In contrast, variables that describe company characteristics (sector and size) and precaution bear most information for losses to equipment, and goods and stock and sometimes even exceed the effect of water depth. Considering the pronounced variable effect of the sector for these assets, it seems that the heterogeneity among companies mainly reflects in the damage processes for equipment, and goods and stock. For instance, company equipment ranges from heavy machinery over technical devices to office items depending on the business sector. Conversely, the low RF predictor importance of the sector in the building loss models suggests that the damage processes for buildings are more alike over different company types. These findings are in line with the results of Sieg et al. (2017), where damage processes across sectors diverged more for equipment, and goods and stock.
We conclude that the fitted candidate models satisfy the criteria of plausibility because the predictor effects agree with previous findings and match the physical understanding of damage processes. The dissimilarity in the model fits for different assets justifies the development of distinct loss models for building, equipment, and goods and stock and highlights the benefit of multivariable loss modeling approaches. Overall, the candidate models consistently identify the same predictors as most relevant (water depth, precaution, and sector) and agree well within one asset. Minor discrepancies occur primarily for predictors with moderate to weak predictive power such as return period or flood experience. Figure 3 shows the results of the repeated cross-validation runs for the three performance metrics MAE, MBE, and mean CRPS. Each boxplot summarizes 100 repetitions of a tenfold cross validation for a specific asset (x axis), metric (plot panels), and model (color coded). Comparing MAE, we observe that all models achieve the lowest errors for building loss. The multivariable models (BN, BR, and RF) perform similarly and exhibit median MAE values of 0.149, 0.158, and 0.150, respectively. The probabilistic and deterministic SDFs reach slightly higher MAEs of 0.174 and 0.165 in the median. MAE scores for equipment and goods and stock deteriorate in comparison to building loss and are in the same range across the multivariable models with values of approximately 0.27. For the SDFs, however, MAEs increase stronger for goods and stock (SDF-P: 0.355 and SDF-D: 0.348) than for equipment (SDF-P: 0.329 and SDF-D: 0.316). Among the multivariable models, BR shows slightly higher MAEs than BN and RF.

Water Resources Research
The cross-validated mean CRPS shows almost the same relative ranking of the models. Medians of mean CRPS values for the multivariable models are approximately 0.10 for building and 0.16 for equipment and goods and stock. With mean CRPS values of 0.109 (BUI), 0.195 (EQU), and 0.200 (GNS), SDF-P is outperformed by the complex models, especially for equipment and goods and stock. RF reaches the best mean CRPS for all three assets, yet the difference to the other multivariable models is small. CRPS cannot be calculated for the deterministic SDF, as it requires probabilistic predictions.
The boxplots of the MBE reveal that all models neither underestimate nor overestimate relative loss considerably in the median.
As described in section 2.3, the CRPS generalizes the MAE, which facilitates the direct comparison of deterministic and probabilistic forecasts. The larger values of MAE in comparison to mean CRPS suggest a loss of information about the observed response, when the predictive distribution is condensed to a single value, namely, the mean. Moreover, in case of the probabilistic models, the MAE produces biased estimates of model skill, as the mean of the predictive distribution commonly deviates from the most probable loss. This could also explain why the deterministic SDF outperforms the probabilistic SDF when considering the MAE. Yet computing the MAE on basis of the mode is likewise biased in this application since the predictive distributions are often bimodal (see Figure 4). We reason that scoring rules that evaluate entire predictive distributions rather than response means or modes, are more robust estimates of true model skill; at least for response distributions other than the normal. Figure 4 compares the predictive distributions for building loss of the candidate and benchmark models for nine randomly selected companies. The predictive distributions are color coded according to the models, and the actually observed relative loss is indicated by a vertical, black line. The yellow line shows the predicted loss of the deterministic stage-damage function (SDF-D). The predictive distributions of BN, BR, and RF are flexible and vary considerably from company to company. Conversely, the SDF-P distributions fluctuate less and their medians rarely exceed 0.10. The deterministic predictions of SDF-D vary the least across individual companies, as the model lacks a component that explicitly accounts for extreme losses at 0 and 1. In contrast to SDF-P, which predicts constant shares of 0 and 1, the multivariable models inflate and deflate the modes of their predictive densities at 0 and 1 dynamically, reflecting the actual observations of relative loss (see, e.g.,

Water Resources Research
IDs 165 and 136 in Figure 4). The invariant shape of the SDF-P predictive densities leads to overall higher errors of their predictions (Figure 3). In general, the prediction accuracy and sharpness is larger for companies with low loss magnitudes as compared to companies with more severe losses. Figure 5 confirms that the differences in the example predictive distributions between the multivariable models and the SDF-P also apply to the entire data set. Every point represents the CRPS error of the predictive distribution for one company, while the stepwise, black line indicates the mean CRPS in the corresponding interval of observed relative loss. The scatter plots show that the CRPS of the probabilistic predictions changes over different magnitudes of observed relative loss. The variation in the CRPS is stronger for the multivariable models than for the SDF-P, for which errors disperse less around the interval mean.
The steady predictive distributions of the SDF-P, and hence its errors, do not change significantly across observations. While this generalizing behavior of the SDF-P is favorable in principle, its mean CRPS values exceed the ones of the multivariable models. In addition, prediction errors tend to increase with larger values of relative loss. We encounter this trend for all models and assets, and it is more pronounced for building loss than for losses to equipment, and goods and stock. The striking difference in the scatter point clouds between Figure 4. Examples of predictive densities from the four probabilistic models (color coded) for the building loss of nine randomly selected companies (identified by ID). Black and yellow lines display the observed loss and the predicted loss of the deterministic stage-damage function, respectively. The panels are sorted according to the observed loss, which is indicated by the black, vertical lines. Colored lines beneath the distributions indicate the quartiles of the respective predictive density. The scaling of the four y axes within each panel is consistent, ensuring the comparability of the predictive densities. The displayed densities originate from a cross-validation run.

10.1029/2020WR027649
Water Resources Research buildings on the one side and equipment, and goods and stock on the other side, traces back to stronger bimodality for the observed losses to equipment, and goods and stock (see rloss distributions in Figure 1).

Model Comparison 3.3.1. Multivariable Models and Stage-Damage Functions
The cross-validated performance metrics in Figure 3 show that BN, BR, and RF are superior to the deterministic and probabilistic SDFs with respect to predictive capacity for all three study assets. In general, the prediction accuracy is higher for buildings than for equipment and goods and stock. We identify two reasons for the difference in prediction skill. First, Figure 6 shows that the relationship between water depth and relative loss is volatile for all assets and only insufficiently discriminates between severe and minor relative loss. BN, BR, and RF have access to information contained in predictors other than water depth, which fosters a more accurate determination of the loss magnitude. SDFs base their predictions solely on water depth and, thus, fail to explain irregular loss cases, for instance, when low water depth causes high relative loss. Second, the multivariable models exhibit higher structural complexity than the SDFs, which allows for closer fits to the training data. For instance, SDF-P and BR both model relative loss with an inflated beta distribution. However, while SDF-P assumes constant inflation and precision parameters, BR predicts these parameters for each company individually. The increased number of parameters leads to higher flexibility in the predictive densities for the BR model. The capability to deflate and inflate the modes at 0 and 1 (see Figure 4) Figure 5. Scatter plots of observed relative loss versus cross-validated continuous ranked probability scores (CRPS) for all combinations of assets (rows; symbol coded) and probabilistic models (columns, color coded). Each symbol represents the prediction error incurred by the respective model for one company. The black, stepwise lines show the average CRPS in different intervals of observed relative loss. The labels in the top left corner of each panel contain the mean CRPS over all predictions (=symbols) of the respective asset-model combination.

10.1029/2020WR027649
Water Resources Research enables the complex models to capture both extremes of relative loss at the same time, while the SDFs have to find a balance between covering no and total loss cases. The boxplots in Figure 3 show that the larger shares of 0 and 1 in the data for equipment and goods and stock lead to larger performance difference between the models with complex (BN, BR, and RF) and simple structure (SDFs).
However, the flexibility in the predictive distributions of the multivariable models propagates to the CRPS, resulting in considerable variance in the errors for individual companies ( Figure 5). This observation reflects the bias-variance trade-off, a typical phenomenon in predictive statistical modeling (see, e.g., James et al., 2013). It describes that complex, multivariable models, such as BN, BR, and RF, incur lower bias than models with fewer parameters, such as the SDFs, at the cost of larger variance in their predictions and errors. While overly flexible models are at risk of undesirably capturing random noise in the data (i.e., overfitting), inflexible models might be unable to reproduce essential features of the data generating process (i.e., underfitting). The required degree of model complexity depends on the data and the question under consideration. We assume that the heterogeneity of companies and damage processes demands for a fair amount of model complexity. Given the results of the validation experiment, we conclude that it is the combination of multivariable and probabilistic modeling, which causes the candidate models to outperform the benchmark models, albeit the large variation in CRPS error. Schröter et al. (2014), who developed and validated multivariable probabilistic models for the residential sector, also observed that complex models perform better than simpler modeling approaches. The ability of the proposed models to account for variable interactions and to capture complicated data-generating processes (i.e., zero-one-inflation) might even be more useful for modeling company loss data, where heterogeneity across company types leads to particularly noisy relationships between predictors and loss.
Further, the notable difference in the values of the mean CRPS and the MAE within the same models in Figure 3 shows that the predictions of the probabilistic models are more informative than the loss forecasts of the deterministic SDF. This gain in information can be employed for practical applications in risk analysis or decision making, where estimates of prediction reliability provide additional decision support.

Intercomparison of Multivariable Models
The intercomparison of the different multivariable models does not reveal clear performance differences. BN and RF outperform BR slightly. Yet the magnitude of the performance differences is small. The high agreement on the aggregated and company-level performance metrics of BN, BR, and RF implies that the predictive capacity of the multivariable approaches is rather constrained by the information content in the training data than by model-specific characteristics. It remains an open question, whether limited knowledge about flood damage processes hinders the composition of more meaningful predictor sets, or whether the inherent variation in the flood damage processes restricts the forecasting capacity of existing models at a certain threshold. Either way, the model choice should be guided by the study task and data availability. BNs allow for intuitive inference on the flood damage processes through the graphical dependency structure and have advantages in the treatment of missing data. The strength of BR lies in the flexibility of the Bayesian framework, where multilevel modeling and the definition of strong priors facilitate predictions even with few loss data. RF provides accurate predictions with relatively small implementation effort and is tolerant with respect to differently scaled model variables. However, modelers have less influence on the internal model structure, and the interpretation of the RF functionality is difficult.
BN, BR, and RF outperform other multivariable company loss models, which have been validated on subsets of the same survey data. For instance, Seifert et al. (2010) reported MAE values of 0.23 (BUI), 0.30 (EQU), and 0.30 (GNS) for their FLEMOcs+ model. The RF model of Sieg et al. (2017) achieved MAE values of 0.18 (BUI), 0.31 (EQU), and 0.37 (GNS). We assume that the performance advantages of the presented models are a joint result of different model configurations, changes in the predictor variables, and a larger data basis in this study.
Although the multivariable probabilistic models improve the accuracy and sharpness of the loss estimations over the entire data set, they incur considerable CRPS errors for severe losses. This is problematic, since large relative loss cases can have a strong effect on the total estimates of postevent loss in a flooded area. The poor performance for severe losses arises from the imbalance between frequent but small and infrequent but major damages, which is common in natural disaster loss data sets (Pisarenko & Rodkin, 2010). The number of high losses provides too few training samples for the algorithms to reliably identify whether a company experiences severe flood loss or not. The problem of undersampled extremes might eventually resolve when observational periods become long enough to contain a sufficient number of severe losses. Yet in the analysis of natural hazards the required time horizons quickly exceed decades (Zöller, 2013). Here, the enrichment of loss data sets with severe loss cases from other regions, as practiced in the modeling of extreme earthquakes (i.e., method of analogs; see, e.g., Holschneider et al., 2011or Wheeler, 2009, could represent an alternative. Additionally, if available in the data, further refinements of the predictor set could improve the predictive power of the models, for example, by including variables that describe the structural characteristic of buildings more accurately (see, e.g., Hasanzadeh Nafari, Ngo, & Lehman, 2016;Scawthorn et al., 2006). In general, the predictive distributions of the multivariable models are relatively wide, especially for companies that experienced large relative loss and for the assets equipment, and goods and stock. Hence, further analysis of the distinct uncertainty sources and the potential to reduce their contribution to the overall variance in the loss estimates could improve the reliability of the proposed models.

Conclusions
This study presents three multivariable flood loss models for companies, which return probabilistic loss predictions. Referring to the objectives of this study: 1. Bayesian networks, Bayesian regression, and random forest outperform the deterministic and probabilistic stage-damage functions due to additional information contained in predictors other than water depth and larger flexibility in their predictive densities. 2. The predictive capabilities across the multivariable models are very similar and constrained by the explanatory power of the predictor set rather than by model choice.
Although the cross-validated performance metrics for the multivariable models confirm higher predictive skill in comparison to existing company flood loss models, our analysis identified substantial uncertainty in the predictive distributions and deteriorating predictive power for large losses.
Since we have to accept the inherent complexity of flood damage processes and poor coverage of severe losses in the data, we advocate the proper treatment of the resulting uncertainties. Probabilistic modeling explicitly quantifies the associated uncertainties and, hence, provides more honest loss estimates than 10.1029/2020WR027649

Water Resources Research
deterministic approaches. Moreover, the additional uncertainty information could directly contribute to flood risk management practice, for instance, by providing the probabilistic foundation for an informed decision making, where the attractiveness of a certain flood protection measure not only depends on the expected reduction in damage but also on the confidence in the predicted damage reduction, or by facilitating the seamless propagation of predictive uncertainty across different exposure aggregation levels. Therefore, in our opinion, probabilistic models should become the standard approach in flood loss estimation. Further, this study underlines that the demand for probabilistic loss estimation is particularly strong for companies, given the large variation of loss influencing variables across individual companies and their exposed assets. In conclusion, the combination of multivariable and probabilistic modeling advances the representation of company vulnerability in flood risk assessment through improved loss estimations and transparent communication of their reliability.

Data Availability Statement
The survey data are available at the German flood damage database HOWAS21 (http://howas21.gfz-potsdam.de/howas21/).