Bayesian Quantification and Reduction of Uncertainties in 3D Geomechanical‐Numerical Models

The distance to failure of the upper crustal rock in the prevalent stress field is of importance to better understand fault reactivation by natural and induced processes as well as to plan and manage georeservoirs. In particular, the contemporary stress state is one of the key ingredients for this assessment. To provide a continuous description of the 3D absolute stress state geomechanical‐numerical models are used. However, stress magnitude data for model calibration are sparse and incomplete and thus, the resulting model uncertainties are large. In order to reduce the uncertainties, we incorporate additional constraints on stress magnitudes to check the plausibility of different data‐based stress states. We use formation integrity tests, borehole breakouts, drilling induced fractures, and observations of seismicity and distinct seismological quiescence. This information is weighted according to its confidence and the agreement with the different modeled stress states is assessed. The information is introduced to a Bayesian approach to estimate weights of the modeled stress states and thereby identify their plausibility. A case study in southern Germany shows the ability of the approach to identify from a wide range of stress states a small number of plausible ones and reject implausible stress states. This significantly reduces the number of stress states and thus lowers the model uncertainties.

For example, in the gas field of Groningen in The Netherlands, induced seismicity increased significantly in the past years even though exploitation started decades earlier (van Wees et al., 2014). This reflects that induced seismicity and other unwanted failure processes depend also on the initial conditions of the reservoir. A key parameter regarding stability aspects is the undisturbed stress state before any anthropogenic interventions as it defines the distance to failure and quantifies how much stress changes a reservoir can sustain without reaching a critical stress state Schoenball et al., 2018). The distance to failure is often expressed by means of the fracture potential of intact rock (Connolly & Cosgrove, 1999) or slip tendency (ST) for the reactivation of pre-existing faults (Morris et al., 1996). The further the undisturbed stress state is away from failure the more stress changes it takes before a critical stress state is reached (Figure 1).
To quantify the 3D undisturbed stress state and its spatial variability 3D geomechanical-numerical modeling is used (Fischer & Henk, 2013;Lecampion & Lei, 2010;Roche & van der Baan, 2017;Ziegler et al., 2016). These models are based on static 3D geological models that provide the spatial distribution of rock properties, faults, and pore pressure. The stress state is modeled as a superposition of gravitational volume forces and surface forces due to plate tectonics that are imposed by means of lateral displacement boundary conditions. These boundary conditions are either determined using model-independent stress magnitude data in an inversion scheme (Lecampion & Lei, 2010) or data-driven approaches to determine a best-fit model (Hergert et al., 2015;Reiter & Heidbach, 2014). This search for best-fit boundary conditions is referred to as model calibration using stress data records. Ziegler and Heidbach (2020) showed that the stress data records that are used for the model calibration result in high uncertainties of the 3D stress tensor (Figure 1c). Their 3D model of the Bavarian Molasse Basin in the Northern Alpine Foreland addresses the geomechanics of the induced seismicity at some of the abundant geothermal projects using hot Jurassic water for district heating (Megies & Wassermann, 2014). In the model, the 1σ standard deviation is 5-10 MPa for the magnitude of the minimum horizontal stress S hmin and 25-45 MPa for the magnitude of the maximum horizontal stress S Hmax . In the model approach of Ziegler and Heidbach (2020), it is assumed that the stress magnitude data that are used for the model calibration are free of errors, which is not the case. The two key sources for the resulting high standard deviation are the stress magnitude data records used for the model calibration. They are subject to individual and potentially large measurement errors. Furthermore, some of these data records are not representative for a larger rock volume, but instead represent a local deviation that is not resolved with the achieved model resolution. Outliers are the key drivers for the model uncertainties. They are either due to a priori unknown measurement errors or data that are not representative for a larger rock volume, but only a local deviation.
Thus, the paper addresses two open issues. First, the introduction of an individual error for each calibration data record of the S hmin and S Hmax magnitude and second, an objective identification of data records that are not suitable for the model calibration. For the latter, we show that additional indirect stress information can be integrated to derive weights for the model calibration. This stress information comes from observations that provide upper or lower bounds of certain stress components. In particular, we use Formation Integrity Tests (FITs). In these, the lithological formation is subjected to a known pressure (White et al., 2002) and, provided the rock does not fail, the FIT pressure has to be smaller than the least principal stress. Thus, the FIT pressure provides a minimum lower boundary of all principal stress components. The second additional constraint results from seismological  Ziegler and Heidbach (2020). Dotted line shows the best-fit model result w.r.t. the calibration data and the light and dark shaded areas show the 1σ and 2σ standard deviation, respectively.
3 of 23 observation (Shen et al., 2019). If seismicity is observed, the stress state is at failure while no observed seismicity (in well instrumented areas) indicates a subcritical stress state. This observed criticality or stability of the stress state can be compared to the modeled criticality in combination with an assumed failure criterion (Figure 1). The agreement or disagreement of the model results with the observations provides a measure how reliable a modeled stress state is. The third additional constraint is the observation of borehole failures such as borehole breakouts (BOs) or drilling induced tensile fractures (DITFs). These failures occur if the maximum circumferential stress exceeds the rocks compressive strength (BOs) or if the rocks tensile strength exceeds the minimum circumferential stress (DITFs; Aadnoy, 1990;Amadei & Stephansson, 1997). In synthesis with an assumption on the rock strength, observations of BOs or DITFs thus provide indication on whether the modeled circumferential stress agrees with the observed borehole failures.
We show that this approach allows us a posteriori to identify the a priori unknown outliers of the stress data records that are either erroneous or not representative for a larger rock volume. Data-based weights for individual stress data records will be estimated which results in an improved significance of the modeled stress states. For such a task the Bayesian approach is common practice in other research fields (Gómez Zapata et al., 2022;Nawaz & Curtis, 2019). It allows a formal weighting, but to our knowledge, this has not been applied in forward geomechanical modeling of the 3D stress field.
To exemplify the approach, we use the 3D geomechanical-numerical model of the Bavarian Molasse Basin from Ziegler and Heidbach (2020) that has been calibrated against nine S hmin and four S Hmax stress magnitude data records. To lower the uncertainties of this model result we will use now in addition 53 FITs from geothermal boreholes and seismological observations from 20 geothermal projects. Furthermore, we also use 52 BO and five DITF data records from 57 wells taken from the World Stress Map database . This information is used as input for the described Bayesian workflow to estimate weights. We thereby show that we can improve the significance of the modeled absolute stress magnitudes significantly.

Modeling the Undisturbed Stress State
The contemporary undisturbed absolute stress field in the upper crust can be modeled through the superposition of body forces due to gravity and surface forces due to tectonic processes (Fischer & Henk, 2013;van Wees et al., 2003). The contribution of gravitational body forces is controlled by the 3D density distribution and tectonic processes by lateral displacement at the model boundary (Hergert et al., 2015;Reiter & Heidbach, 2014). We describe briefly the general approach of model calibration and estimation of its uncertainties.

Model Calibration
Mathematically, the stress state at a point is described with the Cauchy stress tensor: As it is common in geosciences, a compression-positive convention is used in contrast to engineering applications. Due to the stress tensors symmetry only six components (three normal components ij, = and three shear components ij, ≠ ) are independent from each other (Jaeger et al., 2007) and after transformation into a principal axis system the three perpendicular absolute principal stresses S 1 , S 2 , and S 3 remain which reduces the stress tensor to: By definition, S 1 is the largest principal stress and S 3 the smallest. In the upper crust, it is assumed that one of the principal stresses is the vertical stress S v . This assumption reduces the independent components to four: The magnitudes of S v and the two horizontal principal stresses S Hmax and S hmin and the orientation of S Hmax (Anderson, 1951).
To model the absolute 3D undisturbed stress state, a geological model that describes the distribution of the rock properties (Young's modulus, Poisson's ratio and density) is needed. The gravitational body forces are well constrained by the density distribution whereas surface forces due to tectonic processes are usually imposed with displacement boundary conditions (Hergert et al., 2015). Data of S Hmax orientation is provided in the World Stress Map database , but this information is only helpful for a rough estimation of the criticality of a faults orientation in the prevailing stress field. In the context of the modeling approach, the orientation of S Hmax helps to determine the orientation of the model boundaries and may be used as additional constraints to assess the quality of the stress state. For the calibration of the undisturbed stress state, however, stress magnitude data are essential. The task is to determine the displacement boundary conditions that satisfy the stress magnitude data records. This is the calibration procedure briefly described in the following or in more detail in Reiter and Heidbach (2014) or Ziegler and Heidbach (2021b).
Assuming linearity and a rectangular model where boundary conditions are applied perpendicular to the model boundaries, the displacement solution space is a line that fits the observed S hmin or S Hmax magnitude, respectively (Figure 2a). In a linear system, there is only one pair of displacement boundary conditions that fit a pair of S Hmax and S hmin magnitude data records ( Figure 2b). The more data record pairs of S hmin or S Hmax magnitudes are available for the model calibration, the more lines defining possible combinations of displacement boundary conditions are generated (Figure 2c). This requires the usage of an average set of displacement boundary conditions (white circle in Figure 2c) for the computation of the displacement boundary condition that is the best approximation to all stress magnitude data records, that is, a model which minimizes the difference between the model stress state and all stress magnitude data . This entire process is referred to as model calibration and is assisted by the tool FAST Calibration v2.0 (Ziegler & Heidbach, 2021).  Reiter and Heidbach (2014) and Ziegler and Heidbach (2020). Top row shows the usage of combination of displacements at the lateral model boundaries. Lower row shows the associated implementation in the model with stress magnitude data that are used for the model calibration. Red (S Hmax ) and blue (S hmin ) indicate the horizontal stress magnitudes. Associated possible boundary conditions that fit their respective value are indicated at the model boundaries. Solid white dot indicates the pair of boundary conditions that satisfies a pair of S Hmax and S hmin magnitude data. (a) For a single S hmin data record, the combinations of boundary conditions that satisfy the stress magnitude data record are a linear function. One possible set of boundary conditions is indicated here as an example. (b) A single S hmin and S Hmax data record can be modeled individually again by boundary conditions providing two linear functions. However, if both S Hmax and S hmin should be satisfied the intersection of these two lines is the only valid set of boundary conditions. (c) If several S Hmax and S hmin data records are available, the averaged intersection of all lines is used as the valid set of boundary conditions that minimizes the differences between modeled and observed stress state. (d) Adding uncertainties to the stress data records increases the uncertainties in the averaged intersection of all lines.

Definition of Model Scenarios
We use the term uncertainty following the common definition of for example, Begg (2014) in contrast to the term variability. Begg (2014) states that uncertainty refers to the likelihood of what the single, true value of the uncertain quality is and variability refers to the range of multiple instances of the quantity, derived from observed data. The model uncertainties are represented by the average of each of the six independent components of the 3D stress tensor and their standard deviations. Our approach to quantify model uncertainties is purely data driven. In contrast to this, Lecampion and Lei (2010) use data records of a defined reliability to invert for a stress state. To quantify the uncertainties due to the calibration of a model of the 3D undisturbed stress state we use the approach described in Ziegler and Heidbach (2020). We focus in this study on the epistemic uncertainties due to the stress magnitude data records and their uncertainties. Further epistemic model uncertainties that result for example, from rock property and its spatial variability (Ziegler, 2022), model geometry (Bjorge et al., 2022;Wellmann & Regenauer-Lieb, 2012), or the assumption of linear elasticity are not considered.
When using several data records to calibrate a model of the undisturbed stress state, the necessary averaging leads to a loss of information. We mitigate this by setting up a range of different model scenarios. The term model scenario here refers to the same geological model and distribution of rock properties but a different set of boundary conditions. Each model scenario uses only one S Hmax and one S hmin data record which can be perfectly fitted ( Figure 2b). The number of individual model scenarios is thus the number of all possible combinations of S Hmax and S hmin stress magnitude data pairs that can be used for model calibration.
The generated model scenarios provide a range of possible absolute stress states based on stress magnitude data.
To account also for pore pressure, we estimate the effective stresses from the model results by assumption of hydrostatic pore pressure if not specified differently in distinct maps of pore pressure deviations. Thus, for each model scenario, the effective stress can be estimated. Initially, no information on the plausibility of an individual model scenario is available. This usually leads to a significant variety of stress states which inhibits an interpretation. In order to reduce the variety, a weighting of the model scenarios is indicated. This is accomplished by using estimates on the quality of stress data records and additional constraints on the stress state in a Bayesian framework.

Bayesian Framework
Bayes theorem is a way to estimate the probability of an event-here a model scenario-with the help of prior knowledge that may be related to the used stress data records ( Figure 3). Translated to this geomechanical modeling approach the Bayes theorem allows to estimate the probability P(S|C) of model scenario S in relation with the additional constraint C and it reads as: with P(C|S) the probability of the additional constraint in the model scenario, that is, does an individual additional constraint agree with the stress state of an individual model scenario (Figure 3 green). P(S) is the initial probability of this model scenario based on the data records qualities and therefore regarding the inherent uncertainties (see above, Figure 3 red). P(C) is the probability of the additional constraint used to check the consistency of the modeled stress state (Figure 3 blue).
First, a probability P(S|C) based on each individual additional constraint is estimated for each model scenario.
Then, for each type of additional constraints a single probability that is based on all data from the group is computed for each model scenario. Finally, the probabilities of the types of constraints are combined to a single final weight for each model scenario. In this step, the impact of the individual constraints is modulated by using a weighted mean. The weights assigned to the different types of additional constraints take into consideration their possible uncertainties and the general degree of believe.

Introduction of Calibration Data Uncertainties
In order to regard the assumed or measured uncertainties in the stress data records, we use the data quality assessment based on a quality ranking scheme provided by Morawietz et al. (2020). Each stress data record is assigned a quality based on the type and circumstances of measurement as well as the quality of documentation and availability of metadata. We use these qualities to assign a probability density function (PDF) around the reported stress magnitude in the shape of a normal distribution to each data record. The better the quality is, the narrower is the distribution since it is assumed that the uncertainties in the estimation of the data record are smaller. The standard deviation σ, that is, the width of the distribution, is defined by the weight w of the data records following an inverse scheme that reads with the reported stress magnitude m and the scaling coefficient c. Herein, c is set to c = 30. For each original data record m four auxiliary stress magnitudes are extracted from the according distribution at m − 2σ, m − 1σ, m + 1σ, and m + 2σ. These are then used in the same way as an actually reported stress magnitude but are assigned a weight according to the given PDF. This consideration of the calibration data uncertainties results in a significant increase in model scenarios and an overall increase of the model uncertainties.

Additional Constraints
We integrate indirect stress information as additional constraints in order to identify less likely model scenarios and thereby limit the range of possible stress states. This indirect information cannot be used in the model calibration procedure that determines the best-fit displacement boundary conditions. But the occurrence or non-occurrence of stress-controlled failure of the rock in terms of seismic events, information from FITs and the observation of borehole wall failures provide upper and/or lower boundaries of the undisturbed stress state. This . Proposed Bayesian workflow to estimate the weight of a modeled stress state based on weighted stress data records and additional constraints. Red: stress data records are assigned to a distribution based on their uncertainties. From the distribution pairs of S Hmax and S hmin data records are selected to setup model scenarios. An according weight is estimated for each model scenario (P(S)). Blue: additional constraints on the stress state from within the model volume are compiled. If required, further data to relate the constraints to the stress state are added. For each constraint, an individual weight is estimated (P(C)). Green: model scenarios are evaluated for their agreement or disagreement with the additional constraints (P(C|S)). Yellow: Bayesian weight (P(S|C)) for a model scenario is estimated.
7 of 23 is used to assess the reasonability of each individual model scenario. We present in the following subsections the three different types of additional information that we use.

Formation Integrity Tests
An FIT is a test of the strength and integrity of a formation by subjecting it to a known pressure (White et al., 2002). If the rock does not fail, the FIT pressure has to be smaller than the S 3 magnitude which is the intention of the test. Thus, all successful FITs provide a minimum lower boundary of the S 3 magnitude.
In terms of the Bayesian weighting approach the stress state at the location of a known FIT is extracted from a model scenario. Then the observed FIT pressure and the modeled stress state are compared ( Figure 4a). If the FIT pressure is smaller than the modeled S 3 magnitude the individual model scenario agrees with this additional constraint, thus supporting this model scenario which is reflected in P(C FIT |S) = 1. On the contrary, if the FIT pressure is larger than the modeled S 3 magnitude, the individual model scenario disagrees with the additional constraint which is reflected in P(C FIT |S) = 0. In the latter case, if the modeled stress state would be the real stress state the FIT should have resulted in fracturing the rock and fluid loss into the formation. An FIT measurement itself is also subject to uncertainties. The probability of each FIT that serves as an additional limit P(C FIT ) can hence be individually defined.
For normal faulting (S v > S Hmax > S hmin ) or strike slip (S Hmax > S v > S hmin ) stress regimes the agreement or disagreement of the S hmin magnitude from an individual model scenario with respect to the FIT pressure provides information on the reliability of this model scenario. A disagreement indicates that the FIT data or/and the calibration data is not reliable. In a thrust faulting regime (S Hmax > S hmin > S v ), the information provided by the assessment of FIT pressures is limited to the magnitude of S v which is the least principal stress component S 3 and mainly a result of the overburden. In case of a disagreement this may indicate that the density of the overburden is not correctly chosen. Thus, in a thrust faulting setting an evaluation of the reliability of the S hmin calibration is not possible using FIT pressure data.

Seismicity-Based Constraints
The observation of seismicity indicates that the stress state has reached the fault strength. On the other hand, if no seismicity occurs, the undisturbed stress state does not reach critical values with respect to the fault strength. On a first order, the fault strength is described by a failure envelope defined by cohesion and friction angle (Figure 4b).
For a given fault orientation, the criticality of the stress state is often communicated with the normalized scalar value of the ST: where τ max and σ n are the maximum shear stress and normal effective stress acting on the fault, respectively, and μ is the static friction coefficient of fault material (Morris et al., 1996). In addition, the cohesion C can be In a location and depth where seismicity is observed the modeled stress state from a scenario has to touch the failure envelope in order to be reliable. For locations with explicitly no observed seismicity the requirements of the stress state and in this representation the color of the Mohr circles is changed.

of 23
considered which is assumed to be 0 for faults at failure (Ziegler et al., 2016a(Ziegler et al., , 2016b. ST values range between 0 and 1 with 1 being the critical stress at failure and 0 indicating that the differential stress is zero. The original estimation of ST does not include pore pressure (Morris et al., 1996) however, effective stresses are often used (Moeck et al., 2009;Vadacca et al., 2021).
In order to use the information of seismicity to assess the reliability of the modeled stress state, ST is computed for each model scenario at locations of observed seismicity and also explicitly observed seismic quiescence. If the fault orientation of an observed event is known (either from earthquake focal mechanism or inferred from structural geological information) ST can be computed for these distinct orientations. If the fault orientation is unknown, ST is computed for faults that are optimally oriented in the stress field of each model scenario and thus are the fault orientations that have the highest ST.
Then, it is evaluated whether the computed ST value from a model scenario agrees with the observations of seismicity. If ST ≥ 1 in an area where seismicity has been recorded or ST < 1 in an area of explicit seismic quiescence, the model scenario agrees with the constraints derived from seismological observations. This is reflected in an assignment of P(C Seismicity |S) = 1. If ST < 1 in an area where seismicity has been recorded or ST ≥ 1 in an area of explicit seismic quiescence, the model scenario does not agree with the constraints derived from seismological observations (P(C Seismicity |S) = 0).
In case of seismicity-based constraints that use induced seismicity, this approach is not necessarily justified. The modeled stress tensor used for estimation of ST provides an undisturbed stress state. The observed induced seismicity, however, was caused by a stress tensor that was disturbed by human activity such as fluid injection and production that alter the pore pressure an therefore the effective stress. In order to obtain this disturbed stress state, a modeling of the process that lead to the inducing of seismicity can be performed (Blöcher et al., 2010;Jeanne et al., 2014;Rutqvist et al., 2015;Ziegler et al., 2015). For a few important individual events, such a modeling approach can be a viable solution. If multiple induced seismic events are considered, the effort to generate individual models and the required computation time is likely not feasible. Instead, an analytical solution can provide a rough estimation of the changes between the initial undisturbed stress state and the disturbed stress state which lead to the induced seismicity (Altmann et al., 2014(Altmann et al., , 2010Müller et al., 2018;Rudnicki, 1986). Consequently, the change in ST due to fluid injection or extraction operations is estimated as explained in the following.
The ST is available for an initial undisturbed stress state (ST i ) from the in situ stress model scenario and for an altered disturbed stress state (ST d ) from the analytical estimation of stress changes if needed. The estimation of P(C Seismicity |S) is thus altered. In case of observed induced seismicity, the desired model results are ST i < 1 and ST d > 1 which leads to an assignment of P(C Seismicity |S) = 1. In contrast, observation of ST i > 1 and ST d < 1 shows that the man-made processes actually stabilized the stress field which means induced seismicity became less likely and leads to an assignment of P(C Seismicity |S) = 0. The two additional combinations (ST i < 1 and ST d < 1, ST i > 1 and ST d > 1) are per se wrong. However, considering the associated uncertainties these cases can be considered as well as long as the stress state moves closer to failure, that is, ST d -ST i > 0. P(C Seismicity |S) is then assigned the inverse value of the number of times ST d -ST i has to be added or subtracted, respectively, before the desired outcome ST i < 1 and ST d > 1 is achieved.
The observation of seismicity requires a seismological network whose density determines the magnitude of completeness that may vary locally. In particular detection of small-magnitude events (M w < 2.5) is challenging if the network is not purpose-built. Furthermore, the accuracy of the velocity model is variable. Thus, the location, depth, and focal mechanism solution are subject to uncertainties. If various events in various locations are regarded, the uncertainties of the information are likely variable. A weighting of the individual events is indicated. Factors such as the magnitude of completeness at the location, the azimuthal coverage, event magnitude, and number and density of stations should be considered.

Constraints From Borehole Failure Data
Compressive and tensile failures of the borehole wall are well known and used as an indicator for the orientation of the horizontal stress components (Aadnoy & Bell, 1998;Bell & Gough, 1979;Schmitt et al., 2012). Their occurrence is linked to the compressive or tensile strength of the rock, respectively, and the stress state at the borehole wall. If such borehole failures are observed in an image or caliper well-log and data on the rock strength is known, information on the in situ stress state can be derived ( Figure 6).
BOs occur if the maximum effective circumferential stress is larger than the uniaxial compressive strength of the rock C 0 expressed as with the mud pressure P w and the pore pressure P p . Usage of the uniaxial compressive strength is a simplification over more sophisticated strength data (Valley & Evans, 2019) that may be required due to limited data availability.
This circumferential stress in a plane normal to the vertical borehole orientation is estimated from the full stress tensor by using the Kirsch equation (Kirsch, 1898) as shown by for example, Scheidegger (1962). In case of a vertical borehole, the maximum circumferential stress immediately at the borehole wall can be estimated from the effective stresses S hmin and S Hmax (Figure 5a, Hillis & Reynolds, 2000) by the following equation: Analogously, DITFs occur if the minimum circumferential stress around the borehole wall is smaller than the tensile strength of the rock T 0 expressed as: In case of a vertical borehole, the minimum circumferential stress immediately at the borehole wall can be estimated from the effective stresses S hmin and S Hmax by as shown in Figure 5b (Aadnoy & Bell, 1998).
If a BO is observed and the model scenario stress state indicates that the at the location of the breakout is larger than C 0 the model agrees with the observation and P(C BO |S) = 1. On the contrary, if breakouts are observed but the model indicates a that is smaller than C 0 the model does not agree with the observations (P(C BO |S) = 0). This works analogously for DITFs and T 0 . If at the location of the DITF is smaller than T 0 the model scenario agrees with the observation (P(C DITF |S) = 1). However, if is larger than T 0 at the location of an observed DITF the model scenarios stress state disagrees with the observations (P(C DITF |S) = 0).
To assign a confidence factor for each additional constraint based on borehole failure P(C) we cannot use the quality ranking of the World Stress Map project . This is designed to assess the reliability of the mean S Hmax orientation along the entire borehole length and not directly related to the stress magnitudes that caused the borehole failure in individual sections of the borehole. Thus, the WSM weighting scheme is adapted in a way that only the length of the observed failures is considered. All breakouts with a total length of ≥300 m observed with a caliper log are assigned P(C) = 1 while less total length of the breakout zones is weighted based on their length. DITFs and BOs with a total length of ≥100 m that are observed in an image or televiewer log are assigned P(C) = 1 while <100 m of total length of the failure zones are weighted based on their lengths. The cap on the length-based weighting is introduced to prevent over-representation of extremely long failure zones.  (Hillis & Reynolds, 2000). Red line shows a stress situation that is subcritical as neither the compressional (a) nor the tensile strength (b) are exceeded. Green line results from a stress state that is generating borehole breakouts (BOs) and drilling induced tensile fractures (DITFs), respectively.

Model Setup
To test the presented approach, we use the regional-scale 3D geomechanical-numerical model from Ziegler and Heidbach (2020). It is located in the Bavarian part of the Northern Alpine Molasse Basin ( Figure 6) and based on a geologic model by Przybycin et al. (2015Przybycin et al. ( , 2017. Below the sedimentary Molasse units an Upper Jurassic karst lithology holds geothermal water that is locally extensively used for energy supply mainly in terms of district heating and a few electric power plants (Agemar et al., 2012). The 13 implemented lithological units with according rock properties (Table S1 in Supporting Information S1) dip south toward the Alps reaching a maximum thickness of the sediments of approx. 5,000 m. The geometry is discretized with approximately three million hexahedrons to solve the problem numerically with the finite element method. The vertical and lateral resolution is 18 and 1,500 m, respectively. The high vertical resolution is to ensure a good numerical representation of the individual lithologies that requires at least three vertical elements.
For the model calibration, nine S hmin and four S Hmax magnitudes are available (Table B1, Figure S1 in Supporting Information S1). The data records are assigned a quality between A (high confidence) and E (data not suitable for calibration) using the quality assignment of Morawietz et al. (2020). The encountered data records have qualities C and D. According to data records qualities in the World Stress Map are commonly assigned weights of 0.6 and 0.375, respectively . To allow for continuity, the same weights are used herein, even though applied to stress magnitude data. According to the previously described approach, the expected uncertainties in the stress magnitude data records is estimated by construction of a probability distribution around the reported stress magnitude. This is done to represent possible uncertainties in the reported stress magnitude data records resulting in five individual values for each data record. Each possible combination of S Hmax and S hmin magnitudes results in (4 S Hmax data records × 5) × (9 S hmin data records × 5) = 900 combinations. For each of these pairs the best-fit displacement is solved in 900 individual model scenarios for the absolute stress state. Furthermore, each model scenario is assigned a weight P(S) based on the probability of the utilized data records and the position in the distribution.

Introduction of Additional Limits
To evaluate the predictive value of each model scenario indirect stress observations are used (Figure 3 blue). In the Bavarian Molasse, we have access to 53 FIT pressure recordings from 24 boreholes at 10 geothermal sites ( Figure 6). The disclosed 20 FIT data records from three sites are presented in Table B2 and Figure S1 in Supporting Information S1 while the other 33 FITs from seven additional sites are confidential, but were used in the same way as the disclosed ones. Since no additional information on the FIT recordings are available, they are weighted equally with P(C) = 1.
However, the evaluation of P(C|S) (Figure 3 green) has to consider the lithological unit since the stress state can be highly variable even in neighboring units (Hergert et al., 2015;Warpinski & Teufel, 1991). Due to this effect, a mismatch between the model lithology and in situ lithology may lead to a wrongly assumed stress state which in turn impedes interpretation of the data. If the lithology in the model agrees with the data from the FIT recording, P(C|S) = 1 for an agreement of the model with the observation and P(C|S) = 0 for a disagreement. On the other hand, if the lithology in the model deviates from the lithology according to the FIT data, P(C|S) = 0.5 for an agreement of the model with the observation and P(C|S) = 0.25 for a disagreement. The latter is to ensure the possibility that either the model geometry is wrong at the location of the FIT or that the lithology in the FIT report is erroneous. Analogously, if no data on the lithology is available from the FIT report, P(C|S) = 0.75 for an agreement and P(C|S) = 0.25 for a disagreement.
Seismological observations from regional and local networks are used to assess the agreement with the criticality of a model scenario. While no natural seismicity has been recorded in the model area, since the onset of geothermal exploitation induced seismic events of up to M l 2.4 have been recorded in close association with some geothermal projects (Grünthal, 2014;Megies & Wassermann, 2014. Thus herein, we use recorded induced seismic events in the immediate vicinity of eight geothermal projects (Table S2 in Supporting Information S1, red in Figure 6). As of 2022 the other 12 geothermal projects, however, remained seismologically quiet (Table S3 in Supporting Information S1, green in Figure 6) and are introduced as such to our Bayesian tree. For both types of site, we subtract a hydrostatic pore pressure from the normal components of the modeled stress tensor. Where indicated, we correct for a natural local overpressure (Drews et al., 2018(Drews et al., , 2020. Then we analytically estimate the stress changes due to operations and use the accordingly changed stress state for estimation of ST (Section 3.3.2).
Seismic events are known to lead to stress changes in their vicinity (Hardebeck et al., 1998;King et al., 1994). However, given that the moment magnitude of induced events is generally low (Blöcher et al., 2018;Grünthal, 2014) the resulting stress changes are small in comparison to the uncertainties considered in Figure 1c. Thus at least for this study, the effect of static stress changes due to induced events can be neglected.
The localization and estimation of magnitude for seismic events is highly dependent on the quality and azimuthal coverage of the seismological network. Therefore, P(C Seismicity ) is used to provide an individual weighting to each recorded event. The larger the magnitude and the denser and more local the network, the higher is the expected quality (Table 1). Thus, a small seismic event which has been recorded by a global network only is assumed to have a larger uncertainty, for example, in terms of localization. Thus, it is assigned a smaller weight compared to a large event that has been recorded by a local network. The presented weights are a first order approach that demonstrates the applicability of the weighting and could be refined in future studies.
Explicitly observed seismological quiescence is prone to misinterpretation due to not detectable seismic events with magnitudes that are below the resolution of the installed observation network. This would result in an erroneous assessment of a model's stress state. Therefore, a strict weighting has to be applied. Here, the general rule applies, that quiescence in an area covered by a local network is assigned P(C No Seismicity ) = 1, for a regional network P(C No seismicity ) = 0.5, and for a global network P(C No seismicity ) = 0.25.
We use 54 BO and five DITF data records ( Figure 6) taken from the World Stress Map database release 2016 (Heidbach et al., 2016). This borehole failure information is used to identify model scenarios in which the stress state agrees with the observations. P(C BO/DITF ) is derived according to the previously mentioned scheme. We also account for the effect of the pore pressure, a hydrostatic pore pressure is added to the normal components of the stress tensor. Where necessary, an overpressure according to Drews et al. (2018Drews et al. ( , 2020 is realized. The estimation of P(C BO/DITF |S), however, requires to regard the large uncertainties in the compressive or tensile strength of the   (Tables S4 and S5 in Supporting Information S1, Bär et al., 2020;Bohnsack et al., 2021;Ortiz Rojas et al., 2017;Potten, 2020;Potten et al., 2022). The agreement rate of the range of rock strength values to the stress state is reflected in P(C BO/DITF |S) accordingly.

Bayesian Quantification of Model Uncertainties
The first step to estimate a weight P(S|C) for the 900 model scenario is based on each individual additional constraint, for example, based on a single FIT pressure recording (Figure 3 green, Figure 7). This may be helpful information for a local application, however, the aim of this study is an integrated weighting of model scenarios.
In a second step, the median of the weights of each type of additional constraints is computed for each model scenario. Now, each model scenario is weighted based on all BOs, all DITFs, all FIT pressure recordings, all observed seismicity, and all explicitly seismically quiet locations.
Eventually, these weights are merged in order to estimate a final weight for each model scenario. There are significant differences between the constraints concerning their reliability. These differences origin in associated uncertainties due to measurements methods or documentation, required assumptions or additional variables, or leverage in interpretation. While an FIT pressure is quite robust being only a pressure value readout, borehole observations not only require an interpretation of caliper or image logs but also the knowledge of the rock compressional and tensile strength which is subject to large uncertainties. Seismological observations and even more explicitly quiescence require a dense seismological network with a good azimuthal coverage and a significant magnitude of completeness, as well as a velocity model of the subsurface.
This illustrates the need to define a confidence in each type of constraint in addition to a potentially previously defined confidence in the individual data record (Figure 7). The final weight is the weighted median of the groups of constraints. FITs with the least uncertainties are assigned a weight of 1. Observed seismicity is recorded by a regional and partly local network and has a relatively high confidence and as such a weight of 0.75. Borehole failure related constraints are heavily influenced by the uncertain and sparse data on rock strength and are thus assigned a weight of 0.5. Non-observed seismicity is based on the same network as seismicity but is also subject to municipal noise and the possibility of a non-detection of small events. Thus, a weight of 0.25 is assigned.
The final weight for each of the 900 model scenarios are shown scaled to 1 in Figure 8.

Results and Discussion of the Case Study
In order to display the results, the stress state is extracted from all model scenarios at the reservoir depth of 2,625 m TVD at a geothermal project that provides district heating to Aschheim/Feldkirchen/Kirchheim (AFK), an eastern suburb of Munich (star in Figure 6). The according pore pressure changes as a result of 12 yr of production and reinjection of 75 l/s have been accounted for in the analysis. Therefore, the pore pressure diffusion was estimated analytically (Altmann et al., 2010(Altmann et al., , 2014. As of 2022, the location has not been subject to recorded (induced) seismicity. Each model scenario stress state is displayed in a Mohr-Coulomb diagram with the Mohr circles transparency and linewidth based on the weight for each of the 900 model scenarios (Figure 9). The weights based on the individual additional constraints are displayed (Figure 9) as well as the combined final weight (Figure 10a). The resulting Mohr circles and their weights indicate that an intermediate magnitude differential stress state is preferred even though a stress state with a very small differential stress seems also possible.
The example AFK shows that the prediction of the stress state in the model generally agrees well with the observation of an uncritical stress state at this particular location (Figure 10a). In contrast, the neighboring geothermal  site Poing where seismicity of up to M w 2.1 was recorded  is also for many model scenarios critical (Figure 10b). This is a valuable information for future operations and induced seismic hazard assessments in the area since the model also provides information on the stress state at other possible sites. The assignment of weights to the model scenarios allows to estimate the probability of occurrence for a given stress state.
There is generally a good agreement between the weighting based on the different additional constraints (Figure 9). The only exception is the non-observation of seismicity which behaves as expected in preferring model scenarios with small differential stresses that are far away from the failure envelope. This may also be a result of the possible inconsistencies associated with non-observations since for the presented study no distinct seismological networks were set up. This is reflected in the low overall impact of no seismicity on the final weight ( Figure 7). Furthermore, it is noteworthy that the DITFs are preferring large differential stress scenarios over the model scenarios with a small differential stress which is supported in particular by FITs and seismicity-based weights. We assume that the deviating behavior of the DITFs is due to their low occurrence in the model volume.
In the presented case study, only in five boreholes DITFs were interpreted which is a small number compared to the BOs (54) and FITs (55). This shows that it is not necessarily a systematic deviating behavior but could be a site-effect. Another source of the deviation is that we do not consider additional factors that control their creation which is the dependency on the drilling procedure and variables such as the mud pressure, the rotation of the drill-bit, and the borehole width. However, these effects are small in magnitude and would not have a significant effect and thus we think that the distribution and lower number of DITFs is not sufficient to constrain the model scenarios. In summary, there is generally no major contradiction between the constraints which shows their applicability.
With the exception of the no-seismicity based weighting (Figure 9e) and to a lesser extend the DITF based weighting (Figure 9b), the bi-modal distribution of model scenario weights (Figure 8) is also significantly reflected in the stress states. Roughly two prevailing possibilities are indicated. Either a high differential stress of approx. 100 MPa or a smaller differential stress of approx. 40 MPa are predicted. This bi-modal characteristic of the stress states is not solely due to the additional constraints. It is rather a result of the stress magnitude data records used for calibration. They provide the basic stress states which are then evaluated with respect to the additional constraints. In particular, the S Hmax magnitudes can be separated into two clusters-with a large or small S Hmax magnitude (Table B1). The model scenarios based on these data records are evaluated and accordingly weighted.
The resulting weights of model scenarios with an intermediate S Hmax magnitude (which is not supported by data records) cannot be predicted but should be the subject of future research.

Discussion
We present an approach to geomechanical-numerical modeling that uses Bayes theorem as a formalized way to estimate the uncertainties of the model results due to uncertainties in the stress magnitude data records that are used for model calibration. The importance of further uncertainties such as those in the geological model (Bjorge  , 2022;Wellmann & Regenauer-Lieb, 2012) which is in close relation to the rock properties systematic lateral distribution and stochastic heterogeneity (Ziegler, 2022) is indicated but not considered herein. The available stress magnitude data records are combined in pairs of one S Hmax and one S hmin data record each. Each pair is used to setup a model scenario that represents one possible stress state. BOs, DITFs, FITs, observed seismicity, and explicitly observed seismic quiescence are used as additional constraints on these model scenarios. This way, a data-driven weighting is able to single out reliable model scenarios that are favorable for interpretation. The applicability of the approach to identify realistic stress states and reject unrealistic ones is shown in a generic truth model (Appendix A). The approach was able to identify the correct boundary conditions even though the input data was subject to noise.

Input Data
Additional observations and data that cannot be used for model calibration still carry valuable information that can be used to separate reliable from less reliable model scenarios. Their integration to the approach increases the predictive value of the model results by the integration of constraints that can judge the modeled stress states.
However, the additional observations are subject to uncertainties themselves. For example, a ST value larger than 1 does not necessarily lead to recorded seismicity. The affected area may not be large enough to induce an event that can be recorded or an event at all. Alternatively, the stress relief can be accommodated in aseismic creep which does not emit a signal that is recognized as seismicity.
This illustrates that the reliability of the additional constraints themselves cannot always be guaranteed. Therefore, their impact on the weighting approach is moderated by the assignment of weights. These weights of the additional constraints are based on a "degree of believe" or expert elicitation which is possible source for subjectivity. Still, even with a minimum weighting of the additional data, the case study indicated that the usage of additional constraints indeed identifies distinguished model scenarios that are an exceptionally good representation of the stress state and at the same time discards unreliable ones.
Such a clear identification of favorable model scenarios is not necessarily always the case. It depends on the amount and quality of the additional constraints. This is in turn dependent on the specific setting and area. The available data is likely unevenly distributed throughout the model area. Furthermore, local characteristics may be variable, such as the seismological network density, and therefore the weighting needs to be flexibly designed. Accordingly, the desired impact of the different types of constraints within the Bayesian framework needs to be assigned specifically for each model. The possibility of disagreement or even contradictions between the additional constraints exists. At least a deviation of the weights based on no seismicity and DITFs compared to the other additional constraints is indicated in the case study (Figures 8 and 9). Therefore, we investigate if the various types of constraint are allowed to be used in conjunction. A correlation between the model scenarios weights based on the different constraints was computed for the case study ( Figure 11). Generally, there is a good agreement between the different constraints with correlation coefficients of >0.90. However, the slightly deviating behavior of the DITFs-based weights already observed in the model scenarios is also reflected in the correlation table. The slightly smaller correlation coefficient of >0.83 is still sufficiently high in order to allow an interpretation.
The deviating behavior of no-seismicity based weights to only allow modeled stress states with a small differential stress is also reflected in the correlation table (Figure 11). The most significantly reduced correlation coefficient is 0.33 with the DITFs. While the latter have a slight preference for large differential stress, no-seismicity based constraints do not allow such a stress state. Furthermore, the other additional constraints (FITs, BOs, seismic- ity) are in a better correlation with no-seismicity (0.51-0.69) even though clearly less correlated than the other constraints amongst themselves.
If such a clear disagreement is observed, it is required to scrutinize the affected data. Even more, if an anti-correlation should emerge. On the one hand this may identify errors or incorrect assumptions in the additional data that shows a deviating behavior. On the other hand, a deviation could indicate local anomalies which are not reflected sufficiently well in the model's lithology. Therefore, particular attention should be paid whether a clustering of deviating data is observed either laterally or in a particular lithology.

Bayesian Approach
The Bayesian approach assists in providing a technical framework for the weighting of model scenarios. It calls for a weighting or estimation of the probability of all the associated data, that is, the model scenarios and the additional constraints. The main benefit of using a Bayesian approach is the objectiveness and formal way to estimate the predictive value of model scenarios and the simplicity of introduction of large amounts of data. The general framework is fixed and as such provides objective results. The assignment of weights, however, is highly flexible as it needs to adapt to the characteristics of the model. Thus, the weighting is a major source for subjective decisions.
The model scenarios weighting P(S) is realized following the stress magnitude data quality ranking by Morawietz et al. (2020). This approach results in a rather objective estimate of the confidence in a data record. However, the stress magnitude quality ranking does not provide an estimate of the uncertainty that is associated with a certain quality. The translation of qualities to uncertainties needs to be estimated individually for data records or more in general for each modeling approach. It can be partly based on the absolute stress magnitude and therefore the depth of the data record. Furthermore, it needs to be ensured that the data records are representative for the model area in terms of the tectonic unit and its lithological origin.
The estimation of the weight of an additional data record P(C) allows a detailed assessment of the individual quality. Generally speaking, the more information is available, the more detailed the weighting can be. In a comprehensive approach, the individual data records should be evaluated and rated by according experts which then provide a founded estimate on the confidence. The quality of FIT pressure recordings and seismicity observations largely can be objectively assessed. In contrast, the interpretation of well-logs for BOs and DITFs requires proficiency and experience and thus involves a certain degree of subjectivity. Furthermore, additional data on the compressive or tensile strength are required. Due to challenges in data availability, an uniaxial compressive strength was assumed, which is a simplification (Valley & Evans, 2019).
In particular, challenging is the usage of non-observations that add significant value to the approach. Potentially a large amount of additional information can be compiled which help to further narrow down the possible model scenarios. Of particular value is the back-to-back usage of observations and non-observations that together provide lower and upper constraints to the stress state. Furthermore, if a lateral or lithology-dependent clustering of disagreements can be observed this can be an indicator for inaccuracies in the model. Future research should aim to use this to generate maps of reliability of model scenarios that identify areas and lithologies that are particularly well represented in the model. It is likely that such areas exist since the availability of data is often limited to certain areas and lithologies while others are almost completely disregarded. Such a reliability mapping would help to further increase the interpretative value of the resulting modelled stress states. At the same time, areas and lithologies that may need more data are identified.
However, the case study indicated that even a simple weighting provides additional legitimacy to the approach. If applied correctly and objectively weighted, the Bayesian approach contributes to a significant modelled stress state. Furthermore, it helps in identification of additional constraints that are not representative in that they disagree with most or all model scenarios. Such an observation is not necessarily an indication for an erroneous constraint but can also indicate areas or locations that are not well represented in the model. This kind of feedback can then be used to identify the importance for the stress state of lateral deviations in rock properties or other features that are not included in the model. This highlights the fact that the Bayesian approach also provides valuable information for all parts of the modeling workflow. A detailed analysis of the data records and weights serves as a control mechanism to consolidate the value of an entire model.

Conclusion
We present an approach to geomechanical modeling with quantified uncertainties that uses stress magnitude data to setup a range of possible model scenarios and additional constraints to evaluate the scenarios plausibility. We integrate the individual calibration data uncertainties and additional indirect stress information that provide upper and lower bounds of differential stress into a Bayesian framework. This results in individual weights of model scenarios and an identification of calibration data that area either erroneous or not suitable for the model calibration. Our Bayesian approach clearly manages in a formalized way the weights of the stress magnitudes and additional constraints and provides a final weight to the different model scenarios that describe the stress state. This allows an evidence-based interpretation of the model results in terms of its probability. Our case study shows that the presented approach positively identifies a small cluster of reliable model scenarios and discards less reliable ones through assignment of a very low weight. This results in a significant lower uncertainty of the modeled stress state.

Appendix A: Truth Model
The presented Bayesian approach aims to improve a stress model's significance by a weighting of model scenarios. In the weighting approach, a range of model scenarios is setup and various indirect stress information are used to assess whether a model scenario is realistic or not. The applicability of the approach to correctly identify the correct stress state is verified by the application of a so-called truth model. Therefore, a generic 5 × 5 × 5 km 3 model with three distinct units is setup. It is populated with realistic elastic rock properties (Table A1). The model is subjected to displacement boundary conditions (x: −5.87 m, y: 6.11 m) that result in an arbitrary, yet realistic stress state throughout the model. This stress state is defined as the "truth" and used as a reference to test the ability of the presented Bayesian approach to identify the correct stress state.
Two magnitudes of each S hmin and S Hmax are extracted from the reference truth model. Noise and errors are added to these data in order to mimic a realistic pool of available stress magnitude data records that can be used for calibration (Table A2). Furthermore, the stress magnitude data are assigned a confidence that is based on expert elicitation. The described approach is applied which assigns a distribution to each of the stress data records. From each distribution new magnitudes are extracted at +/−2σ, +/−1σ and the mean value (Table A2). The total number of stress scenarios is the number of possible permutations of derived S Hmax and S hmin stress data records. From each initial data record five individual data records are derived to account for measurement errors. From initially two S Hmax data records 2 × 5 = 10 data points are derived. The same is for initially two S hmin data records. Eventually 10 S Hmax and 10 S hmin data points are available. The possible number of combinations is 10 × 10 = 100 which is the number of stress scenarios that are solved.  Indirect additional data on the stress state is available in terms of FIT pressures, expected BOs and DITFs and on the stability of the lithological units are available (Table A3). Each of the additional information is assigned a confidence which for the borehole features is based on the proposed weighting schemes. Both BOs and DITFs feature a correct observation and an incorrect identification of natural fractures as a BO or DITF, respectively. The FITs are assumed to be very reliable and thus have a high confidence. The same is for the observation of seismicity and no seismicity where a high-quality broadband network is assumed to be in place for a long time already.
All the available information and weights are introduced in the Bayesian weighting approach. The final weights are shown in Figure A1 and are in good agreement with the true stress state, here identified by the boundary conditions that lead to the true stress state. The method is thus able to identify the optimal stress state even with noise added to the input data, which is expected to be the case for real data as well. Due to the generic nature of the truth model, the results in this test are exceptionally good. In the realistic case study with heterogeneities, anomalies, and additional sources of uncertainties the result of the Bayesian weighting approach cannot be assumed to be the single true stress state.
Furthermore, the truth model allows exploration of the individual contribution of the different additional data on the final result. This is indicated in Figure A2 where the final weight can be compared to the individual methods weights. In particular for the FIT based weights a significant deviation can be identified. For the rest of the methods, a general agreement with only slight deviations is observed. These findings indicate the necessity to apply more than one method in the weighting approach. Note. The true magnitudes are extracted from the truth model and noise is added where indicated by *. The distributions are created with a width according to the confidence. From these distributions data points for calibration are extracted at +/−2σ, +/−1σ and the mean value.

Table A2
The Stress Data Records Available for the Calibration of the Model

Appendix B: Data
The disclosed direct data for model calibration (Table B1) and indirect data (Table B2) (Backers et al., 2017) Note. The qualities are assigned according to Morawietz et al. (2020). The asterisk signifies that the magnitudes are assumed for a strike slip regime. The corresponding magnitudes for a transtensional stress regime are documented in the according to references. The locations of the data records are displayed in Figure 6. Note. The locations are displayed in Figure 6.

Table B2
List of the Disclosed Formation Integrity Test (FIT) Data Records Used as Additional Constraints