Inference and dynamic decision-making for deteriorating systems
with probabilistic dependencies through Bayesian networks
and deep reinforcement learning

P.G. Morato pgmorato@uliege.be C.P. Andriotis K.G. Papakonstantinou P. Rigo ANAST, Department of ArGEnCo, University of Liege, 4000, Liege, Belgium Faculty of Architecture and the Built Environment, Delft University of Technology, 2628 BL Delft, The Netherlands Department of Civil & Environmental Engineering, The Pennsylvania State University, University Park, PA 16802, USA

Abstract

In the context of modern engineering, environmental, and societal concerns, there is an increasing demand for methods able to identify rational management strategies for civil engineering systems, minimizing structural failure risks while optimally planning inspection and maintenance (I&M) processes. Most available methods simplify the I&M decision problem to the component level, often assuming statistical, structural, or cost independence among components, due to the computational complexity associated with global optimization methodologies under joint system-level state descriptions. In this paper, we propose an efficient algorithmic framework for inference and decision-making under uncertainty for engineering systems exposed to deteriorating environments, providing optimal management strategies directly at the system level. In our approach, the decision problem is formulated as a factored partially observable Markov decision process, whose dynamics are encoded in Bayesian network conditional structures. The methodology can handle environments under equal or general, unequal deterioration correlations among components, through Gaussian hierarchical structures and dynamic Bayesian networks, decoupling the originally joint system state space to component networks conditional on shared random variables. In terms of policy optimization, we adopt a deep decentralized multi-agent actor-critic (DDMAC) reinforcement learning approach, in which the policies are approximated by actor neural networks guided by a critic network. By including deterioration dependence in the simulated environment, and by formulating the cost model at the system level, DDMAC policies intrinsically consider the underlying system-effects. This is demonstrated through numerical experiments conducted for both a 9-out-of-10 system and a steel frame under fatigue deterioration. Results demonstrate that DDMAC policies offer substantial benefits when compared to state-of-the-art heuristic approaches. The inherent consideration of system-effects by DDMAC strategies is also interpreted based on the learned policies.

keywords:

Infrastructure management; Decision analysis; Deep reinforcement learning; Partially Observable Markov Decision Processes; System reliability analysis; Dynamic Bayesian networks

^†^†journal: Reliability Engineering & System Safety

1 Introduction

Managing engineering systems, by controlling the risks of adverse events and optimally allocating inspection and repair resources, is crucial for securing societal progress, improving the quality of life at the community level and maximizing economic returns from an individual perspective Rackwitz et al. (2005); Faber and Stewart (2003). Research efforts devoted to the development of risk-based inspection and maintenance planning methods have increased considerably during the last decade Frangopol and Soliman (2016); Frangopol et al. (2004); Frangopol and Liu (2007). Increasing societal consciousness on sustainability, along with the expanding wealth of data from our structural systems and infrastructure, require, and enable, more efficient management policies Frangopol and Soliman (2016). Such policies need to support decision-making, for both newly designed systems and existing ones, through life-cycle plans that integrally account for interventions (e.g., repairs, retrofits, etc.) and data collection methods (e.g., inspections, structural health monitoring, etc).

Most available inspection and maintenance (I&M) planning methods assume independence among the constitutive components, primarily driven by the practical need to tame the involved computational complexities associated with solving such a decision-making optimization problem under uncertainty Straub (2004, 2009). At the component level, existing risk-based I&M methods can be classified according to their capabilities of modeling physically-based deterioration processes, e.g., fatigue or corrosion deterioration Papakonstantinou and Shinozuka (2013); Lotsberg et al. (2016); Yang and Frangopol (2018), and depending on their policy optimization approaches, namely, static decision rules, adaptive decision rules prescribed by heuristics or adaptive decision rules defined as a function of the dynamically updated history of actions and observations.

Some methods focus on the optimization of predefined static decision rules, planning inspections at equidistant intervals or when a prescribed failure probability threshold is surpassed, and prescribing maintenance interventions if a certain damage indicator is observed, e.g., crack detection Straub (2004); Nielsen and Sørensen (2018); Long et al. (2020); Hlaing et al. (2020). While these approaches can provide reasonable and effective policies in some specific scenarios, the optimality of the policies depends greatly on the designer’s experience when defining the heuristic combinations for the policy search, since they cannot consider all policies within the vast available policy space, which could in turn result more optimal than the originally considered predefined heuristics Morato et al. (2022); Hlaing et al. (2022). In other existing methods, while inspection planning decision rules are defined a priori, the maintenance policy is adaptive, properly updating the involved thresholds based on new information Bismut and Straub (2018, 2021). In these cases, action planning is formulated based on optimization techniques which need to be repeated for the number of all desired updates. While still operating on a limited policy subspace, such sophisticated methods correctly identify the need to go beyond static thresholds and accordingly provide solution approaches.

Methods based on Markov Decision Processes (MDPs) and Partially Observable MDPs (POMDPs) aim, on the other hand, at addressing the problem in a global optimization sense, outside the limitations of threshold-based or time-based formulations. Early works on the application of Markov decision processes for managing deteriorating engineering cases include Corotis et al. (2005); Papakonstantinou and Shinozuka (2014a, b, c). Founded on the principled mathematical properties offered by dynamic programming, either under full or partial observability, additional formulations have been proposed, e.g., in Andriotis et al. (2021); Memarzadeh et al. (2016, 2015). In the same class of applications, a recent POMDP-based approach proposed in Morato et al. (2022) demonstrated that POMDP policies outperform heuristic-based policies, as exemplified in physically-based numerical examples featuring fatigue deterioration. POMDP policies are defined as a function of the belief about the condition states, i.e., the probability distribution over states, which is a sufficient statistic of the prior history of actions and observations, recursively encoding it through forward Bayesian updates.

As mentioned before, many of the existing I&M methods formulate the decision-making problem at the component level. However, disregarding the essential interrelations among the system constituents, although allowing for a substantial simplification of the decision-making problem, may result in sub-optimal and even non-conservative policies in some cases. The need for I&M methods capable of determining policies at the system level has long been identified by the risk research community. Early works approaching the problem at the system level include Thoft-Christensen and Dalsgård Sørensen (1982); Ito et al. (1992); Deodatis et al. (1996); Enevoldsen and Sørensen (1993). In Straub and Papaioannou (2015), the fatigue details were classified according to the fatigue design factor, establishing a simplified approach for identifying system policies. More recently, Luque and Straub (2019, 2016) proposed a static I&M planning optimization relying on dynamic Bayesian networks to efficiently model deterioration, cost and reliability dependence among the structural elements. In this method, the policy is computed by optimizing static heuristic decision rules, with decision variables including, among others, equidistant inspections, number of inspected components, component prioritization, and repair thresholds based on observations. As with all static policy optimization methods, explained before, the policies are constrained to the set of predefined heuristic rules, out of the immense space of possible policies, which is substantially enlarged now in structural system settings.

Addressing the important complexities of managing large engineering systems, a deep reinforcement learning (DRL) method has been introduced in Andriotis and Papakonstantinou (2019), motivated by the success of deep reinforcement learning algorithms in complex game environments, e.g., in Silver et al. (2016, 2017); Mnih et al. (2013). In particular, a multi-agent actor-critic DRL scheme is developed in Andriotis and Papakonstantinou (2019), relying on (PO)MDPs for simulating the deteriorating environment, and demonstrating the capabilities of deep reinforcement learning approaches for identifying optimal policies in vast high-dimensional state, action and observation spaces. Thereafter, a modified version of this method has also been applied for solving system I&M decision-making problem under constraints, e.g., imposed risk thresholds or budget limitations Andriotis and Papakonstantinou (2021). In general, DRL approaches offer computational benefits in high-dimensional state spaces, mitigating the need for exhaustive state exploration by leveraging a function parametrization over the state space Wei et al. (2020); Fan et al. (2022); Yang (2022).

In this work, we formalize an efficient modeling framework for inference and decision-making under uncertainty for engineering systems, directly generating management strategies at the system level. In terms of inference, and addressing the general computational challenges associated with probabilistic analysis of multi-component engineering systems, our proposed methodology builds on top of adept Dynamic Bayesian Network (DBN) formulations Straub (2009); Luque and Straub (2016), modeling environments described by deterioration dependencies among components through Gaussian hierarchical structures, with the objective of decoupling the joint system space to independent component networks conditional on common influencing random variables. This decomposition results in a linear computational complexity with the number of components that otherwise increases exponentially in the joint system space. Furthermore, in this paper, the Gaussian hierarchical model is originally expanded and enhanced to enable the treatment of general, unequal deterioration correlation scenarios and dependence alterations after a maintenance action is taken for some of the components. In our developed framework, the transitional probabilistic model should appropriately consider the common random variables, and thus the algorithmic steps for properly updating the belief state under deterioration correlation within the DBN framework are described in detail.

While our developed generalized inference framework is applicable regardless of the decision-making method used for the generation of management policies, by formulating the decision-making problem as a factored POMDP, whose dynamics are encoded as Bayesian network conditional structures, the aforementioned efficient, modeling and inference framework can be seamlessly and naturally integrated with sophisticated decision-making optimization methods. In this regard, we adopt a deep decentralized multi-agent actor-critic (DDMAC) scheme, in which the system policies are approximated by actor neural networks, at a component level, guided via system level value function estimates approximated by a critic network Andriotis and Papakonstantinou (2019). As DDMAC adjusts the weights of the actor networks according to noisy rewards collected at the system level, DDMAC policies intrinsically consider system-effects stemming from structural and statistical dependencies. Through numerical experiments, we demonstrate the efficacy of the proposed method for I&M planning of structural systems exposed to fatigue deterioration. In particular, the effects of including deterioration dependence and campaign cost models are explored for the case of a 9-out-of-10 system. In the second application studied, featuring a steel frame structural system, the focus is on examining and interpreting the inherent allocation of maintenance interventions by DDMAC policies according to the element importance to the global structural reliability. In all experiments analyzed, DDMAC policies are compared against state-of-the-art optimized heuristic policies.

The remainder of the paper is structured as follows: an overview of POMDP methods along with the proposed factored formulation are presented in Section 2. In Section 3, the definition and modeling of Gaussian hierarchical structures are introduced, together with a belief update algorithm, applicable to environments under general deterioration dependence. The integration of the simulator, defined as a factored POMDP, with DDMAC is presented in Section 4. The numerical experiments are then introduced and discussed in Section 5, concluding with some final remarks in Section 6.

2 I&M decision problem formulated as a factored POMDP

2.1 Factored POMDP definition

The inspection and maintenance (I&M) planning decision-making problem is formulated here as a Partially Observable Markov Decision Process (POMDP), whose transition and observation models are defined by Bayesian network structures. POMDPs provide a principled mathematical framework for optimal planning and decision-making under uncertainty, formally specified by the tuple $⟨ S, A, O, T, Z, R, γ ⟩$ . A decision maker (henceforth agent) interacts with a stochastic environment, described by the state $s \in S$ , taking actions $a \in A$ over a finite or infinite horizon $t_{N}$ . The dynamics correspond to those in a MDP: At each time step $t$ , an agent takes an action $a_{t} \in A$ , and the environment evolves from state $s_{t} \in S$ to state $s_{t + 1} \in S$ , according to the transition model $T\coloneqqp(st+1|st,at)$ . In a MDP, the agent receives a reward based on the cost model $R\coloneqqrt(st,at,st+1)$ discounted by factor $γ$ , and the objective is to find the policy $π^{*}$ that induces the optimal value function $V^{*} (s_{t})$ :

V^{*} (s_{t}) = max a_{t} \in A ⎧ ⎨ ⎩ r (s_{t}, a_{t}) + γ \sum s_{t + 1} \in S p (s_{t + 1} | s_{t}, a_{t}) V^{*} (s_{t + 1}) ⎫ ⎬ ⎭

(1)

In a POMDP, however, states $s \in S$ are not directly observed and, instead, observations $o \in O$ can be collected according to the observation model $Z\coloneqqp(ot+1|st+1,at)$ . Note that the observation model is the likelihood of collecting an observation $o_{t + 1} \in O$ after taking an action $a_{t}$ and having transitioned to state $s_{t + 1}$ . In an I&M context, the observation model is often modeled by Probability of Detection (PoD) curves or according to inspection/monitoring measurement noise Morato et al. (2022). A POMDP policy is a mapping of the dynamically updated history of actions and observations, $a_{0 : t - 1}, o_{0 : t}$ , to the current action $a_{t}$ . This history is sufficiently encoded in belief $b$ , which is the probability over system states, b(s). The optimal policy $π^{*}$ , therefore, corresponds to the value function Kurniawati et al. (2008) that satisfies the Bellman equation:

V^{*} (b_{t}) = max a_{t} \in A ⎧ ⎨ ⎩ \sum s_{t} \in S r (s_{t}, a_{t}) b (s_{t}) + γ \sum o_{t + 1} \in O p (o_{t + 1} | b_{t}, a_{t}) V^{*} (b_{t + 1}) ⎫ ⎬ ⎭,

(2)

where $p (o_{t + 1} | b_{t}, a_{t})$ is the probability of collecting an observation $o_{t + 1} \in O$ given the belief $b_{t}$ and action $a_{t} \in A$ . Assuming a Markovian environment is reasonable in most practical applications with the aid of state augmentation techniques (Papakonstantinou and Shinozuka, 2014a), hence, any general I&M planning decision problem can be efficiently formulated as a POMDP. The determination of the optimal I&M policy $π^{*}$ becomes the main objective, inducing a minimization of the expected life-cycle costs $r_{t o t}$ , by balancing structural failure risk against inspection and maintenance costs:

E [r_{t o t}] = t_{N} \sum t = 0 [γ^{t} (r_{i n s, t} + r_{r e p, t} + r_{F, t})],

(3)

where $r_{i n s}$ , $r_{r e p}$ and $r_{F}$ stand for inspection, repair and failure costs, respectively, defined as negative rewards. In terms of utilities, the failure risk $r_{F}$ is typically defined in a structural reliability context as the annual probability of a failure event weighted by the consequence of a structural failure, which might also include environmental and societal consequences, specified in equivalent units. The definition of the failure risk at the system level will be further elaborated in Section 3.

Existing I&M planning applications often model the deterioration evolution $d$ , at the component level, conditional on a set of random variables $\upthetad$ Straub (2009); Luque and Straub (2019, 2016) or as a function of the deterioration rate $τ$ Papakonstantinou and Shinozuka (2014b); Andriotis and Papakonstantinou (2019). Both formulations are equivalent for modeling deterioration processes, as already discussed and demonstrated in Morato et al. (2022) and shown in Fig. 1. When observations are collected, through inspections or monitoring, Bayesian updating can be then conducted. Available algorithms allow exact Bayesian inference if the problem is formulated in a discrete state space Murphy (2002), as the computation of Bayes’ normalization constant is a challenging task in continuous state spaces Papakonstantinou et al. (2022). In order to utilize discrete state based algorithms, the involved continuous random variables can be discretized. The quality of the discretization has a huge impact and shall be treated carefully Straub (2009); Morato et al. (2022), especially when the problem deals with rare events, e.g., failure events. In general, an efficient discretization aims at minimizing the computational expense while preserving the required level of accuracy.

In a POMDP, the states cannot be directly observed and the decision maker reasons under partial observability, only informed by a belief $b$ , which is defined as the probability over states. At each time step, the belief is dynamically updated, based on Bayes’ rule, depending on the initial belief, $b_{t}$ , the action taken, $a_{t}$ , and the collected observation, $o_{t}$ , following three main steps: (i) the belief evolves according to the transition model $p (s_{t + 1} | s_{t}, a_{t})$ , (ii) the belief is updated based on the collected observation with probability $p (o_{t + 1} | s_{t + 1}, a_{t})$ , and (iii) the belief state is normalized. This belief update operation is denoted as forward pass within the context of hidden Markov models Murphy (2002). At the system level, the belief of each component can be updated by implementing the steps listed in Algorithm 1.

function updateBelief(

b_{t}, a_{t}, o_{t + 1}

)

for

1, N_{c}

b (s_{t + 1}) \leftarrow b (s_{t}) p (s_{t + 1} | s_{t}, a_{t})

▹

propagation step

b (s_{t + 1}) \leftarrow b (s_{t + 1}) p (o_{t + 1} | s_{t + 1}, a_{t})

▹

estimation step

b (s_{t + 1}) \leftarrow b (s_{t + 1}) / p (o_{t + 1} | b_{t}, a_{t})

▹

normalization step

end for

end function

Algorithm 1 Belief update for a system of

N_{c}

components

State-of-the-art POMDP solvers often require the modeling of the POMDPs in a flat structure, which can be usually encoded by augmenting the state space Papakonstantinou and Shinozuka (2014a), particularly if the process is described by multiple random variables. However, POMDPs can also be formulated in a factored fashion, exploiting the dependence structure among random variables and thus significantly alleviating the required computational effort. We specify here the transition and observation models based on conditional structures described by dynamic Bayesian networks (DBNs), and while the belief state $b$ remains the same as that for flat POMDPs, the transition and observation models are now constructed by taking advantage of the involved dependencies. For instance, the deterioration rate model can be constructed as $p (d_{t + 1} | d_{t}, τ_{t + 1}) p (τ_{t + 1} | τ_{t})$ instead of $p (d_{t + 1}, τ_{t + 1} | d_{t}, τ_{t})$ . This incorporation of conditional structures allows a reduction of the transition model dimensionality from $| S_{d} |^{2} | S_{τ} |^{2}$ to $| S_{d} |^{2} | S_{τ} | + | S_{τ} |^{2}$ and can achieve significant computational benefits when multiple random variables are involved. This formulation can be seamlessly applied to simulate the deterioration environment, as will be explained in Section 4, due to the flexibility naturally offered by the proposed deep reinforcement learning approach.

3 System effects in I&M planning

3.1 Deterioration dependence in a hierarchical Gaussian structure

Figure 1: Graphical representation of dynamic Bayesian networks for modeling deterioration processes. At the component level, the damage $d_{t}$ evolves over time $t$ as a function of the deterioration rate $τ_{t}$ (left) or conditional on a set of parameters $\upthetadt$ (right). While $d_{t}$ and $\upthetadt$ are hidden states, partially observed through $o_{d_{t}}$ , $τ_{t}$ is fully observable. The deterioration dependence among components is encoded by the hyperparameter or set of hyperparameters $\upalpha,\upbeta$ . The binary failure observable state indicates either the survival or failure state of the system $F_{s y s_{t}}$ depending on the components failure hidden state $F_{t}$ . Note that other variants of the system failure formulation can also be represented accordingly.

Existing methods model the deterioration correlation among components either via random fields or through common influencing factors. Whereas the former are particularly useful for applications in which the dependence is attributed to the geometrical distance between components, the latter are more suitable for systems in which identical attributes of physical phenomena, e.g., similar manufacturing techniques or similar loading, lead to shared sources of model uncertainties among the components Luque and Straub (2016). In a hierarchical structure, the deterioration of each component is defined conditional on a set of common influencing variables, shared among all the components and represented at the highest level of the hierarchy. In theory, the state space of a system under deterioration dependence can be modeled directly as the joint space of all the parameters involved in the deterioration process of the system. In this case, the discretized state space would grow exponentially with the number of components $N_{C}$ included, into a $| S |^{N_{C}}$ dimensional space. To overcome this increase in dimensionality, we adopt the hierarchical Gaussian structure previously proposed in Luque and Straub (2016), in which the belief state of each component is encoded conditional on a hyperparameter (common influencing variable) $α$ or set of hyperparameters $α$ . The central idea behind this hierarchical structure is that component beliefs for a given hyperparameter $b (s | α)$ are independent, enabling an efficient decoupling of the components joint space. This decoupling alleviates the computational complexity from the original joint space $| S |^{N_{C}}$ to a space $| S | \cdot | S_{α} | \cdot N_{C}$ that grows linearly with the number of considered components $N_{C}$ . Note that the state space includes now the states of the hyperparameter(s), which should also be properly discretized. The increase of the state space due to the incorporation of the hyperparameter(s) is however less significant than when considering the joint state space.

A graphical representation of the proposed hierarchical structure is illustrated in Fig. 1, applicable to deterioration processes modeled either as a function of the deterioration rate or conditional on a set of parameters Morato et al. (2022). In either case, the deterioration process $d$ is encoded conditional on the hyperparameter(s) $\upalpha$ , along with the deterioration rate $τ$ or parameters $\upthetad$ . Evidence collected through observations $o_{d_{t}}$ does not only serve for updating the damage state, but also for updating the hyperparameters. Since the hyperparameters are parent nodes for all the components, once a component is inspected, the hyperparameters are also updated, influencing all the other components, even those for which evidence was not directly available. The reliability of the system is represented in Fig. 1 by the binary node $F_{s y s}$ , conditional on the failure state of the components $F^{(l)}$ . At a component level, the failure state is modeled by the binary variable $F^{(l)}$ and corresponds to the subset of the deterioration space classified as failure $S_{F} \subseteq S$ .

This Gaussian hierarchical structure is a mathematically motivated model induced by the convenient formulation available for normal random variables, i.e., the conditional and joint distributions of normal random variables are also normally distributed. Let us first consider the special case in which the marginal probability of each considered component deterioration is defined as a standard normal random variable $Y_{i}$ . Under correlation, the parameters $Y_{i}$ are, however, defined as normal random variables with mean $λ_{i} α$ and standard deviation $\sqrt{1 - λ_{i}^{2}}$ Dunnett and Sobel (1955):

Y_{i} = \sqrt{1 - λ_{i}^{2}} X_{i} + λ_{i} α

(4)

Since both $X_{i}$ and $α$ are independent standard normal random variables, the covariance of $Y_{i}$ and $Y_{j}$ can be formulated as:

c o v (Y_{i}, Y_{j}) = (1 - λ_{i}^{2}) c o v (X_{i}, X_{j}) + \sqrt{1 - λ_{i}^{2}} (λ_{j}) c o v (X_{i}, α) + \sqrt{1 - λ_{j}^{2}} (λ_{i}) c o v (X_{j}, α) + λ_{i} λ_{j} c o v (α, α)

(5)

After removing all the terms associated with zero covariance, i.e., $c o v (X_{i}, X_{j})$ , $c o v (X_{i}, α)$ , and $c o v (X_{j}, α)$ , we can define the correlation coefficient between $Y_{i}$ and $Y_{j}$ as:

ρ (Y_{i}, Y_{j}) = λ_{i} λ_{j}

(6)

If all the components are equi-correlated, then $λ_{i} = λ_{j} = \sqrt{ρ (Y_{i}, Y_{j})}$ , for all $i, j$ . This presented Gaussian structure is further generalized in this work for the general case of unequally correlated components, by preserving the validity of Eq. 6. For complex correlation configurations, one hyperparameter $α$ might not be sufficient to satisfy Eq. 6, and in that case, one can incorporate additional hyperparameters $α$ , at the expense of a higher computational cost. When multiple hyperparameters $α$ are included, the best fit for $ρ (Y_{i}, Y_{j}) = \sum_{k = 1}^{m} (λ_{i k} λ_{j k})$ can be found via optimization procedures, e.g., least squares Song and Kang (2009). Once the Gaussian correlation structure is specified through the parameters $λ$ , the cumulative distribution of $Y_{i}$ conditional on the hyperparameter(s) $α$ can be defined as:

F_{Y_{i} | α} (y_{i}) = Φ ⎡ ⎢ ⎢ ⎣ \frac{y_{i} - λ_{i} α}{\sqrt{1 - λ_{i}^{2}}} ⎤ ⎥ ⎥ ⎦

(7)

For the cases in which the deterioration process is modeled by random variables other than Gaussian and considering that a Nataf transformation is applicable Luque and Straub (2016), then Eq. 7 can be also redefined as:

F_{D_{i} | α} (d_{i}) = Φ ⎡ ⎢ ⎢ ⎣ \frac{Φ^{- 1} [F_{d} (d_{i})] - λ_{i} α}{\sqrt{1 - λ_{i}^{2}}} ⎤ ⎥ ⎥ ⎦

(8)

where $F_{D_{i} | α} (d_{i})$ stands for the cumulative distribution function of a variable $d_{i}$ conditional on the hyperparameter(s) $\upalpha$ , and $Φ$ is the standard normal cumulative distribution function. In a discrete state space, the belief conditional on the hyperparameters is equal to the difference between the cumulative distribution function at the upper boundary and at the lower boundary of each belief interval:

b (d_{i} | α) = F_{D_{i} | α} (d_{i}^{+}) - F_{D_{i} | α} (d_{i}^{-})

(9)

3.2 Belief update under deterioration dependence

We reformulate here the belief update algorithmic scheme introduced in Section 2 for a system under deterioration dependence among components. All necessary implementation steps are listed in Algorithm 2. Bayesian inference is firstly conducted for the conditional beliefs $b (s_{t + 1} | α)$ and hyperparameters $b (α)$ , propagating uncertainty according to the transition model $p (s_{t + 1} | s_{t}, a_{t})$ and observation model $p (o_{t + 1} | s_{t + 1}, a_{t})$ . The likelihood of collecting an observation given the hyperparameter(s) $p (o_{t + 1} | α)$ , later necessary to update $b (α)$ , can be easily computed by marginalizing out the states other than $α$ :

p (o_{t + 1} | α, a_{t}) = \sum s_{t + 1} \in S [b (s_{t + 1} | α) p (o_{t + 1} | s_{t + 1}, a_{t})]

(10)

Bayesian inference is then conducted for the hyperparameters:

p (α | o_{t + 1}, a_{t}) = b (α) p (o_{t + 1} | α, a_{t}) / p (o_{t + 1} | a_{t})

(11)

After updating the conditional beliefs and common influencing variables, the marginal deterioration beliefs can be computed by marginalizing out the hyperparameters $α$ as:

b (s_{t + 1}) = \sum α \in Γ [p (s_{t + 1} | α) b (α)]

(12)

function updateBelief(

b (s_{t} | α), b (α), a_{t}, o_{t + 1}

)

for

1, N_{c}

b (s_{t + 1} | α) \leftarrow b (s_{t} | α) p (s_{t + 1} | s_{t}, a_{t})

▹

propagation step

b (s_{t + 1} | α) \leftarrow b (s_{t + 1} | α) p (o_{t + 1} | s_{t + 1}, a_{t})

▹

estimation step

b (s_{t + 1} | α) \leftarrow b (s_{t + 1} | α) / p (o_{t + 1} | b, a_{t})

▹

normalization step

p (o_{t + 1} | α) \leftarrow \sum_{s_{t + 1} \in S} [b (s_{t + 1} | α) p (o_{t + 1} | s_{t + 1}, a_{t})]

▹

likelihood

b (α) \leftarrow b (α) p (o_{t + 1} | α, a_{t}) / p (o_{t + 1} | a_{t})

▹

hyperparameter(s) update

end for

for

1, N_{c}

b (s_{t + 1}) \leftarrow \sum_{α \in α} [b (s_{t + 1} | α) b (α)]

▹

marginalizing out hyperparameter(s)

end for

return

b (s_{t + 1})

end function

Algorithm 2 Belief update under deterioration dependence for a system of

N_{C}

components

The effect of maintenance actions on the Gaussian dependence structure has not been explored in the existing literature Luque and Straub (2019, 2016), up to the best knowledge of the authors. Whereas the defined deterioration dependence is preserved if no maintenance interventions are planned, structural interventions can potentially disrupt the underlying correlation structure. For instance, if a structural system is specified with a correlated initial crack size among fatigue hotspots, this correlation structure will be perturbed after a component is repaired, along with the correlation reduction naturally experienced by the system over time. The correlation evolution associated with the latter is intrinsically quantified through the uncertainty propagation and updating operations formulated in Eqs. 10-11, whereas the correlation disruption associated with the former can be modeled by now defining the transition model $p (s_{t + 1} | s_{t}, a_{t}, α)$ of the involved components also conditional to the hyperparameter(s) $α$ , enabling therefore the removal or modification of the deterioration dependence by redefining and implementing the relevant correlation coefficients $λ_{i}$ in Eqs. 7 - 9. Additional discussion and implementation of this aspect is presented in the numerical experiments section.

3.3 System structural reliability and system cost model

As input to the I&M decision-making problem (Section 2), utilities, $r_{i n s}$ and $r_{r e p}$ , are assigned to inspection and repair actions, respectively, specified according to available options and settings in each problem. The annual risk of a system failure, $r_{F}$ , is defined accounting for two consecutive time steps (e.g., years) and its associated system failure cost, $r_{f}$ , as:

r_{F} = (p_{F_{s y s, t + 1}} - p_{F_{s y s, t}}) r_{f}

(13)

The system structural failure event, as illustrated in Figure 1 by the node $F_{s y s}$ , is specified by a binary variable $p_{F_{s y s}}$ , indicating the failure and survival states, conditional on the belief state $b (s)$ of the structural components, as these are determined by the performed I&M actions in time. In principle, $p_{F_{s y s}}$ could be directly defined as a function of the components belief state; in practice, however, $p_{F_{s y s}}$ remains only conditional to the event of component failures $p_{F}$ , specified as:

p_{F} = \sum s \in S_{F} b (s)

(14)

where $S_{F}$ corresponds to the components state subset classified as failure $S_{F} \subseteq S$ . Within the deep reinforcement learning approach presented in Section 4, $P_{F_{s y s}}$ can be computed via closed-form procedures and/or supported by efficient matricial algorithms Song and Kang (2009); or it can be computed following a general scheme, obtaining $p_{F_{s y s}}$ through a simulator Der Kiureghian (2022). By assigning utilities to the system state, the importance of each structural element to the global risk of a system failure is implicitly accounted. To illustrate the effect of defining the failure risks at the system level, I&M strategies for a redundant 2-dimensional frame structure are later explored in Section 5.

In most structural systems, from bridges to offshore platforms or wind farms, inspection and repair actions are not planned separately for each structural element. Maintenance campaigns are instead scheduled, collecting information or performing repairs on a group of structural components. The cost model can thus be adapted from Eq. 3, to include a fixed campaign cost, $r_{c a m p}$ , incurred every time a campaign is planned, along with inspection, $r_{i n s}^{(l)}$ , and repair, $r_{r e p}^{(l)}$ , costs assigned to the individual components according to any nonlinear function of choice, $H (.)$ , e.g., simple linear sum operator, as:

r_{t o t} = r_{c a m p} + H (r_{i n s}^{(l)}, r_{r e p}^{(l)}) + r_{F}

(15)

4 Optimal I&M planning via deep reinforcement learning

I&M planning decision problems, formulated as POMDPs (as explained in Sections 2 and 3), can be solved by dynamic programming algorithms, e.g., via exact alpha-vector value iteration Pineau et al. (2003). In practice, however, exact value iteration can be applied to only very small state space problems due to the complexity associated with the exponential increase in the number alpha vectors with the number of observations at every iteration. Recently, I&M planning decision problems, at the component level, formulated as POMDPs and characterized by multiple states have been efficiently solved via point-based POMDP algorithms Morato et al. (2022); Papakonstantinou and Shinozuka (2014b); Papakonstantinou et al. (2018). Point-based solvers exploit the fact that the value function $V (b)$ (Eq. 2) is piece-wise linear and convex and can be thus parameterized by a set of $v_{p} \in V_{p}$ vectors, each of which is associated with a specific action $a \in A$ . The optimal value function, $V^{*} (b)$ , can be therefore defined in terms of a set of $v_{p}$ vectors Puterman (2014):

V^{*} (b) = m a x_{v_{p} \in V_{p}} [\sum s \in S b (s) v_{p} (s)]

(16)

State-of-the-art point-based POMDP solvers mainly differ on their approach of sampling reachable belief points, and the way Bellman backup operations are executed, e.g., in Kurniawati et al. (2008); Spaan and Vlassis (2005); Smith and Simmons (2006). The reader is directed to Papakonstantinou et al. (2018) for a detailed comparison of point-based solvers applied to infrastructure I&M settings. While point-based solvers are able to efficiently provide optimal policies at the component level and for realistically large systems, the dimensionality still becomes a limiting factor in high-dimensional state, action, and observation space settings, typical in structural systems. Deep Reinforcement Learning (DRL) provides then a powerful solution in such settings, as the value or policy function can be parameterized with deep artificial neural networks. Thereby, the planning task reduces to finding a number of parameters that is much lower than the number of original states and actions of the problem. The interested reader is directed to Sutton and Barto (2018); Li (2017) for a well elaborated introduction and discussion on DRL. In our proposed approach, we integrate the factored POMDP formulation introduced in Section 2 with a Decentralized Deep Multi-agent Actor-Critic (DDMAC) scheme, adopted from Andriotis and Papakonstantinou (2019). This combination provides an efficient algorithmic platform for inspection and maintenance planning of structural systems under deterioration, reliability and cost dependencies, in large-scale multi-component environments.

Figure 2: On the left: Representation of a factored POMDP derived from the deterioration rate dynamic Bayesian network introduced in Fig. 1. The deterioration process $d_{t}$ , influenced by the deterioration rate $τ_{t}$ , conditional on the hyperparameters $α$ and partially observed through $o_{d_{t}}$ , is controlled by the action decision node $a_{t}$ . A reward $r_{t}$ is collected as a result of taking action $a_{t}$ at state $d_{t}$ . On the right: Deep Decentralized Multi-Agent Actor Critic (DDMAC) featuring the critic network at the top, and a group of actor networks, one for each component, at the bottom.

Each component of the system is controlled by the stochastic policy $π(a|b,\upthetaπ)$ provided by a group of multi-agent actor networks, defined as a function of parameters $\upthetaπ$ , as illustrated on the right side of Fig. 2 with light blue bars. In many applications, DRL policies after training are nearly deterministic, suggesting one action in particular, whereas stochastic policies are often optimal in constraint environments Andriotis and Papakonstantinou (2021). In our implementation, we consider agents acting as independent units in a decentralized manner, i.e., the actions taken by one actor are naturally not affected by the actions taken by other actors:

π (a | b) = N_{C} \prod l = 1 π_{l} (a^{(l)} | b)

(17)

The input to the actor networks corresponds to the marginal belief states of all components along with the deterioration rate and the time step encoded as a zero-one vector. For instance, if the environment is described by the factored POMDP represented on the left side of Fig. 2, the actor networks receive the deterioration belief states $b (s_{d})$ and deterioration rate states $b (s_{τ})$ for all components (in $| S_{d} | \cdot N_{c}$ and $| S_{τ} | \cdot N_{c}$ matrix formats, respectively), plus an input indicating the time step $t \in t_{N}$ . If deterioration dependence is included through a hierarchical Gaussian model, as explained in Section 3, then conditional beliefs $b(sd|\upalpha)$ and hyperparameters beliefs $b(\upalpha)$ should also be used while simulating the deterioration environment. Even for environments under deterioration dependence, the neural networks only receive as input the components’ marginal beliefs $b^{(l)}$ , for all $l$ , computed by following the steps listed in Algorithm 2. ReLU activation functions are used for the hidden layers of the actor networks, and the output layer is activated by a softmax function, generating the output policy as a probability distribution over the available actions.

The actor network weights are adjusted/updated according to the noisy rewards collected from a batch of previous experiences, following an off-policy training approach that offers more sample efficiency than on-policy training algorithms. A replay buffer Schaul et al. (2015) stores beliefs $b_{t}$ , actions $a_{t}$ , rewards $r (b_{t}, a_{t})$ and behavior policies $μ_{t}$ , experienced during the simulations of the deterioration environment. The off-policy gradient estimator is thus formulated with samples generated by a behavior policy $μ$ , different from $π$ , and corrected with the truncated importance sampling weight $w_{t} = min {c, π (a_{t} | b_{t}) / μ (a_{t} | b_{t})}$ , with $c > 0$ Andriotis and Papakonstantinou (2019):

g\upthetaπ=Eat∼\upmu[wt{Nc∑i=1∇\upthetaπlogπi(a(i)t|bt,\upthetaπ)}Aπ(bt,at|\upthetaV)]

(18)

The advantage function $A^{π} (b_{t}, a_{t})$ indicates how optimal is action $a_{t}$ with respect to the current estimated value function $V^{π} (b_{t})$ and defined in a temporal difference learning fashion as:

Aπ(bt,at|\upthetaV)≃r(bt,at)+γV(bt+1|\upthetaV)−V(bt|\upthetaV)

(19)

The value function is approximated by the critic network, defined as a function of parameters $\upthetaV$ , as illustrated on the right side of Fig. 2. Whereas the critic network receives the same input as the actor network (components marginalized beliefs, deterioration rates, and a time step indicator), the output of the critic is the value function, i.e., one scalar value that indicates the expected reward of the system. The critic network provides the value function used by the advantage function $Aπ(bt,at|\upthetaV)$ , acting, therefore, as a critic who is determining how good the action taken by the actor network is. The training of the critic network also follows a temporal difference approach, collecting experiences from the replay buffer, and adjusting the critic parameters $\upthetaV$ according to the gradient:

g\upthetaV=Eat∼\upmu[wt∇\upthetaVVπ(bt|\upthetaV)Aπ(bt,at|\upthetaV)]

(20)

All the algorithmic steps are described in Algorithm 3. With our proposed method, we are able to find optimal I&M policies for structural systems featuring very high dimensional state, action and observation spaces. Moreover, the obtained DDMAC policies intrinsically account for system-effects (Section 3) as the actor network is adjusted according to the rewards collected by simulating the deteriorating environment at the system level. Specifically, the integration of DDMAC with a deterioration environment simulated with a factored POMDP (Section 2) enables the identification of optimal I&M policies that concurrently consider the following system effects:

Deterioration dependence among components (statistical dependence): A Gaussian hierarchical model efficiently captures the deterioration dependence, e.g., initial crack size, or loading. The belief of each component is conditional on the common hyperparameter(s) $b (s_{d} | α)$ . Under statistical deterioration dependence, information collected by inspecting one component informs belief updates for other components as per the specified deterioration. The influence of this system-effect on the policy is explored via numerical experiments in Section 5, for a 9-out-of-10 system and for a steel frame structural system subject to fatigue deterioration.
System structural reliability (structural dependence): Failure risk is computed at the system level, by multiplying annual risk with a negative reward $r_{f}$ that is defined as a function of the components structural health, as shown in Eq. 13. The actors, even though acting individually, are all conditioned on all component beliefs, thus knowing the system reliability. DDMAC is able to intrinsically adjust the policy according to the relative importance of each component to the system structural reliability, as demonstrated with the numerical experiments conducted for the steel frame structural system (Section 5).
Inspection and maintenance cost model (cost dependence): A campaign cost $r_{c a m p}$ is included, in the applications, as a base cost, if at least one component is inspected or repaired, plus an additional inspection or repair cost for each inspected or repaired component, as shown in Eq. 15. Since DDMAC collects rewards at the system level, the campaign cost model affects the resulting I&M policies, concentrating inspection and repair actions at particular time steps, as observed in the numerical experiments conducted for the 9-out-of-10 system (Section 5).

Initialize replay buffer

Initialize actor and critic network weights

θ^{π}, θ^{V}

for

e p i s o d e = 1, M

for

t = 1, t_{N}

Select action

a_{t}

at random according to exploration noise

Otherwise select action

at∼\upmut={πj(⋅|bt,θπ)}Ncj=1

Collect reward

r (b_{t}, a_{t})

Observe

o_{t + 1}^{(l)} \sim p (o_{t + 1}^{(l)} | b_{t}, a_{t})

for

l = 1, 2, . . ., N_{c}

Compute beliefs

b_{t + 1}

: updateBelief(

b_{t}, a_{t}, o_{t}

)

Store experience

(b_{t}, a_{t}, μ_{t}, r (b_{t}, a_{t}), b_{t + 1})

in replay buffer

Sample batch of

(b_{i}, a_{i}, μ_{i}, r (b_{i}, a_{i}), b_{i + 1})

from replay buffer

b_{i + 1}

is terminal state

A_{i}^{π} = r (b_{i}, a_{i}) - V^{π} (b_{i}, θ^{V})

Otherwise

A_{i}^{π} = r (b_{i}, a_{i}) + γ V^{π} (b_{i + 1}, θ^{V}) - V^{π} (b_{i}, θ^{V})

Update actor parameters

θ^{π}

according to gradient:

g\upthetaπ≃∑iwi{∑Ncj=1∇\upthetaπlogπj(a(j)i|bi,θπ)}Aπi

Update critic parameters

\upthetaV

according to gradient:

g\upthetaV≃∑iwi∇\upthetaVVπ(bi|θV)Aπi

end for

Algorithm 3 Deep Decentralized Multi-agent Actor Critic (DDMAC)

5 Numerical experiments

DDMAC inspection and maintenance policies are tested for a 9-out-of-10 system under fatigue deterioration, exploring the different statistical, structural, and cost dependencies. A second set of numerical experiments is conducted to investigate the efficiency of DDMAC policies for a 2D steel frame, also known as Zayas frame, used as a benchmark structural system for offshore engineering collapse analyses Popov et al. (1980); Moan et al. (1991). The numerical experiments are conducted on an Intel Core i9-7900X processor with a clock speed of 3.30 GHz.

Fatigue deterioration model

The components explored throughout the numerical investigations are assumed to be exposed to a similar fatigue deterioration, described according to the Markovian model, originally proposed in Ditlevsen and Madsen (2007):

d_{t + 1} = [(1 - \frac{m}{2}) C_{F M} S_{R}^{m} π^{m / 2} n + d_{t}^{1 - m / 2}]^{2 / (2 - m)}

(21)

where the crack depth, $d$ , evolution over time, $t$ , follows a linear-elastic fracture mechanics law with material parameters $C_{F M}$ and $m$ , stress range $S_{R}$ , and $n$ annual stress cycles. At the component level, failure occurs if the crack depth, $d$ , exceeds a critical size, $d_{c}$ , that corresponds to the plate thickness. In a stochastic environment, the initial crack depth, $d_{0}$ , along with fracture mechanics model parameters are either represented by random variables or deterministic parameters as listed in Table 1.

Variable	Distribution	Mean	SD
$l n (C_{F M})$	Normal	$- 35.2$	$0.5$
$S_{R} (N / m m^{2})$	Normal	$70$	$10$
$d_{0} (m m)$	Exponential	$1$	$1$
$m$	Deterministic	$3.5$	-
$n (c y c l e s)$	Deterministic	$10^{6}$	-
$t_{N} (y r)$	Deterministic	$30$	-
$d_{c} (m m)$	Deterministic	$20$	-

Table 1: Random variables and deterministic parameters utilized to model the fatigue deterioration of the components in the numerical experiments.

The failure probability $p_{F_{t}}$ , defined as $p_{F_{t}} = P r [g_{t} \leq 0]$ , can be computed following, for instance, a through-thickness failure criterion Hlaing et al. (2020) by formulating the failure limit state at time step $t$ as:

g_{t} = d_{c} - d_{t}

(22)

The fatigue deterioration is encoded in a deterioration rate DBN model, and ultimately shaping a factored POMDP, as shown on the left side of Fig. 2, and presented in Section 2. The continuous crack depth, $d$ , is adequately discretizated into $| S_{d} | = 30$ states conditional on $| S_{τ} | = 31$ fully observable deterioration rates states. The intervals and state space utilized for this deterioration rate model are listed in Table 2.

Variable	Interval boundaries
Deterioration rate model
$S_{d}$	$0, e x p {l n (10^{- 4}) : \frac{l n (d_{c}) - l n (10^{- 4})}{28} : l n (d_{c})}, \infty$
$S_{τ}$	$0 : 1 : 30$

Table 2: Description of the discretization scheme implemented for the factored deterioration rate POMDP.

In terms of observation model, the inspection quality is quantified with a Probability of Detection curve $P o D (d) \sim E x p [μ = 8]$ . Further details on the fatigue deterioration or observation model, including an extensive investigation of the discretization scheme can be found in Morato et al. (2022).

5.1 I&M planning for a 9-out-10 system

The system explored in this application is composed of ten components, each of which is subjected to a non-stationary fatigue deterioration, as described earlier in this Section. The system is assumed to be functional if at least 9-out-of-10 components are operational (not failed), thus characterized with a single step change in terms of redundancy with respect to a series system, which would correspond to the case of a 10-out-of-10 system. The system failure probability $p_{F_{s y s}}$ is efficiently computed here, as a function of the failure state of all components, by following the recursive method proposed in Barlow and Heidtmann (1984).

Description of the I&M decision problem

A total of eight I&M planning scenarios are investigated, exploring different deterioration, risk, and cost dependencies among components. In terms of deterioration dependence, some environments are specified with an equally correlated initial crack size, $d_{0}$ , among components, defined by an equal correlation $ρ_{e q} = 0$ , $ρ_{e q} = 0.4$ and $ρ_{e q} = 0.8$ , respectively. Additionally, a deterioration environment is examined with an unequally correlated $ρ_{u q}$ initial crack size, $d_{0}$ , among components. The unequal deterioration dependence case is originally specified with a different correlation among components of either $ρ = 0.4$ , $ρ = 0.6$ , or $ρ = 0.8$ , as shown on the left side of Fig. 3. After a Gaussian hierarchical structure with two hyperparameters is optimized, by computing the $λ$ parameters with the objective of satisfying Eq. 6, an approximated correlation structure is obtained with relatively small errors, as shown on the right side of Fig. 3. The approximated correlation structure with two hyperparameters is deemed to be sufficiently accurate for the conducted experiments. Otherwise, a more accurate correlation structure can be achieved by adding more hyperparameters, at the expense of additional computational cost, as explained in Section 3. For each of the aforementioned environments, specified with different deterioration dependencies, two I&M cost models are further investigated, i.e., an I&M cost model that incurs inspection and repair costs individually; and an I&M cost model in which an initial campaign cost exists, if at least one component is inspected or repaired, plus a cost surplus per inspected or repaired component.

Figure 3: Representation of the initial crack size dependence among the components of the unequally correlated 9-out-of-10 system. The original deterioration correlation is represented by the colored matrix on the left. The approximated correlation structure, resulting from the derived Gaussian hierarchical model with two hyperparameters, is displayed on the right colored matrix.

Since each component, herein denoted as fatigue hotspot, contains 930 states, defined by the joint space of 30 crack states $S_{d}$ and 31 deterioration rate states $S_{τ}$ , these are a total of 9,300 states in the experiments that do not consider deterioration correlation ( $ρ_{e q} = 0$ ). For experiments under equal correlation ( $ρ_{e q}$ ) the total states become 744,080, rising up to $5.95 \cdot 10^{7}$ for deterioration environments under unequal correlation ( $ρ_{u q}$ ). The increase of the state space corresponds to the incorporation of the Gaussian hierarchical model, in which crack and deterioration rate states are formulated conditional on the hyperparameter(s) states. When the deterioration correlation is modeled equally for all components, only one hyperparameter is sufficient to satisfy Eq. 6, while two hyperparameters are added for the case of unequal correlation, as explained earlier. Each hyperparameter is discretized into 80 states, initially prescribed with equal probability for each state. Note here the importance of optimizing the number of hyperparameters included in the model, as the state space grows exponentially with the number of considered random variables. By formulating the POMDPs’ transition model as dynamic Bayesian networks, the dimensionality is reduced from $| S_{d} |^{2} | S_{τ} |^{2}$ , in a flat structure, to $| S_{d} |^{2} | S_{τ} | + | S_{τ} |^{2}$ for the uncorrelated scenario. In that case, the transition model of only one component is reduced from 864,900 to 28,861 elements. Moreover, the formulation of the environment through a hierarchical deterioration dependence model importantly enables the decoupling of the joint state space at the system level, which would grow exponentially for a flat POMDP structure, but instead grows linearly now. For instance, in the setting under unequal deterioration dependence, the joint space would be described by ${| S_{d} | | S_{τ} |}^{N_{C}}$ , equaling $930^{10}$ states, while it is now instead defined by ${| S_{d} | | S_{τ} | N_{C} | S_{α} | | S_{β} | + | S_{α} | + | S_{β} |}$ , thus resulting in $930 \cdot 10 \cdot 80^{2} + 80 \cdot 2 ≃ 5.95 \cdot 10^{7}$ states in the hierarchical model, with two hyperparameters ( $α$ and $β$ ) discretized into 80 states.

In terms of the neural networks’ architecture, DDMAC is laid out in this application with two hidden fully-connected layers of 100 neurons for each actor network, and two hidden fully-connected layers with 200 neurons for the critic network. The learning rate is adjusted during the training of the networks from $10^{- 4}$ to $10^{- 5}$ for the actor, and from $10^{- 3}$ to $10^{- 4}$ for the critic. The exploration is set up initially with a 100% random noise, decreasing linearly over the first 20,000 episodes to a random noise of 1%, held constant for the remaining episodes. A more stable and efficient training was found when a prioritization of do-nothing actions is implemented at the beginning of the training, because this allows visitation of more states, thus better exploration.

Following typical fatigue I&M planning settings, inspection and repair decisions are combined into three available actions per component: do-nothing / no-inspection, do-nothing / inspection, and perfect repair / no-inspection. The action perfect-repair / inspection is considered a priori suboptimal, without loss of generality, as it would be unusual to plan an inspection just after a component returns to its initial state. Inspections provide binary indications, i.e., detection or no-detection of a crack according to the observation model. In terms of costs, two different scenarios are considered. In the first case, inspection and repair costs are incurred independently per component, i.e., $r_{i n s} = - 1$ and $r_{r e p} = - 20$ , respectively. In the second case, a campaign cost of $r_{c a m p} = - 5$ is incurred if at least one component is inspected or repaired, plus a surplus per inspected or repair component of $r_{i n s} = - 0.2$ and $r_{p e r} = - 20$ money units, respectively. The consequence of a system structural failure is $r_{F} = - 10, 000$ money units for both cases, and the discount factor $γ$ is 0.95 in all the experiments.

In order to verify the optimality of the obtained DDMAC policies, predefined heuristic decision rules, adopted from Luque and Straub (2019), are optimized and compared against the results provided by DDMAC strategies. The investigated heuristic-based policies are dictated by (i) the interval between equidistant inspections $Δ_{i n s}$ , (ii) how many components $n_{i n s}$ are inspected at each campaign, in which the $n_{i n s}$ components with higher failure probability $p_{F}$ are prioritized, and (iii) a perfect repair action is undertaken after a crack is detected. Initially, all the combinations of heuristics, i.e., interval between inspections $Δ_{i n s}$ and number of components inspected per campaign $n_{i n s}$ , are evaluated over 3,000 policy realizations. Then, the 5 sets of heuristic rules that yielded the minimum expected costs are evaluated again, this time over 10,000 policy realizations, and at the end, the set of heuristics that minimized the expected total costs are selected for comparison against DDMAC-based policies, also evaluated over 10,000 policy realizations. The resulting set of optimized heuristics is listed in Appendix A (Table A1).

Results and discussion

The life-cycle expected costs obtained by evaluating the investigated policies are displayed in Fig. 4, sorted in two main categories according to the specified cost model, comparing DDMAC and optimized heuristic policies and investigating the effect of adding campaign I&M costs. For each category, four degrees of deterioration correlation are compared, i.e., no correlation ( $ρ_{e q} = 0$ ), equal correlation with ( $ρ_{e q} = 0.4$ ), equal correlation with ( $ρ_{e q} = 0.8$ ) and unequal correlation ( $ρ_{u q}$ ). In all explored numerical examples, DDMAC outperforms the optimized heuristics, yielding life-cycle cost reductions ranging from 9.7% to 21.9%. The difference is more predominant for the case in which inspections and repairs are planned separately because the explored heuristic decision rules plan inspections for a group of $n_{C}$ components, being thus more tailored to the campaign I&M setting. A closer examination reveals that DDMAC policy provides lower inspection, repair, and failure expected costs, with respect to heuristics-based policies, for the uncorrelated deterioration experiment specified with the individual I&M cost model. In this case, the savings on repairs are more significant probably because the heuristic policy prescribes a repair anytime a crack detection is observed, while DDMAC-based policy usually requires more evidence than a single detection instance.

With regard to deterioration dependence, highly correlated environments result generally in lower expected total cost, as observed in Fig. 4. Information collected on one component, in environments under deterioration correlation, also provides information to other components’ states in the system. One can clearly observe the effect of inspection and repairs on the resulting component and system failure probabilities as well as the oscillation of the hyperparameters around an uncorrelated mean of 0 when inspections among components are not consistent, i.e., some inspections indicate damage and some others do not, whereas the opposite is observed when all inspections are consistent, e.g., no damage indicators. For instance, a crack detection observed on component 9 for the case of equal correlation $ρ_{e q} = 0.8$ , leads to an incremental increase on the failure probability of other non-repaired components, as indicated inside the green rectangles on the lower-left corner of Fig. 5. This effect can be also visualized when observing the impact of a crack detection on component 4, for the case under unequal deterioration dependence, marked by green rectangles on the lower-right plot of Fig. 5. In this case, components 3 and 5, highly correlated with component 4, as indicated in Fig. 3, are clearly affected by the observed crack detection.

Figure 4: Expected cost results of all the numerical experiments conducted for the 9-out-of-10 system, divided into campaign $E [r_{c a m p}]$ , inspection $E [r_{i n s}]$ , perfect-repair $E [r_{r e p}]$ and failure $E [r_{F}]$ expected costs. On the left, DDMAC and heuristic policies, specified with an I&M cost model, are compared for different deterioration correlation environments. Likewise, on the right, DDMAC and heuristic policies are compared for different levels of deterioration dependence, yet specified with a campaign I&M cost model.

Figure 5: 9-out-of-10 system policy realizations: (Upper-left) uncorrelated deterioration and individual cost model; (Upper-right) uncorrelated deterioration and campaign cost model; (Lower-left) equally correlated environment ( $ρ_{e q} = 0.8$ ) and individual cost model; (Lower-right) unequally correlated deterioration and individual cost model. Failure probabilities at the component level are depicted by blue lines, inspection indications are represented by upwards (detection) or downwards (no-detection) triangles and repairs are circled in red. At the system level, the failure probability is represented by green diagrams and the evolution of the hyperparameters, under correlated deterioration, is described by light-blue graphs. As mentioned in the text, specific policy and inference effects are marked with coloured rectangles.

Figure 6: Comparison of DDMAC policies specified with either individual (light blue) or campaign (dark blue) I&M cost models. For each case, the number of inspected components per time step is represented in a histogram based on 10,000 policy realizations.

Moreover, highly correlated deteriorating environments induce higher variability on the expected total costs, as shown by the black error bars in Fig. 4. The variability can be attributed to the very different resulting policy paths following inspections, depending on whether the collected observations indicate cracks or not. If a crack is detected on one component, the other components’ failure probabilities increase, and repair actions or additional inspections will be planned thereafter. Conversely, if a crack is not detected, the failure probability of all the correlated components will decrease, inducing less repair actions in the future. Interestingly, policies under dependent environments do not always plan fewer inspections, as it could be expected due to the additional information gained through the underlying correlation among components, but instead highly correlated environments might plan more inspections, often resulting in significant failure risk reductions, as displayed for the case with $ρ_{e q} = 0.8$ in Fig. 4. The effect of repair actions on the deterioration dependence structure can also be clearly observed. Once an element is repaired, its damage belief becomes independent from the global hyperparameter(s), and thus inspection outcomes from other components do not influence the repaired component, and vice versa. As illustrated with a red rectangle at the lower-left corner of Fig. 5 (i.e., equal correlation $ρ_{e q} = 0.8$ setting), after component 4 is repaired, its failure probability is not influenced now by inspection results retrieved from other components, e.g., a crack detection observed on component 9.

To further investigate the effect of including campaign utilities within the cost model, a histogram over 10,000 policy realizations is shown in Fig. 6 for a DDMAC policy in which inspections and repairs are incurred separately (light blue) and another DDMAC policy considering the expense of campaign actions (dark blue). The deterioration environment for both DDMAC policies does not consider deterioration correlation among components. The emphasis of Fig. 6 is on the number of components inspected on every occasion an inspection is planned. If inspections and repairs are paid individually for each component, information on only one or two components usually suffices, whereas inspection of more than five components per time step is rare. In contrast, eight-component inspections become the predominant inspection decision if an initial campaign cost is included in the cost model. This system effect can be visualized, for example, in the policy realizations shown at the top of Fig. 5, in which black rectangle-marked components are inspected at the same year for the DDMAC policy under campaign costs. The policy for the campaign cost model, therefore, tends to group inspection and repair actions at the same year, avoiding, if possible, unnecessary campaign costs associated with one or two inspected components. In some cases, campaigns are however planned for only one or two components, contrasting with the static inspection decision rules imposed by heuristics, where a specific number of inspections is fixed for all the inspection campaigns. Based on the above, we observe that DDMAC is able to devise dynamic I&M policies according to the specified cost model, whether under campaign costs or individual inspection and repairs, and to provide an advanced, flexible, and adaptive decision-making framework.

5.2 I&M planning for Zayas frame

In the first set of numerical experiments, conducted for a 9-out-of-10 system, the focus was mainly directed to the investigation of the deterioration dependence among identical components and the effects of including a campaign cost within the cost model. In this second application, we further explore how I&M DDMAC policies are able to inherently capture the relative importance of each element with respect to the system structural reliability. The structural system of study, in this case, is the 2-dimensional Zayas frame, well studied in many structural reliability analysis applications Popov et al. (1980); Moan et al. (1991); Schneider et al. (2017). Zayas frame is composed of two columns, which along with 13 braces, sustain a rigid beam at the top. The geometry and material properties used in this work are presented in the Appendix B (Fig. B1) and are the same as the ones used in Schneider et al. (2017).

Description of the I&M decision problem

In this application, DDMAC policies are identified for two I&M settings: (i) under equal deterioration dependence among components with $ρ_{e q} = 0.4$ , and (ii) assuming independence among components’ deterioration. The state space for the latter includes 30 crack states along with 31 deterioration rate states, for each of the 22 hotspots (i.e., components), resulting therefore in a total of 20,460 input variables; while the input variables for the former climbs to approximately $1.6 \cdot 10^{6}$ states, including 80 states for the one discretized hyperparameter. The benefits associated with the proposed decoupled hierarchical structure are very significant, since the state space has a dimension of $930^{22}$ if the joint states of all hotspots are explicitly considered.

Similarly to the experiments reported in Section 5.1, the decision maker is here able to select three actions per hotspot at each time step: do-nothing / no-inspection, do-nothing / inspection, and perfect repair / no-inspection. Again, inspections provide binary crack indications (detected or no-detected), equally modeled for each component by the observation model described in the beginning of this Section. As for the cost model, inspections and repairs (planned individually for each component) cost $r_{i n s} = - 1$ and $r_{p r} = - 15$ money units, respectively, while the system failure cost is defined as $r_{F} = - 50, 000$ money units. All costs are discounted to the present value by a $γ = 0.95$ factor. DDMAC’s architecture is similar to the first application, featuring two hidden fully-connected layers of 150 neurons for each actor, and two hidden layers of 300 neurons for the critic network. Learning rate, prioritization of actions and additional exploration settings are equally defined as for the first application. The investigated heuristic-based policies rely on the same set of decision rules introduced in the former numerical experiments, accounting in this case, for inspections intervals $Δ_{i n s}$ and inspected hotspots per campaign $n_{C}$ . Both DDMAC and heuristic policies are evaluated over 30,000 episodes and the results, in terms of expected total costs, are showcased in Fig. 7.

System failure probability

Offshore structures are exposed to fatigue and corrosion deterioration due to the combined cyclic effect of waves and wind in a harsh marine environment. Initial defects at geometric discontinuities or at welded regions (hotspots) grow over time, becoming critical if maintenance actions are not timely undertaken. In this study, and following the experiments conducted in Luque and Straub (2016); Schneider et al. (2017), a total of 22 hotspots are considered, located at the joints at the braces or columns. Each brace is associated with either one or two hotspots, as illustrated in Fig 8, at critical locations for fatigue deterioration. The fatigue deterioration is assumed similar for all hotspots, modeled by the same deterioration process as for the 9-out-of-10 structural system (Section 5.1).

The failure of the system is defined here as the incapacity of the frame to withstand the concentrated horizontal load applied at the upper-left corner. At the component level, the health of each hotspot is described by the vector $F_{h}$ , in which $F_{h}$ is a binary variable with $F_{h} = 0$ indicating a hotspot failure and $F_{h} = 1$ corresponding to a hotspot survival. The failure probability of a hotspot $p_{F}^{(h)}$ corresponds thus to the probability of being in state $F_{h} = 0$ . At an element level, the state of each brace is represented by a vector $x_{e l}$ , considering $x_{e l} = 0$ if the element has failed and $x_{e l} = 1$ otherwise. Assuming that a brace fails if any of its associated hotspots fail, the failure probability of an element $p_{F}^{(e l)}$ , i.e., $P r (x_{e l} = 0)$ , can be therefore computed as a series system:

p_{F}^{(e l)} = 1 - \prod h \in N_{h} [1 - p_{F}^{(h)}]

(23)

At the system scale, the health of the frame depends on the state of all its constitutive elements, i.e., 13 braces, and the failure probability of the system $p_{F_{s y s}}$ is computed herein as a function of all the element state combinations. A total of 8,192 ( $= 2^{13}$ ) non-linear static push-over simulations have been run with the assistance of the computer code ‘USFOS’ (available within the software package Sesam) Søreide et al. (1993), before the training of DDMAC, so that the failure probability of the system conditional on all element state combinations is explicitly and directly defined. The element configuration for each push-over simulation is arranged according to the element state vector $x_{e l}$ , removing the braces associated with a failed state $x_{e l} = 0$ . The resistance $L_{c o l} (x_{e l})$ of each element state combination cases is retrieved from the conducted push-over simulations.

The collapse event of the frame is defined as the probability of the external horizontal load exceeding the structural system resistance $P r (L > L_{c o l})$ . In this case, the horizontal load is modeled as a lognormal random variable with mean $μ_{L} = 70$ kN and $25 %$ coefficient of variation, while no uncertainty is associated with the resistance, a reasonable assumption when the external load is highly uncertain in comparison with the resistance Schneider et al. (2017). The failure probability of the system, $p_{F s y s}$ , conditional on the element state vector $x_{e l}$ can be then defined directly from the probability density function of the load, $f_{L}$ ,:

p_{F s y s}^{(x_{e l})} = \int_{L_{c o l}}^{\infty} f_{L} (x) d x

(24)

In the undamaged case, i.e., no elements are removed from the original configuration, the collapse load is 247 kN, resulting in a failure probability of approximately $10^{- 4}$ . The state of the frame is, however, computed conditional on the state of all the elements, and to do so the probability of being in each state combination should be computed. We follow the iterative procedure proposed in Song and Kang (2009) to compute the probability of being in each element state $q ≐ p (x_{e l})$ as a function of the element failure probability $p_{F}^{(e l)}$ and the element survival probability ${¯ p}_{F}^{(e l)}$ :

\begin{matrix} q_{[1]} = {[\begin{matrix} p_{F}^{(1)} & {¯ p}_{F}^{(1)} \end{matrix}]}^{T} q_{[i]} = ⎡ ⎣ \begin{matrix} q_{[i - 1]} \cdot p_{F}^{(i)} q_{[i - 1]} \cdot {¯ p}_{F}^{(i)} \end{matrix} ⎤ ⎦ \end{matrix}

(25)

Finally, the system failure probability $p_{F s y s}$ is equal to the system failure probability conditional on the element state $p_{F s y s}^{(x_{e l})}$ multiplied by the probability of being in that state $q^{(x_{e l})}$ :

p_{F s y s} = \sum x_{e l} \in X_{e l} [p_{F s y s}^{(x_{e l})} \cdot q^{(x_{e l})}]

(26)

Results and discussion

The comparison between DDMAC and optimized heuristics follows the same trend as that of the 9-out-10 structural system experiments. In terms of expected life-cycle costs, DDMAC policies outperform heuristic-based policies in the two tested settings, as shown in Fig. 7, with cost savings ranging from 20.1% to 22.8%. A slight decrease in the expected life-cycle costs can also be observed for the case under deterioration correlation, as a result of the reduction of failure risk. This is expected, since under deterioration dependence, i.e., the initial crack size among the hotspots is correlated, an observation collected at one hotspot also provides information to other hotspots, accordingly updating their damage state belief. These beliefs are updated for both detection and no detection observation outcomes, as illustrated in Fig. 8. At year 12, a crack is detected at the lower X-brace, and this observation shows up as a failure probability update for the other components, marked with a green rectangle in the plots, an effect that can also be observed clearly in the updated mean of the hyperparameter, $α$ . In most policy realizations and hotspot inspections, however, the most likely observation outcome is no-detection, explaining the observed risk reduction in cases under deterioration correlation. The effect of deterioration dependencies among components on the resulting system failure probability can also be visualized in Fig. 8, e.g., a crack detection at hotspot 6 ultimately induces a kink in the system failure probability around year 20.

Figure 7: Expected cost results for the numerical experiments conducted for the Zayas frame, divided into inspection $E [r_{i n s}]$ , perfect-repair $E [r_{r e p}]$ and failure $E [r_{F}]$ expected costs. DDMAC and heuristics are compared under an uncorrelated deterioration environment at the top, and under an equally correlated environment ( $ρ_{e q} = 0.4$ ) at the bottom.

Essentially, DDMAC is able to discover the importance of each hotspot regarding the structural reliability of the frame. To explore this system effect, the Single Element Importance (SEI) measure is calculated for each hotspot. The concept of SEI, as defined in Straub and Der Kiureghian (2011), determines the importance of each element to the system structural reliability by subtracting the undamaged system failure probability $p_{F_{s y s}}$ from the system failure probability with the element removed ( $\sim e l$ ). In this case, and since each element is defined as a series system of hotspots, the SEI can be directly computed for each hotspot $h$ , determining each hotspot importance as:

S E I_{h} = p_{F_{s y s}}^{(\sim h)} - p_{F_{s y s}}

(27)

The SEI of a vital element for the structural system is thus higher than the SEI of a less important component. The structural element importance (SEI) of each hotspot is shown in Fig. 9, along with histograms of the actions taken at each component during 30,000 DDMAC (dark blue) and optimized-heuristic (light blue) realizations. As represented by the dark green bar diagram at the top-right corner of Fig. 9, and in agreement with the findings reported in Schneider et al. (2017), the critical hotspots are located at the X-braces, whereas the less critical hotspots are the ones connecting the horizontal braces. While the do-nothing action is dominant and inspection actions are distributed similarly among components, the distribution of repair actions among hotspots differs for DDMAC and heuristics-based decision rules. DDMAC plans repairs mainly for important elements with respect to the global structural reliability, i.e., with a high SEI, such as hotspots 6 and 7, whereas less important components for the system structural reliability are less frequently repaired. In contrast, the heuristic-based policy plan component repairs nearly evenly, disregarding the influence of each hotspot to the reliability of the system. We can therefore conclude that DDMAC policies are able to inherently identify the system effects attributed to the structural and reliability importance of each element for the entire system.

Figure 8: Zayas frame sample policy realization in an equal deterioration correlation environment ( $ρ_{e q} = 0.4$ ). The failure probability of each hotspot is depicted by a blue line, inspection indications are represented by upwards (detection) or downwards (no-detection) triangles, and perfect repairs are circled in red. At the system level, the failure probability, at the top-left corner, and system-effects, as these are explained in the main text, are represented by a green line and squares, respectively. The evolution of the hyperparameters over time is plotted in a light blue diagram, at the top-right corner.

Figure 9: Histograms of DDMAC and heuristics-based policy actions of 30,000 realizations of this Zayas frame setting for a 30-year policy period. The Single Element Importance metric (SEI) associated with each fatigue hotspot is indicated at the top of each histogram and summarized at the green top-right diagram. The relative importance of each hotspot is also represented by color, with a darker red being a more critical element for the structural reliability of the system.

6 Concluding remarks

This paper introduces an efficient algorithmic framework for inference and optimal decision-making under uncertainty for engineering systems exposed to deteriorating environments. In terms of inference, a Gaussian hierarchical structure is presented, within a dynamic Bayesian network model, and further formalized here with the objective of enabling the treatment of engineering systems under general, unequal deterioration correlation settings, considering also the effect of maintenance actions. The proposed efficient inference framework is then seamlessly integrated with principled optimization methods by formulating the decision-making problem as a factored Partially Observable Markov Decision Process (POMDP), with its dynamics encoded by Bayesian network conditional structures. The system life-cycle realizations collected by simulating the specified POMDP are employed by a multi-agent actor-critic deep reinforcement learning algorithm, able to identify optimal strategies in very high dimensional state, action, and observation spaces, commonly found in practical structural and engineering systems. In particular, we demonstrate through numerical experiments that the proposed approach provides efficient inspection and maintenance (I&M) policies, outperforming state-of-the-art policies, and enables a systematic treatment of system effects, that is autonomously and intrinsically reflected in the identified strategies.

POMDP-based policies, parametrized in high-dimensional settings through the Deep Decentralized Multi-agent Actor Critic (DDMAC) algorithm, map the current belief state of the system to a probability distribution of possible actions. These stochastic policies are thus prescribed as a function of the belief state, which is a sufficient statistic of the history of actions and observations. Constructing the policies based on a sufficient statistic feature enables more effective optimal decision-making strategies than static optimization approaches, which are constrained by the limited space explored during the policy search. POMDP-based policies provide an additional flexibility to the decision maker, who can opt for an alternative decision at some point, for any reason, and the policy through the updated belief state will be automatically adapted thereafter, yielding near-optimal results.

DDMAC policies are approximated by actor neural networks, whose weights are learned according to noisy rewards collected at the system level. By including deterioration dependence among components in the simulated environment, and by formulating the cost model at the system level, DDMAC policies are able to intrinsically consider the following system effects:

In deterioration dependent environments, observing the state of one component provides indirect information to the other components of the system, modulated by their degree of correlation. In the tested I&M planning scenarios, environments with higher correlation resulted in a reduction of expected costs, usually characterized by lower expected failure risk. As structural systems are designed according to high reliability standards, demanding a low failure risk, observations mostly indicate sound structural states, which in highly correlated environments results in a global reduction of failure risk. As opposed to independent deterioration settings, higher variability in the expected costs is observed in dependent environments, in which very different I&M policy scenarios can be experienced based on the acquired observations.
A clustering effect on inspections and repairs is observed in settings that include a campaign cost model, i.e., a fixed cost is activated if at least one component is repaired or inspected. In this case, policies seek to avoid planning single or few inspections and repairs at one time step. Instead, inspection and maintenance actions are generally grouped, saving the additional campaign cost associated with inspecting and repairing only few components within one campaign.
Maintenance actions are influenced by the relative importance of the components to the system structural reliability. As observed in the steel frame application, repairs were mostly allocated to critical elements, whereas components less important to the global reliability were less often repaired.

In this work, the deterioration environment is formulated as a discrete state POMDP, in which exact Bayesian inference can be conducted. Further research can be focused on the proper development of continuous state POMDPs and/or on modeling, inference, and optimization procedures that would allow for further reduction of the state/action space dimensionality.

Acknowledgements

This research is funded by the National Fund for Scientific Research in Belgium F.R.I.A. - F.N.R.S. This support is gratefully acknowledged. Dr. Papakonstantinou would further like to acknowledge that this material is also based upon work supported by the U.S. National Science Foundation under Grant No. 1751941. Dr. Andriotis would like to acknowledge the support of the TU Delft AI Labs program. Finally, the authors would like to acknowledge the support provided by DNV GL Digital Solutions for granting access to the software package ‘Sesam’.

Appendix A. Optimized heuristic decision rules.

Setting	Deterioration correlation	Cost model	$Δ_{i n s}$	$n_{C}$
9-out-of-10 system	$ρ_{e q} = 0$	Individual	$6$	$10$
9-out-of-10 system	$ρ_{e q} = 0.4$	Individual	$6$	$10$
9-out-of-10 system	$ρ_{e q} = 0.8$	Individual	$6$	$8$
9-out-of-10 system	$ρ_{u q}$	Individual	$5$	$7$
9-out-of-10 system	$ρ_{e q} = 0$	Campaign	$5$	$10$
9-out-of-10 system	$ρ_{e q} = 0.4$	Campaign	$6$	$10$
9-out-of-10 system	$ρ_{e q} = 0.8$	Campaign	$5$	$7$
9-out-of-10 system	$ρ_{u q}$	Campaign	$8$	$10$
Zayas frame	$ρ_{e q} = 0$	Individual	$10$	$16$
Zayas frame	$ρ_{e q} = 0.4$	Individual	$10$	$16$

Table A1: List of optimized heuristic decision rules employed in the numerical experiments. For each considered setting, the resulting heuristic decision rules dictate inspections for

n_{C}

components at equidistant intervals of

Δ_{i n s}

years.

ρ

indicates equal (

e q

) or unequal (

u q

) deterioration correlation among components.

Appendix B. Zayas frame geometry and material properties.

Figure B1: Zayas frame representation Popov et al. (1980); Schneider et al. (2017). Elements are denoted with lower-case letters and fatigue hotspots are designated with numbers. The outer diameters, OD, wall thicknesses, WT, and mechanical properties corresponding to each element of the frame are specified in the diagram. An external horizontal load, $L$ , is applied at the upper-left corner of the frame.

References

C. P. Andriotis and K. G. Papakonstantinou (2019) Managing engineering systems with large state and action spaces through deep reinforcement learning. Reliability Engineering and System Safety 191, pp. 106483. External Links: Document, ISSN 09518320 Cited by: §1, §1, §2.1, §4, §4.
C. P. Andriotis, K. G. Papakonstantinou, and E. N. Chatzi (2021) Value of structural health information in partially observable stochastic environments. Structural Safety 93, pp. 102072. External Links: Document Cited by: §1.
C. P. Andriotis and K. G. Papakonstantinou (2021) Deep reinforcement learning driven inspection and maintenance planning under incomplete information and constraints. Reliability Engineering & System Safety 212, pp. 107551. External Links: Document Cited by: §1, §4.
R. E. Barlow and K. D. Heidtmann (1984) Computing k-out-of-n system reliability. IEEE Transactions on Reliability 33 (4), pp. 322–323. Cited by: §5.1.
E. Bismut and D. Straub (2018) Adaptive direct policy search for inspection and maintenance planning in structural systems. In Proc. 6th International Symposium on Life-Cycle Civil Engineering (IALCCE), Cited by: §1.
E. Bismut and D. Straub (2021) Optimal adaptive inspection and maintenance planning for deteriorating structural systems. Reliability Engineering & System Safety 215, pp. 107891. External Links: Document Cited by: §1.
R. B. Corotis, J. Hugh Ellis, and M. Jiang (2005) Modeling of risk-based inspection, maintenance and life-cycle cost with partially observable Markov decision processes. Structure and Infrastructure Engineering 1 (1), pp. 75–84. External Links: Document, ISSN 1573-2479 Cited by: §1.
G. Deodatis, H. Asada, and S. Ito (1996) Reliability of aircraft structures under non-periodic inspection: a Bayesian approach. Engineering Fracture Mechanics 53 (5), pp. 789–805. External Links: Document Cited by: §1.
A. Der Kiureghian (2022) Structural and System Reliability. Cambridge University Press. External Links: Document Cited by: §3.3.
O. Ditlevsen and H. O. Madsen (2007) Structural Reliability Methods. Department of Mechanical Engineering. Technical University of Denmark.. External Links: ISBN 0 471 96086 1 Cited by: §5.
C. W. Dunnett and M. Sobel (1955) Approximations to the probability integral and certain percentage points of a multivariate analogue of Student’s t-distribution. Biometrika 42, pp. 258–260. External Links: Document Cited by: §3.1.
I. Enevoldsen and J. D. Sørensen (1993) Reliability-based optimization of series systems of parallel systems. Journal of structural engineering 119 (4), pp. 1069–1084. External Links: Document Cited by: §1.
M. H. Faber and M. G. Stewart (2003) Risk assessment for civil engineering facilities: Critical overview and discussion. Reliability Engineering and System Safety 80 (2), pp. 173–184. External Links: Document, ISSN 09518320 Cited by: §1.
L. Fan, H. Su, W. Wang, E. Zio, L. Zhang, Z. Yang, S. Peng, W. Yu, L. Zuo, and J. Zhang (2022) A systematic method for the optimization of gas supply reliability in natural gas pipeline network based on Bayesian networks and deep reinforcement learning. Reliability Engineering & System Safety 225, pp. 108613. External Links: Document Cited by: §1.
D. M. Frangopol, M. Kallen, and J. M. v. Noortwijk (2004) Probabilistic models for life-cycle performance of deteriorating structures: review and future directions. Progress in structural engineering and Materials 6 (4), pp. 197–212. External Links: Document Cited by: §1.
D. M. Frangopol and M. Liu (2007) Maintenance and management of civil infrastructure based on condition, safety, optimization, and life-cycle cost. Structure and infrastructure engineering 3 (1), pp. 29–41. External Links: Document Cited by: §1.
D. M. Frangopol and M. Soliman (2016) Life-cycle of structural systems: recent achievements and future directions. Structure and Infrastructure Engineering 12 (1), pp. 1–20. External Links: Link, Document, ISSN 1573-2479 Cited by: §1.
N. Hlaing, P. G. Morato Dominguez, P. Rigo, P. Amirafshari, A. Kolios, and J. S. Nielsen (2020) The effect of failure criteria on risk-based inspection planning of offshore wind support structures. In Proc. 7th International Symposium on Life-Cycle Civil Engineering (IALCCE), Cited by: §1, §5.
N. Hlaing, P. G. Morato, J. S. Nielsen, P. Amirafshari, A. Kolios, and P. Rigo (2022) Inspection and maintenance planning for offshore wind structural components: integrating fatigue failure criteria with Bayesian networks and Markov decision processes. Structure and Infrastructure Engineering 18 (7), pp. 983–1001. External Links: Document Cited by: §1.
S. Ito, G. Deodatis, Y. Fujimoto, H. Asada, and M. Shinozuka (1992) Non-periodic inspection by Bayesian method II: structures with elements subjected to different stress levels. Probabilistic engineering mechanics 7 (4), pp. 205–215. External Links: Document Cited by: §1.
H. Kurniawati, D. Hsu, and W. Sun Lee (2008) SARSOP: Efficient Point-Based POMDP Planning by Approximating Optimally Reachable Belief Spaces. In Proceedings of Robotics: Science and Systems IV, Zurich, Switzerland. External Links: Link, ISBN 9780262513098, Document Cited by: §2.1, §4.
Y. Li (2017) Deep reinforcement learning: an overview. arXiv preprint arXiv:1701.07274. Cited by: §4.
L. Long, Q. A. Mai, P. G. Morato, J. D. Sørensen, and S. Thöns (2020) Information value-based optimization of structural and environmental monitoring for offshore wind turbines support structures. Renewable Energy 159, pp. 1036–1046. External Links: Document Cited by: §1.
I. Lotsberg, G. Sigurdsson, A. Fjeldstad, and T. Moan (2016) Probabilistic methods for planning of inspection for fatigue cracks in offshore structures. Marine Structures 46, pp. 167–192. External Links: ISBN 978-1-61399-379-8, Document, ISSN 09518339 Cited by: §1.
J. Luque and D. Straub (2016) Reliability analysis and updating of deteriorating systems with dynamic Bayesian networks. Structural Safety 62, pp. 34–46. External Links: Document, ISSN 01674730 Cited by: §1, §1, §2.1, §3.1, §3.1, §3.2, §5.
J. Luque and D. Straub (2019) Risk-based optimal inspection strategies for structural systems using dynamic Bayesian networks. Structural Safety 76, pp. 68–80. External Links: Link, Document, ISSN 01674730 Cited by: §1, §2.1, §3.2, §5.
M. Memarzadeh, M. Pozzi, and J. Z. Kolter (2016) Hierarchical modeling of systems with similar components: A framework for adaptive monitoring and control. Reliability Engineering & System Safety 153, pp. 159–169. External Links: Document, ISSN 0951-8320 Cited by: §1.
M. Memarzadeh, M. Pozzi, and J. Zico Kolter (2015) Optimal Planning and Learning in Uncertain Environments for the Management of Wind Farms. Journal of Computing in Civil Engineering 29 (5), pp. 04014076. External Links: Link, Document, ISSN 0887-3801 Cited by: §1.
V. Mnih, K. Kavukcuoglu, D. Silver, A. Graves, I. Antonoglou, D. Wierstra, and M. Riedmiller (2013) Playing atari with deep reinforcement learning. arXiv preprint arXiv:1312.5602. Cited by: §1.
T. Moan, J. Amdahl, T. Granli, and O. Hellan (1991) Collapse behaviour of offshore structural systems. Advances in Marine Structures–2, pp. 469–494. Cited by: §5.2, §5.
P. G. Morato, K. G. Papakonstantinou, C. P. Andriotis, J. S. Nielsen, and P. Rigo (2022) Optimal inspection and maintenance planning for deteriorating structural components through dynamic Bayesian networks and Markov decision processes. Structural Safety 94, pp. 102140. External Links: Document Cited by: §1, §1, §2.1, §2.1, §3.1, §4, §5.
K. P. Murphy (2002) Dynamic Bayesian Networks: Representation, Inference and Learning. Ph.D. Thesis, University of California, Berkeley. Cited by: §2.1, §2.1.
J. S. Nielsen and J. D. Sørensen (2018) Computational framework for risk-based planning of inspections, maintenance and condition monitoring using discrete Bayesian networks. Structure and Infrastructure Engineering 14 (8), pp. 1082–1094. External Links: Document Cited by: §1.
K. G. Papakonstantinou and M. Shinozuka (2014a) Planning structural inspection and maintenance policies via dynamic programming and Markov processes. Part I: Theory. Reliability Engineering and System Safety 130, pp. 202–213. External Links: Link, Document, ISSN 09518320 Cited by: §1, §2.1, §2.1.
K. G. Papakonstantinou and M. Shinozuka (2014b) Planning structural inspection and maintenance policies via dynamic programming and Markov processes. Part II: POMDP implementation. Reliability Engineering and System Safety 130, pp. 214–224. External Links: Link, Document, ISSN 09518320 Cited by: §1, §2.1, §4.
K. G. Papakonstantinou, M. Amir, and G. P. Warn (2022) A Scaled Spherical Simplex Filter (S3F) with a decreased n+ 2 sigma points set size and equivalent 2n+ 1 Unscented Kalman Filter (UKF) accuracy. Mechanical Systems and Signal Processing 163, pp. 107433. External Links: Document Cited by: §2.1.
K. G. Papakonstantinou and M. Shinozuka (2014c) Optimum inspection and maintenance policies for corroded structures using partially observable Markov decision processes and stochastic, physically based models. Probabilistic Engineering Mechanics 37, pp. 93–108. External Links: Document Cited by: §1.
K. G. Papakonstantinou and M. Shinozuka (2013) Probabilistic model for steel corrosion in reinforced concrete structures of large dimensions considering crack effects. Engineering Structures 57, pp. 306–326. External Links: Document Cited by: §1.
K. G. Papakonstantinou, C. P. Andriotis, and M. Shinozuka (2018) POMDP and MOMDP solutions for structural life-cycle cost minimization under partial and mixed observability. Structure and Infrastructure Engineering 14 (7), pp. 869–882. External Links: Document, ISSN 17448980 Cited by: §4.
J. Pineau, G. Gordon, and S. Thrun (2003) Point-based value iteration: An anytime algorithm for POMDPs. In IJCAI International Joint Conference on Artificial Intelligence, pp. 1025–1030. External Links: ISSN 10450823 Cited by: §4.
E. P. Popov, V. A. Zayas, and S. A. Mahin (1980) Inelastic cyclic behavior of tubular braced frames. Journal of the Structural Division 106 (12), pp. 2375–2390. External Links: Document Cited by: §5.2, §5, Figure B1.
M. L. Puterman (2014) Markov decision processes: discrete stochastic dynamic programming. John Wiley & Sons. Cited by: §4.
R. Rackwitz, A. Lentz, and M. Faber (2005) Socio-economically sustainable civil engineering infrastructures by optimization. Structural safety 27 (3), pp. 187–229. External Links: Document Cited by: §1.
T. Schaul, J. Quan, I. Antonoglou, and D. Silver (2015) Prioritized experience replay. arXiv preprint arXiv:1511.05952. Cited by: §4.
R. Schneider, S. Thöns, and D. Straub (2017) Reliability analysis and updating of deteriorating systems with subset simulation. Structural Safety 64, pp. 20–36. External Links: Document, ISSN 01674730, Link Cited by: §5.2, §5, §5, §5, Figure B1.
D. Silver, A. Huang, C. J. Maddison, A. Guez, L. Sifre, G. Van Den Driessche, J. Schrittwieser, I. Antonoglou, V. Panneershelvam, M. Lanctot, et al. (2016) Mastering the game of go with deep neural networks and tree search. Nature 529 (7587), pp. 484–489. External Links: Document Cited by: §1.
D. Silver, J. Schrittwieser, K. Simonyan, I. Antonoglou, A. Huang, A. Guez, T. Hubert, L. Baker, M. Lai, A. Bolton, et al. (2017) Mastering the game of go without human knowledge. Nature 550 (7676), pp. 354–359. External Links: Document Cited by: §1.
T. Smith and R. Simmons (2006) Focused real-time dynamic programming for MDPs: Squeezing more out of a heuristic. Proceedings of the National Conference on Artificial Intelligence 2 (January), pp. 1227–1232. External Links: ISBN 1577352815 Cited by: §4.
J. Song and W. Kang (2009) System reliability and sensitivity under statistical dependence by matrix-based system reliability method. Structural Safety 31 (2), pp. 148–156. External Links: Document Cited by: §3.1, §3.3, §5.
T. Søreide, J. Amdahl, E. Eberg, T. Holmås, and Ø. Hellan (1993) USFOS—A computer program for progressive collapse analysis of steel offshore structures. Theory Manual, SINTEF, Trondheim, Norway. Cited by: §5.
M. T. J. Spaan and N. Vlassis (2005) Perseus: Randomized point-based value iteration for POMDPs. Journal of Artificial Intelligence Research 24, pp. 195–220. External Links: Document, ISSN 10769757 Cited by: §4.
D. Straub and A. Der Kiureghian (2011) Reliability acceptance criteria for deteriorating elements of structural systems. Journal of Structural Engineering 137 (12), pp. 1573–1582. External Links: Document Cited by: §5.
D. Straub and I. Papaioannou (2015) Bayesian updating with structural reliability methods. Journal of Engineering Mechanics 141 (3), pp. 34. External Links: ISBN 0733-9399, Document, ISSN 0733-9399 Cited by: §1.
D. Straub (2004) Generic Approaches to Risk Based Inspection Planning for Steel Structures. Ph.D. Thesis, Swiss Federal Institute of Technology Zürich (ETH). External Links: ISBN 3728129690 Cited by: §1, §1.
D. Straub (2009) Stochastic Modeling of Deterioration Processes through Dynamic Bayesian Networks. Journal of Engineering Mechanics 135 (10), pp. 1089–1099. External Links: Document, ISSN 0733-9399 Cited by: §1, §1, §2.1.
R. S. Sutton and A. G. Barto (2018) Reinforcement learning: an introduction. MIT press. Cited by: §4.
P. Thoft-Christensen and J. Dalsgård Sørensen (1982) Reliability of structural systems with correlated elements. Applied Mathematical Modelling 6 (3), pp. 171–178. External Links: Document, ISSN 0307904X Cited by: §1.
S. Wei, Y. Bao, and H. Li (2020) Optimal policy for structure maintenance: A deep reinforcement learning framework. Structural Safety 83, pp. 101906. External Links: Document Cited by: §1.
D. Y. Yang and D. M. Frangopol (2018) Probabilistic optimization framework for inspection/repair planning of fatigue-critical details using dynamic Bayesian networks. Computers & Structures 198, pp. 40–50. External Links: Document Cited by: §1.
D. Y. Yang (2022) Deep Reinforcement Learning-Enabled Bridge Management Considering Asset and Network Risks. Journal of Infrastructure Systems 28 (3), pp. 04022023. External Links: Document Cited by: §1.

Inference and dynamic decision-making for deteriorating systems with probabilistic dependencies through Bayesian networks and deep reinforcement learning

Abstract

keywords:

1 Introduction

2 I&M decision problem formulated as a factored POMDP

2.1 Factored POMDP definition

3 System effects in I&M planning

3.1 Deterioration dependence in a hierarchical Gaussian structure

3.2 Belief update under deterioration dependence

3.3 System structural reliability and system cost model

4 Optimal I&M planning via deep reinforcement learning

5 Numerical experiments

Fatigue deterioration model

5.1 I&M planning for a 9-out-10 system

Description of the I&M decision problem

Results and discussion

5.2 I&M planning for Zayas frame

Description of the I&M decision problem

System failure probability

Results and discussion

6 Concluding remarks

Acknowledgements

Appendix A. Optimized heuristic decision rules.

Appendix B. Zayas frame geometry and material properties.

References

Inference and dynamic decision-making for deteriorating systems
with probabilistic dependencies through Bayesian networks
and deep reinforcement learning