An important problem in terrain analysis is modeling how water flows across a terrain and creates floods by filling up depressions. In this paper, we study a number of *flood-risk* related problems: given a terrain *Σ*, represented as a triangulated *xy*-monotone surface with *n* vertices, a rain distribution R, and a volume of rain Ψ, determine which portions of *Σ* are flooded. We give an overview of efficient algorithms for these problems as well as explore the efficacy and efficiency of these algorithms on real terrains.

### 1. Introduction

Flooding can be extremely dangerous and damaging. The United States experienced the wettest 12-month period from June 2018 to May 2019, with major flooding in the Midwest affecting millions of people and causing several billion dollars in damages. Being able to accurately and quickly model flooding can help predict and prepare for the risks. Flood-risk analysis has been studied widely across multiple research communities including environmental science, engineering, machine learning, and GIS communities: see Section 7.

Flood risk analysis also has been a focus of a number of companies as well. SCALGO^{22} is a software development and services company that uses massive terrain dataprocessing technology to provide a flood risk platform for Scandinavian countries. 3Di Water Management^{1} provides interactive hydrodynamic simulation software combining overland, channel, sewer and ground water flow. Fathom^{13} uses high-resolution global data-sets and hydrological modeling to provide flood hazard data for many applications, including insurance and disaster response.

In this paper, we will focus on two key problems related to flood-risk analysis, which are defined more formally in later sections:

**Terrain-flood query:** given a terrain *Σ* and a rain pattern, determine which portions of *Σ* will be flooded. (See Figure 1 for an example.)

**Figure 1. A terrain-flood query over a region of Philadelphia, PA, USA. The areas marked in blue are flooded, with regions that water flows over marked in orange.**

**Point-flood query:** In some applications, the terrain *Σ* is fixed and we wish to know whether a query point on *Σ* will be flooded for a given rain pattern. Preprocess *Σ* into a data structure so that for a given rain pattern and query point *q* ∈ *Σ*, one can quickly determine whether *q* is flooded. Alternatively, one can ask how long rain must fall before *q* is flooded.

When rain falls, the rate at which a depression fills up depends not only on its shape and the size of its watershed (the area of the terrain that contributes water to the depression), but also on other depressions filling up. Water falling on the watershed of a depression that is already filled flows to a neighboring depression, effectively making its watershed larger and thus making it fill up faster. Maintaining how depressions fill and spill into other depressions during a flash flood makes the above problem challenging.

We present an efficient algorithm for the terrain-flood query in Section 4. The algorithm works by sweeping descending contours on the terrain, tracking where water flows and determining which depressions become full. A key feature of the algorithm is that once we find a contour delimiting a flooded region, we can mark the enclosed region as flooded and prune it from consideration.

For the point-flood query, we present in Section 5 an algorithm that preprocesses the terrain into a data structure so that queries can be answered quickly. If we assume the single-flow direction (SFD) model in which water from a vertex flows along the steepest descending edge, we describe an algorithm that, after preprocessing the terrain, can answer point-flood queries in time logarithmic in the number of vertices on the terrain. We also briefly discuss algorithms for the *flood-time query* that asks *when*, rather than *if*, a point will become flooded. These algorithms all work by recognizing that not all depressions are equally important from the perspective of the query point. The terrain can be simplified into a set of depressions called the tributaries. The local behavior within each tributary can then be ignored. Instead, the algorithms depend only on the global behavior: does the tributary become full, and if so, which downstream tributaries does it spill into?

We have implemented the algorithms for terrain-flood and point-flood queries and tested them on real terrains. We show that the algorithms are efficient in practice, across a variety of queries, and give some analysis as to how varying the query affects the running time. We also compare terrain-flood queries qualitatively under two variants, the multi-flow direction (MFD) and single-flow direction (SFD) models, showing cases where the flooded regions are significantly different under the two (Section 6).

### 2. Preliminaries

**Terrains.** Let M be a triangulation of R^{2}, and let V be the set of vertices of M; set *n* = |V|. We assume that V contains a vertex *v*_{∞} at infinity and that each edge {*u*, *v*_{∞}} is a ray emanating from *u*; the triangles in M incident to *v*_{∞} are unbounded. Let *h* : M → R be a height function. We assume that the restriction of *h* to each triangle of M is a linear map, that *h* approaches +∞ at *v*_{∞}, and that the heights of all vertices are distinct. Given M and *h*, the graph of *h*, called a *terrain* and denoted by *Σ* = (M, *h*), is an *xy*-monotone triangulated surface whose triangulation is induced by M.

**Critical vertices.** There is a natural cyclic order on the neighbor vertices of a vertex *v* of M, and each such vertex *u* is either an *upslope* or *downslope* neighbor, that is, *h*(*u*) > *h*(*v*) or *h*(*u*) < *h*(*v*), respectively. If *v* has no downslope (resp. upslope) neighbor, then *v* is a *minimum* (resp. *maximum*). We also refer to a minimum as a *sink.* If *v* has four neighbors *w*_{1}, *w*_{2}, *w*_{3}, *w*_{4} in clockwise order such that max(*h*(*w*_{1}), *h*(*w*_{3}) ) < *h*(*v*) < min(*h*(*w*_{2}), *h*(*w*_{4}) ), then *v* is a *saddle* vertex. If a vertex is not a critical point, call it *regular.* See Figure 2.

**Figure 2. From left to right: minimum (sink), maximum, saddle, and regular vertices. Upslope and downslope neighbors are marked in white and black, respectively.**

**Level sets, contours, depressions.** Given *l* ∈ R, the *l*–*sublevel set of h* is the set *h*_{=l} = {*x* ∈ R^{2} | *h*(*x*) = *l*}, and the *l-level set of h* is the set *h*_{=l} = {*x* ∈ R^{2} | *h*(*x*) = *l*}. Each connected component of *h _{<l}* is called a

*depression.*Each connected component of the boundary of a depression is a

*contour.*A contour not passing through a critical vertex is a simple polygonal cycle. Note that a depression is not necessarily simply connected, as a maximum can cause a hole to appear.

For a point *x* ∈ M, a depression *β*_{x} of *h _{<l}* is said to be

*delimited by the point x*if

*x*lies on the boundary of

*β*, which implies that

*h*(

*x*) =

*l.*A depression

*β*

_{1}is

*maximal*if every depression

*β*

_{2}³

*β*

_{1}contains strictly more sinks than

*β*

_{1}. A maximal depression that contains exactly one sink is called an

*elementary depression.*Note that each maximal depression is delimited by a saddle, and a saddle that delimits more than one maximal depression is called a

*negative saddle.*For a maximal depression

*β*, let Sd(

*β*) denote the saddle delimiting

*β*, and let Sk(

*β*) denote the set of sinks in

*β*.

**Merge tree.** Suppose we sweep a horizontal plane from –∞ to ∞. As we vary *l*, the depressions in *h _{<l}* vary continuously, but their structure changes only at sinks and negative saddles. If we increase

*l*, then a new depression appears at a sink, and two depressions merge at a negative saddle. The merge tree, denoted by T

_{h}, is a tree that tracks these changes. Its leaves are the sinks of the terrain, and its internal nodes are the negative saddles. The edges of T

_{h}are in one-to-one correspondence with the maximal depressions of

*Σ*

_{h}, that is, we associate each edge (

*u*,

*v*), for

*h*(

*u*) >

*h*(

*v*), with the maximal depression delimited by

*u*and containing

*v.*The point of height

*l*∈ [

*h*(

*v*),

*h*(

*u*)] on edge (

*u*,

*v*) represents the depression of

*h*contained in

_{<l}*β*

_{v}. We assume that T

_{h}has an edge from the root of T

_{h}extending to +∞, corresponding to the depression that extends to ∞. See Figure 4. For simplicity, we assume that T

_{h}is binary, that is, each negative saddle delimits exactly two depressions. Nonsimple saddles can be unfolded into a number of simple saddles.

^{12}

Let *u* be a negative saddle, let (*u*, *v*_{1}) and (*u*, *v*_{2}) be two down edges in T_{h} from *u*, and let (*w*, *u*) be the up edge from *u.* We call the depression associated with (*u*, *v*^{2}) (resp. with (*w*, *u*) ) as the *sibling* (resp. *parent*) (depression) of that associated with (*u*, *v*^{1}). Van Kreveld et al.^{16} gave an *O*(*n* log *n*)-time algorithm for constructing the merge tree in 2D. The algorithm was later extended to 3D by Tarasov and Vyalyi,^{23} and to arbitrary dimensions by Carr et al.^{8}

T_{h} can be preprocessed in *O*(*n*) additional time so that for a point *x* ∈ R^{2}, Vol(*β*_{x}), the volume of the depression delimited by *x* can be computed in *O*(log *n*) time. In the following sections, we will be working with a fixed height function, so will drop the subscript *h* from T_{h}, Vol_{h}, etc.

### 3. Flooding Model

**Flow graph and flow functions.** We transform M into a directed acyclic graph M, referred to as the *flow graph*, by directing each edge {*u*, *v*} of M in downward direction, that is, from *u* to *v* if *h*(*u*) > *h*(*v*), and from *v* to *u* otherwise. For each (directed) edge (*u*, *v*), we define the *local flow* λ(*u*, *v*) to be the portion of water arriving at *u* that flows along the edge (*u*, *v*) to *v.*

The value of λ(*u*, *v*) is, in general, based on relative heights of the downslope neighbors of *u.* We will refer to the general version in which water can flow along multiple downward edges from *u* as the *multi-flow direction* (MFD) model. If λ(*u*, *v*) > 0 for exactly one downslope edge from *u*, we will refer to this as the *single-flow direction* (SFD) model. See Figure 3 for an example of how these models differ. In some cases, notably for point-flood and flood-time queries, the SFD model will admit more efficient algorithms. We will not focus on the specifics of the local flow function, and only assume that it is specified and for a pair *u*, *v* can be retrieved in *O*(1) time.

Following the edges of M, water reaches a set of sinks of M. We define a *flow function* Φ: V^{2} → [0, 1], which specifies the proportion of water that flows from a vertex *u* to another vertex *v.* Note that under the MFD model, water can flow from *u* to *v* along many paths. The flow function is defined recursively as follows:

**Figure 3. Rain falls at the local maximum at the green circle toward local minima marked with squares. Left: an SFD model, water flows along a single path to a single minimum. Right: an MFD model, water flows along multiple paths to three local minima.**

For a maximal depression *β*, we define

to be the portion of water that reaches from a vertex *u* to *β*. Recall that Sk(*β*) is the set of sinks in *β*.

If a maximal depression *β* delimited by saddle *u* is full, we define Φ_{β}(*u*, *v*) to be the modified flow function, computed as if the flooded vertices have a height of *h*(*u*).

**Rain distribution.** We let R denote a *rain distribution*, which is specified as a probability distribution on the vertices of the terrain, that is, for each vertex *v* ∈ V, R(*v*) ≥ 0 indicates the rate at which it rains on *v*, and we require that *Σ*_{v} R(*v*) = 1. For a given depression *β*, let R(*β*) = *Σ**v*∈*β* R(*v*) be the portion of rain falling directly into *β*. We denote by |R| the number of vertices with positive rainfall in R, and we assume that R is represented as a list of |R| pairs (*v*, R(*v*)). In practice, |R| << *n.*

**Flood propagation.** Our flooding model follows a similar depression-filling model as Liu and Snoeyink.^{17} As rain falls according to a distribution R on *Σ*, water flows along the downward edges according to the flow function and accumulates in depressions of *Σ*. When a maximal depression *β*_{i} fills up, water spills from the saddle *v _{i}* delimiting

*β*

_{i}toward sinks in the sibling depression. We refer to this event as a

*spill event.*

The above process defines a sequence of spill events, each event marking a sink *u* as full, and redistributing the rain falling on *u* to other sinks. See Figure 4 for an example. In our model, the maximal depressions of *Σ* fill up at a constant rate between any two consecutive spill events. That is, after a spill event occurs at time *t*_{1} and until the next occurs at time *t*_{2}, the volume of water in each maximal depression is a nondecreasing linear function of time.

**Figure 4. Example terrain and point-flood query ( p, q). Sinks are marked with squares, and saddles are marked with labels 1–8 indicating the saddle elevation. Dotted lines indicate spill events. Top: Merge tree with tributaries of q, β_{1}–β_{5} marked. Middle: Terrain seen from the side. Bottom: Terrain seen from above.**

**Tributaries.** Any given point *q* ∈ M is contained in a sequence of maximal depressions α_{1} ³ ··· ³ α_{k} ∋ *q*, each α_{i} delimited by a saddle *v _{i}* with sibling depression

*β*

_{i}. These saddles form a path in T from

*q*to the root. We refer to the maximal depressions

*β*

_{1}, …,

*β*

_{k}as the

*tributaries of q*and denote them by Γ

_{q}. See Figure 4.

**Fill and spill rates.** For a maximal depression *β*, we define the *fill rate F*_{β} : R_{≥0} → [0, 1] as the rate at which water is arriving in the depression *β* as a function of time. That is, the rate at which rain is falling directly in *β* plus the rate at which other depressions are spilling water into *β*. The fill rate *F*_{β} is a monotonically nondecreasing piecewise-constant function. Similarly, we define the *spill rate S*_{β} : R_{≥0} → [0, 1] as the rate (as a function of time) at which water spills from *β* through the saddle that delimits *β*. Let τ_{β}, called the *fill time*, be the time at which *β* becomes full. Let *β*‘ be the sibling depression of *β*. Then,

By the definition of the fill rate, for any maximal depression *β*, its initial fill rate is

which is how much rain water flows initially to the sinks of *β*. We can define *F*_{β}(0) recursively using (1) as follows: abusing the notation a little, let *F*_{v}(0) denote the water reaching a vertex *v* at time 0. Then,

For any *t* ≥ 0, the fill rate of *β* at time *t* is the direct rain reaching *β* plus the water spilling into *β* from its tributaries that are full. That is,

**Fill and spill volumes.** For a depression *β*, we similarly define F_{β}, S_{β}: R_{≥0} →R_{≥0} as *fill- and spill-volume* functions of *β*, that is, F_{β}(*t*) tells how much water has arrived in depression *β* by time *t*, and S_{β}(*t*) tells how much water has spilled from *β* by time *t.*

By definition of the spill rate, letting *β*‘ be the sibling depression of *β*, we have

### 4. Terrain-Flood Queries

In this section, we describe an algorithm for answering a terrain-flood query. That is, given a rain distribution R and a volume ψ, determine which vertices of M will be flooded if a volume of ψ rain falls according to the distribution R. A key idea of the algorithm is that if a vertex *v* is flooded, then all points lying in the depression *β*_{v} are also flooded, so we do not have to process the vertices lying in *β*_{v}. We just need to know how much (if any) water will spill to its downstream tributaries. Using this simple observation, we compute the flooded vertices of M for a given R as follows.

**Overview of the algorithm.** As a preprocessing step, we construct the merge tree T, and in doing so, we augment T with a linear-size data structure so that for each point *q* (i) the edge of T containing *q* and (ii) Vol(*β*_{q}) can each be computed in *O*(log *n*) time.

For a given rain distribution R, we first compute how much rain directly falls in each maximal depression initially. For a maximal depression *β*, let
be the rate of rain falling on vertices in *β* which are not contained in any other maximal depression. Using the recurrence

R(α) for all maximal depressions α ∈ T can be computed in *O*(|R| + *m*) time.

Next, we process the vertices in descending height order, and at each vertex *v*, we maintain the following:

- The set of
*active depressions*that are not completely filled depressions in the*h*(*v*)-sublevel set - For each active depression α (i) the fill volume F
_{α}, that is, the volume of water in α, and (ii) the set of edges crossing into α, denoted*E*(α), and finally - For each edge
*e*∈*E*(α), the volume of rain flowing along*e*denoted by Λ(*e*)

As we sweep the vertices from top to bottom, we will find the height at which depressions are flooded. At this point, we mark the corresponding depression as flooded and do not consider any vertices contained in the flooded depression.

We now describe in detail how we process each vertex. **Processing a nonsaddle vertex.** Let α_{v} be the smallest maximal depression containing *β*_{v}. A key observation is that F_{α} is the same as F_{αv}. Therefore, it does not change at a nonsaddle vertex and thus has already been computed at an earlier step. Then, if Vol(*β*_{v}) ≤ F_{β}_{v}, that is, *β*_{v} is flooded, we mark all vertices contained in *β*_{v} as flooded and remove *β*_{v} from the set of active depressions. If *β*_{v} is not flooded, for each edge (*v*, *w*) in T, we compute the water flowing along it as:

**Processing a saddle vertex.** If *v* is a nonflooded saddle vertex delimiting two maximal depressions α, α’, in addition to the above process, we must also compute the volume of rain in each of the two depressions. To do so, partition the edges *E*(*β*_{v}) into the two sets *E*(α) and *E*(α’) and compute the volume of rain crossing into α (resp. α’) as *Σ*_{e∈E(α)}Λ(e) (*e*) (resp. *Σ*_{e∈E(α)}Λ(e).) See Figure 5 for an example. These sums can be computed in a total of *O*(*n* log *n*) time summed over all saddles. Using this value along with the value of R(α) (resp. R(α’) ) in (7), we can compute F_{α} (resp. F_{α’}).

**Figure 5. (a) A single active depression β with active edges marked in red. (b) Saddle vertex v (marked in purple) delimits two maximal depressions α and α’. The active edges are partitioned into two sets: those crossing into α (resp. α’) marked in red (resp. blue); the edges connecting to v in (a) are no longer active (corresponding to upslope edges), and downslope edges from v are now active and partitioned accordingly.**

Then, the volume of water in a depression *β* is the amount falling directly in it, plus the amount of water flowing into it,

Finally, we use this value along with the depression volumes to check if α or α’ is flooded. Note that because *v* is not flooded, at most, one of these depressions will be flooded. If one is flooded, without loss of generality, let it be α. In this case, mark α as flooded, and add α’ to the set of active depressions. Then, the volume of rain spilling from α into α’ will be F_{α} – Vol(α). Let λ'(*v*, *w*) be the modified flow function, computed as if the flooded neighboring vertices have a height of *l.* Then, we update the flow of water along the edges from *v* as follows:

If neither α nor α’ is full, add them both to the set of active depressions.

THEOREM 4.1. *Given a triangulation of* M *of* R^{2} *with n vertices, a height function h* : M → R *that is linear on each face of* M, *a rain distribution R, and a volume of rain ψ, the flooded vertices of* M *can be computed in O*(*n* log *n*) *time.*

### 5. Point-Flood Queries

Given a rain distribution R, a query point *q* ∈ M, and a rain volume ψ, the point-flood query asks whether the point *q* is flooded if a volume ψ rain falls with distribution R. Of course, the terrain-flood query procedure described in Section 4 can answer this query, but our goal is to answer this query more efficiently. We first discuss an algorithm for the MFD model and then describe how the running time can be further improved for the SFD model. We also briefly discuss a variant of the point-flood query, the *flood-time query*, which asks how much it must rain before a query point *q* ∈ M is flooded.

**5.1. Point-flood query under MFD**

Under the MFD model, the algorithm exploits two observations. First, we need not compute the fill volume of all maximal depressions. In particular, suppose *q* lies in a maximal depression *β* and there are two children depressions *β*_{1} and *β*_{2} of the sibling depression *β*‘ of *β*. We do not have to compute the fill volume of *β*_{1} and *β*_{2}. It suffices to compute if *β*‘ fills and how much, if any, water spills from *β*‘ to the depression *β*_{q}. In fact, the only depressions we need to consider are the tributaries of *q.* Second, we introduce the notion of tributary graph that describes how water flows between the tributaries of *q.* The tributary graph is used to quickly compute the fill and spill volumes of the tributaries of *q.*

Using these ideas, we describe an *O*(*nm*)-size data structure for the MFD model that answers a point-flood query in *O*(|R|*k* + *k*^{2}) time, where *m* is the number of sinks in M and *k* is the number of tributaries of *q.*

**Tributary graph.** For a point *q* ∈ M, the tributary graph *G*[*q*] = (X_{q}, E_{q}) is a directed acyclic graph. X_{q} is the depression *β*_{q} plus its tributaries. For a pair of depressions α, *β* ∈ X_{q}, we add the directed edge (α, *β*) to E_{q} if *h*(Sd(α) ) > *h*(Sd(*β*) ) and Φ_{α}(Sd(α), *β*) > 0 (if *β* = *β*_{q}; then, by Sd(*β*_{q}), we mean *q.*) Recalling that Φ_{α} is the flow function when the depression α is flooded, we set the weight of the edge (α, *β*) to be Φ_{α}(Sd(α), *β*). For the MFD model, this can be computed as

that is, the weights are normalized so that the weighted out-degree of each node in G[*q*] is 1. See Figure 6 for an example. **Overview of algorithm.** It is expensive to compute the fill and spill volume functions F_{β} and S_{β} exactly for each tributary of *q*, so we define slightly different functions
for all *β* ∈ X_{q}. They are fill, spill volume functions under the assumption that every tributary of *β*_{q} fills before its sibling, that is, water spills from a tributary *β* to various sinks in the sibling *β*‘ of *β*; note that *β*‘ is a depression containing *q.*

**Figure 60 (a) A merge tree T, with tributaries ( β_{1}, β_{2}, β_{3}) of q delimited and flow from v_{1} to each sink s, Φ(v_{1}, s), marked. (b) Tributary graph G[q], with the edge weights from β_{1}.**

We define
recursively using the tributary graph G[*q*] as follows:

For any given
if and only if
< Vol(*β*_{q}). So the point-flood query can be answered by computing
and returning yes if this quantity is at least Vol(*β*_{q}).

** Preprocessing step.** In the preprocessing step, we construct the merge tree T and preprocess it so that for a point

*q*∈ M, (i) the edge of T containing

*q*and (ii) Vol(

*β*

_{q}) can be computed in

*O*(log

*n*) time.

Additionally, for each vertex *v* ∈ M and for each of *O*(*m*) maximal depressions *β*, we store the value of Φ(*v*, *β*). Actually, we need to store only nonzero values; in practice, the number of such pairs is much smaller than *mn.*

Preprocessing takes *O*(*n* log *n* + *nm*) time, and the size of the data structure is *O*(*nm*).

** Query procedure.** For a query rain distribution R and a query point, we first find the edge

*e*of T containing

*q*and Vol(

*β*

_{q}). Given

*e*, we compute the tributaries of

*q*in

*O*(

*k*) time by traversing T upward from

*e.*We now construct the tributary graph G[

*q*] = (X

_{q}, E

_{q}) in

*O*(

*k*

^{2}) time, using the precomputed values of Φ(

*v*,

*β*) in (9).

To compute
for all depressions *β* ∈ X_{q}, we first compute *F*_{β}(0) for each *β* ∈ X_{q} using the formula

and then use the recurrence 10. The total time spent by the query procedure is *O*(|R|*k* + *k*^{2}). Putting everything together, we obtain the following:

THEOREM 5.1. *Given a triangulation of* M *of* R^{2} *with n vertices, a height function h* : M → R *that is linear on each face of* M, *a data structure of size O*(*mn*) *can be constructed in O*(*n* log *n* + *mn*) *time so that a point-flood query can be answered in O*(|R|*k* + *k*^{2}) *time, where* |R| *is the complexity of the query rain distribution, k is the number of maximal depressions containing the query point, and m is the number of sinks in the terrain* (M, *h*).

**5.2. Point-flood query under SFD**

If the water flows according to an SFD model, point-flood queries can be answered even more efficiently. Under the SFD model, a key observation is that the tributary graph G[*q*] is a tree because each tributary has out degree 1; see Figure 7. To see why, note that for each vertex *v*, water flows from *v* to exactly one sink γ. Importantly, each tributary α upon becoming full will spill to a single sink γ, that is, Φ_{α}(Sd(α), γ) = 1, and for all other sinks, this will be 0. Letting *β* be the tributary containing γ, the edge (α, *β*) will have unit weight in G[*q*], and there will be no other edges from α.

**Figure 7. Left: G[ q] is a tree under the SFD model. Right: When rain falls at p ∈ β_{4}, the tributaries along the path from β_{4} to β_{q} in G[q] fill and spill in order.**

**Single-point source.** First consider the case when rain falls only at a single point *p* (contained in tributary, say *β*, of *q*). As G[*q*] is a tree, there is a unique path π from *β* to *β*_{q}. When a tributary becomes full, the water begins spilling to the next tributary in π. For *q* to become flooded, all the tributaries in π must be flooded. We can answer the point-flood query by simply following the tributaries π, pushing excess water to the next tributary until either *q* is flooded or we come to a tributary that does not get filled. See Figure 7. However, if *q* has *k* tributaries, this algorithm takes *O*(*k*) time, and in the worst case, *k* = *Ω*(*n*).

The query can be expedited using a well-known data structure called a *heavy-path tree decomposition* on T. In this data structure, T is partitioned into *heavy*-paths, such that every path intersects *O*(log *n*) heavy-paths. By precomputing prefix sums of tributary volumes along each heavy-path, we can process in amortized *O*(1) time all the tributaries in π intersecting a given heavy-path. Therefore, point-flood queries for a single-point source can be answered in *O*(log *n*) time. We can again use the heavy-path decomposition data structure to add the volumes of tributaries, but the running time now depends on the number of tributaries in which rain is falling.

**Region source.** This approach can be extended to work for rain falling in multiple tributaries. When paths from two of these tributaries intersect, both spilling into a common tributary, we simply add the spill volume from both. See Figure 8.

**Figure 8. Top: Under the SFD model, water spilling from two tributaries intersects at a single downstream tributary. Bottom: Under the MFD model, (a fraction of) water spilling from two tributaries may intersect at many downstream tributaries.**

THEOREM 5.2. *Given a triangulation of* M *of* R^{2} *with n vertices, a height function h*: M → R *that is linear on each face of* M, *a data structure of size O*(*n*) *can be constructed in O*(*n* log *n*) *time so that a point-flood query under the SFD model can be answered in O*(|R| + *k* log *n*) *time, where* |R| *is the complexity of the query rain distribution and k is the number of tributaries of q on which it is raining.*

Given a rain distribution R and a query point *q* ∈ M, we can also ask how much it must rain before *q* becomes flooded. Now, instead of just maintaining the spill and fill volumes for a fixed volume of rain ψ, we must maintain the functions for spill and fill rates as defined in (2) and (5). Under the SFD model, the algorithm described above can be extended to answer the flood-time queries without increasing the space or time complexity. The main idea is that given the rates at the predecessors of a node α in G[*q*], the fill and spill rates at α can be computed in *O*(log *n*) amortized time. See Figure 9.

**Figure 9. The fill rate F_{α} is computed from the spill rates S_{β}_{1} and S_{β}_{2}; the spill rate S_{α} is computed from F_{α}.**

However, under the MFD model, computing spill and fill rates becomes significantly more complex as the water spilling from a tributary may split and fill multiple downstream tributaries. We can, however, do better than the naive approach adapting ideas from the algorithm for the SFD model. The main idea is that by using matrix multiplication, the downstream effect of spilling from multiple tributaries can be computed in a single step. If the product of two *k* × *k* matrices can be computed in *O*(*k*^{ω}) time, a flood-time query can be answered in *O*(|R|*k* + *k*^{ω} + *k*^{2} log *n*), where |R| is the complexity of the query rain distribution and *k* is the number of tributaries of *q.*

### 6. Experiments

In this section, we present experiments we have conducted on real terrain data to demonstrate the efficiency of these algorithms and compare qualitatively the flooding under SFD and MFD models.

We have implemented the terrain-flood algorithm, described in Section 4, in C++, as well as a version of the point-flood algorithm.

We study the performance of the algorithms on three publicly available grid DEMs:

- The
*Indiana dataset*, a 0.89 mi^{2}model of a suburban area 0.5 mi northeast of Holland, IN, USA - The
*Philadelphia dataset*, a 225 km^{2}model of an urban area in the northwest area of Philadelphia, PA, USA - The
*Norway dataset*, a 10,000 km^{2}model of a mountainous region located in the Jotunheimen National Park, Norway

For the SFD model, water flows from a vertex to its lowest neighbor. For the MFD model, water flows from a vertex to all downslope neighbors with the relative rates proportional to the gradient of the slope.

**SFD vs. MFD flooding.** We considered the rain distribution R to be rain falling: (i) on a single vertex or (ii) uniformly over a region.

Our experiments show that when rain is falling at a single point, the areas that are flooded under the SFD and MFD models can be quite different. Under the MFD model, some large areas may become flooded that would not under the SFD model (Figure 10). As we increase the region on which rain is falling, we still see differences in the areas flooded, although they may be less pronounced (Figure 11). For example, the same general regions may be flooded, but under the MFD model, more water might end up in one location as opposed to that in SFD model, or water may reach more depressions. When we expand the rain distribution to be falling over the whole terrain, the regions that are flooded tend to be very similar.

**Figure 10. 10 ^{5} m^{3} (resp. 10^{7} m^{3}) of rain falls at p in the Indiana dataset. The flooded regions of Σ are marked in blue, with regions that water flows over marked in red. Water flows according to SFD in (a) and MFD in (b).**

**Figure 11. 100 m of rain falls uniformly over the square outlined in white in the Norway dataset. (a) Water flows according to MFD, (b) water flows according to SFD. (c) and (d) show a 3 km × 3 km area around the region it is raining on.**

Another difference in the two models, irrespective of the area of the region where rain is falling, is how water flows over the terrain. Under 6the SFD model, water flows along disjoint paths, whereas under the MFD model, it spreads more on the terrain (Figures 10 and 11). We illustrate these observations here with a few examples.

For the case when rain falls at a single point, we computed the flooded areas with rain volume of 10^{5} m^{3} on the Indiana dataset and 10^{7} m^{3} on the Norway dataset. Figure 10 shows the terrain-flood query for two single point rain distributions under both the SFD and MFD models. In Figure 10(a), we see that under the SFD model water follows a single path from *p* north east, first filling a large region before spilling and filling a series of smaller regions as the water flows west toward a feature corresponding to a river. In Figure 10(b), under the MFD model, the water splits at *p* and fills a number of depressions to the south west of *p.* We additionally note that under the MFD model in (b) water spreads out more and flows across a wider path between full depressions.

For the case where rain is falling uniformly over a small square, we set the rain distribution to be uniform over a square of size 1 km × 1 km and set ψ = 10^{6} m^{3} and computed the flooded area for the Norway dataset, under both SFD and MFD models. Figure 11 shows the queries along with enlarged images of the region on which it is raining. We see that although similar areas become flooded, water spreads out across the peak more under the MFD model, and a larger fraction of the water flows to the southwest. In contrast, under the SFD model, the rain follows narrow bands outside of the rain region, and a larger fraction of the water flows to the north.

**Performance of algorithms.** Building the merge tree and preprocessing it to compute the depression volume of any point as well as to perform lowest common ancestor queries took an average of 1.33 s over 5 trials for the Indiana dataset containing 10^{6} vertices.

We also ran tests over the Philadelphia dataset, taking subregions with 1.6 × 10^{7}, 9 × 10^{6} and 10^{6} vertices, and on the Norway dataset. Our experiments show that in practice, the preprocessing time is near linear in the size of the terrain. Although preprocessing does require sorting the nodes, which takes *O*(*n* log *n*) time in the worst case, in practice, the constant on this term is much smaller than that of the linear steps in the preprocessing.

We examined the running time of the terrain-flood query on the Indiana dataset as we change the amount of rain ψ. Increasing the volume of rain first increases the running time until it reaches a peak and then decreases, becoming very fast with the largest volumes of rain. When a small amount of rain is falling in a small area, water reaches very few depressions and thus only a small portion of the merge tree is explored by the algorithm. As the volume and area of rain increases, the algorithm explores larger portions of the merge tree leading to an increase in the running time. However, once the volume of rain increases further, large depressions get filled and the algorithm succeeds in pruning large portions of the merge tree. Roughly speaking, the running time of the algorithm is proportional to the number of depressions that are partially filled.

Finally, we examined the running time of the terrain-flood query on the Indiana and Philadelphia datasets as we change the number of points with positive rain in R, raining uniformly over squares of varying sizes. Although the running time increases with the number of vertices with positive rainfall, it grows slower than the linear dependence indicated by the worst-case time analysis.

### 7. Related Work

The flood-risk problem has been widely studied in multiple research communities, and many different approaches have been proposed to tackle this problem. One such approach, coming from the hydrology community, simulates fluid dynamics, using nonlinear partial differential equations such as the Navier-Stokes equations.^{7, 14} They often account for additional factors, such as the effects of different terrain types and drainage networks. Although these models tend to be the most accurate, naive applications are computationally expensive and many techniques such as multiresolution representation of the terrain and approximate computation have been proposed to expedite the computation.^{7, 14, 25} Not withstanding much work on to reducing the computational cost of these methods, these algorithms are hard to scale to large terrains.

Recently, machine-learning-based approaches have been proposed for predicting flood risk.^{10, 24} These approaches are relatively fast, while maintaining a reasonable level of predictive power. However, these methods generally are used for predicting fluvial floods (i.e., flooding of rivers) and rely on stream gauges or other sensors already being installed in the area of interest. Tehrany et al.^{24} tested the efficacy of various support vector machine (SVM) kernels at predicting the overall flood hazard of points on a terrain using historical flood events and a number of terrain features such as slope, altitude, surface runoff, and distance from a river. Chang et al.^{10} used self-organizing maps and neural networks to forecast the flood inundation in the near future (1–3 h) given the current inundated areas.

There has been extensive work on modeling water flow on a terrain in the GIS community.^{2, 3, 4, 5, 6, 9, 11, 15, 17, 19, 20, 21} These approaches use simpler models, focusing on the geometry of the terrain. These tend to be more computationally efficient and suitable for large datasets. However, the simplifying assumptions mean that they may not be as accurate as PDE-based models in all situations. For example, they do not take into account absorption of water into the ground and are thus more suitable for flash floods wherein most of the flooding occurs over a shorter timespan. Liu and Snoeyink^{17} (see also Arge et al.^{6}) proposed an *O*(*n* log *n*)-time algorithm under the single-flow direction (SFD) model that computes the fill times of all depressions assuming rain is falling at a constant rate on the entire terrain.

Arge et al.^{4} described an *O*(*n* log *n*)-time algorithm, under the SFD model, to compute the set of flooded vertices when a given volume of rain ψ ≥ 0 falls on a given region of the terrain.

### 8. Conclusion

In this paper, we have presented efficient algorithms for a few flood-risk queries: the *terrain-flood* query that asks which vertices of a terrain will be flooded and the *point-flood* query that asks if a given point will be flooded. The work is only a small step toward performing flood-risk analysis in real time over a large terrain, and there are many open questions:

Can these algorithms be extended to incorporate uncertainty in data as well as in the model? There have been some preliminary results for uncertainty of terrain height under the SFD model,^{21} but this problem remains largely open.

The flooding model described in this paper depends only on the elevation of the terrain data. In reality, there are other factors that affect flooding such as terrain type and permeability as well as drainage networks. Can a model be developed that incorporates additional terrain data? Furthermore, can historical flood data be incorporated into this model to more accurately compute flood risk?

The flooding model described here also assumes that water flows only along edges of the terrain and that water flows instantaneously to the sinks. Although these assumptions are reasonable in some settings (e.g., flash floods and high-resolution terrain models), can a model be developed that incorporates the velocity of the water and allows for channel flow?

### Acknowledgments

Work by Lowe and Agarwal is supported by NSF under grants CCF-15-13816, CCF-15-46392, and IIS-14-08846, by ARO grant W911NF-15-1-0408, and by grant 2012/229 from the U.S.-Israel Binational Science Foundation. Work by Rav is partially supported by the Innovation Fund Denmark.

## Join the Discussion (0)

## Become a Member or Sign In to Post a Comment