Home/Magazine Archive/March 2018 (Vol. 61, No. 3)/Time-Inconsistent Planning: A Computational Problem.../Full Text

Research highlights
## Time-Inconsistent Planning: A Computational Problem in Behavioral Economics

In many settings, people exhibit behavior that is inconsistent across timewe allocate a block of time to get work done and then procrastinate, or put effort into a project and then later fail to complete it. An active line of research in behavioral economics and related fields has developed and analyzed models for this type of time-inconsistent behavior.

Here we propose a graph-theoretic model of tasks and goals, in which dependencies among actions are represented by a directed graph, and a time-inconsistent agent constructs a path through this graph. We first show how instances of this path-finding problem on different input graphs can reconstruct a wide range of qualitative phenomena observed in the literature on time-inconsistency, including procrastination, abandonment of long-range tasks, and the benefits of reduced sets of choices. We then explore a set of analyses that quantify over the set of all graphs; among other results, we find that in any graph, there can be only polynomially many distinct forms of time-inconsistent behavior; and any graph in which a time-inconsistent agent incurs significantly more cost than an optimal agent must contain a large "procrastination" structure as a minor. Finally, we use this graph-theoretic model to explore ways in which tasks can be designed to motivate agents to reach designated goals.

A fundamental issue in behavioral economicsand in the modeling of individual decision-making more generallyis to understand the effects of decisions that are inconsistent over time. Examples of such inconsistency are widespread in everyday life: we make plans for completing a task but then procrastinate; we put work into getting a project partially done but then abandon it; we pay for gym memberships but then fail to make use of them. In addition to analyzing and modeling these effects, there has been increasing interest in incorporating them into the design of policies and incentives in domains that range from health to personal finance.

These types of situations have a recurring structure: a person makes a plan at a given point in time for something they will do in the future (finishing homework, exercising, paying off a loan), but at a later point in time they fail to follow through on the plan. Sometimes this failure is the result of unforeseen circumstances that render the plan invalida person might join a gym but then break their leg and be unable to exercisebut in many cases the plan is abandoned even though the circumstances are essentially the same as they were at the moment the plan was made. This presents a challenge to any model of planning based on optimizing a utility function that is consistent over time: in an optimization framework, the plan must have been an optimal choice at the outset, but later it was optimal to abandon it. A line of work in the economics literature has thus investigated the properties of planning with objective functions that vary over time in certain natural and structured ways.

**A basic example and model**

To introduce these models, it is useful to briefly describe an example due to George Akerlof,^{1} with the technical details adapted slightly for the discussion here. (The story will be familiar to readers who know Akerlof's paper; we cover it in some detail because it will motivate a useful and recurring construction later in the work.) Imagine a decision-making agentAkerlof himself, in his storywho needs to ship a package sometime during one of the next *n* days, labeled *t* = 1, 2, ..., *n*, and must decide on which day *t* to do so. Each day that the package has not been sent results in a fixed cost of 1 (per day), due to the lack of use of the package's contents; this means a cost of *t* if it is shipped on day *t.* (For simplicity, we will disregard the time the package spends in transit, which is a constant additional cost regardless of when it is shipped.) Also, shipping the package is an elaborate operation that will result in one-time cost of *c* > 1, due to the amount of time involved in getting it sent out. The package must be shipped during one of the *n* specified days.

What is the optimal plan for shipping the package? Clearly the cost *c* will be incurred exactly once regardless of the day on which it is shipped, and there will also be a cost of *t* if it is shipped on day *t.* Thus we are minimizing *c* + *t* subject to 1 *t* *n*; the cost is clearly minimized by setting *t* = 1. In other words, the agent should ship the package right away.

But in Akerlof's story, he did something that should be familiar from one's own everyday experience: he procrastinated. Although there were no unexpected changes to the trade-offs involved in shipping the package, when each new day arrived there seemed to be other things that were more crucial than sending it out that day, and so each day he resolved that he would instead send it tomorrow. The result was that the package was not sent out until the end of the time period. (In fact, he sent it a few months into the time period once something unexpected happened to change the cost structurea friend offered to send it as part of a larger shipmentthough this wrinkle is not crucial for the story.)

There is a natural way to model an agent's decision to procrastinate, using the notion of *present bias*the tendency to view costs and benefits that are incurred at the present moment to be more salient than those incurred in the future. In particular, suppose that for a constant *b* > 1, costs that one must incur in the current time period are increased by a factor of *b* in one's evaluation.^{a} Then in Akerlof's example, when the agent on day *t* is considering the decision to send the package, the cost of sending it on day *t* is *bc* + *t*, while the cost of sending it on day *t* + 1 is *c* + *t* + 1. The difference between these two costs is (*b* 1)*c* 1, and so if (*b* 1)*c* > 1, the agent will decide on each day *t* that the optimal plan is to wait until day *t* + 1; things will continue this way until day *n*, when waiting is no longer an option and the package must be sent.

**Quasi-hyperbolic discounting**

Building on considerations such as those above, and others in earlier work in economics,^{16, 19} a significant amount of work has developed around a model of time-inconsistency known as *quasi-hyperbolic discounting.*^{12} In this model, parametrized by quantities , 1, a cost or reward of value *c* that will be realized at a point *t* 1 time units into the future is evaluated as having a present value of ^{t}*c.* (In other words, values at time *t* are discounted by a factor of ^{t}.) With = 1 this is the standard functional form for exponential discounting, but when < 1 the function captures present bias as well: values in the present time period are scaled up by ^{1} relative to all other periods. (In what follows, we will consistently use *b* to denote ^{1}.)

Research on this (, )-model of discounting has been extensive, and has proceeded in a wide variety of directions; see Ref. Frederick et al.^{7} for a review. To keep our analysis clearly delineated in scope, we make certain decisions at the outset relative to the full range of possible research questions: we focus on a model of agents who are *naive*, in that they do not take their own time-inconsistency into account when planning; we do not attempt to derive the (, )-model from more primitive assumptions but rather take it as a self-contained description of the agent's observed behavior; and we discuss the case of = 1 so as to focus attention on the present-bias parameter . Note that the initial Akerlof example has all these properties; it is essentially described in terms of the (, )-model with an agent who is naive about his own time-inconsistency, with = 1, and with = *b*^{1} (using the parameter *b* from that discussion).

Our starting point in this paper is to think about some of the qualitative predictions of the (, )-model, and how to analyze them in a unified framework. In particular, research in behavioral economics has shown how agents making plans in this model can exhibit the following behaviors.

*Procrastination*, as discussed above.*Abandonment*of long-range tasks, in which a person starts on a multi-stage project but abandons it in the middle, even though the costs and benefits of the project have remained essentially unchanged.^{15, b}- The benefits of
*choice reduction*, in which reducing the set of options available to an agent can actually help them reach a goal more efficiently.^{9, 14}A canonical example is the way in which imposing a deadline can help people complete a task that might not get finished in the absence of a deadline.^{4}

These consequences of time-inconsistency, as well as a number of others, have in general each required their own separate and sometimes quite intricate modeling efforts. It is natural to ask whether there might instead be a single framework for representing tasks and goals in which all of these effects could instead emerge "mechanically," each just as a different instance of the same generic computational problem. With such a framework, it would become possible to search for worst-case guarantees, by quantifying over all instances, and to talk about designing or modifying given task structures to induce certain desired behaviors.

**The present work: A graph-theoretic model**

Here we propose such a framework, using a graph-theoretic formulation. We consider an agent with present-bias parameter who must construct a path in a directed acyclic graph *G* with edge costs, from a designated start node *s* to a designated target node *t.* We assume without loss of generality that all the edges of *G* lie on some *s* *t* path. We will call such a structure a *task graph.* Informally, the nodes of the task graph represent states of intermediate progress toward the goal *t*, and the edges represent transitions between them. Directed graphs have been shown to have considerable expressive power for planning problems in the artificial intelligence literature;^{18} this provides evidence for the robustness of a graph-based approach in representing these types of decision environments. Our concerns in this work, however, are quite distinct from the set of graph-based planning problems in artificial intelligence, since our aim is to study the consequences of time-inconsistency in these domains.

A sample instance of this problem is depicted in Figure 1, with the costs drawn on the edges. When the agent is standing at a node *v*, it determines the minimum cost of a path from *v* to *t*, but it does so using its present-biased evaluation of costs: the cost of the first edge on the path (starting from *v*) is evaluated according to the true cost, and all subsequent edges have their costs reduced by . If the agent chooses path *P*, it follows just the first edge (*v, w*) of *P*, and then it re-evaluates which path to follow using this same present-biased evaluation but now from *w.* In this way, the agent iteratively constructs a path from *s* to *t.*

**Figure 1. A present-biased agent must choose a path from s to t.**

In the next section we will show how our graph-theoretic model easily captures time-inconsistency phenomena including procrastination, abandonment, and choice reduction. But to make the definitions concrete, it is useful to work through the agent's computation on the graph depicted in Figure 1. As shown in Figure 1, an agent that has a present-bias parameter of = 1/2 needs to go from *s* to *t.* From *s*, the agent evaluates the path *s-a-b-t* as having cost 16 + 2 + 2 = 18, the path *s-c-d-t* as having cost 8 + 8 + 8 = 16, and the path *s-c-e-t* as having cost 8 + 2 + 16 = 17. Thus the agent traverses the edge (*s, c*) and ends up at *c.* From *c*, the agent now evaluates the path *c-d-t* as having cost 8 + 8 = 12 and the path *c-e-t* as having cost 2 + 16 = 10, and so the agent traverses the edge (*c, e*) and then (having no further choices) continues on the edge (*e, t*).

This example illustrates a few points. First, when the agent set out on the edge (*s, c*), it was intending to next follow the edge (*c, d*), but when it got to *c*, it changed its mind and followed the edge (*c, e*). A time-consistent agent (with = 1), in contrast, would never do this; the path it decides to take starting at *s* is the path it will continue to follow all the way to *t.* Second, we are interested in whether the agent minimizes the cost of traveling from *s* to *t* according to the real costs, not according to its evaluation of the costs, and in this regard it fails to do so; the shortest path is *s-a-b-t*, with a cost of 20, while the agent incurs a cost of 26.

**Overview of results**

Our graph-theoretic framework makes it possible to reason about time-inconsistency effects that arise in very different settings, provided simply that the underlying decisions faced by the agent can be modeled as the search for a path through a graph-structured sequence of options. And perhaps more importantly, since it is tractable to ask questions that quantify over all possible graphs, we can cleanly compare different scenarios, and search for the best or worst possible structures relative to specific objectives. This is difficult to do without an underlying combinatorial structure. For example, suppose we were inspired by Akerlof's example to try identifying the scenario in which time-inconsistency leads to the greatest waste of effort. *A priori*, it is not clear how to formalize the search over all possible "scenarios." But as we will see, this is precisely something we can do if we simply ask for the graph in which time-inconsistency produces the greatest ratio between the cost of the path traversed and cost of the optimal path.

Moreover, with this framework in place, it becomes easier to express formal questions about design for these contexts: if as a designer of a complex task we are able to specify the underlying graph structure, which graphs will lead time-inconsistent agents to reach the goal efficiently?

Our core questions are based on quantifying the inefficiency from time-inconsistent behavior, designing task structures to reduce this inefficiency, and comparing the behavior of agents with different levels of time-inconsistency. Specifically, we ask:

- In which graph structures does time-inconsistent planning have the potential to cause the greatest waste of effort relative to optimal planning?
- How do agents with different levels of present bias (encoded as different values of ) follow combinatorially different paths through a graph toward the same goal?
- Can we increase an agent's efficiency in reaching a goal by deleting nodes and/or edges from the underlying graph, thus reducing the number of options available?

In what follows, we address these questions in turn. For the first question, we consider *n*-node graphs and ask how large the *cost ratio* can be between the path followed by a present-biased agent and the path of minimum total cost. Since deviations from the minimum-cost plan due to present bias are sometimes viewed as a form of "irrational" behavior, this cost ratio effectively serves as a "price of irrationality" for our system. We give a characterization of the worst-case graphs in terms of *graph minors*; this enables us to show, roughly speaking, that any instance with sufficiently high cost ratio must contain a large instance of the Akerlof example embedded inside it.

For the second question, we consider the possible paths followed by agents with different present-bias parameters . As we sweep over the interval [0, 1], we have a type of *parametric* path problem, where the choice of path is governed by a continuous parameter ( in this case). We show that in any instance, the number of distinct paths is bounded by a polynomial function of *n*, which forms an interesting contrast with canonical formulations of the parametric shortest-path problem, in which the number of distinct paths can be super polynomial in *n.*^{5, 13}

Lastly, for the third question, we show how it is possible for agents to be more efficient when nodes and/or edges are deleted from the underlying graph; on the other hand, if we want to motivate an agent to follow a particular path *P* through the graph, it can be crucial to present the agent with a subgraph that includes not just *P* but also certain additional nodes and edges that do not belong to *P.* We give a graph-theoretic characterization of the possible subgraphs supporting efficient traversal.

Before turning to these questions, we first discuss the basic graph-theoretic problem in more detail, showing how instances of this problem capture the time-inconsistency phenomena discussed earlier in this section.

In order to argue that our graph-theoretic model captures a variety of phenomena that have been studied in connection with time-inconsistency, we present a sequence of examples to illustrate some of the different behaviors that the model exhibits. We note that the example as shown in Figure 1 already illustrates two simple points: that the path chosen by the agent can be suboptimal; and that even if the agent traverses an edge *e* with the intention of following a path *P* that begins with *e*, it may end up following a different path *P'* that also begins with *e.*

For an edge *e* in *G*, let *c*(*e*) denote the cost of *e*; and for a path *P* in *G*, let *e _{i}*(

We begin by observing that Figure 2(a) represents a version of the Akerlof example from the introduction. (Recall that here we use *b* to denote ^{1}.) Node *t* represents the state in which the agent has sent the package, and node *v _{i}* represents the state in which the agent has reached day

**Figure 2. Path problems that exhibit procrastination, abandonment, and choice reduction.**

**Extending the model to include rewards**

Thus far we can not talk about an agent who abandons its pursuit of the goal midway through, since our model requires the agent to construct a path that goes all the way to *t.* A simple extension of the model enables to consider such situations.

Suppose we place a reward of *r* at the target node *t*, which will be claimed if the agent reaches *t.* Standing at a node *v*, the agent now has an expanded set of options: it can follow an edge out of *v* as before, or it can quit taking steps, incurring no further cost but also not claiming the reward. The agent will choose the latter option precisely when either there is no *vt* path, or when the minimum cost of a *v-t* path exceeds the value of the reward, evaluated in light of present bias: for all *v-t* paths *P.* It is important to note a key feature of this evaluation: the reward is always discounted by relative to the cost that is being incurred in the current period, even if the reward will be received right after this cost is incurred. (e.g., if the path *P* has a single edge, then the agent is comparing *c*(*e*_{1}(*P*) ) to *r.*)

In what follows, we will consider both these models: the former *fixed-goal model*, in which the agent must reach *t* and seeks to minimize its cost; and the latter *reward model* in which the agent trades off cost incurred against reward at *t*, and has the option of stopping partway to *t.* Aside from this distinction, both models share the remaining ingredients, based on traversing an *s-t* path in *G.*

It is easy to see that the reward model displays the phenomenon of *abandonment*, in which the agent spends some cost to try reaching *t*, but then subsequently gives up without receiving the reward. Consider for example a three-node path on nodes *s*, *v*_{1}, and *t*, with an edge *c*(*s, v*_{1}) = 1 and *c*(*v*_{1}, *t*) = 4. If = 1/2 and there is a reward of 7 at *t*, then the agent will traverse the edge (*s, v*_{1}) because it evaluates the total cost of the path at 1 + 4 = 3 < 7 = 3.5. But once it reaches *v*_{1}, it evaluates the cost of completing the path at 4 > 7 = 3.5, and so it quits without reaching *t.*

**An example involving choice reduction**

It is useful to describe a more complex example that shows the modeling power of this shortest-path formalism, and also shows how we can use the model to analyze deadlines as a form of beneficial choice reduction. (As should be clear, with a time-consistent agent it can never help to reduce the set of choices; such a phenomenon requires some form of time-inconsistency.) First we describe the example in text, and then show how to represent it as a graph.

Imagine a student taking a three-week short course in which the required work is to complete two small projects by the end of the course. It is up to the student when to do the projects, as long as they are done by the end. The student incurs an effort cost of one from any week in which she does no projects (since even without projects there is still the lower-level effort of attending class), a cost of four from any week in which she does one project, and a cost of nine from any week in which she does both projects. Finally, the student receives a reward of 16 for completing the course, and she has a present-bias parameter of = 1/2.

Figure 2(b) shows how to represent this scenario using a graph. Node *v _{ij}* corresponds to a state in which

How does the student's construction of an *s-t* path work out? From *s*, she goes to *v*_{10} and then to *v*_{20}, intending to complete the path to *t* via the edge (*v*_{20}, *t*). But at *v*_{20}, she evaluates the cost of the edge (*v*_{20}, *t*) as 9 > *r* = 16/2 = 8, and so she quits without reaching *t.* The story is thus a familiar one: the student plans to do both projects in the final week of the course, but when she reaches the final week, she concludes that it would be too costly and so she drops the course instead.

The instructor can prevent this from happening through a very simple intervention. If he requires that the first project be completed by the end of the second week of the course, this corresponds simply to deleting node *v*_{20} from the graph. With *v*_{20} gone, the path-finding problem changes: now the student starting at s decides to follow the path *s-v*_{10} -*v*_{21} -*t*, and at *v*_{10} and then *v*_{21} she continues to select this path, thereby reaching *t.* Thus, by reducing the set of options available to the studentand in particular, by imposing an intermediate deadline to enforce progressthe instructor is able to induce the student to complete the course.

There are many stories like this one about homework and deadlines, and our point is not to focus too closely on it in particular. Indeed, to return to one of the underpinnings of our graph-theoretic formalism, our point is in a sense the opposite: it is hard to reason about the space of possible "stories," whereas it is much more tractable to think about the space of possible graphs. Thus by encoding the set of stories mechanically in the form of graphs, it becomes feasible to reason about them as a whole.

We have thus seen how a number of different time-inconsistency phenomena arise in simple instances of the path-finding problem. The full power of the model, however, lies in proving statements that quantify over all graphs; we begin this next.

Our path-finding model naturally motivates a basic quantity of interest: the *cost ratio*, defined as the ratio between the cost of the path found by the agent and the cost of the shortest path. We work here within the fixed-goal version of the model, in which the agent is required to reach the goal *t* and the objective is to minimize the cost of the path used.

To fix notation for this discussion, given a directed acyclic graph *G* on *n* nodes with positive edge costs, we let *d*(*v, w*) denote the cost of the shortest *v-w* path in *G* (using the true edge costs, not modified by present bias). Let *P*_{}(*v, t*) denote the the *v-t* path followed by an agent with present-bias , and let *c*_{}(*v, t*) be the total cost of this path. The cost ratio can thus be written as *c*_{}(*s, t*)/*d*(*s, t*).

**A bad example for the cost ratio**

We first describe a simple construction showing that the cost ratio can be exponential in the number of nodes *n.* We then move on to the main result of this section, which is a characterization of the instances in which the cost ratio achieves this exponential lower bound.

Our construction is an adaptation of the Akerlof example from the introduction. We have a graph that consists of a directed path *s* = *v*_{0}, *v*_{1}, *v*_{2}, ..., *v _{n}*, and with each

Now, when the agent is standing at node *v _{j}*, it evaluates the cost of going directly to

The following observation establishes this is the highest possible cost ratio

OBSERVATION 3.1. *Consider an agent currently at v and let u be the next node on P*_{}(*v, t*). *Then d*(*u, t*) < *b* · *d*(*v, t*).

PROOF. If *u* is on the shortest path from *v* to *t* then clearly the claim holds. Else, since the agent chose to continue to *u* instead of continuing on the shortest path from *v* to *t* we have *c*(*v, u*) + *d*(*u, t*) *d*(*v, t*). This implies that *d*(*u, t*) *b* · *d*(*v, t*).

The observation essentially implies that with each step that the agent takes the cost of the path that it plans to take increases by an extra factor of at most *b* relatively to *d*(*s, t*). Hence, we have a tight upper bound on the cost ratio:

COROLLARY 3.2. *The cost ratio for a graph G with n nodes is at most b*^{n2}.

**A graph minor characterization**

We now provide a structural description of the graphs on which the cost ratio can be exponential in the number of nodes *n*essentially we show that a constant fraction of the nodes in such a graph must have the structure of the Akerlof example.

We make this precise using the notion of a *minor.* Given two undirected graphs *H* and *K*, we say that *H contains a K-minor* if we can map each node *k* of *K* to a connected subgraph *S _{k}* in

Our goal here is to show that if *G* has exponential cost ratio, then its undirected version must contain a large copy of the graph underlying the Akerlof example as a minor. In other words, the Akerlof example is not only a way to produce a large cost ratio, but it is in a sense an unavoidable signature of any example in which the cost ratio is very large.

We set this up as follows. Let (*G*) denote the skeleton of *G*, the undirected graph obtained by removing the directions on the edges of *G.* Let *F _{k}* denote the graph with nodes

We now claim

THEOREM 3.3. *For every* > 1, *if n is sufficiently large and the cost ratio is greater than* ^{n}, *then* (*G*) *contains an F _{k}-minor for k* = (

**Proof sketch.**^{c} We now provide a sketch of the proof. The basic idea is to pin down *k* = (*n*) nodes, *v*_{1}, ..., *v _{k}*, on the path

**Figure 3. The construction of an F_{k}-minor in Theorem 3.3.**

Once we have this structure, we obtain an *F _{k}* minor by partitioning

For simplicity we normalize the edges costs such that *d*(*s, t*) = 1. We choose the nodes *v*_{1}, ..., *v _{k}* such that for every

Finally, we should show that indeed for every 1 *i* *k*, there exists a distinct node *v _{i}* with the property that (i)

After the initial publication of our original paper, Tang et al.^{20} extended Theorem 3.3 as follows:

THEOREM 3.4. *If the cost ratio is greater than b*^{k2}, then (*G*) *contains an F _{k}-minor.*

Both Theorem 3.3 and its tighter counterpart Theorem 3.4 offer some qualitatively relevant advice for thinking about the structure of complicated tasks: to avoid creating inefficient behavior due to present bias, it is better to organize tasks so that they do not contain large fan-like structures. The point to appreciate is that such fan-like structures are not purely graph-theoretic abstractions; they arise in real settings whenever a task has a series of "branches" (as in Akerlof's story) that allows an agent to repeatedly put off completing the task. The theorems are a way of formalizing the idea that such sets of repeated branches are the crux of the reason why present-biased individuals incur unnecessary inefficiency in completing large tasks. And correspondingly, organizing tasks in a way that breaks this type of branchingfor example, with intermediate deadlines or subgoalscan be a way of reducing inefficiency.

Thus far we have focused on the behavior of a single agent with a given present-bias parameter . Now we consider all possible values of , and ask the following basic question: how large can the set {*P*_{}(*s, t*) : [0, 1]} be? In other words, if for each , an agent with parameter were to construct an *s-t* path in *G*, how many different paths would be constructed across all the agents? Bounding this quantity tells us how many genuinely "distinct" types of behaviors there are for the instance defined by *G.* Let *P*(*G*) denote the set {*P*_{}(*s, t*) : [0, 1]}. Despite the fact that comes from the continuum [0, 1], the set *P*(*G*) is clearly finite, since *G* only has finitely many *s-t* paths. The question is whether we can obtain a nontrivial upper bound on the size of *P*(*G*), and in particular one that does not grow exponentially in the number of nodes *n.* In fact this is possible, and our main goal in this section is to prove the following theorem.

THEOREM 4.1. *For every directed acyclic graph G, the size of* *P*(*G*) *is O*(*n*^{2}). *Moreover, there exists a graph for which the size of P*(*G*) *is* (*n*^{2}).

**Proof idea.** We use the following procedure to "discover" all the paths in *P*(*G*). We start by taking = 0 and let *P*_{0} the path that the agent with = 0 takes. Now, we gradually increase till we reach * such that an agent with * will take a different path. We claim that there is at least one edge (*v, u*) in *P*_{0} that will not take part in any path that an agent with > * will take. More generally, each time that we discover a new path, we essentially delete at least one edge from the graph. Hence the number of paths in *P*(*G*) is bounded by the number of edges in the graph.

Theorem 4.1 tells us that the effect of present-bias on the path that an agent takes is in some sense limited as the specific value of an agent has determines which path will the agent take from a precomputed set of at most *O*(*n*^{2}) different paths. Since this *O*(*n*^{2}) quantity is much smaller in general than the full set of possible paths, it says that the possible heterogeneity in agent behavior based on different levels of present bias is not as extensive as it might initially seem. Furthermore, quantifying this heterogeneity is a first step in designing efficient task graphs for populations of agents that are heterogeneous.

We now consider the version of the model with rewards: there is a reward at *t*, and the agent has the additional option of quitting if it perceivesunder its present-biased evaluationthat the value of the reward is not worth the remaining cost in the path.

Note that the presence of the reward does not affect the agent's *choice* of path, only whether it continues along the path. Thus we can clearly determine the minimum reward *r* required to motivate the agent to reach the goal in *G* by simply having it construct a path to *t* according to our standard fixed-goal model, identifying the node at which it perceives the remaining cost to be the greatest (due to present bias this might not be *s*), and assigning this maximum perceived cost as a reward at *t.*

A more challenging question is suggested by the possibility of deleting nodes and edges from *G*; recall that Figure 2(b) showed a basic example in which the instructor of a course was able to motivate a student to finish the course-work by deleting a node from the underlying graph. (This deletion essentially corresponded to introducing a deadline for the first piece of work.) This shows that even if the reward remains fixed, in general it may be possible for a designer to remove parts of the graph, thereby reducing the set of options available to the agent, so as to get the agent to reach the goal. We now consider the structure of the subgraphs that naturally arise from this process.

**Motivating subgraphs: A fundamental example**

The basic set-up we consider is the following. Suppose the agent in the reward model is trying to construct a path from *s* to *t* in *G*; the reward *r* is not under our controlperhaps it is defined by a third party, or represents an intrinsic reward that we can not augmentbut we are able to remove nodes and edges from the graph (essentially by declaring certain activities invalid, as the deadline did in Figure 2(b) ). Let us say that a subgraph *G'* of *G motivates* the agent if in *G'* with reward *r*, the agent reaches the goal node *t.* (We will also refer to *G'* as a *motivating subgraph.*) Note that it is possible for the full graph *G* to be a motivating subgraph.

It would be natural to conjecture that if there is any subgraph *G'* of *G* that motivates the agent, then there is a motivating subgraph consisting simply of an *s-t* path *P* that the agent follows. In fact this is not the case. Figure 4 shows a graph illustrating a phenomenon that we find somewhat surprising *a priori*, though not hard to verify from the example. In the graph *G* depicted in the figure, an agent with = 1/2 will reach the goal *t.* However, there is no proper subgraph of *G* in which the agent will reach the goal. The point is that the agent starts out expecting to follow the path *s-a-b-t*, but when it gets to node *a* it finds the remainder of the path *a-b-t* too expensive to justify the reward, and it switches to *a-c-t* for remainder. With just the path *s-a-b-t* in isolation, the agent would get stuck at *a*; and with just *s-a-c-t*, the agent would never start out from *s.* It is crucial the agent mistakenly believes the upper path is an option in order to eventually use the lower path to reach the goal.

**Figure 4. A minimal subgraph for getting an agent to reach t.**

It is interesting, of course, to consider real-life analogues of this phenomenon. In some settings, the structure in Figure 4 could correspond to deceptive practices on the part of the designer of *G*in other words, inducing the agent to reach the goal by misleading them at the outset. But there are other settings in real life where one could argue that the type of deception represented here is more subtle, not any one party's responsibility, and potentially even salutary. For example, suppose the graph schematically represents the learning of a skill such as a musical instrument. There's the initial commitment corresponding to the edge (*s, a*), and then the fork at *a* where one needs to decide whether to "get serious about it" (taking the expensive edge (*a, b*) ) or not (taking the cheaper edge (*a, c*) ). In this case, the agent's trajectory could describe the story of someone who derived personal value from learning the violin (the lower path) even though at the outset they believed incorrectly that they would be willing to put the work into becoming a concert violinist (the upper path).

**The structure of minimal motivating subgraphs**

Given that there is sometimes no single path in *G* that is motivating, how rich a subgraph do we necessarily need to motivate the agent? Let us say that a subgraph *G** of *G* is a *minimal motivating subgraph* if (i) *G** is motivating, and (ii) no proper subgraph of *G** is motivating. Thus, for example, in Figure 4, the graph *G* is a minimal motivating subgraph of itself; no proper subgraph of *G* is motivating.

Concretely, then, we can ask the following question: what can a minimal motivating subgraph look like? For example, could it be arbitrarily dense with edges?

In fact, minimal motivating subgraphs necessarily have a sparse structure, which we now describe in our next theorem. To set up this result, we need the following definition. Given a directed acyclic graph *G* and a path *P* in *G*, we say that a path *Q* in *G* is a *P-bypass* if the first and last nodes of *Q* lie on *P*, and no other nodes of *Q* do; in other words, *P* *Q* is equal to the two ends of *Q.* We now have

THEOREM 5.1. *If G* is a minimal motivating subgraph, then it contains an s-tpath P* with the properties that*

*Every edge of G* is either part of P* or lies on a P*-bypass in G*; and**Every node of G* has at most one outgoing edge that does not lie on P*.*

**Proof sketch.** Roughly speaking, there are two types of edges that should be included in a minimal motivating subgraph: edges that the agent will actually take (these are the edges of the path *P**) and edges that at some point the agent (wrongly) believes that it will take (these are the edges on the *P**-bypasses). It is clearly the case that an edge *e* that the agent never plans to take can be safely removed from the graph without affecting the agent's decisions. Furthermore, since all the bypass edges are only used in shortest path computations it is impossible for a node *v* in *P** to have two neighbors *w*_{1} and *w*_{2} not on *P** such that the agent at some *v*_{1} on *P** plans to follow a path that includes the edge (*v, w*_{1}) and an agent at some *v*_{2} on *P** plans to follow a path that includes the edge (*v, w*_{2}). This is simply because if (*v, w*_{1}) + *d*(*w*_{1}, *t*) (*v, w*_{2}) + *d*(*w*_{2}, *t*) the agent will choose the path that contains (*v*, *w*_{1}) both when standing at *v*_{1} and at *v*_{2} and otherwise it will choose the path that contains (*v, w*_{2}) in both cases.

After the publication of our original paper, Tang et al.^{20} and Albers and Kraft^{2} independently showed that determining whether a task graph admits a motivating subgraph is NP-complete. One way of circumventing these hardness results is to identify task graphs that are more common in practice and asking whether on these graphs the motivating subgraph problem can be solved in polynomial time. Alternatively, we can consider approximation algorithms. Albers and Kraft^{2} considered a variant of this question: what is the minimum reward *r* for which a motivating subgraph exists? They showed that this problem can not be approximated by a factor better than , and they presented a approximation algorithm. Interestingly, the subgraph achieving the approximation is a path. Surprisingly, in a different paper, Albers and Kraft^{3} were able to break the barrier and presented a 2-approximation algorithm for a more powerful designer who is able to increase the costs of edges.

An alternative way to motivate an agent to reach *t* is to place intermediate rewards on specific nodes or edges; the agent will claim each reward if it reaches the node or edge on which it is placed. Now the question is to place rewards on the nodes or edges of an instance *G* such that the agent reaches the goal *t* while claiming as little total reward as possible; this corresponds to the designer's objective to pay out as little as possible while still motivating the agent to reach the goal. Following our paper, Albers and Kraft^{2} and Tang et al.^{20} studied different versions of this question and showed that the problem of assigning intermediate rewards in an optimal way is NP-complete. A version worth mentioning is the one in which the designer only cares about minimizing the rewards that the agent actually takes. Such a formulation of the problem comes with the danger that a designer would be able to create "exploitive" solutions in which the agent is motivated by intermediate rewards that it will never claim, because these rewards are on nodes that the agent will never reach.

We have developed a graph-theoretic model in which an agent constructs a path from a start node *s* to a goal node *t* in an underlying graph *G* representing a sequence of tasks. Time-inconsistent agents may plan an *s-t* path that is different from the one they actually follow, and this type of behavior in the model can reproduce a range of qualitative phenomena including procrastination, abandonment of long-range tasks, and the benefits of a reduced set of options. Our results provide characterizations for a set of basic structures in this model, including for graphs achieving the highest cost ratios between time-inconsistent agents and shortest paths, and we have investigated the structure of minimal graphs on which an agent is motivated to reach the goal node.

There is a wide range of broader issues for further work. These include finding structural properties beyond our graph-minor characterization that have a bearing on the cost ratio of a given instance; obtaining a deeper understanding of the relationship between agents with different levels of time-inconsistency as measured by different values of ; and developing algorithms for designing graph structures that motivate effort as efficiently as possible, including for multiple agents with diverse time-inconsistency properties.

In work following the initial appearance of our paper, the results were extended in several subsequent lines of research. We have discussed several of these further results in the text thus far, including a tighter graph-minor characterization of instances with high cost ratio,^{20} and a set of hardness results and approximation algorithms for motivating subgraphs.^{2, 3, 20} Further results beyond these have considered the behavior of different types of agents. Gravin et al.^{8} studied the behavior of a present-biased agent whose present-bias parameter is not fixed over time; rather, after each step it re-samples the parameter from some fixed distribution. In Kleinberg et al.,^{10} the authors of the present paper together with Manish Raghavan studied the behavior of *sophisticated* agents (building on a formalization from O'Donoghue and Rabin^{14}), who are aware of their present bias and can take it into account in their planning. Such agents are not able to ignore or disregard their present biasthey still prefer avoiding costs in the present time periodbut their sophistication does mean that they can take measures to reduce the future effects of present bias in their plans. Finally Kleinberg et al.,^{11} studied the behavior of agents that exhibit two biases concurrently: present bias, as in this paper, and *sunk-cost bias*, which is the tendency to reason about costs already incurred in the formulation of plans for the future.

We thank Supreet Kaur, Sendhil Mullainathan, and Ted O'Donoghue for valuable discussions and suggestions.

1. Akerlof, G.A. Procrastination and obedience. *American Econ. Rev.: Papers and Proc. 81* (1991).

2. Albers, S., Kraft, D. Motivating time-inconsistent agents: A computational approach. In *Intl. Conf. Web and Internet Econ.* (2016).

3. Albers, S., Kraft, D. On the value of penalties in time-inconsistent planning. In *Proc. 44th Intl. Colloq. on Automata, Languages and Programming* (2017).

4. Ariely, D., Wertenbroch, K. Procrastination, deadlines, and performance: self-control by precommitment. *Psych. Sci. 13* (2002).

5. Carstensen, P. The complexity of some problems in parametric linear and combinatorial programming. PhD thesis, U. Michigan (1983).

6. Diestel, R. *Graph Theory*, 3 edn. Springer (Berlin/Heidelberg, 2005).

7. Frederick, S., Loewenstein, G., O'Donoghue, T. Time discounting and time preference. *J. Econ. Lit. 40*, 2 (June 2002), 351401.

8. Gravin, N., Immorlica, N., Lucier, B., Pountourakis, E. Procrastination with variable present bias. In *Proc. ACM Conf. Econ. Comp.* (2016).

9. Kaur, S., Kremer, M., Mullainathan, S. Self-control and the development of work arrangements. *American Econ. Rev.: Papers and Proceedings 100* (2002).

10. Kleinberg, J., Oren, S., Raghavan, M. Planning problems for sophisticated agents with present bias. In *Proc. ACM Conf. Econ. Comp.* (2016).

11. Kleinberg, J., Oren, S., Raghavan, M. Planning with multiple biases. In *Proc. ACM Conf. Econ. Comp.* (2017).

12. Laibson, D. Golden eggs and hyperbolic discounting. *Q. J. Econ. 112*, 2 (1997), 443478.

13. Nikolova, E., Kelner, J.A., Brand, M., Mitzenmacher, M. Stochastic shortest paths via quasi-convex maximization. In *Proc. 14th European Symposium on Algorithms* (2006), 552563.

14. O'Donoghue, T., Rabin, M. Doing it now or later. *American Econ. Rev. 89* (1999).

15. O'Donoghue, T., Rabin, M. Procrastination on long-term projects. *J. Econ. Behav. Org. 66* (2008).

16. Pollak, R.A. Consistent planning. *Rev. Econ. Studies 35*, 2 (Apr. 1968), 201208.

17. Roughgarden, T. Lecture 19: Time-inconsistent planning, 2016. http://theory.stanford.edu/tim/f16/l/l19.pdf.

18. Russell, S.L., Norvig, P. *Artificial Intelligence: A Modern Approach.* Prentice Hall (Upper Saddle River NJ, USA, 1994).

19. Strotz, R.H. Myopia and inconsistency in dynamic utility maximization. *Rev. Econ. Studies 23* (1955).

20. Tang, P., Teng, Y., Wang, Z., Xiao, S., Xu, Y. Computational issues in time-inconsistent planning. In *Proc. 31st AAAI Conf. Artificial Intelligence* (2017).

a. Note that there is no time-discounting in this example, so the factor of *b* is only applied to the present time period, while all future time periods are treated equally. We will return to the issue of discounting shortly.

b. For purposes of our discussion, we distinguish abandonment of a task from the type of procrastination exhibited by Akerlof's example, in which the task is eventually finished, but at a much higher cost due to the effect of procrastination.

c. This sketch is based on the full proof in our original paper, and also draws on the exposition in Tim Roughgarden's lecture notes about our paper.^{17}

The original version of this paper was published in the *Proceedings of the 15th ACM Conference on Economics and Computation* (Palo Alto, CA, June 812, 2014), 547564.

**©2018 ACM 0001-0782/18/03**

Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and full citation on the first page. Copyright for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, to republish, to post on servers, or to redistribute to lists, requires prior specific permission and/or fee. Request permission to publish from permissions@acm.org or fax (212) 869-0481.

The Digital Library is published by the Association for Computing Machinery. Copyright © 2018 ACM, Inc.

No entries found