The expected value of extinction risk reduction is positive

December 06, 2018

By Jan M. Brauner and Friederike M. Grosse-Holz

Work on this article has been funded by the Centre for Effective Altruism, but the article represents the personal views of the authors.

Abstract

If most expected value or disvalue lies in the billions of years to come, altruists should plausibly focus their efforts on improving the long-term future. It is not clear whether reducing the risk of human extinction would, in expectation, improve the long-term future, because a future with humanity may be better or worse than one without it.

From a consequentialist, welfarist view, most expected value (EV) or disvalue of the future comes from scenarios in which (post-)humanity colonizes space, because these scenarios contain most expected beings. Simply extrapolating the current welfare (part 1.1) of humans and farmed and wild animals, it is unclear whether we should support spreading sentient beings to other planets.

From a more general perspective (part 1.2), future agents will likely care morally about the same things we find valuable or about any of the things we are neutral towards. It seems very unlikely that they would see value exactly where we see disvalue. If future agents are powerful enough to shape the world according to their preferences, this asymmetry implies the EV of future agents colonizing space is positive from many welfarist perspectives.

If we can defer the decision about whether to colonize space to future agents with more moral and empirical insight, doing so creates option value (part 1.3). However, most expected future disvalue plausibly comes from futures controlled by indifferent or malicious agents. Such “bad” agents will make worse decisions than we, currently, could. Thus, the option value in reducing the risk of human extinction is small.

The universe may not stay empty, even if humanity goes extinct (part 2.1). A non-human animal civilization, extraterrestrials or uncontrolled artificial intelligence that was created by humanity might colonize space. These scenarios may be worse than (post-)human space colonization in expectation. Additionally, with more moral or empirical insight, we might realize that the universe is already filled with beings or things we care about (part 2.2). If the universe is already filled with disvalue that future agents could alleviate, this gives further reason to reduce extinction risk.

In practice, many efforts to reduce the risk of human extinction also have other effects of long-term significance. Such efforts might often reduce the risk of global catastrophes (part 3.1) from which humanity would recover, but which might set technological and social progress on a worse track than they are on now. Furthermore, such efforts often promote global coordination, peace and stability (part 3.2), which is crucial for safe development of pivotal technologies and to avoid negative trajectory changes in general.

Aggregating these considerations, efforts to reduce extinction risk seem positive in expectation from most consequentialist views, ranging from neutral on some views to extremely positive on others. As efforts to reduce extinction risk also seem highly leveraged and time-sensitive, they should probably hold prominent place in the long-termist EA portfolio.

Introduction and background

The future of Earth-originating life might be vast, lasting millions of years and containing many times more beings than currently alive (Bostrom, 2003). If future beings matter morally, it should plausibly be a major moral concern that the future plays out well. So how should we, today,  prioritise our efforts aimed at improving the future?

We could try to reduce the risk of human extinction. A future with humanity would be drastically different from one without it. Few other factors seems as pivotal for how the world will look like in the millions of years to come as whether or not humanity survives the next few centuries and millennia. Effective efforts to reduce the risk of human extinction could thus have immense long-term impact. If we were sure that this impact was positive, extinction risk reduction would plausibly be one of the most effective ways to improve the future.

However, it is not at first glance clear that reducing extinction risk is positive from an impartial altruistic perspective. For example, future humans might have terrible lives that they can’t escape from, or humane values might exert little control over the future, resulting in future agents causing great harm to other beings. If indeed it turned out that we weren’t sure if extinction risk reduction was positive, we would prioritize other ways to improve the future without making extinction risk reduction a primary goal.

To inform this prioritisation, in this article we estimate the expected value of efforts to reduce the risk of human extinction.

Moral assumptions

Throughout this article, we base our considerations on two assumptions:

  1. That it morally matters what happens in the billions of years to come. From this very long-term view, making sure the future plays out well is a primary moral concern.
  2. That we should aim to satisfy our reflected moral preferences. Most people would want to act according to the preferences they would have upon idealized reflection, rather than according to their current preferences. The process of idealized reflection will differ between people. Some people might want to revise their preferences after they became much smarter, more rational and had spent millions of years in philosophical discussion. Others might want to largely keep their current moral intuitions, but learn empirical facts about the world (e.g. about the nature of consciousness).

Most arguments further assume that the state the world is brought into by one’s actions is what matters morally (as opposed to e.g. the actions following a specific rule). We thus take a consequentialist view, judging potential actions by their consequences.

Parts 1.1 and 1.2 further take a welfarist perspective, assuming that what matters morally in states of the world is the welfare of sentient beings. In a way, that means assuming our reflected preferences are welfarist. Welfare will be broadly defined as including pleasure and pain, but also complex values or the satisfaction of preferences. From this perspective, a state of the world is good if it is good for the individuals in this world. Across several beings, welfare will be aggregated additively[^fn-1], no matter how far in the future an expected being lives. Additional beings with positive (negative) welfare coming into existence will count as morally good (bad). In short, parts 1.1 and 1.2 take the view of welfarist consequentialism with a total view on population ethics (see e.g. (Greaves, 2017)), but the arguments also hold for other similar views.

If we make the assumptions outlined above, nearly all expected value or disvalue in a future with humanity arises from scenarios in which (post-)humans colonize space. The colonizable universe seems very large, so scenarios with space colonization likely contain a lot more beings than scenarios with earthbound life only (Bostrom, 2003). Conditional on human survival, space colonization also does not seem too unlikely, thus nearly all expected future beings live in scenarios with space colonization[^fn-2]. We thus take “a future with humanity” to mean “(post-)human space colonization” for the main text and briefly discuss what a future with only earthbound humanity might look like in Appendix 1.

Outline of the article

Ultimately, we want to know “What is the expected value (EV) of efforts to reduce the risk of human extinction?”. We will address this question in three parts:

  • In part 1, we ask “What is the EV of (post-)human space colonization[^fn-3]?”. We first attempt to extrapolate the EV from the amounts of value and disvalue in today’s world and how they would likely develop with space colonization. We then turn toward a more general examination of what future agents’ tools and preferences might look like and how they will, in expectation, shape the future. Finally, we consider if future agents could make a better decision on whether to colonize space (or not) than we can, so that it seems valuable to let them decide (option value).
  • In part 1 we tacitly assumed the universe without humanity is and stays empty. In part 2, we drop that assumption. We evaluate how the possibility of space colonization by alternative agents and the possibility of existing but tractable disvalue in the universe change the EV of keeping humans around.
  • In part 3, we ask “Besides reducing extinction risk, what will be the consequences of our efforts?”. We look at how different efforts to reduce extinction risk might influence the long-term future by reducing global catastrophic risk and by promoting global coordination and stability.

We stress that the conclusions of the different parts should not be separated from the context. Since we are reasoning about a topic as complex and uncertain as the long-term future, we take several views, aiming to ultimately reach a verdict by aggregating across them.

A note on disvalue-focus

The moral view on which this article is based is very broad and can include enormously different value systems, in particular different degrees of ‘disvalue-focus’. We consider a moral view disvalue-focused if it holds the prevention/reduction of disvalue is (vastly) more important than the creation of value. One example are views that hold the prevention or reduction of suffering as an especially high moral priority.

The degree of disvalue focus one takes chiefly influences the EV of reducing extinction risk.

From very disvalue-focused views, (post-) human space colonization may not seem desirable even if the future contains a much better ratio of value to disvalue than today. There is little to gain from space colonization if the creation of value (e.g. happy beings) morally matters little. On the other hand, space colonization would multiply thesentient beings and thereby multiply the absolute amount of disvalue.

At first glance it thus seems that reducing the risk of human extinction is not a good idea from a strongly disvalue-focused perspective. However, the value of extinction risk reduction for disvalue-focused views gets shifted upwards considerably by the arguments in part 2 and 3 of this article.


Part 1: What is the EV of (post-)human space colonization?[^fn-4]

1.1: Extrapolating from today’s world

Space colonization is hard. By the time our technology is advanced enough, human civilization will possibly have changed considerably in many ways. However, to get a first grasp of the expected value of the long-term future, we can model it as a rough extrapolation of the present. What if humanity as we know it colonized space? There would be vastly more sentient beings, including humans, farmed animals and wild animals[^fn-5]. To estimate the expected value of this future, we will consider three questions:

  1. How many humans, farmed animals and wild animals will exist?

  2. How should we weigh the welfare of different beings?

  3. For each of humans, farmed animals and wild animals:

  4. Is the current average welfare net positive/average life worth living?

  5. How will welfare develop in the future?

We will then attempt to draw a conclusion. Note that throughout this consideration, we take an individualistic welfarist perspective on wild animals. This perspective stands in contrast to e.g. valuing functional ecosystems and might seem unusual, but is increasingly popular.

There will likely be more farmed and wild animals than humans, but the ratio will decrease compared to the present

In today’s world, both farmed and wild animals outnumber humans by far. There are about 3-4 times more farmed land animals and about 13 times more farmed fish[^fn-6] than humans alive. Wild animals prevail over farmed animals, with about 10 times more wild birds than farmed birds and 100 times more wild mammals than farmed mammals alive at any point. Moving on to smaller wild animals, the numbers increase again, with 10 000 times more vertebrates than humans, and between 100 000 000 - 10 000 000 000 times more insects and spiders than humans[^fn-7].

In the future, the relative number of animals compared to humans will likely decrease considerably.

Farmed animals will not be alive if animal farming substantially decreases or stops, which seems more likely than not for both for moral and economical reasons. Humanity’s moral circle seems to have been expanding throughout history (Singer, 2011) and further expansion to animals may well lead us to stop farming animals.[^fn-8] Also financially, plant-based meat alternatives or lab-grown meat will likely develop to be more efficient than growing animals (Tuomisto and Teixeira de Mattos, 2011). However, none of these developments seems unequivocally destined to end factory-farming[^fn-9], and the historical track record shows that meat consumption per head has been growing for > 50 years[^fn-10]. Overall, it seems likely but not absolutely clear that the number of farmed animals relative to humans will be smaller in the future. For wild animals, we can extrapolate from a historical trend of decreasing wild animal populations. Even if wild animals were spread to other planets for terraforming, the animal / human ratio would likely be lower than today.

Welfare of different beings can be weighted by (expected) consciousness

To determine the EV of the future, we need to aggregate welfare across different beings. It seems like we should weigh the experience of a human, a cow and a beetle differently when adding up, but by how much? This is a hard question with no clear answer, but we outline some approaches here. The degree to which an animal is conscious (“the lights are on”, the being is aware of its experiences, emotions and thoughts), or the confidence we have in an animal being conscious, can serve as a parameter by which to weight welfare. To arrive at a number for this parameter, we can use proxies such as brain mass, neuron count and mental abilities directly. Alternatively, we may aggregate these proxies with other considerations into an estimate of confidence that a being is conscious. For instance, the Open Philanthropy Project estimates the probability that cows are conscious at 80%.

The EV of (post-)human lives is likely positive

Currently, the average human life seems to be perceived as being worth living. Survey data and experience sampling suggests that most humans are quite content with their lives and experience more positive than negative emotions on a day-to-day basis[^fn-11]. If they find it not worth living, humans can take their life, but relatively few people commit suicide (Suicide accounts for 1.7 % of all deaths in US).[^fn-12] We could conclude that human welfare is positive.

We should, however, note the two caveats in this conclusion. First, a live can be perceived as worth living even if it is negative from a welfarist perspective.[^fn-13] Second, the average life might not be worth living if the suffering of the worst off was sufficiently more intense than the happiness of the majority of people.

Overall, it seems that from a large majority of consequentialist views, the current aggregated human welfare is positive.

In the future, we will probably make progress that will improve the average human life. Historic trends have been positive across many indicators of human well-being, knowledge, intelligence and capability. On a global scale, violence is declining, cooperation increasing (Pinker, 2011). Yet, the trend does not include all indicators: subjective welfare has (in recent times) remained stable or improved very little, and mental health problems are more prevalent. These developments have sparked research into positive psychology and mental health treatment, which is slowly bearing fruit. As more fundamental issues are gradually improved, humanity will likely shift more resources towards actively improving welfare and mental health. Powerful tools like genetic design and virtual reality could be used to further improve the lives of the broad majority as well as the worst-off. While there are good reasons to assume that human welfare in the future will be more positive than now, we still face uncertainties (e.g. from low probability events like malicious, but very powerful autocratic regimes and unknown unknowns).

EV of farmed animals’ lives is probably negative

Currently, 93% of farmed animals live on factory farms in conditions that likely make their lives not worth living. Although there are positive sides to animal life on farms compared to life in the wild[^fn-14], these are likely outweighed by negative experiences[^fn-15]. Most farmed animals also lack opportunities to exhibit naturally desired behaviours like grooming. While there is clearly room for improvement in factory farming conditions, the question “is the average life worth living?” must be answered separately for each situation and remains controversial[^fn-16]. On average, a factory farm animal life today probably has negative welfare.

In the future, factory farming is likely to be abolished or modified to improve animal welfare as our moral circle expands to animals (see above). We can thus be moderately optimistic that farm animal welfare will improve and/or less farm animals will be alive.

The EV of wild animals’ lives is very unclear, but potentially negative

Currently, we know too little about the lives and perception of wild animals to judge whether their average welfare is positive or negative. We see evidence of both positive[^fn-17] and negative[^fn-18] experiences. Meanwhile, our perspective on wild animals might be skewed towards charismatic big mammals living relatively good lives. We thus overlook the vast majority of wild animals, based both on biomass and neural count. Most smaller wild animal species (invertebrates, insects etc) are r-selected, with most individuals living very short lives before dying painfully. While vast numbers of those lives seem negative from a welfarist perspective, we may chose to weight them less based on the considerations outlined above. In summary, most welfarist views would probably judge the aggregated welfare of wild animals as negative. The more one thinks that smaller, r-selected animals matter morally, the more negative average wild animal welfare becomes.

In future, we may reduce the suffering of wild animals, but it is unclear whether their welfare would be positive. Future humans may be driven by the expansion of the moral circle and empowered by technological progress (e.g. biotechnology) to improve wild animal lives.  However, if average wild animal welfare remains negative, it would still be bad to increase wild animal numbers by space colonization.

Conclusion

It remains unclear whether the EV of a future in which a human civilization similar to the one we know colonized space is positive or negative.

To quantify the above considerations from a welfarist perspective, we created a mathematical model. This model yields a positive EV for a future with space colonization if different beings are weighted by neuron count and a negative EV if they are weighted by sqrt(neuron count). In the first case, average welfare is positive, driven by the spreading of happy (post-)humans. In the second case, average welfare is negative as suffering wild animals are spread. The model is also based on a series of low-confidence assumptions[^fn-19], alteration of which could flip the sign of the outcome again.

More qualitatively, the EV of an extrapolated future heavily depends on one’s moral views. The degree to which one is focused on avoiding disvalue seems especially important. Consider that every day, humans and animals are being tortured, murdered, or in psychological despair. Those who would walk away from Omelas might also walk away from current and extrapolated future worlds.

Finally, we should note how little we know about the world and how this impacts our confidence in considerations about an extrapolated future. To illustrate the extent of our empirical uncertainty, consider that we are extrapolating from 100 000 years of human existence, 10 000 years of civilizational history and 200 years of industrial history to potentially 500 million years on earth (and much longer in the rest of the universe). If people in the past had guessed about the EV of the future in a similar manner, they would most likely have gotten it wrong (e.g. they might not have considered moral relevance of animals, or not have known that there is a universe to potentially colonize). We might be missing crucial considerations now in analogous ways.


1.2: Future agents’ tools and preferences

While part 1.1 extrapolates directly from today’s world, part 1.2 takes a more abstract approach. To estimate the EV of (post-)human space-colonization in more broadly applicable terms, we consider three questions:

  1. Will future agents have the tools to shape the world according to their preferences?
  2. Will future agents’ preferences resemble our reflected preferences?
  3. Can we expect the net welfare of future agents and powerless beings to be positive or negative?

We then attempt to estimate the EV of future agents colonizing space from a welfarist consequentialist view.

Future agents will have powerful tools to shape the world according to their preferences

Since climbing down from the trees, humanity has changed the world a great deal. We have done this by developing increasingly powerful tools to satisfy our preferences (i.e. preferences to eat, stay healthy and warm, and communicate with friends (even if they are far away)). As far as humans have altruistic preferences, powerful tools have made acting on them less costly. For instance, if you see someone is badly hurt and want to help, you don’t have to carry them home and care for them yourself anymore, you can just call an ambulance. However, powerful tools have also made it easier to cause harm, either by satisfying harmful preferences (e.g. weapons of mass destruction) or as a side-effect of our actions that we are indifferent to. Technologies that enable factory farming do enormous harm to animals, although they were developed to satisfy a preference for eating meat, not for harming animals[^fn-20].

It seems likely that future agents will have much more powerful tools than we do today. These tools could be used to make the future better or worse. For instance, biotechnology and genetic engineering could help us cure diseases and live longer, but they could also enforce inequality if treatments are too expensive for most people. Advanced AI could make all kinds of services much cheaper but could also be misused. For more potent and complex tools, the stakes are even higher. Consider the example of technologies that facilitate space colonization. These tools could be used to cause the existence of many times more happy lives than would be possible on Earth, but also to spread suffering.

In summary, future agents will have the tools to create enormous value (more examples here) or disvalue (more examples here).[^fn-21] It is thus important to consider the values/preferences that future agents might have.

We can expect future agents to have other-regarding preferences that we would, after reflection, find somewhat positive

When referring to future agents’ preferences, we distinguish between ‘self-regarding preferences’, i.e. preferences about states of affairs that directly affect an agent, and ‘other-regarding preferences’, i.e. preferences about the world that remain even if an agent is not directly affected (see footnote[^fn-22] for a precise definition). Future agents’ other-regarding preferences will be crucial for the value of the future. For example, if the future contains powerless beings in addition to powerful agents, the welfare of the former will depend to a large degree on the other-regarding preferences of the latter (much more about that later).

We can expect a considerable fraction of future agents’ preferences to be other-regarding

Most people alive today clearly have (positive and negative) other-regarding preferences, but will this be the case for future agents? It has been argued that over time, other-regarding preferences could be stripped away by Darwinian selection. We explore this argument and several counterarguments in appendix 2. We conclude that future agents will, in expectation, have a considerable fraction of other-regarding preferences.

Future agents’ preferences will in expectation be parallel rather than anti-parallel to our reflected preferences

We want to estimate the EV of a future shaped by powerful tools according to future agents’ other-regarding preferences. In this article we assume that we should ultimately aim to satisfy our reflected moral preferences, the preferences we would have after an idealized reflection process (as discussed in the "Moral assumptions" section above). Thus, we must establish how future agents’ other-regarding preferences (FAP) compare to our reflected other-regarding preferences (RP). Briefly put, we need to ask: “would we want the same things as these future agents who will shape the world?”

FAP can be somewhere on a spectrum from parallel to orthogonal to anti-parallel to RP. If FAP and RP are parallel, future agents agree exactly with our reflected preferences. If the are anti-parallel, future agents see value exactly where we see disvalue. And if the are orthogonal, future agents value what we regard as neutral, and vice versa.  We now examine how FAP will be distributed on this spectrum.

Assume that future agents care about moral reflection. They will then have better conditions for an idealized reflection process than we have, for several reasons:

  • Future agents will probably be more intelligent and rational[^fn-23]
  • Empirical advances will help inform moral intuitions (e.g. experience machines might allow agents to get a better idea of other beings’ experiences)
  • Philosophy will advance further
  • Future agents will have more time and resources to deliberate

Given these prerequisites, it seems that future agents’ moral reflection would in expectation lead to FAP that are parallel rather than anti-parallel to RP. How much overlap between FAP and RP to expect remains difficult to estimate.[^fn-24]

However, scenarios in which future agents do not care about moral reflection might substantially influence the EV of the future. For example, it might be likely  that humanity loses control and the agents shaping the future bear no resemblance to humans. This could be the case if developing controlled artificial general intelligence (AGI) is very hard, and the probability that misaligned AGI will be developed is high (in this case, the future agent is a misaligned AI).[^fn-25]

Even if (post-)humans remain in control, human moral intuitions might turn out to be contingent the starting conditions of the reflection process and not very convergent across the species. Thus, FAP may not develop into any clear direction, but rather drift randomly[^fn-26]. Very strong and fast goal drift might be possible if future agents include digital (human) minds because such minds would not be restrained by the cultural universals rooted in the physical brain architecture.

If it turns out that FAP develop differently from RP, FAP will in expectation be orthogonal to RP rather than anti-parallel. The space of possible preferences is vast, so it seems much more likely that FAP will be completely different from RP, rather than exactly opposite[^fn-27] (See footnote[^fn-28] for an example). In summary, FAP parallel or orthogonal to RP both seem likely, but a large fraction of FAP being anti-parallel to RP seems fairly unlikely. This main claim seems true for most “idealized reflection processes” that people would choose.

However, FAP being between parallel and orthogonal to RP in expectation does not necessarily imply the future will be good. Actions driven by (orthogonal) FAP could have very harmful side-effects, as judged by our reflected preferences. Harmful side-effects could be devastating especially if future agents are indifferent towards beings we (would on reflection) care about morally. Such negative side-effects might outweigh positive intended effects, as has happened in the past[^fn-29]. Indeed, some of the most discussed “risks of astronomical future suffering” are examples of negative side-effects.[^fn-30]

Future agents’ tools and preferences will in expectation shape a world with probably net positive welfare

Above we argued that we can expect some overlap between future agents’ other-regarding preferences (FAP) and our reflected other-regarding preferences (RP). We can thus be somewhat optimistic about the future in a very general way, independent of our first-order moral views, if we ultimately aim to satisfy our reflected preferences. In the following section, we will drop some of that generality. We will examine what future agents’ preferences will imply for the welfare of future beings. In doing so, we assume that we would on reflection hold an aggregative, welfarist altruistic view (as explained in the background-section).

If we assume these specific RP, can we still expect FAP to overlap with them? After all, other-regarding preferences anti-parallel to welfarist altruism – such as sadistic, hateful, revengeful preferences - clearly exist within present day humanity. If current human values transferred broadly into the future, should we then expect a large fraction of FAP being anti-parallel to welfarist altruism? Probably not. We argue in appendix 3 that although this is hard to quantify, the large majority of human other-regarding preferences seem positive.

Assuming somewhat welfarist FAP, we explore what the future might be like for two types of beings: Future agents (post-humans) who have powerful tools to shape the world, and powerless future beings. To aggregate welfare for moral evaluation, we need to estimate how many beings of each type will exist. Powerful agents will likely be able to create powerless beings as “tools” if this seems useful for them. Sentient “tools” could include animals, farmed for meat production or spread to other planets for terraforming (e.g. insects), but also digital sentient minds, like sentient robots for task performance or simulated minds created for scientific experimentation or entertainment. The last example seems especially relevant, as digital minds could be created in vast amounts if digital sentience is possible at all, which does not seem unlikely. If we find we morally care about these “tools” upon reflection, the future would contain many times more powerless beings than powerful agents.

The EV of the future thus depends on the welfare of both powerful agents and powerless beings, with the latter potentially much more relevant than the former. We now consider each in turn, asking:

  • How will their expected welfare be affected by intended effects and side-effects of future agents’ actions?
  • How to evaluate this morally?
The aggregated welfare of powerful future agents is in expectation positive

Future agents will have powerful tools to satisfy their self-regarding preferences and be somewhat benevolent towards each other. Thus, we can expect future agents’ welfare to be increased through intended effects of their actions.

Side-effects of future agents’ actions negative for other agents’ welfare would mainly arise if their civilization is not coordinated well. However, compromise and cooperation seem to usually benefit all involved parties, indicating that we can expect future agents to develop good tools for coordination and use them a lot.[^fn-31] Coordination also seems essential to avert many extinction risks. Thus, a civilization that avoided extinction so successfully that it colonizes space is expected to be quite coordinated.

Taken together, vastly more resources will likely be used in ways that improve the welfare of powerful agents than in ways that diminish their welfare. From the big majority of welfarist views, future agents’ aggregated welfare is thus expected to be positive. This conclusion is also supported by human history, as improved tools, cooperation and altruism have increased the welfare of most humans and average human lives are seen as worth living by many (see part 1.1).

The aggregated welfare of powerless future beings may in expectation be positive

Assuming that future agents are mostly indifferent towards the welfare of their “tools”, their actions would affect powerless beings only via (in expectation random) side-effects. It is thus relevant to know the “default” level of welfare of powerless beings. If the affected powerless beings were animals shaped by evolution, their default welfare might be net negative. This is because evolutionary pressure might result in a pain-pleasure asymmetry with suffering being much more intense than pleasure (see footnote for further explanation[^fn-32]). Such evolutionary pressure would not apply for designed digital sentience. Given that our experience with welfare is restricted to animals (incl. humans) shaped by evolution, it is unclear what the default welfare of digital sentients would be. If there is at least some moral concern for digital sentience, it seems fairly likely that the creating agents would prefer to give their sentient tools net positive welfare[^fn-33].

If future agents intend to affect the welfare of powerless beings, they might - besides from treating their sentient “tools” accordingly - create (dis-)value optimized sentience: minds that are optimized for extreme positive or negative welfare. For example, future agents could simulate many minds in bliss, or many minds in agony. The motivation for creating (dis-)value optimized sentience could be altruism, sadism or strategic reasons[^fn-34]. Creating (dis-)value optimized sentience would likely produce much more (negative) welfare per unit of invested resources than the side-effects on sentient tools mentioned above, as sentient tools are optimized for task performance, not production of (dis-)value[^fn-35]. (Dis-)value optimized sentience would then be the main determinant of the expected value of post-human space colonization, and not side-effects on sentient tools.

FAP may be orthogonal to welfarist altruism, in which case little (dis-)value optimized sentience will be produced. However, we expect a much larger fraction of FAP to be parallel to welfarist altruism than anti-parallel to it, and thus expect that future agents will use many more resources to create value-optimized sentience than disvalue-optimized sentience. The possibility of (dis-)value optimized sentience should increase the net expected welfare of powerless future beings. However, there is considerable uncertainty about the moral implications of one resource-unit spent optimized for value or disvalue (see e.g. here and here). On the one hand, (dis)value optimized sentience created without evolutionary pressure might be equally efficient in producing moral (dis)value, but used a lot more to produce value. On the other hand, disvalue optimized sentience might lead to especially intense suffering. Many people intuitively give more moral importance to the prevention of suffering the worse it gets (e.g. prioritarianism).

In summary, it seems plausible that a little concern for the welfare of sentient tools could go a long way. Even if most future agents were completely indifferent towards sentient tools (=majority of FAP orthogonal to RP), positive intended effects – creation of value-optimized sentience – could plausibly weigh heavier than side-effects.

Conclusion

Morally evaluating the future scenarios sketched in part 1.2 is hard because we are uncertain. Both empirically uncertain what the future will be like and morally uncertain what our intuitions will be like. The key unanswered questions are

  • How much can we expect the preferences that shape the future to overlap with our reflected preferences?
  • In absence of concern for the welfare of sentient tools, how good or bad is their default welfare?
  • How will the scales of intended effects and side-effects compare?

Taken together, we believe that the arguments in this section indicate that the EV of (post)-human space colonization would only be negative from relatively strongly disvalue-focused views. From the majority, but not overwhelming majority, of welfarist views the EV of (post)-human space colonization seems positive.[^fn-36],[^fn-37]

In parts 1.1 and 1.2, we directly estimated the EV of (post-)human space colonization and found it to be very uncertain. In the remaining parts, we will improve our estimate via other approaches that are less dependent on specific predictions about how (post-)humans will shape the future.


1.3: Future agents could later decide not to colonize space (option value)

We are often uncertain about what the right thing to do is. If we can defer the decision to someone wiser than ourselves, this is generally a good call. We can also defer across time: we can keep our options open for now, and hope our descendants will be able to make better decisions. This option value may give us a reason to prefer to keep our options open.

For instance, our descendants may be in a better position to judge whether space colonization would be good or bad. If they can see that space colonization would be negative, they can refrain from (further) colonizing space: They have the option to limit the harm. In contrast, if humanity goes extinct, the option of (post)-human space colonization is forever lost. So avoiding extinction creates ‘option value’(e.g. Macaskill).[^fn-38] This specific type of ‘option value’ - from future agents choosing not to colonize space - and not the more general value of keeping options open, is what we will be referring to throughout this section.[^fn-39] This type of option value exist for nearly all moral views, and is very unlikely to be negative.[^fn-40] However, as we will discuss in this chapter, this value is rather small compared to other considerations.

A considerable fraction of futures contains option value

Reducing the risk of human extinction only creates option value if future agents will make a better decision, by our (reflected) lights, about whether to colonize space than we could. If they will make worse decisions than us, we would rather decide ourselves.

In order for future agents to make better decisions than us and actually act on them, they need to surpass us in at least one of the following aspects:

  • Better values
  • Better judgement what space colonization will be like (based on increased empirical understanding and rationality)
  • Greater willingness and ability to make decisions based on moral values (non-selfishness and coordination)
Values

Human values change. We are disgusted by many of our ancestors’ moral views, and they would find ours equally repugnant. We can even look back on our own moral views and disagree. There is no reason for these trends to stop exactly now: human morality will likely continue to change.

Yet at each stage in the change, we are likely to view our values as obviously correct. This encourages a greater degree of moral uncertainty than feels natural. We should expect that our moral views would change after idealized reflection (although this also depends on which meta-ethical theory is correct and how idealized reflection works).

We argued in part 1.2 that future agents’ preferences will in expectation have some overlap with our reflected preferences. Even if that overlap is not very high, a high degree of moral uncertainty would indicate that we would often prefer future agents’ preferences over our current, unreflected preferences. In a sizeable fraction of future scenarios, future agents with more time and better tools to reflect, can be expected to make better decisions than one could today.

Empirical understanding and rationality

We now understand the world better than our ancestors, and are able to think more clearly. If those trends continue, future agents may understand better what space colonization will be like, and so better understand how good it will be on a given set of values.

For example, future agents’ estimate of the EV of space colonization will benefit from

  • Better empirical understanding of the universe (for instance about questions discussed in part 2.2)[^fn-41] and better predictions, fuelled by more scientific knowledge and better forecasting techniques
  • Increased intelligence and rationality[^fn-42], allowing them to more accurately determine what the best action is based on their values.

As long as there is some overlap between their preferences and one’s reflected preferences, this gives an additional reason to defer to future agents’ decisions (example see footnote).[^fn-43]

Non-selfishness and coordination

We often know what’s right, but don’t follow through on it anyway. What is true for diets also applies here:

  • Future agents would need to actually make the decision about space colonization based on moral reasoning[^fn-44]. This might imply acting against strong economic incentives pushing towards space colonization.
  • Future agents need to be coordinated well enough to avoid space colonization. That might be a challenge in non-singleton futures since future civilization would need ways to ensure that not a single agent starts space colonization.

It seems likely that future agents would probably surpass our current level of empirical understanding, rationality, and coordination, and in a considerable fraction of possible futures they might also do better on values and non-selfishness. However, we should note that to actually not colonize space, they would have to surpass a certain threshold in all of these fields, which may be quite high. Thus, a little bit of progress doesn’t help - option value is only created in deferring the decision to future agents if they surpass this threshold.

Only the relative good futures contain option value

For any future scenario to contain option value, the agents in that future need to surpass us in various ways, as outlined above. This has an implication that further diminishes the relevance of the option value argument. Future agents need to have relatively good values and be relatively non-selfishness to decide not to colonize space for moral reasons. But even if these agents colonized space, they would probably do it in a relatively good manner. Most expected future disvalue plausibly comes from futures controlled by indifferent or malicious agents (like misaligned AI). Such “bad” agents will make worse decisions about whether or not to colonize space than we, currently, could, because their preferences are very different from our (reflected) preferences. Potential space colonization by indifferent or malicious agents thus generates large amounts of expected future disvalue, which cannot be alleviated by option value. Option value doesn’t help in the cases where it is most needed (see footnote for an explanatory example)[^fn-45]

Conclusion

If future agents are good enough, there is option value in deferring the decision whether to colonize space to them. In some not-too-small fraction of possible futures, agents will fulfill the criteria and thus option value adds positively to the EV of reducing extinction risk. However, the futures accounting for most expected future disvalue are likely controlled by indifferent or malicious agents. Such “bad” agents would likely make worse decisions than we could. A large amount of expected future disvalue is thus not amendable from alleviation through option value. Overall, we think the option value in reducing the risk of human extinction is probably fairly moderate, but there is a lot of uncertainty and contingency on one’s specific moral and empirical views[^fn-46]. Modelling the considerations of this section showed that if the 90% confidence interval of value of the future was from -0.9 to 0.9 (arbitrary value units), option value was 0.07.