You do not have Javascript enabled. Some elements of this website may not work correctly.

June 4, 2017

In this 2017 talk, the NYU philosopher [Amanda Askell]http://www.amandaaskell.com/p/me.html) argues that we often underestimate the value of new information or knowledge when thinking of how to do good. This means that interventions which give us information, such as research, are often more valuable than it might naively seem.

White space

The below transcript is lightly edited for readability.

So, I'm going to start the talk with some spoilers. Basically, I want nothing here to be a surprise. The first thing that I'm going to claim, and I hope you find it plausible, is that we generally prefer interventions with more evidential support, all else being equal. I'll go into detail about what that means. The second is, I'm going to argue that having less evidence in favor of a given intervention means that your credences about the effectiveness of that intervention are what I call "low resilience".

This is something that has been explored in decision theory to some extent. That's true even if your credences about the effectiveness of that intervention are the same value. So, if I thought there was a 50% chance that I would get $100, there's actually a difference between a low resilience 50% and a high resilience 50%.

I'm going to argue that, if your credences are low resilience, then the value of information in this domain is generally higher than it would be in a domain where your credences are high resilience. And, I'm going to argue that this means that actually in many cases, we should prefer interventions with less evidential support, all else being equal. Hopefully you’ll find that counterintuitive and interesting.

The first thing to say is that we generally think that expected value calculations are a pretty decent way of estimating the effectiveness of a given intervention. An example here is one where we imagine that there is a Disease A, very novelly and interestingly named, and another disease equally interestingly named Disease B.

Basically, the idea is that these two diseases are virtually impossible to differentiate. They all have the same symptoms, they cause the same reduction in life expectancy, etc. The key difference is that they respond very differently to different treatments, so any doctor who finds themselves with a patient with one of these conditions is in a difficult situation.

Expected value and evidence

They can prescribe Drug A. Drug A costs $100. If the patient has Disease A, then Drug A will basically extend their life by another 10 years. If on the other hand they have Disease B, it won't extend their life at all. They will die of Disease B, because Disease B is completely non-responsive to Drug A. So the expected years of life that we get from Drug A is 0.05 per dollar. Drug B works in a very similar way, except it is used to treat Disease B. If you have Disease A, it will be completely non-responsive. So, it's got the same expected value.

Then, we have Drug C. Drug C costs $100, but regardless of whether you have Disease A or Disease B, it will in fact be responsive to Drug C. So, this is a new and interesting drug. This means that the expected value for Drug C is greater than the expected value for either Drug A or Drug B. So we think, "Okay, great. Kind of obvious that you should prescribe Drug C."

Suppose that Drug A and Drug B have been heavily tested in numerous trials, and they've been shown in meta-analysis to be highly effective, and that the estimates I just gave you were extremely robust. Drug C on the hand, is completely new. It's only had a single trial, in which it increased patients’ lives by many years. We assume that in the trial, this was a trial of patients with both diseases.

So, you have a conservative prior about the effectiveness of a drug. You think, "In likelihood, most random drugs that we were to select would either be net neutral or net negative," so that's your conservative prior. If you see one trial in which a drug massively extends someone's life, then your prior might bring you down to something like six years, regardless of whether they have Disease A or Disease B. Now we have the same expectation, but suddenly it seems a bit more questionable whether we should prescribe Drug C.

This idea that we should favor interventions with more evidence, and that expected utility theory can't capture this, is summed up in this blog post from Give Well, I think from a couple of years ago.

"There seems to be nothing in explicit expected value that penalizes relative ignorance, or relatively pearly grounded estimates. If I can literally save a child I see drowning by ruining a $1,000 suit, but in the same moment that I make a wild guess that this $1,000 could save two lives if I put it toward medical research, then explicit expected value seems to indicate that I should opt for the latter."

The idea is that there's something wrong with expected value calculations because they kind of tell us to take wild guesses, as long as the expected value is higher. I want to argue that there are kind of two claims that we might want to vindicate in these sorts of cases. The first claim is one that I think I and hopefully you find quite plausible, and it's the claim that evidence matters. So, how much evidence we have about an intervention can make a different to what we should do.

The second claim is one that I think is implied by the previous quote, which is that we should favor more evidence, all else being equal. So, if the expected value of two interventions is similar, we should generally favor investing in interventions that have more evidence supporting them.

In a case involving Drug A and Drug B and Drug C, maybe we would say something like, "These are relevantly similar." In a case where you have a lot of evidence that Drug A and Drug B have kind of the effects that you saw, this might actually favor giving a more well known drug over a new one, that's only been shown in one trial to be effective.

I'm basically going to consider both of these claims, and whether expected value calculations can vindicate either or both of them. As you kind of know from the spoilers, I'm going to argue that it can support the first claim but it actually rejects the second. Okay. So, I want to turn to this notion of resilience, and how we represent how much evident you have, in terms of the credences you assign to propositions like "This drug will cure this disease."

Probabilities and resilience

Take the first case, which is this untested coin. I've given you no information about how biased this coin is. It could be completely biased in favor of heads, it could be completely biased in favor of tails, or it could be a completely fair coin. You have no information to distinguish between any of these hypotheses. It seems like, in a case where you have no idea about what the bias of a coin is and I say to you, "What is the chance it lands heads on the next throw?" You're going to have to say, "It's about 50%," because you have no reason to favor a heads bias over a tails bias.

Now consider a difference case, which is the well-tested coin. The well tested coin, you flip it, you get the following sequence, "Heads, heads, heads, tails, heads, heads, tails, tails," until the coin has been flipped a million times. You had a very, very boring series of days with this coin.

In the first case, in answer to the question, "What's the probability that the coin will land heads in the next flip?" you should say, "0.5 or 50%." In the second case, where you tested the coin a bunch and it's come up heads roughly 50% of the time, tails roughly 50% of the time, you should also say that the next flip is 50% likely to be heads.

The difference in these cases is reflected in the resilience levels of your credences. One kind of simple formulation of resilience, I think we can get a bit more specific with this, but for the purposes of this talk it doesn't matter too much, is that credo-resilience is how stable you expect your credences to be in response to new evidence. If my credences are high resilience, then there's more stability. I don't expect them to vary that much as new evidence comes in, even if the evidence is good and pertinent to the question. If they're low resilience, then they have low stability. I expect them to change a little in response to new evidence. That's true in the case of the untested coin, where I just have no data about how good it is, so the resilience of my credence of 50% is fairly low.

It's worth noting that resilience levels can reflect either the set of evidence that you have about a proposition, or your prior about the proposition. So, if it's just incredibly plausible that the coins are generally fair. For example, if you saw me simply pick the coin up out of a stack of otherwise fair coins, in this case you would have evidence that it's fair. But if you simply live in a world that doesn't include a lot of very biased coins, then your prior might be doing a lot of the work that your evidence would otherwise do. These are the two things that generate credo-resilience.

In both cases, with the coin, your credence that the coin will land heads on the next flip is the same, it's 0.5. Your credence of 0.5 about the tested coin is resilient, because you've done a million trials of this coin. Whereas, your credence about the untested coin is quite fragile. It could easily move in response to new evidence, as we see here.

Probabilities and resilience 2

Take this third case. You start to test the untested coin, so you perform a series of flips with the coin, and you start to see a pattern. In a case like this, it looks like the coin in front of you is pretty heavily heads biased, or you at least start to quite rapidly increase your credence that it's heads biased. So, your credence that it's going to come up heads next time is much higher. Because you had less evidence before, this credence was much more fragile, so now you've seen a change.

This would not happen if you got this sequence on the well-tested coin, because more evidence means that your credences are more resilient. If you saw a series of five head after performing a million trials, and it lands heads roughly half the time, this is just not going to make a huge difference to what you expect the next coin flip to be.

I think credo-resilience has some interesting effects. A lot of people seems to be kind of unwilling to assert probability estimates about whether something is going to work or not. I think a really good explanation for this is that, in cases where we don't have a lot of evidence, our credences about how good our credences are, are fairly low.

We basically think it's really likely that we're going to move around a lot in response to new evidence. We're just not willing to assert a credence that we think is just going to be false, or inaccurate once we gain a little bit more evidence. Sometimes people think you have mushy credences, that you don't actually have precise probabilities that you can assign to claims like, "This intervention is effective to Degree N." I actually think resilience might be a good way of explaining that away, to say, "No. You can have really precise estimates. You just aren't willing to assert them."

One thing that this has a huge influence on, and is kind of the theme to this talk, is the value of information. To return to our drug case, which I hope you'll see, the idea is that this is supposed to be somewhat analogous to interventions. However I don't want to put any interventions there, because I don't want to make people think that I think their interventions don't have enough evidence behind them.

Value of information

In the original case, we had the following kind of scenario, where we had expected 0.05, 0.05 and 0.06 for the three drugs. Of course, one thing that we can do here is gain valuable evidence about the world. Consider this case, where diagnosis is invented, at least as far as Disease A and Disease B are concerned. So, we can now diagnose whether you have Disease A or Disease B, and it costs 60 additional dollars to do so. Given this, if I diagnose you, then I can expect that conditional on diagnosis, if you have Disease A, you will live for 10 years because, I will be able to then pay an additional $100 to give you Drug A. If you have Disease B, I'll be able to pay an additional $100 to get you Drug B.

So in this case, the value of diagnosis, including the cost of then curing you of the disease, is actually higher than any of the original interventions. Rather than giving you Drug A, Drug B, Drug C, I should diagnose you and give you the correct drug. Hopefully this is intuitive.

Okay. That was information about the world, which maybe we think is valuable anyway. Supposed I care about global poverty and I want to find out good interventions, I can find out about deficiencies that exist in India for example. Then, I can see if there are good ways to improve that. So, that's finding out about the world.

Obviously a different way we can gain valuable information here is by finding out about interventions themselves. An example would be to look at the actual intervention of Drug C, and at how effective it is.

Value of information 2

Suppose that the cost of a trial of Drug C, that would basically bring you to certainty about its effectiveness, is in this wonderfully ideal world, $5,000. Somehow, you know that it can only be very low impact or very high impact. You have a credence of about 0.5, that it's going to come out that, in both Disease A and Disease B, this only actually extends life by two years. Let that be the kind of skeptical prior. But you also have the credence of about 0.5 that will extend life by 10 years in both cases. Let's assume diagnosis has gone out the window.

Okay. So, you're currently prescribing Drug C. You're ignoring the fact that there's low evidence here. You obviously don't exist in any modern medical system. You're going with the expected value as is, basically. Then the question is, "What is the value of doing this trial, especially given that you're already prescribing Drug C?" If your credence of low impact goes to one, i.e., you suddenly discover that this drug is much less effective than you thought it would be, then you're going to switch from Drug C to prescribing Drug A or B again.

Value of information 3

The per patient benefit is going to go from two years of expected life to five years of expected life in this case. Whereas, if you perform no trial, you won't spend anything, but you'll only get two years of additional life per $100, if it is in fact a low impact drug. You'll be prescribing Drug C continuously every time you see Disease A and Disease B, and it'll only give people an additional two years of life. Whereas if it's high impact, then you'll still get the ten year benefit because you'll have been accidentally prescribing something that's very good.

So, the trial adds 1.5 years of expected life per future treatment. The trial is therefore better than just prescribing Drug C if there are more than 2,000 patients. So, if there are more than 2,000 patients, the value of investing in a trial of Drug C is better than giving any of the drugs currently present. That means that the information value actually swamps the direct value of intervention here.

The value of investing in Drug A or B testing is going to be negligible because credences about their effectiveness are already resilient. So, this all builds up to what I think is the result of this, which is a really intrusive one really, which is that actually expected utility theory or expected value calculations might say that in cases where all else is equal, we should favor investing in interventions that have less evidence, rather than interventions that have more.

That means, if the expected concrete value of two interventions is similar, we should generally favor investing in interventions that have less evidence supporting them. I'm going to use concrete value to just mean "Non-informational value." The idea here is that in such cases, the concrete value is the same but, the information value for one of them is much higher, namely the one where you have a much lower resilience credence generating your expected value calculation.

So, this gets us to the "What does this mean, and what should we do" part. Hopefully I’ve convinced you that, despite the fact that it was an intuitive proposition, that we should favor things with more evidence, there's actually some argument that we should favor things that have less evidence.

When we're considering information value, there are basically three options available to us. I used to call this "Look, leap and retreat," and then I discovered that I really like things that sound the same so, I went for "Explore, exploit or evade."

Explore, exploit, evade

So we can choose to explore, and this means investing resources in interventions primarily for their information value. So things like research, funding to gather data, career trials. We can exploit, which means investing resources in an intervention for its concrete value. That means, things like large project grants and entire career choices. Or we can evade. We can decide just not to invest in a given intervention. We either invest elsewhere, or we completely delay investment.

The main difference between these is the reason for action. Take three people. Amy donates $100 to an existential risk charity to protect the future of humanity, so she's just exploiting the value. She's just looking at the direct concrete value of this intervention.

Bella donates $100 to the same charity to find out how much good they can do in the world. So, she's doing it mainly to explore, and then later she'll exploit. She'll think to herself "Okay. Let's see how valuable this is." If it's very valuable, then she'll basically mine the value of it.

Carla donates $100 to Charity C, so that we have more time to discover what the best causes are. So she is exploiting currently, by investing in it for its direct value for reducing existential risk, so that we can find out what the best cause is, so that we can exploit that. So, she's exploiting to explore to exploit.

So, when is exploring especially cost effective? Essentially, when there are three features. When there's more uncertainty about the direct value of an intervention, so this means options that have high expected value, but low resilience. When there are high benefits of certainty about the direct value, so when we can basically repeatedly mine something for value. And when there are low information costs, so when information's not too costly to obtain and the delay is low cost (you don't really want to be looking for information when cars are driving towards you, as the cost of not just taking action and getting out of the way is pretty high!).

The question I have is basically "Is gaining information especially valuable for effective altruists?" Maybe a different way to put this is "Is information essentially its own cause area, within effective altruism?"

There's a lot of uncertainty within and across good cause areas, especially if we consider long-term indirect effects. We don't know about the long-term indirect effects of a lot of our interventions. So, I think there's a lot of uncertainty here. We see that in terms of the progress that EA Research has made. I think this is evidence of a lot of uncertainty.

There are also high benefits of certainty in this case. We expect to use this information in the long term. Effective altruism isn't like a short-term intervention. So, in multi-generational projects, you expect the value of information to be higher, because people can essentially explore for longer and find optimal interventions.

To some degree, there are low information costs, so far as the movement is young, and there’s still a lot of low-hanging fruit. This is with caveats: Maybe you're a bit like Carla. Maybe you're very worried that we're just screwing up the climate or that nuclear war is going to go terribly wrong. In which case, maybe you think we should just be directly intervening in those areas.

So, what difference would exploring more make to effective altruism, if you can abide my argument that this is an important cause area? Well, I think we could probably invest a lot more time and resources in interventions that are plausibly good, in order to get more evidence about them. I think we should probably do more research, but I realize that this point is kind of self-serving. I think that larger donors should probably diversify their giving more, if the value of information diminishes steeply enough, which I think might be the case.

Psychologically, I think we should be a bit more resilient to failure and change. I think when people consider the idea that they might be giving to cause areas that could turn out to just be completely fruitless, they find it psychologically difficult. In some ways just thinking "Look, I'm just exploring this to get the information about how good it is, and if it's bad, I'll just change. Or, if it doesn't do as well as I thought, I'll just change." I actually find this quite psychologically comforting, if you worry about these things.

The extreme view that you could have is "We should just start investing time and money in interventions with high expected value, but little or no evidential support." A more modest proposal, which is the one that I'm going to kind of endorse, is "We should probably start explicitly including the value of information, and assessments of causes and interventions, rather than treating it as an afterthought to concrete value." With some of things that I've looked at, I really think information value can swamp concrete value. If that's the case, it really shouldn't be an afterthought. It should be one of the primary drivers of values, not an afterthought in your calculation summary.

In summary, evidence does make a difference to expected value calculations via the value of information. If the expected concrete value to interventions is the same, this will favor testing out the intervention with less evidential support, rather than one with more. And taking value of information seriously would change what effective altruists invest their resources, i.e., their time and money in.

Question: One person did have a question about what it means to have credence in a credence. Maybe 80% chance that it has 50% chance of it working, etc., etc. Does it recourse down to zero, was the person's question.

Amanda Askell: It's not that you have a credence that you have a credence, but your credence in your credence being the same or changing in response to new evidence. There are a lot of related concepts here. There are things like "Your credence about the accuracy of your credence." So, it's not "I have a credence that I have a credence of 0.8." This is a separate thing, my credence that in response to this trial, I will adjust my credence from 0.5 to either 0.7 or 0.2, is the kind of credence that I'm talking about.

Question: Do you think there's a way to avoid falling into the rabbit hole, of the nesting credences of the kind that the person might have been referring to?

Amanda Askell: I guess my view, in the boring philosophical jargon, is that credences are dispositional. So, I do think that you probably have credences over infinitely many propositions. I mean, if I actually ask you about the proposition, you'll give me an answer. So, this is a really boring kind of answer, which is to say "No, the rabbit hole totally exists and I just try and get away from it by giving you a weird non-psychological account of credences."

Question: I'll take the side step. I'm not sure I'm parsing the question correctly but, I'll give it a go. They say, "Is information about the resilience captured by a full description of your current credences across the hypothesis space? If not, is there a parsimonious way to convey the extra information about resilience?"

Amanda Askell: Okay. I'm trying to think about the best way of parsing that. So, your credences across the hypothesis in question. Let's imagine that I'm just asking your credence, I say that the intervention has value N, for each N I'm considering. That will not capture the resilience of your credence because, it's going to be how you think that's going to adjust in response to a new state. If you include that how things are going to adjust in response to a new state in your hypotheses space, then yes, that should cover resilience. So yeah, it just depends on how you're carving up the space.