July 23, 2020
This session covers the strengths and limitations of using data and evidence to allocate resources across priorities within the field of global health and development. Ruth Levine, the CEO of IDinsight, and Caitlin Tulloch, Associate Director for Best Use of Resources at the International Rescue Committee, highlight cases where rigorous evidence contributed to high-impact decisions, yielding large benefits. They also address key critiques of the use of evidence as it has been promoted over the past 15 years, and suggest ways in which the evidence-informed policymaking agenda could evolve.
Nathan Labenz (Moderator): Hello, and welcome to this session: “The good, bad and ugly of evidence-informed decision making in global health and development,” with Ruth Levine and Caitlin Tulloch. Following a 25-minute exchange between Ruth and Caitlin, we'll move on to a live Q&A session in which they will respond to [audience members’] questions. […]
Now, I would like to introduce our speakers for this session. Ruth Levine is the CEO of IDinsight, a global advisory, data analytics, and research organization. She is a development economist and expert in international development, global health, and education. Ruth was previously a policy fellow at Stanford University. And from 2011 to 2019, she served as Director of the Global Development and Population Program at the William and Flora Hewlett Foundation. Before that, Dr. Levine led the development of USAID's evaluation policy, and spent nearly a decade at the Center for Global Development in Washington, DC.
Caitlin Tulloch is the Associate Director for Best Use of Resources at the International Rescue Committee, where she leads the team assessing the cost effectiveness of humanitarian programs. Since 2015, this team has conducted 10 cost-effectiveness analyses and more than 100 cost-efficiency studies, as well as creating new software to enable rapid, field-based analysis. Prior to this, Caitlin worked at the World Bank in Indonesia, and spent four years at the Poverty Action Lab, where she managed their portfolio of cost-effectiveness analyses.
Here are Ruth and Caitlin.
Ruth: Hi, I'm Ruth Levine. I'm the CEO of IDinsight, which is an organization that works in many parts of the world to provide decision makers in government, nonprofits, and funding agencies with analytic support so that they can make the best decisions. The kinds of work we do range from rigorous, multi-year impact evaluations, to much shorter engagements for a one-time data analysis, to setting up monitoring systems for implementation. We have a lot of experience with a wide range of decisions, mostly in health, education, and social protection, as well as agriculture.
Before that, I worked at the Hewlett Foundation for eight years as the Director of the Global Development and Population Program; before that, at USAID’s leading evaluation policy development; and before that, at the Center for Global Development.
I've been around this “evidenced” policy space for quite some time, and what I’ve found over the years in observing its challenges (not only to do high quality work, but to have it used) is that there is tremendous potential in the evaluation agenda. There are also many challenges ahead of us. Caitlin and I will be talking about that today.
One of the things that I have focused on a great deal in recent years is the intersection between the evidence agenda and our larger ambitions around social change. In particular, I've spent a fair amount of time thinking about and trying to work on important issues around representation, voice, government accountability, and responsiveness to citizens — and how that can be reflected in the way we do our work collecting and analyzing data, and undertaking evaluations.
One of the only ways that decision makers in government and in NGO headquarters actually hear from the people who are affected by their decisions is through the data that we collect and [communicate] — on public opinions, the conditions of people's lives, and how public and private programs affect them. That work is hugely important. It brings the experiences of people who otherwise would not be heard from into public policymaking. I see the evaluation and monitoring agenda as [critical to] amplifying the voices of people whose lives and livelihoods we're seeking to improve.
Caitlin, what about you?
Caitlin: Thank you. That is a hard CV to follow, but my name is Caitlin Tulloch. I am the Associate Director for Best Use of Resources, which is a fancy way of saying cost effectiveness, at the International Rescue Committee. The IRC is a humanitarian organization working in 30 to 35 countries at any given time to provide a wide range of services — health, education, civilian protection — to people affected by crisis or conflict. Prior to my work at the IRC, I worked with the public financial management team in World Bank Indonesia on public budgeting.
I led the policy strategy for the scale-up of an evidence-based — and hopefully evidence-generating — education policy in the Dominican Republic. And I worked at the Poverty Action Lab at MIT for a number of years leading their portfolio of cost-effectiveness analysis.
There's definitely a trend [over my career], which is: How do you spend money on things that you think are going to have the most impact? With that in mind, I'm really excited to share with the audience today some examples, stories, and challenges from [the IRC’s last five years].
It has been an interesting experience, because a lot of my previous work was as an external evaluator, seeking to influence [and assist] decision makers in Bappenas, the Indonesian government's planning ministry, or MINERD, the Ministry of Education of the Dominican Republic. Being in-house at the IRC has really let me see how decisions happen, in a way that I think speaks to the need for data at multiple points along that decision-making process. That's been really exciting for me, [as has seeing] the constraints that Ruth referenced — all of the challenges and considerations that people face and weigh in making decisions.
But also, as we've tried to build structures and processes — and, more than anything, relationships — I’ve seen how we need to feed the right data to the right people at the right time. I’m pleased to get to share some of that with the audience today.
Ruth, I'd love to start by [asking you to describe] what you see as the main changes in the use of evidence in global health and development in the last five years.
Ruth: It's been a very exciting past five, eight, ten years. I'm not 100% sure where the demarcation is, but I think a few of the changes that I've seen — and I'd be curious to hear your thoughts on this, too — are that we've grown from focusing on the implications of a single study to [focusing on] what the body of evidence tells us. As more and more high-quality evaluations have been done, we've been able to ask, “Well, does an intervention that works in Bangladesh also work in Malawi? What about in Zambia?” And we've been able to see whether something that works at a small scale in Kenya also works on the national level in Kenya.
I think that perhaps we've become more sophisticated in not drawing too many inferences or making too many policy decisions based on a single study. Instead, we focus on: What does the aggregate evidence tell us? That's one big change.
I would say another is an important advance in transparency: a resetting of the norm around making your data available for re-analysis and being transparent about your methods and who's funding your work. I think all of those have contributed to both higher quality and greater perceived integrity of the work.
Also, I’ve seen a great trend toward more research productivity and capacity in not only Latin America and South Asia, but also many countries in Sub-Saharan Africa. There's a generation of new research from the people who live in, work in, were raised in, and understand the [places where] that research is being done. That is a tremendously important positive development.
The last thing that I'll mention is an absolute explosion in the use of non-traditional data sources. What do I mean by that? I mean using satellite imagery and other forms of remote imagery to look at geospatial information, using cel-tel information to look at patterns of mobility, using remote sensors to look at climate and environmental changes. I think there’s a lot of excitement and learning around using what's often referred to as “big data” to answer some questions, or at least generate some hypotheses, that might give us insights that we don't get from the smaller, more intense evaluations that we were exclusively focused on maybe 10 or 15 years ago. So those are some of the changes that I'm pretty excited about.
I'd love to hear your thoughts on the same question. Also, Caitlin, it would be great to hear what you think the big wins have been in the use of evidence in global health and development — the ones that you've seen and that inspire you.
Caitlin: Definitely. I'll start by saying that I think I agree with everything you said there, especially your first comment about the move away from weighting so much certainty onto individual studies, and toward looking at the landscape of research from many angles, with many methods, and thinking about what that means for a specific decision, [made by] a specific decision maker, in a specific place.
I think that if you'd asked me five years ago about the big win that we talk about when we think about global health and development, the one — at least for evaluation nerds — that would come up is PROGRESA. This landmark was a conditional cash transfer program; cash transfers were conditional on children attending school or participating in regular health visits in their formative years. It was the [central program in the] social safety net of the Mexican government, starting in about 1997 or 1998.
It was evaluated very rigorously by IFPRI [the International Food Policy Research Institute]. There was a huge cost-benefit analysis — I want to say it was about 110 pages — that I still go back to sometimes. It was a big program, a big evaluation, and had a proven impact and benefit-to-cost ratio, and it has continued for many years since then.
But thinking in line with what you said about a shift toward understanding mechanisms and using multiple sources of data, big wins embody the use of data in the formation and conception of new ways of reaching people and helping them meet their needs. Big wins [also entail] using that data on an ongoing basis.
Staying [on the topic of wins related to cash transfers], GiveDirectly [comes to mind]. There is now a very good body of impact evaluation evidence showing that, [perhaps unsurprisingly], giving people cash helps sustain their consumption and reduce negative coping mechanisms. But GiveDirectly has continued doing studies [on other variations of their program], asking, “How can we deliver this more efficiently? How can we reach these kinds of populations?” I think they’ve uncovered why cash works beyond the very basic [idea] that if someone has cash, they can buy things. But what does it help people do? Is it meeting credit constraints? Is it supporting consumption? Or is it preventing people from going into debt?
When you know those things, it becomes much easier to think about whether [an intervention] is the right tool in another environment. Although we at the IRC work in places very different from Western Kenya, which has a wonderful mobile money network, we use a lot of that evidence to inform how we think about giving cash in emergencies.
More broadly, I think there's a way of using data that the GiveDirectly folks embody, and that I think you're seeing in different places. For example, when DIV [Development Innovation Ventures], a branch of USAID that gives multistage funding for small pilots of promising ideas that entail collecting a lot of data, it [applies basic measures]: Is [the intervention] being delivered properly? Are people using what you give? Moving on to [the next level of] evaluation, is it showing impact in a pilot setting? And do we think that can be sustained? Before eventually getting to a big evaluation, [organizations are] learning something at all of these different phases.
If we're allowed to toot our own horns a bit, one of the IRC’s biggest programs right now is called Ahlan Simsim. It's a partnership with Sesame Workshop to bring early childhood development materials to children displaced by the Syrian conflict. The concept was based on the dense literature on highly cost-effective early childhood interventions in Jamaica, in the United States, and in parts of Latin America.
But it's not simply a case of taking one study — for example, the one in Jamaica, Reach Up and Learn — and plopping it down in a refugee camp in Lebanon. The first two years of Ahlan Simsim have [entailed] local, small-scale piloting, gathering data on what's working and what isn't, what people are or aren't responding to, and how home visiting works in a refugee-camp setting as compared to a formal settlement. From there, we’ve built it into a program that is going to be scalable, and used data as an element in our partnerships with the government. And we're now moving into the final phase, where we expect to do bigger evaluations of this product that we've developed. The ability to use data at all points in [a program’s] cycle strengthens the final product.
The other thing I'll mention, which [we may cover] more later, is [the power of determining] what the challenges of evidence-based policy are — challenges that effective altruism often faces, such as buy-in from, and developing data iteratively with, partners who are going to be the ones stuck with the bill of implementing the program at the end.
It has been, I would say, a very exciting five or eight years.[I’ve just painted]
Ruth: Those are really good questions. There's really no shortage of critiques. Many of them are very pointed about the evidence agenda. The ones that come to mind when you ask that question are, just as you say: Does a randomized controlled trial [ ] done in one place yield information that can be applied to other places? Is it worth the typically considerable amount of money and time?
I think other questions have to do with asking funders to use the results from randomized controlled trials as the source of information for their funding decisions. Does that drive money to [programs] that can be measured using that method? That is certainly a subset of all important things in the world, and not the universe of them.
Those are a few critiques that come up around external validity and [the question of whether] we’re driving money toward intervention-level priorities that can be measured using RCTs. There are reams and reams that have been written on this.
There are also reams and reams that have been written on the extent to which RCTs reinforce an image of decision making that is not very realistic — that it's removed from politics and the political economy of particular countries or districts. I think those critiques are interesting. They're valid, and are pushing the field ahead in important ways.
But I think the more important way to think about what's right or wrong in the field is to ask, “To what extent are we actually able to solve problems for decision makers — to answer questions for people who are faced with making resource allocation decisions, or targeting decisions?” [Those people are trying to determine whether] to invest in health or education, or how to scale up a secondary school program. Are we providing the decision-making support to help with those sorts of consequential [choices], such that [the decisions] have better impacts for real people living real lives?
I think when you ask that question — “Are we supporting decision makers?” — the answer is to engage with them. We must understand the time, budget, and political constraints within which they're operating [in order to determine] how a whole set of methods can be brought to bear to help inform those decisions. Some of those methods are going to be RCTs. There are definitely times when the right way to figure out the net impact of a program is through an RCT — when there are [adequate] time and resources, and it's an important enough question to dedicate those time and resources to answering it.
There are many other times when what decision makers need is basic descriptive information about the current conditions in which the population is living. There are times when what the decision makers need is information that will help design the implementation plan for a large-scale program. How can it be rolled out? Where should we [implement the program] first? How can we get ongoing feedback about whether that implementation is going well or not?
My main point is that the more we can leave arguments about the finer points of which method is better or worse aside, in favor of looking at real-world decision makers and the kinds of support they need to make the best decisions, the better off we're going to be. Then, the critiques can [follow], and [we can consider whether] we’re doing a good job of supporting decision makers. And if not, we can ask, “Are we doing a good job of generating robust evidence?”
I'd love to hear your thoughts about that. You face these sorts of tradeoffs and questions every day, and you have decision makers in the IRC who are counting on you. How do you think about the tradeoffs and the critiques that you've heard? And what do you see as the agenda for the organization’s work and [approach to] data and evidence going forward?
Caitlin: In many ways, it is what you've described: the shift from a supply-side view of evaluation and data to a demand-led view. What do people want to know to make the particular decisions that they’re faced with? And there are often about seven people who need to be involved for [the process] to move in the right direction.
Our priority is often to get those people all working together and understanding [the context around their decisions]. In some ways, [we’re trying to] simply do a better job of communicating the idea that impact evaluations — randomized controlled trials — are the least biased method of answering one very particular type of question, not all of the [relevant] questions, and not a lot of the really important questions.
But I would agree with you that sometimes the focus on litigating that has obscured one of the most valuable things, I think, that comes out of many RCTs: the baseline study, which is a fabulous set of descriptive statistics about what's going on and who's getting services.
My favorite example from within the IRC is this: We have been working for a number of years on an alternative protocol for treating acute and moderately acute malnutrition in children (“moderately acute” means they are still quite malnourished). Currently, [these two states of malnutrition] are treated separately; we've been exploring the impact, as well as the feasibility [in terms of] the cost and the management structure, for combining them into one set of treatments at one set of facilities. We think it could have a lot of benefits.
But implementing this [approach], which we think is evidence-based and cost-effective, raises a huge number of questions. We're trying to organize ourselves to answer those in a holistic fashion. There has been an impact evaluation in an RCT at one site in Kenya, and one site in South Sudan is [addressing the question] “Is this more effective at reducing acute malnutrition than the current standard?”
One of the really interesting things to come out of some of the cost-effectiveness analysis is that coverage rates — the number of local children who get treatment — is, I would say, the biggest determinant of cost effectiveness. That raises several questions way beyond “What intervention are you doing?” You may be [implementing the] combined protocol in Nigeria, but if only 30% of the kids show up, your coverage is too low to be truly cost-effective. Then we enter the realm of performance data — what we've called “monitoring for action” — where you're monitoring coverage on a periodic basis.
When the coverage falls below some critical point, it's bad for cost-effectiveness. It also means you're not reaching the children you're supposed to be reaching at a more fundamental level. That [information] can be used to redeploy resources to get kids into treatment, or to consider whether an alternative strategy is needed. That's how smaller-scale piloting and tinkering [works]; you continue to gather data and reflect. There's not a counterfactual or a control group, and you're not going to publish the results in QJE [the Quarterly Journal of Economics], but [the approach] still works, I think, together with a broader understanding of what we’re trying to do: What's the theory of change behind it? Where might we be falling down? And what else can we do to be serving the people whom we're trying to serve?
I think viewing it from the supply side has kept us focused on what we know how to evaluate and what we like to do, but the demand side [helps us] see how this looks from the perspective of the Ministry of Health in Mali (which has to make policy recommendations), versus the IRC, which is a partner in certain countries but not others, versus funders who are thinking about where to spend their money. We really need a lot of data to answer that set of questions, and it doesn't need to be an ideological fistfight. Ideally, we’re all pulling in the same direction.
Ruth: That is a great way to end. And I think we will both look forward to a lively conversation.
Nathan: Thank you, Ruth and Caitlin, for that talk. We have had a number of questions submitted already. Let's begin our Q&A with the first one: Could you comment on the randomista economic growth discussion that's been happening on the EA Forum, and that also was touched upon in the EAGx talk by Hauke Hillebrandt?
Caitlin: Ruth, do you want to kick us off with this one?
Ruth: Sure. I was going to invite you to do the same. I'm sure [Caitlin] can pick up and add to this. Unfortunately, I haven't been tuned in to the EA Forum, so I'm going to guess that the debate is between [people] looking at micro-level evidence through randomized controlled trials and [those] looking at larger macro phenomena through the lens of what is resulting in stalled economic growth.
I would say that it's a false choice. I'm not trying to [throw a] punch here, but basically, the policies related to countries’ prospects for economic growth — what can lift their population out of poverty through expansion of the economy, through supporting some industries versus others, through educating their population or not to a certain level through various kinds of social investments — are all hugely important [in terms of] how those policies are implemented. We have the opportunity to use evidence from randomized controlled trials and many, many other sources to help governments [determine] and learn as they go how to design and implement the programs that are part of an economic growth strategy.
I think it's quite unfortunate that there has been a debate fostered when the genuine answer is that you need both the macro policies and many of the insights that come from more micro-level data collection and analysis to figure out how to implement some of the most important government programs in health, education, and social protection that contribute overall to economic growth.
Caitlin, over to you.
Caitlin: The most crystallized thought I have about this is entirely borrowed from Rachel Glennerster. We were talking about this issue one day and she said, “Yes, economic growth is important. Yes, the micro-level [work] we can learn from is important. But why do we care about economic growth? It is because we think it helps improve people's standards of living. These are not either/or; they're different and probably complementary pathways to the same thing.”
I think that speaks to your point, Ruth, that it's about [determining what] the right tool is to address the relevant constraint in a particular country.
I do think it's exciting that RCTs have begun to move more into the realm of industrialization and [larger] organizations to get at some market-level questions. But in the end, I think they all have the same goal. Therefore, they should be seen as complementary.
Nathan: Thank you both. Moving on to the next question, how concerned are you about running into the McNamara fallacy — that is, ignoring potentially highly valuable interventions because they're not easy to quantify?
Caitlin: I'm happy to start. Ruth, I'm really curious to hear your thoughts as well.
In my experience, there are two elements of this problem. One is that the intervention itself is difficult to measure, quantify, or describe. And the other is that the outcome is hard to describe. Only more recently has significant work been done on violence against women and girls. That work has come as we've had more validated measures of such violence that work across a lot of contexts. It's hard to measure the impact of something when you can't measure the outcome.
That means a lot of the early micro-empirical work was focused on health and education outcomes that we know how to measure. Over time, I see our ability to look at interventions which are aiming at less tangible — but very important — outcomes [increase] as our ability to measure those outcomes improves.
Therefore, I think that we need to be cautious. Right now, the bulk of our evidence is disproportionately concentrated in areas with outcomes that we know how to measure. […] But I'm optimistic that [our approach is] getting better as people take their measurement toolbox into areas outside of just health and education.
I think there's a second question embedded [in this issue], which is that sometimes the interventions themselves are hard to quantify and measure — for example, building social movements. I am more conflicted about that, not only because the outcome is difficult to measure, but the thing itself is diffuse. It's important to notice that if the intervention itself is diffuse, it's hard to know what to do with the evidence you generate. How would you go and replicate a thing when you don't know what the thing is?
I think there is some room for missing these kinds of things if we focus too much on what can be put in a log and measured in that way. I think there are enough really passionate people out there that they're not going to stop [an intervention] because there's not an RCT about it. So I think that is a cause of greater concern for me, but I don't flatter myself or my community [by thinking that a] lack of evaluation attention is going to stop social movements from happening.
Ruth: I totally agree with Caitlin. Let me just add a few more thoughts.
I think that one of the key elements of better practice — and in the end, better results — from all of our work is that we have not done a good job at all of engaging with and learning from people who are experiencing the failure of the systems that we are trying to work on and [using that to] inform the research and evaluation work that we do.
Engaging with and listening to people who are experiencing those system failures is important for a million reasons. But two are particularly relevant to this question. First, they are the ones who can help define what's important to them and what's not, what is affecting their lives. That's separate whether it is measurable or not. The question is: Is it important to them?
Second, as Caitlin said, we're getting better at measuring less tangible kinds of outcomes. And for that kind of measurement, clearly, the engagement of people who have experience with those outcomes is absolutely essential, and should be at the center of that measurement agenda. So I think that we have a long way to go, but the questions are not “Can we measure it or not? And are we missing something if we can't measure it?” From my perspective, it's about who's doing the measuring.
Nathan: Thank you. Next question: What's the bigger obstacle to progress in your opinion — the ignorance or inability of policymakers to implement evidence-based approaches or the unwillingness to do so due to electoral or other concerns on their part?
Ruth: Maybe I'll start this one, and Caitlin, feel free to take over.
It depends on the setting, obviously. There's no sweeping generalization that can be made. But regarding the point of the question, which [centers on] why elected officials do not have the will to implement effective policies that can improve the health and well-being of their citizens, I think a key aspect is certainly that there are broken relationships of accountability between citizens and their governments in many, many countries. I would not exclude our own [the US] from that. Therefore, there's a real question about how to strengthen that relationship of accountability.
Actually, there's a lot of interesting evaluation work that's been done looking at different kinds of interventions (typically quite separate from the ballot box) and how collective action can make demands on government so that government is responsive. I think that there are ways to strengthen civil society so that there's greater ability to collectively call on government [to act].
Caitlin: I would agree that a lot of this is about misalignment in incentives. The one thing I would add is that viewing it as a pathological [condition] misses a lot about how governments actually work. You can't have everyone in an entire organization focus solely on one outcome. That's not how the private sector works, and that's certainly not how the public sector works. If we wait for everyone to be focused only on health outcomes as [compared to the] SDGs [sustainable development goals], I think we're missing opportunities where alignment exists.
There are places where I've worked on scale-ups of evidence-based programs in which you find alignment between the electoral incentives. There was a movement to spend 4% of GDP on education in the Dominican Republic. There was tension around that issue, but we had an evidence-based policy, and there were some fancy people attached to it who made it look really good electorally. There are really a lot of alignments even within systems where the incentives are not perfect.
One thing that's really helpful [for us] is to be ourselves — not just as neutral producers of evidence about the thing that is objectively the right thing to do, but understanding what people value already in the system, and where there's already a lot of [alignment] that can be built upon.
Nathan: I think we're almost out of time for additional questions, but let's do one more quickly. Caitlin, this one will be more for you: To what extent has the IRC managed to overcome scope neglect in its allocation of resources? And do you have an opinion on which organizations do the best or the worst in terms of making sure they take scope into account?
Caitlin: I'll start by [sharing my definition of] scope neglect to make sure we’re on the same page. The idea is that in assessing the importance of working on a particular question, we neglect the size of the population that is affected by this issue versus any other issue.
This is a really hard issue, particularly for a humanitarian agency. I'm an economist. I come from a fairly utilitarian background. The fact that a humanitarian agency chose to build a team of [economists] inside shows that they were really interested in confronting this issue of where marginal benefit is greatest, especially for humanitarian agencies. The fundamental philosophy on which this is built is that if people are in need of assistance, they deserve to get it.
What I've come to see, as a somewhat reformed utilitarian and by working with a lot of humanitarians, it's that it’s neither one nor the other. I believe both are true, and the question is: Where can you find the most middle ground between them? What we've really pushed for is to say, “At the IRC, let’s define our goals: Who do we seek to serve, and what are the outcomes we seek to get for them?”
Within that space — for example, maybe we've decided that within Sierra Leone, we think health is an incredibly important outcome — there is a huge amount of opportunity for optimization in a neoclassical sense. But you don't have to start from the framework of saying, “All need is equal and it doesn't matter where it happens or who it happens to.”
I have been really impressed by the way in which the IRC has taken on that issue, I don't think it has solved it. I'm not sure any organization will, because in many ways, it's a very personal thing. It relates to your [personal ethical beliefs], and whether you are purely utilitarian. You could get into the ethics of it; are you Rawlsian? Or, where do you sit [on that ethical spectrum]? Those are very personal things. So I think understanding scope neglect within the framework of the value system that a particular organization espouses [is key]. I've been quite impressed with the IRC, but it is a journey, and it is a very personal one.
I'm not sure that's fully answered the question. I can mostly speak to the IRC experience rather than others. Ruth, I don't know if there are others who rise to the top of your list.
Ruth: No. But now I'm interested in the question and I'm going to be thinking about it as I go forward. Thanks to the person who asked that question.[…]
Thank you so much for watching.