Adoption of Effective Practices in Schools | Todd Rogers || Radcliffe Institute

Adoption of Effective Practices in Schools | Todd Rogers || Radcliffe Institute


– Thank you, Meredith, for
that nice introduction. What I want to start
with is I’ll just give a brief overview
of my background as it relates to the
work we’re about to do, and then we’ll launch into it. So I am a behavioral
scientist, which means I work on developing
scalable interventions that use the insights of
psychology and other fields to change behavior to help
people make better choices. And then the second element
is we always use randomized control trials, which I
will illustrate in a moment, to figure out what
works and what doesn’t. And I used to work in politics. I no longer do that. Everything I do now is
about mobilizing families to help families help kids. I want to start by describing
a project that we did in my lab two years ago. I recruited 10
school districts that have– to give– to let us send
attendance awards to 15,000 high school kids
who had at least one month of perfect
attendance in the fall. So in January,
half of these kids we sent an award– this is with
Carly Robinson, Jana Gallus, and Monica Lee. Monica’s a grad
student at Stanford. Jana is a professor
at UCLA and currently is a doctoral student
here at the Ed School. And half we sent these awards
to saying congratulations on a perfect month
of attendance. We mailed it to them. This is a relatively common
practice in some schools to give awards for perfect–
for good attendance. The other half we sent nothing. So we randomly chose, and so
it’s a randomized experiment. I want you guys to guess
what you think the effect is on subsequent attendance. I’ll give you five
response options. Very large decrease
in attendance, small decrease in attendance,
no effect, which I probably wouldn’t be talking about
if that’s the result– [LAUGHTER] –a small increase
in attendance, large increase in attendance. OK, so raise your hand– we’ll
start with large increase in attendance. Raise your hand if you
think that’s what happens. Raise your hand if you think a
small increase in attendance. Raise your hand if
you think no effect. I hope no one says this. OK. Raise your hand if you think
small decrease in attendance. OK. Raise your hand if you think
large decrease in attendance. OK. So the vast majority
of you, like me, thought that this would
increase subsequent attendance. This is a common practice. 92% of schools in the US report
that they give attendance awards. What we learned by
doing this experiment is that sending these
awards decreased subsequent attendance. Students who
receive these awards show up to school less
after getting them than they did before. And then a control group. So– and there is
no way to know this without running the experiment. So this practice, it turns out,
is actually counterproductive. But when they send
these awards to kids who have excellent attendance,
those kids generally continue to have
excellent attendance, and so they might believe
that these awards are what’s causing the ongoing
excellent attendance. But they’re selecting the kids
with excellent attendance, slightly decreasing their
subsequent attendance, and attributing the relative
excellence to the awards, right? They miss that. This is– experiments
are important is the point of this one. Here’s another one. Correlation is not causation. So this is one of my– it’s an
old– it’s a decade old now, but it’s one of my
favorite examples of a common practice in medicine
or a widespread practice in medicine that was practiced
for a couple of decades, that it turns out
wasn’t awesome. So studies question using
cement for spine injuries. This is in the New York Times. It’s reporting a JAMA paper. The treatment,
vertebrate plasty, injects an acrylic cement into
bones in the spinal column to ease the pain
from cracks caused by osteoporosis, the
bone thinning disorder common in older people. Doctors began performing
it in the ’90s. Patients swore by it. Most got better, and it caught
on without any rigorous trials showing that it works. Last year– this is in 2009– 73,000 Americans
had this treatment. This is– just to be clear– you have back pain, and so
they inject acrylic cement between your vertebrae as a
way of reducing your back pain. And there’s– at that point,
what they know is that people who get it tend to get better. And so they did some randomized
experiments, and it turns out has no effect. But the people who had
back pain– here’s like– are there any– I know there
are probably doctors here. But it turns out
that people have back pain tend to get
better anyway, which is– but when you inject
the vertebrate plastic, they tend to get better, and
they attribute that improvement to the vertebrate plasty. It turns out it’s useless and
expensive and probably not healthy in the long run to have
acrylic cement in your bone joints or your spinal column. Point is that what– the
correlation is people who get it tend to get better. And they attributed
that to that it must be that this is causing it. There was no way to
know that this is not what was causing it unless they
did this randomized experiment. In a randomized experiment,
a treatment group gets it. A control group got a placebo,
which is– they went– they got a placebo treatment,
which was they got– I think they may have gotten an
incision without any injection. And over time, they looked
identical, the two groups. So what I’m going to do is
I’m going to describe two– there’s now three structures
we’re going to talk about. The first is a
project that we’ve replicated a bunch
of times, really powerful for helping
kids do better in school, reduce course failure,
improve GPA, and improve standardized test scores,
and is dead on the vine. Zero schools that we ever
run the studies in ever continue doing it. The second is a project
we’ve replicated 15 times, reduces absenteeism, and this
year we’ll deliver 3 million of these report–
these interventions. And starting with– we
started with 10,000. It’s just rapidly growing. Everyone wants to do it. And I’ll talk about– that has led me to this
question, which the third thing that we’ll talk about, which is
how do people decide what works and what to adopt in education? So this is a parent portal. And any of you who have
kids probably have– your school district
offers something like this. You need to get a
password, verify it with your social
security number, usually your kid’s social
security number, confirmation of the school,
confirmation of birth date. Then a few clicks later
you can eventually access parent information. A friend of mine, Peter Bergman,
developed this intervention in Los Angeles Unified
School District, where he automatically pushed
out from a digital grade book that teachers already put
all their information in when your kid has not turned
in their homework. So at 5:00 PM, you’d
get a text saying, Todd didn’t turn in
his homework today. Please get him to do it. Or Todd skipped class today. Please talk to him about it. Or– and this is one that
we think is most useful– Todd is– just fell below
passing today in this class. So anyone who works
with high school kids may know that parents
on average hear nothing ever from their schools. So this was a radical
increase in useful information from their schools. Peter has done
this, studied this in LA Unified; Grand
Rapids, Michigan; Morgantown, West Virginia. I’ve studied it throughout
London and the UK with the prime
minister’s office. And I’m going to show
you a replication that we did in Washington DC Public
Schools with 7,000 families across 12 schools. The point is Peter has
shown now in four– three replications,
and I’ve shown in two, that this reduces
course failure, increases GPA, and increases
standardized test scores. But what we’re going to show
is something cool about– like, one of the– a behavioral
insights illustration. The way it’s normally
offered is families are alerted by text
that this can be– that they can receive
this information. They are then
texted and said, you can enroll by logging in at some
website that is not obvious. But in order to know what
to enter into the website, you need to call your school and
get the relevant information. This is the way this enrollment
process typically works. We simplified it. So one of the least
inspired but most powerful things in behavioral
science is that when you make something easy, people
are more likely to do it. So we just– instead of having
them log in, we just said, text back Start if
you want to enroll. And so we said, text back Start. Then, here’s the most powerful
thing in behavioral science, is that when you
shift the default, people are most likely
to accept the default. Right now, the default is
that if you do nothing, you’re not enrolled, but if
we shift the default, we say, if you do nothing,
you are enrolled. Text back Stop to opt out. And then a control group. So first, let’s start. What percent do you
think enroll here? They’ve got to call
their school and log in. Throw out a number. Somebody– – 20. – 20, 5. OK, less than 1%
of parents enroll. – No! – And if you’re a
superintendent here, you’re not wrong to
think this is useless. All right, now what number? Come on. What one– would
you yell it out? – 20? [LAUGHTER] – OK. All right, now I’ve tricked you. You’re not– OK. 11%. So that may not seem like a lot. It is a 10x increase in
the percent of parents who sign up for it, right? That’s massive. It happens to be the parents
of the best-performing kids, so not necessarily the ones
that I’m most worried about. We should definitely
inform everybody. But these aren’t
the ones who I think are going to benefit
the most from it. So now, your intuition
is right on this one. What percent do you think stay
enrolled when it’s opt-out? 95% of parents stay enrolled. And what– – So it’s not even 89? [LAUGHTER] – It’s not 89. It’s 95. – It’s not even the converse. – No, no. It’s better than that. Yeah, exactly. So one thing that’s important
about the opt-in/opt-out. Oh, I should say that
these students, 1 in 4 doesn’t fail the class relative
to the other groups, so it reduces course failure. GPA goes up. And my favorite
part of this study is we text parents
during the summer, after this program
is over, and we say, would you like more information
than you’ve been getting this past year from the school? And so this group has
been getting a ton. Every other group has been
getting what they normally get, which is nothing. And this group is more
likely than the others to say they want
still more, right? Which is that parents don’t
know what they don’t know. When you give them useful
information, they act on it. They improve
student achievement. They want more. But one thing worth noting,
just on opt-in/opt-out, they mean something
totally different. Here, would your parents– would you like extra
information we’re offering you about your kid? Yeah, I mean, more extra,
above what is expected. Here, would you not
like the information we’re planning on
giving you about what’s going on with your kid? Like, that means– what kind of
a parent are you if you don’t. I mean, like, you may not want
it for a variety of reasons, but it’s more
diagnostic about, like, do you not want this
information that we’re already going to give you,
versus would you like to elect to get
all this extra stuff? So the default is
really powerful. Simplification’s
really powerful. I want to then show– we then– I– you’ll
see a bunch of these over the course of the
next however many minutes. I teach superintendents
all the time or principals and whatever, so I always
use them as participants and have them fill out surveys. So I asked
superintendents how many they thought would sign
up in each conditions. And so remember, this is
in the standard 1% sign up; simplified, 11%; 95%. So the difference between
automatic enrollment and the active,
painful opt-in is 94%. The superintendents thought
that the standard process would enroll 34% of families. They thought that when you
simplified, it would go to 48%, and when you changed the default
it will go to 66%, right? So they thought that the way
you implement the technology would only lead to this
relatively modest increase in take up, whereas it has this
profound difference in take up, which is– leaders don’t
anticipate that how you implement the technology
changes its likelihood to be adopted, which changes
real– this downstream effectiveness, right? This is part of this theme
that I’m working through, which is how do decision
makers decide what works? A superintendent here would have
to conclude this is useless, but it’s entirely because
they implemented it wrong. And they would miss that
it was– that it ends up being among the most
effective education technology interventions
that we know exist. We’ve now replicated
this, between Peter and me and my lab and his, five
times with 100,000 families. So that puts it about the second
most-replicated intervention in education. And what we know is that
it reduces course failures. It increases GPA. It increases parent demand
for more information. And leaders don’t anticipate
that how you implement things like this matter. And zero schools we’ve
ever implemented it in ever continue doing it. So it’s now been
super well replicated, involving perhaps over 100
schools, and none of them continued using it. I’ll revisit that in
a second when we talk about the next intervention. OK, so putting that one
aside, the other intervention is modeled after this
mailing that you all get, which is comparing your
energy use to your neighbors. Raise your hand if you get this. Do you get a mailing
coming saying, do you use more energy
than your neighbors? OK. This intervention is delivered
on behalf of your utility, probably Eversource or
something like that, by an organization
called Opower, started by two friends of mine
who are alums at the college. They were reading– they
were starting a clean energy company in 2007, and they read
this journal, this psychology journal, and saw
this study that when you tell people they use more
energy than their neighbors, they reduce their
energy meaningfully. This company now delivers
something like 15 million of these reports a
month in 14 countries. They went public for a billion
dollars five years ago. And they reduce energy
use the equivalent of half of all solar
production in the world. It’s something like
the most effective way to reduce energy
use anyone has ever uncovered, other than
a technology change. And I studied it– 250,000 families that
were getting it in five– four different zones in
the US over five years. And the effect grows
over time, which I would not have predicted. The other thing
that we learned– that they have learned
is that delivering it digitally is way less
effective, if at all effective, relative to
delivering it by mail. And the reason is
the mail becomes what we call a social artifact. It sticks around in the home. People put it on the fridge. They complain to
their spouse about it. They show it to the
kids and tell them they need to turn
their lights off, whereas a digital communication
has a one time consumption, and then disappears. And like with the
texting, texting is great if you tell a
parent their kid didn’t turn in their homework tonight. Get them to do
their homework now. But if it’s a behavior
that spans time, like generally reducing energy
use, the language I use– does it– do we
need to bridge time for the intervention to work? In which case, then
you need to have some kind of shelf
life, strategy for surviving in the
home, maintaining and capturing attention. So this is super effective. The least inspired intervention
that I’ve ever been involved in is like, let’s
directly translate this into an education intervention,
reduce absenteeism. You’re going to see. It looks exactly like this. We’ll show it in a second. Let me tell you about
student absenteeism. It turns out the missing
school is correlated with all the things that
we think school does, which is– it’s negatively
correlated those things. Missing school seems to
be related to dropout. Missing school
seems to be related to reducing test scores. Missing school
actually seems to be related to later life earnings,
contact with juvenile justice system. It appears that attending
school is necessary for learning in school. Reducing absenteeism is
really expensive and difficult to do at scale, so the
best evidence we have is that texting is– texting
this kind of personalized info automatically is useless. The best evidence we have
on reducing absenteeism is truancy officers
or social workers, mentors devoted to
reducing absenteeism. And they can reduce
a day of absenteeism at a cost of about
$500 per day generated. So Chicago Public Schools,
Jens Ludwig and John Grurie and the team have a randomized
experiment in Chicago Public Schools where they
showed that, at best, mentors can reduce absenteeism
by up to three days over the course of a year where
they check in with a twice a year, and they check in with
the parents twice a month– or they check in with
the kid twice a week. They check with the
parent twice a month. They do this for a
year to two years, and at most they can reduce
absenteeism up to three days. Turns out it’s just
hard and expensive to reduce absenteeism. Something interesting
about this– oh, you guys are– if
anybody is wondering about this intervention,
you may wonder, why would a utility
that sells energy hire a company to get
its consumers to consume less of what it sells? Think about that for a second. It’s a really weird
business model. It’s because it’s a regulated
niche that regulators tell utilities, you get a fixed
amount of revenue per user, and if you can
demonstrate reductions, you get additional revenue. It’s just a niche that
policy makers created. Similarly for schools, 40%
of kids attend schools, they get paid per kid per day. So Los Angeles Unified
gets $63 per kid per day. The state has to come up with
a strategy for allocating aid, and Texas, California, and
a bunch of other states have decided that’s
going to be based on average daily attendance. So there’s revenue tied
to increasing attendance in addition to
academic performance. And then finally, the
federal government passed a law that’s resulted
in every district– nearly every district in
the US now being held accountable for the first
time to reducing absenteeism. OK, so you can see the
completely uninspired intervention. We did this in
Philadelphia in 2015, district-wide with
30,000 families. We randomly assigned them in
different conditions and sent four rounds of these
reports saying, you’re– choosing the kids who missed
the most school, saying your kid misses more school
than their classmates. This is how many days. It’s worth noting that parents
have two really widespread false beliefs almost
universally held. They underestimate their
own kid’s absences by 50%. So if my kid has missed 20 days,
I think my kid has missed 10. Like, every district we’ve done
a survey in, this is, like– this is the pattern. And the other is that kids– parents of kids who miss more
school than their classmates, so 100% of them that
miss more school than the classmates,
the majority think they miss
less or the same. Which like the Lake
Wobegon Effect. Like, all– every
kid– every parent thinks their kid is above
average on attendance, even those of kids who miss
more than their classmates. And there’s no way
they could know what the right answer is to that. We never tell them. So we end up correcting
these false beliefs, and it proves to
be very effective. Some key beliefs really matter. For example, how many
days your kid has missed. My kid has missed 20 days. I thought it was 10. Correcting that belief
seems to really matter. People read mail, contrary to– you know, we’re not
living in the future yet. We do these experiments where
in the summer you call people and you say, do you
remember receiving this? In the control group,
0 people received it, but 25% say they got it,
which is totally common. They’re lying, they’re being
nice, they’re misremembering, whatever. 25% of people think they got it. But in the treatment
group, 75% of people remember getting it and
reading it, which is– a set of mailings, at least
50% read it closely enough to remember it into
the summer, right? So that’s a
conservative estimate. And it turns out it’s
really effective. It reduces this chronic
absenteeism by more than 10%, and it’s consistent
across grade levels, K-12, which is surprising. Happy to talk about that,
but why we think that is. It’s consistent across race,
gender, free and reduced lunch status. One thing that’s cool. If you assume that–
if we’re siblings and I’m the focal student, we
randomized so that only one student is communicated
about, but all siblings in the household
attend school more. So we’re really changing
parent motivation to reduce absenteeism. It’s not just with
regards to me. It’s with regards
to all my children– all my parents’ children. And so we reduce absenteeism
at a cost of $7 a day, generated per day of
additional attendance. So it’s about 100 times
more cost effective than the next best intervention. It can be implemented at scale. And in a lot of
districts, it actually is massively revenue-generating
for the district. We’ve now replicated
it 14 times. We replicated it
Chicago Public Schools. We replicated it in 10
districts in California. We then replicate in LA unified. We replicated it in other,
not disclosed districts. Consistently reduces
absenteeism with this kind of effectiveness, and it seems
to be getting more effective. These districts that we
initially ran the studies in asked me and my lab
at Harvard if we would implement it for them to
continue to reduce absenteeism. And Harvard said it
was not research, because it’s not anymore. Now it’s just implementation. So we spun an
organization out class In Class Today that now
implements these for districts. And two facts that are
really cool about it. The first is now it has a
staff and data scientist and it runs constant
experiments. It makes the intervention
more effective. For those of you in– we now have a much better
heterogeneous treatment effect model, so we can
actually target those who would be most
treatment-responsive to different messages. We also have improved
timing, and there’s now 10,000 variations of the
message across– there’s like, 13 languages. But all these different–
different variations that have resulted, we think,
in 50% to 100% increase in the effectiveness of the
intervention over the last two years, right? So rarely– what usually
happens in academic science is we have papers with
small sample sizes showing giant effects. Then the world tries
to implement them, and if they replicate
it at all, they replicate it a tiny
fraction of the effect size. So this, because we keep
doing the constant tweaking, it’s getting more
and more effective. And additionally,
it actually has proven to be really
revenue generating for a bunch of districts. So we did this pilot in Los
Angeles Unified three years ago where they spent
$150,000, and we did a control group
and a treatment group so we could see exactly how
many net days we generated. And we generated enough net
days that we increased revenue by almost 10 times as
much as they spent. So now– this is the
second year in a row. They’re now doing
it district-wide. And that was sort of– that
was compelling evidence, or at least it seemed
compelling evidence, to the superintendent. So the organization
in the first year only implemented a few thousand. Then the next year, something
like five times as many. And then this year, we’re–
they’re implementing something like 3 million of
these interventions. And the expectation that it’ll
probably double or triple again next year. And this is just because– so that’s just a fact. Districts seem to want
to implement this. Even though it doesn’t even
move the thing we care about, which is actual
learning, it just moves in input,
which is attendance. So the impact, it
reduces absenteeism. It spills over to siblings. Some beliefs really
matter, in particular how many total
absences my kid had. And it’s scaled up– much, much to my
surprise, like seems to have good product-market
fit, as they describe it. And I contrast that with
the other intervention. Both– these two interventions,
I think I can immodestly say are the most-replicated
education interventions, I think with the largest
scale replications of any education
interventions in the US, because they’re
so easy to study. Most interventions are
really, really hard to study. These, we’ve done
lots of experiments, because they’re super
easy, and they work. But this one has gone nowhere. And this one seems
to– every district we’ve done it in wants
to continue doing it. So that has led to this
question, like, why is that? And could it be that there’s
an important outcome? I actually– the
texting parents actually moves the outcome
everyone cares about. The absenteeism one
doesn’t necessarily move the one we care about it. It moves an input to it. Evidence that it works. Lots of evidence that
both of them work. Is there are regulatory mandate? Like, are they are they held
accountable to moving these? In both cases they are held
accountable to improving student performance and
reducing absenteeism. Do they understand
that it works? In both cases, we’ve
communicated back the results. We do trainings
with their staff. So does the relative marginal
cost for intervention delivery? How much does it cost to
deliver the intervention? This one actually
dominates, right? The modality is just
digital communications. It’s actually cheaper,
it would appear. So how about teacher effort to
administer the intervention? This is where I think
the real sweet spot is. In order to implement
this intervention, we need teachers to continue
to use the same grade book and keep it up to date. And anyone who’s ever
worked in schools, it’s incredibly hard to
change teacher behavior. When we run these
studies, the principals hound the teachers to use
the same digital grade book and keep it up to date. And they don’t like it, and it’s
really hard to compel them to. And teachers have
learned over time, there’s always new initiatives,
and the average tenure of an urban superintendent
is like 2 and 1/2 years. They have to wait it
out, and whatever they’re being pushed to do will change. So it requires a ton of– even though it appears to be
easy, costless technology, it requires teachers to change
their behavior in order for it to be implemented at scale. Additionally, it requires
district stability for the intervention
to be possible. So one thing we’ve learned
is that the backend– like, the backend
databases, the district keeps changing with new
staff and new leadership. And so one of the things
that teachers complain about is they’ll learn a new
technology because they’re asked to, and then the
district will discontinue it and move into a new one. So there’s no– so
these are the two that– for me, when I make
sense of these, I think that the
reason one has scaled and the other
hasn’t very well is because one requires teachers
to change their behavior, and the other, I talked
to superintendents and they can say yes,
and within two weeks, district-wide, we can
reduce chronic absenteeism by 10% to 15%. The other one is going
to require an 18 month initiative to get
teachers to come on board. And they have a lot
of other initiatives that may or may
not be productive, but they have a lot
of other initiatives. This was a surprise to me,
because I started both, thinking that if they work,
they’re easily scalable. Those are the only
projects we start in my lab, which is can we learn
something about human behavior? And if it works, does it
suggest a scalable intervention? We started it, and much to
my surprise, only one of them actually can scale. So that has led to
this question, for me, of how do we scale– like, what leads some things
to be adopted and other things not to be? And one approach to this
that has left me baffled– so I’ve been inspired by the
talks of the other Fellows where– some– I have a hard
time sitting with not knowing the answer to stuff. And I see many of you were
just wrestling with something, openly and continuously. I have a high need for closure. But nonetheless– so I’m– so
that’s the closure on what I know. Both of these
interventions work. One seems to be scaling really
well, and the other doesn’t. But the question I’m
really interested in is how do decision makers,
leaders, decide what to adopt and what not to? And a dimension
I think is really interesting on that is what
counts as valid evidence. For me, I just– I started with those
examples of the spinal cement and the attendance awards, where
we do randomized experiments and we show that
things that people do, and they all believe work, are
actually counterproductive. So I’ve run a bunch of– so I talk with school
leaders and principals and superintendents all the
time, and one very common thing where I’m talking to the
absenteeism intervention– I say, we’ve got 15
randomized experiments involving 300,000 families
across urban, rural, diverse districts. And they consistently
show this result. We know it works in
a way that nothing you’ve ever encountered in your
lives as an educator we know works. And they’re like, great. Let’s do a pilot
in our district. And I say, what does that mean? They say, let’s see if we
can do it in one school and see if it
reduces absenteeism. And so I had my team–
we did a simulation, where we ran the
simulation with– we ran the intervention. It has a modest but real effect,
and effect we know is real. And then we ran the
simulation knowing what the natural standard– natural noise in a school is,
year over year, week over week, month over month. A kid’s absence varies a ton. A school’s absence varies a ton. It’s a noisy world. And so given that, if you
implement it in one school and you just look pre and
post, what’s the probability that attendance is going to
decrease after administering the intervention? So you have a 50% chance
that attendance will decrease after administering
the intervention, and a 49% chance that it will
increase, just based on noise. And only a 1% where
there’ll be no change. Basically, the effect is
small, but real, cheap and easy to implement. It’s not a panacea, but it
works, and it’s outrageously cost effective. And if you run a pilot,
you have a 49% chance of falsely believing that
it actually reduced– it increased absenteeism
and only a 50% chance– this may– for those
of you unfamiliar, you may read this as
like, oh, it doesn’t work. It works. 15 randomized experiments
involving 300,000 families around the United States. But the world is super
noisy, and effects are small. And so given that, a
pilot is utterly useless, but everybody wants to run
one, because they think that they can interpret it. It actually creates
what my mentor, Don Greene in political
science, calls, why don’t I– I don’t want to
blame him for this. I understand he calls it
negative knowledge, which is like, they will falsely
believe they learn something. They are worse off
from a knowledge base for having done this than
had they not done it. But it’s completely widespread. So I ran an experiment
with 200 or so principals where I said to these
school principals, I said, imagine there’s an absence
reduction intervention, and a large, randomized
controlled trial in a district like yours
shows it works great. In the other condition,
I said, imagine there’s an absence
reduction intervention that a single principal
in a different district says it works great. So from a truth– from a true value in terms
of what we know to be true, a large randomized
controlled trial gives us more information
about the actual causal effect than a single
principal observing the impact in their district,
in their single school. And I said, how
interested would you be in working to implement
the program in your school, just based on this information? Between subjects, so they don’t
know that they’re different. And roughly the same. So basically like, they– and actually, the
data actually– they slightly prefer
a single principal to any randomized experiment. We then do a bunch of– so now we’re moving from peers– and I have a big
field experiment where we’re working on
this, where we’re actually going to deliver to several
thousand superintendents marketing materials where
we quote a principal or we quote a
randomized experiment, and the outcome measures whether
they hire this organization that we started. And it’s almost certain they’re
going to prefer the principal. Not certain. How about, it’s almost
certain that they’re not going to prefer the RCT. And all that is like,
why do we care– we’re the only people
who care so much about this level of evidence. So here’s a hypothetical
press release. This is an online
sample I’ll then replicate with some
superintendents in the next study. “According to a
new meta-analysis, researchers found
that people whose diets were high
in vitamin K2 had a 23% lower risk
of hip fractures compared to those with
a lower consumption.” So this is just correlation. We just look at people who have
this versus people who don’t. People who eat– so
there are a lot of things that could be different between
people who eat vitamin K2 diets. I don’t even– is that even– we maybe even made
up the vitamin. I don’t know if
that’s a real vitamin. [LAUGHTER] And we said hypothetical twice. See, that’s the ethical part. But then in the
other condition, we said, according to a new
randomized controlled trial, people were randomly assigned
to the vitamin versus those in the control group. OK, it may seem subtle, but
one is observational data, just correlation. The other is causal. And this is just lay people. They did understand– they did
read it closely enough to be able to know that in the
randomized controlled condition, when we said,
were they randomly assigned to treatment condition? They knew they were. And in the
correlation condition, they knew they were not. 74% of people correctly
recalled what condition, that it was not
randomly assigned or it was randomly assigned. So they processed it. They know that it’s
random assignment or not. But again, does it
prevent hip fractures? The same. If you were a doctor,
would you prescribe it? The same. OK, so normal people,
even when they know random assignment
happens, don’t really think it’s that important
for inferring the causality. So now I did superintendents. I did this a couple of weeks ago
with a group of superintendents that were on campus. There are 64 of them. I asked them, researchers
found that those– so you guys can do the
counterfactual here, trying to figure out what’s
going on, causally, here. Research found that students
who enrolled in peer tutoring had a 12% higher
likelihood of passing all their classes compared
to those who did not enroll. One could imagine that those
who enroll on their own are different than
those who do not, and so then it may not
be the peer tutoring, but it may be those who elect to
enroll are different than those who do not. This is the danger of this
kind of observational data. Then in another condition, we
say, these are superintendents. Those who were randomly assigned
to participate in peer tutoring had a 12% higher likelihood
of passing all their classes compared to those
who were randomly assigned to not participate. So these are random assignments,
same populations, 12%. This should give
you a much higher prior on whether this matters,
whether this works, peer tutoring. Again, they can infer. They read it closely enough. That graph, I find confusing. 84% of the superintendents in
the next question can answer, were they randomly assigned
in the vignette you just read? And those in the RCT said yes. Those in the correlational
study said no. So they read it and they
understand that they were or were not randomly assigned. And now it’s– these
are superintendents. Does peer tutoring cause
higher course passing? Same across conditions. Based on the press
release, would you recommend it to students? Same across conditions, even
though we’ve given them– if this were true,
and it is true, they can’t infer this sort
of evidentiary difference between the correlational
and causal study. I say this as,
like, I am obsessed with inferring causality, how
do we know whether something caused anything? All of my colleagues
are obsessed with it. And many of you, I share your– we’re too obsessed with it. One, there’s like,
we should be more interested in descriptive
data in the first place. And second, these
kind of causal claims take on this dominating role in
policy and public discussion. Fine. But nonetheless, it does
take on a dominating role. And the audience doesn’t
seem to know the difference and doesn’t– and slightly
prefers hearing an anecdote from a single principal over
evidence of a randomized experiment. I talked to superintendents,
and this is exactly– they are much more moved by
a quote from a single parent than me saying that we have 15
randomized experiments showing it. And lots of possible
alternative explanations. But I find it baffling. And more than that, I find it– it leads to this
deep introspection of what am I doing? [LAUGHTER] Why bother? So in summary, some effective
practices scale rapidly. Others don’t, sometimes, I
think, because of attributes of the practice, whether
it requires teachers to implement it or not,
and sometimes because of decisions by leaders. And leaders have an intuition
that pilot studies are useful. And I just published something I
really like in EdWeek this week on how effect sizes in
education are really small. It just turns out it’s
hard to change people’s behavior in any domain. In voting, we’ve now learned
that the typical Get Out the Vote mailing
increases turnout. Generate your own intuition. A good mailing saying
remember to vote on Tuesday, and you got it on a Monday. Think about what
impact it has on increasing people’s
likelihood of voting, OK? It has a 0.2 percentage
point effect. I’m going to bet
that everybody had a higher estimate,
everybody who’s not familiar with this research. I’ve run probably over 100
Get Out the Vote experiments using mail, like Don Green and
others have this meta analysis. 0.19 percentage points. Is that good or bad? It’s neither. That’s just what it is. It turns out it’s hard to
change people’s behavior with light touch interventions. In education, nearly everything
has a small effect, nearly everything. We wrote about it. You guys have probably
heard about growth mindset, this growth mindset stuff,
where you teach kids that their intelligence
is malleable versus their intelligence is fixed. Carol Dweck has a series
of great papers on this. The original paper in schools
showed that it reduced– that when you convince kids
that a growth mindset is real and that they should work
hard and that intelligence is malleable, it led
to a 0.8 GPA increase. There were less than 100
students in that study. And a recent– including
Carol and others, massive replication
of 11,000 students, finds it’s actually 0.04
GPA points, which is still outrageously worth it for
a 30 minute intervention, because it’s really
hard to improve student achievement at a 30 minute
intervention that increases GPA by 0.04. It’s still worth it. It’s just, there’s
no silver bullet. Turns out it’s hard
to change behavior. So effects sizes are small
when samples are small and the world is super
noisy, but people still think pilot studies are informative. Peer reference, seen as at
least as informative as RCTs. Correlation experiments
appear equally compelling as causal information. And this is all provisional. Like, I don’t need to replicate
all of this, but these are– this is– the way I learn is by running
poorly-designed survey experiments. But ultimately, the
question that I’m wrestling with and
reading about is what are attributes of
organizations, leaders, policy environments,
and interventions that make organizations
more likely to adopt effective practices? And that is the
motivating question that comes from this
body of work, from– these are some of
the studies we’ve done over the last couple of
years, and the two that work, these actually work,
and only one of them, people want to adopt. [LIGHT MUSIC PLAYING] That is all, and I am
happy to take questions. Thank you guys for coming. [APPLAUSE]

Leave a Reply

Your email address will not be published. Required fields are marked *