Summary: Researchers question the statistical evidence behind psychiatric therapies considered to be empirically supported treatments by the APA.
Source: University of Kansas
A paper appearing today in a special edition of the Journal of Abnormal Psychology questions much of the statistical evidence underpinning therapies designated as “Empirically Supported Treatments,” or ESTs, by Division 12 of the American Psychological Association.
For years, ESTs have represented a “gold standard” in research-supported psychotherapies for conditions like depression, schizophrenia, eating disorders, substance abuse, generalized anxiety, and post-traumatic stress disorder. But recent concerns about the replicability of research findings in clinical psychology prompted the re-examination of their evidence.
The new study, led by researchers at the University of Kansas and University of Victoria, concluded that while underlying evidence for a small number of empirically supported treatments is strong, “power and replicability estimates were concerningly low across almost all ESTs, and individually, some ESTs scored poorly across multiple metrics.”
“By some accounts, there are over 600 approaches to psychotherapy, and some are going to be more effective than others,” said co-lead author Alexander Williams, program director of psychology and director of the Psychological Clinic for KU’s Edwards Campus. “Since the 1970s, people have been trying to figure out which are most effective using clinical trials just like in medicine, where some subjects are assigned to therapy and some to a control group. Division 12 of the APA has a list of therapies with strong scientific evidence from these trials, called ESTs. Ours is the first attempt anyone has made using this broad suite of statistical tools to evaluate the EST literature.”
The researchers analyzed 78 ESTs with “strong” or “modest” research support, as determined by the APA’s Society of Clinical Psychology Division 12, from more than 450 published articles. Four types of evidential value were assessed — rates of misreported statistics, power, R-index and Bayes factors. Among the key conclusions:
- 56% (44 of 78) of ESTs fared poorly across most metric scores.
- 19% (15 of 78) of ESTs fared strongly across most metric scores.
- 52% (26 of 50) of ESTs deemed by Division 12 of the APA as having Strong Research Support fared poorly across most metric scores.
- 22% (11 of 50) of ESTs deemed by Division 12 of the APA as having Strong Research Support fared strongly across most metric scores.
- 64% (18 of 28) of ESTs deemed by Division 12 of the APA as having Modest Research Support fared poorly across most metric scores.
- 4% (4 of 28) of ESTs deemed by Division 12 of the APA as having Modest Research Support fared strongly across most metric scores.
“Our findings don’t mean that therapy doesn’t work, they don’t mean that anything goes or everything is the same,” said co-lead author John Sakaluk, assistant professor in the University of Victoria’s Department of Psychology, who earned his doctorate at KU. “But based on this evidence, we don’t know if most therapies designated as ESTs do actually have better science on their side compared to the alternative, research-supported forms of therapy.”
According to Williams, the field of clinical psychology may be ripe for a broad-scale reassessment of therapies that were thought to be supported by rigorous scientific evidence until now.
“Medical researchers coined a term called ‘medical reversal,'” the KU researcher said. “Sometimes these are medical practices that doctors use across the country, but they are discontinued after it’s found they don’t work or aren’t more effective than less-costly alternatives — or they’re actually harmful. Pending replications of our results, we may need broad systems-level psychotherapy reversals. Some of these ESTs are widely implemented in big systems like the Veterans Health Administration. If we find evidence for them isn’t as strong as believed, it may be worth looking at. Let’s say, hypothetically, there are two therapies for depression, and people have said, ‘Well, Therapy A has stronger evidence for it than Therapy B.’ But we know Therapy B works, too, and it’s less costly. Today, if we find the evidence for Therapy A isn’t actually stronger, it may be time to promote Therapy B.”
Further, Williams advised clinicians and patients to continually evaluate progress in therapy and adjust therapeutic approaches based more on patient progress than research evidence of a given therapy’s effectiveness.
“For clinicians and clients, this speaks to the importance of frequently assessing how well a client is doing in therapy,” he said. “Routine outcome monitoring is always a good thing to be doing, but it may be a particularly good idea based on new evidence that we don’t know if some therapies are effective. So, if I’m a patient, I want to assess how I’m doing — and there are different measures for doing that. This study suggests it’s even more important than previously believed.”
For the research community, the authors recommended a reassessment of the size and power of clinical trials and more collaborations between labs to increase the precision of analyses, along with fresh approaches to how research is appraised, published and evaluated.
“One of the things that becomes really obvious when you look at the literature is researchers are collecting and analyzing their data in ways that are extremely flexible,” Sakaluk said. “If you don’t follow certain rules of statistical inference, you can inadvertently trick yourself into claiming effects that aren’t really there. For EST research, it may become important to define in advance what researchers are going to do — like how they’ll analyze data — and go on record in a way that restricts what they’re going to do. This would coincide with a movement to encourage researchers to propose what they’d like to do and get reviewers and journal editors to weigh in before — not after — scientists do research, and to publish it irrespective of what they find.”
Williams said studies supporting the power of clinical treatments should improve over time with more exacting approaches to statistical data.
“This is a system-level issue that will get better as our field begins to grapple with replication,” he said. “We think you’ll see improvement in study design going forward. There wasn’t a fieldwide appreciation for these problems until a decade ago. It takes time for the field to improve. We think our results will complement ongoing efforts by Division 12 to increase the quality of EST research and evaluation.”
Williams and Sakaluk’s co-authors were Robyn Kilshaw of the University of Utah and Kathleen Teresa Rhyner of the Canandaigua VA Medical Center, the latter of whom also earned her doctorate at KU.
University of Kansas
Brendan M. Lynch – University of Kansas
The image is in the public domain.
Original Research: Closed access
“Evaluating the evidential value of empirically supported psychological treatments (ESTs): A meta-scientific review”. Sakaluk, John Kitchener, Williams, Alexander J., Kilshaw, Robyn E., Rhyner, Kathleen Teresa.
Journal of Abnormal Psychology. doi:10.1037/abn0000421
Evaluating the evidential value of empirically supported psychological treatments (ESTs): A meta-scientific review
Empirically supported treatments (or therapies; ESTs) are the gold standard in therapeutic interventions for psychopathology. Based on a set of methodological and statistical criteria, the APA has assigned particular treatment-diagnosis combinations EST status and has further rated their empirical support as Strong, Modest, and/or Controversial. Emerging concerns about the replicability of research findings in clinical psychology highlight the need to critically examine the evidential value of EST research. We therefore conducted a metascientific review of the EST literature, using clinical trials reported in an existing online APA database of ESTs, and a set of novel evidential value metrics (i.e., rates of misreported statistics, statistical power, R-Index, and Bayes Factors). Our analyses indicated that power and replicability estimates were concerningly low across almost all ESTs, and individually, some ESTs scored poorly across multiple metrics, with Strong ESTs failing to continuously outperform their Modest counterparts. Lastly, we found evidence of improvements over time in statistical power within the EST literature, but not for the strength of evidence of EST efficacy. We describe the implications of our findings for practicing psychotherapists and offer recommendations for improving the evidential value of EST research moving forward.