Reproducibility Rounds

Stanford, California

About

Reproducibility Rounds

Stanford CTSA Program on Research Rigor & Reproducibility (SPORR), in collaboration with Columbia, Duke, Harvard, and Indiana, cordially invites you to the Reproducibility Rounds webinar titled: What Biomedicine Can Learn about Reproducibility from Social & Behavioral Research: The SCORE Project

Registration Link

Speaker: Brian Nosek, PhD

Brian Nosek is the founder and Executive Director of the Center for Open Science (COS) and a professor at the University of Virginia. COS has long been a leader in creating an infrastructure to foster open and reproducible science, as well as a pioneer in metascience scholarship, having conducted three major reproducibility projects – in Psychological Science (Science, 2015), Cancer Biology (eLife, 2021) and in Social and Behavioral Science (SCORE, Nature, 2026).

Brian’s research and interests are in understanding how people and systems produce values-misaligned behavior; to develop, implement, and evaluate solutions to align behavior with values; and, to improve research methods and culture to accelerate progress in science. For this work he has received honorary doctorates from the Universities of Ghent (2019) and Bristol (2022).

Abstract: SCORE, a collaboration of 865 researchers, is now released as three papers in Nature, six preprints, and a lot of data (https://cos.io/score/). SCORE examined repeatability of findings from the social-behavioral sciences and tested whether human and automated methods could predict replicability. A representative subset of 600 claims were available for repeatability tests: reproductions (same data, same analysis), robustness tests (same data, different analyses), and replications (same question, different data). For reproducibility, we could obtain data for only 24% of the 600 papers. Of the 143 papers (551 claims) assessed, we precisely reproduced 54%, and approximately reproduced 74%. We were much more likely to succeed if authors shared data and code, versus just data or if we had to reconstruct data from original sources. For robustness, 34% of reanalyses showed the same result within a narrow tolerance (+/- .05 Cohen’s d), and 57% with a wider tolerance (+/- .20). Limiting to statistical conclusions (p<.05?), 74% of reanalyses reached the same conclusion, 24% observed no effect, and 2% observed an opposing effect. For replicability, we tested findings from 164 papers and successfully replicated 49% of them with the common statistical significance criterion. Original studies had an average effect size of r = 0.25, replication studies r = 0.10. Best performing human methods achieved about 75% accuracy predicting replication outcomes. The talk will also discuss implications for researchers and institutions on ways to improve the credibility of research.

About

Tags