From biomedicine to political sciences, researchers increasingly use machine learning as a tool to make predictions on the basis of patterns in their data. But the claims in many such studies are likely to be overblown, according to Sayash Kapoor and Arvind Narayanan at Princeton University. They want to sound an alarm about what they call a "brewing reproducibility crisis" in machine-learning-based sciences.
Machine learning is being sold as a tool that researchers can learn and use by themselves — and many follow that advice, Kapoor says. That can create issues and lead to reproduceability failures, according to "Leakage and the Reproducibility Crisis in ML-based Science." The researchers have created guidelines for scientists to avoid such pitfalls, including an explicit checklist to submit with each paper.
Their rallying cry has struck a chord. More than 1,200 people have signed up for an online workshop, to be held July 28, designed to come up with and disseminate solutions. "Unless we do something like this, each field will continue to find these problems over and over again," Kapoor says.
View Full Article
No entries found