Sign In

Communications of the ACM

ACM Careers

Tech Leaders Launch Open Database of Scientific Articles to Fight Coronavirus

View as: Print Mobile App Share:
COVID-19 Open Research Dataset (CORD-19) logo

Five organizations have released a new open dataset of over 29,000 scientific articles published in journals and on preprint servers, in the hopes of spurring America's artificial intelligence experts to develop new techniques for mining data and text that could help answer some of the most pressing questions about the novel coronavirus and the disease it causes.

The dataset is believed to be the most extensive collection of its kind concerning the coronavirus, and, crucially, it's machine-readable, a format that can be easily processed by a computer and thus makes it much easier for AI specialists to work with.

The database's contents are variable in terms of how comprehensive they are. Only about 13,000 of the articles in the dataset include full text, meaning that all of the figures and words within the article are available. The other roughly 16,000 articles include only metadata, such as the authors' names or the abstract of the paper, in large part because they are behind paywalls.

The COVID-19 Open Research Dataset was built by a collaboration of organizations, including Microsoft, the Allen Institute for AI, the National Institutes of Health's National Library of Medicine, the Chan Zuckerberg Initiative, and Georgetown University's Center for Security and Emerging Technology.

Michael Kratsios, the U.S. chief technology officer, told reporters that the Trump administration is issuing "a call to action" to the tech community to use the dataset to develop AI techniques and insights that could be useful in the response to the coronavirus.

View Full Article


No entries found

Sign In for Full Access
» Forgot Password? » Create an ACM Web Account