One of the formidable challenges healthcare providers face is putting medical data to maximum use. Somewhere between the quest to unlock the mysteries of medicine and design better treatments, therapies, and procedures, lies the real world of applying data and protecting patient privacy.
"Today, there are many barriers to putting data to work in the most effective way possible," observes Drew Harris, director of health policy and population health at Thomas Jefferson University's College of Population Health in Philadelphia, PA. "The goals of protecting patients and finding answers are frequently at odds."
It is a critical issue and one that will define the future of medicine. Medical advances are increasingly dependent on the analysis of enormous datasets—as well as data that extends beyond any one agency or enterprise. What's more, as connected healthcare devices flourish, at-home and remote monitoring blossoms and big data analytics advances at a staggering rate, the stakes—and the ability to use, misuse, and abuse confidential data grows significantly.
"Healthcare is at a very important crossroads. To move to a more value-based framework and one that rewards patient and doctor behavior, we need to have systems in place that manage data and protect individuals," says Ophir Frieder, professor of computer science and information processing at Georgetown University in Washington, D.C., and professor of biostatistics, bioinformatics, and biomathematics at the Georgetown University Medical Center.
Make no mistake, researchers are exploring ways to better manage and protect patient data. These methods revolve largely around machine learning, integrating blockchain into electronic healthcare records (EHRs) and other systems, and finding other ways to anonymize data, validate records, and prevent data leaks. "We must strike a balance between data ownership, interoperability, security, and dynamic consent for patients, so that data can be used and shared at the right times and under the right circumstances," says Jim Nasr, chief software architect for the Centers for Disease Control and Prevention (CDC).
The level of disruption rippling through the healthcare industry is staggering. According to research firm IDC, the overall volume of data in the industry will increase from 153 exabytes in 2013 to 2,314 exabytes in 2020.
There also is a greater variety of data to manage. Electronic healthcare records, personal fitness devices, connected home monitoring systems, and a variety of other sensors, machines, and systems are pushing the boundaries of medicine in new directions. As a result, researchers, physicians, and other practitioners—using big data analytics and machine learning—can spot patterns, trends, and causalities that would otherwise escape human detection. This makes it possible to improve therapies, procedures, and drugs, while improving diagnostics and care for individual patients.
Yet the risks are also enormous—and they are magnified by the fact that there are no clear boundaries for what constitutes appropriate or inappropriate use. In some cases, it's possible to trace aliases, codes, and metadata used for anonymous tracking back to individuals. At the same time, a substantial amount of health data—particularly information from activity trackers, website searches, and credit card records—remains unregulated in the U.S. and many other countries. Because all this data falls outside the scope of the U.S. Health Insurance Portability and Accountability Act (HIPAA), data scientists can circumvent privacy protections by combining publicly available data with anonymized data to gain deep insights into personal behavior and health.
Marketers and companies looking to target consumers can tap into this data. "The result is a blizzard of transactions hidden to the public in which companies (called data miners) buy, sell, and barter anonymized but intimate profiles of hundreds of millions of Americans," says Adam Tanner, who authored a 2017 report for The Century Foundation, Strengthening Protection of Patient Medical Data. "While the anonymization of patient data may seem like a good firewall for protecting privacy, it increasingly is not." In fact, Tanner says, "Data scientists can now circumvent HIPAA's privacy protections by making very sophisticated guesses, marrying anonymized patient dossiers with named consumer profiles available elsewhere—with a surprising degree of accuracy."
Says Andreas Holzinger, professor of machine learning at the Medical University of Graz in Austria and founder and lead of the university's Human-Computer Interaction and Knowledge Discovery and Data Mining (HCI-KDD) group. "Healthcare data represents enormous value to both legitimate businesses and the hacker community." For example, employers could potentially use private medical and healthcare data to guide decisions about hiring and firing. Insurance companies could use personal data to make coverage and pricing decisions, and individuals could find the public release of personal health data embarrassing or costly in other ways. "If we are unable to protect people but at the same time enable the use of data in an appropriate manner, we risk the public losing confidence in the system and medical researchers losing opportunities to solve problems," Holzinger says.
The need for more sophisticated research methods and controls is redefining healthcare. Researchers are exploring new and more sophisticated ways to collect, manage, and exchange data. In addition, regulatory requirements in many countries are adding to the urgency. For instance, the General Data Protection Regulation (GDPR) in the European Union requires any organization handling data for even a single European citizen to abide by strict privacy guidelines, or risk a substantial fine. In Japan, a law introduced last year requires stricter controls over how healthcare providers manage data. While all electronic healthcare records must be searchable for academic researchers, drug companies, and others, facilities must make the data completely anonymous.
Holzinger is developing a human-in-the-loop machine learning approach that offers a high level of data trace-ability, and the ability to explain how the system arrives at a conclusion. "We must understand how and why an algorithm makes decisions and that data is verifiable. Consequently, we have to move beyond a black box approach to have full confidence that data is accurate and systems work as advertised," he explains.
The central problem Holzinger is attempting to address is that medical data is intrinsically complex, high-dimensional, and noisy, and contains much unstructured information. A prime example is the use of Gaussian processes, where automated machine learning (aML) systems with standard kernel machines attempt to find answers through stochastic modeling. These systems struggle with basic extrapolation functions that remain very simple for humans.
An interactive machine learning (iML) model, on the other hand, allows a researcher, doctor, or other expert in the loop to select specific parameters or reduce an exponential search space through heuristic selection of samples. Such a model can also help guide causality, though it can also reflect biases and introduce or amplify human errors.
By combining cross-functional expertise, it also is possible to explore data models in entirely different ways—all while maintaining tight security and privacy controls through a tool such as blockchain. The goal is a glass-box approach to data processing. "It introduces the concept of explainable medicine. A medical doctor can retrace what a certain algorithm has done, and this may provide insight that ultimately delivers a medical explanation," Holzinger says.
At the CDC, Nasr and an accelerator development team are building a software framework that incorporates blockchain to integrate disparate systems used to address public health issues such as opioid abuse and infectious disease. Blockchain would guarantee the anonymous data is accurate, and that it comes from a legitimate source. This is critical because different groups and agencies—across a spectrum of public and private entities—must share data feeds and databases while ensuring errors, intentional or inadvertent, are not introduced into the data stream.
"We need to have greater flexibility than current data service architectures provide," Nasr explains.
"There are almost no standards for data. As a result, doctors rarely have access to a complete and accurate medical record."
At the center of this emerging model is a simple but profound issue, he says. "We must be able to effectively communicate software decisions and direction to a large customer base of physicians, epidemiologists, and public health experts. We have to ensure large numbers of disparate groups working on unconnected, separately funded, contract-based projects can access, share, and process data efficiently."
Nasr has his sights set on designing an interoperable software framework that can tie together databases, IoT devices, and more. The approach uses open source software, deployed through Docker containers, to create a mesh of functions and applications—Nasr calls this the "Software Theme Park"—that can be connected and rearranged to address broad data analytics requirements. These functions could carry out much of what is needed for public health data surveillance, including automatically validating anonymous data from different sources and across different application programming interfaces (APIs). The key, Nasr says, is standard-definition APIs regulated through an API gateway. This makes it possible to verify data across local, state, federal, and international agencies, as well as private organizations and other third-party data sources. The end result is a "robust information supply chain," he says.
Finding ways to improve healthcare at the patient level is also at the center of this revolution in informatics. Georgetown's Frieder, who also serves as chief scientific officer for Umbra Health, has developed a framework for connecting and merging disparate medical data automatically while protecting the identity of patients. The system allows authorized healthcare practitioners—doctors, dentists, therapists, nutritionists, and others—to access EHRs and view only the specific information they require to do their jobs.
Frieder's motivation? Medical errors account for about 250,000 deaths a year in the U.S. alone, according to a 2016 study conducted by Johns Hopkins Medicine. In addition, errors account for many other injuries and therapy failures globally. "Part of the problem is inconsistent systems and processes. There are almost no standards for data. As a result, doctors rarely have access to a complete and accurate medical record."
The approach, which the company calls Lifeography, ties a person's data together in a secure cloud-based HIPAA-compliant environment. It harnesses blockchain to create a traceable ledger that spans healthcare touch-points. Frieder says the system serves as a lingua franca for healthcare data, and delivers a longitudinal view from birth to death. The software framework delivers only the data the patient and attendant providers deem necessary for a specific interaction or transaction and strips out any unnecessary personal information, while also providing a forensic trail of users and devices. The end goal is to deliver "more precise orders, more accurate diagnoses, and more coordinated healthcare, including predictive and preventative capabilities," he says.
Of course, the ultimate challenge is to ensure these next-generation systems and processes make data widely accessible, while locking it down. The goal is to mine data to the extent possible, but protect personal privacy.
For now, many questions remain about who should hold and maintain blockchain ledgers, who should be granted privileges to modify or view data, and how identities should be managed and displayed on a blockchain. No clear consensus or direction has emerged.
Nevertheless, the future of healthcare is clear: "We need greater portability of data, greater interoperability between systems, and a more coordinated approach to patient care," Harris says. "As we incorporate fitness devices, medical monitoring devices, and more advanced analytics, systems must address the often-competing interests of putting data to maximum use but also protecting it."
Beyond Data Mining: Integrative Machine Learning for Health Informatics. April, 2016; https://link.springer.com/article/10.1007/s40708-016-0042-6.
Machine Learning for Health Informatics. ML for Health Informatics, LNAI 9605, pp. 1–24, 2016; https://link.springer.com/book/10.1007/978-3-319-50478-0.
Aitken, M., de St. Jorre, J., Pagliari, C., Jepson, R., and Cunningham-Burley, S.
Public responses to the sharing and linkage Of health data for research purposes: a systematic review and thematic synthesis of qualitative studies. BMC Medical Ethics, Vol. 17, No. 1. Jan. 12, 2016.
Confidentiality, Privacy and Security of Health Information: Balancing Interests. Dec. 8, 2014. https://healthinformatics.uic.edu/resources/articles/confidentiality-privacy-and-security-of-health-information-balancing-interests/
Johns Hopkins Medicine, News Release, Study Suggests Medical Errors Now Third Leading Cause of Death in the U.S., May 3, 2016; http://bit.ly/2BeXY5T
©2018 ACM 0001-0782/18/5
Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and full citation on the first page. Copyright for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, to republish, to post on servers, or to redistribute to lists, requires prior specific permission and/or fee. Request permission to publish from [email protected] or fax (212) 869-0481.
The Digital Library is published by the Association for Computing Machinery. Copyright © 2018 ACM, Inc.
No entries found