What Is the National COVID Cohort Collaborative (N3C) Data Enclave?
What Is N3C, and What’s the Technology Behind It?
Dr. Kenneth Gersing, director of informatics at NCATS, says the organization gets groups to work together and turns data silos into a network through shared services. To make a program such as N3C possible, it’s important for NCATS to find economies of scale.
The organization also leads the Rare Diseases Clinical Research Network, a precursor of N3C, and it was able to use some of the same technology and processes around data cleanup, analysis and output to support the N3C initiative. In 2017, NCATS started piloting Palantir instances in the cloud using Amazon Web Services to create a secure analytic environment. The organization deployed Google Workspace in 2019 to allow the research community to easily share findings.
The organization also harmonizes all the COVID-19 data collected from healthcare institutions, which use a variety of common data models. NCATS provides all the tools researchers need to access and analyze the data and ensures the security of the enclave and protection of patient privacy.
RELATED: Find out how technology helps identify and track social determinants of health data.
Gersing explains that NCATS handles identity authentication, cloud deployment, security, Software as a Service support, single sign-on, ticketing and compliance concerns so that researchers can focus on science.
The goal of N3C is to share information with the community. If a researcher brings in an algorithm to run against the enclave, it becomes part of the assets available to the community. The algorithm would need to be evaluated for security purposes prior to approved use.
How Is N3C Data Accessed by Researchers?
N3C’s enclave includes data from at least 5.9 million COVID-19-positive patients, plus data from two controls for every positive patient. New data is collected, harmonized to the Observational Medical Outcomes Partnership common data model and released weekly.
“We have almost 4,000 volunteers. There is no way this would be possible without a community coming together and helping,” says Gersing.
Some of the patient data collected dates back to January 1, 2018, offering researchers a fuller picture of patient journeys.
N3C uses a centralized rather than federated model. In federated models, researchers can ask a question such as, “How many of the female patients over 60 have hypertension?” They would receive a number but wouldn’t have access to row-level data. Using a centralized model removes that limitation.
“We wanted researchers to be able to reach the data directly and iterate over it,” says Gersing. “Particularly, we wanted to be able to use technologies like machine learning, which is difficult to do in a federated model.”
N3C also ensures that definitions are consistent, which is important in cross-model data harmonization. Different organizations must agree on the definition of a visit, for instance, Gersing explains.