How to Make Sense of This Fall’s Messy COVID Data
It is a truth universally acknowledged among health experts that official COVID-19 data are a mess right now. Since the Omicron surge last winter, case counts from public-health agencies have become less reliable. PCR tests have become harder to access and at-home tests are typically not counted.
Official case numbers now represent “the tip of the iceberg” of actual infections, Denis Nash, an epidemiologist at the City University of New York, told me. Although case rates may seem low now, true infections may be up to 20 times higher. And even those case numbers are no longer available on a daily basis in many places, as the CDC and most state agencies have switched to updating their data once a week instead of every day.
How, then, is anyone supposed to actually keep track of the COVID-19 risk in their area—especially when cases are expected to increase this fall and winter? Using newer data sources, such as wastewater surveillance and population surveys, experts have already noticed potential signals of a fall surge: Official case counts are trending down across the U.S., but Northeast cities such as Boston are seeing more coronavirus in their wastewater, and the CDC reports that this region is a hot spot for further-mutated versions of the Omicron variant. Even if you’re not an expert, you can still get a clearer picture of how COVID-19 is hitting your community in the weeks ahead. You’ll simply need to understand how to interpret these alternate data sources.
The problem with case data goes right to the source. Investment in COVID-19 tracking at the state and local levels has been in free fall, says Sam Scarpino, a surveillance expert at the Rockefeller Foundation’s Pandemic Prevention Initiative. “More recently, we’ve started to see lots of states sunsetting their reporting,” Scarpino told me. Since the Pandemic Prevention Initiative and the Pandemic Tracking Collective started publishing a state-by-state scorecard of breakthrough-case reporting in December 2021, the number of states with a failing grade has doubled. Scarpino considers this trend a “harbinger of what’s coming” as departments continue to shift resources away from COVID-19 reporting.
Hospitalization data don’t suffer from the same reporting problems, because the federal government collects information directly from thousands of facilities across the country. But “hospitalizations often lag behind cases by a matter of weeks,” says Caroline Hugh, an epidemiologist and volunteer with the People’s CDC, an organization providing COVID-19 data and guidance while advocating for improved safety measures. Hospitalizations also don’t necessarily reflect transmission rates, which still matter if you want to stay safe. Some studies suggest, for example, that long COVID might now be more likely than hospitalization after an infection.
For a better sense of how much the coronavirus is circulating, many experts are turning to wastewater surveillance. Samples from our sewage can provide an advanced warning of increased COVID-19 spread because everyone in a public-sewer system contributes data; the biases that hinder PCR test results don’t apply. As a result, Hugh and her colleagues at the People’s CDC consider wastewater trends to be more “consistent” than constantly fluctuating case numbers.
When Omicron first began to wreak havoc in December 2021, “the wastewater data started to rise very steeply, almost two weeks before we saw the same rise” in case counts, Newsha Ghaeli, the president and a co-founder of the wastewater-surveillance company Biobot Analytics, told me. Biobot is now working with hundreds of sewage-sampling sites in all 50 states, Ghaeli said. The company’s national and regional dashboard incorporates data from every location in its network, but for more local data, you might need to go to a separate dashboard run by the CDC or by your state health department. Some states have wastewater surveillance in every county, while others have just a handful of sites. If your location is not represented, Ghaeli said, “the wastewater data from communities nearby is still very applicable.” And even if your county does have tracking, checking up on neighboring communities might be good practice. “A surge in a state next door … could very quickly turn into a surge locally,” Ghaeli explained.
Ghaeli recommends watching how coronavirus levels in wastewater shift over time, rather than homing in on individual data points. Look at both “directionality” and “magnitude”: Are viral levels increasing or decreasing, and how do these levels compare with earlier points in the pandemic? A 10 percent uptick when levels are low is less concerning than a 10 percent uptick when the virus is already spreading widely.
Researchers are still working to understand how wastewater data correlate with actual infections, because every community has unique waste patterns. For example, big cities differ from rural areas, and in some places, environmental factors such as rainfall or nearby agriculture may interfere with coronavirus tracking. Still, long-term-trend data are generally thought to be a good tool that can help sound the alarm on new surges.
Wastewater data can help you figure out how much COVID-19 is spreading in a community and can even track all the variants circulating locally, but they can’t tell you who’s getting sick. To answer the latter question, epidemiologists turn to what Nash calls “active surveillance”: Rather than relying on the COVID-19 test results that happen to get reported to a public-health agency, actively seek out and ask people whether they recently got sick or tested positive.
Nash and his team at CUNY have conducted population surveys in New York City and at the national level. The team’s most recent survey (which hasn’t yet been peer-reviewed), conducted from late June to early July, included questions about at-home test results and COVID-like symptoms. From a nationally representative survey of about 3,000 people, Nash and his team found that more than 17 percent of U.S. adults had COVID-19 during the two-week period—about 24 times higher than the CDC’s case counts at that time.
Studies like these “capture people who might not be counted by the health system,” Nash told me. His team found that Black and Hispanic Americans and those with low incomes were more likely to get sick during the survey period, compared with the national estimate. The CDC and Census Bureau take a similar approach through the ongoing Household Pulse Survey.
These surveys are “a gold mine of data,” though they need to be “carefully designed,” Maria Pyra, an epidemiologist and volunteer with the People’s CDC, told me. By showing the gap between true infections and officially reported cases, surveys like Nash’s can allow researchers to approximate how much COVID-19 is really spreading.
Survey results may be delayed by weeks or months, however, and are typically published in preprints or news reports rather than on a health agency’s dashboard. They might also be biased by who chooses to respond or how questions are worded. Scarpino suggested a more timely option: data collected from cellphone locations or social media. The Delphi Group at Carnegie Mellon University, for example, provides data on how many people are Googling coldlike symptoms or seeking COVID-related doctor visits. While such trends aren’t a perfect proxy for case rates, they can be a helpful warning that transmission patterns are changing.
Readers seeking to monitor COVID-19 this fall should “look as local as you can,” Scarpino recommended. That means examining county- or zip-code-level data, depending on what’s available for you. Nash suggested checking multiple data sources and attempting to “triangulate” between them. For example, if case data suggest that transmission is down, do wastewater data say the same thing? And how do the data match with local behavior? If a popular community event or holiday happened recently, low case numbers might need to be taken with a grain of salt.
“We’re heading into a period where it’s going to be increasingly harder to know what’s going on with the virus,” Nash told me. Case numbers will continue to be undercounted, and dashboards may be updated less frequently. Pundits on Twitter are turning to Yankee Candle reviews for signs of surges. Helpful sources still exist, but piecing together the disparate data can be exhausting—after all, data reporting and interpretation should be a job for our public-health agencies, not for concerned individuals.
Rather than accept this fragmented data status quo, experts would like to see improved public-health systems for COVID-19 and other diseases, such as monkeypox and polio. “If we get better at collecting and making available local, relevant infectious-disease data for decision making, we’re going to lead healthier, happier lives,” Scarpino said.