Documenting Disparity: The Challenges of Collecting Racial Data on Coronavirus

By Cynthia Greenlee Apr 29, 2020

As of April 15, each of the nine people known to have died from coronavirus in Richmond, Virginia, was Black. 

It’s a dire statistic—and part of a growing mountain of data proving just how prevalent and deadly COVID-19 is in communities of color, which already had less healthcare access pre-pandemic. 

Earlier this month, reports from Louisiana showed that Black people made up 70 percent of deaths, compared to their 30 percent share of the population. Also in early April, Latinx people constituted 33 percent of coronavirus fatalities in New York City. Black Chicagoans are dying at a rate six times higher than their White neighbors. In New Mexico, Native Americans are less than 10 percent of the population, but a third of confirmed coronavirus cases. And Asian-Americans are slightly more likely to die from the virus than their White counterparts.

But that data mountain about disparities isn’t growing fast or evenly enough. Last week, Congress passed a $480-billion dollar coronavirus relief package that included a provision to collect and publish information about how coronavirus has affected all Americans. It built off the Equitable Data Collection and Disclosure on COVID-19 Act, previously introduced by Rep. Robin Kelly of Illinois and scores of other congressional Democrats. The relief package will require the Department of Health and Human Services and associated agencies such as its Centers for Disease Control and Prevention to collect and report data to be broken down by race, ethnicity, age, sex, geographic region, and other relevant factors.” 

Just how hard is it to gather such data? Colorlines spoke to various public-health researchers and advocates, who said generating data can be fraught with complications, particularly when that data is about race and health disparities. But many also say it’s doable—if there’s the political will and funding at various levels of government and in public health and scientific communities.


The Difficulty With Data: Process and Politics

Consider the life cycle of data—be it information a patient gives at a medical appointment or through big, national telephone-based surveys. Someone has to come up with questions, ask them, answer them, record the answers, enter them uniformly into medical records systems or databases, know how to extract responses from files, figure out what’s missing, analyze responses in relationship to other information, summarize in internal reports or journal articles, and disseminate results. And that’s a much simplified and abbreviated chain of steps.   

Before a given statistic comes to life, the data must travel through these and other stages—all potential places for problems. Sociodemographic information should be routinely collected in most health-care settings. But patients forget to fill out questionnaires and may not answer race questions if they feel they’re intrusive or if they don’t trust the health-care provider. People sometimes don’t share their complete health histories. Use of outdated terms about race or gender identity can offend patients. Staff who collect information can skip sections, incorrectly record answers, or make assumptions about race or ethnicity based on a person’s physical appearance rather than asking the individual how they define themselves (Native Americans are particularly vulnerable to misidentification.) Furthermore, low literacy cuts across U.S. society, and many native English speakers and those who speak other languages struggle with navigating medical forms.

In addition to those challenges, there’s another overarching concern: Science and data collection are anything but neutral. As Daniel Dawes, director of the David Satcher Center at the Morehouse School of Medicine in Georgia, said: “Data is political.” He’s hopeful that the push for data collection is a step in the right direction, but he’s also clear-eyed about the opposition. Dawes is author of the book "The Political Determinants of Health" and previously chaired the National Working Group on Health Disparities and Health Reform, a consortium of 300 groups that advocated for the Affordable Care Act (ACA).  

He continued: “When the Obama administration made ACA provisions for collection of data about racial disparities, [administration and congressional] aides were willing to cut those provisions when the political fire got hot with the Tea Party in 2009. Those provisions were saved only by the intervention of the president himself. So, even within the Democratic Party, with a friendly president, reporting on health disparities was one of the easiest things” to abandon when negotiations got rough. Dawes further noted that hospitals and health-care lobbyists have often vehemently objected to proposed reporting requirements about race and racial disparities.  

Dawes conceded that health-care providers may balk at asking their staff to ask about race and additional details in the middle of a pandemic or when a person needs emergency care. 

“I know people are probably saying, ‘Our doctors and nurses don’t have time to be asking if you’re Black or White. We’re trying to save people’s lives.’ And that may be a fair argument. But I argue that there’s no way you can provide healthcare for the most vulnerable without data.”

Northwestern University medical anthropologist and researcher Ricky Hill pointed to problems that emanate from the very institutions that generate data. 

“Disparities are always understudied. And this is a moment when you can really see that science is determined by who’s in the room. You know, science is really fucking white. I work primarily in LGBTQ health, but we do a lot of racial and ethnic data collection because we want to know what the disparities are in our communities. And we can tell when someone who’s not from a community has created surveys; when someone who is not queer is trying to ask questions about queer people (usually, the glaring giveaway is that they use the word ‘homosexual’). The way to address that is to have the people you’re studying come in and give you feedback on your questions. We need to get out of our weird scientific bubbles and talk like humans to other humans,” said Hill. 

Getting good, inclusive data is hampered by structural inequality both in health care and in health research, and the original Equitable Data Act mentioned the critical role of historically Black colleges and universities in turning out data on communities of color. Yet, it did not make specific suggestions about how they can participate in ensuring diverse information about coronavirus cases. 

The Problem With a State-by-State Patchwork

As COVID-19 cases have multiplied in almost every state and U.S. territory, the pandemic has also amplified the tensions in having a federal system lead the charge—especially when President Trump dragged his feet, allowed unqualified relatives to run the response, and released an ill-advised plan to “reopen” the nation before the novel coronavirus has run its course. Governors largely have to figure it out on their own, among ongoing shortages of tests and necessary medical equipment. 

While states are fending for themselves, opening prematurely or collaborating regionally, they are collecting data in ways that aren’t standardized. There is no overarching and national requirement for reporting race in healthcare interactions. Most federally funded programs use the Office and Management Budget categories as a minimal standard or basis for more expanded ways to define identity. Those categories are: White, Black or African American, Asian or Pacific Islander, American Indian or Alaskan Native. “Hispanic” is counted as an ethnicity, meaning that people can claim a race as well as Latinx origins. An additional list helps to flesh out more detailed information about ethnicity, say, allowing an Asian American to specify that she is of Korean or Vietnamese descent.  

But states and cities may have entirely different ways of categorizing race, ethnicity, and gender. And every state isn’t willing to offer up data that reveals disparities within its borders, even when they have it.  

As George Washington University epidemiologist and contributor Emily Smith said, “Often the difficult part of epidemiological research [which focuses on disease patterns, including the spread of contagious illness] is not the mathematical models. Rather, it’s finding the right information, about the right people, in a harmonized way. And given that many cities and states aren’t collecting data on the race or ethnicity of people with COVID, it’s not only difficult, it’s impossible to document racial disparities.”

Linda Goler Blount, an epidemiologist and executive director of the Black Women’s Health Imperative, noted widely disparate reporting of racial data and COVID-19 according to geography and type of testing sites. Sociodemographic data (SDD)  “is unevenly collected as a part of COVID-19 testing—especially at drive-through sites. However, patients seen at drive-through sites have a provider’s order to be there, so the provider has [that information] and can link it if the person tests positive.” Among other problems, Goler Blount emphasized that place matters. For example, “Jackson, Mississippi has very little SDD while New York City has fairly complete SDD from its hospitals.”

Take numbers coming from the CDC, which say race is “missing or unspecified” for 65 percent of all confirmed U.S. cases. That’s because states, counties, and cities are falling short. Look at Georgia, where an April 21 report from the Department of Public Health reported: “unknown race” for 8,945 cases and 32 deaths. According to a Virginia Department of Health report from the same date, 34 percent of cases there didn’t report race. 

Without that data, Goler Blount fears “we may never know the true infection or [death] rate in general, and we certainly won’t know by race and ethnicity. The Native population is being hit especially hard, and there is almost no data collection taking place in Indian country.”

But basic data collection doesn’t have to be difficult. LaShannon Spencer is chief executive director of Community Health Centers of Arkansas, a network of a dozen community health centers and their 100 satellite programs. 

When asked if it’s onerous for her clinics to pull racial statistics about coronavirus testing at the network’s member clinics, she said, “Not at all. We can pull that data by looking at our patient encounters and what happened at their appointments.” She added that could be done in a few hours, and that a patient could easily be asked to confirm their identity when they’re asked for their birthdate or address at checkups or other visits. It could be routine. 

 But Spencer acknowledged that some data is more difficult to collect and stresses the best data collectors—whether they are records clerks or physicians—need training to generate better data. Any legislative attempt to root out coronavirus disparities will need to take that into account. Most of all, Spencer added, trusting relationships with patients is key. She sees the incomplete data about coronavirus and race as a symptom of a larger problem. 

“The lack of primary care providers and the workforce shortage that’s plaguing so many communities absolutely play a role in this. … We’re losing that patient-provider communication, and that primary-care provider can tell you so many things about their patients. So many people will be moving to telehealth, but I think about that Black male patient who’s now looking at that doctor via screen. We’re losing that eye contact, that nonverbal communication that’s letting the patient know ‘I have you. It’s going to be OK.’” That translates into lost opportunities for optimal care and optimal data collection. 

Coronavirus Data Challenges

The Equitable Data Collection and Disclosure on COVID-19 Act called for the collection and publication of data including, but not limited to, the numbers of tests administered, positive results, hospitalizations, intensive-care admissions and deaths. It also would have required information about denial of care, particularly for people with disabilities or pre-existing conditions. 

Those numbers aren’t generated by simple counting. It can be harder to measure things that didn’t happen, such as lack of testing or denial of care, unless people’s self-reports are considered and substantiated. How sex and gender are defined also can vary. 

The state-by-state patchwork approach to collecting and reporting data also has implications for national tallies of something as seemingly straightforward as the number of deaths. With coronavirus raging near the end of flu season, it’s unclear how many deaths could be attributed to it and how many people, assuming they had the flu, stayed at home when they had the coronavirus. All sorts of different professionals and actors check the boxes that are the raw material of statistics. Who’s determining the race of people who died when no one can confirm their background? In many cases, it can fall to the coroner or the funeral home to make a guess or abstain from speculating. 

“If folks are dying with respiratory symptoms or cardiac failure [during the pandemic, but without a diagnosis], are we just going to say, ‘Well, because it happened at this time, they probably had COVID-19’ and call that a coronavirus death? Some authorities and coroners may allow that. Some may not. What’s the deciding factor? Some may only call it COVID-19 if there’s a positive test, some may do it if you’re in the hospital with respiratory symptoms and you were intubated,” said Caitlin Williams, a North Carolina-based maternal health researcher and consultant who hosts #CoronaChat videos in both English and Spanish. 

Any official count will be an undercount because of lack of testing, undiagnosed cases, and home deaths that might not register on a coroner’s radar as a coronavirus death. Epidemiologists, however, can compare patterns of deaths and estimate “excess deaths” that might be attributable to coronavirus or other ailments. 

Williams pointed to the challenges around gathering data on the U.S. maternal mortality crisis, in which women die in childbirth and from other pregnancy-related causes, as one example.  “Maternal health had to grapple with this kind of ambiguity in thinking about HIV or AIDS: When is a death HIV-related versus pregnancy-related? I anticipate those kinds of decisions will need to be made here, too.”

She went on: “We have a federal system that can put out guidance and say this is the information that we would like states to report to us. But states can decide what they’re actually going to do. So when a federal form for a new standardized death certificate form [that noted recent pregnancy]  was published in 2003, a decade later, not all states had adopted it.” The result: The data the CDC receives from states may vary widely, rendering it hard to create comparable estimates across different sub-groups. 

And if states moved to even piecemeal standardization—say, deciding to collect a few new critical pieces of data—Spencer is interested in more than just numbers. When Colorlines spoke with Spencer, she did a quick brainstorm of what she wants to know about coronavirus and how it’s playing out at the Arkansas clinics she oversees. 

“At some point, I’m going to bring together my members of the health centers to debrief and talk about our questions. I want to know things like: How did patients arrive at our clinic? Were they transported by family members? How did they actually pay for the tests? Were they uninsured, underinsured, Medicaid, Medicare? What’s their gender, age? Is their housing stable? What other chronic illnesses did they have? Did they go somewhere before us? You can have the information, but you still have to interpret it.” 

Williams added, “This kind of information can be pulled together by a team working within an individual hospital or practice, or in certain specific health systems like Kaiser or Geisinger, where insurers and hospitals are one and the same. But in general, the U.S. health-care system is so fractured that a state health department, much less CDC, could never answer those kinds of questions at a population level, which is a real problem. Because these are the kinds of questions we have to be able to answer to make sure that we’re actually meeting people’s needs."

Data helps build knowledge about how the virus spreads and makes the case for more resources: where to expand testing, where to send more staff and ventilators; where people are dying the most; and how long people had mild or severe symptoms. And, when it comes to health disparities, figuring where resources to contain or prevent outbreaks are actually going versus where they should be deployed is crucial.

This will demand more descriptive data, including about why people make certain health-care choices. Everything isn’t quantitative, said Hill, the Northwestern researcher.  

“Coronavirus disparities have revealed themselves in the numbers. But they also reveal themselves in people’s stories. In data collection, the focus is on numbers and getting a sample size that is large enough to be able to run a statistical analysis. I have empathy for that, and I’m also part of one of those communities that gets left out. Just because the numbers of trans people aren’t high enough to get statistical power doesn’t mean that we’re not worth thinking about. I believe that about every community. I believe that about people of color. I believe that about folks living with disabilities.”

Dawes agreed that the narratives of people worrying about their coughs and sick people trying to navigate care during a crisis are critical to this moment. He talked to an African-American man who experienced a heart attack and feared going to the hospital because he thought he was at high risk of contracting COVID-19 there. He could have been an indirect coronavirus casualty—if he had delayed or not sought care because he was weighing “which of these things could kill him quicker.” 

These are choices no one should have to make. Goler Blount of the Black Women’s Health Imperative and many others are thinking about other ominous decisions that can further endanger communities of color. 

“Those of us in public health and especially we epidemiologists fear now that the rates among Black and Brown populations are so high, this will be further reason for racist treatment and reduced access to care, employment, housing, insurance. That will accelerate both the downward economic spiral, health conditions and avoidable [deaths].”

Dr. Cynthia Greenlee is a North Carolina-based journalist and historian.