Jonathan Schwabish, Author at Nightingale | Nightingale | Nightingale

Understanding the Evolution of Race and Ethnicity Data in Federal Surveys: How You Can Engage in Shaping Future Census Categories

Jonathan Schwabish — Tue, 29 Oct 2024 15:47:23 +0000

For many researchers and analysts, using large federal, state, and local surveys involves tabulating or organizing categories of race, ethnicity, gender, religion, sexual orientation, or a host of other identities and demographic characteristics. The specific categories that are available in federal data and how they are collected, ordered, and coded are defined by individual federal data collection agencies as well as overarching guidelines published by the Office of Management and Budget (OMB). Here, we discuss the large changes to federal data collection guidelines from OMB and how you can be an active participant in how OMB and other agencies decide what data and how to go about collecting it.

Existing data collection and usage guidelines are not fixed in space or time. Racial categories, for example, have changed dramatically over the 24 decennial Censuses that are conducted every 10 years starting in 1790. In the first three censuses in 1790, 1800, and 1810, for example, there were three racial categories: “Free white males, Free white females”; “All other free persons”; and “Slaves.” Those categories have obviously changed over the past two hundred-plus years to be more inclusive and representative of the US population. The most recent 2020 census included 15 separate identified categories to which survey respondents could provide additional details by writing in specific racial and ethnic identities.

Today, OMB, the Census Bureau, and many other data collection agencies—as well as other organizations, researchers, and advocacy groups—discuss and debate the best ways to collect identity information. The Census Bureau, for example, is currently debating whether and how to change the set of questions used to collect information on disability status.

How did we get to the 15 racial categories that appear on most Census Bureau surveys (and most other government agency surveys), and how will things change in the future?

OMB Creating the Census 1977 categories

US government agencies have long collected demographic data, including information on race and ethnicity. Prior to 1977, there was no comprehensive, official guidance on the best ways to collect these data. In response to the variety of racial and ethnic categories in use across federal agencies at the time, a federal interagency committee was convened and recommended a common set of categories for use across the federal government. That work culminated in what is known as Statistical Policy Directive No. 15 or SPD 15, OMB established five major racial categories for the U.S. Census Bureau to use in its data collection, including “American Indian or Alaskan Native,” “Asian or Pacific Islander,” “Black or African American,” “Hispanic,” and “White.” These categories were designed to provide a consistent and standardized framework for collecting and reporting racial and ethnic data across various federal agencies.

SPD15 also noted that “to provide flexibility, it is preferable to collect data on race and ethnicity separately.” This recommendation was implemented by asking one question about race that consisted of four categories (American Indian or Alaskan Native; Asian or Pacific Islander; Black; and White) and a separate question on ethnicity (Hispanic origin/Not of Hispanic origin).

Census 1997 categories

Over the subsequent 20 years, many began to argue that the five categories (including the ethnicity category) did not accurately capture the diversity of the American population. Thus, beginning with Congressional hearings in 1994, OMB started the process of researching a revised set of guidelines. Those discussions resulted in a new set of recommendations that were published in 1997. This update introduced several key changes:

First, the “Asian or Pacific Islander” category was divided into two distinct categories: “Asian” and “Native Hawaiian or Other Pacific Islander.”
Second, the term “Hispanic” was replaced with “Hispanic or Latino” to encompass a broader range of identities and terminology preferences within this ethnic group.
Third, the revisions allowed individuals to select more than one racial category, recognizing the increasing prevalence of multiracial identities. (Between the 2010 and 2020 censuses, the number of people who identified with more than one race increased from 9 million people to almost 34 million people.)

All of these changes aimed to provide a more nuanced and inclusive framework for collecting and reporting data on race and ethnicity, enabling better policy-making and resource allocation.

New Race & Ethnicity Categories

Earlier this year, OMB published a final set of revisions to how racial and ethnic data will be collected by US federal data collection agencies. The process started in June 2022 with an Interagency Technical Working Group, which engaged in nearly 100 listening sessions with members of the public and reviewed around 20,000 submitted comments.

After nearly two years of work, OMB is now ready to implement three large revisions to the data collection efforts.

First, the separate race and ethnicity questions will be combined to a single question. Respondents will be encouraged to select as many options as they wish to capture their identity.
Second, a new “Middle Eastern or North African” category will be added to the set of options, creating a total of eight possible answers: American Indian or Alaska Native; Asian; Black or African American; Hispanic or Latino; Middle Eastern or North African; Native Hawaiian or Pacific Islander; White; and Other.
OMB is also recommending that additional detail beyond these eight possible categories be provided as options for survey respondents, to “ensure further disaggregation in the collection, tabulation, and presentation of data when useful and appropriate.”

How to get involved

You might have been surprised to see nearly 100 listening sessions and 20,000 comments as part of the working group’s process over the past year. Many people and groups are often caught off-guard as to when the federal government (as well as state and local governments) seek input from the public. Unfortunately, there is not a single source that provides all of the notices of government requests for comment or information.

We spoke with several groups that regularly track and participate in these kinds of comment periods, who recommended a few proactive steps you can take to be more engaged and part of the comment process:

Subscribe to Email Notifications. Many government agencies, including the Census Bureau, offer email subscription services to notify the public about updates, news releases, and requests for comments. Signing up for these services ensures timely notifications.
Monitor the Federal Register. The Federal Register publishes daily updates on government activities, including requests for information and public comments. Individuals can access it online and even subscribe to receive email notifications about specific topics of interest.
Visit Agency Websites Regularly. Regularly checking the official websites of agencies such as the Census Bureau, OMB, and other relevant bodies can help individuals stay informed about upcoming requests and deadlines.
Engage with Professional and Community Organizations. Many professional and community organizations track government announcements and share relevant information with their members. Joining such organizations can provide an additional layer of information.
Set Up Alerts. Using search engines and news services, individuals can set up alerts for specific keywords related to government requests for information or public comments. This can automate the process of staying informed.

The work of collecting, analyzing, and communicating better, more effective, and more representative data is an ongoing process. That work does not—and should not—be isolated to inside the walls of government agencies and survey collection organizations. Participating in the process–while often difficult and time-consuming—can be an important way to get your voice—and that of your organization or community–heard and ultimately represented in government data.

CategoriesTopics in Dataviz

Data4Kids: A Collaborative Project to Teach Kids about Data

Jonathan Schwabish & Claire McKay Bowen — Wed, 09 Feb 2022 14:00:00 +0000

When the COVID-19 pandemic first shut down in-person learning, we knew education wouldn’t be the same. How could kids—especially younger kids in elementary and middle schools—learn and socialize if they weren’t in a physical classroom? In late 2021, we worked with a team of people from various groups—including the Concord Consortium, Esri, Found Spatial, the American Statistical Association, the Launch Years initiative at the University of Texas at Austin, and faculty members at the University of Memphis and the University of Tennessee—to provide an open repository of data-related material for educators. This resulted in our Data4Kids project.

Our goal for this project was to create an online repository of materials that help educators prepare children to be better data users, stewards, and consumers. We wanted to develop a set of easy-to-use and easy-to-access learning materials with well-defined learning goals and step-by-step curricula for online learning but that could also be used during in-person instruction. We envisioned the materials to help educators bring the world of data, data science, and data visualization to their students.

We also recognized that students of different ages have different skills and interests in data. This motivated us to incorporate three main groups into our final product: Band 1 for grades 3-5 (roughly ages 9-11); Band 2 for grades 6-8 (roughly ages 12-14); and Band 3 for grades 9-12 (roughly 15-18).

With our collaborators and these criteria in mind, we created six “Data Stories” for educators to use with their students. The Data Stories reflect the students’ potential differences in learning styles, complexity of the material, and student maturity. Each story contains four items:

Instructors’ Guide (docx, Google Doc)
Data (xlsx, Google Sheets, CSV)
Data Dictionary (docx, Google Sheets)
Slides (PowerPoint, Google Slides)

The Instructors’ Guide walks the instructor through the entire project with notes that describe how to use the other data story items. For instance, the Guide provides data questions with answers and tips to engage their students for the accompanying annotated slides. This allows the instructor to start teaching without much effort—we want our materials to be as easy as possible for teachers so they can focus on the instruction and content.

We also structured the Guides with five sections:

Data Question. We prompt the instructor to ask the kids to think about the data by using phrases like “I notice that….”, “I wonder if…”, “I wonder why…” and “I wonder how…”. We alternatively provide a list of specific data questions with answers.
Data Collection. We provide prompts about the basics of data collection. The youngest kids, for example, are asked to think about measuring their own height while the oldest are asked about why the poverty rate is an imperfect measure of poverty.
Data Analysis. In this section, we give them some “messy” data—with errors we’ve added and marked for the instructors—so the kids can explore and see if they can find (obvious) mistakes, in contrast to the clean data.
Data Visualization. We created a set of graphs and charts (in Excel and Google Sheets). Here, our priming question is, “Which data visualization is best to tell the story?” We give the instructor ideas about what data questions each visualization could answer.
Data Equity, Ethics, and Privacy. Finally, we delve into Data Equity, Ethics, and Privacy. We ask priming questions like, “Who is represented in the data?”, “How should we report the data?”, and “Are we telling the right data story?”

Although each section contains lots of detail and options, we don’t expect any educator to get into the weeds on any specific topic. We wanted to give educators the opportunity and flexibility to emphasize various aspects of being a responsible data consumer in their curriculum.

To help you better understand what is available in the Data4Kids site, we’ll walk you through the data story on City Health Equity.

In this data story, we encourage students to imagine they are helping a family decide where to live in the United States and consider what makes a city “livable.” We use a dataset that includes information about different US cities, including: in which region are they located, how their population is changing, and what access they offer to parks, healthy food, breathable air, and other amenities. Some potential data questions ask, “Which region or city type has higher home ownership?” and “What is the average unemployment rate for each city type?”

These questions may be a little bit difficult for the youngest kids to answer. As another option, the educator can hand out index cards and drawing utensils to the students. The students then pick a city from the dataset and make a data card with various data points from the dataset. Children learn in different ways at different ages, so we wanted to provide the educators with this alternative method.

We are hopeful that educators and parents will use these materials to teach kids about collecting, analyzing, and communicating data. We are also excited to broaden the project by inviting anyone to create their own data story. There is an open form on the site, where all the materials we have created are essentially templates so anyone can assemble a dataset, formulate questions and answers, and develop data visualizations to help educators around the world. We hope you will consider collaborating with us to provide more data stories and data science resources!

CategoriesData Literacy How To Kidz Dataviz Topics in Dataviz

Six Ways to Bring Empathy into your Data

Jonathan Schwabish & Alice Feng — Wed, 09 Jun 2021 13:00:12 +0000

One of the big challenges in visualizing data, and quantitative research in general, is helping readers connect with the content. Connecting directly with people and communities, and trying to better understand their lived experiences, can help content producers create visualizations and tell stories that better reflect the true experiences of different people. Our recent report on taking a racial equity awareness in how you and your organization work with and communicate your data and research focuses on this important aspect.

Embracing empathy in data and data visualization is a key dimension for people working with data to help put their work into the hands of policymakers, stakeholders, and community members who can use it to affect change. Inclusive and thoughtful data visualization that respectfully reflects the people and communities of focus can also help researchers build trust with those communities.

We think of empathy as it applies to communicating data across six main themes:

1. Put people first. First and foremost, we need to remember and communicate that the data shown reflect the lives and experiences of real people. Data communicators must help readers understand and recognize the people behind the data.

2. Use personal stories to help readers and users better connect with the material. Pairing data-driven charts with personal stories centered on individual experiences can help readers understand and identify with the people represented in the research and data visualizations. Techniques that can be used in tandem with data visualizations to help lift up personal stories include photography, illustrations, pull quotes, and oral histories.

3. Use a mix of quantitative and qualitative approaches to telling a story. Most charts and graphs are built on top of spreadsheets or databases of quantitative data. However, focusing on numbers alone without any context can overlook important aspects of a story including the “why” and the “how.”

4. Create a platform for engagement. This can take the form of interactivity in which users are able to manipulate buttons, sliders, tooltips, and other elements to make selections, filter the dataset, or create customized views of a chart. Such engagement can be leveraged as a way to allow users to find themselves in the data or discover the stories that most interest them. Another form of engagement is offering audiences a means of providing feedback about a data tool or visualization.

5. Consider how your framing of an issue can create a biased emotional response. Carefully consider how the data you visualize presents a particular perspective on the content. Take the examples ProPublica journalist Lena Groeger discusses in this post on different ways to visualize the impact of crime on local communities. Maps that show the locations of where crimes occurred versus maps that show the percentage of residents in a neighborhood who were in prisons are two different ways to visualize data related to the criminal justice system. What data we choose to focus on and what we choose to ignore can bias our audiences’ perceptions of the issues about which we are communicating.

6. Recognize the needs of your audience. Taking an empathetic view of the readers’ needs as they read or perceive information is an important step to better data communication. This kind of empathy can also be couched in terms of producing visualizations that are accessible by people with vision, physical, or intellectual impairments; reducing overly technical or jargon-laden language; and translating your work into languages most used by your target audiences.

Being empathetic to the people and communities of focus does not imply sacrificing the data and methods used in responsible, in-depth, sophisticated research. In fact, the opposite is true: high-quality research and empathy for people and communities can be complementary. Effective research necessarily means understanding someone else’s point of view nonjudgmentally and recording that perspective as accurately and truthfully as possible. Empathy underlies research and data visualizations that uphold diversity, equity, and inclusion, so data communicators should seek to find ways to help their audiences understand and connect with the people that the data represent.

Read the full Do No Harm guide here.

CategoriesAll Stories Data Humanism How To Topics in Dataviz Use Charts