Where does our data come from?
The Environment and Health Data Portal hosts over 200 sets of measures from data sources that cover topics like air quality, climate, pests, housing, and much more. These data can tell us how environmental factors like housing, air quality, and socioeconomic status can shape health from neighborhood to neighborhood. But where does this data come from? How often is it updated? And how do we pick which datasets to feature?
Where do you get your data?
We get data from many sources: some from the NYC Health Department or other city agencies like NYC Parks or Department of Investigations; and some from outside sources, like the CDC, SPARCS, the U.S. Census, and the EPA. Here are some of the core datasets:
What is the Community Health Survey (CHS)? This survey is conducted by the NYC Health Department and interviews about 10,000 New Yorkers each year. Running since 2002, CHS reports detailed data on many chronic diseases and health behaviors, helping us see trends at the neighborhood, borough, and citywide level.
Some CHS indicators include:
What is New York Statewide Planning and Research Cooperative System (SPARCS)? SPARCS is a billing claims data system that collects patient-level data, like diagnoses, treatments, and characteristics for both inpatient and outpatient stays in every hospital throughout New York state. It is a collaboration between the NY state government and the healthcare system. At the NYC Health Department, we restrict data to hospitals within NYC and sometimes to NYC residents.
Some SPARCS indicators include:
What is the Housing and Vacancy Survey (HVS)? The NYC Department of Housing Preservation and Development (HPD) and the US Census Bureau together conduct the HVS every 3 years. The main purpose of HVS is to describe how many rental units are vacant to understand more about rent control and stabilization and the housing market.
Some HVS indicators include:
What is the American Community Survey (ACS)? The US Census Bureau conducts the ACS annually, collecting population, housing, and workforce data like unemployment, income, insurance, and more.
Some ACS indicators include:
Some NYCCAS indicators include: What we use it for: It helps us inform PlaNYC, track changes in air quality over time, estimate exposures for health research, inform the public about local topics, such as air quality improvements, health benefits of public transit to air quality, and efforts to reduce the health impacts of air pollution.
Reporting all vital events in NYC since the 1800s, the NYC Bureau of Vital Statistics’ records information about birth and death rates, infant mortality, and causes of death.
Some Bureau of Vital Statistics indicators include:
Vital stats data, like premature death rates, can help us get a snapshot of the general health of New Yorkers. When we analyze these data alongside social determinants of health, it can help us understand the burden of factors like neighborhood poverty on health outcomes. In one analysis, we found a higher minimum wage could save thousands of lives. We use cause of death records in our annual heat mortality report, to calculate how many deaths can be attributed to heat-related causes, and understand how race, income, and AC access shape vulnerability to heat-related illness and mortality.
What types of data sources are there?
There are many kinds of data, which are collected, cleaned, and updated in different ways. Here are some of the categories data can fall into, and some may even fall into multiple categories.
Collecting this type of data is mandated by the local, state, or federal government, which typically means it is updated regularly and reliably. In New York state, blood lead level testing is mandated. Because it is mandatory, it also means that blood lead levels of NYC populations are regularly updated, so there are many years of this data available. Still, not everyone goes to the doctor, even if testing is required. Federal air quality monitoring required by the Clean Air Act is another example of regulatory data.
A selection of survey respondents answer questions online, via phone, or e-mail. Surveys like the Community Health Survey, the Housing and Vacancy Survey, and the American Community Survey are conducted regularly at different intervals. While most surveys are voluntary, some, like ACS, are compulsory.
Sometimes, a survey conducted every year drops a question, and we have to decide how to continue to track that dataset. In 2015, CHS dropped a question about recent cycling, so we looked for other indicators in both the CHS and ACS, with the goal of finding something with many years of data so we could see the change over time. We found monthly bicycle use, another survey question from CHS. The reliability of survey data depends on the willingness of respondents, as well as their honesty, the framing of the questions, and many other factors.
Registry and population data is standardized information that must be collected about every person or event. This includes birth and death records, and Census data, which have all been recorded for a long time.
Premature mortality from the NYC Bureau of Vital Statistics and the US Census; or cancers in children from the New York State Cancer Registry both fall into this category.
Collected continuously and systematically, near real-time data can include environmental data, like real-time air quality (PM2.5) monitoring, which is updated hourly. It can also include near real-time health data, such as the total daily visits to the Emergency Department during the hot weather season.
There are other kinds of data, too. There is administrative data, which is collected by healthcare or government organizations as part of conducting routine business or activities. An example of this are evictions (court-ordered), which are available through the Department of Investigations (DOI). There is also operational data, like our litter basket coverage data, which is from the Department of Sanitation (DSNY), but available through NYC Open Data.
NYC’s Local Law 11 requires city agencies to make data considered “public” available through a single data portal so that anyone can access and use it. This type of transparency reflects the idea that public data belongs to the public, and empowers all New Yorkers to understand key information about civic life. Note that not all data used by city agencies is considered public; due to privacy laws, much health data is excluded from this requirement.
How do you choose datasets?
We use datasets from many sources to quantify the state of various measures of health, and explanatory text to frame it, provide context, and add meaning. No single dataset can tell us everything, but together, they can paint a picture of how environments shape health in NYC across time. That said, there are tons of datasets out there – so how do we choose? We have frequent conversations with our data experts to determine what datasets would add the most value to the Portal.
But sometimes we see something on NYC Open Data, or another source, that provides interesting context to NYC’s environment and health, for example, our litter basket coverage data. These humble amenities may be overlooked, but have a strong connection to health: when there are more litter baskets, there is less litter, and fewer pests. Fewer pests are healthier for a neighborhood and cleaner streets have a positive impact on mental health and feelings of safety and positivity. Public bathrooms also make it easier for people to partake in public life.
Transit datasets like accessible subway stations and bus stops with audio announcements are also from NYC Open Data, and illustrate how accessible transit (and thus, all of New York City) is to New Yorkers with disabilities, caregivers, older adults, and everyone!
Why aren’t some of your data more recent?
These data (ranging from neighborhood poverty and cold-stress hospitalizations, to Citi bike station density and cockroach sightings) aren’t all measured, collected, recorded, organized, and reported in the same way, or within the same time period. Sometimes data are also aggregated into multi-year batches to protect privacy while being stable enough to show impacts at the neighborhood level.
As a result, some types of data aren’t updated as frequently as others. But that doesn’t mean that older datasets don’t tell us valuable information. Significant trends in health can take a long time to show up. When it comes to Fall-related hospitalizations (age 65+), for instance, our most recent dataset is from 2023. However, the chart tells us that borough-level trends have been relatively stable since 2018. Any programs and outreach we are developing to address these issues will still be relevant year after year, even as we await a batch of newer data.
Combining data from different sources and looking at them in the context of one another is part of what makes the Portal such a powerful tool, and improves our understanding across data types and time frames. Together, each of the Portal’s data sources captures diverse, valuable information that together show how the environment – built, social, economic – shapes population health in NYC.
Published on:
January 20, 2026