Step 7b: Join 2013 ACS Block-Group Demographics into Master
Author
Russell Blessing
Overview
Pulls 2013 American Community Survey (5-year) demographic data at the census block-group level for the eastern NC study area, spatial-joins each parcel to its containing block group via centroid, and appends the demographic columns to parcels_master.gpkg.
Inputs:
parcels_master.gpkg (Step 7 output)
2013 ACS 5-year estimates via tidycensus
2010 block-group TIGER polygons via tigris (2013 ACS uses 2010 census geography)
Outputs:
parcels_master.gpkg updated in-place with new bg_* columns
acs_bg_2013_nc.gpkg — standalone block-group layer with demographics for future use
acs_corelogic_value_comparison.csv — diagnostic comparing CoreLogic median TVC to ACS median home value per block group
In [1]:
Show / hide code
library(sf)
Linking to GEOS 3.12.0, GDAL 3.11.0, PROJ 9.2.1; sf_use_s2() is TRUE
Show / hide code
library(dplyr)
Attaching package: 'dplyr'
The following objects are masked from 'package:stats':
filter, lag
The following objects are masked from 'package:base':
intersect, setdiff, setequal, union
Show / hide code
library(readr)library(tigris)
To enable caching of data, set `options(tigris_use_cache = TRUE)`
in your R script or .Rprofile.
# B03002 — Hispanic or Latino origin by race (race-ethnicity cross-tab)# B19013 — Median household income (last 12 months, 2013 inflation-adjusted)# B25077 — Median value of owner-occupied housing unitsacs_vars <-c(pop_total ="B03002_001",pop_nh_white ="B03002_003",pop_nh_black ="B03002_004",pop_nh_aian ="B03002_005",pop_nh_asian ="B03002_006",pop_hispanic ="B03002_012",median_income ="B19013_001",median_home_value ="B25077_001")
In [5]:
Show / hide code
# Pull all block groups for the study area's counties. tidycensus returns# a long-format tibble; pivot_wider gives one row per block group.acs_raw <-get_acs(geography ="block group",variables = acs_vars,state ="NC",county = study_counties_3digit,year =2013,survey ="acs5",geometry =TRUE,output ="wide",cache_table =TRUE)
# Save the BG layer for future usest_write(bg_2013, bg_out_path, delete_dsn =TRUE, quiet =TRUE)
3 Spatial join parcels to block groups
In [7]:
Show / hide code
# Build sf points from parcel centroidsparcel_pts <- master_attrs |>filter(!is.na(cent_lat), !is.na(cent_lng)) |>st_as_sf(coords =c("cent_lng", "cent_lat"), crs =4326, remove =FALSE)cat("Parcels with valid centroids:",format(nrow(parcel_pts), big.mark =","), "\n")
Parcels with valid centroids: 4,216,239
In [8]:
Show / hide code
# Point-in-polygon — assigns each parcel its containing block-group GEOID.parcel_bg <-st_join( parcel_pts |>select(parcel_index), bg_2013 |>select(bg_geoid),join = st_intersects,left =TRUE) |>st_drop_geometry() |>distinct(parcel_index, .keep_all =TRUE) # in case any point lands on a BG edgecat("Parcels with assigned BG GEOID: ",format(sum(!is.na(parcel_bg$bg_geoid)), big.mark =","), "\n")
Parcels missing BG GEOID (off-NC or out-of-BG): 2,342
In [9]:
Show / hide code
# Join BG demographics onto each parcel via GEOIDparcel_bg_demo <- parcel_bg |>left_join(st_drop_geometry(bg_2013), by ="bg_geoid")cat("Coverage by demographic column:\n")
# Reload the full master (with geometry), join in the BG columns, save back.master_full <-st_read(master_path, quiet =TRUE)# Drop any prior bg_* columns so we don't end up with bg_*.x / bg_*.y duplicatesmaster_full <- master_full |>select(-any_of(grep("^bg_", names(master_full), value =TRUE)))# Join — bg columns are attribute-only, so st_drop_geometry on parcel_bg_demomaster_with_bg <- master_full |>left_join(parcel_bg_demo, by ="parcel_index")cat("master_with_bg dimensions:",format(nrow(master_with_bg), big.mark =","), "rows ×",ncol(master_with_bg), "cols\n")
Funded median below Flooded by ~15pp → cleaner shift than the EIF version
If median_pct_white for Applied lands near 64 and for Funded near 59, the block-group switch reproduces Julian’s qualitative story. The above table is parcel-weighted (each parcel contributes its BG’s race share); Step 8’s Fig. 4 will compute the cell-vs-unique-block-group version, which will be slightly different.
6 CoreLogic-vs-ACS dollar-level value diagnostic
The ACS B25077 median home value is in dollars (unlike the EIF vigintiles), so we can do the cell-vs-cell dollar comparison Miyuki originally expected. Per block group:
ACS median home value (2013 dollars, owner-occupied units)
Median CoreLogic TOTAL.VALUE.CALCULATED across our 1–4 family residential parcels (2022 dollars)
The vintage mismatch (ACS 2013 vs CL 2022) will bias the ratio upward — NC home values appreciated ~30–40% over that decade. We document the agreement in ratio and correlation terms.
In [12]:
Show / hide code
cl_per_bg <- master_with_bg |>st_drop_geometry() |>filter(fr_1_4 ==1L, interpolate ==0L, # raw CoreLogic, no interpolation!is.na(TOTAL.VALUE.CALCULATED),as.numeric(TOTAL.VALUE.CALCULATED) >0,!is.na(bg_geoid)) |>group_by(bg_geoid) |>summarise(cl_n_parcels = dplyr::n(),cl_median_tvc =median(as.numeric(TOTAL.VALUE.CALCULATED), na.rm =TRUE),.groups ="drop" )cmp <-inner_join( cl_per_bg |>filter(cl_n_parcels >=5), bg_2013 |>st_drop_geometry() |>select(bg_geoid, bg_median_home_value_2013) |>filter(!is.na(bg_median_home_value_2013)),by ="bg_geoid")cat("BGs with both CL and ACS medians:",format(nrow(cmp), big.mark =","), "\n")
Block groups follow 2010 Census geography because the 2013 ACS 5-year is published against that geography. If you later want to use 2020-vintage ACS, you’ll need 2020 BG boundaries and a crosswalk if comparing both vintages.
ACS median home value (B25077) is owner-occupied housing units only. Renter-occupied units don’t contribute. This matches the conceptual scope of CoreLogic parcel values (assessor-recorded ownership) reasonably well.
Some block groups have suppressed values for median income or home value when the underlying counts are too small. These appear as NA in the demographic columns — propagating to any parcel in those BGs.
NA bg_geoid parcels are likely off-coast (over water) or in BG-edge positions that the spatial join missed. Investigate the count printed in Section 3 before publishing; if it’s >1% of parcels, may need a fallback rule (nearest-BG via st_nearest_feature).