Step 7b: Join 2013 ACS Block-Group Demographics into Master

Overview

Pulls 2013 American Community Survey (5-year) demographic data at the census block-group level for the eastern NC study area, spatial-joins each parcel to its containing block group via centroid, and appends the demographic columns to parcels_master.gpkg.

Inputs:

parcels_master.gpkg (Step 7 output)
2013 ACS 5-year estimates via tidycensus
2010 block-group TIGER polygons via tigris (2013 ACS uses 2010 census geography)

Outputs:

parcels_master.gpkg updated in-place with new bg_* columns
acs_bg_2013_nc.gpkg — standalone block-group layer with demographics for future use
acs_corelogic_value_comparison.csv — diagnostic comparing CoreLogic median TVC to ACS median home value per block group

In [1]:

Show / hide code

library(sf)

Linking to GEOS 3.12.0, GDAL 3.11.0, PROJ 9.2.1; sf_use_s2() is TRUE

Show / hide code

library(dplyr)


Attaching package: 'dplyr'

The following objects are masked from 'package:stats':

    filter, lag

The following objects are masked from 'package:base':

    intersect, setdiff, setequal, union

Show / hide code

library(readr)
library(tigris)

To enable caching of data, set `options(tigris_use_cache = TRUE)`
in your R script or .Rprofile.

Show / hide code

library(tidycensus)
library(stringr)
library(tidyr)
options(tigris_use_cache = TRUE)

In [2]:

Show / hide code

out_dir     <- "/proj/mhinolab/users/rbless/data/Obstacles_Output"
master_path <- file.path(out_dir, "parcels_master.gpkg")
bg_out_path <- file.path(out_dir, "acs_bg_2013_nc.gpkg")
cmp_out_path <- file.path(out_dir, "acs_corelogic_value_comparison.csv")

1 Identify study-area counties

Pull the unique county FIPS codes from parcels_master.gpkg so we only download block groups for counties that contain analysis-sample parcels.

In [3]:

Show / hide code

master_attrs <- st_read(master_path, quiet = TRUE) |>
  st_drop_geometry() |>
  select(parcel_index, cntyfips, cent_lat, cent_lng, fr_1_4, interpolate,
         TOTAL.VALUE.CALCULATED) |>
  mutate(cntyfips = as.character(cntyfips))

study_county_fips <- master_attrs |>
  distinct(cntyfips) |>
  filter(!is.na(cntyfips)) |>
  pull(cntyfips)

# tigris/tidycensus expect 3-digit county codes within a state.
# cntyfips is stored as "37063" (state + county); strip the leading state.
study_counties_3digit <- str_sub(study_county_fips, 3, 5)

cat("Study area counties: ", length(study_counties_3digit), "\n")

Study area counties:  78

Show / hide code

cat(head(study_counties_3digit, 10), "...\n")

001 007 057 059 009 013 061 063 065 067 ...

2 Pull 2013 ACS demographics + block-group geometries

In [4]:

Show / hide code

# B03002 — Hispanic or Latino origin by race (race-ethnicity cross-tab)
# B19013 — Median household income (last 12 months, 2013 inflation-adjusted)
# B25077 — Median value of owner-occupied housing units

acs_vars <- c(
  pop_total          = "B03002_001",
  pop_nh_white       = "B03002_003",
  pop_nh_black       = "B03002_004",
  pop_nh_aian        = "B03002_005",
  pop_nh_asian       = "B03002_006",
  pop_hispanic       = "B03002_012",
  median_income      = "B19013_001",
  median_home_value  = "B25077_001"
)

In [5]:

Show / hide code

# Pull all block groups for the study area's counties. tidycensus returns
# a long-format tibble; pivot_wider gives one row per block group.
acs_raw <- get_acs(
  geography = "block group",
  variables = acs_vars,
  state     = "NC",
  county    = study_counties_3digit,
  year      = 2013,
  survey    = "acs5",
  geometry  = TRUE,
  output    = "wide",
  cache_table = TRUE
)

Getting data from the 2009-2013 5-year ACS

Show / hide code

cat("Block groups pulled:", nrow(acs_raw), "\n")

Block groups pulled: 5147

In [6]:

Show / hide code

bg_2013 <- acs_raw |>
  st_transform(4326) |>                    # WGS84 to match parcel centroids
  rename_with(\(x) str_remove(x, "E$"),    # drop the "E" suffix on estimates
              ends_with("E")) |>
  mutate(
    bg_geoid             = GEOID,
    bg_pop_total_2013    = pop_total,
    bg_pct_white_2013    = ifelse(pop_total > 0,
                                  pop_nh_white / pop_total * 100, NA_real_),
    bg_pct_black_2013    = ifelse(pop_total > 0,
                                  pop_nh_black / pop_total * 100, NA_real_),
    bg_pct_hispanic_2013 = ifelse(pop_total > 0,
                                  pop_hispanic / pop_total * 100, NA_real_),
    bg_pct_asian_2013    = ifelse(pop_total > 0,
                                  pop_nh_asian / pop_total * 100, NA_real_),
    bg_pct_aian_2013     = ifelse(pop_total > 0,
                                  pop_nh_aian / pop_total * 100, NA_real_),
    bg_median_income_2013 = median_income,
    bg_median_home_value_2013 = median_home_value
  ) |>
  select(bg_geoid,
         starts_with("bg_pop_"),
         starts_with("bg_pct_"),
         starts_with("bg_median_"))

cat("BG demographic summary (NC study area, 2013):\n")

BG demographic summary (NC study area, 2013):

Show / hide code

bg_2013 |>
  st_drop_geometry() |>
  summarise(
    n_bgs                = dplyr::n(),
    median_pop           = median(bg_pop_total_2013,    na.rm = TRUE),
    median_pct_white     = median(bg_pct_white_2013,    na.rm = TRUE),
    median_pct_black     = median(bg_pct_black_2013,    na.rm = TRUE),
    median_pct_hispanic  = median(bg_pct_hispanic_2013, na.rm = TRUE),
    median_inc           = median(bg_median_income_2013,  na.rm = TRUE),
    median_hv            = median(bg_median_home_value_2013, na.rm = TRUE)
  ) |>
  print()

  n_bgs median_pop median_pct_white median_pct_black median_pct_hispanic
1  5147       1443         67.37084          16.9464             4.16133
  median_inc median_hv
1      43531    133900

Show / hide code

# Save the BG layer for future use
st_write(bg_2013, bg_out_path, delete_dsn = TRUE, quiet = TRUE)

3 Spatial join parcels to block groups

In [7]:

Show / hide code

# Build sf points from parcel centroids
parcel_pts <- master_attrs |>
  filter(!is.na(cent_lat), !is.na(cent_lng)) |>
  st_as_sf(coords = c("cent_lng", "cent_lat"), crs = 4326, remove = FALSE)

cat("Parcels with valid centroids:",
    format(nrow(parcel_pts), big.mark = ","), "\n")

Parcels with valid centroids: 4,216,239

In [8]:

Show / hide code

# Point-in-polygon — assigns each parcel its containing block-group GEOID.
parcel_bg <- st_join(
  parcel_pts |> select(parcel_index),
  bg_2013 |> select(bg_geoid),
  join = st_intersects,
  left = TRUE
) |>
  st_drop_geometry() |>
  distinct(parcel_index, .keep_all = TRUE)   # in case any point lands on a BG edge

cat("Parcels with assigned BG GEOID: ",
    format(sum(!is.na(parcel_bg$bg_geoid)), big.mark = ","), "\n")

Parcels with assigned BG GEOID:  4,213,897

Show / hide code

cat("Parcels missing BG GEOID (off-NC or out-of-BG): ",
    format(sum(is.na(parcel_bg$bg_geoid)),  big.mark = ","), "\n")

Parcels missing BG GEOID (off-NC or out-of-BG):  2,342

In [9]:

Show / hide code

# Join BG demographics onto each parcel via GEOID
parcel_bg_demo <- parcel_bg |>
  left_join(st_drop_geometry(bg_2013), by = "bg_geoid")

cat("Coverage by demographic column:\n")

Coverage by demographic column:

Show / hide code

parcel_bg_demo |>
  summarise(across(starts_with("bg_"),
                   \(x) sum(!is.na(x)),
                   .names = "{.col}")) |>
  pivot_longer(everything(), names_to = "column", values_to = "n_non_na") |>
  mutate(pct = sprintf("%.1f%%", 100 * n_non_na / nrow(parcel_bg_demo))) |>
  print(n = Inf)

# A tibble: 9 × 3
  column                    n_non_na pct  
  <chr>                        <int> <chr>
1 bg_geoid                   4213897 99.9%
2 bg_pop_total_2013          4213897 99.9%
3 bg_pct_white_2013          4213578 99.9%
4 bg_pct_black_2013          4213578 99.9%
5 bg_pct_hispanic_2013       4213578 99.9%
6 bg_pct_asian_2013          4213578 99.9%
7 bg_pct_aian_2013           4213578 99.9%
8 bg_median_income_2013      4212350 99.9%
9 bg_median_home_value_2013  4198773 99.6%

4 Update parcels_master.gpkg in place

In [10]:

Show / hide code

# Reload the full master (with geometry), join in the BG columns, save back.
master_full <- st_read(master_path, quiet = TRUE)

# Drop any prior bg_* columns so we don't end up with bg_*.x / bg_*.y duplicates
master_full <- master_full |>
  select(-any_of(grep("^bg_", names(master_full), value = TRUE)))

# Join — bg columns are attribute-only, so st_drop_geometry on parcel_bg_demo
master_with_bg <- master_full |>
  left_join(parcel_bg_demo, by = "parcel_index")

cat("master_with_bg dimensions:",
    format(nrow(master_with_bg), big.mark = ","), "rows ×",
    ncol(master_with_bg), "cols\n")

master_with_bg dimensions: 4,216,239 rows × 112 cols

Show / hide code

cat("New columns:\n")

New columns:

Show / hide code

new_cols <- setdiff(names(master_with_bg), names(master_full))
print(new_cols)

[1] "bg_geoid"                  "bg_pop_total_2013"        
[3] "bg_pct_white_2013"         "bg_pct_black_2013"        
[5] "bg_pct_hispanic_2013"      "bg_pct_asian_2013"        
[7] "bg_pct_aian_2013"          "bg_median_income_2013"    
[9] "bg_median_home_value_2013"

Show / hide code

# Overwrite the master file
st_write(master_with_bg, master_path,
         delete_dsn = TRUE, quiet = TRUE)
cat("Wrote (overwrote):", master_path, "\n")

Wrote (overwrote): /proj/mhinolab/users/rbless/data/Obstacles_Output/parcels_master.gpkg

5 Sanity checks

In [11]:

Show / hide code

# Quick verification: stages-level race medians, like Step 8 will compute
master_sanity <- master_with_bg |>
  st_drop_geometry() |>
  filter(fr_1_4 == 1L | interpolate == 1L,
         !is.na(flooded_any), as.numeric(flooded_any) >= 1)

stage_check <- bind_rows(
  master_sanity |>
    mutate(stage = "Flooded"),
  master_sanity |>
    filter(!is.na(eligible), as.numeric(eligible) >= 1) |>
    mutate(stage = "Eligible"),
  master_sanity |>
    filter(as.logical(applied)) |>
    mutate(stage = "Applied"),
  master_sanity |>
    filter(as.logical(funded)) |>
    mutate(stage = "Funded")
) |>
  group_by(stage) |>
  summarise(
    n_parcels         = dplyr::n(),
    n_with_bg         = sum(!is.na(bg_pct_white_2013)),
    median_pct_white  = round(median(bg_pct_white_2013,
                                     na.rm = TRUE), 1),
    median_pct_black  = round(median(bg_pct_black_2013,
                                     na.rm = TRUE), 1),
    median_income     = round(median(bg_median_income_2013,
                                     na.rm = TRUE), 0),
    median_hv         = round(median(bg_median_home_value_2013,
                                     na.rm = TRUE), 0),
    .groups = "drop"
  ) |>
  mutate(stage = factor(stage,
                        levels = c("Flooded", "Eligible",
                                   "Applied", "Funded")))

knitr::kable(stage_check |> arrange(stage),
             caption = "BG demographic medians by pipeline stage (parcel-weighted)")

BG demographic medians by pipeline stage (parcel-weighted)
stage	n_parcels	n_with_bg	median_pct_white	median_pct_black	median_income	median_hv
Flooded	202623	201923	74.1	12.4	43640	129500
Eligible	141253	140574	72.5	13.2	44549	140700
Applied	5866	5853	53.6	34.7	38325	98000
Funded	4706	4702	48.1	41.2	31382	87400

Reproduction targets (from Julian’s draft):

Panel A (% NH White): Flooded ~74.9, Eligible ~74.1, In-a-community ~75.2, Applied ~64.1, Funded ~59.3
Funded median below Flooded by ~15pp → cleaner shift than the EIF version

If median_pct_white for Applied lands near 64 and for Funded near 59, the block-group switch reproduces Julian’s qualitative story. The above table is parcel-weighted (each parcel contributes its BG’s race share); Step 8’s Fig. 4 will compute the cell-vs-unique-block-group version, which will be slightly different.

6 CoreLogic-vs-ACS dollar-level value diagnostic

The ACS B25077 median home value is in dollars (unlike the EIF vigintiles), so we can do the cell-vs-cell dollar comparison Miyuki originally expected. Per block group:

ACS median home value (2013 dollars, owner-occupied units)
Median CoreLogic TOTAL.VALUE.CALCULATED across our 1–4 family residential parcels (2022 dollars)

The vintage mismatch (ACS 2013 vs CL 2022) will bias the ratio upward — NC home values appreciated ~30–40% over that decade. We document the agreement in ratio and correlation terms.

In [12]:

Show / hide code

cl_per_bg <- master_with_bg |>
  st_drop_geometry() |>
  filter(fr_1_4 == 1L,
         interpolate == 0L,                         # raw CoreLogic, no interpolation
         !is.na(TOTAL.VALUE.CALCULATED),
         as.numeric(TOTAL.VALUE.CALCULATED) > 0,
         !is.na(bg_geoid)) |>
  group_by(bg_geoid) |>
  summarise(
    cl_n_parcels  = dplyr::n(),
    cl_median_tvc = median(as.numeric(TOTAL.VALUE.CALCULATED), na.rm = TRUE),
    .groups       = "drop"
  )

cmp <- inner_join(
  cl_per_bg |> filter(cl_n_parcels >= 5),
  bg_2013   |> st_drop_geometry() |>
    select(bg_geoid, bg_median_home_value_2013) |>
    filter(!is.na(bg_median_home_value_2013)),
  by = "bg_geoid"
)

cat("BGs with both CL and ACS medians:",
    format(nrow(cmp), big.mark = ","), "\n")

BGs with both CL and ACS medians: 4,369

Show / hide code

cmp_summary <- cmp |>
  summarise(
    spearman_rho     = cor(cl_median_tvc, bg_median_home_value_2013,
                            method = "spearman"),
    pearson_log      = cor(log(cl_median_tvc), log(bg_median_home_value_2013)),
    median_cl        = median(cl_median_tvc),
    median_acs       = median(bg_median_home_value_2013),
    median_ratio_cl_acs = median(cl_median_tvc / bg_median_home_value_2013)
  )
print(cmp_summary)

# A tibble: 1 × 5
  spearman_rho pearson_log median_cl median_acs median_ratio_cl_acs
         <dbl>       <dbl>     <dbl>      <dbl>               <dbl>
1        0.858       0.814    122265     130400               0.954

Show / hide code

write_csv(cmp, cmp_out_path)
cat("Wrote:", cmp_out_path, "\n")

Wrote: /proj/mhinolab/users/rbless/data/Obstacles_Output/acs_corelogic_value_comparison.csv

Output schema additions to `parcels_master.gpkg`

Column	Type	Source
`bg_geoid`	char (12-digit GEOID)	spatial join (Section 3)
`bg_pop_total_2013`	int	ACS B03002_001
`bg_pct_white_2013`	dbl, 0–100	derived: B03002_003 / B03002_001 × 100
`bg_pct_black_2013`	dbl, 0–100	derived: B03002_004 / B03002_001 × 100
`bg_pct_hispanic_2013`	dbl, 0–100	derived: B03002_012 / B03002_001 × 100
`bg_pct_asian_2013`	dbl, 0–100	derived: B03002_006 / B03002_001 × 100
`bg_pct_aian_2013`	dbl, 0–100	derived: B03002_005 / B03002_001 × 100
`bg_median_income_2013`	int (dollars)	ACS B19013_001
`bg_median_home_value_2013`	int (dollars)	ACS B25077_001

Notes

Block groups follow 2010 Census geography because the 2013 ACS 5-year is published against that geography. If you later want to use 2020-vintage ACS, you’ll need 2020 BG boundaries and a crosswalk if comparing both vintages.
ACS median home value (B25077) is owner-occupied housing units only. Renter-occupied units don’t contribute. This matches the conceptual scope of CoreLogic parcel values (assessor-recorded ownership) reasonably well.
Some block groups have suppressed values for median income or home value when the underlying counts are too small. These appear as NA in the demographic columns — propagating to any parcel in those BGs.
NA bg_geoid parcels are likely off-coast (over water) or in BG-edge positions that the spatial join missed. Investigate the count printed in Section 3 before publishing; if it’s >1% of parcels, may need a fallback rule (nearest-BG via st_nearest_feature).

Step 7b: Join 2013 ACS Block-Group Demographics into Master

Overview

1 Identify study-area counties

2 Pull 2013 ACS demographics + block-group geometries

3 Spatial join parcels to block groups

4 Update parcels_master.gpkg in place

5 Sanity checks

6 CoreLogic-vs-ACS dollar-level value diagnostic

Output schema additions to parcels_master.gpkg

Notes

Output schema additions to `parcels_master.gpkg`