This notebook bridges Helena’s pre-existing flood event data onto R’s parcel_index, producing per-parcel flood and eligibility counts that match the Julian’s reported 67.5% eligibility ratio.
Data source decision. We use the pre-aggregated parcels_events.csv produced by Julian and then crosswalked to R’s parcel_index via (cntyfips, parno).
Inputs:
parcels_pri.gpkg — R’s parcels (from Step 5)
/proj/.../Raw Data Inputs/parcels_study_area.csv — Python’s parcel index registry, used to build the (cntyfips, parno) crosswalk
/proj/.../Intermediary Data/parcels_events.csv — pre-aggregated flood and eligibility columns per Python parcel_index, from Julian
Outputs (written to out_dir):
parcel_crosswalk_R_Py.csv — Python parcel_index ↔︎ R parcel_index crosswalk for future reference
parcels_events_R.csv — final per-parcel flood and eligibility table, keyed on R’s parcel_index, ready to join to parcels_pri.gpkg
Methodology
Why a crosswalk is necessary. Python’s parcel_index was created as the row position in pandas’s read of the source NC1 file, after Python-specific filtering and deduplication. R’s parcel_index was created independently during the R Step 1 pipeline. The two sets diverge: R has ~29k more parcels and same numeric indices refer to different parcels.
Crosswalk key choice.(cntyfips, parno) is the most stable real-world parcel identifier across both pipelines. This uses a normalized parno (alphanumeric only, matching Step 3’s strip_alnum) for the primary crosswalk and altparno as a fallback for R parcels that didn’t match on parno.
Flood and eligibility columns. The Original parcels_events.csv already contains pre-computed flooded and eligible counts (and per-event _max columns) following Python Step 12’s logic. This translates these directly to R’s parcel_index space via the crosswalk.
In [1]:
Show / hide code
library(sf)
Linking to GEOS 3.12.0, GDAL 3.11.0, PROJ 9.2.1; sf_use_s2() is TRUE
Show / hide code
library(dplyr)
Attaching package: 'dplyr'
The following objects are masked from 'package:stats':
filter, lag
The following objects are masked from 'package:base':
intersect, setdiff, setequal, union
Python’s aggregated event CSV is keyed on Python’s parcel_index. We need to translate to R’s parcel_index so the flood data can be joined back to parcels_pri.gpkg. The join key is (cntyfips, parno), normalized alphanumerically to handle formatting differences.
Crosswalk saved to: /proj/mhinolab/users/rbless/data/Obstacles_Output/parcel_crosswalk_R_Py.csv
2 Crosswalk the Original aggregated event data to R parcel space
Read parcels_events.csv and translate its Python parcel_index keys to R’s parcel_index via the crosswalk built above. The flooded and eligible columns in this file are pre-computed and used as-is.
cat("Crosswalk match rate: ",sprintf("%.1f%%", 100*nrow(post_merge) /nrow(py_events_aggregated)), "\n")
Crosswalk match rate: 96.5%
3 Sanity check vs paper
In [8]:
Show / hide code
# Coerce flooded / eligible to numeric — they were read as characterpost_merge <- post_merge |>mutate(flooded =as.integer(flooded_any),eligible =as.integer(eligible) )flooded_n <-sum(post_merge$flooded >0, na.rm =TRUE)eligible_n <-sum(post_merge$eligible >0, na.rm =TRUE)# Residential-only subset (fr_1_4 == "1" in the Original CSV)res_rows <- post_merge |>filter(as.integer(fr_1_4) ==1)res_flooded_n <-sum(res_rows$flooded_any >0, na.rm =TRUE)res_eligible_n <-sum(res_rows$eligible >0, na.rm =TRUE)sanity_tbl <-tibble(Metric =c("Total parcels (crosswalked)","Parcels flooded ≥ 1 time","Parcels eligible ≥ 1 declared event","Eligible / Flooded — all parcels","Eligible / Flooded — residential only (fr_1_4 == 1)","Paper reports" ),Value =c( scales::comma(nrow(post_merge)), scales::comma(flooded_n), scales::comma(eligible_n),sprintf("%.1f%%", 100* eligible_n / flooded_n),sprintf("%.1f%%", 100* res_eligible_n / res_flooded_n),"67.5%" ))knitr::kable(sanity_tbl, caption ="Sanity check: eligibility rate vs paper claim")
Sanity check: eligibility rate vs paper claim
Metric
Value
Total parcels (crosswalked)
4,041,245
Parcels flooded ≥ 1 time
372,779
Parcels eligible ≥ 1 declared event
256,376
Eligible / Flooded — all parcels
68.8%
Eligible / Flooded — residential only (fr_1_4 == 1)
69.4%
Paper reports
67.5%
Expected: all-parcel ratio ~69.0%, residential ~69.4%. If either ratio is far from these values, check that aggregated_events_path points to the correct parcel_events file.
Prerequisite. Run the Step 6 per-cell summary chunk first so eif_cells_summary.csv exists in out_dir. Without it, the cell demographics join silently produces NAs.
Centroids in WGS84 are the canonical parcel location. CoreLogic’s PARCEL.LEVEL.LATITUDE / LONGITUDE are not used — they have ~34% missingness for late-pipeline parcels (completed buyouts, voided addresses) and known precision issues. The EIF cell key uses Step 6’s convention (cell-center, 3-decimal rounded) so the join to eif_cells_summary.csv works directly.