Lab 2: Spatial Analysis and Visualization

Healthcare Access and Equity in Pennsylvania

Author

Wil Rantala

Published

March 17, 2026

Assignment Overview

Learning Objectives: - Apply spatial operations to answer policy-relevant research questions - Integrate census demographic data with spatial analysis - Create publication-quality visualizations and maps - Work with spatial data from multiple sources - Communicate findings effectively for policy audiences

Part 1: Healthcare Access for Vulnerable Populations

Research Question

Which Pennsylvania counties have the highest proportion of vulnerable populations (elderly + low-income) living far from hospitals?

Your analysis should identify counties that should be priorities for healthcare investment and policy intervention.

Required Analysis Steps

Complete the following analysis, documenting each step with code and brief explanations:

Step 1: Data Collection (5 points)

Load the required spatial data: - Pennsylvania county boundaries - Pennsylvania hospitals (from lecture data) - Pennsylvania census tracts

Your Task:

# Load required packages

library(sf)
library(tidyverse)
library(ggplot2)
library(tigris)
library(tidycensus)

# Load spatial data

pa_counties <- st_read("data/Pennsylvania_County_Boundaries.shp")

Reading layer `Pennsylvania_County_Boundaries' from data source 
  `/Users/wilrantala/Desktop/Wil-Rantala---CPLN-5920-Portfolio/labs/lab_2/data/Pennsylvania_County_Boundaries.shp' 
  using driver `ESRI Shapefile'
Simple feature collection with 67 features and 19 fields
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: -8963377 ymin: 4825316 xmax: -8314404 ymax: 5201413
Projected CRS: WGS 84 / Pseudo-Mercator

districts <- st_read("data/districts.geojson")

Reading layer `U.S._Congressional_Districts_for_Pennsylvania' from data source 
  `/Users/wilrantala/Desktop/Wil-Rantala---CPLN-5920-Portfolio/labs/lab_2/data/districts.geojson' 
  using driver `GeoJSON'
Simple feature collection with 17 features and 8 fields
Geometry type: POLYGON
Dimension:     XY
Bounding box:  xmin: -80.51939 ymin: 39.71986 xmax: -74.68956 ymax: 42.26935
Geodetic CRS:  WGS 84

hospitals <- st_read("data/hospitals.geojson")

Reading layer `hospitals' from data source 
  `/Users/wilrantala/Desktop/Wil-Rantala---CPLN-5920-Portfolio/labs/lab_2/data/hospitals.geojson' 
  using driver `GeoJSON'
Simple feature collection with 223 features and 11 fields
Geometry type: POINT
Dimension:     XY
Bounding box:  xmin: -80.49621 ymin: 39.75163 xmax: -74.86704 ymax: 42.13403
Geodetic CRS:  WGS 84

hospitals <- hospitals %>%
  st_transform(st_crs(pa_counties))

census_tracts <- tracts(state = "PA", cb = TRUE)


  |                                                                            
  |                                                                      |   0%
  |                                                                            
  |=                                                                     |   1%
  |                                                                            
  |=                                                                     |   2%
  |                                                                            
  |==                                                                    |   3%
  |                                                                            
  |===                                                                   |   4%
  |                                                                            
  |===                                                                   |   5%
  |                                                                            
  |====                                                                  |   5%
  |                                                                            
  |====                                                                  |   6%
  |                                                                            
  |=====                                                                 |   7%
  |                                                                            
  |======                                                                |   8%
  |                                                                            
  |======                                                                |   9%
  |                                                                            
  |=======                                                               |  10%
  |                                                                            
  |========                                                              |  11%
  |                                                                            
  |=========                                                             |  12%
  |                                                                            
  |=========                                                             |  13%
  |                                                                            
  |==========                                                            |  14%
  |                                                                            
  |===========                                                           |  15%
  |                                                                            
  |===========                                                           |  16%
  |                                                                            
  |============                                                          |  17%
  |                                                                            
  |=============                                                         |  18%
  |                                                                            
  |=============                                                         |  19%
  |                                                                            
  |==============                                                        |  20%
  |                                                                            
  |===============                                                       |  21%
  |                                                                            
  |===============                                                       |  22%
  |                                                                            
  |================                                                      |  23%
  |                                                                            
  |=================                                                     |  24%
  |                                                                            
  |=================                                                     |  25%
  |                                                                            
  |==================                                                    |  26%
  |                                                                            
  |===================                                                   |  27%
  |                                                                            
  |===================                                                   |  28%
  |                                                                            
  |====================                                                  |  29%
  |                                                                            
  |=====================                                                 |  30%
  |                                                                            
  |======================                                                |  31%
  |                                                                            
  |======================                                                |  32%
  |                                                                            
  |=======================                                               |  33%
  |                                                                            
  |========================                                              |  34%
  |                                                                            
  |========================                                              |  35%
  |                                                                            
  |=========================                                             |  36%
  |                                                                            
  |==========================                                            |  37%
  |                                                                            
  |==========================                                            |  38%
  |                                                                            
  |===========================                                           |  39%
  |                                                                            
  |============================                                          |  40%
  |                                                                            
  |============================                                          |  41%
  |                                                                            
  |=============================                                         |  42%
  |                                                                            
  |==============================                                        |  42%
  |                                                                            
  |==============================                                        |  43%
  |                                                                            
  |===============================                                       |  44%
  |                                                                            
  |================================                                      |  45%
  |                                                                            
  |================================                                      |  46%
  |                                                                            
  |=================================                                     |  47%
  |                                                                            
  |==================================                                    |  48%
  |                                                                            
  |===================================                                   |  49%
  |                                                                            
  |===================================                                   |  50%
  |                                                                            
  |====================================                                  |  51%
  |                                                                            
  |=====================================                                 |  52%
  |                                                                            
  |=====================================                                 |  53%
  |                                                                            
  |======================================                                |  54%
  |                                                                            
  |=======================================                               |  55%
  |                                                                            
  |=======================================                               |  56%
  |                                                                            
  |========================================                              |  57%
  |                                                                            
  |=========================================                             |  58%
  |                                                                            
  |=========================================                             |  59%
  |                                                                            
  |==========================================                            |  60%
  |                                                                            
  |===========================================                           |  61%
  |                                                                            
  |===========================================                           |  62%
  |                                                                            
  |============================================                          |  63%
  |                                                                            
  |=============================================                         |  64%
  |                                                                            
  |=============================================                         |  65%
  |                                                                            
  |==============================================                        |  66%
  |                                                                            
  |===============================================                       |  67%
  |                                                                            
  |===============================================                       |  68%
  |                                                                            
  |================================================                      |  69%
  |                                                                            
  |=================================================                     |  70%
  |                                                                            
  |==================================================                    |  71%
  |                                                                            
  |==================================================                    |  72%
  |                                                                            
  |===================================================                   |  73%
  |                                                                            
  |====================================================                  |  74%
  |                                                                            
  |====================================================                  |  75%
  |                                                                            
  |=====================================================                 |  76%
  |                                                                            
  |======================================================                |  77%
  |                                                                            
  |======================================================                |  78%
  |                                                                            
  |=======================================================               |  79%
  |                                                                            
  |========================================================              |  80%
  |                                                                            
  |========================================================              |  81%
  |                                                                            
  |=========================================================             |  81%
  |                                                                            
  |==========================================================            |  82%
  |                                                                            
  |==========================================================            |  83%
  |                                                                            
  |===========================================================           |  84%
  |                                                                            
  |============================================================          |  85%
  |                                                                            
  |============================================================          |  86%
  |                                                                            
  |=============================================================         |  87%
  |                                                                            
  |==============================================================        |  88%
  |                                                                            
  |===============================================================       |  89%
  |                                                                            
  |===============================================================       |  90%
  |                                                                            
  |================================================================      |  91%
  |                                                                            
  |=================================================================     |  92%
  |                                                                            
  |=================================================================     |  93%
  |                                                                            
  |==================================================================    |  94%
  |                                                                            
  |===================================================================   |  95%
  |                                                                            
  |===================================================================   |  96%
  |                                                                            
  |====================================================================  |  97%
  |                                                                            
  |===================================================================== |  98%
  |                                                                            
  |===================================================================== |  99%
  |                                                                            
  |======================================================================| 100%

census_tracts <- census_tracts %>%
  st_transform(st_crs(pa_counties))

# Check that all data loaded correctly: Run glimpse() on all the loaded data


#checking the reference system -- hospitals and tracts should already match pa_counties' reference system since they've been transformed

### Ran st_crs() on all the loaded data

Questions to answer: - How many hospitals are in your dataset? - How many census tracts? - What coordinate reference system is each dataset in?

There are 223 hospitals, 3445 census tracts, and all of the datasets are in WGS 84 (world geodetic system 1984)

Step 2: Get Demographic Data

Use tidycensus to download tract-level demographic data for Pennsylvania.

Required variables: - Total population - Median household income - Population 65 years and over (you may need to sum multiple age categories)

Your Task:

# Get demographic data from ACS
pa_demographics <- get_acs(
  geography = "tract",
  state = "PA",
  variables = c(
    total_pop = "B01001_001",
    median_income = "B19013_001",
    #To get 65+, I'm summing male and female populations by age ranges 65-85+
    male_65_66 = "B01001_020",
    male_67_69 = "B01001_021",
    male_70_74 = "B01001_022",
    male_75_79 = "B01001_023",
    male_80_84 = "B01001_024",
    male_85_over = "B01001_025",
    
    female_65_66 = "B01001_044",
    female_67_69 = "B01001_045",
    female_70_74 = "B01001_046",
    female_75_79 = "B01001_047",
    female_80_84 = "B01001_048",
    female_85_over = "B01001_049"
  ),
  year = 2024,
  output = "wide"
)

# Here, I'm mutating to make pop_65_over a single column

pa_demographics <- pa_demographics %>%
  mutate(
    pop_65_overE = male_65_66E + male_67_69E + male_70_74E +
      male_75_79E + male_80_84E + male_85_overE + female_65_66E +
      female_67_69E + female_70_74E + female_75_79E +
      female_80_84E+ female_85_overE,
    
    pop_65_overM = male_65_66M + male_67_69M + male_70_74M +
      male_75_79M + male_80_84M + male_85_overM + female_65_66M +
      female_67_69M + female_70_74M + female_75_79M +
      female_80_84M+ female_85_overM
  )

# Join to tract boundaries
census_tracts <- census_tracts %>%
  left_join(pa_demographics, by = "GEOID")

#find missing data and median income
sum(is.na(census_tracts$median_incomeE))

[1] 68

median(census_tracts$median_incomeE, na.rm = TRUE)

[1] 74844

Questions to answer: - What year of ACS data are you using? - How many tracts have missing income data? - What is the median income across all PA census tracts?

I’m using 2024 5-year ACS data, as it is the most recent available. There are 68 tracts with missing income data. The median income across all PA census tracts (excluding those without income data) is $74,844

Step 3: Define Vulnerable Populations

Identify census tracts with vulnerable populations based on TWO criteria: 1. Low median household income (choose an appropriate threshold) 2. Significant elderly population (choose an appropriate threshold)

Your Task:

# Filter for vulnerable tracts based on your criteria
vulnerable_tracts <- census_tracts %>%
  mutate(
    pct_65_over = pop_65_overE / total_popE * 100
  ) %>%
  filter(
    median_incomeE < 35000|
    pct_65_over > 25 
  )
#These are tracts in which the median income is less than 40,000, or the percent of population aged 65 and up is greater than 25%

Questions to answer: - What income threshold did you choose and why? - What elderly population threshold did you choose and why? - How many tracts meet your vulnerability criteria? - What percentage of PA census tracts are considered vulnerable by your definition?

I chose $35,000 as the threshold as it is a clean number well below the median tract median income of ~75,000. I could have chosen a lower number around 35,000, but that would exclude many households that are vulnerable but not in abject poverty.

I chose 25% as the elderly population threshold as the national average is 17%. Comparatively, a quarter of a tract’s population being over the age of 65 is a large proportion.

There are 847 tracts within both of the vulnerability criteria I set.

24.6% of PA tracts are considered vulnerable by my definition

Step 4: Calculate Distance to Hospitals

For each vulnerable tract, calculate the distance to the nearest hospital.

Your Task:

# Transform to appropriate projected CRS
## I transformed to PA State Plane South so we can measure distances in feet to later convert to miles.

vulnerable_tracts_proj <- vulnerable_tracts %>%
  st_transform(2272)

hosputals_proj <- hospitals %>%
  st_transform(2272)


# Calculate distance from each tract centroid to nearest hospital
## First I'll get the tract centroids
tract_centroids <- st_centroid(vulnerable_tracts_proj)

## Now I'll make a matrix that has the distances from centroids to hospitals and select the nearest ones, then convert feet to miles
hospital_dists <- st_distance(tract_centroids, hosputals_proj)

vulnerable_tracts_proj$nearest_hospital_dist <-
  apply(hospital_dists, 1, min) / 5280

Requirements: - Use an appropriate projected coordinate system for Pennsylvania - Calculate distances in miles - Explain why you chose your projection

Questions to answer: - What is the average distance to the nearest hospital for vulnerable tracts? - What is the maximum distance? - How many vulnerable tracts are more than 15 miles from the nearest hospital?

For vulnerable tracts, the average distance to the nearest hospital is 4.7 miles away from the tract centroid. The maximum distance is 29.1 miles for tract 9501.04 in Pike County. There are 29 vulnerable tracts with centroids greater than 15 miles from a hospital. I chose the WGS83, Pennsylvania South projection as it uses feet, which is simple to convert into miles.

Step 5: Identify Underserved Areas

Define “underserved” as vulnerable tracts that are more than 15 miles from the nearest hospital.

Your Task:

# Create underserved variable

vulnerable_tracts_proj <- vulnerable_tracts_proj %>%
  mutate(underserved = nearest_hospital_dist > 15)

sum(vulnerable_tracts_proj$underserved)

[1] 29

Questions to answer: - How many tracts are underserved? - What percentage of vulnerable tracts are underserved? - Does this surprise you? Why or why not?

There are 29 underserved vulnerable tracts. That’s 3.4%. That’s much lower than I thought – I was expecting to more vulnerable tracts in the undreserved category based on my assumption that poorer, more elderly areas often have limited access to necessities.

Step 6: Aggregate to County Level

Use spatial joins and aggregation to calculate county-level statistics about vulnerable populations and hospital access.

Your Task:

# Spatial join tracts to counties
## First, I have to project the counties to fit the tract and hospital data.

pa_counties_proj <- pa_counties %>%
  st_transform(2272)

## NOW I can do the join
vulnerable_w_county <- vulnerable_tracts_proj %>%
  st_join(pa_counties_proj, join = st_within)

## It gave me some NAs for COUNTY_NAM when I tried using st_within -- I think because tracts were on the edges of counties? To fix it, I'll take the county names from the NAMELSADCO column that already exists.

vulnerable_w_county <- vulnerable_tracts_proj %>%
  mutate(COUNTY_NAM = str_remove(NAMELSADCO, " County"))

# Aggregate statistics by county

county_stats <- vulnerable_w_county %>%
  st_drop_geometry() %>% ## dropping geometry to avoid issues with grouping and summarizing
  group_by(COUNTY_NAM) %>%
  summarise(
    n_vulnerable_tracts = n(),
    n_underserved = sum(underserved, na.rm = TRUE),
    pct_underserved = mean(underserved, na.rm = TRUE) * 100,
    avg_dist_to_hospital = mean(nearest_hospital_dist, na.rm = TRUE),
    total_vulnerable_pop = sum(total_popE, na.rm = TRUE)
  )

Required county-level statistics: - Number of vulnerable tracts - Number of underserved tracts
- Percentage of vulnerable tracts that are underserved - Average distance to nearest hospital for vulnerable tracts - Total vulnerable population

Questions to answer: - Which 5 counties have the highest percentage of underserved vulnerable tracts? - Which counties have the most vulnerable people living far from hospitals? - Are there any patterns in where underserved counties are located?

The five counties with the highest percentages of underserved vulnerable tracts are: Bradford (100%), Cameron (100%), Forest (100%), Sullivan (100%), and Pike County (58%)

The counties with the most vulnerable people living far from hospitals are Lancaster, Chester, Erie, Dauphin, and Somerset Counties. They don’t all have very high underserved percentages, but their county populations are rather high.

The most underserved counties (percentage-wise) are often ones with the lower populations, and not near major city centers.

Step 7: Create Summary Table

Create a professional table showing the top 10 priority counties for healthcare investment.

Your Task:

# Create and format priority counties table

library(knitr)

### I want to prioritize based on percent underserved, distance to hospital, and total vulnerable population (in that order).

county_stats %>%
  filter(n_vulnerable_tracts > 0) %>%
  mutate(
    priority_score = pct_underserved * 0.5 +
      (avg_dist_to_hospital / max(avg_dist_to_hospital)) * 100 * 0.3 +
      (total_vulnerable_pop / max(total_vulnerable_pop)) * 100 * 0.2
  ) %>%
  
  ## Now that I've made the priority score, I'll arrange a table with the top 10 priority counties with descriptive names, rounded numbers, and commas for large numbers.
  
  arrange(desc(priority_score)) %>%
  slice(1:10) %>%
  mutate(
    total_vulnerable_pop = formatC(total_vulnerable_pop, format = "d", big.mark = ","),
    pct_underserved = round(pct_underserved, 1),
    avg_dist_to_hospital = round(avg_dist_to_hospital, 1)
  ) %>%
  
  ## Now I'll rename the columns with descriptive titles
  
select(
  County = COUNTY_NAM,
  `Vulnerable Tracts` = n_vulnerable_tracts,
  `Underserved Tracts` = n_underserved,
  `Percent Underserved` = pct_underserved,
  `Avg. Distance to Hospital (mi)` = avg_dist_to_hospital,
  `Vulnerable Population` = total_vulnerable_pop
  ) %>%
  
## Now I'll make the kable to display
  
  kable(
    caption = "Ten Pennsylvania Counties Prioritized for Healthcare Investment, ranked by compositing underserved population, distances to hospitals, and vulnerable populations",
    format.args = list(big.mark = ",")
  )

Ten Pennsylvania Counties Prioritized for Healthcare Investment, ranked by compositing underserved population, distances to hospitals, and vulnerable populations
County	Vulnerable Tracts	Underserved Tracts	Percent Underserved	Avg. Distance to Hospital (mi)	Vulnerable Population
Sullivan	4	4	100.0	22.7	5,888
Cameron	1	1	100.0	19.5	2,452
Forest	1	1	100.0	18.1	2,552
Bradford	1	1	100.0	16.7	5,171
Pike	12	7	58.3	17.2	19,162
Huntingdon	5	2	40.0	15.2	12,675
Union	3	1	33.3	8.0	11,300
Wayne	10	2	20.0	11.3	24,884
Clearfield	4	1	25.0	9.5	16,867
Potter	5	1	20.0	9.7	10,887

Requirements: - Use knitr::kable() or similar for formatting - Include descriptive column names - Format numbers appropriately (commas for population, percentages, etc.) - Add an informative caption - Sort by priority (you decide the metric)

Part 2: Comprehensive Visualization

Using the skills from Week 3 (Data Visualization), create publication-quality maps and charts.

Map 1: County-Level Choropleth

Create a choropleth map showing healthcare access challenges at the county level.

Your Task:

# Create county-level access map

## First I'll join county_stats and pa_counties_proj to a new dataset. I need to change COUNTY_NAM in county_stats to uppercase first so they match up.

county_stats <- county_stats %>%
  mutate(COUNTY_NAM = str_to_upper(COUNTY_NAM))

county_map_data <- pa_counties_proj %>%
  left_join(county_stats, by = "COUNTY_NAM")

## Now I'll make the map using ggplot2 -- I fiddled around with colors, line widths, and viridis color palettes to find what I thought looked best. I made sure to account for any NAs that may pop up.

ggplot() +
  geom_sf(data = county_map_data,
          aes(fill = pct_underserved),
          color = "white", linewidth = 0.3) +
  geom_sf(data = hosputals_proj,
          shape = 3,
          size = 1.5,
          color = "red")+
  scale_fill_viridis_c(
    name = "Percent Underserved",
    option = "magma",
    direction = -1,
    na.value = "grey70",
    labels = function(x) paste0(x, "%")
  ) +
  labs(
    title = "Access to Healthcare in Pennsylvania",
    Subtitle = "Percent of vulnerable tracts 15+ miles from the nearest hospital by county",
    caption = "Vulnerable tracts: median income < $35,000 ; elderly population > 25%. \nHospitals marked as red crosses."
  ) +
  theme_void() +
  theme(
    plot.title = element_text(face = "bold", size = 14, hjust = 0.5),
    plot.subtitle = element_text(size = 10, hjust = 0.5, color = "grey45"),
    plot.caption = element_text(size = 8, color = "grey55"),
    legend.position = "right",
    legend.title = element_text(size = 9),
    legend.text = element_text(size = 8),
    plot.margin = margin(10, 10, 10, 10)
  )

Requirements: - Fill counties by percentage of vulnerable tracts that are underserved - Include hospital locations as points - Use an appropriate color scheme - Include clear title, subtitle, and caption - Use theme_void() or similar clean theme - Add a legend with formatted labels

Map 2: Detailed Vulnerability Map

Create a map highlighting underserved vulnerable tracts.

Your Task:

# Create detailed tract-level map

ggplot() +
  geom_sf(data = pa_counties_proj,
          fill = "grey90", color = "black", linewidth = 0.3) +
  ### Mapped the counties just to have their outlines for better spatial comprehension. 
  ### Next, I'll map vulnerable, adequately served and underserved tracts in two different colors (and make them a little see-through to see the counties underneath)
  
  geom_sf(data = vulnerable_tracts_proj %>% filter(!underserved),
          fill = "burlywood", color = NA, alpha = 0.7) +
  
  geom_sf(data = vulnerable_tracts_proj %>% filter(underserved),
          fill = "darkmagenta", color = NA, alpha = 0.7) +
  
  geom_sf(data = hosputals_proj,
          shape = 3, size = 1.5, color = "red", alpha = 0.7) +
  
  labs(title = "Pennsylvania Vulnerable Census Tracts by Healthcare Access",
       subtitle = "PA Vulnerable tracts by access to hospitals with county and hospital locations") +
  theme_void()

Requirements: - Show underserved vulnerable tracts in a contrasting color - Include county boundaries for context - Show hospital locations - Use appropriate visual hierarchy (what should stand out?) - Include informative title and subtitle

Chart: Distribution Analysis

Create a visualization showing the distribution of distances to hospitals for vulnerable populations.

Your Task:

# Create distribution visualization
### I've elected to do a scatterplot of distance vs. vulnerable population size

ggplot(vulnerable_tracts_proj, aes(x = nearest_hospital_dist, y = total_popE)) +
  geom_point(color = "darkmagenta") +
  scale_x_continuous(labels = function(x) paste0(x, " mi")) +
  labs(
    title = "Distance to Hospital vs. Population Size of Vulnerable Census Tracts in PA",
    subtitle = "Each point = one census tract",
    x = "Distance to nearest hospital",
    y = "Census Tract Population"
  )

Suggested chart types: - Histogram or density plot of distances - Box plot comparing distances across regions - Bar chart of underserved tracts by county - Scatter plot of distance vs. vulnerable population size

Requirements: - Clear axes labels with units - Appropriate title - Professional formatting - Brief interpretation (1-2 sentences as a caption or in text)

Whereas at low hospital distances, tract population ranges from 0 to nearly 8,000, once hopsital distances surpass 20 miles, only tracts with populations between 1,000 and 3,000 remain. The scatterplot suggests at lower distances, the tract’s population matters little, but at higher distances, tract populations are far more likely to be smalller than average in the state.

Part 3: Bring Your Own Data Analysis

Choose your own additional spatial dataset and conduct a supplementary analysis.

Challenge Options

Choose ONE of the following challenge exercises, or propose your own research question using OpenDataPhilly data (https://opendataphilly.org/datasets/).

Note these are just loose suggestions to spark ideas - follow or make your own as the data permits and as your ideas evolve. This analysis should include bringing in your own dataset, ensuring the projection/CRS of your layers align and are appropriate for the analysis (not lat/long or geodetic coordinate systems). The analysis portion should include some combination of spatial and attribute operations to answer a relatively straightforward question

Environmental Justice

Option C: Green Space Equity - Data: Parks, Street Trees, Census tracts (race/income demographics) - Question: “Do low-income and minority neighborhoods have equitable access to green space?” - Operations: Buffer parks (10-minute walk = 0.5 mile), calculate tree canopy or park acreage per capita, compare by demographics - Policy relevance: Climate resilience, environmental justice, urban forestry investment

Data Sources

OpenDataPhilly: https://opendataphilly.org/datasets/ - Most datasets available as GeoJSON, Shapefile, or CSV with coordinates - Always check the Metadata for a data dictionary of the fields.

Additional Sources: - Pennsylvania Open Data: https://data.pa.gov/ - Census Bureau (via tidycensus): Demographics, economic indicators, commute patterns - TIGER/Line (via tigris): Geographic boundaries

Recommended Starting Points

If you’re feeling confident: Choose an advanced challenge with multiple data layers. If you are a beginner, choose something more manageable that helps you understand the basics

If you have a different idea: Propose your own question! Just make sure: - You can access the spatial data - You can perform at least 2 spatial operations

Your Analysis

Your Task:

Find and load additional data
- Document your data source
- Check and standardize the CRS
- Provide basic summary statistics

# Load your additional dataset

philly_demo <- get_acs(
  geography = "tract",
  state = "PA",
  county = "Philadelphia",
  variables = c(
    total_pop       = "B01003_001",  # Total population
    median_income   = "B19013_001",  # Median household income
    white           = "B02001_002",  # White alone
    black           = "B02001_003",  # Black or African American alone
    asian           = "B02001_005",  # Asian alone
    hispanic        = "B03003_003"   # Hispanic or Latino
  ),
  year = 2024,
  output = "wide"
)

## Now we've got our demographic data for philly census tracts. Let's make some percent population columns for racial data.

philly_demo <- philly_demo %>%
  mutate(
    pct_white = whiteE / total_popE * 100,
    pct_black = blackE / total_popE * 100,
    pct_asian = asianE / total_popE * 100,
    pct_hispanic = hispanicE / total_popE * 100,
    majority_minority = (pct_white < 50)
  )

philly_tracts <- census_tracts %>%
  filter(COUNTYFP == "101") %>%
  left_join(philly_demo, by = "GEOID") %>%
  st_transform(2272)

## Had to fix the variable names because there were duplicates
philly_tracts <- philly_tracts %>%
  mutate(
    total_pop = total_popE.x,
    median_income = median_incomeE.x
  ) %>%
  select(-ends_with(".x"), -ends_with(".y"), -ends_with("M"),
         -NAME)

## Now let's load the data from OpenDataPhilly for trees and parks. Once loaded, we also have to transform them to be in the same projection.

philly_trees <- st_read("data/ppr_tree_inventory_2025.shp")

Reading layer `ppr_tree_inventory_2025' from data source 
  `/Users/wilrantala/Desktop/Wil-Rantala---CPLN-5920-Portfolio/labs/lab_2/data/ppr_tree_inventory_2025.shp' 
  using driver `ESRI Shapefile'
Simple feature collection with 151726 features and 6 fields
Geometry type: POINT
Dimension:     XY
Bounding box:  xmin: -8380434 ymin: 4847791 xmax: -8344373 ymax: 4885938
Projected CRS: WGS 84 / Pseudo-Mercator

philly_parks <- st_read("data/PPR_Properties.shp")

Reading layer `PPR_Properties' from data source 
  `/Users/wilrantala/Desktop/Wil-Rantala---CPLN-5920-Portfolio/labs/lab_2/data/PPR_Properties.shp' 
  using driver `ESRI Shapefile'
Simple feature collection with 504 features and 25 fields
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: -8380525 ymin: 4847239 xmax: -8344359 ymax: 4885123
Projected CRS: WGS 84 / Pseudo-Mercator

philly_trees <- philly_trees %>%
  st_transform(2272)

philly_parks <- philly_parks %>%
  st_transform(2272)

st_crs(philly_parks)

Coordinate Reference System:
  User input: EPSG:2272 
  wkt:
PROJCRS["NAD83 / Pennsylvania South (ftUS)",
    BASEGEOGCRS["NAD83",
        DATUM["North American Datum 1983",
            ELLIPSOID["GRS 1980",6378137,298.257222101,
                LENGTHUNIT["metre",1]]],
        PRIMEM["Greenwich",0,
            ANGLEUNIT["degree",0.0174532925199433]],
        ID["EPSG",4269]],
    CONVERSION["SPCS83 Pennsylvania South zone (US survey foot)",
        METHOD["Lambert Conic Conformal (2SP)",
            ID["EPSG",9802]],
        PARAMETER["Latitude of false origin",39.3333333333333,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8821]],
        PARAMETER["Longitude of false origin",-77.75,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8822]],
        PARAMETER["Latitude of 1st standard parallel",40.9666666666667,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8823]],
        PARAMETER["Latitude of 2nd standard parallel",39.9333333333333,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8824]],
        PARAMETER["Easting at false origin",1968500,
            LENGTHUNIT["US survey foot",0.304800609601219],
            ID["EPSG",8826]],
        PARAMETER["Northing at false origin",0,
            LENGTHUNIT["US survey foot",0.304800609601219],
            ID["EPSG",8827]]],
    CS[Cartesian,2],
        AXIS["easting (X)",east,
            ORDER[1],
            LENGTHUNIT["US survey foot",0.304800609601219]],
        AXIS["northing (Y)",north,
            ORDER[2],
            LENGTHUNIT["US survey foot",0.304800609601219]],
    USAGE[
        SCOPE["Engineering survey, topographic mapping."],
        AREA["United States (USA) - Pennsylvania - counties of Adams; Allegheny; Armstrong; Beaver; Bedford; Berks; Blair; Bucks; Butler; Cambria; Chester; Cumberland; Dauphin; Delaware; Fayette; Franklin; Fulton; Greene; Huntingdon; Indiana; Juniata; Lancaster; Lawrence; Lebanon; Lehigh; Mifflin; Montgomery; Northampton; Perry; Philadelphia; Schuylkill; Snyder; Somerset; Washington; Westmoreland; York."],
        BBOX[39.71,-80.53,41.18,-74.72]],
    ID["EPSG",2272]]

st_crs(philly_trees)

Coordinate Reference System:
  User input: EPSG:2272 
  wkt:
PROJCRS["NAD83 / Pennsylvania South (ftUS)",
    BASEGEOGCRS["NAD83",
        DATUM["North American Datum 1983",
            ELLIPSOID["GRS 1980",6378137,298.257222101,
                LENGTHUNIT["metre",1]]],
        PRIMEM["Greenwich",0,
            ANGLEUNIT["degree",0.0174532925199433]],
        ID["EPSG",4269]],
    CONVERSION["SPCS83 Pennsylvania South zone (US survey foot)",
        METHOD["Lambert Conic Conformal (2SP)",
            ID["EPSG",9802]],
        PARAMETER["Latitude of false origin",39.3333333333333,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8821]],
        PARAMETER["Longitude of false origin",-77.75,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8822]],
        PARAMETER["Latitude of 1st standard parallel",40.9666666666667,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8823]],
        PARAMETER["Latitude of 2nd standard parallel",39.9333333333333,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8824]],
        PARAMETER["Easting at false origin",1968500,
            LENGTHUNIT["US survey foot",0.304800609601219],
            ID["EPSG",8826]],
        PARAMETER["Northing at false origin",0,
            LENGTHUNIT["US survey foot",0.304800609601219],
            ID["EPSG",8827]]],
    CS[Cartesian,2],
        AXIS["easting (X)",east,
            ORDER[1],
            LENGTHUNIT["US survey foot",0.304800609601219]],
        AXIS["northing (Y)",north,
            ORDER[2],
            LENGTHUNIT["US survey foot",0.304800609601219]],
    USAGE[
        SCOPE["Engineering survey, topographic mapping."],
        AREA["United States (USA) - Pennsylvania - counties of Adams; Allegheny; Armstrong; Beaver; Bedford; Berks; Blair; Bucks; Butler; Cambria; Chester; Cumberland; Dauphin; Delaware; Fayette; Franklin; Fulton; Greene; Huntingdon; Indiana; Juniata; Lancaster; Lawrence; Lebanon; Lehigh; Mifflin; Montgomery; Northampton; Perry; Philadelphia; Schuylkill; Snyder; Somerset; Washington; Westmoreland; York."],
        BBOX[39.71,-80.53,41.18,-74.72]],
    ID["EPSG",2272]]

st_crs(philly_tracts)

Coordinate Reference System:
  User input: EPSG:2272 
  wkt:
PROJCRS["NAD83 / Pennsylvania South (ftUS)",
    BASEGEOGCRS["NAD83",
        DATUM["North American Datum 1983",
            ELLIPSOID["GRS 1980",6378137,298.257222101,
                LENGTHUNIT["metre",1]]],
        PRIMEM["Greenwich",0,
            ANGLEUNIT["degree",0.0174532925199433]],
        ID["EPSG",4269]],
    CONVERSION["SPCS83 Pennsylvania South zone (US survey foot)",
        METHOD["Lambert Conic Conformal (2SP)",
            ID["EPSG",9802]],
        PARAMETER["Latitude of false origin",39.3333333333333,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8821]],
        PARAMETER["Longitude of false origin",-77.75,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8822]],
        PARAMETER["Latitude of 1st standard parallel",40.9666666666667,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8823]],
        PARAMETER["Latitude of 2nd standard parallel",39.9333333333333,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8824]],
        PARAMETER["Easting at false origin",1968500,
            LENGTHUNIT["US survey foot",0.304800609601219],
            ID["EPSG",8826]],
        PARAMETER["Northing at false origin",0,
            LENGTHUNIT["US survey foot",0.304800609601219],
            ID["EPSG",8827]]],
    CS[Cartesian,2],
        AXIS["easting (X)",east,
            ORDER[1],
            LENGTHUNIT["US survey foot",0.304800609601219]],
        AXIS["northing (Y)",north,
            ORDER[2],
            LENGTHUNIT["US survey foot",0.304800609601219]],
    USAGE[
        SCOPE["Engineering survey, topographic mapping."],
        AREA["United States (USA) - Pennsylvania - counties of Adams; Allegheny; Armstrong; Beaver; Bedford; Berks; Blair; Bucks; Butler; Cambria; Chester; Cumberland; Dauphin; Delaware; Fayette; Franklin; Fulton; Greene; Huntingdon; Indiana; Juniata; Lancaster; Lawrence; Lebanon; Lehigh; Mifflin; Montgomery; Northampton; Perry; Philadelphia; Schuylkill; Snyder; Somerset; Washington; Westmoreland; York."],
        BBOX[39.71,-80.53,41.18,-74.72]],
    ID["EPSG",2272]]

### mapped it just to make sure it all looks alright
plot(st_geometry(philly_tracts), border = "grey60")
plot(st_geometry(philly_trees), col = "darkgreen", add = TRUE)

### Now let's make a summary table of some basic info about the data
summary_table <- data.frame(
  Dataset = c("Census Tracts", "Parks", "Street Trees"),
  Features = c(nrow(philly_tracts), nrow(philly_parks), nrow(philly_trees)),
  Geometry = c("Polygon", "Polygon", "Point"),
  CRS = c("EPSG 2272", "EPSG 2272", "EPSG 2272")
)

kable(summary_table,
      col.names = c("Dataset", "No. of Features", "Geometry Type", "CRS"),
      caption = "Summary of Datasets Used in Green Space Equity Analysis")

Summary of Datasets Used in Green Space Equity Analysis
Dataset	No. of Features	Geometry Type	CRS
Census Tracts	408	Polygon	EPSG 2272
Parks	504	Polygon	EPSG 2272
Street Trees	151726	Point	EPSG 2272

Questions to answer: - What dataset did you choose and why? - What is the data source and date? - How many features does it contain? - What CRS is it in? Did you need to transform it?

I chose to use a tree inventory dataset from OpenDataPhilly, a park polygon shapefile from PPR, and census tracts including race and income demographics from the 5-year ACS 2024. I decided to use this data to observe any trends between access to green space/nature and demographics in Philly.

The tree inventory point dataset is from 2025 and contains over 150,000 entries with 7 columns, including the tree’s coordinates, type/species, and year. The parks shapefile from PPR is from 2018, which is older than the other data in this analysis, but is less concerning since parkspace changes much less over time than trees or demographics. It contains 504 entries with 26 columns providing details on the type, use, geometry, and more. The ACS demographic and tract data is from 2024. There are 408 entries and columns for geometry, demographics, etc.

All the data was transformed to NAD83, Pennsylvania South (ftUS).

Pose a research question

Write a clear research statement that your analysis will answer.

“Do low-income and minority neighborhoods have equitable access to green space?”

Conduct spatial analysis

Use at least TWO spatial operations to answer your research question.

Required operations (choose 2+): - Buffers - Spatial joins - Spatial filtering with predicates - Distance calculations - Intersections or unions - Point-in-polygon aggregation

Your Task:

# Your spatial analysis

## First, make a half-mile buffer around parks. dist = 2640 because units are in feet. A half mile is a good measure of convenient walking distance.
parks_buffer <- st_buffer(philly_parks, dist = 2640)
parks_buffer_union <- st_union(parks_buffer)
## I also combined them with st_union so they don't all overlap and count tracts twice in later analysis

## Now let's count how many trees are in each tract
trees_per_tract <- st_join(philly_tracts, philly_trees) %>%
  st_drop_geometry() %>%
  group_by(GEOID) %>%
  summarise(tree_count = n())

## Calculate park area in each tract (and fix with st_make_valid)
philly_parks <- st_make_valid(philly_parks)
philly_tracts <- st_make_valid(philly_tracts)

park_tract_intersection <- st_intersection(philly_tracts, philly_parks)

park_area_per_tract <- park_tract_intersection %>%
  st_drop_geometry() %>%
  mutate(park_area = as.numeric(st_area(park_tract_intersection))) %>%
  group_by(GEOID) %>%
  summarise(total_park_area = sum(park_area, na.rm = TRUE))

## Alright now we join them back to philly_tracts and get per capita info
philly_tracts <- philly_tracts %>%
  left_join(trees_per_tract, by = "GEOID") %>%
  left_join(park_area_per_tract, by = "GEOID") %>%
  mutate(
    tree_count = replace_na(tree_count, 0),
    total_park_area = replace_na(total_park_area, 0),
    trees_per_capita = tree_count / total_pop,
    park_area_per_capita = total_park_area / total_pop
  )

## When checking the data, there were some issues with tracts with 0 population, so I had to count them as NA instead to avoid dividing by 0.
philly_tracts <- philly_tracts %>%
  mutate(
    trees_per_capita = ifelse(total_pop > 0, tree_count / total_pop, NA),
    park_area_per_capita = ifelse(total_pop > 0, total_park_area / total_pop, NA)
  )

summary(philly_tracts$trees_per_capita)

    Min.  1st Qu.   Median     Mean  3rd Qu.     Max.     NA's 
 0.00516  0.04569  0.07550  0.14553  0.13335 12.45588       17

summary(philly_tracts$park_area_per_capita)

    Min.  1st Qu.   Median     Mean  3rd Qu.     Max.     NA's 
     0.0      0.0     32.8   2667.3    116.5 777882.3       17

## Ok, now that we have the parks buffered, our per capita data, and per tract data, we can start comparing green space allocation by income and minority statistics for census tracts.

#First, let's check if tracts have centroids within the .5 mile parks buffer 

tract_centroids <- st_centroid(philly_tracts)

philly_tracts <- philly_tracts %>%
  mutate(park_access = lengths(st_intersects(tract_centroids, parks_buffer_union)) > 0)

#Now we can break the demographics up into groups to better compare metrics

### First, let's make two income levels: above and below a median income of $40,000. With this, we can summarise park access (rounded) by income group.
philly_tracts %>%
  st_drop_geometry() %>%
  mutate(income_group = ifelse(median_income < 40000, "Low Income <$40k", "Higher Income >$40k")) %>%
  group_by(income_group) %>%
  summarise(n_tracts = n(),
            pct_park_access = round(mean(park_access, na.rm = TRUE) * 100, 1),
            avg_trees_per_capita = round(mean(trees_per_capita, na.rm = TRUE), 3),
            avg_park_area_per_capita = round(mean(park_area_per_capita, na.rm = TRUE), 1)
            ) %>%
### Now, let's make a table with clean column names and a caption to show the summary.
  kable(
    col.names = c("Income Level", "No. of Tracts", "% within .5 miles of a park", "Avg Trees per Capita", "Avg Park Area per Capita (sqft)"),
    caption = "Nature/Green Space Access by Income across Philadelphia Census Tracts"
  )

Nature/Green Space Access by Income across Philadelphia Census Tracts
Income Level	No. of Tracts	% within .5 miles of a park	Avg Trees per Capita	Avg Park Area per Capita (sqft)
Higher Income >$40k	296	97.0	0.117	837.9
Low Income <$40k	79	98.7	0.082	101.3
NA	33	75.8	0.984	49180.9

# Now let's make another summary table looking at green space access for majority-minority tracts vs. majority white tracts. It will mostly follow the same steps as the previous table.

philly_tracts %>%
  st_drop_geometry() %>%
  group_by(majority_minority) %>%
  summarise(
    n_tracts = n(),
    pct_park_access = round(mean(park_access, na.rm = TRUE) * 100, 1),
    avg_trees_per_capita = round(mean(trees_per_capita, na.rm = TRUE), 3),
    avg_park_area_per_capita = round(mean(park_area_per_capita, na.rm = TRUE), 1)
  ) %>%
  mutate(majority_minority = ifelse(majority_minority, "Majority-Minority", "Majority White")) %>%
  kable(
    col.names = c("Race Demographics", "No. of Tracts", "% within .5 miles of a park", "Avg Trees per Capita", "Avg Park Area per Capita (sqft)"),
    caption = "Nature/Green Space Access by Race Demographics across Philadelphia Census Tracts"
  )

Nature/Green Space Access by Race Demographics across Philadelphia Census Tracts
Race Demographics	No. of Tracts	% within .5 miles of a park	Avg Trees per Capita	Avg Park Area per Capita (sqft)
Majority White	141	97.9	0.252	7180.8
Majority-Minority	250	96.8	0.086	121.7
NA	17	58.8	NaN	NaN

# Now that we have two summary tables, let's make a map to visualize trees per capita across Philly census tracts with park buffers. It should give a general idea of which parts have differing levels of nature access.

ggplot() +
  geom_sf(data = philly_tracts, aes(fill = trees_per_capita), color = "white", linewidth = 0.1) + #Mapping trees per capita in each tract
  geom_sf(data = parks_buffer_union, fill = adjustcolor("darkgreen", alpha = 0.2), color = "darkgreen", linewidth = 0.3) + #Mapping the parks buffers and making them transparent as not to drown out the rest of the map
  geom_sf(data = philly_parks, fill = "darkgreen", color = NA) +
  scale_fill_viridis_c(
    name = "Trees per Capita",
    option = "cividis",
    na.value = "grey80",
    limits = c(0, 0.25),
    oob = scales::squish
  ) + #Here, I just made the parks dark green, and visualized the trees per capita using a color gradient in each tract. There were some very high outliers, so I used limits and oob = scales::squish to set a limit for the color scale, and make anything above it appear as the max value (0.25 in this case)
  labs(
    title = "Trees per Capita and Park Access in Philadelphia",
    caption = "Park buffers represent a 10-minute walking distance. Dark green areas are parks.") +
   #Added a title and caption
  theme_void() # Got rid of the grey background grid

Analysis requirements: - Clear code comments explaining each step - Appropriate CRS transformations - Summary statistics or counts - At least one map showing your findings - Brief interpretation of results (3-5 sentences)

Your interpretation:

While the higher and lower income groups have similar near-100% access to a park within 0.5 miles, their average trees per capita and average park area per capita vastly Differ. Higher income tracts have 1.4x as many trees per capita and over 8x as much park area per capita. Majority White and Majority Minority tracts also have similar access to a park within .5 miles, but too have vastly differing per capita metrics. Majority White tracts have nearly 3x more trees per capita, and a staggering 59x greater avg park area per capita. Mapped, it’s clear that white and wealth areas in Center City, NW Philly, and areas along the Delaware river have far greater trees per capita, which aligns with the trends seen in the summary tables.

Finally - A few comments about your incorporation of feedback!

Take a few moments to clean up your markdown document and then write a line or two or three about how you may have incorporated feedback that you recieved after your first assignment.

Submission Requirements

What to submit:

Rendered HTML document posted to your course portfolio with all code, outputs, maps, and text
- Use embed-resources: true in YAML so it’s a single file
- All code should run without errors
- All maps and charts should display correctly
Submit the correct and working links of your assignment on Canvas