Lab 3: Midterm
2026-03-17
Why?
How?
Property Sales from 2023-2024
ACS 5-year 2023 data at the Block Group level
Google Maps
OpenDataPhilly
Center City and Northwest Philly have many of the highest prices, with some areas having median sale values in the $ millions
Northeast Philly has median sale prices in the ~$500,000 range
Higher Values in West Philly are concentrated near University City
Living Area has a powerful positive correlation with home sale price
In preliminary data analysis, living space came out as a key driver of housing prices
In our actual models, we use the natural log of sale price to achieve a more normal distribution
We made four models, each progressively more complex with varying predictive data types
Model 1: Spatial Data
Model 2: Structural and Census Data
Model 3: Structural, Census, and Spatial Data
Model 4: Structural, Census, Spatial, and Fixed Effect Data
Model Accuracy: RMSE = 0.53
Top Predictors:
Worst Predictor:
Recommendations:
Comprehensive neighborhood investment (not just focusing transit, green space, or crime)
Including interaction terms of spatial variables and income could be key improvements it may matter more in high/low income neighborhoods
Importance of distance to transit varies greatly in neighborhoods depending on their demographics and other spatial features
Census Tracts of NE Philly were difficult to predict
Limitations and Concerns
Multicollinearity
Variables as Proxies (e.g. crime density is dependent on police activity and arrests)
ACS margins of error (espacially at the block level)
Temporal restraint (applicable for other years?)
Next Steps
Interaction models with transit and other variables to understand verying relationships
Better distance metrics (walking/driving times)
Additional data (environmental threats, school quality)