A Beer Lover’s BFF? An Introduction to Geospatial Interpolation via Inverse Distance Weighting
--
Beer is good. It’s also increasingly easier and easier to come by thanks to the proliferation of American craft watering holes and breweries. Just how much easier is a straightforward question to answer with a little geospatial interpolation, to accomplish which there are a number of techniques, but in this article, the focus is Inverse Distance Weighted (IDW) Interpolation. We use it on a public dataset of over 11,000 breweries and/or brewpubs to identify each state’s most hoppinin’ city and interpolate how far one could be from it at any given locale, making geospatial interpolation a la IDW a beer-lover’s BFF.
Before diving into specific states, I want to know how breweries are roughly distributed across the contiguous U.S., based on all breweries and/or brewpubs in our dataset, which I found via Google’s new Google Dataset Search.
Each orangish (pumpkin ale-colored?) dot is the location of a state’s beer scene, on geographic average, which I calculated with Lucy D’Agostino McGowan’s function, the ad-hoc geographic_average(), but it’s unhelpful in our context because of how it pulls the center of the state’s brew scene outside metros with high pub density due to how pubs are spread across the state. We can see this in Wisconsin, my home state.
That orange dot is nowhere near Milwaukee, WI, unquestionably the state’s pub hub.
If you’re interested in WI for its brew scene, go to Milwaukee, not some cornfield outside Ripon:
It’s clear that we don’t want to know the average location of all breweries in a state, but the best, defined simply as the city with the most absolute number of breweries. (Note that I could just weight our geographic average by population density or another choice variable, but that’s not important to me now.) Because I’m moving to Portland in a few days, let’s check Oregon.
As I suspected, Portland takes the cake.
Having a decent handle now on the distribution of breweries and having identified a metric that usefully measures the city with the “best” brew scene, let’s keep on-pint and hop to geospatial interpolation, the intuition of which is to predict the value of something in an area you don’t have data using data from an adjacent area. Inverse Distance Weighting (IDW) is a particular flavor of this kind of interpolation that weights the values of that something such that its values get smaller the farther away you get from the area with data on the assumption that points close together are more similar than points farther away, which is particularly a particularly useful strategy if, say, you know someone who wants to know how far they’ll be living from their state’s brew scene if they’re anywhere within their new home-state (and they know a little R).
So let’s fit a model that can predict the average distance between a hoppinin’ brew scene and any location given the coordinates of the nearest brewery. Then I’ll generate a dense grid with evenly spaced cells on which to run and map the model.
To do this, we’ll:
- Create distance matrix of the distance between any brewery and its closest brewery neighbor using GeoSphere
- Get and prep an ERSI U.S. Shapefile from the U.S. Census Bureau using rgdal
- Fit IDW using gstat
- Build a raster plot using ggplot2
- Compute a distance matrix
The point of this step is to figure out how far the nearest brewery is from every brewery so we can use try to predict it with a simple interpolation model. I’ll do this for Wisconsin and Oregon separately, but only WI is shown here:
breweries_wi <- breweries_us[breweries_us$province == “WI”,]
cities_wi <- city_sum[city_sum$state == “WI”,]
mat_wi <- distm(breweries_wi[,c(“longitude”,”latitude”)],
breweries_wi[,c(“longitude”,”latitude”)],
fun = distVincentyEllipsoid)
This distm() function computes the distance between two points according to the ‘Vincenty (ellipsoid)’ method, which is understood to be one of the most accurate ways to get as-a-crow flies distance between two locations on ellipsoidal spheres like Earth.
distm(breweries_wi[,c(“longitude”,”latitude”)],
breweries_wi[,c(“longitude”,”latitude”)],
fun = distVincentyEllipsoid)
# convert meters to miles
mat_wi <- mat_wi/1609.344
# don’t want to include itself so replace with a big number that’ll never be the smallest.
mat_wi[mat_wi == 0] <- 1000000
breweries_wi %>%
mutate(closest_pub = breweries_wi$name[max.col(-mat_wi)],
closest_pub_city = breweries_wi$city[max.col(-mat_wi)],
closest_pub_address = breweries_wi$address[max.col(-mat_wi)],
closest_lat = breweries_wi$latitude[max.col(-mat_wi)],
closest_lon = breweries_wi$longitude[max.col(-mat_wi)]) -> breweries_wi
breweries_wi$miles_to_closest <- distVincentyEllipsoid(p1 = breweries_wi[,c(“longitude”, “latitude”)],
p2 = breweries_wi[,c(“closest_lon”, “closest_lat”)]) / 1609.344
- Get an ESRI shapefile that we project interpolations into, for WI, OR, and the contiguous U.S.
To interpolate one must first know what to interpolate into. Since we’re predicting the distance from the nearest brewery, we need to slice and dice our map into many small squares into which we’ll eventually spit interpolations. The rgdal, gstat, and sp packages makes manipulating and computing a wide variety of geospatial/geotemporal on geospatial objects easy.
us <- rgdal::readOGR(“./Data/Raw/US shapefile”,
“tl_2017_us_state”)
# isolate to contiguous U.S.
no_thanks <- c(‘Alaska’, ‘American Samoa’, ‘Puerto Rico’, ‘Guam’,
‘Commonwealth of the Northern Mariana Islands United States Virgin Islands’,
‘Commonwealth of the Northern Mariana Islands’,
‘United States Virgin Islands’, ‘Hawaii’)
wi <- subset(us, (us@data$NAME %in% “Wisconsin”))
makegrid(wi, n = 2000000) %>% SpatialPoints(proj4string = CRS(proj4string(us))) %>%
.[wi, ] -> grid_wi
plot(grid_wi)
# convert the data to a spacial dataframe.
sp::coordinates(breweries_wi) = ~longitude + latitude
# Ensure projection matches the grid
proj4string(breweries_wi) <- CRS(proj4string(wi))
- Fit IDW interpolation model using gstat
Now let’s fit a simple interpolation model to predict the miles to the closest brewery for each of the tiny squares we just punched into our map.
idw_model <- gstat::idw(
formula = miles_to_closest ~ 1,
locations = breweries_wi,
newdata = grid_wi,
idp = 2)
Our formula is simple because how far you are from a brewery is just a function of your location. Locations is our spatial dataframe, newdata is our map with our squares overlaid on the shapefile. Idp = 2 is the default inverse distance weighting power, which we’ll keep.
Now we need to extract the interpolated predictions.
interpolated_results = as.data.frame(idw_model) %>% {
names(.)[1:3] <- c(“longitude”, “latitude”, “miles_to_closest”)
. } %>%
select(longitude, latitude, miles_to_closest)
Here’s our output:
Locations without a distance value in the original dataset now have a predicted or interpolated, one.
- Replot maps based on the interpolated distance from the nearest brewery.
To explore the spadework behind this post, check out my .R on GitHub.
References
- Brewery data are from Kaggle found via Google Dataset Search
- U.S. Shapefile from the U.S. Census Bureau
- Interpolation Tutorial
- GeoSphere Vignette
Further Reading