Spatial join is yet another classic GIS problem. Getting attributes from one layer and transferring them into another layer based on their spatial relationship is something you most likely need to do on a regular basis. The
previous materials focused on learning how to perform a Point in Polygon query. We could now apply those techniques and create our own function to perform a spatial join between two layers based on their spatial relationship. We could for example join the attributes of a polygon layer into a point layer where each point would get the
attributes of a polygon that Luckily, spatial join (
Sounds familiar? Yep, all of those spatial relationships were discussed in the previous materials, thus you should know how they work. Let’s perform a spatial join between the address-point Shapefile that we created and then reprojected and a Polygon layer that is a 250m x 250m grid showing the amount of people living in Helsinki Region. Download and clean the data¶For this lesson we will be using publicly available population data from Helsinki that can be downloaded from Helsinki Region Infroshare (HRI) which is an excellent source that provides all sorts of open data from Helsinki, Finland. From HRI download a Population grid for year 2015 that is a dataset (.shp) produced by Helsinki Region Environmental Services Authority (HSY) (see this page to access data from different years).
$ cd $ unzip Vaestotietoruudukko_2015.zip -d Pop15 $ ls Pop15 Vaestotietoruudukko_2015.dbf Vaestotietoruudukko_2015.shp Vaestotietoruudukko_2015.prj Vaestotietoruudukko_2015.shx You should now have a folder
import geopandas as gpd # Filepath fp = "/home/geo/Pop15/Vaestotietoruudukko_2015.shp" # Read the data pop = gpd.read_file(fp) # See the first rows In [1]: pop.head() Out[1]: INDEX ASUKKAITA ASVALJYYS IKA0_9 IKA10_19 IKA20_29 IKA30_39 \ 0 688 8 31.0 99 99 99 99 1 703 6 42.0 99 99 99 99 2 710 8 44.0 99 99 99 99 3 711 7 64.0 99 99 99 99 4 715 19 23.0 99 99 99 99 IKA40_49 IKA50_59 IKA60_69 IKA70_79 IKA_YLI80 \ 0 99 99 99 99 99 1 99 99 99 99 99 2 99 99 99 99 99 3 99 99 99 99 99 4 99 99 99 99 99 geometry 0 POLYGON ((25472499.99532626 6689749.005069185,... 1 POLYGON ((25472499.99532626 6685998.998064222,... 2 POLYGON ((25472499.99532626 6684249.004130407,... 3 POLYGON ((25472499.99532626 6683999.004997005,... 4 POLYGON ((25472499.99532626 6682998.998461431,... Okey so we have multiple columns in the dataset but the most important one here is the column
# Change the name of a column In [2]: pop = pop.rename(columns={'ASUKKAITA': 'pop15'}) # See the column names and confirm that we now have a column called 'pop15' In [3]: pop.columns Out[3]: Index(['INDEX', 'pop15', 'ASVALJYYS', 'IKA0_9', 'IKA10_19', 'IKA20_29', 'IKA30_39', 'IKA40_49', 'IKA50_59', 'IKA60_69', 'IKA70_79', 'IKA_YLI80', 'geometry'], dtype='object')
# Columns that will be sected In [4]: selected_cols = ['pop15', 'geometry'] # Select those columns In [5]: pop = pop[selected_cols] # Let's see the last 2 rows In [6]: pop.tail(2) Out[6]: pop15 geometry 5782 9 POLYGON ((25513499.99632164 6685498.999797418,... 5783 30244 POLYGON ((25513999.999929 6659998.998172711, 2... Now we have cleaned the data and have only those columns that we need for our analysis.
Join the layers¶Now we are ready to perform the spatial join between the two layers that we have. The aim here is to get information about how many people live in a polygon that contains an individual address-point . Thus, we want to join attributes from the population layer we just modified into the addresses
point layer
# Addresses filpath In [7]: addr_fp = r"/home/geo/addresses_epsg3879.shp" # Read data In [8]: addresses = gpd.read_file(addr_fp) # Check the head of the file In [9]: addresses.head(2) Out[9]: address id \ 0 Kampinkuja 1, 00100 Helsinki, Finland 1001 1 Kaivokatu 8, 00101 Helsinki, Finland 1002 geometry 0 POINT (25496123.30852197 6672833.941567578) 1 POINT (25496774.28242895 6672999.698581985)
# Check the crs of address points In [10]: addresses.crs Out[10]: {'ellps': 'GRS80', 'k': 1, 'lat_0': 0, 'lon_0': 25, 'no_defs': True, 'proj': 'tmerc', 'units': 'm', 'x_0': 25500000, 'y_0': 0} # Check the crs of population layer In [11]: pop.crs Out[11]: {'ellps': 'GRS80', 'k': 1, 'lat_0': 0, 'lon_0': 25, 'no_defs': True, 'proj': 'tmerc', 'units': 'm', 'x_0': 25500000, 'y_0': 0} # Do they match? - We can test that In [12]: addresses.crs == pop.crs Out[12]: True Indeed they are identical. Thus, we can be sure that when doing spatial queries between layers the locations match and we get the right results e.g. from the spatial join that we are conducting here.
# Make a spatial join In [13]: join = gpd.sjoin(addresses, pop, how="inner", op="within") # Let's check the result In [14]: join.head() Out[14]: address id \ 0 Kampinkuja 1, 00100 Helsinki, Finland 1001 1 Kaivokatu 8, 00101 Helsinki, Finland 1002 10 Rautatientori 1, 00100 Helsinki, Finland 1011 3 Itäväylä, 00900 Helsinki, Finland 1004 4 Tyynenmerenkatu 9, 00220 Helsinki, Finland 1005 geometry index_right pop15 0 POINT (25496123.30852197 6672833.941567578) 3326 173 1 POINT (25496774.28242895 6672999.698581985) 3449 31 10 POINT (25496808.64582102 6673146.836896984) 3449 31 3 POINT (25505098.34340289 6677972.568484426) 5112 353 4 POINT (25495639.56049686 6671520.343245601) 3259 1397 Awesome! Now we have performed a successful spatial join where we got two new columns into our
# Output path outfp = r"/home/geo/addresses_pop15_epsg3979.shp" # Save to disk join.to_file(outfp) Do the results make sense? Let’s evaluate this a bit by plotting the points where color intensity indicates the population numbers.
In [15]: import matplotlib.pyplot as plt # Plot the points with population info In [16]: join.plot(column='pop15', cmap="Reds", markersize=7, scheme='natural_breaks', legend=True); # Add title In [17]: plt.title("Amount of inhabitants living close the the point"); # Remove white space around the figure In [18]: plt.tight_layout() By knowing approximately how population is distributed in Helsinki, it seems that the results do make sense as the points with highest population are located in the south where the city center of Helsinki is. |