Geographic Book

Made with ❤️️ on 🌍

An Introduction to GeoPandas

GeoPandas is a powerful Python library that makes working with geospatial data in Python easier. It extends the datatypes used by pandas, the standard tool for manipulating data frames in Python, to allow spatial operations on geometric types.

An Introduction to GeoPandas

Key Concepts

GeoPandas, as the name suggests, extends the popular data science library pandas by adding support for geospatial data. The core data structure in GeoPandas is the geopandas.GeoDataFrame, a subclass of pandas.DataFrame, that can store geometry columns and perform spatial operations. The geopandas.GeoSeries, a subclass of pandas.Series, handles the geometries. Therefore, your GeoDataFrame is a combination of pandas.Series, with traditional data (numerical, boolean, text etc.), and geopandas.GeoSeries, with geometries (points, polygons etc.).

Each GeoSeries can contain any geometry type and has a GeoSeries.crs attribute, which stores information about the projection (CRS stands for Coordinate Reference System)1. Therefore, each GeoSeries in a GeoDataFrame can be in a different projection, allowing you to have, for example, multiple versions (different projections) of the same geometry.

Reading and Writing Files

Assuming you have a file containing both data and geometry (e.g. GeoPackage, GeoJSON, Shapefile), you can read it using geopandas.read_file(), which automatically detects the filetype and creates a GeoDataFrame. Here is an example of how to read a file:

import geopandas
from geodatasets import get_path
path_to_data = get_path("nybb")
gdf = geopandas.read_file(path_to_data)

Installation

To use GeoPandas, you first have to install it, just like any other Python library. GeoPandas relies on a stack of open-source geospatial libraries to deliver its full spatial potential, including shapely, fiona, pyproj, and rtree. You have to make sure you install all these dependencies, otherwise, GeoPandas may not work as expected. To install GeoPandas on the command line:

conda install geopandas

Alternatively, you can install GeoPandas using pip, the standard package installer in Python.

Working with Geometries

GeoPandas uses the shapely library to handle geometric objects such as points, lines, and polygons. These geometric objects are stored in a GeoSeries. Here is an example of how to create a GeoSeries:

from shapely.geometry import Point, Polygon
import geopandas as gpd
 # Create a GeoSeries from a list of shapely Point objectsgs = gpd.GeoSeries([Point(-120, 45), Point(-121.2, 46), Point(-122.9, 47.5)])

Spatial Operations

GeoPandas supports standard spatial operations like union, intersection, difference, and symmetric difference using the shapely library. Here is an example of a spatial operation:

# Create two GeoSeries
gs1 = gpd.GeoSeries([Polygon([(0, 0), (1, 0), (1, 1)]), Polygon([(1, 1), (2,1), (2,2)])])gs2 = gpd.GeoSeries([Polygon([(0, 0), (1, 0), (1, 1)]), Polygon([(1, 0), (2, 0), (2, 1)])])

# Perform a union operation
gs_union = gs1.union(gs2)

Plotting

GeoPandas makes it easy to create basic visualizations of GeoDataFrames and GeoSeries. It uses matplotlib under the hood for plotting. Here is an example of how to plot a GeoDataFrame:

import geopandas as gpd
from geodatasets import get_path

# Load a GeoDataFrame containing regions of New York city
path_to_data = get_path("nybb")
gdf = gpd.read_file(path_to_data)

# Plot the GeoDataFrame
gdf.plot()

Coordinate Reference Systems

GeoPandas allows for the management of coordinate reference systems (CRS) and for transformations between different CRS. Here is an example of how to change the CRS of a GeoDataFrame:

# Load a GeoDataFrame

# Load a GeoDataFrame
gdf = gpd.read_file(gpd.datasets.get_path('nybb'))

# Check the original CRS
print(gdf.crs)

# Change the CRS to EPSG 4326
gdf.to_crs(epsg=4326, inplace=True)

# Check the new CRS
print(gdf.crs)

Spatial Joins

Spatial joins are a common operation in GIS where the goal is to transfer information from one GeoDataFrame to another based on their spatial relationship. In GeoPandas, you can use the sjoin function to accomplish this. Here is an example:

import geopandas as gpd
from geopandas.tools import sjoin

# Load two GeoDataFrames
points = gpd.read_file(gpd.datasets.get_path('naturalearth_cities'))
polygons = gpd.read_file(gpd.datasets.get_path('naturalearth_lowres'))

# Perform a spatial join
joined = sjoin(points, polygons, how="inner", op='intersects')

Overlays

Overlay operations identify spatial relationships between two GeoDataFrames. These include intersection, union, difference, and symmetric difference. Here is an example of an overlay operation:

import geopandas as gpd

# Load two GeoDataFrames
poly1 = gpd.read_file(gpd.datasets.get_path('naturalearth_lowres'))
poly2 = gpd.read_file(gpd.datasets.get_path('naturalearth_cities'))

# Perform an overlay operation
overlay = gpd.overlay(poly1, poly2, how='intersection')

Conclusion

GeoPandas is a powerful tool for anyone interested in using Python for geospatial data analysis. It provides a high-level interface to the underlying geospatial data structures, making it easy to read, write, and process geospatial data in Python. Whether you’re a data scientist, a GIS professional, or a hobbyist, GeoPandas is a great addition to your data analysis toolkit.

References

https://geopandas.org/en/stable/#

Leave a Reply

Scroll to Top

Discover more from Geographic Book

Subscribe now to keep reading and get access to the full archive.

Continue reading