open-gira
This open-source snakemake workflow can be used to analyse environmental risks to infrastructure networks using global open data.
We can use open-gira
to analyse open data on roads, railways and
their exposure to river flooding, or electricity transmission lines and how
they're affected by tropical cyclones.
open-gira
is a work in progress.
Goals
- Automated pipeline for reproducible analysis anywhere in the world
- Maps per-country and of larger areas
- Charts/stats of exposure per admin region, per hazard type, scenario, epoch
- Consider transport, electricity, water, communications systems
- Consider river flooding, storm surge coastal flooding, tropical cyclones
- Estimate direct damages to physical networks
- Estimate indirect effects of disruption - people affected, economic activity disrupted
Non-goals
- Using closed data, which may be appropriate for other projects or use-cases
- Detailed operational/engineering level simulation
- Long-term planning
Installation
This guide will help you to install open-gira
and its dependencies on your computer.
Tests
Once everything is installed, run the tests to ensure everything is linked up properly:
python -m pytest tests
Linux / Mac
The major installation steps are to:
- Download
open-gira
- Set up a Python environment
- Install additional command-line tools
Clone repository
Install open-gira by cloning the repository:
git clone https://github.com/nismod/open-gira.git
Software environment
This repository comes with a environment.yml
file describing almost all of the
software dependencies required to run open-gira
.
There are several ways to manage Python versions and install libraries.
conda
lets you install different versions of Python and Python libraries and other dependencies.micrombamba
a replacement forconda
, and what theopen-gira
developers use.
The recommended approach for open-gira
is to install micromamba
then use it
to create and manage environments.
Local
To install the required dependencies on a local machine, create the open-gira
conda environment:
micromamba create -f environment.yml -y
Then activate it:
micromamba activate open-gira
You're now ready to configure workflows and request outputs.
Cluster
If installing on a cluster, you can work as above, or, create a seperate orchestrating environment containing only snakemake, e.g.
micromamba create -n snakemake python=3.9 snakemake
In this context, snakemake
itself can manage the other required dependencies,
creating other environments as necessary. To activate the orchestration
environment:
micromamba activate snakemake
To run the workflow on a cluster you will need to provide a profile, requesting targets as follows:
snakemake --profile <path_to_cluster_config> -- <target_file>
Other command-line tools
The following tools are not available through conda
and must be installed separately.
exactextract
exactextract is used for zonal statistics. Please see installation instructions here.
Windows
Native installation on Windows is extremely difficult because of the difficulty building the tools that open-gira relies on (binaries are not available for Windows).
To avoid this issue, we recommend Windows users install via Windows Subsystems for Linux (WSL). To install WSL, follow the install instructions.
Once WSL is installed, open-gira can be installed inside the Linux subsystem by following the Linux install instructions.
Docker
It is possible to run open-gira
within a container using Docker. You must have Docker
installed.
- Clone this repository
git clone https://github.com/nismod/open-gira.git
- Build container image
docker build -t open-gira .
- Run container
We use the -it
options to make the container interactive and give it a terminal,
and the -d
option to keep it running in the background (daemon mode).
We give it a nice name we can refer to later, and make sure it is created using the
image we just built.
We tell it to run bash
so that it has something to do and doesn't close immediately.
docker run -itd --name og open-gira bash
- Enter container
We can now lauch a bash shell in our new container:
docker exec -it og /bin/bash
And once inside we can navigate to the open-gira folder.
cd open-gira
We're now ready to run snakemake, launch a python prompt, or do whatever else we like inside the container.
Usage
open-gira
is comprised of a set of snakemake
rules which call scripts and
library code to request data, process it and produce results.
Snakemake
The key idea of snakemake
is similar to make
in that the workflow is
determined from the end (the files users want) to the beginning (the files
users have, if any) by applying general rules with pattern matching on file and
folder names.
A example invocation looks like:
snakemake --cores 2 -- results/wales-latest_filter-road-primary/edges.gpq
Here, we ask snakemake
to use up to 2 CPUs to produce a target file, in this
case, the edges of the Welsh road network. snakemake
pattern matches
wales-latest
as the OSM dataset name and filter-road
as the network type we
want to filter for.
To check what work we're going to request before commencing, use the -n
flag:
snakemake -n --cores 2 -- results/wales-latest_filter-road-primary/edges.gpq
This will explain which rules will be required to run to produce the target file. It may be helpful to visualise which rules are expected to run, too.
Configuration
The snakemake configuration details are in config/config.yml
. You can edit
this to set the target OSM infrastructure datasets, number of spatial slices, and
hazard datasets. See
config/README.md
and the documentation for each workflow for more details on the configuration
variables.
Network creation
open-gira
can currently create connected road, rail and electricity grid
networks from open data.
Road
We can create a topologically connected road network for a given area from OpenStreetMap (OSM) data. The resulting network can be annotated with data retrieved from OSM (e.g. highway classification, surface type), along with data looked up from user-supplied sources (e.g. rehabilitation costs).
Description
- Download or copy
.osm.pbf
file to input data directory. - Filter OSM file with
osmium tags-filter
for elements matching the filter file (see configuration, below). - Define a bounding box for the OSM file, write to disk as JSON.
- Define a grid partitioning the bounding box into a square number of 'slices' as a series of JSON files, one for each slice.
- Cut the OSM file into these slices.
- Convert the sliced OSM files into geoparquet, retaining the
keep_tags
as configured. - Clean and annotate features in the geoparquet files (joining additional data such as country, rehabiliation costs, etc.).
- Join sliced network components together.
Configuration
To specify a desired network:
- Review and amend the spreadsheets in
bundled_data/transport
, these supply information that is used to gap-fill or extend what can be determined from OSM alone. - Review and amend
config/config.yaml
:- The
infrastructure_datasets
map should contain a key pointing to an.osm.pbf
file URL for desired area. There are currently entries for the planet, for (some definition of) continents and several countries. We use the geofabrik service for continent and country-level OSM extracts. - Check the OSM filter file pointed to by
network_filters.road
. This file specifies which elements (nodes, ways or relations) to keep (or reject) from the multitude of data in an OSM file. See the filter expressions section here for more information on the syntax of these files. - Check and amend
keep_tags.road
. This list of strings specifies whichtags
(attributes) to retain on the filtered elements we extract from the.osm.pbf
file. - Review
slice_count
. This controls the degree of parallelism possible. With it set to 1, there is no spatial slicing (we create the network in a single chunk). To speed network creation for large domains, it can be set to a larger square number. The first square number greater than your number of available CPUs is a good heuristic.
- The
Creation
And to create the network, by way of example:
snakemake --cores all -- results/egypt-latest_filter-road-primary/edges.gpq
Rail
The process for creating a rail network is essentially the same as for road.
We can create a topologically connected rail network for a given area from OpenStreetMap (OSM) data. The resulting network can be annotated with data retrieved from OSM, along with data looked up from user-supplied sources (e.g. rehabilitation costs).
Description
- Download or copy
.osm.pbf
file to input data directory. - Filter OSM file with
osmium tags-filter
for elements matching the filter file (see configuration, below). - Define a bounding box for the OSM file, write to disk as JSON.
- Define a grid partitioning the bounding box into a square number of 'slices' as a series of JSON files, one for each slice.
- Cut the OSM file into these slices.
- Convert the sliced OSM files into geoparquet, retaining the
keep_tags
as configured. - Clean and annotate features in the geoparquet files (joining additional data such as country, rehabiliation costs, etc.).
- Join sliced network components together.
Configuration
To specify a desired network:
- Review and amend the spreadsheets in
bundled_data/transport
, these supply information that is used to gap-fill or extend what can be determined from OSM alone. - Review and amend
config/config.yaml
:- The
infrastructure_datasets
map should contain a key pointing to an.osm.pbf
file URL for desired area. There are currently entries for the planet, for (some definition of) continents and several countries. We use the geofabrik service for continent and country-level OSM extracts. - Check the OSM filter file pointed to by
network_filters.rail
. This file specifies which elements (nodes, ways or relations) to keep (or reject) from the multitude of data in an OSM file. See the filter expressions section here for more information on the syntax of these files. - Check and amend
keep_tags.rail
. This list of strings specifies whichtags
(attributes) to retain on the filtered elements we extract from the.osm.pbf
file. - Review
slice_count
. This controls the degree of parallelism possible. With it set to 1, there is no spatial slicing (we create the network in a single chunk). To speed network creation for large domains, it can be set to a larger square number. The first square number greater than your number of available CPUs is a good heuristic.
- The
Creation
And to create the network, by way of example:
snakemake --cores all -- results/egypt-latest_filter-rail/edges.gpq
Note that the nodes file, results/egypt-latest_filter-rail/nodes.gpq
will by default contain the stations and their names as recorded in OSM.
Electricity grid
To create an electricity grid we rely heavily on gridfinder data. This dataset provides transmission and distribution edges with substantial global coverage. It also contains a set of 'targets' or electricity consuming areas, derived from Night Time Lights (NTL) satellite imagery. Our other major data source for electricity grid creation is the World Resources Institute's (WRI) powerplants database.
The workflow currently divides network creation by country. One may request one country, or several. Note that neighbouring countries' networks are not connected with one another.
Description
- Download gridfinder electricity grid data (line edges, 'target' consuming polygons)
- Create global 'targets' file, where each electricity consuming target is annotated with the population and GDP of the enclosed area.
- Download WRI powerplant data
- Download GDP per capita data
- Download population data
- Construct an electricity grid for each country to be studied where nodes are power generators or target consumers. Power consumption is proportional to the GDP enclosed by each target polygon.
Configuration
There aren't currently any options to configure when creating an electricity grid.
Creation
Here's an example grid creation command for Puerto Rico:
snakemake --cores all -- results/power/by_country/PRI/network/edges.geoparquet
Note that the folder name under by_county
is an ISO 3166 Alpha-3 country
code,
specifying the country in question.
Risk analysis
Having created networks, we can analyse the risks that certain hazards (currently tropical cyclones and flooding) may present to these networks.
Transport / flooding
The flooding risk analysis pipeline starts by creating an infrastructure network (road or rail) as described previously. Please refer to this section to configure the network creation.
Description
The pipeline has the following core steps:
- Download the raster files for the hazard dataset.
- Crop the raster files to the extent of the network in question.
- Split the network edges on the raster grid, so no edge resides in more than one pixel.
- Find the pixel value / hazard intensity (e.g. flood depth) for every hazard raster, for each split edge.
- For each asset type (e.g. unpaved road) where a damage curve is defined, calculate the damage fraction for each split edge, for each raster.
- Using rehabilitation cost estimates, calculate the expected monetary loss due to the calculated damage fraction.
- Concatenate the previously split edges, summing the expected monetary loss.
- Given hazard rasters at various return periods, calculate the Expected Annual Damages (EAD).
- Aggregate EAD to desired regional level.
Configuration
The hazard component of the analysis is configurable in config/config.yaml
:
hazard_datasets
contains hazard names pointing to files of hazard layers. These layers are currently flood rasters (inundation depths for a given return period). Add or amend an entry pointing to file containing the rasters you wish to consider.- Ensure
hazard_types
contains an identical key referencing the hazard types. This is currently limited toflood
only. - Configure the damage curves:
- Check and amend
direct_damages.asset_types
contains any assets you wish to calcuate direct damage costs for. Currently implemented assets are available insrc/open_gira/assets.py
as the classes inheriting fromAssets
. - Ensure
direct_damages.curves_dir
is set to a path containing damages functions per asset type, organised by hazard type. Seebundled_data/damage_curves
for an example. - These damage function files should be named
<asset_type>.csv
, e.g.road_unpaved.csv
. Their format is exemplified bybundled_data/damage_curves/flood/road_unpaved.csv
- Check and amend
Outputs
Rasterised network (per slice)
To generate a network, split by the hazard raster grid we might issue something like:
snakemake --cores all -- results/splits/<dataset_name>_filter-<network_type>/hazard-<hazard_name>/slice-<slice_number>.geoparquet
For the egypt-latest
OSM dataset, road
network, aqueduct-river
hazard set and slice 0
, that would be:
snakemake --cores all -- results/splits/egypt-latest_filter-road/hazard-aqueduct-river/slice-0.geoparquet
Expected Annual Damages (per slice)
To request an evaluation of Expected Annual Damages (EAD) as a function of
hazard Return Period (RP) for a given slice
, we can request something like:
For example (with a config.slice_count
of 9):
snakemake --cores all -- results/direct_damages/egypt-latest_filter-road/hazard-aqueduct-river/EAD_and_cost_per_RP/slice-5.geoparquet
Expected Annual Damages (per admin region)
And to compute all the slices for a given domain and then aggregate to country level (admin level 0):
snakemake --cores all -- results/egypt-latest_filter-road/hazard-aqueduct-river/EAD_and_cost_per_RP/agg-sum/admin-level-0.geoparquet
For more possible outputs please refer to the detailed documentation and the
rules defined in workflow/rules/
.
Electricity grids - tropical cyclones
This analysis intersects electricity grids with tropical cyclone wind speed risk. Network creation is described previously. The hazards are event-based (as opposed to return period map based).
Description
- Download storm track data.
- Process storm track data into common geoparquet format, ensure meterological variables are consistent.
- Define a wind speed grid, with an extent equal to the network bounding box, plus a buffer.
- Given a storm track, estimate the resulting wind speed field as a function of time using a modified Holland model.
- Downscale the wind speeds from free-atmosphere to surface level, using a power law. The downscaling exponent is a function of the surface roughnesses, derived from a land surface categorisation.
- Take the maximum wind speed across time for each pixel in the wind grid.
- For a certain defined threshold or set of thresholds, we then fail electricity grid edges (transmission and distribution lines) that experience wind in excess of the threshold.
- Allocate power across the newly degraded grid and see how it differs to the nominal allocation.
- Report on change in power supply and numbers of people affected.
Configuration
- Review and amend
config/config.yaml
:storm_sets
should contain a storm set name, pointing to a JSON file. This JSON file should contain an empty list to process all storms for this storm set, or a list of string storm IDs if only a subset is required.- Specify
transmission_windspeed_failure
as a list of wind speeds in meters per second at which to fail edges.
See comments in the config/config.yaml
file for other less crucial
configuration options.
Outputs
Below you will find example commands necessary to create outputs for the case of PRI (Puerto Rico) and the storm 2017242N16333 (Irma, 2017) from the IBTrACS historic cyclone dataset. Later steps will trigger earlier, dependent steps if necessary.
Networks at risk
- Identify which storms are likely to impact which countries
snakemake --cores 1 -- results/power/by_storm_set/IBTrACS/storms_by_country_impacted.json
Maximum wind speed fields
- Download storm track data (historic or synthetic)
- Preprocess data into a (mostly) common event-set format.
snakemake --cores 1 -- results/storm_tracks/IBTrACS/tracks.geoparquet
- Download land surface categorical data
- Estimate surface roughness for the grid region
snakemake --cores 1 -- results/power/by_country/PRI/storms/surface_roughness.tiff
- Any storm tracks within some proximity of the grid in question are processed into maximum wind speed footprints for the electricity grid region. This is by means of a modified Holland wind field model. Downscale the winds to the surface by a power law, using the surface roughness data.
snakemake --cores 1 -- results/power/by_country/PRI/storms/IBTrACS/max_wind_field.nc
Network exposure
- Rasterise the electricity grid (split line segments such that no segment extends beyond a wind cell boundary).
snakemake --cores 1 -- results/power/by_country/PRI/exposure/edges_split.geoparquet
- For a given storm and country, remove edges from the grid which exceed a given wind speed damage threshold. Reallocate power from powerplants to targets, by GDP, on the degraded network. Store the ratio between the original power supplied and the degraded power supplied to each target.
snakemake --cores 1 -- results/power/by_country/PRI/exposure/IBTrACS/2017242N16333.nc
- Perform the above calculations for a single storm in a cyclone dataset and all the countries its track intersects.
snakemake --cores 1 -- results/power/by_storm_set/IBTrACS/by_storm/2017242N16333/exposure_by_target.nc
- Perform the above calculations for an entire cyclone dataset and all the countries its tracks intersect.
snakemake --cores 1 -- results/power/by_storm_set/IBTrACS/exposure.txt
Further analysis
TODO: Add additional rules to facilitate comparison across countries and cyclone datasets.
TODO: Document these additional rules.
- Map how electricity supply is impacted by a given storm for a variety of different damage thresholds.
snakemake --cores 1 -- results/power/by_storm_set/IBTrACS/by_storm/2017242N16333/outage_map/outage_map_by_threshold.gif
- Map the maximum wind speed experienced across the area impacted by a storm.
snakemake --cores 1 -- results/power/by_storm_set/IBTrACS/by_storm/2017242N16333/wind_field.png
There also exist other plotting and mapping steps to visualise intermediate and final outputs. Refer to workflow/rules/analysis
for a description of these.
Utilities
open-gira
comes with a few small utilities for data processing and file management.
Removing intermediate files
You can remove intermediate files by running the clean
rule.
snakemake --cores 1 -R clean
Geoparquet -> Geopackage
As standard we use the GeoParquet (.geoparquet
or
.gpq
) format to store vector data on disk. Unfortunately common GIS software
such as QGIS may not yet support this file format. To convert file(s) to
geopackage, use:
python workflow/scripts/pq_to_gpkg.py <path_to_geoparquet_1> <path_to_geoparquet_2> <...>
This will write .gpkg
files beside their source .geoparquet
.
Unpickling interactive plots
matplotlib
plots can be interactive (zoom, pan, etc.), but not as static
images. Some rules produce pickled plot files. To view these, use:
python workflow/scripts/unpickle_plot.py <path_to_pickled_plot>
Archiving results
The bash script archive_results.sh
can be used to back up analysis results.
Here's an example usage:
./archive_results.sh results/ /mnt/backup/open-gira
This will create a timestamped folder in path of the second argument, e.g.
/mnt/backup/open-gira/2023-07-24T101756+0100
, containing an archive of results
(excluding input files downloaded from the internet) and a README describing the
state of the repository at the time of archive creation.