open-gira

This open-source snakemake workflow can be used to analyse environmental risks to infrastructure networks using global open data.

We can use open-gira to analyse open data on roads, railways and their exposure to river flooding, or electricity transmission lines and how they're affected by tropical cyclones.

open-gira is a work in progress.

Goals

Automated pipeline for reproducible analysis anywhere in the world
Maps per-country and of larger areas
Charts/stats of exposure per admin region, per hazard type, scenario, epoch
Consider transport, electricity, water, communications systems
Consider river flooding, storm surge coastal flooding, tropical cyclones
Estimate direct damages to physical networks
Estimate indirect effects of disruption - people affected, economic activity disrupted

Non-goals

Using closed data, which may be appropriate for other projects or use-cases
Detailed operational/engineering level simulation
Long-term planning

Installation

This guide will help you to install open-gira and its dependencies on your computer.

Tests

Once everything is installed, run the tests to ensure everything is linked up properly:

python -m pytest tests

Linux / Mac

The major installation steps are to:

Download open-gira
Set up a Python environment
Install additional command-line tools

Clone repository

Install open-gira by cloning the repository:

git clone https://github.com/nismod/open-gira.git

Software environment

This repository comes with a environment.yml file describing almost all of the software dependencies required to run open-gira.

There are several ways to manage Python versions and install libraries.

conda lets you install different versions of Python and Python libraries and other dependencies.
micrombamba a replacement for conda, and what the open-gira developers use.

The recommended approach for open-gira is to install micromamba then use it to create and manage environments.

Local

To install the required dependencies on a local machine, create the open-gira conda environment:

micromamba create -f environment.yml -y

Then activate it:

micromamba activate open-gira

You're now ready to configure workflows and request outputs.

Cluster

If installing on a cluster, you can work as above, or, create a seperate orchestrating environment containing only snakemake, e.g.

micromamba create -n snakemake python=3.9 snakemake

In this context, snakemake itself can manage the other required dependencies, creating other environments as necessary. To activate the orchestration environment:

micromamba activate snakemake

To run the workflow on a cluster you will need to provide a profile, requesting targets as follows:

snakemake --profile <path_to_cluster_config> -- <target_file>

Other command-line tools

The following tools are not available through conda and must be installed separately.

exactextract

exactextract is used for zonal statistics. Please see installation instructions here.

Windows

Native installation on Windows is extremely difficult because of the difficulty building the tools that open-gira relies on (binaries are not available for Windows).

To avoid this issue, we recommend Windows users install via Windows Subsystems for Linux (WSL). To install WSL, follow the install instructions.

Once WSL is installed, open-gira can be installed inside the Linux subsystem by following the Linux install instructions.

Docker

It is possible to run open-gira within a container using Docker. You must have Docker installed.

Clone this repository

git clone https://github.com/nismod/open-gira.git

Build container image

docker build -t open-gira .

Run container

We use the -it options to make the container interactive and give it a terminal, and the -d option to keep it running in the background (daemon mode). We give it a nice name we can refer to later, and make sure it is created using the image we just built. We tell it to run bash so that it has something to do and doesn't close immediately.

docker run -itd --name og open-gira bash

Enter container

We can now lauch a bash shell in our new container:

docker exec -it og /bin/bash

And once inside we can navigate to the open-gira folder.

cd open-gira

We're now ready to run snakemake, launch a python prompt, or do whatever else we like inside the container.

Usage

open-gira is comprised of a set of snakemake rules which call scripts and library code to request data, process it and produce results.

Snakemake

The key idea of snakemake is similar to make in that the workflow is determined from the end (the files users want) to the beginning (the files users have, if any) by applying general rules with pattern matching on file and folder names.

A example invocation looks like:

snakemake --cores 2 -- results/wales-latest_filter-road-primary/edges.gpq

Here, we ask snakemake to use up to 2 CPUs to produce a target file, in this case, the edges of the Welsh road network. snakemake pattern matches wales-latest as the OSM dataset name and filter-road as the network type we want to filter for.

To check what work we're going to request before commencing, use the -n flag:

snakemake -n --cores 2 -- results/wales-latest_filter-road-primary/edges.gpq

This will explain which rules will be required to run to produce the target file. It may be helpful to visualise which rules are expected to run, too.

Configuration

The snakemake configuration details are in config/config.yml. You can edit this to set the target OSM infrastructure datasets, number of spatial slices, and hazard datasets. See config/README.md and the documentation for each workflow for more details on the configuration variables.

Network creation

open-gira can currently create connected road, rail and electricity grid networks from open data.

Road

We can create a topologically connected road network for a given area from OpenStreetMap (OSM) data. The resulting network can be annotated with data retrieved from OSM (e.g. highway classification, surface type), along with data looked up from user-supplied sources (e.g. rehabilitation costs).

Description

Download or copy .osm.pbf file to input data directory.
Filter OSM file with osmium tags-filter for elements matching the filter file (see configuration, below).
Define a bounding box for the OSM file, write to disk as JSON.
Define a grid partitioning the bounding box into a square number of 'slices' as a series of JSON files, one for each slice.
Cut the OSM file into these slices.
Convert the sliced OSM files into geoparquet, retaining the keep_tags as configured.
Clean and annotate features in the geoparquet files (joining additional data such as country, rehabiliation costs, etc.).
Join sliced network components together.

Configuration

To specify a desired network:

Review and amend the spreadsheets in bundled_data/transport, these supply information that is used to gap-fill or extend what can be determined from OSM alone.
Review and amend config/config.yaml:
- The infrastructure_datasets map should contain a key pointing to an .osm.pbf file URL for desired area. There are currently entries for the planet, for (some definition of) continents and several countries. We use the geofabrik service for continent and country-level OSM extracts.
- Check the OSM filter file pointed to by network_filters.road. This file specifies which elements (nodes, ways or relations) to keep (or reject) from the multitude of data in an OSM file. See the filter expressions section here for more information on the syntax of these files.
- Check and amend keep_tags.road. This list of strings specifies which tags (attributes) to retain on the filtered elements we extract from the .osm.pbf file.
- Review slice_count. This controls the degree of parallelism possible. With it set to 1, there is no spatial slicing (we create the network in a single chunk). To speed network creation for large domains, it can be set to a larger square number. The first square number greater than your number of available CPUs is a good heuristic.

Creation

And to create the network, by way of example:

snakemake --cores all -- results/egypt-latest_filter-road-primary/edges.gpq

Rail

The process for creating a rail network is essentially the same as for road.

We can create a topologically connected rail network for a given area from OpenStreetMap (OSM) data. The resulting network can be annotated with data retrieved from OSM, along with data looked up from user-supplied sources (e.g. rehabilitation costs).

Description

Download or copy .osm.pbf file to input data directory.
Filter OSM file with osmium tags-filter for elements matching the filter file (see configuration, below).
Define a bounding box for the OSM file, write to disk as JSON.
Define a grid partitioning the bounding box into a square number of 'slices' as a series of JSON files, one for each slice.
Cut the OSM file into these slices.
Convert the sliced OSM files into geoparquet, retaining the keep_tags as configured.
Clean and annotate features in the geoparquet files (joining additional data such as country, rehabiliation costs, etc.).
Join sliced network components together.

Configuration

To specify a desired network:

Review and amend the spreadsheets in bundled_data/transport, these supply information that is used to gap-fill or extend what can be determined from OSM alone.
Review and amend config/config.yaml:
- The infrastructure_datasets map should contain a key pointing to an .osm.pbf file URL for desired area. There are currently entries for the planet, for (some definition of) continents and several countries. We use the geofabrik service for continent and country-level OSM extracts.
- Check the OSM filter file pointed to by network_filters.rail. This file specifies which elements (nodes, ways or relations) to keep (or reject) from the multitude of data in an OSM file. See the filter expressions section here for more information on the syntax of these files.
- Check and amend keep_tags.rail. This list of strings specifies which tags (attributes) to retain on the filtered elements we extract from the .osm.pbf file.
- Review slice_count. This controls the degree of parallelism possible. With it set to 1, there is no spatial slicing (we create the network in a single chunk). To speed network creation for large domains, it can be set to a larger square number. The first square number greater than your number of available CPUs is a good heuristic.

Creation

And to create the network, by way of example:

snakemake --cores all -- results/egypt-latest_filter-rail/edges.gpq

Note that the nodes file, results/egypt-latest_filter-rail/nodes.gpq will by default contain the stations and their names as recorded in OSM.

Electricity grid

To create an electricity grid we rely heavily on gridfinder data. This dataset provides transmission and distribution edges with substantial global coverage. It also contains a set of 'targets' or electricity consuming areas, derived from Night Time Lights (NTL) satellite imagery. Our other major data source for electricity grid creation is the World Resources Institute's (WRI) powerplants database.

The workflow currently divides network creation by country. One may request one country, or several. Note that neighbouring countries' networks are not connected with one another.

Description

Download gridfinder electricity grid data (line edges, 'target' consuming polygons)
Create global 'targets' file, where each electricity consuming target is annotated with the population and GDP of the enclosed area.
Download WRI powerplant data
Download GDP per capita data
Download population data
Construct an electricity grid for each country to be studied where nodes are power generators or target consumers. Power consumption is proportional to the GDP enclosed by each target polygon.

Configuration

There aren't currently any options to configure when creating an electricity grid.

Creation

Here's an example grid creation command for Puerto Rico:

snakemake --cores all -- results/power/by_country/PRI/network/edges.geoparquet

Note that the folder name under by_county is an ISO 3166 Alpha-3 country code, specifying the country in question.

Risk analysis

Having created networks, we can analyse the risks that certain hazards (currently tropical cyclones and flooding) may present to these networks.

Transport / flooding

The flooding risk analysis pipeline starts by creating an infrastructure network (road or rail) as described previously. Please refer to this section to configure the network creation.

Description

The pipeline has the following core steps:

Download the raster files for the hazard dataset.
Crop the raster files to the extent of the network in question.
Split the network edges on the raster grid, so no edge resides in more than one pixel.
Find the pixel value / hazard intensity (e.g. flood depth) for every hazard raster, for each split edge.
For each asset type (e.g. unpaved road) where a damage curve is defined, calculate the damage fraction for each split edge, for each raster.
Using rehabilitation cost estimates, calculate the expected monetary loss due to the calculated damage fraction.
Concatenate the previously split edges, summing the expected monetary loss.
Given hazard rasters at various return periods, calculate the Expected Annual Damages (EAD).
Aggregate EAD to desired regional level.

Configuration

The hazard component of the analysis is configurable in config/config.yaml:

hazard_datasets contains hazard names pointing to files of hazard layers. These layers are currently flood rasters (inundation depths for a given return period). Add or amend an entry pointing to file containing the rasters you wish to consider.
Ensure hazard_types contains an identical key referencing the hazard types. This is currently limited to flood only.
Configure the damage curves:
- Check and amend direct_damages.asset_types contains any assets you wish to calcuate direct damage costs for. Currently implemented assets are available in src/open_gira/assets.py as the classes inheriting from Assets.
- Ensure direct_damages.curves_dir is set to a path containing damages functions per asset type, organised by hazard type. See bundled_data/damage_curves for an example.
- These damage function files should be named <asset_type>.csv, e.g. road_unpaved.csv. Their format is exemplified by bundled_data/damage_curves/flood/road_unpaved.csv

Outputs

Rasterised network (per slice)

To generate a network, split by the hazard raster grid we might issue something like:

snakemake --cores all -- results/splits/<dataset_name>_filter-<network_type>/hazard-<hazard_name>/slice-<slice_number>.geoparquet

For the egypt-latest OSM dataset, road network, aqueduct-river hazard set and slice 0, that would be:

snakemake --cores all -- results/splits/egypt-latest_filter-road/hazard-aqueduct-river/slice-0.geoparquet

Expected Annual Damages (per slice)

To request an evaluation of Expected Annual Damages (EAD) as a function of hazard Return Period (RP) for a given slice, we can request something like:

For example (with a config.slice_count of 9):

snakemake --cores all -- results/direct_damages/egypt-latest_filter-road/hazard-aqueduct-river/EAD_and_cost_per_RP/slice-5.geoparquet

Expected Annual Damages (per admin region)

And to compute all the slices for a given domain and then aggregate to country level (admin level 0):

snakemake --cores all -- results/egypt-latest_filter-road/hazard-aqueduct-river/EAD_and_cost_per_RP/agg-sum/admin-level-0.geoparquet

For more possible outputs please refer to the detailed documentation and the rules defined in workflow/rules/.

Electricity grids - tropical cyclones

This analysis intersects electricity grids with tropical cyclone wind speed risk. Network creation is described previously. The hazards are event-based (as opposed to return period map based).

Description

Download storm track data.
Process storm track data into common geoparquet format, ensure meterological variables are consistent.
Define a wind speed grid, with an extent equal to the network bounding box, plus a buffer.
Given a storm track, estimate the resulting wind speed field as a function of time using a modified Holland model.
Downscale the wind speeds from free-atmosphere to surface level, using a power law. The downscaling exponent is a function of the surface roughnesses, derived from a land surface categorisation.
Take the maximum wind speed across time for each pixel in the wind grid.
For a certain defined threshold or set of thresholds, we then fail electricity grid edges (transmission and distribution lines) that experience wind in excess of the threshold.
Allocate power across the newly degraded grid and see how it differs to the nominal allocation.
Report on change in power supply and numbers of people affected.

Configuration

Review and amend config/config.yaml:
- storm_sets should contain a storm set name, pointing to a JSON file. This JSON file should contain an empty list to process all storms for this storm set, or a list of string storm IDs if only a subset is required.
- Specify transmission_windspeed_failure as a list of wind speeds in meters per second at which to fail edges.

See comments in the config/config.yaml file for other less crucial configuration options.

Outputs

Below you will find example commands necessary to create outputs for the case of PRI (Puerto Rico) and the storm 2017242N16333 (Irma, 2017) from the IBTrACS historic cyclone dataset. Later steps will trigger earlier, dependent steps if necessary.

Networks at risk

Identify which storms are likely to impact which countries
- snakemake --cores 1 -- results/power/by_storm_set/IBTrACS/storms_by_country_impacted.json

Maximum wind speed fields

Download storm track data (historic or synthetic)
Preprocess data into a (mostly) common event-set format.
- snakemake --cores 1 -- results/storm_tracks/IBTrACS/tracks.geoparquet
Download land surface categorical data
Estimate surface roughness for the grid region
- snakemake --cores 1 -- results/power/by_country/PRI/storms/surface_roughness.tiff
Any storm tracks within some proximity of the grid in question are processed into maximum wind speed footprints for the electricity grid region. This is by means of a modified Holland wind field model. Downscale the winds to the surface by a power law, using the surface roughness data.
- snakemake --cores 1 -- results/power/by_country/PRI/storms/IBTrACS/max_wind_field.nc

Network exposure

Rasterise the electricity grid (split line segments such that no segment extends beyond a wind cell boundary).
- snakemake --cores 1 -- results/power/by_country/PRI/exposure/edges_split.geoparquet
For a given storm and country, remove edges from the grid which exceed a given wind speed damage threshold. Reallocate power from powerplants to targets, by GDP, on the degraded network. Store the ratio between the original power supplied and the degraded power supplied to each target.
- snakemake --cores 1 -- results/power/by_country/PRI/exposure/IBTrACS/2017242N16333.nc
Perform the above calculations for a single storm in a cyclone dataset and all the countries its track intersects.
- snakemake --cores 1 -- results/power/by_storm_set/IBTrACS/by_storm/2017242N16333/exposure_by_target.nc
Perform the above calculations for an entire cyclone dataset and all the countries its tracks intersect.
- snakemake --cores 1 -- results/power/by_storm_set/IBTrACS/exposure.txt

Further analysis

TODO: Add additional rules to facilitate comparison across countries and cyclone datasets.

TODO: Document these additional rules.

Map how electricity supply is impacted by a given storm for a variety of different damage thresholds.
- snakemake --cores 1 -- results/power/by_storm_set/IBTrACS/by_storm/2017242N16333/outage_map/outage_map_by_threshold.gif
Map the maximum wind speed experienced across the area impacted by a storm.
- snakemake --cores 1 -- results/power/by_storm_set/IBTrACS/by_storm/2017242N16333/wind_field.png

There also exist other plotting and mapping steps to visualise intermediate and final outputs. Refer to workflow/rules/analysis for a description of these.

Utilities

open-gira comes with a few small utilities for data processing and file management.

Removing intermediate files

You can remove intermediate files by running the clean rule.

snakemake --cores 1 -R clean

Geoparquet -> Geopackage

As standard we use the GeoParquet (.geoparquet or .gpq) format to store vector data on disk. Unfortunately common GIS software such as QGIS may not yet support this file format. To convert file(s) to geopackage, use:

python workflow/scripts/pq_to_gpkg.py <path_to_geoparquet_1> <path_to_geoparquet_2> <...>

This will write .gpkg files beside their source .geoparquet.

Unpickling interactive plots

matplotlib plots can be interactive (zoom, pan, etc.), but not as static images. Some rules produce pickled plot files. To view these, use:

python workflow/scripts/unpickle_plot.py <path_to_pickled_plot>

Archiving results

The bash script archive_results.sh can be used to back up analysis results. Here's an example usage:

./archive_results.sh results/ /mnt/backup/open-gira

This will create a timestamped folder in path of the second argument, e.g. /mnt/backup/open-gira/2023-07-24T101756+0100, containing an archive of results (excluding input files downloaded from the internet) and a README describing the state of the repository at the time of archive creation.