Tools, tips and tricks (A-Z)

This is a grab-bag collection of examples and tools, notes, hints and pointers to good practices.

All the data conversions

ogr2ogr is a spatial data format swiss army knife

Re-project a shapefile:

ogr2ogr \
    -s_srs "EPSG:4326" \            # source srs
    -t_srs  "EPSG:27700" \          # target srs
    buildings_osgb.shp              # output filename
    buildings_latlon.shp            # input filename

Convert from GeoDatabase to CSV, filtering and selecting:

ogr2ogr \
    -f CSV \                        # output a CSV
    -limit 10 \                     # only 10 features
    -select FloorArea,AvgPop2016 \  # only 2 attributes
    ./nrd_res.csv \                 # output filename
    ./nrd.gdb \                     # filename to read from (this one's a geodatabase)
    -lco GEOMETRY=AS_WKT \          # output geometry as Well-Known-Text in the CSV
NRD2014_LTIS_FloorArea_RES          # input layer name to read

The docs are quite terse, but there are helpful blog posts and cheatsheets around.

Commit messages

Good commit messages are helpful for future collaborators (including your future self).

Rules:

  1. Capitalize the subject line
  2. Use the imperative mood in the subject line

One-line commit messages can be great - clear and succinct:

Fix typo in introduction to user guide

Sometimes more context is very useful:

  1. Separate subject from body with a blank line
  2. Use the body to explain what and why vs. how
Summarize changes in around 50 characters or less

Explain the problem that this commit is solving. Focus on why you
are making this change as opposed to how (the code explains that).

Further paragraphs come after blank lines.

 - Bullet points can be handy, too
 - Use a hyphen or an asterisk

If you use an issue tracker, put references to them at the bottom,
like this:

Resolves: #123
See also: #456, #789

Databases

If you know shapely, you almost know PostGIS:

The Postgres docs are very good:

Start with a the PostgreSQL tutorial for a database intro/refresher:

Work through ‘Intro to PostGIS’ with examples and exercises:

Excel

Pivot table/chart

Reading - just use pandas

Updating, calculating - xlwings interacts with Excel, strong alternative to VBA scripting!

GIFs

Making GIFs from a series of still images

# create discrete-image gif
convert -delay 20 flow_map_*.png -loop 0 flow_map.gif

# create smooth-blended gif
convert -delay 5 flow_map_*.png -loop 0 -morph 5 flow_map_smooth.gif

On windows, the ‘portable’ version doesn’t need admin rights: e.g. if the zip is extracted to Users\<username>\bin\imagemagick, replace convert with %USERPROFILE%\bin\imagemagick\convert.exe

More Workflow pointers

Two overview papers from Wilson et al:

Best practices for scientific computing

Good enough practices in scientific computing

Pandoc

Convert text documents from one format to another - markdown/HTML/Word/LaTeX…

Project layout

Main ideas:

CITATION
README
LICENSE
requirements.txt
data
 |-- birds_count_table.csv
data_as_provided
 |-- SOURCES.md
 |-- ConservationAreas2017.zip
docs
 |-- notebook.md
 |-- manuscript.md
 |-- changelog.txt
results
 |-- summarized_results.csv
figures
 |-- map.png
src
 |-- map_sightings.py
 |-- sightings_analysis.py
 |-- runall.py

(adapted from Wilson et al, linked above)

QGIS label rules

Label features using an expression - the line below concatenates the values of the from_id and to_id attributes to label some network edges, producing labels like 1<>2, 18<>27

concat(from_id, '<>', to_id)

R

R has been a bit of a de-facto language for stats - lots of good statistics packages, and ggplot2 is great for charts.

R for Data Science has a nice intro to R and data work:

Re-use code

snkit - a spatial networks toolkit, is an example of package that’s work in progress. The aim is to collect together bits of code that are often useful for data cleaning and are also tricky to get right (handling enough of the edges cases, running reasonably quickly)

Zoom

Tidy data

This is a simple but powerful idea about how to organise tabular data:

Readable paper which sets it out and runs through a few examples:

Untidy examples:

Untidy

The same dataset, tidied:

Tidy

What can we do with tidy data?

Use the source

Read open source code

Use the source

Videos

Create videos from stills (good for lots of frames, where GIFs would start to struggle):

# convert pngs to mp4 (change framerate to change presentation speed)
ffmpeg -framerate 10 -i figures/map_%d.png -pix_fmt yuv420p -c:v libx264 figures/map.mp4

Workflow

Think about how to switch tools and add project structure as things grow:

Process

(image from R for data science, linked above)

  1. Explore data: fire up GIS, Excel, Jupyter notebook
  2. Write scripts: record/reproduce processing steps, scale up
  3. Make a project: more structure, collaboration

Zoom around

Handy websites to quickly create or edit spatial data:

Pick a point on a map, in any* coordinate system:

Simplify some spatial data:

Make a GIF:

Compress images:

Draw flowcharts:

Awesome lists: