Skip to main content

GitHub Composite Actions

GitHub quietly sneaked out a new feature in the last few weeks: Composite Actions. This solves one of my biggest issues with GH Actions which is that yaml workflow steps couldn't be re-used. This means I have lots of repos with the exact same yaml workflow, or roughly the same workflow with slight variations. For example these 3 repos (and others) currently use the exact same workflow for CI:

Composite actions essentially allows yaml workflow steps to be packaged and re-used in the same way that javascript actions or docker actions can be, so I can package this up as a shared action and use it in lots of repos. This is a feature I have been hoping would be added to Actions for some time and I'm looking forward to DRYing up my workflows across repos over the coming weeks.

Browse The Web Like A Crawler

Some sites sneakily serve different content to crawlers and interactive users. Setting your user agent to:

Mozilla/5.0 (compatible; Googlebot/2.1; +

allows you to see what GoogleBot sees. Uaswitcher for firefox has a preset for this out of the box.

Learnings from a year on Heroku

We migrated to Heroku just over a year ago. Heroku is a great platform but through the process of running a high-traffic application (on a busy day we serve ~30 million requests) I've got to know some of the idiosyncrasies of the platform. In this post I will run through a few of the esoteric details that have surprised me along the way as I've discovered them.

📅 Daily Restarts

Heroku's dyno manager restarts or cycles your dynos once per day. This behaviour is non-configurable. With the default settings, these daily restarts may also lead to a bit of downtime due to Heroku's default dyno lifecycle, which leads us to..

🔛 Preboot

A surprising default behaviour of Heroku is that when you deploy or restart (or Heroku restarts your dyno for you), Heroku stops the existing set of web dynos before starting the new ones. Turning on the preboot feature means Heroku will ensure the new dynos are receiving traffic before shutting down the old ones.

heroku features:enable preboot -a <myapp>

💥 Crash Restart Policy

Sometimes bugs hit production. In the case that your application crashes, for whatever reason, Heroku will automatically restart the dyno for you but it implements a backoff policy. In some cases this can lead to a situation where some of your dynos are spending over 5 hours not serving any traffic because they are in a cool-off period instead of recovering immediately. If this happens, the Heroku web UI will report you are running N dynos even when less than N of them are able to actively serve traffic because some are in a cool-off period waiting to start. Running

heroku ps -a <myapp>

does tell you which dynos are up and which (if any) are crashed. This brings us on to the next point..

👑 The CLI is King

If you're planning on working with Heroku in any depth, you need to get familiar with the CLI. There are certain operations which can be performed via the CLI or API which just aren't available via the Web UI or where the CLI allows you to see more detailed information than the Web UI can show you. Fortunately Heroku's CLI app is very high quality. The --help is very discoverable and it includes examples as well as a reference. This makes it really easy to just jump in and work things out as you go.

I've already hinted at heroku features earlier in the post, but there are a variety of useful flags which can be listed with

heroku features -a <myapp>  # stable
heroku labs -a <myapp>  # experemental

Many of these also can't be enabled/disabled via the Web UI.

Running a GitHub action on pull request merge

Github workflows can be triggered by a variety of events. The pull_request event has 14 different activity types. I recently wanted to write an action that runs after a pull request is merged and I was surprised to find that merged is not one of the supported activity types. This is easy enough to work around using the following snippet, but a surprising omission.

    types: [closed]

    if: github.event.pull_request.merged == true
    steps: ...

Identifying poorly indexed tables in Postgres

Out of the box the Postgres statistics collector gives us a variety of useful views we can query to understand the usage and performance of a database. This is a huge topic, so I'm just going to look at one very constrained topic in this post.

Sequential scans are OK on small tables or if we are making them infrequently, but usually if we are performing a lot of sequential scans on large tables this is a sign that we need to create an index. Fortunately Postgres collects some data in pg_stat_user_tables that can help us identify this:

  • Looking at seq_scan and idx_scan can show us tables where we are frequently performing sequential scans rather than indexed scans
  • Looking at seq_scan and seq_tup_read can show us tables where sequential scans typically process a large number of rows

Running this query:

    relname, idx_scan, seq_scan, seq_tup_read,

    seq_tup_read / seq_scan
    AS average_tuples_per_scan,

    ((seq_scan / (seq_scan + idx_scan) :: FLOAT) * 100)
    AS percent_sequential_scans
FROM pg_stat_user_tables
WHERE seq_scan > 0
ORDER BY average_tuples_per_scan DESC;

Should provide a pretty clear picture of tables that need optimisation.

Renovate Custom Managers

Tools like Dependabot and PyUp are amazing for automatically keeping dependencies updated. They work fine, as long as your dependencies are distributed via a supported language-specific package manager and specified in a manifest file. This doesn't account for 100% of application dependencies though, particularly when it comes to deployment. This is where Renovate has a killer feature: Custom Managers. The setup is a little fiddly, but this allows Renovate to submit pull requests to arbitrary files like a Dockerfile or an ansible playbook bumping applications like nginx, curl or Postgres which aren't specified in a requirements.txt or a package.json.

git --word-diff

Git's default diff behaviour can make it difficult to parse edits to long lines of text. This is often an issue when editing documentation

unhelpful git diffThanks git, but what actually changed?

In these situations, the --word-diff option can be used to generate a diff that is easier to read.

helpful git diffAah that's better!

Capturing stdout in python

Sometimes it is helpful to capture stdout/stderr. I usually use this when writing tests to make assertions about terminal output or just suppress it when the tests are running. Normally I would do a little switcheroo like this to manually manipulate sys.stdout:

from io import StringIO 
import sys

sys.stdout = StringIO()
val = sys.stdout.getvalue()
sys.stdout = sys.__stdout__

but today I leaned about python's contextlib.redirect_stdout and contextlib.redirect_stderr. These standard library context managers provide a built-in abstraction over this operation:

from io import StringIO 
from contextlib import redirect_stdout

with StringIO() as buf, redirect_stdout(buf):
    val = buf.getvalue()