Posts

  • HTML 5 Kitchen Sink

    HTML 5 Kitchen Sink is really useful for testing out themes and stylesheets. It also has the helpful side effect of introducing me to (or reminding me about) some of the less common HTML5 elements that exist in the spec as I use it.


  • Composite Actions vs Reusable Workflows

    A few days after I blogged about GitHub Composite Actions, GitHub launched another similar feature: Reusable Workflows.

    There is a lot of overlap between these features and there are certainly some tasks that could be accomplished with either. Simultaneously, there are some important differences that drive a bit of a wedge between them.

    • A composite action is presented as one “step” when it is invoked in a workflow, even if the action yaml contains multiple steps. Invoking a reusable workflow presents each step separately in the summary output. This can make debugging a failed composite action run harder.

    • Reusable workflows can use secrets, whereas a composite action can’t. You have to pass secrets in as parameters. Reusable workflows are also always implicitly passed secrets.GITHUB_TOKEN. This is often convenient, but another way to see this tradeoff would be to say: If you’re using a reusable workflow published by someone else, it can always read your GITHUB_TOKEN var with whatever scopes that is granted, which may not always be desirable. A composite action can only read what you explicitly pass it.

    • Both can only take string, number or boolean as a param. Arrays are not allowed.

    • Only a subset of job keywords can be used when calling a reusable workflow. This places some restrictions on how they can be used. To give an example, reusable workflows can’t be used with a matrix but composite actions can, so

      jobs:
        build:
          strategy:
            matrix:
              param: ['foo', 'bar']
      
          uses: chris48s/my-reusable-workflow/.github/workflows/reuse-me.yml@main
          with:
            param: ${{ matrix.param }}
      

      will throw an error, but

      jobs:
        build:
          runs-on: ubuntu-latest
          strategy:
            matrix:
              param: ['foo', 'bar']
          steps:
            - uses: chris48s/my-shared-action@main
              with:
                param: ${{ matrix.param }}
      

      is valid

    • Steps in a composite action can not use if: conditions, although there are workarounds.

    • A composite action is called as a job step so a job that calls a composite action can have other steps (including calling other composite actions). A job can only call one reusable workflow and can’t contain other steps.


  • GitHub Composite Actions

    GitHub quietly sneaked out a new feature in the last few weeks: Composite Actions. This solves one of my biggest issues with GH Actions which is that yaml workflow steps couldn’t be re-used. This means I have lots of repos with the exact same yaml workflow, or roughly the same workflow with slight variations. For example these 3 repos (and others) currently use the exact same workflow for CI:

    Composite actions essentially allows yaml workflow steps to be packaged and re-used in the same way that javascript actions or docker actions can be, so I can package this up as a shared action and use it in lots of repos. This is a feature I have been hoping would be added to Actions for some time and I’m looking forward to DRYing up my workflows across repos over the coming weeks.


  • Browse The Web Like A Crawler

    Some sites sneakily serve different content to crawlers and interactive users. Setting your user agent to:

    Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)
    

    allows you to see what GoogleBot sees. Uaswitcher for firefox has a preset for this out of the box.


  • Learnings from a year on Heroku

    We migrated shields.io to Heroku just over a year ago. Heroku is a great platform but through the process of running a high-traffic application (on a busy day we serve ~30 million requests) I’ve got to know some of the idiosyncrasies of the platform. In this post I will run through a few of the esoteric details that have surprised me along the way as I’ve discovered them.

    📅 Daily Restarts

    Heroku’s dyno manager restarts or cycles your dynos once per day. This behaviour is non-configurable. With the default settings, these daily restarts may also lead to a bit of downtime due to Heroku’s default dyno lifecycle, which leads us to..

    🔛 Preboot

    A surprising default behaviour of Heroku is that when you deploy or restart (or Heroku restarts your dyno for you), Heroku stops the existing set of web dynos before starting the new ones. Turning on the preboot feature means Heroku will ensure the new dynos are receiving traffic before shutting down the old ones.

    heroku features:enable preboot -a <myapp>
    

    💥 Crash Restart Policy

    Sometimes bugs hit production. In the case that your application crashes, for whatever reason, Heroku will automatically restart the dyno for you but it implements a backoff policy. In some cases this can lead to a situation where some of your dynos are spending over 5 hours not serving any traffic because they are in a cool-off period instead of recovering immediately. If this happens, the Heroku web UI will report you are running N dynos even when less than N of them are able to actively serve traffic because some are in a cool-off period waiting to start. Running

    heroku ps -a <myapp>
    

    does tell you which dynos are up and which (if any) are crashed. This brings us on to the next point..

    👑 The CLI is King

    If you’re planning on working with Heroku in any depth, you need to get familiar with the CLI. There are certain operations which can be performed via the CLI or API which just aren’t available via the Web UI or where the CLI allows you to see more detailed information than the Web UI can show you. Fortunately Heroku’s CLI app is very high quality. The --help is very discoverable and it includes examples as well as a reference. This makes it really easy to just jump in and work things out as you go.

    I’ve already hinted at heroku features earlier in the post, but there are a variety of useful flags which can be listed with

    heroku features -a <myapp>  # stable
    heroku labs -a <myapp>  # experemental
    

    Many of these also can’t be enabled/disabled via the Web UI.


  • Running a GitHub action on pull request merge

    Github workflows can be triggered by a variety of events. The pull_request event has 14 different activity types. I recently wanted to write an action that runs after a pull request is merged and I was surprised to find that merged is not one of the supported activity types. This is easy enough to work around using the following snippet, but a surprising omission.

    on:
      pull_request:
        types: [closed]
    
    jobs:
      my-job:
        if: github.event.pull_request.merged == true
        steps: ...
    

  • Identifying poorly indexed tables in Postgres

    Out of the box the Postgres statistics collector gives us a variety of useful views we can query to understand the usage and performance of a database. This is a huge topic, so I’m just going to look at one very constrained topic in this post.

    Sequential scans are OK on small tables or if we are making them infrequently, but usually if we are performing a lot of sequential scans on large tables this is a sign that we need to create an index. Fortunately Postgres collects some data in pg_stat_user_tables that can help us identify this:

    • Looking at seq_scan and idx_scan can show us tables where we are frequently performing sequential scans rather than indexed scans
    • Looking at seq_scan and seq_tup_read can show us tables where sequential scans typically process a large number of rows

    Running this query:

    SELECT
        relname, idx_scan, seq_scan, seq_tup_read,
    
        seq_tup_read / seq_scan
        AS average_tuples_per_scan,
    
        ((seq_scan / (seq_scan + idx_scan) :: FLOAT) * 100)
        AS percent_sequential_scans
    FROM pg_stat_user_tables
    WHERE seq_scan > 0
    ORDER BY average_tuples_per_scan DESC;
    

    Should provide a pretty clear picture of tables that need optimisation.


  • Renovate Custom Managers

    Tools like Dependabot and PyUp are amazing for automatically keeping dependencies updated. They work fine, as long as your dependencies are distributed via a supported language-specific package manager and specified in a manifest file. This doesn’t account for 100% of application dependencies though, particularly when it comes to deployment. This is where Renovate has a killer feature: Custom Managers. The setup is a little fiddly, but this allows Renovate to submit pull requests to arbitrary files like a Dockerfile or an ansible playbook bumping applications like nginx, curl or Postgres which aren’t specified in a requirements.txt or a package.json.


  • git --word-diff

    Git’s default diff behaviour can make it difficult to parse edits to long lines of text. This is often an issue when editing documentation

    unhelpful git diff Thanks git, but what actually changed?

    In these situations, the --word-diff option can be used to generate a diff that is easier to read.

    helpful git diff Aah that’s better!


  • Capturing stdout in python

    Sometimes it is helpful to capture stdout/stderr. I usually use this when writing tests to make assertions about terminal output or just suppress it when the tests are running. Normally I would do a little switcheroo like this to manually manipulate sys.stdout:

    from io import StringIO 
    import sys
    
    sys.stdout = StringIO()
    print('foobar')
    val = sys.stdout.getvalue()
    sys.stdout = sys.__stdout__
    

    but today I leaned about python’s contextlib.redirect_stdout and contextlib.redirect_stderr. These standard library context managers provide a built-in abstraction over this operation:

    from io import StringIO 
    from contextlib import redirect_stdout
    
    with StringIO() as buf, redirect_stdout(buf):
        print('foobar')
        val = buf.getvalue()
    

  • Failing the CI build if django migrations are out of date

    A common mistake in django is to make a model change but forget to run makemigrations to generate a migration for the model change. Sometimes it is not entirely obvious when this need to happen. For example, let’s say I’m using the django-extensions library and I define a model like:

    # models.py
    
    from django.db import models
    from django_extensions.db.models import TimeStampedModel
    
    class MyModel(TimeStampedModel, models.Model):
        pass
    

    In this situation, upgrading django-extensions to a new version might require me to regenerate the migrations in my app, even though I haven’t made any changes to models.py and overlooking this could generate unexpected results.

    Fortunately there is a simple thing I can do to detect and warn if this happens: If I run

    python manage.py makemigrations --check
    

    in my CI build, this will cause the build to fail if my migrations are out of sync with the models, warning me about the problem.


  • Changing the tyres while the car is moving

    I am currently working on a project where I need to migrate a legacy test suite from nose to pytest. The codebase has about 7,000 lines of test code. That isn’t an enormous test suite, but I’m the only person who will be working on it. It will take me a while to get through them all because moving from one test runner to another will require changes to fixtures, factories, etc and maybe some light codebase refactoring as well as just reviewing/updating the test code itself. I also need to balance this with continuing to deliver bugfix and feature work. I can’t block all delivery on the test suite migration, so here’s a pattern:

    • Split the test suite into two dirs: /tests/nose and /tests/pt
    • Migrate one module of test code at a time
    • Have the CI build run the old tests with nose and the new ones with pytest and merge the coverage reports:

      - name: Run nose tests
        run: |
          nosetests
            --with-coverage \
            --cover-package=dir \
            --cover-erase \
            path/to/tests/nose
      
      - name: Run pytest tests
        run: |
          pytest \
            --cov=dir \
            --cov-append \
            path/to/tests/pt
      
    • When I rewrite the last test module left in /tests/nose, we can finally drop nose, delete any nose-specific test helpers and move all the tests from /tests/pt back to /tests.

    This approach buys me several useful things:

    • I can tackle the test suite migration one module at a time
    • I can submit small pull requests that are easy to review
    • I can start writing tests for any new features or bugfixes in pytest right now
    • I can minimise the chance of conflicts between test suite migration and bugfix/feature delivery

    This is a very simple example of a high-level general pattern for managing non-trivial migrations (like moving a project from one web framework to another). This can be applied to larger more complex migrations and enables us to “change the tyres while the car is moving” (a phrase I stole from my colleague Adrià):

    • Set up a structure that allows both $OLD_BORING_THING and $NEW_SHINY_THING to run at the same time
    • Gradually move code from $OLD_BORING_THING to $NEW_SHINY_THING
    • When we migrate the last bit of code from $OLD_BORING_THING to $NEW_SHINY_THING, remove the structure that allows $OLD_BORING_THING and $NEW_SHINY_THING to co-exist and delete $OLD_BORING_THING
    • Continue to deliver our roadmap in parallel to the migration

  • Using jq to format JSON on the clipboard

    Jq is a great tool. I use it frequently, but this recent article from Sequoia McDowell still has some great tips in it. Well worth a read. One thing that stood out to me from this post was the snippet:

    I like this pretty-printing/formatting capability so much, I have an alias that formats JSON I’ve copied (in my OS “clipboard”) & puts it back in my clipboard:

    alias jsontidy="pbpaste | jq '.' | pbcopy"
    

    That’s crazy-useful, but I need to tweak it a bit. Firstly, as noted, pbcopy and pbpaste are mac utils. No worries. We can substitute xclip on Linux:

    xclip -selection clipboard -o | jq '.' | xclip -selection clipboard
    

    I spotted a minor problem here though: If I run this and the text on my clipboard isn’t JSON then jq will exit with a non-zero code, output something like parse error: Invalid numeric literal at line 1, column 6 to stderr and nothing to stdout. Pipe doesn’t care about the non-zero exit code though so we overwrite whatever was on the clipboard with an empty string. So lets add a bit of error handing:

    alias jsontidy="xclip -selection clipboard -o | (jq '.' || xclip -selection clipboard -o) | xclip -selection clipboard"
    

    Now jsontidy will format whatever is on the clipboard if it can parse as JSON but leave it alone otherwise.


  • A python CLI app which accepts input from stdin or a file

    This excellent guide on Command Line Interface Guidelines has been shared widely over the last few days and it includes a huge variety of advice on writing elegant and well-behaved command line tools. The whole thing is well worth a read, but I’m going to pick out one quote to focus on in this post:

    If your command is expecting to have something piped to it and stdin is an interactive terminal, display help immediately and quit. This means it doesn’t just hang, like cat.

    Tools that can optionally accept input via pipe are very useful and this pattern is entirely possible with python and argparse, but not completely obvious. Here’s a simple example of a python program which:

    • Can accept input as a file: ./myscript.py /path/to/file.txt
    • Can accept input via a pipe: echo 'foo' | ./myscript.py
    • Will print help and exit if invoked interactively with no arguments: ./myscript.py (instead of just hanging and waiting for input)
    #!/usr/bin/env python3
    
    import argparse
    import sys
    
    def main():
        parser = argparse.ArgumentParser(
            description='A well-behaved CLI app which accepts input from stdin or a file'
        )
        parser.add_argument(
            'file',
            nargs='?',
            help='Input file, if empty stdin is used',
            type=argparse.FileType('r'),
            default=sys.stdin,
        )
        args = parser.parse_args()
    
        if args.file.isatty():
            parser.print_help()
            return 0
    
        sys.stdout.write(args.file.read())
        return 0
    
    if __name__ == '__main__':
        sys.exit(main())
    

  • pip 20.3

    There is a very long-standing issue on the pip repository: pip needs a dependency resolver. Most language package managers (e.g: composer, bundler, cargo, etc) either use a resolver to derive a consistent dependency tree and prevent incompatible installations, or in the case of NPM/yarn allow “broken diamond” resolution (where more than one version of the same package can be installed at the same time). For the longest time, pip has had no true resolver, allowing incompatible dependencies to be installed. Until now..

    Today’s release of Pip 20.3 is the culmination of a long programme of work to implement a proper dependency resolver in pip. This makes pip 20.3 the most significant pip release in a very long time, possibly ever.

    It has actually been possible to preview this feature for some time using the --use-feature=2020-resolver flag in pip 20.2.x or by installing the beta releases of pip 20.3, but pip 20.3 is the first stable release to enable the new resolver by default. This means that a command like:

    pip install virtualenv==20.0.2 six==1.11
    

    or

    printf "virtualenv==20.0.2\nsix==1.11" > requirements.txt
    pip install -r requirements.txt
    

    will now not install any new packages and throw a helpful error like:

    ERROR: Cannot install six==1.11 and virtualenv==20.0.2 because these package versions have conflicting dependencies.
    
    The conflict is caused by:
        The user requested six==1.11
        virtualenv 20.0.2 depends on six<2 and >=1.12.0
    
    To fix this you could try to:
    1. loosen the range of package versions you've specified
    2. remove package versions to allow pip attempt to solve the dependency conflict
    
    ERROR: ResolutionImpossible: for help visit https://pip.pypa.io/en/latest/user_guide/#fixing-conflicting-dependencies
    

    which is a massive improvement over the previous behaviour. There is a gotcha though. Resolution only considers the packages being installed in that command. It doesn’t take into account other packages already installed in your (virtual) environment. This means running

    pip install virtualenv==20.0.2 && pip install six==1.11
    

    doesn’t fail to install six==1.11 with a ResolutionImpossible error. It will still install the incompatible package and warn you that it has done that:

    ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
    ERROR: virtualenv 20.0.2 requires six<2,>=1.12.0, but you'll have six 1.11.0 which is incompatible.
    

    This isn’t a bug. This behaviour is noted in the error message and documented in the release notes, but it is worth understanding this limitation. Even with the new resolver pip will still install incompatible dependencies in your environment if given the right combination of commands. This is a situation it is possible to wander into without realising or consciously enabling the legacy behaviour with --use-deprecated=legacy-resolver. While it is useful to have some further improved messaging around this, it is still a bit of a disappointment.

    I’ve previously written about poetry. I’m now using poetry for all projects where I’m the only person who works on it and I have no plan to change that (poetry’s behaviour is still more advanced here and IMO preferable), but it is impossible to work in the wider python community without encountering pip. This feature reduces that point of friction and it is great to see, but it isn’t a silver bullet for preventing incompatible dependencies. It will be interesting to see how the wider community responds as users encounter this behaviour change for the first time.


  • Three useful psql settings

    • By default psql doesn’t display NULLs, but you can configure it to show NULLs as a printable character or string e.g: \pset null '█' or \pset null '(null)'. This makes it is easier to distinguish NULL from empty string.
    • \timing on appends an execution time to each result e.g: Time: 0.412 ms. \timing off turns it off.
    • By default, psql stores every line in history. Setting \set HISTCONTROL ignoredups ignores lines which match the previous history line. This is a common default for shells and REPLs.

    These can be used on an ad-hoc basis or added to .psqlrc.


  • Show table create statement in Postgres

    MySQL has a handy statement SHOW CREATE TABLE my-table;. Postgres doesn’t have this but it can be achieved with pg_dump and the --schema-only flag.

    pg_dump -t 'public.my-table' --schema-only my-database
    

  • Rich

    I gave this library a spin recently and its an absolute game-changer when it comes to building nice looking terminal interfaces in python. Rich provides elegant high-level abstractions for using colours, tables, markdown and more. See the README and API docs for a more detailed showcase of Rich’s capabilities.


  • Glob Pattern tester

    Little glob pattens are everywhere, but non-trivial ones can be tricky to construct. Digital Ocean have put together this handy tool for constructing glob patterns and testing them against lists of strings. This does for glob patterns what regexr does for regular expressions.


  • Documenting a python project with markdown

    Last time I looked at writing documentation for a python library from scratch (a few years ago), there wasn’t really a solution that allowed me to write my docs in markdown, render it to a static site and incorporate docstrings from my code into the generated documentation. MkDocs allows you to write your free text in markdown, but at the time there wasn’t really a good solution for pulling docstrings into the rendered docs. Conversely, Sphinx did have a good solution for pulling in docstrings but forced you to use reStructuredText for the free text.

    Things have changed a bit since then though, on both sides of that equation. I wanted to write a docs site for my geometry-to-spatialite library, so I decided to investigate a few different options.

    MkDocs

    In the MkDocs ecosystem there are now several plugins that allow you to incorporate docstrings into markdown documentation.

    Mkdocstrings

    Mkdocstrings currently has one handler which parses docstrings written in the popular Google style e.g:

    """
    Function description
    
    Args:
        param1 (int): The first parameter.
        param2 (str): The second parameter.
    
    Returns:
        bool: True for success, False otherwise.
    """
    

    At the time of writing the documentation says “This project currently only works with the Material theme”. This project is young, but it does seem to have some traction around it and I wouldn’t be surprised to see support for more themes and docstring styles added. The tabular rendered outputs of this plugin are really nice and it does a great job of mixing information about type hints, required params and default values from the function declarations with the descriptions from the docstrings. This minimises the need to repeat this information.

    MkAutoDoc

    MkAutoDoc takes different approach. MkAutoDoc doesn’t support any of the popular python docstring formats (Google, reST, NumPy). Instead it allows you to embed markdown in code comments and include those in your documentation. One the one hand, this allows you to really go all-in on markdown on a completely new project, but on the other it doesn’t set out any particular structure for arguments, return types, etc (and hence can’t do anything smart with them, like mkdocstrings does), and if you’ve already got docstrings in your code in any of the common formats, it won’t parse them.

    MkAutoDoc is primarily used by projects under the encode organisation to build the docs for projects like httpx and starlette.

    Jetblack-markdown

    Honourable mention for this one. Jetblack-markdown uses docstring_parser under-the-hood so in theory should work with reST, Google, and Numpydoc-style docstrings, although the test suite currently only covers the Google-style. It does also require a bit of extra CSS for styling. I gave this one a quick go with the readthedocs theme and it produces attractive rendered outputs. This one is a very young project, but worth keeping an eye on 👀

    Sphinx

    Meanwhile in the Sphinx ecosystem, there are already good solutions for working with docstrings in the form of the Autodoc extension to parse docstrings and include them in the rendered output and Napoleon which adds support for the Google and NumPy formats. There’s also a new package that helps us out on the free text side.

    MyST Parser

    MyST parser is a markdown flavour and parser for Sphinx. This is a bit of a game changer because the sphinx ecosystem has years of maturity and a large plugin ecosystem and MyST basically solves my biggest issue with sphinx. The main downside of MyST parser is that it is a slightly leaky abstraction. To access some plugin functionality, we have to drop in an {eval-rst} block and other Sphinx extensions still assume your source content is reST, so although it is possible to write the main body of text in markdown, you won’t be able to use markdown everywhere.

    Conclusion

    Predictably, there isn’t a single obvious conclusion. There are no solutions, only tradeoffs. However in 2020 there are multiple viable options for documenting your python project with markdown 🎉

    After conducing this roundup, I wrote a docs site for geometry-to-spatialite using

    • Sphinx
    • MyST Parser for markdown parsing
    • Autodoc to parse docstrings and include them in the rendered output
    • Napoleon to pre-process Google-style docstrings for autodoc
    • ghp-import to deploy the generated HTML to github pages

    Being able to use sphinx and write (mostly) markdown seems like a good place to be for now, but there are multiple projects in the MkDocs ecosystem which I’ll be monitoring and revisting as they mature and gain traction.