Failing the CI build if django migrations are out of date

chris48s

2021-02-10

A common mistake in django is to make a model change but forget to run makemigrations to generate a migration for the model change. Sometimes it is not entirely obvious when this need to happen. For example, let's say I'm using the django-extensions library and I define a model like:

# models.py

from django.db import models
from django_extensions.db.models import TimeStampedModel

class MyModel(TimeStampedModel, models.Model):
    pass

In this situation, upgrading django-extensions to a new version might require me to regenerate the migrations in my app, even though I haven't made any changes to models.py and overlooking this could generate unexpected results.

Fortunately there is a simple thing I can do to detect and warn if this happens: If I run

python manage.py makemigrations --check

in my CI build, this will cause the build to fail if my migrations are out of sync with the models, warning me about the problem.

Changing the tyres while the car is moving

chris48s

2021-01-31

I am currently working on a project where I need to migrate a legacy test suite from nose to pytest. The codebase has about 7,000 lines of test code. That isn't an enormous test suite, but I'm the only person who will be working on it. It will take me a while to get through them all because moving from one test runner to another will require changes to fixtures, factories, etc and maybe some light codebase refactoring as well as just reviewing/updating the test code itself. I also need to balance this with continuing to deliver bugfix and feature work. I can't block all delivery on the test suite migration, so here's a pattern:

Split the test suite into two dirs: /tests/nose and /tests/pt
Migrate one module of test code at a time
Have the CI build run the old tests with nose and the new ones with pytest and merge the coverage reports:

- name: Run nose tests
  run: |
    nosetests
      --with-coverage \
      --cover-package=dir \
      --cover-erase \
      path/to/tests/nose

- name: Run pytest tests
  run: |
    pytest \
      --cov=dir \
      --cov-append \
      path/to/tests/pt

When I rewrite the last test module left in /tests/nose, we can finally drop nose, delete any nose-specific test helpers and move all the tests from /tests/pt back to /tests.

This approach buys me several useful things:

I can tackle the test suite migration one module at a time
I can submit small pull requests that are easy to review
I can start writing tests for any new features or bugfixes in pytest right now
I can minimise the chance of conflicts between test suite migration and bugfix/feature delivery

This is a very simple example of a high-level general pattern for managing non-trivial migrations (like moving a project from one web framework to another). This can be applied to larger more complex migrations and enables us to "change the tyres while the car is moving" (a phrase I stole from my colleague Adrià):

Set up a structure that allows both $OLD_BORING_THING and $NEW_SHINY_THING to run at the same time
Gradually move code from $OLD_BORING_THING to $NEW_SHINY_THING
When we migrate the last bit of code from $OLD_BORING_THING to $NEW_SHINY_THING, remove the structure that allows $OLD_BORING_THING and $NEW_SHINY_THING to co-exist and delete $OLD_BORING_THING
Continue to deliver our roadmap in parallel to the migration

Using jq to format JSON on the clipboard

chris48s

2021-01-07

Jq is a great tool. I use it frequently, but this recent article from Sequoia McDowell still has some great tips in it. Well worth a read. One thing that stood out to me from this post was the snippet:

I like this pretty-printing/formatting capability so much, I have an alias that formats JSON I've copied (in my OS "clipboard") & puts it back in my clipboard:

alias jsontidy="pbpaste | jq '.' | pbcopy"

That's crazy-useful, but I need to tweak it a bit. Firstly, as noted, pbcopy and pbpaste are mac utils. No worries. We can substitute xclip on Linux:

xclip -selection clipboard -o | jq '.' | xclip -selection clipboard

I spotted a minor problem here though: If I run this and the text on my clipboard isn't JSON then jq will exit with a non-zero code, output something like parse error: Invalid numeric literal at line 1, column 6 to stderr and nothing to stdout. Pipe doesn't care about the non-zero exit code though so we overwrite whatever was on the clipboard with an empty string. So lets add a bit of error handing:

alias jsontidy="xclip -selection clipboard -o | (jq '.' || xclip -selection clipboard -o) | xclip -selection clipboard"

Now jsontidy will format whatever is on the clipboard if it can parse as JSON but leave it alone otherwise.

A python CLI app which accepts input from stdin or a file

chris48s

2020-12-07

This excellent guide on Command Line Interface Guidelines has been shared widely over the last few days and it includes a huge variety of advice on writing elegant and well-behaved command line tools. The whole thing is well worth a read, but I'm going to pick out one quote to focus on in this post:

If your command is expecting to have something piped to it and stdin is an interactive terminal, display help immediately and quit. This means it doesn't just hang, like cat.

Tools that can optionally accept input via pipe are very useful and this pattern is entirely possible with python and argparse, but not completely obvious. Here's a simple example of a python program which:

Can accept input as a file: ./myscript.py /path/to/file.txt
Can accept input via a pipe: echo 'foo' | ./myscript.py
Will print help and exit if invoked interactively with no arguments: ./myscript.py (instead of just hanging and waiting for input)

#!/usr/bin/env python3

import argparse
import sys

def main():
    parser = argparse.ArgumentParser(
        description='A well-behaved CLI app which accepts input from stdin or a file'
    )
    parser.add_argument(
        'file',
        nargs='?',
        help='Input file, if empty stdin is used',
        type=argparse.FileType('r'),
        default=sys.stdin,
    )
    args = parser.parse_args()

    if args.file.isatty():
        parser.print_help()
        return 0

    sys.stdout.write(args.file.read())
    return 0

if __name__ == '__main__':
    sys.exit(main())

pip 20.3

chris48s

2020-11-30

There is a very long-standing issue on the pip repository: pip needs a dependency resolver. Most language package managers (e.g: composer, bundler, cargo, etc) either use a resolver to derive a consistent dependency tree and prevent incompatible installations, or in the case of NPM/yarn allow "broken diamond" resolution (where more than one version of the same package can be installed at the same time). For the longest time, pip has had no true resolver, allowing incompatible dependencies to be installed. Until now..

Today's release of Pip 20.3 is the culmination of a long programme of work to implement a proper dependency resolver in pip. This makes pip 20.3 the most significant pip release in a very long time, possibly ever.

It has actually been possible to preview this feature for some time using the --use-feature=2020-resolver flag in pip 20.2.x or by installing the beta releases of pip 20.3, but pip 20.3 is the first stable release to enable the new resolver by default. This means that a command like:

pip install virtualenv==20.0.2 six==1.11

printf "virtualenv==20.0.2\nsix==1.11" > requirements.txt
pip install -r requirements.txt

will now not install any new packages and throw a helpful error like:

ERROR: Cannot install six==1.11 and virtualenv==20.0.2 because these package versions have conflicting dependencies.

The conflict is caused by:
    The user requested six==1.11
    virtualenv 20.0.2 depends on six<2 and >=1.12.0

To fix this you could try to:
1. loosen the range of package versions you've specified
2. remove package versions to allow pip attempt to solve the dependency conflict

ERROR: ResolutionImpossible: for help visit https://pip.pypa.io/en/latest/user_guide/#fixing-conflicting-dependencies

which is a massive improvement over the previous behaviour. There is a gotcha though. Resolution only considers the packages being installed in that command. It doesn't take into account other packages already installed in your (virtual) environment. This means running

pip install virtualenv==20.0.2 && pip install six==1.11

doesn't fail to install six==1.11 with a ResolutionImpossible error. It will still install the incompatible package and warn you that it has done that:

ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
ERROR: virtualenv 20.0.2 requires six<2,>=1.12.0, but you'll have six 1.11.0 which is incompatible.

This isn't a bug. This behaviour is noted in the error message and documented in the release notes, but it is worth understanding this limitation. Even with the new resolver pip will still install incompatible dependencies in your environment if given the right combination of commands. This is a situation it is possible to wander into without realising or consciously enabling the legacy behaviour with --use-deprecated=legacy-resolver. While it is useful to have some further improved messaging around this, it is still a bit of a disappointment.

I've previously written about poetry. I'm now using poetry for all projects where I'm the only person who works on it and I have no plan to change that (poetry's behaviour is still more advanced here and IMO preferable), but it is impossible to work in the wider python community without encountering pip. This feature reduces that point of friction and it is great to see, but it isn't a silver bullet for preventing incompatible dependencies. It will be interesting to see how the wider community responds as users encounter this behaviour change for the first time.

Three useful psql settings

chris48s

2020-11-26

By default psql doesn't display NULLs, but you can configure it to show NULLs as a printable character or string e.g: \pset null '█' or \pset null '(null)'. This makes it is easier to distinguish NULL from empty string.
\timing on appends an execution time to each result e.g: Time: 0.412 ms. \timing off turns it off.
By default, psql stores every line in history. Setting \set HISTCONTROL ignoredups ignores lines which match the previous history line. This is a common default for shells and REPLs.

These can be used on an ad-hoc basis or added to .psqlrc.

Show table create statement in Postgres

chris48s

2020-11-22

MySQL has a handy statement SHOW CREATE TABLE my-table;. Postgres doesn't have this but it can be achieved with pg_dump and the --schema-only flag.

pg_dump -t 'public.my-table' --schema-only my-database

Rich

chris48s

2020-11-10

I gave Rich a spin recently and its an absolute game-changer when it comes to building nice looking terminal interfaces in python. Rich provides elegant high-level abstractions for using colours, tables, markdown and more. See the README and API docs for a more detailed showcase of Rich's capabilities.

Glob Pattern tester

chris48s

2020-11-01

Little glob pattens are everywhere, but non-trivial ones can be tricky to construct. Digital Ocean have put together this handy tool for constructing glob patterns and testing them against lists of strings. This does for glob patterns what regexr does for regular expressions.

Documenting a python project with markdown

chris48s

2020-10-20

Last time I looked at writing documentation for a python library from scratch (a few years ago), there wasn't really a solution that allowed me to write my docs in markdown, render it to a static site and incorporate docstrings from my code into the generated documentation. MkDocs allows you to write your free text in markdown, but at the time there wasn't really a good solution for pulling docstrings into the rendered docs. Conversely, Sphinx did have a good solution for pulling in docstrings but forced you to use reStructuredText for the free text.

Things have changed a bit since then though, on both sides of that equation. I wanted to write a docs site for my geometry-to-spatialite library, so I decided to investigate a few different options.

MkDocs

In the MkDocs ecosystem there are now several plugins that allow you to incorporate docstrings into markdown documentation.

Mkdocstrings

Mkdocstrings currently has one handler which parses docstrings written in the popular Google style e.g:

"""
Function description

Args:
    param1 (int): The first parameter.
    param2 (str): The second parameter.

Returns:
    bool: True for success, False otherwise.
"""

At the time of writing the documentation says "This project currently only works with the Material theme". This project is young, but it does seem to have some traction around it and I wouldn't be surprised to see support for more themes and docstring styles added. The tabular rendered outputs of this plugin are really nice and it does a great job of mixing information about type hints, required params and default values from the function declarations with the descriptions from the docstrings. This minimises the need to repeat this information.

MkAutoDoc

MkAutoDoc takes different approach. MkAutoDoc doesn't support any of the popular python docstring formats (Google, reST, NumPy). Instead it allows you to embed markdown in code comments and include those in your documentation. One the one hand, this allows you to really go all-in on markdown on a completely new project, but on the other it doesn't set out any particular structure for arguments, return types, etc (and hence can't do anything smart with them, like mkdocstrings does), and if you've already got docstrings in your code in any of the common formats, it won't parse them.

MkAutoDoc is primarily used by projects under the encode organisation to build the docs for projects like httpx and starlette.

Jetblack-markdown

Honourable mention for this one. Jetblack-markdown uses docstring_parser under-the-hood so in theory should work with reST, Google, and Numpydoc-style docstrings, although the test suite currently only covers the Google-style. It does also require a bit of extra CSS for styling. I gave this one a quick go with the readthedocs theme and it produces attractive rendered outputs. This one is a very young project, but worth keeping an eye on 👀

Sphinx

Meanwhile in the Sphinx ecosystem, there are already good solutions for working with docstrings in the form of the Autodoc extension to parse docstrings and include them in the rendered output and Napoleon which adds support for the Google and NumPy formats. There's also a new package that helps us out on the free text side.

MyST Parser

MyST parser is a markdown flavour and parser for Sphinx. This is a bit of a game changer because the sphinx ecosystem has years of maturity and a large plugin ecosystem and MyST basically solves my biggest issue with sphinx. The main downside of MyST parser is that it is a slightly leaky abstraction. To access some plugin functionality, we have to drop in an {eval-rst} block and other Sphinx extensions still assume your source content is reST, so although it is possible to write the main body of text in markdown, you won't be able to use markdown everywhere.

Conclusion

Predictably, there isn't a single obvious conclusion. There are no solutions, only tradeoffs. However in 2020 there are multiple viable options for documenting your python project with markdown 🎉

After conducing this roundup, I wrote a docs site for geometry-to-spatialite using

Sphinx
MyST Parser for markdown parsing
Autodoc to parse docstrings and include them in the rendered output
Napoleon to pre-process Google-style docstrings for autodoc
ghp-import to deploy the generated HTML to github pages

Being able to use sphinx and write (mostly) markdown seems like a good place to be for now, but there are multiple projects in the MkDocs ecosystem which I'll be monitoring and revisting as they mature and gain traction.