Skip to main content

Querying GitHub Repos in Bulk

I've recently been working on a tool called pip-abandoned, which allows you to search for abandoned and deprecated python packages.

In order to make this, one of the things I needed to do was fetch the archived property for lots of GitHub repositories. Using the GitHub v3 (rest) API, this would need one round-trip per repo. So if I want to fetch the archived property for

then I need to make three API calls:

..and if I had a list of 200 GitHub repos, that would require 200 individual network requests.

Fortunately, GitHub also has a GraphQL API. Using the GraphQL API, we can query more than one repo in a single request. To fetch the same data as above using the GraphQL API, I can make the following query

query {
  pygments: repository(
    owner: "pygments", name: "pygments"
  ) {
    isArchived
  }
  mccabe: repository(
    owner: "pycqa", name: "mccabe"
  ) {
    isArchived
  }
  commonmark: repository(
    owner: "readthedocs", name: "commonmark.py"
  ) {
    isArchived
  }
}

to retrieve the archived property for those three repos in a single round-trip over the network. When querying a large number of repos this is a big benefit.