Skip to content

Latest commit

 

History

History
251 lines (168 loc) · 10.3 KB

new_repository.md

File metadata and controls

251 lines (168 loc) · 10.3 KB

Setting up a new repository

By default, all our repositories should be public and MIT licensed, unless there's a specific reason that's not possible.

This document describes how to set up a new repository:

  • Within GitHub (user permissions, branch protection)
  • Applying code linters (black, ruff, mypy)
  • As a package source (setup.py, pypi preparation)

Repository Template

The cpg-python-template-repo template contains a README & MIT software license, and has linter configuration & github actions configured. This can be used as a starting template to simplify setup.

To use this template you can either create a new repository and select this from the drop-down templates, or navigate to the template repository and click Use This Template to begin. Branch protection rules, user access, and pip packaging (where appropriate) still need to be added manually.


After you have created a GitHub repository, you should change the following settings:

merge settings

Only allow merge commits for forks in which you'll incorporate upstream changes. Squash merging keeps the history much cleaner.

Under Branches, add a branch protection rule to enforce reviews for the main branch:

branch rule

Under Manage Access, add collaborators. Prefer to add teams instead of individual people. It's common to add populationgenomics/software-team and populationgenomics/genomic-analysis-team with write permissions.

The next step is to initiate a README.md, add an MIT license, and a .gitignore file, unless these files were already added previously via the GitHub web interface.

Following that, you may want to set up linters for code style and error checks, and (if the project can be shipped as a package) set up versioning and automated artifact builds. This document provides tips on how to set these things up.

Linters

To help us in implementing a consistent coding style throughout our code base, we use git pre-commit hooks with a set of linters that check and/or reformat the files in the repository.

  • pre-commit comes with a set of hooks that perform some very useful inspections:
    • check-yaml to check YAML file correctness,
    • end-of-file-fixer that automatically makes sure every file ends with exactly one line end character,
    • trailing-whitespace that removes whitespace in line ends,
    • check-case-conflict checks for files with names that would conflict on a case-insensitive filesystems like MacOS,
    • check-merge-conflict check files that contain merge conflict strings,
    • detect-private-key checks for existence of private keys
    • debug-statements checks for debugger imports and py37+ breakpoint() calls in Python source
    • check-added-large-files prevents giant files from being committed (larger than 500kB);
    • markdownlint checks the style of the Markdown code;
  • ruff and mypy check Python code style in accordance with PEP8, and perform static analysis to catch potential programming errors.
  • black reformats Python code to make it conform to PEP8.

Setting up pre-commit

Instead of setting up fresh pre-commit configuration, the template repository already has the following content:

  • .pre-commit-config.yaml - contains inspection settings for each tool
  • .markdownlint.json - settings for markdownlint
  • pyproject.toml - settings for black and ruff

Editing the content of these files can modify the behaviour of individual tools. Please copy these files as required from the cpg-python-template repo where you want to add these to an existing, or non-templated repository.

Enabling pre-commit plugins

Install pre-commit globally, or into your virtual environment. You don't need to install ruff or other tools, as pre-commit will do that for you.

pip install pre-commit

You can run pre-commit manually:

pre-commit run --all-files

Optional: Run pre-commit hooks with Git

pre-commit install --install-hooks

On every git commit, the code will be automatically checked, and possibly reformatted. If any of the checks didn't pass, or any reformatting was done, the actual git commit command will not be performed. Instead, you can act upon linters' suggestions and re-run the git commit command afterwards.

Disabling inspections

Note that you may find some linters produce false positives, or just find some checks irrelevant for your particular project. In this case, you may want to modify the configuration files to disable additional inspections.

Carefully consider before disbaling specific inspections for an entire project.

Ruff

To disable an inspection for rust, you can either:

# ruff: noqa: F401
# When at the start of the file, this disables F401 for the whole file

# disable F401 for this specific line
from foo import bar # noqa: F401

# disable ruff for this specific line
from foo import bar # noqa

Or add it to the pyproject.toml.

Mypy

Note that mypy won't typecheck a function if the parameters and return types aren't completely typed.

You can disable typing for a specific line by adding:

x: int = "bla"  # type: ignore

Black

To hide a piece of code from being reformatted with black, you can surround your code with # fmt: off and # fmt: on.

Pylint / Flake8

In some of our repositories, we stil use Pylint / Flake8. Flake8 uses the same semantic as ruff. For pylint, disable a specific inspection with # pylint: disable=<inspection-id>. For examples, look at the Hail code base.

Setting up setup.py

For Python projects, you want to create a setup.py file in the root of your project. See analysis-runner/setup.py as an example to bootstrap from.

After creating setup.py, you can install your code as a Python package into your dev environment in the "editable" mode with:

pip install --editable .

This will make sure that any changes to the code base will be immediataly reflected in the module and scripts you just installed, so you won't have to rerun pip install, unless you change setup.py again.

Versioning project

To version projects, we use a tool called bump2version. It helps to avoid editing version tags manually, which can be very error-prone. Instead, bump2version can increment the version of the project for you with one command. It relies on a config file called .bumpversion.cfg to determine the current project version, as well as the locations in the code where this version should be updated.

See the analysis-runner/.bumpversion.cfg as an example.

To increment the new version, run bump2version <mode>, where <mode> is:

  • major: X.0.0
  • minor: 0.X.0
  • patch: 0.0.X

If you get the error: bumpversion: error: the following arguments are required: --new-version, you're probably in a directory without the config.

GitHub Actions

It is useful to have a GitHub Actions workflow set for your repository that will do a set of automated tasks, like check the code with linters, run tests, and/or package and ship the code.

To set up a workflow that would check the code with pre-commit on every git push or pull-request event, create a file called .github/workflows/lint.yaml based the following: cpg-utils/lint.yaml.

# .github/workflows/lint.yaml
name: Lint
on: [push]

jobs:
  lint:
    runs-on: ubuntu-latest
    defaults:
      run:
        shell: bash -l {0}

    steps:
    - uses: actions/checkout@v2

    - uses: actions/setup-python@v2
      with:
        python-version: '3.10'

    - name: Install packages
      run: pip install -r requirements.txt

    - name: pre-commit
      run: pre-commit run --all-files

After pushing .github/actions/lint.yaml and requirements.txt, GitHub will know to run linters on every push and pull request, and display checks in the web interface.

Adding GitHub Actions

You can set up GitHub Actions to build and upload the package to the Pypi, so it becomes available to install with pip install <package>. We traditionally push to pypi on every push to the main branch. We don't override packages, so ensure you add a bumpversion commit to release the package.

First, you need to create a GitHub secret with the Pypi token. To find the token, please contact a software team member.

Finally, set up another GitHub workflow:

# .github/workflows/deploy.yaml

name: Deploy
on:
  push:
    branches:
      - main

jobs:
  package:
    runs-on: ubuntu-latest
    defaults:
      run:
        shell: bash -l {0}
    steps:
    - uses: actions/checkout@main

    - uses: actions/setup-python@v2
      with:
        python-version: '3.10'

    - name: Build
      run: python setup.py sdist

    - name: Test install
      run: pip install dist/*

    - name: Test import
      run: python -c "import <package>"

    - name: Publish the wheel to PyPI
      uses: pypa/gh-action-pypi-publish@release/v1
      with:
        user: __token__
        password: ${{ secrets.PYPI_API_TOKEN }}
        packages-dir: dist/
        skip-existing: true