Developing pyrealm
This page gives an overview of the process of contributing code to the pyrealm
package, along with the development environment and tools you will need to setup to work
with the codebase.
What is a package contributor?
Being a contributor is all about helping improve the pyrealm package. That could be
something very small, like fixing typos in the package website, or something large, like
adding a draft of an entirely new science module to the package.
We welcome all contributions, but we need to manage contributions of code and documentation to make sure everything works properly together and to keep the code and documentation consistent. We do a lot of this by using some automated tools that help keep the package well organised and ensure that it keeps giving the same results through time.
These tools take a bit of getting used to and the rest of this document sets out how to get your computer set up to run them. It is a good idea to start off with a small contribution in order to get used to the workflow - please do reach out to other developers for help in getting things to work if you run into problems. We will expect you to have read this document and the linked details pages, but we do not expect them to be a perfect or complete explanation!
Contributing code
The workflow for contributing to pyrealm currently follows the Gitflow strategy. The
basic workflow is described below but this AWS
link
provides an overview of the strategy.
Decide what you want to work on. This could be an existing bug or feature request or could be something new. If it is new, then create a new issue on Github describing what you want to change or add. The issue tracker provides templates for bugs and feature requests: please do provide as much detail as possible on the bug or the feature you would like to provide. If you want to work on an existing issue, then just add a comment and say you would like to work on it.
https://github.com/ImperialCollegeLondon/pyrealm/issues
Whatever issue you do want to work on, do give other developers a chance to comment on suggestions before putting a lot of effort in!
On Github issue pages, there is a development link to “create a branch” for the issue. The branch name will then start with the issue number, which makes branches much easier to track, and is explicitly linked to the issue. Feel free to shorten the branch name - it uses the issue title by default.
Check that branch out locally and make commits to it, pushing them to GitHub regularly. Do try and make frequent small commits with clear, specific commit messages: a commit does not mean that an issue is completed, just that you want to record your progress. The commit history can always be compressed at the merge stage (see below).
Create a pull request (PR) from the issue branch onto the
developbranch. The PR description should tag the issue being addressed and explain how the incoming code fixes the issue. You can start a PR as ‘draft’ PR: this can be a useful way to start describing a PR content and checking that testing is passing before opening a PR up for review.We prefer pull requests to be small, with the aim of reviewing and merging frequently the smallest functional unit of work that you can. This helps stop pull requests getting stalled on more and more complex tasks and makes code review fast.
Check that the continuous integration testing passes and fix any issues causing test failures.
Request reviews from other package developers using the Review section on the PR page. A PR cannot be merged into
developuntil at least one approving review has been added to the code. Reviews will often suggest changes to the code and you should discuss those suggestions and implement them.Hopefully, you will have talked to other developers during the process of writing the PR and should have some ideas of who to ask for a review. If not, please request
davidormeto review the PR and we can then work out which of the core team is best placed to give feedback.Once a PR has been approved, the PR can be merged into
developand the branch can be deleted.The
Merge Pull Requestbutton provides alternative merge strategies. The default is to create a “merge commit” - all of the commits on the PR are merged individually todevelop- but you can also “squash and commit” - which squashes all of the commits into a single commit and message before merging. Squashing commits can be really helpful to avoid a bunch of minor ‘typo’ commit messages, but can also make it harder to find commits that made bigger changes on a branch. In general, we use “merge commits”, but if the commit history on a branch is mostly a sequence of minor edits, feel free to squash.
The package development environment
The short descriptions below provide the key commands needed to set up your development
environment and provide links to more detailed descriptions of code development for
pyrealm. The example setup script below gathers
the commands together into a single script, currently only for Linux.
Python environment
You will need to install Python to develop pyrealm. The package is currently tested
against Python version 3.12. You should install this version for developing pyrealm.
We highly recommend using pyenv or
pyenv-win to manage your Python
installations. These tools allow you to manage multiple different python versions in
parallel and to switch between them. However, these extra steps are not necessary to get
started.
Package management
We use poetry for dependency
management and for managing development environments and you will need to install it.
The pyrealm package currently uses poetry version 2.0.1 and you should specify this
when installing to avoid conflicts with the package management process.
For the typical installation process, this would be as simple as:
curl -SSL https://install.python-poetry.org | python3 - --version 2.0.1
Installing pyrealm
To develop pyrealm, you will also need to install git and
then clone the pyrealm GitHub repository.
git clone https://github.com/ImperialCollegeLondon/pyrealm.git
You can now use poetry to install the package dependencies. This is not just the
package requirements for end users of the package, but also a wider set of tools used in
package development. poetry uses the
pyproject.toml
file to configure the dependencies that will be installed.
poetry install
Poetry uses a virtual environment for package development: all packages are installed to
a stand-alone python environment that is only used for pyrealm development. This makes
sure that the development environment is consistent across python versions and different
developers. However, when you are working on the command line, you need to explicitly
use the pyrealm environment to run any command that needs to use the pyrealm
environment - and that is pretty much everything described in this document. There are
two options to do this:
You can add
poetry runbefore a command to make sure that single command is run using thepoetryenvironment. This approach is used in the example commands below.You can also activate the Python environment so that commands can be run without having to use
poetry run. This is usually more convenient.The command
poetry env activatecan be used to print out the command needed to activate the environment. You can simply copy and paste this command, but you can use commands to activate it automatically. For the details for your system, see .
You should now be able to run the following command to see that pyrealm is installed
and is showing the current version.
poetry run python -c "import pyrealm; print(pyrealm.__version__)"
Updating poetry and package versions
You will not need to do this when setting up your development environment but one of the
things that poetry does is to maintain a fixed set of compatible required packages.
The pyproject.toml files sets constraints on package versions, but the particular
combination to be used for a given commit is resolved and stored in the poetry.lock
file.
If you want to add a package - either using
poetry addor by manually updatingpyproject.toml- you will then need to runpoetry updateto check that a compatible set of package versions exists and to update thepoetry.lockfile.If you want to update a package then
poetry updatewill update all the required packages and updatepoetry.lock. You can usepoetry update package_nameto only update a particular requirement.The
poetry installcommand - as shown above - can be re-run to re-install the package. You will typically only need to do this if commands provided by the package have changed and need to be updated.
If you pull code from GitHub that changes pyproject.toml and poetry.lock, you should
also run poetry update to bring your environment in line with other developers.
Installing and using pre-commit
Development of the pyrealm package uses pre-commit. This
is a python tool that runs a set of checks on git commits and stops the commit from
completing when any of those checks fail. We use pre-commit to help catch a wide range
of common issues and make sure that all code pushed to the GitHub repository meets some
simple quality assurance checks and uses some common formatting standards.
There is a detailed description of the pre-commit output and the configured checks
and update process on the code quality assurance page.
Briefly, we use pre-commit to catch inconsistent formatting and variable typing and to
run the widely-used flake8 code checking suite.
The pre-commit tool is installed by the poetry install step above, so you now need
to install the pyrealm configuration for pre-commit and run the tool to set up the
environment and check it is all working.
poetry run pre-commit install
poetry run pre-commit run --all-files
That might take a little while to run on the first use. Once you have done this, every
git commit will generate similar output and your commit will fail if issues are found.
Static typing with mypy
The python programming language does not require code objects to be typed, but the
pyrealm package uses type hints to annotate code.
Those type hints are then checked using the mypy static type checker, which is
installed by poetry and is run as one of the pre-commit checks.
The mypy package and the plugins we use are all installed by poetry. See the code
quality assurance page for more information on using mypy.
Package testing
All code in the pyrealm package should have accompanying unit tests, using pytest.
Look at the existing test suite in the tests/unit directory to see the structure and
get a feel for what they should do, but essentially unit tests should provide a set of
known inputs to a function and check that the expected answer (which could be an
Exception) is generated.
Again, the pytest package and plugins are installed by poetry. See the code testing
page for more details but you should be able to check the tests run
using the following command. Be warned that the mypy steps can be very time consuming
on the first run, but pytest does some caching that makes them quicker when they next
run.
poetry run pytest
Code profiling
We use a specific set of pytest tests to carry out code profiling using
the pytest-profile plugin. This makes it easier to spot functions that are running
slowly within the pyrealm code. We also use our own profiling/run-benchmarking.py
script to compare code profiles between versions to check that new code has not impacted
performance.
See the profiling and benchmarking page for more details on running profiling.
The pyrealm-build-data package
The pyrealm repository includes the pyrealm-build-data
package, which is used to provide a range of datasets used in
package testing and in documentation.
Documentation
We use sphinx to maintain the documentation for pyrealm and Google style docstrings
using the napoleon formatting to provide API documentation for the code. We use MyST
Markdown to provide dynamically built usage examples. See the documentation
page for details but to get started, the following code can be used
to build the documentation.
# Build docs using sphinx
cd docs
poetry run sphinx-build -W --keep-going source build
Once that command completes, the file docs/build/html/index.html can be opened to view
the built documentation.
GitHub Actions
We use GitHub Action workflows to update pre-commit, run code quality checks on pull
requests, and to automate profiling and release publication. See the GitHub Actions
page for details.
Package version releases
We use trusted publishing from GitHub releases to release new versions of pyrealm to
PyPI. Releases are also picked up and archived on
Zenodo. See the release process
page for details.
Setup script example
The scripts below bundle all the commands together to show the set up process, including
using pyenv to manage python versions, ending by running the unit tests. This sets
up everything you need, ready to start developing on pyrealm.
Setup script
!/bin/bash
# pyenv and poetry use sqlite3. You _may_ need to install these requirements first.
sudo apt install sqlite3 sqlite3-doc libsqlite3-dev
# install pyenv to manage parallel python environments
curl <https://pyenv.run> | bash
# Manually edit .bash_profile or .profile to setup pyenv:
# export PYENV_ROOT="$HOME/.pyenv":
# [[ -d $PYENV_ROOT/bin ]] && export PATH="$PYENV_ROOT/bin:$PATH":
# eval "$(pyenv init -)"
# Install a python version
pyenv install 3.12
# Install poetry
curl -sSL https://install.python-poetry.org | python3 -
# Manually add poetry to path in profile file:
# export PATH="/home/validate/.local/bin:$PATH"
# Clone the repository
git clone https://github.com/ImperialCollegeLondon/pyrealm.git
# Configure the pyrealm repo to use python 3.12
cd pyrealm
pyenv local 3.12
poetry env use 3.12
# Install the package with poetry
poetry install
# Install pre-commit and check
poetry run pre-commit install
poetry run pre-commit run --all-files
# Run the test suite
poetry run pytest