Profiling and benchmarking
We use code profiling to assess the performance of pyrealm code and compare it to
previous code versions to identify bottlenecks and guard against degraded performance
when code changes.
Profiling and benchmarking can be run manually when you have made changes to the code and want to check it doesn’t impact performance. The tools listed below can highlight where there may be issues in the code which should be fixed prior to integration by a pull request.
Latest performance results
The plot below shows the current profiling results. This shows where time is spent in the different calls during the profiling tests on the codebase at a single commit.
Running code profiling
We use the pytest-profiling plugin to run
a set of profiling tests and generate profiling data. These tests are located in
tests/profiling and consist of a small set of high-level scripts that are intended to
use a large proportion of the pyrealm codebase with reasonably large inputs.
All tests in the profiling suite are decorated with @pytest.mark.profiling. Tests with
this mark are excluded from the standard pytest testing via the -m "not profiling"
argument in setup.cfg. Any test can be decorated with the profiling mark to move it
temporarily into the profiling test suite.
To run the profiling test suite and generate code profiling data, run pytest as
follows:
poetry run pytest --profile-svg -m "profiling"
This selects only the profiling tests and runs them using pytest-profiling. The
--profile-svg both runs the profiling and generates a figure showing the call
hierarchy of code objects and the time spent in each call. Generating this graph
requires the graphviz command line library, which
provides the dot command for generating SVG graph diagrams. You will need to install
graphviz to use this option. Alternatively, you can use the following command to only
generate the profile data.
poetry run pytest --profile -m "profiling"
The pytest-profiling plugin saves data and graphs to the prof directory, which is
excluded from the git repository. The key files are the combined results:
prof/combined.prof and prof/combined.svg.
Scaling of the profiling with problem size
The profiling tests use a couple of smaller datasets that are then tiled to scale up the problem size. The size of these scaling factors can be controlled from the command line.
This scaling up is partly to get more stable profiling results - small problems have a lot of runtime noise, leading to benchmarking fails.
Having variable dataset size also allows profiling to look at how the code performance scales.
The scaling up also affects the peak memory usage of the tests, which can lead to issues with running the tests on local machines and GitHub Action runners.
The data below shows how the peak memory usage changes with problem set scaling factors.
The peak memory size is estimated using, /usr/bin/time -l and changing the scaling
factors (note that -v is required instead on Linux systems). For example:
/usr/bin/time -l pytest tests/profiling/test_profiling_pmodel.py \
-m "profiling" --pmodel-profile-scaleup 10
pmodel_profile_scaleup |
peak memory footprint in GB |
|---|---|
40 |
35.84 |
20 |
19.74 |
10 |
9.95 |
splash_profile_scaleup |
peak memory footprint in GB |
|---|---|
500 |
22.47 |
250 |
11.54 |
125 |
5.95 |
The peak memory usage of the full profiling suite is currently around 11GB with the
defaults of --splash-profile-scaleup 125 and --pmodel-profile-scaleup 6.
Benchmarking code performance
Simple benchmarking
The profiling directory contains a tool for performing simple regression testing,
performance_regression_checking.sh. This can identify if changes have affected the
speed of the code by comparing the overall time taken to run the profiling test
functions.
The script can be run without arguments to compare the current HEAD with the
origin/develop branch. Alternatively, command line arguments can be used to compare
any two commits:
./performance_regression_checking.sh -n [NEW-COMMIT] -o [OLD-COMMIT]
The -s flag can be used to increase the problem scaling relative to the defaults in
order to reduce the variability of the results. For example, -s 2 would result in
--splash-profile-scaleup 250 and --pmodel-profile-scaleup 12. Likewise, it can be
used to reduce the scaleup, for example -s 0.5 gives --splash-profile-scaleup 62 and
--pmodel-profile-scaleup 3.
Advanced benchmarking
When pytest-profiling runs, the resulting prof/combined.prof file contains detailed
information on all the calls invoked in the test code, including the number of times
each call is made and the time spent on each call. The prof/combined.svg shows where
time is spent during the test runs, which identifies bottlenecks, but it is also useful
to check that the time spent on each function call has not increased markedly when code
is revised.
This can be done by passing the -a flag to the regression test script:
./performance_regression_checking.sh -n [NEW-COMMIT] -o [OLD-COMMIT] -a
This will check if any function is more than a certain tolerance (5% by default) slower
than the old version. It will produce a plot similar to that below in
prof/performance-plot.png showing how the cumulative time per call has changed between
the two versions.
Note
Variance in the run time can be significant when looking at individual functions. This
can limit the effectiveness of the advanced benchmarking results. This may be reduced by
running on exclusive cores in a cluster or by scaling up using the -s flag.

It is also possible to perform the advanced benchmarking manually using the
run_benchmarking.py tool in the profiling directory. This can be used to compare
against more than one previous version.
Click to see details:
The usage of the tool is:
usage: run_benchmarking.py [-h] [--exclude EXCLUDE] [--n-runs N_RUNS]
[--tolerance TOLERANCE] [--update-on-pass]
[--plot-path PLOT_PATH]
prof_path database_path fail_data_path label
Run the package benchmarking.
This function runs the standard benchmarking for the pyrealm package. The profiling
tests in the test suite generate a set of combined profile data across the package
functionality. This command then reads in a set of combined profile data and
compares it to previous benchmark data.
The profiling excludes all profiled code objects matching regex patterns provided
using the `--exclude` argument. The defaults exclude standard and site packages,
built in code and various other standard code, and are intended to reduce the
benchmarking to only code objects within the package.
positional arguments:
prof_path Path to pytest-profiling output
database_path Path to benchmarking database
fail_data_path Output path for data on benchmark fails
label A text label for the incoming profiling results, typically a
commit SHA
options:
-h, --help show this help message and exit
--exclude EXCLUDE Exclude profiled code matching a regex pattern, can be repeated
(default: ['{.*}', '<.*>', '/lib/'])
--n-runs N_RUNS Number of most recent runs to use in benchmarking (default: 5)
--tolerance TOLERANCE Tolerance of time cost increase in benchmarking (default: 0.05)
--update-on-pass Update the profiling database if benchmarking passes (default:
False)
--plot-path PLOT_PATH Generate a benchmarking plot to this path (default: None)
To perform the comparison it is necessary to first generate a performance database for
at least one previous version. This is likely to be origin/develop for incorporating
changes. The workflow is therefore:
Checkout the previous commit to be used for comparison:
git checkout origin/developPerform the profiling:
poetry run pytest --profile -m "profiling"Run
run_benchmarking.pyto generate the performance database,profiling/profiling-database.csv(Remove it first if it already exists).poetry run python profiling/run_benchmarking.py \ prof/combined.prof profiling/profiling-database.csv \ profiling/benchmark-fails.csv PREVIOUS
Return to the commit to be benchmarked:
git checkout -Perform the profiling:
poetry run pytest --profile -m "profiling"Re-run
run_benchmarking.pyto benchmark the new code.poetry run python profiling/run_benchmarking.py \ prof/combined.prof profiling/profiling-database.csv \ profiling/benchmark-fails.csv INCOMING \ --plot-path profiling/performance-plot.png
Check the results to see if the relative performance of the incoming code is notably slower than the previous performance.
The second call to run_benchmarking.py will run the benchmark checks and print a
success or failure message to the screen depending upon if any functions have increased
by more than the tolerance. It will also create performance-plot.png which shows how
the relative performance of each function has changed, and highlights any functions
which have slowed down by more than the tolerance. If the benchmarking fails, then the
file benchmark-fails.csv is created, containing the incoming and database performance
data for all processes that have failed benchmarks.
In the above code PREVIOUS and INCOMING are used as labels for the old and new
code, respectively. But it can be more useful to use the commit SHA to identify the
profiled code more explicitly. The SHA is a unique hash calculated from summary
information for each commit. The SHA is 40 characters long, but is usually truncated to
the first 7 characters. This can be shown for the last commit using git rev-parse --short HEAD.
Resolving failed benchmarking
If benchmarking fails then the incoming code has introduced possibly troublesome performance issues. If the code can be made more efficient, then submit commits to fix the performance and re-run the benchmarking.
Updating performance results
The results shown above can be updated by:
Copying the call graph generated by profiling (
prof/combined.svg) toprofiling/call-graph.svgCopying the advanced benchmarking plot (
prof/performance-plot.png) toprofiling/performance-plot.png.