--- jupytext: formats: md:myst text_representation: extension: .md format_name: myst format_version: 0.13 kernelspec: display_name: Python 3 language: python name: python3 language_info: codemirror_mode: name: ipython version: 3 file_extension: .py mimetype: text/x-python name: python nbconvert_exporter: python pygments_lexer: ipython3 version: 3.11.9 --- # Profiling and benchmarking We use code profiling to assess the performance of `pyrealm` code and compare it to previous code versions to identify bottlenecks and guard against degraded performance when code changes. Profiling and benchmarking can be run manually when you have made changes to the code and want to check it doesn't impact performance. The tools listed below can highlight where there may be issues in the code which should be fixed prior to integration by a pull request. ## Latest performance results The two plots below show the current profiling and benchmarking results. The **profiling plot** shows where time is spent in the different calls during the profiling tests on the codebase at a single commit. ![profiling plot](../../../profiling/call-graph.svg) The **benchmarking** plot shows the cumulative time per call for successive commits and allows developers to check if an incoming commit has caused code to slow down more than a given threshold (currently 5% longer run times or greater). ![benchmarking plot](../../../profiling/performance-plot.png) ## Running code profiling We use the [pytest-profiling](https://pypi.org/project/pytest-profiling/) plugin to run a set of profiling tests and generate profiling data. These tests are located in `tests/profiling` and consist of a small set of high-level scripts that are intended to use a large proportion of the `pyrealm` codebase with reasonably large inputs. All tests in the profiling suite are decorated with `@pytest.mark.profiling`. Tests with this mark are excluded from the standard `pytest` testing via the `-m "not profiling"` argument in `setup.cfg`. Any test can be decorated with the `profiling` mark to move it temporarily into the profiling test suite. To run the profiling test suite and generate code profiling data, run `pytest` as follows: ```bash poetry run pytest --profile-svg -m "profiling" ``` This selects _only_ the profiling tests and runs them using `pytest-profiling`. The `--profile-svg` both runs the profiling _and_ generates a figure showing the call hierachy of code objects and the time spent in each call. Generating this graph requires the [graphviz](https://pypi.org/project/graphviz/) command line library, which provides the `dot` command for generating SVG graph diagrams. You will need to install `graphviz` to use this option. Alternatively, you can use the following command to only generate the profile data. ```bash poetry run pytest --profile -m "profiling" ``` The `pytest-profiling` plugin saves data and graphs to the `prof` directory, which is excluded from the `git` repository. The key files are the combined results: `prof/combined.prof` and `prof/combined.svg`. ### Scaling of the profiling with problem size The profiling tests use a couple of smaller datasets that are then tiled to scale up the problem size. The size of these scaling factors can be controlled from the command line. * This scaling up is partly to get more stable profiling results - small problems have a lot of runtime noise, leading to benchmarking fails. * Having variable dataset size also allows profiling to look at how the code performance scales. * The scaling up also affects the peak memory usage of the tests, which can lead to issues with running the tests on local machines and GitHub Action runners. The data below shows how the peak memory usage changes with problem set scaling factors. The peak memory size is estimated using, `/usr/bin/time -l` and changing the scaling factors (note that `-v` is required instead on Linux systems). For example: ```sh /usr/bin/time -l pytest tests/profiling/test_profiling_pmodel.py \ -m "profiling" --pmodel-profile-scaleup 10 ``` :::{list-table} test_profiling_pmodel.py :header-rows: 1 * - pmodel_profile_scaleup - peak memory footprint in GB * - 40 - 35.84 * - 20 - 19.74 * - 10 - 9.95 ::: :::{list-table} test_profiling_splash.py :header-rows: 1 * - splash_profile_scaleup - peak memory footprint in GB * - 500 - 22.47 * - 250 - 11.54 * - 125 - 5.95 ::: The peak memory usage of the full profiling suite is currently around 11GB with the defaults of `--splash-profile-scaleup 125` and `--pmodel-profile-scaleup 6`. ## Benchmarking code performance ### Simple benchmarking The `profiling` directory contains a tool for performing simple regression testing, `performance_regression_checking.sh`. This can identify if changes have affected the speed of the code by comparing the overall time taken to run the profiling test functions. The script can be run without arguments to compare the current `HEAD` with the `origin/develop` branch. Alternatively, command line arguments can be used to compare any two commits: ```bash ./performance_regression_checking -n [NEW-COMMIT] -o [OLD-COMMIT] ``` ### Advanced benchmarking When `pytest-profiling` runs, the resulting `prof/combined.prof` file contains detailed information on all the calls invoked in the test code, including the number of times each call is made and the time spent on each call. The `prof/combined.svg` shows where time is spent during the test runs, which identifies bottlenecks, but it is also useful to check that the time spent on each function call has not increased markedly when code is revised. To do this, the `profiling` directory also contains the `run_benchmarking.py` tool. This will check if any function is more than a certain tolerance (5% by default) slower than a previous run. The usage of the tool is: ```text usage: run_benchmarking.py [-h] [--exclude EXCLUDE] [--n-runs N_RUNS] [--tolerance TOLERANCE] [--update-on-pass] [--plot-path PLOT_PATH] prof_path database_path fail_data_path label Run the package benchmarking. This function runs the standard benchmarking for the pyrealm package. The profiling tests in the test suite generate a set of combined profile data across the package functionality. This command then reads in a set of combined profile data and compares it to previous benchmark data. The profiling excludes all profiled code objects matching regex patterns provided using the `--exclude` argument. The defaults exclude standard and site packages, built in code and various other standard code, and are intended to reduce the benchmarking to only code objects within the package. positional arguments: prof_path Path to pytest-profiling output database_path Path to benchmarking database fail_data_path Output path for data on benchmark fails label A text label for the incoming profiling results, typically a commit SHA options: -h, --help show this help message and exit --exclude EXCLUDE Exclude profiled code matching a regex pattern, can be repeated (default: ['{.*}', '<.*>', '/lib/']) --n-runs N_RUNS Number of most recent runs to use in benchmarking (default: 5) --tolerance TOLERANCE Tolerance of time cost increase in benchmarking (default: 0.05) --update-on-pass Update the profiling database if benchmarking passes (default: False) --plot-path PLOT_PATH Generate a benchmarking plot to this path (default: None) ``` To perform the comparison it is necessary to first generate a performance database for at least one previous version. This is likely to be `origin/develop` for incorporating changes. The workflow is therefore: * Checkout the previous commit to be used for comparison: `git checkout origin/develop` * Perform the profiling: `poetry run pytest --profile -m "profiling"` * Run `run_benchmarking.py` to generate the performance database, `profiling/profiling-database.csv` (Remove it first if it already exists). ```bash poetry run python profiling/run_benchmarking.py \ prof/combined.prof profiling/profiling-database.csv \ profiling/benchmark-fails.csv PREVIOUS ``` * Return to the commit to be benchmarked: `git checkout -` * Perform the profiling: `poetry run pytest --profile -m "profiling"` * Re-run `run_benchmarking.py` to benchmark the new code. ```bash poetry run python profiling/run_benchmarking.py \ prof/combined.prof profiling/profiling-database.csv \ profiling/benchmark-fails.csv INCOMING \ --plot-path profiling/performance-plot.png ``` * Check the results to see if the relative performance of the incoming code is notably slower than the previous performance. The second call to `run_benchmarking.py` will run the benchmark checks and print a success or failure message to the screen depending upon if any functions have increased by more than the tolerance. It will also create `performance-plot.png` which shows how the relative performance of each function has changed, and highlights any functions which have slowed down by more than the tolerance. If the benchmarking fails, then the file `benchmark-fails.csv` is created, containing the incoming and database performance data for all processes that have failed benchmarks. In the above code `PREVIOUS` and `INCOMING` are used as labels for the old and new code, respectively. But it can be more useful to use the commit SHA to identify the profiled code more explicitly. The SHA is a unique hash calculated from summary information for each commit. The SHA is 40 characters long, but is usually truncated to the first 7 characters. This can be shown for the last commit using `git rev-parse --short HEAD`. ### Resolving failed benchmarking If benchmarking fails then the incoming code has introduced possibly troublesome performance issues. If the code can be made more efficient, then submit commits to fix the performance and re-run the benchmarking. ## Updating performance results The results shown [above](#latest-performance-results) can be updated by: * Copying the call graph generated by profiling (`prof/combined.svg`) to `profiling/call-graph.svg` * Committing the new version of `profiling/performance-plot.png` to show what has changed.