Skip to content

Commit

Permalink
differences for PR #20
Browse files Browse the repository at this point in the history
  • Loading branch information
actions-user committed Mar 5, 2024
1 parent 657fd57 commit 3cf6ae7
Show file tree
Hide file tree
Showing 29 changed files with 5,956 additions and 159 deletions.
Binary file modified fig/latency.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
4 changes: 2 additions & 2 deletions fig/latency.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@
https://www.intel.com/content/www/us/en/developer/articles/technical/memory-performance-in-a-nutshell.html
"""
import matplotlib.pyplot as plt
label = ["L1 cache", "L2 cache", "L3 cache", "RAM", "SSD", "HDD", "Ldn->Ca->Ldn"]
label = ["L1 cache", "L2 cache", "L3 cache", "RAM", "SSD", "HDD", "Atlantic round trip"]
latency_ns = [1, 4, 40, 80, 8000, 80000, 140000000]

plt.figure().set_figheight(2)
Expand All @@ -17,4 +17,4 @@
plt.xscale('symlog')
plt.tight_layout()
plt.savefig('latency.png')
plt.show()
plt.show()
5,745 changes: 5,745 additions & 0 deletions fig/stack.ai

Large diffs are not rendered by default.

Binary file added fig/stack.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added files/pred-prey/predprey_out.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added files/schelling_out.prof
Binary file not shown.
Binary file added files/travelling-sales/profiles.zip
Binary file not shown.
Binary file added files/travelling-sales/profiles/out_01.prof
Binary file not shown.
Binary file added files/travelling-sales/profiles/out_02.prof
Binary file not shown.
Binary file added files/travelling-sales/profiles/out_03.prof
Binary file not shown.
Binary file added files/travelling-sales/profiles/out_04.prof
Binary file not shown.
Binary file added files/travelling-sales/profiles/out_05.prof
Binary file not shown.
Binary file added files/travelling-sales/profiles/out_06.prof
Binary file not shown.
Binary file added files/travelling-sales/profiles/out_07.prof
Binary file not shown.
Binary file added files/travelling-sales/profiles/out_08.prof
Binary file not shown.
Binary file added files/travelling-sales/profiles/out_09.prof
Binary file not shown.
Binary file added files/travelling-sales/profiles/out_10.prof
Binary file not shown.
22 changes: 11 additions & 11 deletions md5sum.txt
Original file line number Diff line number Diff line change
Expand Up @@ -4,17 +4,17 @@
"config.yaml" "b413b2dfbce4f70e178cae4d6d2d6311" "site/built/config.yaml" "2024-02-08"
"index.md" "3a6d3683998a6b866c134a818f1bb46e" "site/built/index.md" "2024-02-15"
"links.md" "8184cf4149eafbf03ce8da8ff0778c14" "site/built/links.md" "2024-01-03"
"episodes/profiling-introduction.md" "a0163cbc57865b4fad063468ac4c0a41" "site/built/profiling-introduction.md" "2024-02-08"
"episodes/profiling-functions.md" "4ea67773010619ae5fbaa2dc69ecc4f6" "site/built/profiling-functions.md" "2024-02-08"
"episodes/profiling-lines.md" "8bd8cf015fcc38cdb004edf5fad75a65" "site/built/profiling-lines.md" "2024-02-08"
"episodes/profiling-conclusion.md" "340969a321636eb94fff540191a511e7" "site/built/profiling-conclusion.md" "2024-01-29"
"episodes/optimisation-introduction.md" "aff88de80645a433161ad48231f6fa7f" "site/built/optimisation-introduction.md" "2024-02-15"
"episodes/optimisation-data-structures-algorithms.md" "75dbff01d990fa1e99beec4b24b2b0ad" "site/built/optimisation-data-structures-algorithms.md" "2024-02-08"
"episodes/optimisation-minimise-python.md" "12d5c57fb3c31439d39c0d4997bdd323" "site/built/optimisation-minimise-python.md" "2024-02-15"
"episodes/optimisation-use-latest.md" "829f7a813b0a9a131fa22e6dbb534cf7" "site/built/optimisation-use-latest.md" "2024-02-08"
"episodes/optimisation-memory.md" "52c4b2884410050c9646cf987d2aa50e" "site/built/optimisation-memory.md" "2024-02-08"
"episodes/optimisation-conclusion.md" "1d608c565c199cea5e00dc5209f3da1b" "site/built/optimisation-conclusion.md" "2024-02-15"
"episodes/profiling-introduction.md" "7dae558b7851344dcb1746141b6fdf0a" "site/built/profiling-introduction.md" "2024-03-05"
"episodes/profiling-functions.md" "806c4d44bd7b0957030044a3010c3004" "site/built/profiling-functions.md" "2024-03-05"
"episodes/profiling-lines.md" "547d98ccf5dfb92abb63feb0e02fc8a2" "site/built/profiling-lines.md" "2024-03-05"
"episodes/profiling-conclusion.md" "a3c2deb1bc4efaaf4a2a70f966734b71" "site/built/profiling-conclusion.md" "2024-03-05"
"episodes/optimisation-introduction.md" "2c2bbafab97d4db78aa5735839516c81" "site/built/optimisation-introduction.md" "2024-03-05"
"episodes/optimisation-data-structures-algorithms.md" "e1242f561a4caca2071fd84ad515b79b" "site/built/optimisation-data-structures-algorithms.md" "2024-03-05"
"episodes/optimisation-minimise-python.md" "adbadeb1eedc1adfb0cf920c4ba4f341" "site/built/optimisation-minimise-python.md" "2024-03-05"
"episodes/optimisation-use-latest.md" "4c939e9dbde33a1f47fefe5e757ff256" "site/built/optimisation-use-latest.md" "2024-03-05"
"episodes/optimisation-memory.md" "69eb84dfc419083ff12856a80750a618" "site/built/optimisation-memory.md" "2024-03-05"
"episodes/optimisation-conclusion.md" "ccd780c447f0b0ce97b8da1b2572b9c1" "site/built/optimisation-conclusion.md" "2024-03-05"
"instructors/instructor-notes.md" "cae72b6712578d74a49fea7513099f8c" "site/built/instructor-notes.md" "2024-01-03"
"learners/setup.md" "50d49ff7eb0ea2d12d75773ce1decd45" "site/built/setup.md" "2024-01-29"
"learners/setup.md" "3465b1c09e7527d085eb32f647227dc6" "site/built/setup.md" "2024-03-05"
"learners/acknowledgements.md" "c4064263d442f147d3796cb3dfa7b351" "site/built/acknowledgements.md" "2024-02-08"
"profiles/learner-profiles.md" "60b93493cf1da06dfd63255d73854461" "site/built/learner-profiles.md" "2024-01-03"
2 changes: 1 addition & 1 deletion optimisation-conclusion.md
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,7 @@ This course's website can be used as a reference manual when profiling your own

::::::::::::::::::::::::::::::::::::: keypoints

Data Structures & Algorithms
- Data Structures & Algorithms
- List comprehension should be preferred when constructing lists.
- Where appropriate, Tuples and Generator functions should be preferred over Python lists.
- Dictionaries and sets are appropriate for storing a collection of unique data with no intrinsic order for random access.
Expand Down
136 changes: 35 additions & 101 deletions optimisation-data-structures-algorithms.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,36 +7,56 @@ exercises: 0
:::::::::::::::::::::::::::::::::::::: questions

- What's the most efficient way to construct a list?
- When should Tuples be used?
- When should generator functions be used?
- When should tuples be used?
- When are sets appropriate?
- What is the best way to search a list?

::::::::::::::::::::::::::::::::::::::::::::::::

::::::::::::::::::::::::::::::::::::: objectives

- Able to summarise how Lists and Tuples work behind the scenes.
- Able to summarise how lists and tuples work behind the scenes.
- Able to identify appropriate use-cases for tuples.
- Able to use generator functions in appropriate situations.
- Able to utilise dictionaries and sets effectively
- Able to use `bisect_left()` to perform a binary search of a list or array

::::::::::::::::::::::::::::::::::::::::::::::::

::::::::::::::::::::::::::::::::::::: Instructor

The important information for students to learn within this episode are the patterns demonstrated via the benchmarks.

This episode introduces many complex topics, these are used to ground the performant patterns in understanding to aid memorisation.

It should not be a concern to students if they find the data-structure/algorithm internals challenging, if they are still able to recognise the demonstrated patterns.

::::::::::::::::::::::::::::::::::::::::::::::::

::::::::::::::::::::::::::::::::::::: callout

## This episode is challenging!

Within this episode you will be introduced to how certain data-structures and algorithms work.

This is used to explain why one approach is likely to execute faster than another.

It matters that you are able to recognise the faster/slower approaches, not that you can describe or reimplement these data-structures and algorithms yourself.

:::::::::::::::::::::::::::::::::::::::::::::

## Lists

Lists are a fundamental data structure within Python.

It is implemented as a form of dynamic array found within many programming languages by different names (C++: `std::vector`, Java: `ArrayList`, R: `vector`, Julia: `Vector`).

They allows direct and sequential element access, with the convenience to append items.
They allow direct and sequential element access, with the convenience to append items.

This is achieved by internally storing items in a static array.
This array however can be longer than the `List`, so the current length of the list is stored alongside the array.
When an item is appended, the `List` checks whether it has enough spare space to add the item to the end.
If it doesn't, it will reallocate a larger array, copy across the elements, and deallocate the old array.
Before copying the item to the end and incrementing the counter which tracks the list's length.
This array however can be longer than the list, so the current length of the list is stored alongside the array.
When an item is appended, the list checks whether it has enough spare space to add the item to the end.
If it doesn't, it will re-allocate a larger array, copy across the elements, and deallocate the old array.
The item to be appended is then copied to the end and the counter which tracks the list's length is increemnted.

The amount the internal array grows by is dependent on the particular list implementation's growth factor.
CPython for example uses [`newsize + (newsize >> 3) + 6`](https://github.com/python/cpython/blob/a571a2fd3fdaeafdfd71f3d80ed5a3b22b63d0f7/Objects/listobject.c#L74), which works out to an over allocation of roughly ~12.5%.
Expand Down Expand Up @@ -92,7 +112,7 @@ Results will vary between Python versions, hardware and list lengths. But in thi

## Tuples

In contrast, Python's Tuples are immutable static arrays (similar to strings), their elements cannot be modified and they cannot be resized.
In contrast, Python's tuples are immutable static arrays (similar to strings), their elements cannot be modified and they cannot be resized.

Their potential use-cases are greatly reduced due to these two limitations, they are only suitable for groups of immutable properties.

Expand All @@ -113,100 +133,14 @@ This can be easily demonstrated with Python's `timeit` module in your console.
It takes 3x as long to allocate a short list than a tuple of equal length. This gap only grows with the length, as the tuple cost remains roughly static whereas the cost of allocating the list grows slightly.


## Generator Functions

You may not even require your data be stored in a list or tuple if it is only accessed once and in sequence.

Generators are special functions, that use `yield` rather than `return`. Each time the generator is called, it resumes computation until the next `yield` statement is hit to return the next value.

This avoids needing to allocate a data structure, and can greatly reduce memory utilisation.

Common examples for generators include:

* Reading from a large file that may not fit in memory.
* Any generated sequence where the required length is unknown.

The below example demonstrates how a generator function (`fibonnaci_generator()`) differs from one that simply returns a constructed list (`fibonacci_list()`).

```python
from timeit import timeit

N = 1000000
repeats = 1000

def fibonacci_generator():
a=0
b=1
while True:
yield b
a,b= b,a+b

def fibonacci_list(max_val):
rtn = []
a=0
b=1
while b < max_val:
rtn.append(b)
a,b= b,a+b
return rtn

def test_generator():
t = 0
max_val = N
for i in fibonacci_generator():
if i > max_val:
break
t += i

def test_list():
li = fibonacci_list(N)
t = 0
for i in li:
t += i

def test_list_long():
t = 0
max_val = N
li = fibonacci_list(max_val*10)
for i in li:
if i > max_val:
break
t += i

print(f"Gen: {timeit(test_generator, number=repeats):.5f}ms")
print(f"List: {timeit(test_list, number=repeats):.5f}ms")
print(f"List_long: {timeit(test_list_long, number=repeats):.5f}ms")
```

The performance of `test_generator()` and `test_list()` are comparable, however `test_long_list()` which generates a list with 5 extra elements (35 vs 30) is consistently slower.

```output
Gen: 0.00251ms
List: 0.00256ms
List_long: 0.00332ms
```

Unlike list comprehension, a generator function will normally involve a Python loop. Therefore, their performance is typically slower than list comprehension where much of the computation can be offloaded to the CPython backend.

::::::::::::::::::::::::::::::::::::: callout

The use of `max_val` in the previous example moves the value of `N` from global to local scope.

The Python interpreter checks local scope first when finding variables, therefore this makes accessing local scope variables slightly faster than global scope, this is most visible when a variable is being accessed regularly such as within a loop.

Replacing the use of `max_val` with `N` inside `test_generator()` causes the function to consistently perform a little slower than `test_list()`, whereas before the change it would normally be a little faster.

:::::::::::::::::::::::::::::::::::::::::::::


## Dictionaries

Dictionaries are another fundamental Python data-structure.
They provide a key-value store, whereby unique keys with no intrinsic order map to attached values.

::::::::::::::::::::::::::::::::::::: callout

> no intrinsic order
## "no intrinsic order"

Since Python 3.6, the items within a dictionary will iterate in the order that they were inserted. This does not apply to sets.

Expand All @@ -224,7 +158,7 @@ If that index doesn't already contain another key, the key (and any associated v
When the index isn't free, a collision strategy is applied. CPython's [dictionary](https://github.com/python/cpython/blob/main/Objects/dictobject.c) and [set](https://github.com/python/cpython/blob/main/Objects/setobject.c) both use a form of open addressing whereby a hash is mutated and corresponding indices probed until a free one is located.
When the hashing data structure exceeds a given load factor (e.g. 2/3 of indices have been assigned keys), the internal storage must grow. This process requires every item to be re-inserted which can be expensive, but reduces the average probes for a key to be found.

![An visual explanation of linear probing, CPython uses an advanced form of this.](episodes/fig/hash_linear_probing.png){alt='A diagram demonstrating how the keys (hashes) 37, 64, 14, 94, 67 are inserted into a hash table with 11 indices. This is followed by the insertion of 59, 80 and 39 which require linear probing to be inserted due to collisions.'}
![An visual explanation of linear probing, CPython uses an advanced form of this.](episodes/fig/hash_linear_probing.png){alt="A diagram demonstrating how the keys (hashes) 37, 64, 14, 94, 67 are inserted into a hash table with 11 indices. This is followed by the insertion of 59, 80 and 39 which require linear probing to be inserted due to collisions."}

To retrieve or check for the existence of a key within a hashing data structure, the key is hashed again and a process equivalent to insertion is repeated. However, now the key at each index is checked for equality with the one provided. If any empty index is found before an equivalent key, then the key must not be present in the ata structure.

Expand Down Expand Up @@ -253,7 +187,7 @@ class MyKey:
dict = {}
dict[MyKey("one", 2, 3.0)] = 12
```
The only limitation is that two objects where two objects are equal they must have the same hash, hence all member variables which contribute to `__eq__()` should also contribute to `__hash__()` and vice versa (it's fine to have irrelevant or redundant internal members contribute to neither).
The only limitation is that where two objects are equal they must have the same hash, hence all member variables which contribute to `__eq__()` should also contribute to `__hash__()` and vice versa (it's fine to have irrelevant or redundant internal members contribute to neither).

## Sets

Expand All @@ -263,7 +197,7 @@ Sets are used for eliminating duplicates and checking for membership, and will n

::::::::::::::::::::::::::::::::::::: challenge

## Unique Collection
## Exercise: Unique Collection

There are four implementations in the below example code, each builds a collection of unique elements from 25,000 where 50% can be expected to be duplicates.

Expand Down Expand Up @@ -415,7 +349,7 @@ These results are subject to change based on the number of items and the proport
::::::::::::::::::::::::::::::::::::: keypoints

- List comprehension should be preferred when constructing lists.
- Where appropriate, Tuples and Generator functions should be preferred over Python lists.
- Where appropriate, tuples should be preferred over Python lists.
- Dictionaries and sets are appropriate for storing a collection of unique data with no intrinsic order for random access.
- When used appropriately, dictionaries and sets are significantly faster than lists.
- If searching a list or array is required, it should be sorted and searched using `bisect_left()` (binary search).
Expand Down
15 changes: 7 additions & 8 deletions optimisation-introduction.md
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,7 @@ Now that you're able to find the most expensive components of your code with pro
In order to optimise code for performance, it is necessary to have an understanding of what a computer is doing to execute it.

<!-- Goal is to give you a high level understanding of how your code executes. You don't need to be an expert, even a vague general understanding will leave you in a stronger position. -->
Even a high-level understanding of a typical computer architecture; the most common data-structures and algorithms; and how Python executes your code, enable the identification of suboptimal approaches. If you have learned to write code informally out of necessity, to get something to work, it's not uncommon to have collected some bad habits along the way.
Even a high-level understanding of how you code executes, such as how Python and the most common data-structures and algorithms are implemented, can help you to identify suboptimal approaches when programming. If you have learned to write code informally out of necessity, to get something to work, it's not uncommon to have collected some bad habits along the way.

<!-- This is largely high-level/abstract knowledge applicable to the vast majority of programming languages, applies even more strongly if using compiled Python features like numba -->
The remaining content is often abstract knowledge, that is transferable to the vast majority of programming languages. This is because the hardware architecture, data-structures and algorithms used are common to many languages and they hold some of the greatest influence over performance bottlenecks.
Expand Down Expand Up @@ -53,18 +53,16 @@ When optimising your code, you are making speculative changes. It's easy to make
Testing is hopefully already a seamless part of your research software development process.
Test can be used to clarify how your software should perform, ensuring that new features work as intended and protecting against unintended changes to old functionality.

There are a plethora of methods for testing code. Most Python developers use the testing package [pytest](https://docs.pytest.org/en/latest/).
There are a plethora of methods for testing code.

## pytest Overview

Typically a developer will create a folder for their tests.
Tests can be split across one of more Python files. As a codebase grows so will the number of tests, so it's important to organise them sensibly.
Most Python developers use the testing package [pytest](https://docs.pytest.org/en/latest/), it's a great place to get started if you're new to testing code.

![The python tests directory of FLAMEGPU2.](episodes/fig/testsuite-dir.png){alt='A partial screenshot of windows file explorer, showing seven folders (codegen, detail, io, model, runtime, simulation, util) and two files conftest.py and test_version.py.'}
Here's a quick example of how a test can be used to check your function's output against an expected value.

Visible in the above screenshot `conftest.py` is an optional configuration that pytest will parse, in this case it runs additional code before and after the tests to disable telemetry.
Tests should be created within a project's testing directory, by creating files named with the form `test_*.py` or `*_test.py`.

Tests should be created within your testing directory, by creating files named with the form `test_*.py` or `*_test.py`.
pytest looks for these files, when running the test suite.

Within the created test file, any functions named in the form `test*` are considered tests that will be executed by pytest.
Expand Down Expand Up @@ -123,10 +121,10 @@ You may already have a different testing workflow in-place for validating the co

:::

<!-- todo exercise, write a test (suite?) for a provided function, to catch people not handling edge-cases-->

<!-- todo callout FAIR: testing course (when it's ready) -->

<!--
## Coming Up
In the remainder of this course we will cover:
Expand All @@ -146,6 +144,7 @@ In the remainder of this course we will cover:
- How variables are accessed & the performance implications
- Latency in perspective
- Memory allocation isn't free
-->

::::::::::::::::::::::::::::::::::::: keypoints

Expand Down
Loading

0 comments on commit 3cf6ae7

Please sign in to comment.