Skip to content

Commit

Permalink
differences for PR #20
Browse files Browse the repository at this point in the history
  • Loading branch information
actions-user committed Mar 5, 2024
1 parent cff061a commit b5adb3e
Show file tree
Hide file tree
Showing 27 changed files with 5,945 additions and 146 deletions.
5,745 changes: 5,745 additions & 0 deletions fig/stack.ai

Large diffs are not rendered by default.

Binary file added fig/stack.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added files/pred-prey/predprey_out.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added files/schelling_out.prof
Binary file not shown.
Binary file added files/travelling-sales/profiles.zip
Binary file not shown.
Binary file added files/travelling-sales/profiles/out_01.prof
Binary file not shown.
Binary file added files/travelling-sales/profiles/out_02.prof
Binary file not shown.
Binary file added files/travelling-sales/profiles/out_03.prof
Binary file not shown.
Binary file added files/travelling-sales/profiles/out_04.prof
Binary file not shown.
Binary file added files/travelling-sales/profiles/out_05.prof
Binary file not shown.
Binary file added files/travelling-sales/profiles/out_06.prof
Binary file not shown.
Binary file added files/travelling-sales/profiles/out_07.prof
Binary file not shown.
Binary file added files/travelling-sales/profiles/out_08.prof
Binary file not shown.
Binary file added files/travelling-sales/profiles/out_09.prof
Binary file not shown.
Binary file added files/travelling-sales/profiles/out_10.prof
Binary file not shown.
22 changes: 11 additions & 11 deletions md5sum.txt
Original file line number Diff line number Diff line change
Expand Up @@ -4,17 +4,17 @@
"config.yaml" "b413b2dfbce4f70e178cae4d6d2d6311" "site/built/config.yaml" "2024-02-08"
"index.md" "3a6d3683998a6b866c134a818f1bb46e" "site/built/index.md" "2024-02-15"
"links.md" "8184cf4149eafbf03ce8da8ff0778c14" "site/built/links.md" "2024-01-03"
"episodes/profiling-introduction.md" "a0163cbc57865b4fad063468ac4c0a41" "site/built/profiling-introduction.md" "2024-02-08"
"episodes/profiling-functions.md" "4ea67773010619ae5fbaa2dc69ecc4f6" "site/built/profiling-functions.md" "2024-02-08"
"episodes/profiling-lines.md" "8bd8cf015fcc38cdb004edf5fad75a65" "site/built/profiling-lines.md" "2024-02-08"
"episodes/profiling-conclusion.md" "340969a321636eb94fff540191a511e7" "site/built/profiling-conclusion.md" "2024-01-29"
"episodes/optimisation-introduction.md" "aff88de80645a433161ad48231f6fa7f" "site/built/optimisation-introduction.md" "2024-02-15"
"episodes/optimisation-data-structures-algorithms.md" "75dbff01d990fa1e99beec4b24b2b0ad" "site/built/optimisation-data-structures-algorithms.md" "2024-02-08"
"episodes/optimisation-minimise-python.md" "12d5c57fb3c31439d39c0d4997bdd323" "site/built/optimisation-minimise-python.md" "2024-02-15"
"episodes/optimisation-use-latest.md" "829f7a813b0a9a131fa22e6dbb534cf7" "site/built/optimisation-use-latest.md" "2024-02-08"
"episodes/optimisation-memory.md" "52c4b2884410050c9646cf987d2aa50e" "site/built/optimisation-memory.md" "2024-02-08"
"episodes/optimisation-conclusion.md" "1d608c565c199cea5e00dc5209f3da1b" "site/built/optimisation-conclusion.md" "2024-02-15"
"episodes/profiling-introduction.md" "7dae558b7851344dcb1746141b6fdf0a" "site/built/profiling-introduction.md" "2024-03-05"
"episodes/profiling-functions.md" "806c4d44bd7b0957030044a3010c3004" "site/built/profiling-functions.md" "2024-03-05"
"episodes/profiling-lines.md" "547d98ccf5dfb92abb63feb0e02fc8a2" "site/built/profiling-lines.md" "2024-03-05"
"episodes/profiling-conclusion.md" "a3c2deb1bc4efaaf4a2a70f966734b71" "site/built/profiling-conclusion.md" "2024-03-05"
"episodes/optimisation-introduction.md" "2c2bbafab97d4db78aa5735839516c81" "site/built/optimisation-introduction.md" "2024-03-05"
"episodes/optimisation-data-structures-algorithms.md" "660ab2356de1ff90e3c6cc339b1d6f31" "site/built/optimisation-data-structures-algorithms.md" "2024-03-05"
"episodes/optimisation-minimise-python.md" "adbadeb1eedc1adfb0cf920c4ba4f341" "site/built/optimisation-minimise-python.md" "2024-03-05"
"episodes/optimisation-use-latest.md" "4c939e9dbde33a1f47fefe5e757ff256" "site/built/optimisation-use-latest.md" "2024-03-05"
"episodes/optimisation-memory.md" "69eb84dfc419083ff12856a80750a618" "site/built/optimisation-memory.md" "2024-03-05"
"episodes/optimisation-conclusion.md" "ccd780c447f0b0ce97b8da1b2572b9c1" "site/built/optimisation-conclusion.md" "2024-03-05"
"instructors/instructor-notes.md" "cae72b6712578d74a49fea7513099f8c" "site/built/instructor-notes.md" "2024-01-03"
"learners/setup.md" "50d49ff7eb0ea2d12d75773ce1decd45" "site/built/setup.md" "2024-01-29"
"learners/setup.md" "3465b1c09e7527d085eb32f647227dc6" "site/built/setup.md" "2024-03-05"
"learners/acknowledgements.md" "c4064263d442f147d3796cb3dfa7b351" "site/built/acknowledgements.md" "2024-02-08"
"profiles/learner-profiles.md" "60b93493cf1da06dfd63255d73854461" "site/built/learner-profiles.md" "2024-01-03"
2 changes: 1 addition & 1 deletion optimisation-conclusion.md
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,7 @@ This course's website can be used as a reference manual when profiling your own

::::::::::::::::::::::::::::::::::::: keypoints

Data Structures & Algorithms
- Data Structures & Algorithms
- List comprehension should be preferred when constructing lists.
- Where appropriate, Tuples and Generator functions should be preferred over Python lists.
- Dictionaries and sets are appropriate for storing a collection of unique data with no intrinsic order for random access.
Expand Down
116 changes: 26 additions & 90 deletions optimisation-data-structures-algorithms.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,13 +24,35 @@ exercises: 0

::::::::::::::::::::::::::::::::::::::::::::::::

::::::::::::::::::::::::::::::::::::: Instructor

The important information for students to learn within this episode are the patterns demonstrated via the benchmarks.

This episode introduces many complex topics, these are used to ground the performant patterns in understanding to aid memorisation.

It should not be a concern to students if they find the data-structure/algorithm internals challenging, if they are still able to recognise the demonstrated patterns.

::::::::::::::::::::::::::::::::::::::::::::::::

::::::::::::::::::::::::::::::::::::: callout

## This episode is challenging!

Within this episode you will be introduced to how certain data-structures and algorithms work.

This is used to explain why one approach is likely to execute faster than another.

It matters that you are able to recognise the faster/slower approaches, not that you can describe or reimplement these data-structures and algorithms yourself.

:::::::::::::::::::::::::::::::::::::::::::::

## Lists

Lists are a fundamental data structure within Python.

It is implemented as a form of dynamic array found within many programming languages by different names (C++: `std::vector`, Java: `ArrayList`, R: `vector`, Julia: `Vector`).

They allows direct and sequential element access, with the convenience to append items.
They allow direct and sequential element access, with the convenience to append items.

This is achieved by internally storing items in a static array.
This array however can be longer than the `List`, so the current length of the list is stored alongside the array.
Expand Down Expand Up @@ -113,100 +135,14 @@ This can be easily demonstrated with Python's `timeit` module in your console.
It takes 3x as long to allocate a short list than a tuple of equal length. This gap only grows with the length, as the tuple cost remains roughly static whereas the cost of allocating the list grows slightly.


## Generator Functions

You may not even require your data be stored in a list or tuple if it is only accessed once and in sequence.

Generators are special functions, that use `yield` rather than `return`. Each time the generator is called, it resumes computation until the next `yield` statement is hit to return the next value.

This avoids needing to allocate a data structure, and can greatly reduce memory utilisation.

Common examples for generators include:

* Reading from a large file that may not fit in memory.
* Any generated sequence where the required length is unknown.

The below example demonstrates how a generator function (`fibonnaci_generator()`) differs from one that simply returns a constructed list (`fibonacci_list()`).

```python
from timeit import timeit

N = 1000000
repeats = 1000

def fibonacci_generator():
a=0
b=1
while True:
yield b
a,b= b,a+b

def fibonacci_list(max_val):
rtn = []
a=0
b=1
while b < max_val:
rtn.append(b)
a,b= b,a+b
return rtn

def test_generator():
t = 0
max_val = N
for i in fibonacci_generator():
if i > max_val:
break
t += i

def test_list():
li = fibonacci_list(N)
t = 0
for i in li:
t += i

def test_list_long():
t = 0
max_val = N
li = fibonacci_list(max_val*10)
for i in li:
if i > max_val:
break
t += i

print(f"Gen: {timeit(test_generator, number=repeats):.5f}ms")
print(f"List: {timeit(test_list, number=repeats):.5f}ms")
print(f"List_long: {timeit(test_list_long, number=repeats):.5f}ms")
```

The performance of `test_generator()` and `test_list()` are comparable, however `test_long_list()` which generates a list with 5 extra elements (35 vs 30) is consistently slower.

```output
Gen: 0.00251ms
List: 0.00256ms
List_long: 0.00332ms
```

Unlike list comprehension, a generator function will normally involve a Python loop. Therefore, their performance is typically slower than list comprehension where much of the computation can be offloaded to the CPython backend.

::::::::::::::::::::::::::::::::::::: callout

The use of `max_val` in the previous example moves the value of `N` from global to local scope.

The Python interpreter checks local scope first when finding variables, therefore this makes accessing local scope variables slightly faster than global scope, this is most visible when a variable is being accessed regularly such as within a loop.

Replacing the use of `max_val` with `N` inside `test_generator()` causes the function to consistently perform a little slower than `test_list()`, whereas before the change it would normally be a little faster.

:::::::::::::::::::::::::::::::::::::::::::::


## Dictionaries

Dictionaries are another fundamental Python data-structure.
They provide a key-value store, whereby unique keys with no intrinsic order map to attached values.

::::::::::::::::::::::::::::::::::::: callout

> no intrinsic order
## "no intrinsic order"

Since Python 3.6, the items within a dictionary will iterate in the order that they were inserted. This does not apply to sets.

Expand All @@ -224,7 +160,7 @@ If that index doesn't already contain another key, the key (and any associated v
When the index isn't free, a collision strategy is applied. CPython's [dictionary](https://github.com/python/cpython/blob/main/Objects/dictobject.c) and [set](https://github.com/python/cpython/blob/main/Objects/setobject.c) both use a form of open addressing whereby a hash is mutated and corresponding indices probed until a free one is located.
When the hashing data structure exceeds a given load factor (e.g. 2/3 of indices have been assigned keys), the internal storage must grow. This process requires every item to be re-inserted which can be expensive, but reduces the average probes for a key to be found.

![An visual explanation of linear probing, CPython uses an advanced form of this.](episodes/fig/hash_linear_probing.png){alt='A diagram demonstrating how the keys (hashes) 37, 64, 14, 94, 67 are inserted into a hash table with 11 indices. This is followed by the insertion of 59, 80 and 39 which require linear probing to be inserted due to collisions.'}
![An visual explanation of linear probing, CPython uses an advanced form of this.](episodes/fig/hash_linear_probing.png){alt="A diagram demonstrating how the keys (hashes) 37, 64, 14, 94, 67 are inserted into a hash table with 11 indices. This is followed by the insertion of 59, 80 and 39 which require linear probing to be inserted due to collisions."}

To retrieve or check for the existence of a key within a hashing data structure, the key is hashed again and a process equivalent to insertion is repeated. However, now the key at each index is checked for equality with the one provided. If any empty index is found before an equivalent key, then the key must not be present in the ata structure.

Expand Down Expand Up @@ -253,7 +189,7 @@ class MyKey:
dict = {}
dict[MyKey("one", 2, 3.0)] = 12
```
The only limitation is that two objects where two objects are equal they must have the same hash, hence all member variables which contribute to `__eq__()` should also contribute to `__hash__()` and vice versa (it's fine to have irrelevant or redundant internal members contribute to neither).
The only limitation is that where two objects are equal they must have the same hash, hence all member variables which contribute to `__eq__()` should also contribute to `__hash__()` and vice versa (it's fine to have irrelevant or redundant internal members contribute to neither).

## Sets

Expand Down
15 changes: 7 additions & 8 deletions optimisation-introduction.md
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,7 @@ Now that you're able to find the most expensive components of your code with pro
In order to optimise code for performance, it is necessary to have an understanding of what a computer is doing to execute it.

<!-- Goal is to give you a high level understanding of how your code executes. You don't need to be an expert, even a vague general understanding will leave you in a stronger position. -->
Even a high-level understanding of a typical computer architecture; the most common data-structures and algorithms; and how Python executes your code, enable the identification of suboptimal approaches. If you have learned to write code informally out of necessity, to get something to work, it's not uncommon to have collected some bad habits along the way.
Even a high-level understanding of how you code executes, such as how Python and the most common data-structures and algorithms are implemented, can help you to identify suboptimal approaches when programming. If you have learned to write code informally out of necessity, to get something to work, it's not uncommon to have collected some bad habits along the way.

<!-- This is largely high-level/abstract knowledge applicable to the vast majority of programming languages, applies even more strongly if using compiled Python features like numba -->
The remaining content is often abstract knowledge, that is transferable to the vast majority of programming languages. This is because the hardware architecture, data-structures and algorithms used are common to many languages and they hold some of the greatest influence over performance bottlenecks.
Expand Down Expand Up @@ -53,18 +53,16 @@ When optimising your code, you are making speculative changes. It's easy to make
Testing is hopefully already a seamless part of your research software development process.
Test can be used to clarify how your software should perform, ensuring that new features work as intended and protecting against unintended changes to old functionality.

There are a plethora of methods for testing code. Most Python developers use the testing package [pytest](https://docs.pytest.org/en/latest/).
There are a plethora of methods for testing code.

## pytest Overview

Typically a developer will create a folder for their tests.
Tests can be split across one of more Python files. As a codebase grows so will the number of tests, so it's important to organise them sensibly.
Most Python developers use the testing package [pytest](https://docs.pytest.org/en/latest/), it's a great place to get started if you're new to testing code.

![The python tests directory of FLAMEGPU2.](episodes/fig/testsuite-dir.png){alt='A partial screenshot of windows file explorer, showing seven folders (codegen, detail, io, model, runtime, simulation, util) and two files conftest.py and test_version.py.'}
Here's a quick example of how a test can be used to check your function's output against an expected value.

Visible in the above screenshot `conftest.py` is an optional configuration that pytest will parse, in this case it runs additional code before and after the tests to disable telemetry.
Tests should be created within a project's testing directory, by creating files named with the form `test_*.py` or `*_test.py`.

Tests should be created within your testing directory, by creating files named with the form `test_*.py` or `*_test.py`.
pytest looks for these files, when running the test suite.

Within the created test file, any functions named in the form `test*` are considered tests that will be executed by pytest.
Expand Down Expand Up @@ -123,10 +121,10 @@ You may already have a different testing workflow in-place for validating the co

:::

<!-- todo exercise, write a test (suite?) for a provided function, to catch people not handling edge-cases-->

<!-- todo callout FAIR: testing course (when it's ready) -->

<!--
## Coming Up
In the remainder of this course we will cover:
Expand All @@ -146,6 +144,7 @@ In the remainder of this course we will cover:
- How variables are accessed & the performance implications
- Latency in perspective
- Memory allocation isn't free
-->

::::::::::::::::::::::::::::::::::::: keypoints

Expand Down
4 changes: 2 additions & 2 deletions optimisation-memory.md
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,7 @@ Modern computer's typically have a single processor (CPU), within this processor
Data held in memory by running software is exists in RAM, this memory is faster to access than hard drives (and solid-state drives).
But the CPU has much smaller caches on-board, to make accessing the most recent variables even faster.

![An annotated photo of a computer's hardware.](episodes/fig/annotated-motherboard.jpg){alt='An annotated photo of inside a desktop computer's case. The CPU, RAM, power supply, graphics cards (GPUs) and harddrive are labelled.'}
![An annotated photo of a computer's hardware.](episodes/fig/annotated-motherboard.jpg){alt="An annotated photo of inside a desktop computer's case. The CPU, RAM, power supply, graphics cards (GPUs) and harddrive are labelled."}

<!-- Read/operate on variable ram->cpu cache->registers->cpu -->
When reading a variable, to perform an operation with it, the CPU will first look in it's registers. These exist per core, they are the location that computation is actually performed. Accessing them is incredibly fast, but there only exists enough storage for around 32 variables (typical number, e.g. 4 bytes).
Expand Down Expand Up @@ -160,7 +160,7 @@ An even greater overhead would apply.

Latency can have a big impact on the speed that a program executes, the below graph demonstrates this. Note the log scale!

![A graph demonstrating the wide variety of latencies a programmer may experience when accessing data.](episodes/fig/latency.png){alt='A horizontal bar chart displaying the relative latencies for L1/L2/L3 cache, RAM, SSD, HDD and a packet being sent from London to California and back. These latencies range from 1 nanosecond to 140 milliseconds and are displayed with a log scale.'}
![A graph demonstrating the wide variety of latencies a programmer may experience when accessing data.](episodes/fig/latency.png){alt="A horizontal bar chart displaying the relative latencies for L1/L2/L3 cache, RAM, SSD, HDD and a packet being sent from London to California and back. These latencies range from 1 nanosecond to 140 milliseconds and are displayed with a log scale."}

The lower the latency typically the higher the effective bandwidth. L1 and L2 cache have 1TB/s, RAM 100GB/s, SSDs upto 32 GB/s, HDDs upto 150MB/s. Making large memory transactions even slower.

Expand Down
Loading

0 comments on commit b5adb3e

Please sign in to comment.