diff --git a/index.md b/index.md index 6a05026..214c104 100644 --- a/index.md +++ b/index.md @@ -2,10 +2,14 @@ site: sandpaper::sandpaper_site --- + + +**Welcome to Performance Profiling & Optimisation (Python) Training!** The training curriculum for this course is designed for researchers that are writing Python and lack formal computer science training. The curriculum covers how to assess where time is being spent during execution of a Python program, it also provides a high level understanding of how code executes and how this maps to the limiting factors of performance and good practice. diff --git a/md5sum.txt b/md5sum.txt index 71f12de..e1ed012 100644 --- a/md5sum.txt +++ b/md5sum.txt @@ -2,18 +2,18 @@ "CODE_OF_CONDUCT.md" "c93c83c630db2fe2462240bf72552548" "site/built/CODE_OF_CONDUCT.md" "2024-01-03" "LICENSE.md" "b24ebbb41b14ca25cf6b8216dda83e5f" "site/built/LICENSE.md" "2024-01-03" "config.yaml" "b413b2dfbce4f70e178cae4d6d2d6311" "site/built/config.yaml" "2024-02-08" -"index.md" "5d420b7de3ab84e1eda988e6bc4d58b4" "site/built/index.md" "2024-01-29" +"index.md" "3a6d3683998a6b866c134a818f1bb46e" "site/built/index.md" "2024-02-13" "links.md" "8184cf4149eafbf03ce8da8ff0778c14" "site/built/links.md" "2024-01-03" "episodes/profiling-introduction.md" "a0163cbc57865b4fad063468ac4c0a41" "site/built/profiling-introduction.md" "2024-02-08" "episodes/profiling-functions.md" "4ea67773010619ae5fbaa2dc69ecc4f6" "site/built/profiling-functions.md" "2024-02-08" "episodes/profiling-lines.md" "8bd8cf015fcc38cdb004edf5fad75a65" "site/built/profiling-lines.md" "2024-02-08" "episodes/profiling-conclusion.md" "340969a321636eb94fff540191a511e7" "site/built/profiling-conclusion.md" "2024-01-29" -"episodes/optimisation-introduction.md" "496655bd664412eacb982024994d60b0" "site/built/optimisation-introduction.md" "2024-02-08" +"episodes/optimisation-introduction.md" "aff88de80645a433161ad48231f6fa7f" "site/built/optimisation-introduction.md" "2024-02-13" "episodes/optimisation-data-structures-algorithms.md" "75dbff01d990fa1e99beec4b24b2b0ad" "site/built/optimisation-data-structures-algorithms.md" "2024-02-08" -"episodes/optimisation-minimise-python.md" "4af3642c2a613a36d8d0ffb056225083" "site/built/optimisation-minimise-python.md" "2024-02-08" +"episodes/optimisation-minimise-python.md" "12d5c57fb3c31439d39c0d4997bdd323" "site/built/optimisation-minimise-python.md" "2024-02-13" "episodes/optimisation-use-latest.md" "829f7a813b0a9a131fa22e6dbb534cf7" "site/built/optimisation-use-latest.md" "2024-02-08" "episodes/optimisation-memory.md" "52c4b2884410050c9646cf987d2aa50e" "site/built/optimisation-memory.md" "2024-02-08" -"episodes/optimisation-conclusion.md" "e4a79aa1713310c75bc0ae9e258641c2" "site/built/optimisation-conclusion.md" "2024-01-29" +"episodes/optimisation-conclusion.md" "1d608c565c199cea5e00dc5209f3da1b" "site/built/optimisation-conclusion.md" "2024-02-13" "instructors/instructor-notes.md" "cae72b6712578d74a49fea7513099f8c" "site/built/instructor-notes.md" "2024-01-03" "learners/setup.md" "50d49ff7eb0ea2d12d75773ce1decd45" "site/built/setup.md" "2024-01-29" "learners/acknowledgements.md" "c4064263d442f147d3796cb3dfa7b351" "site/built/acknowledgements.md" "2024-02-08" diff --git a/optimisation-conclusion.md b/optimisation-conclusion.md index 4579521..fe39929 100644 --- a/optimisation-conclusion.md +++ b/optimisation-conclusion.md @@ -29,6 +29,24 @@ This course's website can be used as a reference manual when profiling your own ::::::::::::::::::::::::::::::::::::: keypoints - +Data Structures & Algorithms + - List comprehension should be preferred when constructing lists. + - Where appropriate, Tuples and Generator functions should be preferred over Python lists. + - Dictionaries and sets are appropriate for storing a collection of unique data with no intrinsic order for random access. + - When used appropriately, dictionaries and sets are significantly faster than lists. + - If searching a list or array is required, it should be sorted and searched using `bisect_left()` (binary search). +- Minimise Python Written + - Python is an interpreted language, this adds an additional overhead at runtime to the execution of Python code. Many core Python and NumPy functions are implemented in faster C/C++, free from this overhead. + - NumPy can take advantage of vectorisation to process arrays, which can greatly improve performance. + - Pandas' data tables store columns as arrays, therefore operations applied to columns can take advantage of NumPys vectorisation. +- Newer is Often Faster + - Where feasible, the latest version of Python and packages should be used as they can include significant free improvements to the performance of your code. + - There is a risk that updating Python or packages will not be possible to due to version incompatibilities or will require breaking changes to your code. + - Changes to packages may impact results output by your code, ensure you have a method of validation ready prior to attempting upgrades. +- How the Computer Hardware Affects Performance + - Sequential accesses to memory (RAM or disk) will be faster than random or scattered accesses. + - This is not always natively possible in Python without the use of packages such as NumPy and Pandas + - One large file is preferable to many small files. + - Memory allocation is not free, avoiding destroying and recreating objects can improve performance. :::::::::::::::::::::::::::::::::::::::::::::::: diff --git a/optimisation-introduction.md b/optimisation-introduction.md index f2d2dcd..2957fd8 100644 --- a/optimisation-introduction.md +++ b/optimisation-introduction.md @@ -136,15 +136,16 @@ In the remainder of this course we will cover: - Sets - Generator Functions - Searching -- How Python Executes - - Why less Python is often faster - - How to use NumPy for performance - - How to get the most from pandas +- Minimise Python Written + - built-ins + - NumPY + - Pandas - Newer is Often Faster - Keeping Python and packages upto date - How the Computer Hardware Affects Performance - - Why some accessing some variables can be faster than others - - Putting latencies in perspective + - How variables are accessed & the performance implications + - Latency in perspective + - Memory allocation isn't free ::::::::::::::::::::::::::::::::::::: keypoints diff --git a/optimisation-minimise-python.md b/optimisation-minimise-python.md index 36d275b..494abab 100644 --- a/optimisation-minimise-python.md +++ b/optimisation-minimise-python.md @@ -1,5 +1,5 @@ --- -title: "Minimise Python (Numpy/Pandas)" +title: "Minimise Python (NumPY/Pandas)" teaching: 0 exercises: 0 ---