-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Notes and comments from trial run 2024-02-15 #18
Comments
File formats - reasons for covering:
Could suggest creating HDF5 or Parquet caches of CSV files if need to make repeated reads of files? EDIT: |
See https://github.com/RSE-Sheffield/hi-perf-ipynb/blob/master/tutorials/01-multithreading.ipynb |
Generate a diagram of or text info on the CPU core, CPU cache, mem and peripheral device connectivity/affinity within your own machine: |
Native Python |
https://projecteuler.net/ is fab and language agnostic. |
Re references/objects in Numpy arrays and Pandas DataFrames being bad: recommend people look to see if |
Use of decorators when profiling: add suggestions for how to enable profiling on dubious-quality 3rd party code? Edit files within packages in virtualenv/conda env or something cleaner than that? |
'function' vs 'method': use 'function' everywhere for consistency unless explicitly meaning method of an object? |
|
Function profiling: could comment that easier to introduce if have somewhat modular software architecture (a reminder of issues of having functions 1000s of lines long)? |
Function Level Profiling
Profiling Summary
Optimisation
Testing
Data Structures and Algorithms
Minimise Python
Latency Overview
General comments
|
I spent some time this weekend profiling and trying to do some optimisation on one of my personal Python code projects. (Bear in mind that I am not a Python specialist and I wasn't working on particularly scientific or complex code.) The profiling part of the course worked as advertised and helped me identify exactly which bits of my code were slow and would benefit from some effort improving. Unfortunately, the result there was that the major slow down in my code was caused by poor coding on my part and can only be improved by a better algorithm for tackling the problem, and not as far as I can see by taking advantage of any Python-specific quirks. A few bits of the optimisation side of the course were still quite helpful however. I think by far the most useful thing I learned was about variable scope and function calls causing slow downs. The easiest and largest speed gains I got were pre-allocating non-local variables to local copies, putting functions called only once inline. Particularly the scope thing I think speed up those functions by around 10% and it might be worth putting more emphasis on this than just a single callout. I think it would be beneficial to acknowledge at some point in the course that there might not be any optimisations to be made. I would not like to put a researcher in a position of "these things should be helping me but I can't get them to work I feel so disheartened". |
Thanks Fred, useful comments. I appreciate all this feedback, not too sure when I will have to time to address it though. I've got a bit of a busy month. |
I think @gyengen currently plans for it to run on managed desktops in Hicks. So this may not be that simple. |
With a wider view though is it possible that some might want to use their own laptops? I never used the managed desktops so don't know if its possible for people to install software in advance, i.e. they work like VMs/Remote desktops. If so it would seem sensible to ask people to download and install software and data in advance as doing so at the start of a session wastes valuable face-to-face time. Also this course has the possibility of feeding up-stream into the Carpentries Incubator where it could be used by others and may see contributions and so making it as general as possible would be useful. In that regard having instructions for participants to download and install setups before hand would be really useful. |
Yes, eventually. Still a lot to resolve before then. I'm acknowledging the feedback (not going to hide it away), just not an immediate priority. Afaik carpentries format does have a data page, which would serve this purpose. I'm just not a huge fan of having individual downloads that need to also be manually archived if change. So would want to look at whether I can fudge carpentries CI to do that for me.
There's already Sheffield specific stuff in here (such as the Theme), I expect carpentries incubator would end up being a fork of this repository. |
Cool, the main reason I mentioned it is that with the Git course it can delay the start of the session if people hadn't followed the setup instructions. If/when you get round to creating archives there seems to be a GitHub Action for everything...Create Archive · Actions · GitHub Marketplace! |
Removed the scope callout whilst removing generator functions. Need to workout where it fits.
|
Hopefully useful capture of some of the points raised.
Introduction to Profiling
Function Level Profiling
to
in What is a Stack?traceback.print_stack()
travellingsales.py
with10
cities took about 5 minutes, that needs accounting for in the class. Perhaps get them to start this and then talk about something else whilst running and return to the output.predprey.py
ran in about 20 seconds.numpy
which isn't included by default.Line Level Profiling
<script name/arguments>
throughout.cProfile
down toline_profile
.Optimisation
Testing
Data Structures and Algorithms
They allows direct and sequential element access, with the convenience to append items.
extras
on allows.List
Generators
Sets
items
.Searching
load factor
andcollisions
Minimise Python
zip()
from built-in operators?Numpy
dtype
when having arrays of mixed types.Pandas
import numpy as np
Keeping Python up-to-date
such changes to the JIT and GIL will provide
is missing anas
.Memory
Accessing Disk
Latency Overview
Optimisation Conclusion
Useful resources to point people (from @ns-rse)
The text was updated successfully, but these errors were encountered: