-
-
Notifications
You must be signed in to change notification settings - Fork 561
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix IDAKLU solver crash when running in multiple threads + when telemetry is prompted for #4583
base: develop
Are you sure you want to change the base?
Fix IDAKLU solver crash when running in multiple threads + when telemetry is prompted for #4583
Conversation
This approach might seem overkill but this is what came to mind first and seems like the most robust way to me, so I'd like to test this a bit more – I've converted this to a draft for now. I triggered a wheel build here, let's see if it passes: https://github.com/agriyakhetarpal/PyBaMM/actions/runs/11808256997 I've confirmed the fix locally with a fresh PyBaMM installation (i.e., with the
across the following scenarios:
|
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## develop #4583 +/- ##
===========================================
- Coverage 99.26% 99.26% -0.01%
===========================================
Files 302 302
Lines 22889 22866 -23
===========================================
- Hits 22721 22698 -23
Misses 168 168 ☔ View full report in Codecov by Sentry. |
The buttons aren't working in the notebooks, actually – I'm quite sure they were working when I was testing earlier, so I guess I made some additional changes which made them break. I'll debug later in the day, and I'll consider dropping the buttons in favour of a simple input in case I don't get them to work. |
I switched to a more basic implementation for Jupyter notebooks, and I've seen bug reports about how inputs don't work properly with JupyterLab <4, but I don't think (too) many people are using version 3 now, so we should be good. |
Additionally, I tested on Google Colab by installing PyBaMM in editable mode from my branch – we should be ready to go with this. Suggestions on cleaning up the code are welcome. The only place I haven't been able to test is Windows, because I don't have a Windows machine. |
Linux wheel builds passing: https://github.com/agriyakhetarpal/PyBaMM/actions/runs/11819747790/job/32930388878 |
Ok I will take a close look at this in an hour or so |
src/pybamm/config.py
Outdated
if is_notebook(): # pragma: no cover | ||
try: | ||
from IPython.display import clear_output | ||
|
||
user_input = input("Do you want to enable telemetry? (Y/n): ") | ||
clear_output() | ||
|
||
return user_input | ||
|
||
except Exception: # pragma: no cover | ||
return None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not a major issue, but this has no timeout so people notebooks will just hang if they don't realize this popped up
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this is intentional. I opine that notebooks are meant to be used interactively by default – I expect this to be a one-time issue, since importing PyBaMM and choosing "yes" or "no" once would mean that the config gets saved and this never comes up again (until the config file gets deleted, of course).
while time.time() - start_time < timeout: | ||
rlist, _, _ = select.select([sys.stdin], [], [], 0.1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we are already using select, do we need the loop? The timeout isn't very long
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I checked and this loop ended up being required – since while normal input()
in Python blocks indefinitely, I found that there's no (built-in) way to timeout if we are in raw terminal mode (set above in tty.setraw()
). So we need to manually handle everything, which is to read the "y" or "n" or empty characters and flush them to the screen.
select
lets us say "wake me up when, either":
- there's input to read, or
- one-tenth of a second has passed
which the loop apparently allows us to do. But if you have something else in mind, I'm happy to try that.
import select | ||
|
||
# Save terminal settings for later | ||
old_settings = termios.tcgetattr(sys.stdin) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems excessive that we need to change the terminal settings for this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What this allows you to do is that you don't get new indentation when you write to the terminal again. I wrapped this in a try-finally block based on the Python docs: https://docs.python.org/3/library/termios.html#example.
So, by saving the attributes, I'm trying to disable the fact that sys.stdout.write("\n")
will not return the cursor to the leftmost position (because we are in raw mode and we need to do that manually instead – for which I want to avoid).
We end up with this, otherwise:
❯ python -c "import pybamm; print(pybamm.IDAKLUSolver())"
PyBaMM can collect usage data and send it to the PyBaMM team to help us improve the software.
We do not collect any sensitive information such as models, parameters, or simulation results - only information on which parts of the code are being used and how frequently.
This is entirely optional and does not impact the functionality of PyBaMM.
For more information, see https://docs.pybamm.org/en/latest/source/user_guide/index.html#telemetry
Do you want to enable telemetry? (Y/n): n
Telemetry disabled.
<pybamm.solvers.idaklu_solver.IDAKLUSolver object at 0x102de3ad0>
%
and this would break one's terminal until one restarts it – a more robust solution could be to use TCSAFLUSH
instead of TCSADRAIN
, I'll switch to that.
src/pybamm/config.py
Outdated
else: | ||
print("Invalid input. Please enter 'Y/y' for yes or 'n/N' for no:") | ||
user_input = get_input_or_timeout(timeout) | ||
if user_input is None: | ||
print("\nTimeout reached. Defaulting to not enabling telemetry.") | ||
return False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know I brought it up before, but if we just check for the yes options, we can skip having an infinite loop
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If a person presses "Enter" in haste, that enables telemetry for them (because of your suggestion that I applied above), which might not be what they want – I'm trying to reduce the chance of that happening
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, should I keep it as "yes" if Enter is pressed or "no"? I feel the latter is the better option
Co-authored-by: Eric G. Kratz <[email protected]>
Continuing from my question above: though I pushed e4859dc which fixes the failing test, it also changes the behaviour on pressing "Enter" to enable telemetry – which, to me, does not seem like good UX because it can cause an inadvertent "yes" from a user. It might be better to be explicit rather than implicit: either allow only a "yes" or a "no" in the response and ask them to enter their response again if it's empty, or let "Enter" disable telemetry rather than enabling it). I understand that it would be good for us to get a reasonable amount of telemetry data from as many users as we can, but we should also be conscientious with our approach. @valentinsulzer, do you have any thoughts? I might have missed this when reviewing #4441. That said, if prompts in other applications do regard pressing Enter/an empty input as a "yes", then this approach is completely fine with me. |
Crashes should be resolved. I could not reproduce any of the notebook issues |
I'd say that this PR could still be useful because it avoids using |
@agriyakhetarpal I built the wheel with develop and did some testing on Windows. A few things I noticed:
The fact that tests are failing locally on windows bothers me, but it looks like some of this was already present and not caught by all of our tests. It is also possible that the ~45 failing Jax tests are due to issues with my Windows setup since the tests pass in CI. I won't know for sure until I do additional testing for the release. I will continue investigating the failures on windows, but I don't think removing the threading or adding extra logic to text output is needed at the moment. The changes to the installation guide, noxfile.py, and the pyproject.toml look good. Next steps:
|
Just a quick comment for now; I'll get back to the rest in a while: @kratman, if you can't reproduce the Windows crashes in CI, could you also check how old the machine that you are using locally is? Some JAX operations might crash or simply not work on very old machines because of the lack of AVX512 instructions, so maybe this is indeed just something that is for your setup and should be fine for slightly newer Windows computers. |
This is not a particularly old machine. It is an i5-1235U (2022), which is the Alder Lake group of chips that do not have AVX-512, it does have AVX and AVX2 though |
Description
This came up as an unrelated issue in #4582 when testing the wheels and apparently stems from #4441 where a thread was being used to check for the input. This caused the Linux wheel tests to fail: https://github.com/agriyakhetarpal/PyBaMM/actions/runs/11805727556/job/32890103930. This PR redoes that with a multi-platform approach and avoids the use of threads altogether by checking for different environments: Windows via
msvcrt
and Linux/macOS viatermios
.A possible corner case for Jupyter Notebooks (running in VS Code versus Jupyter Notebook/JupyterLab) has also been fixed.I've resorted to a more rudimentary fix using the previous method for now, since this fix was erroneous and did not work intermittently.Type of change
Please add a line in the relevant section of CHANGELOG.md to document the change (include PR #) - note reverse order of PR #s. If necessary, also add to the list of breaking changes.
Key checklist:
$ pre-commit run
(or$ nox -s pre-commit
) (see CONTRIBUTING.md for how to set this up to run automatically when committing locally, in just two lines of code)$ python run-tests.py --all
(or$ nox -s tests
)$ python run-tests.py --doctest
(or$ nox -s doctests
)You can run integration tests, unit tests, and doctests together at once, using
$ python run-tests.py --quick
(or$ nox -s quick
).Further checks: