-
Notifications
You must be signed in to change notification settings - Fork 102
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: new wait for idle #1219
feat: new wait for idle #1219
Conversation
7e9bc99
to
2aa4634
Compare
@james-garner-canonical I'd appreciate an early review. I'm expecting integration tests to pass 🎉 There's still some refactoring to be done... and I'm happy to hear your thoughts about that and areas to improve in general. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great, so much nicer than the previous implementation. Really cool to have the tests passing on both. Was wait_for_idle tested much previously?
810eb09
to
9a36b7a
Compare
@james-garner-canonical the big refactor is done, I hope you'll like the result :) still a draft, as some unit tests are missing and I have one internal concern, but should be ready for review :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't looked at the tests (though I noticed the addition of freezegun
), but I've left some comments on the new implementation. My main concern is that the generator/coroutine business would be a lot easier (for me) to follow if we used a regular class/object instead (see comment on def _loop
) -- it took me a while to get my head around. I do like the separation of the time-based checks. Lmk if you want me to look at anything else in particular
9a36b7a
to
3bcf321
Compare
@james-garner-canonical I've refactored the loop to be an [async] filter. It turns out that I've misunderstood the definition of "idle". In short:
This means that I have to update the Edit: done now. |
6710b0c
to
b6fbb0c
Compare
88cd329
to
065d3de
Compare
@james-garner-canonical idle implemented, most comments addressed, tests pass again. please review :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Exciting updates, glad tests are passing now.
I'm finding it hard to follow the logical changes between versions on this PR, I think primarily because the generator adds a fair bit of mental overhead for me -- it's kind of nicer now (though the new outer async loop construct took me some time to digest), but I still have to put in more mental work to follow the various yield ... break
s and yield ... continue
s.
For now I've made some requests for changes to make it easier for me to follow what's going on. Sorry if that seems like wasted effort -- it did take me a while, but I kind of had to do it anyway to follow the code.
Let me know if you want me to look closer at specific logical changes (looks like there are changes to the idle_since
logic?). I still haven't looked in detail at your tests either, would you like further review there?
Also, what's the latest with running charmers' integration tests on the latest version? Are there still failures with the new wait_for_idle that pass with the old one?
async def status_on_demand(): | ||
while True: | ||
yield _idle.check( | ||
await self.get_status(), | ||
apps=apps, | ||
raise_on_error=raise_on_error, | ||
raise_on_blocked=raise_on_blocked, | ||
status=status, | ||
) | ||
|
||
async for done in _idle.loop( | ||
status_on_demand(), | ||
apps=apps, | ||
wait_for_exact_units=wait_for_exact_units, | ||
wait_for_units=wait_for_units, | ||
idle_period=idle_period, | ||
): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggestion: simplify this for noobs like me ... btw at first I thought this could be simplified without changing anything else, but after making it to _idle.loop
I realised that my original simplification was totally incorrect, which suggests that the current version is a bit tricky to follow ... the below suggestion assumes a later suggestion to refactor _idle.loop
to use a class, which I think makes things a lot easier to follow -- it certainly does for me
async def status_on_demand(): | |
while True: | |
yield _idle.check( | |
await self.get_status(), | |
apps=apps, | |
raise_on_error=raise_on_error, | |
raise_on_blocked=raise_on_blocked, | |
status=status, | |
) | |
async for done in _idle.loop( | |
status_on_demand(), | |
apps=apps, | |
wait_for_exact_units=wait_for_exact_units, | |
wait_for_units=wait_for_units, | |
idle_period=idle_period, | |
): | |
timing_checker = _idle.TimingChecker() | |
while True: | |
full_status = await self.get_status() | |
check_status = _idle.check( | |
raw_status, | |
apps=apps, | |
raise_on_error=raise_on_error, | |
raise_on_blocked=raise_on_blocked, | |
status=status, | |
) | |
done = timing_checker.check( | |
check_status, | |
apps=apps, | |
wait_for_exact_units=wait_for_exact_units, | |
wait_for_units=wait_for_units, | |
idle_period=idle_period, | |
) |
if unit.machine: | ||
machine = full_status.machines[unit.machine] | ||
assert isinstance(machine, MachineStatus) | ||
assert machine.instance_status | ||
if machine.instance_status.status == "error" and raise_on_error: | ||
raise JujuMachineError( | ||
f"{unit_name!r} machine {unit.machine!r} has errored: {machine.instance_status.info!r}" | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggestion: make it clear up front that we're only interested in unit.machine -- there's no elif/else coming
if unit.machine: | |
machine = full_status.machines[unit.machine] | |
assert isinstance(machine, MachineStatus) | |
assert machine.instance_status | |
if machine.instance_status.status == "error" and raise_on_error: | |
raise JujuMachineError( | |
f"{unit_name!r} machine {unit.machine!r} has errored: {machine.instance_status.info!r}" | |
) | |
if not unit.machine: | |
continue | |
machine = full_status.machines[unit.machine] | |
assert isinstance(machine, MachineStatus) | |
assert machine.instance_status | |
if machine.instance_status.status == "error" and raise_on_error: | |
raise JujuMachineError( | |
f"{unit_name!r} machine {unit.machine!r} has errored: {machine.instance_status.info!r}" | |
) |
|
||
for app_name in apps: | ||
units.update(_app_units(full_status, app_name)) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggestion: I really like how the loops below all do one thing, much nicer to read and follow than the original wait_for_idle. I notice that the first few only do anything if raise_on_error
, while the next couple only do something if raise_on_blocked
. How about something like:
if raise_on_error:
_check_for_errors(full_status, apps, units)
if raise_on_blocked:
_check_for_blocked(full_status, apps, units)
if app.status.status == "blocked" and raise_on_blocked: | ||
raise JujuAppError(f"{app_name!r} is blocked: {app.status.info!r}") | ||
|
||
rv = CheckStatus(set(), set(), set()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor suggestion: in the CheckStatus
definition, use = field(default_factory=set)
. Reasoning: I think rv = CheckStatus()
will look more natural to readers (like me =))
async def loop( | ||
foo: AsyncIterable[CheckStatus | None], | ||
*, | ||
apps: AbstractSet[str], | ||
wait_for_exact_units: int | None = None, | ||
wait_for_units: int, | ||
idle_period: float, | ||
) -> AsyncIterable[bool]: | ||
"""The outer, time-dependents logic of a wait_for_idle loop.""" | ||
idle_since: dict[str, float] = {} | ||
|
||
async for status in foo: | ||
logger.info("wait_for_idle iteration %s", status) | ||
now = time.monotonic() | ||
|
||
if not status: | ||
yield False | ||
continue | ||
|
||
expected_idle_since = now - idle_period | ||
|
||
# FIXME there's some confusion about what a "busy" unit is | ||
# are we ready when over the last idle_period, every time sampled: | ||
# a. >=N units were ready (possibly different each time), or | ||
# b. >=N units were ready each time | ||
for name in status.units: | ||
if name in status.idle_units: | ||
idle_since[name] = min(now, idle_since.get(name, float("inf"))) | ||
else: | ||
idle_since[name] = float("inf") | ||
|
||
if busy := {n for n, t in idle_since.items() if t > expected_idle_since}: | ||
logger.info("Waiting for units to be idle enough: %s", busy) | ||
yield False | ||
continue | ||
|
||
for app_name in apps: | ||
ready_units = [ | ||
n for n in status.ready_units if n.startswith(f"{app_name}/") | ||
] | ||
if len(ready_units) < wait_for_units: | ||
logger.info( | ||
"Waiting for app %r units %s >= %s", | ||
app_name, | ||
len(status.ready_units), | ||
wait_for_units, | ||
) | ||
yield False | ||
break | ||
|
||
if ( | ||
wait_for_exact_units is not None | ||
and len(ready_units) != wait_for_exact_units | ||
): | ||
logger.info( | ||
"Waiting for app %r units %s == %s", | ||
app_name, | ||
len(ready_units), | ||
wait_for_exact_units, | ||
) | ||
yield False | ||
break | ||
else: | ||
yield True |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggestion: I really think that a class to encapsulate the state, and a single synchronous method call call with returns, is so much easier to follow than the async generator, and it seems like it's pretty straightforwardly logically equivalent -- just a lot easier to follow for noobs like me
async def loop( | |
foo: AsyncIterable[CheckStatus | None], | |
*, | |
apps: AbstractSet[str], | |
wait_for_exact_units: int | None = None, | |
wait_for_units: int, | |
idle_period: float, | |
) -> AsyncIterable[bool]: | |
"""The outer, time-dependents logic of a wait_for_idle loop.""" | |
idle_since: dict[str, float] = {} | |
async for status in foo: | |
logger.info("wait_for_idle iteration %s", status) | |
now = time.monotonic() | |
if not status: | |
yield False | |
continue | |
expected_idle_since = now - idle_period | |
# FIXME there's some confusion about what a "busy" unit is | |
# are we ready when over the last idle_period, every time sampled: | |
# a. >=N units were ready (possibly different each time), or | |
# b. >=N units were ready each time | |
for name in status.units: | |
if name in status.idle_units: | |
idle_since[name] = min(now, idle_since.get(name, float("inf"))) | |
else: | |
idle_since[name] = float("inf") | |
if busy := {n for n, t in idle_since.items() if t > expected_idle_since}: | |
logger.info("Waiting for units to be idle enough: %s", busy) | |
yield False | |
continue | |
for app_name in apps: | |
ready_units = [ | |
n for n in status.ready_units if n.startswith(f"{app_name}/") | |
] | |
if len(ready_units) < wait_for_units: | |
logger.info( | |
"Waiting for app %r units %s >= %s", | |
app_name, | |
len(status.ready_units), | |
wait_for_units, | |
) | |
yield False | |
break | |
if ( | |
wait_for_exact_units is not None | |
and len(ready_units) != wait_for_exact_units | |
): | |
logger.info( | |
"Waiting for app %r units %s == %s", | |
app_name, | |
len(ready_units), | |
wait_for_exact_units, | |
) | |
yield False | |
break | |
else: | |
yield True | |
class TimingChecker: | |
"""The outer, time-dependent logic of a wait_for_idle loop.""" | |
def __init__(self): | |
self.idle_since: dict[str, float] = {} | |
def check( | |
status: CheckStatus, | |
* | |
apps: AbstractSet[str], | |
wait_for_exact_units: int | None = None, | |
wait_for_units: int, | |
idle_period: float, | |
) -> bool: | |
logger.info("wait_for_idle iteration %s", status) | |
now = time.monotonic() | |
if not status: | |
return False | |
expected_idle_since = now - idle_period | |
# FIXME there's some confusion about what a "busy" unit is | |
# are we ready when over the last idle_period, every time sampled: | |
# a. >=N units were ready (possibly different each time), or | |
# b. >=N units were ready each time | |
for name in status.units: | |
if name in status.idle_units: | |
self.idle_since[name] = min(now, idle_since.get(name, float("inf"))) | |
else: | |
self.idle_since[name] = float("inf") | |
if busy := {n for n, t in self.idle_since.items() if t > expected_idle_since}: | |
logger.info("Waiting for units to be idle enough: %s", busy) | |
return False | |
for app_name in apps: | |
ready_units = [ | |
n for n in status.ready_units if n.startswith(f"{app_name}/") | |
] | |
if len(ready_units) < wait_for_units: | |
logger.info( | |
"Waiting for app %r units %s >= %s", | |
app_name, | |
len(status.ready_units), | |
wait_for_units, | |
) | |
return False | |
if ( | |
wait_for_exact_units is not None | |
and len(ready_units) != wait_for_exact_units | |
): | |
logger.info( | |
"Waiting for app %r units %s == %s", | |
app_name, | |
len(ready_units), | |
wait_for_exact_units, | |
) | |
return False | |
return True |
I've pushed a separate branch with this refactor only: #1245 PTAL |
closing in favour of #1245 |
Pull the new Model.wait_for_idle implementation from #1104