feat: new wait for idle #1219

dimaqq · 2024-11-26T08:53:15Z

Pull the new Model.wait_for_idle implementation from #1104

pull in the unit tests
pull in the implementation
new unit tests pass
all unit tests pass
feature flag
run integration tests with and without the feature flag
implement idle/ready distinction properly
merge chore: remove juju.loop, deprecated before 3.0 #1242 and rebase

dimaqq · 2024-12-04T07:50:11Z

@james-garner-canonical I'd appreciate an early review.

I'm expecting integration tests to pass 🎉

There's still some refactoring to be done... and I'm happy to hear your thoughts about that and areas to improve in general.

james-garner-canonical

Looks great, so much nicer than the previous implementation. Really cool to have the tests passing on both. Was wait_for_idle tested much previously?

juju/model.py

tests/unit/test_wait_for_idle.py

juju/unit.py

juju/client/facade.py

dimaqq · 2024-12-11T08:25:24Z

@james-garner-canonical the big refactor is done, I hope you'll like the result :)

still a draft, as some unit tests are missing and I have one internal concern, but should be ready for review :)

james-garner-canonical

I haven't looked at the tests (though I noticed the addition of freezegun), but I've left some comments on the new implementation. My main concern is that the generator/coroutine business would be a lot easier (for me) to follow if we used a regular class/object instead (see comment on def _loop) -- it took me a while to get my head around. I do like the separation of the time-based checks. Lmk if you want me to look at anything else in particular

juju/model/idle.py

juju/model/__init__.py

juju/model/idle.py

juju/model/__init__.py

dimaqq · 2024-12-17T08:29:40Z

@james-garner-canonical I've refactored the loop to be an [async] filter.

It turns out that I've misunderstood the definition of "idle".

In short:

ready units: depends on workload (and possibly agent, machine, etc.)
idle units: depends on agent only

This means that I have to update the idle.check code.

Edit: done now.

dimaqq · 2024-12-18T05:13:51Z

@james-garner-canonical idle implemented, most comments addressed, tests pass again. please review :)

juju/model/_idle.py

james-garner-canonical

Exciting updates, glad tests are passing now.

I'm finding it hard to follow the logical changes between versions on this PR, I think primarily because the generator adds a fair bit of mental overhead for me -- it's kind of nicer now (though the new outer async loop construct took me some time to digest), but I still have to put in more mental work to follow the various yield ... breaks and yield ... continues.

For now I've made some requests for changes to make it easier for me to follow what's going on. Sorry if that seems like wasted effort -- it did take me a while, but I kind of had to do it anyway to follow the code.

Let me know if you want me to look closer at specific logical changes (looks like there are changes to the idle_since logic?). I still haven't looked in detail at your tests either, would you like further review there?

Also, what's the latest with running charmers' integration tests on the latest version? Are there still failures with the new wait_for_idle that pass with the old one?

james-garner-canonical · 2024-12-18T23:08:02Z

juju/model/__init__.py

+        async def status_on_demand():
+            while True:
+                yield _idle.check(
+                    await self.get_status(),
+                    apps=apps,
+                    raise_on_error=raise_on_error,
+                    raise_on_blocked=raise_on_blocked,
+                    status=status,
+                )
+
+        async for done in _idle.loop(
+            status_on_demand(),
+            apps=apps,
+            wait_for_exact_units=wait_for_exact_units,
+            wait_for_units=wait_for_units,
+            idle_period=idle_period,
+        ):


Suggestion: simplify this for noobs like me ... btw at first I thought this could be simplified without changing anything else, but after making it to _idle.loop I realised that my original simplification was totally incorrect, which suggests that the current version is a bit tricky to follow ... the below suggestion assumes a later suggestion to refactor _idle.loop to use a class, which I think makes things a lot easier to follow -- it certainly does for me

Suggested change

async def status_on_demand():

while True:

yield _idle.check(

await self.get_status(),

apps=apps,

raise_on_error=raise_on_error,

raise_on_blocked=raise_on_blocked,

status=status,

)

async for done in _idle.loop(

status_on_demand(),

apps=apps,

wait_for_exact_units=wait_for_exact_units,

wait_for_units=wait_for_units,

idle_period=idle_period,

):

timing_checker = _idle.TimingChecker()

while True:

full_status = await self.get_status()

check_status = _idle.check(

raw_status,

apps=apps,

raise_on_error=raise_on_error,

raise_on_blocked=raise_on_blocked,

status=status,

)

done = timing_checker.check(

check_status,

apps=apps,

wait_for_exact_units=wait_for_exact_units,

wait_for_units=wait_for_units,

idle_period=idle_period,

)

james-garner-canonical · 2024-12-18T23:10:05Z

juju/model/_idle.py

+        if unit.machine:
+            machine = full_status.machines[unit.machine]
+            assert isinstance(machine, MachineStatus)
+            assert machine.instance_status
+            if machine.instance_status.status == "error" and raise_on_error:
+                raise JujuMachineError(
+                    f"{unit_name!r} machine {unit.machine!r} has errored: {machine.instance_status.info!r}"
+                )


Suggestion: make it clear up front that we're only interested in unit.machine -- there's no elif/else coming

Suggested change

if unit.machine:

machine = full_status.machines[unit.machine]

assert isinstance(machine, MachineStatus)

assert machine.instance_status

if machine.instance_status.status == "error" and raise_on_error:

raise JujuMachineError(

f"{unit_name!r} machine {unit.machine!r} has errored: {machine.instance_status.info!r}"

)

if not unit.machine:

continue

machine = full_status.machines[unit.machine]

assert isinstance(machine, MachineStatus)

assert machine.instance_status

if machine.instance_status.status == "error" and raise_on_error:

raise JujuMachineError(

f"{unit_name!r} machine {unit.machine!r} has errored: {machine.instance_status.info!r}"

)

james-garner-canonical · 2024-12-18T23:15:32Z

juju/model/_idle.py

+
+    for app_name in apps:
+        units.update(_app_units(full_status, app_name))
+


Suggestion: I really like how the loops below all do one thing, much nicer to read and follow than the original wait_for_idle. I notice that the first few only do anything if raise_on_error, while the next couple only do something if raise_on_blocked. How about something like:

if raise_on_error: _check_for_errors(full_status, apps, units) if raise_on_blocked: _check_for_blocked(full_status, apps, units)

james-garner-canonical · 2024-12-18T23:18:07Z

juju/model/_idle.py

+        if app.status.status == "blocked" and raise_on_blocked:
+            raise JujuAppError(f"{app_name!r} is blocked: {app.status.info!r}")
+
+    rv = CheckStatus(set(), set(), set())


Minor suggestion: in the CheckStatus definition, use = field(default_factory=set). Reasoning: I think rv = CheckStatus() will look more natural to readers (like me =))

james-garner-canonical · 2024-12-18T23:43:58Z

juju/model/_idle.py

+async def loop(
+    foo: AsyncIterable[CheckStatus | None],
+    *,
+    apps: AbstractSet[str],
+    wait_for_exact_units: int | None = None,
+    wait_for_units: int,
+    idle_period: float,
+) -> AsyncIterable[bool]:
+    """The outer, time-dependents logic of a wait_for_idle loop."""
+    idle_since: dict[str, float] = {}
+
+    async for status in foo:
+        logger.info("wait_for_idle iteration %s", status)
+        now = time.monotonic()
+
+        if not status:
+            yield False
+            continue
+
+        expected_idle_since = now - idle_period
+
+        # FIXME there's some confusion about what a "busy" unit is
+        # are we ready when over the last idle_period, every time sampled:
+        # a. >=N units were ready (possibly different each time), or
+        # b. >=N units were ready each time
+        for name in status.units:
+            if name in status.idle_units:
+                idle_since[name] = min(now, idle_since.get(name, float("inf")))
+            else:
+                idle_since[name] = float("inf")
+
+        if busy := {n for n, t in idle_since.items() if t > expected_idle_since}:
+            logger.info("Waiting for units to be idle enough: %s", busy)
+            yield False
+            continue
+
+        for app_name in apps:
+            ready_units = [
+                n for n in status.ready_units if n.startswith(f"{app_name}/")
+            ]
+            if len(ready_units) < wait_for_units:
+                logger.info(
+                    "Waiting for app %r units %s >= %s",
+                    app_name,
+                    len(status.ready_units),
+                    wait_for_units,
+                )
+                yield False
+                break
+
+            if (
+                wait_for_exact_units is not None
+                and len(ready_units) != wait_for_exact_units
+            ):
+                logger.info(
+                    "Waiting for app %r units %s == %s",
+                    app_name,
+                    len(ready_units),
+                    wait_for_exact_units,
+                )
+                yield False
+                break
+        else:
+            yield True


Suggestion: I really think that a class to encapsulate the state, and a single synchronous method call call with returns, is so much easier to follow than the async generator, and it seems like it's pretty straightforwardly logically equivalent -- just a lot easier to follow for noobs like me

Suggested change

async def loop(

foo: AsyncIterable[CheckStatus | None],

*,

apps: AbstractSet[str],

wait_for_exact_units: int | None = None,

wait_for_units: int,

idle_period: float,

) -> AsyncIterable[bool]:

"""The outer, time-dependents logic of a wait_for_idle loop."""

idle_since: dict[str, float] = {}

async for status in foo:

logger.info("wait_for_idle iteration %s", status)

now = time.monotonic()

if not status:

yield False

continue

expected_idle_since = now - idle_period

# FIXME there's some confusion about what a "busy" unit is

# are we ready when over the last idle_period, every time sampled:

# a. >=N units were ready (possibly different each time), or

# b. >=N units were ready each time

for name in status.units:

if name in status.idle_units:

idle_since[name] = min(now, idle_since.get(name, float("inf")))

else:

idle_since[name] = float("inf")

if busy := {n for n, t in idle_since.items() if t > expected_idle_since}:

logger.info("Waiting for units to be idle enough: %s", busy)

yield False

continue

for app_name in apps:

ready_units = [

n for n in status.ready_units if n.startswith(f"{app_name}/")

]

if len(ready_units) < wait_for_units:

logger.info(

"Waiting for app %r units %s >= %s",

app_name,

len(status.ready_units),

wait_for_units,

)

yield False

break

if (

wait_for_exact_units is not None

and len(ready_units) != wait_for_exact_units

):

logger.info(

"Waiting for app %r units %s == %s",

app_name,

len(ready_units),

wait_for_exact_units,

)

yield False

break

else:

yield True

class TimingChecker:

"""The outer, time-dependent logic of a wait_for_idle loop."""

def __init__(self):

self.idle_since: dict[str, float] = {}

def check(

status: CheckStatus,

*

apps: AbstractSet[str],

wait_for_exact_units: int | None = None,

wait_for_units: int,

idle_period: float,

) -> bool:

logger.info("wait_for_idle iteration %s", status)

now = time.monotonic()

if not status:

return False

expected_idle_since = now - idle_period

# FIXME there's some confusion about what a "busy" unit is

# are we ready when over the last idle_period, every time sampled:

# a. >=N units were ready (possibly different each time), or

# b. >=N units were ready each time

for name in status.units:

if name in status.idle_units:

self.idle_since[name] = min(now, idle_since.get(name, float("inf")))

else:

self.idle_since[name] = float("inf")

if busy := {n for n, t in self.idle_since.items() if t > expected_idle_since}:

logger.info("Waiting for units to be idle enough: %s", busy)

return False

for app_name in apps:

ready_units = [

n for n in status.ready_units if n.startswith(f"{app_name}/")

]

if len(ready_units) < wait_for_units:

logger.info(

"Waiting for app %r units %s >= %s",

app_name,

len(status.ready_units),

wait_for_units,

)

return False

if (

wait_for_exact_units is not None

and len(ready_units) != wait_for_exact_units

):

logger.info(

"Waiting for app %r units %s == %s",

app_name,

len(ready_units),

wait_for_exact_units,

)

return False

return True

dimaqq · 2024-12-19T01:59:59Z

I've pushed a separate branch with this refactor only: #1245 PTAL

dimaqq · 2024-12-19T08:50:26Z

closing in favour of #1245

#1245 Same as #1219 but using dumb classes instead of async generator.

dimaqq force-pushed the feat-new-wait-for-idle branch 3 times, most recently from 7e9bc99 to 2aa4634 Compare December 3, 2024 13:48

dimaqq mentioned this pull request Dec 4, 2024

draft: live jrpc calls using separate event in a helper thread #1104

Draft

dimaqq requested a review from james-garner-canonical December 4, 2024 07:49

james-garner-canonical reviewed Dec 5, 2024

View reviewed changes

dimaqq force-pushed the feat-new-wait-for-idle branch 2 times, most recently from 810eb09 to 9a36b7a Compare December 11, 2024 08:21

dimaqq requested a review from james-garner-canonical December 11, 2024 08:24

james-garner-canonical reviewed Dec 12, 2024

View reviewed changes

dimaqq force-pushed the feat-new-wait-for-idle branch from 9a36b7a to 3bcf321 Compare December 17, 2024 05:59

dimaqq force-pushed the feat-new-wait-for-idle branch from 6710b0c to b6fbb0c Compare December 17, 2024 08:48

dimaqq added 2 commits December 18, 2024 12:15

feat: new Model.wait_for_idle()

03e7d51

chore: remplement best guess for idle timer

065d3de

dimaqq force-pushed the feat-new-wait-for-idle branch from 88cd329 to 065d3de Compare December 18, 2024 03:15

dimaqq added 2 commits December 18, 2024 12:39

chore: better logging in integration tests

138d8f9

chore: studpi bug

1951425

dimaqq requested a review from james-garner-canonical December 18, 2024 05:13

dimaqq marked this pull request as ready for review December 18, 2024 06:01

dimaqq commented Dec 18, 2024

View reviewed changes

juju/model/_idle.py Outdated Show resolved Hide resolved

dimaqq added 4 commits December 18, 2024 15:21

fix: async filter should output only one result for each input

c6087a3

chore: add missing tests and refactor for readability

3e29b2b

chore: slightly better type hints

4276d99

chore: note on converted responses

d7abe10

james-garner-canonical requested changes Dec 19, 2024

View reviewed changes

dimaqq mentioned this pull request Dec 19, 2024

feat: new wait for idle #1245

Merged

dimaqq closed this Dec 19, 2024

jujubot added a commit that referenced this pull request Dec 20, 2024

Merge pull request #1245 from dimaqq/feat-new-wait-for-idle--dumb-class

e0199c8

#1245 Same as #1219 but using dumb classes instead of async generator.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: new wait for idle #1219

feat: new wait for idle #1219

dimaqq commented Nov 26, 2024 •

edited

Loading

dimaqq commented Dec 4, 2024

james-garner-canonical left a comment

dimaqq commented Dec 11, 2024

james-garner-canonical left a comment

dimaqq commented Dec 17, 2024 •

edited

Loading

dimaqq commented Dec 18, 2024

james-garner-canonical left a comment

james-garner-canonical Dec 18, 2024

james-garner-canonical Dec 18, 2024

james-garner-canonical Dec 18, 2024

james-garner-canonical Dec 18, 2024

james-garner-canonical Dec 18, 2024

dimaqq commented Dec 19, 2024

dimaqq commented Dec 19, 2024


		for app_name in apps:
		units.update(_app_units(full_status, app_name))

feat: new wait for idle #1219

feat: new wait for idle #1219

Conversation

dimaqq commented Nov 26, 2024 • edited Loading

dimaqq commented Dec 4, 2024

james-garner-canonical left a comment

Choose a reason for hiding this comment

dimaqq commented Dec 11, 2024

james-garner-canonical left a comment

Choose a reason for hiding this comment

dimaqq commented Dec 17, 2024 • edited Loading

dimaqq commented Dec 18, 2024

james-garner-canonical left a comment

Choose a reason for hiding this comment

james-garner-canonical Dec 18, 2024

Choose a reason for hiding this comment

james-garner-canonical Dec 18, 2024

Choose a reason for hiding this comment

james-garner-canonical Dec 18, 2024

Choose a reason for hiding this comment

james-garner-canonical Dec 18, 2024

Choose a reason for hiding this comment

james-garner-canonical Dec 18, 2024

Choose a reason for hiding this comment

dimaqq commented Dec 19, 2024

dimaqq commented Dec 19, 2024

dimaqq commented Nov 26, 2024 •

edited

Loading

dimaqq commented Dec 17, 2024 •

edited

Loading