Integration test for bluechi-is-online node <name> --monitor #1017

nsimsolo · 2025-01-02T10:51:46Z

Adding integration test for bluechi-is-online node --monitor

Start node and agent, keep it running and verify --monitor does not return output.
Stop node and verify --monitor returns 1.

Fixes: #1010
Signed-off-by: Nisim simsolo [email protected]

coveralls · 2025-01-02T11:05:20Z

coverage: 82.356%. remained the same
when pulling 18b4c81 on nsimsolo:bluechi-is-online-node-monitor
into 7f73321 on eclipse-bluechi:main.

1. Start node and agent, keep it running and verify --monitor does not return output. 2. Stop node and verify --monitor returns 1. Signed-off-by: nsimsolo <[email protected]>

engelmi · 2025-01-03T09:01:57Z

tests/tests/tier0/bluechi-is-online-node-monitor/test_bluechi_is_online_node_monitor.py

+    node_foo = nodes[NODE_FOO]
+    agent_one = nodes[AGENT_ONE]
+    # Test 1: Agent and node are running, no monitor output expected
+    LOGGER.debug("Starting NODE_FOO.")
+    node_foo.systemctl.start_unit("bluechi-agent")
+    # Verifying agent_one is online
+    agent_one.bluechi_is_online.agent_is_online()


All bluechi-agents and the bluechi-controller get implicitly started by the BluechiTest class, so no need to start it here.

Suggested change

node_foo = nodes[NODE_FOO]

agent_one = nodes[AGENT_ONE]

# Test 1: Agent and node are running, no monitor output expected

LOGGER.debug("Starting NODE_FOO.")

node_foo.systemctl.start_unit("bluechi-agent")

# Verifying agent_one is online

agent_one.bluechi_is_online.agent_is_online()

node_foo = nodes[NODE_FOO]

agent_one = nodes[AGENT_ONE]

# Test 1: Agent and node are running, no monitor output expected

agent_one.bluechi_is_online.agent_is_online()

Thinking about it again, agent_is_online is a different functionality and shouldn't be used here. Please remove the test 1 here.

engelmi · 2025-01-03T09:12:33Z

tests/tests/tier0/bluechi-is-online-node-monitor/test_bluechi_is_online_node_monitor.py

+    monitor_output_test_one = []
+    monitor_thread = threading.Thread(
+        target=monitor_command, args=(ctrl, NODE_FOO, monitor_output_test_one)
+    )
+    monitor_thread.start()
+    time.sleep(2)
+    assert (
+        not monitor_output_test_one
+    ), "Monitor command should not produce output when node is running."


I'd avoid the time.sleep here and simply use .join() on the thread. The monitor thread should exit immediately since bluechi-is-online exits immediately with success when the node is online. And since the thread might get stuck if bluechi-is-online doesn' work that way, wrapping this in a Timeout should do the trick.

monitor_output_test_one = [] with Timeout(2, f"Timeout while monitoring {NODE_FOO}): monitor_thread = threading.Thread( target=monitor_command, args=(ctrl, NODE_FOO, monitor_output_test_one) ) monitor_thread.start() monitor_thread.join() assert ( not monitor_output_test_one ), "Monitor command should directly return with exit code success when node is online."

The same applies for test case 2.

I think this suggestion is incorrect, The intention of this test is to verify 'node --monitor' is keeping monitoring for 2 seconds without exiting, because all nodes are online.
According to bluechi-is-online --help:
--monitor: keeps monitoring as long as [agent|node|system] is online and exits if it detects an offline state.

You are right. I confused it with --wait-time, sorry about that. In that case, lets keep it with the time.sleep.

However, the monitor_thread will keep running in that case - which is not good. You are even overwriting the monitor_thread variable in the lines below. Not sure how pytest handles pending threads here, but I'd prefer finding a way to shut the thread down. Maybe you can pass a timeout to the thread function monitor_command (which should be higher than the sleep) and use the with Timeout in there? This way the thread will receive the timeout signal, thus stopping and the main thread shouldn't be affected.

engelmi · 2025-01-03T09:14:30Z

tests/tests/tier0/bluechi-is-online-node-monitor/test_bluechi_is_online_node_monitor.py

+
+
+def monitor_command(
+    ctrl: BluechiControllerMachine, node_name: str, monitor_output: list


There is no need for using a list and adding a custom string to it. You can probably replace the monitor_output list with a simple bool variable which you set to the return of ctrl.bluechi_is_online.monitor_node.

The --monitor option doesn’t behave like a simple boolean because it continuously monitors the node and only returns 1 when an offline state is detected. I tried several ways to implement your suggestion but couldn’t find a successful approach.
Could you please provide an example to clarify how to implement this change?

The --monitor option doesn’t behave like a simple boolean because it continuously monitors the node and only returns 1 when an offline state is detected.
This is not about the --monitor option, but the return value of the monitor_command function which is used as thread runner. Currently, you use monitor_output: list here and add a message - which is not really necessary. A simple boolean should suffice. So you could try out something like:

def monitor_command( ctrl: BluechiControllerMachine, node_name: str, monitor_output: List[bool]): """Run the node --monitor command and monitor output.""" monitor_output.append(ctrl.bluechi_is_online.monitor_node(node_name))

You can't use a primitive type like bool here, but I wouldn't add a custom string to the list - just the return value of the monitor function. Or you could use a wrapping class, but thats too much, probably.

I actually think that a small wrapping class for a thread with a field that is updated when the command returns is clearer than a list of one boolean. The timeout that is discussed in the other thread should also be added there. This will simplify the test code itself.

nsimsolo requested review from alexlarsson, engelmi, mkemel and rhatdan as code owners January 2, 2025 10:51

nsimsolo force-pushed the bluechi-is-online-node-monitor branch 2 times, most recently from 17ff95a to 3519c49 Compare January 2, 2025 10:56

Integration test for bluechi-is-online node <name> --monitor

18b4c81

1. Start node and agent, keep it running and verify --monitor does not return output. 2. Stop node and verify --monitor returns 1. Signed-off-by: nsimsolo <[email protected]>

nsimsolo force-pushed the bluechi-is-online-node-monitor branch from 3519c49 to 18b4c81 Compare January 2, 2025 11:08

engelmi requested changes Jan 3, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Integration test for bluechi-is-online node <name> --monitor #1017

Integration test for bluechi-is-online node <name> --monitor #1017

nsimsolo commented Jan 2, 2025

coveralls commented Jan 2, 2025 •

edited

Loading

engelmi Jan 3, 2025

engelmi Jan 5, 2025

engelmi Jan 3, 2025

nsimsolo Jan 5, 2025

engelmi Jan 5, 2025

engelmi Jan 3, 2025

nsimsolo Jan 5, 2025

engelmi Jan 5, 2025 •

edited

Loading

mkemel Jan 5, 2025



		def monitor_command(
		ctrl: BluechiControllerMachine, node_name: str, monitor_output: list

Integration test for bluechi-is-online node <name> --monitor #1017

Are you sure you want to change the base?

Integration test for bluechi-is-online node <name> --monitor #1017

Conversation

nsimsolo commented Jan 2, 2025

coveralls commented Jan 2, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

engelmi Jan 5, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

coveralls commented Jan 2, 2025 •

edited

Loading

engelmi Jan 5, 2025 •

edited

Loading