Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Integration test for bluechi-is-online node <name> --monitor #1017

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

nsimsolo
Copy link
Contributor

@nsimsolo nsimsolo commented Jan 2, 2025

Adding integration test for bluechi-is-online node --monitor

  1. Start node and agent, keep it running and verify --monitor does not return output.
  2. Stop node and verify --monitor returns 1.

Fixes: #1010
Signed-off-by: Nisim simsolo [email protected]

@nsimsolo nsimsolo force-pushed the bluechi-is-online-node-monitor branch 2 times, most recently from 17ff95a to 3519c49 Compare January 2, 2025 10:56
@coveralls
Copy link

coveralls commented Jan 2, 2025

Coverage Status

coverage: 82.356%. remained the same
when pulling 18b4c81 on nsimsolo:bluechi-is-online-node-monitor
into 7f73321 on eclipse-bluechi:main.

1. Start node and agent, keep it running and verify --monitor does not return output.
2. Stop node and verify --monitor returns 1.

Signed-off-by: nsimsolo <[email protected]>
@nsimsolo nsimsolo force-pushed the bluechi-is-online-node-monitor branch from 3519c49 to 18b4c81 Compare January 2, 2025 11:08
Comment on lines +30 to +36
node_foo = nodes[NODE_FOO]
agent_one = nodes[AGENT_ONE]
# Test 1: Agent and node are running, no monitor output expected
LOGGER.debug("Starting NODE_FOO.")
node_foo.systemctl.start_unit("bluechi-agent")
# Verifying agent_one is online
agent_one.bluechi_is_online.agent_is_online()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All bluechi-agents and the bluechi-controller get implicitly started by the BluechiTest class, so no need to start it here.

Suggested change
node_foo = nodes[NODE_FOO]
agent_one = nodes[AGENT_ONE]
# Test 1: Agent and node are running, no monitor output expected
LOGGER.debug("Starting NODE_FOO.")
node_foo.systemctl.start_unit("bluechi-agent")
# Verifying agent_one is online
agent_one.bluechi_is_online.agent_is_online()
node_foo = nodes[NODE_FOO]
agent_one = nodes[AGENT_ONE]
# Test 1: Agent and node are running, no monitor output expected
agent_one.bluechi_is_online.agent_is_online()

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thinking about it again, agent_is_online is a different functionality and shouldn't be used here. Please remove the test 1 here.

Comment on lines +38 to +46
monitor_output_test_one = []
monitor_thread = threading.Thread(
target=monitor_command, args=(ctrl, NODE_FOO, monitor_output_test_one)
)
monitor_thread.start()
time.sleep(2)
assert (
not monitor_output_test_one
), "Monitor command should not produce output when node is running."
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd avoid the time.sleep here and simply use .join() on the thread. The monitor thread should exit immediately since bluechi-is-online exits immediately with success when the node is online. And since the thread might get stuck if bluechi-is-online doesn' work that way, wrapping this in a Timeout should do the trick.

    monitor_output_test_one = []
    with Timeout(2, f"Timeout while monitoring {NODE_FOO}):
        monitor_thread = threading.Thread(
            target=monitor_command, args=(ctrl, NODE_FOO, monitor_output_test_one)
        )
        monitor_thread.start()
        monitor_thread.join()
        
    assert (
        not monitor_output_test_one
    ), "Monitor command should directly return with exit code success when node is online."

The same applies for test case 2.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this suggestion is incorrect, The intention of this test is to verify 'node --monitor' is keeping monitoring for 2 seconds without exiting, because all nodes are online.
According to bluechi-is-online --help:
--monitor: keeps monitoring as long as [agent|node|system] is online and exits if it detects an offline state.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right. I confused it with --wait-time, sorry about that. In that case, lets keep it with the time.sleep.

However, the monitor_thread will keep running in that case - which is not good. You are even overwriting the monitor_thread variable in the lines below. Not sure how pytest handles pending threads here, but I'd prefer finding a way to shut the thread down. Maybe you can pass a timeout to the thread function monitor_command (which should be higher than the sleep) and use the with Timeout in there? This way the thread will receive the timeout signal, thus stopping and the main thread shouldn't be affected.



def monitor_command(
ctrl: BluechiControllerMachine, node_name: str, monitor_output: list
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no need for using a list and adding a custom string to it. You can probably replace the monitor_output list with a simple bool variable which you set to the return of ctrl.bluechi_is_online.monitor_node.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The --monitor option doesn’t behave like a simple boolean because it continuously monitors the node and only returns 1 when an offline state is detected. I tried several ways to implement your suggestion but couldn’t find a successful approach.
Could you please provide an example to clarify how to implement this change?

Copy link
Member

@engelmi engelmi Jan 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The --monitor option doesn’t behave like a simple boolean because it continuously monitors the node and only returns 1 when an offline state is detected.
This is not about the --monitor option, but the return value of the monitor_command function which is used as thread runner. Currently, you use monitor_output: list here and add a message - which is not really necessary. A simple boolean should suffice. So you could try out something like:

def monitor_command(
    ctrl: BluechiControllerMachine, node_name: str, monitor_output: List[bool]):
    """Run the node --monitor command and monitor output."""
    monitor_output.append(ctrl.bluechi_is_online.monitor_node(node_name))

You can't use a primitive type like bool here, but I wouldn't add a custom string to the list - just the return value of the monitor function. Or you could use a wrapping class, but thats too much, probably.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually think that a small wrapping class for a thread with a field that is updated when the command returns is clearer than a list of one boolean. The timeout that is discussed in the other thread should also be added there. This will simplify the test code itself.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add integration tests for bluechi-is-online node with --monitor parameter
4 participants