Improved monitoring of replica state during large binlog load #151

WithSoull · 2024-12-23T12:10:21Z

Improved monitoring of replica state during large binlog load

Description:

This pull request addresses an issue where the ZooKeeper service mistakenly believed that a replica was dead when it was actually downloading a large binlog file.

To resolve this problem, I added a new field to the replica — NodeState.IsBinlogLoading. This field will allow us to more accurately track the state of the replica and ensure that the service correctly identifies when a replica is in the process of downloading a binlog.

Additionally, I have written unit tests and integration tests that cover the new functionality I have added.

Small fixes

I have also made minor updates to the existing code in the unit tests to improve the readability of the tests.

Finally, I have updated the Dockerfile to reflect the changes made in this pull request. This update removes two warnings and ensures that the latest version of the code is being used in the container.

- Some reveiw of old unit tests

- removed 2 warnings

internal/app/data.go

noname0443 · 2024-12-24T07:51:37Z

I believe it is a good idea to aggregate tests into two scripts, because it reduces consuming resources on creating a container on each test, but I suppose we may have a problem in the future with flapped tests. If one is dead due to context deadline, we have to restart a whole test set for either 5.7 or 8.0.

I think we should separate CI/CD changes in a separate PR and maybe add some restart logic for flapped tests.

WithSoull · 2024-12-24T08:00:18Z

Yes, I think you are right, we should split the PR.

I want to add my thoughts on this.
We need to set some kind of restart limit so that it doesn't go on indefinitely.

noname0443 · 2024-12-24T08:09:21Z

BTW: In the current view, we launch tests in a sequence. So it can be a good idea to launch them in a parallel way.

tests/features/repl_mon.feature

WithSoull and others added 4 commits December 23, 2024 14:36

Add new field IsLoadingBinlog to NodeState

9ce7fa3

Add unit and feature tests for new flag IsLoadingBinlog of nodes

4eb9345

- Some reveiw of old unit tests

Update Dockerfile:

e0c454f

- removed 2 warnings

Merge branch 'yandex:master' into master

9d88343

teem0n approved these changes Dec 23, 2024

View reviewed changes

internal/app/data.go Outdated Show resolved Hide resolved

internal/app/data.go Outdated Show resolved Hide resolved

WithSoull requested a review from teem0n December 23, 2024 12:33

WithSoull added 2 commits December 23, 2024 15:37

Add new func GetCurrentBinlogPosition() to SlaveState

e2c0cb1

Some fixes during PR

08f8575

noname0443 approved these changes Dec 23, 2024

View reviewed changes

rename variable in data.go for correctly linting

a073537

WithSoull force-pushed the master branch from 6d1d5e4 to a073537 Compare December 24, 2024 08:50

noname0443 approved these changes Dec 24, 2024

View reviewed changes

WithSoull force-pushed the master branch from 9cd56df to a073537 Compare December 24, 2024 09:24

Add testing binlog loading to repl_mon.feature testing

754d448

noname0443 approved these changes Dec 24, 2024

View reviewed changes

Microfix for linter

0b31923

noname0443 reviewed Dec 24, 2024

View reviewed changes

tests/features/repl_mon.feature Outdated Show resolved Hide resolved

WithSoull added 3 commits December 24, 2024 15:31

microfixes for linter

87fdb24

Remove gaps

6ad8809

Remove another gaps

315663f

noname0443 approved these changes Dec 24, 2024

View reviewed changes

noname0443 merged commit 3eeb942 into yandex:master Dec 24, 2024
52 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improved monitoring of replica state during large binlog load #151

Improved monitoring of replica state during large binlog load #151

WithSoull commented Dec 23, 2024

noname0443 commented Dec 24, 2024

WithSoull commented Dec 24, 2024

noname0443 commented Dec 24, 2024

Improved monitoring of replica state during large binlog load #151

Improved monitoring of replica state during large binlog load #151

Conversation

WithSoull commented Dec 23, 2024

Improved monitoring of replica state during large binlog load

Description:

Small fixes

noname0443 commented Dec 24, 2024

WithSoull commented Dec 24, 2024

noname0443 commented Dec 24, 2024