Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WSREP: exception from gcomm, backend must be restarted failed to form singleton view after exceeding max_install_timeouts 3, giving up (FATAL) #638

Open
suizouwuya opened this issue Mar 15, 2023 · 0 comments

Comments

@suizouwuya
Copy link

suizouwuya commented Mar 15, 2023

[version]
Galera 25.3.33
Mariadb 10.3.30

[Background of the problem]

  1. Three nodes of mariadb
    node1: 1.1.1.21
    node2: 1.1.1.22
    node3: 1.1.1.23

[Problem scenario]

  1. Problem time: 2023-02-28 15:50:15
  2. Problem node: node2
  3. Last restart and synced time: (node2) 2023-02-27 20:07:44 0 [Note] WSREP: Shifting JOINED -> SYNCED (TO: 10872869)
    Service was not restarted after 2023-02-27.
  4. Last shift log:
    (node1: 2023-02-28 14:08:26 0 [Note] WSREP: Restored state OPEN -> SYNCED (11343237)
    node2: 2023-02-28 14:08:26 0 [Note] WSREP: Restored state OPEN -> SYNCED (11343237)
    node3: 2023-02-28 14:08:26 0 [Note] WSREP: Restored state OPEN -> SYNCED (11343237)
    )

[My analysis]

  1. The problem may be related to the abnormal network environment.
  2. Tried many times, unable to reproduce

[Problem]

  1. node1 log
    node1.txt

  2. node2 log
    node2.txt

  3. node3 log
    node3.txt

  4. node2 error shows:

2023-02-28 15:50:08 0 [Note] WSREP: max install timeouts reached, will isolate node for PT20S
2023-02-28 15:50:08 0 [Note] WSREP: no install message received
2023-02-28 15:50:14 0 [Note] WSREP: (511505cf, 'tcp://1.1.1.22:9601') turning message relay requesting off
2023-02-28 15:50:15 0 [Warning] WSREP: evs::proto(511505cf, GATHER, view_id(REG,511505cf,28)) install timer expired
evs::proto(evs::proto(511505cf, GATHER, view_id(REG,511505cf,28)), GATHER) {
current_view=view(view_id(REG,511505cf,28) memb {
511505cf,0
8e061fa0,0
e71b5018,0
} joined {
} left {
} partitioned {
}),
input_map=evs::input_map: {aru_seq=62468,safe_seq=62468,node_index=node: {idx=0,range=[62474,62473],safe_seq=62468} node: {idx=1,range=[62475,62474],safe_seq=62469} node: {idx=2,range=[62469,62468],safe_seq=62468} },
fifo_seq=1384760,
last_sent=62473,
known:
511505cf at 
{o=1,s=0,i=0,fs=-1,jm=
{v=0,t=4,ut=255,o=1,s=62468,sr=-1,as=62468,f=0,src=511505cf,srcvid=view_id(REG,511505cf,28),insvid=view_id(UNKNOWN,00000000,0),ru=00000000,r=[-1,-1],fs=1384760,nl=(
511505cf, {o=1,s=0,e=0,ls=-1,vid=view_id(REG,511505cf,28),ss=62468,ir=[62474,62473],}
8e061fa0, {o=0,s=1,e=0,ls=-1,vid=view_id(REG,511505cf,28),ss=62469,ir=[62475,62474],}
e71b5018, {o=0,s=1,e=0,ls=-1,vid=view_id(REG,511505cf,28),ss=62468,ir=[62469,62468],}
)
},
}
8e061fa0 at tcp://1.1.1.21:9601
{o=0,s=1,i=0,fs=6212585,}
e71b5018 at tcp://1.1.1.23:9601
{o=0,s=1,i=0,fs=5108065,}
}
2023-02-28 15:50:15 0 [Note] WSREP: going to give up, state dump for diagnosis:
evs::proto(evs::proto(511505cf, GATHER, view_id(REG,511505cf,28)), GATHER) {
current_view=view(view_id(REG,511505cf,28) memb {
511505cf,0
8e061fa0,0
e71b5018,0
} joined {
} left {
} partitioned {
}),
input_map=evs::input_map: {aru_seq=62468,safe_seq=62468,node_index=node: {idx=0,range=[62474,62473],safe_seq=62468} node: {idx=1,range=[62475,62474],safe_seq=62469} node: {idx=2,range=[62469,62468],safe_seq=62468} },
fifo_seq=1384760,
last_sent=62473,
known:
511505cf at 
{o=1,s=0,i=0,fs=-1,jm=
{v=0,t=4,ut=255,o=1,s=62468,sr=-1,as=62468,f=0,src=511505cf,srcvid=view_id(REG,511505cf,28),insvid=view_id(UNKNOWN,00000000,0),ru=00000000,r=[-1,-1],fs=1384760,nl=(
511505cf, {o=1,s=0,e=0,ls=-1,vid=view_id(REG,511505cf,28),ss=62468,ir=[62474,62473],}
8e061fa0, {o=0,s=1,e=0,ls=-1,vid=view_id(REG,511505cf,28),ss=62469,ir=[62475,62474],}
e71b5018, {o=0,s=1,e=0,ls=-1,vid=view_id(REG,511505cf,28),ss=62468,ir=[62469,62468],}
)
},
}
8e061fa0 at tcp://1.1.1.21:9601
{o=0,s=1,i=0,fs=6212585,}
e71b5018 at tcp://1.1.1.23:9601
{o=0,s=1,i=0,fs=5108065,}
}
2023-02-28 15:50:15 0 [ERROR] WSREP: exception from gcomm, backend must be restarted: evs::proto(511505cf, GATHER, view_id(REG,511505cf,28)) failed to form singleton view after exceeding max_install_timeouts 3, giving up (FATAL)
at gcomm/src/evs_proto.cpp:handle_install_timer():727
2023-02-28 15:50:15 0 [Note] WSREP: gcomm: terminating thread
2023-02-28 15:50:15 0 [Note] WSREP: gcomm: joining thread
2023-02-28 15:50:15 0 [Note] WSREP: gcomm: closing backend
2023-02-28 15:50:15 0 [Note] WSREP: Forced PC close
2023-02-28 15:50:15 0 [Warning] WSREP: discarding 3 messages from message index
2023-02-28 15:50:15 0 [Note] WSREP: gcomm: closed
2023-02-28 15:50:15 0 [Note] WSREP: Received self-leave message.
2023-02-28 15:50:15 0 [Note] WSREP: comp msg error in core 103
2023-02-28 15:50:15 0 [Note] WSREP: Closing send monitor...
2023-02-28 15:50:15 0 [Note] WSREP: Closed send monitor.
2023-02-28 15:50:15 0 [Note] WSREP: Closing replication queue.
2023-02-28 15:50:15 0 [Note] WSREP: Closing slave action queue.
2023-02-28 15:50:15 2 [Note] WSREP: New cluster view: global state: 00000000-0000-0000-0000-000000000000:0, view# -1: non-Primary, number of nodes: 0, my index: -1, protocol version -1
2023-02-28 15:50:15 0 [Note] WSREP: Shifting SYNCED -> CLOSED (TO: 11391296)
2023-02-28 15:50:15 2 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
2023-02-28 15:50:15 0 [Note] WSREP: RECV thread exiting -103: Software caused connection abort
2023-02-28 15:50:15 2 [Note] WSREP: applier thread exiting (code:6)
  1. I double-checked the issues list, which is somewhat similar to this one, but the version I'm using should have fixed it.
    WSREP: exception from gcomm, backend must be restarted: evs::proto #202
    exception from gcomm: mn.operational() == false #40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant