Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Primary Component not restored after cluster partition during SST #410

Open
PrzemekMalkowski opened this issue Jul 6, 2016 · 2 comments
Assignees

Comments

@PrzemekMalkowski
Copy link

In a situation when donor node suffers connectivity problems during and because of SST (network saturation, IO overload, etc), it may be removed from cluster by other members so the SST will fail. However, in the following scenario, SST attempt leaves the cluster in non-Primary state:

  • cluster has 3 members, node1 and node2 are up and node3 needs SST to join
  • node3 joins the cluster and requests the SST from node1
  • SST starts but node1 has problems communicating to the other nodes on 4567 port
  • node1 is removed from cluster configuration by node2 and node3
  • SST fails and node3 (joiner) has to abort
  • node2 switches to non-Primary as cannot keep quorum alone
  • node1 restores connectivity with node2
  • both node1 and node2 cannot restore primary component any more, until manual intervention

In usual case of split brain situation - when node1 and node2 would loose connectivity, they would become non-Primary, but when network is restored, tbey will restore Primary Component and continue to operate. But in this scenario, that's not the case.
Tested on PXC 5.6.30.

Example test case

-- percona3 service start

2016-07-06 10:34:51 29280 [Note] WSREP: Quorum results:
        version = 3,
        component = PRIMARY,
        conf_id = 47,
        members = 2/3 (joined/total),
        act_id = 3717,
        last_appl. = -1,
        protocols = 0/7/3 (gcs/repl/appl),
        group UUID = 405bb13f-aa42-11e4-96e9-da7dd046b9dd
2016-07-06 10:34:51 29280 [Note] WSREP: Flow-control interval: [28, 28]
2016-07-06 10:34:51 29280 [Note] WSREP: Shifting OPEN -> PRIMARY (TO: 3717)
2016-07-06 10:34:51 29280 [Note] WSREP: State transfer required:
        Group state: 405bb13f-aa42-11e4-96e9-da7dd046b9dd:3717
        Local state: 00000000-0000-0000-0000-000000000000:-1
(...)
2016-07-06 10:34:53 29280 [Note] WSREP: Member 2.1 (percona3) requested state transfer from 'percona1'. Selected 0.1 (percona1)(SYNCED) as donor.
2016-07-06 10:34:53 29280 [Note] WSREP: Shifting PRIMARY -> JOINER (TO: 3717)
2016-07-06 10:34:53 29280 [Note] WSREP: Requesting state transfer: success, donor: 0
WSREP_SST: [INFO] Proceeding with SST (20160706 10:34:54.924)
(...)

-- percona1 port 4567 blocked

2016-07-06 10:34:51 22287 [Note] WSREP: Member 2.1 (percona3) requested state transfer from 'percona1'. Selected 0.1 (percona1)(SYNCED) as donor.
2016-07-06 10:34:51 22287 [Note] WSREP: Shifting SYNCED -> DONOR/DESYNCED (TO: 3717)
(...)
WSREP_SST: [INFO] Sleeping before data transfer for SST (20160706 10:34:53.527)
2016-07-06 10:34:56 22287 [Note] WSREP: (39892c46, 'tcp://192.168.3.2:4567') turning message relay requesting on, nonlive peers: tcp://192.168.3.3:4567
2016-07-06 10:34:57 22287 [Note] WSREP: (39892c46, 'tcp://192.168.3.2:4567') reconnecting to 464fbed4 (tcp://192.168.3.3:4567), attempt 0
2016-07-06 10:34:59 22287 [Note] WSREP: (39892c46, 'tcp://192.168.3.2:4567') reconnecting to 81727c1c (tcp://192.168.3.4:4567), attempt 0
(...)
2016-07-06 10:35:01 22287 [Note] WSREP: New COMPONENT: primary = no, bootstrap = no, my_idx = 0, memb_num = 1
2016-07-06 10:35:01 22287 [Note] WSREP: Flow-control interval: [16, 16]
2016-07-06 10:35:01 22287 [Note] WSREP: Received NON-PRIMARY.
2016-07-06 10:35:01 22287 [Note] WSREP: Shifting DONOR/DESYNCED -> OPEN (TO: 3717)
2016-07-06 10:35:01 22287 [Note] WSREP: New cluster view: global state: 405bb13f-aa42-11e4-96e9-da7dd046b9dd:3717, view# -1: non-Primary, number of nodes: 1, my index: 0, protocol version 3
(...)

-- percona3 aborts due to failed SST

2016-07-06 10:35:06 29280 [Note] WSREP: Quorum results:
        version = 3,
        component = PRIMARY,
        conf_id = 48,
        members = 1/2 (joined/total),
        act_id = 3717,
        last_appl. = 0,
        protocols = 0/7/3 (gcs/repl/appl),
        group UUID = 405bb13f-aa42-11e4-96e9-da7dd046b9dd
2016-07-06 10:35:06 29280 [Warning] WSREP: Donor 39892c46-4354-11e6-a224-92c5aa111752 is no longer in the group. State transfer cannot be completed, need to abort. Aborting...
2016-07-06 10:35:06 29280 [Note] WSREP: /usr/sbin/mysqld: Terminated.
160706 10:35:06 mysqld_safe mysqld from pid file /var/lib/mysql/percona3.pid ended

-- percona2 looses PC

2016-07-06 08:35:12 9346 [Note] WSREP: view(view_id(NON_PRIM,464fbed4,754) memb {
        464fbed4,2
} joined {
} left {
} partitioned {
        81727c1c,1
})
2016-07-06 08:35:12 9346 [Note] WSREP: New COMPONENT: primary = no, bootstrap = no, my_idx = 0, memb_num = 1
2016-07-06 08:35:12 9346 [Note] WSREP: Flow-control interval: [16, 16]
2016-07-06 08:35:12 9346 [Note] WSREP: Received NON-PRIMARY.
2016-07-06 08:35:12 9346 [Note] WSREP: Shifting SYNCED -> OPEN (TO: 3717)

-- percona1 and percona2 communication is restored, but they cannot restore the original Primary Component

2016-07-06 08:35:12 9346 [Note] WSREP: New cluster view: global state: 405bb13f-aa42-11e4-96e9-da7dd046b9dd:3717, view# -1: non-Primary, number of nodes: 1, my index: 0, protocol version 3
2016-07-06 08:35:12 9346 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
2016-07-06 08:35:28 9346 [Note] WSREP: declaring 39892c46 at tcp://192.168.3.2:4567 stable
2016-07-06 08:35:28 9346 [Note] WSREP: view(view_id(NON_PRIM,39892c46,756) memb {
        39892c46,1
        464fbed4,2
} joined {
} left {
} partitioned {
        81727c1c,1
})
2016-07-06 08:35:28 9346 [Note] WSREP: New COMPONENT: primary = no, bootstrap = no, my_idx = 1, memb_num = 2
2016-07-06 08:35:28 9346 [Note] WSREP: Flow-control interval: [23, 23]
2016-07-06 08:35:28 9346 [Note] WSREP: Received NON-PRIMARY.
2016-07-06 08:35:28 9346 [Note] WSREP: New cluster view: global state: 405bb13f-aa42-11e4-96e9-da7dd046b9dd:3717, view# -1: non-Primary, number of nodes: 2, my index: 1, protocol version 3
2016-07-06 08:35:28 9346 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
2016-07-06 08:35:51 9346 [Note] WSREP: (464fbed4, 'tcp://192.168.3.3:4567') reconnecting to 81727c1c (tcp://192.168.3.4:4567), attempt 30
2016-07-06 08:36:35 9346 [Note] WSREP: (464fbed4, 'tcp://192.168.3.3:4567') reconnecting to 81727c1c (tcp://192.168.3.4:4567), attempt 60
2016-07-06 08:37:19 9346 [Note] WSREP: (464fbed4, 'tcp://192.168.3.3:4567') reconnecting to 81727c1c (tcp://192.168.3.4:4567), attempt 90
2016-07-06 08:38:04 9346 [Note] WSREP: (464fbed4, 'tcp://192.168.3.3:4567') reconnecting to 81727c1c (tcp://192.168.3.4:4567), attempt 120
2016-07-06 08:38:48 9346 [Note] WSREP: (464fbed4, 'tcp://192.168.3.3:4567') reconnecting to 81727c1c (tcp://192.168.3.4:4567), attempt 150
(...)
@PrzemekMalkowski
Copy link
Author

PrzemekMalkowski commented Oct 11, 2016

Still happening on latest Galera Cluster Codership binaries:
galera2 >select @@Version,@@version_comment;
+-----------+-------------------------------------------+
| @@Version | @@version_comment |
+-----------+-------------------------------------------+
| 5.6.33 | MySQL Community Server (GPL), wsrep_25.17 |
+-----------+-------------------------------------------+
1 row in set (0.00 sec)

Donor log:

2016-10-11 13:59:35 909 [Note] WSREP: (d5afd2f2, 'tcp://0.0.0.0:4567') connection established to f0f6e1f0 tcp://172.17.0.2:4567
2016-10-11 13:59:35 909 [Note] WSREP: (d5afd2f2, 'tcp://0.0.0.0:4567') turning message relay requesting on, nonlive peers: 
2016-10-11 13:59:35 909 [Note] WSREP: declaring 4a54b31f at tcp://172.17.0.3:4567 stable
2016-10-11 13:59:35 909 [Note] WSREP: declaring f0f6e1f0 at tcp://172.17.0.2:4567 stable
2016-10-11 13:59:35 909 [Note] WSREP: Node 4a54b31f state prim
2016-10-11 13:59:35 909 [Note] WSREP: view(view_id(PRIM,4a54b31f,154) memb {
    4a54b31f,0
    d5afd2f2,0
    f0f6e1f0,0
} joined {
} left {
} partitioned {
})
2016-10-11 13:59:35 909 [Note] WSREP: save pc into disk
2016-10-11 13:59:35 909 [Note] WSREP: New COMPONENT: primary = yes, bootstrap = no, my_idx = 1, memb_num = 3
2016-10-11 13:59:35 909 [Note] WSREP: STATE EXCHANGE: Waiting for state UUID.
2016-10-11 13:59:35 909 [Note] WSREP: STATE EXCHANGE: sent state msg: f143ee2e-8fba-11e6-94cd-5b4301e72b8b
2016-10-11 13:59:35 909 [Note] WSREP: STATE EXCHANGE: got state msg: f143ee2e-8fba-11e6-94cd-5b4301e72b8b from 0 (galera2)
2016-10-11 13:59:35 909 [Note] WSREP: STATE EXCHANGE: got state msg: f143ee2e-8fba-11e6-94cd-5b4301e72b8b from 1 (galera1)
2016-10-11 13:59:36 909 [Note] WSREP: STATE EXCHANGE: got state msg: f143ee2e-8fba-11e6-94cd-5b4301e72b8b from 2 (galera3)
2016-10-11 13:59:36 909 [Note] WSREP: Quorum results:
    version    = 4,
    component  = PRIMARY,
    conf_id    = 12,
    members    = 2/3 (joined/total),
    act_id     = 100,
    last_appl. = 0,
    protocols  = 0/7/3 (gcs/repl/appl),
    group UUID = c8d6c713-89c1-11e6-af94-2b7d3c8cd95f
2016-10-11 13:59:36 909 [Note] WSREP: Flow-control interval: [28, 28]
2016-10-11 13:59:36 909 [Note] WSREP: New cluster view: global state: c8d6c713-89c1-11e6-af94-2b7d3c8cd95f:100, view# 13: Primary, number of nodes: 3, my index: 1, protocol version 3
2016-10-11 13:59:36 909 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
2016-10-11 13:59:36 909 [Note] WSREP: REPL Protocols: 7 (3, 2)
2016-10-11 13:59:36 909 [Note] WSREP: Service thread queue flushed.
2016-10-11 13:59:36 909 [Note] WSREP: Assign initial position for certification: 100, protocol version: 3
2016-10-11 13:59:36 909 [Note] WSREP: Service thread queue flushed.
2016-10-11 13:59:36 909 [Note] WSREP: Member 2.0 (galera3) requested state transfer from 'galera1'. Selected 1.0 (galera1)(SYNCED) as donor.
2016-10-11 13:59:36 909 [Note] WSREP: Shifting SYNCED -> DONOR/DESYNCED (TO: 100)
2016-10-11 13:59:36 909 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
2016-10-11 13:59:36 909 [Note] WSREP: Running: 'wsrep_sst_xtrabackup-v2 --role 'donor' --address '172.17.0.2:4444/xtrabackup_sst//1' --socket '/var/run/mysqld/mysqld.sock' --datadir '/var/lib/mysql/' --defaults-file '/etc/my.cnf' --defaults-group-suffix ''   '' --gtid 'c8d6c713-89c1-11e6-af94-2b7d3c8cd95f:100''
2016-10-11 13:59:36 909 [Note] WSREP: sst_donor_thread signaled with 0
WSREP_SST: [INFO] Streaming with xbstream (20161011 13:59:36.596)
WSREP_SST: [INFO] Using socat as streamer (20161011 13:59:36.597)
WSREP_SST: [INFO] Using /tmp/tmp.CEPUrQr6XH as xtrabackup temporary directory (20161011 13:59:36.612)
WSREP_SST: [INFO] Using /tmp/tmp.nsywmYASeD as innobackupex temporary directory (20161011 13:59:36.614)
WSREP_SST: [INFO] Streaming GTID file before SST (20161011 13:59:36.618)
WSREP_SST: [INFO] Evaluating xbstream -c ${INFO_FILE} | socat -u stdio TCP:172.17.0.2:4444; RC=( ${PIPESTATUS[@]} ) (20161011 13:59:36.620)
WSREP_SST: [INFO] Sleeping before data transfer for SST (20161011 13:59:36.633)
2016-10-11 13:59:38 909 [Note] WSREP: (d5afd2f2, 'tcp://0.0.0.0:4567') turning message relay requesting off
2016-10-11 13:59:39 909 [Note] WSREP: (d5afd2f2, 'tcp://0.0.0.0:4567') connection to peer f0f6e1f0 with addr tcp://172.17.0.2:4567 timed out, no messages seen in PT3S
2016-10-11 13:59:39 909 [Note] WSREP: (d5afd2f2, 'tcp://0.0.0.0:4567') turning message relay requesting on, nonlive peers: tcp://172.17.0.2:4567 
2016-10-11 13:59:41 909 [Note] WSREP: (d5afd2f2, 'tcp://0.0.0.0:4567') reconnecting to f0f6e1f0 (tcp://172.17.0.2:4567), attempt 0
2016-10-11 13:59:41 909 [Note] WSREP: (d5afd2f2, 'tcp://0.0.0.0:4567') reconnecting to 4a54b31f (tcp://172.17.0.3:4567), attempt 0
2016-10-11 13:59:41 909 [Note] WSREP: evs::proto(d5afd2f2, OPERATIONAL, view_id(REG,4a54b31f,154)) suspecting node: f0f6e1f0
2016-10-11 13:59:41 909 [Note] WSREP: evs::proto(d5afd2f2, OPERATIONAL, view_id(REG,4a54b31f,154)) suspected node without join message, declaring inactive
2016-10-11 13:59:44 909 [Note] WSREP: (d5afd2f2, 'tcp://0.0.0.0:4567') connection to peer 00000000 with addr tcp://172.17.0.2:4567 timed out, no messages seen in PT3S
2016-10-11 13:59:44 909 [Note] WSREP: (d5afd2f2, 'tcp://0.0.0.0:4567') connection to peer 00000000 with addr tcp://172.17.0.3:4567 timed out, no messages seen in PT3S
2016-10-11 13:59:45 909 [Note] WSREP: evs::proto(d5afd2f2, GATHER, view_id(REG,4a54b31f,154)) suspecting node: 4a54b31f
2016-10-11 13:59:45 909 [Note] WSREP: evs::proto(d5afd2f2, GATHER, view_id(REG,4a54b31f,154)) suspected node without join message, declaring inactive
2016-10-11 13:59:45 909 [Note] WSREP: view(view_id(NON_PRIM,4a54b31f,154) memb {
    d5afd2f2,0
} joined {
} left {
} partitioned {
    4a54b31f,0
    f0f6e1f0,0
})
2016-10-11 13:59:45 909 [Note] WSREP: New COMPONENT: primary = no, bootstrap = no, my_idx = 0, memb_num = 1
2016-10-11 13:59:45 909 [Note] WSREP: Flow-control interval: [16, 16]
2016-10-11 13:59:45 909 [Note] WSREP: Received NON-PRIMARY.
2016-10-11 13:59:45 909 [Note] WSREP: Shifting DONOR/DESYNCED -> OPEN (TO: 100)
2016-10-11 13:59:45 909 [Note] WSREP: New cluster view: global state: c8d6c713-89c1-11e6-af94-2b7d3c8cd95f:100, view# -1: non-Primary, number of nodes: 1, my index: 0, protocol version 3
2016-10-11 13:59:45 909 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
2016-10-11 13:59:45 909 [Note] WSREP: view(view_id(NON_PRIM,d5afd2f2,155) memb {
    d5afd2f2,0
} joined {
} left {
} partitioned {
    4a54b31f,0
    f0f6e1f0,0
})
2016-10-11 13:59:45 909 [Note] WSREP: New COMPONENT: primary = no, bootstrap = no, my_idx = 0, memb_num = 1
2016-10-11 13:59:45 909 [Note] WSREP: Flow-control interval: [16, 16]
2016-10-11 13:59:45 909 [Note] WSREP: Received NON-PRIMARY.
2016-10-11 13:59:45 909 [Note] WSREP: New cluster view: global state: c8d6c713-89c1-11e6-af94-2b7d3c8cd95f:100, view# -1: non-Primary, number of nodes: 1, my index: 0, protocol version 3
2016-10-11 13:59:45 909 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
WSREP_SST: [INFO] Streaming the backup to joiner at 172.17.0.2 4444 (20161011 13:59:46.635)
WSREP_SST: [INFO] Evaluating innobackupex --defaults-file=/etc/my.cnf  --defaults-group=mysqld --no-version-check  $tmpopts $INNOEXTRA --galera-info --stream=$sfmt $itmpdir 2>${DATA}/innobackup.backup.log | socat -u stdio TCP:172.17.0.2:4444; RC=( ${PIPESTATUS[@]} ) (20161011 13:59:46.637)
2016-10-11 13:59:48 909 [Note] WSREP: (d5afd2f2, 'tcp://0.0.0.0:4567') connection to peer 00000000 with addr tcp://172.17.0.2:4567 timed out, no messages seen in PT3S
2016-10-11 13:59:48 909 [Note] WSREP: (d5afd2f2, 'tcp://0.0.0.0:4567') connection established to 4a54b31f tcp://172.17.0.3:4567
2016-10-11 13:59:49 909 [Note] WSREP: (d5afd2f2, 'tcp://0.0.0.0:4567') reconnecting to 4a54b31f (tcp://172.17.0.3:4567), attempt 0
2016-10-11 13:59:49 909 [Note] WSREP: (d5afd2f2, 'tcp://0.0.0.0:4567') connection established to 4a54b31f tcp://172.17.0.3:4567
WSREP_SST: [ERROR] innobackupex finished with error: 1.  Check /var/lib/mysql//innobackup.backup.log (20161011 13:59:49.875)
WSREP_SST: [ERROR] Cleanup after exit with status:22 (20161011 13:59:49.881)
WSREP_SST: [INFO] Cleaning up temporary directories (20161011 13:59:49.888)
2016-10-11 13:59:49 909 [ERROR] WSREP: Failed to read from: wsrep_sst_xtrabackup-v2 --role 'donor' --address '172.17.0.2:4444/xtrabackup_sst//1' --socket '/var/run/mysqld/mysqld.sock' --datadir '/var/lib/mysql/' --defaults-file '/etc/my.cnf' --defaults-group-suffix ''   '' --gtid 'c8d6c713-89c1-11e6-af94-2b7d3c8cd95f:100'
2016-10-11 13:59:49 909 [ERROR] WSREP: Process completed with error: wsrep_sst_xtrabackup-v2 --role 'donor' --address '172.17.0.2:4444/xtrabackup_sst//1' --socket '/var/run/mysqld/mysqld.sock' --datadir '/var/lib/mysql/' --defaults-file '/etc/my.cnf' --defaults-group-suffix ''   '' --gtid 'c8d6c713-89c1-11e6-af94-2b7d3c8cd95f:100': 22 (Invalid argument)
2016-10-11 13:59:49 909 [ERROR] WSREP: sst sent called when not SST donor, state CONNECTED
2016-10-11 13:59:49 909 [ERROR] WSREP: Command did not run: wsrep_sst_xtrabackup-v2 --role 'donor' --address '172.17.0.2:4444/xtrabackup_sst//1' --socket '/var/run/mysqld/mysqld.sock' --datadir '/var/lib/mysql/' --defaults-file '/etc/my.cnf' --defaults-group-suffix ''   '' --gtid 'c8d6c713-89c1-11e6-af94-2b7d3c8cd95f:100'
2016-10-11 13:59:52 909 [Note] WSREP: declaring 4a54b31f at tcp://172.17.0.3:4567 stable
2016-10-11 13:59:52 909 [Note] WSREP: view(view_id(NON_PRIM,4a54b31f,156) memb {
    4a54b31f,0
    d5afd2f2,0
} joined {
} left {
} partitioned {
    f0f6e1f0,0
})
2016-10-11 13:59:52 909 [Note] WSREP: New COMPONENT: primary = no, bootstrap = no, my_idx = 1, memb_num = 2
2016-10-11 13:59:52 909 [Note] WSREP: Flow-control interval: [23, 23]
2016-10-11 13:59:52 909 [Note] WSREP: Received NON-PRIMARY.
2016-10-11 13:59:52 909 [Note] WSREP: New cluster view: global state: c8d6c713-89c1-11e6-af94-2b7d3c8cd95f:100, view# -1: non-Primary, number of nodes: 2, my index: 1, protocol version 3
2016-10-11 13:59:52 909 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
2016-10-11 13:59:57 909 [Note] WSREP: (d5afd2f2, 'tcp://0.0.0.0:4567') connection to peer 00000000 with addr tcp://172.17.0.2:4567 timed out, no messages seen in PT3S
2016-10-11 13:59:57 909 [Note] WSREP: (d5afd2f2, 'tcp://0.0.0.0:4567') reconnecting to 4a54b31f (tcp://172.17.0.3:4567), attempt 0
2016-10-11 14:00:00 909 [Note] WSREP: (d5afd2f2, 'tcp://0.0.0.0:4567') connection to peer 00000000 with addr tcp://172.17.0.3:4567 timed out, no messages seen in PT3S
2016-10-11 14:00:01 909 [Note] WSREP: evs::proto(d5afd2f2, OPERATIONAL, view_id(REG,4a54b31f,156)) suspecting node: 4a54b31f
2016-10-11 14:00:01 909 [Note] WSREP: evs::proto(d5afd2f2, OPERATIONAL, view_id(REG,4a54b31f,156)) suspected node without join message, declaring inactive
2016-10-11 14:00:01 909 [Note] WSREP: (d5afd2f2, 'tcp://0.0.0.0:4567') connection to peer 00000000 with addr tcp://172.17.0.2:4567 timed out, no messages seen in PT3S
2016-10-11 14:00:02 909 [Note] WSREP: view(view_id(NON_PRIM,4a54b31f,156) memb {
    d5afd2f2,0
} joined {
} left {
} partitioned {
    4a54b31f,0
    f0f6e1f0,0
})
2016-10-11 14:00:02 909 [Note] WSREP: New COMPONENT: primary = no, bootstrap = no, my_idx = 0, memb_num = 1
2016-10-11 14:00:02 909 [Note] WSREP: view(view_id(NON_PRIM,d5afd2f2,157) memb {
    d5afd2f2,0
} joined {
} left {
} partitioned {
    4a54b31f,0
    f0f6e1f0,0
})
2016-10-11 14:00:02 909 [Note] WSREP: Flow-control interval: [16, 16]
2016-10-11 14:00:02 909 [Note] WSREP: Received NON-PRIMARY.
2016-10-11 14:00:02 909 [Note] WSREP: New COMPONENT: primary = no, bootstrap = no, my_idx = 0, memb_num = 1
2016-10-11 14:00:02 909 [Note] WSREP: Flow-control interval: [16, 16]
2016-10-11 14:00:02 909 [Note] WSREP: Received NON-PRIMARY.
2016-10-11 14:00:02 909 [Note] WSREP: New cluster view: global state: c8d6c713-89c1-11e6-af94-2b7d3c8cd95f:100, view# -1: non-Primary, number of nodes: 1, my index: 0, protocol version 3
2016-10-11 14:00:02 909 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
2016-10-11 14:00:02 909 [Note] WSREP: New cluster view: global state: c8d6c713-89c1-11e6-af94-2b7d3c8cd95f:100, view# -1: non-Primary, number of nodes: 1, my index: 0, protocol version 3
2016-10-11 14:00:02 909 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
2016-10-11 14:00:04 909 [Note] WSREP: (d5afd2f2, 'tcp://0.0.0.0:4567') connection to peer 00000000 with addr tcp://172.17.0.3:4567 timed out, no messages seen in PT3S
2016-10-11 14:00:05 909 [Note] WSREP: (d5afd2f2, 'tcp://0.0.0.0:4567') connection established to 4a54b31f tcp://172.17.0.3:4567
2016-10-11 14:00:05 909 [Note] WSREP: declaring 4a54b31f at tcp://172.17.0.3:4567 stable
2016-10-11 14:00:05 909 [Note] WSREP: view(view_id(NON_PRIM,4a54b31f,158) memb {
    4a54b31f,0
    d5afd2f2,0
} joined {
} left {
} partitioned {
    f0f6e1f0,0
})
2016-10-11 14:00:05 909 [Note] WSREP: New COMPONENT: primary = no, bootstrap = no, my_idx = 1, memb_num = 2
2016-10-11 14:00:05 909 [Note] WSREP: Flow-control interval: [23, 23]
2016-10-11 14:00:05 909 [Note] WSREP: Received NON-PRIMARY.
2016-10-11 14:00:05 909 [Note] WSREP: New cluster view: global state: c8d6c713-89c1-11e6-af94-2b7d3c8cd95f:100, view# -1: non-Primary, number of nodes: 2, my index: 1, protocol version 3
2016-10-11 14:00:05 909 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
2016-10-11 14:00:35 909 [Note] WSREP: (d5afd2f2, 'tcp://0.0.0.0:4567') reconnecting to f0f6e1f0 (tcp://172.17.0.2:4567), attempt 30
2016-10-11 14:01:19 909 [Note] WSREP: (d5afd2f2, 'tcp://0.0.0.0:4567') reconnecting to f0f6e1f0 (tcp://172.17.0.2:4567), attempt 60
2016-10-11 14:02:03 909 [Note] WSREP: (d5afd2f2, 'tcp://0.0.0.0:4567') reconnecting to f0f6e1f0 (tcp://172.17.0.2:4567), attempt 90
2016-10-11 14:02:44 909 [Note] WSREP: (d5afd2f2, 'tcp://0.0.0.0:4567') reconnecting to f0f6e1f0 (tcp://172.17.0.2:4567), attempt 120
2016-10-11 14:03:15 909 [Note] WSREP: (d5afd2f2, 'tcp://0.0.0.0:4567') reconnecting to f0f6e1f0 (tcp://172.17.0.2:4567), attempt 150

@wuyafang
Copy link

wuyafang commented Sep 29, 2020

Hi, @PrzemekMalkowski . I have met the same problem with you. Would you like to tell me how you solve the problem? I SET GLOBAL wsrep_provider_options='pc.bootstrap=YES'; on one node to recover my cluster by manual,but I need an automatic solution.

I agree with this
In usual case of split brain situation - when node1 and node2 would loose connectivity, they would become non-Primary, but when network is restored, tbey will restore Primary Component and continue to operate. But in this scenario, that's not the case. Tested on PXC 5.6.30.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants