Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

(t) replication spawn error #2766 #2769

Closed

Conversation

phillxnet
Copy link
Member

@phillxnet phillxnet commented Dec 16, 2023

  • Modernise previously missed replication imports re Py3.*
  • Force bytes format for replication messages and commands.
  • Minor modification re Pythnon 3 behaviour re dict.keys(),
    we replied on an implicit Python 2 behaviour.
  • Move to Fstrings.
  • Temp debug prints.
  • further parameter/return type typecasting.
  • removed unused variable.
  • black format update

Closes #2766

Modernise previously missed replication imports re Py3.*
- Force bytes format for replication messages and commands.
- Minor modification re Pythnon 3 behaviour re dict.keys(),
we replied on an implicit Python 2 behaviour.
- Move to Fstrings.
- Temp debug prints.
@phillxnet
Copy link
Member Author

We still have a python 2 to 3 bytes issue going on here re:

[19/Dec/2023 18:30:02] DEBUG [smart_manager.replication.listener_broker:124] Starting a new Sender(029ea547-da0b-4c23-b4f9-53c02bb7c283_5).
[19/Dec/2023 18:30:02] ERROR [smart_manager.replication.sender:75] Id: 029ea547-da0b-4c23-b4f9-53c02bb7c283-5. b'Top level exception in sender: 029ea547-da0b-4c23-b4f9-53c02bb7c283-5'. Exception: %b requires a bytes-like object, or an object that implements __bytes__, not 'str'

But still a work in progress.

Associated email:

req = <zmq.Socket(zmq.DEALER) at 0x7fb31bcda890>
poll.register = <bound method Poller.register of <zmq.sugar.poll.Poller object at 0x7fb3217eb5d0>>
req.connect = <bound method Socket.connect of <zmq.Socket(zmq.DEALER) at 0x7fb31bcda890>>, ipc_socket = /var/run/replication.sock
rcommand=b'SUCCESS', reply=b'A new Sender started successfully for Replication Task(5).'
b'A new Sender started successfully for Replication Task(5).'

- more str to byte for zmq use given our Py3 base now.
- more fstring conversions.
- further parameter/return type typecasting.
- removed unused variable.
- black format update
@phillxnet
Copy link
Member Author

PR status update:

Current draft pull request branch has the following behaviour.

  1. The first read-only replication snapshot (ready to btrfs send) is created on the sender.
  2. The target sub-vol on the receiver is created.

From then on we have failures until after 10 attempts to follow-up we successfully disable the replication task.


Sender:

tail -f /opt/rockstor/var/log/rockstor.log

[20/Dec/2023 17:40:03] DEBUG [smart_manager.replication.listener_broker:304] new-send request received for 13
[20/Dec/2023 17:40:03] DEBUG [smart_manager.replication.listener_broker:124] Starting a new Sender(029ea547-da0b-4c23-b4f9-53c02bb7c283_13).
[20/Dec/2023 17:40:03] DEBUG [smart_manager.replication.sender:108] Id: 029ea547-da0b-4c23-b4f9-53c02bb7c283-13 Initial greeting: {'pool': 'test-pool', 'share': 'rep-source-share', 'snap': 'rep-source-share_13_replication_1', 'incremental': False, 'uuid': '029ea547-da0b-4c23-b4f9-53c02bb7c283'}
[20/Dec/2023 17:40:04] DEBUG [system.osi:235] Running command: /usr/sbin/btrfs subvolume list /mnt2/rock-pool
[20/Dec/2023 17:40:06] DEBUG [system.osi:235] Running command: /usr/bin/systemctl --lines=0 status sshd
[20/Dec/2023 17:40:10] DEBUG [smart_manager.replication.sender:306] Id: 029ea547-da0b-4c23-b4f9-53c02bb7c283-13. No response from receiver. Number of retry attempts left: 9
[20/Dec/2023 17:40:10] DEBUG [smart_manager.replication.sender:108] Id: 029ea547-da0b-4c23-b4f9-53c02bb7c283-13 Initial greeting: {'pool': 'test-pool', 'share': 'rep-source-share', 'snap': 'rep-source-share_13_replication_1', 'incremental': False, 'uuid': '029ea547-da0b-4c23-b4f9-53c02bb7c283'}
[20/Dec/2023 17:40:10] ERROR [smart_manager.replication.sender:75] Id: 029ea547-da0b-4c23-b4f9-53c02bb7c283-13. b'b\'receiver-init-error\' received for 029ea547-da0b-4c23-b4f9-53c02bb7c283-13. extended reply: b"Receiver(b\'029ea547-da0b-4c23-b4f9-53c02bb7c283-13\') already exists. Will not start a new one.". Aborting.'. Exception: b'b\'receiver-init-error\' received for 029ea547-da0b-4c23-b4f9-53c02bb7c283-13. extended reply: b"Receiver(b\'029ea547-da0b-4c23-b4f9-53c02bb7c283-13\') already exists. Will not start a new one.". Aborting.'

...
[20/Dec/2023 17:41:03] DEBUG [smart_manager.replication.listener_broker:61] Sender(029ea547-da0b-4c23-b4f9-53c02bb7c283_13) exited. exitcode: 3

Receiver

tail -f /opt/rockstor/var/log/gunicorn.log

127.0.0.1 - - [20/Dec/2023:17:40:05 +0000] "POST /o/token/ HTTP/1.1" 200 118 "-" "python-requests/2.31.0" 779ms
127.0.0.1 - - [20/Dec/2023:17:40:05 +0000] "POST /api/shares HTTP/1.1" 200 1692 "-" "python-requests/2.31.0" 118ms
127.0.0.1 - - [20/Dec/2023:17:40:05 +0000] "POST /api/sm/replicas/rshare HTTP/1.1" 200 221 "-" "python-requests/2.31.0" 62ms
127.0.0.1 - - [20/Dec/2023:17:40:05 +0000] "POST /api/sm/replicas/rtrail/rshare/8 HTTP/1.1" 200 224 "-" "python-requests/2.31.0" 39ms

AND

tail -f /opt/rockstor/var/log/rockstor.log

[20/Dec/2023 17:40:10] DEBUG [smart_manager.replication.listener_broker:254] initial greeting from b'029ea547-da0b-4c23-b4f9-53c02bb7c283-13'
[20/Dec/2023 17:40:10] ERROR [smart_manager.replication.listener_broker:274] Receiver(b'029ea547-da0b-4c23-b4f9-53c02bb7c283-13') already exists. Will not start a new one.
...

[20/Dec/2023 17:41:05] DEBUG [smart_manager.replication.receiver:149] Id: b'029ea547-da0b-4c23-b4f9-53c02bb7c283-13' command: b'receiver-ready' rcommand: None
[20/Dec/2023 17:41:05] ERROR [smart_manager.replication.receiver:288] Id: b'029ea547-da0b-4c23-b4f9-53c02bb7c283-13'. No response from the broker for receiver-ready command. Aborting.
[20/Dec/2023 17:41:10] DEBUG [smart_manager.replication.listener_broker:75] Receiver(b'029ea547-da0b-4c23-b4f9-53c02bb7c283-13') exited. exitcode: 3. Total messages processed: 2. Removing from the list.

@phillxnet phillxnet closed this Dec 21, 2023
@phillxnet phillxnet deleted the 2766-(t)-replication-spawn-error branch December 21, 2023 16:39
@phillxnet
Copy link
Member Author

This draft has been:
superseded by #2772
by way of rebase & squash.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant