Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

(t) replication spawn error #2766

Closed
phillxnet opened this issue Dec 14, 2023 · 3 comments
Closed

(t) replication spawn error #2766

phillxnet opened this issue Dec 14, 2023 · 3 comments
Assignees

Comments

@phillxnet
Copy link
Member

Thanks to forum member niko for highlighting this issue. Testing branch exhibits the following when attempting to activate a replication receiver:

Failed to start Replication due to an error: Error running a command. cmd = /opt/rockstor/.venv/bin/supervisorctl start replication. rc = 7. stdout = ['replication: ERROR (spawn error)', '']. stderr = ['']

            Traceback (most recent call last):
  File "/opt/rockstor/src/rockstor/smart_manager/views/replication_service.py", line 82, in post
    superctl(service.name, command)
  File "/opt/rockstor/src/rockstor/system/services.py", line 147, in superctl
    out, err, rc = run_command([SUPERCTL_BIN, switch, service], throw=throw)
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/rockstor/src/rockstor/system/osi.py", line 263, in run_command
    raise CommandException(cmd, out, err, rc)
system.exceptions.CommandException: Error running a command. cmd = /opt/rockstor/.venv/bin/supervisorctl start replication. rc = 7. stdout = ['replication: ERROR (spawn error)', '']. stderr = ['']

Forum reference: https://forum.rockstor.com/t/failed-to-start-replication-due-to-an-error-error-running-a-command-cmd-opt-rockstor-venv-bin-supervisorctl-start-replication-rc-7-stdout-replication-error-spawn-error-stderr/9136

@phillxnet
Copy link
Member Author

supervisord.log for above rpm instance built against an ongoing PR: 5.0.5-2764 running on a Leap 15.5:

tail -f /opt/rockstor/var/log/supervisord.log 
...
2023-12-14 17:09:48,073 INFO spawned: 'replication' with pid 6202
2023-12-14 17:09:50,468 INFO success: replication entered RUNNING state, process has stayed up for > than 2 seconds (startsecs)
2023-12-14 17:09:50,468 INFO exited: replication (exit status 1; not expected)
2023-12-14 17:09:50,470 INFO spawned: 'replication' with pid 6288
2023-12-14 17:09:51,758 INFO exited: replication (exit status 1; not expected)
2023-12-14 17:09:52,760 INFO spawned: 'replication' with pid 6373
2023-12-14 17:09:54,022 INFO exited: replication (exit status 1; not expected)
2023-12-14 17:09:56,026 INFO spawned: 'replication' with pid 6458
2023-12-14 17:09:57,302 INFO exited: replication (exit status 1; not expected)
2023-12-14 17:10:00,308 INFO spawned: 'replication' with pid 6547
2023-12-14 17:10:02,619 INFO success: replication entered RUNNING state, process has stayed up for > than 2 seconds (startsecs)
2023-12-14 17:10:02,620 INFO exited: replication (exit status 1; not expected)
2023-12-14 17:10:03,623 INFO spawned: 'replication' with pid 6879
2023-12-14 17:10:04,957 INFO exited: replication (exit status 1; not expected)
2023-12-14 17:10:05,960 INFO spawned: 'replication' with pid 6974
2023-12-14 17:10:07,282 INFO exited: replication (exit status 1; not expected)
2023-12-14 17:10:09,287 INFO spawned: 'replication' with pid 7078
2023-12-14 17:10:10,821 INFO exited: replication (exit status 1; not expected)
2023-12-14 17:10:13,827 INFO spawned: 'replication' with pid 7188
2023-12-14 17:10:15,233 INFO exited: replication (exit status 1; not expected)
2023-12-14 17:10:16,235 INFO gave up: replication entered FATAL state, too many start retries too quickly

and from the more specific log we have:

tail -f /opt/rockstor/var/log/supervisord_replication_stderr.log 
... repeating instances of:

Traceback (most recent call last):
  File "/opt/rockstor/.venv/bin/replicad", line 3, in <module>
    from smart_manager.replication.listener_broker import main
  File "/opt/rockstor/src/rockstor/smart_manager/replication/listener_broker.py", line 27, in <module>
    from sender import Sender
ModuleNotFoundError: No module named 'sender'

So we may just have a dangling Python3 style import issue or the like here.

@phillxnet phillxnet added this to the 5.1.X-X Stable release milestone Dec 15, 2023
@phillxnet phillxnet self-assigned this Dec 16, 2023
phillxnet added a commit to phillxnet/rockstor-core that referenced this issue Dec 16, 2023
Modernise previously missed replication imports re Py3.*
@phillxnet
Copy link
Member Author

Diagnostic notes:

Receiver

rleap15-5:~ # ss -tulpn | grep 10002
tcp   LISTEN 0      100                       192.168.2.199:10002      0.0.0.0:*    users:(("replicad",pid=2505,fd=15))

phillxnet added a commit to phillxnet/rockstor-core that referenced this issue Dec 19, 2023
- Force bytes format for replication messages and commands.
- Minor modification re Pythnon 3 behaviour re dict.keys(),
we replied on an implicit Python 2 behaviour.
- Move to Fstrings.
- Temp debug prints.
phillxnet added a commit to phillxnet/rockstor-core that referenced this issue Dec 20, 2023
- more str to byte for zmq use given our Py3 base now.
- more fstring conversions.
- further parameter/return type typecasting.
- removed unused variable.
- black format update
phillxnet added a commit to phillxnet/rockstor-core that referenced this issue Dec 21, 2023
Update replication code re Py3.*
- Modernise previously missed replication imports re Py3.*
- Force bytes format for replication messages and commands.
Required as zmq needs bytes format for these.
- Minor modification re Pythnon 3 behaviour re dict.keys(),
we replied on an implicit Python 2 behaviour.
- Move to Fstrings.
- Parameter/return type hinting.
- Removed an unused local variable.
- black format update
phillxnet added a commit to phillxnet/rockstor-core that referenced this issue Dec 21, 2023
- complete fstrings conversion for all files modified.
phillxnet added a commit to phillxnet/rockstor-core that referenced this issue Dec 29, 2023
- Improve diagnostic content of receiver failing to
retrieve senders IP address from sent appliance ID.
phillxnet added a commit to phillxnet/rockstor-core that referenced this issue Dec 29, 2023
- more debug logging
- remove receiver 'latest_snap or b""' to simplify for debug.
phillxnet added a commit to phillxnet/rockstor-core that referenced this issue Dec 30, 2023
- More debug logging.
- reduce retry iterations from 10 to 3.
- more type hints.
phillxnet added a commit to phillxnet/rockstor-core that referenced this issue Dec 31, 2023
- Yet more debug logging.
- Remove use of None from within zmq command/message passing:
to help with stricter type hinting.
phillxnet added a commit to phillxnet/rockstor-core that referenced this issue Dec 31, 2023
- return prior latest_snap name send as message in receiver-ready.
- remove libzmq socker.set_hwm.
phillxnet added a commit to phillxnet/rockstor-core that referenced this issue Jan 2, 2024
- refactor poll -> poller socks -> events for readability.
- additional explanatory comments re sockets etc.
- formatting typo re fstrings.
- additional typing.
phillxnet added a commit to phillxnet/rockstor-core that referenced this issue Jan 2, 2024
- Enable tracker on receiver's response: to assist debug.
phillxnet added a commit to phillxnet/rockstor-core that referenced this issue Jan 3, 2024
- Enable tracker on listender_broker and sender:
- add zmq_version and libzmq_version properties to sender,
receiver also now has these.
phillxnet added a commit to phillxnet/rockstor-core that referenced this issue Jan 4, 2024
- Enable trackers on listener_broker.
- more type-hinting.
phillxnet added a commit to phillxnet/rockstor-core that referenced this issue Jan 5, 2024
phillxnet added a commit to phillxnet/rockstor-core that referenced this issue Jan 5, 2024
- more str/byte issues.
- iostream behaviour differs, in-dev modifications.
phillxnet added a commit to phillxnet/rockstor-core that referenced this issue Jan 8, 2024
- additional type-hinting and fix for very low send byte count.
- harmonize on btrfs binary location to fs.btrfs for replication.
- readability refactoring.
- more byte/str fixes re Py2.7/Py3
phillxnet added a commit to phillxnet/rockstor-core that referenced this issue Jan 9, 2024
- minor additional refactoring for clarity.
- keep receiver self.share/snap naming as str, encode before
send only.
- more type-hints.
phillxnet added a commit to phillxnet/rockstor-core that referenced this issue Jan 9, 2024
- During debug logging we only show the first 180 bytes of the
message, this avoids log-spamming MBs of btrfs-send stream data.
- Send btrfs-send byte stream in 10MB chunks.
phillxnet added a commit to phillxnet/rockstor-core that referenced this issue Jan 10, 2024
- Avoid logging btrfs data stream contents entirely.
- Set read1() bytes read to 100MB max.
phillxnet added a commit to phillxnet/rockstor-core that referenced this issue Jan 15, 2024
Update replication code re Py2.7 to Py3.11.
- Modernise previously missed replication imports re Py3.*
- Force bytes format for replication messages and commands.
Zmq requires bytes format.
- Minor modification re Pythnon 3 behaviour re dict.keys(),
we previously relied on an implicit Python 2 behaviour.
- Move to Fstrings for all issue focused files.
- Parameter/return type hinting.
- Removed an unused local variable.
- black format update
- Improve error diagnostic content of receiver failing to
retrieve senders IP address from sent appliance ID.
- Improve debug logging.
- Remove receiver 'latest_snap or b""' argument to improve
readability.
- reduce retry iterations from 10 to 3.
- Remove use of None from within zmq command/message passing:
to help with stricter type hinting.
- remove libzmq socker.set_hwm.
- refactor poll -> poller socks -> events for readability.
- additional explanatory comments re sockets etc.
- Enable tracker on listender_broker, sender, and receiver's
response: improves robustness, and aids in debugging.
- add zmq_version and libzmq_version properties to sender and
receiver.
- adapt iostream behaviour: this differs between Py2.7 & Py3.*.
- Fix existing bug re very low send byte count.
- harmonize on btrfs binary location to fs.btrfs for replication.
- readability refactoring improvements.
- keep receiver self.share/snap naming as str, encode before
send only.
- Avoid logging btrfs data stream contents.
- Set read1() bytes read to 100MB max.
phillxnet added a commit that referenced this issue Jan 15, 2024
@phillxnet
Copy link
Member Author

Closing as:
Fixed by #2777

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant