-
Notifications
You must be signed in to change notification settings - Fork 170
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
End-to-end test coverage for CLI commands output #304
Conversation
There's still the discussion as to whether this is enough for end-to-end testing (let alone unit tests which are not covered by this PR) or even the behavior currently shown is appropriate. We'll have to iterate on it as we move forward. |
cbbaa69
to
07a0883
Compare
Ok, I'll stop here for the reasons exposed in this PR's description and because this is getting larger than I'm usually comfortable with. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't read the full patch yet - just a few high level comments:
- The size of the tests makes me wonder if we really can't write them more compact? Is there anything we could do API / feature wise which would help reduce this?
- What is the reason that all the tests are bundled in the separate package
test_ros2cli
? What is preventing them to be implemented in the package which command/verb they are testing?
test_ros2cli/package.xml
Outdated
@@ -12,11 +12,21 @@ | |||
<test_depend>action_tutorials</test_depend> | |||
<test_depend>ament_lint_auto</test_depend> | |||
<test_depend>ament_lint_common</test_depend> | |||
<test_depend>demo_nodes_py</test_depend> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will create a repository level cycle since packages in the repository containing demo_nodes_py
depend on packages in this repo. Imo we shouldn't do that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, that is true for ros2run
. It's another instance of an issue we have elsewhere e.g. launch_ros
, and that I was consciously eluding to avoid scope creep. IMO we shouldn't be using demo nodes for system or integration testing, nor duplicate ros2/system_tests
talkers, listeners, servers, etc., and others. We need a package with test nodes that ros2/ros2cli
, ros2/launch*
, ros2/system_tests
, can all depend on. Probably in its own repo, like we did for test_interface_files
.
I can put this on hold, do that and come back, or fix it afterwards. What would you rather do?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Imo a package/repo like test_nodes
sounds like a good idea.
test_ros2cli/CMakeLists.txt
Outdated
if("${TEST_RMW_IMPLEMENTATION}" STREQUAL "rmw_connext_cpp") | ||
# Connext startup is too slow. It needs a few extra seconds till | ||
# discovery completes. | ||
set(TEST_SETUP_DELAY 5.0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That looks problematic because it:
- is fragile in case Connext might just take a second longer
- is slowing down the tests in case Connext happens to be faster than the threshold
- needs to be done for potentially other RMW impl
Instead we should find a way to write the test to work without this delay.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is problematic, I know. Truth to be told, all these tests have a delay to deal with CPU load fluctuations, uneven discovery times, etc., which makes them all flakey. I had to remove publishing/echoing rate tests because it's completely inconsistent, time wise.
Now, to fix this. Many tests don't need this delay at all e.g. ros2 msg list
, so we can cut down time there. For those that do need it, I see two kinds: those that are bound to communication latency and those that are bound to discovery time.
For the first kind, we can wait for specific output or events e.g. for ros2 topic bw
wait till we have any (non filtered) output to start testing it.
For the second kind e.g. ros2 topic list
, well, it gets harder. Assuming the daemon is there, we may repeatedly execute the command, though that isn't really less fragile than a delay (and I can't figure how to pull that off using ros_testing
as it is today). We may also wait for the ROS graph to show some nodes, but since the CLI command, the CLI daemon and launch test are three separate processes, it may not work as intended.
Alternatively, I think that we could make our CLI more predictable. Just like ros2 service call
awaits for the server to be available, ros2 service list
could have an optional ROS graph condition to wait for e.g. ros2 service list --when "'my_server' in nodes"
. The same goes for all discovery time bound commands. What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Assuming the daemon is there, we may repeatedly execute the command, though that isn't really less fragile than a delay.
Why would that not be less fragile? It will pass the test as soon as the command returns the desired output and only runs to a maximum timeout if the command doesn't do so.
Just like
ros2 service call
awaits for the server to be available,ros2 service list
could have an optional ROS graph condition to wait for e.g.ros2 service list --when "'my_server' in nodes"
.
This is a good example how not to add conditions imo. If you want to wait for X (services of a node being available) you can't use a condition Y (check if a node has been discovered). Simply because there is a still a race since the node might become available first shortly before the services are actually advertised.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why would that not be less fragile? It will pass the test as soon as the command returns the desired output and only runs to a maximum timeout if the command doesn't do so.
I meant neither option makes it not fragile. You may not delay enough or you may not wait enough. Sure, we can increase the timeout a lot to increase the likelihood of discovery to find all peers for free, but that doesn't make it not fragile. If CI nodes are running with enough load, tests may still flake.
Anyways, even if we go down that route, we lack support in launch testing to do it so we have to go back to it first.
Simply because there is a still a race since the node might become available first shortly before the services are actually advertised.
Yes, it doesn't make it not fragile either. I should've said that. I'm not well versed enough on DDS discovery mechanisms, is it atomic in any aspect e.g. some information chunks that are guaranteed to arrive in single block?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, we can increase the timeout a lot to increase the likelihood of discovery to find all peers for free, but that doesn't make it not fragile. If CI nodes are running with enough load, tests may still flake.
I don't think that is the definition of flaky. The fact that we have to define an upper bound how long we are willing to wait is fine.
What do you mean by size? These tests leverage the commonalities in the CLI commands to have single
There's currently no way to setup launch tests in pure Python packages (see ros2/launch#237). |
test_ros2cli/test/test_cli.py.in
Outdated
OpaqueFunction(function=lambda context: ready_fn()) | ||
) | ||
launch_description = LaunchDescription([ | ||
# Always restart daemon to isolate tests. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe instead of doing this it would be simpler to add an option --no-daemon
to all the commands to not start the daemon on demand?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, that is a good idea, but won't it result in test instances affecting each other through daemon state?
test_ros2cli/CMakeLists.txt
Outdated
add_custom_cli_test( | ||
test_ros2interface | ||
CONFIG_MODULE ros2interface_test_configuration | ||
TIMEOUT 240) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The sum of all timeouts is 31 min. Is that really necessary?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's insane. I was going to bring it up in tomorrow's meeting. And yes, it used to be half of it and tests would timeout in CI. Cutting down delays we may be able to reduce it, but I wouldn't expect a large reduction. Connext is very slow.
I was wondering why are The length of each actual test file is also super long and verbose. I don't see an obvious way to reduce that though.
I think it would be very important to make it possible to write a test function which can run a launch test. The fact that this PR moves all the tests into a separate package is a good indicator that not having that functionality is resulting in new code to be written which immediately introduces tech debt. |
If we like this way of specifying tests we may standardize it, but no, it's not provided by
I agree with that, though it's not a blocker for these tests to land. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left some minimal comments.
ros2node/ros2node/verb/list.py
Outdated
@@ -37,4 +37,4 @@ def main(self, *, args): | |||
if args.count_nodes: | |||
print(len(node_names)) | |||
elif node_names: | |||
print(*[n.full_name for n in node_names], sep='\n') | |||
print(*sorted(n.full_name for n in node_names), sep='\n') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why sorting them? Was the order random?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe they show in discovery order, which is non-deterministic, yes.
output.extend((' ' + yaml.dump({ | ||
'partial_sequence': sequence | ||
})).splitlines()) | ||
output.append('') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same
|
||
sys.path.append(os.path.dirname(__file__)) | ||
|
||
from cli_test_configuration import CLITestConfiguration # noqa |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Prefer noqa: I100
than wildcard noqa
(I know that tests were doing that, and that I wrote it 😄 ).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same in many places.
'Waiting for an action server to become available...', | ||
'Sending goal:', | ||
' order: {}'.format(order), | ||
'', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why?
' order: {}'.format(order), | ||
'', | ||
re.compile('Goal accepted with ID: [a-f0-9]+'), | ||
'', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same
), | ||
CLITestConfiguration( | ||
command='interface', | ||
arguments=['show', 'std_msgs/msg/String'], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
show
should be tested with an action and a srv too.
), | ||
CLITestConfiguration( | ||
command='msg', | ||
arguments=['show', 'std_msgs/msg/NotAMessageType'], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would add a test case doing:
arguments=['show', 'std_msgs/srv/NotAMessageType']
and another doing:
arguments=['show', 'std_msgs/action/NotAMessageType']
Actually, I'm curious if they work correctly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, this is my result:
ros2 msg show std_msgs/action/String
string data
... it isn't working fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and this one is fun:
ros2 msg show std_msgs/asd/String
string data
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's interesting.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ivanpauno I can confirm interface introspection CLIs ignore the interface type namespace. I've commented out those tests, we should address that in a follow up PR.
test_ros2cli/test/test_cli.py.in
Outdated
def @TEST_NAME@(self, proc_info, command_under_test, test_configuration): | ||
"""Test that the command under test finished in a finite amount of time.""" | ||
success = proc_info.waitForShutdown( | ||
process=command_under_test, timeout=test_configuration.timeout |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
waitForShutdown
could take an optional keyword argument: assert: bool = False
. In that way, we wouldn't need the extra if ...: assert ...
here.
I think it may be useful.
Note: We have also assertWaitForShutdown
, but it's less versatile than an extra argument.
Note: I won't delete the original waitForShutdown
get_add_two_ints_server_action(service_name='_add_two_ints')], | ||
expected_output=itertools.chain( | ||
# Cope with launch internal ROS 2 node. | ||
itertools.repeat(re.compile(r'/launch_ros/.*parameter.*'), 6), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are checking for an exact number of lines here (6), of something that may easily change.
I don't know if it's better checking just if some lines are in the output, or checking for exact output.
This apply in many places, but I don't have an strong opinion about what's better.
Check lines in output pro: Updating the test is needed less often
Check lines in output con: Maybe, the test should have been updated (adding extra checks) after a change in other place, and it wasn't.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Check lines in output con: Maybe, the test should have been updated (adding extra checks) after a change in other place, and it wasn't.
That's the main reason why I went for a restrictive rather than a permissive test. If we trust CLI release testing on these tests, I'm more comfortable being as precise as possible regarding how they should behave.
), | ||
CLITestConfiguration( | ||
command='topic', | ||
arguments=['pub', '/my_ns/chatter', 'std_msgs/msg/String', '{data: test}'], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should the test also subscribe to the topic and check that the message is received?
I think that checking output is not enough in this case
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's why there's a listener, to check its output. Yes, we're trusting the listener in the process...
I think that most of the code is just test cases for parameterizing the test.
I agree with this point. It's probably a bit difficult to implement.
I did a comment about improving |
That is true @dirk-thomas. Test parameterizations are as few and compact as they can be. We simply have way to many combinations that need to be tested. But the test itself could use an API simplification. |
08d5088
to
e75d98e
Compare
e75d98e
to
8a16abd
Compare
a15df96
to
7c6ae38
Compare
@ivanpauno @dirk-thomas rebased and ready for re-review. bf24f0d is a major refactor of this PR, so anything before that commit is likely outdated. PTAL when you have time. Splitting is also an option if you find this PR hard to grok. |
Considering how CI looks like atm, I don't think we'll get to merge this patch today. There are test fixture executables failing on specific platforms in a variety of ways, some even segfault in MacOS, while nightlies are all green. That means this is going to require further investigation on each platform. |
IMO, this can be merged after the feature freeze. It's just adding tests, no features. |
The same applies to that one. Test failures are mostly unrelated to this PR, but to other packages that make use of |
Alright, recent builds from CI also experienced some issues and thus the current failed builds. I'll re-trigger after solving some other failures related to |
8f2f2d1
to
6dc5d0c
Compare
Rebased to resolve conflicts. |
CI (full, against Fast-RTPS, Connext and Opensplice)
|
- action command - service command - topic command - interface command - node command - msg command - srv command Signed-off-by: Michel Hidalgo <michel@ekumenlabs.com>
Signed-off-by: Michel Hidalgo <michel@ekumenlabs.com>
Improved test coverage for: - ros2action - ros2service - ros2topic - ros2msg - ros2srv - ros2interface - ros2node - ros2pkg Signed-off-by: Michel Hidalgo <michel@ekumenlabs.com>
Signed-off-by: Michel Hidalgo <michel@ekumenlabs.com>
Signed-off-by: Michel Hidalgo <michel@ekumenlabs.com>
Signed-off-by: Michel Hidalgo <michel@ekumenlabs.com>
Signed-off-by: Michel Hidalgo <michel@ekumenlabs.com>
Signed-off-by: Michel Hidalgo <michel@ekumenlabs.com>
Signed-off-by: Michel Hidalgo <michel@ekumenlabs.com>
6dc5d0c
to
e6f5893
Compare
Signed-off-by: Michel Hidalgo <michel@ekumenlabs.com>
Alright, @ivanpauno @mjcarroll may I ask for a final review? Test failures on Linux and MacOS are all unrelated to this patch and I had to re-trigger Windows CI because it failed for unrelated reasons (yet again). I'd like to merge this ASAP to avoid further conflicts with any new changes to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
It would be good to immediately follow-up in the TODOs (which are bugs unrelated to this PR that should be fixed).
Please, double check that all CI failures are unrelated. |
Signed-off-by: Michel Hidalgo <michel@ekumenlabs.com>
Alright, finally green! Though that means there's some flakiness that needs to be addressed. @mjcarroll may I proceed with this PR and any follow-up ones after it? |
This pull request aims to add full end-to-end test coverage to
allseveralros2
CLI commands:paramruncomponentlifecycledaemonlaunchNote: this table will be updated as progress is made.Test coverage for commands that have been struck through i.e.param
,component
,lifecycle
, etc., will not be provided here but in follow-up PRs. These commands have side effects beyond their execution lifetime that have to be checked in ways that are specific to each command. Thus, these don't lend themselves well to be integrated with the "standardized" testing device used for other commands.Connected to ros2/ros2#739.