Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

invoke colcon instead of ament_tools #132

Merged
merged 15 commits into from
Apr 21, 2018
Merged

invoke colcon instead of ament_tools #132

merged 15 commits into from
Apr 21, 2018

Conversation

dirk-thomas
Copy link
Member

@dirk-thomas dirk-thomas commented Mar 21, 2018

I don't intended to merge this patch - it will stay "in progress" until the direction is decided. I will keep the branch up-to-date though to allow anyone to opt-in to use this on CI.


With this branch you can set CI_SCRIPTS_BRANCH to colcon in the job parameters to use colcon for the build. When using this branch you need to change the default value for CI_AMENT_BUILD_ARGS from:

--parallel --cmake-args -DSECURITY=ON --

to

--cmake-args " -DSECURITY=ON"

  • Parallel builds are the default (--executor parallel).
  • Since colcon uses argparse without custom parsing of greedy arguments the -D needs to be prefixed with a space so that argparse doesn't consider it an argument. The --cmake-args argument itself will then take care of removing the leading space. The terminating -- is not needed anymore.

I also often use the option --event-handler console_cohesion+ to print the output of each package "on block" once the package has finished. That makes sure the output is not interleaves but the build log still contains "all" the information.

  • Linux Build Status
  • Linux-aarch64 Build Status
  • macOS Build Status
  • Windows Build Status
  • built without SECURITY=ON since that require changing the handling of the job parameters the PR has been updated to change the default parameters to keep security on the same way as before
  • only using FastRTPS since the RTI code generator keeps failing when building "many" packages in parallel
  • Linux coverage Build Status

Comparison of two Linux CI builds using different build tools:

@dirk-thomas dirk-thomas added the in progress Actively being worked on (Kanban column) label Mar 21, 2018
@dirk-thomas dirk-thomas self-assigned this Mar 21, 2018
@mikaelarguedas
Copy link
Member

Not stating an opinion here but I think it would be more fair to compare build time instead of the time including repeating flaky tests:

ament build time: 00:21:30
colcon build time: 00:20:04 (slightly faster)

@dirk-thomas
Copy link
Member Author

Not stating an opinion here but I think it would be more fair to compare build time instead of the time including repeating flaky tests:

With "build" time I was referring to the time the Jenkins build takes. I would expect both builds to have a similar rate of flakiness and would be surprised if they preferred happen in the ament_tools one. The overall result is still a significant speed up - I haven't looked into where the time difference originates from...

@dirk-thomas
Copy link
Member Author

Ready for review with updated CI builds:

  • Linux Build Status
  • Linux-aarch64 Build Status
  • macOS Build Status
  • Windows Build Status

To iterate quickly the current state uses the default branches of the colcon repos. At some point we can / should (?) switch to the latest release.

@dirk-thomas dirk-thomas added in review Waiting for review (Kanban column) and removed in progress Actively being worked on (Kanban column) labels Apr 11, 2018
@mikaelarguedas
Copy link
Member

Is this able to build connext without crashing on the code generator ? I see that the windows job is not using connext

@dirk-thomas
Copy link
Member Author

Is this able to build connext without crashing on the code generator ?

At least the builds I ran on Jenkins didn't had that problem. But since the problem happened only very rarely in the first place I don't think I can answer that question after just a few builds.

I see that the windows job is not using connext

I triggered another Windows build with Connext: Build Status

@mikaelarguedas
Copy link
Member

I think I'm missing information to review this so forgive if the following questions have been answered elsewhere and are planned to be addressed (pointers appreciated)

If it's not desired to have the console output displayed by default in colcon. Is there a plan to add --event-handler console_cohesion+ or at least the error output to the jobs by default?

What is the plan for DSECURITY ? Do you plan on having the jobs fetch the meta information from somewhere to add the flag for fastrtps and test_security ? Or should the default values of the jobs also be modified accordingly?

Why is catkin_pkg required? Does this mean that we don;t need ament_package anymore?

@dirk-thomas
Copy link
Member Author

If it's not desired to have the console output displayed by default in colcon. Is there a plan to add --event-handler console_cohesion+ or at least the error output to the jobs by default?

Yes, we could make that the default. I can add more commits to this PR - I was mostly worried when also altering the job generation (to use different defaults and change the naming of some of the parameters) it might get too much for a review.

What is the plan for DSECURITY ? Do you plan on having the jobs fetch the meta information from somewhere to add the flag for fastrtps and test_security ? Or should the default values of the jobs also be modified accordingly?

I would keep that separate (to not make a single PR address too many things at the same time) and keep the current default value passing it to all packages.

Why is catkin_pkg required? Does this mean that we don;t need ament_package anymore?

colcon uses the API of catkin_pkg to parse the manifests and doesn't use ament_package. Either of the two would be sufficient. It just happens that the first is already available from Debian in a ROS distro independent way which isn't the case for the later.

@mikaelarguedas
Copy link
Member

mikaelarguedas commented Apr 11, 2018

I would keep that separate (to not make a single PR address too many things at the same time) and keep the current default value passing it to all packages.

So if we merge this we don't build or test security anymore, is that correct ?

I can add more commits to this PR - I was mostly worried when also altering the job generation (to use different defaults and change the naming of some of the parameters) it might get too much for a review.

I understand the concern for the review, nicely crafted commits should allow to reduce the review burden. If we want to convince ourselves about the job generation, what about a set of test jobs generated with the new config and manually triggered ?

I think that the behavior of the jobs should be the same after merging this PR, so not having console output or not building/testing part of our stack is a blocker for merging this in my opinion.

@dirk-thomas
Copy link
Member Author

So if we merge this we don't build or test security anymore, is that correct ?

No, that is not the case. Currently the default job parameters contain -DSECURITY=ON. The same will be the case after this change.

I think that the behavior of the jobs should be the same after merging this PR, so not having console output or not building/testing part of our stack is a blocker for merging this in my opinion.

I updated the PR to change the default parameters to match the same behavior for e.g. output handling (see https://github.com/ros2/ci/pull/132/files#diff-42a7cd652d15099c3f6b6c27e05dce31R82).

If we want to convince ourselves about the job generation, what about a set of test jobs generated with the new config and manually triggered ?

I intentionally didn't include any other job configuration changes in this PR. So I don't think we need to generate any temporary jobs. You can simply run trigger the current job using this branch and using the default values from this patch:

--cmake-args " -DSECURITY=ON" --event-handler console_cohesion+

@mikaelarguedas
Copy link
Member

Currently the default job parameters contain -DSECURITY=ON. The same will be the case after this change.

I must have misunderstood the description of this PR then "built without SECURITY=ON since that require changing the handling of the job parameters"

I updated the PR to change the default parameters to match the same behavior for e.g. output handling

👍 , without that change our jobs would have been broken: https://ci.ros2.org/job/ci_linux/4197/

@mikaelarguedas
Copy link
Member

I intentionally didn't include any other job configuration changes in this PR. So I don't think we need to generate any temporary jobs. You can simply run trigger the current job using this branch and using the default values from this patch:
--cmake-args " -DSECURITY=ON" --event-handler console_cohesion+

I tried to run a job with these arguments and the job crashed on argument parsing: https://ci.ros2.org/job/ci_windows/4301/

@dirk-thomas dirk-thomas force-pushed the colcon branch 2 times, most recently from 6ed5ea1 to d36619a Compare April 11, 2018 17:45
@mikaelarguedas mikaelarguedas added in progress Actively being worked on (Kanban column) and removed in review Waiting for review (Kanban column) labels Apr 11, 2018
@dirk-thomas dirk-thomas force-pushed the colcon branch 6 times, most recently from cd30776 to dc6cc3a Compare April 11, 2018 18:34
@dirk-thomas
Copy link
Member Author

dirk-thomas commented Apr 11, 2018

This is now colliding with the special handling from #86. I removed all of that logic in dc6cc3a which now requires separate jobs 😞:

@dirk-thomas
Copy link
Member Author

Rebased to fix conflicts with the recently merged PR.

if sys.platform != 'win32':
colcon_script = os.path.join(venv_path, 'bin', 'colcon')
else:
colcon_script = 'c:\\python36\\Scripts\\colcon.exe'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this mean that we will now need to update the ci scripts on new python minor version?
Could we infer that from the environment ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If that can be done automatically even better.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 4fe9cfd.

rmdir /S /Q ws workspace

echo "# BEGIN SECTION: Determine arguments"
set "PATH=%PATH:"=%"
set "PATH=!PATH:"=!"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question: Can you clarify why we need to use ! everywhere now ? and how this plays with the delayed expansion?

Thanks!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using % didn't work for me. I didn't spend too much effort trying to figure out why. Using ! made it work.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, I was wondering if this didnt work before you enabled the delayed expansion but would have worked after enabling it.

That's fine, I'm not planning on looking into it to find out

@@ -24,33 +24,37 @@


def build_and_test_and_package(args, job):
print('# BEGIN SUBSECTION: ament build')
print('# BEGIN SUBSECTION: build')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is "BEGIN SUBSECTION: colcon build" in the other file. I suggest homogenizing by just using "build", "test" and "test-result" everywhere

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in 78d6cab.

@mikaelarguedas
Copy link
Member

This looks good to me now. 👍

Not sure if we prefer to roll this out before the weekend to have the time to see all jobs types running with it before Monday or if on the opposite we want to be around to monitor the jobs after deploying this

@dirk-thomas
Copy link
Member Author

I am planning on merging this in an hour so that the nightlies can run and there is time over the weekend to fix minor issues. Imo that is preferred over potentially disturbing everyone during the week. If the problems are too big (or my time too short) I will just revert it before the nightlies Sunday evening and will try again some time later... Let me know if you have any objections.

@dirk-thomas dirk-thomas merged commit 72fdf47 into master Apr 21, 2018
@dirk-thomas dirk-thomas deleted the colcon branch April 21, 2018 04:02
@dirk-thomas dirk-thomas removed the in review Waiting for review (Kanban column) label Apr 21, 2018
@dirk-thomas
Copy link
Member Author

dirk-thomas commented Apr 21, 2018

The first round of manually triggered builds (after reconfiguring the jobs) look promising:

CI builds:

  • Linux Build Status
  • Linux aarch Build Status
  • macOS Build Status
  • Windows Build Status

Packaging builds:

  • Linux Build Status
  • Linux aarch Build Status
  • macOS Build Status
  • Windows Build Status

@dirk-thomas
Copy link
Member Author

To summarize the current state:

  • all CI builds are green
  • all packaging builds are green
  • the nightly builds:
    • all release builds are green
    • the debug builds:
      • Linux, Linux aarch are green
      • macOS: the 26 test failures are all related to rviz rendering and plugins
      • Windows: the 4 test failures are from the test_showimage timing out
    • the repeated builds:
      • Linux: the 20 test failiures are composition tests, test_parameter_server_cpp, test_tutorial_parameter_events
      • Linux aarch: the 8 test failures are composition tests
      • macOS: the 10 test failures are composition, test_services_cpp, rclcpp.TestNode.now
      • Windows: the 2 test failures are from the test_parameter_server_cpp

Neither of the failing tests strikes me as likely being related to this patch.

@dhood
Copy link
Member

dhood commented Apr 25, 2018

If I may add my perspective on the current state:

The composition tests have always been a bit flaky with fastrtps, but since April 21 they started failing regularly again, including with connext. Perhaps because of the same cause as ros2/build_farmer#114 (whatever it is). It may have been caused by something unrelated to colcon and the timing was just a coincidence, but AFAICS colcon hasn't been ruled out yet.

The packaging jobs are green but the output from them is not yet usable (fix ready in #158 but overlays not tested yet). Users of nightlies also haven't been informed of the switch that has occurred.

@mikaelarguedas
Copy link
Member

Few additions that may help us segregate the failures:

It indeed looks like the various fixes in testing and packaging yesterday reduced the number of failures

macOS: the 26 test failures are all related to rviz rendering and plugins

👍 from today's meeting, @nuclearsandwich is about to disable them

Windows: the 4 test failures are from the test_showimage timing out

👍 not related to this patch, these failures are due to OpenCV libraries not being installed properly on the build machines:
see #146 (comment) and ros2/build_farmer#113 for details

all packaging builds are green

Packaging jobs are green, but the packaging artifacts are still unusable ATM. @dhood has been investigating and @mjcarroll @sloretz are testing the ones from #158 to see if the resulting workspaces can be source and used as underlays

@dirk-thomas
Copy link
Member Author

Users of nightlies also haven't been informed of the switch that has occurred.

I couldn't find anywhere in the docs where we point users to the nightly packaging jobs. Do you have a pointer for me? I am more than happy to add a note for the changed script name.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants