Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sometimes could not find cached deb package for test_depend? #539

Closed
k-okada opened this issue Apr 26, 2018 · 6 comments
Closed

sometimes could not find cached deb package for test_depend? #539

k-okada opened this issue Apr 26, 2018 · 6 comments

Comments

@k-okada
Copy link
Contributor

k-okada commented Apr 26, 2018

I have following error on buildfirm (always on arm and sometimes on amd64/i386), It seems they can not find deb package within test_depends

  • 51 agent-d0968e8a NG
  • 52 agent-6b80438 OK
  • 53 agent-9da19ae8 OK
  • 54 agent-0b3a0b30 NG
  • 55 agent-e26dbb61 NG
  • 56 agent-e26dbb61 OK

http://build.ros.org/job/Kbin_uX64__nextage_calibration__ubuntu_xenial_amd64__binary/57/console

# BEGIN SUBSECTION: append build timestamp
23:23:25 dpkg-parsechangelog: warning:     debian/changelog(l5): found trailer where expected start of change data
23:23:25 LINE:  -- TORK <dev@opensource-robotics.tokyo.jp>  Tue, 16 Jan 2018 00:00:00 -0000
23:23:25 Invoking 'debchange -v 0.8.4-0xenial-20180416-222325-0800 -p -D xenial -u high -m Append timestamp when binarydeb was built.' in '/tmp/binarydeb/ros-kinetic-nextage-calibration-0.8.4'
23:23:25 debchange: warning:     debian/changelog(l5): found trailer where expected start of change data
23:23:25 LINE:  -- TORK <dev@opensource-robotics.tokyo.jp>  Tue, 16 Jan 2018 00:00:00 -0000
23:23:25 # END SUBSECTION
23:23:28 Looking for the '.dsc' file of package 'ros-kinetic-nextage-calibration' with version '0.8.4-0'
23:23:29 Traceback (most recent call last):
23:23:29   File "/usr/lib/python3/dist-packages/apt/cache.py", line 194, in __getitem__
23:23:29     return self._weakref[key]
23:23:29   File "/usr/lib/python3.5/weakref.py", line 131, in __getitem__
23:23:29     o = self.data[key]()
23:23:29 KeyError: 'ros-kinetic-nextage-gazebo'
23:23:29 
23:23:29 During handling of the above exception, another exception occurred:
23:23:29 
23:23:29 Traceback (most recent call last):
23:23:29   File "/usr/lib/python3/dist-packages/apt/cache.py", line 198, in __getitem__
23:23:29     rawpkg = self._cache[key]
23:23:29 KeyError: 'ros-kinetic-nextage-gazebo'
23:23:29 
23:23:29 During handling of the above exception, another exception occurred:
23:23:29 
23:23:29 Traceback (most recent call last):
23:23:29   File "/tmp/ros_buildfarm/scripts/release/create_binarydeb_task_generator.py", line 163, in <module>
23:23:29     main()
23:23:29   File "/tmp/ros_buildfarm/scripts/release/create_binarydeb_task_generator.py", line 85, in main
23:23:29     apt_cache, debian_pkg_names)
23:23:29   File "/tmp/ros_buildfarm/ros_buildfarm/common.py", line 144, in get_binary_package_versions
23:23:29     pkg = apt_cache[debian_pkg_name]
23:23:29   File "/usr/lib/python3/dist-packages/apt/cache.py", line 200, in __getitem__
23:23:29     raise KeyError('The cache has no package named %r' % key)
23:23:29 KeyError: "The cache has no package named 'ros-kinetic-nextage-gazebo'"
23:23:30 Build step 'Execute shell' marked build as failure
@mikaelarguedas
Copy link
Contributor

were the 2 packages released around the same time ?

this looks like a build race that can happen when the 2 packages are building in parallel (nextage_calibration started building just before nextage_gazebo).

In that case the deb of nextage_gazebo is deleted from the building repository to leave room for the new one that is building. As nextage_calibration already started building it cannot be marked as "blocked" in the build queue, when it reaches the point where it installs its' dependencies the nextage_gazebo deb has already been deleted and hasn't been replaced by the new one yet.

Maybe a better behavior would be that when nextage_gazebo deb is deleted, any downstream running job should be cancelled.

@nuclearsandwich do you think that is possible ?

@mikaelarguedas
Copy link
Contributor

Oh I just saw the linked PR.
Discard my previous answer, the problem is that with that bloom PR we install the test dependencies in the job.
Which is conflicting with #534 that removed the dependency between the 2 jobs.
So there is currently nothing ensuring that nextage_gazebo exists before starting the nextage_calibration build.

@dirk-thomas FYI

@k-okada
Copy link
Contributor Author

k-okada commented Apr 26, 2018

@mikaelarguedas thanks for investigation, I think we need to revert either ros-infrastructure/bloom#263 nor #534

@nuclearsandwich
Copy link
Contributor

Reverting the change in Bloom won't resolve the issue until every package gets a new release with bloom. The goal with #534 was to avoid rebuilding packages when test dependencies change, but this has the knock on effect of not blocking builds on the availability of test packages.

Reverting #534 is definitely the lowest effort resolution to this issue. I haven't actually looked at how #534 changed the job trees to see what the differences in the old and new job dependency trees is. If it's really compelling to keep #534 we could discuss ways to try installing test dependencies but ignore missing ones. However because test dependencies show up as Build dependencies in the Debian manifest I'm not sure how feasible that approach would be.

@mikaelarguedas
Copy link
Contributor

This has been happening quite a bit on lunar builds as well. Due to the lack of relationship between the jobs, the failing jobs are not retriggered when the test dependency is done building. And if not lucky means the deb gets nuked from the testing repo for the day
screenshot 2018-04-26 13 07 57

I'm fine with reverting #534 to fix the issue. That does imply more builds though. I opened #540 to revert it.

@nuclearsandwich
Copy link
Contributor

The change that caused this was reverted and this should no longer be an issue.

There will still be the odd failure that looks like this due to some Jenkins issues with transitive upstream/downstream.

If we want to enable something like it in the future we'll need to take the test dependency presence in the bloomed build depends into account.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants