Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[main] Update dependencies from dotnet/xliff-tasks dotnet/arcade #11954

Merged
merged 2 commits into from
Dec 15, 2022

Conversation

dotnet-maestro[bot]
Copy link
Contributor

@dotnet-maestro dotnet-maestro bot commented Dec 14, 2022

This pull request updates the following dependencies

From https://github.com/dotnet/xliff-tasks

  • Subscription: 82dea055-1e11-4bb2-1eba-08d8d8fef0ea
  • Build: 20221213.1
  • Date Produced: December 13, 2022 2:30:07 PM UTC
  • Commit: a04986beb8e9d819b6e550e492a22058274e6d05
  • Branch: refs/heads/main

From https://github.com/dotnet/arcade

  • Subscription: e1494738-68cf-4cfe-3661-08d8e287a9c2
  • Build: 20221213.4
  • Date Produced: December 13, 2022 10:05:04 PM UTC
  • Commit: 6ef9e13
  • Branch: refs/heads/main

…20221213.1

Microsoft.DotNet.XliffTasks
 From Version 1.0.0-beta.22612.1 -> To Version 1.0.0-beta.22613.1
…213.4

Microsoft.DotNet.Arcade.Sdk , Microsoft.DotNet.Build.Tasks.Feed , Microsoft.DotNet.Helix.Sdk , Microsoft.DotNet.SignTool , Microsoft.DotNet.SwaggerGenerator.MSBuild
 From Version 8.0.0-beta.22612.4 -> To Version 8.0.0-beta.22613.4
@dotnet-maestro dotnet-maestro bot changed the title [main] Update dependencies from dotnet/xliff-tasks [main] Update dependencies from dotnet/xliff-tasks dotnet/arcade Dec 14, 2022
@premun
Copy link
Member

premun commented Dec 14, 2022

@dotnet/dnceng any idea why this is failing? The job seems to have returned 0

https://helix.dot.net/api/jobs/79079c0d-2a7e-47f4-9007-eb1989d81b3f/workitems/System.Numerics.Vectors.Tests?api-version=2019-06-17

@MattGal
Copy link
Member

MattGal commented Dec 14, 2022

@dotnet/d n ceng any idea why this is failing? The job seems to have returned 0

https://helix.dot.net/api/jobs/79079c0d-2a7e-47f4-9007-eb1989d81b3f/workitems/System.Numerics.Vectors.Tests?api-version=2019-06-17

Taking a look.

@MattGal
Copy link
Member

MattGal commented Dec 14, 2022

This is super weird. I took notes and details are below, but the TL;DR of it is that Service Bus allowed re-delivery of the message of this work item during the 28 minutes the first attempt was hanging and timing out, and due to random luck the 2nd attempt succeeded before the first one did. There's not much we can do about this, since we didn't even log losing the lock on the work item in the first attempt; this is just a transient bad behavior from Service bus and the mitigation is to retry. If this starts happening frequently, we should open an IcM with the Service Bus team.

Details:

Since we never expect to have to handle erroneous multi-delivery of the same work item's attempts, and especially not the 2nd attempt finishing before the 1st attempt, both the times in the Kusto DB are wrong (they show both attempts starting at the same time) and the pass/fail API is wrong (because it gets the last-finished version of the work item, in this case the first attempt)

Based off what we see in https://helix.dot.net/api/jobs/79079c0d-2a7e-47f4-9007-eb1989d81b3f/workitems/System.Numerics.Vectors.Tests?api-version=2019-06-17, the work item definitely is considered passed (it timed out on attempt #1, then got retried and passed in the log you've noted.)

However: In the log from attempt 1, on dci-mac-build-014

https://helixre107v0xdeko0k025g8.blob.core.windows.net/dotnet-arcade-refs-pull-11954-merge-79079c0d2a7e47f490/System.Numerics.Vectors.Tests/d4f764ad-5fa2-4be9-88a3-23739e299469.log

Execution runs from 2022-12-14T14:21:39.911Z to 2022-12-14T14:49:50.244Z (quite a long timeout)

In the log from attempt 2, on dci-mac-build-188:

https://helixre107v0xdeko0k025g8.blob.core.windows.net/dotnet-arcade-refs-pull-11954-merge-79079c0d2a7e47f490/System.Numerics.Vectors.Tests.Attempt.2/331aff86-a1e7-412e-8b03-785bc1b9dd6d.log

Execution runs from 2022-12-14T14:48:59.598Z until 2022-12-14T14:49:49.520Z, finishing literally a second before the first attempt.

SInce it finished last, the first attempt is considered the "last" attempt in the eyes of the pass/fail API (we might be able to tighten up its query to include the max attempt?) and thus a call to HelixApi.Job.PassFailAsync("79079c0d-2a7e-47f4-9007-eb1989d81b3f"); will return this test as failing.

@MattGal
Copy link
Member

MattGal commented Dec 14, 2022

Ah, there was one more twist here, specifically the machine was messed up, detected it, and tried to reboot (which would have prevented the -3 exit code from finishing), but the machine telling itself to reboot didn't actually work:

2022-12-14T14:48:02.812Z	INFO   	platformutil(52)	reboot_machine	Reboot has been triggered due to: peek-lock abort timer elapsed
2022-12-14T14:48:02.813Z	INFO   	job(100)	kill_any_leaked_workitem_processes	Killing leaked process : 22843 bash
2022-12-14T14:48:02.814Z	INFO   	job(100)	kill_any_leaked_workitem_processes	Killing leaked process : 22844 xcodebuild
2022-12-14T14:48:02.826Z	INFO   	job(100)	kill_any_leaked_workitem_processes	Killing leaked process : 23004 python3.9
2022-12-14T14:48:02.827Z	ERROR  	executor(877)	_execute_command	Executor timed out after 1320 seconds and was killed.
2022-12-14T14:48:03.171Z	INFO   	event(51)	send	Sending event type WorkItemTimeout
2022-12-14T14:48:03.242Z	INFO   	saferequests(87)	request_with_retry	Response complete with status code '201'
2022-12-14T14:48:03.943Z	INFO   	interval(70)	cancel_abort_after	Cancel abort_after for timer
2022-12-14T14:48:03.944Z	INFO   	executor(897)	_execute_command	The return code was not 0. Sleeping for 30 seconds in case it's still writing to disk.
2022-12-14T14:48:04.060Z	INFO   	platformutil(71)	reboot_machine	Rebooting!

(... goes on to not reboot)

I'll see if I can figure out why this is.

@premun
Copy link
Member

premun commented Dec 15, 2022

Ah, interesting! I did smell something fishy. I then noticed the other console log so I was thinking a retry did happen at some point.

@dotnet-maestro dotnet-maestro bot merged commit 325bd53 into main Dec 15, 2022
@dotnet-maestro dotnet-maestro bot deleted the darc-main-4faca213-ca26-40cd-b580-b91f7b51284b branch December 15, 2022 10:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants