Utilize Standalone Proxy #21129
Labels
blocking-release
Blocks release
Central-EngSys
This issue is owned by the Engineering System team.
EngSys
This issue is impacting the engineering system.
Context
This issue is actually prompted by what the real solution to this other discussion that I was having with @gracewilcox and @benbp .
FIrst, some backstory. @gracewilcox was seeing some really wonky errors in the
azadmin
build. She mentioned that this also occurred inazfile
andazblob
. These issues started June 28th, the same day that I touched the test-proxy version. This immediately makes me suspect the proxy is the source of the problems, so I did a bit of digging.Turns out, the test-proxy fix did introduce this
* is not a valid pathspec
item. It's because we made the proxy aware of what had already been checked out. If the proxy already has a targeted tag checked out, it will invoke aclean
operation prior to restoring that tag. Thisclean
operation is what was failing, and is the reason for theproxy changed, error began
.Given that we're in the middle of the release week, I tried an alternative that merely removes the possibility of a race condition.
After I added this, azadmin continued failing. This new failure revolves around the fact that
azadmin
is actually multiple packages, and as such has multiple test suites. Given there is a single proxy running for all of these packages, and each test-suite has no idea about the others, it is easy for them to stomp on each other. In this case, the new error is because as each test suite in theazadmin
module finishes, it fires anAdmin/Reset
to clear out any of its customizations. Given that there are 4 test suites running, there is a VERY high chance that when test suite A firesAdmin/Reset
, test suite B is still running a test. When this happens, the test-proxy rightfully kicks back the reset with400
, as once can't change a session level sanitizer while a playback is happening.The part that I'm still scratching my head at is why we haven't seen this at all prior to June 28th. I'd definitely expect there to be consistent failures given this.
I spoke to @gracewilcox about this not reproing locally, and the reasoning there is because given that these are different packages they are never run locally as a single unit like they are in CI!
Why do we see this new error? This is one piece of data that I don't understand.
Prospective Solution
I think that the test-suite should swap to the standalone test-proxy, and utilize either a dynamic port or a configured port/protocol for each test suite.
At the beginning of each test suite, the proxy should be downloaded to
.proxy
at repo root, then utilized from there. If each test suite uses it's own proxy and port, then a session-level sanitizer CANNOT stomp on another test run. They'll be totally split from each other!The text was updated successfully, but these errors were encountered: