Skip to content
This repository has been archived by the owner on Sep 26, 2019. It is now read-only.

Fix race condition in RunnerTest #1067

Closed
wants to merge 1 commit into from
Closed

Fix race condition in RunnerTest #1067

wants to merge 1 commit into from

Conversation

usmansaleem
Copy link
Contributor

@usmansaleem usmansaleem commented Mar 7, 2019

PR description

While attempting to build pantheon on a Windows machine, the RunnerTest.fastSyncFromGenesis was constantly failing at final Peer advertisedPeer = runnerAhead.getAdvertisedPeer().get(); because Runner isn't initialized completely. It is fixed by using same logic as tech.pegasys.pantheon.tests.acceptance.dsl.node.PantheonNodeRunner#waitForPortsFile

Fixed Issue(s)

Wait for Runner to start before attempt to access advertise peer.

Wait for Runner to start before attempt to access advertise peer.
@CLAassistant
Copy link

CLAassistant commented Mar 7, 2019

CLA assistant check
All committers have signed the CLA.

@ajsutton ajsutton self-assigned this Mar 7, 2019
ajsutton added a commit to ajsutton/pantheon that referenced this pull request Mar 7, 2019
…e open before it returns but doesn't block until shutdown. Expose an explicit awaitStop method for callers who want to block until Pantheon exits.

Fixes intermittency in RunnerTest with thanks to @usmansaleem for reporting via PegaSysEng#1067.
@ajsutton
Copy link
Contributor

ajsutton commented Mar 7, 2019

Thanks for reporting this and the contributed fix. While this fix would indeed work, we've recently changed the network startup to be more synchronous to try and avoid this kind of problem but didn't take the final step of exposing a synchronous call to start Runner without waiting for it to exit. While more invasive I believe it can prevent us needing to think about waiting for ports to open in embedded scenarios like this.

Would you be able to try the changes in https://github.com/ajsutton/pantheon/tree/runner-start and let me know if that fixes the issue for you please?

@usmansaleem
Copy link
Contributor Author

@ajsutton Thank you for the feedback.

Following would indeed do the trick, this was the alternate solution I had, but then I found existing code for waiting for "port files" so I copied that. I'll try it your branch today and then close my pull request.

Awaitility.await()
          .atMost(20, TimeUnit.SECONDS)
          .ignoreExceptions()
          .untilAsserted(() -> assertThat(runnerAhead.getAdvertisedPeer()).isNotEmpty());

@ajsutton
Copy link
Contributor

ajsutton commented Mar 7, 2019

It shouldn't even need that extra wait now because start() won't return until the ports are all open but I messed up and didn't remove it from my first attempt at investigating. I've removed it on my branch now so would be great to check it on your setup since we know it fails there (and it passes quite consistently for me and in our CI).

@usmansaleem
Copy link
Contributor Author

@ajsutton your branch (https://github.com/ajsutton/pantheon/tree/runner-start) build successfully in my setup.

@ajsutton
Copy link
Contributor

ajsutton commented Mar 8, 2019

Awesome thank you. I'll get a PR up and hopefully get that merged soon.

@usmansaleem usmansaleem closed this Mar 8, 2019
@usmansaleem
Copy link
Contributor Author

Closing as changes are/will be fixed from another contribution.

ajsutton added a commit that referenced this pull request Mar 8, 2019
start() guarantees ports are open before it returns but doesn't block until shutdown. Then awaitStop can be used by callers who want to block until Pantheon exits.

Fixes intermittency regression in RunnerTest with thanks to @usmansaleem for reporting via #1067.
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants