Skip to content
This repository has been archived by the owner on Sep 18, 2024. It is now read-only.

[Retiarii] Retry a failed multi-model trial by disabling CGO in CGOExecutionEngine #4098

Merged
merged 20 commits into from
Oct 11, 2021

Conversation

hzhua
Copy link
Contributor

@hzhua hzhua commented Aug 23, 2021

When multiple models is merged into one trial by cross-graph optimization (CGO), there are multiple reasons that can cause trial failure:

  1. The CGO policies wrongly merge multiple models, e.g., GPU OOM;
  2. The mutated model is wrong even without CGO, e.g., wrong shape in mutation.

A failed multi-model trial CGOExecutionEngine should be dissembled into multiple single models, each of which should be submitted independently as a single trial.

This PR depends on #4086. #4086 should be first merged before this PR.

@hzhua hzhua requested review from ultmaster and QuanluZhang August 23, 2021 07:13
@hzhua hzhua changed the title [Retiarii] Retry a failed multi-model trial by disabling CGO in CGOExecutionEngine [DO NOT MERGE][Retiarii] Retry a failed multi-model trial by disabling CGO in CGOExecutionEngine Sep 27, 2021
@QuanluZhang QuanluZhang merged commit 50dc05d into microsoft:master Oct 11, 2021
@hzhua hzhua changed the title [DO NOT MERGE][Retiarii] Retry a failed multi-model trial by disabling CGO in CGOExecutionEngine [Retiarii] Retry a failed multi-model trial by disabling CGO in CGOExecutionEngine Oct 11, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants