Skip to content
This repository has been archived by the owner on Sep 18, 2024. It is now read-only.

[Retiarii] add validation in base trainers #3184

Merged
merged 26 commits into from
Dec 15, 2020

Conversation

hzhua
Copy link
Contributor

@hzhua hzhua commented Dec 11, 2020

No description provided.

self._val_dataset = getattr(datasets, dataset_cls)(train=False,
transform=get_default_transform(
dataset_cls),
**(dataset_kwargs or {}))
self._optimizer = getattr(torch.optim, optimizer_cls)(
model.parameters(), **(optimizer_kwargs or {}))
self._trainer_kwargs = trainer_kwargs or {'max_epochs': 10}

# TODO: we will need at least two (maybe three) data loaders in future.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove TODO

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed

def training_step(self, batch: Tuple[torch.Tensor, torch.Tensor], batch_idx: int) -> Dict[str, Any]:
x, y = self.training_step_before_model(batch, batch_idx)
y_hat = self.model(x)
return self.training_step_after_model(x, y, y_hat)

def training_step_before_model(self, batch: Tuple[torch.Tensor, torch.Tensor], batch_idx: int, device = None):
def training_step_before_model(self, batch: Tuple[torch.Tensor, torch.Tensor], batch_idx: int, device=None):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggest using self.device

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In MultiModel, different model's input may need to be placed on different devices (called in _train). Currently, the trainer just sets one GPU per model in hard-code.

BTW, train_step and validation_step are not used in PyTorchImageClassificationTrainer. Removed.

summed_loss = sum(losses)
summed_loss.backward()
for opt in self._optimizers:
opt.step()
if batch_idx % 50 == 0:
nni.report_intermediate_result(report_loss)
# if batch_idx % 50 == 0:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why comment this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was for debug. training_loss is not reported. Removed.

@ultmaster
Copy link
Contributor

NNI's line limit is 140. You might need to configure your autopep to avoid unwanted linebreaks. :)

@ultmaster ultmaster merged commit a0e2f8e into microsoft:dev-retiarii Dec 15, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants