Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bind trainer __init__ arguments as properites/attributes #154

Closed
wants to merge 8 commits into from

Conversation

ravi-mosaicml
Copy link
Contributor

@ravi-mosaicml ravi-mosaicml commented Dec 15, 2021

Most arguments that could be passed to the __init__ of the trainer are now bound to the trainer either as raw attributes or properties. This will enable changing the trainer behavior when doing interactive development.

Making state a dataclass was more hassle than it was worth, since dataclasses do not easily allow getters to have a different type signature than setters and the init. Instead, switched to using a plain old python class.

This change allowed for the following cleanup:

* Replaced `train_batch_size` and `eval_batch_size` with getters that read from the dataloader
* Added getters for `state.optimizers` and `state.schedulers` which return a list, regardless of the type passed to the setters or upon init
* Removed epoch, step, loss, batch, and outputs from the init args, as they would be updated by the trainer (and would not be set on init)
* Added setters for `optimizers`, `schedulers`, `callbacks`, and `algorithms` that ensure in-place operations, so references to previous state objects remain valid
Most arguments that could be passed to the `__init__` of the trainer are now bound to the trainer either as raw attributes or properties. This will enable changing the trainer behavior when doing interactive development.
@ravi-mosaicml ravi-mosaicml changed the title Ravi/trainer init args as attributes Bind trainer __init__ arguments as properites/attributes Dec 15, 2021
Copy link
Contributor

@siriuslee siriuslee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some small comments. There was a lot in here that didn't have anything to do with the getting and setting. From what I could tell, they looked sensible and fine, but I'll defer to folks that might be more aware of the reasons for those changes. The init stuff mostly looks good to me, though.

Base automatically changed from ravi/state_without_dataclass to dev December 16, 2021 19:29
self.state.optimizers = optimizer
self.state.schedulers = ComposedScheduler(schedulers=schedulers)
self.state.optimizers = optimizers
self.state.schedulers = [ComposedScheduler(schedulers=schedulers)]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure, but the Composed Scheduler casting maybe could be the underlying schedulers object? A little opinionated, but can auto set ComposedScheduler in the setter

Comment on lines +396 to +410
@property
def model(self) -> BaseMosaicModel:
"""The original model"""
return ddp.get_original_model(self.state.model)

@property
def train_dataloader(self) -> Union[DataLoader, DataloaderSpec]:
"""The train dataloader"""
if self._train_split_fn is not None and self._train_device_transformation_fn is not None:
return DataloaderSpec(self.state.train_dataloader, self._train_device_transformation_fn,
self._train_split_fn)
else:
return self.state.train_dataloader

@train_dataloader.setter
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think possibly it'd be nice to go to a TrainerProperties style separation of all properties for ease of reading all properties like PTL? maybe separate into a separate file

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed, we should remove all the setters here

Copy link
Contributor

@Averylamp Averylamp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm, I think I'd very much prefer breaking out properties into a TrainerProperties interface just to keep the Trainer file a little smaller and separate out the bloat

@property
def grad_accum(self):
"""Gradient Accumulation"""
return self.state.grad_accum

@grad_accum.setter
def grad_accum(self, grad_accum: int):
if self.deepspeed_enabled:
raise RuntimeError("Cannot change grad_accum when using deepspeed")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you say what should be done instead? Is this another "If you'd like to change it, create a new Trainer" type of thing?

if val is not None and val > len(self.train_dataloader):
warnings.warn(
textwrap.dedent(f"""StepsPerEpochWarning: The desired steps_per_epoch({val})
is greater than the number of batches in the training dataloader
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should include a short description of what will happen / what behavior to expect in this case (even if it's just ignored)

if checkpoint_interval_unit.lower() == "it":
self._save_event = Event.BATCH_END
return
raise RuntimeError(f"Invalid checkpoint_interval_unit: {checkpoint_interval_unit}")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
raise RuntimeError(f"Invalid checkpoint_interval_unit: {checkpoint_interval_unit}")
raise RuntimeError(f"Invalid checkpoint_interval_unit: {checkpoint_interval_unit} must be one of 'it', 'ep'")

Comment on lines +67 to +70
if backends is None:
self.backends = []
else:
self.backends = backends
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if backends is None:
self.backends = []
else:
self.backends = backends
self.backends = [] if backends is None else backends


if isinstance(train_dataloader, DataloaderSpec):
train_dataloader_spec = train_dataloader
self._train_device_transformation_fn = train_dataloader.device_transform_fn
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These changes don't seem related to binding the trainer arguments as properties?

I also find the previous logic to be more readable, and now the handling of train_dataloader and eval_dataloader are not consistent.

({self.state.steps_per_epoch})"""))
else:
self.state.steps_per_epoch = train_subset_num_batches
self.state.steps_per_epoch = train_subset_num_batches
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

curious to understand why train_subset and eval_subset are now handled differently?

self.engine = Engine(self.state, self.state.algorithms, self.logger, self.state.callbacks)

self.validate_every_n_batches = validate_every_n_batches
self.validate_every_n_epochs = validate_every_n_epochs
self.compute_training_metrics = compute_training_metrics
self.grad_clip_norm = grad_clip_norm
self._grad_clip_norm = grad_clip_norm
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since this is not a property of state, we can not make this a getter/private attribute

Comment on lines +396 to +410
@property
def model(self) -> BaseMosaicModel:
"""The original model"""
return ddp.get_original_model(self.state.model)

@property
def train_dataloader(self) -> Union[DataLoader, DataloaderSpec]:
"""The train dataloader"""
if self._train_split_fn is not None and self._train_device_transformation_fn is not None:
return DataloaderSpec(self.state.train_dataloader, self._train_device_transformation_fn,
self._train_split_fn)
else:
return self.state.train_dataloader

@train_dataloader.setter
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed, we should remove all the setters here

Comment on lines +775 to +778
if self._train_split_fn is None:
split_fn = default_batch_split_fn
else:
split_fn = self._train_split_fn
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if self._train_split_fn is None:
split_fn = default_batch_split_fn
else:
split_fn = self._train_split_fn
split_fn = default_batch_split_fn if self._train_split_fn is None else self._train_split_fn

or.. put this logic closer to where the self._train_split_fn is defined?

checkpoint_interval_unit=checkpoint_interval_unit)
if checkpoint_interval_unit is not None and self.deepspeed_enabled:
raise NotImplementedError("Checkpointing is not yet supported with DeepSpeed.")
self._checkpointer = Checkpointer(checkpoint_folder=checkpoint_folder,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this need not be private

@ravi-mosaicml
Copy link
Contributor Author

Closing this PR as we need to revisit multiple calls to .fit

@Averylamp Averylamp deleted the ravi/trainer_init_args_as_attributes branch March 10, 2022 00:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants