-
Notifications
You must be signed in to change notification settings - Fork 433
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bind trainer __init__
arguments as properites/attributes
#154
Conversation
Making state a dataclass was more hassle than it was worth, since dataclasses do not easily allow getters to have a different type signature than setters and the init. Instead, switched to using a plain old python class. This change allowed for the following cleanup: * Replaced `train_batch_size` and `eval_batch_size` with getters that read from the dataloader * Added getters for `state.optimizers` and `state.schedulers` which return a list, regardless of the type passed to the setters or upon init * Removed epoch, step, loss, batch, and outputs from the init args, as they would be updated by the trainer (and would not be set on init) * Added setters for `optimizers`, `schedulers`, `callbacks`, and `algorithms` that ensure in-place operations, so references to previous state objects remain valid
Most arguments that could be passed to the `__init__` of the trainer are now bound to the trainer either as raw attributes or properties. This will enable changing the trainer behavior when doing interactive development.
__init__
arguments as properites/attributes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some small comments. There was a lot in here that didn't have anything to do with the getting and setting. From what I could tell, they looked sensible and fine, but I'll defer to folks that might be more aware of the reasons for those changes. The init stuff mostly looks good to me, though.
self.state.optimizers = optimizer | ||
self.state.schedulers = ComposedScheduler(schedulers=schedulers) | ||
self.state.optimizers = optimizers | ||
self.state.schedulers = [ComposedScheduler(schedulers=schedulers)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure, but the Composed Scheduler casting maybe could be the underlying schedulers object? A little opinionated, but can auto set ComposedScheduler in the setter
@property | ||
def model(self) -> BaseMosaicModel: | ||
"""The original model""" | ||
return ddp.get_original_model(self.state.model) | ||
|
||
@property | ||
def train_dataloader(self) -> Union[DataLoader, DataloaderSpec]: | ||
"""The train dataloader""" | ||
if self._train_split_fn is not None and self._train_device_transformation_fn is not None: | ||
return DataloaderSpec(self.state.train_dataloader, self._train_device_transformation_fn, | ||
self._train_split_fn) | ||
else: | ||
return self.state.train_dataloader | ||
|
||
@train_dataloader.setter |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think possibly it'd be nice to go to a TrainerProperties style separation of all properties for ease of reading all properties like PTL? maybe separate into a separate file
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As discussed, we should remove all the setters here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm, I think I'd very much prefer breaking out properties into a TrainerProperties interface just to keep the Trainer file a little smaller and separate out the bloat
@property | ||
def grad_accum(self): | ||
"""Gradient Accumulation""" | ||
return self.state.grad_accum | ||
|
||
@grad_accum.setter | ||
def grad_accum(self, grad_accum: int): | ||
if self.deepspeed_enabled: | ||
raise RuntimeError("Cannot change grad_accum when using deepspeed") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you say what should be done instead? Is this another "If you'd like to change it, create a new Trainer" type of thing?
if val is not None and val > len(self.train_dataloader): | ||
warnings.warn( | ||
textwrap.dedent(f"""StepsPerEpochWarning: The desired steps_per_epoch({val}) | ||
is greater than the number of batches in the training dataloader |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should include a short description of what will happen / what behavior to expect in this case (even if it's just ignored)
if checkpoint_interval_unit.lower() == "it": | ||
self._save_event = Event.BATCH_END | ||
return | ||
raise RuntimeError(f"Invalid checkpoint_interval_unit: {checkpoint_interval_unit}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
raise RuntimeError(f"Invalid checkpoint_interval_unit: {checkpoint_interval_unit}") | |
raise RuntimeError(f"Invalid checkpoint_interval_unit: {checkpoint_interval_unit} must be one of 'it', 'ep'") |
if backends is None: | ||
self.backends = [] | ||
else: | ||
self.backends = backends |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if backends is None: | |
self.backends = [] | |
else: | |
self.backends = backends | |
self.backends = [] if backends is None else backends |
|
||
if isinstance(train_dataloader, DataloaderSpec): | ||
train_dataloader_spec = train_dataloader | ||
self._train_device_transformation_fn = train_dataloader.device_transform_fn |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These changes don't seem related to binding the trainer arguments as properties?
I also find the previous logic to be more readable, and now the handling of train_dataloader and eval_dataloader are not consistent.
({self.state.steps_per_epoch})""")) | ||
else: | ||
self.state.steps_per_epoch = train_subset_num_batches | ||
self.state.steps_per_epoch = train_subset_num_batches |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
curious to understand why train_subset and eval_subset are now handled differently?
self.engine = Engine(self.state, self.state.algorithms, self.logger, self.state.callbacks) | ||
|
||
self.validate_every_n_batches = validate_every_n_batches | ||
self.validate_every_n_epochs = validate_every_n_epochs | ||
self.compute_training_metrics = compute_training_metrics | ||
self.grad_clip_norm = grad_clip_norm | ||
self._grad_clip_norm = grad_clip_norm |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
since this is not a property of state, we can not make this a getter/private attribute
@property | ||
def model(self) -> BaseMosaicModel: | ||
"""The original model""" | ||
return ddp.get_original_model(self.state.model) | ||
|
||
@property | ||
def train_dataloader(self) -> Union[DataLoader, DataloaderSpec]: | ||
"""The train dataloader""" | ||
if self._train_split_fn is not None and self._train_device_transformation_fn is not None: | ||
return DataloaderSpec(self.state.train_dataloader, self._train_device_transformation_fn, | ||
self._train_split_fn) | ||
else: | ||
return self.state.train_dataloader | ||
|
||
@train_dataloader.setter |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As discussed, we should remove all the setters here
if self._train_split_fn is None: | ||
split_fn = default_batch_split_fn | ||
else: | ||
split_fn = self._train_split_fn |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if self._train_split_fn is None: | |
split_fn = default_batch_split_fn | |
else: | |
split_fn = self._train_split_fn | |
split_fn = default_batch_split_fn if self._train_split_fn is None else self._train_split_fn |
or.. put this logic closer to where the self._train_split_fn
is defined?
checkpoint_interval_unit=checkpoint_interval_unit) | ||
if checkpoint_interval_unit is not None and self.deepspeed_enabled: | ||
raise NotImplementedError("Checkpointing is not yet supported with DeepSpeed.") | ||
self._checkpointer = Checkpointer(checkpoint_folder=checkpoint_folder, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this need not be private
Closing this PR as we need to revisit multiple calls to .fit |
Most arguments that could be passed to the
__init__
of the trainer are now bound to the trainer either as raw attributes or properties. This will enable changing the trainer behavior when doing interactive development.