Bind trainer `init` arguments as properites/attributes #154

ravi-mosaicml · 2021-12-15T01:22:09Z

Most arguments that could be passed to the __init__ of the trainer are now bound to the trainer either as raw attributes or properties. This will enable changing the trainer behavior when doing interactive development.

Making state a dataclass was more hassle than it was worth, since dataclasses do not easily allow getters to have a different type signature than setters and the init. Instead, switched to using a plain old python class. This change allowed for the following cleanup: * Replaced `train_batch_size` and `eval_batch_size` with getters that read from the dataloader * Added getters for `state.optimizers` and `state.schedulers` which return a list, regardless of the type passed to the setters or upon init * Removed epoch, step, loss, batch, and outputs from the init args, as they would be updated by the trainer (and would not be set on init) * Added setters for `optimizers`, `schedulers`, `callbacks`, and `algorithms` that ensure in-place operations, so references to previous state objects remain valid

…gs_as_attributes

Most arguments that could be passed to the `__init__` of the trainer are now bound to the trainer either as raw attributes or properties. This will enable changing the trainer behavior when doing interactive development.

siriuslee

Some small comments. There was a lot in here that didn't have anything to do with the getting and setting. From what I could tell, they looked sensible and fine, but I'll defer to folks that might be more aware of the reasons for those changes. The init stuff mostly looks good to me, though.

composer/trainer/checkpoint.py

composer/trainer/trainer.py

Averylamp · 2021-12-16T20:01:37Z

composer/trainer/trainer.py

-        self.state.optimizers = optimizer
-        self.state.schedulers = ComposedScheduler(schedulers=schedulers)
+        self.state.optimizers = optimizers
+        self.state.schedulers = [ComposedScheduler(schedulers=schedulers)]


Not sure, but the Composed Scheduler casting maybe could be the underlying schedulers object? A little opinionated, but can auto set ComposedScheduler in the setter

Averylamp · 2021-12-16T20:02:41Z

composer/trainer/trainer.py

+    @property
+    def model(self) -> BaseMosaicModel:
+        """The original model"""
+        return ddp.get_original_model(self.state.model)
+
+    @property
+    def train_dataloader(self) -> Union[DataLoader, DataloaderSpec]:
+        """The train dataloader"""
+        if self._train_split_fn is not None and self._train_device_transformation_fn is not None:
+            return DataloaderSpec(self.state.train_dataloader, self._train_device_transformation_fn,
+                                  self._train_split_fn)
+        else:
+            return self.state.train_dataloader
+
+    @train_dataloader.setter


I think possibly it'd be nice to go to a TrainerProperties style separation of all properties for ease of reading all properties like PTL? maybe separate into a separate file

As discussed, we should remove all the setters here

Averylamp

lgtm, I think I'd very much prefer breaking out properties into a TrainerProperties interface just to keep the Trainer file a little smaller and separate out the bloat

siriuslee · 2021-12-16T22:28:26Z

composer/trainer/trainer.py

    @property
    def grad_accum(self):
        """Gradient Accumulation"""
        return self.state.grad_accum

    @grad_accum.setter
    def grad_accum(self, grad_accum: int):
+        if self.deepspeed_enabled:
+            raise RuntimeError("Cannot change grad_accum when using deepspeed")


Could you say what should be done instead? Is this another "If you'd like to change it, create a new Trainer" type of thing?

ajaysaini725 · 2021-12-16T20:04:05Z

composer/core/state.py

+        if val is not None and val > len(self.train_dataloader):
+            warnings.warn(
+                textwrap.dedent(f"""StepsPerEpochWarning: The desired steps_per_epoch({val})
+                    is greater than the number of batches in the training dataloader


Should include a short description of what will happen / what behavior to expect in this case (even if it's just ignored)

ajaysaini725 · 2021-12-16T20:26:43Z

composer/trainer/checkpoint.py

+        if checkpoint_interval_unit.lower() == "it":
+            self._save_event = Event.BATCH_END
+            return
+        raise RuntimeError(f"Invalid checkpoint_interval_unit: {checkpoint_interval_unit}")


Suggested change

raise RuntimeError(f"Invalid checkpoint_interval_unit: {checkpoint_interval_unit}")

raise RuntimeError(f"Invalid checkpoint_interval_unit: {checkpoint_interval_unit} must be one of 'it', 'ep'")

hanlint · 2021-12-17T22:15:28Z

composer/core/logging/logger.py

+        if backends is None:
+            self.backends = []
+        else:
+            self.backends = backends


Suggested change

if backends is None:

self.backends = []

else:

self.backends = backends

self.backends = [] if backends is None else backends

hanlint · 2021-12-17T22:19:34Z

composer/trainer/trainer.py


        if isinstance(train_dataloader, DataloaderSpec):
-            train_dataloader_spec = train_dataloader
+            self._train_device_transformation_fn = train_dataloader.device_transform_fn


These changes don't seem related to binding the trainer arguments as properties?

I also find the previous logic to be more readable, and now the handling of train_dataloader and eval_dataloader are not consistent.

hanlint · 2021-12-17T22:22:06Z

composer/trainer/trainer.py

-                        ({self.state.steps_per_epoch})"""))
-            else:
-                self.state.steps_per_epoch = train_subset_num_batches
+        self.state.steps_per_epoch = train_subset_num_batches


curious to understand why train_subset and eval_subset are now handled differently?

hanlint · 2021-12-17T22:24:34Z

composer/trainer/trainer.py

        self.engine = Engine(self.state, self.state.algorithms, self.logger, self.state.callbacks)

        self.validate_every_n_batches = validate_every_n_batches
        self.validate_every_n_epochs = validate_every_n_epochs
        self.compute_training_metrics = compute_training_metrics
-        self.grad_clip_norm = grad_clip_norm
+        self._grad_clip_norm = grad_clip_norm


since this is not a property of state, we can not make this a getter/private attribute

hanlint · 2021-12-17T22:25:59Z

composer/trainer/trainer.py

+    @property
+    def model(self) -> BaseMosaicModel:
+        """The original model"""
+        return ddp.get_original_model(self.state.model)
+
+    @property
+    def train_dataloader(self) -> Union[DataLoader, DataloaderSpec]:
+        """The train dataloader"""
+        if self._train_split_fn is not None and self._train_device_transformation_fn is not None:
+            return DataloaderSpec(self.state.train_dataloader, self._train_device_transformation_fn,
+                                  self._train_split_fn)
+        else:
+            return self.state.train_dataloader
+
+    @train_dataloader.setter


As discussed, we should remove all the setters here

hanlint · 2021-12-17T22:28:20Z

composer/trainer/trainer.py

+                    if self._train_split_fn is None:
+                        split_fn = default_batch_split_fn
+                    else:
+                        split_fn = self._train_split_fn


Suggested change

if self._train_split_fn is None:

split_fn = default_batch_split_fn

else:

split_fn = self._train_split_fn

split_fn = default_batch_split_fn if self._train_split_fn is None else self._train_split_fn

or.. put this logic closer to where the self._train_split_fn is defined?

hanlint · 2021-12-17T22:29:05Z

composer/trainer/trainer.py

-                                             checkpoint_interval_unit=checkpoint_interval_unit)
+        if checkpoint_interval_unit is not None and self.deepspeed_enabled:
+            raise NotImplementedError("Checkpointing is not yet supported with DeepSpeed.")
+        self._checkpointer = Checkpointer(checkpoint_folder=checkpoint_folder,


this need not be private

ravi-mosaicml · 2022-01-05T00:49:12Z

Closing this PR as we need to revisit multiple calls to .fit

ravi-mosaicml added 5 commits December 14, 2021 11:19

WIP

80a01b8

Add most trainer init arguments as trainer properties

43b2dc3

Merge branch 'ravi/state_without_dataclass' into ravi/trainer_init_ar…

5fd4c65

…gs_as_attributes

Added most trainer __init__ arguments as trainer attributes

c3f4396

Most arguments that could be passed to the `__init__` of the trainer are now bound to the trainer either as raw attributes or properties. This will enable changing the trainer behavior when doing interactive development.

ravi-mosaicml requested review from siriuslee, jbloxham, Averylamp, A-Jacobson, ajaysaini725, growlix and anisehsani December 15, 2021 01:22

ravi-mosaicml mentioned this pull request Dec 15, 2021

Multiple calls to .fit #138

Closed

2 tasks

Added errors when attempting to change some params when using deepspeed

40aab66

ravi-mosaicml changed the title ~~Ravi/trainer init args as attributes~~ Bind trainer __init__ arguments as properites/attributes Dec 15, 2021

siriuslee reviewed Dec 15, 2021

View reviewed changes

composer/trainer/checkpoint.py Outdated Show resolved Hide resolved

composer/trainer/trainer.py Show resolved Hide resolved

composer/trainer/trainer.py Show resolved Hide resolved

composer/trainer/trainer.py Show resolved Hide resolved

Addressed PR feedback

6acde1f

Base automatically changed from ravi/state_without_dataclass to dev December 16, 2021 19:29

Merge branch 'dev' into ravi/trainer_init_args_as_attributes

8a520d2

Averylamp reviewed Dec 16, 2021

View reviewed changes

Averylamp approved these changes Dec 16, 2021

View reviewed changes

siriuslee reviewed Dec 16, 2021

View reviewed changes

ajaysaini725 reviewed Dec 16, 2021

View reviewed changes

hanlint requested changes Dec 17, 2021

View reviewed changes

ravi-mosaicml closed this Jan 5, 2022

Averylamp deleted the ravi/trainer_init_args_as_attributes branch March 10, 2022 00:00

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bind trainer `init` arguments as properites/attributes #154

Bind trainer `init` arguments as properites/attributes #154

ravi-mosaicml commented Dec 15, 2021 •

edited

Loading

siriuslee left a comment

Averylamp Dec 16, 2021

Averylamp Dec 16, 2021

hanlint Dec 17, 2021

Averylamp left a comment

siriuslee Dec 16, 2021

ajaysaini725 Dec 16, 2021

ajaysaini725 Dec 16, 2021

hanlint Dec 17, 2021

hanlint Dec 17, 2021

hanlint Dec 17, 2021

hanlint Dec 17, 2021

hanlint Dec 17, 2021

hanlint Dec 17, 2021

hanlint Dec 17, 2021

ravi-mosaicml commented Jan 5, 2022

	raise RuntimeError(f"Invalid checkpoint_interval_unit: {checkpoint_interval_unit}")
	raise RuntimeError(f"Invalid checkpoint_interval_unit: {checkpoint_interval_unit} must be one of 'it', 'ep'")

Bind trainer __init__ arguments as properites/attributes #154

Bind trainer __init__ arguments as properites/attributes #154

Conversation

ravi-mosaicml commented Dec 15, 2021 • edited Loading

siriuslee left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Averylamp left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ravi-mosaicml commented Jan 5, 2022

Bind trainer `init` arguments as properites/attributes #154

Bind trainer `init` arguments as properites/attributes #154

ravi-mosaicml commented Dec 15, 2021 •

edited

Loading