Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Run Directory Uploader #101

Merged
merged 58 commits into from
Dec 3, 2021
Merged
Show file tree
Hide file tree
Changes from 53 commits
Commits
Show all changes
58 commits
Select commit Hold shift + click to select a range
6357f2e
Added `run_event` to callback
ravi-mosaicml Nov 15, 2021
f395df4
Removed callback helper methods
ravi-mosaicml Nov 16, 2021
0f1aa69
Fixed tests
ravi-mosaicml Nov 16, 2021
06cac4b
Formatting
ravi-mosaicml Nov 16, 2021
d886af6
Addressed PR feedback
ravi-mosaicml Nov 18, 2021
9644ad9
Fixed tests
ravi-mosaicml Nov 18, 2021
cf5e533
Formatting
ravi-mosaicml Nov 18, 2021
b1bf400
Fixed _run_event
ravi-mosaicml Nov 18, 2021
9bffe3b
Merge branch 'dev' into ravi/run_event
ravi-mosaicml Nov 19, 2021
4ed9f4f
Formatting
ravi-mosaicml Nov 19, 2021
75944eb
Removed ip
ravi-mosaicml Nov 19, 2021
cee479f
Create dataloader on trainer __init__()
ravi-mosaicml Nov 19, 2021
f2f4ede
Merge branch 'ravi/run_event' into ravi/libcloud
ravi-mosaicml Nov 22, 2021
8bf1c67
Merge branch 'ravi/create_dataloaders_in_init' into ravi/libcloud
ravi-mosaicml Nov 22, 2021
8b3563e
Run Directory Uploader
ravi-mosaicml Nov 22, 2021
c8ccb49
Merge branch 'dev' into ravi/run_event
ravi-mosaicml Nov 22, 2021
5214f39
Supporting both styles for callbacks
ravi-mosaicml Nov 23, 2021
47158fb
Minimizing Diff
ravi-mosaicml Nov 23, 2021
35faa29
Fixed tests
ravi-mosaicml Nov 23, 2021
d20c914
Merge branch 'dev' into ravi/run_event
ravi-mosaicml Nov 23, 2021
254bd51
Merge branch 'dev' into ravi/run_event
ravi-mosaicml Nov 23, 2021
0d02d07
Merge branch 'ravi/run_event' into ravi/libcloud
ravi-mosaicml Nov 23, 2021
d568aa6
Added fasteners
ravi-mosaicml Nov 23, 2021
f0d2090
Lazy population of kwargs
ravi-mosaicml Nov 23, 2021
06ade34
1. Added object_name_prefix
ravi-mosaicml Nov 23, 2021
e561463
Addressed PR feedback
ravi-mosaicml Nov 23, 2021
f3aa6bd
Remove the composer.trainer.ddp class
ravi-mosaicml Nov 23, 2021
a28ce89
Merge branch 'ravi/run_event' into ravi/ddp_global
ravi-mosaicml Nov 23, 2021
568e3c9
Merge branch 'ravi/ddp_global' into ravi/libcloud
ravi-mosaicml Nov 23, 2021
f706ff8
Added in DDP barrier
ravi-mosaicml Nov 23, 2021
d7841ca
Fixed tests
ravi-mosaicml Nov 23, 2021
c30a274
Merge branch 'dev' into ravi/run_event
ravi-mosaicml Nov 23, 2021
0509df5
Merge branch 'ravi/run_event' into ravi/ddp_global
ravi-mosaicml Nov 23, 2021
9e12061
Update composer/utils/ddp.py
jbloxham Nov 30, 2021
87c0441
Update composer/utils/ddp.py
jbloxham Nov 30, 2021
6a6427e
Switched tqdm to using callback hooks
ravi-mosaicml Nov 30, 2021
c9ee85b
Merge branch 'dev' into ravi/run_event
ravi-mosaicml Nov 30, 2021
eb9def8
Fixed pyright
ravi-mosaicml Nov 30, 2021
20ba063
Merge branch 'ravi/run_event' into ravi/ddp_global
ravi-mosaicml Nov 30, 2021
b8863da
Fixed DDP barriers
ravi-mosaicml Nov 30, 2021
aa207f4
Merge branch 'ravi/ddp_global' of github.com:mosaicml/composer into r…
ravi-mosaicml Nov 30, 2021
8d3cceb
Merge branch 'ravi/ddp_global' into ravi/libcloud
ravi-mosaicml Nov 30, 2021
a913fa9
Increased timeout for run directory uploader
ravi-mosaicml Nov 30, 2021
00fcc33
Switched callback format for run directory uploader
ravi-mosaicml Nov 30, 2021
4214ce4
Merge branch 'dev' into ravi/libcloud
ravi-mosaicml Nov 30, 2021
5066bdc
Replaced `atexit` with cleanup methods
ravi-mosaicml Nov 30, 2021
3d5c86f
Merge branch 'ravi/remove_atexit' into ravi/libcloud
ravi-mosaicml Nov 30, 2021
5171468
Uncommented code
ravi-mosaicml Nov 30, 2021
97326bd
Running callbacks befor algorithms for the INIT event in the engine
ravi-mosaicml Nov 30, 2021
8c21260
Merge branch 'ravi/fix_engine_2' into ravi/remove_atexit
ravi-mosaicml Nov 30, 2021
d7f9514
Merge branch 'ravi/remove_atexit' into ravi/libcloud
ravi-mosaicml Nov 30, 2021
a4e3b24
Merge branch 'dev' into ravi/libcloud
ravi-mosaicml Dec 1, 2021
20dc896
Fixed tests
ravi-mosaicml Dec 1, 2021
42f9ab3
Addressed PR feedback
ravi-mosaicml Dec 1, 2021
481ab37
Fixed bug
ravi-mosaicml Dec 2, 2021
6fc5555
Fixed bugs
ravi-mosaicml Dec 2, 2021
ec7011e
Fixed rank 0 only uploads
ravi-mosaicml Dec 2, 2021
2d0b058
Using filesystem timestamps instead of python process timestamps to d…
ravi-mosaicml Dec 2, 2021
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions composer/callbacks/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,8 +6,10 @@
from composer.callbacks.callback_hparams import GradMonitorHparams as GradMonitorHparams
from composer.callbacks.callback_hparams import LRMonitorHparams as LRMonitorHparams
from composer.callbacks.callback_hparams import MemoryMonitorHparams as MemoryMonitorHparams
from composer.callbacks.callback_hparams import RunDirectoryUploaderHparams as RunDirectoryUploaderHparams
from composer.callbacks.callback_hparams import SpeedMonitorHparams as SpeedMonitorHparams
from composer.callbacks.callback_hparams import TorchProfilerHparams as TorchProfilerHparams
from composer.callbacks.lr_monitor import LRMonitor as LRMonitor
from composer.callbacks.run_directory_uploader import RunDirectoryUploader as RunDirectoryUploader
from composer.callbacks.speed_monitor import SpeedMonitor as SpeedMonitor
from composer.callbacks.torch_profiler import TorchProfiler as TorchProfiler
63 changes: 62 additions & 1 deletion composer/callbacks/callback_hparams.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,8 +4,9 @@
from __future__ import annotations

import abc
import textwrap
from dataclasses import asdict, dataclass
from typing import TYPE_CHECKING, List
from typing import TYPE_CHECKING, Any, Dict, List, Optional

import yahp as hp

Expand All @@ -16,6 +17,7 @@
from composer.callbacks.grad_monitor import GradMonitor
from composer.callbacks.lr_monitor import LRMonitor
from composer.callbacks.memory_monitor import MemoryMonitor
from composer.callbacks.run_directory_uploader import RunDirectoryUploader
from composer.callbacks.speed_monitor import SpeedMonitor
from composer.callbacks.torch_profiler import TorchProfiler

Expand Down Expand Up @@ -153,3 +155,62 @@ class TorchProfilerHparams(CallbackHparams):
def initialize_object(self) -> TorchProfiler:
from composer.callbacks.torch_profiler import TorchProfiler
return TorchProfiler(**asdict(self))


@dataclass
class RunDirectoryUploaderHparams(CallbackHparams):
""":class:`~composer.callbacks.torch_profiler.RunDirectoryUploader` hyperparameters.
See :class:`~composer.callbacks.torch_profiler.RunDirectoryUploader` for documentation.
"""

provider: str = hp.required("Cloud provider to use.")
container: str = hp.required("The name of the container (i.e. bucket) to use.")
object_name_prefix: Optional[str] = hp.optional(textwrap.dedent("""A prefix to prepend to all object keys.
An object's key is this prefix combined with its path relative to the run directory.
If the container prefix is non-empty, a trailing slash ('/') will
be added if necessary. If not specified, then the prefix defaults to the run directory. To disable prefixing,
set to the empty string."""),
default=None)
key: Optional[str] = hp.optional(textwrap.dedent(
"""API key or username to use to connect to the provider. For security. do NOT hardcode the key in the YAML.
Instead, please specify via CLI arguments, or even better, environment variables."""),
default=None)
secret: Optional[str] = hp.optional(textwrap.dedent(
"""API secret to use to connect to the provider. For security. do NOT hardcode the key in the YAML.
Instead, please specify via CLI arguments, or even better, environment variables."""),
default=None)
region: Optional[str] = hp.optional("Cloud region to use", default=None)
host: Optional[str] = hp.optional("Override hostname for connections", default=None)
port: Optional[int] = hp.optional("Override port for connections", default=None)
num_concurrent_uploads: int = hp.optional("Maximum number of concurrent uploads. Defaults to 4.", default=4)
use_procs: bool = hp.optional(
"Whether to perform file uploads in background processes (as opposed to threads). Defaults to True.",
default=True)
upload_staging_folder: Optional[str] = hp.optional(
"Staging folder for uploads. If not specified, will use a temporary directory.", default=None)
extra_init_kwargs: Dict[str, Any] = hp.optional(
"Extra keyword arguments to pass into the constructor for the specified provider.", default_factory=dict)
upload_every_n_batches: int = hp.optional(
textwrap.dedent("""Interval at which to scan the run directory for changes and to
queue uploads of files. Uploads are also queued at the end of the epoch. Defaults to every 100 batches."""),
default=100)

def initialize_object(self) -> RunDirectoryUploader:
from composer.callbacks.run_directory_uploader import RunDirectoryUploader
init_kwargs = {}
for key in ("key", "secret", "host", "port", "region"):
kwarg = getattr(self, key)
if getattr(self, key) is not None:
init_kwargs[key] = kwarg
init_kwargs.update(self.extra_init_kwargs)
return RunDirectoryUploader(
provider=self.provider,
container=self.container,
object_name_prefix=self.object_name_prefix,
num_concurrent_uploads=self.num_concurrent_uploads,
upload_staging_folder=self.upload_staging_folder,
use_procs=self.use_procs,
provider_init_kwargs=init_kwargs,
upload_every_n_batches=self.upload_every_n_batches,
)
Loading