Skip to content

Commit

Permalink
Merge branch 'master' into ci/bump-pt-2.6
Browse files Browse the repository at this point in the history
  • Loading branch information
Borda authored Feb 14, 2025
2 parents 3d8f484 + 4eda5a0 commit d54ab0b
Show file tree
Hide file tree
Showing 30 changed files with 40 additions and 44 deletions.
2 changes: 1 addition & 1 deletion .github/CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -189,7 +189,7 @@ We welcome any useful contribution! For your convenience here's a recommended wo
#### How can I help/contribute?

All types of contributions are welcome - reporting bugs, fixing documentation, adding test cases, solving issues, and preparing bug fixes.
To get started with code contributions, look for issues marked with the label [good first issue](https://github.com/Lightning-AI/lightning/issues?q=is%3Aopen+is%3Aissue+label%3A%22good+first+issue%22) or chose something close to your domain with the label [help wanted](https://github.com/Lightning-AI/lightning/issues?q=is%3Aopen+is%3Aissue+label%3A%22help+wanted%22). Before coding, make sure that the issue description is clear and comment on the issue so that we can assign it to you (or simply self-assign if you can).
To get started with code contributions, look for issues marked with the label [good first issue](https://github.com/Lightning-AI/pytorch-lightning/issues?q=is%3Aopen+is%3Aissue+label%3A%22good+first+issue%22) or chose something close to your domain with the label [help wanted](https://github.com/Lightning-AI/pytorch-lightning/issues?q=is%3Aopen+is%3Aissue+label%3A%22help+wanted%22). Before coding, make sure that the issue description is clear and comment on the issue so that we can assign it to you (or simply self-assign if you can).

#### Is there a recommendation for branch names?

Expand Down
2 changes: 1 addition & 1 deletion docs/source-fabric/_templates/theme_variables.jinja
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
{%- set external_urls = {
'github': 'https://github.com/Lightning-AI/lightning',
'github_issues': 'https://github.com/Lightning-AI/lightning/issues',
'github_issues': 'https://github.com/Lightning-AI/pytorch-lightning/issues',
'contributing': 'https://github.com/Lightning-AI/lightning/blob/master/.github/CONTRIBUTING.md',
'governance': 'https://lightning.ai/docs/pytorch/latest/community/governance.html',
'docs': 'https://lightning.ai/docs/fabric/',
Expand Down
2 changes: 1 addition & 1 deletion docs/source-fabric/links.rst
Original file line number Diff line number Diff line change
@@ -1,3 +1,3 @@
.. _PyTorchJob: https://www.kubeflow.org/docs/components/training/pytorch/
.. _PyTorchJob: https://www.kubeflow.org/docs/components/trainer/legacy-v1/user-guides/pytorch/
.. _Kubeflow: https://www.kubeflow.org
.. _Trainer: https://lightning.ai/docs/pytorch/stable/common/trainer.html
2 changes: 1 addition & 1 deletion docs/source-pytorch/_templates/theme_variables.jinja
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
{%- set external_urls = {
'github': 'https://github.com/Lightning-AI/lightning',
'github_issues': 'https://github.com/Lightning-AI/lightning/issues',
'github_issues': 'https://github.com/Lightning-AI/pytorch-lightning/issues',
'contributing': 'https://github.com/Lightning-AI/lightning/blob/master/.github/CONTRIBUTING.md',
'governance': 'https://lightning.ai/docs/pytorch/latest/community/governance.html',
'docs': 'https://lightning.ai/docs/pytorch/latest/',
Expand Down
2 changes: 1 addition & 1 deletion docs/source-pytorch/accelerators/accelerator_prepare.rst
Original file line number Diff line number Diff line change
Expand Up @@ -123,7 +123,7 @@ It is possible to perform some computation manually and log the reduced result o
# When you call `self.log` only on rank 0, don't forget to add
# `rank_zero_only=True` to avoid deadlocks on synchronization.
# Caveat: monitoring this is unimplemented, see https://github.com/Lightning-AI/lightning/issues/15852
# Caveat: monitoring this is unimplemented, see https://github.com/Lightning-AI/pytorch-lightning/issues/15852
if self.trainer.is_global_zero:
self.log("my_reduced_metric", mean, rank_zero_only=True)
Expand Down
4 changes: 0 additions & 4 deletions docs/source-pytorch/accelerators/gpu_intermediate.rst
Original file line number Diff line number Diff line change
Expand Up @@ -25,10 +25,6 @@ Lightning supports multiple ways of doing distributed training.
.. note::
If you request multiple GPUs or nodes without setting a strategy, DDP will be automatically used.

For a deeper understanding of what Lightning is doing, feel free to read this
`guide <https://towardsdatascience.com/9-tips-for-training-lightning-fast-neural-networks-in-pytorch-8e63a502f565>`_.


----


Expand Down
2 changes: 1 addition & 1 deletion docs/source-pytorch/advanced/ddp_optimizations.rst
Original file line number Diff line number Diff line change
Expand Up @@ -58,7 +58,7 @@ On a Multi-Node Cluster, Set NCCL Parameters
********************************************

`NCCL <https://developer.nvidia.com/nccl>`__ is the NVIDIA Collective Communications Library that is used by PyTorch to handle communication across nodes and GPUs.
There are reported benefits in terms of speedups when adjusting NCCL parameters as seen in this `issue <https://github.com/Lightning-AI/lightning/issues/7179>`__.
There are reported benefits in terms of speedups when adjusting NCCL parameters as seen in this `issue <https://github.com/Lightning-AI/pytorch-lightning/issues/7179>`__.
In the issue, we see a 30% speed improvement when training the Transformer XLM-RoBERTa and a 15% improvement in training with Detectron2.
NCCL parameters can be adjusted via environment variables.

Expand Down
2 changes: 1 addition & 1 deletion docs/source-pytorch/advanced/model_parallel/deepspeed.rst
Original file line number Diff line number Diff line change
Expand Up @@ -319,7 +319,7 @@ Additionally, DeepSpeed supports offloading to NVMe drives for even larger model
)
trainer.fit(model)
When offloading to NVMe you may notice that the speed is slow. There are parameters that need to be tuned based on the drives that you are using. Running the `aio_bench_perf_sweep.py <https://github.com/microsoft/DeepSpeed/blob/master/csrc/aio/py_test/aio_bench_perf_sweep.py>`__ script can help you to find optimum parameters. See the `issue <https://github.com/microsoft/DeepSpeed/issues/998>`__ for more information on how to parse the information.
When offloading to NVMe you may notice that the speed is slow. There are parameters that need to be tuned based on the drives that you are using. Running the `aio_bench_perf_sweep.py <https://github.com/microsoft/DeepSpeed/blob/master/csrc/aio/py_test/aio_bench_perf_sweep.py>`__ script can help you to find optimum parameters. See the `issue <https://github.com/deepspeedai/DeepSpeed/issues/998>`__ for more information on how to parse the information.

.. _deepspeed-activation-checkpointing:

Expand Down
2 changes: 1 addition & 1 deletion docs/source-pytorch/data/alternatives.rst
Original file line number Diff line number Diff line change
Expand Up @@ -90,7 +90,7 @@ the desired GPU in your pipeline. When moving data to a specific device, you can
WebDataset
^^^^^^^^^^

The `WebDataset <https://webdataset.github.io/webdataset>`__ makes it easy to write I/O pipelines for large datasets.
The `WebDataset <https://github.com/webdataset/webdataset>`__ makes it easy to write I/O pipelines for large datasets.
Datasets can be stored locally or in the cloud. ``WebDataset`` is just an instance of a standard IterableDataset.
The webdataset library contains a small wrapper (``WebLoader``) that adds a fluid interface to the DataLoader (and is otherwise identical).

Expand Down
2 changes: 1 addition & 1 deletion docs/source-pytorch/data/iterables.rst
Original file line number Diff line number Diff line change
Expand Up @@ -50,7 +50,7 @@ To choose a different mode, you can use the :class:`~lightning.pytorch.utilities
Currently, the ``trainer.predict`` method only supports the ``"sequential"`` mode, while ``trainer.fit`` method does not support it.
Support for this feature is tracked in this `issue <https://github.com/Lightning-AI/lightning/issues/16830>`__.
Support for this feature is tracked in this `issue <https://github.com/Lightning-AI/pytorch-lightning/issues/16830>`__.

Note that when using the ``"sequential"`` mode, you need to add an additional argument ``dataloader_idx`` to some specific hooks.
Lightning will `raise an error <https://github.com/Lightning-AI/lightning/pull/16837>`__ informing you of this requirement.
Expand Down
2 changes: 1 addition & 1 deletion docs/source-pytorch/links.rst
Original file line number Diff line number Diff line change
@@ -1,2 +1,2 @@
.. _PyTorchJob: https://www.kubeflow.org/docs/components/training/pytorch/
.. _PyTorchJob: https://www.kubeflow.org/docs/components/trainer/legacy-v1/user-guides/pytorch/
.. _Kubeflow: https://www.kubeflow.org
4 changes: 2 additions & 2 deletions docs/source-pytorch/versioning.rst
Original file line number Diff line number Diff line change
Expand Up @@ -61,8 +61,8 @@ For API removal, renaming or other forms of backwards-incompatible changes, the
#. From that version onward, the deprecation warning gets converted into a helpful error, which will remain until next major release.

This policy is not strict. Shorter or longer deprecation cycles may apply to some cases.
For example, in the past DDP2 was removed without a deprecation process because the feature was broken and unusable beyond fixing as discussed in `#12584 <https://github.com/Lightning-AI/lightning/issues/12584>`_.
Also, `#10410 <https://github.com/Lightning-AI/lightning/issues/10410>`_ is an example that a longer deprecation applied to. We deprecated the accelerator arguments, such as ``Trainer(gpus=...)``, in 1.7, however, because the APIs were so core that they would impact almost all use cases, we decided not to introduce the breaking change until 2.0.
For example, in the past DDP2 was removed without a deprecation process because the feature was broken and unusable beyond fixing as discussed in `#12584 <https://github.com/Lightning-AI/pytorch-lightning/issues/12584>`_.
Also, `#10410 <https://github.com/Lightning-AI/pytorch-lightning/issues/10410>`_ is an example that a longer deprecation applied to. We deprecated the accelerator arguments, such as ``Trainer(gpus=...)``, in 1.7, however, because the APIs were so core that they would impact almost all use cases, we decided not to introduce the breaking change until 2.0.

Compatibility matrix
********************
Expand Down
2 changes: 1 addition & 1 deletion src/lightning/__setup__.py
Original file line number Diff line number Diff line change
Expand Up @@ -104,7 +104,7 @@ def _setup_args() -> dict[str, Any]:
"install_requires": install_requires,
"extras_require": _prepare_extras(),
"project_urls": {
"Bug Tracker": "https://github.com/Lightning-AI/lightning/issues",
"Bug Tracker": "https://github.com/Lightning-AI/pytorch-lightning/issues",
"Documentation": "https://lightning.ai/lightning-docs",
"Source Code": "https://github.com/Lightning-AI/lightning",
},
Expand Down
2 changes: 1 addition & 1 deletion src/lightning/fabric/CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -337,7 +337,7 @@ Removed legacy supoport for `lightning run model`. Use `fabric run` instead. ([#
### Fixed

- Fixed computing the next version folder in `CSVLogger` ([#17139](https://github.com/Lightning-AI/lightning/pull/17139))
- Fixed inconsistent settings for FSDP Precision ([#17670](https://github.com/Lightning-AI/lightning/issues/17670))
- Fixed inconsistent settings for FSDP Precision ([#17670](https://github.com/Lightning-AI/pytorch-lightning/issues/17670))


## [2.0.2] - 2023-04-24
Expand Down
2 changes: 1 addition & 1 deletion src/lightning/fabric/plugins/environments/kubeflow.py
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,7 @@ class KubeflowEnvironment(ClusterEnvironment):
This environment, unlike others, does not get auto-detected and needs to be passed to the Fabric/Trainer
constructor manually.
.. _PyTorchJob: https://www.kubeflow.org/docs/components/training/pytorch/
.. _PyTorchJob: https://www.kubeflow.org/docs/components/trainer/legacy-v1/user-guides/pytorch/
.. _Kubeflow: https://www.kubeflow.org
"""
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -78,7 +78,7 @@ def __init__(
def is_interactive_compatible(self) -> bool:
# The start method 'spawn' is not supported in interactive environments
# The start method 'fork' is the only one supported in Jupyter environments, with constraints around CUDA
# initialization. For more context, see https://github.com/Lightning-AI/lightning/issues/7550
# initialization. For more context, see https://github.com/Lightning-AI/pytorch-lightning/issues/7550
return self._start_method == "fork"

@override
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -156,7 +156,7 @@ def _check_can_spawn_children(self) -> None:


def _basic_subprocess_cmd() -> Sequence[str]:
import __main__ # local import to avoid https://github.com/Lightning-AI/lightning/issues/15218
import __main__ # local import to avoid https://github.com/Lightning-AI/pytorch-lightning/issues/15218

if __main__.__spec__ is None: # pragma: no-cover
return [sys.executable, os.path.abspath(sys.argv[0])] + sys.argv[1:]
Expand All @@ -167,7 +167,7 @@ def _hydra_subprocess_cmd(local_rank: int) -> tuple[Sequence[str], str]:
from hydra.core.hydra_config import HydraConfig
from hydra.utils import get_original_cwd, to_absolute_path

import __main__ # local import to avoid https://github.com/Lightning-AI/lightning/issues/15218
import __main__ # local import to avoid https://github.com/Lightning-AI/pytorch-lightning/issues/15218

# when user is using hydra find the absolute path
if __main__.__spec__ is None: # pragma: no-cover
Expand Down
14 changes: 7 additions & 7 deletions src/lightning/pytorch/CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -199,27 +199,27 @@ The format is based on [Keep a Changelog](http://keepachangelog.com/en/1.0.0/).
- Fixed handling checkpoint dirpath suffix in NeptuneLogger ([#18863](https://github.com/Lightning-AI/lightning/pull/18863))
- Fixed an edge case where `ModelCheckpoint` would alternate between versioned and unversioned filename ([#19064](https://github.com/Lightning-AI/lightning/pull/19064))
- Fixed broadcast at initialization in `MPIEnvironment` ([#19074](https://github.com/Lightning-AI/lightning/pull/19074))
- Fixed the tensor conversion in `self.log` to respect the default dtype ([#19046](https://github.com/Lightning-AI/lightning/issues/19046))
- Fixed the tensor conversion in `self.log` to respect the default dtype ([#19046](https://github.com/Lightning-AI/pytorch-lightning/issues/19046))


## [2.1.2] - 2023-11-15

### Fixed

- Fixed an issue causing permission errors on Windows when attempting to create a symlink for the "last" checkpoint ([#18942](https://github.com/Lightning-AI/lightning/issues/18942))
- Fixed an issue where Metric instances from `torchmetrics` wouldn't get moved to the device when using FSDP ([#18954](https://github.com/Lightning-AI/lightning/issues/18954))
- Fixed an issue preventing the user to `Trainer.save_checkpoint()` an FSDP model when `Trainer.test/validate/predict()` ran after `Trainer.fit()` ([#18992](https://github.com/Lightning-AI/lightning/issues/18992))
- Fixed an issue causing permission errors on Windows when attempting to create a symlink for the "last" checkpoint ([#18942](https://github.com/Lightning-AI/pytorch-lightning/issues/18942))
- Fixed an issue where Metric instances from `torchmetrics` wouldn't get moved to the device when using FSDP ([#18954](https://github.com/Lightning-AI/pytorch-lightning/issues/18954))
- Fixed an issue preventing the user to `Trainer.save_checkpoint()` an FSDP model when `Trainer.test/validate/predict()` ran after `Trainer.fit()` ([#18992](https://github.com/Lightning-AI/pytorch-lightning/issues/18992))


## [2.1.1] - 2023-11-06

### Fixed

- Fixed an issue when replacing an existing `last.ckpt` file with a symlink ([#18793](https://github.com/Lightning-AI/lightning/pull/18793))
- Fixed an issue when `BatchSizeFinder` `steps_per_trial` parameter ends up defining how many validation batches to run during the entire training ([#18394](https://github.com/Lightning-AI/lightning/issues/18394))
- Fixed an issue saving the `last.ckpt` file when using `ModelCheckpoint` on a remote filesystem and no logger is used ([#18867](https://github.com/Lightning-AI/lightning/issues/18867))
- Fixed an issue when `BatchSizeFinder` `steps_per_trial` parameter ends up defining how many validation batches to run during the entire training ([#18394](https://github.com/Lightning-AI/pytorch-lightning/issues/18394))
- Fixed an issue saving the `last.ckpt` file when using `ModelCheckpoint` on a remote filesystem and no logger is used ([#18867](https://github.com/Lightning-AI/pytorch-lightning/issues/18867))
- Refined the FSDP saving logic and error messaging when path exists ([#18884](https://github.com/Lightning-AI/lightning/pull/18884))
- Fixed an issue parsing the version from folders that don't include a version number in `TensorBoardLogger` and `CSVLogger` ([#18897](https://github.com/Lightning-AI/lightning/issues/18897))
- Fixed an issue parsing the version from folders that don't include a version number in `TensorBoardLogger` and `CSVLogger` ([#18897](https://github.com/Lightning-AI/pytorch-lightning/issues/18897))


## [2.1.0] - 2023-10-11
Expand Down
2 changes: 1 addition & 1 deletion src/lightning/pytorch/callbacks/stochastic_weight_avg.py
Original file line number Diff line number Diff line change
Expand Up @@ -354,7 +354,7 @@ def _clear_schedulers(trainer: "pl.Trainer") -> None:
# Note that this relies on the callback state being restored before the scheduler state is
# restored, and doesn't work if restore_checkpoint_after_setup is True, but at the time of
# writing that is only True for deepspeed which is already not supported by SWA.
# See https://github.com/Lightning-AI/lightning/issues/11665 for background.
# See https://github.com/Lightning-AI/pytorch-lightning/issues/11665 for background.
if trainer.lr_scheduler_configs:
assert len(trainer.lr_scheduler_configs) == 1
trainer.lr_scheduler_configs.clear()
Expand Down
2 changes: 1 addition & 1 deletion src/lightning/pytorch/plugins/precision/xla.py
Original file line number Diff line number Diff line change
Expand Up @@ -79,7 +79,7 @@ def optimizer_step( # type: ignore[override]
# we lack coverage here so disable this - something to explore if there's demand
raise MisconfigurationException(
"Skipping backward by returning `None` from your `training_step` is not implemented with XLA."
" Please, open an issue in `https://github.com/Lightning-AI/lightning/issues`"
" Please, open an issue in `https://github.com/Lightning-AI/pytorch-lightning/issues`"
" requesting this feature."
)
return closure_result
Expand Down
4 changes: 2 additions & 2 deletions src/lightning/pytorch/strategies/launchers/multiprocessing.py
Original file line number Diff line number Diff line change
Expand Up @@ -88,7 +88,7 @@ def __init__(
def is_interactive_compatible(self) -> bool:
# The start method 'spawn' is not supported in interactive environments
# The start method 'fork' is the only one supported in Jupyter environments, with constraints around CUDA
# initialization. For more context, see https://github.com/Lightning-AI/lightning/issues/7550
# initialization. For more context, see https://github.com/Lightning-AI/pytorch-lightning/issues/7550
return self._start_method == "fork"

@override
Expand All @@ -111,7 +111,7 @@ def launch(self, function: Callable, *args: Any, trainer: Optional["pl.Trainer"]
if self._start_method == "spawn":
_check_missing_main_guard()
if self._already_fit and trainer is not None and trainer.state.fn == TrainerFn.FITTING:
# resolving https://github.com/Lightning-AI/lightning/issues/18775 will lift this restriction
# resolving https://github.com/Lightning-AI/pytorch-lightning/issues/18775 will lift this restriction
raise NotImplementedError(
"Calling `trainer.fit()` twice on the same Trainer instance using a spawn-based strategy is not"
" supported. You can work around this limitation by creating a new Trainer instance and passing the"
Expand Down
2 changes: 1 addition & 1 deletion src/lightning/pytorch/strategies/launchers/xla.py
Original file line number Diff line number Diff line change
Expand Up @@ -76,7 +76,7 @@ def launch(self, function: Callable, *args: Any, trainer: Optional["pl.Trainer"]
"""
if self._already_fit and trainer is not None and trainer.state.fn == TrainerFn.FITTING:
# resolving https://github.com/Lightning-AI/lightning/issues/18775 will lift this restriction
# resolving https://github.com/Lightning-AI/pytorch-lightning/issues/18775 will lift this restriction
raise NotImplementedError(
"Calling `trainer.fit()` twice on the same Trainer instance using a spawn-based strategy is not"
" supported. You can work around this by creating a new Trainer instance and passing the"
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -154,7 +154,7 @@ def check_logging(cls, fx_name: str) -> None:
if fx_name not in cls.functions:
raise RuntimeError(
f"Logging inside `{fx_name}` is not implemented."
" Please, open an issue in `https://github.com/Lightning-AI/lightning/issues`."
" Please, open an issue in `https://github.com/Lightning-AI/pytorch-lightning/issues`."
)

if cls.functions[fx_name] is None:
Expand Down
2 changes: 1 addition & 1 deletion src/lightning_fabric/__setup__.py
Original file line number Diff line number Diff line change
Expand Up @@ -85,7 +85,7 @@ def _setup_args() -> dict[str, Any]:
},
"extras_require": _prepare_extras(),
"project_urls": {
"Bug Tracker": "https://github.com/Lightning-AI/lightning/issues",
"Bug Tracker": "https://github.com/Lightning-AI/pytorch-lightning/issues",
"Documentation": "https://pytorch-lightning.rtfd.io/en/latest/",
"Source Code": "https://github.com/Lightning-AI/lightning",
},
Expand Down
2 changes: 1 addition & 1 deletion src/pytorch_lightning/__setup__.py
Original file line number Diff line number Diff line change
Expand Up @@ -89,7 +89,7 @@ def _setup_args() -> dict[str, Any]:
),
"extras_require": _prepare_extras(),
"project_urls": {
"Bug Tracker": "https://github.com/Lightning-AI/lightning/issues",
"Bug Tracker": "https://github.com/Lightning-AI/pytorch-lightning/issues",
"Documentation": "https://pytorch-lightning.rtfd.io/en/latest/",
"Source Code": "https://github.com/Lightning-AI/lightning",
},
Expand Down
4 changes: 2 additions & 2 deletions tests/tests_pytorch/callbacks/test_early_stopping.py
Original file line number Diff line number Diff line change
Expand Up @@ -61,8 +61,8 @@ def on_train_epoch_end(self, trainer, pl_module):
def test_resume_early_stopping_from_checkpoint(tmp_path):
"""Prevent regressions to bugs:
https://github.com/Lightning-AI/lightning/issues/1464
https://github.com/Lightning-AI/lightning/issues/1463
https://github.com/Lightning-AI/pytorch-lightning/issues/1464
https://github.com/Lightning-AI/pytorch-lightning/issues/1463
"""
seed_everything(42)
Expand Down
Loading

0 comments on commit d54ab0b

Please sign in to comment.