diff --git a/docs/source/advanced/advanced_gpu.rst b/docs/source/advanced/advanced_gpu.rst index c0c767bebbe7b..c20826545fe04 100644 --- a/docs/source/advanced/advanced_gpu.rst +++ b/docs/source/advanced/advanced_gpu.rst @@ -117,7 +117,7 @@ To activate parameter sharding, you must wrap your model using provided ``wrap`` When not using Fully Sharded these wrap functions are a no-op. This means once the changes have been made, there is no need to remove the changes for other plugins. ``auto_wrap`` will recursively wrap `torch.nn.Modules` within the ``LightningModule`` with nested Fully Sharded Wrappers, -signalling that we'd like to partition these modules across data parallel devices, discarding the full weights when not required (information `here `__). +signalling that we'd like to partition these modules across data parallel devices, discarding the full weights when not required (information :class:`here `). ``auto_wrap`` can have varying level of success based on the complexity of your model. **Auto Wrap does not support models with shared parameters**. @@ -182,7 +182,7 @@ Activation checkpointing frees activations from memory as soon as they are not n FairScales' checkpointing wrapper also handles batch norm layers correctly unlike the PyTorch implementation, ensuring stats are tracked correctly due to the multiple forward passes. -This saves memory when training larger models however requires wrapping modules you'd like to use activation checkpointing on. See `here `__ for more information. +This saves memory when training larger models however requires wrapping modules you'd like to use activation checkpointing on. See :class:`here ` for more information. .. warning:: diff --git a/docs/source/advanced/ipu.rst b/docs/source/advanced/ipu.rst index cd5391ff44808..1f22ac534242e 100644 --- a/docs/source/advanced/ipu.rst +++ b/docs/source/advanced/ipu.rst @@ -114,7 +114,7 @@ PopVision Graph Analyser :alt: PopVision Graph Analyser :width: 500 -Lightning supports integration with the `PopVision Graph Analyser Tool `__. This helps to look at utilization of IPU devices and provides helpful metrics during the lifecycle of your trainer. Once you have gained access, The PopVision Graph Analyser Tool can be downloaded via the `GraphCore download website `__. +Lightning supports integration with the `PopVision Graph Analyser Tool `__. This helps to look at utilization of IPU devices and provides helpful metrics during the lifecycle of your trainer. Once you have gained access, The PopVision Graph Analyser Tool can be downloaded via the `GraphCore download website `__. Lightning supports dumping all reports to a directory to open using the tool. @@ -127,7 +127,7 @@ Lightning supports dumping all reports to a directory to open using the tool. trainer = pl.Trainer(ipus=8, strategy=IPUPlugin(autoreport_dir="report_dir/")) trainer.fit(model) -This will dump all reports to ``report_dir/`` which can then be opened using the Graph Analyser Tool, see `Opening Reports `__. +This will dump all reports to ``report_dir/`` which can then be opened using the Graph Analyser Tool, see `Opening Reports `__. .. _ipu-model-parallelism: diff --git a/docs/source/conf.py b/docs/source/conf.py index 8aaa06ccef8ec..2c858bc21afea 100644 --- a/docs/source/conf.py +++ b/docs/source/conf.py @@ -273,6 +273,8 @@ def _transform_changelog(path_in: str, path_out: str) -> None: "numpy": ("https://numpy.org/doc/stable/", None), "PIL": ("https://pillow.readthedocs.io/en/stable/", None), "torchmetrics": ("https://torchmetrics.readthedocs.io/en/stable/", None), + "fairscale": ("https://fairscale.readthedocs.io/en/latest/", None), + "graphcore": ("https://docs.graphcore.ai/en/latest/", None), } # -- Options for todo extension ---------------------------------------------- diff --git a/pytorch_lightning/plugins/precision/fully_sharded_native_amp.py b/pytorch_lightning/plugins/precision/fully_sharded_native_amp.py index e93e936e420e6..870e658bfc9c3 100644 --- a/pytorch_lightning/plugins/precision/fully_sharded_native_amp.py +++ b/pytorch_lightning/plugins/precision/fully_sharded_native_amp.py @@ -21,7 +21,7 @@ class FullyShardedNativeMixedPrecisionPlugin(ShardedNativeMixedPrecisionPlugin): """Native AMP for Fully Sharded Training.""" def clip_grad_by_norm(self, *_: Any, **__: Any) -> None: - # see https://fairscale.readthedocs.io/en/latest/api/nn/fsdp_tips.html + # see https://fairscale.readthedocs.io/en/latest/api/nn/fsdp.html # section `Gradient Clipping`, using `torch.nn.utils.clip_grad_norm_` is incorrect # for FSDP module. To overcome this, needs to call sharded_module.clip_grad_norm(clip_val) # however we rely on LightningModule's configure_sharded_model to wrap FSDP, it would be hard to