.. meta::
:description: hydra-zen can be used to design a boilerplate-free Hydra application for running PyTorch Lightning experiments.
.. _Lightning:
.. admonition:: Prerequisites
Your must install `PyTorch `_ and `PyTorch Lightning `_ in your Python environment in order to follow this
tutorial.
.. tip::
Using hydra-zen for your research project? `Cite us `_! 😊
======================================================================
Run Boilerplate-Free ML Experiments with PyTorch Lightning & hydra-zen
======================================================================
`PyTorch Lightning `_ is a library designed to
eliminate the boilerplate code that is associated with training and testing neural
networks in PyTorch. This is a natural bedfellow of Hydra and hydra-zen, which eliminate the boilerplate associated with designing software that is configurable, repeatable, and scalable.
Let's use Hydra, hydra-zen, and PyTorch Lightning to **configure and train multiple
single-layer neural networks without any boilerplate code**. For the sake of
simplicity, we will train it to simply fit :math:`\cos{x}` on
:math:`x \in [-2\pi, 2\pi]`.
In this tutorial we will do the following:
1. Define a simple neural network and `lightning module `_.
2. Create configs for our lighting module, data loader, optimizer, and trainer.
3. Define a task-function for training and testing a model.
4. Train four different models using combinations of two batch-sizes and two model-sizes (i.e. the number of neurons).
5. Analyze our models' results.
6. Load our best model using the checkpoints saved by PyTorch Lightning and the job-config saved by Hydra.
Defining Our Model
==================
Create a script called ``zen_model.py`` (or, open a Jupyter notebook and include the
following code. Here, we define our single-layer neural network and the `lightning module
`_ that describes how to train and evaluate our model.
.. code-block:: python
:caption: Contents of ``zen_model.py``
from typing import Callable, Type
import pytorch_lightning as pl
import torch as tr
import torch.nn as nn
import torch.nn.functional as F
from torch.optim import Optimizer
from torch.utils.data import DataLoader, TensorDataset
from hydra_zen.typing import Partial
__all__ = ["UniversalFuncModule", "single_layer_nn", "train_and_eval"]
def single_layer_nn(num_neurons: int) -> nn.Module:
"""y = sum(V sigmoid(X W + b))"""
return nn.Sequential(
nn.Linear(1, num_neurons),
nn.Sigmoid(),
nn.Linear(num_neurons, 1, bias=False),
)
class UniversalFuncModule(pl.LightningModule):
def __init__(
self,
model: nn.Module,
optim: Partial[Optimizer],
dataloader: Type[DataLoader],
target_fn: Callable[[tr.Tensor], tr.Tensor],
training_domain: tr.Tensor,
):
super().__init__()
self.optim = optim
self.dataloader = dataloader
self.training_domain = training_domain
self.target_fn = target_fn
self.model = model
def forward(self, x): # type: ignore
return self.model(x)
def configure_optimizers(self):
# provide optimizer with model parameters
return self.optim(self.parameters())
def training_step(self, batch, batch_idx): # type: ignore
x, y = batch
# compute |cos(x) - model(x)|^2
return F.mse_loss(self.model(x), y)
def train_dataloader(self):
# generate dataset: x, cos(x)
x = self.training_domain.reshape(-1, 1)
y = self.target_fn(x)
return self.dataloader(TensorDataset(x, y))
def train_and_eval(
model: tr.nn.Module,
optim: Partial[Optimizer],
dataloader: Type[DataLoader],
target_fn: Callable[[tr.Tensor], tr.Tensor],
training_domain: tr.Tensor,
lit_module: Type[UniversalFuncModule],
trainer: pl.Trainer,
):
lit = lit_module(
model=model,
optim=optim,
dataloader=dataloader,
target_fn=target_fn,
training_domain=training_domain,
)
# train the model
trainer.fit(lit)
# evaluate the model over the domain to assess the fit
final_eval = lit(training_domain.reshape(-1, 1))
final_eval = final_eval.detach().cpu().numpy().ravel()
# return the final evaluation of our model:
# a shape-(N,) numpy-array
return final_eval
.. attention::
:plymi:`Type-annotations ` are **not** required by hydra-zen. However, they do enable :ref:`runtime type-checking of configured values ` for our app.
Creating Our Configs and Task Function
======================================
Create another script - named ``experiment.py`` - in the same directory as ``zen_model.py``.
Here, we will create the configs for our optimizer, model, data-loader, lightning module,
and trainer. We'll also define the task function that trains and tests our model.
.. code-block:: python
:caption: Contents of ``experiment.py``
from math import pi
import pytorch_lightning as pl
from hydra_zen import builds, make_config, make_custom_builds_fn, zen
import torch as tr
from torch.optim import Adam
from torch.utils.data import DataLoader
from zen_model import UniversalFuncModule, train_and_eval, single_layer_nn
pbuilds = make_custom_builds_fn(zen_partial=True, populate_full_signature=True)
ExperimentConfig = make_config(
seed=1,
lit_module=UniversalFuncModule,
trainer=builds(pl.Trainer, max_epochs=100),
model=builds(single_layer_nn, num_neurons=10),
optim=pbuilds(Adam),
dataloader=pbuilds(DataLoader, batch_size=25, shuffle=True, drop_last=True),
target_fn=tr.cos,
training_domain=builds(tr.linspace, start=-2 * pi, end=2 * pi, steps=1000),
)
# Wrapping `train_and_eval` with `zen` makes it compatible with Hydra as a task function
#
# We must specify `pre_call` to ensure that pytorch lightning seeds everything
# *before* any of our configs are instantiated (which will initialize the pytorch
# model whose weights depend on the seed)
pre_seed = zen(lambda seed: pl.seed_everything(seed))
task_function = zen(train_and_eval, pre_call=pre_seed)
if __name__ == "__main__":
# enables us to call
from hydra_zen import ZenStore
store = ZenStore(deferred_hydra_store=False)
store(ExperimentConfig, name="lit_app")
task_function.hydra_main(
config_name="lit_app",
version_base="1.1",
config_path=".",
)
.. admonition:: Be Mindful of What Your Task Function Returns
We *could* make this `train_and_eval` return our trained neural network, which would enable
convenient access to it, in-memory, after our Hydra job completes. However, launching this
task function in a multirun fashion will train multiple models and thus would keep *all* of
those models in-memory (and perhaps on-GPU) simultaneously!
By not returning the model from our task function, we avoid the risk of hitting out-of-memory
errors when training multiple large models.
Running Our Experiments
========================
We will use :func:`hydra_zen.launch` to run four jobs: training our model with all four combinations of:
- a batch-size of 20 and 200
- a model with 10 and 100 neurons
Open a Python console (or Jupyter notebook) in the same directory as ``experiment.py``
and run the following code.
.. code-block:: pycon
:caption: Launching four jobs from a Python console.
>>> from hydra_zen import launch
>>> from experiment import ExperimentConfig, task_function
>>> (jobs,) = launch(
... ExperimentConfig,
... task_function,
... overrides=[
... "dataloader.batch_size=20,200",
... "model.num_neurons=10,100",
... ],
... multirun=True,
... )
[2021-10-24 21:23:32,556][HYDRA] Launching 4 jobs locally
[2021-10-24 21:23:32,558][HYDRA] #0 : dataloader.batch_size=20 model.num_neurons=10
[2021-10-24 21:23:45,809][HYDRA] #1 : dataloader.batch_size=20 model.num_neurons=100
[2021-10-24 21:23:58,656][HYDRA] #2 : dataloader.batch_size=200 model.num_neurons=10
[2021-10-24 21:24:01,796][HYDRA] #3 : dataloader.batch_size=200 model.num_neurons=100
Keep this Python console open; we will be making use of ``jobs`` in order to inspect
our results.
Note that this is equivalent to running the following from the CLI:
.. code-block:: console
:caption: Launching four jobs from the CLI.
$ python experiment.py dataloader.batch_size=20,200 model.num_neurons=10,100 -m
[2021-10-24 21:23:32,556][HYDRA] Launching 4 jobs locally
[2021-10-24 21:23:32,558][HYDRA] #0 : dataloader.batch_size=20 model.num_neurons=10
[2021-10-24 21:23:45,809][HYDRA] #1 : dataloader.batch_size=20 model.num_neurons=100
[2021-10-24 21:23:58,656][HYDRA] #2 : dataloader.batch_size=200 model.num_neurons=10
[2021-10-24 21:24:01,796][HYDRA] #3 : dataloader.batch_size=200 model.num_neurons=100
Inspecting Our Results
=======================
Visualizing Our Results
-----------------------
Let's begin inspecting our results by plotting our four models on :math:`x \in [-2\pi, 2\pi]`, alongside the
target function: :math:`\cos{x}`. Continuing to work in our current Python console (or Jupyter notebook), run
the following code and verify that you see the plot shown below.
.. code-block:: pycon
:caption: Plotting our models
>>> from hydra_zen import instantiate
>>> import matplotlib.pyplot as plt
>>> from matplotlib.axes import Axes
>>> x = instantiate(ExperimentConfig.training_domain)
>>> target_fn = instantiate(ExperimentConfig.target_fn)
>>> fig, ax = plt.subplots()
>>> assert isinstance(ax, Axes)
>>> ax.plot(x, target_fn(x), ls="--", label="Target")
>>> for j in jobs:
... out = j.return_value
... ax.plot(x, out, label=",".join(s.split(".")[-1] for s in j.overrides))
...
>>> ax.grid(True)
>>> ax.legend(bbox_to_anchor=(1.04, 1), loc="upper left")
>>> plt.show()
.. image:: https://user-images.githubusercontent.com/29104956/138622935-3a3a960f-301f-477e-b5ab-7f4c741b1f9e.png
:width: 800
:alt: Plot of four trained models vs the target function
Loading the Model of Best-Fit
-----------------------------
The 100-neuron model trained with a batch-size of 20 best fits our target function.
Let's load the model weights that were saved by PyTorch Lightning during training.
Continuing our work in the same Python console, let's verify that job-1 corresponds to
our desired model. Verify that you see the following outputs.
.. code-block:: pycon
:caption: Job 1 corresponds to the 100-neuron model trained with batch-size 20.
>>> best = jobs[1]
>>> best.cfg.dataloader.batch_size
20
>>> best.cfg.model.num_neurons
100
Next, we'll load the config for this job. Recall that Hydra saves a ``.hydra/config.yaml`` file, which contains the complete configuration of this job -- we can reproduce
all aspects of it from this YAML.
.. code-block:: pycon
:caption: Loading the complete config for this job
>>> from hydra_zen import load_from_yaml, get_target, to_yaml
>>> from pathlib import Path
>>> outdir = Path(best.working_dir)
>>> cfg = load_from_yaml(outdir / ".hydra" / "config.yaml")
It is worth printing our this config to appreciate all of the exhaustive details that
it captures about this job.
.. code-block:: pycon
>>> print(to_yaml(cfg)) # fully details this job's config
seed: 1
lit_module:
path: zen_model.UniversalFuncModule
_target_: hydra_zen.funcs.get_obj
trainer:
_target_: pytorch_lightning.trainer.trainer.Trainer
max_epochs: 100
model:
_target_: zen_model.single_layer_nn
num_neurons: 100
optim:
_target_: torch.optim.adam.Adam
_partial_: true
lr: 0.001
betas:
- 0.9
- 0.999
eps: 1.0e-08
weight_decay: 0
amsgrad: false
dataloader:
_target_: torch.utils.data.dataloader.DataLoader
_partial_: true
batch_size: 20
shuffle: true
sampler: null
batch_sampler: null
num_workers: 0
collate_fn: null
pin_memory: false
drop_last: true
timeout: 0.0
worker_init_fn: null
multiprocessing_context: null
generator: null
prefetch_factor: 2
persistent_workers: false
target_fn:
path: torch.cos
_target_: hydra_zen.funcs.get_obj
training_domain:
_target_: torch.linspace
start: -6.283185307179586
end: 6.283185307179586
steps: 1000
PyTorch Lightning saved the model's trained weights as a ``.ckpt`` file in this job's
working directory. Let's load these weights and use them to instantiate our lighting
module.
.. code-block:: pycon
:caption: Loading our lighting module with trained weights
>>> from hydra_zen import zen
>>> from functools import partial
>>> *_, last_ckpt = sorted(outdir.glob("**/*.ckpt"))
>>> LitModule = get_target(cfg.lit_module)
>>> pload = partial(LitModule.load_from_checkpoint, last_ckpt)
>>> # extract top-level fields from `cfg`, instantiate them, and pass to `load_from_checkpoint`
>>> loaded = zen(pload, unpack_kwargs=True)(cfg) # type: ignore
Finally, let's double check that this loaded model behaves as-expected. Evaluating it
at :math:`-\pi/2`, :math:`0`, and :math:`\pi/2` should return, approximately, :math:`0`, :math:`1`, and :math:`0`, respectively.
.. code-block:: pycon
:caption: Checkout our loaded model's behavior
>>> import torch as tr
>>> loaded(tr.tensor([-3.1415 / 2, 0.0, 3.1415 / 2]).reshape(-1, 1))
tensor([[0.0110],
[0.9633],
[0.0364]], grad_fn=)
.. admonition:: Math Details
For the interested reader... In this toy-problem we are optimizing `arbitrary-width universal function approximators `_ to fit :math:`\cos{x}`
on :math:`x \in [-2\pi, 2\pi]`.
In mathematical notation, we want to solve the following optimization problem:
.. math::
F(\vec{v}, \vec{w}, \vec{b}; x) &= \sum_{i=1}^{N}{v_{i}\sigma(x w_i + b_i)}
\vec{v}^*, \vec{w}^*, \vec{b}^* &= \operatorname*{arg\,min}_{\vec{v}, \vec{w}, \vec {b}\in\mathbb{R}^{N}} \; \|F(\vec{v}, \vec{w}, \vec{b}; x)\ - \cos{x}\|_{2}
x &\in [-2\pi, 2\pi]
where :math:`N` – the number of "neurons" in our layer – is a hyperparameter.
.. attention:: **Cleaning Up**:
To clean up after this tutorial, delete the ``multirun`` directory that Hydra
created upon launching our app. You can find this in the same directory as your
``experiment.py`` file.