Prerequisites
Your must install PyTorch and PyTorch Lightning in your Python environment in order to follow this tutorial.
Tip
Using hydra-zen for your research project? Cite us! 😊
Run Boilerplate-Free ML Experiments with PyTorch Lightning & hydra-zen#
PyTorch Lightning is a library designed to eliminate the boilerplate code that is associated with training and testing neural networks in PyTorch. This is a natural bedfellow of Hydra and hydra-zen, which eliminate the boilerplate associated with designing software that is configurable, repeatable, and scalable.
Let’s use Hydra, hydra-zen, and PyTorch Lightning to configure and train multiple single-layer neural networks without any boilerplate code. For the sake of simplicity, we will train it to simply fit \(\cos{x}\) on \(x \in [-2\pi, 2\pi]\).
In this tutorial we will do the following:
Define a simple neural network and lightning module.
Create configs for our lighting module, data loader, optimizer, and trainer.
Define a task-function for training and testing a model.
Train four different models using combinations of two batch-sizes and two model-sizes (i.e. the number of neurons).
Analyze our models’ results.
Load our best model using the checkpoints saved by PyTorch Lightning and the job-config saved by Hydra.
Defining Our Model#
Create a script called zen_model.py
(or, open a Jupyter notebook and include the
following code. Here, we define our single-layer neural network and the lightning module that describes how to train and evaluate our model.
from typing import Callable, Type
import pytorch_lightning as pl
import torch as tr
import torch.nn as nn
import torch.nn.functional as F
from torch.optim import Optimizer # type: ignore
from torch.utils.data import DataLoader, TensorDataset
from hydra_zen.typing import Partial
__all__ = ["UniversalFuncModule", "single_layer_nn", "train_and_eval"]
def single_layer_nn(num_neurons: int) -> nn.Module:
"""y = sum(V sigmoid(X W + b))"""
return nn.Sequential(
nn.Linear(1, num_neurons),
nn.Sigmoid(),
nn.Linear(num_neurons, 1, bias=False),
)
class UniversalFuncModule(pl.LightningModule):
def __init__(
self,
model: nn.Module,
optim: Partial[Optimizer],
dataloader: Type[DataLoader],
target_fn: Callable[[tr.Tensor], tr.Tensor],
training_domain: tr.Tensor,
):
super().__init__()
self.optim = optim
self.dataloader = dataloader
self.training_domain = training_domain
self.target_fn = target_fn
self.model = model
def forward(self, x):
return self.model(x)
def configure_optimizers(self):
# provide optimizer with model parameters
return self.optim(self.parameters())
def training_step(self, batch, batch_idx):
x, y = batch
# compute |cos(x) - model(x)|^2
return F.mse_loss(self.model(x), y)
def train_dataloader(self):
# generate dataset: x, cos(x)
x = self.training_domain.reshape(-1, 1)
y = self.target_fn(x)
return self.dataloader(TensorDataset(x, y))
def train_and_eval(
model: tr.nn.Module,
optim: Partial[Optimizer],
dataloader: Type[DataLoader],
target_fn: Callable[[tr.Tensor], tr.Tensor],
training_domain: tr.Tensor,
lit_module: Type[UniversalFuncModule],
trainer: pl.Trainer,
):
lit = lit_module(
model=model,
optim=optim,
dataloader=dataloader,
target_fn=target_fn,
training_domain=training_domain,
)
# train the model
trainer.fit(lit)
# evaluate the model over the domain to assess the fit
final_eval = lit(training_domain.reshape(-1, 1))
final_eval = final_eval.detach().cpu().numpy().ravel()
# return the final evaluation of our model:
# a shape-(N,) numpy-array
return final_eval
Attention
Type-annotations are not required by hydra-zen. However, they do enable runtime type-checking of configured values for our app.
Creating Our Configs and Task Function#
Create another script - named experiment.py
- in the same directory as zen_model.py
.
Here, we will create the configs for our optimizer, model, data-loader, lightning module,
and trainer. We’ll also define the task function that trains and tests our model.
from math import pi
import pytorch_lightning as pl
from hydra_zen import builds, make_config, make_custom_builds_fn, zen
import torch as tr
from torch.optim import Adam # type: ignore
from torch.utils.data import DataLoader
from zen_model import UniversalFuncModule, train_and_eval, single_layer_nn
pbuilds = make_custom_builds_fn(zen_partial=True, populate_full_signature=True)
ExperimentConfig = make_config(
seed=1,
lit_module=UniversalFuncModule,
trainer=builds(pl.Trainer, max_epochs=100),
model=builds(single_layer_nn, num_neurons=10),
optim=pbuilds(Adam),
dataloader=pbuilds(DataLoader, batch_size=25, shuffle=True, drop_last=True),
target_fn=tr.cos,
training_domain=builds(tr.linspace, start=-2 * pi, end=2 * pi, steps=1000),
)
# Wrapping `train_and_eval` with `zen` makes it compatible with Hydra as a task function
#
# We must specify `pre_call` to ensure that pytorch lightning seeds everything
# *before* any of our configs are instantiated (which will initialize the pytorch
# model whose weights depend on the seed)
pre_seed = zen(lambda seed: pl.seed_everything(seed))
task_function = zen(train_and_eval, pre_call=pre_seed)
if __name__ == "__main__":
# enables us to call
from hydra_zen import ZenStore
store = ZenStore(deferred_hydra_store=False)
store(ExperimentConfig, name="lit_app")
task_function.hydra_main(
config_name="lit_app",
version_base="1.1",
config_path=".",
)
Be Mindful of What Your Task Function Returns
We could make this train_and_eval
return our trained neural network, which would enable
convenient access to it, in-memory, after our Hydra job completes. However, launching this
task function in a multirun fashion will train multiple models and thus would keep all of
those models in-memory (and perhaps on-GPU) simultaneously!
By not returning the model from our task function, we avoid the risk of hitting out-of-memory errors when training multiple large models.
Running Our Experiments#
We will use hydra_zen.launch()
to run four jobs: training our model with all four combinations of:
a batch-size of 20 and 200
a model with 10 and 100 neurons
Open a Python console (or Jupyter notebook) in the same directory as experiment.py
and run the following code.
>>> from hydra_zen import launch
>>> from experiment import ExperimentConfig, task_function
>>> (jobs,) = launch(
... ExperimentConfig,
... task_function,
... overrides=[
... "dataloader.batch_size=20,200",
... "model.num_neurons=10,100",
... ],
... multirun=True,
... )
[2021-10-24 21:23:32,556][HYDRA] Launching 4 jobs locally
[2021-10-24 21:23:32,558][HYDRA] #0 : dataloader.batch_size=20 model.num_neurons=10
[2021-10-24 21:23:45,809][HYDRA] #1 : dataloader.batch_size=20 model.num_neurons=100
[2021-10-24 21:23:58,656][HYDRA] #2 : dataloader.batch_size=200 model.num_neurons=10
[2021-10-24 21:24:01,796][HYDRA] #3 : dataloader.batch_size=200 model.num_neurons=100
Keep this Python console open; we will be making use of jobs
in order to inspect
our results.
Note that this is equivalent to running the following from the CLI:
$ python experiment.py dataloader.batch_size=20,200 model.num_neurons=10,100 -m
[2021-10-24 21:23:32,556][HYDRA] Launching 4 jobs locally
[2021-10-24 21:23:32,558][HYDRA] #0 : dataloader.batch_size=20 model.num_neurons=10
[2021-10-24 21:23:45,809][HYDRA] #1 : dataloader.batch_size=20 model.num_neurons=100
[2021-10-24 21:23:58,656][HYDRA] #2 : dataloader.batch_size=200 model.num_neurons=10
[2021-10-24 21:24:01,796][HYDRA] #3 : dataloader.batch_size=200 model.num_neurons=100
Inspecting Our Results#
Visualizing Our Results#
Let’s begin inspecting our results by plotting our four models on \(x \in [-2\pi, 2\pi]\), alongside the target function: \(\cos{x}\). Continuing to work in our current Python console (or Jupyter notebook), run the following code and verify that you see the plot shown below.
>>> from hydra_zen import instantiate
>>> import matplotlib.pyplot as plt
>>> from matplotlib.axes import Axes
>>> x = instantiate(ExperimentConfig.training_domain)
>>> target_fn = instantiate(ExperimentConfig.target_fn)
>>> fig, ax = plt.subplots()
>>> assert isinstance(ax, Axes)
>>> ax.plot(x, target_fn(x), ls="--", label="Target")
>>> for j in jobs:
... out = j.return_value
... ax.plot(x, out, label=",".join(s.split(".")[-1] for s in j.overrides))
...
>>> ax.grid(True)
>>> ax.legend(bbox_to_anchor=(1.04, 1), loc="upper left")
>>> plt.show()
Loading the Model of Best-Fit#
The 100-neuron model trained with a batch-size of 20 best fits our target function. Let’s load the model weights that were saved by PyTorch Lightning during training.
Continuing our work in the same Python console, let’s verify that job-1 corresponds to our desired model. Verify that you see the following outputs.
>>> best = jobs[1]
>>> best.cfg.dataloader.batch_size
20
>>> best.cfg.model.num_neurons
100
Next, we’ll load the config for this job. Recall that Hydra saves a .hydra/config.yaml
file, which contains the complete configuration of this job – we can reproduce
all aspects of it from this YAML.
>>> from hydra_zen import load_from_yaml, get_target, to_yaml
>>> from pathlib import Path
>>> outdir = Path(best.working_dir)
>>> cfg = load_from_yaml(outdir / ".hydra" / "config.yaml")
It is worth printing our this config to appreciate all of the exhaustive details that it captures about this job.
>>> print(to_yaml(cfg)) # fully details this job's config
seed: 1
lit_module:
path: zen_model.UniversalFuncModule
_target_: hydra_zen.funcs.get_obj
trainer:
_target_: pytorch_lightning.trainer.trainer.Trainer
max_epochs: 100
model:
_target_: zen_model.single_layer_nn
num_neurons: 100
optim:
_target_: torch.optim.adam.Adam
_partial_: true
lr: 0.001
betas:
- 0.9
- 0.999
eps: 1.0e-08
weight_decay: 0
amsgrad: false
dataloader:
_target_: torch.utils.data.dataloader.DataLoader
_partial_: true
batch_size: 20
shuffle: true
sampler: null
batch_sampler: null
num_workers: 0
collate_fn: null
pin_memory: false
drop_last: true
timeout: 0.0
worker_init_fn: null
multiprocessing_context: null
generator: null
prefetch_factor: 2
persistent_workers: false
target_fn:
path: torch.cos
_target_: hydra_zen.funcs.get_obj
training_domain:
_target_: torch.linspace
start: -6.283185307179586
end: 6.283185307179586
steps: 1000
PyTorch Lightning saved the model’s trained weights as a .ckpt
file in this job’s
working directory. Let’s load these weights and use them to instantiate our lighting
module.
>>> from hydra_zen import zen
>>> from functools import partial
>>> *_, last_ckpt = sorted(outdir.glob("**/*.ckpt"))
>>> LitModule = get_target(cfg.lit_module)
>>> pload = partial(LitModule.load_from_checkpoint, last_ckpt)
>>> # extract top-level fields from `cfg`, instantiate them, and pass to `load_from_checkpoint`
>>> loaded = zen(pload, unpack_kwargs=True)(cfg) # type: ignore
Finally, let’s double check that this loaded model behaves as-expected. Evaluating it at \(-\pi/2\), \(0\), and \(\pi/2\) should return, approximately, \(0\), \(1\), and \(0\), respectively.
>>> import torch as tr
>>> loaded(tr.tensor([-3.1415 / 2, 0.0, 3.1415 / 2]).reshape(-1, 1))
tensor([[0.0110],
[0.9633],
[0.0364]], grad_fn=<MmBackward>)
Math Details
For the interested reader… In this toy-problem we are optimizing arbitrary-width universal function approximators to fit \(\cos{x}\) on \(x \in [-2\pi, 2\pi]\). In mathematical notation, we want to solve the following optimization problem:
where \(N\) – the number of “neurons” in our layer – is a hyperparameter.
Attention
Cleaning Up:
To clean up after this tutorial, delete the multirun
directory that Hydra
created upon launching our app. You can find this in the same directory as your
experiment.py
file.