{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ ".. meta::\n", " :description: Using the responsible AI toolbox to build workflows." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "> Copyright (c) 2023 Massachusetts Institute of Technology \n", "> SPDX-License-Identifier: MIT \n", "\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Building Workflows for Cross Validation Training and Adversarial Robustness Analysis\n", "This notebook demonstrates how the to build experiment workflows for configurable, repeatable, and scalable (CRS) experimentation. Two basic workflows will be demonstrated in this tutorial:\n", "\n", "- Cross-Validation Workflow: Performs cross-validation training that logs accuracy and loss across data folds\n", "- Robustness Curve Workflow: Loads a trained model and assesses the impact of adversarial perturbations on the model's performance; the model's performance metric is plotted against the increasing \"severity\" of the perturbation\n", "\n", "Here, \"workflow\" has a precise meaning. In the parlance of [mushin](https://mit-ll-responsible-ai.github.io/responsible-ai-toolbox/ref_mushin.html) a workflow is an API for describing how we configure, launch, and post-process one or more tasks. These workflows leverage [Hydra](https://hydra.cc/) and [hydra-zen](https://github.com/mit-ll-responsible-ai/hydra-zen) so that they are highly configurable and so that each job launched by a workflow is self-documenting and reproducible. \n", "In this tutorial, we also make use of [PyTorch Lightning](https://www.pytorchlightning.ai/) to eliminate boilerplate code associated with training and testing a PyTorch model.\n", "\n", "## Getting Started\n", "\n", "\n", "We will install the rAI-toolbox and then we will create a Jupyter notebook in which we will complete this tutorial.\n", "\n", "\n", "### Installing `rai_toolbox`\n", "\n", "\n", "To install the toolbox (along with its `mushin` capabilities) in your Python environment, run the following command in your \n", "terminal:\n", "\n", "```console\n", "$ pip install rai-toolbox[mushin]\n", "```\n", "\n", "To verify that the toolbox is installed as-expected, open a Python console and try \n", "importing ``rai_toolbox``.\n", "\n", "```python\n", ">>> import rai_toolbox\n", "```\n", "\n", "You will also need to install scikit-learn; please follow [these instructions](https://scikit-learn.org/stable/install.html#installing-scikit-learn).\n", "\n", "\n", "## Opening a Jupyter notebook\n", "\n", "If you do not have Jupyter Notebook or Jupyter Lab installed in your Python environment, please follow [these instructions](https://jupyter.org/install).\n", "Now open a terminal on your computer and [start a notebook/lab session](http://www.pythonlikeyoumeanit.com/Module1_GettingStartedWithPython/Jupyter_Notebooks.html).\n", "A file-viewer will open in an internet browser; pick a directory where you are okay with saving some PyTorch model weights. Create a notebook called `Building-Workflows.ipynb`. You can then follow along with this tutorial by copying, pasting, and running the code blocks below in the cells of your notebook.\n", "\n", "Note: you may also need to install the `ipywidgets` package in your Python environment to configure the notebook to display ipywidgets:\n", "\n", "```console\n", "$ pip install ipywidgets\n", "```\n", "\n", "## Imports" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "from pathlib import Path\n", "from typing import Optional, Tuple, Union\n", "\n", "import matplotlib.pyplot as plt\n", "import torch as tr" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "# Hydra and hydra-zen\n", "from hydra.core.config_store import ConfigStore\n", "from hydra_zen import MISSING, builds, instantiate, load_from_yaml, make_config\n", "\n", "# Lightning\n", "from pytorch_lightning import LightningModule, Trainer\n", "\n", "# sklearn and torch\n", "from sklearn.model_selection import StratifiedKFold\n", "from torch import Tensor, nn\n", "from torch.optim import Optimizer\n", "from torch.utils.data import DataLoader, Subset\n", "from torchmetrics import Accuracy\n", "from torchvision import transforms\n", "from torchvision.datasets import MNIST\n", "\n", "# rAI-toolbox\n", "from rai_toolbox._typing import Partial\n", "from rai_toolbox.mushin import load_from_checkpoint\n", "from rai_toolbox.mushin.lightning import MetricsCallback\n", "from rai_toolbox.mushin.workflows import (\n", " MultiRunMetricsWorkflow,\n", " RobustnessCurve,\n", " multirun,\n", ")\n", "\n", "from rai_toolbox.optim import L2ProjectedOptim, LinfProjectedOptim\n", "from rai_toolbox.perturbations import gradient_ascent" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Experiment Functions and Classes\n", "\n", "Here we define two Neural Network models, a fully linear neural network and a convolutional neural network." ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": [ "class LinearModel(nn.Module):\n", " def __init__(self):\n", " super().__init__()\n", " self.model = nn.Sequential(\n", " nn.Flatten(1),\n", " nn.Linear(28 * 28, 256),\n", " nn.ReLU(),\n", " nn.Linear(256, 128),\n", " nn.ReLU(),\n", " nn.Linear(128, 64),\n", " nn.ReLU(),\n", " nn.Linear(64, 10),\n", " )\n", "\n", " def forward(self, x):\n", " return self.model(x)\n", "\n", "class ConvModel(nn.Module):\n", " def __init__(self):\n", " super().__init__()\n", " self.model = nn.Sequential(\n", " nn.Conv2d(1, 32, 5, padding=\"same\"),\n", " nn.BatchNorm2d(32),\n", " nn.ReLU(),\n", " nn.MaxPool2d(3),\n", " nn.Conv2d(32, 32, 3, padding=\"same\"),\n", " nn.BatchNorm2d(32),\n", " nn.ReLU(),\n", " nn.MaxPool2d(3),\n", " nn.Conv2d(32, 32, 3, padding=\"same\"),\n", " nn.BatchNorm2d(32),\n", " nn.ReLU(),\n", " nn.Conv2d(32, 10, 3),\n", " nn.Flatten(1),\n", " )\n", "\n", " def forward(self, x):\n", " return self.model(x)\n", "\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Next lets define a function that takes the `MNIST` dataset and splits the data into training and validation sets using SciKit-Learn's [StratifiedKFold](https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.StratifiedKFold.html). This allows us to split the dataset into \"folds\" and select the fold for each experiment." ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [], "source": [ "def split_dataset(\n", " dataset: MNIST, n_splits: int, fold: int, random_state: int = 49\n", ") -> Tuple[Subset, Subset]:\n", " \"\"\"Provide training and validation splits using `sklearn.model_selection.StratifiedKfold`\"\"\"\n", "\n", " kfold = StratifiedKFold(n_splits=n_splits, shuffle=True, random_state=random_state)\n", " train_indices, val_indices = list(\n", " kfold.split(range(len(dataset)), dataset.targets)\n", " )[fold]\n", " return Subset(dataset, train_indices), Subset(dataset, val_indices)\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now define the [LightningModule](https://pytorch-lightning.readthedocs.io/en/latest/api/pytorch_lightning.core.LightningModule.html#pytorch_lightning.core.LightningModule) for training and testing.\n", "This describes how we:\n", "\n", "- Load our data\n", "- Process a batch of data with our model (both with and without adversarial perturbations)\n", "- Update our model's parameters during training\n", "\n", "Note that we specifically design this lightning module to log the following metrics:\n", "\n", "- Loss and accuracy for cross-validation training\n", "- Adversarial loss, adversarial accuracy, and clean accuracy for robustness analysis\n", "\n", "These metrics will be saved during each of our runs, and we will load and aggregate these metrics to analyze our results. " ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [], "source": [ "class StandardModule(LightningModule):\n", " def __init__(\n", " self,\n", " *,\n", " model: nn.Module,\n", " dataset: MNIST,\n", " optimizer: Optional[Partial[Optimizer]] = None,\n", " perturber=None,\n", " fold: int = 0,\n", " n_splits: int = 5,\n", " batch_size: int = 100,\n", " num_workers: int = 4,\n", " ) -> None:\n", " super().__init__()\n", " self.dataset = dataset\n", " self.optimizer = optimizer\n", " self.criterion = nn.CrossEntropyLoss()\n", " self.model = model\n", " self.perturber = perturber\n", " self.n_splits = n_splits\n", " self.fold = fold\n", " self.batch_size = batch_size\n", " self.num_workers = num_workers\n", "\n", " # Metrics\n", " self.acc_metric = Accuracy(task=\"multiclass\", num_classes=10)\n", " if self.perturber:\n", " self.clean_acc_metric = Accuracy(task=\"multiclass\", num_classes=10)\n", "\n", " def forward(self, data: Tensor) -> Tensor:\n", " return self.model(data)\n", "\n", " def train_dataloader(self) -> DataLoader:\n", " train_dataset, _ = split_dataset(self.dataset, self.n_splits, self.fold)\n", " return DataLoader(\n", " train_dataset,\n", " batch_size=self.batch_size,\n", " num_workers=self.num_workers,\n", " shuffle=True,\n", " )\n", "\n", " def val_dataloader(self) -> DataLoader:\n", " _, val_dataset = split_dataset(self.dataset, self.n_splits, self.fold)\n", " return DataLoader(\n", " val_dataset, batch_size=self.batch_size, num_workers=self.num_workers\n", " )\n", "\n", " def test_dataloader(self) -> DataLoader:\n", " return DataLoader(\n", " self.dataset, batch_size=self.batch_size, num_workers=self.num_workers\n", " )\n", "\n", " def configure_optimizers(self) -> Optional[Optimizer]:\n", " if self.optimizer:\n", " return self.optimizer(self.model.parameters())\n", " return None\n", "\n", " def _step(self, batch, stage: str) -> Tensor:\n", " data_orig, target = batch\n", "\n", " if self.perturber:\n", " with tr.no_grad():\n", " output = self.model(data_orig)\n", " loss = self.criterion(output, target)\n", " acc = self.clean_acc_metric(output, target)\n", " self.log(f\"{stage}_clean_accuracy\", acc)\n", " \n", " inference_tensors = tr.is_inference_mode_enabled()\n", " with tr.inference_mode(mode=False), tr.enable_grad():\n", " if inference_tensors:\n", " # we need to clone in order to support grad mode\n", " data_orig = data_orig.clone()\n", " target = target.clone()\n", "\n", " data, adv_loss = self.perturber(\n", " model=self.model, data=data_orig, target=target\n", " )\n", " self.log(f\"{stage}_adversarial_loss\", adv_loss.mean().item())\n", "\n", " else:\n", " data = data_orig\n", "\n", " output = self.model(data)\n", " loss = self.criterion(output, target)\n", " acc = self.acc_metric(output, target)\n", " self.log(f\"{stage}_loss\", loss)\n", " self.log(f\"{stage}_accuracy\", acc)\n", " return loss\n", "\n", " def training_step(self, batch, batch_idx) -> Tensor:\n", " return self._step(batch, \"train\")\n", "\n", " def validation_step(self, batch, batch_idx) -> Tensor:\n", " return self._step(batch, \"val\")\n", "\n", " def test_step(self, batch, batch_idx) -> Tensor:\n", " return self._step(batch, \"test\")\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Configuring our experiments with hydra-zen\n", "\n", "Now we use `hydra-zen` to create \"configs\" for all of the components of our experiments.\n", "Each config describes an interface and/or object in our experiment that we want to be able to modify from run to run.\n", "They will also serve to make our work self-documenting and reproducible." ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [], "source": [ "Augmentations = builds(\n", " transforms.Compose,\n", " [builds(transforms.RandomCrop, size=28, padding=4), builds(transforms.ToTensor)],\n", ")\n", "TrainDataset = builds(\n", " MNIST, root=\"${data_dir}\", train=True, transform=Augmentations, download=True\n", ")\n", "TestDataset = builds(\n", " MNIST,\n", " root=\"${data_dir}\",\n", " train=False,\n", " transform=builds(transforms.ToTensor),\n", " download=True,\n", ")\n", "ConvModelCfg = builds(ConvModel)\n", "LinearModelCfg = builds(LinearModel)\n", "Optim = builds(tr.optim.SGD, lr=0.1, zen_partial=True)\n", "\n", "\n", "L2PGD = builds(L2ProjectedOptim, zen_partial=True)\n", "LinfPGD = builds(LinfProjectedOptim, zen_partial=True)\n", "\n", "\n", "def lr_for_pgd(epsilon, num_steps):\n", " return 2.5 * epsilon / num_steps\n", "\n", "\n", "Perturber = builds(\n", " gradient_ascent,\n", " optimizer=\"${optimizer}\",\n", " epsilon=\"${epsilon}\",\n", " steps=\"${steps}\",\n", " lr=builds(lr_for_pgd, \"${epsilon}\", \"${steps}\"),\n", " zen_partial=True,\n", " populate_full_signature=True,\n", ")\n", "\n", "PLModule = builds(\n", " StandardModule,\n", " model=\"${model}\",\n", " fold=\"${fold}\",\n", " n_splits=\"${n_splits}\",\n", " dataset=TrainDataset,\n", " optimizer=Optim,\n", " perturber=\"${perturber}\",\n", " populate_full_signature=True,\n", ")\n", "\n", "\n", "EvalPLModule = builds(\n", " StandardModule,\n", " model=\"${model}\",\n", " dataset=TestDataset,\n", " perturber=\"${perturber}\",\n", " populate_full_signature=True,\n", ")\n", "\n", "\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We configure our trainer to use `MetricsCallback`, which will instruct PyTorch Lightning to automatically save our logged metrics as a dictionary in a file named \"fit_metrics.pt\" and \"test_metrics.pt\" for training and evaluation, respectively." ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [], "source": [ "TrainerCfg = builds(\n", " Trainer,\n", " max_epochs=10,\n", " accelerator=\"auto\",\n", " devices=1,\n", " enable_progress_bar=False,\n", " enable_model_summary=False,\n", " callbacks=[builds(MetricsCallback)],\n", " populate_full_signature=True,\n", ")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now we use Hydra's [ConfigStore](https://hydra.cc/docs/tutorials/structured_config/config_store/) API to create named configuration groups that can be specified/swapped when we run our workflow.\n", "Let's make it easy to swap both models and optimizers by-name." ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [], "source": [ "cs = ConfigStore.instance()\n", "cs.store(name=\"cnn\", group=\"model\", node=ConvModelCfg)\n", "cs.store(name=\"linear\", group=\"model\", node=LinearModelCfg)\n", "cs.store(name=\"l2pgd\", group=\"optimizer\", node=L2PGD)\n", "cs.store(name=\"linfpgd\", group=\"optimizer\", node=LinfPGD)\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Cross Validation Workflow\n", "\n", "With all the configurations in place we can now define our first experiment workflow: train multiple models on `MNIST` data using cross-validation. First define the main experiment configuration:" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [], "source": [ "import platform\n", "\n", "Config = make_config(\n", " defaults=[\n", " \"_self_\",\n", " {\"model\": \"linear\"},\n", " ],\n", " data_dir=Path.home() / \".torch/data\",\n", " model=MISSING,\n", " module=PLModule,\n", " trainer=TrainerCfg,\n", " perturber=None,\n", " fold=0,\n", " n_splits=5,\n", ")\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Create `CrossValWorkflow` by inheriting from [MultiRunMetricsWorkflow](https://mit-ll-responsible-ai.github.io/responsible-ai-toolbox/generated/rai_toolbox.mushin.MultiRunMetricsWorkflow.html) to train a given model for a given a cross validation dataset (fold). The task function simply runs PyTorch Lightning's [Trainer.fit](https://pytorch-lightning.readthedocs.io/en/latest/api/pytorch_lightning.trainer.trainer.Trainer.html#pytorch_lightning.trainer.trainer.Trainer) and returns the metrics saved from [MetricsCallback](https://mit-ll-responsible-ai.github.io/responsible-ai-toolbox/generated/rai_toolbox.mushin.MetricsCallback.html). To run this workflow simply define the number of cross validation splits to use via `n_splits`." ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [], "source": [ "class CrossValWorkFlow(MultiRunMetricsWorkflow):\n", " @staticmethod\n", " def task(trainer: Trainer, module: LightningModule):\n", " trainer.fit(module)\n", " \n", " # Loads & returns a dictionary of metrics logged by PyTorch Lightning\n", " return tr.load(\"fit_metrics.pt\")\n", "\n", " def run(self, n_splits: int, **run_kwargs):\n", " fold = multirun(range(n_splits))\n", " super().run(n_splits=n_splits, fold=fold, **run_kwargs)\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now run the workflow by defining the requried `n_splits` and the models (names defined in the [ConfigStore](https://hydra.cc/docs/tutorials/structured_config/config_store/) above). Additionally we define the working directory of the experiment by setting `hydra.sweep.dir` configuration via `overrides`." ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[2023-01-27 13:08:46,529][HYDRA] Launching 4 jobs locally\n", "[2023-01-27 13:08:46,529][HYDRA] \t#0 : n_splits=2 fold=0 model=linear\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "/home/justin_goodwin/.conda/envs/rai_toolbox/lib/python3.10/site-packages/hydra/_internal/core_plugins/basic_launcher.py:74: UserWarning: Future Hydra versions will no longer change working directory at job runtime by default.\n", "See https://hydra.cc/docs/1.2/upgrades/1.1_to_1.2/changes_to_job_working_dir/ for more information.\n", " ret = run_job(\n", "/home/justin_goodwin/.conda/envs/rai_toolbox/lib/python3.10/site-packages/lightning_fabric/plugins/environments/slurm.py:166: PossibleUserWarning: The `srun` command is available on your system but is not used. HINT: If your intention is to run Lightning on SLURM, prepend your python command with `srun` like so: srun python /home/justin_goodwin/.conda/envs/rai_toolbox/lib/pyt ...\n", " rank_zero_warn(\n", "GPU available: True (cuda), used: True\n", "TPU available: False, using: 0 TPU cores\n", "IPU available: False, using: 0 IPUs\n", "HPU available: False, using: 0 HPUs\n", "/home/justin_goodwin/.conda/envs/rai_toolbox/lib/python3.10/site-packages/pytorch_lightning/trainer/connectors/logger_connector/logger_connector.py:67: UserWarning: Starting from v1.9.0, `tensorboardX` has been removed as a dependency of the `pytorch_lightning` package, due to potential conflicts with other packages in the ML ecosystem. For this reason, `logger=True` will use `CSVLogger` as the default logger, unless the `tensorboard` or `tensorboardX` packages are found. Please `pip install lightning[extra]` or one of them to enable TensorBoard support by default\n", " warning_cache.warn(\n", "Missing logger folder: /home/justin_goodwin/projects/raiden/rai_toolbox/docs/source/tutorials/outputs/cross_validation/2023-01-27/13-08-46/0/lightning_logs\n", "LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0,1]\n", "`Trainer.fit` stopped: `max_epochs=10` reached.\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "[2023-01-27 13:09:20,658][HYDRA] \t#1 : n_splits=2 fold=0 model=cnn\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "/home/justin_goodwin/.conda/envs/rai_toolbox/lib/python3.10/site-packages/hydra/_internal/core_plugins/basic_launcher.py:74: UserWarning: Future Hydra versions will no longer change working directory at job runtime by default.\n", "See https://hydra.cc/docs/1.2/upgrades/1.1_to_1.2/changes_to_job_working_dir/ for more information.\n", " ret = run_job(\n", "/home/justin_goodwin/.conda/envs/rai_toolbox/lib/python3.10/site-packages/lightning_fabric/plugins/environments/slurm.py:166: PossibleUserWarning: The `srun` command is available on your system but is not used. HINT: If your intention is to run Lightning on SLURM, prepend your python command with `srun` like so: srun python /home/justin_goodwin/.conda/envs/rai_toolbox/lib/pyt ...\n", " rank_zero_warn(\n", "GPU available: True (cuda), used: True\n", "TPU available: False, using: 0 TPU cores\n", "IPU available: False, using: 0 IPUs\n", "HPU available: False, using: 0 HPUs\n", "Missing logger folder: /home/justin_goodwin/projects/raiden/rai_toolbox/docs/source/tutorials/outputs/cross_validation/2023-01-27/13-08-46/1/lightning_logs\n", "LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0,1]\n", "`Trainer.fit` stopped: `max_epochs=10` reached.\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "[2023-01-27 13:09:56,220][HYDRA] \t#2 : n_splits=2 fold=1 model=linear\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "/home/justin_goodwin/.conda/envs/rai_toolbox/lib/python3.10/site-packages/hydra/_internal/core_plugins/basic_launcher.py:74: UserWarning: Future Hydra versions will no longer change working directory at job runtime by default.\n", "See https://hydra.cc/docs/1.2/upgrades/1.1_to_1.2/changes_to_job_working_dir/ for more information.\n", " ret = run_job(\n", "/home/justin_goodwin/.conda/envs/rai_toolbox/lib/python3.10/site-packages/lightning_fabric/plugins/environments/slurm.py:166: PossibleUserWarning: The `srun` command is available on your system but is not used. HINT: If your intention is to run Lightning on SLURM, prepend your python command with `srun` like so: srun python /home/justin_goodwin/.conda/envs/rai_toolbox/lib/pyt ...\n", " rank_zero_warn(\n", "GPU available: True (cuda), used: True\n", "TPU available: False, using: 0 TPU cores\n", "IPU available: False, using: 0 IPUs\n", "HPU available: False, using: 0 HPUs\n", "Missing logger folder: /home/justin_goodwin/projects/raiden/rai_toolbox/docs/source/tutorials/outputs/cross_validation/2023-01-27/13-08-46/2/lightning_logs\n", "LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0,1]\n", "`Trainer.fit` stopped: `max_epochs=10` reached.\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "[2023-01-27 13:10:28,555][HYDRA] \t#3 : n_splits=2 fold=1 model=cnn\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "/home/justin_goodwin/.conda/envs/rai_toolbox/lib/python3.10/site-packages/hydra/_internal/core_plugins/basic_launcher.py:74: UserWarning: Future Hydra versions will no longer change working directory at job runtime by default.\n", "See https://hydra.cc/docs/1.2/upgrades/1.1_to_1.2/changes_to_job_working_dir/ for more information.\n", " ret = run_job(\n", "/home/justin_goodwin/.conda/envs/rai_toolbox/lib/python3.10/site-packages/lightning_fabric/plugins/environments/slurm.py:166: PossibleUserWarning: The `srun` command is available on your system but is not used. HINT: If your intention is to run Lightning on SLURM, prepend your python command with `srun` like so: srun python /home/justin_goodwin/.conda/envs/rai_toolbox/lib/pyt ...\n", " rank_zero_warn(\n", "GPU available: True (cuda), used: True\n", "TPU available: False, using: 0 TPU cores\n", "IPU available: False, using: 0 IPUs\n", "HPU available: False, using: 0 HPUs\n", "Missing logger folder: /home/justin_goodwin/projects/raiden/rai_toolbox/docs/source/tutorials/outputs/cross_validation/2023-01-27/13-08-46/3/lightning_logs\n", "LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0,1]\n", "`Trainer.fit` stopped: `max_epochs=10` reached.\n" ] } ], "source": [ "kfold_task = CrossValWorkFlow(Config)\n", "kfold_task.run(\n", " n_splits=2,\n", " model=multirun([\"linear\", \"cnn\"]),\n", " overrides=[\n", " \"hydra.sweep.dir=outputs/cross_validation/${now:%Y-%m-%d}/${now:%H-%M-%S}\"\n", " ],\n", ")\n", "## You can load previous experiments\n", "# kfold_task = CrossValWorkFlow().load_from_dir(\"outputs/cross_validation/2022-05-11/14-42-14\", metrics_filename=\"fit_metrics.pt\")\n" ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "PosixPath('/home/justin_goodwin/projects/raiden/rai_toolbox/docs/source/tutorials/outputs/cross_validation/2023-01-27/13-08-46')" ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" } ], "source": [ "kfold_task.working_dir" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Once the task is finished we can load in the metrics to an [xarray.Dataset](https://docs.xarray.dev/en/stable/generated/xarray.Dataset.html) using the `to_xarray` method.\n", "This xarray dataset thus stores all of the metrics that were saved/returned by the tasks that we ran." ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
<xarray.Dataset>\n", "Dimensions: (fold: 2, model: 2, epoch: 10)\n", "Coordinates:\n", " * fold (fold) int64 0 1\n", " * model (model) <U6 'linear' 'cnn'\n", " * epoch (epoch) int64 0 1 2 3 4 5 6 7 8 9\n", "Data variables:\n", " train_loss (fold, model, epoch) float64 1.8 0.9345 ... 0.09954 0.01555\n", " train_accuracy (fold, model, epoch) float64 0.47 0.68 0.83 ... 1.0 0.97 1.0\n", " val_loss (fold, model, epoch) float64 1.894 1.108 ... 0.07693 0.04691\n", " val_accuracy (fold, model, epoch) float64 0.3434 0.6266 ... 0.9743 0.9844\n", "Attributes:\n", " n_splits: 2
<xarray.Dataset>\n", "Dimensions: (epsilon: 3, job_dir: 4)\n", "Coordinates:\n", " * epsilon (epsilon) int64 0 1 2\n", " * job_dir (job_dir) <U117 '/home/justin_goodwin/projects/rai...\n", " fold (job_dir) int64 0 0 1 1\n", " model (job_dir) <U6 'linear' 'cnn' 'linear' 'cnn'\n", "Data variables:\n", " test_clean_accuracy (epsilon, job_dir) float64 0.9595 0.9869 ... 0.9899\n", " test_adversarial_loss (epsilon, job_dir) float64 0.1262 0.03737 ... 14.75\n", " test_loss (epsilon, job_dir) float64 0.1262 0.03737 ... 14.75\n", " test_accuracy (epsilon, job_dir) float64 0.9595 0.9869 ... 0.0