rai_toolbox.optim.L2FrankWolfe#

class rai_toolbox.optim.L2FrankWolfe(params, *, epsilon, lr=2.0, use_default_lr_schedule=True, param_ndim=-1, div_by_zero_eps=1.1754943508222875e-38)[source]#

A Frank-Wolfe [1] optimizer that constrains each updated parameter to fall within an \(\epsilon\)-sized ball in \(L^2\) space, centered on the origin.

This parameter-transforming optimizer is useful for producing error counter factuals and performing visual concept probing [2].

See also

FrankWolfe
L1FrankWolfe
LinfFrankWolfe
L1qFrankWolfe

Notes

The method L2NormedGradientOptim._pre_step_transform_ is responsible for computing the negative linear minimization oracle for a parameter and storing it on param.grad.

References

[1]

https://en.wikipedia.org/wiki/Frank%E2%80%93Wolfe_algorithm#Algorithm

[2]

Roberts, Jay, and Theodoros Tsiligkaridis. Controllably Sparse Perturbations of Robust Classifiers for Explaining Predictions and Probing Learned Concepts. (2021).

__init__(params, *, epsilon, lr=2.0, use_default_lr_schedule=True, param_ndim=-1, div_by_zero_eps=1.1754943508222875e-38)[source]#

Parameters:

paramsSequence[Tensor] | Iterable[Mapping[str, Any]]

Iterable of parameters or dicts defining parameter groups.

epsilonfloat

The radius of the of the L2 ball to which each updated parameter will be constrained. Can be specified per parameter-group.

lrfloat, optional (default=2.0)

Indicates the weight with which the LMO contributes to the parameter update. See use_default_lr_schedule for additional details. If use_default_lr_schedule=False then lr must be be in the domain [0, 1].

use_default_lr_schedulebool, optional (default=True)

If True, then the per-parameter “learning rate” is scaled by \(\hat{l_r} = l_r / (l_r + k)\) where k is the update index for that parameter, which starts at 0.

param_ndimUnion[int, None], optional (default=-1)

Determines how a parameter and its gradient is temporarily reshaped prior to being passed to both _pre_step_transform_ and _post_step_transform_. By default,the transformation broadcasts over the tensor’s first dimension in a batch-like style. This can be specified per param-group

A positive number determines the dimensionality of the tensor that the transformation will act on.
A negative number indicates the ‘offset’ from the dimensionality of the tensor (see “Notes” for examples).
None means that the transformation will be applied directly to the tensor without any broadcasting.

See ParamTransformingOptimizer for more details and examples.

div_by_zero_epsfloat, optional (default=`torch.finfo(torch.float32).tiny`)

Prevents div-by-zero error in learning rate schedule.

Examples

Using L2FrankWolfe, we’ll constrain the updated parameter to fall within a \(L^2\)-ball of radius 1.8.

>>> import torch as tr
>>> from rai_toolbox.optim import L2FrankWolfe

Creating a parameter for our optimizer to update, and our optimizer. We specify param_ndim=None so that the constrain occurs on the parameter without any broadcasting.

>>> x = tr.tensor([1.0, 1.0], requires_grad=True)
>>> optim = L2FrankWolfe([x], epsilon=1.8, param_ndim=None)

Performing a simple calculation with x and performing backprop to create a gradient.

>>> (tr.tensor([1.0, 2.0]) * x).sum().backward()

Performing a step with our optimizer uses the Frank-Wolfe algorithm to update its parameters. Note that the updated parameter falls within/on the \(L^2\)-ball of radius 1.8.

>>> optim.step()
>>> x
tensor([-0.8050, -1.6100], requires_grad=True)

Methods

__init__(params, *, epsilon[, lr, ...])

Parameters: